Re: HP Proliant won't reboot
On Wed, Apr 2, 2014 at 9:45 AM, D'Arcy J.M. Cain wrote: > On Wed, 2 Apr 2014 07:56:04 -0400 > "D'Arcy J.M. Cain" wrote: >> On Tue, 1 Apr 2014 08:45:36 -0700 >> Andy Ruhl wrote: >> > It would be interesting to know if other versions of NetBSD work, >> > current especially. >> >> I will try that shortly as soon as I have built a new kernel. > > I tried to build a new kernel but I am getting a build error. I need > to investigate. Meanwhile I grabbed a snapshot from today and it still > fails. > > At this point I don't even know where to start debugging this. Can > someone with clue please suggest? You've gotta get more messages somehow. If it was mine, I would set ddb.fromconsole=1, set up a serial console, and try to get it into the debugger from the serial console. From there, someone can get an idea of what's going on. I see lots of diagnostic and debug options in the kernel. Based on my own digging, I think you want to set DEBUG. There are probably others. Hopefully someone else will speak up. Andy
Re: HP Proliant won't reboot
On Wed, 2 Apr 2014 07:56:04 -0400 "D'Arcy J.M. Cain" wrote: > On Tue, 1 Apr 2014 08:45:36 -0700 > Andy Ruhl wrote: > > It would be interesting to know if other versions of NetBSD work, > > current especially. > > I will try that shortly as soon as I have built a new kernel. I tried to build a new kernel but I am getting a build error. I need to investigate. Meanwhile I grabbed a snapshot from today and it still fails. At this point I don't even know where to start debugging this. Can someone with clue please suggest? -- D'Arcy J.M. Cain http://www.NetBSD.org/ IM:da...@vex.net
Re: Proposal for kernel clock changes
On Apr 1, 2014, at 1:50 PM, David Laight wrote: > This may mean that you can (effectively) count the ticks on all your > clocks since 'boot' and then scale the frequency of each to give the > same 'time since boot' - even though that will slightly change the > relationship between old timestamps taken on different clocks. > Possibly you do need a small offset for each clock to avoid > discrepencies in the 'current time' when you recalculate the clocks > frequency. If the underling clock moves in frequency, you need to have both a scale on the frequency, and a time to count adjustment as well. Otherwise on long-running systems you accumulate a fair amount of error. It doesn’t take much more than 1ppm of error to accumulate a second of error in 10 days if you don’t have ‘on time’ marks that integrate all of time up to that point. Then the error in phase will be related to the time since last phase sync, rather than since time of boot. Warner
Re: Proposal for kernel clock changes
On 1 Apr, 2014, at 12:50 , David Laight wrote: > On Fri, Mar 28, 2014 at 06:16:23PM -0400, Dennis Ferguson wrote: >> I would like to rework the clock support in the kernel a bit to correct >> some deficiencies which exist now, and to provide new functionality. The >> issues I would like to try to address include: > > A few comments, I've deleted the body so they aren't hidden! Thanks very much for looking at it. I know that reading about clocks is, for most people, a good way to put oneself to sleep at night. > One problem I do see is knowing which counter to trust most. > You are trying to cross synchronise values and it might be that > the clock with the best long term accuracy is a very slow one > with a lot of jitter (NTP over dialup anyone?). > Whereas the fastest clock is likely to have the least jitter, but > may not have the long term stability. This is true but when considering the quality of non-special-purpose computer clock hardware running on its own, either on the CPU board or on an ethernet card, what you'll effectively end up trying to determine by this is whether the clock is just crappy, or is crappier than that. The stability of cheap, uncompensated free-running crystals is always poor, you shouldn't trust any of these these unless you have no choice, and life is too short to worry about trying to measure degrees of crappiness. Since all the clocks in your system are likely to be crappy if left running free the "best" clock in the system will always be the one which is making the most accurate measurements of the most accurate external time source you have available and steering itself to that. The only important "quality" of a clock is how well it is measuring its time source and how good that time source is. The measurement clocks are only useful if you have an application which is interested in taking and processing those measurements, and if that application is not broken it will certainly come to some opinion about which of those clocks is the best one based on those measurements. That will be the clock the time comes from, the polling is the mechanism to get it to the others. The kernel itself will see the polling and see adjustments being made to clocks but it will be the application which knows why that is being done and which way the time is moving. If there are no external time sources, however, you'll probably just live with whatever your chosen system clock does and not worry about the measurement clocks. > There are places where you are only interested in the difference > between timestamps - rather than needing them converting to absolute > times. I'm not quite sure how to read that, but I'll guess. I over-simplified the description of what is being maintained a bit. I'm fond of, and the system call interface I like makes use of, the two timescales the kernel maintains now, i.e. time = uptime + boottime; where `time' has an UTC-aligned epoch, `uptime's epoch is around the time the machine was booted, and boottime is a mostly-constant value which expresses uptime's epoch in terms of time's epoch. uptime is maintained to advance at the same rate as time but to be phase continuous, which means that uptime will advance at as close to the rate of the SI second as we can determine it (since it advances at the same rate as time, which advances at the rate of UTC, which advances at the rate of the SI second) but is unaffected by step changes made to time make to bring it into phase alignment to UTC (boottime changes instead). uptime hence tracks UTC's frequency but not its phase. If you want to measure the interval between timestamps, then, I think you would take your timestamps in terms of uptime and then compute interval = uptime[1] - uptime[0]; which should reliably give you system's best estimate of the elapsed number of SI seconds between the times the two stamps were acquired. I like to record event timestamps in terms of uptime as well since it makes it unambiguous when the events occurred even if someone calls clock_settime() in between. Also, the tuple describing a conversion from a tickcount_t tc to a systime_t, which I over-simplified, actually maintains the pair of timescales by maintaining two `c' values, so that time = (tc << s) * r + c_time; uptime = (tc << s) * r + c_uptime; and boottime = c_time - c_uptime; So if "absolute time" means UTC, in the form of UTC-aligned `time', then I agree. You can't reliably compute time intervals from two UTC timestamps since, almost unavoidably, some day the system's estimate of UTC will be wrong and will require a step change to fix, and you'll compute a bogus time interval if your timestamps straddle that. On the other hand, if avoiding "needing them converted to absolute times" means hanging on to the raw tickstamp/tickcount for an extended period then I don't see the point. The conversion isn't very expensive, and a pair of uptime timestamps taken from the system clock will rel
Re: asymmetric smp
On 2014-04-02 16:10, John Nemeth wrote: On Apr 2, 1:55pm, Johnny Billquist wrote: } The root fs in on nfs, as I'm running the machine diskless. Disk is } served from a -current NetBSD/alpha system sitting right next to it. And } I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k } block size for NFS. Login is obviously already running, since that is } what also prompts for the username, and doing it twice should even put } some stuff in local cache. Uh, actually getty does the initial prompt for username on the console. After collecting the username, getty execs login. Hmm. My mistake in that case. So we have image activation at that point. Hmm... Thanks for pointing it out. Johnny
Re: asymmetric smp
On 2014-04-02 15:38, Anders Magnusson wrote: Martin Husemann skrev 2014-04-02 15:33: On Wed, Apr 02, 2014 at 03:13:19PM +0200, Johnny Billquist wrote: What model of VAX do you have, and how long does it take to boot, to the point where you get the login prompt on the console? VS4000/M96 with 128 MB, and local scsi disk - very nice machine ;-) Didn't measure exactly right now, but on the order of 30s. Heh, that machines is like 30 times faster than Johnny's :-) Indeed. You can't get much faster VAXen than that. If it takes about 3s from pressing enter after the username until the password prompt appears, then 30s on this 3500 seems very reasonable. Johnny
Re: asymmetric smp
On Apr 2, 1:55pm, Johnny Billquist wrote: } On 2014-04-01 23:04, Warner Losh wrote: } > On Apr 1, 2014, at 5:49 AM, Johnny Billquist wrote: } > } >> Good points. } >> Is this the right time to ask why booting NetBSD on a VAX (a 3500) now takes more than 15 minutes? What is the system doing all that time??? } > } > FreeBSD used to take forever to boot on certain low-end ARM CPUs with /etc/rc.d after it was imported from NetBSD. This was due to crappy root-device performance (100kB/s is enough for anybody, right?) and crappy, at the time, pmap code that caused excess page traffic in the /etc/rc.d environment. Perhaps those areas would be fruitful to profile? Also, there were some inefficiencies that were either the result of a botched port, or were basic to the system that got fixed. Between fixing all these things, the boot time went from 10 minutes down to ~20s. } } Always nice with some ideas. The problem here is that this used to be } way faster in the past, but have slowed down recently. } } The time between entering a username and getting the password prompt in } the same 3500 with the latest release is something like 30 seconds. } } This is on an otherwise idle system, where boot has completed. 30 } seconds (approximately, I should time it) just from pressing enter after } the username, until I just get the "Password:" prompt seems incredible } to me. } } The root fs in on nfs, as I'm running the machine diskless. Disk is } served from a -current NetBSD/alpha system sitting right next to it. And } I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k } block size for NFS. Login is obviously already running, since that is } what also prompts for the username, and doing it twice should even put } some stuff in local cache. Uh, actually getty does the initial prompt for username on the console. After collecting the username, getty execs login. }-- End of excerpt from Johnny Billquist
Re: asymmetric smp
Martin Husemann skrev 2014-04-02 15:33: On Wed, Apr 02, 2014 at 03:13:19PM +0200, Johnny Billquist wrote: What model of VAX do you have, and how long does it take to boot, to the point where you get the login prompt on the console? VS4000/M96 with 128 MB, and local scsi disk - very nice machine ;-) Didn't measure exactly right now, but on the order of 30s. Heh, that machines is like 30 times faster than Johnny's :-) -- R
Re: asymmetric smp
On Wed, Apr 02, 2014 at 03:13:19PM +0200, Johnny Billquist wrote: > What model of VAX do you have, and how long does it take to boot, to the > point where you get the login prompt on the console? VS4000/M96 with 128 MB, and local scsi disk - very nice machine ;-) Didn't measure exactly right now, but on the order of 30s. Martin
Re: asymmetric smp
On 2014-04-02 14:00, Martin Husemann wrote: On Wed, Apr 02, 2014 at 01:55:04PM +0200, Johnny Billquist wrote: The time between entering a username and getting the password prompt in the same 3500 with the latest release is something like 30 seconds. Mostly it needs someone with an affected machine to debug it. My bet would be on PAM being related to this issue. (FWIW, it takes < 3 seconds on my vax) Hmm. This is with a very default installation that I have not even started trying to customize. But thanks, PAM is definitely an interesting point. I should look. And even more thanks in saying that it's not the same for you. That implies that there is something bad in the config, and not something totally generic. What model of VAX do you have, and how long does it take to boot, to the point where you get the login prompt on the console? Johnny
Re: asymmetric smp
On Wed, Apr 02, 2014 at 01:55:04PM +0200, Johnny Billquist wrote: > The time between entering a username and getting the password prompt in > the same 3500 with the latest release is something like 30 seconds. Mostly it needs someone with an affected machine to debug it. My bet would be on PAM being related to this issue. (FWIW, it takes < 3 seconds on my vax) Martin
Re: HP Proliant won't reboot
On Wed, 02 Apr 2014 09:17:02 +0900 (JST) Jun Ebihara wrote: > > and stops. NumLock doesn't change the light and the system won't > > respond to the keyboard or the network. Luckily it has already > > unmounted the drive so I can safely power cycle but that's going to > > be extremely inconvenient in a lights out situation. > > Try to cold boot. if it works,try to update iLO firmware if you can. Do you mean cold boot to get it started again? Sure, that works. I have had it off for more than three minutes too. I did update the BIOS with HP's upgrade ISO. > 1. plug off power cable from power supply. > 2. wait 3 min and pray. Funny, that's the name I gave it. :-) > 3. plug in power cable and boot. Same issue. When I reboot it hangs. It's fine as long as it is running. -- D'Arcy J.M. Cain http://www.NetBSD.org/ IM:da...@vex.net
Re: asymmetric smp
On 2014-04-01 23:04, Warner Losh wrote: On Apr 1, 2014, at 5:49 AM, Johnny Billquist wrote: Good points. Is this the right time to ask why booting NetBSD on a VAX (a 3500) now takes more than 15 minutes? What is the system doing all that time??? FreeBSD used to take forever to boot on certain low-end ARM CPUs with /etc/rc.d after it was imported from NetBSD. This was due to crappy root-device performance (100kB/s is enough for anybody, right?) and crappy, at the time, pmap code that caused excess page traffic in the /etc/rc.d environment. Perhaps those areas would be fruitful to profile? Also, there were some inefficiencies that were either the result of a botched port, or were basic to the system that got fixed. Between fixing all these things, the boot time went from 10 minutes down to ~20s. Always nice with some ideas. The problem here is that this used to be way faster in the past, but have slowed down recently. The time between entering a username and getting the password prompt in the same 3500 with the latest release is something like 30 seconds. This is on an otherwise idle system, where boot has completed. 30 seconds (approximately, I should time it) just from pressing enter after the username, until I just get the "Password:" prompt seems incredible to me. The root fs in on nfs, as I'm running the machine diskless. Disk is served from a -current NetBSD/alpha system sitting right next to it. And I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k block size for NFS. Login is obviously already running, since that is what also prompts for the username, and doing it twice should even put some stuff in local cache. The 3500 have 16 megs of memory, and looking at the system once logged in, I'm not using much CPU, nor is all memory committed yet. I have (on and off) complained about perceived slowing down of NetBSD/vax over the years, but it's been a little while since I last tried, and my experience now on the 3500 is so horrible that I'd say it is very close to impossible to run NetBSD on it anymore. And actually, the Alpha is pretty saggy with -current as well. No idea why, but occasionally it spends a whole lot of time in system mode. Might be related. Or not... Johnny
Re: HP Proliant won't reboot
On Tue, 1 Apr 2014 08:45:36 -0700 Andy Ruhl wrote: > > I am downloading the BIOS upgrade DVD now. > > That seems more like a kernel issue to me, but it can't hurt to update > the firmware. I did the upgrade but it still acts the same. Yes, it does seem like a kernel bug since both FreeBSD and Ubuntu reboot just fine. > It would be interesting to know if other versions of NetBSD work, > current especially. I will try that shortly as soon as I have built a new kernel. > Maybe try with ACPIVERBOSE enabled just see what's happening? Is there any way to do that other than rebuilding a kernel with the option? I couldn't find a boot.cfg option for it. -- D'Arcy J.M. Cain http://www.NetBSD.org/ IM:da...@vex.net