Re: HP Proliant won't reboot

2014-04-02 Thread Andy Ruhl
On Wed, Apr 2, 2014 at 9:45 AM, D'Arcy J.M. Cain  wrote:
> On Wed, 2 Apr 2014 07:56:04 -0400
> "D'Arcy J.M. Cain"  wrote:
>> On Tue, 1 Apr 2014 08:45:36 -0700
>> Andy Ruhl  wrote:
>> > It would be interesting to know if other versions of NetBSD work,
>> > current especially.
>>
>> I will try that shortly as soon as I have built a new kernel.
>
> I tried to build a new kernel but I am getting a build error.  I need
> to investigate.  Meanwhile I grabbed a snapshot from today and it still
> fails.
>
> At this point I don't even know where to start debugging this.  Can
> someone with clue please suggest?

You've gotta get more messages somehow. If it was mine, I would set
ddb.fromconsole=1, set up a serial console, and try to get it into the
debugger from the serial console. From there, someone can get an idea
of what's going on.

I see lots of diagnostic and debug options in the kernel. Based on my
own digging, I think you want to set DEBUG. There are probably others.
Hopefully someone else will speak up.

Andy


Re: HP Proliant won't reboot

2014-04-02 Thread D'Arcy J.M. Cain
On Wed, 2 Apr 2014 07:56:04 -0400
"D'Arcy J.M. Cain"  wrote:
> On Tue, 1 Apr 2014 08:45:36 -0700
> Andy Ruhl  wrote:
> > It would be interesting to know if other versions of NetBSD work,
> > current especially.
> 
> I will try that shortly as soon as I have built a new kernel.

I tried to build a new kernel but I am getting a build error.  I need
to investigate.  Meanwhile I grabbed a snapshot from today and it still
fails.

At this point I don't even know where to start debugging this.  Can
someone with clue please suggest?

-- 
D'Arcy J.M. Cain 
http://www.NetBSD.org/ IM:da...@vex.net


Re: Proposal for kernel clock changes

2014-04-02 Thread Warner Losh

On Apr 1, 2014, at 1:50 PM, David Laight  wrote:

> This may mean that you can (effectively) count the ticks on all your
> clocks since 'boot' and then scale the frequency of each to give the
> same 'time since boot' - even though that will slightly change the
> relationship between old timestamps taken on different clocks.
> Possibly you do need a small offset for each clock to avoid
> discrepencies in the 'current time' when you recalculate the clocks
> frequency.

If the underling clock moves in frequency, you need to have both a
scale on the frequency, and a time to count adjustment as well. Otherwise
on long-running systems you accumulate a fair amount of error. It doesn’t
take much more than 1ppm of error to accumulate a second of error in 10
days if you don’t have ‘on time’ marks that integrate all of time up to that
point. Then the error in phase will be related to the time since last phase
sync, rather than since time of boot.

Warner



Re: Proposal for kernel clock changes

2014-04-02 Thread Dennis Ferguson

On 1 Apr, 2014, at 12:50 , David Laight  wrote:
> On Fri, Mar 28, 2014 at 06:16:23PM -0400, Dennis Ferguson wrote:
>> I would like to rework the clock support in the kernel a bit to correct
>> some deficiencies which exist now, and to provide new functionality.  The
>> issues I would like to try to address include:
> 
> A few comments, I've deleted the body so they aren't hidden!

Thanks very much for looking at it.  I know that reading about
clocks is, for most people, a good way to put oneself to sleep
at night.

> One problem I do see is knowing which counter to trust most.
> You are trying to cross synchronise values and it might be that
> the clock with the best long term accuracy is a very slow one
> with a lot of jitter (NTP over dialup anyone?).
> Whereas the fastest clock is likely to have the least jitter, but
> may not have the long term stability.

This is true but when considering the quality of non-special-purpose
computer clock hardware running on its own, either on the CPU board
or on an ethernet card, what you'll effectively end up trying to
determine by this is whether the clock is just crappy, or is crappier
than that. The stability of cheap, uncompensated free-running
crystals is always poor, you shouldn't trust any of these these unless
you have no choice, and life is too short to worry about trying to
measure degrees of crappiness.

Since all the clocks in your system are likely to be crappy if left
running free the "best" clock in the system will always be the one
which is making the most accurate measurements of the most accurate
external time source you have available and steering itself to that.
The only important "quality" of a clock is how well it is measuring
its time source and how good that time source is.  The measurement
clocks are only useful if you have an application which is interested
in taking and processing those measurements, and if that application
is not broken it will certainly come to some opinion about which of
those clocks is the best one based on those measurements.  That will be
the clock the time comes from, the polling is the mechanism to get it
to the others. The kernel itself will see the polling and see adjustments
being made to clocks but it will be the application which knows why that
is being done and which way the time is moving.  If there are no
external time sources, however, you'll probably just live with whatever
your chosen system clock does and not worry about the measurement clocks.

> There are places where you are only interested in the difference
> between timestamps - rather than needing them converting to absolute
> times.

I'm not quite sure how to read that, but I'll guess.  I over-simplified
the description of what is being maintained a bit.  I'm fond of, and
the system call interface I like makes use of, the two timescales the
kernel maintains now, i.e.

time = uptime + boottime;

where `time' has an UTC-aligned epoch, `uptime's epoch is around the
time the machine was booted, and boottime is a mostly-constant value
which expresses uptime's epoch in terms of time's epoch.  uptime is
maintained to advance at the same rate as time but to be phase
continuous, which means that uptime will advance at as close to the
rate of the SI second as we can determine it (since it advances
at the same rate as time, which advances at the rate of UTC, which
advances at the rate of the SI second) but is unaffected by step
changes made to time make to bring it into phase alignment to UTC
(boottime changes instead).  uptime hence tracks UTC's frequency but
not its phase.

If you want to measure the interval between timestamps, then, I think
you would take your timestamps in terms of uptime and then compute

interval = uptime[1] - uptime[0];

which should reliably give you system's best estimate of the elapsed
number of SI seconds between the times the two stamps were acquired.
I like to record event timestamps in terms of uptime as well since it
makes it unambiguous when the events occurred even if someone calls
clock_settime() in between.  Also, the tuple describing a conversion
from a tickcount_t tc to a systime_t, which I over-simplified, actually
maintains the pair of timescales by maintaining two `c' values, so that

time = (tc << s) * r + c_time;
uptime = (tc << s) * r + c_uptime;

and

boottime = c_time - c_uptime;

So if "absolute time" means UTC, in the form of UTC-aligned `time',
then I agree.  You can't reliably compute time intervals from two
UTC timestamps since, almost unavoidably, some day the system's
estimate of UTC will be wrong and will require a step change to
fix, and you'll compute a bogus time interval if your timestamps
straddle that.  On the other hand, if avoiding "needing them
converted to absolute times" means hanging on to the raw
tickstamp/tickcount for an extended period then I don't see
the point.  The conversion isn't very expensive, and a pair of
uptime timestamps taken from the system clock will rel

Re: asymmetric smp

2014-04-02 Thread Johnny Billquist

On 2014-04-02 16:10, John Nemeth wrote:

On Apr 2,  1:55pm, Johnny Billquist wrote:
} The root fs in on nfs, as I'm running the machine diskless. Disk is
} served from a -current NetBSD/alpha system sitting right next to it. And
} I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k
} block size for NFS. Login is obviously already running, since that is
} what also prompts for the username, and doing it twice should even put
} some stuff in local cache.

  Uh, actually getty does the initial prompt for username on
the console.  After collecting the username, getty execs login.


Hmm. My mistake in that case. So we have image activation at that point. 
Hmm...


Thanks for pointing it out.

Johnny



Re: asymmetric smp

2014-04-02 Thread Johnny Billquist

On 2014-04-02 15:38, Anders Magnusson wrote:

Martin Husemann skrev 2014-04-02 15:33:

On Wed, Apr 02, 2014 at 03:13:19PM +0200, Johnny Billquist wrote:

What model of VAX do you have, and how long does it take to boot, to the
point where you get the login prompt on the console?

VS4000/M96 with 128 MB, and local scsi disk - very nice machine ;-)
Didn't measure exactly right now, but on the order of 30s.


Heh, that machines is like 30 times faster than Johnny's :-)


Indeed. You can't get much faster VAXen than that. If it takes about 3s 
from pressing enter after the username until the password prompt 
appears, then 30s on this 3500 seems very reasonable.


Johnny



Re: asymmetric smp

2014-04-02 Thread John Nemeth
On Apr 2,  1:55pm, Johnny Billquist wrote:
} On 2014-04-01 23:04, Warner Losh wrote:
} > On Apr 1, 2014, at 5:49 AM, Johnny Billquist  wrote:
} >
} >> Good points.
} >> Is this the right time to ask why booting NetBSD on a VAX (a 3500) now 
takes more than 15 minutes? What is the system doing all that time???
} >
} > FreeBSD used to take forever to boot on certain low-end ARM CPUs with 
/etc/rc.d after it was imported from NetBSD. This was due to crappy root-device 
performance (100kB/s is enough for anybody, right?) and crappy, at the time, 
pmap code that caused excess page traffic in the /etc/rc.d environment. Perhaps 
those areas would be fruitful to profile? Also, there were some inefficiencies 
that were either the result of a botched port, or were basic to the system that 
got fixed. Between fixing all these things, the boot time went from 10 minutes 
down to ~20s.
} 
} Always nice with some ideas. The problem here is that this used to be 
} way faster in the past, but have slowed down recently.
} 
} The time between entering a username and getting the password prompt in 
} the same 3500 with the latest release is something like 30 seconds.
} 
} This is on an otherwise idle system, where boot has completed. 30 
} seconds (approximately, I should time it) just from pressing enter after 
} the username, until I just get the "Password:" prompt seems incredible 
} to me.
} 
} The root fs in on nfs, as I'm running the machine diskless. Disk is 
} served from a -current NetBSD/alpha system sitting right next to it. And 
} I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k 
} block size for NFS. Login is obviously already running, since that is 
} what also prompts for the username, and doing it twice should even put 
} some stuff in local cache.

 Uh, actually getty does the initial prompt for username on
the console.  After collecting the username, getty execs login.

}-- End of excerpt from Johnny Billquist


Re: asymmetric smp

2014-04-02 Thread Anders Magnusson

Martin Husemann skrev 2014-04-02 15:33:

On Wed, Apr 02, 2014 at 03:13:19PM +0200, Johnny Billquist wrote:

What model of VAX do you have, and how long does it take to boot, to the
point where you get the login prompt on the console?

VS4000/M96 with 128 MB, and local scsi disk - very nice machine ;-)
Didn't measure exactly right now, but on the order of 30s.


Heh, that machines is like 30 times faster than Johnny's :-)

-- R


Re: asymmetric smp

2014-04-02 Thread Martin Husemann
On Wed, Apr 02, 2014 at 03:13:19PM +0200, Johnny Billquist wrote:
> What model of VAX do you have, and how long does it take to boot, to the 
> point where you get the login prompt on the console?

VS4000/M96 with 128 MB, and local scsi disk - very nice machine ;-)
Didn't measure exactly right now, but on the order of 30s.

Martin


Re: asymmetric smp

2014-04-02 Thread Johnny Billquist

On 2014-04-02 14:00, Martin Husemann wrote:

On Wed, Apr 02, 2014 at 01:55:04PM +0200, Johnny Billquist wrote:

The time between entering a username and getting the password prompt in
the same 3500 with the latest release is something like 30 seconds.


Mostly it needs someone with an affected machine to debug it. My bet
would be on PAM being related to this issue.

(FWIW, it takes < 3 seconds on my vax)


Hmm. This is with a very default installation that I have not even 
started trying to customize. But thanks, PAM is definitely an 
interesting point. I should look.


And even more thanks in saying that it's not the same for you. That 
implies that there is something bad in the config, and not something 
totally generic.


What model of VAX do you have, and how long does it take to boot, to the 
point where you get the login prompt on the console?


Johnny



Re: asymmetric smp

2014-04-02 Thread Martin Husemann
On Wed, Apr 02, 2014 at 01:55:04PM +0200, Johnny Billquist wrote:
> The time between entering a username and getting the password prompt in 
> the same 3500 with the latest release is something like 30 seconds.

Mostly it needs someone with an affected machine to debug it. My bet
would be on PAM being related to this issue.

(FWIW, it takes < 3 seconds on my vax)

Martin


Re: HP Proliant won't reboot

2014-04-02 Thread D'Arcy J.M. Cain
On Wed, 02 Apr 2014 09:17:02 +0900 (JST)
Jun Ebihara  wrote:
> > and stops.  NumLock doesn't change the light and the system won't
> > respond to the keyboard or the network.  Luckily it has already
> > unmounted the drive so I can safely power cycle but that's going to
> > be extremely inconvenient in a lights out situation.
> 
> Try to cold boot. if it works,try to update iLO firmware if you can. 

Do you mean cold boot to get it started again?  Sure, that works.  I
have had it off for more than three minutes too.  I did update the BIOS
with HP's upgrade ISO.

> 1. plug off power cable from power supply.
> 2. wait 3 min and pray.

Funny, that's the name I gave it.  :-)

> 3. plug in power cable and boot.

Same issue.  When I reboot it hangs.  It's fine as long as it is
running.

-- 
D'Arcy J.M. Cain 
http://www.NetBSD.org/ IM:da...@vex.net


Re: asymmetric smp

2014-04-02 Thread Johnny Billquist

On 2014-04-01 23:04, Warner Losh wrote:


On Apr 1, 2014, at 5:49 AM, Johnny Billquist  wrote:


Good points.
Is this the right time to ask why booting NetBSD on a VAX (a 3500) now takes 
more than 15 minutes? What is the system doing all that time???


FreeBSD used to take forever to boot on certain low-end ARM CPUs with /etc/rc.d 
after it was imported from NetBSD. This was due to crappy root-device 
performance (100kB/s is enough for anybody, right?) and crappy, at the time, 
pmap code that caused excess page traffic in the /etc/rc.d environment. Perhaps 
those areas would be fruitful to profile? Also, there were some inefficiencies 
that were either the result of a botched port, or were basic to the system that 
got fixed. Between fixing all these things, the boot time went from 10 minutes 
down to ~20s.


Always nice with some ideas. The problem here is that this used to be 
way faster in the past, but have slowed down recently.


The time between entering a username and getting the password prompt in 
the same 3500 with the latest release is something like 30 seconds.


This is on an otherwise idle system, where boot has completed. 30 
seconds (approximately, I should time it) just from pressing enter after 
the username, until I just get the "Password:" prompt seems incredible 
to me.


The root fs in on nfs, as I'm running the machine diskless. Disk is 
served from a -current NetBSD/alpha system sitting right next to it. And 
I have changed the Alpha to run at 10 MB/s half duplex, and I have 2k 
block size for NFS. Login is obviously already running, since that is 
what also prompts for the username, and doing it twice should even put 
some stuff in local cache.


The 3500 have 16 megs of memory, and looking at the system once logged 
in, I'm not using much CPU, nor is all memory committed yet.


I have (on and off) complained about perceived slowing down of 
NetBSD/vax over the years, but it's been a little while since I last 
tried, and my experience now on the 3500 is so horrible that I'd say it 
is very close to impossible to run NetBSD on it anymore.


And actually, the Alpha is pretty saggy with -current as well. No idea 
why, but occasionally it spends a whole lot of time in system mode. 
Might be related. Or not...


Johnny



Re: HP Proliant won't reboot

2014-04-02 Thread D'Arcy J.M. Cain
On Tue, 1 Apr 2014 08:45:36 -0700
Andy Ruhl  wrote:
> > I am downloading the BIOS upgrade DVD now.
> 
> That seems more like a kernel issue to me, but it can't hurt to update
> the firmware.

I did the upgrade but it still acts the same.  Yes, it does seem like a
kernel bug since both FreeBSD and Ubuntu reboot just fine.

> It would be interesting to know if other versions of NetBSD work,
> current especially.

I will try that shortly as soon as I have built a new kernel.

> Maybe try with ACPIVERBOSE enabled just see what's happening?

Is there any way to do that other than rebuilding a kernel with the
option?  I couldn't find a boot.cfg option for it.

-- 
D'Arcy J.M. Cain 
http://www.NetBSD.org/ IM:da...@vex.net