Daniel Ouellet wrote:
Hi,

Any idea on how it might be possible to boot the system step by step to get an idea of where this bug might be isolated?

I strip the boot process as much as possible and this is a very old issue, but may be there is a way to find more in it. Looking at it more, I think, it's possibly in the scheduler of the kernel. I can see this problem only on Sun systems, either with the X1 or the V100 so far.

Rebooting the system will give you either a load of 1.08 to 1.12 or 0.08 to 0.12.

I strip the system as much as I can from daemon start now to show it well.

# cat /etc/rc.conf.local
sshd_flags=NO
sendmail_flags=NO
syslogd_flags=NO
inetd=NO                # almost always needed

and you can see there isn't anything running on the system to justify this load.

# ps -auxwk
USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
root         3 99.0  0.0     0     0 ??  DK     6:15PM    7:08.13 (idle0)
root 8 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (pagedaemon) root 9 0.0 0.0 0 0 ?? DK 6:15PM 0:00.26 (reaper) root 12 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (aiodoned) root 11 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (update) root 10 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (cleaner) root 13 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (crypto) root 0 0.0 0.0 0 0 ?? DKs 6:15PM 0:00.00 (swapper)
root         4  0.0  0.0     0     0 ??  DK     6:15PM    0:00.00 (syswq)
root 2 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (kmthread) root 1 0.0 0.1 616 408 ?? Is 6:15PM 0:00.01 /sbin/init root 7 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (pfpurge) root 6 0.0 0.0 0 0 ?? DK 6:15PM 0:00.00 (usbtask)
root         5  0.0  0.0     0     0 ??  DK     6:15PM    0:00.01 (usb0)
root      6772  0.0  0.2   664  1040 ??  Ss     6:15PM    0:00.02 cron
root 11233 0.0 0.1 552 528 00 Ss 6:15PM 0:00.08 -ksh (ksh) root 32400 0.0 0.1 416 352 00 R+ 6:22PM 0:00.00 ps -auxwk


however, you get this:

# uptime
 6:22PM  up 8 mins, 1 user, load averages: 1.08, 0.89, 0.48
# sysctl vm.loadavg
vm.loadavg=1.08 0.89 0.48
# sysctl kern.nprocs
kern.nprocs=17
# sysctl kern.version
kern.version=OpenBSD 4.4 (GENERIC) #1714: Wed Aug  6 13:31:49 MDT 2008
    [EMAIL PROTECTED]:/usr/src/sys/arch/sparc64/compile/GENERIC

# sysctl hw.model
hw.model=SUNW,UltraSPARC-IIe (rev 3.3) @ 548 MHz


I tried a few different things with boot -c to see, but so far, I can't isolate where this might be.

The only thing I get is that it is ONLY and ALWAYS from the start of the system.

So, either it will be off by one on boot, or good.

Needs to be rebooted may be 5 times to get the real reading, (not off by 1) but then you can get that.

Any suggestion on how I could get more details to dig this more?

I was thinking of may be putting some kind of delay in the scheduler in case it might be possible to isolate it more that way, but I am not sure how I could do it.

Or may be log from the scheduler to get what process add/remove to the load average here, but again no success doing that yet.

This is not really hardware broken as I can do that on way more then 20 different systems here.

May be this might affect something else in the scheduler as looking at the code looks like some process are schedule based on their load and how long they have run. So, if the data is wrong, it may well lead to other issues cause by this.

Any possible suggestions to try to dig this up more and get may be more valuable informations?

One thing for sure, it's always either right, or off by one when present.

Thanks

Daniel


I had the same issue with an X1 at work, disabling USB with boot -c or config would eliminate the problem.

It's been over 6 months since I worked on this, and I won't be able to verify until Thursday, but I recall leaving usb enabled, but keeping a USB device such as a mouse or RS232 adapter plugged in would also bring the load back to 0. Seems like the load could go back up 1 once I unplugged the device, but it's been a while.

I assumed it was buggy USB hardware causing something usb related in the kernel to block on IO and raise the load by 1.

Hopefully this gives you something to go on.

Reply via email to