Re: Why is my load ave so high now? [Now I know why!]

2009-07-28 Thread Rick Stevens

Kevin J. Cummings wrote:

On 07/27/2009 02:26 PM, Rick Stevens wrote:

You see a bunch of NFS-related things in a D state and you wonder why
it's slow?


Yes.  Mostly because the machine accessing the NFS mounts has been
re-booted a couple of times.


If you have processes in an I/O wait (a.k.a. D) state, that'll bog
stuff down badly...especially if the NFS mounts are mounted hard.


Well, tonight I rebooted the server with NFS turned off.  When it
booted, I saw a load average between 1 and 2.  That's all.  When it
re-booted, ivtv started back up, despite my blacklisting it and removing
it from modprobe.conf.  However, ivtvfb did not get installed.
I also noticed that BOINC started right up again.  With astropulse
grabbing all the idle cpu time, my load average was still between 1 and 2.

So, I decided that NFS was my problem, but I'm still not sure why.

So, I tried a couple of things.  My laptop references a few directories
on my server via NFS and autofs.

So, I started nfs again on the server (service nfs start)

Load average remains between 1 and 2.  So far so good.


From the laptop, I did a cd /net/kjc386.  I can then do an ls and see

all of the exported filesystems.  Continues to look good.

ls home lists the directories in the server's exported /home dir.
nfs does the work, and disappears from the top -i that I have running.
Great.

Next I do a ls c: to look at the old WINDOWS partition on my server.
HANG!  I can't interrupt the ls with ^C nor ^Z.  I have to kill it from
another process.  When I do, the hung nfs processes on the server stay
hung.  After it collects all 8 allowed nfs processes, nothing more nfs
works to the server, and the load average climbs roughly 1 per nfs
process (I watched the load average increase with each new nfs process
that appeared).

So, I guess my question is what's broken with NFS between my F11 laptop
and the F10 server


I could see where ls c: might be interpreted by the system as trying
to find an NFS machine called c.  An NFS mount command is:

mount -t nfs server:/sharename /mountpoint

Perhaps F11 is trying to invoke an automount of an NFS share from server
c to satisfy your ls command.  That'd be wild!

I haven't tried this.  perhaps you've found a very subtle bug in F11's
NFS client implementation.  Could you run a wireshark or tcpdump and
watch for NFS traffic when you do that ls c: command?  If you do,
then I'd file a bugzilla PDQ (pretty damned quick).

--
- Rick Stevens, Systems Engineer  ri...@nerd.com -
- AIM/Skype: therps2ICQ: 22643734Yahoo: origrps2 -
--
- People tell me I look at the dark side.  That's not true.  I have -
-   the heart of a small boy..in a jar right here on my desk.   -
--- Stephen King -
--

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now? [Now I know why!]

2009-07-28 Thread Kevin J. Cummings
On 07/28/2009 02:00 PM, Rick Stevens wrote:
 So, I guess my question is what's broken with NFS between my F11 laptop
 and the F10 server
 
 I could see where ls c: might be interpreted by the system as trying
 to find an NFS machine called c.  An NFS mount command is:
 
 mount -t nfs server:/sharename /mountpoint
 
 Perhaps F11 is trying to invoke an automount of an NFS share from server
 c to satisfy your ls command.  That'd be wild!
 
 I haven't tried this.  perhaps you've found a very subtle bug in F11's
 NFS client implementation.  Could you run a wireshark or tcpdump and
 watch for NFS traffic when you do that ls c: command?  If you do,
 then I'd file a bugzilla PDQ (pretty damned quick).

Well, since my cwd at the time is /net/kjc386, I fully expect ls c: to
generate NFS traffic, because (through the autofs stuff) its trying to
access kjc386:/c: which is one of the exported directories from the
server kjc386.  Did I misinterpret what you were trying to say?

I know what you are trying to say, and this naming convention that I
have been using for years now, has only tripped up emacs's readdir stuff
in the past, never ls.  I suppose I could try changing the directories
mount point from c: to c and see if that helps

-- 
Kevin J. Cummings
kjch...@rcn.com
cummi...@kjchome.homeip.net
cummi...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-27 Thread Bill Davidsen

Kevin J. Cummings wrote:

On 07/25/2009 09:58 AM, Aaron Konstam wrote:

Wwll two things, one positive and one negative. The r column tells us
there are not many processes waiting for run time which we normally
associate with a low load average, However your number of interrupts per
second (in) are rather high. Some kernel action seems to be really
beating your machine over the head so to speak. How you find out what
processes these are that are that are interrupting is not clear to me,
however. Is your primary process doing a lot of I/O?


No, astropulse should be CPU bound, not IO bound.  It reads in some 
data, performs *lots* of calculations on it (hours worth) and then 
writes the results out to a file which it then sends back to SETI, and 
downloads another work unit.


I'm very much intrested in how I can figure out where the interrupts are 
coming from


so I ran 2 copies of cat /proc/interurupts 10 seconds apart, and here 
are the delta interrupts in that time



   CPU0
  0:  0   IO-APIC-edge  timer
  1:  0   IO-APIC-edge  i8042
  4:  0   IO-APIC-edge
  6:  0   IO-APIC-edge  floppy
  7:  0   IO-APIC-edge  parport0
  8:  0   IO-APIC-edge  rtc0
  9:  0   IO-APIC-fasteoi   acpi
 12:  0   IO-APIC-edge  i8042
 14:  0   IO-APIC-edge  pata_amd
 15:209   IO-APIC-edge  pata_amd
 16:   2790   IO-APIC-fasteoi   ivtv0
 18:  9   IO-APIC-fasteoi   aic7xxx, cx88[0], cx88[0], 
cx88[0], eth0

 20: 60   IO-APIC-fasteoi   ohci_hcd:usb2, NVidia CK8
 21:114   IO-APIC-fasteoi   ehci_hcd:usb1, ohci_hcd:usb3
 22: 21   IO-APIC-fasteoi   sata_nv
NMI:  0   Non-maskable interrupts
LOC:246   Local timer interrupts
RES:  0   Rescheduling interrupts
CAL:  0   function call interrupts
TLB:  0   TLB shootdowns
TRM:  0   Thermal event interrupts
SPU:  0   Spurious interrupts
ERR:  0
MIS:  0


Could it be my ivtv0 (PVR-350) board?  Its not supposed to be doing 
anything at the moment!  There's nothing plugged into it, and its not 
configured under MythTV right now (cable went all digital)


I'll try removing the driver module and see if that helps.  At worst, 
I'll remove the board entirely.


Looking at the original 'top' output, all the CPU was going to nice processing, 
presumable SETI. When you kill that you note the load average is still high, 
could we see the top few lines again to see the distribution? I note that hi/si 
are low, and load average indicates runable process (my first guess was the seti 
went threaded). So 'top' with the 'i' visual option (only show runnable tasks) 
should show what's running.


--
Bill Davidsen david...@tmr.com
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-27 Thread Kevin J. Cummings
On 07/27/2009 12:04 PM, Bill Davidsen wrote:
 Kevin J. Cummings wrote:
 Could it be my ivtv0 (PVR-350) board?  Its not supposed to be doing
 anything at the moment!  There's nothing plugged into it, and its not
 configured under MythTV right now (cable went all digital)

 I'll try removing the driver module and see if that helps.  At worst,
 I'll remove the board entirely.

I ended up rmmod ivtvt and ivtvfb, and it didn't help.  Yes, the number
of ints propped noticeably, but the load average remains 10+

 Looking at the original 'top' output, all the CPU was going to nice
 processing, presumable SETI. When you kill that you note the load
 average is still high, could we see the top few lines again to see the
 distribution? I note that hi/si are low, and load average indicates
 runable process (my first guess was the seti went threaded). So 'top'
 with the 'i' visual option (only show runnable tasks) should show what's
 running.

(I learn something new everyday!)

Sure, here it is for top -i:

 top - 12:58:12 up 6 days,  9:02,  4 users,  load average: 11.30, 11.15, 11.10
 Cpu(s):  0.7%us,  0.7%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
 Mem:   2074172k total,  1961024k used,   113148k free,   199680k buffers
 Swap:  3911816k total,  412k used,  3911404k free,   935716k cached
 
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND  
   
   
 14743 root  20   0  2560 1152  836 R  0.7  0.1   0:00.08 top  
   
   
  2506 root  20   0 15068  860  592 R  0.0  0.0   0:32.72 apcupsd  
   
   
  2547 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
   
   
  2548 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
   
   
  2549 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
   
   
  2550 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
   
   
  2551 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
   
   
  2552 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
   
   
  2553 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
   
   
  2554 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
   
   
 17427 root  20   0 77040  72m  752 D  0.0  3.6   0:02.07 clamscan 
   
   
 24904 root  20   0  2492  964  704 D  0.0  0.0   0:01.87 find 
   
   
 28703 root  39  19  1900  652  540 D  0.0  0.0   0:00.00 updatedb 
   
   

That's the entire top output

-- 
Kevin J. Cummings
kjch...@rcn.com
cummi...@kjchome.homeip.net
cummi...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-27 Thread Rick Stevens

Kevin J. Cummings wrote:

On 07/27/2009 12:04 PM, Bill Davidsen wrote:

Kevin J. Cummings wrote:

Could it be my ivtv0 (PVR-350) board?  Its not supposed to be doing
anything at the moment!  There's nothing plugged into it, and its not
configured under MythTV right now (cable went all digital)

I'll try removing the driver module and see if that helps.  At worst,
I'll remove the board entirely.


I ended up rmmod ivtvt and ivtvfb, and it didn't help.  Yes, the number
of ints propped noticeably, but the load average remains 10+


Looking at the original 'top' output, all the CPU was going to nice
processing, presumable SETI. When you kill that you note the load
average is still high, could we see the top few lines again to see the
distribution? I note that hi/si are low, and load average indicates
runable process (my first guess was the seti went threaded). So 'top'
with the 'i' visual option (only show runnable tasks) should show what's
running.


(I learn something new everyday!)

Sure, here it is for top -i:


top - 12:58:12 up 6 days,  9:02,  4 users,  load average: 11.30, 11.15, 11.10
Cpu(s):  0.7%us,  0.7%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2074172k total,  1961024k used,   113148k free,   199680k buffers
Swap:  3911816k total,  412k used,  3911404k free,   935716k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND  
14743 root  20   0  2560 1152  836 R  0.7  0.1   0:00.08 top  
 2506 root  20   0 15068  860  592 R  0.0  0.0   0:32.72 apcupsd  
 2547 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
 2548 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
 2549 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
 2550 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
 2551 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
 2552 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
 2553 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
 2554 root  15  -5 000 D  0.0  0.0   0:00.00 nfsd 
17427 root  20   0 77040  72m  752 D  0.0  3.6   0:02.07 clamscan 
24904 root  20   0  2492  964  704 D  0.0  0.0   0:01.87 find 
28703 root  39  19  1900  652  540 D  0.0  0.0   0:00.00 updatedb 


That's the entire top output


You see a bunch of NFS-related things in a D state and you wonder why
it's slow?

If you have processes in an I/O wait (a.k.a. D) state, that'll bog
stuff down badly...especially if the NFS mounts are mounted hard.
--
- Rick Stevens, Systems Engineer  ri...@nerd.com -
- AIM/Skype: therps2ICQ: 22643734Yahoo: origrps2 -
--
- Squawk!  Pieces of Seven!  Pieces of Seven!  Parity Error! -
--

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now? [Now I know why!]

2009-07-27 Thread Kevin J. Cummings
On 07/27/2009 02:26 PM, Rick Stevens wrote:
 You see a bunch of NFS-related things in a D state and you wonder why
 it's slow?

Yes.  Mostly because the machine accessing the NFS mounts has been
re-booted a couple of times.

 If you have processes in an I/O wait (a.k.a. D) state, that'll bog
 stuff down badly...especially if the NFS mounts are mounted hard.

Well, tonight I rebooted the server with NFS turned off.  When it
booted, I saw a load average between 1 and 2.  That's all.  When it
re-booted, ivtv started back up, despite my blacklisting it and removing
it from modprobe.conf.  However, ivtvfb did not get installed.
I also noticed that BOINC started right up again.  With astropulse
grabbing all the idle cpu time, my load average was still between 1 and 2.

So, I decided that NFS was my problem, but I'm still not sure why.

So, I tried a couple of things.  My laptop references a few directories
on my server via NFS and autofs.

So, I started nfs again on the server (service nfs start)

Load average remains between 1 and 2.  So far so good.

From the laptop, I did a cd /net/kjc386.  I can then do an ls and see
all of the exported filesystems.  Continues to look good.

ls home lists the directories in the server's exported /home dir.
nfs does the work, and disappears from the top -i that I have running.
Great.

Next I do a ls c: to look at the old WINDOWS partition on my server.
HANG!  I can't interrupt the ls with ^C nor ^Z.  I have to kill it from
another process.  When I do, the hung nfs processes on the server stay
hung.  After it collects all 8 allowed nfs processes, nothing more nfs
works to the server, and the load average climbs roughly 1 per nfs
process (I watched the load average increase with each new nfs process
that appeared).

So, I guess my question is what's broken with NFS between my F11 laptop
and the F10 server

-- 
Kevin J. Cummings
kjch...@rcn.com
cummi...@kjchome.homeip.net
cummi...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-25 Thread Kevin J. Cummings

On 07/25/2009 09:58 AM, Aaron Konstam wrote:

Wwll two things, one positive and one negative. The r column tells us
there are not many processes waiting for run time which we normally
associate with a low load average, However your number of interrupts per
second (in) are rather high. Some kernel action seems to be really
beating your machine over the head so to speak. How you find out what
processes these are that are that are interrupting is not clear to me,
however. Is your primary process doing a lot of I/O?


No, astropulse should be CPU bound, not IO bound.  It reads in some 
data, performs *lots* of calculations on it (hours worth) and then 
writes the results out to a file which it then sends back to SETI, and 
downloads another work unit.


I'm very much intrested in how I can figure out where the interrupts are 
coming from


so I ran 2 copies of cat /proc/interurupts 10 seconds apart, and here 
are the delta interrupts in that time



   CPU0
  0:  0   IO-APIC-edge  timer
  1:  0   IO-APIC-edge  i8042
  4:  0   IO-APIC-edge
  6:  0   IO-APIC-edge  floppy
  7:  0   IO-APIC-edge  parport0
  8:  0   IO-APIC-edge  rtc0
  9:  0   IO-APIC-fasteoi   acpi
 12:  0   IO-APIC-edge  i8042
 14:  0   IO-APIC-edge  pata_amd
 15:209   IO-APIC-edge  pata_amd
 16:   2790   IO-APIC-fasteoi   ivtv0
 18:  9   IO-APIC-fasteoi   aic7xxx, cx88[0], cx88[0], cx88[0], eth0
 20: 60   IO-APIC-fasteoi   ohci_hcd:usb2, NVidia CK8
 21:114   IO-APIC-fasteoi   ehci_hcd:usb1, ohci_hcd:usb3
 22: 21   IO-APIC-fasteoi   sata_nv
NMI:  0   Non-maskable interrupts
LOC:246   Local timer interrupts
RES:  0   Rescheduling interrupts
CAL:  0   function call interrupts
TLB:  0   TLB shootdowns
TRM:  0   Thermal event interrupts
SPU:  0   Spurious interrupts
ERR:  0
MIS:  0


Could it be my ivtv0 (PVR-350) board?  Its not supposed to be doing 
anything at the moment!  There's nothing plugged into it, and its not 
configured under MythTV right now (cable went all digital)


I'll try removing the driver module and see if that helps.  At worst, 
I'll remove the board entirely.


Thanks Aaaron.

--
Kevin J. Cummings
kjch...@rcn.com
cummi...@kjchome.homeip.net
cummi...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-24 Thread Aaron Konstam
On Thu, 2009-07-23 at 22:51 -0400, Kevin J. Cummings wrote:
 When I was running F8, my server averaged a load ave oof around 4.
 
 Now that I'm running F10, and bittorrent is no longer running, in fact, 
 not much of anything besides s...@home  (BOINC client running 
 astro_pulse), my load average is up around 11 and frequently exceeds 12 
 (and of course when it exceeds 12, it stops receiving emails).  Here's a 
 5 second snapshot from top:
 
  top - 22:48:06 up 2 days, 18:51,  5 users,  load average: 11.15, 11.33, 
  11.63
  Tasks: 250 total,   2 running, 247 sleeping,   0 stopped,   1 zombie
  Cpu(s):  5.9%us,  1.8%sy, 92.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.0%si,  
  0.0%st
  Mem:   2074172k total,  1932680k used,   141492k free,   108872k buffers
  Swap:  3911816k total,  552k used,  3911264k free,   977568k cached
  
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  
  
   
   3234 root  39  19 50292  46m 2072 R 91.7  2.3   3520:10 
  astropulse_5.06   
What is astropulse that is using 91.7% of your cpu? You also seem to
have a large number of tasks running and sleeping which might up your
load average.
   
 
   3378 root  20   0  324m  36m 8496 S  3.6  1.8  14:34.20 Xorg   
  
  
   
  16271 cummings  20   0  172m  60m  21m S  1.1  3.0   0:37.77 
  thunderbird-bin 
  
  
   4026 cummings  20   0 22412  11m 7612 S  0.5  0.6   0:09.28 metacity   
  
  
   
   4108 cummings  20   0 27276  11m 8748 S  0.4  0.6  18:41.47 
  multiload-apple 
  
  
   4126 cummings  20   0 74844  18m  10m S  0.4  0.9   0:07.54 gnome-terminal 
  
  
   
   4027 cummings  20   0 63536  20m  10m S  0.3  1.0   0:27.59 gnome-panel
  
  
   
   4030 cummings  20   0 34104  13m 5500 S  0.3  0.7   0:24.95 
  gnome-screensav 
  
  
  16252 root  20   0  2560 1184  844 S  0.3  0.1   0:00.97 top
  
  
   
   4086 cummings  20   0 57960  14m 9716 S  0.2  0.7   0:06.57 wnck-applet
  
  
   
   4093 cummings  20   0 34464  14m  10m S  0.2  0.7   5:31.08 clock-applet   
  
  
   
196 root  15  -5 000 S  0.1  0.0   1:08.89 ata/0  
  
  
   
971 root  15  -5 000 S  0.1  0.0   3:38.03 scsi_eh_5  
  
  
   
   2838 root  20   0  3332  520  364 S  0.1  0.0   0:05.12 lircd  
  
  
   
   3138 root  20   0  3624 1032  912 S  0.1  0.0   0:14.66 
  hald-addon-stor 
  
  
   3175 root  20   0  

Re: Why is my load ave so high now?

2009-07-24 Thread Kevin J. Cummings

On 07/24/2009 10:15 AM, Aaron Konstam wrote:

On Thu, 2009-07-23 at 22:51 -0400, Kevin J. Cummings wrote:

When I was running F8, my server averaged a load ave oof around 4.

Now that I'm running F10, and bittorrent is no longer running, in fact,
not much of anything besides s...@home  (BOINC client running
astro_pulse), my load average is up around 11 and frequently exceeds 12
(and of course when it exceeds 12, it stops receiving emails).  Here's a
5 second snapshot from top:


top - 22:48:06 up 2 days, 18:51,  5 users,  load average: 11.15, 11.33, 11.63
Tasks: 250 total,   2 running, 247 sleeping,   0 stopped,   1 zombie
Cpu(s):  5.9%us,  1.8%sy, 92.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.0%si,  0.0%st
Mem:   2074172k total,  1932680k used,   141492k free,   108872k buffers
Swap:  3911816k total,  552k used,  3911264k free,   977568k cached

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  3234 root  39  19 50292  46m 2072 R 91.7  2.3   3520:10 astropulse_5.06

What is astropulse that is using 91.7% of your cpu? You also seem to
have a large number of tasks running and sleeping which might up your
load average.


astropulse is the s...@home BOINC client that I run (NICEd to 19).
It only uses excess cycles and in the past my load average has never 
exceed the 3-5 range, except when I was doing real work on the system 
(like running firefox, thunderbird, and other real programs), something 
I almost never do anymore since I bought myself a laptop.


Yeup, but not consuming any real CPU resources.

I guess what I'm asking is if only 1 job is grabbing most of the CPU, 
then what's causing the system to thrash?  (Is a load average of 12 
considered a thrashing system?  sendmail thinks it is.)



  3378 root  20   0  324m  36m 8496 S  3.6  1.8  14:34.20 Xorg
16271 cummings  20   0  172m  60m  21m S  1.1  3.0   0:37.77 thunderbird-bin
  4026 cummings  20   0 22412  11m 7612 S  0.5  0.6   0:09.28 metacity
  4108 cummings  20   0 27276  11m 8748 S  0.4  0.6  18:41.47 multiload-apple
  4126 cummings  20   0 74844  18m  10m S  0.4  0.9   0:07.54 gnome-terminal
  4027 cummings  20   0 63536  20m  10m S  0.3  1.0   0:27.59 gnome-panel
  4030 cummings  20   0 34104  13m 5500 S  0.3  0.7   0:24.95 gnome-screensav
16252 root  20   0  2560 1184  844 S  0.3  0.1   0:00.97 top
  4086 cummings  20   0 57960  14m 9716 S  0.2  0.7   0:06.57 wnck-applet
  4093 cummings  20   0 34464  14m  10m S  0.2  0.7   5:31.08 clock-applet
   196 root  15  -5 000 S  0.1  0.0   1:08.89 ata/0
   971 root  15  -5 000 S  0.1  0.0   3:38.03 scsi_eh_5
  2838 root  20   0  3332  520  364 S  0.1  0.0   0:05.12 lircd
  3138 root  20   0  3624 1032  912 S  0.1  0.0   0:14.66 hald-addon-stor
  3175 root  20   0  3624 1032  912 S  0.1  0.0   0:37.94 hald-addon-stor
  3194 mailman   20   0 13620 7000 2820 S  0.1  0.3   0:53.14 python
  3879 cummings  20   0 58964  30m 8092 S  0.1  1.5  21:56.13 gnome-settings-
  4478 cummings  20   0 10044 4432 2376 S  0.1  0.2   0:01.48 xterm


--
Kevin J. Cummings
kjch...@rcn.com
cummi...@kjchome.homeip.net
cummi...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-24 Thread Andrew Parker
On Fri, Jul 24, 2009 at 11:08 AM, Kevin J.
Cummingscummi...@kjchome.homeip.net wrote:
 astropulse is the s...@home BOINC client that I run (NICEd to 19).
 It only uses excess cycles and in the past my load average has never
 exceed the 3-5 range, except when I was doing real work on the system
 (like running firefox, thunderbird, and other real programs), something I
 almost never do anymore since I bought myself a laptop.

kill it off, wait a few minutes and see if the load average comes down

 I guess what I'm asking is if only 1 job is grabbing most of the CPU, then
 what's causing the system to thrash?  (Is a load average of 12 considered a
 thrashing system?  sendmail thinks it is.)

12 is high.  is the system responsive?  if it is, then this again
points to something that has been nice'd (such as seti), in which case
its not a problem, except for sendmail - which I would then configure
for higher limits.

if its not seti and the system is not very responsive then it could be
something continuously spawning short lived processes.  these can be
hard to spot, run the following and see if the process IDs differ by
much

bash -c 'echo $$'
sleep 5
bash -c 'echo $$'

if they differ by much, then something is creating processes too
quickly.  you can usually spot these by running pstree -plan a couple
of times and seeing what the differences are.  you may have to do that
a couple of times to spot what is causing the problems.

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-24 Thread Kevin J. Cummings

On 07/24/2009 02:04 PM, Andrew Parker wrote:

12 is high.  is the system responsive?  if it is, then this again
points to something that has been nice'd (such as seti), in which case
its not a problem, except for sendmail - which I would then configure
for higher limits.


Mostly responsive.  In the past, I've loaded it heavily enough that it 
swapped mightily and waiting for tings to swap back in could take a 
while.  This is definitely not the case right now.



if its not seti and the system is not very responsive then it could be
something continuously spawning short lived processes.  these can be
hard to spot, run the following and see if the process IDs differ by
much

bash -c 'echo $$'
sleep 5
bash -c 'echo $$'



pids differ by 2-3 numbers, but if I run it again, it usually starts 
where the last one left off



if they differ by much, then something is creating processes too
quickly.  you can usually spot these by running pstree -plan a couple
of times and seeing what the differences are.  you may have to do that
a couple of times to spot what is causing the problems.


pstree diffs only shows itself as changing

--
Kevin J. Cummings
kjch...@rcn.com
cummi...@kjchome.homeip.net
cummi...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-24 Thread Aaron Konstam
On Fri, 2009-07-24 at 11:08 -0400, Kevin J. Cummings wrote:
 On 07/24/2009 10:15 AM, Aaron Konstam wrote:
  On Thu, 2009-07-23 at 22:51 -0400, Kevin J. Cummings wrote:
  When I was running F8, my server averaged a load ave oof around 4.
 
  Now that I'm running F10, and bittorrent is no longer running, in
 fact,
  not much of anything besides s...@home  (BOINC client running
  astro_pulse), my load average is up around 11 and frequently
 exceeds 12
  (and of course when it exceeds 12, it stops receiving emails).
 Here's a
  5 second snapshot from top:
 
  top - 22:48:06 up 2 days, 18:51,  5 users,  load average: 11.15,
 11.33, 11.63
  Tasks: 250 total,   2 running, 247 sleeping,   0 stopped,   1
 zombie
  Cpu(s):  5.9%us,  1.8%sy, 92.0%ni,  0.0%id,  0.0%wa,  0.3%hi,
 0.0%si,  0.0%st
  Mem:   2074172k total,  1932680k used,   141492k free,   108872k
 buffers
  Swap:  3911816k total,  552k used,  3911264k free,   977568k
 cached
 
 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
 COMMAND
3234 root  39  19 50292  46m 2072 R 91.7  2.3   3520:10
 astropulse_5.06
  What is astropulse that is using 91.7% of your cpu? You also seem to
  have a large number of tasks running and sleeping which might up
 your
  load average.
 
 astropulse is the s...@home BOINC client that I run (NICEd to 19).
 It only uses excess cycles and in the past my load average has
 never 
 exceed the 3-5 range, except when I was doing real work on the
 system 
 (like running firefox, thunderbird, and other real programs),
 something 
 I almost never do anymore since I bought myself a laptop.
 
 Yeup, but not consuming any real CPU resources.
 
 I guess what I'm asking is if only 1 job is grabbing most of the CPU, 
 then what's causing the system to thrash?  (Is a load average of 12 
 considered a thrashing system?  sendmail thinks it is.)
Two suggestions:
1. run vmstat 2 30  
to see how many context switches are occurring and the wait time for
processes etc. A load time of 11 means there are a large number of
processes waiting for cpu time. I think it is inaccurate to say no cpu
resources are being used.

2. If you can get a hold of the article on the Real Time Scheduler found
in the August 2009 issue of Linux Journal. To me astropulse is running
away with your CPU time and noting else can get cpu adequate cpu time.
--
===
About the only thing we have left that actually discriminates in favor
of the plain people is the stork.
===
Aaron Konstam telephone: (210) 656-0355 e-mail: akons...@sbcglobal.net

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-24 Thread Bill Davidsen

Kevin J. Cummings wrote:

When I was running F8, my server averaged a load ave oof around 4.

Now that I'm running F10, and bittorrent is no longer running, in fact, 
not much of anything besides s...@home  (BOINC client running 
astro_pulse), my load average is up around 11 and frequently exceeds 12 
(and of course when it exceeds 12, it stops receiving emails).  Here's a 
5 second snapshot from top:


top - 22:48:06 up 2 days, 18:51,  5 users,  load average: 11.15, 
11.33, 11.63

Tasks: 250 total,   2 running, 247 sleeping,   0 stopped,   1 zombie
Cpu(s):  5.9%us,  1.8%sy, 92.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  
0.0%si,  0.0%st

Mem:   2074172k total,  1932680k used,   141492k free,   108872k buffers
Swap:  3911816k total,  552k used,  3911264k free,   977568k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  
COMMAND 
 3234 root  39  19 50292  46m 2072 R 91.7  2.3   3520:10 astropulse_5.06

[___snip___]
I'm open to any WAGs right now as to the cause.  Its probably something 
I haven't yet fixed from my preupgrade(from F8)


You think 91% of your CPU going to astropulse has something to do with it? Try 
turning viewing of threads, I don't see how you would get that load unless you 
had multiple threads running. And you can attach a .txt file, so the lines 
don't get wrapped. In any case, it's low priority and nice so if you have 
anything useful running it should get the CPU.


--
Bill Davidsen david...@tmr.com
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-24 Thread Kevin J. Cummings

On 07/24/2009 04:41 PM, Aaron Konstam wrote:

Two suggestions:
1. run vmstat 2 30
to see how many context switches are occurring and the wait time for
processes etc. A load time of 11 means there are a large number of
processes waiting for cpu time. I think it is inaccurate to say no cpu
resources are being used.


Output here:

# vmstat 2 30
procs ---memory-- ---swap-- -io --system-- 
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 3  0444 173168  78832 9112160032   103  122   55 96  4  0  0  
0
 1  0444 173168  78840 91121600 0   326 1427  446 98  3  0  0  
0
 1  0444 173168  78840 91121600 0 0 1370  219 99  2  0  0  
0
 1  0444 173044  78860 91121600 0   162 1362  242 98  2  0  0  
0
 1  0444 173044  78860 91121600 0 0 1374  449 98  2  0  0  
0
 1  0444 173044  78860 91121600 0 0 1372  214 99  1  0  0  
0
 1  0444 173044  78876 91121600 0   156 1377  448 98  2  0  0  
0
 1  0444 173044  78876 91121600 0 0 1367  234 99  2  0  0  
0
 1  0444 175648  7 91121600 038 1383  239 98  2  0  0  
0
 1  0444 175648  7 91121600 0 0 1366  424 98  2  0  0  
0
 1  0444 175648  7 91121600 0 0 1373  222 99  1  0  0  
0
 1  0444 175648  78904 91121600 080 1383  346 98  2  0  0  
0
 1  0444 175648  78904 91121600 0 0 1366  212 100  1  0  0  
0   
 1  0444 174656  78912 91120800 030 1371  228 98  2  0  0  
0
 1  0444 174656  78912 91121600 0 0 1371  450 98  3  0  0  
0
 1  0444 174656  78912 91121600 0 0 1367  210 100  1  0  0  
0   
 1  0444 174656  78924 91121600 034 1374  451 98  2  0  0  
0
 1  0444 174656  78924 91121600 0 0 1365  221 99  1  0  0  
0
 2  0444 173184  78932 91121600 0   204 1427  556 97  4  0  0  
0
 1  0444 172548  78932 91122400 0 0 1386  635 96  4  0  0  
0
 1  0444 173168  78940 91122400 0   150 1356  221 97  2  0  1  
0
 1  0444 173168  78964 91122400 034 1370  314 99  2  0  0  
0
 1  0444 173168  78976 91122400 0  1324 1375  228 99  1  0  0  
0
 1  0444 172400  78984 91121600 014 1377  246 98  2  0  0  
0
 1  0444 172424  78984 91122400 0 0 1364  433 99  1  0  0  
0
 1  0444 172424  78984 91122400 0 0 1371  220 99  1  0  0  
0
 1  0444 172424  78992 91122400 030 1375  461 98  3  0  0  
0
 1  0444 172424  78992 91122400 0 0 1365  207 100  1  0  0  
0   
 1  0444 172512  79000 91121600 034 1377  237 98  2  0  0  
0
 1  0444 172548  79000 91122400 0 0 1376  451 97  3  0  0  
0


Do you see anything in the above output?  Load ave was 11.45 when I 
started



2. If you can get a hold of the article on the Real Time Scheduler found
in the August 2009 issue of Linux Journal. To me astropulse is running
away with your CPU time and noting else can get cpu adequate cpu time.


But its only July 2009!   B^)

I'm no longer an LJ subscriber, so I'll have to either find/buy the 
issue, or wait a while for it to become available on their WWW site.


Thanks Aaron!

--
Kevin J. Cummings
kjch...@rcn.com
cummi...@kjchome.homeip.net
cummi...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-24 Thread Kevin J. Cummings

On 07/24/2009 04:52 PM, Bill Davidsen wrote:


You think 91% of your CPU going to astropulse has something to do with
it? Try turning viewing of threads, I don't see how you would get that


I'll say it again.  BOINC is niced to 19.  It only runs when there is 
nothing else to run.  It is not even consuming all memory (no, or very 
little swap is in use), so other processes still remain resident.


Under F8 the same mix of programs (actually more programs running 
concurrently) had a lower load average by a factor of 3!  The answer 
must be something else.


However, something Aaron said got me thinking.  The kernel may be using 
a different process scheduler in F10 than it was in F8.  That is 
definitely something for me to look into.



load unless you had multiple threads running. And you can attach a
.txt file, so the lines don't get wrapped. In any case, it's low
priority and nice so if you have anything useful running it should get
the CPU.


I agree.  In fact, the 2 ssh connections I have to it from my laptop 
remain very responsive.  Its just the large number that bothers me.


--
Kevin J. Cummings
kjch...@rcn.com
cummi...@kjchome.homeip.net
cummi...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-24 Thread Andrew Parker
On Fri, Jul 24, 2009 at 2:04 PM, Andrew Parkergbofs...@gmail.com wrote:
 On Fri, Jul 24, 2009 at 11:08 AM, Kevin J.
 Cummingscummi...@kjchome.homeip.net wrote:
 astropulse is the s...@home BOINC client that I run (NICEd to 19).
 It only uses excess cycles and in the past my load average has never
 exceed the 3-5 range, except when I was doing real work on the system
 (like running firefox, thunderbird, and other real programs), something I
 almost never do anymore since I bought myself a laptop.

 kill it off, wait a few minutes and see if the load average comes down

did you try this yet?

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-24 Thread Kevin J. Cummings

On 07/24/2009 02:04 PM, Andrew Parker wrote:

kill it off, wait a few minutes and see if the load average comes down


Kill off boinc and astropulse with kill -9s.  After waiting for 10 
minutes (or more), the load average dropped to 10.35


--
Kevin J. Cummings
kjch...@rcn.com
cummi...@kjchome.homeip.net
cummi...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Why is my load ave so high now?

2009-07-23 Thread Kevin J. Cummings

When I was running F8, my server averaged a load ave oof around 4.

Now that I'm running F10, and bittorrent is no longer running, in fact, 
not much of anything besides s...@home  (BOINC client running 
astro_pulse), my load average is up around 11 and frequently exceeds 12 
(and of course when it exceeds 12, it stops receiving emails).  Here's a 
5 second snapshot from top:



top - 22:48:06 up 2 days, 18:51,  5 users,  load average: 11.15, 11.33, 11.63
Tasks: 250 total,   2 running, 247 sleeping,   0 stopped,   1 zombie
Cpu(s):  5.9%us,  1.8%sy, 92.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.0%si,  0.0%st
Mem:   2074172k total,  1932680k used,   141492k free,   108872k buffers
Swap:  3911816k total,  552k used,  3911264k free,   977568k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
 3234 root  39  19 50292  46m 2072 R 91.7  2.3   3520:10 astropulse_5.06 
 3378 root  20   0  324m  36m 8496 S  3.6  1.8  14:34.20 Xorg
16271 cummings  20   0  172m  60m  21m S  1.1  3.0   0:37.77 thunderbird-bin 
 4026 cummings  20   0 22412  11m 7612 S  0.5  0.6   0:09.28 metacity
 4108 cummings  20   0 27276  11m 8748 S  0.4  0.6  18:41.47 multiload-apple 
 4126 cummings  20   0 74844  18m  10m S  0.4  0.9   0:07.54 gnome-terminal  
 4027 cummings  20   0 63536  20m  10m S  0.3  1.0   0:27.59 gnome-panel 
 4030 cummings  20   0 34104  13m 5500 S  0.3  0.7   0:24.95 gnome-screensav 
16252 root  20   0  2560 1184  844 S  0.3  0.1   0:00.97 top 
 4086 cummings  20   0 57960  14m 9716 S  0.2  0.7   0:06.57 wnck-applet 
 4093 cummings  20   0 34464  14m  10m S  0.2  0.7   5:31.08 clock-applet
  196 root  15  -5 000 S  0.1  0.0   1:08.89 ata/0   
  971 root  15  -5 000 S  0.1  0.0   3:38.03 scsi_eh_5   
 2838 root  20   0  3332  520  364 S  0.1  0.0   0:05.12 lircd   
 3138 root  20   0  3624 1032  912 S  0.1  0.0   0:14.66 hald-addon-stor 
 3175 root  20   0  3624 1032  912 S  0.1  0.0   0:37.94 hald-addon-stor 
 3194 mailman   20   0 13620 7000 2820 S  0.1  0.3   0:53.14 python   

RE: Why is my load ave so high now?

2009-07-23 Thread Joseph L. Casale
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  3234 root  39  19 50292  46m 2072 R 91.7  2.3   3520:10 astropulse_5.06

I'm open to any WAGs right now as to the cause.

FFS, would seti's load not have anything to do with it? :)

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Why is my load ave so high now?

2009-07-23 Thread Kevin J. Cummings

On 07/23/2009 11:09 PM, Joseph L. Casale wrote:

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  3234 root  39  19 50292  46m 2072 R 91.7  2.3   3520:10 astropulse_5.06

I'm open to any WAGs right now as to the cause.


FFS, would seti's load not have anything to do with it? :)



It didn't used to

--
Kevin J. Cummings
kjch...@rcn.com
cummi...@kjchome.homeip.net
cummi...@kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines