Re: Why is my load ave so high now? [Now I know why!]
Kevin J. Cummings wrote: On 07/27/2009 02:26 PM, Rick Stevens wrote: You see a bunch of NFS-related things in a D state and you wonder why it's slow? Yes. Mostly because the machine accessing the NFS mounts has been re-booted a couple of times. If you have processes in an I/O wait (a.k.a. D) state, that'll bog stuff down badly...especially if the NFS mounts are mounted hard. Well, tonight I rebooted the server with NFS turned off. When it booted, I saw a load average between 1 and 2. That's all. When it re-booted, ivtv started back up, despite my blacklisting it and removing it from modprobe.conf. However, ivtvfb did not get installed. I also noticed that BOINC started right up again. With astropulse grabbing all the idle cpu time, my load average was still between 1 and 2. So, I decided that NFS was my problem, but I'm still not sure why. So, I tried a couple of things. My laptop references a few directories on my server via NFS and autofs. So, I started nfs again on the server (service nfs start) Load average remains between 1 and 2. So far so good. From the laptop, I did a cd /net/kjc386. I can then do an ls and see all of the exported filesystems. Continues to look good. ls home lists the directories in the server's exported /home dir. nfs does the work, and disappears from the top -i that I have running. Great. Next I do a ls c: to look at the old WINDOWS partition on my server. HANG! I can't interrupt the ls with ^C nor ^Z. I have to kill it from another process. When I do, the hung nfs processes on the server stay hung. After it collects all 8 allowed nfs processes, nothing more nfs works to the server, and the load average climbs roughly 1 per nfs process (I watched the load average increase with each new nfs process that appeared). So, I guess my question is what's broken with NFS between my F11 laptop and the F10 server I could see where ls c: might be interpreted by the system as trying to find an NFS machine called c. An NFS mount command is: mount -t nfs server:/sharename /mountpoint Perhaps F11 is trying to invoke an automount of an NFS share from server c to satisfy your ls command. That'd be wild! I haven't tried this. perhaps you've found a very subtle bug in F11's NFS client implementation. Could you run a wireshark or tcpdump and watch for NFS traffic when you do that ls c: command? If you do, then I'd file a bugzilla PDQ (pretty damned quick). -- - Rick Stevens, Systems Engineer ri...@nerd.com - - AIM/Skype: therps2ICQ: 22643734Yahoo: origrps2 - -- - People tell me I look at the dark side. That's not true. I have - - the heart of a small boy..in a jar right here on my desk. - --- Stephen King - -- -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now? [Now I know why!]
On 07/28/2009 02:00 PM, Rick Stevens wrote: So, I guess my question is what's broken with NFS between my F11 laptop and the F10 server I could see where ls c: might be interpreted by the system as trying to find an NFS machine called c. An NFS mount command is: mount -t nfs server:/sharename /mountpoint Perhaps F11 is trying to invoke an automount of an NFS share from server c to satisfy your ls command. That'd be wild! I haven't tried this. perhaps you've found a very subtle bug in F11's NFS client implementation. Could you run a wireshark or tcpdump and watch for NFS traffic when you do that ls c: command? If you do, then I'd file a bugzilla PDQ (pretty damned quick). Well, since my cwd at the time is /net/kjc386, I fully expect ls c: to generate NFS traffic, because (through the autofs stuff) its trying to access kjc386:/c: which is one of the exported directories from the server kjc386. Did I misinterpret what you were trying to say? I know what you are trying to say, and this naming convention that I have been using for years now, has only tripped up emacs's readdir stuff in the past, never ls. I suppose I could try changing the directories mount point from c: to c and see if that helps -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
Kevin J. Cummings wrote: On 07/25/2009 09:58 AM, Aaron Konstam wrote: Wwll two things, one positive and one negative. The r column tells us there are not many processes waiting for run time which we normally associate with a low load average, However your number of interrupts per second (in) are rather high. Some kernel action seems to be really beating your machine over the head so to speak. How you find out what processes these are that are that are interrupting is not clear to me, however. Is your primary process doing a lot of I/O? No, astropulse should be CPU bound, not IO bound. It reads in some data, performs *lots* of calculations on it (hours worth) and then writes the results out to a file which it then sends back to SETI, and downloads another work unit. I'm very much intrested in how I can figure out where the interrupts are coming from so I ran 2 copies of cat /proc/interurupts 10 seconds apart, and here are the delta interrupts in that time CPU0 0: 0 IO-APIC-edge timer 1: 0 IO-APIC-edge i8042 4: 0 IO-APIC-edge 6: 0 IO-APIC-edge floppy 7: 0 IO-APIC-edge parport0 8: 0 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 12: 0 IO-APIC-edge i8042 14: 0 IO-APIC-edge pata_amd 15:209 IO-APIC-edge pata_amd 16: 2790 IO-APIC-fasteoi ivtv0 18: 9 IO-APIC-fasteoi aic7xxx, cx88[0], cx88[0], cx88[0], eth0 20: 60 IO-APIC-fasteoi ohci_hcd:usb2, NVidia CK8 21:114 IO-APIC-fasteoi ehci_hcd:usb1, ohci_hcd:usb3 22: 21 IO-APIC-fasteoi sata_nv NMI: 0 Non-maskable interrupts LOC:246 Local timer interrupts RES: 0 Rescheduling interrupts CAL: 0 function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 MIS: 0 Could it be my ivtv0 (PVR-350) board? Its not supposed to be doing anything at the moment! There's nothing plugged into it, and its not configured under MythTV right now (cable went all digital) I'll try removing the driver module and see if that helps. At worst, I'll remove the board entirely. Looking at the original 'top' output, all the CPU was going to nice processing, presumable SETI. When you kill that you note the load average is still high, could we see the top few lines again to see the distribution? I note that hi/si are low, and load average indicates runable process (my first guess was the seti went threaded). So 'top' with the 'i' visual option (only show runnable tasks) should show what's running. -- Bill Davidsen david...@tmr.com We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On 07/27/2009 12:04 PM, Bill Davidsen wrote: Kevin J. Cummings wrote: Could it be my ivtv0 (PVR-350) board? Its not supposed to be doing anything at the moment! There's nothing plugged into it, and its not configured under MythTV right now (cable went all digital) I'll try removing the driver module and see if that helps. At worst, I'll remove the board entirely. I ended up rmmod ivtvt and ivtvfb, and it didn't help. Yes, the number of ints propped noticeably, but the load average remains 10+ Looking at the original 'top' output, all the CPU was going to nice processing, presumable SETI. When you kill that you note the load average is still high, could we see the top few lines again to see the distribution? I note that hi/si are low, and load average indicates runable process (my first guess was the seti went threaded). So 'top' with the 'i' visual option (only show runnable tasks) should show what's running. (I learn something new everyday!) Sure, here it is for top -i: top - 12:58:12 up 6 days, 9:02, 4 users, load average: 11.30, 11.15, 11.10 Cpu(s): 0.7%us, 0.7%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2074172k total, 1961024k used, 113148k free, 199680k buffers Swap: 3911816k total, 412k used, 3911404k free, 935716k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 14743 root 20 0 2560 1152 836 R 0.7 0.1 0:00.08 top 2506 root 20 0 15068 860 592 R 0.0 0.0 0:32.72 apcupsd 2547 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2548 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2549 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2550 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2551 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2552 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2553 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2554 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 17427 root 20 0 77040 72m 752 D 0.0 3.6 0:02.07 clamscan 24904 root 20 0 2492 964 704 D 0.0 0.0 0:01.87 find 28703 root 39 19 1900 652 540 D 0.0 0.0 0:00.00 updatedb That's the entire top output -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
Kevin J. Cummings wrote: On 07/27/2009 12:04 PM, Bill Davidsen wrote: Kevin J. Cummings wrote: Could it be my ivtv0 (PVR-350) board? Its not supposed to be doing anything at the moment! There's nothing plugged into it, and its not configured under MythTV right now (cable went all digital) I'll try removing the driver module and see if that helps. At worst, I'll remove the board entirely. I ended up rmmod ivtvt and ivtvfb, and it didn't help. Yes, the number of ints propped noticeably, but the load average remains 10+ Looking at the original 'top' output, all the CPU was going to nice processing, presumable SETI. When you kill that you note the load average is still high, could we see the top few lines again to see the distribution? I note that hi/si are low, and load average indicates runable process (my first guess was the seti went threaded). So 'top' with the 'i' visual option (only show runnable tasks) should show what's running. (I learn something new everyday!) Sure, here it is for top -i: top - 12:58:12 up 6 days, 9:02, 4 users, load average: 11.30, 11.15, 11.10 Cpu(s): 0.7%us, 0.7%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2074172k total, 1961024k used, 113148k free, 199680k buffers Swap: 3911816k total, 412k used, 3911404k free, 935716k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 14743 root 20 0 2560 1152 836 R 0.7 0.1 0:00.08 top 2506 root 20 0 15068 860 592 R 0.0 0.0 0:32.72 apcupsd 2547 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2548 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2549 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2550 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2551 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2552 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2553 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 2554 root 15 -5 000 D 0.0 0.0 0:00.00 nfsd 17427 root 20 0 77040 72m 752 D 0.0 3.6 0:02.07 clamscan 24904 root 20 0 2492 964 704 D 0.0 0.0 0:01.87 find 28703 root 39 19 1900 652 540 D 0.0 0.0 0:00.00 updatedb That's the entire top output You see a bunch of NFS-related things in a D state and you wonder why it's slow? If you have processes in an I/O wait (a.k.a. D) state, that'll bog stuff down badly...especially if the NFS mounts are mounted hard. -- - Rick Stevens, Systems Engineer ri...@nerd.com - - AIM/Skype: therps2ICQ: 22643734Yahoo: origrps2 - -- - Squawk! Pieces of Seven! Pieces of Seven! Parity Error! - -- -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now? [Now I know why!]
On 07/27/2009 02:26 PM, Rick Stevens wrote: You see a bunch of NFS-related things in a D state and you wonder why it's slow? Yes. Mostly because the machine accessing the NFS mounts has been re-booted a couple of times. If you have processes in an I/O wait (a.k.a. D) state, that'll bog stuff down badly...especially if the NFS mounts are mounted hard. Well, tonight I rebooted the server with NFS turned off. When it booted, I saw a load average between 1 and 2. That's all. When it re-booted, ivtv started back up, despite my blacklisting it and removing it from modprobe.conf. However, ivtvfb did not get installed. I also noticed that BOINC started right up again. With astropulse grabbing all the idle cpu time, my load average was still between 1 and 2. So, I decided that NFS was my problem, but I'm still not sure why. So, I tried a couple of things. My laptop references a few directories on my server via NFS and autofs. So, I started nfs again on the server (service nfs start) Load average remains between 1 and 2. So far so good. From the laptop, I did a cd /net/kjc386. I can then do an ls and see all of the exported filesystems. Continues to look good. ls home lists the directories in the server's exported /home dir. nfs does the work, and disappears from the top -i that I have running. Great. Next I do a ls c: to look at the old WINDOWS partition on my server. HANG! I can't interrupt the ls with ^C nor ^Z. I have to kill it from another process. When I do, the hung nfs processes on the server stay hung. After it collects all 8 allowed nfs processes, nothing more nfs works to the server, and the load average climbs roughly 1 per nfs process (I watched the load average increase with each new nfs process that appeared). So, I guess my question is what's broken with NFS between my F11 laptop and the F10 server -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On 07/25/2009 09:58 AM, Aaron Konstam wrote: Wwll two things, one positive and one negative. The r column tells us there are not many processes waiting for run time which we normally associate with a low load average, However your number of interrupts per second (in) are rather high. Some kernel action seems to be really beating your machine over the head so to speak. How you find out what processes these are that are that are interrupting is not clear to me, however. Is your primary process doing a lot of I/O? No, astropulse should be CPU bound, not IO bound. It reads in some data, performs *lots* of calculations on it (hours worth) and then writes the results out to a file which it then sends back to SETI, and downloads another work unit. I'm very much intrested in how I can figure out where the interrupts are coming from so I ran 2 copies of cat /proc/interurupts 10 seconds apart, and here are the delta interrupts in that time CPU0 0: 0 IO-APIC-edge timer 1: 0 IO-APIC-edge i8042 4: 0 IO-APIC-edge 6: 0 IO-APIC-edge floppy 7: 0 IO-APIC-edge parport0 8: 0 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 12: 0 IO-APIC-edge i8042 14: 0 IO-APIC-edge pata_amd 15:209 IO-APIC-edge pata_amd 16: 2790 IO-APIC-fasteoi ivtv0 18: 9 IO-APIC-fasteoi aic7xxx, cx88[0], cx88[0], cx88[0], eth0 20: 60 IO-APIC-fasteoi ohci_hcd:usb2, NVidia CK8 21:114 IO-APIC-fasteoi ehci_hcd:usb1, ohci_hcd:usb3 22: 21 IO-APIC-fasteoi sata_nv NMI: 0 Non-maskable interrupts LOC:246 Local timer interrupts RES: 0 Rescheduling interrupts CAL: 0 function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 MIS: 0 Could it be my ivtv0 (PVR-350) board? Its not supposed to be doing anything at the moment! There's nothing plugged into it, and its not configured under MythTV right now (cable went all digital) I'll try removing the driver module and see if that helps. At worst, I'll remove the board entirely. Thanks Aaaron. -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On Thu, 2009-07-23 at 22:51 -0400, Kevin J. Cummings wrote: When I was running F8, my server averaged a load ave oof around 4. Now that I'm running F10, and bittorrent is no longer running, in fact, not much of anything besides s...@home (BOINC client running astro_pulse), my load average is up around 11 and frequently exceeds 12 (and of course when it exceeds 12, it stops receiving emails). Here's a 5 second snapshot from top: top - 22:48:06 up 2 days, 18:51, 5 users, load average: 11.15, 11.33, 11.63 Tasks: 250 total, 2 running, 247 sleeping, 0 stopped, 1 zombie Cpu(s): 5.9%us, 1.8%sy, 92.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st Mem: 2074172k total, 1932680k used, 141492k free, 108872k buffers Swap: 3911816k total, 552k used, 3911264k free, 977568k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3234 root 39 19 50292 46m 2072 R 91.7 2.3 3520:10 astropulse_5.06 What is astropulse that is using 91.7% of your cpu? You also seem to have a large number of tasks running and sleeping which might up your load average. 3378 root 20 0 324m 36m 8496 S 3.6 1.8 14:34.20 Xorg 16271 cummings 20 0 172m 60m 21m S 1.1 3.0 0:37.77 thunderbird-bin 4026 cummings 20 0 22412 11m 7612 S 0.5 0.6 0:09.28 metacity 4108 cummings 20 0 27276 11m 8748 S 0.4 0.6 18:41.47 multiload-apple 4126 cummings 20 0 74844 18m 10m S 0.4 0.9 0:07.54 gnome-terminal 4027 cummings 20 0 63536 20m 10m S 0.3 1.0 0:27.59 gnome-panel 4030 cummings 20 0 34104 13m 5500 S 0.3 0.7 0:24.95 gnome-screensav 16252 root 20 0 2560 1184 844 S 0.3 0.1 0:00.97 top 4086 cummings 20 0 57960 14m 9716 S 0.2 0.7 0:06.57 wnck-applet 4093 cummings 20 0 34464 14m 10m S 0.2 0.7 5:31.08 clock-applet 196 root 15 -5 000 S 0.1 0.0 1:08.89 ata/0 971 root 15 -5 000 S 0.1 0.0 3:38.03 scsi_eh_5 2838 root 20 0 3332 520 364 S 0.1 0.0 0:05.12 lircd 3138 root 20 0 3624 1032 912 S 0.1 0.0 0:14.66 hald-addon-stor 3175 root 20 0
Re: Why is my load ave so high now?
On 07/24/2009 10:15 AM, Aaron Konstam wrote: On Thu, 2009-07-23 at 22:51 -0400, Kevin J. Cummings wrote: When I was running F8, my server averaged a load ave oof around 4. Now that I'm running F10, and bittorrent is no longer running, in fact, not much of anything besides s...@home (BOINC client running astro_pulse), my load average is up around 11 and frequently exceeds 12 (and of course when it exceeds 12, it stops receiving emails). Here's a 5 second snapshot from top: top - 22:48:06 up 2 days, 18:51, 5 users, load average: 11.15, 11.33, 11.63 Tasks: 250 total, 2 running, 247 sleeping, 0 stopped, 1 zombie Cpu(s): 5.9%us, 1.8%sy, 92.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st Mem: 2074172k total, 1932680k used, 141492k free, 108872k buffers Swap: 3911816k total, 552k used, 3911264k free, 977568k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3234 root 39 19 50292 46m 2072 R 91.7 2.3 3520:10 astropulse_5.06 What is astropulse that is using 91.7% of your cpu? You also seem to have a large number of tasks running and sleeping which might up your load average. astropulse is the s...@home BOINC client that I run (NICEd to 19). It only uses excess cycles and in the past my load average has never exceed the 3-5 range, except when I was doing real work on the system (like running firefox, thunderbird, and other real programs), something I almost never do anymore since I bought myself a laptop. Yeup, but not consuming any real CPU resources. I guess what I'm asking is if only 1 job is grabbing most of the CPU, then what's causing the system to thrash? (Is a load average of 12 considered a thrashing system? sendmail thinks it is.) 3378 root 20 0 324m 36m 8496 S 3.6 1.8 14:34.20 Xorg 16271 cummings 20 0 172m 60m 21m S 1.1 3.0 0:37.77 thunderbird-bin 4026 cummings 20 0 22412 11m 7612 S 0.5 0.6 0:09.28 metacity 4108 cummings 20 0 27276 11m 8748 S 0.4 0.6 18:41.47 multiload-apple 4126 cummings 20 0 74844 18m 10m S 0.4 0.9 0:07.54 gnome-terminal 4027 cummings 20 0 63536 20m 10m S 0.3 1.0 0:27.59 gnome-panel 4030 cummings 20 0 34104 13m 5500 S 0.3 0.7 0:24.95 gnome-screensav 16252 root 20 0 2560 1184 844 S 0.3 0.1 0:00.97 top 4086 cummings 20 0 57960 14m 9716 S 0.2 0.7 0:06.57 wnck-applet 4093 cummings 20 0 34464 14m 10m S 0.2 0.7 5:31.08 clock-applet 196 root 15 -5 000 S 0.1 0.0 1:08.89 ata/0 971 root 15 -5 000 S 0.1 0.0 3:38.03 scsi_eh_5 2838 root 20 0 3332 520 364 S 0.1 0.0 0:05.12 lircd 3138 root 20 0 3624 1032 912 S 0.1 0.0 0:14.66 hald-addon-stor 3175 root 20 0 3624 1032 912 S 0.1 0.0 0:37.94 hald-addon-stor 3194 mailman 20 0 13620 7000 2820 S 0.1 0.3 0:53.14 python 3879 cummings 20 0 58964 30m 8092 S 0.1 1.5 21:56.13 gnome-settings- 4478 cummings 20 0 10044 4432 2376 S 0.1 0.2 0:01.48 xterm -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On Fri, Jul 24, 2009 at 11:08 AM, Kevin J. Cummingscummi...@kjchome.homeip.net wrote: astropulse is the s...@home BOINC client that I run (NICEd to 19). It only uses excess cycles and in the past my load average has never exceed the 3-5 range, except when I was doing real work on the system (like running firefox, thunderbird, and other real programs), something I almost never do anymore since I bought myself a laptop. kill it off, wait a few minutes and see if the load average comes down I guess what I'm asking is if only 1 job is grabbing most of the CPU, then what's causing the system to thrash? (Is a load average of 12 considered a thrashing system? sendmail thinks it is.) 12 is high. is the system responsive? if it is, then this again points to something that has been nice'd (such as seti), in which case its not a problem, except for sendmail - which I would then configure for higher limits. if its not seti and the system is not very responsive then it could be something continuously spawning short lived processes. these can be hard to spot, run the following and see if the process IDs differ by much bash -c 'echo $$' sleep 5 bash -c 'echo $$' if they differ by much, then something is creating processes too quickly. you can usually spot these by running pstree -plan a couple of times and seeing what the differences are. you may have to do that a couple of times to spot what is causing the problems. -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On 07/24/2009 02:04 PM, Andrew Parker wrote: 12 is high. is the system responsive? if it is, then this again points to something that has been nice'd (such as seti), in which case its not a problem, except for sendmail - which I would then configure for higher limits. Mostly responsive. In the past, I've loaded it heavily enough that it swapped mightily and waiting for tings to swap back in could take a while. This is definitely not the case right now. if its not seti and the system is not very responsive then it could be something continuously spawning short lived processes. these can be hard to spot, run the following and see if the process IDs differ by much bash -c 'echo $$' sleep 5 bash -c 'echo $$' pids differ by 2-3 numbers, but if I run it again, it usually starts where the last one left off if they differ by much, then something is creating processes too quickly. you can usually spot these by running pstree -plan a couple of times and seeing what the differences are. you may have to do that a couple of times to spot what is causing the problems. pstree diffs only shows itself as changing -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On Fri, 2009-07-24 at 11:08 -0400, Kevin J. Cummings wrote: On 07/24/2009 10:15 AM, Aaron Konstam wrote: On Thu, 2009-07-23 at 22:51 -0400, Kevin J. Cummings wrote: When I was running F8, my server averaged a load ave oof around 4. Now that I'm running F10, and bittorrent is no longer running, in fact, not much of anything besides s...@home (BOINC client running astro_pulse), my load average is up around 11 and frequently exceeds 12 (and of course when it exceeds 12, it stops receiving emails). Here's a 5 second snapshot from top: top - 22:48:06 up 2 days, 18:51, 5 users, load average: 11.15, 11.33, 11.63 Tasks: 250 total, 2 running, 247 sleeping, 0 stopped, 1 zombie Cpu(s): 5.9%us, 1.8%sy, 92.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st Mem: 2074172k total, 1932680k used, 141492k free, 108872k buffers Swap: 3911816k total, 552k used, 3911264k free, 977568k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3234 root 39 19 50292 46m 2072 R 91.7 2.3 3520:10 astropulse_5.06 What is astropulse that is using 91.7% of your cpu? You also seem to have a large number of tasks running and sleeping which might up your load average. astropulse is the s...@home BOINC client that I run (NICEd to 19). It only uses excess cycles and in the past my load average has never exceed the 3-5 range, except when I was doing real work on the system (like running firefox, thunderbird, and other real programs), something I almost never do anymore since I bought myself a laptop. Yeup, but not consuming any real CPU resources. I guess what I'm asking is if only 1 job is grabbing most of the CPU, then what's causing the system to thrash? (Is a load average of 12 considered a thrashing system? sendmail thinks it is.) Two suggestions: 1. run vmstat 2 30 to see how many context switches are occurring and the wait time for processes etc. A load time of 11 means there are a large number of processes waiting for cpu time. I think it is inaccurate to say no cpu resources are being used. 2. If you can get a hold of the article on the Real Time Scheduler found in the August 2009 issue of Linux Journal. To me astropulse is running away with your CPU time and noting else can get cpu adequate cpu time. -- === About the only thing we have left that actually discriminates in favor of the plain people is the stork. === Aaron Konstam telephone: (210) 656-0355 e-mail: akons...@sbcglobal.net -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
Kevin J. Cummings wrote: When I was running F8, my server averaged a load ave oof around 4. Now that I'm running F10, and bittorrent is no longer running, in fact, not much of anything besides s...@home (BOINC client running astro_pulse), my load average is up around 11 and frequently exceeds 12 (and of course when it exceeds 12, it stops receiving emails). Here's a 5 second snapshot from top: top - 22:48:06 up 2 days, 18:51, 5 users, load average: 11.15, 11.33, 11.63 Tasks: 250 total, 2 running, 247 sleeping, 0 stopped, 1 zombie Cpu(s): 5.9%us, 1.8%sy, 92.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st Mem: 2074172k total, 1932680k used, 141492k free, 108872k buffers Swap: 3911816k total, 552k used, 3911264k free, 977568k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3234 root 39 19 50292 46m 2072 R 91.7 2.3 3520:10 astropulse_5.06 [___snip___] I'm open to any WAGs right now as to the cause. Its probably something I haven't yet fixed from my preupgrade(from F8) You think 91% of your CPU going to astropulse has something to do with it? Try turning viewing of threads, I don't see how you would get that load unless you had multiple threads running. And you can attach a .txt file, so the lines don't get wrapped. In any case, it's low priority and nice so if you have anything useful running it should get the CPU. -- Bill Davidsen david...@tmr.com We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On 07/24/2009 04:41 PM, Aaron Konstam wrote: Two suggestions: 1. run vmstat 2 30 to see how many context switches are occurring and the wait time for processes etc. A load time of 11 means there are a large number of processes waiting for cpu time. I think it is inaccurate to say no cpu resources are being used. Output here: # vmstat 2 30 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 3 0444 173168 78832 9112160032 103 122 55 96 4 0 0 0 1 0444 173168 78840 91121600 0 326 1427 446 98 3 0 0 0 1 0444 173168 78840 91121600 0 0 1370 219 99 2 0 0 0 1 0444 173044 78860 91121600 0 162 1362 242 98 2 0 0 0 1 0444 173044 78860 91121600 0 0 1374 449 98 2 0 0 0 1 0444 173044 78860 91121600 0 0 1372 214 99 1 0 0 0 1 0444 173044 78876 91121600 0 156 1377 448 98 2 0 0 0 1 0444 173044 78876 91121600 0 0 1367 234 99 2 0 0 0 1 0444 175648 7 91121600 038 1383 239 98 2 0 0 0 1 0444 175648 7 91121600 0 0 1366 424 98 2 0 0 0 1 0444 175648 7 91121600 0 0 1373 222 99 1 0 0 0 1 0444 175648 78904 91121600 080 1383 346 98 2 0 0 0 1 0444 175648 78904 91121600 0 0 1366 212 100 1 0 0 0 1 0444 174656 78912 91120800 030 1371 228 98 2 0 0 0 1 0444 174656 78912 91121600 0 0 1371 450 98 3 0 0 0 1 0444 174656 78912 91121600 0 0 1367 210 100 1 0 0 0 1 0444 174656 78924 91121600 034 1374 451 98 2 0 0 0 1 0444 174656 78924 91121600 0 0 1365 221 99 1 0 0 0 2 0444 173184 78932 91121600 0 204 1427 556 97 4 0 0 0 1 0444 172548 78932 91122400 0 0 1386 635 96 4 0 0 0 1 0444 173168 78940 91122400 0 150 1356 221 97 2 0 1 0 1 0444 173168 78964 91122400 034 1370 314 99 2 0 0 0 1 0444 173168 78976 91122400 0 1324 1375 228 99 1 0 0 0 1 0444 172400 78984 91121600 014 1377 246 98 2 0 0 0 1 0444 172424 78984 91122400 0 0 1364 433 99 1 0 0 0 1 0444 172424 78984 91122400 0 0 1371 220 99 1 0 0 0 1 0444 172424 78992 91122400 030 1375 461 98 3 0 0 0 1 0444 172424 78992 91122400 0 0 1365 207 100 1 0 0 0 1 0444 172512 79000 91121600 034 1377 237 98 2 0 0 0 1 0444 172548 79000 91122400 0 0 1376 451 97 3 0 0 0 Do you see anything in the above output? Load ave was 11.45 when I started 2. If you can get a hold of the article on the Real Time Scheduler found in the August 2009 issue of Linux Journal. To me astropulse is running away with your CPU time and noting else can get cpu adequate cpu time. But its only July 2009! B^) I'm no longer an LJ subscriber, so I'll have to either find/buy the issue, or wait a while for it to become available on their WWW site. Thanks Aaron! -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On 07/24/2009 04:52 PM, Bill Davidsen wrote: You think 91% of your CPU going to astropulse has something to do with it? Try turning viewing of threads, I don't see how you would get that I'll say it again. BOINC is niced to 19. It only runs when there is nothing else to run. It is not even consuming all memory (no, or very little swap is in use), so other processes still remain resident. Under F8 the same mix of programs (actually more programs running concurrently) had a lower load average by a factor of 3! The answer must be something else. However, something Aaron said got me thinking. The kernel may be using a different process scheduler in F10 than it was in F8. That is definitely something for me to look into. load unless you had multiple threads running. And you can attach a .txt file, so the lines don't get wrapped. In any case, it's low priority and nice so if you have anything useful running it should get the CPU. I agree. In fact, the 2 ssh connections I have to it from my laptop remain very responsive. Its just the large number that bothers me. -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On Fri, Jul 24, 2009 at 2:04 PM, Andrew Parkergbofs...@gmail.com wrote: On Fri, Jul 24, 2009 at 11:08 AM, Kevin J. Cummingscummi...@kjchome.homeip.net wrote: astropulse is the s...@home BOINC client that I run (NICEd to 19). It only uses excess cycles and in the past my load average has never exceed the 3-5 range, except when I was doing real work on the system (like running firefox, thunderbird, and other real programs), something I almost never do anymore since I bought myself a laptop. kill it off, wait a few minutes and see if the load average comes down did you try this yet? -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On 07/24/2009 02:04 PM, Andrew Parker wrote: kill it off, wait a few minutes and see if the load average comes down Kill off boinc and astropulse with kill -9s. After waiting for 10 minutes (or more), the load average dropped to 10.35 -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Why is my load ave so high now?
When I was running F8, my server averaged a load ave oof around 4. Now that I'm running F10, and bittorrent is no longer running, in fact, not much of anything besides s...@home (BOINC client running astro_pulse), my load average is up around 11 and frequently exceeds 12 (and of course when it exceeds 12, it stops receiving emails). Here's a 5 second snapshot from top: top - 22:48:06 up 2 days, 18:51, 5 users, load average: 11.15, 11.33, 11.63 Tasks: 250 total, 2 running, 247 sleeping, 0 stopped, 1 zombie Cpu(s): 5.9%us, 1.8%sy, 92.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st Mem: 2074172k total, 1932680k used, 141492k free, 108872k buffers Swap: 3911816k total, 552k used, 3911264k free, 977568k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3234 root 39 19 50292 46m 2072 R 91.7 2.3 3520:10 astropulse_5.06 3378 root 20 0 324m 36m 8496 S 3.6 1.8 14:34.20 Xorg 16271 cummings 20 0 172m 60m 21m S 1.1 3.0 0:37.77 thunderbird-bin 4026 cummings 20 0 22412 11m 7612 S 0.5 0.6 0:09.28 metacity 4108 cummings 20 0 27276 11m 8748 S 0.4 0.6 18:41.47 multiload-apple 4126 cummings 20 0 74844 18m 10m S 0.4 0.9 0:07.54 gnome-terminal 4027 cummings 20 0 63536 20m 10m S 0.3 1.0 0:27.59 gnome-panel 4030 cummings 20 0 34104 13m 5500 S 0.3 0.7 0:24.95 gnome-screensav 16252 root 20 0 2560 1184 844 S 0.3 0.1 0:00.97 top 4086 cummings 20 0 57960 14m 9716 S 0.2 0.7 0:06.57 wnck-applet 4093 cummings 20 0 34464 14m 10m S 0.2 0.7 5:31.08 clock-applet 196 root 15 -5 000 S 0.1 0.0 1:08.89 ata/0 971 root 15 -5 000 S 0.1 0.0 3:38.03 scsi_eh_5 2838 root 20 0 3332 520 364 S 0.1 0.0 0:05.12 lircd 3138 root 20 0 3624 1032 912 S 0.1 0.0 0:14.66 hald-addon-stor 3175 root 20 0 3624 1032 912 S 0.1 0.0 0:37.94 hald-addon-stor 3194 mailman 20 0 13620 7000 2820 S 0.1 0.3 0:53.14 python
RE: Why is my load ave so high now?
PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3234 root 39 19 50292 46m 2072 R 91.7 2.3 3520:10 astropulse_5.06 I'm open to any WAGs right now as to the cause. FFS, would seti's load not have anything to do with it? :) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Why is my load ave so high now?
On 07/23/2009 11:09 PM, Joseph L. Casale wrote: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3234 root 39 19 50292 46m 2072 R 91.7 2.3 3520:10 astropulse_5.06 I'm open to any WAGs right now as to the cause. FFS, would seti's load not have anything to do with it? :) It didn't used to -- Kevin J. Cummings kjch...@rcn.com cummi...@kjchome.homeip.net cummi...@kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines