Re: Weird behaviour on System under high load
On 5/28/23 03:09, Christian wrote: Ursprüngliche Nachricht Von: David Christensen An: debian-user@lists.debian.org Betreff: Re: Weird behaviour on System under high load Datum: Sat, 27 May 2023 16:30:05 -0700 On 5/27/23 15:28, Christian wrote: New day, new tests. Got a crash again, however with the message "AHCI controller unavailable". Figured that is the SATA drives not being plugged in the right order. Corrected that and a 3:30h stress test went so far without any issues besides this old bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947685 Seems that I am just jumping from one error to the next... 3 hours and 30 minutes? Yikes! Please stop before you fry your computer. 10 seconds should be enough to see a problem; 1 minute is more than enough. Sadly not always. My crashes before would occur between a few minutes and 1 hour load. Now I hope everything is stable. Crashes are gone, only the network error seems to be unresolved (even though there is some workaround). Repeatable crashes from a reported issue indicate your hardware is okay. With the undervolting / overclocking on 12 core stress test, the system stays below 65°C (on Smbusmaster0) so should be no risk of damage. It is your computer and your decision. At this point, I would start adding the software stack, one piece at a time, testing between each piece. The challenge is devising or finding tests. Spot testing by hand can reveal bugs, but that gets tiresome. The best approach is an automated/ scripted test suite. If you are using Debian packages, you might want to look for test suites in the corresponding source packages. And/or, you can use building from source as a stress test. Compiling the Linux kernel should provide your processor, memory, and storage with a good workout. Thanks for the help! YW. :-) David
Re: Weird behaviour on System under high load
> Ursprüngliche Nachricht > Von: David Christensen > An: debian-user@lists.debian.org > Betreff: Re: Weird behaviour on System under high load > Datum: Sat, 27 May 2023 16:30:05 -0700 > > On 5/27/23 15:28, Christian wrote: > > > New day, new tests. Got a crash again, however with the message > "AHCI > > controller unavailable". > > Figured that is the SATA drives not being plugged in the right > order. > > Corrected that and a 3:30h stress test went so far without any > issues > > besides this old bug > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947685 > > > > Seems that I am just jumping from one error to the next... > > > 3 hours and 30 minutes? Yikes! Please stop before you fry your > computer. 10 seconds should be enough to see a problem; 1 minute is > more than enough. > Sadly not always. My crashes before would occur between a few minutes and 1 hour load. Now I hope everything is stable. Crashes are gone, only the network error seems to be unresolved (even though there is some workaround). With the undervolting / overclocking on 12 core stress test, the system stays below 65°C (on Smbusmaster0) so should be no risk of damage. Thanks for the help! > > David > > >
Re: Weird behaviour on System under high load
On 5/27/23 15:28, Christian wrote: New day, new tests. Got a crash again, however with the message "AHCI controller unavailable". Figured that is the SATA drives not being plugged in the right order. Corrected that and a 3:30h stress test went so far without any issues besides this old bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947685 Seems that I am just jumping from one error to the next... 3 hours and 30 minutes? Yikes! Please stop before you fry your computer. 10 seconds should be enough to see a problem; 1 minute is more than enough. David
Re: Weird behaviour on System under high load
> Ursprüngliche Nachricht > Von: David Christensen > An: debian-user@lists.debian.org > Betreff: Re: Weird behaviour on System under high load > Datum: Fri, 26 May 2023 18:22:17 -0700 > > On 5/26/23 16:08, Christian wrote: > > > Good and bad things: > > I started to test different setups (always with full 12 core stress > > test). Boot from USB liveCD (only stress and s-tui installed): > > > > - All disks disconnected, other than M2. Standard BIOS > > - All disks disconnected, other than M2. Proper Memory profile for > > timing > > - All disks disconnected, other than M2. Memory profile, > undervolted > > and overclocked with limited burst to 4ghz > > - All disks connected. Memory profile, undervolted and overclocked > > with > > limited burst to 4ghz > > > > All settings so far are stable. :-/ > > Will see tomorrow any differences in non-free firmware and kernel > > modules and test again. > > > > Very strange... > > > If everything is stable, including undervoltage and overclocking, I > would consider that good. I think your hardware is good. > > > When you say "USB liveCD", is that a USB optical drive with a live > CD, a > USB flash drive with a bootable OS on it, or something else? If it > is > something that can change, I suggest taking a image of the raw blocks > with dd(1) so that you can easily get back to this point as you > continue > testing. > New day, new tests. Got a crash again, however with the message "AHCI controller unavailable". Figured that is the SATA drives not being plugged in the right order. Corrected that and a 3:30h stress test went so far without any issues besides this old bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947685 Seems that I am just jumping from one error to the next... > > AIUI Debian can include microcode patches (depending upon processor). > If you are using such, I suggest adding that to your test agenda > first. > > > Firmware and kernel modules seem like the right next steps. > > > David > >
Re: Weird behaviour on System under high load
On 5/26/23 16:08, Christian wrote: Good and bad things: I started to test different setups (always with full 12 core stress test). Boot from USB liveCD (only stress and s-tui installed): - All disks disconnected, other than M2. Standard BIOS - All disks disconnected, other than M2. Proper Memory profile for timing - All disks disconnected, other than M2. Memory profile, undervolted and overclocked with limited burst to 4ghz - All disks connected. Memory profile, undervolted and overclocked with limited burst to 4ghz All settings so far are stable. :-/ Will see tomorrow any differences in non-free firmware and kernel modules and test again. Very strange... If everything is stable, including undervoltage and overclocking, I would consider that good. I think your hardware is good. When you say "USB liveCD", is that a USB optical drive with a live CD, a USB flash drive with a bootable OS on it, or something else? If it is something that can change, I suggest taking a image of the raw blocks with dd(1) so that you can easily get back to this point as you continue testing. AIUI Debian can include microcode patches (depending upon processor). If you are using such, I suggest adding that to your test agenda first. Firmware and kernel modules seem like the right next steps. David
Re: Weird behaviour on System under high load
> Ursprüngliche Nachricht > Von: David Christensen > An: debian-user@lists.debian.org > Betreff: Re: Weird behaviour on System under high load > Datum: Sun, 21 May 2023 15:04:44 -0700 > > > > > > > What stresstest are you using? > > > ... the package and command "s-tui" and "stress" > > s-tui gives you an overview on power usage, fan control, temps, > core > > frequencies and core utilization on the console > > > > stress is just producing load on selected # of cpus, it can be > > integrated in s-tui. > > > Thanks -- I like tools and will play with it: > > https://packages.debian.org/bullseye/s-tui > > > > > Okay. Put my Perl script on your liveUSB. Also put some tool > for > > > monitoring CPU temperature, such as sensors(1). > > > > Will have time again in a few days and check. > > > Please let us know what you find. > Good and bad things: I started to test different setups (always with full 12 core stress test). Boot from USB liveCD (only stress and s-tui installed): - All disks disconnected, other than M2. Standard BIOS - All disks disconnected, other than M2. Proper Memory profile for timing - All disks disconnected, other than M2. Memory profile, undervolted and overclocked with limited burst to 4ghz - All disks connected. Memory profile, undervolted and overclocked with limited burst to 4ghz All settings so far are stable. :-/ Will see tomorrow any differences in non-free firmware and kernel modules and test again. Very strange...
Re: Weird behaviour on System under high load
On 5/21/23 14:46, Christian wrote: David Christensen Sun, 21 May 2023 14:22:22 -0700 On 5/21/23 06:31, Christian wrote: David Christensen Sun, 21 May 2023 03:11:43 -0700 David Christensen Sat, 20 May 2023 18:00:48 -0700 Heat sinks, heat pipes, water blocks, radiators, fans, ducts, etc.. It is quite simple - Noctua NH-L9a-AM4 for CPU - Chassis 12cm fan - PSU Integrated fans I like the Noctua. :-) What stresstest are you using? ... the package and command "s-tui" and "stress" s-tui gives you an overview on power usage, fan control, temps, core frequencies and core utilization on the console stress is just producing load on selected # of cpus, it can be integrated in s-tui. Thanks -- I like tools and will play with it: https://packages.debian.org/bullseye/s-tui Okay. Put my Perl script on your liveUSB. Also put some tool for monitoring CPU temperature, such as sensors(1). Will have time again in a few days and check. Please let us know what you find. David
Re: Weird behaviour on System under high load
> Ursprüngliche Nachricht > Von: David Christensen > An: debian-user@lists.debian.org > Betreff: Re: Weird behaviour on System under high load > Datum: Sun, 21 May 2023 14:22:22 -0700 > > On 5/21/23 06:31, Christian wrote: > > David Christensen Sun, 21 May 2023 03:11:43 -0700 > > >>> David Christensen Sat, 20 May 2023 18:00:48 -0700 > > > > Please use inline posting style and proper indentation. > > > > Phew... will be quite hard to read. But here you go. > > > It is not hard when you delete the portions that you are not > responding to. > > > > > > > Have you cleaned the system interior, filters, fans, > heatsinks, > > > > > ducts, > > > > > etc., recently? > > > As written in OP, the system is new. Only PSU is used. So it is > clean > > > Okay. > > > > What is a thermal solution? > > > Heat sinks, heat pipes, water blocks, radiators, fans, ducts, etc.. > It is quite simple - Noctua NH-L9a-AM4 for CPU - Chassis 12cm fan - PSU Integrated fans > > > > What stresstest are you using? > > > > > stress running in s-tui > > > Do you mean "in situ"? > > https://www.merriam-webster.com/dictionary/in%20situ > No, it is the package and command "s-tui" and "stress" s-tui gives you an overview on power usage, fan control, temps, core frequencies and core utilization on the console stress is just producing load on selected # of cpus, it can be integrated in s-tui. > I prefer a tool that I can control. That is why I wrote the > previously > attached Perl script. It is public domain; you and everyone are free > to > use, modify, distribute, etc., as you see fit. > > > > > > > Have you tested the power supply recently? > > > It was working before without issues, so not explicitly tested. > > > I am not building regularly, so would need to borrow such equipment > > somewhere > > > Understand that an ATX PSU has multiple stages that produce +12 VDC, > +5 > VDC, +5 VDC standby, +3.3 VDC, and -12 VDC ("rails"). It is common > for > one or more rails to fail and the others to continue working. > Computers > exhibit "weird behaviour" when this happens. > > > Just spend the US$20. > > > > > > > Have you tested the memory recently? > > > > Did you do multi-threaded/ stress tests? > > > > > Yes, stress is running multiple threads. Only on 2 threads it was > > stable so far. However it takes longer for the errors to come up > when > > using less threads. might be that I did not test long enough. > > > I use Memtest86+ 5.01 on a bootable USB stick. In the > "Configuration" > menu, I can choose "Core Selection". It appears the default is > "Parallel (all)". Other choices include "Round Robin" and > "Sequential". > Memtest 5.01 also displays the CPU temperature. Running it an > Intel > Core i7-2600S with matching factory heat sink and fan for 30+ > minutes, > the current CPU temperature is 50 C. This leads me to believe that > the > memory is loaded to 100%, but the CPU is less (perhaps 60%?). > > https://memtest.org/ > > > I recommend that you run Memtest86+ in parallel mode for at least one > pass. I have seen computers go for 20+ hours before encountering a > memory error. > > > > > Did you see the problems when running Debian stable OOTB, before > > > adding > > > anything? > > > I would need to do this with a liveUSB, to have it run OOTB > > > Okay. Put my Perl script on your liveUSB. Also put some tool for > monitoring CPU temperature, such as sensors(1). Will have time again in a few days and check. > > > David > >
Re: Weird behaviour on System under high load
On 5/21/23 06:31, Christian wrote: David Christensen Sun, 21 May 2023 03:11:43 -0700 >>> David Christensen Sat, 20 May 2023 18:00:48 -0700 Please use inline posting style and proper indentation. Phew... will be quite hard to read. But here you go. It is not hard when you delete the portions that you are not responding to. Have you cleaned the system interior, filters, fans, heatsinks, ducts, etc., recently? As written in OP, the system is new. Only PSU is used. So it is clean Okay. What is a thermal solution? Heat sinks, heat pipes, water blocks, radiators, fans, ducts, etc.. What stresstest are you using? stress running in s-tui Do you mean "in situ"? https://www.merriam-webster.com/dictionary/in%20situ I prefer a tool that I can control. That is why I wrote the previously attached Perl script. It is public domain; you and everyone are free to use, modify, distribute, etc., as you see fit. Have you tested the power supply recently? It was working before without issues, so not explicitly tested. I am not building regularly, so would need to borrow such equipment somewhere Understand that an ATX PSU has multiple stages that produce +12 VDC, +5 VDC, +5 VDC standby, +3.3 VDC, and -12 VDC ("rails"). It is common for one or more rails to fail and the others to continue working. Computers exhibit "weird behaviour" when this happens. Just spend the US$20. Have you tested the memory recently? Did you do multi-threaded/ stress tests? Yes, stress is running multiple threads. Only on 2 threads it was stable so far. However it takes longer for the errors to come up when using less threads. might be that I did not test long enough. I use Memtest86+ 5.01 on a bootable USB stick. In the "Configuration" menu, I can choose "Core Selection". It appears the default is "Parallel (all)". Other choices include "Round Robin" and "Sequential". Memtest 5.01 also displays the CPU temperature. Running it an Intel Core i7-2600S with matching factory heat sink and fan for 30+ minutes, the current CPU temperature is 50 C. This leads me to believe that the memory is loaded to 100%, but the CPU is less (perhaps 60%?). https://memtest.org/ I recommend that you run Memtest86+ in parallel mode for at least one pass. I have seen computers go for 20+ hours before encountering a memory error. Did you see the problems when running Debian stable OOTB, before adding anything? I would need to do this with a liveUSB, to have it run OOTB Okay. Put my Perl script on your liveUSB. Also put some tool for monitoring CPU temperature, such as sensors(1). David
Re: Weird behaviour on System under high load
On 5/21/23 06:26, songbird wrote: David Christensen wrote: ... Measuring actual power supply output and system usage would involve building or buying suitable test equipment. The cost would be non-trivial. ... it depends upon how accurate you want to be and how much power. for my system it was a simple matter of buying a reasonably sized battery backup unit which includes in it's display the amount of power being drawn in watts. on sale the backup unit cost about $150 USD. if i want to see what something draws i have a power cord set up to use for that and just plug it in and watch the display as it operates. if the device is a computer part i can plug it in to my motherboard or via usb or ... as long as it gets done with a grounding strip and i do the power turn off and turn back on as is appropriate for the device (and within ratings of my power supply). also use this setup to figure out how much power the various wall warts are eating. :( switches on all of them are worth the expense. songbird Yes, there are a variety of price/performance options for measuring current and voltages between the AC power outlet and an AC load (such as a computer). But, I was talking about measuring currents and voltages between a computer power supply output and the various components inside the computer. David
Re: Weird behaviour on System under high load
David Christensen wrote: ... > Measuring actual power supply output and system usage would involve > building or buying suitable test equipment. The cost would be non-trivial. ... it depends upon how accurate you want to be and how much power. for my system it was a simple matter of buying a reasonably sized battery backup unit which includes in it's display the amount of power being drawn in watts. on sale the backup unit cost about $150 USD. if i want to see what something draws i have a power cord set up to use for that and just plug it in and watch the display as it operates. if the device is a computer part i can plug it in to my motherboard or via usb or ... as long as it gets done with a grounding strip and i do the power turn off and turn back on as is appropriate for the device (and within ratings of my power supply). also use this setup to figure out how much power the various wall warts are eating. :( switches on all of them are worth the expense. songbird
Re: Weird behaviour on System under high load
> Ursprüngliche Nachricht > Von: David Christensen > An: debian-user@lists.debian.org > Betreff: Re: Weird behaviour on System under high load > Datum: Sun, 21 May 2023 03:11:43 -0700 > > On 5/21/23 01:14, Christian wrote: > > > > Ursprüngliche Nachricht > > > Von: David Christensen > > > An: debian-user@lists.debian.org > > > Betreff: Re: Weird behaviour on System under high load > > > Datum: Sat, 20 May 2023 18:00:48 -0700 > > > > > > On 5/20/23 14:46, Christian wrote: > > > > Hi there, > > > > > > > > I am having trouble with a new build system. It works normal > and > > > > stable > > > > until I put extreme stress on it, e.g. using all 12 cores with > > > > stress > > > > tool. > > > > > > > > System will suddenly loose network connection and become > > > > unresponsive. > > > > Only a reset works. I am not sure what is going on, but it is > > > > reproducible: Put stress on the system and it fails. It seems, > > > > that > > > > something is getting out of step. > > > > > > > > Stuff below I found in the logs. I tried quite a bit, even > > > > upgraded > > > > to > > > > bookworm, to see if the newer kernel works. > > > > > > > > If anyone knows how to analyze this issue, it would be very > > > > helpful. > > > Please use inline posting style and proper indentation. Phew... will be quite hard to read. But here you go. > > > > > Have you verified that your PSU has sufficient capacity for the > > > load on > > > each and every rail? > > > Hi there, > > > > Lets go through the different topics: > > - Setup: It is a AMD 5600G > > https://www.amd.com/en/products/apu/amd-ryzen-5-5600g > > 65 W > > > > on a ASRock B550M-ITX/ac, > > > https://www.asrock.com/mb/AMD/B550M-ITXac/index.asp > > > > powered by a BeQuiet SP7 300W > > > > - Power: From the specifications it should fit. As it takes 5-20 > > minutes for the error to occur, I would take that as an > indication, > > that the power supply is ok. Otherwise would expect that to fail > right > > away? Is there a way to measure/test if there is any issue with > it? > > I also tested to limit PPT to 45W which also makes no difference. > > > If all you have a motherboard, a 65W CPU, and an SSD, that looks like > a > good quality 300W PSU and I would think it should support long-term > full > loading of the CPU. But, there is no substitute for doing the > engineering. > > > I do PSU calculations using a spreadsheet. This requires finding > power > specifications (or making estimates) for everything in the system, > which > can be tough. > > > BeQuiet has a PSU calculator. I suggest using it: > > https://www.bequiet.com/en/psucalculator > > > Measuring actual power supply output and system usage would involve > building or buying suitable test equipment. The cost would be non- > trivial. > > > An easy A/B test would be to connect a known-good, high-quality PSU > with > a higher power rating (say, 500-1000W). I use: > > https://www.fractal-design.com/products/power-supplies/ion/ion-2-platinum-660w/black/ > Used the calculator, however might be, that the onboard graphics is not attributed properly for. Will see that I get a 500W PSU for testing. > > > > Have you cleaned the system interior, filters, fans, heatsinks, > > > ducts, > > > etc., recently? > > > ? As written in OP, the system is new. Only PSU is used. So it is clean > > > > > Have you tested the thermal solution(s) recently? > > > - Thermal: I am observing the temperatures on the stresstest. If I > am > > correct in reading Smbusmaster0, Temps haven't been above 71°C, > but > > error also occurs earlier, way below 70. > > > Okay. > > > What is your CPU thermal solution? > What is a thermal solution? > > What stresstest are you using? > stress running in s-tui > > > > Have you tested the power supply recently? > It was working before without issues, so not explicitly tested. > > I suffered a rash of bad PSU's recently. I was able to figure it out > because I bought an inexpensive PSU tester years ago. It has saved > my > sanity more than once. I suggest that you buy something like it: > > https://www.ebay.com/sch/i.html?_from=R40&_t
Re: Weird behaviour on System under high load
On 5/21/23 01:14, Christian wrote: Ursprüngliche Nachricht Von: David Christensen An: debian-user@lists.debian.org Betreff: Re: Weird behaviour on System under high load Datum: Sat, 20 May 2023 18:00:48 -0700 On 5/20/23 14:46, Christian wrote: Hi there, I am having trouble with a new build system. It works normal and stable until I put extreme stress on it, e.g. using all 12 cores with stress tool. System will suddenly loose network connection and become unresponsive. Only a reset works. I am not sure what is going on, but it is reproducible: Put stress on the system and it fails. It seems, that something is getting out of step. Stuff below I found in the logs. I tried quite a bit, even upgraded to bookworm, to see if the newer kernel works. If anyone knows how to analyze this issue, it would be very helpful. Please use inline posting style and proper indentation. Have you verified that your PSU has sufficient capacity for the load on each and every rail? > Hi there, > > Lets go through the different topics: > - Setup: It is a AMD 5600G https://www.amd.com/en/products/apu/amd-ryzen-5-5600g 65 W > on a ASRock B550M-ITX/ac, https://www.asrock.com/mb/AMD/B550M-ITXac/index.asp > powered by a BeQuiet SP7 300W > > - Power: From the specifications it should fit. As it takes 5-20 > minutes for the error to occur, I would take that as an indication, > that the power supply is ok. Otherwise would expect that to fail right > away? Is there a way to measure/test if there is any issue with it? > I also tested to limit PPT to 45W which also makes no difference. If all you have a motherboard, a 65W CPU, and an SSD, that looks like a good quality 300W PSU and I would think it should support long-term full loading of the CPU. But, there is no substitute for doing the engineering. I do PSU calculations using a spreadsheet. This requires finding power specifications (or making estimates) for everything in the system, which can be tough. BeQuiet has a PSU calculator. I suggest using it: https://www.bequiet.com/en/psucalculator Measuring actual power supply output and system usage would involve building or buying suitable test equipment. The cost would be non-trivial. An easy A/B test would be to connect a known-good, high-quality PSU with a higher power rating (say, 500-1000W). I use: https://www.fractal-design.com/products/power-supplies/ion/ion-2-platinum-660w/black/ Have you cleaned the system interior, filters, fans, heatsinks, ducts, etc., recently? ? Have you tested the thermal solution(s) recently? > - Thermal: I am observing the temperatures on the stresstest. If I am > correct in reading Smbusmaster0, Temps haven't been above 71°C, but > error also occurs earlier, way below 70. Okay. What is your CPU thermal solution? What stresstest are you using? Have you tested the power supply recently? I suffered a rash of bad PSU's recently. I was able to figure it out because I bought an inexpensive PSU tester years ago. It has saved my sanity more than once. I suggest that you buy something like it: https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=antec+atx12+tester&_sacat=0 Have you tested the memory recently? > - Memory: Yes was tested right after the build with no errors Okay. Did you do multi-threaded/ stress tests? Are you running Debian stable? Are you running Debian stable packages only? Were they all installed with the same package manager? > - OS: I was running Debian stable in quite a minimal configuration > (fresh install as most services are dockerized) when first observed the > error. Now moved to Debian 12/Bookworm to see if it makes any > difference with higher kernel (it does not). Also exchanged r8169 for > the r8168. It changes the error messages, however system instability > stays. Did you see the problems when running Debian stable OOTB, before adding anything? Did you stress test the system before adding anything (other than the stress test)? If all of the above are okay and the system is still locking up, I would disable or remove all disks in the system, install a zeroed SSD, install Debian stable choosing only "SSH server" and "standard system utilities", install only the stable packages required for your workload, put the workload on it, and see what happens. > I could disconnect the disks and see if it makes any difference. > However when reproducing this error, disks other than system where > unmounted. So would guess this would be a test to see if it is about > power? Stripping the system down to minimum hardware and software is a good starting point. You will need a tool to load the system and some means to watch what happens. Assuming the base configuration passes all tests, then add something, test, and repeat until testing fails.
Re: Weird behaviour on System under high load
Hi there, Lets go through the different topics: - Setup: It is a AMD 5600G on a ASRock B550M-ITX/ac, powered by a BeQuiet SP7 300W - Power: From the specifications it should fit. As it takes 5-20 minutes for the error to occur, I would take that as an indication, that the power supply is ok. Otherwise would expect that to fail right away? Is there a way to measure/test if there is any issue with it? I also tested to limit PPT to 45W which also makes no difference. - Memory: Yes was tested right after the build with no errors - Thermal: I am observing the temperatures on the stresstest. If I am correct in reading Smbusmaster0, Temps haven't been above 71°C, but error also occurs earlier, way below 70. - OS: I was running Debian stable in quite a minimal configuration (fresh install as most services are dockerized) when first observed the error. Now moved to Debian 12/Bookworm to see if it makes any difference with higher kernel (it does not). Also exchanged r8169 for the r8168. It changes the error messages, however system instability stays. I could disconnect the disks and see if it makes any difference. However when reproducing this error, disks other than system where unmounted. So would guess this would be a test to see if it is about power? Ursprüngliche Nachricht Von: David Christensen An: debian-user@lists.debian.org Betreff: Re: Weird behaviour on System under high load Datum: Sat, 20 May 2023 18:00:48 -0700 On 5/20/23 14:46, Christian wrote: > Hi there, > > I am having trouble with a new build system. It works normal and > stable > until I put extreme stress on it, e.g. using all 12 cores with stress > tool. > > System will suddenly loose network connection and become > unresponsive. > Only a reset works. I am not sure what is going on, but it is > reproducible: Put stress on the system and it fails. It seems, that > something is getting out of step. > > Stuff below I found in the logs. I tried quite a bit, even upgraded > to > bookworm, to see if the newer kernel works. > > If anyone knows how to analyze this issue, it would be very helpful. > > Kind regards > Christian > > > 2023-05-20T20:12:17.054224+02:00 diskstation kernel: [ 1303.236428] - > -- > -[ cut here ] > 2023-05-20T20:12:17.054234+02:00 diskstation kernel: [ 1303.236430] > NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out > 2023-05-20T20:12:17.054235+02:00 diskstation kernel: [ 1303.236437] > WARNING: CPU: 5 PID: 2411 at net/sched/sch_generic.c:525 > dev_watchdog+0x207/0x210 > 2023-05-20T20:12:17.054236+02:00 diskstation kernel: [ 1303.236442] > Modules linked in: eq3_char_loop(OE) rpi_rf_mod_led(OE) ledtrig_timer > ledtrig_default_on xt_MASQUERADE nf_conntrack_netlink xfrm_user > xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay ip6t_rt > nft_chain_nat nf_nat xt_set xt_tcpmss xt_tcpudp xt_conntrack > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables > ip_set_hash_ip ip_set binfmt_misc nfnetlink nls_ascii nls_cp437 vfat > fat amdgpu iwlmvm btusb intel_rapl_msr btrtl intel_rapl_common btbcm > btintel edac_mce_amd btmtk mac80211 snd_hda_codec_realtek bluetooth > snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi gpu_sched > kvm_amd drm_buddy libarc4 snd_hda_intel drm_display_helper > snd_intel_dspcfg snd_intel_sdw_acpi iwlwifi kvm cec snd_hda_codec > jitterentropy_rng irqbypass rc_core snd_hda_core cfg80211 snd_hwdep > drm_ttm_helper snd_pcm ttm drbg wmi_bmof rapl ccp snd_timer > ansi_cprng > drm_kms_helper sp5100_tco snd pcspkr ecdh_generic rng_core > i2c_algo_bit > watchdog soundcore k10temp rfkill hb_rf_usb_2(OE) ecc > 2023-05-20T20:12:17.054240+02:00 diskstation kernel: [ 1303.236494] > generic_raw_uart(OE) acpi_cpufreq button joydev evdev sg nct6775 > nct6775_core drm hwmon_vid fuse loop efi_pstore configfs efivarfs > ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs > blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic > dm_crypt dm_mod hid_generic usbhid hid sd_mod crc32_pclmul > crc32c_intel > ahci ghash_clmulni_intel sha512_ssse3 libahci xhci_pci sha512_generic > xhci_hcd r8169 nvme realtek libata aesni_intel nvme_core t10_pi > crypto_simd mdio_devres usbcore scsi_mod crc64_rocksoft_generic > cryptd > libphy crc64_rocksoft crc_t10dif i2c_piix4 crct10dif_generic > crct10dif_pclmul crc64 crct10dif_common usb_common scsi_common video > wmi gpio_amdpt gpio_generic > 2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236534] > CPU: 5 PID: 2411 Comm: stress Tainted: G OE 6.1.0-9- > amd64 #1 Debian 6.1.27-1 > 2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236536] > Hardware name: To Be Filled By O.E.M. B550M-ITX/ac/B550M-ITX/ac, BIOS > L2.62 01/31/2023 > 2023-05-20T20:12:17.
Re: Weird behaviour on System under high load
On 5/20/23 14:46, Christian wrote: Hi there, I am having trouble with a new build system. It works normal and stable until I put extreme stress on it, e.g. using all 12 cores with stress tool. System will suddenly loose network connection and become unresponsive. Only a reset works. I am not sure what is going on, but it is reproducible: Put stress on the system and it fails. It seems, that something is getting out of step. Stuff below I found in the logs. I tried quite a bit, even upgraded to bookworm, to see if the newer kernel works. If anyone knows how to analyze this issue, it would be very helpful. Kind regards Christian 2023-05-20T20:12:17.054224+02:00 diskstation kernel: [ 1303.236428] --- -[ cut here ] 2023-05-20T20:12:17.054234+02:00 diskstation kernel: [ 1303.236430] NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out 2023-05-20T20:12:17.054235+02:00 diskstation kernel: [ 1303.236437] WARNING: CPU: 5 PID: 2411 at net/sched/sch_generic.c:525 dev_watchdog+0x207/0x210 2023-05-20T20:12:17.054236+02:00 diskstation kernel: [ 1303.236442] Modules linked in: eq3_char_loop(OE) rpi_rf_mod_led(OE) ledtrig_timer ledtrig_default_on xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay ip6t_rt nft_chain_nat nf_nat xt_set xt_tcpmss xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables ip_set_hash_ip ip_set binfmt_misc nfnetlink nls_ascii nls_cp437 vfat fat amdgpu iwlmvm btusb intel_rapl_msr btrtl intel_rapl_common btbcm btintel edac_mce_amd btmtk mac80211 snd_hda_codec_realtek bluetooth snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi gpu_sched kvm_amd drm_buddy libarc4 snd_hda_intel drm_display_helper snd_intel_dspcfg snd_intel_sdw_acpi iwlwifi kvm cec snd_hda_codec jitterentropy_rng irqbypass rc_core snd_hda_core cfg80211 snd_hwdep drm_ttm_helper snd_pcm ttm drbg wmi_bmof rapl ccp snd_timer ansi_cprng drm_kms_helper sp5100_tco snd pcspkr ecdh_generic rng_core i2c_algo_bit watchdog soundcore k10temp rfkill hb_rf_usb_2(OE) ecc 2023-05-20T20:12:17.054240+02:00 diskstation kernel: [ 1303.236494] generic_raw_uart(OE) acpi_cpufreq button joydev evdev sg nct6775 nct6775_core drm hwmon_vid fuse loop efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic dm_crypt dm_mod hid_generic usbhid hid sd_mod crc32_pclmul crc32c_intel ahci ghash_clmulni_intel sha512_ssse3 libahci xhci_pci sha512_generic xhci_hcd r8169 nvme realtek libata aesni_intel nvme_core t10_pi crypto_simd mdio_devres usbcore scsi_mod crc64_rocksoft_generic cryptd libphy crc64_rocksoft crc_t10dif i2c_piix4 crct10dif_generic crct10dif_pclmul crc64 crct10dif_common usb_common scsi_common video wmi gpio_amdpt gpio_generic 2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236534] CPU: 5 PID: 2411 Comm: stress Tainted: G OE 6.1.0-9- amd64 #1 Debian 6.1.27-1 2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236536] Hardware name: To Be Filled By O.E.M. B550M-ITX/ac/B550M-ITX/ac, BIOS L2.62 01/31/2023 2023-05-20T20:12:17.054242+02:00 diskstation kernel: [ 1303.236537] RIP: 0010:dev_watchdog+0x207/0x210 2023-05-20T20:12:17.054242+02:00 diskstation kernel: [ 1303.236540] Code: 00 e9 40 ff ff ff 48 89 df c6 05 ff 5f 3d 01 01 e8 be 79 f9 ff 44 89 e9 48 89 de 48 c7 c7 c8 16 9b a8 48 89 c2 e8 09 d2 86 ff <0f> 0b e9 22 ff ff ff 66 90 0f 1f 44 00 00 55 53 48 89 fb 48 8b 6f 2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236541] RSP: :a831c345fdc8 EFLAGS: 00010286 2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236543] RAX: RBX: 91a3c141 RCX: 2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236544] RDX: 0103 RSI: a893fa66 RDI: 2023-05-20T20:12:17.054244+02:00 diskstation kernel: [ 1303.236545] RBP: 91a3c1410488 R08: R09: a831c345fc38 2023-05-20T20:12:17.054244+02:00 diskstation kernel: [ 1303.236546] R10: 0003 R11: 91aafe27afe8 R12: 91a3c14103dc 2023-05-20T20:12:17.054245+02:00 diskstation kernel: [ 1303.236547] R13: R14: a7e2e7a0 R15: 91a3c1410488 2023-05-20T20:12:17.054245+02:00 diskstation kernel: [ 1303.236548] FS: 7f169849d740() GS:91aade34() knlGS: 2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236550] CS: 0010 DS: ES: CR0: 80050033 2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236551] CR2: 55d05c3f4000 CR3: 000103cf2000 CR4: 00750ee0 2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236552] PKRU: 5554 2023-05-20T20:12:17.054247+02:00 diskstation kernel: [ 1303.236553] Call Trace: 2023-05-20T20:12:17.054247+02:00 diskstation kernel: [ 1303.236554] 2023-05-20T20:12:17.054248+02:00 diskstation kernel: [
Weird behaviour on System under high load
Hi there, I am having trouble with a new build system. It works normal and stable until I put extreme stress on it, e.g. using all 12 cores with stress tool. System will suddenly loose network connection and become unresponsive. Only a reset works. I am not sure what is going on, but it is reproducible: Put stress on the system and it fails. It seems, that something is getting out of step. Stuff below I found in the logs. I tried quite a bit, even upgraded to bookworm, to see if the newer kernel works. If anyone knows how to analyze this issue, it would be very helpful. Kind regards Christian 2023-05-20T20:12:17.054224+02:00 diskstation kernel: [ 1303.236428] --- -[ cut here ] 2023-05-20T20:12:17.054234+02:00 diskstation kernel: [ 1303.236430] NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out 2023-05-20T20:12:17.054235+02:00 diskstation kernel: [ 1303.236437] WARNING: CPU: 5 PID: 2411 at net/sched/sch_generic.c:525 dev_watchdog+0x207/0x210 2023-05-20T20:12:17.054236+02:00 diskstation kernel: [ 1303.236442] Modules linked in: eq3_char_loop(OE) rpi_rf_mod_led(OE) ledtrig_timer ledtrig_default_on xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay ip6t_rt nft_chain_nat nf_nat xt_set xt_tcpmss xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables ip_set_hash_ip ip_set binfmt_misc nfnetlink nls_ascii nls_cp437 vfat fat amdgpu iwlmvm btusb intel_rapl_msr btrtl intel_rapl_common btbcm btintel edac_mce_amd btmtk mac80211 snd_hda_codec_realtek bluetooth snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi gpu_sched kvm_amd drm_buddy libarc4 snd_hda_intel drm_display_helper snd_intel_dspcfg snd_intel_sdw_acpi iwlwifi kvm cec snd_hda_codec jitterentropy_rng irqbypass rc_core snd_hda_core cfg80211 snd_hwdep drm_ttm_helper snd_pcm ttm drbg wmi_bmof rapl ccp snd_timer ansi_cprng drm_kms_helper sp5100_tco snd pcspkr ecdh_generic rng_core i2c_algo_bit watchdog soundcore k10temp rfkill hb_rf_usb_2(OE) ecc 2023-05-20T20:12:17.054240+02:00 diskstation kernel: [ 1303.236494] generic_raw_uart(OE) acpi_cpufreq button joydev evdev sg nct6775 nct6775_core drm hwmon_vid fuse loop efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic dm_crypt dm_mod hid_generic usbhid hid sd_mod crc32_pclmul crc32c_intel ahci ghash_clmulni_intel sha512_ssse3 libahci xhci_pci sha512_generic xhci_hcd r8169 nvme realtek libata aesni_intel nvme_core t10_pi crypto_simd mdio_devres usbcore scsi_mod crc64_rocksoft_generic cryptd libphy crc64_rocksoft crc_t10dif i2c_piix4 crct10dif_generic crct10dif_pclmul crc64 crct10dif_common usb_common scsi_common video wmi gpio_amdpt gpio_generic 2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236534] CPU: 5 PID: 2411 Comm: stress Tainted: G OE 6.1.0-9- amd64 #1 Debian 6.1.27-1 2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236536] Hardware name: To Be Filled By O.E.M. B550M-ITX/ac/B550M-ITX/ac, BIOS L2.62 01/31/2023 2023-05-20T20:12:17.054242+02:00 diskstation kernel: [ 1303.236537] RIP: 0010:dev_watchdog+0x207/0x210 2023-05-20T20:12:17.054242+02:00 diskstation kernel: [ 1303.236540] Code: 00 e9 40 ff ff ff 48 89 df c6 05 ff 5f 3d 01 01 e8 be 79 f9 ff 44 89 e9 48 89 de 48 c7 c7 c8 16 9b a8 48 89 c2 e8 09 d2 86 ff <0f> 0b e9 22 ff ff ff 66 90 0f 1f 44 00 00 55 53 48 89 fb 48 8b 6f 2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236541] RSP: :a831c345fdc8 EFLAGS: 00010286 2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236543] RAX: RBX: 91a3c141 RCX: 2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236544] RDX: 0103 RSI: a893fa66 RDI: 2023-05-20T20:12:17.054244+02:00 diskstation kernel: [ 1303.236545] RBP: 91a3c1410488 R08: R09: a831c345fc38 2023-05-20T20:12:17.054244+02:00 diskstation kernel: [ 1303.236546] R10: 0003 R11: 91aafe27afe8 R12: 91a3c14103dc 2023-05-20T20:12:17.054245+02:00 diskstation kernel: [ 1303.236547] R13: R14: a7e2e7a0 R15: 91a3c1410488 2023-05-20T20:12:17.054245+02:00 diskstation kernel: [ 1303.236548] FS: 7f169849d740() GS:91aade34() knlGS: 2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236550] CS: 0010 DS: ES: CR0: 80050033 2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236551] CR2: 55d05c3f4000 CR3: 000103cf2000 CR4: 00750ee0 2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236552] PKRU: 5554 2023-05-20T20:12:17.054247+02:00 diskstation kernel: [ 1303.236553] Call Trace: 2023-05-20T20:12:17.054247+02:00 diskstation kernel: [ 1303.236554] 2023-05-20T20:12:17.054248+02:00 diskstation kernel: [ 1303.236557] ?
Re: NFS on Raspberry Pi high load
Bob Proulx b...@proulx.com wrote: I don't know about the new Raspberry quad core. Does it have the same limited usb chip as the original? It does. But because the CPU is more powerful (and you have 4 cores) you can squeeze about 95MBit/s out of it. Right now I am dd'ing a 600MB file over NFS (the Raspi2 is the client) to /dev/null and the transfer rate (measured on the server) is stable at 96.7MBite/s, but one core is fully occupied with the transfer and the dd-process is mostly in the D-state and does not use much CPU at all (about 1% according to top). Final results: (with not special blocksize setting): 1273709+1 records in 1273709+1 records out 652139386 bytes (621.9MB) copied, 55.460434 seconds, 11.2MB/s real0m 55.46s user0m 0.75s sys 0m 6.31s (with bs=4M): 155+1 records in 155+1 records out 652139386 bytes (621.9MB) copied, 55.431787 seconds, 11.2MB/s real0m 55.44s user0m 0.00s sys 0m 2.68s Copying the same file to the SDHC card takes a little bit longer, but not much: real1m 1.91s user0m 0.13s sys 0m 8.12s Grüße, Sven. -- Sigmentation fault. Core dumped. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/2bna0mm4b...@mids.svenhartge.de
Re: NFS on Raspberry Pi high load
Sven Hartge wrote: Reco wrote: Sven Hartge wrote: Maybe the USB hardware implementation is better in the N900? The one in the Pi is quite bad and finicky. I am coming to this discussion late but I had to confirm that the USB chip in the Raspberry Pi is very limiting. It has a maximum bandwidth of around 50Mbps and everything including ethernet goes through it. This means that if you have one data stream that it will get a maximum of 50Mbps. If you have two streams, such as if using the Raspberry Pi for a router and it is routing packets in one interface and out a different one, then the maximum throughput is 25Mbps with one stream in and one stream out. I have a good friend who has worked on the drivers for the pi and he told me that the usb chip generated a minimum of 2000 interrupts per second even at idle. I will pass that along as hearsay because it seems plausible. For example this CEC limitation makes the Raspberry Pi acceptable for an 802.11b WiFi access point at 11Mbps but not able to keep up with 'g' or 'n' speeds. It is simply hardware limited. If you have 8 nfsds running and let's say each of them try to use only one data stream then each will get only 6.25Mbps maximum. They will spend a lot of time in the run queue blocked on I/O waiting for the network to respond. basti wrote: Per default nfs starts with 8 servers root@raspberrypi:~# head -n 2 /etc/default/nfs-kernel-server # Number of servers to start up RPCNFSDCOUNT=8 As you have found doing any type of real transfer will immediately consume 8 processese because each daemon will be in the run queue ready to run but waiting for I/O. The biggest problem is that each daemon will consume *memory*. The daemon won't consume cpu while it is blocked waiting on I/O. But it will consume memory. Memory for the nfs daemons. Memory for the kernel to track the multiple network streams active. Memory for file system buffer cache. Everything takes memory. The Raspbery Pi has a limited 512M of ram. I like seeing the bar graph of the memory visualization from htop. I suggest installing htop and looking at the memory bar graph displaying the amount of consumed memory and the amount available for cache. So I try to transfer a 3GB file from the raspberry to my laptop via WLAN(n). This operation kills my raspberry. I get a load of 12 and more. 10 Minutes after I interrupt this operation the load was still at 10. In addition to the 8 processes consuming memory from the 8 nfsds there will need to be additional cpu to deal with the driver for the usb chip. It will need to handle the accounting for the multiple network streams. A single stream will take less resources than 8 streams. And anything else that happens along. That extra accounts for the load of 10 you are seeing. But the real problem is probably the lack of memory. The many processes stacked up and the I/O buffers will likely have consumed everything. So I deside to reduce the number of servers to 2. Now it's a bit better, the load is only around 5. That was a good configuration modification. About the best you can do. Can somebody reproduce this behavior? Yes. Easily! It is simply a natural consequence of the limited hardware of the Raspberry Pi. I have become a fan of the newer Banana Pi. It is very Raspberry-like but has a different CEC and doesn't have that very limited 50Mbps usb chip found on the Raspberry. On the Banana Pi there is 1G of ram, twice that of the Raspbery. It is a dual core arm, again twice the Raspberry. It is an armv7l architecture and therefore runs stock Debian. And best yet for your purposes it has much higher speed I/O. On the Banana Pi I can routinely get 750Mbps through a single ethernet connection. That is about the same performance as an Intel Atom D525. The Banana Pi makes a much better practical machine than the Raspberry. The price of the Banana is currently running around US $42 only $7 more than the Raspberry. It is a much more capable machine. I don't know about the new Raspberry quad core. Does it have the same limited usb chip as the original? Bob signature.asc Description: Digital signature
Re: NFS on Raspberry Pi high load
Reco recovery...@gmail.com wrote: On Fri, 19 Jun 2015 20:38:12 +0200 Sven Hartge s...@svenhartge.de wrote: Maybe the USB hardware implementation is better in the N900? The one in the Pi is quite bad and finicky. I happen to have Pi too. Not that I need an NFS server on it, NFS client is sufficient for my needs, but still. In addition to that, data transfer via USB is quite CPU-intensive, as Petter wrote and overwhelms the single CPU core of the Pi if it needs to drive the SD card at the same time. Hm. I plugged an Ethernet cable into it, read and wrote a big file via NFS. Got consistent 50mbps. Where did you write the file to and from? You said your Pi is a NFS client so I assume you wrote a file to a server and read it back from there. According to iperf, I could go as high as 82.2 mbps. Not the fair gigabit I have on this LAN, but close to theoretical 100mbit limit of the NIC. iperf does no file I/O so nearly every CPU cylce can be used for the USB transfer. During the NFS test, two kernel threads were the worst CPU consumers, kworker/0 and ksoftirqd/0. During the iperf test, the worst CPU consumers were iperf itself and ksoftirqd/0. According to the /proc/interrupts, the top interrupt consumer was IRQ32, which is: dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb1 That is the driver for the USB port, a DesignWare OnTheGo USB controller. The controller is able to drive the USB port as either a host or a client. This chip and the driver are a constant work in progress and depending on the kernel version and the firmware your luck with the USB port on the Pi might be better or worse. For example: http://ludovicrousseau.blogspot.de/2014/04/usb-issues-with-raspberry-pi.html So maybe by updating the bootloader and GPU firmware to the latest from https://github.com/raspberrypi/firmware one might be able to improve the situation. Grüße, Sven. -- Sigmentation fault. Core dumped. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/7bn4gvfph...@mids.svenhartge.de
Re: NFS on Raspberry Pi high load
Am 19.06.2015 um 14:47 schrieb Petter Adsen: On Fri, 19 Jun 2015 14:09:45 +0200 basti black.flederm...@arcor.de wrote: The Problem is not the speed of 3 MB/s it's the load of 12 and more. On 19.06.2015 14:03, Sven Hartge wrote: basti black.flederm...@arcor.de wrote: iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card (10-15 MB/s read, 9-5 MB/s write I guess). More than 3MByte/s is not really achievable with a Pi-1, because the CPU is very weak and the Ethernet-Chip is attached via USB. Under the best conditions you may be able to transfer up to 45MBit/s, but a maximum transfer rate of about 35MBit/s is normal. The load is so high because USB is very CPU-intensive. If you were to use the on-board Ethernet, you would not see such a high load. The pi has no on-board ethernet. The ethernet port is attached via USB. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Re: NFS on Raspberry Pi high load
Hi. On Fri, Jun 19, 2015 at 02:47:20PM +0200, Petter Adsen wrote: On Fri, 19 Jun 2015 14:09:45 +0200 basti black.flederm...@arcor.de wrote: The Problem is not the speed of 3 MB/s it's the load of 12 and more. On 19.06.2015 14:03, Sven Hartge wrote: basti black.flederm...@arcor.de wrote: iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card (10-15 MB/s read, 9-5 MB/s write I guess). More than 3MByte/s is not really achievable with a Pi-1, because the CPU is very weak and the Ethernet-Chip is attached via USB. Under the best conditions you may be able to transfer up to 45MBit/s, but a maximum transfer rate of about 35MBit/s is normal. The load is so high because USB is very CPU-intensive. If you were to use the on-board Ethernet, you would not see such a high load. What? Are you serious? I have this Nokia N900 lying behind me which is connected by IP-via-USB (aka usbnet aka g_ether) and with the order of magnitude slower ARM CPU it reliably shows 40mbps with no noticeable load. There are countless things I'd blame in this situation (large amounts of sync I/O from knfsd, relatively small amount of memory for a NFS server, HUEG read/write latency of MMC card), but blaming the type of Ethernet connection is the last thing I'd do. Regardless, there's a way to see the cause of all this trouble. Relatively new, but demonstrative one: perf record --a perf report perf.data Reco -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150619130248.gb20...@d1696.int.rdtex.ru
Re: NFS on Raspberry Pi high load
basti black.flederm...@arcor.de wrote: iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card (10-15 MB/s read, 9-5 MB/s write I guess). More than 3MByte/s is not really achievable with a Pi-1, because the CPU is very weak and the Ethernet-Chip is attached via USB. Under the best conditions you may be able to transfer up to 45MBit/s, but a maximum transfer rate of about 35MBit/s is normal. Grüße, Sven. -- Sigmentation fault. Core dumped. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/3bn3l39ph...@mids.svenhartge.de
Re: NFS on Raspberry Pi high load
On Fri, 19 Jun 2015 14:09:45 +0200 basti black.flederm...@arcor.de wrote: The Problem is not the speed of 3 MB/s it's the load of 12 and more. On 19.06.2015 14:03, Sven Hartge wrote: basti black.flederm...@arcor.de wrote: iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card (10-15 MB/s read, 9-5 MB/s write I guess). More than 3MByte/s is not really achievable with a Pi-1, because the CPU is very weak and the Ethernet-Chip is attached via USB. Under the best conditions you may be able to transfer up to 45MBit/s, but a maximum transfer rate of about 35MBit/s is normal. The load is so high because USB is very CPU-intensive. If you were to use the on-board Ethernet, you would not see such a high load. Petter -- I'm ionized Are you sure? I'm positive. pgpb9DSiuayKO.pgp Description: OpenPGP digital signature
NFS on Raspberry Pi high load
Hello, perhaps thats a bit OT but I can't found a Rasbian or RaspberryPi related mailinglist. Per default nfs starts with 8 servers root@raspberrypi:~# head -n 2 /etc/default/nfs-kernel-server # Number of servers to start up RPCNFSDCOUNT=8 So I try to transfer a 3GB file from the raspberry to my laptop via WLAN(n). This operation kills my raspberry. I get a load of 12 and more. 10 Minutes after I interrupt this operation the load was still at 10. So I deside to reduce the number of servers to 2. Now it's a bit better, the load is only around 5. iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card (10-15 MB/s read, 9-5 MB/s write I guess). Test on Pi 1 model B with 512MB RAM. Can somebody reproduce this behavior? Thanks a lot. Regards Basti -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/5583f766.9050...@arcor.de
Re: NFS on Raspberry Pi high load
The Problem is not the speed of 3 MB/s it's the load of 12 and more. On 19.06.2015 14:03, Sven Hartge wrote: basti black.flederm...@arcor.de wrote: iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card (10-15 MB/s read, 9-5 MB/s write I guess). More than 3MByte/s is not really achievable with a Pi-1, because the CPU is very weak and the Ethernet-Chip is attached via USB. Under the best conditions you may be able to transfer up to 45MBit/s, but a maximum transfer rate of about 35MBit/s is normal. Grüße, Sven. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/55840689.8080...@arcor.de
Re: NFS on Raspberry Pi high load
Hi. On Fri, 19 Jun 2015 20:38:12 +0200 Sven Hartge s...@svenhartge.de wrote: Reco recovery...@gmail.com wrote: On Fri, Jun 19, 2015 at 02:47:20PM +0200, Petter Adsen wrote: On Fri, 19 Jun 2015 14:09:45 +0200 basti black.flederm...@arcor.de wrote: On 19.06.2015 14:03, Sven Hartge wrote: basti black.flederm...@arcor.de wrote: iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card (10-15 MB/s read, 9-5 MB/s write I guess). More than 3MByte/s is not really achievable with a Pi-1, because the CPU is very weak and the Ethernet-Chip is attached via USB. Under the best conditions you may be able to transfer up to 45MBit/s, but a maximum transfer rate of about 35MBit/s is normal. The Problem is not the speed of 3 MB/s it's the load of 12 and more. The load is so high because USB is very CPU-intensive. If you were to use the on-board Ethernet, you would not see such a high load. What? Are you serious? I have this Nokia N900 lying behind me which is connected by IP-via-USB (aka usbnet aka g_ether) and with the order of magnitude slower ARM CPU it reliably shows 40mbps with no noticeable load. Maybe the USB hardware implementation is better in the N900? The one in the Pi is quite bad and finicky. I happen to have Pi too. Not that I need an NFS server on it, NFS client is sufficient for my needs, but still. In addition to that, data transfer via USB is quite CPU-intensive, as Petter wrote and overwhelms the single CPU core of the Pi if it needs to drive the SD card at the same time. Hm. I plugged an Ethernet cable into it, read and wrote a big file via NFS. Got consistent 50mbps. According to iperf, I could go as high as 82.2 mbps. Not the fair gigabit I have on this LAN, but close to theoretical 100mbit limit of the NIC. During the NFS test, two kernel threads were the worst CPU consumers, kworker/0 and ksoftirqd/0. During the iperf test, the worst CPU consumers were iperf itself and ksoftirqd/0. According to the /proc/interrupts, the top interrupt consumer was IRQ32, which is: dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb1 On the other hand, a simple cat /dev/zero file test provided me with 100% iowait, but no actual CPU usage. Perf mysteriously failed on me. It did record something, but 'perf report' refused me to show anything. Must be something with this custom Raspbian kernel. So, I agree that using Pi's Ethernet interface eats CPU, but saying 'USB eats CPU' is oversimplifying thing quite a bit. Specifically, if NFS is involved. What I suspect was happening with your NFS server is the multiple knfsd threads in D-state (i.e. blocked by iowait by slof MMC card) *plus* this USB Ethernet interrupts. I'd start with lowering knfsd count. If the source or destination of the transmitted data is on an USB medium it gets even worse because all USB ports share the same root port on the SoC. I'm too lazy to check it, so I'll trust you on this. Besides: I always found the load on Linux NFS servers to be higher than on a Samba-Server with equal throughput. I guess the calculation of the load is different for the NFS kernel server process than for userland fileservices. I have to trust you on this too. Never bothered myself with inferior network filesystems (Samba) due to the existence of superior one (NFS4). And, speaking of those network filesystems. Have you tried to use iSCSI to do whatever you're trying to do with NFS? What about a simple sshfs? Reco -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150619222457.216fb96452d765f16fb17...@gmail.com
Re: NFS on Raspberry Pi high load
Reco recovery...@gmail.com wrote: On Fri, 19 Jun 2015 20:38:12 +0200 Sven Hartge s...@svenhartge.de wrote: What I suspect was happening with your NFS server is the multiple knfsd threads in D-state (i.e. blocked by iowait by slof MMC card) *plus* this USB Ethernet interrupts. I'd start with lowering knfsd count. That would also be my first step. I would lower RPCNFSDCOUNT to 2. If the source or destination of the transmitted data is on an USB medium it gets even worse because all USB ports share the same root port on the SoC. I'm too lazy to check it, so I'll trust you on this. Data enters the SoC through USB from the ethernet chip and then is pushed out on the same shared bus to the USB disk. This absolutely kills the Pi. Besides: I always found the load on Linux NFS servers to be higher than on a Samba-Server with equal throughput. I guess the calculation of the load is different for the NFS kernel server process than for userland fileservices. I have to trust you on this too. Never bothered myself with inferior network filesystems (Samba) due to the existence of superior one (NFS4). Well, if you want to serve files to many different operating systems you cannot always use the tools you want if you are not able to control the protocol the client wants or need to speak. And, speaking of those network filesystems. Have you tried to use iSCSI to do whatever you're trying to do with NFS? What about a simple sshfs? sshfs couples the problems of the USB network port with the slow ARM-CPU doing crypto stuff. You won't win any speed records with that combination. Grüße, Sven. -- Sigmentation fault. Core dumped. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/6bn4gjrph...@mids.svenhartge.de
Re: NFS on Raspberry Pi high load
Reco recovery...@gmail.com wrote: On Fri, Jun 19, 2015 at 02:47:20PM +0200, Petter Adsen wrote: On Fri, 19 Jun 2015 14:09:45 +0200 basti black.flederm...@arcor.de wrote: On 19.06.2015 14:03, Sven Hartge wrote: basti black.flederm...@arcor.de wrote: iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card (10-15 MB/s read, 9-5 MB/s write I guess). More than 3MByte/s is not really achievable with a Pi-1, because the CPU is very weak and the Ethernet-Chip is attached via USB. Under the best conditions you may be able to transfer up to 45MBit/s, but a maximum transfer rate of about 35MBit/s is normal. The Problem is not the speed of 3 MB/s it's the load of 12 and more. The load is so high because USB is very CPU-intensive. If you were to use the on-board Ethernet, you would not see such a high load. What? Are you serious? I have this Nokia N900 lying behind me which is connected by IP-via-USB (aka usbnet aka g_ether) and with the order of magnitude slower ARM CPU it reliably shows 40mbps with no noticeable load. Maybe the USB hardware implementation is better in the N900? The one in the Pi is quite bad and finicky. In addition to that, data transfer via USB is quite CPU-intensive, as Petter wrote and overwhelms the single CPU core of the Pi if it needs to drive the SD card at the same time. If the source or destination of the transmitted data is on an USB medium it gets even worse because all USB ports share the same root port on the SoC. Besides: I always found the load on Linux NFS servers to be higher than on a Samba-Server with equal throughput. I guess the calculation of the load is different for the NFS kernel server process than for userland fileservices. Grüße, Sven. -- Sigmentation fault. Core dumped. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/4bn4bukph...@mids.svenhartge.de
Re: Re: MySQL slow and high load with Debian Wheezy (was: [whole mail text])
Found this thread searching for a solution to my problem (which sounds similar). My solution was barrier=0 in /etc/fstab see https://wiki.archlinux.org/index.php/Ext4 Uh, specifically my problem was that loading large mysql files took forever and would often end with mysql losing the connection (local mysql daemon). Smaller files were still slow (especially from the context of running unit tests that do lots of mysql queries via sql files. iostat (apt-get install sysstat; iostat -x -d sda 5;) showed very high %util, but very low writes. Interestingly when I pointed the mysql server to store the data on slower usb mounted drives I had better performance. Anyway my old workstation running squeeze gives good performance. Wheezy bad performance until barrier=0 was added. Before using barrier=0 I played around with stuff such as wrapping sql in set autocommit =0; and commit; Also played with innodb mysql server settings. But, for me barrier=0 is awesome! (Using laptop so have battery in case power fails...). Daniel -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/522b7920.90...@gmail.com
Re: High Load/Interrupts on Wheezy
On Tue, Jul 02, 2013 at 08:54:06PM -0400, Will Platnick wrote: I am experiencing some issues with load after upgrading some of my Squeeze boxes to Wheezy. I have 7 app servers, all with identical hardware with identical packages and code. I upgraded one of my boxes to wheezy, along with the custom packages we use for Python, PHP, etc… Same versions of the software, just built on Wheezy instead of Squeeze. My problem is that my Wheezy boxes have a load of over 3 and are not staying up during our peak time, whereas our squeeze boxes have a load of less than 1. The interesting part, is that despite the high load, my wheezy boxes are actually performing quite well, and are outperforming my squeeze boxes by 2-3 ms. Never the less, the high load is giving us cause for concern and is stopping us from migrating completely, and we're wondering if anybody else is seeing the same thing or can give us some assistance on where to go from here. I believe I have tracked down the issue with our load to be an interrupt issue. My interrupts on wheezy are way higher. CPU, I/O, Memory and Context Switches are all the same (measured with top, atop, iotop, vmstat). It doesn't appear to be a hardware issue, as I deployed wheezy and our code base to a different and faster motherboard/cpu combo, and the issue remained. The items that stands out is that my Rescheduling Interrupts and timer are interrupting like crazy on wheezy compared to squeeze. Here is my output of total interrupts on Squeeze vs Wheezy for two different machines, rebooted and placed into service at the exact same time, with traffic distributed to them via round robin, so it should be fairly equal. Rescheduling Interrupts: 4109580 on Wheezy vs 67418 on Squeeze. Timer: 504238 on Wheezy vs 50 on Squeeze. Thoughts? Suggestions? This was the first search result for Rescheduling Interrupts. The advice should apply to Debian equally well. https://help.ubuntu.com/community/ReschedulingInterrupts signature.asc Description: Digital signature
Re: High Load/Interrupts on Wheezy
I followed those. I got nothing. — Sent from Mailbox for iPhone On Wed, Jul 3, 2013 at 5:24 AM, Darac Marjal mailingl...@darac.org.uk wrote: On Tue, Jul 02, 2013 at 08:54:06PM -0400, Will Platnick wrote: I am experiencing some issues with load after upgrading some of my Squeeze boxes to Wheezy. I have 7 app servers, all with identical hardware with identical packages and code. I upgraded one of my boxes to wheezy, along with the custom packages we use for Python, PHP, etc… Same versions of the software, just built on Wheezy instead of Squeeze. My problem is that my Wheezy boxes have a load of over 3 and are not staying up during our peak time, whereas our squeeze boxes have a load of less than 1. The interesting part, is that despite the high load, my wheezy boxes are actually performing quite well, and are outperforming my squeeze boxes by 2-3 ms. Never the less, the high load is giving us cause for concern and is stopping us from migrating completely, and we're wondering if anybody else is seeing the same thing or can give us some assistance on where to go from here. I believe I have tracked down the issue with our load to be an interrupt issue. My interrupts on wheezy are way higher. CPU, I/O, Memory and Context Switches are all the same (measured with top, atop, iotop, vmstat). It doesn't appear to be a hardware issue, as I deployed wheezy and our code base to a different and faster motherboard/cpu combo, and the issue remained. The items that stands out is that my Rescheduling Interrupts and timer are interrupting like crazy on wheezy compared to squeeze. Here is my output of total interrupts on Squeeze vs Wheezy for two different machines, rebooted and placed into service at the exact same time, with traffic distributed to them via round robin, so it should be fairly equal. Rescheduling Interrupts: 4109580 on Wheezy vs 67418 on Squeeze. Timer: 504238 on Wheezy vs 50 on Squeeze. Thoughts? Suggestions? This was the first search result for Rescheduling Interrupts. The advice should apply to Debian equally well. https://help.ubuntu.com/community/ReschedulingInterrupts
Re: High Load/Interrupts on Wheezy
Something else I just noticed now that I'm on a screen high enough to show all of /proc/interrupts on one line:Non-maskable interrupts are happening on Wheezy whereas they didn't on Squeeze. Additionally, it seems Non-maskable interrupts and Performance monitoring are the same value all the time. --Will PlatnickSent with Airmail On July 3, 2013 at 7:17:04 AM, Will Platnick (wplatn...@gmail.com) wrote: I followed those. I got nothing.—Sent from Mailbox for iPhoneOn Wed, Jul 3, 2013 at 5:24 AM, Darac Marjal mailingl...@darac.org.uk wrote:nullsignature.asc
Re: High Load/Interrupts on Wheezy
More troubleshooting steps: Built and installed latest 3.10 kernel, no change in interrupts Built and installed latest 2.6.32 kernel, and I am back to Squeeze level loads and no high timer, rescheduling, non-maskable or performance interrupts are present. So, does anybody have any idea what changed in the 3.2+ series that could cause this? On Wed, Jul 3, 2013 at 8:06 AM, Will Platnick wplatn...@gmail.com wrote: Something else I just noticed now that I'm on a screen high enough to show all of /proc/interrupts on one line: Non-maskable interrupts are happening on Wheezy whereas they didn't on Squeeze. Additionally, it seems Non-maskable interrupts and Performance monitoring are the same value all the time. -- Will Platnick Sent with Airmail http://airmailapp.info/tracking On July 3, 2013 at 7:17:04 AM, Will Platnick (wplatn...@gmail.com) wrote: I followed those. I got nothing. — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Wed, Jul 3, 2013 at 5:24 AM, Darac Marjal mailingl...@darac.org.ukwrote: null **signature.asc**
Re: High Load/Interrupts on Wheezy
On 04/07/13 00:30, Will Platnick wrote: More troubleshooting steps: Built and installed latest 3.10 kernel, no change in interrupts Built and installed latest 2.6.32 kernel, and I am back to Squeeze level loads and no high timer, rescheduling, non-maskable or performance interrupts are present. So, does anybody have any idea what changed in the 3.2+ series that could cause this? Nope. But I've been experiencing the same thing so following your posts. I've set up two identical LAMP servers hosting identical sites using Virtualmin, one pure, standard (untweaked) Squeeze, the other pure, standard (untweaked) Wheezy built from the same package list. The Squeeze one runs fine in 256MB of RAM, the Wheezy takes nearly three times as long to boot, likewise to shutdown *even* when given 512MB of RAM. Identical virtualmachine setups. I've logged the output of ps aux and will compare them tomorrow night - if I find anything obvious I'll post them. Everything else is similar to your results. On Wed, Jul 3, 2013 at 8:06 AM, Will Platnick wplatn...@gmail.com mailto:wplatn...@gmail.com wrote: Something else I just noticed now that I'm on a screen high enough to show all of /proc/interrupts on one line: snipped Kind regards -- Iceweasel/Firefox/Chrome/Chromium/Iceape/IE extensions for finding answers to Debian questions:- https://addons.mozilla.org/en-US/firefox/collections/Scott_Ferguson/debian/ signature.asc Description: OpenPGP digital signature
Re: High Load/Interrupts on Wheezy
Same issue here exactly and have noticed this since upgrading to Wheezy. We have also delayed upgrading the rest of our servers until this gets fixed. On Wed, Jul 3, 2013 at 10:45 AM, Scott Ferguson scott.ferguson.debian.u...@gmail.com wrote: On 04/07/13 00:30, Will Platnick wrote: More troubleshooting steps: Built and installed latest 3.10 kernel, no change in interrupts Built and installed latest 2.6.32 kernel, and I am back to Squeeze level loads and no high timer, rescheduling, non-maskable or performance interrupts are present. So, does anybody have any idea what changed in the 3.2+ series that could cause this? Nope. But I've been experiencing the same thing so following your posts. I've set up two identical LAMP servers hosting identical sites using Virtualmin, one pure, standard (untweaked) Squeeze, the other pure, standard (untweaked) Wheezy built from the same package list. The Squeeze one runs fine in 256MB of RAM, the Wheezy takes nearly three times as long to boot, likewise to shutdown *even* when given 512MB of RAM. Identical virtualmachine setups. I've logged the output of ps aux and will compare them tomorrow night - if I find anything obvious I'll post them. Everything else is similar to your results. On Wed, Jul 3, 2013 at 8:06 AM, Will Platnick wplatn...@gmail.com mailto:wplatn...@gmail.com wrote: Something else I just noticed now that I'm on a screen high enough to show all of /proc/interrupts on one line: snipped Kind regards -- Iceweasel/Firefox/Chrome/Chromium/Iceape/IE extensions for finding answers to Debian questions:- https://addons.mozilla.org/en-US/firefox/collections/Scott_Ferguson/debian/
Re: High Load/Interrupts on Wheezy
So since there seems to be a few of us having this issue, are there any Debian or linux kernel engineers out there who are willing to help? Is this the best place for that? On Wed, Jul 3, 2013 at 3:50 PM, David Mckisick mckis...@gmail.com wrote: Same issue here exactly and have noticed this since upgrading to Wheezy. We have also delayed upgrading the rest of our servers until this gets fixed. On Wed, Jul 3, 2013 at 10:45 AM, Scott Ferguson scott.ferguson.debian.u...@gmail.com wrote: On 04/07/13 00:30, Will Platnick wrote: More troubleshooting steps: Built and installed latest 3.10 kernel, no change in interrupts Built and installed latest 2.6.32 kernel, and I am back to Squeeze level loads and no high timer, rescheduling, non-maskable or performance interrupts are present. So, does anybody have any idea what changed in the 3.2+ series that could cause this? Nope. But I've been experiencing the same thing so following your posts. I've set up two identical LAMP servers hosting identical sites using Virtualmin, one pure, standard (untweaked) Squeeze, the other pure, standard (untweaked) Wheezy built from the same package list. The Squeeze one runs fine in 256MB of RAM, the Wheezy takes nearly three times as long to boot, likewise to shutdown *even* when given 512MB of RAM. Identical virtualmachine setups. I've logged the output of ps aux and will compare them tomorrow night - if I find anything obvious I'll post them. Everything else is similar to your results. On Wed, Jul 3, 2013 at 8:06 AM, Will Platnick wplatn...@gmail.com mailto:wplatn...@gmail.com wrote: Something else I just noticed now that I'm on a screen high enough to show all of /proc/interrupts on one line: snipped Kind regards -- Iceweasel/Firefox/Chrome/Chromium/Iceape/IE extensions for finding answers to Debian questions:- https://addons.mozilla.org/en-US/firefox/collections/Scott_Ferguson/debian/
High Load/Interrupts on Wheezy
I am experiencing some issues with load after upgrading some of my Squeeze boxes to Wheezy. I have 7 app servers, all with identical hardware with identical packages and code. I upgraded one of my boxes to wheezy, along with the custom packages we use for Python, PHP, etc… Same versions of the software, just built on Wheezy instead of Squeeze. My problem is that my Wheezy boxes have a load of over 3 and are not staying up during our peak time, whereas our squeeze boxes have a load of less than 1.The interesting part, is that despite the high load, my wheezy boxes are actually performing quite well, and are outperforming my squeeze boxes by 2-3 ms. Never the less, the high load is giving us cause for concern and is stopping us from migrating completely, and we're wondering if anybody else is seeing the same thing or can give us some assistance on where to go from here.I believe I have tracked down the issue with our load to be an interrupt issue. My interrupts on wheezy are way higher. CPU, I/O, Memory and Context Switches are all the same (measured with top, atop, iotop, vmstat). It doesn't appear to be a hardware issue, as I deployed wheezy and our code base to a different and faster motherboard/cpu combo, and the issue remained.The items that stands out is that my "Rescheduling Interrupts" and "timer" are interrupting like crazy on wheezy compared to squeeze. Here is my output of total interrupts on Squeeze vs Wheezy for two different machines, rebooted and placed into service at the exact same time, with traffic distributed to them via round robin, so it should be fairly equal. Rescheduling Interrupts: 4109580 on Wheezy vs 67418 on Squeeze.Timer: 504238 on Wheezy vs 50 on Squeeze. Thoughts? Suggestions? Here is my squeeze box interrupts: # sudo cat /proc/interrupts | awk '{ print $18, $19, $2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17 }' 0 IO-APIC-edge timer 50 IO-APIC-edge i8042 2 IO-APIC-edge serial 8 IO-APIC-edge rtc0 1 IO-APIC-fasteoi acpi 0 IO-APIC-edge i8042 4 IO-APIC-fasteoi uhci_hcd:usb2 0 IO-APIC-fasteoi ehci_hcd:usb1, 2 IO-APIC-fasteoi ata_piix, 24014 IO-APIC-fasteoi uhci_hcd:usb4 48 IO-APIC-fasteoi ehci_hcd:usb3, 0 PCI-MSI-edge eth0 1 PCI-MSI-edge eth0-TxRx-0 919924 PCI-MSI-edge eth0-TxRx-1 1206377 PCI-MSI-edge eth0-TxRx-2 1208344 PCI-MSI-edge eth0-TxRx-3 817225 PCI-MSI-edge eth0-TxRx-4 1165734 PCI-MSI-edge eth0-TxRx-5 1314252 PCI-MSI-edge eth0-TxRx-6 998115 PCI-MSI-edge eth0-TxRx-7 1229384 PCI-MSI-edge eth1 1 PCI-MSI-edge eth1-TxRx-0 4776 PCI-MSI-edge eth1-TxRx-1 PCI-MSI-edge eth1-TxRx-2 5557 PCI-MSI-edge eth1-TxRx-3 5344 PCI-MSI-edge eth1-TxRx-4 5827 PCI-MSI-edge eth1-TxRx-5 5060 PCI-MSI-edge eth1-TxRx-6 4078 PCI-MSI-edge eth1-TxRx-7 4317 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 Non-maskable interrupts 0 Local timer 3968846 Spurious interrupts 0 Performance monitoring 0 Performance pending 0 Rescheduling interrupts 67418 Function call 16404 TLB shootdowns 33915 Thermal event 0 Threshold APIC 0 Machine check 0 Machine check 128 Here is my wheezy interrupts: # sudo cat /proc/interrupts | awk '{ print $18, $19, $2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17 }' IO-APIC-edge timer 504238 IO-APIC-edge i8042 3 IO-APIC-edge serial 12 IO-APIC-edge rtc0 1 IO-APIC-fasteoi acpi 0 IO-APIC-edge i8042 4 IO-APIC-fasteoi uhci_hcd:usb3 0 IO-APIC-fasteoi ehci_hcd:usb1, 2 IO-APIC-fasteoi ata_piix, 21189 IO-APIC-fasteoi uhci_hcd:usb4 47 IO-APIC-fasteoi ehci_hcd:usb2, 0 PCI-MSI-edge eth0 1 PCI-MSI-edge eth0-TxRx-0 1506134 PCI-MSI-edge eth0-TxRx-1 1102085 PCI-MSI-edge eth0-TxRx-2 1399087 PCI-MSI-edge eth0-TxRx-3 1123149 PCI-MSI-edge eth0-TxRx-4 849678 PCI-MSI-edge eth0-TxRx-5 1428705 PCI-MSI-edge eth0-TxRx-6 897420 PCI-MSI-edge eth0-TxRx-7 1321820 PCI-MSI-edge eth1 1 PCI-MSI-edge eth1-TxRx-0 4290 PCI-MSI-edge eth1-TxRx-1 4217 PCI-MSI-edge eth1-TxRx-2 3685 PCI-MSI-edge eth1-TxRx-3 4081 PCI-MSI-edge eth1-TxRx-4 5532 PCI-MSI-edge eth1-TxRx-5 6604 PCI-MSI-edge eth1-TxRx-6 3996 PCI-MSI-edge eth1-TxRx-7 4560 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 PCI-MSI-edge ioat-msix 3 Non-maskable interrupts 3847 Local timer 3846061 Spurious interrupts 0 Performance monitoring 3847 IRQ work 0 Rescheduling interrupts 4109580 Function call 13442 TLB shootdowns 1745 Thermal event 0 Threshold APIC 0 Machine check 0 Machine check 128
Re: MySQL slow and high load with Debian Wheezy (was: [whole mail text])
Hi Andrei, How could that KMail can answer from the subject if marked, but I strongly second Lisi´s notion of putting a legible text into the mail body and using a fine descriptive and short enough subject for the mail. Am Donnerstag, 23. Mai 2013, 11:15:29 schrieb Andrei Hristow: Hi, I have a serious problem with Debian 7. The system is very slow, work with MySQL databases is slow and painful. On Debian 6.0.7 system is very fast and stable, works on ext3 and ext4 on Debian 7. I have 8 GB of RAM and use the AMD64 version. CPU is 2.133 Ghz Intel core 2 duo. Mainboard is Gigabyte GA-EP45-UD3R socket 775 Hard drive using ata_piix driveron debian 6 and debian 7. Where could be the problem? Be because of ext4? Or should I use the i386 version of Debian 7 with PAE kernel The difference in performance between debian 6 and debian 7 is huge! I use wine and system load reached 10.0 What are your tips. What is better to use. Debian 7 i386 or Debian 7 AMD64 ? And whether it's better to ext3 or ext4 Lots of information is missing in there. What does slow mean? How do you notice its slow? Do you have any numbers? What is the workload? How is memory, cpu, disk usage and so on… But just a rough guess: Are you by chance using the -486 kernel? Well that will give you *one* CPU and I think a maximum of 1 GB of RAM (not sure about the latter). With any current x86 hardware for 32-bit 686-pae is suitable, for 64-bit its amd64. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/3206881.4rnBuzsIDg@merkaba
Intermittent high load average after upgrade to lenny / 2.6.26-2 ?
Hi Chaps, I've upgraded a server running our database connection pool software from etch on 2.6.18 to lenny on 2.6.26 and I'm now seeing intermittant high load averages. I don't see anything CPU or IO bound on the machine using top and vmstat, and I'm absoloutely baffled by it. Normal load average is below 1, but every so often totally out of the blue I'll see it jump up to 20! Didn't happen on 2.6.18, what else should I look at before suspecting it's a bug somewhere in CFS? Ta Glyn -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Regular high load peaks on servers
Julien wrote: Hi, Since quite a long time now, we observe the same phenomenon on three web servers we have on two different places. They regularly have high load peaks, until 20 to 50. These peaks append very regularly (from once a day to several an hour), and we can't explain why. It doesn't seem to be linked to a special increase in traffic or web requests. Two of the three servers are load-balanced web frontends running apache with nfs mounts. The third is an autonomous server with web, mail, mysql… services, without nfs. The three run under Debian Lenny. I know nobody could really solve this problem without access to the machines and logs, but I wonder if someone already experienced this sort of regular load peaks. Thanks in advance for any help, Try installing sysstat, and use the iostat utility to check your disk's usage during those peaks. High load is caused from high cpu utilization, and from I/O util. Try also to stop cron for several hours, in order to be sure that no cron job causes the load. G. smime.p7s Description: S/MIME Cryptographic Signature
Regular high load peaks on servers
Hi, Since quite a long time now, we observe the same phenomenon on three web servers we have on two different places. They regularly have high load peaks, until 20 to 50. These peaks append very regularly (from once a day to several an hour), and we can't explain why. It doesn't seem to be linked to a special increase in traffic or web requests. Two of the three servers are load-balanced web frontends running apache with nfs mounts. The third is an autonomous server with web, mail, mysql… services, without nfs. The three run under Debian Lenny. I know nobody could really solve this problem without access to the machines and logs, but I wonder if someone already experienced this sort of regular load peaks. Thanks in advance for any help, -- Julien -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
etch testing bug 341055 spamassassin and exim - high load
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=341055 http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4590 Anyone have a work around? the --round-robin from the above link has lessened the issue however it is still creating a load ave of over 12.0 ! I tried downgrading to sarge/stable for sa sa-exim and exim-daemon-heavy and ended up with TLS cache read failed What is that and how can the cache be fixed? Just recently did a apt-get update;apt-get dist-upgrade on etch. Was already on etch. This brought in: spamassassin 3.1.0a-2 exim-daemon-heavy 4.61-1 sa-exim 4.2.1-2 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Woody: High load average, but no processes hogging...
On Tue, Jun 07, 2005 at 02:54:37PM +1200, Simon wrote: [snip] I have noticed high(ish) load averages (currently 2.08, last week it was 17!!), but there is no processes hogging the CPU, nor are we using any [snip] Check the output of ps(1) and look for processes in the 'D' state. Also, check I/O with: vmstat 5 (don't forget to discard the first line of info from that command.) -- asg -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Woody: High load average, but no processes hogging...
Adam Garside wrote: I have noticed high(ish) load averages (currently 2.08, last week it was 17!!), but there is no processes hogging the CPU, nor are we using any [snip] Check the output of ps(1) and look for processes in the 'D' state. Nothing there. All seems fine. Also, check I/O with: vmstat 5 (don't forget to discard the first line of info from that command.) The load average is currently at 2.12. Looked abit much cut n pasted, but here is the result: http://gremin.orcon.net.nz/vmstat.html -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
gateway pppd, syslog high load
hallo! ich hab vor kurzem ein gateway (adsl pptp, iptables) aufgesetzt. mein problem ist die zu hohe load. ---top-- 11:12:49 up 12 days, 19:01, 1 user, load average: 1.65, 1.45, 1.42 29 processes: 24 sleeping, 4 running, 1 zombie, 0 stopped CPU states: 75.6% user, 24.4% system, 0.0% nice, 0.0% idle Mem:192188K total, 124172K used,68016K free,46264K buffers Swap: 248968K total,0K used, 248968K free,34356K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 348 root 16 0 936 932 772 R57.8 0.4 3841m pppd 16821 root 12 0 596 596 488 R42.2 0.3 244:56 syslogd 1 root 8 0 484 484 424 S 0.0 0.2 0:05 init 2 root 9 0 00 0 SW0.0 0.0 0:00 keventd 3 root 19 19 00 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU0 4 root 9 0 00 0 SW0.0 0.0 0:00 kswapd 5 root 9 0 00 0 SW0.0 0.0 0:00 bdflush 6 root 9 0 00 0 SW0.0 0.0 0:00 kupdated 7 root 9 0 00 0 SW0.0 0.0 0:01 kjournald 163 root 9 0 1084 1084 408 S 0.0 0.5 0:00 klogd 210 root 9 0 1208 1208 1072 S 0.0 0.6 0:00 sshd 213 daemon 9 0 580 580 504 S 0.0 0.3 0:00 atd 216 root 8 0 684 684 564 S 0.0 0.3 0:00 cron 218 root 9 0 3628 3628 1324 S 0.0 1.8 2:20 ddclient 220 root 9 0 468 468 408 S 0.0 0.2 0:00 getty 221 root 9 0 468 468 408 S 0.0 0.2 0:00 getty --top-- software os: debian 3.0 r2 kernel: 2.4.25 vom der hardware durfte es jedoch kein problem sein: 256 MB und 1800+ Athlon sind eigentlich überdimensioniert für einen router und das netz ist auch nicht zu groß, ca. 10 clients. mir ist auch aufgefallen, dass die ersten 3 tage die load ganz normal war. wenn ich mit top die refresh rate auf 0.1 setzte fällt auf, dass der syslog immer kurze zeit 99% der cpu verbraucht. mfg alex
Re: gateway pppd, syslog high load
On Wed, 14 Apr 2004 09:19:12 +0200 Alex Handle [EMAIL PROTECTED] wrote: vom der hardware durfte es jedoch kein problem sein: 256 MB und 1800+ Athlon sind eigentlich überdimensioniert für einen router und das netz ist auch nicht zu groß, ca. 10 clients. mir ist auch aufgefallen, dass die ersten 3 tage die load ganz normal war. wenn ich mit top die refresh rate auf 0.1 setzte fällt auf, dass der syslog immer kurze zeit 99% der cpu verbraucht. Hrm... Platte bzw Partition voll? So dass der Syslogd nicht mehr schreiben kann? Sonnige Grüsse, Timo.
Re: gateway pppd, syslog high load
nein hab eine 20 GB platte und es sind nur ca 400 mb drauf On Wednesday 14 April 2004 09:49, Timo Eckert wrote: On Wed, 14 Apr 2004 09:19:12 +0200 Alex Handle [EMAIL PROTECTED] wrote: vom der hardware durfte es jedoch kein problem sein: 256 MB und 1800+ Athlon sind eigentlich überdimensioniert für einen router und das netz ist auch nicht zu groß, ca. 10 clients. mir ist auch aufgefallen, dass die ersten 3 tage die load ganz normal war. wenn ich mit top die refresh rate auf 0.1 setzte fällt auf, dass der syslog immer kurze zeit 99% der cpu verbraucht. Hrm... Platte bzw Partition voll? So dass der Syslogd nicht mehr schreiben kann? Sonnige Grüsse, Timo.
Re: gateway pppd, syslog high load
Ich hatte das selbe problem schon bei einem anderen rechner, ich glaube es liegt nicht an der hardware. On Wednesday 14 April 2004 09:49, Timo Eckert wrote: On Wed, 14 Apr 2004 09:19:12 +0200 Alex Handle [EMAIL PROTECTED] wrote: vom der hardware durfte es jedoch kein problem sein: 256 MB und 1800+ Athlon sind eigentlich überdimensioniert für einen router und das netz ist auch nicht zu groß, ca. 10 clients. mir ist auch aufgefallen, dass die ersten 3 tage die load ganz normal war. wenn ich mit top die refresh rate auf 0.1 setzte fällt auf, dass der syslog immer kurze zeit 99% der cpu verbraucht. Hrm... Platte bzw Partition voll? So dass der Syslogd nicht mehr schreiben kann? Sonnige Grüsse, Timo.
Re: gateway pppd, syslog high load
On Wed, 14 Apr 2004 10:12:08 +0200 Alex Handle [EMAIL PROTECTED] wrote: nein hab eine 20 GB platte und es sind nur ca 400 mb drauf Hast du den syslogd mal restarted? Sonnige Grüsse, Timo.
Re: gateway pppd, syslog high load
ich hab den syslog jetzt gestoppt und jetzt geht die load leicht runter load average: 1.04, 1.29, 1.38 nach einem start geht die load wieder hoch ... vielleicht liegt es am dhcpd der schreibt ziemlich viel in die daemon.log -- /var/log/daemon.log -- Apr 13 23:47:28 router dhcpd-2.2.x: DHCPREQUEST for 192.168.2.14 from 00:0b:6a:18:a6:92 via eth0 Apr 13 23:47:28 router dhcpd-2.2.x: DHCPACK on 192.168.2.14 to 00:0b:6a:18:a6:92 via eth0 Apr 13 23:50:39 router dhcpd-2.2.x: DHCPREQUEST for 192.168.2.13 from 00:0b:6a:2a:8f:31 via eth0 Apr 13 23:50:39 router dhcpd-2.2.x: DHCPACK on 192.168.2.13 to 00:0b:6a:2a:8f:31 via eth0 Apr 13 23:51:06 router dhcpd-2.2.x: DHCPREQUEST for 192.168.2.12 from 00:0b:6a:2a:94:00 via eth0 Apr 13 23:51:06 router dhcpd-2.2.x: DHCPACK on 192.168.2.12 to 00:0b:6a:2a:94:00 via eth0 Apr 13 23:52:29 router dhcpd-2.2.x: DHCPREQUEST for 192.168.2.14 from 00:0b:6a:18:a6:92 via eth0 Apr 13 23:52:29 router dhcpd-2.2.x: DHCPACK on 192.168.2.14 to 00:0b:6a:18:a6:92 via eth0 -- /var/log/daemon.log -- On Wednesday 14 April 2004 10:23, Timo Eckert wrote: On Wed, 14 Apr 2004 10:12:08 +0200 Alex Handle [EMAIL PROTECTED] wrote: nein hab eine 20 GB platte und es sind nur ca 400 mb drauf Hast du den syslogd mal restarted? Sonnige Grüsse, Timo.
Re: gateway pppd, syslog high load
On Wed, 14 Apr 2004 10:45:59 +0200 Alex Handle [EMAIL PROTECTED] wrote: ich hab den syslog jetzt gestoppt und jetzt geht die load leicht runter load average: 1.04, 1.29, 1.38 Naja, aber immer noch über 1.. was sagt denn 'dmesg'? Irgendwelche Errors? Sonnige Grüsse, Timo.
Re: gateway pppd, syslog high load
mir ist auch aufgefallen, dass ppp und pptp 2 x gestartet sind: router:~# ps aux | grep pptp root 16556 0.0 0.2 1316 524 ?SApr13 0:04 /usr/sbin/pptp root 16558 0.0 0.2 1316 552 ?SApr13 0:00 /usr/sbin/pptp router:~# ps aux | grep ppp root 348 21.1 0.4 2008 932 ?RApr01 3924:37 /usr/sbin/pppd /dev/pts/0 38400 persist maxfail 0 root 16560 0.0 0.4 2008 916 pts/1SApr13 0:00 /usr/sbin/pppd /dev/pts/1 38400 persist maxfail 0 ist das überhaupt normal ... On Wednesday 14 April 2004 10:23, Timo Eckert wrote: On Wed, 14 Apr 2004 10:12:08 +0200 Alex Handle [EMAIL PROTECTED] wrote: nein hab eine 20 GB platte und es sind nur ca 400 mb drauf Hast du den syslogd mal restarted? Sonnige Grüsse, Timo.
Re: gateway pppd, syslog high load
hab das in der kern.log gefunden sieht auch nicht gut aus: -- kern.log -- Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame spanned multiple buffers, entry 0xd length 0 status 0400! Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame cbaa90d0 vs cbaa90d0. Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame spanned multiple buffers, entry 0xe length 0 status 0400! Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame cbaa90e0 vs cbaa90e0. Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame spanned multiple buffers, entry 0xf length 0 status 0581! Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame cbaa90f0 vs cbaa90f0. -- kern.log On Wednesday 14 April 2004 09:19, Alex Handle wrote: hallo! ich hab vor kurzem ein gateway (adsl pptp, iptables) aufgesetzt. mein problem ist die zu hohe load. ---top-- 11:12:49 up 12 days, 19:01, 1 user, load average: 1.65, 1.45, 1.42 29 processes: 24 sleeping, 4 running, 1 zombie, 0 stopped CPU states: 75.6% user, 24.4% system, 0.0% nice, 0.0% idle Mem:192188K total, 124172K used,68016K free,46264K buffers Swap: 248968K total,0K used, 248968K free,34356K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 348 root 16 0 936 932 772 R57.8 0.4 3841m pppd 16821 root 12 0 596 596 488 R42.2 0.3 244:56 syslogd 1 root 8 0 484 484 424 S 0.0 0.2 0:05 init 2 root 9 0 00 0 SW0.0 0.0 0:00 keventd 3 root 19 19 00 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU0 4 root 9 0 00 0 SW0.0 0.0 0:00 kswapd 5 root 9 0 00 0 SW0.0 0.0 0:00 bdflush 6 root 9 0 00 0 SW0.0 0.0 0:00 kupdated 7 root 9 0 00 0 SW0.0 0.0 0:01 kjournald 163 root 9 0 1084 1084 408 S 0.0 0.5 0:00 klogd 210 root 9 0 1208 1208 1072 S 0.0 0.6 0:00 sshd 213 daemon 9 0 580 580 504 S 0.0 0.3 0:00 atd 216 root 8 0 684 684 564 S 0.0 0.3 0:00 cron 218 root 9 0 3628 3628 1324 S 0.0 1.8 2:20 ddclient 220 root 9 0 468 468 408 S 0.0 0.2 0:00 getty 221 root 9 0 468 468 408 S 0.0 0.2 0:00 getty --top-- software os: debian 3.0 r2 kernel: 2.4.25 vom der hardware durfte es jedoch kein problem sein: 256 MB und 1800+ Athlon sind eigentlich überdimensioniert für einen router und das netz ist auch nicht zu groß, ca. 10 clients. mir ist auch aufgefallen, dass die ersten 3 tage die load ganz normal war. wenn ich mit top die refresh rate auf 0.1 setzte fällt auf, dass der syslog immer kurze zeit 99% der cpu verbraucht. mfg alex
high load but no cpu usage
Hi, I seem to have a strange problem. I have a server which is showing a load average of around 1 but cpu usage of 0.6% over two cpus. What bothers me is that load average used to stay under 0.16 previously - nothing has changed. I have already tried to see if there are any processes blocking using ps auxwww but they all seem to be in State S with a few in SW and two in SWN. %cpu in iowait is also 0% according to top. also, iostat tells me the following (iostat -k) avg-cpu: %user %nice%sys %idle 8.310.000.75 90.94 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn dev8-01.72 0.0619.45 137015 41971011 Its really only running postgresql since it running as a db server. Any help is getting to the bottom of this appreciated. Shri -- Shri Shrikumar U R Byte Solutions Tel: 0845 644 4745 I.T. Consultant Edinburgh, Scotland Mob: 0773 980 3499 Web: www.urbyte.com Email: [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: high load but no cpu usage
Hi, I seem to have a strange problem. I have a server which is showing a load average of around 1 but cpu usage of 0.6% over two cpus. This would imply I/O wait for me. What sort of disks does it have? What bothers me is that load average used to stay under 0.16 previously - nothing has changed. I have already tried to see if there are any processes blocking using ps auxwww but they all seem to be in State S with a few in SW and two in SWN. Run vmstat 1 for a few minutes and post it here Rgds Rus -- w: http://www.jvds.com | Dedicated FreeBSD,Debian and RedHat Servers e: [EMAIL PROTECTED]| Dontations made to Debian, FreeBSD t: +44 7919 373537 | and Slackware t: 1-888-327-6330 | email: [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: high load but no cpu usage
On Sat, 2003-10-04 at 19:44, Rus Foster wrote: Hi, I seem to have a strange problem. I have a server which is showing a load average of around 1 but cpu usage of 0.6% over two cpus. This would imply I/O wait for me. What sort of disks does it have? Thats what I thought but this same machine has handles twice the load at just 0.16 load average. Also, its running off 2 scsi disks which are mirrored. Run vmstat 1 for a few minutes and post it here I have attached the output of vmstat. The load average of the machine was around 1 throughout. Please let me know if you want me to add a longer vmstat run. Best wishes, Shri -- Shri Shrikumar U R Byte Solutions Tel: 0845 644 4745 I.T. Consultant Edinburgh, Scotland Mob: 0773 980 3499 Web: www.urbyte.com Email: [EMAIL PROTECTED] vmstat.gz Description: GNU Zip compressed data signature.asc Description: This is a digitally signed message part
high load average
the other day I was moving several gigs of files from one ide drive to another on the same ide chain (the secondary channel is broken) and my load average went up to around 7 (no, not 0.07). The machine would become unresponsive for several seconds at a time. This is a uniprocessor machine, both drives are ext2 filesystems. Is this normal? I don't seem to remember having ide performance issues like this before (this is a new install). -jason -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: high load average
Have you checked your dma settings? hdparm/hwtools? Ramon Kagan York University, Computing and Network Services Unix Team - Intermediate System Administrator (416)736-2100 #20263 [EMAIL PROTECTED] - I have not failed. I have just found 10,000 ways that don't work. - Thomas Edison - On Mon, 23 Sep 2002, Jason Pepas wrote: the other day I was moving several gigs of files from one ide drive to another on the same ide chain (the secondary channel is broken) and my load average went up to around 7 (no, not 0.07). The machine would become unresponsive for several seconds at a time. This is a uniprocessor machine, both drives are ext2 filesystems. Is this normal? I don't seem to remember having ide performance issues like this before (this is a new install). -jason -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: high load average
Jason Pepas said: the other day I was moving several gigs of files from one ide drive to another on the same ide chain (the secondary channel is broken) and my load average went up to around 7 (no, not 0.07). The machine would become unresponsive for several seconds at a time. This is a uniprocessor machine, both drives are ext2 filesystems. Is this normal? I don't seem to remember having ide performance issues like this before (this is a new install). this is normal(in my experience) if DMA is not enabled on one or more of the IDE drives in use. some broken IDE chipsets(e.g. VIA) don't work well in DMA mode and the driver may automatically revert to PIO mode(even if you told it to use DMA) if it encounters problems in DMA mode(which prompted me to start using promise IDE controllers on VIA boards a couple years ago) nate -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: high load average
Is this normal? I don't seem to remember having ide performance issues like this before (this is a new install). This is normal if dma is not enabled. It isn't enabled by default in Debian. To enable it install hdparm and then run hdparm -d1 /dev/hdx as root where x is either a,b,c,d depending on the ide device. Hopefully that will work and your problem will be solved. If you're really lucky like me you can do something like hdparm -c3d1m16X66 /dev/hda to enable other options such as ATA-66. Just do man hdparm and check out the options. You might want to make a script to run hdparm on boot. You can put such a script in /etc/rc.boot it would look something like #!/bin/sh hdparm -options /dev/1stdevice hdparm -options /dev/2nddevice ... Hope that helps, Bijan -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: high load average
Or just get hwtools it creates a basic init.d script where you put your hdparm flags Bijan Soleymani wrote: Is this normal? I don't seem to remember having ide performance issues like this before (this is a new install). This is normal if dma is not enabled. It isn't enabled by default in Debian. To enable it install hdparm and then run hdparm -d1 /dev/hdx as root where x is either a,b,c,d depending on the ide device. Hopefully that will work and your problem will be solved. If you're really lucky like me you can do something like hdparm -c3d1m16X66 /dev/hda to enable other options such as ATA-66. Just do man hdparm and check out the options. You might want to make a script to run hdparm on boot. You can put such a script in /etc/rc.boot it would look something like #!/bin/sh hdparm -options /dev/1stdevice hdparm -options /dev/2nddevice ... Hope that helps, Bijan -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: high load average
Bijan Soleymani [EMAIL PROTECTED] writes: Is this normal? I don't seem to remember having ide performance issues like this before (this is a new install). This is normal if dma is not enabled. It isn't enabled by default in Debian. To enable it install hdparm and then run hdparm -d1 /dev/hdx as root where x is either a,b,c,d depending on the ide device. Hopefully that will work and your problem will be solved. If you're really lucky like me you can do something like hdparm -c3d1m16X66 /dev/hda to enable other options such as ATA-66. Just do man hdparm and check out the options. This sounds like a problem I'm having. I tried everything I could figure out to enable DMA on my IDE drive, but it still won't take the enable command... [joq@sulphur] ~/ $ sudo hdparm -d 1 /dev/hda /dev/hda: setting using_dma to 1 (on) HDIO_SET_DMA failed: Operation not permitted using_dma= 0 (off) I'm running woody. I built a kernel to turn on IDE DMA... [joq@sulphur] ~/ $ grep IDEDMA /usr/src/kernel-source-2.4.18/.config CONFIG_BLK_DEV_IDEDMA_PCI=y CONFIG_IDEDMA_PCI_AUTO=y CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_PCI_WIP is not set # CONFIG_IDEDMA_NEW_DRIVE_LISTINGS is not set CONFIG_IDEDMA_AUTO=y # CONFIG_IDEDMA_IVB is not set Here's what hdparm reports on my hardware... [joq@sulphur] ~/ $ sudo hdparm -i /dev/hda /dev/hda: Model=IC35L040AVVA07-0, FwRev=VA2OA52A, SerialNo=VNC202A2L1SU7A Config={ HardSect NotMFM HdSw15uSec Fixed DTR10Mbs } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=52 BuffType=DualPortCache, BuffSize=1863kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=80418240 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=yes: disabled (255) WriteCache=enabled Drive Supports : ATA/ATAPI-5 T13 1321D revision 1 : ATA-2 ATA-3 ATA-4 ATA-5 [joq@sulphur] ~/ $ sudo hdparm /dev/hda /dev/hda: multcount= 16 (on) I/O support = 1 (32-bit) unmaskirq= 1 (on) using_dma= 0 (off) keepsettings = 0 (off) nowerr = 0 (off) readonly = 0 (off) readahead= 8 (on) geometry = 5005/255/63, sectors = 80418240, start = 0 busstate = 1 (on) My mobo is an ASUS A7V333... [joq@sulphur] ~/ $ sudo lspci -v 00:00.0 Host bridge: VIA Technologies, Inc. VT8367 [KT266] Subsystem: Asustek Computer, Inc.: Unknown device 807f Flags: bus master, 66Mhz, medium devsel, latency 0 Memory at e000 (32-bit, prefetchable) [size=64M] Capabilities: [a0] AGP version 2.0 Capabilities: [c0] Power Management version 2 00:01.0 PCI bridge: VIA Technologies, Inc. VT8367 [KT266 AGP] (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, medium devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Memory behind bridge: dc80-dddf Prefetchable memory behind bridge: ddf0-dfff Capabilities: [80] Power Management version 2 snip: other devices 00:11.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP]) Subsystem: Asustek Computer, Inc.: Unknown device 808c Flags: bus master, medium devsel, latency 32 I/O ports at b400 [size=16] Capabilities: [c0] Power Management version 2 snip: other devices [joq@sulphur] ~/ $ cat /proc/ide/hda/driver ide-disk version 1.10 [joq@sulphur] ~/ $ cat /proc/ide/hda/model IC35L040AVVA07-0 [joq@sulphur] ~/ $ sudo cat /proc/ide/hda/cache 1863 [joq@sulphur] ~/ $ sudo cat /proc/ide/hda/settings namevalue min max mode - --- --- bios_cyl50050 65535 rw bios_head 255 0 255 rw bios_sect 63 0 63 rw breada_readahead4 0 127 rw bswap 0 0 1 r current_speed 0 0 69 rw failures0 0 65535 rw file_readahead 124 0 16384 rw ide_scsi0 0 1 rw init_speed 0 0 69 rw io_32bit1 0 3 rw keepsettings0 0 1 rw lun 0 0 7 rw max_failures1 0 65535 rw max_kb_per_request 127 1 127 rw multcount 8
Samba Problem: dead smbd, high load, kill -9 funktioniert nicht
Hallo ML, mein fileserver hat heute nacht seltsamerweise ueber 70 smbd connections bekommen, von diversen Rechnern in meinem Netzwerk. Soweit ok, sind aber alle nicht mehr aktiv, tauchen aber noch im smbstatus auf. Lassen sich mit kill -9 nicht beenden. netstat zeigt CLOSE_WAIT bei allen an. Load ist bei 75 derzeit und das blockiert zb auch sendmail. Booten ist leider nicht drin, der Server ist in einem abgeschlossenen Raum und bootet leider seid drei tagen nicht mehr automatisch ( ein promise controller, der auf 'ne eingabe wartet) warum kann ich die prozesse nicht killen ? und wie kann ich die lod runterdruecken ? danke PDU -- GMX - Die Kommunikationsplattform im Internet. http://www.gmx.net -- Zum AUSTRAGEN schicken Sie eine Mail an [EMAIL PROTECTED] mit dem Subject unsubscribe. Probleme? Mail an [EMAIL PROTECTED] (engl)
Re: Samba Problem: dead smbd, high load, kill -9 funktioniert nicht
On Tue, Jun 18, 2002 at 04:59:43PM +0200, Proud Debian-User wrote: Hallo ML, Hallo Proud Debian-User, [ Samba hat high load - Prozesse koennen nicht gekillt werden ] Booten ist leider nicht drin, der Server ist in einem abgeschlossenen Raum und bootet leider seid drei tagen nicht mehr automatisch ( ein promise controller, der auf 'ne eingabe wartet) Hast du schon mal probiert den Samba zu beenden und dann wieder zu starten? -- Greetz Johannes Athmer msg10776/pgp0.pgp Description: PGP signature
High Load Average
Just a question: Is there any reason in particular for a Debian Box keep its load average always over 6? It is a AMD Athlon 750 Mhz with 256 Megs of RAM, running potato and 2.2.19, compiled to run on i686. It has the Patches Debian puts on the stock kernel, and the new-style raid patches, although no RAIDs are set up yet. Sometimes the Load Average goes over 10, making sendmail refuse connections. It is running sendmail, IMAP, POP3, apache+perl, Radius(cistron) and that's it. What can possibly be wrong? Sidenote: We had another similar machine (processor was a PIII 550 Mhz) running the same stuff, but with Slackware. Load was never that high, and the machine swapped all the time, at least 25 Megs. The new Debian Box never swaps, but has a high load always. Any thoughts? Jordi S. Bunster [EMAIL PROTECTED]
Re: High Load Average
what is running on it? have you checked top for processes? -- Forrest English http://truffula.net When we have nothing left to give There will be no reason for us to live But when we have nothing left to lose You will have nothing left to use -Fugazi On Sun, 3 Jun 2001, Jordi S. Bunster wrote: Just a question: Is there any reason in particular for a Debian Box keep its load average always over 6? It is a AMD Athlon 750 Mhz with 256 Megs of RAM, running potato and 2.2.19, compiled to run on i686. It has the Patches Debian puts on the stock kernel, and the new-style raid patches, although no RAIDs are set up yet. Sometimes the Load Average goes over 10, making sendmail refuse connections. It is running sendmail, IMAP, POP3, apache+perl, Radius(cistron) and that's it. What can possibly be wrong? Sidenote: We had another similar machine (processor was a PIII 550 Mhz) running the same stuff, but with Slackware. Load was never that high, and the machine swapped all the time, at least 25 Megs. The new Debian Box never swaps, but has a high load always. Any thoughts? Jordi S. Bunster [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: High Load Average
hi ya jordi you have a run away process and/or a memory leak ( amd and intel cpu behave slightly differently for ( the same code... what apps is running??? top -i ps axuw c ya alvin On Sun, 3 Jun 2001, Jordi S. Bunster wrote: Just a question: Is there any reason in particular for a Debian Box keep its load average always over 6? It is a AMD Athlon 750 Mhz with 256 Megs of RAM, running potato and 2.2.19, compiled to run on i686. It has the Patches Debian puts on the stock kernel, and the new-style raid patches, although no RAIDs are set up yet. Sometimes the Load Average goes over 10, making sendmail refuse connections. It is running sendmail, IMAP, POP3, apache+perl, Radius(cistron) and that's it. What can possibly be wrong? Sidenote: We had another similar machine (processor was a PIII 550 Mhz) running the same stuff, but with Slackware. Load was never that high, and the machine swapped all the time, at least 25 Megs. The new Debian Box never swaps, but has a high load always.
Re: High Load Average
On Sun, 3 Jun 2001 22:51:51 -0300 (BRT) Jordi S. Bunster [EMAIL PROTECTED] wrote: Just a question: Is there any reason in particular for a Debian Box keep its load average always over 6? Not really. Did you try top to find out which processes are doing that? Maybe you where running a Netscape/Mozilla client and some java stuff keeps runnig after a crash... -- Christoph Simon [EMAIL PROTECTED] --- ^X^C q quit :q ^C end x exit ZZ ^D ? help shit .
Re: High Load Average
hi ay or you could have a hacker running an irc on your machine -- if the rest of your lan/machines is fine... than probably not c ya alvin On Sun, 3 Jun 2001, Alvin Oga wrote: hi ya jordi you have a run away process and/or a memory leak ( amd and intel cpu behave slightly differently for ( the same code... what apps is running??? top -i ps axuw Just a question: Is there any reason in particular for a Debian Box keep its load average always over 6?
Re: High Load Average
you have a run away process and/or a memory leak ( amd and intel cpu behave slightly differently for ( the same code... Mmm .. speaking about internal programs, we only have some perl scripts. Perl is the compiled one, right? what apps is running??? We JUST installed the server. I mean, there's nothing hand compiled, except for Amavis. But it doesn't eat that much CPU time. In fact, top reveals that everyone uses CPU all the time. A ipop3d session easily goes for 18%, and a apache or sendmail one goes for 47% ~ 56%. It is just like everyone is using the machine at its most. Look: 91 processes: 89 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 68.7% user, 31.2% system, 0.0% nice, 0.0% idle Mem: 257856K av, 229104K used, 28752K free, 103600K shrd, 73192K buff Swap: 128484K av, 0K used, 128484K free 86696K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 170 root 0 0 632 632 516 S 0 5.3 0.2 6:00 syslogd 13533 root 10 0 1124 1120 780 S 0 4.3 0.4 0:00 scanmails 12172 jsb8 0 1192 1192 688 R 0 4.1 0.4 0:03 top 13532 root 0 0 1552 1552 1200 S 0 0.7 0.6 0:00 sendmail 177 root 0 0 4160 4160 804 S 0 0.5 1.6 2:30 named 11006 www-data 0 0 7632 7632 4776 S 0 0.5 2.9 0:01 apache 13271 www-data 0 0 4804 4804 4656 S 0 0.5 1.8 0:00 apache 13673 root 11 0 464 464 296 R 0 0.5 0.1 0:00 file 11825 root 0 0 1480 1480 1220 S 0 0.3 0.5 0:00 sshd 12136 rosanak1 0 1460 1460 972 S 0 0.3 0.5 0:00 ipop3d 13529 root 0 0 1412 1412 1200 S 0 0.3 0.5 0:00 sendmail 13627 root 0 0 1396 1396 1160 S 0 0.3 0.5 0:00 sendmail 357 root 0 0 1212 1212 1072 S 0 0.1 0.4 0:06 sendmail 15976 thomas 0 0 5752 5752 972 S 0 0.1 2.2 0:06 ipop3d 11525 www-data 0 0 4828 4828 4680 S 0 0.1 1.8 0:00 apache 1 root 0 0 472 472 400 S 0 0.0 0.1 0:12 init 2 root 0 0 00 0 SW 0 0.0 0.0 0:00 kflushd 3 root 0 0 00 0 SW 0 0.0 0.0 0:03 kupdate 4 root 0 0 00 0 SW 0 0.0 0.0 0:02 kswapd 5 root 0 0 00 0 SW 0 0.0 0.0 0:00 keventd 6 root -20 -20 00 0 SW 0 0.0 0.0 0:00 mdrecoveryd 102 daemon 0 0 492 492 408 S 0 0.0 0.1 0:00 portmap 172 root 0 0 760 760 384 S 0 0.0 0.2 0:00 klogd 230 root 0 0 440 440 376 S 0 0.0 0.1 0:00 gpm 241 root 0 0 560 560 476 S 0 0.0 0.2 0:00 lpd 356 root 0 0 1188 1188 832 S 0 0.0 0.4 0:02 nmbd 371 root 0 0 1204 1204 532 S 0 0.0 0.4 0:00 xfs 380 root 0 0 1548 1548 1320 S 0 0.0 0.6 0:00 ntpd 398 root 0 0 848 844 684 S 0 0.0 0.3 0:00 radwatch 399 root 0 0 856 856 792 S 0 0.0 0.3 0:02 radiusd 438 root 0 0 844 844 788 S 0 0.0 0.3 0:13 radiusd 469 root 0 0 616 616 512 S 0 0.0 0.2 0:00 cron 6613 root 0 0 440 440 376 S 0 0.0 0.1 0:00 getty 22159 root 0 0 584 584 500 S 0 0.0 0.2 0:03 inetd 31116 root 0 0 1224 1224 644 S 0 0.0 0.4 0:00 smbmount-2.2 31129 root 0 0 1216 1216 744 S 0 0.0 0.4 0:00 smbmount-2.2 31141 root 0 0 1220 1220 744 S 0 0.0 0.4 0:00 smbmount-2.2 31159 root 0 0 1220 1220 744 S 0 0.0 0.4 0:00 smbmount-2.2 31172 root 0 0 1220 1220 744 S 0 0.0 0.4 0:00 smbmount-2.2 At this moment, Load is a little bit lower (about 4), but idle is still 0%. Quite weird uh? If any command output is helpful, please let me know. Jordi S. Bunster [EMAIL PROTECTED]
Re: High Load Average
On Sun, 3 Jun 2001, Jordi S. Bunster wrote: JSB you have a run away process and/or a memory leak JSB JSB ( amd and intel cpu behave slightly differently for JSB ( the same code... JSB JSB Mmm .. speaking about internal programs, we only have some perl JSB scripts. Perl is the compiled one, right? JSB JSB what apps is running??? JSB JSB We JUST installed the server. I mean, there's nothing hand JSB compiled, except for Amavis. But it doesn't eat that much CPU JSB time. In fact, top reveals that everyone uses CPU all the time. A JSB ipop3d session easily goes for 18%, and a apache or sendmail one JSB goes for 47% ~ 56%. It is just like everyone is using the machine JSB at its most. JSB JSB Look: JSB JSB 91 processes: 89 sleeping, 2 running, 0 zombie, 0 stopped JSB CPU states: 68.7% user, 31.2% system, 0.0% nice, 0.0% idle JSB Mem: 257856K av, 229104K used, 28752K free, 103600K shrd, JSB 73192K buff JSB Swap: 128484K av, 0K used, 128484K free JSB 86696K cached check your bios settings, it looks like you have disabled external or internal cache .. they should be both enabled .. and all other memory region shadowing/caching should be disabled. Dingo. ).|.( '.'___'.' ' '(~)' ' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-ooO-=(_)=-Ooo-=-=-=-=-=-=-=-=-=-=-=-=-=- Petr [Dingo] Dvorak [EMAIL PROTECTED] Coder - Purple Dragon MUD pdragon.org port -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-[ 369D93 ]=-=- Debian version 2.2.18pre21, up 4 days, 13 users, load average: 1.00 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Re: High Load Average
On Sun, Jun 03, 2001 at 11:18:41PM -0300, Jordi S. Bunster wrote: 91 processes: 89 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 68.7% user, 31.2% system, 0.0% nice, 0.0% idle Mem: 257856K av, 229104K used, 28752K free, 103600K shrd, 73192K buff Swap: 128484K av, 0K used, 128484K free 86696K cached This (along with the process list copied from 'top') is not enough. I suspect that you have several processes in state 'D' which is uninterruptable sleep. Run 'ps auxwww' and search for any 'D's in the STAT column. They won't be using any CPU, but if they're hanging out in that state they could indicate some other kind of problem, possibly hardware related. noah -- ___ | Web: http://web.morgul.net/~frodo/ | PGP Public Key: http://web.morgul.net/~frodo/mail.html pgpjVK6ydfMg8.pgp Description: PGP signature
Re: High Load Average
On Sun, 3 Jun 2001 23:18:41 -0300 (BRT) Jordi S. Bunster [EMAIL PROTECTED] wrote: you have a run away process and/or a memory leak ( amd and intel cpu behave slightly differently for ( the same code... Mmm .. speaking about internal programs, we only have some perl scripts. Perl is the compiled one, right? what apps is running??? We JUST installed the server. I mean, there's nothing hand compiled, except for Amavis. But it doesn't eat that much CPU time. In fact, top reveals that everyone uses CPU all the time. A ipop3d session easily goes for 18%, and a apache or sendmail one goes for 47% ~ 56%. It is just like everyone is using the machine at its most. Look: 91 processes: 89 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 68.7% user, 31.2% system, 0.0% nice, 0.0% idle Mem: 257856K av, 229104K used, 28752K free, 103600K shrd, 73192K buff Swap: 128484K av, 0K used, 128484K free 86696K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 170 root 0 0 632 632 516 S 0 5.3 0.2 6:00 syslogd 13533 root 10 0 1124 1120 780 S 0 4.3 0.4 0:00 scanmails 12172 jsb8 0 1192 1192 688 R 0 4.1 0.4 0:03 top [...] At this moment, Load is a little bit lower (about 4), but idle is still 0%. Quite weird uh? If any command output is helpful, please let me know. Your table isn't very meaningful, as it doesn't show even 20% of load. It might take a while to see. But as you say that all are usually high, maybe you've got a kernel problem, maybe due to a hardware (IRQ?) conflict. Just a quick guess. -- Christoph Simon [EMAIL PROTECTED] --- ^X^C q quit :q ^C end x exit ZZ ^D ? help shit .
Re: High Load Average
Jordi S. Bunster wrote: We JUST installed the server. I mean, there's nothing hand compiled, except for Amavis. But it doesn't eat that much CPU amavis is VERY cpu intensive i run it on many systems. is there a lot of mail going through the system? is there a lot of big attachments? one of my mail servers didnt dip below load of 8 until i upgraded the system's hardware. amavis is great..but if you got a lotta mail you need more horsepower. also if your using something like UW Imap that can be a cause for very high load as well. i suggest switching to something else like CYRUS which reduces load by a factor of 100-200 (it did for me anyways) im sure there are other good IMAP servers like courier(sp?) but i haven't tried them. same goes for POP3. if your using qpopper or ipop3d those can be causes of high load as well(cyrus has a pop3 server as well, does not cause high load) if you have a lot of mail going through i suggest setting up a raid0 array(or raid 10) for /var/spool and have amavis scan mail off that drive. get SCSI if you can for this. sample mail server config: Average KB/hour of mail: 873kB/H Max KB/hour of mail: 9944.7kB/H Average Mail/hour: 71 Max Mail/hour: 1081 Average System Load: 1.02 Max System Load: 4.09 (Statistics gathered from MRTG over the past ~6 weeks or so) System config: Dual P3-800Mhz Dual 15k RPM Ultra160 SCSI drives raid1 /var/spool single 15k RPM Ultra160 drive (no raid) / 256MB ram 256MB swap Uptime: 74 days Time spent idle: 87.0% Linux 2.2.17+many patches (openwall included) Debian GNU/Linux 2.2r3 sendmail 8.9.3 + amavis Cyrus IMAP/POP Apache Apache+ssl Squirrelmail (webmail front end) Mcafee Antivirus 4.0.70 running amavis 0.2.1. hope this gives you an idea of what to expect when using amavis as far as load goes. nate -- ::: ICQ: 75132336 http://www.aphroland.org/ http://www.linuxpowered.net/ [EMAIL PROTECTED]
kernel 2.4.2 and high load = machine freezes?
I installed kernel 2.4.2 and while it works ok most of the time there were two occasions when computer (almost) froze, load being 100% and almost nothing worked for about an hor or more. both times this high load attack happened I opened xv (the thumbs view) on a directory with large number of files (about 2000). I did the same thing using old (2.2.17) kernel and it never caused significant problems. even with 2.4.2 kernel it only happens rarely, other times it works... first time it happened the mouse still moved, very slowly and after about 30 min. I saw that the focus starts to move from one window to another (title bar of one window changed color) second time it happened I couldn't do anything but ping the machine (it responded immediately, but ssh did not work) and switch from VT to VT - the switching between VTs was fast and the text screen appeared immediately but I could not type anything (well, I could type but nothing appeared on the screen, no keyboard combination worked (not even ctrl-alt-del) except of alt-Fn). it looks like it's caused or at least triggered by xv but I am quite sure it wasn't updated since quite some time before it used to work (these two freezes just happened within last week or so), the binary has date May 12 2000. if it happens I cannot even run top (it took about 15 - 30 sec. for load to build up to the level when machine was completely unusable) to see where the time is spent - it might be kernel. the disk seems to be working most of the time but not constantly. as far as I can see the memory usage does not go up, only the load (that's what the gkrellm show while it works). something similar happened before with netscape (one of those extra bad builds around version 4.0x), at that time I only had 16MB RAM so it wasn't hard to choke the system, but it woke up (killed netscape) eventually (it took few hours). however this time most of the software is the same as it was before I installed 2.4.2 I didn't find any suspicious messages in syslog or messages... any ideas on what's going on? system: debian testing, kernel 2.4.2, X 4.0.2, pentium 1GHz, 128MB RAM, plenty of disk space (MB, RAM and processor are new so it might be HW problem) TIA erik
Re: kernel 2.4.2 and high load = machine freezes?
Erik Steffl wrote: any ideas on what's going on? login on an xterm from another machine and run top while you try that. recently i upgraded my firewall from a k6-3 400 to a p3-800 and doubled the memory to 512MB. but it was still much slower!! turns out the VIA ide chipset on the p3 board(asus) didn't play well with the drivers(Even the most updated ones from linux-ide.org). DMA was disabled, when doing a lot of file access load would get up to 10 to 15 making even typing in an terminal(either local or remote) very difficult to do. once i turned it on things improved. but i have since disabled the VIA controller and got a promise controller instead it seems to have much better drivers, or is a better ide chip..whichever. no more problems! course i run 2.2.x, but im sure the DMA problem can happen in 2.4.x as i've seen stuff on the kernel mailing list about it. nate -- ::: ICQ: 75132336 http://www.aphroland.org/ http://www.linuxpowered.net/ [EMAIL PROTECTED]
Re: high load average
on Mon, Mar 05, 2001 at 11:12:16PM -0500, MaD dUCK ([EMAIL PROTECTED]) wrote: [cc'ing this to PLUG because it seems interesting...] also sprach kmself@ix.netcom.com (on Mon, 05 Mar 2001 08:02:51PM -0800): It's not 200% loaded. There are two processes in the run queue. I'd do huh? is that what 2.00 means? the average length of the run queue? Yep. that would explain it because i found two STAT = D processes which i cannot kill (any hints what to do when kill -9 doesn't work and the /proc/`pidof` directory cannot be removed?). that's why 2.00. Find their parents and kill them. Easiest way IMO is to use pstree: $ pstree -p ...search for the PIDs of the defunct processes, locate parent(s), kill same. Report back if problems. -- Karsten M. Self kmself@ix.netcom.comhttp://kmself.home.netcom.com/ What part of Gestalt don't you understand? There is no K5 cabal http://gestalt-system.sourceforge.net/ http://www.kuro5hin.org pgpvocfV4OJyu.pgp Description: PGP signature
Re: high load average
on Tue, Mar 06, 2001 at 11:21:07AM -0600, Dave Sherohman ([EMAIL PROTECTED]) wrote: On Tue, Mar 06, 2001 at 06:09:41PM +0100, Joris Lambrecht wrote: isn't 2.00 more like 2% ? It is US notation where . is a decimal separator. Not ? You have the notation correct, but load average and CPU utilization are not directly related. Load average is the average number of processes that are waiting on system resources over a certain time period; they could be waiting for CPU, for I/O, or for other resources. (CPU does tend to be the biggest bottleneck, though, so a basic rule of thumb is that you usually don't want load to be much greater than the number of CPUs in the box. It *is* CPU. These are processes in the run queue. A process blocked for I/O or another resource is blocked, not runnable (I think, I'm not positive, but I'll bet my morning coffee on it -- which I *really* like, and you'll want to give it to me anyway if I don't get it). The significance of load average is that if you have more runnable processes than CPUs, you have identified a system bottleneck: it's now possible to increase total system throughput by providing either more and/or faster processors. Excessive swapping indicates the system is memory bound. This isn't to say that having a large amount of swapped memory is bad (it may or may not be), but having a large number of processes swapping in and out of memory is bad. Not sure what the metric for I/O bound is. Under Solaris, top would report on I/O wait. I could crack the O'Reilly system performance tuning book and see what it says. If none of the above are evident and things are still too slow, then start optimizing your program(s). The machine I'm using starts killing off processes if load exceeds 6 or 7; I wouldn't want to see it hit 100...) It may not be all bad. In certain cases, I believe Apache will spawn large numbers of processes which manage to count against load average. However, total system performance isn't actually negatively effected too much. I once took my UMP PII/180 box to a load of about 30 by running multiple instances of computer v. computer gnuches That took a while to clean up. -- Karsten M. Self kmself@ix.netcom.comhttp://kmself.home.netcom.com/ What part of Gestalt don't you understand? There is no K5 cabal http://gestalt-system.sourceforge.net/ http://www.kuro5hin.org pgp78fxrBeMql.pgp Description: PGP signature
Re: high load average
On Thu, Mar 08, 2001 at 10:55:10PM -0800, kmself@ix.netcom.com wrote: on Tue, Mar 06, 2001 at 11:21:07AM -0600, Dave Sherohman ([EMAIL PROTECTED]) wrote: You have the notation correct, but load average and CPU utilization are not directly related. Load average is the average number of processes that are waiting on system resources over a certain time period; they could be waiting for CPU, for I/O, or for other resources. It *is* CPU. These are processes in the run queue. A process blocked for I/O or another resource is blocked, not runnable OK, now I'm confused... My statements were based on my memory of a thread from last May (was it that long ago?) on this very list titled (ot) What is load average?. Checking back on the messages I saved from that conversation, I see a one from kmself@ix.netcom.com stating that load average is | Number of processes in the run queue, averaged over time. Often | confused with CPU utilization, which it is not. Load average either is CPU or it isn't, right? So you can't have been correct both times. Now, you may have been wrong last year and since realized that it's more CPU-related than you had thought, but (aside from this thread's original question describing a situation with a long-term consistent load average of 2.00 and low-to-no CPU utilization) last May's thread also included a message from [EMAIL PROTECTED] stating that ] It is the average number of processes in the 'R' (running/runnable) state ] (or blocked on I/O). and ] The load average is most directly related to CPU. Two CPU-intensive ] processes running will result in a load average of 2, etc. But I/O ] intensive processes spend so much time active that they can drive up the ] load average also. In addition if more than one process is blocked on I/O ] then the load average will go up very quickly, as both processes count ] toward the load even if only one can access the disk at a time. Based on my observations of load and CPU readings on my boxes and the messages from last May that I quoted above, I'm inclined to maintain my earlier statement that processes waiting on any resource (not just CPU) contribute to load. But, if that's not the case, I'm willing to be corrected. -- Linux will do for applications what the Internet did for networks. - IBM, Peace, Love, and Linux Geek Code 3.1: GCS d? s+: a- C++ UL++$ P+ L+++ E- W--(++) N+ o+ !K w---$ O M- V? PS+ PE Y+ PGP t 5++ X+ R++ tv b+ DI D G e* h+ r y+
Re: high load average
on Fri, Mar 09, 2001 at 01:27:50AM -0600, Dave Sherohman ([EMAIL PROTECTED]) wrote: On Thu, Mar 08, 2001 at 10:55:10PM -0800, kmself@ix.netcom.com wrote: on Tue, Mar 06, 2001 at 11:21:07AM -0600, Dave Sherohman ([EMAIL PROTECTED]) wrote: You have the notation correct, but load average and CPU utilization are not directly related. Load average is the average number of processes that are waiting on system resources over a certain time period; they could be waiting for CPU, for I/O, or for other resources. It *is* CPU. These are processes in the run queue. A process blocked for I/O or another resource is blocked, not runnable OK, now I'm confused... I'm also somewhat fallible. So, we'll get to the source of the question this time. In particular, a job blocked for I/O *is* runnable. My error. My statements were based on my memory of a thread from last May (was it that long ago?) on this very list titled (ot) What is load average?. Checking back on the messages I saved from that conversation, I see a one from kmself@ix.netcom.com stating that load average is | Number of processes in the run queue, averaged over time. Often | confused with CPU utilization, which it is not. Load average either is CPU or it isn't, right? Percent of clock ticks being utilized is CPU utilization. Number of jobs in runnable state is load average. Related, but not identical metrics. My own statement: Load average is a measure of _average current requests for CPU processing_ over some time interval. While we're at it, let's pull in a more authoritative definition, this from _System Performance Tuning_, by Mike Loukides, O'Reilly, 1990: The _system load average_ provides a convenient way to summarize the activity on a system. It is the first statistic you should look at when performance seems to be poor. UNIX defines load average as the average number of processes in the kernel's run queue during an interval. A _process_ is a single stream of instructions. Most programs run as a single process, but some sapwn (UNIX terminology: _fork_) other processes as they run. A process is in the run queue if it is: * Not waiting for any external event (e.g., not waiting for someone to type a character at a terminal). * Not waiting of its own accord (e.g., the job hasn't called 'wait'.) * Not stopped (e.g., the job hasn't been stopped by CTRL-Z). Processes cannot be stopped on XENIX and versions of System V.2. The ability to stop processes has been added to System V.4 and some versions of V.3. While the load average is convenient, it may not give you an accurate picture of the system's load. There are two primary reasons for this innaccuracy: * The load average counts as runnable all jobs waiting for disk I/O. This includes processes that are waiting for disk operations to complete across NFS. If an NFS server is not responding (e.g., if the network is faulty or the server has crashed), a percoess can wait for hours for an NFS operation to complete. It is considered runnable the entire time even though nothing is happening; therefore, the load average climbs when NFS servers crash, even though the system isn't really doing any more work. * The load average does not account for scheduling priority. It does not differentiate between jobs that have been niced (i.e., placed at a lower priority and therefore not consuming much CPU time) or jobs that are running at a high priority. Hopefully, that clarifies a few misperceptions and sloppy statements (my own included). Specific to GNU/Linux, the count of active tasks is computed in kernel/sched.c as: static unsigned long count_active_tasks(void) { struct task_struct *p; unsigned long nr = 0; read_lock(tasklist_lock); for_each_task(p) { if ((p-state == TASK_RUNNING || p-state == TASK_UNINTERRUPTIBLE || p-state == TASK_SWAPPING)) nr += FIXED_1; } read_unlock(tasklist_lock); return nr; } So you can't have been correct both times. No, I am. You're just not reading me consistently ;-) My admonition in the current thread that load average is a metric of CPU utilization is just that: load average is concerned with CPU, it is *not* concerned with memory, disk I/O (though I/O blocking can effect it), etc. However, as I clarify in this current post, and my prior thread, load average is not equivalent to CPU _utilization_. To put it in different terms: - Load average is how often you're asking for it. - CPU utilization is how often you're getting it. High load average means you've got more requests than you can handle
Re: high load average
On Fri, Mar 09, 2001 at 03:25:24PM -0800, kmself@ix.netcom.com wrote: The clarification is given in the O'Reilly citation. Runnable processes, not waiting on other resources, I/O blocking excepted. Excellent - thanks! -- Linux will do for applications what the Internet did for networks. - IBM, Peace, Love, and Linux Geek Code 3.1: GCS d? s+: a- C++ UL++$ P+ L+++ E- W--(++) N+ o+ !K w---$ O M- V? PS+ PE Y+ PGP t 5++ X+ R++ tv b+ DI D G e* h+ r y+
RE: high load average
Dear dUCK, isn't 2.00 more like 2% ? It is US notation where . is a decimal separator. Not ? -Original Message- From: MaD dUCK [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 06, 2001 3:38 AM To: debian users Subject: high load average someone explain this to me: albatross:~$ uname -a Linux albatross 2.2.17 #2 Mon Sep 04 20:49:27 CET 2000 i586 unknown albatross:~$ uptime 2:56am up 174 days, 5:50, 1 user, load average: 2.00, 2.05, 2.01 # processes sorted by decreasing cpu usage albatross:~$ ps aux | head -1 ps aux | sort -nrk3 | head -5 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 15889 0.2 1.6 2720 1536 ?S02:50 0:02 /usr/sbin/sshd root 1646 0.1 0.9 1672 864 ?S 2000 32:01 /usr/sbin/diald - xfs 1776 0.0 1.0 2060 1020 ?S 2000 0:00 xfs -droppriv -da squid 1748 0.0 0.3 1088 332 ?S 2000 0:06 (unlinkd) squid 1742 0.0 19.2 20048 18440 ? S 2000 15:01 (squid) -D root 25890 0.0 0.7 1652 764 ?D00:01 0:00 sh /etc/ppp/ip-up root 25889 0.0 0.7 1644 752 ?S00:01 0:00 bash /etc/ppp/ip- the load average displayed by uptime has been very consistently above 2.00 and the output of ps aux has been pretty much the same for the past two weeks. no hung jobs. no traffic. the server basically *isn't being used*, especially not during the last 1, 5, or 15 minutes. and cron isn't running, there are *only* 35 running jobs. why, oh why then is it 200% loaded??? martin [greetings from the heart of the sun]# echo [EMAIL PROTECTED]:1:[EMAIL PROTECTED]@@@.net -- the web site you seek cannot be located but endless others exist. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: high load average
On Tue, Mar 06, 2001 at 06:09:41PM +0100, Joris Lambrecht wrote: isn't 2.00 more like 2% ? It is US notation where . is a decimal separator. Not ? You have the notation correct, but load average and CPU utilization are not directly related. Load average is the average number of processes that are waiting on system resources over a certain time period; they could be waiting for CPU, for I/O, or for other resources. (CPU does tend to be the biggest bottleneck, though, so a basic rule of thumb is that you usually don't want load to be much greater than the number of CPUs in the box. The machine I'm using starts killing off processes if load exceeds 6 or 7; I wouldn't want to see it hit 100...) -- Linux will do for applications what the Internet did for networks. - IBM, Peace, Love, and Linux Geek Code 3.1: GCS d? s+: a- C++ UL++$ P+ L+++ E- W--(++) N+ o+ !K w---$ O M- V? PS+ PE Y+ PGP t 5++ X+ R++ tv b+ DI D G e* h+ r y+
high load average
someone explain this to me: albatross:~$ uname -a Linux albatross 2.2.17 #2 Mon Sep 04 20:49:27 CET 2000 i586 unknown albatross:~$ uptime 2:56am up 174 days, 5:50, 1 user, load average: 2.00, 2.05, 2.01 # processes sorted by decreasing cpu usage albatross:~$ ps aux | head -1 ps aux | sort -nrk3 | head -5 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 15889 0.2 1.6 2720 1536 ?S02:50 0:02 /usr/sbin/sshd root 1646 0.1 0.9 1672 864 ?S 2000 32:01 /usr/sbin/diald - xfs 1776 0.0 1.0 2060 1020 ?S 2000 0:00 xfs -droppriv -da squid 1748 0.0 0.3 1088 332 ?S 2000 0:06 (unlinkd) squid 1742 0.0 19.2 20048 18440 ? S 2000 15:01 (squid) -D root 25890 0.0 0.7 1652 764 ?D00:01 0:00 sh /etc/ppp/ip-up root 25889 0.0 0.7 1644 752 ?S00:01 0:00 bash /etc/ppp/ip- the load average displayed by uptime has been very consistently above 2.00 and the output of ps aux has been pretty much the same for the past two weeks. no hung jobs. no traffic. the server basically *isn't being used*, especially not during the last 1, 5, or 15 minutes. and cron isn't running, there are *only* 35 running jobs. why, oh why then is it 200% loaded??? martin [greetings from the heart of the sun]# echo [EMAIL PROTECTED]:1:[EMAIL PROTECTED]@@@.net -- the web site you seek cannot be located but endless others exist.
Re: high load average
On Mon, Mar 05, 2001 at 09:37:36PM -0500, MaD dUCK wrote: the load average displayed by uptime has been very consistently above 2.00 and the output of ps aux has been pretty much the same for the past two weeks. no hung jobs. no traffic. the server basically *isn't being used*, especially not during the last 1, 5, or 15 minutes. and cron isn't running, there are *only* 35 running jobs. why, oh why then is it 200% loaded??? Load average is not an indication of how busy the CPU is. A busy CPU can *cause* a high load average, but so can other stuff. In this case, I would guess that the high load is caused by processes being blocked while waiting for IO routines to complete. In the ps output or in top, look for processes in state 'D'. I suspect you'll find 2 of them (one was a ppp process visible in the ps output you posted). Figure out why they're blocking and you'll be able to do something to fix it. noah -- ___ | Web: http://web.morgul.net/~frodo/ | PGP Public Key: http://web.morgul.net/~frodo/mail.html pgph2L7ZgFNFZ.pgp Description: PGP signature
Re: high load average
also sprach Noah L. Meyerhans (on Mon, 05 Mar 2001 09:51:53PM -0500): Load average is not an indication of how busy the CPU is. A busy CPU can *cause* a high load average, but so can other stuff. good point. so i found two offending processes in state D: root 24520 0.0 0.9 1652 904 ?DFeb25 0:00 /bin/gawk root 25890 0.0 0.7 1652 764 ?D00:01 0:00 sh /etc/ppp/ip-up however, a kill -9 on either one doesn't delete them, i cannot delete it's directory in /proc (as works on solaris 2.6), and to the best of my knowledge, these processes won't go away. any tips, other than to reboot? martin [greetings from the heart of the sun]# echo [EMAIL PROTECTED]:1:[EMAIL PROTECTED]@@@.net -- oxymoron: micro$oft works
Re: high load average
on Mon, Mar 05, 2001 at 09:37:36PM -0500, MaD dUCK ([EMAIL PROTECTED]) wrote: someone explain this to me: albatross:~$ uname -a Linux albatross 2.2.17 #2 Mon Sep 04 20:49:27 CET 2000 i586 unknown albatross:~$ uptime 2:56am up 174 days, 5:50, 1 user, load average: 2.00, 2.05, 2.01 # processes sorted by decreasing cpu usage albatross:~$ ps aux | head -1 ps aux | sort -nrk3 | head -5 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 15889 0.2 1.6 2720 1536 ?S02:50 0:02 /usr/sbin/sshd root 1646 0.1 0.9 1672 864 ?S 2000 32:01 /usr/sbin/diald - xfs 1776 0.0 1.0 2060 1020 ?S 2000 0:00 xfs -droppriv -da squid 1748 0.0 0.3 1088 332 ?S 2000 0:06 (unlinkd) squid 1742 0.0 19.2 20048 18440 ? S 2000 15:01 (squid) -D root 25890 0.0 0.7 1652 764 ?D00:01 0:00 sh /etc/ppp/ip-up root 25889 0.0 0.7 1644 752 ?S00:01 0:00 bash /etc/ppp/ip- the load average displayed by uptime has been very consistently above 2.00 and the output of ps aux has been pretty much the same for the past two weeks. no hung jobs. no traffic. the server basically *isn't being used*, especially not during the last 1, 5, or 15 minutes. and cron isn't running, there are *only* 35 running jobs. why, oh why then is it 200% loaded??? It's not 200% loaded. There are two processes in the run queue. I'd do a 'ps aux' and look at what's runnable (STAT = 'R'). You might have to do this repeatedly to find out what's there. If it's the same processes consistently, you might look to see what they or their children are doing. Note that you list *no* runnable processes in your ps output. -- Karsten M. Self kmself@ix.netcom.comhttp://kmself.home.netcom.com/ What part of Gestalt don't you understand? There is no K5 cabal http://gestalt-system.sourceforge.net/ http://www.kuro5hin.org pgpVPI810tT06.pgp Description: PGP signature
Re: high load average
[cc'ing this to PLUG because it seems interesting...] also sprach kmself@ix.netcom.com (on Mon, 05 Mar 2001 08:02:51PM -0800): It's not 200% loaded. There are two processes in the run queue. I'd do huh? is that what 2.00 means? the average length of the run queue? that would explain it because i found two STAT = D processes which i cannot kill (any hints what to do when kill -9 doesn't work and the /proc/`pidof` directory cannot be removed?). that's why 2.00. thanks, martin [greetings from the heart of the sun]# echo [EMAIL PROTECTED]:1:[EMAIL PROTECTED]@@@.net -- micro$oft is to operating systems security what mcdonalds is to gourmet cuisine.
Re: High load
Suresh Kumar posts: I have never seen load averages going above 2 earlier with redhat installation. On a similar setup while running Netscape ? Please install libc5 and libg++272 found in /oldlibs of the Debian 'slink' CD. ragOO, VU2RGU. Kochi, INDIA. Keeping the Air-Waves FREE.Amateur Radio Keeping the W W W FREE..Debian GNU/Linux
High load
Hi, I recently installed a debian 2.1 on my machine which was earlier running redhat 5.2. (pentium 100MHz, 16mb ram). The machine becomes very very slow and unusable when I run netscape. I have dialup connection. The load average goes 100 and more. I have never seen load averages going above 2 earlier with redhat installation. I tried issuing top command to know who the culprit is. I could not find much sense from the listing. It showed multiple entries of syslogd. Any ideas on how to make the system useful ? Suresh - Suresh Kumar.R Email: [EMAIL PROTECTED] Dept of Electronics Communication College of Engineering, Trivandrum - 695 016 INDIA
RE: High load
Recent versions of netscape will slow a 16Mb system to a crawl. How does the system respond when you aren't running netscape? What window manager are you using? What else are you running at the time. Check you netscape memory cache size. I would be wiling to bet the problem lies in the (lack of) RAM. Bryan On 28-Apr-2000 Suresh Kumar.R wrote: Hi, I recently installed a debian 2.1 on my machine which was earlier running redhat 5.2. (pentium 100MHz, 16mb ram). The machine becomes very very slow and unusable when I run netscape. I have dialup connection. The load average goes 100 and more. I have never seen load averages going above 2 earlier with redhat installation. I tried issuing top command to know who the culprit is. I could not find much sense from the listing. It showed multiple entries of syslogd. Any ideas on how to make the system useful ? Suresh - Suresh Kumar.REmail: [EMAIL PROTECTED] Dept of Electronics Communication College of Engineering, Trivandrum - 695 016 INDIA -- Unsubscribe? mail -s unsubscribe [EMAIL PROTECTED] /dev/null
high load but idle CPU
I have a dual-CPU system running potato with kernel 2.2.3. Here's what top reports: 6:30pm up 36 days, 20:55, 10 users, load average: 5.22, 5.28, 5.17 152 processes: 147 sleeping, 2 running, 2 zombie, 1 stopped CPU states: 0.4% user, 1.5% system, 0.0% nice, 97.9% idle Mem: 516688K av, 480208K used, 36480K free, 96664K shrd, 167100K buff Swap: 513968K av, 0K used, 513968K free134748K cached How can the load be above 5 while the CPU is 97.9% idle? This has been the case over the last week. The load stays very high even when there are hours of very low CPU activity. Any clues? Thanks, Max -- The hopeful depend on a world without end Whatever the hopeless may say Neil Peart, 1985 pgpz1QUKCNLDr.pgp Description: PGP signature
Re: high load but idle CPU
* George Bonser [EMAIL PROTECTED] [05/26/99 18:59] wrote: Do a ps -ax and see how many processes you have stuck in D state ;). Then go and get 2.2.9 Yup, that explains it! I have 5 sxid processes in D state. Hmmmcould it have something to do with the fact that I installed arla 5 days ago and sxid is trying to traverse the entire AFS tree? :) I guess I'll have to wait till the next reboot to clear these D processes out. In the meantime, editing sxid.conf is a good idea. :) Thanks, Max -- The hopeful depend on a world without end Whatever the hopeless may say Neil Peart, 1985 pgp6JTuhkfWBY.pgp Description: PGP signature
Re: high load but idle CPU
George Bonser wrote: Any process involved with heavy net activity in an SMP system with 2.2.3 will do this. I had problems with web servers doing it. 2.2.9 seems OK. 2.2.6/7 were disasters. 2.2.5 seemed to work, though. Hm, could you expand on that? I've been using 2.2.7 for a while, what problems does it have? -- see shy jo
Extremely High Load
I'm running a Debian 1.3.1 system and find the machine, when put into our production environment here, after a little while causes the machine's load to rise, and keep on going. It was so bad it got up to 150+ once. At any ratI ran top one time and nothing was using any large amount of CPU, nor was the hard drive going crazy or any significant amount of memory being used. This machine is slated to replace our current shell machine, which is currently handling shell services, e-mail, dns, and www for our customers. The machine is a Pentium 233 /w 128MB ram (side note: I made sure I gave the kernel the mem=128M param), with a 2.0.33 kernel. Some of the other significant software we run (as in, the stuff that gets hit the most) is sendmail 8.8.8 (I rolled my own), qpopper 2.2 (my own compile),and apache 1.2.4 (again, my own compile). We also run cgiwrap, but I don't think that would cause the problem since I disabled it when I first started seeing the problem. I was also running process accounting. Sometimes the system doesn't get bad right away, it made it 25 minutes of uptime once before getting the skyrocketing load, but usually it will start jumping within a few minutes (the last time I tried it, the load went up as soon as the machine booted). I wouldn't be totally concerned with the load except once the problem starts, the machine is almost totally unresponsive to interactive use and I have to do a reset to restart the system. Our old setup is an old Slackware distribution with a 1.2.13 kernel. Unfortunately, I didn't set up the old system, so the old admin may have made modifications to the kernel that I don't know about to deal with the amount of load it gets (we can have like 25-30 sendmail processes + 30-50 apache processes running at once on the old machine with little load). Could a kernel limit be getting hit (such as file descriptors or open sockets maybe)? If anyone has any suggestions,m please let me know. This is a problem that has me at my wits end! Thanks! -Leigh - Leigh Koven CyberComm Online Services [EMAIL PROTECTED] http://www.cybercomm.net/ http://www.thegovernment.net/ (732) 818- You can check out any time you like, but you can never leave - The Eagles - -- TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word unsubscribe to [EMAIL PROTECTED] . Trouble? e-mail to [EMAIL PROTECTED] .
Re: Extremely High Load
From personal experience this is a tad much for one machine. DNS can fill up some memory w/ cache and is a constant hit. Really should be its own 486 or so w/ some memory tossed in. Shell services can be dangerous, and a user could easily peg out a system. -- TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word unsubscribe to [EMAIL PROTECTED] . Trouble? e-mail to [EMAIL PROTECTED] .
Re: Extremely High Load
On Sat, 3 Jan 1998, Shaleh wrote: From personal experience this is a tad much for one machine. DNS can fill up some memory w/ cache and is a constant hit. Really should be its own 486 or so w/ some memory tossed in. Shell services can be dangerous, and a user could easily peg out a system. Eventually we plan to move everything to their own machines, but we're just not seeing this problem with the same load on the old machine (also a Pentium 233, but it was running as a P5-100 a month ago). -Leigh - Leigh Koven CyberComm Online Services [EMAIL PROTECTED] http://www.cybercomm.net/ http://www.thegovernment.net/ (732) 818- You can check out any time you like, but you can never leave - The Eagles - -- TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word unsubscribe to [EMAIL PROTECTED] . Trouble? e-mail to [EMAIL PROTECTED] .
Re: Extremely High Load
From personal experience this is a tad much for one machine. DNS can fill up some memory w/ cache and is a constant hit. Really should be its own 486 or so w/ some memory tossed in. Shell services can be dangerous, and a user could easily peg out a system. We run a shell machine, a dns server, and a e-mail/web server. We also have another machine running secondary names in addition to its usual load. -- TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word unsubscribe to [EMAIL PROTECTED] . Trouble? e-mail to [EMAIL PROTECTED] .