Re: Weird behaviour on System under high load

2023-05-28 Thread David Christensen

On 5/28/23 03:09, Christian wrote:

 Ursprüngliche Nachricht 
Von: David Christensen 
An: debian-user@lists.debian.org
Betreff: Re: Weird behaviour on System under high load
Datum: Sat, 27 May 2023 16:30:05 -0700

On 5/27/23 15:28, Christian wrote:


New day, new tests. Got a crash again, however with the message

"AHCI

controller unavailable".
Figured that is the SATA drives not being plugged in the right

order.

Corrected that and a 3:30h stress test went so far without any

issues

besides this old bug
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947685

Seems that I am just jumping from one error to the next...



3 hours and 30 minutes?  Yikes!  Please stop before you fry your
computer.  10 seconds should be enough to see a problem; 1 minute is
more than enough.


Sadly not always. My crashes before would occur between a few minutes
and 1 hour load. Now I hope everything is stable. Crashes are gone,
only the network error seems to be unresolved (even though there is
some workaround).



Repeatable crashes from a reported issue indicate your hardware is okay.



With the undervolting / overclocking on 12 core stress test, the system
stays below 65°C (on Smbusmaster0) so should be no risk of damage.



It is your computer and your decision.


At this point, I would start adding the software stack, one piece at a 
time, testing between each piece.  The challenge is devising or finding 
tests.  Spot testing by hand can reveal bugs, but that gets tiresome. 
The best approach is an automated/ scripted test suite.  If you are 
using Debian packages, you might want to look for test suites in the 
corresponding source packages.  And/or, you can use building from source 
as a stress test.  Compiling the Linux kernel should provide your 
processor, memory, and storage with a good workout.




Thanks for the help!


YW.  :-)


David



Re: Weird behaviour on System under high load

2023-05-28 Thread Christian
>  Ursprüngliche Nachricht 
> Von: David Christensen 
> An: debian-user@lists.debian.org
> Betreff: Re: Weird behaviour on System under high load
> Datum: Sat, 27 May 2023 16:30:05 -0700
> 
> On 5/27/23 15:28, Christian wrote:
> 
> > New day, new tests. Got a crash again, however with the message
> "AHCI
> > controller unavailable".
> > Figured that is the SATA drives not being plugged in the right
> order.
> > Corrected that and a 3:30h stress test went so far without any
> issues
> > besides this old bug
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947685
> > 
> > Seems that I am just jumping from one error to the next...
> 
> 
> 3 hours and 30 minutes?  Yikes!  Please stop before you fry your 
> computer.  10 seconds should be enough to see a problem; 1 minute is 
> more than enough.
> 
Sadly not always. My crashes before would occur between a few minutes
and 1 hour load. Now I hope everything is stable. Crashes are gone,
only the network error seems to be unresolved (even though there is
some workaround).

With the undervolting / overclocking on 12 core stress test, the system
stays below 65°C (on Smbusmaster0) so should be no risk of damage.

Thanks for the help!
> 
> David
> 
> 
> 



Re: Weird behaviour on System under high load

2023-05-27 Thread David Christensen

On 5/27/23 15:28, Christian wrote:


New day, new tests. Got a crash again, however with the message "AHCI
controller unavailable".
Figured that is the SATA drives not being plugged in the right order.
Corrected that and a 3:30h stress test went so far without any issues
besides this old bug
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947685

Seems that I am just jumping from one error to the next...



3 hours and 30 minutes?  Yikes!  Please stop before you fry your 
computer.  10 seconds should be enough to see a problem; 1 minute is 
more than enough.



David




Re: Weird behaviour on System under high load

2023-05-27 Thread Christian
>  Ursprüngliche Nachricht 
> Von: David Christensen 
> An: debian-user@lists.debian.org
> Betreff: Re: Weird behaviour on System under high load
> Datum: Fri, 26 May 2023 18:22:17 -0700
> 
> On 5/26/23 16:08, Christian wrote:
> 
> > Good and bad things:
> > I started to test different setups (always with full 12 core stress
> > test). Boot from USB liveCD (only stress and s-tui installed):
> > 
> > - All disks disconnected, other than M2. Standard BIOS
> > - All disks disconnected, other than M2. Proper Memory profile for
> > timing
> > - All disks disconnected, other than M2. Memory profile,
> undervolted
> > and overclocked with limited burst to 4ghz
> > - All disks connected. Memory profile, undervolted and overclocked
> > with
> > limited burst to 4ghz
> > 
> > All settings so far are stable. :-/
> > Will see tomorrow any differences in non-free firmware and kernel
> > modules and test again.
> > 
> > Very strange...
> 
> 
> If everything is stable, including undervoltage and overclocking, I 
> would consider that good.  I think your hardware is good.
> 
> 
> When you say "USB liveCD", is that a USB optical drive with a live
> CD, a 
> USB flash drive with a bootable OS on it, or something else?  If it
> is 
> something that can change, I suggest taking a image of the raw blocks
> with dd(1) so that you can easily get back to this point as you
> continue 
> testing.
> 

New day, new tests. Got a crash again, however with the message "AHCI
controller unavailable".
Figured that is the SATA drives not being plugged in the right order.
Corrected that and a 3:30h stress test went so far without any issues
besides this old bug
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947685

Seems that I am just jumping from one error to the next...

> 
> AIUI Debian can include microcode patches (depending upon processor).
> If you are using such, I suggest adding that to your test agenda
> first.
> 
> 
> Firmware and kernel modules seem like the right next steps.
> 
> 
> David
> 
> 



Re: Weird behaviour on System under high load

2023-05-26 Thread David Christensen

On 5/26/23 16:08, Christian wrote:


Good and bad things:
I started to test different setups (always with full 12 core stress
test). Boot from USB liveCD (only stress and s-tui installed):

- All disks disconnected, other than M2. Standard BIOS
- All disks disconnected, other than M2. Proper Memory profile for
timing
- All disks disconnected, other than M2. Memory profile, undervolted
and overclocked with limited burst to 4ghz
- All disks connected. Memory profile, undervolted and overclocked with
limited burst to 4ghz

All settings so far are stable. :-/
Will see tomorrow any differences in non-free firmware and kernel
modules and test again.

Very strange...



If everything is stable, including undervoltage and overclocking, I 
would consider that good.  I think your hardware is good.



When you say "USB liveCD", is that a USB optical drive with a live CD, a 
USB flash drive with a bootable OS on it, or something else?  If it is 
something that can change, I suggest taking a image of the raw blocks 
with dd(1) so that you can easily get back to this point as you continue 
testing.



AIUI Debian can include microcode patches (depending upon processor). 
If you are using such, I suggest adding that to your test agenda first.



Firmware and kernel modules seem like the right next steps.


David



Re: Weird behaviour on System under high load

2023-05-26 Thread Christian
>  Ursprüngliche Nachricht 
> Von: David Christensen 
> An: debian-user@lists.debian.org
> Betreff: Re: Weird behaviour on System under high load
> Datum: Sun, 21 May 2023 15:04:44 -0700
> 
> 
> > > > > What stresstest are you using?
> 
> > ... the package and command "s-tui" and "stress"
> > s-tui gives you an overview on power usage, fan control, temps,
> core
> > frequencies and core utilization on the console
> > 
> > stress is just producing load on selected # of cpus, it can be
> > integrated in s-tui.
> 
> 
> Thanks -- I like tools and will play with it:
> 
> https://packages.debian.org/bullseye/s-tui
> 
> 
> > > Okay.  Put my Perl script on your liveUSB.  Also put some tool
> for
> > > monitoring CPU temperature, such as sensors(1).
> > 
> > Will have time again in a few days and check.
> 
> 
> Please let us know what you find.
> 
Good and bad things: 
I started to test different setups (always with full 12 core stress
test). Boot from USB liveCD (only stress and s-tui installed):

- All disks disconnected, other than M2. Standard BIOS
- All disks disconnected, other than M2. Proper Memory profile for
timing
- All disks disconnected, other than M2. Memory profile, undervolted
and overclocked with limited burst to 4ghz
- All disks connected. Memory profile, undervolted and overclocked with
limited burst to 4ghz

All settings so far are stable. :-/
Will see tomorrow any differences in non-free firmware and kernel
modules and test again.

Very strange...



Re: Weird behaviour on System under high load

2023-05-21 Thread David Christensen

On 5/21/23 14:46, Christian wrote:

David Christensen Sun, 21 May 2023 14:22:22 -0700

On 5/21/23 06:31, Christian wrote:

David Christensen Sun, 21 May 2023 03:11:43 -0700

David Christensen Sat, 20 May 2023 18:00:48 -0700



Heat sinks, heat pipes, water blocks, radiators, fans, ducts, etc..


It is quite simple
- Noctua NH-L9a-AM4 for CPU
- Chassis 12cm fan
- PSU Integrated fans



I like the Noctua.  :-)



What stresstest are you using?



... the package and command "s-tui" and "stress"
s-tui gives you an overview on power usage, fan control, temps, core
frequencies and core utilization on the console

stress is just producing load on selected # of cpus, it can be
integrated in s-tui.



Thanks -- I like tools and will play with it:

https://packages.debian.org/bullseye/s-tui



Okay.  Put my Perl script on your liveUSB.  Also put some tool for
monitoring CPU temperature, such as sensors(1).


Will have time again in a few days and check.



Please let us know what you find.


David



Re: Weird behaviour on System under high load

2023-05-21 Thread Christian
>  Ursprüngliche Nachricht 
> Von: David Christensen 
> An: debian-user@lists.debian.org
> Betreff: Re: Weird behaviour on System under high load
> Datum: Sun, 21 May 2023 14:22:22 -0700
> 
> On 5/21/23 06:31, Christian wrote:
> > David Christensen Sun, 21 May 2023 03:11:43 -0700
> 
>  >>> David Christensen Sat, 20 May 2023 18:00:48 -0700
> 
> > > Please use inline posting style and proper indentation.
> > 
> > Phew... will be quite hard to read. But here you go.
> 
> 
> It is not hard when you delete the portions that you are not
> responding to.
> 
> 
> > > > > Have you cleaned the system interior, filters, fans,
> heatsinks,
> > > > > ducts,
> > > > > etc., recently?
> 
> > As written in OP, the system is new. Only PSU is used. So it is
> clean
> 
> 
> Okay.
> 
> 
> > What is a thermal solution?
> 
> 
> Heat sinks, heat pipes, water blocks, radiators, fans, ducts, etc..
> 
It is quite simple 
- Noctua NH-L9a-AM4 for CPU
- Chassis 12cm fan
- PSU Integrated fans
> 
> > > What stresstest are you using?
> > > 
> > stress running in s-tui
> 
> 
> Do you mean "in situ"?
> 
> https://www.merriam-webster.com/dictionary/in%20situ
> 
No, it is the package and command "s-tui" and "stress"
s-tui gives you an overview on power usage, fan control, temps, core
frequencies and core utilization on the console

stress is just producing load on selected # of cpus, it can be
integrated in s-tui.

> I prefer a tool that I can control.  That is why I wrote the
> previously 
> attached Perl script.  It is public domain; you and everyone are free
> to 
> use, modify, distribute, etc., as you see fit.
> 
> 
> > > > > Have you tested the power supply recently?
> 
> > It was working before without issues, so not explicitly tested.
> 
> > I am not building regularly, so would need to borrow such equipment
> > somewhere
> 
> 
> Understand that an ATX PSU has multiple stages that produce +12 VDC,
> +5 
> VDC, +5 VDC standby, +3.3 VDC, and -12 VDC ("rails").  It is common
> for 
> one or more rails to fail and the others to continue working. 
> Computers 
> exhibit "weird behaviour" when this happens.
> 
> 
> Just spend the US$20.
> 
> 
> > > > > Have you tested the memory recently?
> 
> > > Did you do multi-threaded/ stress tests?
> > > 
> > Yes, stress is running multiple threads. Only on 2 threads it was
> > stable so far. However it takes longer for the errors to come up
> when
> > using less threads. might be that I did not test long enough.
> 
> 
> I use Memtest86+ 5.01 on a bootable USB stick.  In the
> "Configuration" 
> menu, I can choose "Core Selection".  It appears the default is 
> "Parallel (all)".  Other choices include "Round Robin" and
> "Sequential". 
>   Memtest 5.01 also displays the CPU temperature.  Running it an
> Intel 
> Core i7-2600S with matching factory heat sink and fan for 30+
> minutes, 
> the current CPU temperature is 50 C.  This leads me to believe that
> the 
> memory is loaded to 100%, but the CPU is less (perhaps 60%?).
> 
> https://memtest.org/
> 
> 
> I recommend that you run Memtest86+ in parallel mode for at least one
> pass.  I have seen computers go for 20+ hours before encountering a 
> memory error.
> 
> 
> > > Did you see the problems when running Debian stable OOTB, before
> > > adding
> > > anything?
> 
> > I would need to do this with a liveUSB, to have it run OOTB
> 
> 
> Okay.  Put my Perl script on your liveUSB.  Also put some tool for 
> monitoring CPU temperature, such as sensors(1).

Will have time again in a few days and check.

> 
> 
> David
> 
> 



Re: Weird behaviour on System under high load

2023-05-21 Thread David Christensen

On 5/21/23 06:31, Christian wrote:

David Christensen Sun, 21 May 2023 03:11:43 -0700


>>> David Christensen Sat, 20 May 2023 18:00:48 -0700


Please use inline posting style and proper indentation.


Phew... will be quite hard to read. But here you go.



It is not hard when you delete the portions that you are not responding to.



Have you cleaned the system interior, filters, fans, heatsinks,
ducts,
etc., recently?



As written in OP, the system is new. Only PSU is used. So it is clean



Okay.



What is a thermal solution?



Heat sinks, heat pipes, water blocks, radiators, fans, ducts, etc..



What stresstest are you using?


stress running in s-tui



Do you mean "in situ"?

https://www.merriam-webster.com/dictionary/in%20situ


I prefer a tool that I can control.  That is why I wrote the previously 
attached Perl script.  It is public domain; you and everyone are free to 
use, modify, distribute, etc., as you see fit.




Have you tested the power supply recently?



It was working before without issues, so not explicitly tested.



I am not building regularly, so would need to borrow such equipment
somewhere



Understand that an ATX PSU has multiple stages that produce +12 VDC, +5 
VDC, +5 VDC standby, +3.3 VDC, and -12 VDC ("rails").  It is common for 
one or more rails to fail and the others to continue working.  Computers 
exhibit "weird behaviour" when this happens.



Just spend the US$20.



Have you tested the memory recently?



Did you do multi-threaded/ stress tests?


Yes, stress is running multiple threads. Only on 2 threads it was
stable so far. However it takes longer for the errors to come up when
using less threads. might be that I did not test long enough.



I use Memtest86+ 5.01 on a bootable USB stick.  In the "Configuration" 
menu, I can choose "Core Selection".  It appears the default is 
"Parallel (all)".  Other choices include "Round Robin" and "Sequential". 
 Memtest 5.01 also displays the CPU temperature.  Running it an Intel 
Core i7-2600S with matching factory heat sink and fan for 30+ minutes, 
the current CPU temperature is 50 C.  This leads me to believe that the 
memory is loaded to 100%, but the CPU is less (perhaps 60%?).


https://memtest.org/


I recommend that you run Memtest86+ in parallel mode for at least one 
pass.  I have seen computers go for 20+ hours before encountering a 
memory error.




Did you see the problems when running Debian stable OOTB, before
adding
anything?



I would need to do this with a liveUSB, to have it run OOTB



Okay.  Put my Perl script on your liveUSB.  Also put some tool for 
monitoring CPU temperature, such as sensors(1).



David



Re: Weird behaviour on System under high load

2023-05-21 Thread David Christensen

On 5/21/23 06:26, songbird wrote:

David Christensen wrote:
...

Measuring actual power supply output and system usage would involve
building or buying suitable test equipment.  The cost would be non-trivial.


...

   it depends upon how accurate you want to be and
how much power.

   for my system it was a simple matter of buying a
reasonably sized battery backup unit which includes
in it's display the amount of power being drawn in
watts.

   on sale the backup unit cost about $150 USD.  if
i want to see what something draws i have a power
cord set up to use for that and just plug it in
and watch the display as it operates.  if the
device is a computer part i can plug it in to my
motherboard or via usb or ...  as long as it gets
done with a grounding strip and i do the power
turn off and turn back on as is appropriate for
the device (and within ratings of my power supply).

   also use this setup to figure out how much power
the various wall warts are eating.  :(  switches on
all of them are worth the expense.


   songbird



Yes, there are a variety of price/performance options for measuring 
current and voltages between the AC power outlet and an AC load (such as 
a computer).



But, I was talking about measuring currents and voltages between a 
computer power supply output and the various components inside the 
computer.



David



Re: Weird behaviour on System under high load

2023-05-21 Thread songbird
David Christensen wrote:
...
> Measuring actual power supply output and system usage would involve 
> building or buying suitable test equipment.  The cost would be non-trivial.

...

  it depends upon how accurate you want to be and
how much power.

  for my system it was a simple matter of buying a
reasonably sized battery backup unit which includes
in it's display the amount of power being drawn in
watts.

  on sale the backup unit cost about $150 USD.  if
i want to see what something draws i have a power 
cord set up to use for that and just plug it in
and watch the display as it operates.  if the 
device is a computer part i can plug it in to my
motherboard or via usb or ...  as long as it gets
done with a grounding strip and i do the power 
turn off and turn back on as is appropriate for
the device (and within ratings of my power supply).

  also use this setup to figure out how much power
the various wall warts are eating.  :(  switches on
all of them are worth the expense.


  songbird



Re: Weird behaviour on System under high load

2023-05-21 Thread Christian
>  Ursprüngliche Nachricht 
> Von: David Christensen 
> An: debian-user@lists.debian.org
> Betreff: Re: Weird behaviour on System under high load
> Datum: Sun, 21 May 2023 03:11:43 -0700
> 
> On 5/21/23 01:14, Christian wrote:
> 
> > >  Ursprüngliche Nachricht 
> > > Von: David Christensen 
> > > An: debian-user@lists.debian.org
> > > Betreff: Re: Weird behaviour on System under high load
> > > Datum: Sat, 20 May 2023 18:00:48 -0700
> > > 
> > > On 5/20/23 14:46, Christian wrote:
> > > > Hi there,
> > > > 
> > > > I am having trouble with a new build system. It works normal
> and
> > > > stable
> > > > until I put extreme stress on it, e.g. using all 12 cores with
> > > > stress
> > > > tool.
> > > > 
> > > > System will suddenly loose network connection and become
> > > > unresponsive.
> > > > Only a reset works. I am not sure what is going on, but it is
> > > > reproducible: Put stress on the system and it fails. It seems,
> > > > that
> > > > something is getting out of step.
> > > > 
> > > > Stuff below I found in the logs. I tried quite a bit, even
> > > > upgraded
> > > > to
> > > > bookworm, to see if the newer kernel works.
> > > > 
> > > > If anyone knows how to analyze this issue, it would be very
> > > > helpful.
> 
> 
> Please use inline posting style and proper indentation.

Phew... will be quite hard to read. But here you go.

> 
> 
> > > Have you verified that your PSU has sufficient capacity for the
> > > load on
> > > each and every rail?
> 
>  > Hi there,
>  >
>  > Lets go through the different topics:
>  > - Setup: It is a AMD 5600G
> 
> https://www.amd.com/en/products/apu/amd-ryzen-5-5600g
> 
> 65 W
> 
> 
>  > on a ASRock B550M-ITX/ac,
> 
> 
> https://www.asrock.com/mb/AMD/B550M-ITXac/index.asp
> 
> 
>  > powered by a BeQuiet SP7 300W
>  >
>  > - Power: From the specifications it should fit. As it takes 5-20
>  > minutes for the error to occur, I would take that as an
> indication,
>  > that the power supply is ok. Otherwise would expect that to fail
> right
>  > away? Is there a way to measure/test if there is any issue with
> it?
>  > I also tested to limit PPT to 45W which also makes no difference.
> 
> 
> If all you have a motherboard, a 65W CPU, and an SSD, that looks like
> a 
> good quality 300W PSU and I would think it should support long-term
> full 
> loading of the CPU.  But, there is no substitute for doing the
> engineering.
> 
> 
> I do PSU calculations using a spreadsheet.  This requires finding
> power 
> specifications (or making estimates) for everything in the system,
> which 
> can be tough.
> 
> 
> BeQuiet has a PSU calculator.  I suggest using it:
> 
> https://www.bequiet.com/en/psucalculator
> 
> 
> Measuring actual power supply output and system usage would involve 
> building or buying suitable test equipment.  The cost would be non-
> trivial.
> 
> 
> An easy A/B test would be to connect a known-good, high-quality PSU
> with 
> a higher power rating (say, 500-1000W).  I use:
> 
> https://www.fractal-design.com/products/power-supplies/ion/ion-2-platinum-660w/black/
> 
Used the calculator, however might be, that the onboard graphics is not
attributed properly for. Will see that I get a 500W PSU for testing.
> 
> > > Have you cleaned the system interior, filters, fans, heatsinks,
> > > ducts,
> > > etc., recently?
> 
> 
> ?
As written in OP, the system is new. Only PSU is used. So it is clean
> 
> 
> > > Have you tested the thermal solution(s) recently?
> 
>  > - Thermal: I am observing the temperatures on the stresstest. If I
> am
>  > correct in reading Smbusmaster0, Temps haven't been above 71°C,
> but
>  > error also occurs earlier, way below 70.
> 
> 
> Okay.
> 
> 
> What is your CPU thermal solution?
> 
What is a thermal solution?
> 
> What stresstest are you using?
> 
stress running in s-tui
> 
> > > Have you tested the power supply recently?
> 
It was working before without issues, so not explicitly tested.
> 
> I suffered a rash of bad PSU's recently.  I was able to figure it out
> because I bought an inexpensive PSU tester years ago.  It has saved
> my 
> sanity more than once.  I suggest that you buy something like it:
> 
> https://www.ebay.com/sch/i.html?_from=R40&_t

Re: Weird behaviour on System under high load

2023-05-21 Thread David Christensen

On 5/21/23 01:14, Christian wrote:


 Ursprüngliche Nachricht 
Von: David Christensen 
An: debian-user@lists.debian.org
Betreff: Re: Weird behaviour on System under high load
Datum: Sat, 20 May 2023 18:00:48 -0700

On 5/20/23 14:46, Christian wrote:

Hi there,

I am having trouble with a new build system. It works normal and
stable
until I put extreme stress on it, e.g. using all 12 cores with stress
tool.

System will suddenly loose network connection and become
unresponsive.
Only a reset works. I am not sure what is going on, but it is
reproducible: Put stress on the system and it fails. It seems, that
something is getting out of step.

Stuff below I found in the logs. I tried quite a bit, even upgraded
to
bookworm, to see if the newer kernel works.

If anyone knows how to analyze this issue, it would be very helpful.



Please use inline posting style and proper indentation.



Have you verified that your PSU has sufficient capacity for the load on
each and every rail?


> Hi there,
>
> Lets go through the different topics:
> - Setup: It is a AMD 5600G

https://www.amd.com/en/products/apu/amd-ryzen-5-5600g

65 W


> on a ASRock B550M-ITX/ac,


https://www.asrock.com/mb/AMD/B550M-ITXac/index.asp


> powered by a BeQuiet SP7 300W
>
> - Power: From the specifications it should fit. As it takes 5-20
> minutes for the error to occur, I would take that as an indication,
> that the power supply is ok. Otherwise would expect that to fail right
> away? Is there a way to measure/test if there is any issue with it?
> I also tested to limit PPT to 45W which also makes no difference.


If all you have a motherboard, a 65W CPU, and an SSD, that looks like a 
good quality 300W PSU and I would think it should support long-term full 
loading of the CPU.  But, there is no substitute for doing the engineering.



I do PSU calculations using a spreadsheet.  This requires finding power 
specifications (or making estimates) for everything in the system, which 
can be tough.



BeQuiet has a PSU calculator.  I suggest using it:

https://www.bequiet.com/en/psucalculator


Measuring actual power supply output and system usage would involve 
building or buying suitable test equipment.  The cost would be non-trivial.



An easy A/B test would be to connect a known-good, high-quality PSU with 
a higher power rating (say, 500-1000W).  I use:


https://www.fractal-design.com/products/power-supplies/ion/ion-2-platinum-660w/black/



Have you cleaned the system interior, filters, fans, heatsinks, ducts,
etc., recently?



?



Have you tested the thermal solution(s) recently?


> - Thermal: I am observing the temperatures on the stresstest. If I am
> correct in reading Smbusmaster0, Temps haven't been above 71°C, but
> error also occurs earlier, way below 70.


Okay.


What is your CPU thermal solution?


What stresstest are you using?



Have you tested the power supply recently?



I suffered a rash of bad PSU's recently.  I was able to figure it out 
because I bought an inexpensive PSU tester years ago.  It has saved my 
sanity more than once.  I suggest that you buy something like it:


https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=antec+atx12+tester&_sacat=0



Have you tested the memory recently?


> - Memory: Yes was tested right after the build with no errors


Okay.


Did you do multi-threaded/ stress tests?



Are you running Debian stable?


Are you running Debian stable packages only?  Were they all installed
with the same package manager?


> - OS: I was running Debian stable in quite a minimal configuration
> (fresh install as most services are dockerized) when first observed the
> error. Now moved to Debian 12/Bookworm to see if it makes any
> difference with higher kernel (it does not). Also exchanged r8169 for
> the r8168. It changes the error messages, however system instability
> stays.


Did you see the problems when running Debian stable OOTB, before adding 
anything?



Did you stress test the system before adding anything (other than the 
stress test)?




If all of the above are okay and the system is still locking up, I
would
disable or remove all disks in the system, install a zeroed SSD,
install
Debian stable choosing only "SSH server" and "standard system
utilities", install only the stable packages required for your
workload,
put the workload on it, and see what happens.


> I could disconnect the disks and see if it makes any difference.
> However when reproducing this error, disks other than system where
> unmounted. So would guess this would be a test to see if it is about
> power?


Stripping the system down to minimum hardware and software is a good 
starting point.  You will need a tool to load the system and some means 
to watch what happens.  Assuming the base configuration passes all 
tests, then add something, test, and repeat until testing fails.




Re: Weird behaviour on System under high load

2023-05-21 Thread Christian
Hi there,

Lets go through the different topics:
- Setup: It is a AMD 5600G on a ASRock B550M-ITX/ac, powered by a
BeQuiet SP7 300W

- Power: From the specifications it should fit. As it takes 5-20
minutes for the error to occur, I would take that as an indication,
that the power supply is ok. Otherwise would expect that to fail right
away? Is there a way to measure/test if there is any issue with it?
I also tested to limit PPT to 45W which also makes no difference.

- Memory: Yes was tested right after the build with no errors

- Thermal: I am observing the temperatures on the stresstest. If I am
correct in reading Smbusmaster0, Temps haven't been above 71°C, but
error also occurs earlier, way below 70.

- OS: I was running Debian stable in quite a minimal configuration
(fresh install as most services are dockerized) when first observed the
error. Now moved to Debian 12/Bookworm to see if it makes any
difference with higher kernel (it does not). Also exchanged r8169 for
the r8168. It changes the error messages, however system instability
stays.

I could disconnect the disks and see if it makes any difference.
However when reproducing this error, disks other than system where
unmounted. So would guess this would be a test to see if it is about
power?

 Ursprüngliche Nachricht 
Von: David Christensen 
An: debian-user@lists.debian.org
Betreff: Re: Weird behaviour on System under high load
Datum: Sat, 20 May 2023 18:00:48 -0700

On 5/20/23 14:46, Christian wrote:
> Hi there,
> 
> I am having trouble with a new build system. It works normal and
> stable
> until I put extreme stress on it, e.g. using all 12 cores with stress
> tool.
> 
> System will suddenly loose network connection and become
> unresponsive.
> Only a reset works. I am not sure what is going on, but it is
> reproducible: Put stress on the system and it fails. It seems, that
> something is getting out of step.
> 
> Stuff below I found in the logs. I tried quite a bit, even upgraded
> to
> bookworm, to see if the newer kernel works.
> 
> If anyone knows how to analyze this issue, it would be very helpful.
> 
> Kind regards
>    Christian
> 
> 
> 2023-05-20T20:12:17.054224+02:00 diskstation kernel: [ 1303.236428] -
> --
> -[ cut here ]
> 2023-05-20T20:12:17.054234+02:00 diskstation kernel: [ 1303.236430]
> NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out
> 2023-05-20T20:12:17.054235+02:00 diskstation kernel: [ 1303.236437]
> WARNING: CPU: 5 PID: 2411 at net/sched/sch_generic.c:525
> dev_watchdog+0x207/0x210
> 2023-05-20T20:12:17.054236+02:00 diskstation kernel: [ 1303.236442]
> Modules linked in: eq3_char_loop(OE) rpi_rf_mod_led(OE) ledtrig_timer
> ledtrig_default_on xt_MASQUERADE nf_conntrack_netlink xfrm_user
> xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay ip6t_rt
> nft_chain_nat nf_nat xt_set xt_tcpmss xt_tcpudp xt_conntrack
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables
> ip_set_hash_ip ip_set binfmt_misc nfnetlink nls_ascii nls_cp437 vfat
> fat amdgpu iwlmvm btusb intel_rapl_msr btrtl intel_rapl_common btbcm
> btintel edac_mce_amd btmtk mac80211 snd_hda_codec_realtek bluetooth
> snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi gpu_sched
> kvm_amd drm_buddy libarc4 snd_hda_intel drm_display_helper
> snd_intel_dspcfg snd_intel_sdw_acpi iwlwifi kvm cec snd_hda_codec
> jitterentropy_rng irqbypass rc_core snd_hda_core cfg80211 snd_hwdep
> drm_ttm_helper snd_pcm ttm drbg wmi_bmof rapl ccp snd_timer
> ansi_cprng
> drm_kms_helper sp5100_tco snd pcspkr ecdh_generic rng_core
> i2c_algo_bit
> watchdog soundcore k10temp rfkill hb_rf_usb_2(OE) ecc
> 2023-05-20T20:12:17.054240+02:00 diskstation kernel: [ 1303.236494]
> generic_raw_uart(OE) acpi_cpufreq button joydev evdev sg nct6775
> nct6775_core drm hwmon_vid fuse loop efi_pstore configfs efivarfs
> ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs
> blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic
> dm_crypt dm_mod hid_generic usbhid hid sd_mod crc32_pclmul
> crc32c_intel
> ahci ghash_clmulni_intel sha512_ssse3 libahci xhci_pci sha512_generic
> xhci_hcd r8169 nvme realtek libata aesni_intel nvme_core t10_pi
> crypto_simd mdio_devres usbcore scsi_mod crc64_rocksoft_generic
> cryptd
> libphy crc64_rocksoft crc_t10dif i2c_piix4 crct10dif_generic
> crct10dif_pclmul crc64 crct10dif_common usb_common scsi_common video
> wmi gpio_amdpt gpio_generic
> 2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236534]
> CPU: 5 PID: 2411 Comm: stress Tainted: G   OE  6.1.0-9-
> amd64 #1  Debian 6.1.27-1
> 2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236536]
> Hardware name: To Be Filled By O.E.M. B550M-ITX/ac/B550M-ITX/ac, BIOS
> L2.62 01/31/2023
> 2023-05-20T20:12:17.

Re: Weird behaviour on System under high load

2023-05-20 Thread David Christensen

On 5/20/23 14:46, Christian wrote:

Hi there,

I am having trouble with a new build system. It works normal and stable
until I put extreme stress on it, e.g. using all 12 cores with stress
tool.

System will suddenly loose network connection and become unresponsive.
Only a reset works. I am not sure what is going on, but it is
reproducible: Put stress on the system and it fails. It seems, that
something is getting out of step.

Stuff below I found in the logs. I tried quite a bit, even upgraded to
bookworm, to see if the newer kernel works.

If anyone knows how to analyze this issue, it would be very helpful.

Kind regards
   Christian


2023-05-20T20:12:17.054224+02:00 diskstation kernel: [ 1303.236428] ---
-[ cut here ]
2023-05-20T20:12:17.054234+02:00 diskstation kernel: [ 1303.236430]
NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out
2023-05-20T20:12:17.054235+02:00 diskstation kernel: [ 1303.236437]
WARNING: CPU: 5 PID: 2411 at net/sched/sch_generic.c:525
dev_watchdog+0x207/0x210
2023-05-20T20:12:17.054236+02:00 diskstation kernel: [ 1303.236442]
Modules linked in: eq3_char_loop(OE) rpi_rf_mod_led(OE) ledtrig_timer
ledtrig_default_on xt_MASQUERADE nf_conntrack_netlink xfrm_user
xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay ip6t_rt
nft_chain_nat nf_nat xt_set xt_tcpmss xt_tcpudp xt_conntrack
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables
ip_set_hash_ip ip_set binfmt_misc nfnetlink nls_ascii nls_cp437 vfat
fat amdgpu iwlmvm btusb intel_rapl_msr btrtl intel_rapl_common btbcm
btintel edac_mce_amd btmtk mac80211 snd_hda_codec_realtek bluetooth
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi gpu_sched
kvm_amd drm_buddy libarc4 snd_hda_intel drm_display_helper
snd_intel_dspcfg snd_intel_sdw_acpi iwlwifi kvm cec snd_hda_codec
jitterentropy_rng irqbypass rc_core snd_hda_core cfg80211 snd_hwdep
drm_ttm_helper snd_pcm ttm drbg wmi_bmof rapl ccp snd_timer ansi_cprng
drm_kms_helper sp5100_tco snd pcspkr ecdh_generic rng_core i2c_algo_bit
watchdog soundcore k10temp rfkill hb_rf_usb_2(OE) ecc
2023-05-20T20:12:17.054240+02:00 diskstation kernel: [ 1303.236494]
generic_raw_uart(OE) acpi_cpufreq button joydev evdev sg nct6775
nct6775_core drm hwmon_vid fuse loop efi_pstore configfs efivarfs
ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs
blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic
dm_crypt dm_mod hid_generic usbhid hid sd_mod crc32_pclmul crc32c_intel
ahci ghash_clmulni_intel sha512_ssse3 libahci xhci_pci sha512_generic
xhci_hcd r8169 nvme realtek libata aesni_intel nvme_core t10_pi
crypto_simd mdio_devres usbcore scsi_mod crc64_rocksoft_generic cryptd
libphy crc64_rocksoft crc_t10dif i2c_piix4 crct10dif_generic
crct10dif_pclmul crc64 crct10dif_common usb_common scsi_common video
wmi gpio_amdpt gpio_generic
2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236534]
CPU: 5 PID: 2411 Comm: stress Tainted: G   OE  6.1.0-9-
amd64 #1  Debian 6.1.27-1
2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236536]
Hardware name: To Be Filled By O.E.M. B550M-ITX/ac/B550M-ITX/ac, BIOS
L2.62 01/31/2023
2023-05-20T20:12:17.054242+02:00 diskstation kernel: [ 1303.236537]
RIP: 0010:dev_watchdog+0x207/0x210
2023-05-20T20:12:17.054242+02:00 diskstation kernel: [ 1303.236540]
Code: 00 e9 40 ff ff ff 48 89 df c6 05 ff 5f 3d 01 01 e8 be 79 f9 ff 44
89 e9 48 89 de 48 c7 c7 c8 16 9b a8 48 89 c2 e8 09 d2 86 ff <0f> 0b e9
22 ff ff ff 66 90 0f 1f 44 00 00 55 53 48 89 fb 48 8b 6f
2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236541]
RSP: :a831c345fdc8 EFLAGS: 00010286
2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236543]
RAX:  RBX: 91a3c141 RCX: 
2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236544]
RDX: 0103 RSI: a893fa66 RDI: 
2023-05-20T20:12:17.054244+02:00 diskstation kernel: [ 1303.236545]
RBP: 91a3c1410488 R08:  R09: a831c345fc38
2023-05-20T20:12:17.054244+02:00 diskstation kernel: [ 1303.236546]
R10: 0003 R11: 91aafe27afe8 R12: 91a3c14103dc
2023-05-20T20:12:17.054245+02:00 diskstation kernel: [ 1303.236547]
R13:  R14: a7e2e7a0 R15: 91a3c1410488
2023-05-20T20:12:17.054245+02:00 diskstation kernel: [ 1303.236548] FS:
7f169849d740() GS:91aade34() knlGS:
2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236550] CS:
0010 DS:  ES:  CR0: 80050033
2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236551]
CR2: 55d05c3f4000 CR3: 000103cf2000 CR4: 00750ee0
2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236552]
PKRU: 5554
2023-05-20T20:12:17.054247+02:00 diskstation kernel: [ 1303.236553]
Call Trace:
2023-05-20T20:12:17.054247+02:00 diskstation kernel: [ 1303.236554]

2023-05-20T20:12:17.054248+02:00 diskstation kernel: [ 

Weird behaviour on System under high load

2023-05-20 Thread Christian
Hi there,

I am having trouble with a new build system. It works normal and stable
until I put extreme stress on it, e.g. using all 12 cores with stress
tool.

System will suddenly loose network connection and become unresponsive.
Only a reset works. I am not sure what is going on, but it is
reproducible: Put stress on the system and it fails. It seems, that
something is getting out of step.

Stuff below I found in the logs. I tried quite a bit, even upgraded to
bookworm, to see if the newer kernel works.

If anyone knows how to analyze this issue, it would be very helpful.

Kind regards
  Christian


2023-05-20T20:12:17.054224+02:00 diskstation kernel: [ 1303.236428] ---
-[ cut here ]
2023-05-20T20:12:17.054234+02:00 diskstation kernel: [ 1303.236430]
NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out
2023-05-20T20:12:17.054235+02:00 diskstation kernel: [ 1303.236437]
WARNING: CPU: 5 PID: 2411 at net/sched/sch_generic.c:525
dev_watchdog+0x207/0x210
2023-05-20T20:12:17.054236+02:00 diskstation kernel: [ 1303.236442]
Modules linked in: eq3_char_loop(OE) rpi_rf_mod_led(OE) ledtrig_timer
ledtrig_default_on xt_MASQUERADE nf_conntrack_netlink xfrm_user
xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay ip6t_rt
nft_chain_nat nf_nat xt_set xt_tcpmss xt_tcpudp xt_conntrack
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables
ip_set_hash_ip ip_set binfmt_misc nfnetlink nls_ascii nls_cp437 vfat
fat amdgpu iwlmvm btusb intel_rapl_msr btrtl intel_rapl_common btbcm
btintel edac_mce_amd btmtk mac80211 snd_hda_codec_realtek bluetooth
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi gpu_sched
kvm_amd drm_buddy libarc4 snd_hda_intel drm_display_helper
snd_intel_dspcfg snd_intel_sdw_acpi iwlwifi kvm cec snd_hda_codec
jitterentropy_rng irqbypass rc_core snd_hda_core cfg80211 snd_hwdep
drm_ttm_helper snd_pcm ttm drbg wmi_bmof rapl ccp snd_timer ansi_cprng
drm_kms_helper sp5100_tco snd pcspkr ecdh_generic rng_core i2c_algo_bit
watchdog soundcore k10temp rfkill hb_rf_usb_2(OE) ecc
2023-05-20T20:12:17.054240+02:00 diskstation kernel: [ 1303.236494] 
generic_raw_uart(OE) acpi_cpufreq button joydev evdev sg nct6775
nct6775_core drm hwmon_vid fuse loop efi_pstore configfs efivarfs
ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs
blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic
dm_crypt dm_mod hid_generic usbhid hid sd_mod crc32_pclmul crc32c_intel
ahci ghash_clmulni_intel sha512_ssse3 libahci xhci_pci sha512_generic
xhci_hcd r8169 nvme realtek libata aesni_intel nvme_core t10_pi
crypto_simd mdio_devres usbcore scsi_mod crc64_rocksoft_generic cryptd
libphy crc64_rocksoft crc_t10dif i2c_piix4 crct10dif_generic
crct10dif_pclmul crc64 crct10dif_common usb_common scsi_common video
wmi gpio_amdpt gpio_generic
2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236534]
CPU: 5 PID: 2411 Comm: stress Tainted: G   OE  6.1.0-9-
amd64 #1  Debian 6.1.27-1
2023-05-20T20:12:17.054241+02:00 diskstation kernel: [ 1303.236536]
Hardware name: To Be Filled By O.E.M. B550M-ITX/ac/B550M-ITX/ac, BIOS
L2.62 01/31/2023
2023-05-20T20:12:17.054242+02:00 diskstation kernel: [ 1303.236537]
RIP: 0010:dev_watchdog+0x207/0x210
2023-05-20T20:12:17.054242+02:00 diskstation kernel: [ 1303.236540]
Code: 00 e9 40 ff ff ff 48 89 df c6 05 ff 5f 3d 01 01 e8 be 79 f9 ff 44
89 e9 48 89 de 48 c7 c7 c8 16 9b a8 48 89 c2 e8 09 d2 86 ff <0f> 0b e9
22 ff ff ff 66 90 0f 1f 44 00 00 55 53 48 89 fb 48 8b 6f
2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236541]
RSP: :a831c345fdc8 EFLAGS: 00010286
2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236543]
RAX:  RBX: 91a3c141 RCX: 
2023-05-20T20:12:17.054243+02:00 diskstation kernel: [ 1303.236544]
RDX: 0103 RSI: a893fa66 RDI: 
2023-05-20T20:12:17.054244+02:00 diskstation kernel: [ 1303.236545]
RBP: 91a3c1410488 R08:  R09: a831c345fc38
2023-05-20T20:12:17.054244+02:00 diskstation kernel: [ 1303.236546]
R10: 0003 R11: 91aafe27afe8 R12: 91a3c14103dc
2023-05-20T20:12:17.054245+02:00 diskstation kernel: [ 1303.236547]
R13:  R14: a7e2e7a0 R15: 91a3c1410488
2023-05-20T20:12:17.054245+02:00 diskstation kernel: [ 1303.236548] FS:
7f169849d740() GS:91aade34() knlGS:
2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236550] CS:
0010 DS:  ES:  CR0: 80050033
2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236551]
CR2: 55d05c3f4000 CR3: 000103cf2000 CR4: 00750ee0
2023-05-20T20:12:17.054246+02:00 diskstation kernel: [ 1303.236552]
PKRU: 5554
2023-05-20T20:12:17.054247+02:00 diskstation kernel: [ 1303.236553]
Call Trace:
2023-05-20T20:12:17.054247+02:00 diskstation kernel: [ 1303.236554] 

2023-05-20T20:12:17.054248+02:00 diskstation kernel: [ 1303.236557]  ?

Re: NFS on Raspberry Pi high load

2015-06-21 Thread Sven Hartge
Bob Proulx b...@proulx.com wrote:

 I don't know about the new Raspberry quad core.  Does it have the same
 limited usb chip as the original?

It does. But because the CPU is more powerful (and you have 4 cores) you
can squeeze about 95MBit/s out of it.

Right now I am dd'ing a 600MB file over NFS (the Raspi2 is the client) to
/dev/null and the transfer rate (measured on the server) is stable at
96.7MBite/s, but one core is fully occupied with the transfer and the
dd-process is mostly in the D-state and does not use much CPU at all
(about 1% according to top).

Final results:

(with not special blocksize setting):
1273709+1 records in
1273709+1 records out
652139386 bytes (621.9MB) copied, 55.460434 seconds, 11.2MB/s
real0m 55.46s
user0m 0.75s
sys 0m 6.31s

(with bs=4M):
155+1 records in
155+1 records out
652139386 bytes (621.9MB) copied, 55.431787 seconds, 11.2MB/s
real0m 55.44s
user0m 0.00s
sys 0m 2.68s

Copying the same file to the SDHC card takes a little bit longer, but
not much:
real1m 1.91s
user0m 0.13s
sys 0m 8.12s

Grüße,
Sven.

-- 
Sigmentation fault. Core dumped.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/2bna0mm4b...@mids.svenhartge.de



Re: NFS on Raspberry Pi high load

2015-06-21 Thread Bob Proulx
Sven Hartge wrote:
 Reco wrote:
  Sven Hartge wrote:
  Maybe the USB hardware implementation is better in the N900? The one
  in the Pi is quite bad and finicky.

I am coming to this discussion late but I had to confirm that the USB
chip in the Raspberry Pi is very limiting.  It has a maximum bandwidth
of around 50Mbps and everything including ethernet goes through it.
This means that if you have one data stream that it will get a maximum
of 50Mbps.

If you have two streams, such as if using the Raspberry Pi for a
router and it is routing packets in one interface and out a different
one, then the maximum throughput is 25Mbps with one stream in and one
stream out.  I have a good friend who has worked on the drivers for
the pi and he told me that the usb chip generated a minimum of 2000
interrupts per second even at idle.  I will pass that along as hearsay
because it seems plausible.

For example this CEC limitation makes the Raspberry Pi acceptable for
an 802.11b WiFi access point at 11Mbps but not able to keep up with
'g' or 'n' speeds.  It is simply hardware limited.

If you have 8 nfsds running and let's say each of them try to use only
one data stream then each will get only 6.25Mbps maximum.  They will
spend a lot of time in the run queue blocked on I/O waiting for the
network to respond.

basti wrote:
 Per default nfs starts with 8 servers
 root@raspberrypi:~# head -n 2 /etc/default/nfs-kernel-server
 # Number of servers to start up
 RPCNFSDCOUNT=8

As you have found doing any type of real transfer will immediately
consume 8 processese because each daemon will be in the run queue
ready to run but waiting for I/O.  The biggest problem is that each
daemon will consume *memory*.  The daemon won't consume cpu while it
is blocked waiting on I/O.  But it will consume memory.  Memory for
the nfs daemons.  Memory for the kernel to track the multiple network
streams active.  Memory for file system buffer cache.  Everything
takes memory.  The Raspbery Pi has a limited 512M of ram.

I like seeing the bar graph of the memory visualization from htop.  I
suggest installing htop and looking at the memory bar graph displaying
the amount of consumed memory and the amount available for cache.

 So I try to transfer a 3GB file from the raspberry to my laptop via
 WLAN(n).  This operation kills my raspberry.  I get a load of 12 and
 more. 10 Minutes after I interrupt this operation the load was still
 at 10.

In addition to the 8 processes consuming memory from the 8 nfsds there
will need to be additional cpu to deal with the driver for the usb
chip.  It will need to handle the accounting for the multiple network
streams.  A single stream will take less resources than 8 streams.
And anything else that happens along.  That extra accounts for the
load of 10 you are seeing.

But the real problem is probably the lack of memory.  The many
processes stacked up and the I/O buffers will likely have consumed
everything.

 So I deside to reduce the number of servers to 2. Now it's a bit
 better, the load is only around 5.

That was a good configuration modification.  About the best you can do.

 Can somebody reproduce this behavior?

Yes.  Easily!  It is simply a natural consequence of the limited
hardware of the Raspberry Pi.

I have become a fan of the newer Banana Pi.  It is very Raspberry-like
but has a different CEC and doesn't have that very limited 50Mbps usb
chip found on the Raspberry.  On the Banana Pi there is 1G of ram,
twice that of the Raspbery.  It is a dual core arm, again twice the
Raspberry.  It is an armv7l architecture and therefore runs stock
Debian.  And best yet for your purposes it has much higher speed I/O.

On the Banana Pi I can routinely get 750Mbps through a single ethernet
connection.  That is about the same performance as an Intel Atom D525.
The Banana Pi makes a much better practical machine than the Raspberry.
The price of the Banana is currently running around US $42 only $7
more than the Raspberry.  It is a much more capable machine.

I don't know about the new Raspberry quad core.  Does it have the same
limited usb chip as the original?

Bob


signature.asc
Description: Digital signature


Re: NFS on Raspberry Pi high load

2015-06-19 Thread Sven Hartge
Reco recovery...@gmail.com wrote:
 On Fri, 19 Jun 2015 20:38:12 +0200 Sven Hartge s...@svenhartge.de wrote:


 Maybe the USB hardware implementation is better in the N900? The one
 in the Pi is quite bad and finicky.

 I happen to have Pi too. Not that I need an NFS server on it, NFS
 client is sufficient for my needs, but still.

  
 In addition to that, data transfer via USB is quite CPU-intensive, as
 Petter wrote and overwhelms the single CPU core of the Pi if it needs to
 drive the SD card at the same time.

 Hm. I plugged an Ethernet cable into it, read and wrote a big file via
 NFS. Got consistent 50mbps.

Where did you write the file to and from? You said your Pi is a
NFS client so I assume you wrote a file to a server and read it back
from there.

 According to iperf, I could go as high as 82.2 mbps. Not the fair
 gigabit I have on this LAN, but close to theoretical 100mbit limit of
 the NIC.

iperf does no file I/O so nearly every CPU cylce can be used for the USB
transfer.

 During the NFS test, two kernel threads were the worst CPU
 consumers, kworker/0 and ksoftirqd/0.

 During the iperf test, the worst CPU consumers were iperf itself and
 ksoftirqd/0.

 According to the /proc/interrupts, the top interrupt consumer was
 IRQ32, which is:

 dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb1

That is the driver for the USB port, a DesignWare OnTheGo USB
controller. The controller is able to drive the USB port as either a
host or a client.

This chip and the driver are a constant work in progress and depending
on the kernel version and the firmware your luck with the USB port on
the Pi might be better or worse.

For example:
http://ludovicrousseau.blogspot.de/2014/04/usb-issues-with-raspberry-pi.html

So maybe by updating the bootloader and GPU firmware to the latest from
https://github.com/raspberrypi/firmware one might be able to improve the
situation.

Grüße,
Sven.

-- 
Sigmentation fault. Core dumped.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/7bn4gvfph...@mids.svenhartge.de



Re: NFS on Raspberry Pi high load

2015-06-19 Thread Michael Biebl
Am 19.06.2015 um 14:47 schrieb Petter Adsen:
 On Fri, 19 Jun 2015 14:09:45 +0200
 basti black.flederm...@arcor.de wrote:
 
 The Problem is not the speed of 3 MB/s it's the load of 12 and more.

 On 19.06.2015 14:03, Sven Hartge wrote:
 basti black.flederm...@arcor.de wrote:

 iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card
 (10-15 MB/s read, 9-5 MB/s write I guess).
 More than 3MByte/s is not really achievable with a Pi-1, because the CPU
 is very weak and the Ethernet-Chip is attached via USB.

 Under the best conditions you may be able to transfer up to 45MBit/s,
 but a maximum transfer rate of about 35MBit/s is normal.
 
 The load is so high because USB is very CPU-intensive. If you were to
 use the on-board Ethernet, you would not see such a high load.

The pi has no on-board ethernet. The ethernet port is attached via USB.


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Re: NFS on Raspberry Pi high load

2015-06-19 Thread Reco
 Hi.

On Fri, Jun 19, 2015 at 02:47:20PM +0200, Petter Adsen wrote:
 On Fri, 19 Jun 2015 14:09:45 +0200
 basti black.flederm...@arcor.de wrote:
 
  The Problem is not the speed of 3 MB/s it's the load of 12 and more.
  
  On 19.06.2015 14:03, Sven Hartge wrote:
   basti black.flederm...@arcor.de wrote:
  
   iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card
   (10-15 MB/s read, 9-5 MB/s write I guess).
   More than 3MByte/s is not really achievable with a Pi-1, because the CPU
   is very weak and the Ethernet-Chip is attached via USB.
  
   Under the best conditions you may be able to transfer up to 45MBit/s,
   but a maximum transfer rate of about 35MBit/s is normal.
 
 The load is so high because USB is very CPU-intensive. If you were to
 use the on-board Ethernet, you would not see such a high load.

What? Are you serious? I have this Nokia N900 lying behind me which is
connected by IP-via-USB (aka usbnet aka g_ether) and with the order of
magnitude slower ARM CPU it reliably shows 40mbps with no noticeable
load.

There are countless things I'd blame in this situation (large amounts of
sync I/O from knfsd, relatively small amount of memory for a NFS server,
HUEG read/write latency of MMC card), but blaming the type of Ethernet
connection is the last thing I'd do.

Regardless, there's a way to see the cause of all this trouble.
Relatively new, but demonstrative one:

perf record --a
perf report perf.data

Reco


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150619130248.gb20...@d1696.int.rdtex.ru



Re: NFS on Raspberry Pi high load

2015-06-19 Thread Sven Hartge
basti black.flederm...@arcor.de wrote:

 iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card
 (10-15 MB/s read, 9-5 MB/s write I guess).

More than 3MByte/s is not really achievable with a Pi-1, because the CPU
is very weak and the Ethernet-Chip is attached via USB.

Under the best conditions you may be able to transfer up to 45MBit/s,
but a maximum transfer rate of about 35MBit/s is normal.

Grüße,
Sven.

-- 
Sigmentation fault. Core dumped.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/3bn3l39ph...@mids.svenhartge.de



Re: NFS on Raspberry Pi high load

2015-06-19 Thread Petter Adsen
On Fri, 19 Jun 2015 14:09:45 +0200
basti black.flederm...@arcor.de wrote:

 The Problem is not the speed of 3 MB/s it's the load of 12 and more.
 
 On 19.06.2015 14:03, Sven Hartge wrote:
  basti black.flederm...@arcor.de wrote:
 
  iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card
  (10-15 MB/s read, 9-5 MB/s write I guess).
  More than 3MByte/s is not really achievable with a Pi-1, because the CPU
  is very weak and the Ethernet-Chip is attached via USB.
 
  Under the best conditions you may be able to transfer up to 45MBit/s,
  but a maximum transfer rate of about 35MBit/s is normal.

The load is so high because USB is very CPU-intensive. If you were to
use the on-board Ethernet, you would not see such a high load.

Petter

-- 
I'm ionized
Are you sure?
I'm positive.


pgpb9DSiuayKO.pgp
Description: OpenPGP digital signature


NFS on Raspberry Pi high load

2015-06-19 Thread basti
Hello,
perhaps thats a bit OT but I can't found a Rasbian or RaspberryPi
related mailinglist.

Per default nfs starts with 8 servers

root@raspberrypi:~# head -n 2 /etc/default/nfs-kernel-server
# Number of servers to start up
RPCNFSDCOUNT=8

So I try to transfer a 3GB file from the raspberry to my laptop via WLAN(n).
This operation kills my raspberry.
I get a load of 12 and more. 10 Minutes after I interrupt this operation
the load was still at 10.
So I deside to reduce the number of servers to 2. Now it's a bit better,
the load is only around 5.

iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card
(10-15 MB/s read, 9-5 MB/s write I guess).

Test on Pi 1 model B with 512MB RAM.

Can somebody reproduce this behavior?

Thanks a lot.
Regards Basti


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/5583f766.9050...@arcor.de



Re: NFS on Raspberry Pi high load

2015-06-19 Thread basti
The Problem is not the speed of 3 MB/s it's the load of 12 and more.

On 19.06.2015 14:03, Sven Hartge wrote:
 basti black.flederm...@arcor.de wrote:

 iotop show me a read speed around 3 MB/s, there is a Class 10 UHS card
 (10-15 MB/s read, 9-5 MB/s write I guess).
 More than 3MByte/s is not really achievable with a Pi-1, because the CPU
 is very weak and the Ethernet-Chip is attached via USB.

 Under the best conditions you may be able to transfer up to 45MBit/s,
 but a maximum transfer rate of about 35MBit/s is normal.

 Grüße,
 Sven.



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/55840689.8080...@arcor.de



Re: NFS on Raspberry Pi high load

2015-06-19 Thread Reco
 Hi.

On Fri, 19 Jun 2015 20:38:12 +0200
Sven Hartge s...@svenhartge.de wrote:

 Reco recovery...@gmail.com wrote:
  On Fri, Jun 19, 2015 at 02:47:20PM +0200, Petter Adsen wrote:
  On Fri, 19 Jun 2015 14:09:45 +0200
  basti black.flederm...@arcor.de wrote:
  On 19.06.2015 14:03, Sven Hartge wrote:
  basti black.flederm...@arcor.de wrote:
 
  iotop show me a read speed around 3 MB/s, there is a Class 10 UHS
  card (10-15 MB/s read, 9-5 MB/s write I guess).
 
  More than 3MByte/s is not really achievable with a Pi-1, because
  the CPU is very weak and the Ethernet-Chip is attached via USB.
 
  Under the best conditions you may be able to transfer up to
  45MBit/s, but a maximum transfer rate of about 35MBit/s is normal.
 
  The Problem is not the speed of 3 MB/s it's the load of 12 and more.
  
  The load is so high because USB is very CPU-intensive. If you were to
  use the on-board Ethernet, you would not see such a high load.
 
  What? Are you serious? I have this Nokia N900 lying behind me which is
  connected by IP-via-USB (aka usbnet aka g_ether) and with the order of
  magnitude slower ARM CPU it reliably shows 40mbps with no noticeable
  load.
 
 Maybe the USB hardware implementation is better in the N900? The one in
 the Pi is quite bad and finicky.

I happen to have Pi too. Not that I need an NFS server on it, NFS
client is sufficient for my needs, but still.

 
 In addition to that, data transfer via USB is quite CPU-intensive, as
 Petter wrote and overwhelms the single CPU core of the Pi if it needs to
 drive the SD card at the same time.

Hm. I plugged an Ethernet cable into it, read and wrote a big file via
NFS. Got consistent 50mbps.

According to iperf, I could go as high as 82.2 mbps. Not the fair
gigabit I have on this LAN, but close to theoretical 100mbit limit of
the NIC.

During the NFS test, two kernel threads were the worst CPU
consumers, kworker/0 and ksoftirqd/0.

During the iperf test, the worst CPU consumers were iperf itself and
ksoftirqd/0.

According to the /proc/interrupts, the top interrupt consumer was
IRQ32, which is:

dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb1


On the other hand, a simple cat /dev/zero  file test provided me with
100% iowait, but no actual CPU usage. 


Perf mysteriously failed on me. It did record something, but 'perf
report' refused me to show anything. Must be something with this custom
Raspbian kernel.

So, I agree that using Pi's Ethernet interface eats CPU, but saying
'USB eats CPU' is oversimplifying thing quite a bit.
Specifically, if NFS is involved.


What I suspect was happening with your NFS server is the multiple knfsd
threads in D-state (i.e. blocked by iowait by slof MMC card) *plus*
this USB Ethernet interrupts. I'd start with lowering knfsd count.


 If the source or destination of the transmitted data is on an USB medium
 it gets even worse because all USB ports share the same root port on the
 SoC.

I'm too lazy to check it, so I'll trust you on this.


 Besides: I always found the load on Linux NFS servers to be higher than
 on a Samba-Server with equal throughput. I guess the calculation of the
 load is different for the NFS kernel server process than for userland
 fileservices.

I have to trust you on this too. Never bothered myself with inferior
network filesystems (Samba) due to the existence of superior one (NFS4).

And, speaking of those network filesystems. Have you tried to use iSCSI
to do whatever you're trying to do with NFS? What about a simple sshfs?

Reco


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150619222457.216fb96452d765f16fb17...@gmail.com



Re: NFS on Raspberry Pi high load

2015-06-19 Thread Sven Hartge
Reco recovery...@gmail.com wrote:
 On Fri, 19 Jun 2015 20:38:12 +0200 Sven Hartge s...@svenhartge.de wrote:

 What I suspect was happening with your NFS server is the multiple
 knfsd threads in D-state (i.e. blocked by iowait by slof MMC card)
 *plus* this USB Ethernet interrupts. I'd start with lowering knfsd
 count.

That would also be my first step. I would lower RPCNFSDCOUNT to 2.

 If the source or destination of the transmitted data is on an USB
 medium it gets even worse because all USB ports share the same root
 port on the SoC.

 I'm too lazy to check it, so I'll trust you on this.

Data enters the SoC through USB from the ethernet chip and then is
pushed out on the same shared bus to the USB disk. This absolutely kills
the Pi.

 Besides: I always found the load on Linux NFS servers to be higher
 than on a Samba-Server with equal throughput. I guess the calculation
 of the load is different for the NFS kernel server process than for
 userland fileservices.

 I have to trust you on this too. Never bothered myself with inferior
 network filesystems (Samba) due to the existence of superior one
 (NFS4).

Well, if you want to serve files to many different operating systems you
cannot always use the tools you want if you are not able to control the
protocol the client wants or need to speak.

 And, speaking of those network filesystems. Have you tried to use iSCSI
 to do whatever you're trying to do with NFS? What about a simple sshfs?

sshfs couples the problems of the USB network port with the slow ARM-CPU
doing crypto stuff. You won't win any speed records with that
combination.

Grüße,
Sven.

-- 
Sigmentation fault. Core dumped.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/6bn4gjrph...@mids.svenhartge.de



Re: NFS on Raspberry Pi high load

2015-06-19 Thread Sven Hartge
Reco recovery...@gmail.com wrote:
 On Fri, Jun 19, 2015 at 02:47:20PM +0200, Petter Adsen wrote:
 On Fri, 19 Jun 2015 14:09:45 +0200
 basti black.flederm...@arcor.de wrote:
 On 19.06.2015 14:03, Sven Hartge wrote:
 basti black.flederm...@arcor.de wrote:

 iotop show me a read speed around 3 MB/s, there is a Class 10 UHS
 card (10-15 MB/s read, 9-5 MB/s write I guess).

 More than 3MByte/s is not really achievable with a Pi-1, because
 the CPU is very weak and the Ethernet-Chip is attached via USB.

 Under the best conditions you may be able to transfer up to
 45MBit/s, but a maximum transfer rate of about 35MBit/s is normal.

 The Problem is not the speed of 3 MB/s it's the load of 12 and more.
 
 The load is so high because USB is very CPU-intensive. If you were to
 use the on-board Ethernet, you would not see such a high load.

 What? Are you serious? I have this Nokia N900 lying behind me which is
 connected by IP-via-USB (aka usbnet aka g_ether) and with the order of
 magnitude slower ARM CPU it reliably shows 40mbps with no noticeable
 load.

Maybe the USB hardware implementation is better in the N900? The one in
the Pi is quite bad and finicky.

In addition to that, data transfer via USB is quite CPU-intensive, as
Petter wrote and overwhelms the single CPU core of the Pi if it needs to
drive the SD card at the same time.

If the source or destination of the transmitted data is on an USB medium
it gets even worse because all USB ports share the same root port on the
SoC.

Besides: I always found the load on Linux NFS servers to be higher than
on a Samba-Server with equal throughput. I guess the calculation of the
load is different for the NFS kernel server process than for userland
fileservices.

Grüße,
Sven.

-- 
Sigmentation fault. Core dumped.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/4bn4bukph...@mids.svenhartge.de



Re: Re: MySQL slow and high load with Debian Wheezy (was: [whole mail text])

2013-09-07 Thread Daniel Enright
Found this thread searching for a solution to my problem (which sounds 
similar).


My solution was barrier=0 in /etc/fstab see

https://wiki.archlinux.org/index.php/Ext4

Uh, specifically my problem was that loading large mysql files took 
forever and would often end with mysql losing the connection (local 
mysql daemon). Smaller files were still slow (especially from the 
context of running unit tests that do lots of mysql queries via sql files.


iostat (apt-get install sysstat; iostat -x -d sda 5;) showed very high 
%util, but very low writes. Interestingly when I pointed the mysql 
server to store the data on slower usb mounted drives I had better 
performance.  Anyway my old workstation running squeeze gives good 
performance. Wheezy bad performance until barrier=0 was added.


Before using barrier=0 I played around with stuff such as wrapping sql 
in set autocommit =0; and commit; Also played with innodb mysql server 
settings.


But, for me barrier=0 is awesome! (Using laptop so have battery in case 
power fails...).


Daniel


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/522b7920.90...@gmail.com



Re: High Load/Interrupts on Wheezy

2013-07-03 Thread Darac Marjal
On Tue, Jul 02, 2013 at 08:54:06PM -0400, Will Platnick wrote:
 I am experiencing some issues with load after upgrading some of my Squeeze 
 boxes to Wheezy. I have 7 app servers, all with identical hardware with 
 identical packages and code. I upgraded one of my boxes to wheezy, along with 
 the custom packages we use for Python, PHP, etc… Same versions of the 
 software, just built on Wheezy instead of Squeeze. My problem is that my 
 Wheezy boxes have a load of over 3 and are not staying up during our peak 
 time, whereas our squeeze boxes have a load of less than 1. 
 The interesting part, is that despite the high load, my wheezy boxes are 
 actually performing quite well, and are outperforming my squeeze boxes by 2-3 
 ms. Never the less, the high load is giving us cause for concern and is 
 stopping us from migrating completely, and we're wondering if anybody else is 
 seeing the same thing or can give us some assistance on where to go from here.
 I believe I have tracked down the issue with our load to be an interrupt 
 issue. My interrupts on wheezy are way higher. CPU, I/O, Memory and Context 
 Switches are all the same (measured with top, atop, iotop, vmstat). It 
 doesn't appear to be a hardware issue, as I deployed wheezy and our code base 
 to a different and faster motherboard/cpu combo, and the issue remained.
 The items that stands out is that my Rescheduling Interrupts and timer 
 are interrupting like crazy on wheezy compared to squeeze. Here is my output 
 of total interrupts on Squeeze vs Wheezy for two different machines, rebooted 
 and placed into service at the exact same time, with traffic distributed to 
 them via round robin, so it should be fairly equal.
 Rescheduling Interrupts: 4109580 on Wheezy vs 67418 on Squeeze.
 Timer: 504238 on Wheezy vs 50 on Squeeze.
 
 Thoughts? Suggestions?

This was the first search result for Rescheduling Interrupts. The
advice should apply to Debian equally well.
https://help.ubuntu.com/community/ReschedulingInterrupts



signature.asc
Description: Digital signature


Re: High Load/Interrupts on Wheezy

2013-07-03 Thread Will Platnick
I followed those. I got nothing.
—
Sent from Mailbox for iPhone

On Wed, Jul 3, 2013 at 5:24 AM, Darac Marjal mailingl...@darac.org.uk
wrote:

 On Tue, Jul 02, 2013 at 08:54:06PM -0400, Will Platnick wrote:
 I am experiencing some issues with load after upgrading some of my Squeeze 
 boxes to Wheezy. I have 7 app servers, all with identical hardware with 
 identical packages and code. I upgraded one of my boxes to wheezy, along 
 with the custom packages we use for Python, PHP, etc… Same versions of the 
 software, just built on Wheezy instead of Squeeze. My problem is that my 
 Wheezy boxes have a load of over 3 and are not staying up during our peak 
 time, whereas our squeeze boxes have a load of less than 1. 
 The interesting part, is that despite the high load, my wheezy boxes are 
 actually performing quite well, and are outperforming my squeeze boxes by 
 2-3 ms. Never the less, the high load is giving us cause for concern and is 
 stopping us from migrating completely, and we're wondering if anybody else 
 is seeing the same thing or can give us some assistance on where to go from 
 here.
 I believe I have tracked down the issue with our load to be an interrupt 
 issue. My interrupts on wheezy are way higher. CPU, I/O, Memory and Context 
 Switches are all the same (measured with top, atop, iotop, vmstat). It 
 doesn't appear to be a hardware issue, as I deployed wheezy and our code 
 base to a different and faster motherboard/cpu combo, and the issue remained.
 The items that stands out is that my Rescheduling Interrupts and timer 
 are interrupting like crazy on wheezy compared to squeeze. Here is my output 
 of total interrupts on Squeeze vs Wheezy for two different machines, 
 rebooted and placed into service at the exact same time, with traffic 
 distributed to them via round robin, so it should be fairly equal.
 Rescheduling Interrupts: 4109580 on Wheezy vs 67418 on Squeeze.
 Timer: 504238 on Wheezy vs 50 on Squeeze.
 
 Thoughts? Suggestions?
 This was the first search result for Rescheduling Interrupts. The
 advice should apply to Debian equally well.
 https://help.ubuntu.com/community/ReschedulingInterrupts

Re: High Load/Interrupts on Wheezy

2013-07-03 Thread Will Platnick
Something else I just noticed now that I'm on a screen high enough to show all of /proc/interrupts on one line:Non-maskable interrupts are happening on Wheezy whereas they didn't on Squeeze. Additionally, it seems Non-maskable interrupts and Performance monitoring are the same value all the time. --Will PlatnickSent with Airmail On July 3, 2013 at 7:17:04 AM, Will Platnick (wplatn...@gmail.com) wrote: I followed those. I got nothing.—Sent from Mailbox for iPhoneOn Wed, Jul 3, 2013 at 5:24 AM, Darac Marjal mailingl...@darac.org.uk wrote:nullsignature.asc

Re: High Load/Interrupts on Wheezy

2013-07-03 Thread Will Platnick
More troubleshooting steps:

Built and installed latest 3.10 kernel, no change in interrupts
Built and installed latest 2.6.32 kernel, and I am back to Squeeze level
loads and no high timer, rescheduling, non-maskable or performance
interrupts are present.

So, does anybody have any idea what changed in the 3.2+ series that could
cause this?

On Wed, Jul 3, 2013 at 8:06 AM, Will Platnick wplatn...@gmail.com wrote:

 Something else I just noticed now that I'm on a screen high enough to show
 all of /proc/interrupts on one line:


 Non-maskable interrupts are happening on Wheezy whereas they didn't on
 Squeeze. Additionally, it seems Non-maskable interrupts and Performance
 monitoring are the same value all the time.


 --
 Will Platnick
 Sent with Airmail http://airmailapp.info/tracking

 On July 3, 2013 at 7:17:04 AM, Will Platnick (wplatn...@gmail.com) wrote:

 I followed those. I got nothing.
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Wed, Jul 3, 2013 at 5:24 AM, Darac Marjal mailingl...@darac.org.ukwrote:

 null
 **signature.asc**





Re: High Load/Interrupts on Wheezy

2013-07-03 Thread Scott Ferguson
On 04/07/13 00:30, Will Platnick wrote:
 More troubleshooting steps:
 
 Built and installed latest 3.10 kernel, no change in interrupts
 Built and installed latest 2.6.32 kernel, and I am back to Squeeze level
 loads and no high timer, rescheduling, non-maskable or performance
 interrupts are present.
 
 So, does anybody have any idea what changed in the 3.2+ series that
 could cause this?

Nope. But I've been experiencing the same thing so following your posts.
I've set up two identical LAMP servers hosting identical sites using
Virtualmin, one pure, standard (untweaked) Squeeze, the other pure,
standard (untweaked) Wheezy built from the same package list.  The
Squeeze one runs fine in 256MB of RAM, the Wheezy takes nearly three
times as long to boot, likewise to shutdown *even* when given 512MB of
RAM. Identical virtualmachine setups.

I've logged the output of ps aux and will compare them tomorrow night -
if I find anything obvious I'll post them.

Everything else is similar to your results.

 
 On Wed, Jul 3, 2013 at 8:06 AM, Will Platnick wplatn...@gmail.com
 mailto:wplatn...@gmail.com wrote:
 
 Something else I just noticed now that I'm on a screen high enough
 to show all of /proc/interrupts on one line:
 
snipped

Kind regards

-- 
Iceweasel/Firefox/Chrome/Chromium/Iceape/IE extensions for finding
answers to Debian questions:-
https://addons.mozilla.org/en-US/firefox/collections/Scott_Ferguson/debian/



signature.asc
Description: OpenPGP digital signature


Re: High Load/Interrupts on Wheezy

2013-07-03 Thread David Mckisick
Same issue here exactly and have noticed this since upgrading to Wheezy. We
have also delayed upgrading the rest of our servers until this gets fixed.


On Wed, Jul 3, 2013 at 10:45 AM, Scott Ferguson 
scott.ferguson.debian.u...@gmail.com wrote:

 On 04/07/13 00:30, Will Platnick wrote:
  More troubleshooting steps:
 
  Built and installed latest 3.10 kernel, no change in interrupts
  Built and installed latest 2.6.32 kernel, and I am back to Squeeze level
  loads and no high timer, rescheduling, non-maskable or performance
  interrupts are present.
 
  So, does anybody have any idea what changed in the 3.2+ series that
  could cause this?

 Nope. But I've been experiencing the same thing so following your posts.
 I've set up two identical LAMP servers hosting identical sites using
 Virtualmin, one pure, standard (untweaked) Squeeze, the other pure,
 standard (untweaked) Wheezy built from the same package list.  The
 Squeeze one runs fine in 256MB of RAM, the Wheezy takes nearly three
 times as long to boot, likewise to shutdown *even* when given 512MB of
 RAM. Identical virtualmachine setups.

 I've logged the output of ps aux and will compare them tomorrow night -
 if I find anything obvious I'll post them.

 Everything else is similar to your results.

 
  On Wed, Jul 3, 2013 at 8:06 AM, Will Platnick wplatn...@gmail.com
  mailto:wplatn...@gmail.com wrote:
 
  Something else I just noticed now that I'm on a screen high enough
  to show all of /proc/interrupts on one line:
 
 snipped

 Kind regards

 --
 Iceweasel/Firefox/Chrome/Chromium/Iceape/IE extensions for finding
 answers to Debian questions:-
 https://addons.mozilla.org/en-US/firefox/collections/Scott_Ferguson/debian/




Re: High Load/Interrupts on Wheezy

2013-07-03 Thread Will Platnick
So since there seems to be a few of us having this issue, are there any
Debian or linux kernel engineers out there who are willing to help? Is this
the best place for that?

On Wed, Jul 3, 2013 at 3:50 PM, David Mckisick mckis...@gmail.com wrote:

 Same issue here exactly and have noticed this since upgrading to Wheezy.
 We have also delayed upgrading the rest of our servers until this gets
 fixed.


 On Wed, Jul 3, 2013 at 10:45 AM, Scott Ferguson 
 scott.ferguson.debian.u...@gmail.com wrote:

 On 04/07/13 00:30, Will Platnick wrote:
  More troubleshooting steps:
 
  Built and installed latest 3.10 kernel, no change in interrupts
  Built and installed latest 2.6.32 kernel, and I am back to Squeeze level
  loads and no high timer, rescheduling, non-maskable or performance
  interrupts are present.
 
  So, does anybody have any idea what changed in the 3.2+ series that
  could cause this?

 Nope. But I've been experiencing the same thing so following your posts.
 I've set up two identical LAMP servers hosting identical sites using
 Virtualmin, one pure, standard (untweaked) Squeeze, the other pure,
 standard (untweaked) Wheezy built from the same package list.  The
 Squeeze one runs fine in 256MB of RAM, the Wheezy takes nearly three
 times as long to boot, likewise to shutdown *even* when given 512MB of
 RAM. Identical virtualmachine setups.

 I've logged the output of ps aux and will compare them tomorrow night -
 if I find anything obvious I'll post them.

 Everything else is similar to your results.

 
  On Wed, Jul 3, 2013 at 8:06 AM, Will Platnick wplatn...@gmail.com
  mailto:wplatn...@gmail.com wrote:
 
  Something else I just noticed now that I'm on a screen high enough
  to show all of /proc/interrupts on one line:
 
 snipped

 Kind regards

 --
 Iceweasel/Firefox/Chrome/Chromium/Iceape/IE extensions for finding
 answers to Debian questions:-

 https://addons.mozilla.org/en-US/firefox/collections/Scott_Ferguson/debian/





High Load/Interrupts on Wheezy

2013-07-02 Thread Will Platnick
I am experiencing some issues with load after upgrading some of my Squeeze boxes to Wheezy. I have 7 app servers, all with identical hardware with identical packages and code. I upgraded one of my boxes to wheezy, along with the custom packages we use for Python, PHP, etc… Same versions of the software, just built on Wheezy instead of Squeeze. My problem is that my Wheezy boxes have a load of over 3 and are not staying up during our peak time, whereas our squeeze boxes have a load of less than 1.The interesting part, is that despite the high load, my wheezy boxes are actually performing quite well, and are outperforming my squeeze boxes by 2-3 ms. Never the less, the high load is giving us cause for concern and is stopping us from migrating completely, and we're wondering if anybody else is seeing the same thing or can give us some assistance on where to go from here.I believe I have tracked down the issue with our load to be an interrupt issue. My interrupts on wheezy are way higher. CPU, I/O, Memory and Context Switches are all the same (measured with top, atop, iotop, vmstat). It doesn't appear to be a hardware issue, as I deployed wheezy and our code base to a different and faster motherboard/cpu combo, and the issue remained.The items that stands out is that my "Rescheduling Interrupts" and "timer" are interrupting like crazy on wheezy compared to squeeze. Here is my output of total interrupts on Squeeze vs Wheezy for two different machines, rebooted and placed into service at the exact same time, with traffic distributed to them via round robin, so it should be fairly equal.
Rescheduling Interrupts: 4109580 on Wheezy vs 67418 on Squeeze.Timer: 504238 on Wheezy vs 50 on Squeeze.
Thoughts? Suggestions?

Here is my squeeze box interrupts:
# sudo cat /proc/interrupts | awk '{ print $18, $19, $2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17 }'
  0
IO-APIC-edge timer 50
IO-APIC-edge i8042 2
IO-APIC-edge serial 8
IO-APIC-edge rtc0 1
IO-APIC-fasteoi acpi 0
IO-APIC-edge i8042 4
IO-APIC-fasteoi uhci_hcd:usb2 0
IO-APIC-fasteoi ehci_hcd:usb1, 2
IO-APIC-fasteoi ata_piix, 24014
IO-APIC-fasteoi uhci_hcd:usb4 48
IO-APIC-fasteoi ehci_hcd:usb3, 0
PCI-MSI-edge eth0 1
PCI-MSI-edge eth0-TxRx-0 919924
PCI-MSI-edge eth0-TxRx-1 1206377
PCI-MSI-edge eth0-TxRx-2 1208344
PCI-MSI-edge eth0-TxRx-3 817225
PCI-MSI-edge eth0-TxRx-4 1165734
PCI-MSI-edge eth0-TxRx-5 1314252
PCI-MSI-edge eth0-TxRx-6 998115
PCI-MSI-edge eth0-TxRx-7 1229384
PCI-MSI-edge eth1 1
PCI-MSI-edge eth1-TxRx-0 4776
PCI-MSI-edge eth1-TxRx-1 
PCI-MSI-edge eth1-TxRx-2 5557
PCI-MSI-edge eth1-TxRx-3 5344
PCI-MSI-edge eth1-TxRx-4 5827
PCI-MSI-edge eth1-TxRx-5 5060
PCI-MSI-edge eth1-TxRx-6 4078
PCI-MSI-edge eth1-TxRx-7 4317
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
Non-maskable interrupts 0
Local timer 3968846
Spurious interrupts 0
Performance monitoring 0
Performance pending 0
Rescheduling interrupts 67418
Function call 16404
TLB shootdowns 33915
Thermal event 0
Threshold APIC 0
Machine check 0
Machine check 128

Here is my wheezy interrupts:
# sudo cat /proc/interrupts | awk '{ print $18, $19, $2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17 }'
IO-APIC-edge timer 504238
IO-APIC-edge i8042 3
IO-APIC-edge serial 12
IO-APIC-edge rtc0 1
IO-APIC-fasteoi acpi 0
IO-APIC-edge i8042 4
IO-APIC-fasteoi uhci_hcd:usb3 0
IO-APIC-fasteoi ehci_hcd:usb1, 2
IO-APIC-fasteoi ata_piix, 21189
IO-APIC-fasteoi uhci_hcd:usb4 47
IO-APIC-fasteoi ehci_hcd:usb2, 0
PCI-MSI-edge eth0 1
PCI-MSI-edge eth0-TxRx-0 1506134
PCI-MSI-edge eth0-TxRx-1 1102085
PCI-MSI-edge eth0-TxRx-2 1399087
PCI-MSI-edge eth0-TxRx-3 1123149
PCI-MSI-edge eth0-TxRx-4 849678
PCI-MSI-edge eth0-TxRx-5 1428705
PCI-MSI-edge eth0-TxRx-6 897420
PCI-MSI-edge eth0-TxRx-7 1321820
PCI-MSI-edge eth1 1
PCI-MSI-edge eth1-TxRx-0 4290
PCI-MSI-edge eth1-TxRx-1 4217
PCI-MSI-edge eth1-TxRx-2 3685
PCI-MSI-edge eth1-TxRx-3 4081
PCI-MSI-edge eth1-TxRx-4 5532
PCI-MSI-edge eth1-TxRx-5 6604
PCI-MSI-edge eth1-TxRx-6 3996
PCI-MSI-edge eth1-TxRx-7 4560
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
PCI-MSI-edge ioat-msix 3
Non-maskable interrupts 3847
Local timer 3846061
Spurious interrupts 0
Performance monitoring 3847
IRQ work 0
Rescheduling interrupts 4109580
Function call 13442
TLB shootdowns 1745
Thermal event 0
Threshold APIC 0
Machine check 0
Machine check 128

Re: MySQL slow and high load with Debian Wheezy (was: [whole mail text])

2013-05-24 Thread Martin Steigerwald
Hi Andrei,

How could that KMail can answer from the subject if marked, but I strongly 
second Lisi´s notion of putting a legible text into the mail body and using a 
fine descriptive and short enough subject for the mail.

Am Donnerstag, 23. Mai 2013, 11:15:29 schrieb Andrei Hristow:
 Hi, I have a serious problem with Debian 7. The system is very slow, work
 with MySQL databases is slow and painful. On Debian 6.0.7 system is very
 fast and stable, works on ext3 and ext4 on Debian 7. I have 8 GB of RAM and
 use the AMD64 version. CPU is 2.133 Ghz Intel core 2 duo. Mainboard is
 Gigabyte GA-EP45-UD3R socket 775 Hard drive using ata_piix driveron debian
 6 and debian 7. Where could be the problem? Be because of ext4? Or should I
 use the i386 version of Debian 7 with PAE kernel The difference in
 performance between debian 6 and debian 7 is huge! I use wine and system
 load reached 10.0 What are your tips. What is better to use. Debian 7 i386
 or Debian 7 AMD64 ? And whether it's better to ext3 or ext4

Lots of information is missing in there. What does slow mean? How do you 
notice its slow? Do you have any numbers? What is the workload? How is memory, 
cpu, disk usage and so on…

But just a rough guess:

Are you by chance using the -486 kernel? Well that will give you *one* CPU and 
I think a maximum of 1 GB of RAM (not sure about the latter).

With any current x86 hardware for 32-bit 686-pae is suitable, for 64-bit its 
amd64.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/3206881.4rnBuzsIDg@merkaba



Intermittent high load average after upgrade to lenny / 2.6.26-2 ?

2009-11-09 Thread Glyn Astill
Hi Chaps,

I've upgraded a server running our database connection pool software from etch 
on 2.6.18 to lenny on 2.6.26 and I'm now seeing intermittant high load averages.

I don't see anything CPU or IO bound on the machine using top and vmstat, and 
I'm absoloutely baffled by it.  Normal load average is below 1, but every so 
often totally out of the blue I'll see it jump up to 20!

Didn't happen on 2.6.18, what else should I look at before suspecting it's a 
bug somewhere in CFS?

Ta
Glyn





--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Regular high load peaks on servers

2009-10-21 Thread Γιώργος Πάλλας

Julien wrote:

Hi,

Since quite a long time now, we observe the same phenomenon on three
web servers we have on two different places. They regularly have 
high load peaks, until 20 to 50. These peaks append very regularly

(from once a day to several an hour), and we can't explain why. It
doesn't seem to be linked to a special increase in traffic or web
requests.

Two of the three servers are load-balanced web frontends running
apache with nfs mounts. The third is an autonomous server with web,
mail, mysql… services, without nfs. The three run under Debian Lenny.

I know nobody could really solve this problem without access to the
machines and logs, but I wonder if someone already experienced this
sort of regular load peaks.

Thanks in advance for any help,

  


Try installing sysstat, and use the iostat utility to check your disk's 
usage during those peaks. High load is caused from  high cpu 
utilization, and from I/O util.


Try also to stop cron for several hours, in order to be sure that no 
cron job causes the load.


G.



smime.p7s
Description: S/MIME Cryptographic Signature


Regular high load peaks on servers

2009-10-20 Thread Julien
Hi,

Since quite a long time now, we observe the same phenomenon on three
web servers we have on two different places. They regularly have 
high load peaks, until 20 to 50. These peaks append very regularly
(from once a day to several an hour), and we can't explain why. It
doesn't seem to be linked to a special increase in traffic or web
requests.

Two of the three servers are load-balanced web frontends running
apache with nfs mounts. The third is an autonomous server with web,
mail, mysql… services, without nfs. The three run under Debian Lenny.

I know nobody could really solve this problem without access to the
machines and logs, but I wonder if someone already experienced this
sort of regular load peaks.

Thanks in advance for any help,

-- 
Julien


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



etch testing bug 341055 spamassassin and exim - high load

2006-04-25 Thread hanasaki
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=341055
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4590

Anyone have a work around?  the --round-robin from the above link has
lessened the issue however it is still creating a load ave of over 12.0 !
I tried downgrading to sarge/stable for sa sa-exim and exim-daemon-heavy
and ended up with TLS cache read failed  What is that and how can the
cache be fixed?

Just recently did a apt-get update;apt-get dist-upgrade on etch.  Was
already on etch.  This brought in:
spamassassin 3.1.0a-2
exim-daemon-heavy 4.61-1
sa-exim 4.2.1-2


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Woody: High load average, but no processes hogging...

2005-06-07 Thread Adam Garside
On Tue, Jun 07, 2005 at 02:54:37PM +1200, Simon wrote:
[snip]
 I have noticed high(ish) load averages (currently 2.08, last week it was 
 17!!), but there is no processes hogging the CPU, nor are we using any 
[snip]

Check the output of ps(1) and look for processes in the 'D' state. Also,
check I/O with:

vmstat 5

(don't forget to discard the first line of info from that command.)

-- asg


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Woody: High load average, but no processes hogging...

2005-06-07 Thread Simon

Adam Garside wrote:

I have noticed high(ish) load averages (currently 2.08, last week it was 
17!!), but there is no processes hogging the CPU, nor are we using any 


[snip]

Check the output of ps(1) and look for processes in the 'D' state.


Nothing there. All seems fine.

Also,

check I/O with:

vmstat 5

(don't forget to discard the first line of info from that command.)


The load average is currently at 2.12.

Looked abit much cut n pasted, but here is the result:
http://gremin.orcon.net.nz/vmstat.html


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




gateway pppd, syslog high load

2004-04-14 Thread Alex Handle
hallo!

ich hab vor kurzem ein gateway (adsl pptp, iptables) aufgesetzt. mein problem
ist die zu hohe load.


---top--
11:12:49 up 12 days, 19:01,  1 user,  load average: 1.65, 1.45, 1.42
29 processes: 24 sleeping, 4 running, 1 zombie, 0 stopped
CPU states:  75.6% user,  24.4% system,   0.0% nice,   0.0% idle
Mem:192188K total,   124172K used,68016K free,46264K buffers
Swap:   248968K total,0K used,   248968K free,34356K cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  348 root  16   0   936  932   772 R57.8  0.4  3841m pppd
16821 root  12   0   596  596   488 R42.2  0.3 244:56 syslogd
1 root   8   0   484  484   424 S 0.0  0.2   0:05 init
2 root   9   0 00 0 SW0.0  0.0   0:00 keventd
3 root  19  19 00 0 SWN   0.0  0.0   0:00 ksoftirqd_CPU0
4 root   9   0 00 0 SW0.0  0.0   0:00 kswapd
5 root   9   0 00 0 SW0.0  0.0   0:00 bdflush
6 root   9   0 00 0 SW0.0  0.0   0:00 kupdated
7 root   9   0 00 0 SW0.0  0.0   0:01 kjournald
  163 root   9   0  1084 1084   408 S 0.0  0.5   0:00 klogd
  210 root   9   0  1208 1208  1072 S 0.0  0.6   0:00 sshd
  213 daemon 9   0   580  580   504 S 0.0  0.3   0:00 atd
  216 root   8   0   684  684   564 S 0.0  0.3   0:00 cron
  218 root   9   0  3628 3628  1324 S 0.0  1.8   2:20 ddclient
  220 root   9   0   468  468   408 S 0.0  0.2   0:00 getty
  221 root   9   0   468  468   408 S 0.0  0.2   0:00 getty
--top--

software
os: debian 3.0 r2
kernel: 2.4.25

vom der hardware durfte es jedoch kein problem sein:
256 MB und 1800+ Athlon sind eigentlich überdimensioniert für einen router
und das netz ist auch nicht zu groß, ca. 10 clients.
mir ist auch aufgefallen, dass die ersten 3 tage die load ganz normal war.
wenn ich mit top die refresh rate auf 0.1 setzte fällt auf, dass der syslog 
immer kurze zeit 99% der cpu verbraucht.


mfg
alex



Re: gateway pppd, syslog high load

2004-04-14 Thread Timo Eckert
On Wed, 14 Apr 2004 09:19:12 +0200
Alex Handle [EMAIL PROTECTED] wrote:

 vom der hardware durfte es jedoch kein problem sein:
 256 MB und 1800+ Athlon sind eigentlich überdimensioniert für einen router
 und das netz ist auch nicht zu groß, ca. 10 clients.
 mir ist auch aufgefallen, dass die ersten 3 tage die load ganz normal war.
 wenn ich mit top die refresh rate auf 0.1 setzte fällt auf, dass der syslog 
 immer kurze zeit 99% der cpu verbraucht.

Hrm...

Platte bzw Partition voll? So dass der Syslogd nicht mehr schreiben kann?

Sonnige Grüsse,
Timo.



Re: gateway pppd, syslog high load

2004-04-14 Thread Alex Handle
nein hab eine 20 GB platte und es sind nur ca 400 mb drauf

On Wednesday 14 April 2004 09:49, Timo Eckert wrote:
 On Wed, 14 Apr 2004 09:19:12 +0200

 Alex Handle [EMAIL PROTECTED] wrote:
  vom der hardware durfte es jedoch kein problem sein:
  256 MB und 1800+ Athlon sind eigentlich überdimensioniert für einen
  router und das netz ist auch nicht zu groß, ca. 10 clients.
  mir ist auch aufgefallen, dass die ersten 3 tage die load ganz normal
  war. wenn ich mit top die refresh rate auf 0.1 setzte fällt auf, dass der
  syslog immer kurze zeit 99% der cpu verbraucht.

 Hrm...

 Platte bzw Partition voll? So dass der Syslogd nicht mehr schreiben kann?

 Sonnige Grüsse,
 Timo.



Re: gateway pppd, syslog high load

2004-04-14 Thread Alex Handle
Ich hatte das selbe problem schon bei einem anderen rechner,
ich glaube es liegt nicht an der hardware.

On Wednesday 14 April 2004 09:49, Timo Eckert wrote:
 On Wed, 14 Apr 2004 09:19:12 +0200

 Alex Handle [EMAIL PROTECTED] wrote:
  vom der hardware durfte es jedoch kein problem sein:
  256 MB und 1800+ Athlon sind eigentlich überdimensioniert für einen
  router und das netz ist auch nicht zu groß, ca. 10 clients.
  mir ist auch aufgefallen, dass die ersten 3 tage die load ganz normal
  war. wenn ich mit top die refresh rate auf 0.1 setzte fällt auf, dass der
  syslog immer kurze zeit 99% der cpu verbraucht.

 Hrm...

 Platte bzw Partition voll? So dass der Syslogd nicht mehr schreiben kann?

 Sonnige Grüsse,
 Timo.



Re: gateway pppd, syslog high load

2004-04-14 Thread Timo Eckert
On Wed, 14 Apr 2004 10:12:08 +0200
Alex Handle [EMAIL PROTECTED] wrote:

 nein hab eine 20 GB platte und es sind nur ca 400 mb drauf

Hast du den syslogd mal restarted?

Sonnige Grüsse,
Timo.



Re: gateway pppd, syslog high load

2004-04-14 Thread Alex Handle
ich hab den syslog jetzt gestoppt und jetzt geht die load leicht runter 
 load average: 1.04, 1.29, 1.38

nach einem start geht die load wieder hoch ...

vielleicht liegt es am dhcpd der schreibt ziemlich viel in die daemon.log

-- /var/log/daemon.log --
Apr 13 23:47:28 router dhcpd-2.2.x: DHCPREQUEST for 192.168.2.14 from 
00:0b:6a:18:a6:92 via eth0
Apr 13 23:47:28 router dhcpd-2.2.x: DHCPACK on 192.168.2.14 to 
00:0b:6a:18:a6:92 via eth0
Apr 13 23:50:39 router dhcpd-2.2.x: DHCPREQUEST for 192.168.2.13 from 
00:0b:6a:2a:8f:31 via eth0
Apr 13 23:50:39 router dhcpd-2.2.x: DHCPACK on 192.168.2.13 to 
00:0b:6a:2a:8f:31 via eth0
Apr 13 23:51:06 router dhcpd-2.2.x: DHCPREQUEST for 192.168.2.12 from 
00:0b:6a:2a:94:00 via eth0
Apr 13 23:51:06 router dhcpd-2.2.x: DHCPACK on 192.168.2.12 to 
00:0b:6a:2a:94:00 via eth0
Apr 13 23:52:29 router dhcpd-2.2.x: DHCPREQUEST for 192.168.2.14 from 
00:0b:6a:18:a6:92 via eth0
Apr 13 23:52:29 router dhcpd-2.2.x: DHCPACK on 192.168.2.14 to 
00:0b:6a:18:a6:92 via eth0
-- /var/log/daemon.log --



On Wednesday 14 April 2004 10:23, Timo Eckert wrote:
 On Wed, 14 Apr 2004 10:12:08 +0200

 Alex Handle [EMAIL PROTECTED] wrote:
  nein hab eine 20 GB platte und es sind nur ca 400 mb drauf

 Hast du den syslogd mal restarted?

 Sonnige Grüsse,
 Timo.



Re: gateway pppd, syslog high load

2004-04-14 Thread Timo Eckert
On Wed, 14 Apr 2004 10:45:59 +0200
Alex Handle [EMAIL PROTECTED] wrote:

 ich hab den syslog jetzt gestoppt und jetzt geht die load leicht runter 
  load average: 1.04, 1.29, 1.38

Naja, aber immer noch über 1..

was sagt denn 'dmesg'?
Irgendwelche Errors?

Sonnige Grüsse,
Timo.



Re: gateway pppd, syslog high load

2004-04-14 Thread Alex Handle
mir ist auch aufgefallen, dass ppp und pptp 2 x gestartet sind:

router:~# ps aux | grep pptp
root 16556  0.0  0.2  1316  524 ?SApr13   0:04 /usr/sbin/pptp
root 16558  0.0  0.2  1316  552 ?SApr13   0:00 /usr/sbin/pptp

router:~# ps aux | grep ppp
root   348 21.1  0.4  2008  932 ?RApr01 
3924:37 /usr/sbin/pppd /dev/pts/0 38400 persist maxfail 0
root 16560  0.0  0.4  2008  916 pts/1SApr13   
0:00 /usr/sbin/pppd /dev/pts/1 38400 persist maxfail 0

ist das überhaupt normal ...


On Wednesday 14 April 2004 10:23, Timo Eckert wrote:
 On Wed, 14 Apr 2004 10:12:08 +0200

 Alex Handle [EMAIL PROTECTED] wrote:
  nein hab eine 20 GB platte und es sind nur ca 400 mb drauf

 Hast du den syslogd mal restarted?

 Sonnige Grüsse,
 Timo.



Re: gateway pppd, syslog high load

2004-04-14 Thread Alex Handle
hab das in der kern.log gefunden sieht auch nicht gut aus:

-- kern.log --
Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame spanned multiple 
buffers, entry 0xd length 0 status 0400!
Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame cbaa90d0 vs 
cbaa90d0.
Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame spanned multiple 
buffers, entry 0xe length 0 status 0400!
Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame cbaa90e0 vs 
cbaa90e0.
Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame spanned multiple 
buffers, entry 0xf length 0 status 0581!
Apr 12 17:50:40 router kernel: eth1: Oversized Ethernet frame cbaa90f0 vs 
cbaa90f0.
-- kern.log

On Wednesday 14 April 2004 09:19, Alex Handle wrote:
 hallo!

 ich hab vor kurzem ein gateway (adsl pptp, iptables) aufgesetzt. mein
 problem ist die zu hohe load.


 ---top--
 11:12:49 up 12 days, 19:01,  1 user,  load average: 1.65, 1.45, 1.42
 29 processes: 24 sleeping, 4 running, 1 zombie, 0 stopped
 CPU states:  75.6% user,  24.4% system,   0.0% nice,   0.0% idle
 Mem:192188K total,   124172K used,68016K free,46264K buffers
 Swap:   248968K total,0K used,   248968K free,34356K cached

   PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
   348 root  16   0   936  932   772 R57.8  0.4  3841m pppd
 16821 root  12   0   596  596   488 R42.2  0.3 244:56 syslogd
 1 root   8   0   484  484   424 S 0.0  0.2   0:05 init
 2 root   9   0 00 0 SW0.0  0.0   0:00 keventd
 3 root  19  19 00 0 SWN   0.0  0.0   0:00
 ksoftirqd_CPU0 4 root   9   0 00 0 SW0.0  0.0   0:00
 kswapd 5 root   9   0 00 0 SW0.0  0.0   0:00 bdflush 6
 root   9   0 00 0 SW0.0  0.0   0:00 kupdated 7 root
   9   0 00 0 SW0.0  0.0   0:01 kjournald 163 root   9  
 0  1084 1084   408 S 0.0  0.5   0:00 klogd 210 root   9   0  1208
 1208  1072 S 0.0  0.6   0:00 sshd 213 daemon 9   0   580  580   504
 S 0.0  0.3   0:00 atd
   216 root   8   0   684  684   564 S 0.0  0.3   0:00 cron
   218 root   9   0  3628 3628  1324 S 0.0  1.8   2:20 ddclient
   220 root   9   0   468  468   408 S 0.0  0.2   0:00 getty
   221 root   9   0   468  468   408 S 0.0  0.2   0:00 getty
 --top--

 software
 os: debian 3.0 r2
 kernel: 2.4.25

 vom der hardware durfte es jedoch kein problem sein:
 256 MB und 1800+ Athlon sind eigentlich überdimensioniert für einen router
 und das netz ist auch nicht zu groß, ca. 10 clients.
 mir ist auch aufgefallen, dass die ersten 3 tage die load ganz normal war.
 wenn ich mit top die refresh rate auf 0.1 setzte fällt auf, dass der syslog
 immer kurze zeit 99% der cpu verbraucht.


 mfg
 alex



high load but no cpu usage

2003-10-04 Thread Shri Shrikumar
Hi,

I seem to have a strange problem. I have a server which is showing a
load average of around 1 but cpu usage of 0.6% over two cpus.

What bothers me is that load average used to stay under 0.16 previously
- nothing has changed. I have already tried to see if there are any
processes blocking using ps auxwww but they all seem to be in State S
with a few in SW and two in SWN.

%cpu in iowait is also 0% according to top. 

also, iostat tells me the following (iostat -k)

avg-cpu:  %user   %nice%sys   %idle
   8.310.000.75   90.94

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
dev8-01.72 0.0619.45 137015   41971011


Its really only running postgresql since it running as  a db server.

Any help is getting to the bottom of this appreciated.

Shri

-- 

Shri Shrikumar   U R Byte Solutions   Tel:   0845 644 4745
I.T. Consultant  Edinburgh, Scotland  Mob:   0773 980 3499
 Web: www.urbyte.com  Email: [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: high load but no cpu usage

2003-10-04 Thread Rus Foster
 Hi,

 I seem to have a strange problem. I have a server which is showing a
 load average of around 1 but cpu usage of 0.6% over two cpus.

This would imply I/O wait for me. What sort of disks does it have?

 What bothers me is that load average used to stay under 0.16 previously
 - nothing has changed. I have already tried to see if there are any
 processes blocking using ps auxwww but they all seem to be in State S
 with a few in SW and two in SWN.


Run vmstat 1 for a few minutes and post it here

Rgds

Rus
-- 
w: http://www.jvds.com  | Dedicated FreeBSD,Debian and RedHat Servers
e: [EMAIL PROTECTED]| Dontations made to Debian, FreeBSD
t: +44 7919 373537  | and Slackware
t: 1-888-327-6330   | email: [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: high load but no cpu usage

2003-10-04 Thread Shri Shrikumar
On Sat, 2003-10-04 at 19:44, Rus Foster wrote:
  Hi,
 
  I seem to have a strange problem. I have a server which is showing a
  load average of around 1 but cpu usage of 0.6% over two cpus.
 
 This would imply I/O wait for me. What sort of disks does it have?

Thats what I thought but this same machine has handles twice the load at
just 0.16 load average. Also, its running off 2 scsi disks which are
mirrored.

 Run vmstat 1 for a few minutes and post it here

I have attached the output of vmstat. The load average of the machine
was around 1 throughout. Please let me know if you want me to add a
longer vmstat run.

Best wishes,

Shri

-- 

Shri Shrikumar   U R Byte Solutions   Tel:   0845 644 4745
I.T. Consultant  Edinburgh, Scotland  Mob:   0773 980 3499
 Web: www.urbyte.com  Email: [EMAIL PROTECTED]


vmstat.gz
Description: GNU Zip compressed data


signature.asc
Description: This is a digitally signed message part


high load average

2002-09-23 Thread Jason Pepas

the other day I was moving several gigs of files from one ide drive to 
another on the same ide chain (the secondary channel is broken) and my load 
average went up to around 7 (no, not 0.07).  The machine would become 
unresponsive for several seconds at a time.  This is a uniprocessor machine, 
both drives are ext2 filesystems.  

Is this normal?  I don't seem to remember having ide performance issues like 
this before (this is a new install).

-jason


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: high load average

2002-09-23 Thread Ramon Kagan

Have you checked your dma settings?  hdparm/hwtools?

Ramon Kagan
York University, Computing and Network Services
Unix Team -  Intermediate System Administrator
(416)736-2100 #20263
[EMAIL PROTECTED]

-
I have not failed.  I have just
found 10,000 ways that don't work.
- Thomas Edison
-

On Mon, 23 Sep 2002, Jason Pepas wrote:

 the other day I was moving several gigs of files from one ide drive to
 another on the same ide chain (the secondary channel is broken) and my load
 average went up to around 7 (no, not 0.07).  The machine would become
 unresponsive for several seconds at a time.  This is a uniprocessor machine,
 both drives are ext2 filesystems.

 Is this normal?  I don't seem to remember having ide performance issues like
 this before (this is a new install).

 -jason


 --
 To UNSUBSCRIBE, email to [EMAIL PROTECTED]
 with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: high load average

2002-09-23 Thread nate

Jason Pepas said:
 the other day I was moving several gigs of files from one ide drive to
 another on the same ide chain (the secondary channel is broken) and my
 load  average went up to around 7 (no, not 0.07).  The machine would
 become  unresponsive for several seconds at a time.  This is a
 uniprocessor machine,  both drives are ext2 filesystems.

 Is this normal?  I don't seem to remember having ide performance issues
 like  this before (this is a new install).


this is normal(in my experience) if DMA is not enabled on one or
more of the IDE drives in use.

some broken IDE chipsets(e.g. VIA) don't work well in DMA mode and
the driver may automatically revert to PIO mode(even if you told it to
use DMA) if it encounters problems in DMA mode(which prompted me to
start using promise IDE controllers on VIA boards a couple years ago)

nate




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: high load average

2002-09-23 Thread Bijan Soleymani

 Is this normal?  I don't seem to remember having ide performance issues like 
 this before (this is a new install).
 
This is normal if dma is not enabled.
It isn't enabled by default in Debian.
To enable it install hdparm and then
run hdparm -d1 /dev/hdx as root
where x is either a,b,c,d depending on the
ide device.

Hopefully that will work and your problem
will be solved. If you're really lucky
like me you can do something like
hdparm -c3d1m16X66 /dev/hda to enable
other options such as ATA-66. Just
do man hdparm and check out the options.

You might want to make a script to run
hdparm on boot. You can put such
a script in /etc/rc.boot
it would look something like
#!/bin/sh
hdparm -options /dev/1stdevice
hdparm -options /dev/2nddevice
...

Hope that helps,
Bijan


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: high load average

2002-09-23 Thread Quenten Griffith

Or just get hwtools it creates a basic init.d script where you put your 
hdparm flags

Bijan Soleymani wrote:

Is this normal?  I don't seem to remember having ide performance issues like 
this before (this is a new install).



This is normal if dma is not enabled.
It isn't enabled by default in Debian.
To enable it install hdparm and then
run hdparm -d1 /dev/hdx as root
where x is either a,b,c,d depending on the
ide device.

Hopefully that will work and your problem
will be solved. If you're really lucky
like me you can do something like
hdparm -c3d1m16X66 /dev/hda to enable
other options such as ATA-66. Just
do man hdparm and check out the options.

You might want to make a script to run
hdparm on boot. You can put such
a script in /etc/rc.boot
it would look something like
#!/bin/sh
hdparm -options /dev/1stdevice
hdparm -options /dev/2nddevice
...

Hope that helps,
Bijan


  




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: high load average

2002-09-23 Thread Jack O'Quin

Bijan Soleymani [EMAIL PROTECTED] writes:

  Is this normal?  I don't seem to remember having ide performance issues like 
  this before (this is a new install).
  
 This is normal if dma is not enabled.
 It isn't enabled by default in Debian.
 To enable it install hdparm and then
 run hdparm -d1 /dev/hdx as root
 where x is either a,b,c,d depending on the
 ide device.
 
 Hopefully that will work and your problem
 will be solved. If you're really lucky
 like me you can do something like
 hdparm -c3d1m16X66 /dev/hda to enable
 other options such as ATA-66. Just
 do man hdparm and check out the options.


This sounds like a problem I'm having.  I tried everything I could
figure out to enable DMA on my IDE drive, but it still won't take the
enable command...

[joq@sulphur] ~/ $ sudo hdparm -d 1 /dev/hda

/dev/hda:
 setting using_dma to 1 (on)
 HDIO_SET_DMA failed: Operation not permitted
 using_dma=  0 (off)

I'm running woody.  I built a kernel to turn on IDE DMA...

[joq@sulphur] ~/ $ grep IDEDMA /usr/src/kernel-source-2.4.18/.config
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_PCI_WIP is not set
# CONFIG_IDEDMA_NEW_DRIVE_LISTINGS is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_IDEDMA_IVB is not set

Here's what hdparm reports on my hardware...

[joq@sulphur] ~/ $ sudo hdparm -i /dev/hda

/dev/hda:

 Model=IC35L040AVVA07-0, FwRev=VA2OA52A, SerialNo=VNC202A2L1SU7A
 Config={ HardSect NotMFM HdSw15uSec Fixed DTR10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=52
 BuffType=DualPortCache, BuffSize=1863kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=80418240
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive Supports : ATA/ATAPI-5 T13 1321D revision 1 : ATA-2 ATA-3 ATA-4 ATA-5 

[joq@sulphur] ~/ $ sudo hdparm /dev/hda

/dev/hda:
 multcount= 16 (on)
 I/O support  =  1 (32-bit)
 unmaskirq=  1 (on)
 using_dma=  0 (off)
 keepsettings =  0 (off)
 nowerr   =  0 (off)
 readonly =  0 (off)
 readahead=  8 (on)
 geometry = 5005/255/63, sectors = 80418240, start = 0
 busstate =  1 (on)

My mobo is an ASUS A7V333...

[joq@sulphur] ~/ $ sudo lspci -v
00:00.0 Host bridge: VIA Technologies, Inc. VT8367 [KT266]
Subsystem: Asustek Computer, Inc.: Unknown device 807f
Flags: bus master, 66Mhz, medium devsel, latency 0
Memory at e000 (32-bit, prefetchable) [size=64M]
Capabilities: [a0] AGP version 2.0
Capabilities: [c0] Power Management version 2

00:01.0 PCI bridge: VIA Technologies, Inc. VT8367 [KT266 AGP] (prog-if 00 [Normal 
decode])
Flags: bus master, 66Mhz, medium devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Memory behind bridge: dc80-dddf
Prefetchable memory behind bridge: ddf0-dfff
Capabilities: [80] Power Management version 2

snip: other devices

00:11.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) (prog-if 8a 
[Master SecP PriP])
Subsystem: Asustek Computer, Inc.: Unknown device 808c
Flags: bus master, medium devsel, latency 32
I/O ports at b400 [size=16]
Capabilities: [c0] Power Management version 2

snip: other devices

[joq@sulphur] ~/ $ cat /proc/ide/hda/driver 
ide-disk version 1.10
[joq@sulphur] ~/ $ cat /proc/ide/hda/model
IC35L040AVVA07-0
[joq@sulphur] ~/ $ sudo cat /proc/ide/hda/cache
1863
[joq@sulphur] ~/ $ sudo cat /proc/ide/hda/settings
namevalue   min max mode
-   --- --- 
bios_cyl50050   65535   rw
bios_head   255 0   255 rw
bios_sect   63  0   63  rw
breada_readahead4   0   127 rw
bswap   0   0   1   r
current_speed   0   0   69  rw
failures0   0   65535   rw
file_readahead  124 0   16384   rw
ide_scsi0   0   1   rw
init_speed  0   0   69  rw
io_32bit1   0   3   rw
keepsettings0   0   1   rw
lun 0   0   7   rw
max_failures1   0   65535   rw
max_kb_per_request  127 1   127 rw
multcount   8   

Samba Problem: dead smbd, high load, kill -9 funktioniert nicht

2002-06-18 Thread Proud Debian-User

Hallo ML,

mein fileserver hat heute nacht seltsamerweise ueber 70 smbd connections
bekommen, von diversen Rechnern in meinem Netzwerk. Soweit ok, sind aber alle
nicht mehr aktiv, tauchen aber noch im smbstatus auf. Lassen sich mit kill -9
nicht beenden. netstat zeigt CLOSE_WAIT  bei allen an. Load ist bei 75 derzeit
und das blockiert zb auch sendmail.
Booten ist leider nicht drin, der Server ist in einem abgeschlossenen Raum
und bootet leider seid drei tagen nicht mehr automatisch ( ein promise
controller, der auf 'ne eingabe wartet)

warum kann ich die prozesse nicht killen ?
und wie kann ich die lod runterdruecken ?

danke
PDU

-- 
GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net


-- 
Zum AUSTRAGEN schicken Sie eine Mail an [EMAIL PROTECTED]
mit dem Subject unsubscribe. Probleme? Mail an [EMAIL PROTECTED] (engl)




Re: Samba Problem: dead smbd, high load, kill -9 funktioniert nicht

2002-06-18 Thread Johannes Athmer

On Tue, Jun 18, 2002 at 04:59:43PM +0200, Proud Debian-User wrote:
 Hallo ML,

Hallo Proud Debian-User,
 
 [ Samba hat high load - Prozesse koennen nicht gekillt werden ]
 Booten ist leider nicht drin, der Server ist in einem abgeschlossenen Raum
 und bootet leider seid drei tagen nicht mehr automatisch ( ein promise
 controller, der auf 'ne eingabe wartet)

Hast du schon mal probiert den Samba zu beenden und dann wieder zu
starten?
-- 
Greetz

Johannes Athmer



msg10776/pgp0.pgp
Description: PGP signature


High Load Average

2001-06-03 Thread Jordi S. Bunster

Just a question: Is there any reason in particular for a Debian
Box keep its load average always over 6?

It is a AMD Athlon 750 Mhz with 256 Megs of RAM, running potato
and 2.2.19, compiled to run on i686. It has the Patches Debian
puts on the stock kernel, and the new-style raid patches,
although no RAIDs are set up yet.

Sometimes the Load Average goes over 10, making sendmail refuse
connections. It is running sendmail, IMAP, POP3, apache+perl,
Radius(cistron) and that's it. What can possibly be wrong?

Sidenote: We had another similar machine (processor was a PIII
550 Mhz) running the same stuff, but with Slackware. Load was
never that high, and the machine swapped all the time, at least
25 Megs. The new Debian Box never swaps, but has a high load
always.


Any thoughts?


  Jordi S. Bunster
[EMAIL PROTECTED]




Re: High Load Average

2001-06-03 Thread Forrest English
what is running on it? have you checked top for processes?


--
Forrest English
http://truffula.net

When we have nothing left to give
There will be no reason for us to live
But when we have nothing left to lose
You will have nothing left to use
-Fugazi 

On Sun, 3 Jun 2001, Jordi S. Bunster wrote:

 
 Just a question: Is there any reason in particular for a Debian
 Box keep its load average always over 6?
 
 It is a AMD Athlon 750 Mhz with 256 Megs of RAM, running potato
 and 2.2.19, compiled to run on i686. It has the Patches Debian
 puts on the stock kernel, and the new-style raid patches,
 although no RAIDs are set up yet.
 
 Sometimes the Load Average goes over 10, making sendmail refuse
 connections. It is running sendmail, IMAP, POP3, apache+perl,
 Radius(cistron) and that's it. What can possibly be wrong?
 
 Sidenote: We had another similar machine (processor was a PIII
 550 Mhz) running the same stuff, but with Slackware. Load was
 never that high, and the machine swapped all the time, at least
 25 Megs. The new Debian Box never swaps, but has a high load
 always.
 
 
 Any thoughts?
 
 
   Jordi S. Bunster
 [EMAIL PROTECTED]
 
 
 
 -- 
 To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
 with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
 



Re: High Load Average

2001-06-03 Thread Alvin Oga

hi ya jordi

you have a run away process and/or a memory leak

( amd and intel cpu behave slightly differently for 
( the same code...

what apps is running???

top -i
ps axuw

c ya
alvin


On Sun, 3 Jun 2001, Jordi S. Bunster wrote:

 
 Just a question: Is there any reason in particular for a Debian
 Box keep its load average always over 6?
 
 It is a AMD Athlon 750 Mhz with 256 Megs of RAM, running potato
 and 2.2.19, compiled to run on i686. It has the Patches Debian
 puts on the stock kernel, and the new-style raid patches,
 although no RAIDs are set up yet.
 
 Sometimes the Load Average goes over 10, making sendmail refuse
 connections. It is running sendmail, IMAP, POP3, apache+perl,
 Radius(cistron) and that's it. What can possibly be wrong?
 
 Sidenote: We had another similar machine (processor was a PIII
 550 Mhz) running the same stuff, but with Slackware. Load was
 never that high, and the machine swapped all the time, at least
 25 Megs. The new Debian Box never swaps, but has a high load
 always.
 



Re: High Load Average

2001-06-03 Thread Christoph Simon

On Sun, 3 Jun 2001 22:51:51 -0300 (BRT)
Jordi S. Bunster [EMAIL PROTECTED] wrote:

 Just a question: Is there any reason in particular for a Debian
 Box keep its load average always over 6?

Not really. Did you try top to find out which processes are doing
that? Maybe you where running a Netscape/Mozilla client and some java
stuff keeps runnig after a crash...

--
Christoph Simon
[EMAIL PROTECTED]
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
shit
.



Re: High Load Average

2001-06-03 Thread Alvin Oga

hi ay

or you could have a hacker running an irc on your machine
-- if the rest of your lan/machines is fine...
   than probably not

c ya
alvin


On Sun, 3 Jun 2001, Alvin Oga wrote:

 
 hi ya jordi
 
 you have a run away process and/or a memory leak
 
 ( amd and intel cpu behave slightly differently for 
 ( the same code...
 
 what apps is running???
 
 top -i
 ps axuw
 
  Just a question: Is there any reason in particular for a Debian
  Box keep its load average always over 6?



Re: High Load Average

2001-06-03 Thread Jordi S. Bunster
 you have a run away process and/or a memory leak
 
 ( amd and intel cpu behave slightly differently for 
 ( the same code...

Mmm .. speaking about internal programs, we only have some perl
scripts. Perl is the compiled one, right?

 what apps is running???

We JUST installed the server. I mean, there's nothing hand
compiled, except for Amavis. But it doesn't eat that much CPU
time. In fact, top reveals that everyone uses CPU all the time. A
ipop3d session easily goes for 18%, and a apache or sendmail one
goes for 47% ~ 56%. It is just like everyone is using the machine
at its most.

Look:

91 processes: 89 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: 68.7% user, 31.2% system,  0.0% nice,  0.0% idle
Mem:  257856K av, 229104K used,  28752K free, 103600K shrd, 
73192K buff
Swap: 128484K av,  0K used, 128484K free
86696K cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM  
TIME COMMAND
  170 root   0   0   632  632   516 S   0  5.3  0.2  
6:00 syslogd
13533 root  10   0  1124 1120   780 S   0  4.3  0.4  
0:00 scanmails
12172 jsb8   0  1192 1192   688 R   0  4.1  0.4  
0:03 top
13532 root   0   0  1552 1552  1200 S   0  0.7  0.6  
0:00 sendmail
  177 root   0   0  4160 4160   804 S   0  0.5  1.6  
2:30 named
11006 www-data   0   0  7632 7632  4776 S   0  0.5  2.9  
0:01 apache
13271 www-data   0   0  4804 4804  4656 S   0  0.5  1.8  
0:00 apache
13673 root  11   0   464  464   296 R   0  0.5  0.1  
0:00 file
11825 root   0   0  1480 1480  1220 S   0  0.3  0.5  
0:00 sshd
12136 rosanak1   0  1460 1460   972 S   0  0.3  0.5  
0:00 ipop3d
13529 root   0   0  1412 1412  1200 S   0  0.3  0.5  
0:00 sendmail
13627 root   0   0  1396 1396  1160 S   0  0.3  0.5  
0:00 sendmail
  357 root   0   0  1212 1212  1072 S   0  0.1  0.4  
0:06 sendmail
15976 thomas 0   0  5752 5752   972 S   0  0.1  2.2  
0:06 ipop3d
11525 www-data   0   0  4828 4828  4680 S   0  0.1  1.8  
0:00 apache
1 root   0   0   472  472   400 S   0  0.0  0.1  
0:12 init
2 root   0   0 00 0 SW  0  0.0  0.0  
0:00 kflushd
3 root   0   0 00 0 SW  0  0.0  0.0  
0:03 kupdate
4 root   0   0 00 0 SW  0  0.0  0.0  
0:02 kswapd
5 root   0   0 00 0 SW  0  0.0  0.0  
0:00 keventd
6 root -20 -20 00 0 SW 0  0.0  0.0  
0:00 mdrecoveryd
  102 daemon 0   0   492  492   408 S   0  0.0  0.1  
0:00 portmap
  172 root   0   0   760  760   384 S   0  0.0  0.2  
0:00 klogd
  230 root   0   0   440  440   376 S   0  0.0  0.1  
0:00 gpm
  241 root   0   0   560  560   476 S   0  0.0  0.2  
0:00 lpd
  356 root   0   0  1188 1188   832 S   0  0.0  0.4  
0:02 nmbd
  371 root   0   0  1204 1204   532 S   0  0.0  0.4  
0:00 xfs
  380 root   0   0  1548 1548  1320 S   0  0.0  0.6  
0:00 ntpd
  398 root   0   0   848  844   684 S   0  0.0  0.3  
0:00 radwatch
  399 root   0   0   856  856   792 S   0  0.0  0.3  
0:02 radiusd
  438 root   0   0   844  844   788 S   0  0.0  0.3  
0:13 radiusd
  469 root   0   0   616  616   512 S   0  0.0  0.2  
0:00 cron
 6613 root   0   0   440  440   376 S   0  0.0  0.1  
0:00 getty
22159 root   0   0   584  584   500 S   0  0.0  0.2  
0:03 inetd
31116 root   0   0  1224 1224   644 S   0  0.0  0.4  
0:00 smbmount-2.2
31129 root   0   0  1216 1216   744 S   0  0.0  0.4  
0:00 smbmount-2.2
31141 root   0   0  1220 1220   744 S   0  0.0  0.4  
0:00 smbmount-2.2
31159 root   0   0  1220 1220   744 S   0  0.0  0.4  
0:00 smbmount-2.2
31172 root   0   0  1220 1220   744 S   0  0.0  0.4  
0:00 smbmount-2.2


At this moment, Load is a little bit lower (about 4), but idle is
still 0%. Quite weird uh?

If any command output is helpful, please let me know.

  Jordi S. Bunster
[EMAIL PROTECTED]




Re: High Load Average

2001-06-03 Thread Petr \[Dingo\] Dvorak
On Sun, 3 Jun 2001, Jordi S. Bunster wrote:

JSB  you have a run away process and/or a memory leak
JSB  
JSB  ( amd and intel cpu behave slightly differently for 
JSB  ( the same code...
JSB 
JSB Mmm .. speaking about internal programs, we only have some perl
JSB scripts. Perl is the compiled one, right?
JSB 
JSB  what apps is running???
JSB 
JSB We JUST installed the server. I mean, there's nothing hand
JSB compiled, except for Amavis. But it doesn't eat that much CPU
JSB time. In fact, top reveals that everyone uses CPU all the time. A
JSB ipop3d session easily goes for 18%, and a apache or sendmail one
JSB goes for 47% ~ 56%. It is just like everyone is using the machine
JSB at its most.
JSB 
JSB Look:
JSB 
JSB 91 processes: 89 sleeping, 2 running, 0 zombie, 0 stopped
JSB CPU states: 68.7% user, 31.2% system,  0.0% nice,  0.0% idle
JSB Mem:  257856K av, 229104K used,  28752K free, 103600K shrd, 
JSB 73192K buff
JSB Swap: 128484K av,  0K used, 128484K free
JSB 86696K cached

check your bios settings, it looks like you have disabled external or internal
cache .. they should be both enabled .. and all other memory
region shadowing/caching should be disabled.

Dingo.


  ).|.(
'.'___'.'
   ' '(~)' '
   -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-ooO-=(_)=-Ooo-=-=-=-=-=-=-=-=-=-=-=-=-=-
Petr [Dingo] Dvorak [EMAIL PROTECTED]
Coder - Purple Dragon MUD   pdragon.org port 
   -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-[ 369D93 ]=-=-
 Debian version 2.2.18pre21, up 4 days, 13 users, load average: 1.00
   -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-




Re: High Load Average

2001-06-03 Thread Noah L. Meyerhans
On Sun, Jun 03, 2001 at 11:18:41PM -0300, Jordi S. Bunster wrote:
 91 processes: 89 sleeping, 2 running, 0 zombie, 0 stopped
 CPU states: 68.7% user, 31.2% system,  0.0% nice,  0.0% idle
 Mem:  257856K av, 229104K used,  28752K free, 103600K shrd, 
 73192K buff
 Swap: 128484K av,  0K used, 128484K free
 86696K cached

This (along with the process list copied from 'top') is not enough.  I
suspect that you have several processes in state 'D' which is
uninterruptable sleep.  Run 'ps auxwww' and search for any 'D's in the
STAT column.  They won't be using any CPU, but if they're hanging out in
that state they could indicate some other kind of problem, possibly
hardware related.

noah

-- 
 ___
| Web: http://web.morgul.net/~frodo/
| PGP Public Key: http://web.morgul.net/~frodo/mail.html 



pgpjVK6ydfMg8.pgp
Description: PGP signature


Re: High Load Average

2001-06-03 Thread Christoph Simon

On Sun, 3 Jun 2001 23:18:41 -0300 (BRT)
Jordi S. Bunster [EMAIL PROTECTED] wrote:

 
  you have a run away process and/or a memory leak
  
  ( amd and intel cpu behave slightly differently for 
  ( the same code...
 
 Mmm .. speaking about internal programs, we only have some perl
 scripts. Perl is the compiled one, right?
 
  what apps is running???
 
 We JUST installed the server. I mean, there's nothing hand
 compiled, except for Amavis. But it doesn't eat that much CPU
 time. In fact, top reveals that everyone uses CPU all the time. A
 ipop3d session easily goes for 18%, and a apache or sendmail one
 goes for 47% ~ 56%. It is just like everyone is using the machine
 at its most.
 
 Look:
 
 91 processes: 89 sleeping, 2 running, 0 zombie, 0 stopped
 CPU states: 68.7% user, 31.2% system,  0.0% nice,  0.0% idle
 Mem:  257856K av, 229104K used,  28752K free, 103600K shrd, 
 73192K buff
 Swap: 128484K av,  0K used, 128484K free
 86696K cached
 
   PID USER PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM  
 TIME COMMAND
   170 root   0   0   632  632   516 S   0  5.3  0.2  
 6:00 syslogd
 13533 root  10   0  1124 1120   780 S   0  4.3  0.4  
 0:00 scanmails
 12172 jsb8   0  1192 1192   688 R   0  4.1  0.4  
 0:03 top
[...]
 
 At this moment, Load is a little bit lower (about 4), but idle is
 still 0%. Quite weird uh?
 
 If any command output is helpful, please let me know.

Your table isn't very meaningful, as it doesn't show even 20% of
load. It might take a while to see. But as you say that all are
usually high, maybe you've got a kernel problem, maybe due to a
hardware (IRQ?) conflict. Just a quick guess.

--
Christoph Simon
[EMAIL PROTECTED]
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
shit
.



Re: High Load Average

2001-06-03 Thread Nate Amsden
Jordi S. Bunster wrote:

 We JUST installed the server. I mean, there's nothing hand
 compiled, except for Amavis. But it doesn't eat that much CPU

amavis is VERY cpu intensive i run it on many systems. is there a lot
of mail going through the system? is there a lot of big attachments?
one of my mail servers didnt dip below load of 8 until i upgraded the
system's hardware. amavis is great..but if you got a lotta mail you
need more horsepower.

also if your using something like UW Imap that can be a cause for very
high load as well. i suggest switching to something else like CYRUS which
reduces load by a factor of 100-200 (it did for me anyways) im sure there
are other good IMAP servers like courier(sp?) but i haven't tried them.

same goes for POP3. if your using qpopper or ipop3d those can be causes
of high load as well(cyrus has a pop3 server as well, does not
cause high load)

if you have a lot of mail going through i suggest setting up a raid0
array(or raid 10) for /var/spool and have amavis scan mail off that
drive. get SCSI if you can for this.

sample mail server config:
Average KB/hour of mail: 873kB/H
Max KB/hour of mail: 9944.7kB/H
Average Mail/hour: 71
Max Mail/hour: 1081

Average System Load: 1.02
Max System Load: 4.09

(Statistics gathered from MRTG over the past ~6 weeks or so)

System config:
Dual P3-800Mhz
Dual 15k RPM Ultra160 SCSI drives raid1 /var/spool
single 15k RPM Ultra160 drive (no raid) /
256MB ram
256MB swap
Uptime: 74 days
Time spent idle: 87.0%
Linux 2.2.17+many patches (openwall included)
Debian GNU/Linux 2.2r3
sendmail 8.9.3 + amavis
Cyrus IMAP/POP
Apache
Apache+ssl
Squirrelmail (webmail front end)
Mcafee Antivirus 4.0.70

running amavis 0.2.1. hope this gives you an idea of what to expect
when using amavis as far as load goes.

nate

-- 
:::
ICQ: 75132336
http://www.aphroland.org/
http://www.linuxpowered.net/
[EMAIL PROTECTED]



kernel 2.4.2 and high load = machine freezes?

2001-03-29 Thread Erik Steffl
  I installed kernel 2.4.2 and while it works ok most of the time there
were two occasions when computer (almost) froze, load being 100% and
almost nothing worked for about an hor or more.

  both times this high load attack happened I opened xv (the thumbs
view) on a directory with large number of files (about 2000). I did the
same thing using old (2.2.17) kernel and it never caused significant
problems. even with 2.4.2 kernel it only happens rarely, other times it
works...

  first time it happened the mouse still moved, very slowly and after
about 30 min. I saw that the focus starts to move from one window to
another (title bar of one window changed color)

  second time it happened I couldn't do anything but ping the machine
(it responded immediately, but ssh did not work) and switch from VT to
VT - the switching between VTs was fast and the text screen appeared
immediately but I could not type anything (well, I could type but
nothing appeared on the screen, no keyboard combination worked (not even
ctrl-alt-del) except of alt-Fn).

  it looks like it's caused or at least triggered by xv but I am quite
sure it wasn't updated since quite some time before it used to work
(these two freezes just happened within last week or so), the binary has
date May 12  2000.

  if it happens I cannot even run top (it took about 15 - 30 sec. for
load to build up to the level when machine was completely unusable) to
see where the time is spent - it might be kernel. the disk seems to be
working most of the time but not constantly.

  as far as I can see the memory usage does not go up, only the load
(that's what the gkrellm show while it works).

  something similar happened before with netscape (one of those extra
bad builds around version 4.0x), at that time I only had 16MB RAM so it
wasn't hard to choke the system, but it woke up (killed netscape)
eventually (it took few hours). however this time most of the software
is the same as it was before I installed 2.4.2

  I didn't find any suspicious messages in syslog or messages...

  any ideas on what's going on?

  system: debian testing, kernel 2.4.2, X 4.0.2, pentium 1GHz, 128MB
RAM, plenty of disk space (MB, RAM and processor are new so it might be
HW problem)

  TIA

erik



Re: kernel 2.4.2 and high load = machine freezes?

2001-03-29 Thread Nate Amsden
Erik Steffl wrote:

   any ideas on what's going on?

login on an xterm from another machine and run top while you try that.
recently
i upgraded my firewall from a k6-3 400 to a p3-800 and doubled the memory to
512MB. but it was still much slower!! turns out the VIA ide chipset on the p3
board(asus) didn't play well with the drivers(Even the most updated ones from
linux-ide.org). DMA was disabled, when doing a lot of file access load would
get up to 10 to 15 making even typing in an terminal(either local or remote)
very difficult to do. once i turned it on things improved. but i have since
disabled the VIA controller and got a promise controller instead it seems to
have much better drivers, or is a better ide chip..whichever. no more
problems!

course i run 2.2.x, but im sure the DMA problem can happen in 2.4.x as i've
seen stuff on the kernel mailing list about it.

nate

-- 
:::
ICQ: 75132336
http://www.aphroland.org/
http://www.linuxpowered.net/
[EMAIL PROTECTED]



Re: high load average

2001-03-09 Thread kmself
on Mon, Mar 05, 2001 at 11:12:16PM -0500, MaD dUCK ([EMAIL PROTECTED]) wrote:
 [cc'ing this to PLUG because it seems interesting...]
 
 also sprach kmself@ix.netcom.com (on Mon, 05 Mar 2001 08:02:51PM -0800):
  It's not 200% loaded.  There are two processes in the run queue.  I'd do
 
 huh? is that what 2.00 means? the average length of the run queue?

Yep.

 that would explain it because i found two STAT = D processes which i
 cannot kill (any hints what to do when kill -9 doesn't work and the
 /proc/`pidof` directory cannot be removed?). that's why 2.00.

Find their parents and kill them.  Easiest way IMO is to use pstree:

$ pstree -p

...search for the PIDs of the defunct processes, locate parent(s), kill
same.  Report back if problems.

-- 
Karsten M. Self kmself@ix.netcom.comhttp://kmself.home.netcom.com/
 What part of Gestalt don't you understand?   There is no K5 cabal
  http://gestalt-system.sourceforge.net/ http://www.kuro5hin.org


pgpvocfV4OJyu.pgp
Description: PGP signature


Re: high load average

2001-03-09 Thread kmself
on Tue, Mar 06, 2001 at 11:21:07AM -0600, Dave Sherohman ([EMAIL PROTECTED]) 
wrote:
 On Tue, Mar 06, 2001 at 06:09:41PM +0100, Joris Lambrecht wrote:
  isn't 2.00 more like 2% ? It is US notation where . is a decimal separator.
  Not ?
 
 You have the notation correct, but load average and CPU utilization are not
 directly related.  Load average is the average number of processes that are
 waiting on system resources over a certain time period; they could be waiting
 for CPU, for I/O, or for other resources.  (CPU does tend to be the biggest
 bottleneck, though, so a basic rule of thumb is that you usually don't want
 load to be much greater than the number of CPUs in the box.  

It *is* CPU.  These are processes in the run queue.  A process blocked
for I/O or another resource is blocked, not runnable (I think, I'm not
positive, but I'll bet my morning coffee on it -- which I *really* like,
and you'll want to give it to me anyway if I don't get it).

The significance of load average is that if you have more runnable
processes than CPUs, you have identified a system bottleneck:  it's now
possible to increase total system throughput by providing either more
and/or faster processors. 

Excessive swapping indicates the system is memory bound.  This isn't to
say that having a large amount of swapped memory is bad (it may or may
not be), but having a large number of processes swapping in and out of
memory is bad.

Not sure what the metric for I/O bound is.  Under Solaris, top would
report on I/O wait.  I could crack the O'Reilly system performance
tuning book and see what it says.

If none of the above are evident and things are still too slow, then
start optimizing your program(s).

 The machine I'm using starts killing off processes if load exceeds 6
 or 7; I wouldn't want to see it hit 100...)

It may not be all bad.  In certain cases, I believe Apache will spawn
large numbers of processes which manage to count against load average.
However, total system performance isn't actually negatively effected too
much.  I once took my UMP PII/180 box to a load of about 30 by running
multiple instances of computer v. computer gnuches  That took a
while to clean up.

-- 
Karsten M. Self kmself@ix.netcom.comhttp://kmself.home.netcom.com/
 What part of Gestalt don't you understand?   There is no K5 cabal
  http://gestalt-system.sourceforge.net/ http://www.kuro5hin.org


pgp78fxrBeMql.pgp
Description: PGP signature


Re: high load average

2001-03-09 Thread Dave Sherohman
On Thu, Mar 08, 2001 at 10:55:10PM -0800, kmself@ix.netcom.com wrote:
 on Tue, Mar 06, 2001 at 11:21:07AM -0600, Dave Sherohman ([EMAIL PROTECTED]) 
 wrote:
  You have the notation correct, but load average and CPU utilization are not
  directly related.  Load average is the average number of processes that are
  waiting on system resources over a certain time period; they could be 
  waiting
  for CPU, for I/O, or for other resources.

 It *is* CPU.  These are processes in the run queue.  A process blocked
 for I/O or another resource is blocked, not runnable

OK, now I'm confused...

My statements were based on my memory of a thread from last May (was it
that long ago?) on this very list titled (ot) What is load average?.
Checking back on the messages I saved from that conversation, I see a
one from kmself@ix.netcom.com stating that load average is

| Number of processes in the run queue, averaged over time.  Often
| confused with CPU utilization, which it is not.

Load average either is CPU or it isn't, right?  So you can't have been
correct both times.  Now, you may have been wrong last year and since
realized that it's more CPU-related than you had thought, but (aside from
this thread's original question describing a situation with a long-term
consistent load average of 2.00 and low-to-no CPU utilization) last
May's thread also included a message from [EMAIL PROTECTED] stating that

] It is the average number of processes in the 'R' (running/runnable) state
] (or blocked on I/O).

and

] The load average is most directly related to CPU.  Two CPU-intensive
] processes running will result in a load average of 2, etc.  But I/O
] intensive processes spend so much time active that they can drive up the
] load average also.  In addition if more than one process is blocked on I/O
] then the load average will go up very quickly, as both processes count
] toward the load even if only one can access the disk at a time.

Based on my observations of load and CPU readings on my boxes and the
messages from last May that I quoted above, I'm inclined to maintain
my earlier statement that processes waiting on any resource (not just
CPU) contribute to load.  But, if that's not the case, I'm willing to
be corrected.

-- 
Linux will do for applications what the Internet did for networks. 
- IBM, Peace, Love, and Linux
Geek Code 3.1:  GCS d? s+: a- C++ UL++$ P+ L+++ E- W--(++) N+ o+
!K w---$ O M- V? PS+ PE Y+ PGP t 5++ X+ R++ tv b+ DI D G e* h+ r y+



Re: high load average

2001-03-09 Thread kmself
on Fri, Mar 09, 2001 at 01:27:50AM -0600, Dave Sherohman ([EMAIL PROTECTED]) 
wrote:
 On Thu, Mar 08, 2001 at 10:55:10PM -0800, kmself@ix.netcom.com wrote:
  on Tue, Mar 06, 2001 at 11:21:07AM -0600, Dave Sherohman ([EMAIL 
  PROTECTED]) wrote:
   You have the notation correct, but load average and CPU
   utilization are not directly related.  Load average is the average
   number of processes that are waiting on system resources over a
   certain time period; they could be waiting for CPU, for I/O, or
   for other resources.
 
  It *is* CPU.  These are processes in the run queue.  A process
  blocked for I/O or another resource is blocked, not runnable
 
 OK, now I'm confused...

I'm also somewhat fallible.  So, we'll get to the source of the question
this time.

In particular, a job blocked for I/O *is* runnable.  My error.

 My statements were based on my memory of a thread from last May (was
 it that long ago?) on this very list titled (ot) What is load
 average?.  Checking back on the messages I saved from that
 conversation, I see a one from kmself@ix.netcom.com stating that load
 average is
 
 | Number of processes in the run queue, averaged over time.  Often
 | confused with CPU utilization, which it is not.
 
 Load average either is CPU or it isn't, right?  

Percent of clock ticks being utilized is CPU utilization.  Number of
jobs in runnable state is load average.  Related, but not identical
metrics.

My own statement:

Load average is a measure of _average current requests for CPU
processing_ over some time interval.

While we're at it, let's pull in a more authoritative definition, this
from _System Performance Tuning_, by Mike Loukides, O'Reilly, 1990:

The _system load average_ provides a convenient way to summarize the
activity on a system.  It is the first statistic you should look at
when performance seems to be poor.  UNIX defines load average as the
average number of processes in the kernel's run queue during an
interval.  A _process_ is a single stream of instructions.  Most
programs run as a single process, but some sapwn (UNIX terminology:
_fork_) other processes as they run.  A process is in the run queue
if it is:

  * Not waiting for any external event (e.g., not waiting for
someone to type a character at a terminal).
  
  * Not waiting of its own accord (e.g., the job hasn't called 'wait'.)

  * Not stopped (e.g., the job hasn't been stopped by CTRL-Z).
Processes cannot be stopped on XENIX and versions of System V.2.
The ability to stop processes has been added to System V.4 and
some versions of V.3.

While the load average is convenient, it may not give you an
accurate picture of the system's load.  There are two primary
reasons for this innaccuracy:

  * The load average counts as runnable all jobs waiting for disk
I/O.  This includes processes that are waiting for disk
operations to complete across NFS.  If an NFS server is not
responding (e.g., if the network is faulty or the server has
crashed), a percoess can wait for hours for an NFS operation to
complete.  It is considered runnable the entire time even though
nothing is happening; therefore, the load average climbs when
NFS servers crash, even though the system isn't really doing any
more work.

  * The load average does not account for scheduling priority.  It
does not differentiate between jobs that have been niced (i.e.,
placed at a lower priority and therefore not consuming much CPU
time) or jobs that are running at a high priority.

Hopefully, that clarifies a few misperceptions and sloppy statements (my
own included).

Specific to GNU/Linux, the count of active tasks is computed in
kernel/sched.c as:

static unsigned long count_active_tasks(void)
{
struct task_struct *p;
unsigned long nr = 0;

read_lock(tasklist_lock);
for_each_task(p) {
if ((p-state == TASK_RUNNING ||
 p-state == TASK_UNINTERRUPTIBLE ||
 p-state == TASK_SWAPPING))
nr += FIXED_1;
}
read_unlock(tasklist_lock);
return nr;
}

 So you can't have been correct both times.  

No, I am.  You're just not reading me consistently ;-)

My admonition in the current thread that load average is a metric of CPU
utilization is just that:  load average is concerned with CPU, it is
*not* concerned with memory, disk I/O (though I/O blocking can effect it),
etc.  However, as I clarify in this current post, and my prior thread,
load average is not equivalent to CPU _utilization_.

To put it in different terms:

   - Load average is how often you're asking for it.
   - CPU utilization is how often you're getting it.

High load average means you've got more requests than you can handle

Re: high load average

2001-03-09 Thread Dave Sherohman
On Fri, Mar 09, 2001 at 03:25:24PM -0800, kmself@ix.netcom.com wrote:
 The clarification is given in the O'Reilly citation.  Runnable
 processes, not waiting on other resources, I/O blocking excepted.

Excellent - thanks!

-- 
Linux will do for applications what the Internet did for networks. 
- IBM, Peace, Love, and Linux
Geek Code 3.1:  GCS d? s+: a- C++ UL++$ P+ L+++ E- W--(++) N+ o+
!K w---$ O M- V? PS+ PE Y+ PGP t 5++ X+ R++ tv b+ DI D G e* h+ r y+



RE: high load average

2001-03-06 Thread Joris Lambrecht
Dear dUCK,

isn't 2.00 more like 2% ? It is US notation where . is a decimal separator.
Not ?

-Original Message-
From: MaD dUCK [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 06, 2001 3:38 AM
To: debian users
Subject: high load average


someone explain this to me:

albatross:~$ uname -a
Linux albatross 2.2.17 #2 Mon Sep 04 20:49:27 CET 2000 i586 unknown

albatross:~$ uptime
  2:56am  up 174 days,  5:50,  1 user,  load average: 2.00, 2.05, 2.01

# processes sorted by decreasing cpu usage
albatross:~$ ps aux | head -1  ps aux | sort -nrk3 | head -5
USER   PID %CPU %MEM   VSZ  RSS TTY  STAT START   TIME COMMAND
root 15889  0.2  1.6  2720 1536 ?S02:50   0:02
/usr/sbin/sshd
root  1646  0.1  0.9  1672  864 ?S 2000  32:01
/usr/sbin/diald -
xfs   1776  0.0  1.0  2060 1020 ?S 2000   0:00 xfs -droppriv
-da
squid 1748  0.0  0.3  1088  332 ?S 2000   0:06 (unlinkd)
squid 1742  0.0 19.2 20048 18440 ?   S 2000  15:01 (squid) -D
root 25890  0.0  0.7  1652  764 ?D00:01   0:00 sh
/etc/ppp/ip-up
root 25889  0.0  0.7  1644  752 ?S00:01   0:00 bash
/etc/ppp/ip-

the load average displayed by uptime has been very consistently above
2.00 and the output of ps aux has been pretty much the same for the
past two weeks. no hung jobs. no traffic. the server basically *isn't
being used*, especially not during the last 1, 5, or 15 minutes. and
cron isn't running, there are *only* 35 running jobs. why, oh why then
is it 200% loaded???

martin

[greetings from the heart of the sun]# echo [EMAIL PROTECTED]:1:[EMAIL 
PROTECTED]@@@.net
-- 
the web site you seek
cannot be located but
endless others exist.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact
[EMAIL PROTECTED]



Re: high load average

2001-03-06 Thread Dave Sherohman
On Tue, Mar 06, 2001 at 06:09:41PM +0100, Joris Lambrecht wrote:
 isn't 2.00 more like 2% ? It is US notation where . is a decimal separator.
 Not ?

You have the notation correct, but load average and CPU utilization are not
directly related.  Load average is the average number of processes that are
waiting on system resources over a certain time period; they could be waiting
for CPU, for I/O, or for other resources.  (CPU does tend to be the biggest
bottleneck, though, so a basic rule of thumb is that you usually don't want
load to be much greater than the number of CPUs in the box.  The machine I'm
using starts killing off processes if load exceeds 6 or 7; I wouldn't want to
see it hit 100...)

-- 
Linux will do for applications what the Internet did for networks. 
- IBM, Peace, Love, and Linux
Geek Code 3.1:  GCS d? s+: a- C++ UL++$ P+ L+++ E- W--(++) N+ o+
!K w---$ O M- V? PS+ PE Y+ PGP t 5++ X+ R++ tv b+ DI D G e* h+ r y+



high load average

2001-03-05 Thread MaD dUCK
someone explain this to me:

albatross:~$ uname -a
Linux albatross 2.2.17 #2 Mon Sep 04 20:49:27 CET 2000 i586 unknown

albatross:~$ uptime
  2:56am  up 174 days,  5:50,  1 user,  load average: 2.00, 2.05, 2.01

# processes sorted by decreasing cpu usage
albatross:~$ ps aux | head -1  ps aux | sort -nrk3 | head -5
USER   PID %CPU %MEM   VSZ  RSS TTY  STAT START   TIME COMMAND
root 15889  0.2  1.6  2720 1536 ?S02:50   0:02 /usr/sbin/sshd
root  1646  0.1  0.9  1672  864 ?S 2000  32:01 /usr/sbin/diald -
xfs   1776  0.0  1.0  2060 1020 ?S 2000   0:00 xfs -droppriv -da
squid 1748  0.0  0.3  1088  332 ?S 2000   0:06 (unlinkd)
squid 1742  0.0 19.2 20048 18440 ?   S 2000  15:01 (squid) -D
root 25890  0.0  0.7  1652  764 ?D00:01   0:00 sh /etc/ppp/ip-up
root 25889  0.0  0.7  1644  752 ?S00:01   0:00 bash /etc/ppp/ip-

the load average displayed by uptime has been very consistently above
2.00 and the output of ps aux has been pretty much the same for the
past two weeks. no hung jobs. no traffic. the server basically *isn't
being used*, especially not during the last 1, 5, or 15 minutes. and
cron isn't running, there are *only* 35 running jobs. why, oh why then
is it 200% loaded???

martin

[greetings from the heart of the sun]# echo [EMAIL PROTECTED]:1:[EMAIL 
PROTECTED]@@@.net
-- 
the web site you seek
cannot be located but
endless others exist.



Re: high load average

2001-03-05 Thread Noah L. Meyerhans
On Mon, Mar 05, 2001 at 09:37:36PM -0500, MaD dUCK wrote:
 the load average displayed by uptime has been very consistently above
 2.00 and the output of ps aux has been pretty much the same for the
 past two weeks. no hung jobs. no traffic. the server basically *isn't
 being used*, especially not during the last 1, 5, or 15 minutes. and
 cron isn't running, there are *only* 35 running jobs. why, oh why then
 is it 200% loaded???

Load average is not an indication of how busy the CPU is.  A busy CPU
can *cause* a high load average, but so can other stuff.

In this case, I would guess that the high load is caused by processes
being blocked while waiting for IO routines to complete.  In the ps
output or in top, look for processes in state 'D'.  I suspect you'll
find 2 of them (one was a ppp process visible in the ps output you
posted).  Figure out why they're blocking and you'll be able to do
something to fix it.

noah

-- 
 ___
| Web: http://web.morgul.net/~frodo/
| PGP Public Key: http://web.morgul.net/~frodo/mail.html 



pgph2L7ZgFNFZ.pgp
Description: PGP signature


Re: high load average

2001-03-05 Thread MaD dUCK
also sprach Noah L. Meyerhans (on Mon, 05 Mar 2001 09:51:53PM -0500):
 Load average is not an indication of how busy the CPU is.  A busy CPU
 can *cause* a high load average, but so can other stuff.

good point. so i found two offending processes in state D:

root 24520  0.0  0.9  1652  904 ?DFeb25   0:00 /bin/gawk
root 25890  0.0  0.7  1652  764 ?D00:01   0:00 sh /etc/ppp/ip-up

however, a kill -9 on either one doesn't delete them, i cannot delete
it's directory in /proc (as works on solaris 2.6), and to the best of
my knowledge, these processes won't go away.

any tips, other than to reboot?

martin

[greetings from the heart of the sun]# echo [EMAIL PROTECTED]:1:[EMAIL 
PROTECTED]@@@.net
-- 
oxymoron: micro$oft works



Re: high load average

2001-03-05 Thread kmself
on Mon, Mar 05, 2001 at 09:37:36PM -0500, MaD dUCK ([EMAIL PROTECTED]) wrote:
 someone explain this to me:
 
 albatross:~$ uname -a
 Linux albatross 2.2.17 #2 Mon Sep 04 20:49:27 CET 2000 i586 unknown
 
 albatross:~$ uptime
   2:56am  up 174 days,  5:50,  1 user,  load average: 2.00, 2.05, 2.01
 
 # processes sorted by decreasing cpu usage
 albatross:~$ ps aux | head -1  ps aux | sort -nrk3 | head -5
 USER   PID %CPU %MEM   VSZ  RSS TTY  STAT START   TIME COMMAND
 root 15889  0.2  1.6  2720 1536 ?S02:50   0:02 /usr/sbin/sshd
 root  1646  0.1  0.9  1672  864 ?S 2000  32:01 
 /usr/sbin/diald -
 xfs   1776  0.0  1.0  2060 1020 ?S 2000   0:00 xfs -droppriv 
 -da
 squid 1748  0.0  0.3  1088  332 ?S 2000   0:06 (unlinkd)
 squid 1742  0.0 19.2 20048 18440 ?   S 2000  15:01 (squid) -D
 root 25890  0.0  0.7  1652  764 ?D00:01   0:00 sh 
 /etc/ppp/ip-up
 root 25889  0.0  0.7  1644  752 ?S00:01   0:00 bash 
 /etc/ppp/ip-
 
 the load average displayed by uptime has been very consistently above
 2.00 and the output of ps aux has been pretty much the same for the
 past two weeks. no hung jobs. no traffic. the server basically *isn't
 being used*, especially not during the last 1, 5, or 15 minutes. and
 cron isn't running, there are *only* 35 running jobs. why, oh why then
 is it 200% loaded???

It's not 200% loaded.  There are two processes in the run queue.  I'd do
a 'ps aux' and look at what's runnable (STAT = 'R').  You might have to
do this repeatedly to find out what's there.  If it's the same processes
consistently, you might look to see what they or their children are
doing.

Note that you list *no* runnable processes in your ps output.

-- 
Karsten M. Self kmself@ix.netcom.comhttp://kmself.home.netcom.com/
 What part of Gestalt don't you understand?   There is no K5 cabal
  http://gestalt-system.sourceforge.net/ http://www.kuro5hin.org


pgpVPI810tT06.pgp
Description: PGP signature


Re: high load average

2001-03-05 Thread MaD dUCK
[cc'ing this to PLUG because it seems interesting...]

also sprach kmself@ix.netcom.com (on Mon, 05 Mar 2001 08:02:51PM -0800):
 It's not 200% loaded.  There are two processes in the run queue.  I'd do

huh? is that what 2.00 means? the average length of the run queue?

that would explain it because i found two STAT = D processes which i
cannot kill (any hints what to do when kill -9 doesn't work and the
/proc/`pidof` directory cannot be removed?). that's why 2.00.

thanks,
martin

[greetings from the heart of the sun]# echo [EMAIL PROTECTED]:1:[EMAIL 
PROTECTED]@@@.net
-- 
micro$oft is to operating systems  security
what mcdonalds is to gourmet cuisine.



Re: High load

2000-05-01 Thread Raghavendra Bhat
Suresh Kumar posts:

 I have never seen load averages going above 2
 earlier with redhat installation. 
 

On a similar setup while running Netscape ?  Please
install libc5 and libg++272 found in /oldlibs of the
Debian 'slink' CD.


ragOO, VU2RGU. Kochi, INDIA.
Keeping the Air-Waves FREE.Amateur Radio
Keeping the W W W FREE..Debian GNU/Linux


High load

2000-04-28 Thread Suresh Kumar.R
Hi,

I recently installed a debian 2.1 on my machine which was earlier running
redhat 5.2. (pentium 100MHz, 16mb ram). The machine becomes very very slow
and unusable when I run netscape. I have dialup connection. The load
average goes 100 and more. I have never seen load averages going above 2
earlier with redhat installation. 

I tried issuing top command to know who the culprit is. I could not find
much sense from the listing. It showed multiple entries of syslogd.

Any ideas on how to make the system useful ?

Suresh
-
Suresh Kumar.R  Email: [EMAIL PROTECTED]
Dept of Electronics  Communication
College of Engineering, Trivandrum - 695 016
INDIA






RE: High load

2000-04-28 Thread Bryan Scaringe
Recent versions of netscape will slow a 16Mb system to a crawl.  How does the
system respond when you aren't running netscape?  What window manager
are you using?  What else are you running at the time.  Check you netscape
memory cache size.

I would be wiling to bet the problem lies in the (lack of) RAM.

Bryan


On 28-Apr-2000 Suresh Kumar.R wrote:
 Hi,
 
 I recently installed a debian 2.1 on my machine which was earlier running
 redhat 5.2. (pentium 100MHz, 16mb ram). The machine becomes very very slow
 and unusable when I run netscape. I have dialup connection. The load
 average goes 100 and more. I have never seen load averages going above 2
 earlier with redhat installation. 
 
 I tried issuing top command to know who the culprit is. I could not find
 much sense from the listing. It showed multiple entries of syslogd.
 
 Any ideas on how to make the system useful ?
 
 Suresh
 -
 Suresh Kumar.REmail: [EMAIL PROTECTED]
 Dept of Electronics  Communication
 College of Engineering, Trivandrum - 695 016
 INDIA
 
 
 
 
 
 
 -- 
 Unsubscribe?  mail -s unsubscribe [EMAIL PROTECTED] 
 /dev/null


high load but idle CPU

1999-05-27 Thread Max
I have a dual-CPU system running potato with kernel 2.2.3.

Here's what top reports:

 6:30pm  up 36 days, 20:55, 10 users,  load average: 5.22, 5.28, 5.17
152 processes: 147 sleeping, 2 running, 2 zombie, 1 stopped
CPU states:  0.4% user,  1.5% system,  0.0% nice, 97.9% idle
Mem:  516688K av, 480208K used,  36480K free,  96664K shrd, 167100K buff
Swap: 513968K av,  0K used, 513968K free134748K cached

How can the load be above 5 while the CPU is 97.9% idle?  This has
been the case over the last week.  The load stays very high even when
there are hours of very low CPU activity.  Any clues?

Thanks,
Max

-- 
The hopeful depend on a world without end
Whatever the hopeless may say
 Neil Peart, 1985


pgpz1QUKCNLDr.pgp
Description: PGP signature


Re: high load but idle CPU

1999-05-27 Thread Max
* George Bonser [EMAIL PROTECTED] [05/26/99 18:59] wrote:
 Do a ps -ax and see how many processes you have stuck in D state ;). Then
 go and get 2.2.9

Yup, that explains it!  I have 5 sxid processes in D state.
Hmmmcould it have something to do with the fact that I installed
arla 5 days ago and sxid is trying to traverse the entire AFS tree? :)
I guess I'll have to wait till the next reboot to clear these D
processes out.  In the meantime, editing sxid.conf is a good idea. :)

Thanks,
Max

-- 
The hopeful depend on a world without end
Whatever the hopeless may say
 Neil Peart, 1985


pgp6JTuhkfWBY.pgp
Description: PGP signature


Re: high load but idle CPU

1999-05-27 Thread Joey Hess
George Bonser wrote:
 Any process involved with heavy net activity in an SMP system with 2.2.3
 will do this. I had problems with web servers doing it. 2.2.9 seems OK.
 2.2.6/7 were disasters. 2.2.5 seemed to work, though.

Hm, could you expand on that? I've been using 2.2.7 for a while, what
problems does it have?

-- 
see shy jo


Extremely High Load

1998-01-03 Thread LeighK
I'm running a Debian 1.3.1 system and find the machine, when put into our
production environment here, after a little while causes the machine's
load to rise, and keep on going. It was so bad it got up to 150+ once. At
any ratI ran top one time and nothing was using any large amount of CPU,
nor was the hard drive going crazy or any significant amount of memory
being used. This machine is slated to replace our current shell machine,
which is currently handling shell services, e-mail, dns, and www for our
customers. The machine is a Pentium 233 /w 128MB ram (side note: I made
sure I gave the kernel the mem=128M param), with a 2.0.33 kernel. Some of
the other significant software we run (as in, the stuff that gets hit the
most) is sendmail 8.8.8 (I rolled my own), qpopper 2.2 (my own
compile),and apache 1.2.4 (again, my own compile). We also run cgiwrap,
but I don't think that would cause the problem since I disabled it when I
first started seeing the problem. I was also running process
accounting. 

Sometimes the system doesn't get bad right away, it made it 25 minutes of
uptime once before getting the skyrocketing load, but usually it will
start jumping within a few minutes (the last time I tried it, the load
went up as soon as the machine booted). I wouldn't be totally concerned
with the load except once the problem starts, the machine is almost
totally unresponsive to interactive use and I have to do a reset to
restart the system. Our old setup is an old Slackware distribution
with a 1.2.13 kernel. Unfortunately, I didn't set up the old system, so
the old admin may have made modifications to the kernel that I don't know
about to deal with the amount of load it gets (we can have like 25-30
sendmail processes + 30-50 apache processes running at once on the old
machine with little load). Could a kernel limit be getting hit (such as
file descriptors or open sockets maybe)? If anyone has any suggestions,m
please let me know. This is a problem that has me at my wits end!

Thanks!

-Leigh

-
Leigh Koven CyberComm Online Services
[EMAIL PROTECTED] http://www.cybercomm.net/
http://www.thegovernment.net/  (732) 818-
 You can check out any time you like, but you can never leave - The Eagles
-


--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word unsubscribe to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


Re: Extremely High Load

1998-01-03 Thread Shaleh
From personal experience this is a tad much for one machine.  DNS can
fill up some memory w/ cache and is a constant hit.  Really should be
its own 486 or so w/ some memory tossed in.  Shell services can be
dangerous, and a user could easily peg out a system.


--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word unsubscribe to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


Re: Extremely High Load

1998-01-03 Thread LeighK
On Sat, 3 Jan 1998, Shaleh wrote:

 From personal experience this is a tad much for one machine.  DNS can
 fill up some memory w/ cache and is a constant hit.  Really should be
 its own 486 or so w/ some memory tossed in.  Shell services can be
 dangerous, and a user could easily peg out a system.

Eventually we plan to move everything to their own machines, but we're
just not seeing this problem with the same load on the old machine (also a
Pentium 233, but it was running as a P5-100 a month ago).

-Leigh

-
Leigh Koven CyberComm Online Services
[EMAIL PROTECTED] http://www.cybercomm.net/
http://www.thegovernment.net/  (732) 818-
 You can check out any time you like, but you can never leave - The Eagles
-


--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word unsubscribe to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


Re: Extremely High Load

1998-01-03 Thread Shaleh
From personal experience this is a tad much for one machine.  DNS can
fill up some memory w/ cache and is a constant hit.  Really should be
its own 486 or so w/ some memory tossed in.  Shell services can be
dangerous, and a user could easily peg out a system.  We run a shell
machine, a dns server, and a e-mail/web server.  We also have another
machine running secondary names in addition to its usual load.


--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word unsubscribe to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .