Re: Strange load issue with 2.4.17

2014-10-14 Thread Michael Menge

Hi,

Quoting Sebastian Hagedorn haged...@uni-koeln.de:

--On 13. Oktober 2014 17:39:23 +0200 Simon Matter  
simon.mat...@invoca.ch wrote:



--On 13. Oktober 2014 17:35:25 +0200 Simon Matter
simon.mat...@invoca.ch wrote:


Hi,

for the last week we have seen strange load issues on our Cyrus server.
All
of a sudden the load increases to several thousands, user CPU goes down
to basically zero, system CPU spikes. In the past we've had trouble
with
poor I/O performance, but that went along with an increase in Wait I/O.
We don't
see that now. vmstat shows a massive increase in context switches. When
the
system reaches this state, all we can do is restart Cyrus or reboot the
machine if that doesn't work anymore.

I'm attaching a Ganglia screenshot that shows the problem clearly. When
the
problem exists, there's not much we can do to analyze it. A colleague
suggested that what we see could be related to this bug:

https://bugzilla.cyrusimap.org/show_bug.cgi?id=3744

It was reported for 2.4.16, and it sounds as if it has been fixed, but
is
that fix really part of 2.4.17? Any other ideas?


Is this a physical host or running virtualized?


It's virtualized, but it's been that way for more than a year.


Is this by any chance running on KVM, maybe on an AMD cpu?


No, it's VMware ESX on Intel CPUs.


How is the memory usage? Is the system swaping?



M.MengeTel.: (49) 7071/29-70316
Universität Tübingen   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung  mail:  
michael.me...@zdv.uni-tuebingen.de

Wächterstraße 76
72074 Tübingen

smime.p7s
Description: S/MIME Signatur

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Strange load issue with 2.4.17

2014-10-14 Thread lst_hoe02


Hello,

I'm not a Cyrus expert, but to my knowledge high %sys loads point to  
CPU time spent in kernel space for doing things. One common reason  
could be slow/overloaded I/O but this would be noticed at the %wait at  
least as long as there is progress with I/O at all. So from my point  
of view you either have a hardware problem where the kernel is doing  
busy wait for things it should not have to wait, or you hit a kernel  
bug somewhere. This would also be in line with your observation that  
the system is non-responsive at all, while the number of processes,  
the mempry usage and the %user is at normal or lower than expected.


Do you have any events in the kernel/system log or at the console at  
the time problem starts?


Regards

Andreas




smime.p7s
Description: S/MIME Cryptographic Signature

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Strange load issue with 2.4.17

2014-10-13 Thread Geoff Winkless
Apologies if I'm misreading, but that bug suggests many processes are
created over a period of time. In contrast your grab shows the number of
processes hasn't grown but the load has grown exponentially.

I'd say it's not the same bug.

The grab shows system CPU staying around the same, contrary to your
description - which of them is correct? If load has increased while the CPU
has dropped, I'd say you're still waiting on IO.

On 13 October 2014 15:35, Sebastian Hagedorn haged...@uni-koeln.de wrote:

 Hi,

 for the last week we have seen strange load issues on our Cyrus server.
 All of a sudden the load increases to several thousands, user CPU goes down
 to basically zero, system CPU spikes. In the past we've had trouble with
 poor I/O performance, but that went along with an increase in Wait I/O. We
 don't see that now. vmstat shows a massive increase in context switches.
 When the system reaches this state, all we can do is restart Cyrus or
 reboot the machine if that doesn't work anymore.

 I'm attaching a Ganglia screenshot that shows the problem clearly. When
 the problem exists, there's not much we can do to analyze it. A colleague
 suggested that what we see could be related to this bug:

 https://bugzilla.cyrusimap.org/show_bug.cgi?id=3744

 It was reported for 2.4.16, and it sounds as if it has been fixed, but is
 that fix really part of 2.4.17? Any other ideas?

 Thanks
 Sebastian
 --
.:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
 .:.Regionales Rechenzentrum (RRZK).:.
   .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.
 
 Cyrus Home Page: http://www.cyrusimap.org/
 List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
 To Unsubscribe:
 https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Strange load issue with 2.4.17

2014-10-13 Thread Sebastian Hagedorn

Thanks for your reply. I agree that it doesn't really look like bug 3744.

System CPU *does* increase when the load starts to spike. It only goes to 
about 20 percent, but it's still a notable increase. Most of the system 
*appears* to be idle, but interactively you have to wait for each character 
you type to appear on the screen. My first assumption when dealing with a 
mail server is that all issues are I/O-related, but usually we would see an 
increase in Wait I/O when something was up ...


Cheers
Sebastian

--On 13. Oktober 2014 15:48:03 +0100 Geoff Winkless cy...@geoff.dj wrote:


Apologies if I'm misreading, but that bug suggests many processes are
created over a period of time. In contrast your grab shows the number of
processes hasn't grown but the load has grown exponentially.

I'd say it's not the same bug.

The grab shows system CPU staying around the same, contrary to your
description - which of them is correct? If load has increased while the
CPU has dropped, I'd say you're still waiting on IO.


--
   .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
  .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

p7sY1wPb17mbB.p7s
Description: S/MIME cryptographic signature

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Strange load issue with 2.4.17

2014-10-13 Thread Simon Matter
 Hi,

 for the last week we have seen strange load issues on our Cyrus server.
 All
 of a sudden the load increases to several thousands, user CPU goes down to
 basically zero, system CPU spikes. In the past we've had trouble with poor
 I/O performance, but that went along with an increase in Wait I/O. We
 don't
 see that now. vmstat shows a massive increase in context switches. When
 the
 system reaches this state, all we can do is restart Cyrus or reboot the
 machine if that doesn't work anymore.

 I'm attaching a Ganglia screenshot that shows the problem clearly. When
 the
 problem exists, there's not much we can do to analyze it. A colleague
 suggested that what we see could be related to this bug:

 https://bugzilla.cyrusimap.org/show_bug.cgi?id=3744

 It was reported for 2.4.16, and it sounds as if it has been fixed, but is
 that fix really part of 2.4.17? Any other ideas?

Is this a physical host or running virtualized?

Simon


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Strange load issue with 2.4.17

2014-10-13 Thread Sebastian Hagedorn
--On 13. Oktober 2014 17:35:25 +0200 Simon Matter simon.mat...@invoca.ch 
wrote:



Hi,

for the last week we have seen strange load issues on our Cyrus server.
All
of a sudden the load increases to several thousands, user CPU goes down
to basically zero, system CPU spikes. In the past we've had trouble with
poor I/O performance, but that went along with an increase in Wait I/O.
We don't
see that now. vmstat shows a massive increase in context switches. When
the
system reaches this state, all we can do is restart Cyrus or reboot the
machine if that doesn't work anymore.

I'm attaching a Ganglia screenshot that shows the problem clearly. When
the
problem exists, there's not much we can do to analyze it. A colleague
suggested that what we see could be related to this bug:

https://bugzilla.cyrusimap.org/show_bug.cgi?id=3744

It was reported for 2.4.16, and it sounds as if it has been fixed, but is
that fix really part of 2.4.17? Any other ideas?


Is this a physical host or running virtualized?


It's virtualized, but it's been that way for more than a year.

--
   .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
  .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

p7shuqq8JecBI.p7s
Description: S/MIME cryptographic signature

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Strange load issue with 2.4.17

2014-10-13 Thread Simon Matter
 --On 13. Oktober 2014 17:35:25 +0200 Simon Matter simon.mat...@invoca.ch
 wrote:

 Hi,

 for the last week we have seen strange load issues on our Cyrus server.
 All
 of a sudden the load increases to several thousands, user CPU goes down
 to basically zero, system CPU spikes. In the past we've had trouble
 with
 poor I/O performance, but that went along with an increase in Wait I/O.
 We don't
 see that now. vmstat shows a massive increase in context switches. When
 the
 system reaches this state, all we can do is restart Cyrus or reboot the
 machine if that doesn't work anymore.

 I'm attaching a Ganglia screenshot that shows the problem clearly. When
 the
 problem exists, there's not much we can do to analyze it. A colleague
 suggested that what we see could be related to this bug:

 https://bugzilla.cyrusimap.org/show_bug.cgi?id=3744

 It was reported for 2.4.16, and it sounds as if it has been fixed, but
 is
 that fix really part of 2.4.17? Any other ideas?

 Is this a physical host or running virtualized?

 It's virtualized, but it's been that way for more than a year.

Is this by any chance running on KVM, maybe on an AMD cpu?


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Strange load issue with 2.4.17

2014-10-13 Thread Sebastian Hagedorn
--On 13. Oktober 2014 17:39:23 +0200 Simon Matter simon.mat...@invoca.ch 
wrote:



--On 13. Oktober 2014 17:35:25 +0200 Simon Matter
simon.mat...@invoca.ch wrote:


Hi,

for the last week we have seen strange load issues on our Cyrus server.
All
of a sudden the load increases to several thousands, user CPU goes down
to basically zero, system CPU spikes. In the past we've had trouble
with
poor I/O performance, but that went along with an increase in Wait I/O.
We don't
see that now. vmstat shows a massive increase in context switches. When
the
system reaches this state, all we can do is restart Cyrus or reboot the
machine if that doesn't work anymore.

I'm attaching a Ganglia screenshot that shows the problem clearly. When
the
problem exists, there's not much we can do to analyze it. A colleague
suggested that what we see could be related to this bug:

https://bugzilla.cyrusimap.org/show_bug.cgi?id=3744

It was reported for 2.4.16, and it sounds as if it has been fixed, but
is
that fix really part of 2.4.17? Any other ideas?


Is this a physical host or running virtualized?


It's virtualized, but it's been that way for more than a year.


Is this by any chance running on KVM, maybe on an AMD cpu?


No, it's VMware ESX on Intel CPUs.
--
   .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
  .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

p7stNQTY_zB0X.p7s
Description: S/MIME cryptographic signature

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus