Thanks for data.

The /proc/stat format is this: 

    cpu <user> <nice> <system> <idle> <wait> <irq> <softirq>

The values count the cpu cycles, so if we subtract the corresponding values 
from your output, we get this:

                   user   nice   system   idle   wait   irq   softirq   |   
total
09:57:35    1         0        1              99     0       0      0          
|    101
09:57:36    1         0        0              98     0       0      0          
|    99
09:57:37    25       0        16           59     1       0      0          |   
 101
09:57:38    1         0        2              98     0       0      0          
|    101

=> at  09:57:37 the cpu usage was:

user      = 24.75%
system =  15.84%
wait      =   0.99%

This corresponds to the previous vmstat output. Monit counts the cpu usage the 
same way as above and doesn't modify these values => your monit really reports 
strange cpu usage (reported 50% vs. real ~ 16%).

What's the origin of your monit binary? Did you compile it from original source 
code or some 3rd party source code distibution? (such as RHEL or Fedora 
repository). Or do you use the pre-compiled binaries from www.mmonit.com? Or 
some 3rd party binary, patches or source code from other site?

Please can you try to run monit in verbose mode and provide full output?:

   1.) stop monit
   2.) run monit in foreground with verbose mode enabled:
       ./monit -vI
   3.) after the problem happens, stop monit with "^C" and send output

I can also prepare debug version which will dump the cpu usage related 
informations or if you can provide remote access to the system, i can 
troubleshoot the problem remotely.


Regards,
Martin



On Dec 7, 2011, at 11:07 AM, Lawrence, Wayne wrote:

> Hi Martin,
>  
> this is the output of the commands you requested.
>  
> 1.) uname -m
>  
> x86_64
>  
> 2.) file `which monit`
>  
>  ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked 
> (uses shared libs), for GNU/Linux 2.6.18, not stripped
> I ran the command you supplied to get the cup usage directly as well while 
> restarting the httpd service as i know this will generate an alert.
>  
>  
>       Date:        Wed, 07 Dec 2011 09:57:37
>       Action:      exec
>       Host:        <hostname removed>
>       Description: cpu system usage of 50.0% matches resource limit [cpu 
> system usage>30.0%]
> 
> Wed Dec  7 09:57:34 GMT 2011
> cpu  207060 501 103542 49452254 25303 83 1569 0 0
> Wed Dec  7 09:57:35 GMT 2011
> cpu  207061 501 103543 49452353 25303 83 1569 0 0
> Wed Dec  7 09:57:36 GMT 2011
> cpu  207062 501 103543 49452451 25303 83 1569 0 0
> Wed Dec  7 09:57:37 GMT 2011
> cpu  207087 501 103559 49452510 25304 83 1569 0 0
> Wed Dec  7 09:57:38 GMT 2011
> cpu  207088 501 103561 49452608 25304 83 1569 0 0
> Wed Dec  7 09:57:40 GMT 2011
> If my understanding of /proc/stat is coreect this still doesnt make any sense 
> but i may be wrong.
>  
> Regards
>  
> Wayne
>  
> 
>  
> On 7 December 2011 09:37, Martin Pala <[email protected]> wrote:
> Please can you check that your monit binary matches the system architecture? 
> (i.e. for example 64-bit monit binary on 64-bit system - not 32-bit monit on 
> 64-bit system) 
> 
> To verify provide please the output of following commands:
> 1.) uname -m
> 2.) file `which monit`
> 
> Monit takes the statistics from the /proc/stat kernel interface. You can 
> collect the statistics manually like this - for example to fetch the state in 
> 1 second intervals (30 samples):
> 
> $ for ((i=0; i<30; i++)); do date; grep "cpu " /proc/stat; sleep 1; done
> 
> Note: monit takes the first /proc/stat line ("cpu") which contains the 
> overall cpu usage in the system (summary of all cpus). The /proc/stat also 
> contains per-cpu statistics if you want to collect all the statistics, 
> replace the "grep 'cpu '" simply with "cat".
> 
> Regards,
> Martin
> 
> 
> On Dec 7, 2011, at 10:04 AM, Lawrence, Wayne wrote:
> 
>> Hi Martin,
>>  
>> I have tried various methods to dientify the cause of this and took your 
>> advice and used vmstat. I simply restarted the httpd process from the monit 
>> web interface while the comand was running and got the following warning.
>>  
>>       Description: cpu system usage of 50.0% matches resource limit [cpu 
>> system usage>30.0%]
>>  
>> But vmstat doesnt show that level of usage at the point of alert. As you can 
>> see there is some usage in the 3rd line of the output when i restarted the 
>> httpd service but it doesnt seem enough to trigger an alert.
>>  
>> vmstat 1 10
>> procs -----------memory---------- ---swap-- -----io---- --system-- 
>> -----cpu-----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id 
>> wa st
>>  0  0      0 859596 114684 856908    0    0     4     6   81   77  0  0 99  
>> 0  0
>>  0  0      0 859448 114684 856916    0    0     0     0  100   94  1  0 99  
>> 0  0
>>  0  0      0 898352 114692 815600    0    0     0   168  555  605 23 15 61  
>> 1  0
>>  
>> Not sure if there are any other tests i can run to narrow this down a bit 
>> further as it still isn't making sense.
>>  
>> Regards
>>  
>> Wayne
>>  
>>  
>> 
>> 
>>  
>> On 7 December 2011 08:27, Martin Pala <[email protected]> wrote:
>> Hi Lawrence,
>> 
>> the test which triggers the alert is "system" cpu => it's the time the 
>> system spend in kernel mode. The cpu usage could be triggered by some 
>> background kernel task, to verify the monit report matches the system cpu 
>> usage, you should use either "vmstat" or "top" instead of "ps".
>> 
>> Best regards,
>> Martin 
>> 
>> 
>> On Dec 6, 2011, at 1:19 PM, Lawrence, Wayne wrote:
>> 
>>> Hi Igor,
>>>  
>>> the operating system is RHEL6 and monit version is 5.3.1
>>>  
>>> this is what i have in my config
>>>  
>>>     if cpu usage (user) > 70% then alert
>>>     if cpu usage (system) > 30% then alert
>>>     if cpu usage (wait) > 20% then alert
>>> 
>>> this is one of the errors
>>> Description: cpu system usage of 50.0% matches resource limit [cpu system 
>>> usage>30.0%]
>>>  
>>> this is what i get in /var/log/messages
>>> Dec  6 12:01:29 <hostname-removed> monit[864]: <hostname-removed> cpu 
>>> system usage of 50.0% matches resource limit [cpu system usage>30.0%]
>>> Dec  6 12:02:29 <hostname-removed> monit[864]: 
>>> <hostname-removed><hostname-removed>' cpu system usage check succeeded 
>>> [current cpu system usage=0.9%]
>>>  
>>> this is the output of ps --no-headers -A -o "%cpu sz ucomm" | sort -k1nr | 
>>> head -20
>>>  
>>>  12:01:29 up 4 days, 20:24,  2 users,  load average: 0.04, 0.01, 0.00
>>>              total       used       free     shared    buffers     cached
>>> Mem:       2055108    1092176     962932          0      53156     811864
>>> -/+ buffers/cache:     227156    1827952
>>> Swap:      4128760          0    4128760
>>>  1.2 44308 perl
>>>  0.0     0 aio/0
>>>  0.0     0 async/mgr
>>>  0.0     0 ata/0
>>>  0.0     0 ata_aux
>>>  0.0     0 bdi-default
>>>  0.0     0 cpuset
>>>  0.0     0 crypto/0
>>>  0.0     0 events/0
>>>  0.0     0 ext4-dio-unwrit
>>>  0.0     0 flush-253:0
>>>  0.0     0 jbd2/dm-0-8
>>>  0.0     0 kacpi_hotplug
>>>  0.0     0 kacpi_notify
>>>  0.0     0 kacpid
>>>  0.0     0 kauditd
>>>  0.0     0 kblockd/0
>>>  0.0     0 kdmflush
>>>  0.0     0 khelper
>>>  0.0     0 khubd
>>> 
>>> Have to say i am at a total loss as there is no way the usage figures are 
>>> accurate.
>>> If there is any other info i can supply that will be useful please let me 
>>> know.
>>>  
>>> Regards
>>>  
>>> Wayne
>>> 
>>> 
>>> On 6 December 2011 12:03, Igor Homyakov <[email protected]> 
>>> wrote:
>>> Hi Lawrence,
>>> 
>>> Could you be a little bit more specific ?  Please provide information
>>> about you operation system, monit version on which the problem
>>> occurred and so on.
>>> 
>>> Regards
>>> Igor Homyakov
>>> 
>>> On Tue, Dec 6, 2011 at 15:35, Lawrence, Wayne
>>> <[email protected]> wrote:
>>> > Hi,
>>> >
>>> > I have a few CPU usage checks in my monitrc but it seems monit is
>>> > misreporting the usage.
>>> >
>>> > I have run several tests and it seems that monit is multiplying the actual
>>> > usage by 10.
>>> >
>>> > I ran a process with top running in another shell and CPU usage for the 
>>> > user
>>> > was never above 10% yet monit informed me that there was 100% cpu usage.
>>> >
>>> > I have tried various configurations including the one that came with the
>>> > default config for system cpu monitoring and all seem to demonstrate the
>>> > same issue.
>>> >
>>> > Any advice welcomed on this
>>> >
>>> > Regards
>>> >
>>> > Wayne Lawrence

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to