Re: [Nagios-users] how to fix excessive latency

2010-06-29 Thread Andreas Ericsson
On 06/29/2010 03:57 AM, wwanghongrui wrote:
 Thanks your reply. We are writing to mysql database by ndoutils.We don't use 
 nsca. About external_command_buffer_slots, we don't set it up.
 status_update_interval =15
 
 I use vmstate to capture system performance,like below.Maybe the bottleneck 
 is not at system.
 

Endeavour to not run Nagios on a virtual server. If you must use a virtual 
server,
make very sure that your checkresult spooldirectory and status data files are on
a ramdisk, or you will certainly run into trouble.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] how to fix excessive latency

2010-06-29 Thread Giorgio Zarrelli
I agree, better not to use Nagios on virtual machine. The I/O layer of vms have 
poor performances.

Ciao,

Giorgio

Il giorno 29/giu/2010, alle ore 14:23, Andreas Ericsson a...@op5.se ha 
scritto:

 On 06/29/2010 03:57 AM, wwanghongrui wrote:
 Thanks your reply. We are writing to mysql database by ndoutils.We don't use 
 nsca. About external_command_buffer_slots, we don't set it up.
 status_update_interval =15
 
 I use vmstate to capture system performance,like below.Maybe the bottleneck 
 is not at system.
 
 
 Endeavour to not run Nagios on a virtual server. If you must use a virtual 
 server,
 make very sure that your checkresult spooldirectory and status data files are 
 on
 a ramdisk, or you will certainly run into trouble.
 
 -- 
 Andreas Ericsson   andreas.erics...@op5.se
 OP5 AB www.op5.se
 Tel: +46 8-230225  Fax: +46 8-230231
 
 Considering the successes of the wars on alcohol, poverty, drugs and
 terror, I think we should give some serious thought to declaring war
 on peace.
 
 --
 This SF.net email is sponsored by Sprint
 What will you do first with EVO, the first 4G phone?
 Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] how to fix excessive latency

2010-06-29 Thread Max
Clock skew can be an issue as well depending on the virtualization platform.

On 6/29/10, Giorgio Zarrelli zarre...@linux.it wrote:
 I agree, better not to use Nagios on virtual machine. The I/O layer of vms
 have poor performances.

 Ciao,

 Giorgio

 Il giorno 29/giu/2010, alle ore 14:23, Andreas Ericsson a...@op5.se ha
 scritto:

 On 06/29/2010 03:57 AM, wwanghongrui wrote:
 Thanks your reply. We are writing to mysql database by ndoutils.We don't
 use nsca. About external_command_buffer_slots, we don't set it up.
 status_update_interval =15

 I use vmstate to capture system performance,like below.Maybe the
 bottleneck is not at system.


 Endeavour to not run Nagios on a virtual server. If you must use a virtual
 server,
 make very sure that your checkresult spooldirectory and status data files
 are on
 a ramdisk, or you will certainly run into trouble.

 --
 Andreas Ericsson   andreas.erics...@op5.se
 OP5 AB www.op5.se
 Tel: +46 8-230225  Fax: +46 8-230231

 Considering the successes of the wars on alcohol, poverty, drugs and
 terror, I think we should give some serious thought to declaring war
 on peace.

 --
 This SF.net email is sponsored by Sprint
 What will you do first with EVO, the first 4G phone?
 Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

 --
 This SF.net email is sponsored by Sprint
 What will you do first with EVO, the first 4G phone?
 Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] how to fix excessive latency

2010-06-29 Thread wwanghongrui

I am sorry for my bad english. My nagios server is not running in virtual 
server.  Nagios3.2.0 + Suse10-sp2 x86_64 + 8 GB mem + 4 x ( Xeon(R) CPU  E7420  
@ 2.13GHz ), I think this hardware is enough.
 I use vmstate to capture system performance, vmstate is a command in 
SUSE10,not a virtual server. 

My configuration is like below,I don't know which parameter should I 
optimize,could you give me suggestions, thanks~

cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/escalations.cfg
cfg_file=/usr/local/nagios/etc/dependencies.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_file=/usr/local/nagios/etc/meta_commands.cfg
cfg_file=/usr/local/nagios/etc/meta_contactgroup.cfg
cfg_file=/usr/local/nagios/etc/meta_contact.cfg
cfg_file=/usr/local/nagios/etc/meta_dependencies.cfg
cfg_file=/usr/local/nagios/etc/meta_escalations.cfg
cfg_file=/usr/local/nagios/etc/meta_hostgroup.cfg
cfg_file=/usr/local/nagios/etc/meta_host.cfg
cfg_file=/usr/local/nagios/etc/meta_services.cfg
cfg_file=/usr/local/nagios/etc/meta_timeperiod.cfg
resource_file=/usr/local/nagios/etc//resource.cfg
log_file=/usr/local/nagios/var/nagios.log
temp_file=/usr/local/nagios/var/nagios.tmp
status_file=/usr/local/nagios/var/status.log
p1_file=/usr/local/nagios/bin/p1.pl
status_update_interval=15
nagios_user=nagios
nagios_group=nagios
enable_notifications=1
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_event_handlers=1
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives/
check_external_commands=1
command_check_interval=1s
command_file=/usr/local/nagios/var/rw/nagios.cmd
lock_file=/usr/local/nagios/var/nagios.lock
retain_state_information=1
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=1
log_external_commands=1
sleep_time=1
service_inter_check_delay_method=s
service_interleave_factor=s
max_concurrent_checks=2000
service_reaper_frequency=5
interval_length=60
use_agressive_host_checking=1
enable_flap_detection=0
low_service_flap_threshold=25.0
high_service_flap_threshold=50.0
low_host_flap_threshold=25.0
high_host_flap_threshold=50.0
service_check_timeout=60
host_check_timeout=10
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
ochp_timeout=5
perfdata_timeout=5
process_performance_data=1
host_perfdata_command=107
service_perfdata_command=process-service-perfdata
host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA  TIMET::$TIMET$  
HOSTNAME::$HOSTNAME$HOSTPERFDATA::$HOSTPERFDATA$
HOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$HOSTSTATE::$HOSTSTATE$  
HOSTSTATETYPE::$HOSTSTATETYPE$
service_perfdata_file_template=DATATYPE::SERVICEPERFDATATIMET::$TIMET$  
HOSTNAME::$HOSTNAME$
SERVICEDESC::$SERVICEDESC$SERVICEPERFDATA::$SERVICEPERFDATA$  
SERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$  HOSTSTATE::$HOSTSTATE$  
HOSTSTATETYPE::$HOSTSTATETYPE$  SERVICESTATE::$SERVICESTATE$
SERVICESTATETYPE::$SERVICESTATETYPE$
host_perfdata_file_mode=a
service_perfdata_file_mode=a
host_perfdata_file_processing_interval=30
service_perfdata_file_processing_interval=30
host_perfdata_file_processing_command=process-host-perfdata-file
service_perfdata_file_processing_command=process-service-perfdata-file
check_service_freshness=1
date_format=euro
illegal_object_name_chars=~!$%^*|'?,()=
illegal_macro_output_chars=`~$^|'
admin_email=admin
admin_pager=ad...@localhost
broker_module=/usr/local/nagios/bin/ndomod-3x.o 
config_file=/usr/local/nagios/etc/ndomod.cfg
event_broker_options=-1
use_large_installation_tweaks=1
child_processes_fork_twice=0
enable_environment_macros=0
debug_file=/usr/local/centreon/log/Debug-Graphs.log
debug_level=-1
max_debug_file_size=6
check_result_reaper_frequency=10
max_check_result_reaper_time=20



Regards

HongRui Wang
Mail:wwanghong...@cebbank.com
2010-06-30





wwanghongrui
2010-06-30



发件人: Andreas Ericsson
发送时间: 2010-06-29 20:24:12
收件人: wwanghongrui; Nagios Users List
抄送: shadih rahman
主题: Re: [Nagios-users] how to fix excessive latency

On 06/29/2010 03:57 AM, wwanghongrui wrote:
 Thanks your reply. We are writing to mysql database by ndoutils.We don't use 
 nsca. About external_command_buffer_slots, we don't set it up.
 status_update_interval =15
 
 I use vmstate to capture system performance

Re: [Nagios-users] how to fix excessive latency

2010-06-28 Thread shadih rahman
There is something definitely not right here.  We have about 1 checks
and the performance is lot better.  Anyhow we are using the following values
check_result_reaper_frequency=10
max_check_result_reaper_time=20


You should enabled debug mode and check the debug logs.  Are you writing to
any backend database?  Are you using nsca to transfer service information to
remote location.  what is the value of your status_update_interval?  what is
your external_command_buffer_slots?



2010/6/28 wwanghongrui wwanghong...@cebbank.com

  Hi,guys~

 Our nagios server envrionment: Nagios3.2.0 + Suse10-sp2 x86_64 + 8 GB mem +
 4 x ( Xeon(R) CPU  E7420  @ 2.13GHz )
 We have 500+ active check hosts and 3k+ active check services.  I have
 adjust some perfomance parameters in nagios.cfg, like below:
  use_large_installation_tweaks=1
 child_processes_fork_twice=0
 enable_environment_macros=0
 check_result_reaper_frequency=5
 max_check_result_reaper_time=30

 But, The nagios performance is still bad, like below:

Services Actively Checked:
 Time Frame Services Checked = 1 minute: 271 (9.4%) = 5 minutes: 1749
 (60.4%) = 15 minutes: 2824 (97.4%) = 1 hour: 2898 (100.0%) Since program
 start:   2869 (99.0%) Metric Min. Max. Average Check Execution Time:   
 0.09
 sec 32.23 sec 1.113 sec Check Latency: 1.12 sec 212.59 sec 116.329 sec Percent
 State Change: 0.00% 23.88% 0.05%
 Hosts Acrively Checked:
 Time Frame Hosts Checked = 1 minute: 32 (5.5%) = 5 minutes: 419
 (71.5%) = 15 minutes: 586 (100.0%) = 1 hour: 586 (100.0%) Since program
 start:   586 (100.0%) Metric Min. Max. Average Check Execution Time:   
 0.08
 sec 4.29 sec 3.035 sec Check Latency: 0.00 sec 135.25 sec 116.420 sec Percent
 State Change: 0.00% 11.32% 0.09%

  How could I find which services check or hosts check cause this seriously
 check latency?


 Regards

 HongRui Wang
 mail: wwanghong...@cebbank.com
 2010-06-28




 --
 This SF.net email is sponsored by Sprint
 What will you do first with EVO, the first 4G phone?
 Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null




-- 
Cordially,
Shadhin Rahman
--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] how to fix excessive latency

2010-06-28 Thread wwanghongrui
 0  853 2947  7 22 70  0  0
 4  0160 187860 289248 604014800 0 0  564 2748 12 25 64  0  0
 4  0160 202880 289248 604014800 0 0  432 2336  5 22 73  0  0
 5  0160 189956 289248 604014800 0   416  824 2762  7 24 69  1  0
 2  0160 195912 289248 60411760052  1224  789 2332  5 15 78  2  0
 1  0160 205060 289248 604117600 0 8  343 1718  2  8 90  0  0
 1  0160 205076 289248 604117600 0 0  320 1177  0  6 93  0  0
 1  0160 213844 289248 604117600 0 0  315 1100  0  7 92  0  0
 1  0160 226900 289248 604117600 0 0  305 1210  0  8 92  0  0
 2  0160 227188 289248 604117600 0   956  556  901  0  4 92  3  0
 1  0160 228924 289248 604117600 0 0  294 1034  1  6 93  0  0
 1  0160 229740 289248 604117600 0 0  292 1235  1  6 93  0  0
 1  0160 230228 289248 604117600 0 0  287 1696  1  6 93  0  0
 3  1160 230456 289248 604117600 0   128  288 1307  1  6 93  0  0
 1  1160 228756 289248 604220400  3052  4944  921 1673  5  7 84  4  0
 1  1160 229004 289248 604220400 0  1676 1061 1122  1  6 87  6  0
 1  1160 229004 289248 604220400 0  1672 1081 1093  0  6 87  6  0
 1  1160 230788 289248 604220400 0  1856 1171 1198  1  6 87  6  0

Regards

HongRui Wang
Mail:wwanghong...@cebbank.com
2010-06-29



发件人: shadih rahman
发送时间: 2010-06-29 00:57:24
收件人: wwanghongrui; Nagios Users List
抄送: 
主题: Re: [Nagios-users] how to fix excessive latency

There is something definitely not right here.  We have about 1 checks and 
the performance is lot better.  Anyhow we are using the following values

check_result_reaper_frequency=10
max_check_result_reaper_time=20


You should enabled debug mode and check the debug logs.  Are you writing to any 
backend database?  Are you using nsca to transfer service information to remote 
location.  what is the value of your status_update_interval?  what is your 
external_command_buffer_slots?





2010/6/28 wwanghongrui wwanghong...@cebbank.com

Hi,guys~

Our nagios server envrionment: Nagios3.2.0 + Suse10-sp2 x86_64 + 8 GB mem + 4 x 
( Xeon(R) CPU  E7420  @ 2.13GHz )
We have 500+ active check hosts and 3k+ active check services.  I have adjust 
some perfomance parameters in nagios.cfg, like below:
use_large_installation_tweaks=1
child_processes_fork_twice=0
enable_environment_macros=0
check_result_reaper_frequency=5
max_check_result_reaper_time=30

But, The nagios performance is still bad, like below:

Services Actively Checked:Time FrameServices Checked
= 1 minute:271 (9.4%)
= 5 minutes:1749 (60.4%)
= 15 minutes:2824 (97.4%)
= 1 hour:2898 (100.0%)
Since program start:  2869 (99.0%)

MetricMin.Max.Average
Check Execution Time:  0.09 sec32.23 sec1.113 sec
Check Latency:1.12 sec212.59 sec116.329 sec
Percent State Change:0.00%23.88%0.05%


Hosts Acrively Checked:Time FrameHosts Checked
= 1 minute:32 (5.5%)
= 5 minutes:419 (71.5%)
= 15 minutes:586 (100.0%)
= 1 hour:586 (100.0%)
Since program start:  586 (100.0%)

MetricMin.Max.Average
Check Execution Time:  0.08 sec4.29 sec3.035 sec
Check Latency:0.00 sec135.25 sec116.420 sec
Percent State Change:0.00%11.32%0.09%




 How could I find which services check or hosts check cause this seriously 
check latency? 


Regards 

HongRui Wang
mail: wwanghong...@cebbank.com
2010-06-28



--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null




-- 
Cordially,
Shadhin Rahman
--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null