Re: [Dovecot] High Load Average on POP/IMAP.
Hi, if you try the following command during the server has a high load: # ps -ostat,pid,time,wchan='WCHAN-',cmd ax |grep D Do you get back something like this? STAT PID TIME WCHAN- CMD D18713 00:00:00 synchronize_srcu dovecot/imap D18736 00:00:00 synchronize_srcu dovecot/imap D18775 00:00:05 synchronize_srcu dovecot/imap D20330 00:00:00 synchronize_srcu dovecot/imap D20357 00:00:00 synchronize_srcu dovecot/imap D20422 00:00:00 synchronize_srcu dovecot/imap D20687 00:00:00 synchronize_srcu dovecot/imap S+ 20913 00:00:00 pipe_wait grep D If yes, it could be a problem with Inotify in your kernel. You can try to disable inotify in the kernel with: echo 0 /proc/sys/fs/inotify/max_user_watches echo 0 /proc/sys/fs/inotify/max_user_instances Full article: http://thread.gmane.org/gmane.linux.kernel/1315430 For me this resolved the problem. Load goes down to 1.00 Regards Urban Am 21.08.2013 12:37, schrieb Kavish Karkera: Hi, We have a serious issue running on our POP/IMAP servers these days. The load average of a servers spikes up to 400-500 as a uptime command result, for a particular time period , to be specific mostly in noon time and evening, but it last for few minutes only. We have 2 servers running dovecot 1.1.20 , in loadbanlancer, We have used KEEPLIVE (1.1.13) for loadbalacing. Server specification. Operating System : CentOS 5.5 64bit CPU cores : 16 RAM : 8GB Mail and Indexes are mounted on NFS (NetApp). Below is the dovecot -n ... (top results during high spike) # # 1.1.20: /usr/local/etc/dovecot.conf # OS: Linux 2.6.28 x86_64 CentOS release 5.5 (Final) log_path: /var/log/dovecot-info.log info_log_path: /var/log/dovecot-info.log syslog_facility: local1 protocols: imap imaps pop3 pop3s listen(default): *:143 listen(imap): *:143 listen(pop3): *:110 ssl_listen(default): *:993 ssl_listen(imap): *:993 ssl_listen(pop3): *:995 ssl_cert_file: /usr/local/etc/ssl/certs/dovecot.pem ssl_key_file: /usr/local/etc/ssl/private/dovecot.pem disable_plaintext_auth: no login_dir: /usr/local/var/run/dovecot/login login_executable(default): /usr/local/libexec/dovecot/imap-login login_executable(imap): /usr/local/libexec/dovecot/imap-login login_executable(pop3): /usr/local/libexec/dovecot/pop3-login login_greeting: Welcome to Popserver. login_process_per_connection: no max_mail_processes: 1024 mail_max_userip_connections(default): 100 mail_max_userip_connections(imap): 100 mail_max_userip_connections(pop3): 50 verbose_proctitle: yes first_valid_uid: 99 first_valid_gid: 99 mail_location: maildir:~/Maildir:INDEX=/indexes/%h:CONTROL=/indexes/%h mmap_disable: yes mail_nfs_storage: yes mail_nfs_index: yes lock_method: dotlock mail_executable(default): /usr/local/libexec/dovecot/imap mail_executable(imap): /usr/local/libexec/dovecot/imap mail_executable(pop3): /usr/local/libexec/dovecot/pop3 mail_plugins(default): quota imap_quota mail_plugins(imap): quota imap_quota mail_plugins(pop3): quota mail_plugin_dir(default): /usr/local/lib/dovecot/imap mail_plugin_dir(imap): /usr/local/lib/dovecot/imap mail_plugin_dir(pop3): /usr/local/lib/dovecot/pop3 pop3_no_flag_updates(default): no pop3_no_flag_updates(imap): no pop3_no_flag_updates(pop3): yes pop3_lock_session(default): no pop3_lock_session(imap): no pop3_lock_session(pop3): yes pop3_client_workarounds(default): pop3_client_workarounds(imap): pop3_client_workarounds(pop3): outlook-no-nuls lda: postmaster_address: ad...@research.com mail_plugins: cmusieve quota mail_log mail_plugin_dir: /usr/local/lib/dovecot/lda auth_socket_path: /var/run/dovecot/auth-master auth default: worker_max_count: 15 passdb: driver: sql args: /usr/local/etc/dovecot-mysql.conf userdb: driver: sql args: /usr/local/etc/dovecot-mysql.conf userdb: driver: prefetch socket: type: listen client: path: /var/run/dovecot/auth-client mode: 432 user: nobody group: nobody master: path: /var/run/dovecot/auth-master mode: 384 user: nobody group: nobody plugin: quota_warning: storage=95%% /usr/local/bin/quota-warning.sh 95 %u quota_warning2: storage=80%% /usr/local/bin/quota-warning.sh 80 %u quota: maildir:storage=64 ## ## top - 12:08:31 up 206 days, 10:45, 3 users, load average: 189.88, 82.07, 55.97 Tasks: 771 total, 1 running, 767 sleeping, 1 stopped, 2 zombie Cpu(s): 8.3%us, 7.6%sy, 0.0%ni, 8.3%id, 75.0%wa, 0.0%hi, 0.8%si, 0.0%st Mem: 16279824k total, 11913788k used, 4366036k free, 334308k buffers Swap:
Re: [Dovecot] High Load Average on POP/IMAP.
Thanks Urban, will try this and will let you know. Regards, Kavish Karkera From: Urban Loesch b...@enas.net To: dovecot@dovecot.org dovecot@dovecot.org Sent: Wednesday, 21 August 2013 5:34 PM Subject: Re: [Dovecot] High Load Average on POP/IMAP. Hi, if you try the following command during the server has a high load: # ps -ostat,pid,time,wchan='WCHAN-',cmd ax |grep D Do you get back something like this? STAT PID TIME WCHAN- CMD D 18713 00:00:00 synchronize_srcu dovecot/imap D 18736 00:00:00 synchronize_srcu dovecot/imap D 18775 00:00:05 synchronize_srcu dovecot/imap D 20330 00:00:00 synchronize_srcu dovecot/imap D 20357 00:00:00 synchronize_srcu dovecot/imap D 20422 00:00:00 synchronize_srcu dovecot/imap D 20687 00:00:00 synchronize_srcu dovecot/imap S+ 20913 00:00:00 pipe_wait grep D If yes, it could be a problem with Inotify in your kernel. You can try to disable inotify in the kernel with: echo 0 /proc/sys/fs/inotify/max_user_watches echo 0 /proc/sys/fs/inotify/max_user_instances Full article: http://thread.gmane.org/gmane.linux.kernel/1315430 For me this resolved the problem. Load goes down to 1.00 Regards Urban Am 21.08.2013 12:37, schrieb Kavish Karkera: Hi, We have a serious issue running on our POP/IMAP servers these days. The load average of a servers spikes up to 400-500 as a uptime command result, for a particular time period , to be specific mostly in noon time and evening, but it last for few minutes only. We have 2 servers running dovecot 1.1.20 , in loadbanlancer, We have used KEEPLIVE (1.1.13) for loadbalacing. Server specification. Operating System : CentOS 5.5 64bit CPU cores : 16 RAM : 8GB Mail and Indexes are mounted on NFS (NetApp). Below is the dovecot -n ... (top results during high spike) # # 1.1.20: /usr/local/etc/dovecot.conf # OS: Linux 2.6.28 x86_64 CentOS release 5.5 (Final) log_path: /var/log/dovecot-info.log info_log_path: /var/log/dovecot-info.log syslog_facility: local1 protocols: imap imaps pop3 pop3s listen(default): *:143 listen(imap): *:143 listen(pop3): *:110 ssl_listen(default): *:993 ssl_listen(imap): *:993 ssl_listen(pop3): *:995 ssl_cert_file: /usr/local/etc/ssl/certs/dovecot.pem ssl_key_file: /usr/local/etc/ssl/private/dovecot.pem disable_plaintext_auth: no login_dir: /usr/local/var/run/dovecot/login login_executable(default): /usr/local/libexec/dovecot/imap-login login_executable(imap): /usr/local/libexec/dovecot/imap-login login_executable(pop3): /usr/local/libexec/dovecot/pop3-login login_greeting: Welcome to Popserver. login_process_per_connection: no max_mail_processes: 1024 mail_max_userip_connections(default): 100 mail_max_userip_connections(imap): 100 mail_max_userip_connections(pop3): 50 verbose_proctitle: yes first_valid_uid: 99 first_valid_gid: 99 mail_location: maildir:~/Maildir:INDEX=/indexes/%h:CONTROL=/indexes/%h mmap_disable: yes mail_nfs_storage: yes mail_nfs_index: yes lock_method: dotlock mail_executable(default): /usr/local/libexec/dovecot/imap mail_executable(imap): /usr/local/libexec/dovecot/imap mail_executable(pop3): /usr/local/libexec/dovecot/pop3 mail_plugins(default): quota imap_quota mail_plugins(imap): quota imap_quota mail_plugins(pop3): quota mail_plugin_dir(default): /usr/local/lib/dovecot/imap mail_plugin_dir(imap): /usr/local/lib/dovecot/imap mail_plugin_dir(pop3): /usr/local/lib/dovecot/pop3 pop3_no_flag_updates(default): no pop3_no_flag_updates(imap): no pop3_no_flag_updates(pop3): yes pop3_lock_session(default): no pop3_lock_session(imap): no pop3_lock_session(pop3): yes pop3_client_workarounds(default): pop3_client_workarounds(imap): pop3_client_workarounds(pop3): outlook-no-nuls lda: postmaster_address: ad...@research.com mail_plugins: cmusieve quota mail_log mail_plugin_dir: /usr/local/lib/dovecot/lda auth_socket_path: /var/run/dovecot/auth-master auth default: worker_max_count: 15 passdb: driver: sql args: /usr/local/etc/dovecot-mysql.conf userdb: driver: sql args: /usr/local/etc/dovecot-mysql.conf userdb: driver: prefetch socket: type: listen client: path: /var/run/dovecot/auth-client mode: 432 user: nobody group: nobody master: path: /var/run/dovecot/auth-master mode: 384 user: nobody group: nobody plugin: quota_warning: storage=95%% /usr/local/bin/quota-warning.sh 95 %u quota_warning2: storage=80%% /usr/local/bin/quota-warning.sh 80 %u quota: maildir:storage=64
Re: [Dovecot] High Load Average on POP/IMAP.
On 8/21/2013 5:37 AM, Kavish Karkera wrote: We have a serious issue running on our POP/IMAP servers these days. The load average of a servers spikes up to 400-500 as a uptime command result, for a particular time period , to be specific mostly in noon time and evening, but it last for few minutes only. We have 2 servers running dovecot 1.1.20 , in loadbanlancer, We have used KEEPLIVE (1.1.13) for loadbalacing. Server specification. Operating System : CentOS 5.5 64bit CPU cores : 16 RAM : 8GB Mail and Indexes are mounted on NFS (NetApp). ... Cpu(s): 8.3%us, 7.6%sy, 0.0%ni, 8.3%id, 75.0%wa, 0.0%hi, 0.8%si, 0.0%st ^^^ PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 408 mysql 18 0 384m 38m 4412 S 52.8 0.2 42221:44 mysqld This doesn't seem to be a dovecot issue. mysql has apparently 8 (or more) threads on 8 cores all blocking on IO. I see a few possible causes. 1. The NetApp is unable to keep up with the request rate because: a. There are too few spindles in the RAID set backing this NFS volume and/or the file(s) aren't properly striped across all spindles b. An inappropriate RAID level. The mysql job is apparently doing large table updates and you're experiencing massive RMW latency from RAID5/6. This is why one should never put a transactional database, or one that sees large frequent table updates, on a parity RAID volume--unless the disks are SSD. SSDs have no mechanical parts, thus RMW latency is almost nonexistent. 2. Apparently 8 (or more) threads are concurrently accessing the same file or files. Thus the massive iowait could simply be the result of filesystem and/or NFS locking, NFS client caching issues, etc. The cause of the massive iowait could be one or all of the above, or could be something else entirely. These are the typical causes. You seem to have a database job scheduled to run twice daily that triggers the problem. Identify this job, figure out what it does, why it does it, how necessary it is, and if it can be scheduled to run at off peak hours. If it can you may want to simply do so, as it may be expensive, in hardware and/or labor dollars, to fix the IO latency problem. -- Stan