I've had 2x director ring up and running with production load on 2.1.8
with around 10,000 active connections for two weeks and everything has
been working great - until this morning.
There isn't anything obvious in the logs beyond the fact that the
director connections started bouncing. It was not resolved by reloads
or restarts or an upgrade to 2.1.9 (only the directors.)
I've dropped one of the servers out of the ring to prevent suffering but
this is a less than ideal situation.
Any idea what is going on? Load today is consistent with low weekend
load since it is a major US holiday so this wouldn't appear to be a load
related issue.
auth_master_user_separator = *
auth_username_format = %Ln
auth_verbose = yes
default_client_limit = 10000
director_mail_servers = 1.1.1.1 1.1.1.2 1.1.1.3 1.1.1.4
director_servers = a.director.foo b.director.foo
disable_plaintext_auth = no
doveadm_proxy_port = 1842
login_trusted_networks = 10.0.0.1
mbox_write_locks = fcntl
passdb {
args = /etc/dovecot/master-users
driver = passwd-file
master = yes
pass = yes
}
passdb {
args = proxy=y nopassword=y
driver = static
}
service anvil {
client_limit = 20103
}
service auth {
client_limit = 41704
}
service director {
fifo_listener login/proxy-notify {
mode = 0666
}
inet_listener {
port = 9321
}
unix_listener login/director {
mode = 0666
}
}
service imap-login {
executable = imap-login director
process_limit = 20000
process_min_avail = 32
}
service imap {
process_limit = 20480
}
service pop3-login {
executable = pop3-login director
process_limit = 20000
process_min_avail = 32
}
ssl_ca = </etc/dovecot/ssl/gd_bundle.crt
ssl_cert = </etc/dovecot/ssl/imap.sonic.net.crt
ssl_key = </etc/dovecot/ssl/imap.sonic.net.key
ssl_parameters_regenerate = 1 days
syslog_facility = local0
protocol imap {
ssl_ca = </etc/dovecot/ssl/gd_bundle.crt
ssl_cert = </etc/dovecot/ssl/imap.sonic.net.crt
ssl_key = </etc/dovecot/ssl/imap.sonic.net.key
}
protocol pop3 {
ssl_ca = </etc/dovecot/ssl/pop.sonic.net.ca-bundle
ssl_cert = </etc/dovecot/ssl/pop.sonic.net.crt
ssl_key = </etc/dovecot/ssl/pop.sonic.net.key
}
>Sep 3 09:22:42 b.director. b dovecot: director: Error: Director
10.10.10.71:9321/right disconnected
>Sep 3 09:22:45 a.director. a dovecot: director: Error: Director
10.10.10.37:9321/left disconnected
>Sep 3 09:22:49 b.director. b dovecot: director: Error: Director
10.10.10.71:9321/left disconnected
>Sep 3 09:22:53 b.director. b dovecot: director: Error: Director
10.10.10.71:9321/left disconnected
>Sep 3 09:22:54 a.director. a dovecot: director: Error: Director
10.10.10.37:9321/left disconnected
>Sep 3 09:22:59 b.director. b dovecot: director: Error: Director
10.10.10.71:9321/left disconnected
>Sep 3 09:23:02 a.director. a dovecot: director: Error: Director
10.10.10.37:9321/right disconnected
>Sep 3 09:23:02 a.director. a dovecot: director: Warning: director:
Couldn't connect to right side, we must be the only director left
>Sep 3 09:23:02 b.director. b dovecot: director: Error: Director
10.10.10.71:9321/right disconnected
>Sep 3 09:23:02 b.director. b dovecot: director: Warning: director:
Couldn't connect to right side, we must be the only director left
>Sep 3 09:23:32 a.director. a dovecot: director: Error: director: User
foo host lookup failed: Timeout - queued for 47 secs (Ring synced for 30
secs, weak user, user refreshed 47 secs ago)
>Sep 3 09:23:32 a.director. a dovecot: director: Error: director: User
bar host lookup failed: Timeout - queued for 38 secs (Ring synced for 30
secs, weak user, user refreshed 38 secs ago)
>Sep 3 09:23:32 a.director. a dovecot: director: Error: director: User
bla host lookup failed: Timeout - queued for 30 secs (Ring synced for 30
secs)
>Sep 3 09:23:32 a.director. a dovecot: director: Error: director: User
bla2 host lookup failed: Timeout - queued for 30 secs (Ring synced for
30 secs)
>Sep 3 09:23:32 a.director. a dovecot: director: Error: Director
10.10.10.37:9321/right disconnected
>Sep 3 09:23:32 a.director. a dovecot: director: Warning: director:
Couldn't connect to right side, we must be the only director left
>Sep 3 09:23:32 a.director. a dovecot: director: Warning: director:
Couldn't connect to right side, we must be the only director left
>Sep 3 09:23:32 b.director. b dovecot: director: Error: Director
10.10.10.71:9321/out disconnected before handshake finished
>Sep 3 09:23:32 b.director. b dovecot: director: Warning: director:
Couldn't connect to right side, we must be the only director left
>Sep 3 09:24:02 b.director. b dovecot: director: Error: Director
10.10.10.71:9321/out disconnected before handshake finished
>Sep 3 09:24:02 b.director. b dovecot: director: Warning: director:
Couldn't connect to right side, we must be the only director left
>Sep 3 09:24:05 a.director. a dovecot: director: Warning: Delaying new
user requests until ring is synced
>Sep 3 09:24:32 a.director. a dovecot: director: Warning: Ring is
synced, continuing delayed requests
>Sep 3 09:24:41 b.director. b dovecot: director: Error: Director
10.10.10.71:9321/right disconnected
>Sep 3 09:24:41 b.director. b dovecot: director: Warning: director:
Couldn't connect to right side, we must be the only director left
>Sep 3 09:24:41 a.director. a dovecot: director: Error: Director
10.10.10.37:9321/right disconnected
>Sep 3 09:24:41 a.director. a dovecot: director: Warning: director:
Couldn't connect to right side, we must be the only director left
>Sep 3 09:25:11 b.director. b dovecot: director: Error: User hash
2285697953 is being redirected to two hosts: 10.10.10.39 and 10.10.10.76
(old_ts=1346689481,handshaking,recv_ts=1346689467)
>Sep 3 09:25:12 b.director. b dovecot: director: Error: User hash
623192092 is being redirected to two hosts: 10.10.10.76 and 10.10.10.39
(old_ts=1346689481,handshaking,recv_ts=1346689468)
>Sep 3 09:25:12 b.director. b dovecot: director: Error: User hash
1683990717 is being redirected to two hosts: 10.10.10.43 and 10.10.10.76
(old_ts=1346689481,handshaking,recv_ts=1346689468)
>Sep 3 09:25:12 a.director. a dovecot: director: Error: Director
10.10.10.37:9321/right disconnected
>Sep 3 09:25:12 a.director. a dovecot: director: Warning: director:
Couldn't connect to right side, we must be the only director lef
--
Kelsey Cummings - k...@corp.sonic.net sonic.net, inc.
System Architect 2260 Apollo Way
707.522.1000 Santa Rosa, CA 95407