Can you somehow reproduce this issue with auth_debug=yes and mail_debug=yes and provide those logs?
Aki On 02.11.2017 10:55, Ralf Becker wrote: > No one any idea? > > Replication into wrong mailboxes caused by an unavailable proxy dict > backend is a serious privacy and/or security problem! > > Ralf > > Am 30.10.17 um 10:05 schrieb Ralf Becker: >> It happened now twice that replication created folders and mails in the >> wrong mailbox :( >> >> Here's the architecture we use: >> - 2 Dovecot (2.2.32) backends in two different datacenters replicating >> via a VPN connection >> - Dovecot directors in both datacenters talks to both backends with >> vhost_count of 100 vs 1 for local vs remote backend >> - backends use proxy dict via a unix domain socket and socat to talk via >> tcp to a dict on a different server (kubernetes cluster) >> - backends have a local sqlite userdb for iteration (also containing >> home directories, as just iteration is not possible) >> - serving around 7000 mailboxes in a roughly 200 different domains >> >> Everything works as expected, until dict is not reachable eg. due to a >> server failure or a planed reboot of a node of the kubernetes cluster. >> In that situation it can happen that some requests are not answered, >> even with Kubernetes running multiple instances of the dict. >> I can only speculate what happens then: it seems the connection failure >> to the remote dict is not correctly handled and leads to situation in >> which last mailbox/home directory is used for the replication :( >> >> When it happened the first time we attributed it to the fact that the >> Sqlite database at that time contained no home directory information, >> which we fixed after. This first time (server failure) took a couple of >> minutes and lead to many mailboxes containing mostly folders but also >> some new arrived mails belonging to other mailboxes/users. We could only >> resolve that situation by rolling back to a zfs snapshot before the >> downtime. >> >> The second time was last Friday night during a (much shorter) reboot of >> a Kubernetes node and lead only to a single mailbox containing folders >> and mails of other mailboxes. That was verified by looking at timestamps >> of directories below $home/mdbox/mailboxes and files in $home/mdbox/storage. >> I can not tell if adding the home directory to the Sqlite database or >> the shorter time of the failure limited the wrong replication to a >> single mailbox. >> >> Can someone with more knowledge of the Dovecot code please check/verify >> how replication deals with failures in proxy dict. I'm of cause happy to >> provide more information of our configuration if needed. >> >> Here is an exert of our configuration (full doveconf -n is attached): >> >> passdb { >> args = /etc/dovecot/dovecot-dict-master-auth.conf >> driver = dict >> master = yes >> } >> passdb { >> args = /etc/dovecot/dovecot-dict-auth.conf >> driver = dict >> } >> userdb { >> driver = prefetch >> } >> userdb { >> args = /etc/dovecot/dovecot-dict-auth.conf >> driver = dict >> } >> userdb { >> args = /etc/dovecot/dovecot-sql.conf >> driver = sql >> } >> >> dovecot-dict-auth.conf: >> uri = proxy:/var/run/dovecot_auth_proxy/socket:backend >> password_key = passdb/%u/%w >> user_key = userdb/%u >> iterate_disable = yes >> >> dovecot-dict-master-auth.conf: >> uri = proxy:/var/run/dovecot_auth_proxy/socket:backend >> password_key = master/%{login_user}/%u/%w >> iterate_disable = yes >> >> dovecot-sql.conf: >> driver = sqlite >> connect = /etc/dovecot/users.sqlite >> user_query = SELECT home,NULL AS uid,NULL AS gid FROM users WHERE userid >> = '%n' AND domain = '%d' >> iterate_query = SELECT userid AS username, domain FROM users