Re: [Dovecot] dsync replication errors
On 08 Sep 2015, at 11:20, Sergey Schwartzwrote: > > I use mdbox and probably have similar issue, but in my case only shared > mailboxes were affected. Yes, shared mailboxes don't work nicely with replication. Replication is locking only the original user, so for shared mailboxes multiple dsyncs can be running in parallel and messing things up. A bit troublesome to fix this. I've had this issue happening for a couple of years now for our mails and I haven't bothered fixing it, so it's unlikely I'll do it anytime soon.. Although I haven't seen that many duplicates of the mails - just 10 or so.
Re: [Dovecot] dsync replication errors
On 02/17/2013 03:21 AM, Timo Sirainen wrote: Although there's still some mail duplication problem with maildir that doesn't log any errors about it. I'm not sure why that happens. While you're around, Timo :-) I've had such an issue recently with 2.2.18, using Maildir, where emails were being replicated circularly creating more and more duplicate copies. Replication should have been unidirectional in reality since changes were being made on one side only. Nothing coherent was being logged. Only "Warning: Maildir /srv/mail/domains/.../Maildir: Expunged message reappeared, giving a new UID .. " appearing on the receiving side. Is there any intelligence on the matter, or should I isolate this down and report it from scratch?
Re: [Dovecot] dsync replication errors
On 08 Sep 2015, at 01:16, Gedalyawrote: > > On 02/17/2013 03:21 AM, Timo Sirainen wrote: >> Although there's still some mail >> duplication problem with maildir that doesn't log any errors about it. >> I'm not sure why that happens. > > While you're around, Timo :-) > > I've had such an issue recently with 2.2.18, using Maildir, where emails were > being replicated circularly creating more and more duplicate copies. > Replication should have been unidirectional in reality since changes were > being made on one side only. > Nothing coherent was being logged. Only "Warning: Maildir > /srv/mail/domains/.../Maildir: Expunged message reappeared, giving a new UID > .. " appearing on the receiving side. > Is there any intelligence on the matter, or should I isolate this down and > report it from scratch? dsync bugs usually take a lot of time to debug. Unless there's an easily reproducible way to break it, I try to avoid spending time on it. Also in this case the bug might be in Maildir code instead of dsync code.
Re: [Dovecot] dsync replication errors
On 2013-02-18 10:39 PM, Timo Sirainen t...@iki.fi wrote: On 18.2.2013, at 23.50, Michael Grimm trash...@odo.in-berlin.de wrote: With doveconf -H dovecot%9d I do end in tons of reported collisions like ... | doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005 | doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918 | doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312 Sure there are going to be hash collisions at some point, but I highly doubt you're going to create a million server Dovecot cluster. :) I've been following this thread with interest (or mostly out of curiosity, as I will have no need for running multiple machines, except possibly to run one secondary machine as a 'hot spare', but here I'm confused (and my ignorance is apparently showing)... How are any of the above 'collisions? The hashes are different. -- Best regards, */Charles/*
Re: [Dovecot] dsync replication errors
On 19.2.2013, at 13.48, Charles Marcus cmar...@media-brokers.com wrote: On 2013-02-18 10:39 PM, Timo Sirainen t...@iki.fi wrote: On 18.2.2013, at 23.50, Michael Grimm trash...@odo.in-berlin.de wrote: With doveconf -H dovecot%9d I do end in tons of reported collisions like ... | doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005 | doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918 | doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312 Sure there are going to be hash collisions at some point, but I highly doubt you're going to create a million server Dovecot cluster. :) I've been following this thread with interest (or mostly out of curiosity, as I will have no need for running multiple machines, except possibly to run one secondary machine as a 'hot spare', but here I'm confused (and my ignorance is apparently showing)... How are any of the above 'collisions? The hashes are different. Dovecot uses last 32 bits of SHA1 of the name. So collisions for example: % printf dovecot1368344| sha1sum | awk '{print $1}' | cut -c 33- bd593aec % printf dovecot2055005| sha1sum | awk '{print $1}' | cut -c 33- bd593aec
Re: [Dovecot] dsync replication errors
On 18.02.2013, at 07:49, Timo Sirainen t...@iki.fi wrote: On Sun, 2013-02-17 at 12:30 +0200, Timo Sirainen wrote: (So yeah, ideally there should be checks for detecting hostname hash collisions..) Added to v2.2 hg: % doveconf -H dovecot%2d No duplicate host hashes in dovecot0 .. dovecot99 With doveconf -H dovecot%9d I do end in tons of reported collisions like ... | doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005 | doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918 | doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312 (No wonder, I am running 2.1 replicator with identical local hostnames for some time now.) ... and ending with: | Killed doveconf -H without the template it attempts to detect it from the current hostname. mail doveconf -H doveconf: Fatal: Hostname 'xxx.yyy.tld' has no digits, can't verify JFTR and regards, Michael
Re: [Dovecot] dsync replication errors
On 18.02.2013, at 07:07, Timo Sirainen t...@iki.fi wrote: On 17.2.2013, at 22.04, Michael Grimm trash...@odo.in-berlin.de wrote: First of all: whenever you referred to hostname in this thread you have been using it as a synonym for the local part [1] of a FQDN, right? I mean what gethostname() function returns, which is what hostname command usually also returns. And yes, I think it's the local part always. I am not familiar with the gethostname() function within FreeBSD, but the hostname command normally returns your FQDN, if set. That has been the case because I didn't configure my service jails with FQDNs, thus a hostname couldn't return something else then the local hostname. Given that all my interpretations of your statements are correct I do have difficulties in understanding why a generic communication between Dovecot servers should be limited to enforcing different local parts of all Dovecot servers implied instead of different FQDN? That would make much more sense regarding uniqueness in hostnames, IMHO. Two servers like dovecot.forget-about.it and dovecot.you-name.it should be able to communicate generically, again: IMHO. I think systems named those would belong to different clusters and wouldn't need to communicate with each others. Well, now I do understand my misunderstanding: I did consider replication between different clusters a generic communication between Dovecot servers, as well. I looked through the code. The hostname (without domain) are currently used for: * maildir filenames * temporary filenames * authentication challenge strings in some auth mechanisms * logging So I think the hostname uniqueness matters mainly when using a shared filesystem (e.g. NFS). So, I'm confident that I may stick to identical local hostnames regarding both servers of mine. Thanks and with kind regards, Michael
Re: [Dovecot] dsync replication errors
On 18.2.2013, at 23.50, Michael Grimm trash...@odo.in-berlin.de wrote: % doveconf -H dovecot%2d No duplicate host hashes in dovecot0 .. dovecot99 With doveconf -H dovecot%9d I do end in tons of reported collisions like ... | doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005 | doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918 | doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312 Sure there are going to be hash collisions at some point, but I highly doubt you're going to create a million server Dovecot cluster. :)
Re: [Dovecot] dsync replication errors
On Sat, 2013-02-16 at 19:32 +0100, Oli Schacher wrote: There seems to be an issue left when expunging a large amount of messages from the Trash. I managed to get it twice so far by expunging ~3k messages. I'll try to create a reproducible test script for this scenario. I can currently only provide my clicking around log output. Version is current hg, e63d1cf19ec7. First time it happened: Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1221, file=1361035457.M728795P6220.doco1,S=2476,W=2555:2,Sa) These errors should be gone now in hg. Although there's still some mail duplication problem with maildir that doesn't log any errors about it. I'm not sure why that happens. Feb 16 18:50:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: dsync(local): Received unexpected input S != H Fixed also this error that happened on locking failure. Feb 16 19:13:08 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file mail-transaction-log-view.c: line 72 (mail_transaction_log_view_set): assertion failed: (min_file_seq = max_file_seq) Not sure about this one. But usually this happens only once and retry works.
Re: [Dovecot] dsync replication errors
On 17.02.2013, at 06:23, Timo Sirainen t...@iki.fi wrote: On 17.2.2013, at 7.06, Timo Sirainen t...@iki.fi wrote: On 17.2.2013, at 0.12, Michael Grimm trash...@odo.in-berlin.de wrote: Hmm. Both jails run at distinct servers. ssh replication uses different domains, though. But, both jails are named identically test, and both jails resolve to identical hostnames test if using hostname. But, a hostname -f is lacking to return test.mx1.invalid and test.mx2.invalid, respectively (although a nslookup test does). Hmm, do you think I should need to provide different hostnames in both jails? That's the problem most likely. I'd guess Dovecot sees both servers as having test as the hostname and each server thinks it's the one that should be doing the locking and not the other. See if this helps: http://hg.dovecot.org/dovecot-2.2/rev/e7aabd79c9d5 Good news! Those identical hostnames at both servers broke replicator. Now, with v2.2.beta1 (1dd1e88ba0a2) I cannot break replicator any longer how many messages I do inject at both servers simultaneously. (Tested a couple of times up to 2000 mails at every server.) Although even if it does, other parts of Dovecot still use only the hostname part to guarantee global uniqueness of things. So better to have unique hostnames. What parts of Dovecot would be involved? I'm curious because my production mailservers use identical hostnames in their jails ever since running Dovecot (starting 1.x). Thanks for the new replicator code, I really appreciate your work! And, from my point of view I will consider replicator v2.2 ready for production. With kind regards, Michael
Re: [Dovecot] dsync replication errors
On Sun, 2013-02-17 at 10:44 +0100, Michael Grimm wrote: Although even if it does, other parts of Dovecot still use only the hostname part to guarantee global uniqueness of things. So better to have unique hostnames. What parts of Dovecot would be involved? I'm curious because my production mailservers use identical hostnames in their jails ever since running Dovecot (starting 1.x). Mainly that maildir filenames are used as GUIDs. If two have the same name, they are assumed to be identical. That's why the maildir filenames include the hostname in them, to make sure that the GUID is different even if two mails happen to be delivered at exactly the same time with the same PID and same size to two different servers. So pretty unlikely, but better to be safe. :) There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others. If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement.
Re: [Dovecot] dsync replication errors
On 17.02.2013, at 11:08, Timo Sirainen t...@iki.fi wrote: On Sun, 2013-02-17 at 10:44 +0100, Michael Grimm wrote: Although even if it does, other parts of Dovecot still use only the hostname part to guarantee global uniqueness of things. So better to have unique hostnames. What parts of Dovecot would be involved? I'm curious because my production mailservers use identical hostnames in their jails ever since running Dovecot (starting 1.x). Mainly that maildir filenames are used as GUIDs. If two have the same name, they are assumed to be identical. That's why the maildir filenames include the hostname in them, to make sure that the GUID is different even if two mails happen to be delivered at exactly the same time with the same PID and same size to two different servers. So pretty unlikely, but better to be safe. :) Ok, that won't hit me for the time being because I am using mdbox. There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others. If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement. Thanks for that clarification. Thus I will need to think about different hostnames, although that implies no more just copying config files between both servers that imply identical hostnames at both sites ;-) Regards, Michael
Re: [Dovecot] dsync replication errors
On 17.2.2013, at 12.19, Michael Grimm trash...@odo.in-berlin.de wrote: On 17.02.2013, at 11:08, Timo Sirainen t...@iki.fi wrote: On Sun, 2013-02-17 at 10:44 +0100, Michael Grimm wrote: Although even if it does, other parts of Dovecot still use only the hostname part to guarantee global uniqueness of things. So better to have unique hostnames. What parts of Dovecot would be involved? I'm curious because my production mailservers use identical hostnames in their jails ever since running Dovecot (starting 1.x). Mainly that maildir filenames are used as GUIDs. If two have the same name, they are assumed to be identical. That's why the maildir filenames include the hostname in them, to make sure that the GUID is different even if two mails happen to be delivered at exactly the same time with the same PID and same size to two different servers. So pretty unlikely, but better to be safe. :) Ok, that won't hit me for the time being because I am using mdbox. It's basically the same with mdbox, except instead of using actual hostname it's using a 32bit hash of it. (So yeah, ideally there should be checks for detecting hostname hash collisions..)
Re: [Dovecot] dsync replication errors
Am 17.02.2013 11:08, schrieb Timo Sirainen: What parts of Dovecot would be involved? I'm curious because my production mailservers use identical hostnames in their jails ever since running Dovecot (starting 1.x). Mainly that maildir filenames are used as GUIDs. If two have the same name, they are assumed to be identical. That's why the maildir filenames include the hostname in them, to make sure that the GUID is different even if two mails happen to be delivered at exactly the same time with the same PID and same size to two different servers. So pretty unlikely, but better to be safe. :) There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others. If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement. Postfix is enforcing this since forever Greeted me with my own hostname hostnames inside a network should always be unique signature.asc Description: OpenPGP digital signature
Re: [Dovecot] dsync replication errors
On 17.02.2013, at 11:08, Timo Sirainen t...@iki.fi wrote: There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others. I'd like to come back to that issue in order to understand your statement cited below. First of all: whenever you referred to hostname in this thread you have been using it as a synonym for the local part [1] of a FQDN, right? I have both servers of mine configured to use identical local parts (test) but different FQDN (aka test.domainA.tldA and test.domainB.tldB). Your fix has been to replace my_hostname by my_hostdomain(), thus using test.domainA.tldA and test.domainB.tldB instead of test, right? If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement. Given that all my interpretations of your statements are correct I do have difficulties in understanding why a generic communication between Dovecot servers should be limited to enforcing different local parts of all Dovecot servers implied instead of different FQDN? That would make much more sense regarding uniqueness in hostnames, IMHO. Two servers like dovecot.forget-about.it and dovecot.you-name.it should be able to communicate generically, again: IMHO. BTW: I had had defined hostname= in dovecot.conf identically using completely different *but* identical FQDNs mail.my-domain.tld because of: | conf.d/15-lda.conf: | # Hostname to use in various parts of sent mails, eg. in Message-Id. | # Default is the system's real hostname. | #hostname = At least my_hostdomain() doesn't care about that setting, right? Again, I can live with mandatory different local hostname parts, but I would love to understand why ... With kind regards, Michael [1] http://en.wikipedia.org/wiki/Hostname
Re: [Dovecot] dsync replication errors
Am 17.02.2013 21:04, schrieb Michael Grimm: On 17.02.2013, at 11:08, Timo Sirainen t...@iki.fi wrote: There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others. I'd like to come back to that issue in order to understand your statement cited below. First of all: whenever you referred to hostname in this thread you have been using it as a synonym for the local part [1] of a FQDN, right? I have both servers of mine configured to use identical local parts (test) but different FQDN (aka test.domainA.tldA and test.domainB.tldB). Your fix has been to replace my_hostname by my_hostdomain(), thus using test.domainA.tldA and test.domainB.tldB instead of test, right? If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement. Given that all my interpretations of your statements are correct I do have difficulties in understanding why a generic communication between Dovecot servers should be limited to enforcing different local parts of all Dovecot servers implied instead of different FQDN? That would make much more sense regarding uniqueness in hostnames, IMHO. Two servers like dovecot.forget-about.it and dovecot.you-name.it should be able to communicate generically, again: IMHO. the better design would be if doveot generates some UUID at the first startup in a /etc/dovecot/uuid.conf if the file does not exist becasue it would make hostnames meaningless at all AND give you the option if you are knowing what you are doing to replace a machine with a newer one by rsync datadirs and the whole /etc/dovecot/ signature.asc Description: OpenPGP digital signature
Re: [Dovecot] dsync replication errors
On 17.02.2013, at 21:04, Michael Grimm trash...@odo.in-berlin.de wrote: BTW: I had had defined hostname= in dovecot.conf identically using completely different *but* identical FQDNs mail.my-domain.tld because of: s/using completely different/using completely different to locally reported by resolver/g Regards, Michael
Re: [Dovecot] dsync replication errors
On 17.2.2013, at 22.04, Michael Grimm trash...@odo.in-berlin.de wrote: On 17.02.2013, at 11:08, Timo Sirainen t...@iki.fi wrote: There may be some other features that require unique hostnames in future. Anything where multiple Dovecot servers need to communicate between each others. I'd like to come back to that issue in order to understand your statement cited below. First of all: whenever you referred to hostname in this thread you have been using it as a synonym for the local part [1] of a FQDN, right? I mean what gethostname() function returns, which is what hostname command usually also returns. And yes, I think it's the local part always. I have both servers of mine configured to use identical local parts (test) but different FQDN (aka test.domainA.tldA and test.domainB.tldB). Your fix has been to replace my_hostname by my_hostdomain(), thus using test.domainA.tldA and test.domainB.tldB instead of test, right? Yes. If some day there is such generic communication between Dovecot servers I'm planning on enforcing this requirement. Given that all my interpretations of your statements are correct I do have difficulties in understanding why a generic communication between Dovecot servers should be limited to enforcing different local parts of all Dovecot servers implied instead of different FQDN? That would make much more sense regarding uniqueness in hostnames, IMHO. Two servers like dovecot.forget-about.it and dovecot.you-name.it should be able to communicate generically, again: IMHO. I think systems named those would belong to different clusters and wouldn't need to communicate with each others. I looked through the code. The hostname (without domain) are currently used for: * maildir filenames * temporary filenames * authentication challenge strings in some auth mechanisms * logging So I think the hostname uniqueness matters mainly when using a shared filesystem (e.g. NFS). BTW: I had had defined hostname= in dovecot.conf identically using completely different *but* identical FQDNs mail.my-domain.tld because of: | conf.d/15-lda.conf: | # Hostname to use in various parts of sent mails, eg. in Message-Id. | # Default is the system's real hostname. | #hostname = At least my_hostdomain() doesn't care about that setting, right? Right. I updated the comment a bit: http://hg.dovecot.org/dovecot-2.2/rev/6a67a1440e15 lda_hostname would have been a better name for the settings.
Re: [Dovecot] dsync replication errors
On Sun, 2013-02-17 at 12:30 +0200, Timo Sirainen wrote: (So yeah, ideally there should be checks for detecting hostname hash collisions..) Added to v2.2 hg: % doveconf -H dovecot%d No duplicate host hashes in dovecot0 .. dovecot9 % doveconf -H dovecot%2d No duplicate host hashes in dovecot0 .. dovecot99 % doveconf -H dovecot%02d No duplicate host hashes in dovecot00 .. dovecot99 doveconf -H without the template it attempts to detect it from the current hostname.
Re: [Dovecot] dsync replication errors
I did a bunch of dsync fixes today in hg. With the new locking behavior (and other fixes) you shouldn't be able to break it anymore. On Fri, 2013-02-01 at 21:53 +0100, Michael Grimm wrote: [Sorry Oli for my previous mail to your address, only. Resent here] Oli Schacher dove...@lists.wgwh.ch wrote: There still seems to be a problem when changes to both mailboxes at the same time are involved I can confirm your observation, although triggered by a different test scenario, similar to the one I did use with 2.1 replicator before (http://www.dovecot.org/list/dovecot/2012-March/064354.html). This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes test at both servers mx1 and mx2, and replicator uses ssh for remote access. Both servers run a recent postfix, use lmtp for local delivery, and test is a virtual user. Test script to produce local testmails of equal size at mx1: | #!/bin/csh | set INDEX= 101 | set endINDEX = 200 | while ( $INDEX = $endINDEX ) |echo $INDEX |echo test | mail -s $INDEX test@mx1 |if ( $INDEX % 1000 == 0 ) then | sleep 1 |endif |@ INDEX = $INDEX + 1 |end |exit 0 Test script to produce testmails of equal size at mx2: | #!/bin/csh | set INDEX= 1101 | set endINDEX = 1200 | while ( $INDEX = $endINDEX ) |echo $INDEX |echo test | mail -s $INDEX test@mx2 |if ( $INDEX % 1000 == 0 ) then | sleep 1 |endif |@ INDEX = $INDEX + 1 |end |exit 0 All tests are run with vanilla mailboxes, after restarting dovecot, and without imap connections by MUA: 1) Simultaneous mailbomb approach: run both scripts simultaneously, and you'll end up with numerous duplicates in mailboxes test. Very often you'll find multiples. 2) Mailbomb approach: run one script at one server only, and all mails will become perfectly well synchronised. 3) Mofify both scripts to ( $INDEX % 1 == 0 ) to add a second waiting between every mail injection, and run them simultaneously at both servers, and you'll end up with significantly less duplicates and no more multiples. Feb 1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c960042f4 (UID=211) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Importing mailbox INBOX failed Feb 1 07:13:24 doco2 dovecot: dsync-local(user1): Error: Remote command process isn't dying, killing it I do see those error messages as well, and in addition numerous of those: | dovecot: dsync-local(test): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=7153: 82c5df0a4ffa0b5141e36a0d5a02 != 29cc9f284ffa0b5141c236abecbd | doveadm: Error: dsync-remote(test): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=7153: 82c5df0a4ffa0b5141e36a0d5a02 != 29cc9f284ffa0b5141c236abecbd | dovecot: lmtp(49752, test): Error: Corrupted index cache file /.../test/mailboxes/INBOX/dbox-Mails/dovecot.index.cache: File too small | Feb 1 18:35:16 mail.err mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: Mailbox INBOX: Corrupted index, uidvalidity=0 | Feb 1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:16 mail.err mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: Mailbox INBOX: Corrupted index, uidvalidity=0 | Feb 1 18:35:16 mail.err mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: mdbox /.../test/mailboxes/INBOX/dbox-Mails: Storage keeps breaking | Feb 1 18:35:16 mail.err mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: Mailbox INBOX: Corrupted index, uidvalidity=0 | Feb 1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking
Re: [Dovecot] dsync replication errors
Timo Sirainen t...@iki.fi wrote: I did a bunch of dsync fixes today in hg. With the new locking behavior (and other fixes) you shouldn't be able to break it anymore. Sorry to say, but I am still able to break replicator with v2.2.beta1 (35194cf0693e) under the conditions outlined below. On 2013-02-01 Michael Grimm wrote: This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes test at both servers mx1 and mx2, and replicator uses ssh for remote access. Both servers run a recent postfix, use lmtp for local delivery, and test is a virtual user. I might add that both servers run inside FreeBSD jails (if that might make the difference to your test setup. All tests are run with vanilla mailboxes, after restarting dovecot, and without imap connections by MUA: This time I did even restart both service jails before every test. And, I did use both Mail.app and roundcube as MUA to check the results (if Mail.app might have screwed INBOX ...) 1) Simultaneous mailbomb approach: run both scripts simultaneously, and you'll end up with numerous duplicates in mailboxes test. Very often you'll find multiples. Still a lot of duplicates and multiples. Those numbers are not reproducable, 240 (best case) up to 340 (worst case) instead of 200 messages (after 10 tests). Here is one logfile example of a triplicated mail injected at mx1: logfile at mx1: | Feb 16 19:03:12 mail.info mx1 postfix/pickup[33958]: 3Z7fMh1PYMz5Ng: uid=0 from=root | Feb 16 19:03:12 mail.info mx1 postfix/cleanup[34320]: 3Z7fMh1PYMz5Ng: message-id=3Z7fMh1PYMz5Ng@test.mx1.invalid | Feb 16 19:03:12 mail.info mx1 postfix/qmgr[33959]: 3Z7fMh1PYMz5Ng: from=root@mx1.invalid, size=310, nrcpt=1 (queue active) | Feb 16 19:03:12 mail.info mx1 dovecot: lmtp(34456, test): copy from : box=INBOX, uid=12, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=() | Feb 16 19:03:12 mail.info mx1 dovecot: lmtp(34456, test): nVlIDeDJH1GYhgAAag1aAg: sieve: msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid: stored mail into mailbox 'INBOX' | Feb 16 19:03:12 mail.info mx1 postfix/lmtp[34453]: 3Z7fMh1PYMz5Ng: to=test@mx1.invalid, orig_to=tt@mx1.invalid, relay=test.mx1.invalid[private/dovecot-lmtp], delay=0.29, delays=0.08/0/0/0.21, dsn=2.0.0, status=sent (250 2.0.0 test@mx1.invalid nVlIDeDJH1GYhgAAag1aAg Saved) | Feb 16 19:03:12 mail.info mx1 postfix/qmgr[33959]: 3Z7fMh1PYMz5Ng: removed | Feb 16 19:03:13 mail.info mx1 dovecot: dsync-local(test): copy from INBOX: box=INBOX, uid=42, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=() | Feb 16 19:03:13 mail.info mx1 dovecot: dsync-local(test): expunge: box=INBOX, uid=12, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=(\Recent) | Feb 16 19:03:16 mail.info mx1 dovecot: dsync-local(test): copy from INBOX: box=INBOX, uid=164, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=() | Feb 16 19:03:16 mail.info mx1 dovecot: dsync-local(test): copy from INBOX: box=INBOX, uid=263, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=() | Feb 16 19:03:16 mail.info mx1 dovecot: dsync-local(test): expunge: box=INBOX, uid=118, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=(\Recent) | Feb 16 19:03:16 mail.info mx1 dovecot: dsync-local(test): expunge: box=INBOX, uid=42, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=(\Recent) after reading those three messages at mx1: | Feb 16 19:04:22 mail.info mx1 dovecot: imap(test) hQjfUNvVPwBd3Cqw: flag_change: box=INBOX, uid=372, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=(\Seen \Recent) | Feb 16 19:05:40 mail.info mx1 dovecot: imap(test) hQjfUNvVPwBd3Cqw: flag_change: box=INBOX, uid=263, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=(\Seen \Recent) | Feb 16 19:05:41 mail.info mx1 dovecot: imap(test) hQjfUNvVPwBd3Cqw: flag_change: box=INBOX, uid=164, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=(\Seen \Recent) logfile at mx2: | Feb 16 19:03:13 mail.info mx2 dovecot: dsync-local(test): save: box=INBOX, uid=50, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=() | Feb 16 19:03:17 mail.info mx2 dovecot: dsync-local(test): copy from INBOX: box=INBOX, uid=372, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, from=root@mx1.invalid (admin), flags=() 2) Mailbomb approach: run one script at one server only, and all mails will become perfectly well synchronised. Same results here. 3) Modify both scripts to ( $INDEX % 1 == 0 ) to add a second waiting between every mail injection, and run them simultaneously at both servers, and you'll end up with significantly less duplicates and no more multiples. Same results here. Good: I cannot
Re: [Dovecot] dsync replication errors
On Sat, 16 Feb 2013 17:20:22 +0200 Timo Sirainen t...@iki.fi wrote: I did a bunch of dsync fixes today in hg. With the new locking behavior (and other fixes) you shouldn't be able to break it anymore. Thanks for the fixes, Timo! I can confirm I'm no longer able to break anything with the tests I've mentioned so far(mass appending, simultaneous append and delete on both mailboxes), no more errors, no more dupes. I can also confirm the doveadm-server crash I reported in http://dovecot.markmail.org/thread/fb3qjnsdhtcpirg3 is now gone. There seems to be an issue left when expunging a large amount of messages from the Trash. I managed to get it twice so far by expunging ~3k messages. I'll try to create a reproducible test script for this scenario. I can currently only provide my clicking around log output. Version is current hg, e63d1cf19ec7. First time it happened: Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1221, file=1361035457.M728795P6220.doco1,S=2476,W=2555:2,Sa) Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1222, file=1361035458.M501466P6220.doco1,S=2477,W=2556:2,Sa) Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1223, file=1361035458.M988177P6220.doco1,S=2520,W=2599:2,Sa) Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1224, file=1361035459.M254031P6220.doco1,S=2483,W=2562:2,Sa) Feb 16 18:49:49 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1225, file=1361035459.M431911P6220.doco1,S=2490,W=2569:2,Sa) Feb 16 18:49:49 doco2 dovecot: imap(user1): Warning: Maildir /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID (old uid=1226, file=1361035459.M959244P6220.doco1,S=2482,W=2561:2,Sa) Feb 16 18:50:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Couldn't lock /mailstore/user1/.dovecot-sync.lock: Interrupted system call Feb 16 18:50:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: dsync(local): Received unexpected input S != H Feb 16 18:50:14 doco2 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.61) failed: EOF Feb 16 18:50:14 doco2 dovecot: dsync-local(user1): Error: Remote command returned error 75 Feb 16 18:50:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Couldn't lock /mailstore/user1/.dovecot-sync.lock: Interrupted system call Feb 16 18:50:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: dsync(local): Received unexpected input N != H Feb 16 18:50:44 doco2 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.61) failed: EOF Feb 16 18:50:44 doco2 dovecot: dsync-local(user1): Error: Remote command returned error 75 2nd time: (no reappeared messages this time) Feb 16 19:08:13 doco2 dovecot: imap-login: Login: user=user1, method=PLAIN, rip=192.168.23.130, lip=192.168.23.62, mpid=4794, session=DZ8RYNvVyADAqBeC Feb 16 19:08:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Couldn't lock /mailstore/user1/.dovecot-sync.lock: Interrupted system call Feb 16 19:08:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: dsync(local): Received unexpected input S != H Feb 16 19:08:44 doco2 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.61) failed: EOF Feb 16 19:08:44 doco2 dovecot: dsync-local(user1): Error: Remote command returned error 75 A while later on the other server: Feb 16 19:13:08 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file mail-transaction-log-view.c: line 72 (mail_transaction_log_view_set): assertion failed: (min_file_seq = max_file_seq) Feb 16 19:13:08 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5dc2a) [0x7f305f325c2a] - /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32) [0x7f305f325d12] - /usr/lib64/dovecot/libdovecot.so.0(+0x1f80a) [0x7f305f2e780a] - /usr/lib64/dovecot/libdovecot-storage.so.0(mail_transaction_log_view_set+0x580) [0x7f305f64e3f0] - /usr/bin/doveadm() [0x43786b] - /usr/bin/doveadm(dsync_transaction_log_scan_init+0x8c) [0x43791c] - /usr/bin/doveadm(dsync_brain_sync_mailbox_open+0x5e) [0x42724e] - /usr/bin/doveadm(dsync_brain_slave_recv_mailbox+0x123) [0x427c63] - /usr/bin/doveadm(dsync_brain_run+0x178) [0x425ff8] - /usr/bin/doveadm() [0x4265d1] - /usr/bin/doveadm() [0x4357f0] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36) [0x7f305f334bd6] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f305f335c67] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f305f334b78] - /usr/bin/doveadm()
Re: [Dovecot] dsync replication errors
On 16.2.2013, at 20.26, Michael Grimm trash...@odo.in-berlin.de wrote: Timo Sirainen t...@iki.fi wrote: I did a bunch of dsync fixes today in hg. With the new locking behavior (and other fixes) you shouldn't be able to break it anymore. Sorry to say, but I am still able to break replicator with v2.2.beta1 (35194cf0693e) under the conditions outlined below. I wonder if locking is working correctly in your setup. Your users have home directories, right? Dovecot should be creating .dovecot-sync.lock files in there during the sync. This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes test at both servers mx1 and mx2, and replicator uses ssh for remote access. Both servers run a recent postfix, use lmtp for local delivery, and test is a virtual user. I might add that both servers run inside FreeBSD jails (if that might make the difference to your test setup. Inside jail Dovecot sees two different hostnames (same as hostname command)? Good: I cannot find any Error: entries in both logfiles any longer. What about Warning?
Re: [Dovecot] dsync replication errors
On 16.02.2013, at 20:09, Timo Sirainen t...@iki.fi wrote: On 16.2.2013, at 20.26, Michael Grimm trash...@odo.in-berlin.de wrote: Sorry to say, but I am still able to break replicator with v2.2.beta1 (35194cf0693e) under the conditions outlined below. I wonder if locking is working correctly in your setup. Your users have home directories, right? Yes, I do have homedirs, ... Dovecot should be creating .dovecot-sync.lock files in there during the sync. ... and I double-checked that a .dovecot-sync.lock lockfile is being created during replication, and yes, it is. I might add that both servers run inside FreeBSD jails (if that might make the difference to your test setup. Inside jail Dovecot sees two different hostnames (same as hostname command)? Hmm. Both jails run at distinct servers. ssh replication uses different domains, though. But, both jails are named identically test, and both jails resolve to identical hostnames test if using hostname. But, a hostname -f is lacking to return test.mx1.invalid and test.mx2.invalid, respectively (although a nslookup test does). Hmm, do you think I should need to provide different hostnames in both jails? Good: I cannot find any Error: entries in both logfiles any longer. What about Warning? I do see only those few messages at both servers: | dovecot: doveadm(test): Warning: fscking index file /.../test/storage/dovecot.map.index | dovecot: doveadm(test): Warning: fscking index file /.../test/storage/dovecot.map.index | dovecot: doveadm(test): Warning: mdbox /.../test/storage: rebuilding indexes Please let me know what you want me to test next. I really to appreciate your efforts and with kind regards, Michael
Re: [Dovecot] dsync replication errors
On 17.2.2013, at 0.12, Michael Grimm trash...@odo.in-berlin.de wrote: I might add that both servers run inside FreeBSD jails (if that might make the difference to your test setup. Inside jail Dovecot sees two different hostnames (same as hostname command)? Hmm. Both jails run at distinct servers. ssh replication uses different domains, though. But, both jails are named identically test, and both jails resolve to identical hostnames test if using hostname. But, a hostname -f is lacking to return test.mx1.invalid and test.mx2.invalid, respectively (although a nslookup test does). Hmm, do you think I should need to provide different hostnames in both jails? That's the problem most likely. I'd guess Dovecot sees both servers as having test as the hostname and each server thinks it's the one that should be doing the locking and not the other. See if this helps: http://hg.dovecot.org/dovecot-2.2/rev/e7aabd79c9d5
Re: [Dovecot] dsync replication errors
On 17.2.2013, at 7.06, Timo Sirainen t...@iki.fi wrote: On 17.2.2013, at 0.12, Michael Grimm trash...@odo.in-berlin.de wrote: Hmm. Both jails run at distinct servers. ssh replication uses different domains, though. But, both jails are named identically test, and both jails resolve to identical hostnames test if using hostname. But, a hostname -f is lacking to return test.mx1.invalid and test.mx2.invalid, respectively (although a nslookup test does). Hmm, do you think I should need to provide different hostnames in both jails? That's the problem most likely. I'd guess Dovecot sees both servers as having test as the hostname and each server thinks it's the one that should be doing the locking and not the other. See if this helps: http://hg.dovecot.org/dovecot-2.2/rev/e7aabd79c9d5 Although even if it does, other parts of Dovecot still use only the hostname part to guarantee global uniqueness of things. So better to have unique hostnames.
Re: [Dovecot] dsync replication errors
On Thu, 31 Jan 2013 22:17:28 +0200 Timo Sirainen t...@iki.fi wrote: On Thu, 2013-01-31 at 21:51 +0200, Timo Sirainen wrote: On 31.1.2013, at 19.41, Oli Schacher dove...@lists.wgwh.ch wrote: Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4 (UID=104) I guess there's some bug that causes this to happen in some situations.. But the reason for mail duplication should be fixed by: http://hg.dovecot.org/dovecot-2.2/rev/138f1c76c0ec Except that shouldn't have been necessary. doveadm-server returns success before it has finished running dsync. Not sure why, need to debug it further. Fixed with a bit of a kludge: http://hg.dovecot.org/dovecot-2.2/rev/e9e6a95cea21 I can confirm that it has become significantly harder to produce errors with the latest patches. There still seems to be a problem when changes to both mailboxes at the same time are involved, however, today I didn't have time to test scientifically, i just updated to latest hg and clicked around, so this report probably won't be of much use to you,sorry. I'll try to make reproducible tests again next week. I'll post the errors from my clicking session anyway, maybe it helps you figuring out what went wrong even without knowing how to reproduce. At least the Operation not permitted error below when killing the dsync process sounds unintended? Logoutput is from changeset 78bdcb6642c7 running on both servers. Server 1: Feb 1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c960042f4 (UID=211) Feb 1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c960042f4 (UID=205) Feb 1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c960042f4 (UID=208) Feb 1 07:12:54 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=205: 7a30ff22af5b0b510f0c960042f4 != 8230ff22af5b0b510f0c960042f4 Feb 1 07:12:54 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c960042f4 (UID=228) [...] Feb 1 07:12:55 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Importing mailbox INBOX failed Feb 1 07:12:56 doco1 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.62) failed: EOF Feb 1 07:12:56 doco1 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.62) failed: Broken pipe Feb 1 07:12:56 doco1 dovecot: dsync-local(user1): Error: Remote command returned error 75 [...] Feb 1 07:12:57 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=291: 7b30ff22af5b0b510f0c960042f4 != 8d30ff22af5b0b510f0c960042f4 Feb 1 07:12:57 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file dsync-mailbox-import.c: line 1112 (dsync_mailbox_import_change): assertion failed: (change-type == DSYNC_MAIL_CHANGE_TYPE_SAVE) Feb 1 07:12:57 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5d4ea) [0x7f19cf5954ea] - /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32) [0x7f19cf5955d2] - /usr/lib64/dovecot/libdovecot.so.0(+0x1f6ca) [0x7f19cf5576ca] - /usr/bin/doveadm(dsync_mailbox_import_change+0x501) [0x42c881] - /usr/bin/doveadm(dsync_brain_sync_mails+0x3a2) [0x4290c2] - /usr/bin/doveadm(dsync_brain_run+0x169) [0x425e29] - /usr/bin/doveadm() [0x426380] - /usr/bin/doveadm() [0x434aa0] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36) [0x7f19cf5a4076] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f19cf5a5107] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f19cf5a4018] - /usr/bin/doveadm() [0x424134] - /usr/bin/doveadm() [0x40fe4f] - /usr/bin/doveadm() [0x41067d] - /usr/bin/doveadm(doveadm_mail_try_run+0x141) [0x410ba1] - /usr/bin/doveadm(main+0x3f1) [0x417bc1] - /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f19cf1c3cdd] - /usr/bin/doveadm() [0x40f839] Feb 1 07:12:57 doco1 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.62) failed: EOF Server 2: Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=205: 7a30ff22af5b0b510f0c960042f4 != 8230ff22af5b0b510f0c960042f4 Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c960042f4 (UID=228) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c960042f4 (UID=234) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c960042f4 (UID=238) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX:
Re: [Dovecot] dsync replication errors
[Sorry Oli for my previous mail to your address, only. Resent here] Oli Schacher dove...@lists.wgwh.ch wrote: There still seems to be a problem when changes to both mailboxes at the same time are involved I can confirm your observation, although triggered by a different test scenario, similar to the one I did use with 2.1 replicator before (http://www.dovecot.org/list/dovecot/2012-March/064354.html). This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes test at both servers mx1 and mx2, and replicator uses ssh for remote access. Both servers run a recent postfix, use lmtp for local delivery, and test is a virtual user. Test script to produce local testmails of equal size at mx1: | #!/bin/csh | set INDEX= 101 | set endINDEX = 200 | while ( $INDEX = $endINDEX ) |echo $INDEX |echo test | mail -s $INDEX test@mx1 |if ( $INDEX % 1000 == 0 ) then | sleep 1 |endif |@ INDEX = $INDEX + 1 |end |exit 0 Test script to produce testmails of equal size at mx2: | #!/bin/csh | set INDEX= 1101 | set endINDEX = 1200 | while ( $INDEX = $endINDEX ) |echo $INDEX |echo test | mail -s $INDEX test@mx2 |if ( $INDEX % 1000 == 0 ) then | sleep 1 |endif |@ INDEX = $INDEX + 1 |end |exit 0 All tests are run with vanilla mailboxes, after restarting dovecot, and without imap connections by MUA: 1) Simultaneous mailbomb approach: run both scripts simultaneously, and you'll end up with numerous duplicates in mailboxes test. Very often you'll find multiples. 2) Mailbomb approach: run one script at one server only, and all mails will become perfectly well synchronised. 3) Mofify both scripts to ( $INDEX % 1 == 0 ) to add a second waiting between every mail injection, and run them simultaneously at both servers, and you'll end up with significantly less duplicates and no more multiples. Feb 1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c960042f4 (UID=211) Feb 1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Importing mailbox INBOX failed Feb 1 07:13:24 doco2 dovecot: dsync-local(user1): Error: Remote command process isn't dying, killing it I do see those error messages as well, and in addition numerous of those: | dovecot: dsync-local(test): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=7153: 82c5df0a4ffa0b5141e36a0d5a02 != 29cc9f284ffa0b5141c236abecbd | doveadm: Error: dsync-remote(test): Error: Mailbox INBOX: Unexpected GUID mismatch for UID=7153: 82c5df0a4ffa0b5141e36a0d5a02 != 29cc9f284ffa0b5141c236abecbd | dovecot: lmtp(49752, test): Error: Corrupted index cache file /.../test/mailboxes/INBOX/dbox-Mails/dovecot.index.cache: File too small | Feb 1 18:35:16 mail.err mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: Mailbox INBOX: Corrupted index, uidvalidity=0 | Feb 1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:16 mail.err mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: Mailbox INBOX: Corrupted index, uidvalidity=0 | Feb 1 18:35:16 mail.err mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: mdbox /.../test/mailboxes/INBOX/dbox-Mails: Storage keeps breaking | Feb 1 18:35:16 mail.err mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: Mailbox INBOX: Corrupted index, uidvalidity=0 | Feb 1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file /.../test/storage/dovecot.map.index | Feb 1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: mdbox /.../test/storage: rebuilding indexes | Feb 1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Warning: fscking index file
[Dovecot] dsync replication errors
Hi I'm trying to build a cluster of two servers with dsync replication (based on http://wiki2.dovecot.org/Replication). My test setup works fine for very simple tests, I can log in to both servers, copy a message to one of the servers and it successfully apperars in the other account. But, if I try to copy a large amount of messages at once to one of the accounts, my maillogs get flodded with errors(see below) and the mailboxes seem to get out of sync and messages are duplicated over and over again (I originally copied 100 messages and ended up with thousands in both mailboxes until I killed dovecot) I'd appreciate if someone could have a look at my config and tell me what I did wrong. dovecot.conf of both servers, they are identical except for the target ip in mail_replica: dovecot -n # 2.2.beta1 (070ca24e5846+): /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-279.19.1.el6.x86_64 x86_64 CentOS release 6.3 (Final) disable_plaintext_auth = no mail_plugins = notify replication namespace { inbox = yes location = prefix = separator = / type = private } passdb { args = /etc/dovecot/dovecot-sql.conf driver = sql } plugin { mail_replica = remote:vmail@192.168.23.62 } protocols = pop3 imap service aggregator { fifo_listener replication-notify-fifo { user = vmail } unix_listener replication-notify { user = vmail } } service auth { unix_listener auth-master { group = vmail mode = 0660 user = vmail } user = root } service replicator { process_min_avail = 1 } ssl = no userdb { args = /etc/dovecot/dovecot-sql.conf driver = sql } Log on server1 after I copied 100 messages to an account on that server: Jan 31 10:41:04 doco1 dovecot: imap-login: Login: user=user1, method=PLAIN, rip=192.168.23.130, lip=192.168.23.61, mpid=1432, session=OdjlbJLUmwDAqBeC Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=72, file=1359625327.M621257P1432.doco1,S=2472,W=2547:2,) Jan 31 10:42:12 doco1 dovecot: dsync-local(user1): Error: Recent flags state corrupted for mailbox INBOX Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=73, file=1359625327.M740847P1432.doco1,S=2417,W=2492:2,) Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=74, file=1359625328.M206735P1432.doco1,S=2400,W=2474:2,) Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=75, file=1359625328.M668118P1432.doco1,S=2421,W=2496:2,) Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=76, file=1359625329.M167578P1432.doco1,S=2480,W=2559:2,) Jan 31 10:42:13 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=77, file=1359625329.M520528P1432.doco1,S=2525,W=2604:2,) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 132: 1359625329.M520528P1432.doco1,S=2525,W=2604 (uid 77 - 133) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 133: 1359625327.M621257P1432.doco1,S=2472,W=2547 (uid 72 - 134) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 134: 1359625327.M740847P1432.doco1,S=2417,W=2492 (uid 73 - 135) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 135: 1359625328.M206735P1432.doco1,S=2400,W=2474 (uid 74 - 136) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 136: 1359625328.M668118P1432.doco1,S=2421,W=2496 (uid 75 - 137) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 137: 1359625329.M167578P1432.doco1,S=2480,W=2559 (uid 76 - 138) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 139: 1359625329.M782065P1432.doco1,S=2461,W=2539 (uid 78 - 140) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 140: 1359625329.M973834P1432.doco1,S=2523,W=2602 (uid 79 - 141) Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 141:
Re: [Dovecot] dsync replication errors
On 31.1.2013, at 12.27, Oli Schacher dove...@lists.wgwh.ch wrote: I'm trying to build a cluster of two servers with dsync replication (based on http://wiki2.dovecot.org/Replication). My test setup works fine for very simple tests, I can log in to both servers, copy a message to one of the servers and it successfully apperars in the other account. But, if I try to copy a large amount of messages at once to one of the accounts, my maillogs get flodded with errors(see below) and the mailboxes seem to get out of sync and messages are duplicated over and over again (I originally copied 100 messages and ended up with thousands in both mailboxes until I killed dovecot) .. Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID (old uid=72, file=1359625327.M621257P1432.doco1,S=2472,W=2547:2,) Looks like some bug. Possibilities: a) Use mdbox format instead of maildir. It works better with dsync. b) Switch to v2.2 (latest hg version). It has a rewritte dsync that works better. Ideally do both. :)
Re: [Dovecot] dsync replication errors
a) Use mdbox format instead of maildir. It works better with dsync. ok, I'll try that (although I was hoping I could avoid migrating all boxes on the server I was planning to use this feature) b) Switch to v2.2 (latest hg version). It has a rewritte dsync that works better. the testsetup is already on 2.2 hg Thanks -- message transmitted on 100% recycled electrons
Re: [Dovecot] dsync replication errors
On 31.1.2013, at 14.06, Oli Schacher dove...@lists.wgwh.ch wrote: b) Switch to v2.2 (latest hg version). It has a rewritte dsync that works better. the testsetup is already on 2.2 hg Oh. But it's still beta1. There are several fixes done to dsync since beta1, including a fix for these maildir errors. I should release beta2 or maybe rc1 soon.
Re: [Dovecot] dsync replication errors
On Thu, 31 Jan 2013 14:27:08 +0200 Timo Sirainen t...@iki.fi wrote: Oh. But it's still beta1. There are several fixes done to dsync since beta1, including a fix for these maildir errors. I should release beta2 or maybe rc1 soon. hmm.. actually I think I built it from the latest hg (but I must admit I'm not really familiar with mercurial, so maybe I f*ckd up) dovecot -n tells me # 2.2.beta1 (070ca24e5846+): /etc/dovecot/dovecot.conf and 070ca24e5846 seems to be the latest commit according to http://hg.dovecot.org/dovecot-2.2/ (14 hours ago). not exactly sure why it says something about beta1. I tried with mdbox now.. same problem, although I don't see Expunged message reappeared anymore , but still tons of these: Server1: Jan 31 13:38:05 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a518107960042f4 (UID=136) Jan 31 13:38:05 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=cbec8e2a84650a518107960042f4 (UID=135) Jan 31 13:38:05 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a518107960042f4 (UID=148) Jan 31 13:38:05 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a518107960042f4 (UID=156) Jan 31 13:38:05 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=cbec8e2a84650a518107960042f4 (UID=147) [...] Server2: Jan 31 13:38:03 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a518107960042f4 (UID=80) Jan 31 13:38:03 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=cbec8e2a84650a518107960042f4 (UID=79) Jan 31 13:38:04 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a518107960042f4 (UID=81) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a518107960042f4 (UID=119) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a518107960042f4 (UID=128) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a518107960042f4 (UID=130) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a518107960042f4 (UID=112) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d3ec8e2a84650a518107960042f4 (UID=133) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d2ec8e2a84650a518107960042f4 (UID=131) Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=d1ec8e2a84650a518107960042f4 (UID=132) Jan 31 13:38:06 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a518107960042f4 (UID=136) Jan 31 13:38:06 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=cbec8e2a84650a518107960042f4 (UID=135) [...] -- message transmitted on 100% recycled electrons
Re: [Dovecot] dsync replication errors
On 31.1.2013, at 14.46, Oli Schacher dove...@lists.wgwh.ch wrote: On Thu, 31 Jan 2013 14:27:08 +0200 Timo Sirainen t...@iki.fi wrote: Oh. But it's still beta1. There are several fixes done to dsync since beta1, including a fix for these maildir errors. I should release beta2 or maybe rc1 soon. hmm.. actually I think I built it from the latest hg (but I must admit I'm not really familiar with mercurial, so maybe I f*ckd up) dovecot -n tells me # 2.2.beta1 (070ca24e5846+): /etc/dovecot/dovecot.conf and 070ca24e5846 seems to be the latest commit according to http://hg.dovecot.org/dovecot-2.2/ (14 hours ago). not exactly sure why it says something about beta1. So it seems. Looks like I've been browsing through your mails too quickly to pay attention. :) I tried with mdbox now.. same problem, although I don't see Expunged message reappeared anymore , but still tons of these: Server1: Jan 31 13:38:05 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a518107960042f4 (UID=136) But there's no duplication now and it gets fixed eventually, right? And you can easily reproduce this by simply copying 100 mails from one folder to another? I'll see if I can reproduce.
Re: [Dovecot] dsync replication errors
On 31.1.2013, at 15.10, Oli Schacher dove...@lists.wgwh.ch wrote: connect thunderbird to account user1 on server1 result: login ok, mdbox visible on disk, 0 messages in thunderbird copy exactly 100 messages from a spambox to user1's inbox on server1 spambox not being in server1? So not IMAP COPY command, but APPEND?
Re: [Dovecot] dsync replication errors
On Thu, 31 Jan 2013 15:24:06 +0200 Timo Sirainen t...@iki.fi wrote: On 31.1.2013, at 15.10, Oli Schacher dove...@lists.wgwh.ch wrote: connect thunderbird to account user1 on server1 result: login ok, mdbox visible on disk, 0 messages in thunderbird copy exactly 100 messages from a spambox to user1's inbox on server1 spambox not being in server1? So not IMAP COPY command, but APPEND? yes APPEND, the spambox where I got the messages from is on a completely different server. sorry for not mentioning that earlier.
Re: [Dovecot] dsync replication errors
On 31.1.2013, at 15.36, Oli Schacher dove...@lists.wgwh.ch wrote: On Thu, 31 Jan 2013 15:24:06 +0200 Timo Sirainen t...@iki.fi wrote: On 31.1.2013, at 15.10, Oli Schacher dove...@lists.wgwh.ch wrote: connect thunderbird to account user1 on server1 result: login ok, mdbox visible on disk, 0 messages in thunderbird copy exactly 100 messages from a spambox to user1's inbox on server1 spambox not being in server1? So not IMAP COPY command, but APPEND? yes APPEND, the spambox where I got the messages from is on a completely different server. sorry for not mentioning that earlier. See if http://hg.dovecot.org/dovecot-2.2/rev/1d88f01ba2aa helps?
Re: [Dovecot] dsync replication errors
On Thu, 31 Jan 2013 17:09:20 +0200 Timo Sirainen t...@iki.fi wrote: See if http://hg.dovecot.org/dovecot-2.2/rev/1d88f01ba2aa helps? I updated to the latest hg, including the remote cmd exit wait update. It looks better now, but I still manage to break things :-) # test 1: append 1000 messages messages with thunderbird, mdbox - ok, no more errors, sync ok # test 2: append only 100 messages, but use maildir again instead of mdbox. still produces errors and starts duplicating, even saw an assertion error this time, but I can't reproduce it always Jan 31 16:57:34 doco1 dovecot: imap-login: Login: user=user1, method=PLAIN, rip=192.168.23.130, lip=192.168.23.61, mpid=2684, session=4tper5fU8gDAqBeC Jan 31 16:57:35 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file dsync-mailbox-tree-fill.c: line 72 (dsync_mailbox_tree_add): assertion failed: (status.uidvalidity != 0) Jan 31 16:57:35 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5ce8a) [0x7f65aa39de8a] - /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32) [0x7f65aa39df72] - /usr/lib64/dovecot/libdovecot.so.0(+0x1f55a) [0x7f65aa36055a] - /usr/bin/doveadm(dsync_mailbox_tree_fill+0x4cf) [0x42f5cf] - /usr/bin/doveadm(dsync_brain_mailbox_trees_init+0x180) [0x428630] - /usr/bin/doveadm(dsync_brain_run+0x393) [0x426033] - /usr/bin/doveadm() [0x426331] - /usr/bin/doveadm() [0x434780] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36) [0x7f65aa3aca16] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f65aa3adaa7] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f65aa3ac9b8] - /usr/bin/doveadm() [0x424114] - /usr/bin/doveadm() [0x40fe4f] - /usr/bin/doveadm() [0x41067d] - /usr/bin/doveadm(doveadm_mail_try_run+0x141) [0x410ba1] - /usr/bin/doveadm(main+0x3f1) [0x417ba1] - /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f65a9fcccdd] - /usr/bin/doveadm() [0x40f839] Jan 31 16:57:35 doco1 dovecot: dsync-local(user1): Error: read(vmail@192.168.23.62) failed: EOF Jan 31 16:57:35 doco1 dovecot: dsync-local(user1): Error: Remote command returned error 255 Jan 31 16:58:06 doco1 dovecot: dsync-local(user1): Error: Recent flags state corrupted for mailbox INBOX Jan 31 16:58:06 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 59: 1359647883.M823994P2684.doco1,S=2483,W=2562 (uid 18 - 58) Jan 31 16:58:06 doco1 dovecot: doveadm(user1): Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 60: 1359647883.M382644P2684.doco1,S=2533,W=2610 (uid 15 - 59) [...] # test 3: mdbox again, append 1000 messages with claws mail, but have thunderbird connected at the same time to both accounts while doing so. this leads to the same problem as before (duplication, errors). I guess thunderbird wants to set a seen flag and modifying the mailbox while it's being synced is probably is a bad idea, but you never know what users are going to do :-) Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4 (UID=104) Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4 (UID=114) Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4 (UID=118) Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4 (UID=123) Let me know if you need more info/tests. -- message transmitted on 100% recycled electrons
Re: [Dovecot] dsync replication errors
On 31.1.2013, at 18.37, Oli Schacher dove...@lists.wgwh.ch wrote: I updated to the latest hg, including the remote cmd exit wait update. It looks better now, but I still manage to break things :-) # test 2: append only 100 messages, but use maildir again instead of mdbox. still produces errors and starts duplicating, even saw an assertion error this time, but I can't reproduce it always Jan 31 16:57:34 doco1 dovecot: imap-login: Login: user=user1, method=PLAIN, rip=192.168.23.130, lip=192.168.23.61, mpid=2684, session=4tper5fU8gDAqBeC Jan 31 16:57:35 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file dsync-mailbox-tree-fill.c: line 72 (dsync_mailbox_tree_add): assertion failed: (status.uidvalidity != 0) http://hg.dovecot.org/dovecot-2.2/rev/86629f621fe4 should fix this crash. The duplication happens because maildir somehow messes up itself. I guess I should look into it. test 3: mdbox again, append 1000 messages with claws mail, but have thunderbird connected at the same time to both accounts while doing so. this leads to the same problem as before (duplication, errors). I guess thunderbird wants to set a seen flag and modifying the mailbox while it's being synced is probably is a bad idea, but you never know what users are going to do :-) Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4 (UID=104) All of the clients and changes are done only to one side, not to both sides?
Re: [Dovecot] dsync replication errors
On Thu, 31 Jan 2013 18:49:18 +0200 Timo Sirainen t...@iki.fi wrote: http://hg.dovecot.org/dovecot-2.2/rev/86629f621fe4 should fix this crash. The duplication happens because maildir somehow messes up itself. I guess I should look into it. thanks, much appreciated! test 3: mdbox again, append 1000 messages with claws mail, but have thunderbird connected at the same time to both accounts while doing so. this leads to the same problem as before (duplication, errors). I guess thunderbird wants to set a seen flag and modifying the mailbox while it's being synced is probably is a bad idea, but you never know what users are going to do :-) Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4 (UID=104) All of the clients and changes are done only to one side, not to both sides? In my previous tests I had thunderbird connected to both servers, without actually doing anything, just watching the mailbox unread counter go up. It could be it tried to update both mailboxes. I don't know what thunderbird does in the background when you're not actually clicking on a mailbox. The errors were visible in both maillogs (server1 and server2). But I can reproduce the problem by connecting only to server1, in that case, the errors show up in server1's log only: the current test scenario looks like: - both servers empty mail store, configuration set to mdbox - start server 1 - start server 2 - connect claws mail to server1 - connect thunderbird to server1 too - in claws mail copy a few hundred mails from a remote box to server1 - I can see the unread counter go up in thunderbird - Remote didn't send mail errors start popping up, but only in server1's maillog this time - mails are duplicated in one testrun I also saw the assert failure below, but again, I can't reproduce this one : Jan 31 18:10:11 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file dsync-mailbox-import.c: line 1080 (dsync_mailbox_import_change): assertion failed: (change-type == DSYNC_MAIL_CHANGE_TYPE_SAVE) Jan 31 18:10:11 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5ce8a) [0x7f0ac3602e8a] - /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32) [0x7f0ac3602f72] - /usr/lib64/dovecot/libdovecot.so.0(+0x1f55a) [0x7f0ac35c555a] - /usr/bin/doveadm(dsync_mailbox_import_change+0x501) [0x42c631] - /usr/bin/doveadm(dsync_brain_sync_mails+0x3a2) [0x4290a2] - /usr/bin/doveadm(dsync_brain_run+0x169) [0x425e09] - /usr/bin/doveadm() [0x426360] - /usr/bin/doveadm() [0x434780] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36) [0x7f0ac3611a16] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f0ac3612aa7] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f0ac36119b8] - /usr/bin/doveadm() [0x424114] - /usr/bin/doveadm() [0x40fe4f] - /usr/bin/doveadm() [0x41067d] - /usr/bin/doveadm(doveadm_mail_try_run+0x141) [0x410ba1] - /usr/bin/doveadm(main+0x3f1) [0x417ba1] - /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f0ac3231cdd] - /usr/bin/doveadm() [0x40f839] -- message transmitted on 100% recycled electrons
Re: [Dovecot] dsync replication errors
On 31.1.2013, at 19.41, Oli Schacher dove...@lists.wgwh.ch wrote: Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4 (UID=104) I guess there's some bug that causes this to happen in some situations.. But the reason for mail duplication should be fixed by: http://hg.dovecot.org/dovecot-2.2/rev/138f1c76c0ec Except that shouldn't have been necessary. doveadm-server returns success before it has finished running dsync. Not sure why, need to debug it further. in one testrun I also saw the assert failure below, but again, I can't reproduce this one : Jan 31 18:10:11 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file dsync-mailbox-import.c: line 1080 (dsync_mailbox_import_change): assertion failed: (change-type == DSYNC_MAIL_CHANGE_TYPE_SAVE) Related to incremental syncing. Have to debug it further also.
Re: [Dovecot] dsync replication errors
On Thu, 2013-01-31 at 21:51 +0200, Timo Sirainen wrote: On 31.1.2013, at 19.41, Oli Schacher dove...@lists.wgwh.ch wrote: Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4 (UID=104) I guess there's some bug that causes this to happen in some situations.. But the reason for mail duplication should be fixed by: http://hg.dovecot.org/dovecot-2.2/rev/138f1c76c0ec Except that shouldn't have been necessary. doveadm-server returns success before it has finished running dsync. Not sure why, need to debug it further. Fixed with a bit of a kludge: http://hg.dovecot.org/dovecot-2.2/rev/e9e6a95cea21