Re: [Dovecot] dsync replication errors

2015-09-08 Thread Timo Sirainen
On 08 Sep 2015, at 11:20, Sergey Schwartz  
wrote:
> 
> I use mdbox and probably have similar issue, but in my case only shared 
> mailboxes were affected.

Yes, shared mailboxes don't work nicely with replication. Replication is 
locking only the original user, so for shared mailboxes multiple dsyncs can be 
running in parallel and messing things up. A bit troublesome to fix this. I've 
had this issue happening for a couple of years now for our mails and I haven't 
bothered fixing it, so it's unlikely I'll do it anytime soon.. Although I 
haven't seen that many duplicates of the mails - just 10 or so.


Re: [Dovecot] dsync replication errors

2015-09-07 Thread Gedalya

On 02/17/2013 03:21 AM, Timo Sirainen wrote:

Although there's still some mail
duplication problem with maildir that doesn't log any errors about it.
I'm not sure why that happens.


While you're around, Timo :-)

I've had such an issue recently with 2.2.18, using Maildir, where emails 
were being replicated circularly creating more and more duplicate copies.
Replication should have been unidirectional in reality since changes 
were being made on one side only.
Nothing coherent was being logged. Only "Warning: Maildir 
/srv/mail/domains/.../Maildir: Expunged message reappeared, giving a new 
UID .. " appearing on the receiving side.
Is there any intelligence on the matter, or should I isolate this down 
and report it from scratch?


Re: [Dovecot] dsync replication errors

2015-09-07 Thread Timo Sirainen
On 08 Sep 2015, at 01:16, Gedalya  wrote:
> 
> On 02/17/2013 03:21 AM, Timo Sirainen wrote:
>> Although there's still some mail
>> duplication problem with maildir that doesn't log any errors about it.
>> I'm not sure why that happens.
> 
> While you're around, Timo :-)
> 
> I've had such an issue recently with 2.2.18, using Maildir, where emails were 
> being replicated circularly creating more and more duplicate copies.
> Replication should have been unidirectional in reality since changes were 
> being made on one side only.
> Nothing coherent was being logged. Only "Warning: Maildir 
> /srv/mail/domains/.../Maildir: Expunged message reappeared, giving a new UID 
> .. " appearing on the receiving side.
> Is there any intelligence on the matter, or should I isolate this down and 
> report it from scratch?

dsync bugs usually take a lot of time to debug. Unless there's an easily 
reproducible way to break it, I try to avoid spending time on it. Also in this 
case the bug might be in Maildir code instead of dsync code.


Re: [Dovecot] dsync replication errors

2013-02-19 Thread Charles Marcus

On 2013-02-18 10:39 PM, Timo Sirainen t...@iki.fi wrote:

On 18.2.2013, at 23.50, Michael Grimm trash...@odo.in-berlin.de wrote:

With doveconf -H dovecot%9d I do end in tons of reported collisions like ...
| doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005
| doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918
| doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312

Sure there are going to be hash collisions at some point, but I highly doubt 
you're going to create a million server Dovecot cluster. :)


I've been following this thread with interest (or mostly out of 
curiosity, as I will have no need for running multiple machines, except 
possibly to run one secondary machine as a 'hot spare', but here I'm 
confused (and my ignorance is apparently showing)...


How are any of the above 'collisions? The hashes are different.

--

Best regards,

*/Charles/*



Re: [Dovecot] dsync replication errors

2013-02-19 Thread Timo Sirainen
On 19.2.2013, at 13.48, Charles Marcus cmar...@media-brokers.com wrote:

 On 2013-02-18 10:39 PM, Timo Sirainen t...@iki.fi wrote:
 On 18.2.2013, at 23.50, Michael Grimm trash...@odo.in-berlin.de wrote:
 With doveconf -H dovecot%9d I do end in tons of reported collisions like 
 ...
 | doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005
 | doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918
 | doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312
 Sure there are going to be hash collisions at some point, but I highly doubt 
 you're going to create a million server Dovecot cluster. :)
 
 I've been following this thread with interest (or mostly out of curiosity, as 
 I will have no need for running multiple machines, except possibly to run one 
 secondary machine as a 'hot spare', but here I'm confused (and my ignorance 
 is apparently showing)...
 
 How are any of the above 'collisions? The hashes are different.

Dovecot uses last 32 bits of SHA1 of the name. So collisions for example:

% printf dovecot1368344| sha1sum | awk '{print $1}' | cut -c 33-
bd593aec
% printf dovecot2055005| sha1sum | awk '{print $1}' | cut -c 33-
bd593aec



Re: [Dovecot] dsync replication errors

2013-02-18 Thread Michael Grimm
On 18.02.2013, at 07:49, Timo Sirainen t...@iki.fi wrote:
 On Sun, 2013-02-17 at 12:30 +0200, Timo Sirainen wrote:

 (So yeah, ideally there should be checks for detecting hostname hash 
 collisions..)
 
 Added to v2.2 hg:
 
 % doveconf -H dovecot%2d
 No duplicate host hashes in dovecot0 .. dovecot99

With doveconf -H dovecot%9d I do end in tons of reported collisions like ...
| doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005
| doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918
| doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312

(No wonder, I am running 2.1 replicator with identical local hostnames for some 
time now.)

... and ending with:
| Killed
 
 doveconf -H without the template it attempts to detect it from the
 current hostname.

mail doveconf -H
doveconf: Fatal: Hostname 'xxx.yyy.tld' has no digits, can't verify

JFTR and regards,
Michael



Re: [Dovecot] dsync replication errors

2013-02-18 Thread Michael Grimm
On 18.02.2013, at 07:07, Timo Sirainen t...@iki.fi wrote:
 On 17.2.2013, at 22.04, Michael Grimm trash...@odo.in-berlin.de wrote:

 First of all: whenever you referred to hostname in this thread you have 
 been using it as a synonym for the local part [1] of a FQDN, right?
 
 I mean what gethostname() function returns, which is what hostname command 
 usually also returns. And yes, I think it's the local part always.

I am not familiar with the gethostname() function within FreeBSD, but the 
hostname command normally returns your FQDN, if set. That has been the case 
because I didn't configure my service jails with FQDNs, thus a hostname 
couldn't return something else then the local hostname.  

 Given that all my interpretations of your statements are correct I do have 
 difficulties in understanding why a generic communication between Dovecot 
 servers should be limited to enforcing different local parts of all Dovecot 
 servers implied instead of different FQDN? That would make much more sense 
 regarding uniqueness in hostnames, IMHO. Two servers like 
 dovecot.forget-about.it and dovecot.you-name.it should be able to 
 communicate generically, again: IMHO.
 
 I think systems named those would belong to different clusters and wouldn't 
 need to communicate with each others.

Well, now I do understand my misunderstanding: I did consider replication 
between different clusters a generic communication between Dovecot servers, 
as well.

 I looked through the code. The hostname (without domain) are currently used 
 for:
 
 * maildir filenames
 * temporary filenames
 * authentication challenge strings in some auth mechanisms
 * logging
 
 So I think the hostname uniqueness matters mainly when using a shared 
 filesystem (e.g. NFS).

So, I'm confident that I may stick to identical local hostnames regarding both 
servers of mine.

Thanks and with kind regards,
Michael

Re: [Dovecot] dsync replication errors

2013-02-18 Thread Timo Sirainen
On 18.2.2013, at 23.50, Michael Grimm trash...@odo.in-berlin.de wrote:

 % doveconf -H dovecot%2d
 No duplicate host hashes in dovecot0 .. dovecot99
 
 With doveconf -H dovecot%9d I do end in tons of reported collisions like ...
 | doveconf: Error: Duplicate host hashes: dovecot1368344 and dovecot2055005
 | doveconf: Error: Duplicate host hashes: dovecot2042008 and dovecot2056918
 | doveconf: Error: Duplicate host hashes: dovecot1844965 and dovecot2058312

Sure there are going to be hash collisions at some point, but I highly doubt 
you're going to create a million server Dovecot cluster. :)



Re: [Dovecot] dsync replication errors

2013-02-17 Thread Timo Sirainen
On Sat, 2013-02-16 at 19:32 +0100, Oli Schacher wrote:

 There seems to be an issue left when expunging a large amount of
 messages from the Trash. I managed to get it twice so far by expunging
 ~3k messages. I'll try to create a reproducible test script for this
 scenario. I can currently only provide my clicking around log output.
 Version is current hg, e63d1cf19ec7.
 
 First time it happened:
 Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir 
 /mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new 
 UID (old uid=1221, file=1361035457.M728795P6220.doco1,S=2476,W=2555:2,Sa)

These errors should be gone now in hg. Although there's still some mail
duplication problem with maildir that doesn't log any errors about it.
I'm not sure why that happens.

 Feb 16 18:50:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
 dsync(local): Received unexpected input S != H

Fixed also this error that happened on locking failure.

 Feb 16 19:13:08 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: 
 file mail-transaction-log-view.c: line 72 (mail_transaction_log_view_set): 
 assertion failed: (min_file_seq = max_file_seq)

Not sure about this one. But usually this happens only once and retry
works.




Re: [Dovecot] dsync replication errors

2013-02-17 Thread Michael Grimm
On 17.02.2013, at 06:23, Timo Sirainen t...@iki.fi wrote:
 On 17.2.2013, at 7.06, Timo Sirainen t...@iki.fi wrote:
 On 17.2.2013, at 0.12, Michael Grimm trash...@odo.in-berlin.de wrote:

 Hmm. Both jails run at distinct servers. ssh replication uses different 
 domains, though. But, both jails are named identically test, and both 
 jails resolve to identical hostnames test if using hostname. But, a 
 hostname -f is lacking to return test.mx1.invalid and 
 test.mx2.invalid, respectively (although a nslookup test does). Hmm, do 
 you think I should need to provide different hostnames in both jails? 
 
 That's the problem most likely. I'd guess Dovecot sees both servers as 
 having test as the hostname and each server thinks it's the one that 
 should be doing the locking and not the other.
 
 See if this helps: http://hg.dovecot.org/dovecot-2.2/rev/e7aabd79c9d5

Good news! Those identical hostnames at both servers broke replicator. Now, 
with v2.2.beta1 (1dd1e88ba0a2) I cannot break replicator any longer how many 
messages I do inject at both servers simultaneously. (Tested a couple of times 
up to 2000 mails at every server.)

 Although even if it does, other parts of Dovecot still use only the hostname 
 part to guarantee global uniqueness of things. So better to have unique 
 hostnames.

What parts of Dovecot would be involved? I'm curious because my production 
mailservers use identical hostnames in their jails ever since running Dovecot 
(starting 1.x).

Thanks for the new replicator code, I really appreciate your work! And, from my 
point of view I will consider replicator v2.2 ready for production.

With kind regards,
Michael



Re: [Dovecot] dsync replication errors

2013-02-17 Thread Timo Sirainen
On Sun, 2013-02-17 at 10:44 +0100, Michael Grimm wrote:

  Although even if it does, other parts of Dovecot still use only the 
  hostname part to guarantee global uniqueness of things. So better to have 
  unique hostnames.
 
 What parts of Dovecot would be involved? I'm curious because my production 
 mailservers use identical hostnames in their jails ever since running Dovecot 
 (starting 1.x).

Mainly that maildir filenames are used as GUIDs. If two have the same
name, they are assumed to be identical. That's why the maildir filenames
include the hostname in them, to make sure that the GUID is different
even if two mails happen to be delivered at exactly the same time with
the same PID and same size to two different servers. So pretty unlikely,
but better to be safe. :)

There may be some other features that require unique hostnames in
future. Anything where multiple Dovecot servers need to communicate
between each others. If some day there is such generic communication
between Dovecot servers I'm planning on enforcing this requirement.




Re: [Dovecot] dsync replication errors

2013-02-17 Thread Michael Grimm
On 17.02.2013, at 11:08, Timo Sirainen t...@iki.fi wrote:
 On Sun, 2013-02-17 at 10:44 +0100, Michael Grimm wrote:

 Although even if it does, other parts of Dovecot still use only the 
 hostname part to guarantee global uniqueness of things. So better to have 
 unique hostnames.
 
 What parts of Dovecot would be involved? I'm curious because my production 
 mailservers use identical hostnames in their jails ever since running 
 Dovecot (starting 1.x).
 
 Mainly that maildir filenames are used as GUIDs. If two have the same
 name, they are assumed to be identical. That's why the maildir filenames
 include the hostname in them, to make sure that the GUID is different
 even if two mails happen to be delivered at exactly the same time with
 the same PID and same size to two different servers. So pretty unlikely,
 but better to be safe. :)

Ok, that won't hit me for the time being because I am using mdbox.

 There may be some other features that require unique hostnames in
 future. Anything where multiple Dovecot servers need to communicate
 between each others. If some day there is such generic communication
 between Dovecot servers I'm planning on enforcing this requirement.

Thanks for that clarification. Thus I will need to think about different 
hostnames, although that implies no more just copying config files between 
both servers that imply identical hostnames at both sites ;-)

Regards,
Michael



Re: [Dovecot] dsync replication errors

2013-02-17 Thread Timo Sirainen
On 17.2.2013, at 12.19, Michael Grimm trash...@odo.in-berlin.de wrote:

 On 17.02.2013, at 11:08, Timo Sirainen t...@iki.fi wrote:
 On Sun, 2013-02-17 at 10:44 +0100, Michael Grimm wrote:
 
 Although even if it does, other parts of Dovecot still use only the 
 hostname part to guarantee global uniqueness of things. So better to have 
 unique hostnames.
 
 What parts of Dovecot would be involved? I'm curious because my production 
 mailservers use identical hostnames in their jails ever since running 
 Dovecot (starting 1.x).
 
 Mainly that maildir filenames are used as GUIDs. If two have the same
 name, they are assumed to be identical. That's why the maildir filenames
 include the hostname in them, to make sure that the GUID is different
 even if two mails happen to be delivered at exactly the same time with
 the same PID and same size to two different servers. So pretty unlikely,
 but better to be safe. :)
 
 Ok, that won't hit me for the time being because I am using mdbox.

It's basically the same with mdbox, except instead of using actual hostname 
it's using a 32bit hash of it. (So yeah, ideally there should be checks for 
detecting hostname hash collisions..)



Re: [Dovecot] dsync replication errors

2013-02-17 Thread Reindl Harald


Am 17.02.2013 11:08, schrieb Timo Sirainen:
 What parts of Dovecot would be involved? I'm curious because my production 
 mailservers use identical hostnames in their jails ever since running 
 Dovecot (starting 1.x).
 
 Mainly that maildir filenames are used as GUIDs. If two have the same
 name, they are assumed to be identical. That's why the maildir filenames
 include the hostname in them, to make sure that the GUID is different
 even if two mails happen to be delivered at exactly the same time with
 the same PID and same size to two different servers. So pretty unlikely,
 but better to be safe. :)
 
 There may be some other features that require unique hostnames in
 future. Anything where multiple Dovecot servers need to communicate
 between each others. If some day there is such generic communication
 between Dovecot servers I'm planning on enforcing this requirement.

Postfix is enforcing this since forever
Greeted me with my own hostname

hostnames inside a network should always be unique



signature.asc
Description: OpenPGP digital signature


Re: [Dovecot] dsync replication errors

2013-02-17 Thread Michael Grimm
On 17.02.2013, at 11:08, Timo Sirainen t...@iki.fi wrote:

 There may be some other features that require unique hostnames in
 future. Anything where multiple Dovecot servers need to communicate
 between each others.

I'd like to come back to that issue in order to understand your statement cited 
below.

First of all: whenever you referred to hostname in this thread you have been 
using it as a synonym for the local part [1] of a FQDN, right?

I have both servers of mine configured to use identical local parts (test) 
but different FQDN (aka test.domainA.tldA and test.domainB.tldB). Your fix 
has been to replace my_hostname by my_hostdomain(), thus using 
test.domainA.tldA and test.domainB.tldB instead of test, right?

 If some day there is such generic communication between Dovecot servers
 I'm planning on enforcing this requirement.

Given that all my interpretations of your statements are correct I do have 
difficulties in understanding why a generic communication between Dovecot 
servers should be limited to enforcing different local parts of all Dovecot 
servers implied instead of different FQDN? That would make much more sense 
regarding uniqueness in hostnames, IMHO. Two servers like 
dovecot.forget-about.it and dovecot.you-name.it should be able to 
communicate generically, again: IMHO.

BTW: I had had defined hostname= in dovecot.conf identically using completely 
different *but* identical FQDNs mail.my-domain.tld because of:

| conf.d/15-lda.conf:

| # Hostname to use in various parts of sent mails, eg. in Message-Id.
| # Default is the system's real hostname.
| #hostname = 

At least my_hostdomain() doesn't care about that setting, right? 

Again, I can live with mandatory different local hostname parts, but I would 
love to understand why ...

With kind regards,
Michael


[1] http://en.wikipedia.org/wiki/Hostname

Re: [Dovecot] dsync replication errors

2013-02-17 Thread Reindl Harald


Am 17.02.2013 21:04, schrieb Michael Grimm:
 On 17.02.2013, at 11:08, Timo Sirainen t...@iki.fi wrote:
 
 There may be some other features that require unique hostnames in
 future. Anything where multiple Dovecot servers need to communicate
 between each others.
 
 I'd like to come back to that issue in order to understand your statement 
 cited below.
 
 First of all: whenever you referred to hostname in this thread you have 
 been using it as a synonym for the local part [1] of a FQDN, right?
 
 I have both servers of mine configured to use identical local parts (test) 
 but different FQDN (aka test.domainA.tldA and test.domainB.tldB). Your 
 fix has been to replace my_hostname by my_hostdomain(), thus using 
 test.domainA.tldA and test.domainB.tldB instead of test, right?
 
 If some day there is such generic communication between Dovecot servers
 I'm planning on enforcing this requirement.
 
 Given that all my interpretations of your statements are correct I do have 
 difficulties in understanding why a generic communication between Dovecot 
 servers should be limited to enforcing different local parts of all Dovecot 
 servers implied instead of different FQDN? That would make much more sense 
 regarding uniqueness in hostnames, IMHO. Two servers like 
 dovecot.forget-about.it and dovecot.you-name.it should be able to 
 communicate generically, again: IMHO.


the better design would be if doveot generates some UUID at the first startup
in a /etc/dovecot/uuid.conf if the file does not exist becasue it would
make hostnames meaningless at all AND give you the option if you are
knowing what you are doing to replace a machine with a newer one by rsync
datadirs and the whole /etc/dovecot/



signature.asc
Description: OpenPGP digital signature


Re: [Dovecot] dsync replication errors

2013-02-17 Thread Michael Grimm
On 17.02.2013, at 21:04, Michael Grimm trash...@odo.in-berlin.de wrote:

 BTW: I had had defined hostname= in dovecot.conf identically using 
 completely different *but* identical FQDNs mail.my-domain.tld because of:
 
s/using completely different/using completely different to locally reported by 
resolver/g

Regards,
Michael



Re: [Dovecot] dsync replication errors

2013-02-17 Thread Timo Sirainen
On 17.2.2013, at 22.04, Michael Grimm trash...@odo.in-berlin.de wrote:

 On 17.02.2013, at 11:08, Timo Sirainen t...@iki.fi wrote:
 
 There may be some other features that require unique hostnames in
 future. Anything where multiple Dovecot servers need to communicate
 between each others.
 
 I'd like to come back to that issue in order to understand your statement 
 cited below.
 
 First of all: whenever you referred to hostname in this thread you have 
 been using it as a synonym for the local part [1] of a FQDN, right?

I mean what gethostname() function returns, which is what hostname command 
usually also returns. And yes, I think it's the local part always.

 I have both servers of mine configured to use identical local parts (test) 
 but different FQDN (aka test.domainA.tldA and test.domainB.tldB). Your 
 fix has been to replace my_hostname by my_hostdomain(), thus using 
 test.domainA.tldA and test.domainB.tldB instead of test, right?

Yes.

 If some day there is such generic communication between Dovecot servers
 I'm planning on enforcing this requirement.
 
 Given that all my interpretations of your statements are correct I do have 
 difficulties in understanding why a generic communication between Dovecot 
 servers should be limited to enforcing different local parts of all Dovecot 
 servers implied instead of different FQDN? That would make much more sense 
 regarding uniqueness in hostnames, IMHO. Two servers like 
 dovecot.forget-about.it and dovecot.you-name.it should be able to 
 communicate generically, again: IMHO.

I think systems named those would belong to different clusters and wouldn't 
need to communicate with each others.

I looked through the code. The hostname (without domain) are currently used for:

 * maildir filenames
 * temporary filenames
 * authentication challenge strings in some auth mechanisms
 * logging

So I think the hostname uniqueness matters mainly when using a shared 
filesystem (e.g. NFS).

 BTW: I had had defined hostname= in dovecot.conf identically using 
 completely different *but* identical FQDNs mail.my-domain.tld because of:
 
 | conf.d/15-lda.conf:
 
 | # Hostname to use in various parts of sent mails, eg. in Message-Id.
 | # Default is the system's real hostname.
 | #hostname = 
 
 At least my_hostdomain() doesn't care about that setting, right? 

Right. I updated the comment a bit: 
http://hg.dovecot.org/dovecot-2.2/rev/6a67a1440e15

lda_hostname would have been a better name for the settings.



Re: [Dovecot] dsync replication errors

2013-02-17 Thread Timo Sirainen
On Sun, 2013-02-17 at 12:30 +0200, Timo Sirainen wrote:

 (So yeah, ideally there should be checks for detecting hostname hash 
 collisions..)

Added to v2.2 hg:

% doveconf -H dovecot%d
No duplicate host hashes in dovecot0 .. dovecot9
% doveconf -H dovecot%2d
No duplicate host hashes in dovecot0 .. dovecot99
% doveconf -H dovecot%02d
No duplicate host hashes in dovecot00 .. dovecot99

doveconf -H without the template it attempts to detect it from the
current hostname.




Re: [Dovecot] dsync replication errors

2013-02-16 Thread Timo Sirainen
I did a bunch of dsync fixes today in hg. With the new locking behavior
(and other fixes) you shouldn't be able to break it anymore.

On Fri, 2013-02-01 at 21:53 +0100, Michael Grimm wrote:
 [Sorry Oli for my previous mail to your address, only. Resent here]
 
 Oli Schacher dove...@lists.wgwh.ch wrote:
 
  There still seems to be a problem when changes to both mailboxes at
  the same time are involved
 
 I can confirm your observation, although triggered by a different test
 scenario, similar to the one I did use with 2.1 replicator before
 (http://www.dovecot.org/list/dovecot/2012-March/064354.html).
 
 This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes test
 at both servers mx1 and mx2, and replicator uses ssh for remote
 access. Both servers run a recent postfix, use lmtp for local delivery,
 and test is a virtual user.
 
 Test script to produce local testmails of equal size at mx1:
 | #!/bin/csh
 | set INDEX= 101
 | set endINDEX = 200
 | while ( $INDEX = $endINDEX )
 |echo $INDEX
 |echo test | mail -s $INDEX test@mx1
 |if ( $INDEX % 1000 == 0 ) then
 |   sleep 1
 |endif
 |@ INDEX = $INDEX + 1
 |end
 |exit 0
 
 Test script to produce testmails of equal size at mx2:
 | #!/bin/csh
 | set INDEX= 1101
 | set endINDEX = 1200
 | while ( $INDEX = $endINDEX )
 |echo $INDEX
 |echo test | mail -s $INDEX test@mx2
 |if ( $INDEX % 1000 == 0 ) then
 |   sleep 1
 |endif
 |@ INDEX = $INDEX + 1
 |end
 |exit 0
 
 All tests are run with vanilla mailboxes, after restarting dovecot, and
 without imap connections by MUA:
 
 1) Simultaneous mailbomb approach: run both scripts simultaneously, and
   you'll end up with numerous duplicates in mailboxes test. Very often
   you'll find multiples.
 
 2) Mailbomb approach: run one script at one server only, and all mails
   will become perfectly well synchronised.
 
 3) Mofify both scripts to ( $INDEX % 1 == 0 ) to add a second waiting
   between every mail injection, and run them simultaneously at both
   servers, and you'll end up with significantly less duplicates and no
   more multiples.
 
  Feb  1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: 
  Remote didn't send mail GUID=7a30ff22af5b0b510f0c960042f4 (UID=211)
 
  Feb  1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Importing mailbox 
  INBOX failed
 
  Feb  1 07:13:24 doco2 dovecot: dsync-local(user1): Error: Remote command 
  process isn't dying, killing it
 
 I do see those error messages as well, and in addition numerous of those:
 
 | dovecot: dsync-local(test): Error: Mailbox INBOX: Unexpected GUID mismatch 
 for UID=7153: 82c5df0a4ffa0b5141e36a0d5a02 != 
 29cc9f284ffa0b5141c236abecbd
 
 | doveadm: Error: dsync-remote(test): Error: Mailbox INBOX: Unexpected GUID 
 mismatch for UID=7153: 82c5df0a4ffa0b5141e36a0d5a02 != 
 29cc9f284ffa0b5141c236abecbd
 
 | dovecot: lmtp(49752, test): Error: Corrupted index cache file 
 /.../test/mailboxes/INBOX/dbox-Mails/dovecot.index.cache: File too small
 
 | Feb  1 18:35:16 mail.err  mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Error: Mailbox INBOX: Corrupted index, uidvalidity=0
 | Feb  1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: fscking index file /.../test/storage/dovecot.map.index
 | Feb  1 18:35:16 mail.err  mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Error: Mailbox INBOX: Corrupted index, uidvalidity=0
 | Feb  1 18:35:16 mail.err  mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Error: mdbox /.../test/mailboxes/INBOX/dbox-Mails: Storage keeps breaking
 | Feb  1 18:35:16 mail.err  mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Error: Mailbox INBOX: Corrupted index, uidvalidity=0
 | Feb  1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: fscking index file /.../test/storage/dovecot.map.index
 | Feb  1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: mdbox /.../test/storage: rebuilding indexes
 | Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: fscking index file /.../test/storage/dovecot.map.index
 | Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: fscking index file /.../test/storage/dovecot.map.index
 | Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: mdbox /.../test/storage: rebuilding indexes
 | Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: fscking index file /.../test/storage/dovecot.map.index
 | Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: fscking index file /.../test/storage/dovecot.map.index
 | Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: mdbox /.../test/storage: rebuilding indexes
 | Feb  1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: fscking index file /.../test/storage/dovecot.map.index
 | Feb  1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
 Warning: fscking 

Re: [Dovecot] dsync replication errors

2013-02-16 Thread Michael Grimm
Timo Sirainen t...@iki.fi wrote:

 I did a bunch of dsync fixes today in hg. With the new locking
 behavior (and other fixes) you shouldn't be able to break it anymore.

Sorry to say, but I am still able to break replicator with v2.2.beta1
(35194cf0693e) under the conditions outlined below.

 On 2013-02-01 Michael Grimm wrote:
 
 This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes test
 at both servers mx1 and mx2, and replicator uses ssh for remote
 access. Both servers run a recent postfix, use lmtp for local delivery,
 and test is a virtual user.

I might add that both servers run inside FreeBSD jails (if that might make
the difference to your test setup.

 All tests are run with vanilla mailboxes, after restarting dovecot, and
 without imap connections by MUA:

This time I did even restart both service jails before every test. And, I
did use both Mail.app and roundcube as MUA to check the results (if Mail.app
might have screwed INBOX ...)

 1) Simultaneous mailbomb approach: run both scripts simultaneously, and
   you'll end up with numerous duplicates in mailboxes test. Very often
   you'll find multiples.

Still a lot of duplicates and multiples. Those numbers are not reproducable,
240 (best case) up to 340 (worst case) instead of 200 messages (after 10 
tests).
 
Here is one logfile example of a triplicated mail injected at mx1:

logfile at mx1:
| Feb 16 19:03:12 mail.info mx1 postfix/pickup[33958]: 3Z7fMh1PYMz5Ng: uid=0 
from=root
| Feb 16 19:03:12 mail.info mx1 postfix/cleanup[34320]: 3Z7fMh1PYMz5Ng: 
message-id=3Z7fMh1PYMz5Ng@test.mx1.invalid
| Feb 16 19:03:12 mail.info mx1 postfix/qmgr[33959]: 3Z7fMh1PYMz5Ng: 
from=root@mx1.invalid, size=310, nrcpt=1 (queue active)
| Feb 16 19:03:12 mail.info mx1 dovecot: lmtp(34456, test): copy from : 
box=INBOX, uid=12, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, 
from=root@mx1.invalid (admin), flags=()
| Feb 16 19:03:12 mail.info mx1 dovecot: lmtp(34456, test): 
nVlIDeDJH1GYhgAAag1aAg: sieve: msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid: stored 
mail into mailbox 'INBOX'
| Feb 16 19:03:12 mail.info mx1 postfix/lmtp[34453]: 3Z7fMh1PYMz5Ng: 
to=test@mx1.invalid, orig_to=tt@mx1.invalid, 
relay=test.mx1.invalid[private/dovecot-lmtp], delay=0.29, delays=0.08/0/0/0.21, 
dsn=2.0.0, status=sent (250 2.0.0 test@mx1.invalid nVlIDeDJH1GYhgAAag1aAg 
Saved)
| Feb 16 19:03:12 mail.info mx1 postfix/qmgr[33959]: 3Z7fMh1PYMz5Ng: removed
| Feb 16 19:03:13 mail.info mx1 dovecot: dsync-local(test): copy from INBOX: 
box=INBOX, uid=42, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, 
from=root@mx1.invalid (admin), flags=()
| Feb 16 19:03:13 mail.info mx1 dovecot: dsync-local(test): expunge: 
box=INBOX, uid=12, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, 
from=root@mx1.invalid (admin), flags=(\Recent)
| Feb 16 19:03:16 mail.info mx1 dovecot: dsync-local(test): copy from INBOX: 
box=INBOX, uid=164, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, 
from=root@mx1.invalid (admin), flags=()
| Feb 16 19:03:16 mail.info mx1 dovecot: dsync-local(test): copy from INBOX: 
box=INBOX, uid=263, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, 
from=root@mx1.invalid (admin), flags=()
| Feb 16 19:03:16 mail.info mx1 dovecot: dsync-local(test): expunge: 
box=INBOX, uid=118, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, 
from=root@mx1.invalid (admin), flags=(\Recent)
| Feb 16 19:03:16 mail.info mx1 dovecot: dsync-local(test): expunge: 
box=INBOX, uid=42, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, 
from=root@mx1.invalid (admin), flags=(\Recent)

after reading those three messages at mx1:
| Feb 16 19:04:22 mail.info mx1 dovecot: imap(test) hQjfUNvVPwBd3Cqw: 
flag_change: box=INBOX, uid=372, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, 
size=544, from=root@mx1.invalid (admin), flags=(\Seen \Recent)
| Feb 16 19:05:40 mail.info mx1 dovecot: imap(test) hQjfUNvVPwBd3Cqw: 
flag_change: box=INBOX, uid=263, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, 
size=544, from=root@mx1.invalid (admin), flags=(\Seen \Recent)
| Feb 16 19:05:41 mail.info mx1 dovecot: imap(test) hQjfUNvVPwBd3Cqw: 
flag_change: box=INBOX, uid=164, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, 
size=544, from=root@mx1.invalid (admin), flags=(\Seen \Recent)

logfile at mx2:
| Feb 16 19:03:13 mail.info mx2 dovecot: dsync-local(test): save: box=INBOX, 
uid=50, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, 
from=root@mx1.invalid (admin), flags=()
| Feb 16 19:03:17 mail.info mx2 dovecot: dsync-local(test): copy from INBOX: 
box=INBOX, uid=372, msgid=3Z7fMh1PYMz5Ng@test.mx1.invalid, size=544, 
from=root@mx1.invalid (admin), flags=()

 2) Mailbomb approach: run one script at one server only, and all mails
   will become perfectly well synchronised.

Same results here.

 3) Modify both scripts to ( $INDEX % 1 == 0 ) to add a second waiting
   between every mail injection, and run them simultaneously at both
   servers, and you'll end up with significantly less duplicates and no
   more multiples.

Same results here.

Good: I cannot 

Re: [Dovecot] dsync replication errors

2013-02-16 Thread Oli Schacher
On Sat, 16 Feb 2013 17:20:22 +0200
Timo Sirainen t...@iki.fi wrote:

 I did a bunch of dsync fixes today in hg. With the new locking
 behavior (and other fixes) you shouldn't be able to break it anymore.
 

Thanks for the fixes, Timo! 

I can confirm I'm no longer able to break anything
with the tests I've mentioned so far(mass appending, simultaneous
append and delete on both mailboxes), no more errors, no more dupes.

I can also confirm the doveadm-server crash  I reported in
http://dovecot.markmail.org/thread/fb3qjnsdhtcpirg3 is now gone.

There seems to be an issue left when expunging a large amount of
messages from the Trash. I managed to get it twice so far by expunging
~3k messages. I'll try to create a reproducible test script for this
scenario. I can currently only provide my clicking around log output.
Version is current hg, e63d1cf19ec7.

First time it happened:
Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir 
/mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID 
(old uid=1221, file=1361035457.M728795P6220.doco1,S=2476,W=2555:2,Sa)
Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir 
/mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID 
(old uid=1222, file=1361035458.M501466P6220.doco1,S=2477,W=2556:2,Sa)
Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir 
/mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID 
(old uid=1223, file=1361035458.M988177P6220.doco1,S=2520,W=2599:2,Sa)
Feb 16 18:49:48 doco2 dovecot: imap(user1): Warning: Maildir 
/mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID 
(old uid=1224, file=1361035459.M254031P6220.doco1,S=2483,W=2562:2,Sa)
Feb 16 18:49:49 doco2 dovecot: imap(user1): Warning: Maildir 
/mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID 
(old uid=1225, file=1361035459.M431911P6220.doco1,S=2490,W=2569:2,Sa)
Feb 16 18:49:49 doco2 dovecot: imap(user1): Warning: Maildir 
/mailstore/user1/maildir/.Trash: Expunged message reappeared, giving a new UID 
(old uid=1226, file=1361035459.M959244P6220.doco1,S=2482,W=2561:2,Sa)
Feb 16 18:50:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Couldn't lock /mailstore/user1/.dovecot-sync.lock: Interrupted system call
Feb 16 18:50:14 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
dsync(local): Received unexpected input S != H
Feb 16 18:50:14 doco2 dovecot: dsync-local(user1): Error: 
read(vmail@192.168.23.61) failed: EOF
Feb 16 18:50:14 doco2 dovecot: dsync-local(user1): Error: Remote command 
returned error 75
Feb 16 18:50:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Couldn't lock /mailstore/user1/.dovecot-sync.lock: Interrupted system call
Feb 16 18:50:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
dsync(local): Received unexpected input N != H
Feb 16 18:50:44 doco2 dovecot: dsync-local(user1): Error: 
read(vmail@192.168.23.61) failed: EOF
Feb 16 18:50:44 doco2 dovecot: dsync-local(user1): Error: Remote command 
returned error 75


2nd time: (no reappeared messages this time)
Feb 16 19:08:13 doco2 dovecot: imap-login: Login: user=user1, method=PLAIN, 
rip=192.168.23.130, lip=192.168.23.62, mpid=4794, session=DZ8RYNvVyADAqBeC
Feb 16 19:08:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Couldn't lock /mailstore/user1/.dovecot-sync.lock: Interrupted system call
Feb 16 19:08:44 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
dsync(local): Received unexpected input S != H
Feb 16 19:08:44 doco2 dovecot: dsync-local(user1): Error: 
read(vmail@192.168.23.61) failed: EOF
Feb 16 19:08:44 doco2 dovecot: dsync-local(user1): Error: Remote command 
returned error 75


A while later on the other server:
Feb 16 19:13:08 doco1 dovecot: doveadm: Error: dsync-remote(user1): Panic: file 
mail-transaction-log-view.c: line 72 (mail_transaction_log_view_set): assertion 
failed: (min_file_seq = max_file_seq)
Feb 16 19:13:08 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: Raw 
backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5dc2a) [0x7f305f325c2a] - 
/usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32) [0x7f305f325d12] 
- /usr/lib64/dovecot/libdovecot.so.0(+0x1f80a) [0x7f305f2e780a] - 
/usr/lib64/dovecot/libdovecot-storage.so.0(mail_transaction_log_view_set+0x580) 
[0x7f305f64e3f0] - /usr/bin/doveadm() [0x43786b] - 
/usr/bin/doveadm(dsync_transaction_log_scan_init+0x8c) [0x43791c] - 
/usr/bin/doveadm(dsync_brain_sync_mailbox_open+0x5e) [0x42724e] - 
/usr/bin/doveadm(dsync_brain_slave_recv_mailbox+0x123) [0x427c63] - 
/usr/bin/doveadm(dsync_brain_run+0x178) [0x425ff8] - /usr/bin/doveadm() 
[0x4265d1] - /usr/bin/doveadm() [0x4357f0] - 
/usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36) [0x7f305f334bd6] - 
/usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) [0x7f305f335c67] 
- /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f305f334b78] -
  /usr/bin/doveadm() 

Re: [Dovecot] dsync replication errors

2013-02-16 Thread Timo Sirainen
On 16.2.2013, at 20.26, Michael Grimm trash...@odo.in-berlin.de wrote:

 Timo Sirainen t...@iki.fi wrote:
 
 I did a bunch of dsync fixes today in hg. With the new locking
 behavior (and other fixes) you shouldn't be able to break it anymore.
 
 Sorry to say, but I am still able to break replicator with v2.2.beta1
 (35194cf0693e) under the conditions outlined below.

I wonder if locking is working correctly in your setup. Your users have home 
directories, right? Dovecot should be creating .dovecot-sync.lock files in 
there during the sync.

 This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes test
 at both servers mx1 and mx2, and replicator uses ssh for remote
 access. Both servers run a recent postfix, use lmtp for local delivery,
 and test is a virtual user.
 
 I might add that both servers run inside FreeBSD jails (if that might make
 the difference to your test setup.

Inside jail Dovecot sees two different hostnames (same as hostname command)?

 Good: I cannot find any Error: entries in both logfiles any longer.

What about Warning?



Re: [Dovecot] dsync replication errors

2013-02-16 Thread Michael Grimm
On 16.02.2013, at 20:09, Timo Sirainen t...@iki.fi wrote:
 On 16.2.2013, at 20.26, Michael Grimm trash...@odo.in-berlin.de wrote:

 Sorry to say, but I am still able to break replicator with v2.2.beta1
 (35194cf0693e) under the conditions outlined below.
 
 I wonder if locking is working correctly in your setup. Your users have home 
 directories, right?

Yes, I do have homedirs, ...

 Dovecot should be creating .dovecot-sync.lock files in there during the sync.

... and I double-checked that a .dovecot-sync.lock lockfile is being created 
during replication, and yes, it is. 

 I might add that both servers run inside FreeBSD jails (if that might make
 the difference to your test setup.
 
 Inside jail Dovecot sees two different hostnames (same as hostname command)?

Hmm. Both jails run at distinct servers. ssh replication uses different 
domains, though. But, both jails are named identically test, and both jails 
resolve to identical hostnames test if using hostname. But, a hostname -f 
is lacking to return test.mx1.invalid and test.mx2.invalid, respectively 
(although a nslookup test does). Hmm, do you think I should need to provide 
different hostnames in both jails? 

 Good: I cannot find any Error: entries in both logfiles any longer.
 
 What about Warning?

I do see only those few messages at both servers:

| dovecot: doveadm(test): Warning: fscking index file 
/.../test/storage/dovecot.map.index
| dovecot: doveadm(test): Warning: fscking index file 
/.../test/storage/dovecot.map.index
| dovecot: doveadm(test): Warning: mdbox /.../test/storage: rebuilding indexes

Please let me know what you want me to test next.

I really to appreciate your efforts and with kind regards,
Michael



Re: [Dovecot] dsync replication errors

2013-02-16 Thread Timo Sirainen
On 17.2.2013, at 0.12, Michael Grimm trash...@odo.in-berlin.de wrote:

 I might add that both servers run inside FreeBSD jails (if that might make
 the difference to your test setup.
 
 Inside jail Dovecot sees two different hostnames (same as hostname 
 command)?
 
 Hmm. Both jails run at distinct servers. ssh replication uses different 
 domains, though. But, both jails are named identically test, and both jails 
 resolve to identical hostnames test if using hostname. But, a hostname 
 -f is lacking to return test.mx1.invalid and test.mx2.invalid, 
 respectively (although a nslookup test does). Hmm, do you think I should 
 need to provide different hostnames in both jails? 

That's the problem most likely. I'd guess Dovecot sees both servers as having 
test as the hostname and each server thinks it's the one that should be doing 
the locking and not the other.

See if this helps: http://hg.dovecot.org/dovecot-2.2/rev/e7aabd79c9d5



Re: [Dovecot] dsync replication errors

2013-02-16 Thread Timo Sirainen
On 17.2.2013, at 7.06, Timo Sirainen t...@iki.fi wrote:

 On 17.2.2013, at 0.12, Michael Grimm trash...@odo.in-berlin.de wrote:
 
 Hmm. Both jails run at distinct servers. ssh replication uses different 
 domains, though. But, both jails are named identically test, and both 
 jails resolve to identical hostnames test if using hostname. But, a 
 hostname -f is lacking to return test.mx1.invalid and 
 test.mx2.invalid, respectively (although a nslookup test does). Hmm, do 
 you think I should need to provide different hostnames in both jails? 
 
 That's the problem most likely. I'd guess Dovecot sees both servers as having 
 test as the hostname and each server thinks it's the one that should be 
 doing the locking and not the other.
 
 See if this helps: http://hg.dovecot.org/dovecot-2.2/rev/e7aabd79c9d5

Although even if it does, other parts of Dovecot still use only the hostname 
part to guarantee global uniqueness of things. So better to have unique 
hostnames.



Re: [Dovecot] dsync replication errors

2013-02-01 Thread Oli Schacher
On Thu, 31 Jan 2013 22:17:28 +0200
Timo Sirainen t...@iki.fi wrote:

 On Thu, 2013-01-31 at 21:51 +0200, Timo Sirainen wrote:
  On 31.1.2013, at 19.41, Oli Schacher dove...@lists.wgwh.ch wrote:
  
   Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error:
   Mailbox INBOX: Remote didn't send mail
   GUID=33dabe0f11980a51200c960042f4 (UID=104)
  
  I guess there's some bug that causes this to happen in some
  situations.. But the reason for mail duplication should be fixed
  by: http://hg.dovecot.org/dovecot-2.2/rev/138f1c76c0ec
  
  Except that shouldn't have been necessary. doveadm-server returns
  success before it has finished running dsync. Not sure why, need to
  debug it further.
 
 Fixed with a bit of a kludge:
 http://hg.dovecot.org/dovecot-2.2/rev/e9e6a95cea21
 
 

I can confirm that it has become significantly harder to produce errors
with the latest patches. There still seems to be a problem when changes
to both mailboxes at the same time are involved, however, today I didn't
have time to test scientifically, i just updated to latest hg and
clicked around, so this report probably won't be of much use to
you,sorry. I'll try to make reproducible tests again next week.

I'll post the errors from my clicking session anyway, maybe it helps you
figuring out what went wrong even without knowing how to reproduce. At
least the Operation not permitted error below when killing the
dsync process sounds unintended?


Logoutput is from changeset 78bdcb6642c7 running on both servers.


Server 1:
Feb  1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c960042f4
(UID=211)
Feb  1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c960042f4
(UID=205)
Feb  1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Remote didn't send mail GUID=7a30ff22af5b0b510f0c960042f4
(UID=208)
Feb  1 07:12:54 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Error: Mailbox INBOX: Unexpected GUID mismatch for UID=205:
7a30ff22af5b0b510f0c960042f4 != 8230ff22af5b0b510f0c960042f4
Feb  1 07:12:54 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Error: Mailbox INBOX: Remote didn't send mail
GUID=7b30ff22af5b0b510f0c960042f4 (UID=228)
[...]
Feb  1 07:12:55 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Error: Importing mailbox INBOX failed
Feb  1 07:12:56 doco1 dovecot: dsync-local(user1): Error:
read(vmail@192.168.23.62) failed: EOF
Feb  1 07:12:56 doco1 dovecot: dsync-local(user1): Error:
read(vmail@192.168.23.62) failed: Broken pipe
Feb  1 07:12:56 doco1 dovecot: dsync-local(user1): Error: Remote
command returned error 75
[...]
Feb  1 07:12:57 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Error: Mailbox INBOX: Unexpected GUID mismatch for UID=291:
7b30ff22af5b0b510f0c960042f4 != 8d30ff22af5b0b510f0c960042f4
Feb  1 07:12:57 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Panic: file dsync-mailbox-import.c: line 1112
(dsync_mailbox_import_change): assertion failed: (change-type == 
DSYNC_MAIL_CHANGE_TYPE_SAVE)
Feb  1 07:12:57 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5d4ea) 
[0x7f19cf5954ea]
- /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32) 
[0x7f19cf5955d2] - /usr/lib64/dovecot/libdovecot.so.0(+0x1f6ca) 
[0x7f19cf5576ca] - /usr/bin/doveadm(dsync_mailbox_import_change+0x501) 
[0x42c881] - /usr/bin/doveadm(dsync_brain_sync_mails+0x3a2) 
[0x4290c2] - /usr/bin/doveadm(dsync_brain_run+0x169) 
[0x425e29] - /usr/bin/doveadm() [0x426380] - /usr/bin/doveadm() 
[0x434aa0] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36) 
[0x7f19cf5a4076]
- /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7) 
[0x7f19cf5a5107]
- /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28)
[0x7f19cf5a4018] - /usr/bin/doveadm() 
[0x424134] - /usr/bin/doveadm() 
[0x40fe4f] - /usr/bin/doveadm() 
[0x41067d] - /usr/bin/doveadm(doveadm_mail_try_run+0x141) 
[0x410ba1] - /usr/bin/doveadm(main+0x3f1) 
[0x417bc1] - /lib64/libc.so.6(__libc_start_main+0xfd) 
[0x7f19cf1c3cdd] - /usr/bin/doveadm() [0x40f839]
Feb  1 07:12:57 doco1 dovecot: dsync-local(user1): Error:
read(vmail@192.168.23.62) failed: EOF


Server 2:
Feb  1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Unexpected GUID mismatch for UID=205:
7a30ff22af5b0b510f0c960042f4 != 8230ff22af5b0b510f0c960042f4
Feb  1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c960042f4
(UID=228)
Feb  1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c960042f4
(UID=234)
Feb  1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Remote didn't send mail GUID=7b30ff22af5b0b510f0c960042f4
(UID=238)
Feb  1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Mailbox
INBOX: 

Re: [Dovecot] dsync replication errors

2013-02-01 Thread Michael Grimm
[Sorry Oli for my previous mail to your address, only. Resent here]

Oli Schacher dove...@lists.wgwh.ch wrote:

 There still seems to be a problem when changes to both mailboxes at
 the same time are involved

I can confirm your observation, although triggered by a different test
scenario, similar to the one I did use with 2.1 replicator before
(http://www.dovecot.org/list/dovecot/2012-March/064354.html).

This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes test
at both servers mx1 and mx2, and replicator uses ssh for remote
access. Both servers run a recent postfix, use lmtp for local delivery,
and test is a virtual user.

Test script to produce local testmails of equal size at mx1:
| #!/bin/csh
| set INDEX= 101
| set endINDEX = 200
| while ( $INDEX = $endINDEX )
|echo $INDEX
|echo test | mail -s $INDEX test@mx1
|if ( $INDEX % 1000 == 0 ) then
|   sleep 1
|endif
|@ INDEX = $INDEX + 1
|end
|exit 0

Test script to produce testmails of equal size at mx2:
| #!/bin/csh
| set INDEX= 1101
| set endINDEX = 1200
| while ( $INDEX = $endINDEX )
|echo $INDEX
|echo test | mail -s $INDEX test@mx2
|if ( $INDEX % 1000 == 0 ) then
|   sleep 1
|endif
|@ INDEX = $INDEX + 1
|end
|exit 0

All tests are run with vanilla mailboxes, after restarting dovecot, and
without imap connections by MUA:

1) Simultaneous mailbomb approach: run both scripts simultaneously, and
  you'll end up with numerous duplicates in mailboxes test. Very often
  you'll find multiples.

2) Mailbomb approach: run one script at one server only, and all mails
  will become perfectly well synchronised.

3) Mofify both scripts to ( $INDEX % 1 == 0 ) to add a second waiting
  between every mail injection, and run them simultaneously at both
  servers, and you'll end up with significantly less duplicates and no
  more multiples.

 Feb  1 07:12:52 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: 
 Remote didn't send mail GUID=7a30ff22af5b0b510f0c960042f4 (UID=211)

 Feb  1 07:12:54 doco2 dovecot: dsync-local(user1): Error: Importing mailbox 
 INBOX failed

 Feb  1 07:13:24 doco2 dovecot: dsync-local(user1): Error: Remote command 
 process isn't dying, killing it

I do see those error messages as well, and in addition numerous of those:

| dovecot: dsync-local(test): Error: Mailbox INBOX: Unexpected GUID mismatch 
for UID=7153: 82c5df0a4ffa0b5141e36a0d5a02 != 
29cc9f284ffa0b5141c236abecbd

| doveadm: Error: dsync-remote(test): Error: Mailbox INBOX: Unexpected GUID 
mismatch for UID=7153: 82c5df0a4ffa0b5141e36a0d5a02 != 
29cc9f284ffa0b5141c236abecbd

| dovecot: lmtp(49752, test): Error: Corrupted index cache file 
/.../test/mailboxes/INBOX/dbox-Mails/dovecot.index.cache: File too small

| Feb  1 18:35:16 mail.err  mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: 
Mailbox INBOX: Corrupted index, uidvalidity=0
| Feb  1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: fscking index file /.../test/storage/dovecot.map.index
| Feb  1 18:35:16 mail.err  mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: 
Mailbox INBOX: Corrupted index, uidvalidity=0
| Feb  1 18:35:16 mail.err  mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: 
mdbox /.../test/mailboxes/INBOX/dbox-Mails: Storage keeps breaking
| Feb  1 18:35:16 mail.err  mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: Error: 
Mailbox INBOX: Corrupted index, uidvalidity=0
| Feb  1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: fscking index file /.../test/storage/dovecot.map.index
| Feb  1 18:35:16 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: mdbox /.../test/storage: rebuilding indexes
| Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: fscking index file /.../test/storage/dovecot.map.index
| Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: fscking index file /.../test/storage/dovecot.map.index
| Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: mdbox /.../test/storage: rebuilding indexes
| Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: fscking index file /.../test/storage/dovecot.map.index
| Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: fscking index file /.../test/storage/dovecot.map.index
| Feb  1 18:35:17 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: mdbox /.../test/storage: rebuilding indexes
| Feb  1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: fscking index file /.../test/storage/dovecot.map.index
| Feb  1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: fscking index file /.../test/storage/dovecot.map.index
| Feb  1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: mdbox /.../test/storage: rebuilding indexes
| Feb  1 18:35:18 mail.warn mx1 dovecot: imap(test) BXeiKq3UBgBd3DLy: 
Warning: fscking index file 

[Dovecot] dsync replication errors

2013-01-31 Thread Oli Schacher
Hi

I'm trying to build a cluster of two servers with dsync replication
(based on http://wiki2.dovecot.org/Replication). My test setup works
fine for very simple tests, I can log in to both servers, copy a
message to one of the servers and it successfully apperars in the other
account. But, if I try to copy a large amount of messages at once to
one of the accounts, my maillogs get flodded with errors(see below) and
the mailboxes seem to get out of sync and messages are duplicated over
and over again (I originally copied 100 messages and ended up with
thousands in both mailboxes until I killed dovecot)

I'd appreciate if someone could have a look at my config and tell me
what I did wrong.

dovecot.conf of both servers, they are identical except for the target
ip in mail_replica:

dovecot -n
# 2.2.beta1 (070ca24e5846+): /etc/dovecot/dovecot.conf
# OS: Linux 2.6.32-279.19.1.el6.x86_64 x86_64 CentOS release 6.3
(Final) disable_plaintext_auth = no
mail_plugins =  notify replication
namespace {
  inbox = yes
  location = 
  prefix = 
  separator = /
  type = private
}
passdb {
  args = /etc/dovecot/dovecot-sql.conf
  driver = sql
}
plugin {
  mail_replica = remote:vmail@192.168.23.62
}
protocols = pop3 imap
service aggregator {
  fifo_listener replication-notify-fifo {
user = vmail
  }
  unix_listener replication-notify {
user = vmail
  }
}
service auth {
  unix_listener auth-master {
group = vmail
mode = 0660
user = vmail
  }
  user = root
}
service replicator {
  process_min_avail = 1
}
ssl = no
userdb {
  args = /etc/dovecot/dovecot-sql.conf
  driver = sql
}



Log on server1 after I copied 100 messages to an account on that server:

Jan 31 10:41:04 doco1 dovecot: imap-login: Login: user=user1, method=PLAIN, 
rip=192.168.23.130, lip=192.168.23.61, mpid=1432, session=OdjlbJLUmwDAqBeC
Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: 
Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID 
(old uid=72, file=1359625327.M621257P1432.doco1,S=2472,W=2547:2,)
Jan 31 10:42:12 doco1 dovecot: dsync-local(user1): Error: Recent flags state 
corrupted for mailbox INBOX
Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: 
Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID 
(old uid=73, file=1359625327.M740847P1432.doco1,S=2417,W=2492:2,)
Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: 
Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID 
(old uid=74, file=1359625328.M206735P1432.doco1,S=2400,W=2474:2,)
Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: 
Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID 
(old uid=75, file=1359625328.M668118P1432.doco1,S=2421,W=2496:2,)
Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: 
Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID 
(old uid=76, file=1359625329.M167578P1432.doco1,S=2480,W=2559:2,)
Jan 31 10:42:13 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: 
Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new UID 
(old uid=77, file=1359625329.M520528P1432.doco1,S=2525,W=2604:2,)
Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: 
/mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 132: 
1359625329.M520528P1432.doco1,S=2525,W=2604 (uid 77 - 133)
Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: 
/mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 133: 
1359625327.M621257P1432.doco1,S=2472,W=2547 (uid 72 - 134)
Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: 
/mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 134: 
1359625327.M740847P1432.doco1,S=2417,W=2492 (uid 73 - 135)
Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: 
/mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 135: 
1359625328.M206735P1432.doco1,S=2400,W=2474 (uid 74 - 136)
Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: 
/mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 136: 
1359625328.M668118P1432.doco1,S=2421,W=2496 (uid 75 - 137)
Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: 
/mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 137: 
1359625329.M167578P1432.doco1,S=2480,W=2559 (uid 76 - 138)
Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: 
/mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 139: 
1359625329.M782065P1432.doco1,S=2461,W=2539 (uid 78 - 140)
Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: 
/mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 140: 
1359625329.M973834P1432.doco1,S=2523,W=2602 (uid 79 - 141)
Jan 31 10:42:14 doco1 dovecot: doveadm(user1): Warning: 
/mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry at line 141: 

Re: [Dovecot] dsync replication errors

2013-01-31 Thread Timo Sirainen
On 31.1.2013, at 12.27, Oli Schacher dove...@lists.wgwh.ch wrote:

 I'm trying to build a cluster of two servers with dsync replication
 (based on http://wiki2.dovecot.org/Replication). My test setup works
 fine for very simple tests, I can log in to both servers, copy a
 message to one of the servers and it successfully apperars in the other
 account. But, if I try to copy a large amount of messages at once to
 one of the accounts, my maillogs get flodded with errors(see below) and
 the mailboxes seem to get out of sync and messages are duplicated over
 and over again (I originally copied 100 messages and ended up with
 thousands in both mailboxes until I killed dovecot)
..
 Jan 31 10:42:12 doco1 dovecot: doveadm: Error: dsync-remote(user1): Warning: 
 Maildir /mailstore/user1/maildir: Expunged message reappeared, giving a new 
 UID (old uid=72, file=1359625327.M621257P1432.doco1,S=2472,W=2547:2,)

Looks like some bug. Possibilities:

a) Use mdbox format instead of maildir. It works better with dsync.

b) Switch to v2.2 (latest hg version). It has a rewritte dsync that works 
better.

Ideally do both. :)



Re: [Dovecot] dsync replication errors

2013-01-31 Thread Oli Schacher

 a) Use mdbox format instead of maildir. It works better with dsync.

ok, I'll try that 

(although I was hoping I could avoid migrating all boxes on the server
I was planning to use this feature)


 
 b) Switch to v2.2 (latest hg version). It has a rewritte dsync that
 works better.

the testsetup is already on 2.2 hg


Thanks

-- 
message transmitted on 100% recycled electrons


Re: [Dovecot] dsync replication errors

2013-01-31 Thread Timo Sirainen
On 31.1.2013, at 14.06, Oli Schacher dove...@lists.wgwh.ch wrote:

 b) Switch to v2.2 (latest hg version). It has a rewritte dsync that
 works better.
 
 the testsetup is already on 2.2 hg

Oh. But it's still beta1. There are several fixes done to dsync since beta1, 
including a fix for these maildir errors. I should release beta2 or maybe rc1 
soon.



Re: [Dovecot] dsync replication errors

2013-01-31 Thread Oli Schacher
On Thu, 31 Jan 2013 14:27:08 +0200
Timo Sirainen t...@iki.fi wrote:

 Oh. But it's still beta1. There are several fixes done to dsync since
 beta1, including a fix for these maildir errors. I should release
 beta2 or maybe rc1 soon.
 

hmm.. actually I think I built it from the latest hg (but I must admit
I'm not really familiar with mercurial, so maybe I f*ckd up)

dovecot -n tells me
# 2.2.beta1 (070ca24e5846+): /etc/dovecot/dovecot.conf

and 070ca24e5846 seems to be the latest commit according to
http://hg.dovecot.org/dovecot-2.2/ (14 hours ago). not exactly sure why
it says something about beta1.


I tried with mdbox now.. same problem, although I don't see Expunged
message reappeared anymore , but still tons of these:

Server1:
Jan 31 13:38:05 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a518107960042f4 
(UID=136)
Jan 31 13:38:05 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Mailbox INBOX: Remote didn't send mail GUID=cbec8e2a84650a518107960042f4 
(UID=135)
Jan 31 13:38:05 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote 
didn't send mail GUID=caec8e2a84650a518107960042f4 (UID=148)
Jan 31 13:38:05 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote 
didn't send mail GUID=caec8e2a84650a518107960042f4 (UID=156)
Jan 31 13:38:05 doco1 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote 
didn't send mail GUID=cbec8e2a84650a518107960042f4 (UID=147)
[...]

Server2:
Jan 31 13:38:03 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote 
didn't send mail GUID=caec8e2a84650a518107960042f4 (UID=80)
Jan 31 13:38:03 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote 
didn't send mail GUID=cbec8e2a84650a518107960042f4 (UID=79)
Jan 31 13:38:04 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a518107960042f4 
(UID=81)
Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a518107960042f4 
(UID=119)
Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a518107960042f4 
(UID=128)
Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a518107960042f4 
(UID=130)
Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Mailbox INBOX: Remote didn't send mail GUID=d0ec8e2a84650a518107960042f4 
(UID=112)
Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Mailbox INBOX: Remote didn't send mail GUID=d3ec8e2a84650a518107960042f4 
(UID=133)
Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Mailbox INBOX: Remote didn't send mail GUID=d2ec8e2a84650a518107960042f4 
(UID=131)
Jan 31 13:38:05 doco2 dovecot: doveadm: Error: dsync-remote(user1): Error: 
Mailbox INBOX: Remote didn't send mail GUID=d1ec8e2a84650a518107960042f4 
(UID=132)
Jan 31 13:38:06 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote 
didn't send mail GUID=caec8e2a84650a518107960042f4 (UID=136)
Jan 31 13:38:06 doco2 dovecot: dsync-local(user1): Error: Mailbox INBOX: Remote 
didn't send mail GUID=cbec8e2a84650a518107960042f4 (UID=135)
[...]


-- 
message transmitted on 100% recycled electrons


Re: [Dovecot] dsync replication errors

2013-01-31 Thread Timo Sirainen
On 31.1.2013, at 14.46, Oli Schacher dove...@lists.wgwh.ch wrote:

 On Thu, 31 Jan 2013 14:27:08 +0200
 Timo Sirainen t...@iki.fi wrote:
 
 Oh. But it's still beta1. There are several fixes done to dsync since
 beta1, including a fix for these maildir errors. I should release
 beta2 or maybe rc1 soon.
 
 
 hmm.. actually I think I built it from the latest hg (but I must admit
 I'm not really familiar with mercurial, so maybe I f*ckd up)
 
 dovecot -n tells me
 # 2.2.beta1 (070ca24e5846+): /etc/dovecot/dovecot.conf
 
 and 070ca24e5846 seems to be the latest commit according to
 http://hg.dovecot.org/dovecot-2.2/ (14 hours ago). not exactly sure why
 it says something about beta1.

So it seems. Looks like I've been browsing through your mails too quickly to 
pay attention. :)

 I tried with mdbox now.. same problem, although I don't see Expunged
 message reappeared anymore , but still tons of these:
 
 Server1:
 Jan 31 13:38:05 doco1 dovecot: doveadm: Error: dsync-remote(user1): Error: 
 Mailbox INBOX: Remote didn't send mail GUID=caec8e2a84650a518107960042f4 
 (UID=136)

But there's no duplication now and it gets fixed eventually, right?

And you can easily reproduce this by simply copying 100 mails from one folder 
to another? I'll see if I can reproduce.



Re: [Dovecot] dsync replication errors

2013-01-31 Thread Timo Sirainen
On 31.1.2013, at 15.10, Oli Schacher dove...@lists.wgwh.ch wrote:

 connect thunderbird to account user1 on server1
 result: login ok, mdbox visible on disk, 0 messages 
 
 in thunderbird copy exactly 100 messages from a spambox to user1's
 inbox on server1

spambox not being in server1? So not IMAP COPY command, but APPEND?



Re: [Dovecot] dsync replication errors

2013-01-31 Thread Oli Schacher
On Thu, 31 Jan 2013 15:24:06 +0200
Timo Sirainen t...@iki.fi wrote:

 On 31.1.2013, at 15.10, Oli Schacher dove...@lists.wgwh.ch wrote:
 
  connect thunderbird to account user1 on server1
  result: login ok, mdbox visible on disk, 0 messages 
  
  in thunderbird copy exactly 100 messages from a spambox to user1's
  inbox on server1
 
 spambox not being in server1? So not IMAP COPY command, but APPEND?
 

yes APPEND, the spambox where I got the messages from is on a completely
different server.  sorry for not mentioning that earlier.



Re: [Dovecot] dsync replication errors

2013-01-31 Thread Timo Sirainen
On 31.1.2013, at 15.36, Oli Schacher dove...@lists.wgwh.ch wrote:

 On Thu, 31 Jan 2013 15:24:06 +0200
 Timo Sirainen t...@iki.fi wrote:
 
 On 31.1.2013, at 15.10, Oli Schacher dove...@lists.wgwh.ch wrote:
 
 connect thunderbird to account user1 on server1
 result: login ok, mdbox visible on disk, 0 messages 
 
 in thunderbird copy exactly 100 messages from a spambox to user1's
 inbox on server1
 
 spambox not being in server1? So not IMAP COPY command, but APPEND?
 
 
 yes APPEND, the spambox where I got the messages from is on a completely
 different server.  sorry for not mentioning that earlier.

See if http://hg.dovecot.org/dovecot-2.2/rev/1d88f01ba2aa helps?



Re: [Dovecot] dsync replication errors

2013-01-31 Thread Oli Schacher
On Thu, 31 Jan 2013 17:09:20 +0200
Timo Sirainen t...@iki.fi wrote:


 
 See if http://hg.dovecot.org/dovecot-2.2/rev/1d88f01ba2aa helps?
 

I updated to the latest hg, including the remote cmd exit wait update.

It looks better now, but I still manage to break things :-)

#
test 1: append 1000 messages messages with thunderbird, mdbox
- ok, no more errors, sync ok  


#
test 2: append only 100 messages, but use maildir again instead of
mdbox.
still produces errors and starts duplicating, even saw an
assertion error this time, but I can't reproduce it always

Jan 31 16:57:34 doco1 dovecot: imap-login: Login: user=user1,
method=PLAIN, rip=192.168.23.130, lip=192.168.23.61, mpid=2684,
session=4tper5fU8gDAqBeC
Jan 31 16:57:35 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Panic: file dsync-mailbox-tree-fill.c: line 72
(dsync_mailbox_tree_add): assertion failed: (status.uidvalidity != 0)
Jan 31 16:57:35 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5ce8a) 
[0x7f65aa39de8a]
- /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32)   
[0x7f65aa39df72] - /usr/lib64/dovecot/libdovecot.so.0(+0x1f55a)
[0x7f65aa36055a] - /usr/bin/doveadm(dsync_mailbox_tree_fill+0x4cf) 
[0x42f5cf] - /usr/bin/doveadm(dsync_brain_mailbox_trees_init+0x180)
[0x428630] - /usr/bin/doveadm(dsync_brain_run+0x393) 
[0x426033] - /usr/bin/doveadm() [0x426331] - /usr/bin/doveadm()
[0x434780] - /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36) 
[0x7f65aa3aca16]
- /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7)  
[0x7f65aa3adaa7]
- /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28)   
[0x7f65aa3ac9b8] - /usr/bin/doveadm() [0x424114] - /usr/bin/doveadm()
[0x40fe4f] - /usr/bin/doveadm() [0x41067d]
- /usr/bin/doveadm(doveadm_mail_try_run+0x141)   
[0x410ba1] - /usr/bin/doveadm(main+0x3f1) [0x417ba1]
- /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f65a9fcccdd]
- /usr/bin/doveadm() [0x40f839]  
Jan 31 16:57:35 doco1 dovecot: dsync-local(user1): Error:
read(vmail@192.168.23.62) failed: EOF
Jan 31 16:57:35 doco1 dovecot: dsync-local(user1): Error: Remote
command returned error 255
Jan 31 16:58:06 doco1 dovecot: dsync-local(user1): Error: Recent flags
state corrupted for mailbox INBOX
Jan 31 16:58:06 doco1 dovecot: doveadm(user1):
Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry
at line 59: 1359647883.M823994P2684.doco1,S=2483,W=2562 (uid 18 - 58)
Jan 31 16:58:06 doco1 dovecot: doveadm(user1):
Warning: /mailstore/user1/maildir/dovecot-uidlist: Duplicate file entry
at line 60: 1359647883.M382644P2684.doco1,S=2533,W=2610 (uid 15 - 59)
[...]



#
test 3: mdbox again,  append 1000 messages with claws mail, but have
thunderbird connected at the same time to both accounts while doing so.
this leads to the same problem as before (duplication, errors). I guess
thunderbird wants to set a seen flag and modifying the mailbox while
it's being synced is probably is a bad idea, but you never know
what users are going to do :-)

Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4
(UID=104)
Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4
(UID=114)
Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4
(UID=118)
Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox
INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4
(UID=123)


Let me know if you need more info/tests.

-- 
message transmitted on 100% recycled electrons


Re: [Dovecot] dsync replication errors

2013-01-31 Thread Timo Sirainen
On 31.1.2013, at 18.37, Oli Schacher dove...@lists.wgwh.ch wrote:

 I updated to the latest hg, including the remote cmd exit wait update.
 
 It looks better now, but I still manage to break things :-)
 
 #
 test 2: append only 100 messages, but use maildir again instead of
 mdbox.
 still produces errors and starts duplicating, even saw an
 assertion error this time, but I can't reproduce it always
 
 Jan 31 16:57:34 doco1 dovecot: imap-login: Login: user=user1,
 method=PLAIN, rip=192.168.23.130, lip=192.168.23.61, mpid=2684,
 session=4tper5fU8gDAqBeC
 Jan 31 16:57:35 doco1 dovecot: doveadm: Error: dsync-remote(user1):
 Panic: file dsync-mailbox-tree-fill.c: line 72
 (dsync_mailbox_tree_add): assertion failed: (status.uidvalidity != 0)

http://hg.dovecot.org/dovecot-2.2/rev/86629f621fe4 should fix this crash.

The duplication happens because maildir somehow messes up itself. I guess I 
should look into it.

 test 3: mdbox again,  append 1000 messages with claws mail, but have
 thunderbird connected at the same time to both accounts while doing so.
 this leads to the same problem as before (duplication, errors). I guess
 thunderbird wants to set a seen flag and modifying the mailbox while
 it's being synced is probably is a bad idea, but you never know
 what users are going to do :-)
 
 Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox
 INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4
 (UID=104)

All of the clients and changes are done only to one side, not to both sides?



Re: [Dovecot] dsync replication errors

2013-01-31 Thread Oli Schacher
On Thu, 31 Jan 2013 18:49:18 +0200
Timo Sirainen t...@iki.fi wrote:

 
 http://hg.dovecot.org/dovecot-2.2/rev/86629f621fe4 should fix this
 crash.
 
 The duplication happens because maildir somehow messes up itself. I
 guess I should look into it.
 

thanks, much appreciated!

  test 3: mdbox again,  append 1000 messages with claws mail, but have
  thunderbird connected at the same time to both accounts while doing
  so. this leads to the same problem as before (duplication, errors).
  I guess thunderbird wants to set a seen flag and modifying the
  mailbox while it's being synced is probably is a bad idea, but you
  never know what users are going to do :-)
  
  Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox
  INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4
  (UID=104)
 
 All of the clients and changes are done only to one side, not to both
 sides?
 

In my previous tests I had thunderbird connected to both servers,
without actually doing anything, just watching the mailbox unread
counter go up. It could be it tried to update both mailboxes. I don't
know what thunderbird does in the background when you're not actually
clicking on a mailbox. The errors were visible in both maillogs
(server1 and server2).

But I can reproduce the problem by connecting only to server1, in that
case, the errors show up in server1's log only:

the current test scenario looks like:

- both servers empty mail store, configuration set to mdbox
- start server 1
- start server 2
- connect claws mail to server1
- connect thunderbird to server1 too
- in claws mail copy a few hundred mails from a remote box to server1
- I can see the unread counter go up in thunderbird
- Remote didn't send mail errors start popping up, but only in
  server1's maillog this time 
- mails are duplicated

in one testrun I also saw the assert failure below, but again, I can't
reproduce this one :

Jan 31 18:10:11 doco1 dovecot: doveadm: Error: dsync-remote(user1):
Panic: file dsync-mailbox-import.c: line 1080
(dsync_mailbox_import_change): assertion failed: (change-type ==
DSYNC_MAIL_CHANGE_TYPE_SAVE) Jan 31 18:10:11 doco1 dovecot: doveadm:
Error: dsync-remote(user1): Error: Raw
backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x5ce8a)
[0x7f0ac3602e8a]
- /usr/lib64/dovecot/libdovecot.so.0(default_fatal_handler+0x32)
[0x7f0ac3602f72] - /usr/lib64/dovecot/libdovecot.so.0(+0x1f55a)
[0x7f0ac35c555a] - /usr/bin/doveadm(dsync_mailbox_import_change+0x501)
[0x42c631] - /usr/bin/doveadm(dsync_brain_sync_mails+0x3a2) [0x4290a2]
- /usr/bin/doveadm(dsync_brain_run+0x169) [0x425e09]
- /usr/bin/doveadm() [0x426360] - /usr/bin/doveadm() [0x434780]
- /usr/lib64/dovecot/libdovecot.so.0(io_loop_call_io+0x36)
[0x7f0ac3611a16]
- /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0xa7)
[0x7f0ac3612aa7]
- /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28)
[0x7f0ac36119b8] - /usr/bin/doveadm() [0x424114] - /usr/bin/doveadm()
[0x40fe4f] - /usr/bin/doveadm() [0x41067d]
- /usr/bin/doveadm(doveadm_mail_try_run+0x141) [0x410ba1]
- /usr/bin/doveadm(main+0x3f1) [0x417ba1]
- /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f0ac3231cdd]
- /usr/bin/doveadm() [0x40f839]


-- 
message transmitted on 100% recycled electrons


Re: [Dovecot] dsync replication errors

2013-01-31 Thread Timo Sirainen
On 31.1.2013, at 19.41, Oli Schacher dove...@lists.wgwh.ch wrote:

 Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox
 INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4
 (UID=104)

I guess there's some bug that causes this to happen in some situations.. But 
the reason for mail duplication should be fixed by: 
http://hg.dovecot.org/dovecot-2.2/rev/138f1c76c0ec

Except that shouldn't have been necessary. doveadm-server returns success 
before it has finished running dsync. Not sure why, need to debug it further.

 in one testrun I also saw the assert failure below, but again, I can't
 reproduce this one :
 
 Jan 31 18:10:11 doco1 dovecot: doveadm: Error: dsync-remote(user1):
 Panic: file dsync-mailbox-import.c: line 1080
 (dsync_mailbox_import_change): assertion failed: (change-type ==
 DSYNC_MAIL_CHANGE_TYPE_SAVE)

Related to incremental syncing. Have to debug it further also.



Re: [Dovecot] dsync replication errors

2013-01-31 Thread Timo Sirainen
On Thu, 2013-01-31 at 21:51 +0200, Timo Sirainen wrote:
 On 31.1.2013, at 19.41, Oli Schacher dove...@lists.wgwh.ch wrote:
 
  Jan 31 17:13:11 doco1 dovecot: dsync-local(user1): Error: Mailbox
  INBOX: Remote didn't send mail GUID=33dabe0f11980a51200c960042f4
  (UID=104)
 
 I guess there's some bug that causes this to happen in some situations.. But 
 the reason for mail duplication should be fixed by: 
 http://hg.dovecot.org/dovecot-2.2/rev/138f1c76c0ec
 
 Except that shouldn't have been necessary. doveadm-server returns success 
 before it has finished running dsync. Not sure why, need to debug it further.

Fixed with a bit of a kludge:
http://hg.dovecot.org/dovecot-2.2/rev/e9e6a95cea21