On 26.02.2013, at 10:55, Timo Sirainen <t...@iki.fi> wrote:

> I can't reproduce this. Some interesting questions:
> 
> * If you include hostname+counter in the message, what do the mailboxes look 
> like in the different sides? Did they skip over some numbers or did they both 
> stop at some specific remote counter and continue the local counters until 
> the end?

(I am down with my tests to 100 messages injected at mx1 and mx2 
simultaneously, and this is with Dovecot v2.2.rc1 (ef7eb84d9a3a))

Both inboxes contain all 100 messages injected at its injection site, meaning 
all 100 messages injected at mx1 show up at mx1's inbox, and all 100 messages 
injected at mx2 show up at mx2's inbox. The remaining few messages are those 
replicated, e.g. 22 injected at mx2 can be found in mx1's inbox, and 23 
injected at mx1 can be found in mx2's inbox. Thus, replication stops early.

> * Is it even trying to run doveadm sync commands at the end? (e.g. make 
> dsync_remote_cmd execute some wrapper script that logs something)

Wrapper script shows 23 invocations at mx1 and mx2, each.

> * If the doveadm syncs continue, try saving rawlogs from them to see what 
> they're doing (-r /tmp/rawlog parameter to doveadm dsync-server).

I do have rawlogs, but I am helpless when it comes to their interpretation, 
though. :-(

Perhaps of importance:

| mx1> grep @test /tmp/rawlog | grep I: | wc
|      22      88    1650
| mx1> grep @test /tmp/rawlog | grep O: | wc
|       1       4      74

| mx2> grep @test /tmp/rawlog | grep I: | wc
|      22      88    1628
| mx2> grep @test /tmp/rawlog | grep O: | wc
|       0       0       0

 
BUT: It look as if I haven't waited long enough for replication to become 
finished, sorry :-(

Actually, while going through all those files and writing this mail, all 
missing messages appeared in my MUA, and I do find in both maillogs:

@mx1:
| dovecot: dsync-local(test): Error: dsync(vm...@mx2.tld): I/O has stalled, no 
activity for 600 seconds
| dovecot: dsync-local(test): Error: Remote command process isn't dying, 
killing it

@mx2:
| dovecot: dsync-local(test): Error: dsync(vm...@mx1.tld): I/O has stalled, no 
activity for 600 seconds
| dovecot: dsync-local(test): Error: Remote command process isn't dying, 
killing it

And in rawlog I do now find ...

| mx1> grep @test /tmp/rawlog | grep I: | wc
|      22      88    1650
| mx1> grep @test /tmp/rawlog | grep O: | wc
|       1       4      74

| mx2> grep @test /tmp/rawlog | grep I: | wc
|      99     396    7326
| mx2> grep @test /tmp/rawlog | grep O: | wc
|      78     312    5850

... thus, all mails became replicated after that 600 seconds timeout.

But why do I run into timeouts when those mails become injected second by 
second, but not, if injected without waiting time?

Do you have any idea what I should do next?

Regards,
Michael

Reply via email to