Hello,

I'm trying to create automated backup recovery using "doveadm import" and "doveadm deduplicate". During testing I noticed that deduplicate only deletes some duplicates and has to be called multiple times to find them all. Here's what I've been trying (in shell commands):

First, expunge inbox (the end result is the same even if you delete only some messages):

# doveadm expunge -u test mailbox inbox all
# ls /home/mailboxes/test/cur | wc -l
0

Then import data from backup - twice, so duplicates are created (again, if you don't delete all messages and call import only once, resulting behaviour is the same.)

# doveadm import -u test maildir:/home/test "" mailbox INBOX
# doveadm import -u test maildir:/home/test "" mailbox INBOX
# ls /home/mailboxes/test/cur | wc -l
1046

Then try to deduplicate

# doveadm deduplicate -u test mailbox INBOX
# ls /home/mailboxes/test/cur | wc -l
1040

And again

# doveadm deduplicate -u test mailbox INBOX
# ls /home/mailboxes/test/cur | wc -l
1029

And so on until the message count holds on 523

Each repetition removes 10 - 30 duplicates so eventually all duplicates are removed if "doveadm deduplicate" is called enough times in a row. I also noticed that when I repeat the test, import the backup again and call deduplicate, the steps - how many messages are removed at one time - are the same. That is I start with 1046 messages in the mailbox, after first run there's 1040, then 1029 and so on. My guess would be the behaviour depends on what is stored in the mailbox, but that's pretty much all I can figure out on my own at this time.

My question is - is this intended behaviour, ie. are you supposed to run doveadm deduplicate as long as the number of messages in the mailbox keeps changing? Or is it a bug? Tried to Google for the answer but no luck, so thanks for any answers.

Tested on Dovecot version 2.2.9 and 2.2.12 (both from Debian repositories.)

Reply via email to