RE: doveadm-deduplicate deletes non-duplicates
> On 14/06/2022 18:12 Marc wrote: > > > > > > > > Aki> We have released 2.3.19.1 instead, and should be fixed now. > > > > It is not my intention to hijack this thread, and to be honest it would be > nice to see some statistics on the existence of duplicates, triplicates etc. > But when I create a duplicate of a message, it is often with the intention > that I do not want to loose it and have a copy in a separate folder/mailbox. > If then on the backend, on the server this is being undone, and there is a > corruption on the storage at that specific area. I have still lost 'all' > messages. So I am wondering if it is really worth doing this de-duplication? If you don't want deduplication, don't run doveadm deduplicate. It's admin-ran task, not default behaviour. Aki
RE: doveadm-deduplicate deletes non-duplicates
> > > Aki> We have released 2.3.19.1 instead, and should be fixed now. > It is not my intention to hijack this thread, and to be honest it would be nice to see some statistics on the existence of duplicates, triplicates etc. But when I create a duplicate of a message, it is often with the intention that I do not want to loose it and have a copy in a separate folder/mailbox. If then on the backend, on the server this is being undone, and there is a corruption on the storage at that specific area. I have still lost 'all' messages. So I am wondering if it is really worth doing this de-duplication?
Re: doveadm-deduplicate deletes non-duplicates
Aki> We have released 2.3.19.1 instead, and should be fixed now. Thanks!
Re: doveadm-deduplicate deletes non-duplicates
We have released 2.3.19.1 instead, and should be fixed now. Aki On 06/13/2022 6:24 PM John Stoffelwrote: > "Aki" == Aki Tuomi writes: Will 2.3.20 be released ASAP with this fix? Aki> This has now been fixed in main with Aki> https://github.com/dovecot/core/commit/2780f106e3b185981dd7aaf5cbf2e88daa2f7c64.patch Aki> Aki >> On 13/06/2022 10:43 gravitini wrote: >> >> >> Please consider as critical (data loss) and recommend a warning is >> issued for 2.3.19 users. >> >> >> On 13/06/22 5:25 pm, Aki Tuomi wrote: >> >> On 13/06/2022 02:09 gravitini wrote: >> >> >> >> >> >> Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html >> >> >> >> >> >> Hi, >> >> >> >> Looking at the code (and tested via local build from source) it looks >> >> like doveadm deduplicate in 2.3.19 can cause significant data loss. >> >> >> >> A 2022-02-11 commit removed key duplication resulting in undefined >> >> behaviour which is often truncation of a mailbox to 67 entries. >> >> (HASH_TABLE_MIN_SIZE) >> >> >> >> https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d2050f?diff=split#diff-5842cf9d4248dc515d80ebb45575341b7d76832f979a8ac5f602784cb5b03f2cL121 >> >> >> >> diff --git a/src/doveadm/doveadm-mail-deduplicate.c >> >> b/src/doveadm/doveadm-mail-deduplicate.c >> >> >> >> index caec758112..2152482876 100644 >> >> --- a/src/doveadm/doveadm-mail-deduplicate.c >> >> +++ b/src/doveadm/doveadm-mail-deduplicate.c >> >> @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context >> >> *_ctx, >> >> if (key != NULL && *key != '\0') { >> >> if (hash_table_lookup(hash, key) != NULL) >> >> mail_expunge(mail); >> >> - else >> >> + else { >> >> + key = p_strdup(pool, key); >> >> hash_table_insert(hash, key, >> >> POINTER_CAST(1)); >> >> + } >> >> } >> >> } >> > Thank you both for the report, we'll look into this! >> > >> > Aki
Re: doveadm-deduplicate deletes non-duplicates
> "Aki" == Aki Tuomi writes: Will 2.3.20 be released ASAP with this fix? Aki> This has now been fixed in main with Aki> https://github.com/dovecot/core/commit/2780f106e3b185981dd7aaf5cbf2e88daa2f7c64.patch Aki> Aki >> On 13/06/2022 10:43 gravitini wrote: >> >> >> Please consider as critical (data loss) and recommend a warning is >> issued for 2.3.19 users. >> >> >> On 13/06/22 5:25 pm, Aki Tuomi wrote: >> >> On 13/06/2022 02:09 gravitini wrote: >> >> >> >> >> >> Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html >> >> >> >> >> >> Hi, >> >> >> >> Looking at the code (and tested via local build from source) it looks >> >> like doveadm deduplicate in 2.3.19 can cause significant data loss. >> >> >> >> A 2022-02-11 commit removed key duplication resulting in undefined >> >> behaviour which is often truncation of a mailbox to 67 entries. >> >> (HASH_TABLE_MIN_SIZE) >> >> >> >> https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d2050f?diff=split#diff-5842cf9d4248dc515d80ebb45575341b7d76832f979a8ac5f602784cb5b03f2cL121 >> >> >> >> diff --git a/src/doveadm/doveadm-mail-deduplicate.c >> >> b/src/doveadm/doveadm-mail-deduplicate.c >> >> >> >> index caec758112..2152482876 100644 >> >> --- a/src/doveadm/doveadm-mail-deduplicate.c >> >> +++ b/src/doveadm/doveadm-mail-deduplicate.c >> >> @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context >> >> *_ctx, >> >> if (key != NULL && *key != '\0') { >> >> if (hash_table_lookup(hash, key) != NULL) >> >> mail_expunge(mail); >> >> - else >> >> + else { >> >> + key = p_strdup(pool, key); >> >> hash_table_insert(hash, key, >> >> POINTER_CAST(1)); >> >> + } >> >> } >> >> } >> > Thank you both for the report, we'll look into this! >> > >> > Aki
Re: doveadm-deduplicate deletes non-duplicates
This has now been fixed in main with https://github.com/dovecot/core/commit/2780f106e3b185981dd7aaf5cbf2e88daa2f7c64.patch Aki > On 13/06/2022 10:43 gravitini wrote: > > > Please consider as critical (data loss) and recommend a warning is > issued for 2.3.19 users. > > > On 13/06/22 5:25 pm, Aki Tuomi wrote: > >> On 13/06/2022 02:09 gravitini wrote: > >> > >> > >> Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html > >> > >> > >> Hi, > >> > >> Looking at the code (and tested via local build from source) it looks > >> like doveadm deduplicate in 2.3.19 can cause significant data loss. > >> > >> A 2022-02-11 commit removed key duplication resulting in undefined > >> behaviour which is often truncation of a mailbox to 67 entries. > >> (HASH_TABLE_MIN_SIZE) > >> > >> https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d2050f?diff=split#diff-5842cf9d4248dc515d80ebb45575341b7d76832f979a8ac5f602784cb5b03f2cL121 > >> > >> diff --git a/src/doveadm/doveadm-mail-deduplicate.c > >> b/src/doveadm/doveadm-mail-deduplicate.c > >> > >> index caec758112..2152482876 100644 > >> --- a/src/doveadm/doveadm-mail-deduplicate.c > >> +++ b/src/doveadm/doveadm-mail-deduplicate.c > >> @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context > >> *_ctx, > >> if (key != NULL && *key != '\0') { > >> if (hash_table_lookup(hash, key) != NULL) > >> mail_expunge(mail); > >> - else > >> + else { > >> + key = p_strdup(pool, key); > >> hash_table_insert(hash, key, > >> POINTER_CAST(1)); > >> + } > >> } > >> } > > Thank you both for the report, we'll look into this! > > > > Aki
Re: doveadm-deduplicate deletes non-duplicates
> On 13/06/2022 02:09 gravitini wrote: > > > Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html > > > Hi, > > Looking at the code (and tested via local build from source) it looks > like doveadm deduplicate in 2.3.19 can cause significant data loss. > > A 2022-02-11 commit removed key duplication resulting in undefined > behaviour which is often truncation of a mailbox to 67 entries. > (HASH_TABLE_MIN_SIZE) > > https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d2050f?diff=split#diff-5842cf9d4248dc515d80ebb45575341b7d76832f979a8ac5f602784cb5b03f2cL121 > > diff --git a/src/doveadm/doveadm-mail-deduplicate.c > b/src/doveadm/doveadm-mail-deduplicate.c > > index caec758112..2152482876 100644 > --- a/src/doveadm/doveadm-mail-deduplicate.c > +++ b/src/doveadm/doveadm-mail-deduplicate.c > @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context > *_ctx, > if (key != NULL && *key != '\0') { > if (hash_table_lookup(hash, key) != NULL) > mail_expunge(mail); > - else > + else { > + key = p_strdup(pool, key); > hash_table_insert(hash, key, > POINTER_CAST(1)); > + } > } > } Thank you both for the report, we'll look into this! Aki
doveadm-deduplicate deletes non-duplicates
Hi, I've been trying to use `doveadm deduplicate` to deduplicate mailboxes. According to doveadm-deduplicate(1), "deduplication will be done by message GUIDs". However, deduplication deletes messages with distinct message GUIDs, i.e., it deletes messages that are not duplicates. Is this a case of user error, do I have some form of corruption going on, or am I running into a bug? In case it helps, I'm including: 1) the list of GUIDs for messages in my INBOX before a deduplication run (as documented in doveadm-deduplicate(1)), 2) the output of `doveadm -D deduplicate -u rak mailbox INBOX`, 3) the list of GUIDs after deduplication, 4) a diff of (1) and (3), 5) the output of doveconf -n. Thanks, Ryan $ doveadm -f table fetch -u rak 'guid uid' mailbox INBOX | sort 0b3bee1414118f6282db226807b07100 0b4b681c18168f622094226807b07104 0b83ac1d79cf2962eb3a0100226807b06153 0b97791e83108f6282db226807b07095 1614451513.M180742P77875.hades.rak.ac,S=3516,W=3600 22 1614451513.M180779P77875.hades.rak.ac,S=5252,W=5370 52 1614452870.M315137P88362.hades.rak.ac,S=5516,W=5623 68 1614452870.M315152P88362.hades.rak.ac,S=5977,W=6085 69 23a5a0179e108f6282db226807b07097 33318e0686108f6282db226807b07096 3351581c9c0e8f624cd5226807b07091 3359032360108f6282db226807b07093 3b3927352be48c62b65b226807b07072 3ba2252814118f62a2540100226807b07101 4154be0c2fc77761c255226807b04050 5b3a7327d73b866299cd226807b07013 638e073255235d62a5280100226807b06660 63b6422e14118f6282db226807b07102 8b9b942909118f6282db226807b07099 8bc95639a6168f62ad4e226807b07105 a316ee28980d8f627c46226807b07090 ab802c1e37ac3d6214680100226807b06343 b3d12f21cd108f6282db226807b07098 bb89431b6e728d623092226807b07076 c386310f062adb61d198226807b05306 cb56992484478d62f3090100226807b07074 cb79bf3503128f6275220100226807b07103 eb2818367d108f62484b0100a558518d7094 eb709020e70f8f62292e226807b07092 f19a8c06409a7d61271c226807b04128 f338ad1ff65e8562ff390100226807b07007 f9cd03363c457c613a0e226807b04107 guiduid $ doveadm -D deduplicate -u rak mailbox INBOX Debug: Loading modules from directory: /usr/local/lib/dovecot/doveadm Debug: Skipping module doveadm_acl_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message) Debug: Skipping module doveadm_quota_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message) Debug: Module loaded: /usr/local/lib/dovecot/doveadm/lib10_doveadm_sieve_plugin.so Debug: Skipping module doveadm_fts_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message) Debug: Skipping module doveadm_fts_flatcurve_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message) Debug: Skipping module doveadm_mail_crypt_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message) May 31 12:50:35 Debug: Loading modules from directory: /usr/local/lib/dovecot May 31 12:50:35 Debug: Module loaded: /usr/local/lib/dovecot/lib15_notify_plugin.so May 31 12:50:35 Debug: Module loaded: /usr/local/lib/dovecot/lib20_replication_plugin.so May 31 12:50:35 Debug: Module loaded: /usr/local/lib/dovecot/lib20_virtual_plugin.so May 31 12:50:35 Debug: Loading modules from directory: /usr/local/lib/dovecot/doveadm May 31 12:50:35 Debug: Skipping module doveadm_acl_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message) May 31 12:50:35 Debug: Skipping module doveadm_quota_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message) May 31 12:50:35 Debug: Skipping module doveadm_fts_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message) May 31 12:50:35 Debug: Skipping module doveadm_fts_flatcurve_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message) May 31 12:50:35 Debug: Skipping module doveadm_mail_crypt_plugin, because dlopen() failed: Cannot load specified object (this is usually intentional, so just ignore this message) May 31 12:50:35