RE: doveadm-deduplicate deletes non-duplicates

2022-06-14 Thread Aki Tuomi


> On 14/06/2022 18:12 Marc  wrote:
> 
>  
> > 
> > 
> > Aki> We have released 2.3.19.1 instead, and should be fixed now.
> > 
> 
> It is not my intention to hijack this thread, and to be honest it would be 
> nice to see some statistics on the existence of duplicates, triplicates etc.
> But when I create a duplicate of a message, it is often with the intention 
> that I do not want to loose it and have a copy in a separate folder/mailbox. 
> If then on the backend, on the server this is being undone, and there is a 
> corruption on the storage at that specific area. I have still lost 'all' 
> messages. So I am wondering if it is really worth doing this de-duplication?

If you don't want deduplication, don't run doveadm deduplicate. It's admin-ran 
task, not default behaviour.

Aki


RE: doveadm-deduplicate deletes non-duplicates

2022-06-14 Thread Marc
> 
> 
> Aki> We have released 2.3.19.1 instead, and should be fixed now.
> 

It is not my intention to hijack this thread, and to be honest it would be nice 
to see some statistics on the existence of duplicates, triplicates etc.
But when I create a duplicate of a message, it is often with the intention that 
I do not want to loose it and have a copy in a separate folder/mailbox. If then 
on the backend, on the server this is being undone, and there is a corruption 
on the storage at that specific area. I have still lost 'all' messages. So I am 
wondering if it is really worth doing this de-duplication?







Re: doveadm-deduplicate deletes non-duplicates

2022-06-14 Thread John Stoffel



Aki> We have released 2.3.19.1 instead, and should be fixed now.

Thanks!


Re: doveadm-deduplicate deletes non-duplicates

2022-06-14 Thread Aki Tuomi


 
 
  
   We have released 2.3.19.1 instead, and should be fixed now.
   
  
    
   
  
   Aki
   
   
   
On 06/13/2022 6:24 PM John Stoffel  wrote:

   
 

   
 

   
> "Aki" == Aki Tuomi  writes:

   
 

   
Will 2.3.20 be released ASAP with this fix?

   
 

   
Aki> This has now been fixed in main with

   
Aki> https://github.com/dovecot/core/commit/2780f106e3b185981dd7aaf5cbf2e88daa2f7c64.patch

   
 

   
Aki> Aki

   
 

   
>> On 13/06/2022 10:43 gravitini  wrote:

   
>>

   
>>

   
>> Please consider as critical (data loss) and recommend a warning is

   
>> issued for 2.3.19 users.

   
>>

   
>>

   
>> On 13/06/22 5:25 pm, Aki Tuomi wrote:

   
>> >> On 13/06/2022 02:09 gravitini  wrote:

   
>> >>

   
>> >>

   
>> >> Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html

   
>> >>

   
>> >>

   
>> >> Hi,

   
>> >>

   
>> >> Looking at the code (and tested via local build from source) it looks

   
>> >> like doveadm deduplicate in 2.3.19 can cause significant data loss.

   
>> >>

   
>> >> A 2022-02-11 commit removed key duplication resulting in undefined

   
>> >> behaviour which is often truncation of a mailbox to 67 entries.

   
>> >> (HASH_TABLE_MIN_SIZE)

   
>> >>

   
>> >> https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d2050f?diff=split#diff-5842cf9d4248dc515d80ebb45575341b7d76832f979a8ac5f602784cb5b03f2cL121

   
>> >>

   
>> >> diff --git a/src/doveadm/doveadm-mail-deduplicate.c

   
>> >> b/src/doveadm/doveadm-mail-deduplicate.c

   
>> >>

   
>> >> index caec758112..2152482876 100644

   
>> >> --- a/src/doveadm/doveadm-mail-deduplicate.c

   
>> >> +++ b/src/doveadm/doveadm-mail-deduplicate.c

   
>> >> @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context

   
>> >> *_ctx,

   
>> >>     if (key != NULL && *key != '\0') {

   
>> >>     if (hash_table_lookup(hash, key) != NULL)

   
>> >>     mail_expunge(mail);

   
>> >> -   else

   
>> >> +   else {

   
>> >> +   key = p_strdup(pool, key);

   
>> >>     hash_table_insert(hash, key,

   
>> >> POINTER_CAST(1));

   
>> >> +   }

   
>> >>     }

   
>> >>     }

   
>> > Thank you both for the report, we'll look into this!

   
>> >

   
>> > Aki

  
 



Re: doveadm-deduplicate deletes non-duplicates

2022-06-13 Thread John Stoffel
> "Aki" == Aki Tuomi  writes:

Will 2.3.20 be released ASAP with this fix?  

Aki> This has now been fixed in main with
Aki> 
https://github.com/dovecot/core/commit/2780f106e3b185981dd7aaf5cbf2e88daa2f7c64.patch

Aki> Aki

>> On 13/06/2022 10:43 gravitini  wrote:
>> 
>> 
>> Please consider as critical (data loss) and recommend a warning is 
>> issued for 2.3.19 users.
>> 
>> 
>> On 13/06/22 5:25 pm, Aki Tuomi wrote:
>> >> On 13/06/2022 02:09 gravitini  wrote:
>> >>
>> >>   
>> >> Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html
>> >>
>> >>
>> >> Hi,
>> >>
>> >> Looking at the code (and tested via local build from source) it looks
>> >> like doveadm deduplicate in 2.3.19 can cause significant data loss.
>> >>
>> >> A 2022-02-11 commit removed key duplication resulting in undefined
>> >> behaviour which is often truncation of a mailbox to 67 entries.
>> >> (HASH_TABLE_MIN_SIZE)
>> >>
>> >> https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d2050f?diff=split#diff-5842cf9d4248dc515d80ebb45575341b7d76832f979a8ac5f602784cb5b03f2cL121
>> >>
>> >> diff --git a/src/doveadm/doveadm-mail-deduplicate.c
>> >> b/src/doveadm/doveadm-mail-deduplicate.c
>> >>
>> >> index caec758112..2152482876 100644
>> >> --- a/src/doveadm/doveadm-mail-deduplicate.c
>> >> +++ b/src/doveadm/doveadm-mail-deduplicate.c
>> >> @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context
>> >> *_ctx,
>> >>       if (key != NULL && *key != '\0') {
>> >>       if (hash_table_lookup(hash, key) != NULL)
>> >>       mail_expunge(mail);
>> >> -   else
>> >> +   else {
>> >> +   key = p_strdup(pool, key);
>> >>       hash_table_insert(hash, key,
>> >> POINTER_CAST(1));
>> >> +   }
>> >>       }
>> >>       }
>> > Thank you both for the report, we'll look into this!
>> >
>> > Aki


Re: doveadm-deduplicate deletes non-duplicates

2022-06-13 Thread Aki Tuomi
This has now been fixed in main with 
https://github.com/dovecot/core/commit/2780f106e3b185981dd7aaf5cbf2e88daa2f7c64.patch

Aki

> On 13/06/2022 10:43 gravitini  wrote:
> 
>  
> Please consider as critical (data loss) and recommend a warning is 
> issued for 2.3.19 users.
> 
> 
> On 13/06/22 5:25 pm, Aki Tuomi wrote:
> >> On 13/06/2022 02:09 gravitini  wrote:
> >>
> >>   
> >> Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html
> >>
> >>
> >> Hi,
> >>
> >> Looking at the code (and tested via local build from source) it looks
> >> like doveadm deduplicate in 2.3.19 can cause significant data loss.
> >>
> >> A 2022-02-11 commit removed key duplication resulting in undefined
> >> behaviour which is often truncation of a mailbox to 67 entries.
> >> (HASH_TABLE_MIN_SIZE)
> >>
> >> https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d2050f?diff=split#diff-5842cf9d4248dc515d80ebb45575341b7d76832f979a8ac5f602784cb5b03f2cL121
> >>
> >> diff --git a/src/doveadm/doveadm-mail-deduplicate.c
> >> b/src/doveadm/doveadm-mail-deduplicate.c
> >>
> >> index caec758112..2152482876 100644
> >> --- a/src/doveadm/doveadm-mail-deduplicate.c
> >> +++ b/src/doveadm/doveadm-mail-deduplicate.c
> >> @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context
> >> *_ctx,
> >>       if (key != NULL && *key != '\0') {
> >>       if (hash_table_lookup(hash, key) != NULL)
> >>       mail_expunge(mail);
> >> -   else
> >> +   else {
> >> +   key = p_strdup(pool, key);
> >>       hash_table_insert(hash, key,
> >> POINTER_CAST(1));
> >> +   }
> >>       }
> >>       }
> > Thank you both for the report, we'll look into this!
> >
> > Aki


Re: doveadm-deduplicate deletes non-duplicates

2022-06-12 Thread Aki Tuomi


> On 13/06/2022 02:09 gravitini  wrote:
> 
>  
> Replying to: https://dovecot.org/pipermail/dovecot/2022-May/124816.html
> 
> 
> Hi,
> 
> Looking at the code (and tested via local build from source) it looks 
> like doveadm deduplicate in 2.3.19 can cause significant data loss.
> 
> A 2022-02-11 commit removed key duplication resulting in undefined 
> behaviour which is often truncation of a mailbox to 67 entries. 
> (HASH_TABLE_MIN_SIZE)
> 
> https://github.com/dovecot/core/commit/320844f50cd669b602d30210e2e5216f65d2050f?diff=split#diff-5842cf9d4248dc515d80ebb45575341b7d76832f979a8ac5f602784cb5b03f2cL121
> 
> diff --git a/src/doveadm/doveadm-mail-deduplicate.c 
> b/src/doveadm/doveadm-mail-deduplicate.c
> 
> index caec758112..2152482876 100644
> --- a/src/doveadm/doveadm-mail-deduplicate.c
> +++ b/src/doveadm/doveadm-mail-deduplicate.c
> @@ -63,8 +63,10 @@ cmd_deduplicate_box(struct doveadm_mail_cmd_context 
> *_ctx,
>      if (key != NULL && *key != '\0') {
>      if (hash_table_lookup(hash, key) != NULL)
>      mail_expunge(mail);
> -   else
> +   else {
> +   key = p_strdup(pool, key);
>      hash_table_insert(hash, key, 
> POINTER_CAST(1));
> +   }
>      }
>      }

Thank you both for the report, we'll look into this!

Aki


doveadm-deduplicate deletes non-duplicates

2022-05-31 Thread Ryan Kavanagh
Hi,

I've been trying to use `doveadm deduplicate` to deduplicate mailboxes.
According to doveadm-deduplicate(1), "deduplication will be done by
message GUIDs". However, deduplication deletes messages with distinct
message GUIDs, i.e., it deletes messages that are not duplicates. Is
this a case of user error, do I have some form of corruption going on,
or am I running into a bug?

In case it helps, I'm including:

1) the list of GUIDs for messages in my INBOX before a deduplication run
   (as documented in doveadm-deduplicate(1)),
2) the output of `doveadm -D deduplicate -u rak mailbox INBOX`,
3) the list of GUIDs after deduplication,
4) a diff of (1) and (3),
5) the output of doveconf -n.

Thanks,
Ryan

$ doveadm -f table fetch -u rak 'guid uid' mailbox INBOX | sort
0b3bee1414118f6282db226807b07100
0b4b681c18168f622094226807b07104
0b83ac1d79cf2962eb3a0100226807b06153
0b97791e83108f6282db226807b07095
1614451513.M180742P77875.hades.rak.ac,S=3516,W=3600 22
1614451513.M180779P77875.hades.rak.ac,S=5252,W=5370 52
1614452870.M315137P88362.hades.rak.ac,S=5516,W=5623 68
1614452870.M315152P88362.hades.rak.ac,S=5977,W=6085 69
23a5a0179e108f6282db226807b07097
33318e0686108f6282db226807b07096
3351581c9c0e8f624cd5226807b07091
3359032360108f6282db226807b07093
3b3927352be48c62b65b226807b07072
3ba2252814118f62a2540100226807b07101
4154be0c2fc77761c255226807b04050
5b3a7327d73b866299cd226807b07013
638e073255235d62a5280100226807b06660
63b6422e14118f6282db226807b07102
8b9b942909118f6282db226807b07099
8bc95639a6168f62ad4e226807b07105
a316ee28980d8f627c46226807b07090
ab802c1e37ac3d6214680100226807b06343
b3d12f21cd108f6282db226807b07098
bb89431b6e728d623092226807b07076
c386310f062adb61d198226807b05306
cb56992484478d62f3090100226807b07074
cb79bf3503128f6275220100226807b07103
eb2818367d108f62484b0100a558518d7094
eb709020e70f8f62292e226807b07092
f19a8c06409a7d61271c226807b04128
f338ad1ff65e8562ff390100226807b07007
f9cd03363c457c613a0e226807b04107
guiduid

$ doveadm -D deduplicate -u rak mailbox INBOX
Debug: Loading modules from directory: /usr/local/lib/dovecot/doveadm
Debug: Skipping module doveadm_acl_plugin, because dlopen() failed: Cannot load 
specified object (this is usually intentional, so just ignore this message)
Debug: Skipping module doveadm_quota_plugin, because dlopen() failed: Cannot 
load specified object (this is usually intentional, so just ignore this message)
Debug: Module loaded: 
/usr/local/lib/dovecot/doveadm/lib10_doveadm_sieve_plugin.so
Debug: Skipping module doveadm_fts_plugin, because dlopen() failed: Cannot load 
specified object (this is usually intentional, so just ignore this message)
Debug: Skipping module doveadm_fts_flatcurve_plugin, because dlopen() failed: 
Cannot load specified object (this is usually intentional, so just ignore this 
message)
Debug: Skipping module doveadm_mail_crypt_plugin, because dlopen() failed: 
Cannot load specified object (this is usually intentional, so just ignore this 
message)
May 31 12:50:35 Debug: Loading modules from directory: /usr/local/lib/dovecot
May 31 12:50:35 Debug: Module loaded: 
/usr/local/lib/dovecot/lib15_notify_plugin.so
May 31 12:50:35 Debug: Module loaded: 
/usr/local/lib/dovecot/lib20_replication_plugin.so
May 31 12:50:35 Debug: Module loaded: 
/usr/local/lib/dovecot/lib20_virtual_plugin.so
May 31 12:50:35 Debug: Loading modules from directory: 
/usr/local/lib/dovecot/doveadm
May 31 12:50:35 Debug: Skipping module doveadm_acl_plugin, because dlopen() 
failed: Cannot load specified object (this is usually intentional, so just 
ignore this message)
May 31 12:50:35 Debug: Skipping module doveadm_quota_plugin, because dlopen() 
failed: Cannot load specified object (this is usually intentional, so just 
ignore this message)
May 31 12:50:35 Debug: Skipping module doveadm_fts_plugin, because dlopen() 
failed: Cannot load specified object (this is usually intentional, so just 
ignore this message)
May 31 12:50:35 Debug: Skipping module doveadm_fts_flatcurve_plugin, because 
dlopen() failed: Cannot load specified object (this is usually intentional, so 
just ignore this message)
May 31 12:50:35 Debug: Skipping module doveadm_mail_crypt_plugin, because 
dlopen() failed: Cannot load specified object (this is usually intentional, so 
just ignore this message)
May 31 12:50:35