Re: sis deduplication broken from 2.2.16 upwards

2016-04-12 Thread Alexander Moisseev

On 11.03.2016 3:56, Timo Sirainen wrote:

So, after the fix is applied, does dovecot silently delete the
duplicated files, or is there a command that needs to be run manually?


You'd have to do it manually in some way. A script that does something like:

Go through all attachment directories and for each file:
 - Sort files by filename
 - Identify that files A and B the same (beginning of the filename begins with 
same hash), but have a different inode
 - ln A B.tmp && mv B.tmp B



The problem turned out to be a bit more complicated than that.

Finally a came up with that script:
https://github.com/moisseev/doveadm-tools/blob/master/bin/dsisck

It assumes Dovecot should not run.


Re: sis deduplication broken from 2.2.16 upwards

2016-03-21 Thread Alexander Moisseev

On 11.03.16 3:56, Timo Sirainen wrote:



On 11 Mar 2016, at 02:37, Charles Marcus  wrote:

On 3/9/2016 9:02 PM, Timo Sirainen  wrote:

On 08 Mar 2016, at 01:50, Pavel Stano  wrote:


sis attachment deduplication is broken in 2.2.16 upwards.
It is caused by this commit.
https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63

in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of
inodes of hash files.
Because fs_stat() after that commit use fstat() on open fd of temporary
file instead of stat on filename. But that temporary file has differnt
inode.

It not cause any corruption but it will not save any space.
Because every duplicate attachment will be in separate file.

Thanks, fixed: 
https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1


So, after the fix is applied, does dovecot silently delete the
duplicated files, or is there a command that needs to be run manually?


You'd have to do it manually in some way. A script that does something like:

Go through all attachment directories and for each file:
 - Sort files by filename
 - Identify that files A and B the same (beginning of the filename begins with 
same hash), but have a different inode
 - ln A B.tmp && mv B.tmp B

 
I've also found that many of /hashes/ directories have missed.


# ll /tank1/vmail/attachments/1f/1f
total 3300
-rw---  1 vmail  vmail   403976 12 ноя 00:20 
1f1f504c582600a2af94b39c088692aba714fe72-c53b9e1508b14356797d0100d09efc50
-rw---  1 vmail  vmail   403976 12 ноя 00:20 
1f1f504c582600a2af94b39c088692aba714fe72-c93b9e1508b14356797d0100d09efc50
-rw---  1 vmail  vmail   403976 12 ноя 00:20 
1f1f504c582600a2af94b39c088692aba714fe72-f2a777181eb14356807d0100d09efc50
-rw---  1 vmail  vmail   403976 12 ноя 00:20 
1f1f504c582600a2af94b39c088692aba714fe72-f31a5e2917b143567e7d0100d09efc50
-rw---  1 vmail  vmail  2582016  3 ноя 00:20 
1f1f97880e8cddc2dfe3c4ad2654b9da937226b7-94c53d358bd33756d614d09efc50

Is it related to the same bug or there is another issue?
Is it safe to delete attachment files if there is no file with the same hash in 
the /hashes/ directory or there is no /hashes/ directory at all?


Re: sis deduplication broken from 2.2.16 upwards

2016-03-11 Thread Charles Marcus
On 3/10/2016 7:56 PM, Timo Sirainen  wrote:
>> On 11 Mar 2016, at 02:37, Charles Marcus  wrote:
>>
>> On 3/9/2016 9:02 PM, Timo Sirainen  wrote:
>>> On 08 Mar 2016, at 01:50, Pavel Stano  wrote:
 sis attachment deduplication is broken in 2.2.16 upwards.
 It is caused by this commit.
 https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63

 in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of
 inodes of hash files.
 Because fs_stat() after that commit use fstat() on open fd of temporary
 file instead of stat on filename. But that temporary file has differnt
 inode.

 It not cause any corruption but it will not save any space.
 Because every duplicate attachment will be in separate file.
>>> Thanks, fixed: 
>>> https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1
>> So, after the fix is applied, does dovecot silently delete the
>> duplicated files, or is there a command that needs to be run manually?
> You'd have to do it manually in some way. A script that does something like:
>
> Go through all attachment directories and for each file:
>  - Sort files by filename
>  - Identify that files A and B the same (beginning of the filename begins 
> with same hash), but have a different inode
>  - ln A B.tmp && mv B.tmp B

Ugh... ok thanks, but it seems like that would be much safer as a
doveadm command...


Re: sis deduplication broken from 2.2.16 upwards

2016-03-11 Thread Harald Leithner

Am 11.03.2016 um 01:56 schrieb Timo Sirainen:



On 11 Mar 2016, at 02:37, Charles Marcus  wrote:

On 3/9/2016 9:02 PM, Timo Sirainen  wrote:

On 08 Mar 2016, at 01:50, Pavel Stano  wrote:


sis attachment deduplication is broken in 2.2.16 upwards.
It is caused by this commit.
https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63

in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of
inodes of hash files.
Because fs_stat() after that commit use fstat() on open fd of temporary
file instead of stat on filename. But that temporary file has differnt
inode.

It not cause any corruption but it will not save any space.
Because every duplicate attachment will be in separate file.

Thanks, fixed: 
https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1


So, after the fix is applied, does dovecot silently delete the
duplicated files, or is there a command that needs to be run manually?


You'd have to do it manually in some way. A script that does something like:

Go through all attachment directories and for each file:
  - Sort files by filename
  - Identify that files A and B the same (beginning of the filename begins with 
same hash), but have a different inode
  - ln A B.tmp && mv B.tmp B



This functionality is how it works in sis-queue correct?

Wouldn't it be nice to adopted doveadm sis deduplicate to handle this?

regards

--
Harald Leithner

ITronic
Wiedner Hauptstraße 120/5.1, 1050 Wien, Austria
Tel: +43-1-545 0 604
Mobil: +43-699-123 78 4 78
Mail: leith...@itronic.at | itronic.at


Re: sis deduplication broken from 2.2.16 upwards

2016-03-10 Thread Timo Sirainen

> On 11 Mar 2016, at 02:37, Charles Marcus  wrote:
> 
> On 3/9/2016 9:02 PM, Timo Sirainen  wrote:
>> On 08 Mar 2016, at 01:50, Pavel Stano  wrote:
>>> 
>>> sis attachment deduplication is broken in 2.2.16 upwards.
>>> It is caused by this commit.
>>> https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63
>>> 
>>> in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of
>>> inodes of hash files.
>>> Because fs_stat() after that commit use fstat() on open fd of temporary
>>> file instead of stat on filename. But that temporary file has differnt
>>> inode.
>>> 
>>> It not cause any corruption but it will not save any space.
>>> Because every duplicate attachment will be in separate file.
>> Thanks, fixed: 
>> https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1
> 
> So, after the fix is applied, does dovecot silently delete the
> duplicated files, or is there a command that needs to be run manually?

You'd have to do it manually in some way. A script that does something like:

Go through all attachment directories and for each file:
 - Sort files by filename
 - Identify that files A and B the same (beginning of the filename begins with 
same hash), but have a different inode
 - ln A B.tmp && mv B.tmp B


Re: sis deduplication broken from 2.2.16 upwards

2016-03-10 Thread Charles Marcus
On 3/9/2016 9:02 PM, Timo Sirainen  wrote:
> On 08 Mar 2016, at 01:50, Pavel Stano  wrote:
>>
>> sis attachment deduplication is broken in 2.2.16 upwards.
>> It is caused by this commit.
>> https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63
>>
>> in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of
>> inodes of hash files.
>> Because fs_stat() after that commit use fstat() on open fd of temporary
>> file instead of stat on filename. But that temporary file has differnt
>> inode.
>>
>> It not cause any corruption but it will not save any space.
>> Because every duplicate attachment will be in separate file.
> Thanks, fixed: 
> https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1

So, after the fix is applied, does dovecot silently delete the
duplicated files, or is there a command that needs to be run manually?


Re: sis deduplication broken from 2.2.16 upwards

2016-03-09 Thread Timo Sirainen
On 08 Mar 2016, at 01:50, Pavel Stano  wrote:
> 
> 
> sis attachment deduplication is broken in 2.2.16 upwards.
> It is caused by this commit.
> https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63
> 
> in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of
> inodes of hash files.
> Because fs_stat() after that commit use fstat() on open fd of temporary
> file instead of stat on filename. But that temporary file has differnt
> inode.
> 
> It not cause any corruption but it will not save any space.
> Because every duplicate attachment will be in separate file.

Thanks, fixed: 
https://github.com/dovecot/core/commit/3b39022ea0513363241cf852b7d454c841584ea1


sis deduplication broken from 2.2.16 upwards

2016-03-07 Thread Pavel Stano
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

sis attachment deduplication is broken in 2.2.16 upwards.
It is caused by this commit.
https://github.com/dovecot/core/commit/664bf3e236c214aee86294483c379e4fa66c2e63

in src/lib-fs/fs-sis.c function fs_sis_try_link() is comparation of
inodes of hash files.
Because fs_stat() after that commit use fstat() on open fd of temporary
file instead of stat on filename. But that temporary file has differnt
inode.

It not cause any corruption but it will not save any space.
Because every duplicate attachment will be in separate file.

- -- 
[ Ohodnotte kvalitu mailu:
https://www.nicereply.com/websupport/4afafd34 ]

Pavel Stano | Troubleshooter

http://WebSupport.sk
*** BERTE A VYCHUTNAVAJTE ***

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJW3XkgAAoJEJDF0QA0DLajmJ0QAIZTFTkN0KMJDT3I/rft5Dpp
QYM6kyTVdr/FNeR200LLPjXsHXHF8hhoxkQbpfq1D9ceQO0ERDVaL6knKK7/j5Zj
EffO+ercXUkcJbE8iYRNcbdpXltUjgYYzgOha7ULiCk0VxROxcujThNTGCDHY8g/
zHXre2aW7hQ8o1yjjhITIOU5WsENSI8PbFktvvWF6OEMVwv8EnaEckJTRGwEl2fu
FEPSV+eQg6jHTs/fgiMxAbd4DSzRa2tkeOrw4l4oprtIuthU5hAi9G42Dk+IFvXk
k6imnsI6QMPhlrHCm+8Ym+8FwCv7S75JJ/iRp/sS2zHuEwvfwFcVi8pBSjeV50kZ
Z0Xglm35tFrHNLAFFb1CVnEw6ftQpuMEE7o/IYrLMVVdPB0C/1IP0s9ucKK0ccQ0
Ld9RULBo6jLgxsVodEntgtMABkfvxybut/tF3O4mm/iN0rDzlOZHMeQ+lsgcdrZd
J7h3ow/J0OYosR7CfRFLA875ue2g4GsF+FFP9/oKx/1TkHXJNXpKAhhjOCnSgxHe
h44bDrpJyYKlrCaLcOjQM4iUO0upBPpepOnJu50Fde1hNmyr4NaiHLTo23OGX/v3
kEA6xEgmTsXyr3kjhg51/0tdradqKvih2yIptmhpw8P+RWp+ceMoDX2hF95EGB8q
FGklrxdxCIoGPAZTOVyf
=imfX
-END PGP SIGNATURE-