Re: Corrupted sizes in cache once again

2023-02-02 Thread Stuart Henderson
On 2023-02-02, Tim Evers  wrote:
>
> Am 02.02.23 um 16:23 schrieb Aki Tuomi:
>> For bug reports, we do ask that you try to reproduce it with 2.3.20 (current 
>> latest), you can get packages from https://repo.dovecot.org/ and would be 
>> nice if you can provide steps to reproduce this issue.
>>
>> Also please see https://www.dovecot.org/bugreport-mail/ if you have not yet.
>>
>> Aki
>
> This is not a bug report (yet) - I am asking for ideas to narrow down 
> the issue myself. I would not expect something this obvious and 
> prevalent being a bug in a 10+ years old subsystem (zlib).

Some of the errors seen that were fixed in the period that I mentioned
looked *exactly* like this.




RE: Corrupted sizes in cache once again

2023-02-02 Thread Marc
> 
> Maybe I was a bit unclear: I have about 1000 error messages per day from
> random accounts (about 500 in total so far) on all clusters. These are
> transparent to the user, so it's more like background noise at the
> moment.

Do you have ecc memory?

> No VM involved. All machines are baremetal DRBD two-node clusters.

How old are your drives?
Do you scrub the raids? 
How reliable is your drbd setup? Does drbd even sync raid fixes?
Do you have networking issues?

Connect a hdd directly and use that for a few accounts, do you still have the 
problem? I will bet not.

Why do you have this bare metal? Do you need performance? Otherwise switch to 
reliable storage that is a bit less performing. You will get headaches from 
these multiple drbd setups.

> As far as I see it I can not nail it down to specific accounts, POP3 vs.
> IMAP, LMTP delivery vs. IMAP store or Sieve vs. non-Sieve etc.
> 

It is not dovecot, otherwise it would be here more often listed. 




Re: Corrupted sizes in cache once again

2023-02-02 Thread Tim Evers
Maybe I was a bit unclear: I have about 1000 error messages per day from 
random accounts (about 500 in total so far) on all clusters. These are 
transparent to the user, so it's more like background noise at the moment.


No VM involved. All machines are baremetal DRBD two-node clusters.

As far as I see it I can not nail it down to specific accounts, POP3 vs. 
IMAP, LMTP delivery vs. IMAP store or Sieve vs. non-Sieve etc.


Tim

Am 02.02.23 um 17:55 schrieb Christopher Wensink:
Can you isolate the problem account on a separate VM to see if the 
problem follows the account or the original vm?


Chris

On 2/2/2023 9:58 AM, Tim Evers wrote:
Good point - these are 8 diferrent DRBD clusters. I failed over one 
testing this theory. Problem persists.


So I would rule out underlying issues.

Especially since the "wrong" value is suspiciously often the on-disk 
size rather than a random value one would expect if there is 
corruption underneath.


Tim

Am 02.02.23 um 16:43 schrieb Christopher Wensink:
Something to try, this all could be happening because of underlying 
disk failure on the array it is running on.  If this is a VM, can 
you move the operation to another host or data store to rule out 
hardware issues?


On 2/2/2023 9:19 AM, Stuart Henderson wrote:

On 2023-02-01, Tim Evers  wrote:

I run a fairly large Dovecot Installation (around 100k mailboxes) on
several servers.

gzip compression is on.

Every once in a while I get the dreaded "cache corruption" 
messages in

the log:

Error: Corrupted record in index cache file
/[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical 
size

in mailbox INBOX:
read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) 


failed: Cached message size smaller than expected (2877 < 8099,
box=INBOX, UID=3868)

Error: Corrupted record in index cache file
/[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical 
size

in mailbox INBOX:
read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) 


failed: Cached message size smaller than expected (5533 < 8192,
box=INBOX, UID=3875)

The first entry shows 2877 (size on disk) vs. 8099 (real size 
unzipped,

also in the filename: S=8099).

The second entry shows 5533 (size on disk) vs. 8192 - this is not
correct in any way. Size on disk is 13907 as noted in the filename.

Both mails were delivered trough LMTP and retrieved by the POP3 
service.


Anyone with an idea what might be happening here? I've read all
available info in the doc and in the previous discussions / bug 
reports,
but nothing seems to match my case. And where does that 8192 come 
from -

it looks suspicious?

Version is 2.3.7.2 (Ubuntu 20.04)
2.3.7.2 is rather old now. There were definitely fixes regarding 
compression
around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the 
details
but it took a release or two before some remaining issues were 
sorted out
after changes in the area). I'd be looking to get it updated to a 
current

version first.









Re: Corrupted sizes in cache once again

2023-02-02 Thread Christopher Wensink
Can you isolate the problem account on a separate VM to see if the 
problem follows the account or the original vm?


Chris

On 2/2/2023 9:58 AM, Tim Evers wrote:
Good point - these are 8 diferrent DRBD clusters. I failed over one 
testing this theory. Problem persists.


So I would rule out underlying issues.

Especially since the "wrong" value is suspiciously often the on-disk 
size rather than a random value one would expect if there is 
corruption underneath.


Tim

Am 02.02.23 um 16:43 schrieb Christopher Wensink:
Something to try, this all could be happening because of underlying 
disk failure on the array it is running on.  If this is a VM, can you 
move the operation to another host or data store to rule out hardware 
issues?


On 2/2/2023 9:19 AM, Stuart Henderson wrote:

On 2023-02-01, Tim Evers  wrote:

I run a fairly large Dovecot Installation (around 100k mailboxes) on
several servers.

gzip compression is on.

Every once in a while I get the dreaded "cache corruption" messages in
the log:

Error: Corrupted record in index cache file
/[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical 
size

in mailbox INBOX:
read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) 


failed: Cached message size smaller than expected (2877 < 8099,
box=INBOX, UID=3868)

Error: Corrupted record in index cache file
/[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical 
size

in mailbox INBOX:
read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) 


failed: Cached message size smaller than expected (5533 < 8192,
box=INBOX, UID=3875)

The first entry shows 2877 (size on disk) vs. 8099 (real size 
unzipped,

also in the filename: S=8099).

The second entry shows 5533 (size on disk) vs. 8192 - this is not
correct in any way. Size on disk is 13907 as noted in the filename.

Both mails were delivered trough LMTP and retrieved by the POP3 
service.


Anyone with an idea what might be happening here? I've read all
available info in the doc and in the previous discussions / bug 
reports,
but nothing seems to match my case. And where does that 8192 come 
from -

it looks suspicious?

Version is 2.3.7.2 (Ubuntu 20.04)
2.3.7.2 is rather old now. There were definitely fixes regarding 
compression
around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the 
details
but it took a release or two before some remaining issues were 
sorted out
after changes in the area). I'd be looking to get it updated to a 
current

version first.







--
Christopher Wensink
IS Administrator
Five Star Plastics, Inc
1339 Continental Drive
Eau Claire, WI 54701
Office:  715-831-1682
Mobile:  715-563-3112
Fax:  715-831-6075
cwens...@five-star-plastics.com
www.five-star-plastics.com



Re: Corrupted sizes in cache once again

2023-02-02 Thread Tim Evers
Good point - these are 8 diferrent DRBD clusters. I failed over one 
testing this theory. Problem persists.


So I would rule out underlying issues.

Especially since the "wrong" value is suspiciously often the on-disk 
size rather than a random value one would expect if there is corruption 
underneath.


Tim

Am 02.02.23 um 16:43 schrieb Christopher Wensink:
Something to try, this all could be happening because of underlying 
disk failure on the array it is running on.  If this is a VM, can you 
move the operation to another host or data store to rule out hardware 
issues?


On 2/2/2023 9:19 AM, Stuart Henderson wrote:

On 2023-02-01, Tim Evers  wrote:

I run a fairly large Dovecot Installation (around 100k mailboxes) on
several servers.

gzip compression is on.

Every once in a while I get the dreaded "cache corruption" messages in
the log:

Error: Corrupted record in index cache file
/[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size
in mailbox INBOX:
read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) 


failed: Cached message size smaller than expected (2877 < 8099,
box=INBOX, UID=3868)

Error: Corrupted record in index cache file
/[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size
in mailbox INBOX:
read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) 


failed: Cached message size smaller than expected (5533 < 8192,
box=INBOX, UID=3875)

The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped,
also in the filename: S=8099).

The second entry shows 5533 (size on disk) vs. 8192 - this is not
correct in any way. Size on disk is 13907 as noted in the filename.

Both mails were delivered trough LMTP and retrieved by the POP3 
service.


Anyone with an idea what might be happening here? I've read all
available info in the doc and in the previous discussions / bug 
reports,
but nothing seems to match my case. And where does that 8192 come 
from -

it looks suspicious?

Version is 2.3.7.2 (Ubuntu 20.04)
2.3.7.2 is rather old now. There were definitely fixes regarding 
compression
around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the 
details
but it took a release or two before some remaining issues were sorted 
out
after changes in the area). I'd be looking to get it updated to a 
current

version first.







Re: Corrupted sizes in cache once again

2023-02-02 Thread Tim Evers



Am 02.02.23 um 16:23 schrieb Aki Tuomi:

On 02/02/2023 17:19 EET Stuart Henderson  wrote:

  
On 2023-02-01, Tim Evers  wrote:

I run a fairly large Dovecot Installation (around 100k mailboxes) on
several servers.

gzip compression is on.

Every once in a while I get the dreaded "cache corruption" messages in
the log:

Error: Corrupted record in index cache file
/[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size
in mailbox INBOX:
read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,))
failed: Cached message size smaller than expected (2877 < 8099,
box=INBOX, UID=3868)

Error: Corrupted record in index cache file
/[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size
in mailbox INBOX:
read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,))
failed: Cached message size smaller than expected (5533 < 8192,
box=INBOX, UID=3875)

The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped,
also in the filename: S=8099).

The second entry shows 5533 (size on disk) vs. 8192 - this is not
correct in any way. Size on disk is 13907 as noted in the filename.

Both mails were delivered trough LMTP and retrieved by the POP3 service.

Anyone with an idea what might be happening here? I've read all
available info in the doc and in the previous discussions / bug reports,
but nothing seems to match my case. And where does that 8192 come from -
it looks suspicious?

Version is 2.3.7.2 (Ubuntu 20.04)

2.3.7.2 is rather old now. There were definitely fixes regarding compression
around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details
but it took a release or two before some remaining issues were sorted out
after changes in the area). I'd be looking to get it updated to a current
version first.

For bug reports, we do ask that you try to reproduce it with 2.3.20 (current 
latest), you can get packages from https://repo.dovecot.org/ and would be nice 
if you can provide steps to reproduce this issue.

Also please see https://www.dovecot.org/bugreport-mail/ if you have not yet.

Aki


This is not a bug report (yet) - I am asking for ideas to narrow down 
the issue myself. I would not expect something this obvious and 
prevalent being a bug in a 10+ years old subsystem (zlib).


Tim



RE: Corrupted sizes in cache once again

2023-02-02 Thread Marc
Could even be memory. I had once on an office machine a faulty memory module 
(without ecc), and it caused the md5sum from files on truecrypt usb backup 
drives to change constantly. Removed the module, and no more issues.


> 
> Something to try, this all could be happening because of underlying disk
> failure on the array it is running on.  If this is a VM, can you move
> the operation to another host or data store to rule out hardware issues?
> 
> On 2/2/2023 9:19 AM, Stuart Henderson wrote:
> > On 2023-02-01, Tim Evers  wrote:
> >> I run a fairly large Dovecot Installation (around 100k mailboxes) on
> >> several servers.
> >>
> >> gzip compression is on.
> >>
> >> Every once in a while I get the dreaded "cache corruption" messages in
> >> the log:
> >>
> >> Error: Corrupted record in index cache file
> >> /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size
> >> in mailbox INBOX:
> >>
> read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276
> :2,))
> >> failed: Cached message size smaller than expected (2877 < 8099,
> >> box=INBOX, UID=3868)
> >>
> >> Error: Corrupted record in index cache file
> >> /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size
> >> in mailbox INBOX:
> >>
> read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=141
> 21:2,))
> >> failed: Cached message size smaller than expected (5533 < 8192,
> >> box=INBOX, UID=3875)
> >>
> >> The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped,
> >> also in the filename: S=8099).
> >>
> >> The second entry shows 5533 (size on disk) vs. 8192 - this is not
> >> correct in any way. Size on disk is 13907 as noted in the filename.
> >>
> >> Both mails were delivered trough LMTP and retrieved by the POP3 service.
> >>
> >> Anyone with an idea what might be happening here? I've read all
> >> available info in the doc and in the previous discussions / bug reports,
> >> but nothing seems to match my case. And where does that 8192 come from -
> >> it looks suspicious?
> >>
> >> Version is 2.3.7.2 (Ubuntu 20.04)
> > 2.3.7.2 is rather old now. There were definitely fixes regarding compression
> > around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details
> > but it took a release or two before some remaining issues were sorted out
> > after changes in the area). I'd be looking to get it updated to a current
> > version first.
> >


Re: Corrupted sizes in cache once again

2023-02-02 Thread Christopher Wensink
Something to try, this all could be happening because of underlying disk 
failure on the array it is running on.  If this is a VM, can you move 
the operation to another host or data store to rule out hardware issues?


On 2/2/2023 9:19 AM, Stuart Henderson wrote:

On 2023-02-01, Tim Evers  wrote:

I run a fairly large Dovecot Installation (around 100k mailboxes) on
several servers.

gzip compression is on.

Every once in a while I get the dreaded "cache corruption" messages in
the log:

Error: Corrupted record in index cache file
/[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size
in mailbox INBOX:
read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,))
failed: Cached message size smaller than expected (2877 < 8099,
box=INBOX, UID=3868)

Error: Corrupted record in index cache file
/[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size
in mailbox INBOX:
read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,))
failed: Cached message size smaller than expected (5533 < 8192,
box=INBOX, UID=3875)

The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped,
also in the filename: S=8099).

The second entry shows 5533 (size on disk) vs. 8192 - this is not
correct in any way. Size on disk is 13907 as noted in the filename.

Both mails were delivered trough LMTP and retrieved by the POP3 service.

Anyone with an idea what might be happening here? I've read all
available info in the doc and in the previous discussions / bug reports,
but nothing seems to match my case. And where does that 8192 come from -
it looks suspicious?

Version is 2.3.7.2 (Ubuntu 20.04)

2.3.7.2 is rather old now. There were definitely fixes regarding compression
around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details
but it took a release or two before some remaining issues were sorted out
after changes in the area). I'd be looking to get it updated to a current
version first.





--
Christopher Wensink
IS Administrator
Five Star Plastics, Inc
1339 Continental Drive
Eau Claire, WI 54701
Office:  715-831-1682
Mobile:  715-563-3112
Fax:  715-831-6075
cwens...@five-star-plastics.com
www.five-star-plastics.com



Re: Corrupted sizes in cache once again

2023-02-02 Thread Aki Tuomi


> On 02/02/2023 17:19 EET Stuart Henderson  wrote:
> 
>  
> On 2023-02-01, Tim Evers  wrote:
> > I run a fairly large Dovecot Installation (around 100k mailboxes) on 
> > several servers.
> >
> > gzip compression is on.
> >
> > Every once in a while I get the dreaded "cache corruption" messages in 
> > the log:
> >
> > Error: Corrupted record in index cache file 
> > /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size 
> > in mailbox INBOX: 
> > read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,))
> >  
> > failed: Cached message size smaller than expected (2877 < 8099, 
> > box=INBOX, UID=3868)
> >
> > Error: Corrupted record in index cache file 
> > /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size 
> > in mailbox INBOX: 
> > read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,))
> >  
> > failed: Cached message size smaller than expected (5533 < 8192, 
> > box=INBOX, UID=3875)
> >
> > The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, 
> > also in the filename: S=8099).
> >
> > The second entry shows 5533 (size on disk) vs. 8192 - this is not 
> > correct in any way. Size on disk is 13907 as noted in the filename.
> >
> > Both mails were delivered trough LMTP and retrieved by the POP3 service.
> >
> > Anyone with an idea what might be happening here? I've read all 
> > available info in the doc and in the previous discussions / bug reports, 
> > but nothing seems to match my case. And where does that 8192 come from - 
> > it looks suspicious?
> >
> > Version is 2.3.7.2 (Ubuntu 20.04)
> 
> 2.3.7.2 is rather old now. There were definitely fixes regarding compression
> around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details
> but it took a release or two before some remaining issues were sorted out
> after changes in the area). I'd be looking to get it updated to a current
> version first.

For bug reports, we do ask that you try to reproduce it with 2.3.20 (current 
latest), you can get packages from https://repo.dovecot.org/ and would be nice 
if you can provide steps to reproduce this issue.

Also please see https://www.dovecot.org/bugreport-mail/ if you have not yet.

Aki


Re: Corrupted sizes in cache once again

2023-02-02 Thread Stuart Henderson
On 2023-02-01, Tim Evers  wrote:
> I run a fairly large Dovecot Installation (around 100k mailboxes) on 
> several servers.
>
> gzip compression is on.
>
> Every once in a while I get the dreaded "cache corruption" messages in 
> the log:
>
> Error: Corrupted record in index cache file 
> /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size 
> in mailbox INBOX: 
> read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,))
>  
> failed: Cached message size smaller than expected (2877 < 8099, 
> box=INBOX, UID=3868)
>
> Error: Corrupted record in index cache file 
> /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size 
> in mailbox INBOX: 
> read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,))
>  
> failed: Cached message size smaller than expected (5533 < 8192, 
> box=INBOX, UID=3875)
>
> The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, 
> also in the filename: S=8099).
>
> The second entry shows 5533 (size on disk) vs. 8192 - this is not 
> correct in any way. Size on disk is 13907 as noted in the filename.
>
> Both mails were delivered trough LMTP and retrieved by the POP3 service.
>
> Anyone with an idea what might be happening here? I've read all 
> available info in the doc and in the previous discussions / bug reports, 
> but nothing seems to match my case. And where does that 8192 come from - 
> it looks suspicious?
>
> Version is 2.3.7.2 (Ubuntu 20.04)

2.3.7.2 is rather old now. There were definitely fixes regarding compression
around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details
but it took a release or two before some remaining issues were sorted out
after changes in the area). I'd be looking to get it updated to a current
version first.





Special authentication use case

2023-02-02 Thread Philippe MARASSE

Folks,

I'm trying to configure dovecot SASL with two use cases :
  - First with XOAUTH2 : I've managed to get it working, pretty right 
out of the box, developers have done a great job :-)

  - Second with Client TLS Certificate with no luck.

Let me explain, the certificate brought by the client does not have any 
clue of associated email, I have to check that username (=email) sent by 
the client is really related to some information included in the 
certificate (I have to extract the OU and then lookup into a table of 
authorized mails for that OU).


Is it possible to do that with dovecot ? I think yes but I'm looking for 
direction to achieve that. Lua maybe ?


Our configuration :
  - OS : Debian 11

$ /usr/sbin/dovecot --version
2.3.13 (89f716dc2)

Regards.

--
Philippe MARASSE

Responsable pôle Infrastructures - DSIO
Centre Hospitalier Henri Laborit
CS 10587 - 370 avenue Jacques Cœur
86021 Poitiers Cedex
Tel : 05.49.44.57.19