Re: Corrupted sizes in cache once again
On 2023-02-02, Tim Evers wrote: > > Am 02.02.23 um 16:23 schrieb Aki Tuomi: >> For bug reports, we do ask that you try to reproduce it with 2.3.20 (current >> latest), you can get packages from https://repo.dovecot.org/ and would be >> nice if you can provide steps to reproduce this issue. >> >> Also please see https://www.dovecot.org/bugreport-mail/ if you have not yet. >> >> Aki > > This is not a bug report (yet) - I am asking for ideas to narrow down > the issue myself. I would not expect something this obvious and > prevalent being a bug in a 10+ years old subsystem (zlib). Some of the errors seen that were fixed in the period that I mentioned looked *exactly* like this.
RE: Corrupted sizes in cache once again
> > Maybe I was a bit unclear: I have about 1000 error messages per day from > random accounts (about 500 in total so far) on all clusters. These are > transparent to the user, so it's more like background noise at the > moment. Do you have ecc memory? > No VM involved. All machines are baremetal DRBD two-node clusters. How old are your drives? Do you scrub the raids? How reliable is your drbd setup? Does drbd even sync raid fixes? Do you have networking issues? Connect a hdd directly and use that for a few accounts, do you still have the problem? I will bet not. Why do you have this bare metal? Do you need performance? Otherwise switch to reliable storage that is a bit less performing. You will get headaches from these multiple drbd setups. > As far as I see it I can not nail it down to specific accounts, POP3 vs. > IMAP, LMTP delivery vs. IMAP store or Sieve vs. non-Sieve etc. > It is not dovecot, otherwise it would be here more often listed.
Re: Corrupted sizes in cache once again
Maybe I was a bit unclear: I have about 1000 error messages per day from random accounts (about 500 in total so far) on all clusters. These are transparent to the user, so it's more like background noise at the moment. No VM involved. All machines are baremetal DRBD two-node clusters. As far as I see it I can not nail it down to specific accounts, POP3 vs. IMAP, LMTP delivery vs. IMAP store or Sieve vs. non-Sieve etc. Tim Am 02.02.23 um 17:55 schrieb Christopher Wensink: Can you isolate the problem account on a separate VM to see if the problem follows the account or the original vm? Chris On 2/2/2023 9:58 AM, Tim Evers wrote: Good point - these are 8 diferrent DRBD clusters. I failed over one testing this theory. Problem persists. So I would rule out underlying issues. Especially since the "wrong" value is suspiciously often the on-disk size rather than a random value one would expect if there is corruption underneath. Tim Am 02.02.23 um 16:43 schrieb Christopher Wensink: Something to try, this all could be happening because of underlying disk failure on the array it is running on. If this is a VM, can you move the operation to another host or data store to rule out hardware issues? On 2/2/2023 9:19 AM, Stuart Henderson wrote: On 2023-02-01, Tim Evers wrote: I run a fairly large Dovecot Installation (around 100k mailboxes) on several servers. gzip compression is on. Every once in a while I get the dreaded "cache corruption" messages in the log: Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) failed: Cached message size smaller than expected (2877 < 8099, box=INBOX, UID=3868) Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) failed: Cached message size smaller than expected (5533 < 8192, box=INBOX, UID=3875) The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, also in the filename: S=8099). The second entry shows 5533 (size on disk) vs. 8192 - this is not correct in any way. Size on disk is 13907 as noted in the filename. Both mails were delivered trough LMTP and retrieved by the POP3 service. Anyone with an idea what might be happening here? I've read all available info in the doc and in the previous discussions / bug reports, but nothing seems to match my case. And where does that 8192 come from - it looks suspicious? Version is 2.3.7.2 (Ubuntu 20.04) 2.3.7.2 is rather old now. There were definitely fixes regarding compression around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details but it took a release or two before some remaining issues were sorted out after changes in the area). I'd be looking to get it updated to a current version first.
Re: Corrupted sizes in cache once again
Can you isolate the problem account on a separate VM to see if the problem follows the account or the original vm? Chris On 2/2/2023 9:58 AM, Tim Evers wrote: Good point - these are 8 diferrent DRBD clusters. I failed over one testing this theory. Problem persists. So I would rule out underlying issues. Especially since the "wrong" value is suspiciously often the on-disk size rather than a random value one would expect if there is corruption underneath. Tim Am 02.02.23 um 16:43 schrieb Christopher Wensink: Something to try, this all could be happening because of underlying disk failure on the array it is running on. If this is a VM, can you move the operation to another host or data store to rule out hardware issues? On 2/2/2023 9:19 AM, Stuart Henderson wrote: On 2023-02-01, Tim Evers wrote: I run a fairly large Dovecot Installation (around 100k mailboxes) on several servers. gzip compression is on. Every once in a while I get the dreaded "cache corruption" messages in the log: Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) failed: Cached message size smaller than expected (2877 < 8099, box=INBOX, UID=3868) Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) failed: Cached message size smaller than expected (5533 < 8192, box=INBOX, UID=3875) The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, also in the filename: S=8099). The second entry shows 5533 (size on disk) vs. 8192 - this is not correct in any way. Size on disk is 13907 as noted in the filename. Both mails were delivered trough LMTP and retrieved by the POP3 service. Anyone with an idea what might be happening here? I've read all available info in the doc and in the previous discussions / bug reports, but nothing seems to match my case. And where does that 8192 come from - it looks suspicious? Version is 2.3.7.2 (Ubuntu 20.04) 2.3.7.2 is rather old now. There were definitely fixes regarding compression around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details but it took a release or two before some remaining issues were sorted out after changes in the area). I'd be looking to get it updated to a current version first. -- Christopher Wensink IS Administrator Five Star Plastics, Inc 1339 Continental Drive Eau Claire, WI 54701 Office: 715-831-1682 Mobile: 715-563-3112 Fax: 715-831-6075 cwens...@five-star-plastics.com www.five-star-plastics.com
Re: Corrupted sizes in cache once again
Good point - these are 8 diferrent DRBD clusters. I failed over one testing this theory. Problem persists. So I would rule out underlying issues. Especially since the "wrong" value is suspiciously often the on-disk size rather than a random value one would expect if there is corruption underneath. Tim Am 02.02.23 um 16:43 schrieb Christopher Wensink: Something to try, this all could be happening because of underlying disk failure on the array it is running on. If this is a VM, can you move the operation to another host or data store to rule out hardware issues? On 2/2/2023 9:19 AM, Stuart Henderson wrote: On 2023-02-01, Tim Evers wrote: I run a fairly large Dovecot Installation (around 100k mailboxes) on several servers. gzip compression is on. Every once in a while I get the dreaded "cache corruption" messages in the log: Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) failed: Cached message size smaller than expected (2877 < 8099, box=INBOX, UID=3868) Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) failed: Cached message size smaller than expected (5533 < 8192, box=INBOX, UID=3875) The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, also in the filename: S=8099). The second entry shows 5533 (size on disk) vs. 8192 - this is not correct in any way. Size on disk is 13907 as noted in the filename. Both mails were delivered trough LMTP and retrieved by the POP3 service. Anyone with an idea what might be happening here? I've read all available info in the doc and in the previous discussions / bug reports, but nothing seems to match my case. And where does that 8192 come from - it looks suspicious? Version is 2.3.7.2 (Ubuntu 20.04) 2.3.7.2 is rather old now. There were definitely fixes regarding compression around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details but it took a release or two before some remaining issues were sorted out after changes in the area). I'd be looking to get it updated to a current version first.
Re: Corrupted sizes in cache once again
Am 02.02.23 um 16:23 schrieb Aki Tuomi: On 02/02/2023 17:19 EET Stuart Henderson wrote: On 2023-02-01, Tim Evers wrote: I run a fairly large Dovecot Installation (around 100k mailboxes) on several servers. gzip compression is on. Every once in a while I get the dreaded "cache corruption" messages in the log: Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) failed: Cached message size smaller than expected (2877 < 8099, box=INBOX, UID=3868) Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) failed: Cached message size smaller than expected (5533 < 8192, box=INBOX, UID=3875) The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, also in the filename: S=8099). The second entry shows 5533 (size on disk) vs. 8192 - this is not correct in any way. Size on disk is 13907 as noted in the filename. Both mails were delivered trough LMTP and retrieved by the POP3 service. Anyone with an idea what might be happening here? I've read all available info in the doc and in the previous discussions / bug reports, but nothing seems to match my case. And where does that 8192 come from - it looks suspicious? Version is 2.3.7.2 (Ubuntu 20.04) 2.3.7.2 is rather old now. There were definitely fixes regarding compression around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details but it took a release or two before some remaining issues were sorted out after changes in the area). I'd be looking to get it updated to a current version first. For bug reports, we do ask that you try to reproduce it with 2.3.20 (current latest), you can get packages from https://repo.dovecot.org/ and would be nice if you can provide steps to reproduce this issue. Also please see https://www.dovecot.org/bugreport-mail/ if you have not yet. Aki This is not a bug report (yet) - I am asking for ideas to narrow down the issue myself. I would not expect something this obvious and prevalent being a bug in a 10+ years old subsystem (zlib). Tim
RE: Corrupted sizes in cache once again
Could even be memory. I had once on an office machine a faulty memory module (without ecc), and it caused the md5sum from files on truecrypt usb backup drives to change constantly. Removed the module, and no more issues. > > Something to try, this all could be happening because of underlying disk > failure on the array it is running on. If this is a VM, can you move > the operation to another host or data store to rule out hardware issues? > > On 2/2/2023 9:19 AM, Stuart Henderson wrote: > > On 2023-02-01, Tim Evers wrote: > >> I run a fairly large Dovecot Installation (around 100k mailboxes) on > >> several servers. > >> > >> gzip compression is on. > >> > >> Every once in a while I get the dreaded "cache corruption" messages in > >> the log: > >> > >> Error: Corrupted record in index cache file > >> /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size > >> in mailbox INBOX: > >> > read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276 > :2,)) > >> failed: Cached message size smaller than expected (2877 < 8099, > >> box=INBOX, UID=3868) > >> > >> Error: Corrupted record in index cache file > >> /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size > >> in mailbox INBOX: > >> > read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=141 > 21:2,)) > >> failed: Cached message size smaller than expected (5533 < 8192, > >> box=INBOX, UID=3875) > >> > >> The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, > >> also in the filename: S=8099). > >> > >> The second entry shows 5533 (size on disk) vs. 8192 - this is not > >> correct in any way. Size on disk is 13907 as noted in the filename. > >> > >> Both mails were delivered trough LMTP and retrieved by the POP3 service. > >> > >> Anyone with an idea what might be happening here? I've read all > >> available info in the doc and in the previous discussions / bug reports, > >> but nothing seems to match my case. And where does that 8192 come from - > >> it looks suspicious? > >> > >> Version is 2.3.7.2 (Ubuntu 20.04) > > 2.3.7.2 is rather old now. There were definitely fixes regarding compression > > around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details > > but it took a release or two before some remaining issues were sorted out > > after changes in the area). I'd be looking to get it updated to a current > > version first. > >
Re: Corrupted sizes in cache once again
Something to try, this all could be happening because of underlying disk failure on the array it is running on. If this is a VM, can you move the operation to another host or data store to rule out hardware issues? On 2/2/2023 9:19 AM, Stuart Henderson wrote: On 2023-02-01, Tim Evers wrote: I run a fairly large Dovecot Installation (around 100k mailboxes) on several servers. gzip compression is on. Every once in a while I get the dreaded "cache corruption" messages in the log: Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) failed: Cached message size smaller than expected (2877 < 8099, box=INBOX, UID=3868) Error: Corrupted record in index cache file /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size in mailbox INBOX: read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) failed: Cached message size smaller than expected (5533 < 8192, box=INBOX, UID=3875) The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, also in the filename: S=8099). The second entry shows 5533 (size on disk) vs. 8192 - this is not correct in any way. Size on disk is 13907 as noted in the filename. Both mails were delivered trough LMTP and retrieved by the POP3 service. Anyone with an idea what might be happening here? I've read all available info in the doc and in the previous discussions / bug reports, but nothing seems to match my case. And where does that 8192 come from - it looks suspicious? Version is 2.3.7.2 (Ubuntu 20.04) 2.3.7.2 is rather old now. There were definitely fixes regarding compression around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details but it took a release or two before some remaining issues were sorted out after changes in the area). I'd be looking to get it updated to a current version first. -- Christopher Wensink IS Administrator Five Star Plastics, Inc 1339 Continental Drive Eau Claire, WI 54701 Office: 715-831-1682 Mobile: 715-563-3112 Fax: 715-831-6075 cwens...@five-star-plastics.com www.five-star-plastics.com
Re: Corrupted sizes in cache once again
> On 02/02/2023 17:19 EET Stuart Henderson wrote: > > > On 2023-02-01, Tim Evers wrote: > > I run a fairly large Dovecot Installation (around 100k mailboxes) on > > several servers. > > > > gzip compression is on. > > > > Every once in a while I get the dreaded "cache corruption" messages in > > the log: > > > > Error: Corrupted record in index cache file > > /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size > > in mailbox INBOX: > > read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) > > > > failed: Cached message size smaller than expected (2877 < 8099, > > box=INBOX, UID=3868) > > > > Error: Corrupted record in index cache file > > /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size > > in mailbox INBOX: > > read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) > > > > failed: Cached message size smaller than expected (5533 < 8192, > > box=INBOX, UID=3875) > > > > The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, > > also in the filename: S=8099). > > > > The second entry shows 5533 (size on disk) vs. 8192 - this is not > > correct in any way. Size on disk is 13907 as noted in the filename. > > > > Both mails were delivered trough LMTP and retrieved by the POP3 service. > > > > Anyone with an idea what might be happening here? I've read all > > available info in the doc and in the previous discussions / bug reports, > > but nothing seems to match my case. And where does that 8192 come from - > > it looks suspicious? > > > > Version is 2.3.7.2 (Ubuntu 20.04) > > 2.3.7.2 is rather old now. There were definitely fixes regarding compression > around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details > but it took a release or two before some remaining issues were sorted out > after changes in the area). I'd be looking to get it updated to a current > version first. For bug reports, we do ask that you try to reproduce it with 2.3.20 (current latest), you can get packages from https://repo.dovecot.org/ and would be nice if you can provide steps to reproduce this issue. Also please see https://www.dovecot.org/bugreport-mail/ if you have not yet. Aki
Re: Corrupted sizes in cache once again
On 2023-02-01, Tim Evers wrote: > I run a fairly large Dovecot Installation (around 100k mailboxes) on > several servers. > > gzip compression is on. > > Every once in a while I get the dreaded "cache corruption" messages in > the log: > > Error: Corrupted record in index cache file > /[redacted]/Maildir/dovecot.index.cache: UID 3868: Broken physical size > in mailbox INBOX: > read(zlib(/[redacted]/Maildir/cur/1674129792.M797543P21755.node2,S=8099,W=8276:2,)) > > failed: Cached message size smaller than expected (2877 < 8099, > box=INBOX, UID=3868) > > Error: Corrupted record in index cache file > /[redacted]/Maildir/dovecot.index.cache: UID 3875: Broken physical size > in mailbox INBOX: > read(zlib(/[redacted]/Maildir/cur/1674212201.M985809P29112.node2,S=13907,W=14121:2,)) > > failed: Cached message size smaller than expected (5533 < 8192, > box=INBOX, UID=3875) > > The first entry shows 2877 (size on disk) vs. 8099 (real size unzipped, > also in the filename: S=8099). > > The second entry shows 5533 (size on disk) vs. 8192 - this is not > correct in any way. Size on disk is 13907 as noted in the filename. > > Both mails were delivered trough LMTP and retrieved by the POP3 service. > > Anyone with an idea what might be happening here? I've read all > available info in the doc and in the previous discussions / bug reports, > but nothing seems to match my case. And where does that 8192 come from - > it looks suspicious? > > Version is 2.3.7.2 (Ubuntu 20.04) 2.3.7.2 is rather old now. There were definitely fixes regarding compression around the 2.3.10-2.3.12 timeframe or thereabouts (I forget all the details but it took a release or two before some remaining issues were sorted out after changes in the area). I'd be looking to get it updated to a current version first.
Special authentication use case
Folks, I'm trying to configure dovecot SASL with two use cases : - First with XOAUTH2 : I've managed to get it working, pretty right out of the box, developers have done a great job :-) - Second with Client TLS Certificate with no luck. Let me explain, the certificate brought by the client does not have any clue of associated email, I have to check that username (=email) sent by the client is really related to some information included in the certificate (I have to extract the OU and then lookup into a table of authorized mails for that OU). Is it possible to do that with dovecot ? I think yes but I'm looking for direction to achieve that. Lua maybe ? Our configuration : - OS : Debian 11 $ /usr/sbin/dovecot --version 2.3.13 (89f716dc2) Regards. -- Philippe MARASSE Responsable pôle Infrastructures - DSIO Centre Hospitalier Henri Laborit CS 10587 - 370 avenue Jacques Cœur 86021 Poitiers Cedex Tel : 05.49.44.57.19