Dear John, Thank you very much for this reply! You saved me really a lot of hours of work!
I have managed to index those mailboxes finally after applying your patch. I don’t know why this patch never got any attention and didn’t find its way to the upstream... But it became clear that in current configuration dovecot sends as message body tons of useless binary information that not just hard to index but also affects search performance badly. Moreover due to this on some mailboxes up to 1% of all messages are now ignored completely (it’s of course better than stuck index and overloaded IO of the whole system - thanks again!) I have to wait for non-business hours to run some more tests, but it seems it happens really often when dovecot sends binary attachments of any kind for indexing. Once done I’ll try to submit more detailed separate bug report. At the moment I am thinking about developing a small proxy that I gonna put between dovecot and solr and strip unnecessary body parts on that stage… > On 16. Jan 2021, at 19:16, John Fawcett <j...@voipsupport.it> wrote: > > On 15/01/2021 11:39, Alexey Panov wrote: >> I guess I was indeed correct. As message body is the part of mail index >> dovecot indeed sends all those binary messages as a part of message body: >> >>> doveadm fetch -u test.u...@interpont.com <mailto:test.u...@interpont.com> >>> "body" mailbox KS2 uid 2 | wc --bytes >>> 21549696 >> >> This affects FTS indexing dramatically I guess… >> >>> On 15. Jan 2021, at 17:10, Alexey Panov <programmie...@icloud.com >>> <mailto:programmie...@icloud.com>> wrote: >>> >>> I could find messages causing problems as solr log indeed shows the IMAP >>> UID. >>> >>> Reducing batch_size to a single message I could identify one of those >>> messages causing the problem. >>> >>> Now I can confirm the following behaviour: there are some SINGLE messages >>> where 60 seconds is NOT ENOUGH to index. The message I am looking at now is >>> 20MB big (compressed 17MB) and has a lot of inline pasted screenshots. I >>> guess that dovecot might send those pasted images as a part of the message >>> body to solr? >>> >>> Here are some logs for better understanding: >>> >>> Folder contents (single message): >>> >>>> doveadm search -u u@d mailbox KS2 >>>> e836ff1d20640160340b0000532cee39 2 >>> >>> >>> Force indexing: >>> >>>> doveadm index -u u@d KS2 >>>> doveadm(u@d): Error: fts_solr: Indexing failed: Request timed out (Request >>>> queued 61.105 secs ago, 1 send attempts in 61.089 secs, 60.071 in http >>>> ioloop, 0.000 in other ioloops, connected 61.089 secs ago) >>>> doveadm(u@d): Error: Mailbox KS2: Transaction commit failed: FTS >>>> transaction commit failed: backend deinit >>> >>> >>> Corresponding solr logs: >>> >>>> solr-mailcow_1 | 2021-01-15 10:02:34.901 INFO (qtp524223214-2127) [ >>>> x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] >>>> webapp=/solr path=/update >>>> params={}{add=[2/e836ff1d20640160340b0000532cee39/u@d >>>> (1688946475272241152)]} 0 80683 >>> >>> I would greatly appreciate any help. My users using pasted images heavily >>> (designers exchanging mockups etc) and if this truly is a problem then I am >>> screwed :) >>> >>> >>>> On 14. Jan 2021, at 23:46, Alexey Panov <programmie...@icloud.com >>>> <mailto:programmie...@icloud.com>> wrote: >>>> >>>> Hello, >>>> >>>> I am running recent dovecot 2.3.13 (89f716dc2) within dockerized mailcow >>>> setup. >>>> >>>> During just another update routine I decided to reindex all mailboxes and >>>> some of them fail (timeout) on exactly SAME messages in SAME mailboxes: >>>> >>>>> doveadm(u@d): Error: fts_solr: Indexing failed: Request timed out >>>>> (Request queued 73.213 secs ago, 1 send attempts in 73.098 secs, 60.597 >>>>> in http ioloop, 0.000 in other ioloops, connected 127.875 secs ago) >>>>> doveadm(u@d): Error: Mailbox Sent: Mail search failed: Internal error >>>>> occurred. Refer to server log for more information. [2021-01-14 11:23:46] >>>>> doveadm(u@d): Error: Mailbox Sent: Transaction commit failed: FTS >>>>> transaction commit failed: backend deinit >>>> >>>> This error is persistent and affects only some, but always same folders of >>>> same mailboxes (i mean solr can't finish indexing them at all). >>>> >>>> I/O is in perfect condition and has no corruptions. I also tried to do fts >>>> rescan which made no change. RAM is fine, I haven't seen OOM conditions. >>>> >>>> During my research I found this discussion >>>> <https://www.mail-archive.com/dovecot@dovecot.org/msg76944.html> which >>>> lead me idea of reducing the batch_size to as low as 10 (for testing >>>> purposes). Observing SOLR logs during reindexing on the mailbox/folder >>>> causing the error I noticed the following strange behaviour. >>>> >>>> Firstly, it goes very fast but then always kinda freezes on these messages: >>>> >>>>> solr-mailcow_1 | 2021-01-14 16:05:09.615 INFO (qtp524223214-1151) >>>>> [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] >>>>> webapp=/solr path=/update >>>>> params={}{add=[6883/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773188952064), 6884/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773203632128), 6885/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773218312192), 6886/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773227749376), 6887/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773237186560), 6888/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773244526592), 6889/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773250818048), 6890/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773259206656), 6891/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773272838144), 6892/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773288566784)]} 0 101 >>>>> solr-mailcow_1 | 2021-01-14 16:06:42.540 INFO (qtp524223214-1143) >>>>> [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] >>>>> webapp=/solr path=/update >>>>> params={}{add=[6893/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773329461248), 6894/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878773332606976), 6895/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878774361260032), 6896/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878850290745344), 6897/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878851347709952), 6898/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878852404674560), 6899/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878853586419712), 6900/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878870704422912), 6901/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878870709665792), 6902/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878870727491584)]} 0 92891 >>>> >>>> >>>> It takes almost one and a half minute to get those ten messages >>>> (6893-6902) indexed. It freezes exactly on that position every time I >>>> restart indexing. Actually only reducing the batch_size to 10 made it >>>> through. It was timing out with the default batch_size exactly here each >>>> time. >>>> >>>> Than it goes further again blazing fast until it reaches here: >>>> >>>>> solr-mailcow_1 | 2021-01-14 16:07:26.148 INFO (qtp524223214-1151) >>>>> [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] >>>>> webapp=/solr path=/update >>>>> params={}{add=[7443/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878884286627840), 7444/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878884293967872), 7445/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878884304453632), 7446/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878884502634496), 7447/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878900331937792), 7448/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878900508098560), 7449/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878916409753600), 7450/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878916424433664), 7451/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878916438065152), 7452/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878916448550912)]} 0 30682 >>>>> solr-mailcow_1 | 2021-01-14 16:09:40.277 INFO (qtp524223214-1325) >>>>> [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] >>>>> webapp=/solr path=/update >>>>> params={}{add=[7463/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688879057103486976)]} 0 1 >>>>> solr-mailcow_1 | 2021-01-14 16:10:00.628 INFO (qtp524223214-1009) >>>>> [ x:dovecot-fts] o.a.s.u.p.LogUpdateProcessorFactory [dovecot-fts] >>>>> webapp=/solr path=/update >>>>> params={}{add=[7453/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878916486299648), 7454/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878916489445376), 7455/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878916496785408), 7456/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878917584158720), 7457/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688878994974310400), 7458/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688879073177108480), 7459/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688879074234073088), 7460/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688879075288940544), 7461/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688879076339613696), 7462/78a0ec27b091da5e4054000013681708/u@d >>>>> (1688879077392384000)]} 0 154455 >>>> >>>> This results in the following error on the console where I start >>>> reindexing: >>>> >>>>> $ docker-compose exec dovecot-mailcow doveadm index -u u@d 'Sent' >>>>> doveadm(u@d): Error: fts_solr: Indexing failed: Request timed out >>>>> (Request queued 134.122 secs ago, 1 send attempts in 134.098 secs, >>>>> 131.923 in http ioloop, 0.000 in other ioloops, connected 376.408 secs >>>>> ago) >>>>> doveadm(u@d): Error: Mailbox Sent: Mail search failed: Internal error >>>>> occurred. Refer to server log for more information. [2021-01-14 17:09:40] >>>>> doveadm(u@d): Error: Mailbox Sent: Transaction commit failed: FTS >>>>> transaction commit failed: backend deinit >>>> >>>> >>>> The last message about indexed block of ten messages (2021-01-14 >>>> 16:10:00.628 7453-7462) arrives already after reindexing error on the >>>> other console occurs and doveadm exits. Interestingly, exactly at the time >>>> mentioned in the error message (2021-01-14 17:09:40, i.e. 16:06:40 - I >>>> don’t know why timezone differs) a message about indexing of a single >>>> message 7463 arrives (which is actually NEXT message to the block that >>>> gets indexed 20 seconds later). >>>> >>>> The whole mailbox consists of 8530 messages and doveadm/solr can’t finish >>>> indexing it at all starting over and over. >>>> >>>> Can you please help me understanding the reason of this behaviour? >>>> >>>> P.S. Is there any possibility to determine what message in vmail maildir >>>> corresponds to solr log, e.g. to 7463/78a0ec27b091da5e4054000013681708/u@d >>>> (1688879057103486976) ? >>>> >>>> Thanks, Programmierus >>> >> > I previously posted a patch (7 Dec 2020) you might find useful in this > circumstance which includes an additional setting, for example: > > plugin { > > ... > fts_max_size = 2M > ... > } > > fts_max_size - do not parse message bodies if the message size exceeds > this value. A value of 0 indicates no limit. When the message body is not > parsed, attachments are also not parsed. > > John