Don't use mbox. It is very slow format when mails need to be deleted from the middle. Basically rewriting the whole mbox file each time.
Use sdbox instead. Sami > On 12 Sep 2019, at 9.57, Zenaan Harkness via dovecot <dovecot@dovecot.org> > wrote: > > I am wondering why sieve-filter is so slow compared to gnu sieve. > > I run mpop (like getmail) to download from a pop3 server to a local > mbox file: ~/mail/email-incoming-unsorted > > This step is very fast. > > The next step, I throw the email-incoming-unsorted mbox file at a > sieve processor, to sort the emails from that mbox, into other > mboxes, according to the sieve rules file. > > Up until a couple days ago I was using Gnu sieve. > > Gnu sieve balks on emails which have no x-message-id (?? something > like this) header field, so after a few years, I finally decided to > switch "up" to Dovecot/Pigeonhole's "sieve-filter" command. > > Using Gnu sieve, this mbox sorting step was even faster than mpop (/ > getmail) - and mpop and getmail are really fast (compared with > fetchmail), since they pipeline the email downloads. > > Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds > at most. Super fast. > > Using sieve-filter, all emails are being processed - including those > without "message id header". This is good. > > But also, using sieve filter, is really slower - slower than the > download step by an order of magnitude or two. > > See below for details, any ideas appreciated. > > To add to the below, I added: > > mbox_very_dirty_syncs = yes > > to the sieve-filter config, which slightly improves performance, but > not by much (in comparison with Gnu sieve). > > TIA, > > > > ----- Forwarded message from Zenaan Harkness <zen...@freedbms.net> ----- > > From: Zenaan Harkness <zen...@freedbms.net> > To: debian-u...@lists.debian.org > Date: Thu, 12 Sep 2019 08:06:12 +1000 > Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow > at lda (writing emails to local mbox files) > > On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote: >> Why is Gnu sieve so extremely fast to batch process an mbox file, but >> while Dovecot's sieve-filter is an order of magnitude slower? >> >> Sequence: >> >> - mpop or getmail to pipeline download emails into temp mbox file >> - filter that file >> >> Gnu sieve just flies through a local mbox file and saving emails to >> other local mbox files. >> >> Gnu sieve rejects too many emails with "malformed" errors, so after a >> few years I bit the bullet and upgraded to Dovecot's sieve-filter. >> >> Dovecot's sieve-filter, at present, is an order of magnitude slower. >> >> Here's my filter command (one line): >> >> /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o >> mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions >> ~/etc/email/sieve.rc email-incoming-unsorted >> >> The sieve script is fine now that I have the correct "require" >> clauses (hint: "capability strings"). >> >> File ~/etc/email/sieve-dovecot-config.conf: >> >> protocols = pop >> lda_mailbox_autocreate = yes >> lda_mailbox_autosubscribe = yes >> mail_fsync = never >> >> There's no re-sending of emails into my local Postfix SMTP server - I >> checked the system logs and confirmed this (journalctl -f). >> >> I suspect that Gnu sieve was directly writing each email to the >> appropriate sieve-determined mbox file (perhaps with only a sync at >> the end of a single batch process - what I've attempted to achieve >> above with sieve-filter), and that sieve-filter is instead passing >> each email through some (dovecot) lda? >> >> Here's the output for a sieve-filter batch processing of 11 emails: >> >> $ /usr/bin/sieve-filter -veW -c >> /home/zen/etc/email/sieve-dovecot-config.conf -o >> mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions >> /home/zen/etc/email/sieve.rc email-incoming-unsorted >> # PS0 Timestamp: 20190912@07:02:23 >> info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: >> VentureBeat: The death of disk? H...'. >> info: >> msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=oydedx4fmcord...@mail.gmail.com>: >> stored mail into mailbox 'l/cp/cp'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] >> xattr naming format in Zo...'. >> info: msgid=<15675101930.d5ba2e.12...@composer.zfsonlinux.topicbox.com>: >> stored mail into mailbox 'l/z/zdev'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: >> [zfs-devel] xattr naming format i...'. >> info: msgid=<23955051567513...@sas1-02732547ccc0.qloud-c.yandex.net>: stored >> mail into mailbox 'l/z/zdev'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: >> [Gluster-users] Issues with Geo-r...'. >> info: >> msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=igdajtszkfq5pczsu...@mail.gmail.com>: >> stored mail into mailbox 'l/gl/user'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'. >> info: msgid=<20190903133420.gs6...@eeg.ccf.org>: stored mail into mailbox >> 'l/deb/user'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] >> `[awx-project] Re: AWX on Kubernetes m...'. >> info: msgid=<0715adb7-540f-4cff-9282-e1252c53c...@googlegroups.com>: stored >> mail into mailbox 'l/ansible/awx'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] >> `[awx-project] Re: AWX on Kubernetes m...'. >> info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa...@googlegroups.com>: stored >> mail into mailbox 'l/ansible/awx'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: >> [zfs-devel] xattr naming format i...'. >> info: >> msgid=<cab5c7xphcdfx1w3ya9fyrl-kq8buicr4jbidqrufjj9nogk...@mail.gmail.com>: >> stored mail into mailbox 'l/z/zdev'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: >> [asterisk-users] Playing MP3's in...'. >> info: msgid=<20190903151022.354xpe6ds2vglher@red.localdomain>: stored mail >> into mailbox 'l/as/users'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: >> [Hyperledger Fabric] a primitive ...'. >> info: msgid=<160901d8-b903-9e9a-91ac-267571b0e...@gmx.com>: stored mail into >> mailbox 'l/hl/fabric'. >> info: message expunged from source mailbox upon successful move. >> info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] >> `[awx-project] Re: AWX on Kubernetes m...'. >> info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffc...@googlegroups.com>: stored >> mail into mailbox 'l/ansible/awx'. >> info: message expunged from source mailbox upon successful move. >> 2 ▶︎️ zen@eye 20190912@07:02:30 ~ $ >> >> >> So about 3/4 of a second is spent by dovecot's sieve-filter, on each >> email that it processes - watching it is painful given how fast Gnu >> sieve has been for the last few years - it's almost (but not quite) >> as slow as my previous fetchmail email download per-email time. >> >> Attached is a -D debug run of sieve-filter on 20 emails - slightly >> longer than the above, and took roughly 15 seconds to run. >> >> Any help appreciated... > > > On another test run of ~600 emails, sieve-filter is consistently > running ~100% of one CPU (for about 4 minutes) to process these > emails, which leads to the conclusion that despite what looks like > should be a batch process, sieve-filter is perhaps reloading the > rules for every single email that it processes, even though I gave it > a whole mbox, and not a single email, to process. > > Can sieve-filter work the way it should / the way I want it / batch > process a whole mbox - without reloading the sieve rules for every > email? > > ----- End forwarded message -----