Don't use mbox.

It is very slow format when mails need to be deleted from the middle. Basically 
rewriting the whole mbox file each time.

Use sdbox instead.

Sami


> On 12 Sep 2019, at 9.57, Zenaan Harkness via dovecot <dovecot@dovecot.org> 
> wrote:
> 
> I am wondering why sieve-filter is so slow compared to gnu sieve.
> 
> I run mpop (like getmail) to download from a pop3 server to a local
> mbox file: ~/mail/email-incoming-unsorted
> 
> This step is very fast.
> 
> The next step, I throw the email-incoming-unsorted mbox file at a
> sieve processor, to sort the emails from that mbox, into other
> mboxes, according to the sieve rules file.
> 
> Up until a couple days ago I was using Gnu sieve.
> 
> Gnu sieve balks on emails which have no x-message-id (?? something
> like this) header field, so after a few years, I finally decided to
> switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.
> 
> Using Gnu sieve, this mbox sorting step was even faster than mpop (/
> getmail) - and mpop and getmail are really fast (compared with
> fetchmail), since they pipeline the email downloads.
> 
> Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds
> at most. Super fast.
> 
> Using sieve-filter, all emails are being processed - including those
> without "message id header". This is good.
> 
> But also, using sieve filter, is really slower - slower than the
> download step by an order of magnitude or two.
> 
> See below for details, any ideas appreciated.
> 
> To add to the below, I added:
> 
> mbox_very_dirty_syncs = yes
> 
> to the sieve-filter config, which slightly improves performance, but
> not by much (in comparison with Gnu sieve).
> 
> TIA,
> 
> 
> 
> ----- Forwarded message from Zenaan Harkness <zen...@freedbms.net> -----
> 
> From: Zenaan Harkness <zen...@freedbms.net>
> To: debian-u...@lists.debian.org
> Date: Thu, 12 Sep 2019 08:06:12 +1000
> Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow 
> at lda (writing emails to local mbox files)
> 
> On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:
>> Why is Gnu sieve so extremely fast to batch process an mbox file, but
>> while Dovecot's sieve-filter is an order of magnitude slower?
>> 
>> Sequence:
>> 
>> - mpop or getmail to pipeline download emails into temp mbox file
>> - filter that file
>> 
>> Gnu sieve just flies through a local mbox file and saving emails to
>> other local mbox files.
>> 
>> Gnu sieve rejects too many emails with "malformed" errors, so after a
>> few years I bit the bullet and upgraded to Dovecot's sieve-filter.
>> 
>> Dovecot's sieve-filter, at present, is an order of magnitude slower.
>> 
>> Here's my filter command (one line):
>> 
>> /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o 
>> mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
>>  ~/etc/email/sieve.rc email-incoming-unsorted
>> 
>> The sieve script is fine now that I have the correct "require"
>> clauses (hint: "capability strings").
>> 
>> File ~/etc/email/sieve-dovecot-config.conf:
>> 
>>  protocols = pop
>>  lda_mailbox_autocreate = yes
>>  lda_mailbox_autosubscribe = yes
>>  mail_fsync = never
>> 
>> There's no re-sending of emails into my local Postfix SMTP server - I
>> checked the system logs and confirmed this (journalctl -f).
>> 
>> I suspect that Gnu sieve was directly writing each email to the
>> appropriate sieve-determined mbox file (perhaps with only a sync at
>> the end of a single batch process - what I've attempted to achieve
>> above with sieve-filter), and that sieve-filter is instead passing
>> each email through some (dovecot) lda?
>> 
>> Here's the output for a sieve-filter batch processing of 11 emails:
>> 
>> $ /usr/bin/sieve-filter -veW -c 
>> /home/zen/etc/email/sieve-dovecot-config.conf -o 
>> mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
>>  /home/zen/etc/email/sieve.rc email-incoming-unsorted
>> # PS0 Timestamp: 20190912@07:02:23
>> info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: 
>> VentureBeat: The death of disk? H...'.
>> info: 
>> msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=oydedx4fmcord...@mail.gmail.com>: 
>> stored mail into mailbox 'l/cp/cp'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] 
>> xattr naming format in Zo...'.
>> info: msgid=<15675101930.d5ba2e.12...@composer.zfsonlinux.topicbox.com>: 
>> stored mail into mailbox 'l/z/zdev'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: 
>> [zfs-devel] xattr naming format i...'.
>> info: msgid=<23955051567513...@sas1-02732547ccc0.qloud-c.yandex.net>: stored 
>> mail into mailbox 'l/z/zdev'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: 
>> [Gluster-users] Issues with Geo-r...'.
>> info: 
>> msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=igdajtszkfq5pczsu...@mail.gmail.com>: 
>> stored mail into mailbox 'l/gl/user'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'.
>> info: msgid=<20190903133420.gs6...@eeg.ccf.org>: stored mail into mailbox 
>> 'l/deb/user'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] 
>> `[awx-project] Re: AWX on Kubernetes m...'.
>> info: msgid=<0715adb7-540f-4cff-9282-e1252c53c...@googlegroups.com>: stored 
>> mail into mailbox 'l/ansible/awx'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] 
>> `[awx-project] Re: AWX on Kubernetes m...'.
>> info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa...@googlegroups.com>: stored 
>> mail into mailbox 'l/ansible/awx'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: 
>> [zfs-devel] xattr naming format i...'.
>> info: 
>> msgid=<cab5c7xphcdfx1w3ya9fyrl-kq8buicr4jbidqrufjj9nogk...@mail.gmail.com>: 
>> stored mail into mailbox 'l/z/zdev'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: 
>> [asterisk-users] Playing MP3's in...'.
>> info: msgid=<20190903151022.354xpe6ds2vglher@red.localdomain>: stored mail 
>> into mailbox 'l/as/users'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: 
>> [Hyperledger Fabric] a primitive ...'.
>> info: msgid=<160901d8-b903-9e9a-91ac-267571b0e...@gmx.com>: stored mail into 
>> mailbox 'l/hl/fabric'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] 
>> `[awx-project] Re: AWX on Kubernetes m...'.
>> info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffc...@googlegroups.com>: stored 
>> mail into mailbox 'l/ansible/awx'.
>> info: message expunged from source mailbox upon successful move.
>> 2 ▶︎️ zen@eye 20190912@07:02:30 ~ $
>> 
>> 
>> So about 3/4 of a second is spent by dovecot's sieve-filter, on each
>> email that it processes - watching it is painful given how fast Gnu
>> sieve has been for the last few years - it's almost (but not quite)
>> as slow as my previous fetchmail email download per-email time.
>> 
>> Attached is a -D debug run of sieve-filter on 20 emails - slightly
>> longer than the above, and took roughly 15 seconds to run.
>> 
>> Any help appreciated...
> 
> 
> On another test run of ~600 emails, sieve-filter is consistently
> running ~100% of one CPU (for about 4 minutes) to process these
> emails, which leads to the conclusion that despite what looks like
> should be a batch process, sieve-filter is perhaps reloading the
> rules for every single email that it processes, even though I gave it
> a whole mbox, and not a single email, to process.
> 
> Can sieve-filter work the way it should / the way I want it / batch
> process a whole mbox - without reloading the sieve rules for every
> email?
> 
> ----- End forwarded message -----

Reply via email to