I am wondering why sieve-filter is so slow compared to gnu sieve.

I run mpop (like getmail) to download from a pop3 server to a local
mbox file: ~/mail/email-incoming-unsorted

This step is very fast.

The next step, I throw the email-incoming-unsorted mbox file at a
sieve processor, to sort the emails from that mbox, into other
mboxes, according to the sieve rules file.

Up until a couple days ago I was using Gnu sieve.

Gnu sieve balks on emails which have no x-message-id (?? something
like this) header field, so after a few years, I finally decided to
switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.

Using Gnu sieve, this mbox sorting step was even faster than mpop (/
getmail) - and mpop and getmail are really fast (compared with
fetchmail), since they pipeline the email downloads.

Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds
at most. Super fast.

Using sieve-filter, all emails are being processed - including those
without "message id header". This is good.

But also, using sieve filter, is really slower - slower than the
download step by an order of magnitude or two.

See below for details, any ideas appreciated.

To add to the below, I added:

mbox_very_dirty_syncs = yes

to the sieve-filter config, which slightly improves performance, but
not by much (in comparison with Gnu sieve).

TIA,



----- Forwarded message from Zenaan Harkness <zen...@freedbms.net> -----

From: Zenaan Harkness <zen...@freedbms.net>
To: debian-u...@lists.debian.org
Date: Thu, 12 Sep 2019 08:06:12 +1000
Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at 
lda (writing emails to local mbox files)

On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:
> Why is Gnu sieve so extremely fast to batch process an mbox file, but
> while Dovecot's sieve-filter is an order of magnitude slower?
> 
> Sequence:
> 
>  - mpop or getmail to pipeline download emails into temp mbox file
>  - filter that file
> 
> Gnu sieve just flies through a local mbox file and saving emails to
> other local mbox files.
> 
> Gnu sieve rejects too many emails with "malformed" errors, so after a
> few years I bit the bullet and upgraded to Dovecot's sieve-filter.
> 
> Dovecot's sieve-filter, at present, is an order of magnitude slower.
> 
> Here's my filter command (one line):
> 
> /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o 
> mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
>  ~/etc/email/sieve.rc email-incoming-unsorted
> 
> The sieve script is fine now that I have the correct "require"
> clauses (hint: "capability strings").
> 
> File ~/etc/email/sieve-dovecot-config.conf:
> 
>   protocols = pop
>   lda_mailbox_autocreate = yes
>   lda_mailbox_autosubscribe = yes
>   mail_fsync = never
> 
> There's no re-sending of emails into my local Postfix SMTP server - I
> checked the system logs and confirmed this (journalctl -f).
> 
> I suspect that Gnu sieve was directly writing each email to the
> appropriate sieve-determined mbox file (perhaps with only a sync at
> the end of a single batch process - what I've attempted to achieve
> above with sieve-filter), and that sieve-filter is instead passing
> each email through some (dovecot) lda?
> 
> Here's the output for a sieve-filter batch processing of 11 emails:
> 
> $ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf 
> -o 
> mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
>  /home/zen/etc/email/sieve.rc email-incoming-unsorted
> # PS0 Timestamp: 20190912@07:02:23
> info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: 
> VentureBeat: The death of disk? H...'.
> info: 
> msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=oydedx4fmcord...@mail.gmail.com>: 
> stored mail into mailbox 'l/cp/cp'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] 
> xattr naming format in Zo...'.
> info: msgid=<15675101930.d5ba2e.12...@composer.zfsonlinux.topicbox.com>: 
> stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: 
> [zfs-devel] xattr naming format i...'.
> info: msgid=<23955051567513...@sas1-02732547ccc0.qloud-c.yandex.net>: stored 
> mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: 
> [Gluster-users] Issues with Geo-r...'.
> info: 
> msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=igdajtszkfq5pczsu...@mail.gmail.com>: 
> stored mail into mailbox 'l/gl/user'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'.
> info: msgid=<20190903133420.gs6...@eeg.ccf.org>: stored mail into mailbox 
> 'l/deb/user'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] 
> `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<0715adb7-540f-4cff-9282-e1252c53c...@googlegroups.com>: stored 
> mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] 
> `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa...@googlegroups.com>: stored 
> mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: 
> [zfs-devel] xattr naming format i...'.
> info: 
> msgid=<cab5c7xphcdfx1w3ya9fyrl-kq8buicr4jbidqrufjj9nogk...@mail.gmail.com>: 
> stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: 
> [asterisk-users] Playing MP3's in...'.
> info: msgid=<20190903151022.354xpe6ds2vglher@red.localdomain>: stored mail 
> into mailbox 'l/as/users'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: 
> [Hyperledger Fabric] a primitive ...'.
> info: msgid=<160901d8-b903-9e9a-91ac-267571b0e...@gmx.com>: stored mail into 
> mailbox 'l/hl/fabric'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] 
> `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffc...@googlegroups.com>: stored 
> mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> 2 ▶︎️ zen@eye 20190912@07:02:30 ~ $
> 
> 
> So about 3/4 of a second is spent by dovecot's sieve-filter, on each
> email that it processes - watching it is painful given how fast Gnu
> sieve has been for the last few years - it's almost (but not quite)
> as slow as my previous fetchmail email download per-email time.
> 
> Attached is a -D debug run of sieve-filter on 20 emails - slightly
> longer than the above, and took roughly 15 seconds to run.
> 
> Any help appreciated...


On another test run of ~600 emails, sieve-filter is consistently
running ~100% of one CPU (for about 4 minutes) to process these
emails, which leads to the conclusion that despite what looks like
should be a batch process, sieve-filter is perhaps reloading the
rules for every single email that it processes, even though I gave it
a whole mbox, and not a single email, to process.

Can sieve-filter work the way it should / the way I want it / batch
process a whole mbox - without reloading the sieve rules for every
email?

----- End forwarded message -----

Reply via email to