Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

2019-09-12 Thread @lbutlr via dovecot
On Sep 12, 2019, at 12:57 AM, Zenaan Harkness  wrote:
> The next step, I throw the email-incoming-unsorted mbox file at a
> sieve processor, to sort the emails from that mbox, into other
> mboxes, according to the sieve rules file.

I would expect mbox is the worst possible format choice for this.

> Gnu sieve balks on emails which have no x-message-id (?? something
> like this) header field, so after a few years, I finally decided to
> switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.
> 
> Using Gnu sieve, this mbox sorting step was even faster than mpop (/
> getmail) - and mpop and getmail are really fast (compared with
> fetchmail), since they pipeline the email downloads.

Perhaps because of its reliance on the header allowing it to index?

> Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds
> at most. Super fast.

That doesn’t sound fast. I processed a few thousand messages through sieve in 
less than 10 seconds, if I recall correctly.

> See below for details, any ideas appreciated.

The first thing I would do is download to Maildir and see what the difference 
is.



-- 
What we have here is a failure to communicate.



Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

2019-09-12 Thread Zenaan Harkness via dovecot
Oh, one last bit for now regarding pipeing:

Given my current sieve-filter command:

MLOC="mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions"
SCRIPT=~/etc/email/sieve.rc

sieve-filter -veWD -c $SIEVE_CONF -o $MLOC $SCRIPT emails-incoming

I can imagine trying to do a pipe as suggested, like follows:

cat ~/mail/emails-incoming | sieve-filter -veW -c $SIEVE_CONF -o $MLOC $SCRIPT

But, I see no suggestion in the sieve-filter man page that this would
work. ISTM that sieve-filter just is not designed to work in a local
mbox email environment.


Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

2019-09-12 Thread Zenaan Harkness via dovecot
(I did subscribe to this mailing list, albeit with zen at
freedbms.net, so either way I'm getting all your emails - thank you
-so- much for replying...)

MUA is mutt, reading email in a terminal (sorry, forgot to mention this before).

For many years now my email folder (mbox files) collection has grown
to many GiB, mostly mailing lists.

If I am to change email storage format, it should be mutt compatible;
looking at https://wiki2.dovecot.org/MailboxFormat I see that only
DJB's Maildir is compatible with both Dovecot ("a reliable choice"
says the wiki), and mutt.

I can imagine that sdbox or mdbox could be made "mutt compatible" so
to speak, by running some sort of local IMAP server, and accessing my
email from mutt that way; this is undesirable to my mind because this
would require:

 1) a new learning curve wrt mutt and reading email on IMAP servers
 2) a new learning curve to set up a local IMAP server (securely)
 3) the inability to use mutt without a local IMAP server to read my local email

but such a setup would also have some quite desirable benefits:

 1) once set up, multiple MUAs could be used, and I'd have a beginning
grasp on setting up an IMAP server and front ends (this is something
on my bucket list, to assist my local church with)
 2) simpler remote "online" access to my local "offline" email store
(e.g. using my mobile phone when on the road) by setting up a webmail
server (much simpler (read "possible" to use on a mobile phone) than
using a vpn and mutt...), thus freeing me up from the behemoth web
email providers...

Next, I do not know how to "pipe the messages to the dovecot lda".
After downloading from my POP3 provider into a local mbox file (this
is my step 1), then I sort the emails (this is my step 2): the
following should be on a single line:

/usr/bin/sieve-filter -veW -c
$HOME/etc/email/sieve-dovecot-config.conf -o
mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
~/etc/email/sieve.rc email-incoming-unsorted

As you can see from the above command, sieve-filter is given the name
of the mbox ("mail folder") to sort, as its very last argument on the
command line - so in this instance, sieve-filter really has no excuse,
and should be not be re-reading the sieve rules script for each email
- now perhaps that's not happening, I only made an assumption because
of a CPU hitting 100% for a minute or two just to process a few 100
emails...

What could also be happening (again, an assumption), is that
sieve-filter is written to assume dovecot index files to be in
existence.

I disabled those with the "INDEX=" clause you see in the command
above, which obviously has been given no value.

The reason I figured out how to disable the creation of the indexes in
the .imap directories, is that for my setup, Gnu sieve has proven that
I should not need such indexes - with mbox files, just append each
email to the end of the target "mailbox folder" mbox file, and we're
done! This literally should not cost 100% CPU, even for one
millisecond! But more importantly, because my working email folder is
~30GiB, without disabling this index creation step, sieve-filter
forced the creation of indexes, which "took so long I gave up and hit
CTRL-C, which did not work, so I kill -9'ed the sieve-filter and
whatever other process was not stopping".

Last year someone on debian-user recommended I upgrade to using
Dovecot/Pigeonhole's sieve-filter (rather than Gnu sieve) due to the
issues with Gnu sieve.

I am starting to think that I should perhaps try to figure out if it's
possible to (re)process the emails Gnu sieve has a problem with, to
massage them into a shape that Gnu sieve accepts - then my immediate
problem would certainly be solved...

Thank you all again..
Zenaan


Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

2019-09-12 Thread Sami Ketola via dovecot
Don't use mbox.

It is very slow format when mails need to be deleted from the middle. Basically 
rewriting the whole mbox file each time.

Use sdbox instead.

Sami


> On 12 Sep 2019, at 9.57, Zenaan Harkness via dovecot  
> wrote:
> 
> I am wondering why sieve-filter is so slow compared to gnu sieve.
> 
> I run mpop (like getmail) to download from a pop3 server to a local
> mbox file: ~/mail/email-incoming-unsorted
> 
> This step is very fast.
> 
> The next step, I throw the email-incoming-unsorted mbox file at a
> sieve processor, to sort the emails from that mbox, into other
> mboxes, according to the sieve rules file.
> 
> Up until a couple days ago I was using Gnu sieve.
> 
> Gnu sieve balks on emails which have no x-message-id (?? something
> like this) header field, so after a few years, I finally decided to
> switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.
> 
> Using Gnu sieve, this mbox sorting step was even faster than mpop (/
> getmail) - and mpop and getmail are really fast (compared with
> fetchmail), since they pipeline the email downloads.
> 
> Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds
> at most. Super fast.
> 
> Using sieve-filter, all emails are being processed - including those
> without "message id header". This is good.
> 
> But also, using sieve filter, is really slower - slower than the
> download step by an order of magnitude or two.
> 
> See below for details, any ideas appreciated.
> 
> To add to the below, I added:
> 
> mbox_very_dirty_syncs = yes
> 
> to the sieve-filter config, which slightly improves performance, but
> not by much (in comparison with Gnu sieve).
> 
> TIA,
> 
> 
> 
> ----- Forwarded message from Zenaan Harkness  -----
> 
> From: Zenaan Harkness 
> To: debian-u...@lists.debian.org
> Date: Thu, 12 Sep 2019 08:06:12 +1000
> Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow 
> at lda (writing emails to local mbox files)
> 
> On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:
>> Why is Gnu sieve so extremely fast to batch process an mbox file, but
>> while Dovecot's sieve-filter is an order of magnitude slower?
>> 
>> Sequence:
>> 
>> - mpop or getmail to pipeline download emails into temp mbox file
>> - filter that file
>> 
>> Gnu sieve just flies through a local mbox file and saving emails to
>> other local mbox files.
>> 
>> Gnu sieve rejects too many emails with "malformed" errors, so after a
>> few years I bit the bullet and upgraded to Dovecot's sieve-filter.
>> 
>> Dovecot's sieve-filter, at present, is an order of magnitude slower.
>> 
>> Here's my filter command (one line):
>> 
>> /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o 
>> mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
>>  ~/etc/email/sieve.rc email-incoming-unsorted
>> 
>> The sieve script is fine now that I have the correct "require"
>> clauses (hint: "capability strings").
>> 
>> File ~/etc/email/sieve-dovecot-config.conf:
>> 
>>  protocols = pop
>>  lda_mailbox_autocreate = yes
>>  lda_mailbox_autosubscribe = yes
>>  mail_fsync = never
>> 
>> There's no re-sending of emails into my local Postfix SMTP server - I
>> checked the system logs and confirmed this (journalctl -f).
>> 
>> I suspect that Gnu sieve was directly writing each email to the
>> appropriate sieve-determined mbox file (perhaps with only a sync at
>> the end of a single batch process - what I've attempted to achieve
>> above with sieve-filter), and that sieve-filter is instead passing
>> each email through some (dovecot) lda?
>> 
>> Here's the output for a sieve-filter batch processing of 11 emails:
>> 
>> $ /usr/bin/sieve-filter -veW -c 
>> /home/zen/etc/email/sieve-dovecot-config.conf -o 
>> mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
>>  /home/zen/etc/email/sieve.rc email-incoming-unsorted
>> # PS0 Timestamp: 20190912@07:02:23
>> info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: 
>> VentureBeat: The death of disk? H...'.
>> info: 
>> msgid=: 
>> stored mail into mailbox 'l/cp/cp'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 07:29:53 -040

Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

2019-09-12 Thread Zenaan Harkness via dovecot
I am wondering why sieve-filter is so slow compared to gnu sieve.

I run mpop (like getmail) to download from a pop3 server to a local
mbox file: ~/mail/email-incoming-unsorted

This step is very fast.

The next step, I throw the email-incoming-unsorted mbox file at a
sieve processor, to sort the emails from that mbox, into other
mboxes, according to the sieve rules file.

Up until a couple days ago I was using Gnu sieve.

Gnu sieve balks on emails which have no x-message-id (?? something
like this) header field, so after a few years, I finally decided to
switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.

Using Gnu sieve, this mbox sorting step was even faster than mpop (/
getmail) - and mpop and getmail are really fast (compared with
fetchmail), since they pipeline the email downloads.

Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds
at most. Super fast.

Using sieve-filter, all emails are being processed - including those
without "message id header". This is good.

But also, using sieve filter, is really slower - slower than the
download step by an order of magnitude or two.

See below for details, any ideas appreciated.

To add to the below, I added:

mbox_very_dirty_syncs = yes

to the sieve-filter config, which slightly improves performance, but
not by much (in comparison with Gnu sieve).

TIA,



- Forwarded message from Zenaan Harkness  -

From: Zenaan Harkness 
To: debian-u...@lists.debian.org
Date: Thu, 12 Sep 2019 08:06:12 +1000
Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at 
lda (writing emails to local mbox files)

On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:
> Why is Gnu sieve so extremely fast to batch process an mbox file, but
> while Dovecot's sieve-filter is an order of magnitude slower?
> 
> Sequence:
> 
>  - mpop or getmail to pipeline download emails into temp mbox file
>  - filter that file
> 
> Gnu sieve just flies through a local mbox file and saving emails to
> other local mbox files.
> 
> Gnu sieve rejects too many emails with "malformed" errors, so after a
> few years I bit the bullet and upgraded to Dovecot's sieve-filter.
> 
> Dovecot's sieve-filter, at present, is an order of magnitude slower.
> 
> Here's my filter command (one line):
> 
> /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o 
> mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
>  ~/etc/email/sieve.rc email-incoming-unsorted
> 
> The sieve script is fine now that I have the correct "require"
> clauses (hint: "capability strings").
> 
> File ~/etc/email/sieve-dovecot-config.conf:
> 
>   protocols = pop
>   lda_mailbox_autocreate = yes
>   lda_mailbox_autosubscribe = yes
>   mail_fsync = never
> 
> There's no re-sending of emails into my local Postfix SMTP server - I
> checked the system logs and confirmed this (journalctl -f).
> 
> I suspect that Gnu sieve was directly writing each email to the
> appropriate sieve-determined mbox file (perhaps with only a sync at
> the end of a single batch process - what I've attempted to achieve
> above with sieve-filter), and that sieve-filter is instead passing
> each email through some (dovecot) lda?
> 
> Here's the output for a sieve-filter batch processing of 11 emails:
> 
> $ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf 
> -o 
> mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
>  /home/zen/etc/email/sieve.rc email-incoming-unsorted
> # PS0 Timestamp: 20190912@07:02:23
> info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: 
> VentureBeat: The death of disk? H...'.
> info: 
> msgid=: 
> stored mail into mailbox 'l/cp/cp'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] 
> xattr naming format in Zo...'.
> info: msgid=<15675101930.d5ba2e.12...@composer.zfsonlinux.topicbox.com>: 
> stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: 
> [zfs-devel] xattr naming format i...'.
> info: msgid=<23955051567513...@sas1-02732547ccc0.qloud-c.yandex.net>: stored 
> mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: 
> [Gluster-users] Issues with Geo-r