I’ve asked a related question on this list before but I now have a much better 
handle on what I’m doing and I realize that I still don’t know the answer, so 
I’m going to ask this again in a slightly different form.

I’m writing a spam filter, so obviously I need to feed incoming mail to it 
somehow.  The “obvious” way to do this is with a sieve script using the pipe 
extension.  There are two problems with this:

1.  This will always pipe the entire file no matter how big it is.  The filter 
will often not need to process the body of the message, only the headers, or 
only the first part of a multipart MIME message.  Is there any way to allow my 
filter to open the file in which the message is stored rather than piping it a 
copy of the message?

2.  Once the filter has processed the message and decided if it’s spam it still 
needs to move the message to the appropriate folder (INBOX or Junk).  To do 
this it needs to somehow correlate the *content* of the message that was piped 
to it with the UID of the message that needs to be moved.  One way to do this 
is to pull out the message-id header and then use doveadm to find the file 
containing the message with that message-id, but there are two problems with 
this.  First, not all messages have message-ids.  I can work around this by 
adding my own message-id to messages that don’t already have them, but this 
just feel wrong.  And second, unless dovecot keeps an index of message-ids 
(does it?) then this will be horribly inefficient because it will have to 
essentially grep for the message id every time I want to move a message.  So it 
seems like there has to be a better way, but I can’t think of what that would 
be.

I figure this has to be a solved problem because I am obviously not the first 
person to write a spam filter for dovecot.  What is the Right Way to do this?

Thanks,
rg

Reply via email to