419 scams in .doc and .rtf attachments

2009-06-16 Thread Rosenbaum, Larry M.
We get a significant number of 419 scam letters where the actual spam text is 
in a Word (.doc or .rtf) or PDF attachment.  Example:

http://pastebin.com/m4a161daa

It would be really great if there was an SA plugin to extract the text from the 
attachment and then feed the text to the regular SA body rules.  Has anybody 
looked at that possibility?

Thanks, Larry


Re: 419 scams in .doc and .rtf attachments

2009-06-16 Thread SM

At 10:41 16-06-2009, Rosenbaum, Larry M. wrote:
We get a significant number of 419 scam letters where the actual 
spam text is in a Word (.doc or .rtf) or PDF attachment.  Example:


Don't limit yourself to that.  Think of the next step.

It would be really great if there was an SA plugin to extract the 
text from the attachment and then feed the text to the regular SA 
body rules.  Has anybody looked at that possibility?


See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin  It is 
possible to modify that plugin to call the wv library to extract the 
content.  If you want to use regular rules, you would have to render 
the content before passing the modified message to SpamAssassin.


Regards,
-sm