Re: Spamassassin not parsing email messages

2012-12-29 Thread Martin Gregorie
On Fri, 2012-12-28 at 21:48 -0800, Sean Tout wrote:

 I have practically given up on the original
 perl code since I'm unable to find out the issue. With spamc, I can get a
 decent performance.
 
IMO, unless you need the extra facilities of amavis-new or one of the
other smart wrappers for SA and Clamav, you're almost always better off
using spamc/spamd for the reasons already given. FYI amavis-new is
written in Perl and works by loading the SA code so it can directly pass
messages to SA and read its responses.

However, don't mistake using spamc/spamd for 'not using the original
Perl code' - it isn't. Although spamc is a simple purpose-built, fast C
program which adds minimal runtime overheads, spamd is little more than
simple daemon launcher wrapped round the standard SA code. Look at it
with less and you'll see what I mean...


Martin





Re: Spamassassin not parsing email messages

2012-12-29 Thread RW
On Fri, 28 Dec 2012 21:48:25 -0800 (PST)
Sean Tout wrote:

 Hi Martin,
 
 You certainly did not miss anythingbut I did! Being new to
 spamassassin, I was only familiar with spamassassin command. which
 was awfully slow for a large number of emails. But now that I used
 spamc, I'm getting 5+ messages per second.
 
 Thank you much for the advise. I have practically given up on the
 original perl code since I'm unable to find out the issue. With
 spamc, I can get a decent performance.
 


Using spamc avoids repeated initialisation, but if I want it to be
really fast I do it something like this:


   for m in /home/sean/code/spam/spfiles/*
   do
  spamc $m  ... 
  [ $(( n=(n+1) % 20 )) -eq 0 ]  spamc -K /dev/null
   done

It puts spamc processes into the background in parallel. Occasionally
running spamc -K in the foreground prevents unnecessary timeouts by
limiting the number of spamc process waiting to be assigned to a spamd
child process.

At very least there's a speed-up from using all cpu cores, but with slow
or unreliable network tests the speed-up can be enormous. You need to
set --max-children in spamd appropriately.



Re: Spamassassin not parsing email messages

2012-12-28 Thread Henrik K
On Fri, Dec 28, 2012 at 12:45:03AM -0800, Sean Tout wrote:
 Hello,
 
 I wrote a short Perl program that reads email from an existing mbox
 formatted file, passes each individual email to Spamassassin for parse and
 score, then prints a report for each email. The strange thing is that I keep
 getting the same report score for all messages. I did confirm that I'm
 reading each message by printing it after reading it. I tried the below code
 on many different emails (spam and ham) yet I get the same report score for
 all of them. What am I doing wrong? 

You need to completely destroy SpamAssassin after usage.

Change this:

   my $spamtest = Mail::SpamAssassin-new();
 
   # This is the main loop. It's executed once for each email
   while(!$folder_reader-end_of_file())
   {
 $email = $folder_reader-read_next_email();
 $mail = $spamtest-parse($email);
 $status = $spamtest-check($mail);
 print RFILE $status-get_report();
 print RFILE \n;
   }

To something like this:

while(!$folder_reader-end_of_file())
{
  my $email = $folder_reader-read_next_email();
  my $spamtest = Mail::SpamAssassin-new();
  my $mail = $spamtest-parse($email);
  my $status = $spamtest-check($mail);
  print RFILE $status-get_report();
  print RFILE \n;
  $status-finish(); # important
  $mail-finish(); # important
  $spamtest-finish(); # important
}

I can't remember from the top of my head if $spamtest can be reused after
finish(), but atleast this should work 100%.



Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi Henrik,

Thank you much for the prompt response and points. I ran the Perl script
with the code you pasted below, but still got the same report scores for all
emails! by the way, when I also tried to print contents of the emails using
$status-get_content_preview(), I got [...] I'm unable to print any portions
of the email messages using $status = $spamtest-check($mail), however I can
print any portions using $folder_reader-read_next_email().

Regards,

Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102772.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi Henrik  Jeff,

One more input that might shed more light. I copied one of the emails from
the above 3 emails into its own file and ran spamassassin from the command
line in test mode against it and it worked fine. the command is 
spamassassin --test-mode  /spamemails/singleemail.spam

where singleemail.spam contains a single spam email.

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread Dave Funk

That implies that what ever mechanism you're using in the original process
is adding a blank line (or bare 'nl' or 'cr') to the beginning of the
message that you're then handing to SA.

Idiot question, are you doing (or not) a chomp in the initial read 
process?



On Fri, 28 Dec 2012, Sean Tout wrote:


Hi Henrik  Jeff,

One more input that might shed more light. I copied one of the emails from
the above 3 emails into its own file and ran spamassassin from the command
line in test mode against it and it worked fine. the command is
spamassassin --test-mode  /spamemails/singleemail.spam

where singleemail.spam contains a single spam email.

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{


Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi Dave,

That's most likely the case. But I'm not sure what's going in there and how
to get rid of it. I tried with and without chomp() but got the same results.
below is a snippet with chomp, which I applied before parsing the email with
spamassassin.

my $spamtest = Mail::SpamAssassin-new();

  # This is the main loop. It's executed once for each email
  while(!$folder_reader-end_of_file())
  {
$email = $folder_reader-read_next_email();
chomp($email);
$mail = $spamtest-parse($email);
$status = $spamtest-check($mail);
  #rest of code per above.
  }

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102784.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread John Hardin

On Fri, 28 Dec 2012, Sean Tout wrote:


That's most likely the case. But I'm not sure what's going in there and how
to get rid of it. I tried with and without chomp() but got the same results.
below is a snippet with chomp, which I applied before parsing the email with
spamassassin.

my $spamtest = Mail::SpamAssassin-new();

 # This is the main loop. It's executed once for each email
 while(!$folder_reader-end_of_file())
 {
   $email = $folder_reader-read_next_email();


Write $email to a file here and take a look at it.


   chomp($email);
   $mail = $spamtest-parse($email);
   $status = $spamtest-check($mail);
 #rest of code per above.
 }


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  How can you reason with someone who thinks we're on a glidepath to
  a police state and yet their solution is to grant the government a
  monopoly on force? They are insane.
---
 211 days since the first successful private support mission to ISS (SpaceX)


Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi John,

I wrote every email read to an output file. The output file is identical to
the input file I'm reading the emails from according to diff! 

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102786.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread John Hardin

On Fri, 28 Dec 2012, Sean Tout wrote:


Hi John,

I wrote every email read to an output file. The output file is identical to
the input file I'm reading the emails from according to diff!


The concern is the format of the single mail object being sent to 
SpamAssassin for scanning. Having the very first line of that object be a 
blank line would explain the misformatted message rule hits you've 
reported.


Capturing the entire mailbox and running a diff is certainly suggestive, 
but to be *sure* you want to look at the messages individually.


If you capture that one mail object to a file, and it is a 
properly-formatted RFC-822 message with no leading blank lines, and you 
can successfully pipe that file through SA and get a sensible score, then 
the problem is not in the data, it's how it's being fed to SpamAssassin 
within that script.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The more you believe you can create heaven on earth the more
  likely you are to set up guillotines in the public square to
  hasten the process. -- James Lileks
---
 211 days since the first successful private support mission to ISS (SpaceX)


Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi John,

Per your response below, here is what I did to confirm it's not a content
problem. 
open (RFILE, $reportfile_name);
while(!$folder_reader-end_of_file())
  {
$email = $folder_reader-read_next_email();
chomp($email);
$mail = $spamtest-parse($email);
$status = $spamtest-check($mail);
print RFILE $$email;
}

then issued the following command:
spamassassin --test-mode  /home/stout/spam/reportfile_in.txt

the above worked just fine. the contents of reportfile_in.txt are created by
print RFILE $$email.

Thoughts!

Regards,

-Sean.





--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102789.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread John Hardin

On Fri, 28 Dec 2012, Sean Tout wrote:


Hi John,

Per your response below, here is what I did to confirm it's not a content
problem.
open (RFILE, $reportfile_name);
while(!$folder_reader-end_of_file())
 {
   $email = $folder_reader-read_next_email();
   chomp($email);
   $mail = $spamtest-parse($email);
   $status = $spamtest-check($mail);
   print RFILE $$email;
}

then issued the following command:
spamassassin --test-mode  /home/stout/spam/reportfile_in.txt

the above worked just fine. the contents of reportfile_in.txt are created by
print RFILE $$email.

Thoughts!


Unfortunately that's all I can recommend. I am not familiar with using the 
SpamAssassin libraries directly from Perl. If I were in your situation I'd 
do something hackish like system(spamc $RFILE) or an equally ugly shell 
script... :)


Sorry.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Justice is justice, whereas social justice is code for one set
  of rules for the rich, another for the poor; one set for whites,
  another set for minorities; one set for straight men, another for
  women and gays. In short, it's the opposite of actual justice.
-- Burt Prelutsky
---
 211 days since the first successful private support mission to ISS (SpaceX)


Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi John,

Thank you much for the help. I have been trying to avoid executing
spamassassin shell commands from perl since it takes a significant amount of
time~=12 seconds for each email. I have tried the below script, which works
but of course not in a favorable especially for processing 20,000+ emails in
spfiles folder.

@files = /home/sean/code/spam/spfiles/*;
my $outfile = 'mailrep_out.txt';
open (MYFILE, $outfile);
foreach $file (@files) {
   $cmd = spamassassin --test-mode  .$file. mail_out.txt;
   system ($cmd);
}
close(MYFILE);

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102791.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread Martin Gregorie
On Fri, 2012-12-28 at 16:51 -0800, Sean Tout wrote:
 Hi John,
 
 Thank you much for the help. I have been trying to avoid executing
 spamassassin shell commands from perl since it takes a significant amount of
 time~=12 seconds for each email. I have tried the below script, which works
 but of course not in a favorable especially for processing 20,000+ emails in
 spfiles folder.
 
 @files = /home/sean/code/spam/spfiles/*;
 my $outfile = 'mailrep_out.txt';
 open (MYFILE, $outfile);
 foreach $file (@files) {
$cmd = spamassassin --test-mode  .$file. mail_out.txt;
system ($cmd);
 }
 close(MYFILE);
 
 Regards,
 
 -Sean.
 
As, from this, it seems that you have already got the messages held as
individual files in the /home/sean/code/spam/spfiles/ directory, why not
feed them directly to spamd with a small bash script:

for m in /home/sean/code/spam/spfiles/*
do
spamc $m | pipeline to analyse and store spamd replies
done

which should run a lot faster than calling spamassassin directly because
spamd will only need to be loaded once at the start of the run.

... or did I miss something obvious?


Martin





Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi Martin,

You certainly did not miss anythingbut I did! Being new to spamassassin,
I was only familiar with spamassassin command. which was awfully slow for a
large number of emails. But now that I used spamc, I'm getting 5+ messages
per second.

Thank you much for the advise. I have practically given up on the original
perl code since I'm unable to find out the issue. With spamc, I can get a
decent performance.

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102801.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Parsing Email

2006-10-11 Thread Tyler Nally
Hello,

I've a project that I'm needing to solve.  Fax machines (for a client)
have been replaced with the phone company's fax server that e-mails
the incomming fax (.tif) images to a specific e-address at the clients
place of business.

Just so happens, the e-mail passes through a mail server that will
inspect it for e-viri as well as run it through spamassassin before
it forwards it onto their machine.  That mail server that pre-processes
the clients e-mail is a machine I administer.

What I'd like to do... is capture the contents of these particular
fax e-mails as its passing through the machine I administer and either:

  1- copy the fax images (detach the images from e-mail messages)
  and store these images on that server (whether as a file
  or put into a database as a blob)
  2- create a database record that will essentially catalog the
  incoming fax to associate a fax file image (or db blob ID)

  A- and also search a database for existing origination fax #'s
  so that the fax can be associated as to the right company
  that sent it.  In this case.. the DB used is a MySQL
  database that exists on this particular machine as well.


Now.. what I need help in understanding... is ... assuming that
I can handle each e-mail separately as it comes through, how do I
parse the e-mail (like the way Spamassassin does) to have the
ability to pull the component parts from the e-mail (from:,
subject:, and MIME-encapsulated fax image) in order to be able
to use these pieces (somehow) for the customer care module.

I'm well versed in PHP... I used to do a lot of perl (many moons
ago) and I'd like to make this work without too awful much pain.

I think ultimately, I'll probably let the normal copy of the e-mail
go onto the customers destination.  I'd cause an extra Cc: to
go through a specific e-mail account on the server where anything
that is delivered to this account is strained by this e-mail
parsing program that'll split the e-mail up into it's pieces,
and distribute/use the chunks it in a manner that I can manipulate
it later in the process.

Any help to point me in the right direction?

Thanks a lot

Tyler Nally


Re: Parsing Email

2006-10-11 Thread Theo Van Dinter
On Wed, Oct 11, 2006 at 05:24:46PM -0400, Tyler Nally wrote:
 Now.. what I need help in understanding... is ... assuming that
 I can handle each e-mail separately as it comes through, how do I
 parse the e-mail (like the way Spamassassin does) to have the
 ability to pull the component parts from the e-mail (from:,
 subject:, and MIME-encapsulated fax image) in order to be able
 to use these pieces (somehow) for the customer care module.

:)  I answered this kind of question for someone on IRC a week or two ago,
here's a quick example of how to use Mail::SpamAssassin::Message:

use Mail::SpamAssassin::Message;
my $msg = Mail::SpamAssassin::Message-new() || die Message error?;
my $count = 0;
foreach my $p ($msg-find_parts(qr/^image\b/i, 1)) {
  open(OUT, message..$count++) || die can't write file message.$count: $!;
  binmode OUT;
  print OUT $p-decode();
  close(OUT);
}


So that parses a message from STDIN, goes through and finds all image parts,
and writes them out to files called message.#.

Use perldoc Mail::SpamAssassin::Message and perldoc
Mail::SpamAssassin::Message::Node for more information about functions and
such. :)

-- 
Randomly Selected Tagline:
Zero equals Zero   - Prof. Farr


pgpTIvsXoI7I0.pgp
Description: PGP signature


Re: Parsing Email

2006-10-11 Thread Vincent Li

On Wed, 11 Oct 2006, Theo Van Dinter wrote:


On Wed, Oct 11, 2006 at 05:24:46PM -0400, Tyler Nally wrote:

Now.. what I need help in understanding... is ... assuming that
I can handle each e-mail separately as it comes through, how do I
parse the e-mail (like the way Spamassassin does) to have the
ability to pull the component parts from the e-mail (from:,
subject:, and MIME-encapsulated fax image) in order to be able
to use these pieces (somehow) for the customer care module.


:)  I answered this kind of question for someone on IRC a week or two ago,
here's a quick example of how to use Mail::SpamAssassin::Message:


Yeah, I learned to use Message.pm from felicity :)



use Mail::SpamAssassin::Message;
my $msg = Mail::SpamAssassin::Message-new() || die Message error?;
my $count = 0;
foreach my $p ($msg-find_parts(qr/^image\b/i, 1)) {
 open(OUT, message..$count++) || die can't write file message.$count: $!;
 binmode OUT;
 print OUT $p-decode();
 close(OUT);
}


So that parses a message from STDIN, goes through and finds all image parts,
and writes them out to files called message.#.


I used code below to retrieve the spam forwarded as attachment from 
squirrelmail and feeds spam to sa-learn

---

#!/usr/bin/perl

use strict;
use warnings;

my $fh;
open $fh, , shift;
my @message = $fh;

use Mail::SpamAssassin::Message;
my $msg = Mail::SpamAssassin::Message-new(
{
  'message' = [EMAIL PROTECTED],
}
) || die Message error?;

#foreach my $p ($msg-find_parts(qr/^(text|image|application)\b/i, 1)) {
foreach my $p ($msg-find_parts(qr/^message\b/i, 0)) {
eval {
   no warnings ;
   my $type = $p-{'type'};
   my $attachname = $p-{'name'};
   print Content type is: $type\n;
   print write file name: $attachname\n;
   open my $out, , $attachname || die Can't write file 
$attachname:$!;

   binmode $out;
   print $out $p-decode();
};
#warn $@ if $@;
}
__END__



Use perldoc Mail::SpamAssassin::Message and perldoc
Mail::SpamAssassin::Message::Node for more information about functions and
such. :)

--
Randomly Selected Tagline:
Zero equals Zero   - Prof. Farr



Vincent Li  http://pingpongit.homelinux.com
Opensource  .Implementation. .Consulting.
Platform.Fedora. .Debian. .Mac OS X.
Bloghttp://bl0g.blogdns.com


Re: Parsing Email

2006-10-11 Thread Theo Van Dinter
On Wed, Oct 11, 2006 at 02:48:28PM -0700, Vincent Li wrote:
 my $fh;
 open $fh, , shift;
 my @message = $fh;
 
 use Mail::SpamAssassin::Message;
 my $msg = Mail::SpamAssassin::Message-new(
 {
   'message' = [EMAIL PROTECTED],
 }

FYI, new() accepts a file handle, an array, a scalar, or undef (which
causes it to use \*STDIN).  So you don't need to slurp the message data
in first. :)

-- 
Randomly Selected Tagline:
All in a days work for Confuse-a-Cat.


pgp8ev9w34Yhn.pgp
Description: PGP signature


Re: Parsing Email

2006-10-11 Thread Kelson

Tyler Nally wrote:

  1- copy the fax images (detach the images from e-mail messages)
  and store these images on that server (whether as a file
  or put into a database as a blob)


If you're running Sendmail, you can use MIMEdefang www.mimedefang.org 
for this.  It has a built-in function, action_replace_with_url, which 
does exactly what you want.


--
Kelson Vibber
SpeedGate Communications www.speed.net