Re: Spamassassin not parsing email messages

2012-12-29 Thread Martin Gregorie
On Fri, 2012-12-28 at 21:48 -0800, Sean Tout wrote:

 I have practically given up on the original
 perl code since I'm unable to find out the issue. With spamc, I can get a
 decent performance.
 
IMO, unless you need the extra facilities of amavis-new or one of the
other smart wrappers for SA and Clamav, you're almost always better off
using spamc/spamd for the reasons already given. FYI amavis-new is
written in Perl and works by loading the SA code so it can directly pass
messages to SA and read its responses.

However, don't mistake using spamc/spamd for 'not using the original
Perl code' - it isn't. Although spamc is a simple purpose-built, fast C
program which adds minimal runtime overheads, spamd is little more than
simple daemon launcher wrapped round the standard SA code. Look at it
with less and you'll see what I mean...


Martin





Re: Spamassassin not parsing email messages

2012-12-29 Thread RW
On Fri, 28 Dec 2012 21:48:25 -0800 (PST)
Sean Tout wrote:

 Hi Martin,
 
 You certainly did not miss anythingbut I did! Being new to
 spamassassin, I was only familiar with spamassassin command. which
 was awfully slow for a large number of emails. But now that I used
 spamc, I'm getting 5+ messages per second.
 
 Thank you much for the advise. I have practically given up on the
 original perl code since I'm unable to find out the issue. With
 spamc, I can get a decent performance.
 


Using spamc avoids repeated initialisation, but if I want it to be
really fast I do it something like this:


   for m in /home/sean/code/spam/spfiles/*
   do
  spamc $m  ... 
  [ $(( n=(n+1) % 20 )) -eq 0 ]  spamc -K /dev/null
   done

It puts spamc processes into the background in parallel. Occasionally
running spamc -K in the foreground prevents unnecessary timeouts by
limiting the number of spamc process waiting to be assigned to a spamd
child process.

At very least there's a speed-up from using all cpu cores, but with slow
or unreliable network tests the speed-up can be enormous. You need to
set --max-children in spamd appropriately.



Re: Spamassassin not parsing email messages

2012-12-28 Thread Henrik K
On Fri, Dec 28, 2012 at 12:45:03AM -0800, Sean Tout wrote:
 Hello,
 
 I wrote a short Perl program that reads email from an existing mbox
 formatted file, passes each individual email to Spamassassin for parse and
 score, then prints a report for each email. The strange thing is that I keep
 getting the same report score for all messages. I did confirm that I'm
 reading each message by printing it after reading it. I tried the below code
 on many different emails (spam and ham) yet I get the same report score for
 all of them. What am I doing wrong? 

You need to completely destroy SpamAssassin after usage.

Change this:

   my $spamtest = Mail::SpamAssassin-new();
 
   # This is the main loop. It's executed once for each email
   while(!$folder_reader-end_of_file())
   {
 $email = $folder_reader-read_next_email();
 $mail = $spamtest-parse($email);
 $status = $spamtest-check($mail);
 print RFILE $status-get_report();
 print RFILE \n;
   }

To something like this:

while(!$folder_reader-end_of_file())
{
  my $email = $folder_reader-read_next_email();
  my $spamtest = Mail::SpamAssassin-new();
  my $mail = $spamtest-parse($email);
  my $status = $spamtest-check($mail);
  print RFILE $status-get_report();
  print RFILE \n;
  $status-finish(); # important
  $mail-finish(); # important
  $spamtest-finish(); # important
}

I can't remember from the top of my head if $spamtest can be reused after
finish(), but atleast this should work 100%.



Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi Henrik,

Thank you much for the prompt response and points. I ran the Perl script
with the code you pasted below, but still got the same report scores for all
emails! by the way, when I also tried to print contents of the emails using
$status-get_content_preview(), I got [...] I'm unable to print any portions
of the email messages using $status = $spamtest-check($mail), however I can
print any portions using $folder_reader-read_next_email().

Regards,

Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102772.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi Henrik  Jeff,

One more input that might shed more light. I copied one of the emails from
the above 3 emails into its own file and ran spamassassin from the command
line in test mode against it and it worked fine. the command is 
spamassassin --test-mode  /spamemails/singleemail.spam

where singleemail.spam contains a single spam email.

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread Dave Funk

That implies that what ever mechanism you're using in the original process
is adding a blank line (or bare 'nl' or 'cr') to the beginning of the
message that you're then handing to SA.

Idiot question, are you doing (or not) a chomp in the initial read 
process?



On Fri, 28 Dec 2012, Sean Tout wrote:


Hi Henrik  Jeff,

One more input that might shed more light. I copied one of the emails from
the above 3 emails into its own file and ran spamassassin from the command
line in test mode against it and it worked fine. the command is
spamassassin --test-mode  /spamemails/singleemail.spam

where singleemail.spam contains a single spam email.

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{


Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi Dave,

That's most likely the case. But I'm not sure what's going in there and how
to get rid of it. I tried with and without chomp() but got the same results.
below is a snippet with chomp, which I applied before parsing the email with
spamassassin.

my $spamtest = Mail::SpamAssassin-new();

  # This is the main loop. It's executed once for each email
  while(!$folder_reader-end_of_file())
  {
$email = $folder_reader-read_next_email();
chomp($email);
$mail = $spamtest-parse($email);
$status = $spamtest-check($mail);
  #rest of code per above.
  }

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102784.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread John Hardin

On Fri, 28 Dec 2012, Sean Tout wrote:


That's most likely the case. But I'm not sure what's going in there and how
to get rid of it. I tried with and without chomp() but got the same results.
below is a snippet with chomp, which I applied before parsing the email with
spamassassin.

my $spamtest = Mail::SpamAssassin-new();

 # This is the main loop. It's executed once for each email
 while(!$folder_reader-end_of_file())
 {
   $email = $folder_reader-read_next_email();


Write $email to a file here and take a look at it.


   chomp($email);
   $mail = $spamtest-parse($email);
   $status = $spamtest-check($mail);
 #rest of code per above.
 }


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  How can you reason with someone who thinks we're on a glidepath to
  a police state and yet their solution is to grant the government a
  monopoly on force? They are insane.
---
 211 days since the first successful private support mission to ISS (SpaceX)


Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi John,

I wrote every email read to an output file. The output file is identical to
the input file I'm reading the emails from according to diff! 

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102786.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread John Hardin

On Fri, 28 Dec 2012, Sean Tout wrote:


Hi John,

I wrote every email read to an output file. The output file is identical to
the input file I'm reading the emails from according to diff!


The concern is the format of the single mail object being sent to 
SpamAssassin for scanning. Having the very first line of that object be a 
blank line would explain the misformatted message rule hits you've 
reported.


Capturing the entire mailbox and running a diff is certainly suggestive, 
but to be *sure* you want to look at the messages individually.


If you capture that one mail object to a file, and it is a 
properly-formatted RFC-822 message with no leading blank lines, and you 
can successfully pipe that file through SA and get a sensible score, then 
the problem is not in the data, it's how it's being fed to SpamAssassin 
within that script.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The more you believe you can create heaven on earth the more
  likely you are to set up guillotines in the public square to
  hasten the process. -- James Lileks
---
 211 days since the first successful private support mission to ISS (SpaceX)


Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi John,

Per your response below, here is what I did to confirm it's not a content
problem. 
open (RFILE, $reportfile_name);
while(!$folder_reader-end_of_file())
  {
$email = $folder_reader-read_next_email();
chomp($email);
$mail = $spamtest-parse($email);
$status = $spamtest-check($mail);
print RFILE $$email;
}

then issued the following command:
spamassassin --test-mode  /home/stout/spam/reportfile_in.txt

the above worked just fine. the contents of reportfile_in.txt are created by
print RFILE $$email.

Thoughts!

Regards,

-Sean.





--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102789.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread John Hardin

On Fri, 28 Dec 2012, Sean Tout wrote:


Hi John,

Per your response below, here is what I did to confirm it's not a content
problem.
open (RFILE, $reportfile_name);
while(!$folder_reader-end_of_file())
 {
   $email = $folder_reader-read_next_email();
   chomp($email);
   $mail = $spamtest-parse($email);
   $status = $spamtest-check($mail);
   print RFILE $$email;
}

then issued the following command:
spamassassin --test-mode  /home/stout/spam/reportfile_in.txt

the above worked just fine. the contents of reportfile_in.txt are created by
print RFILE $$email.

Thoughts!


Unfortunately that's all I can recommend. I am not familiar with using the 
SpamAssassin libraries directly from Perl. If I were in your situation I'd 
do something hackish like system(spamc $RFILE) or an equally ugly shell 
script... :)


Sorry.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Justice is justice, whereas social justice is code for one set
  of rules for the rich, another for the poor; one set for whites,
  another set for minorities; one set for straight men, another for
  women and gays. In short, it's the opposite of actual justice.
-- Burt Prelutsky
---
 211 days since the first successful private support mission to ISS (SpaceX)


Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi John,

Thank you much for the help. I have been trying to avoid executing
spamassassin shell commands from perl since it takes a significant amount of
time~=12 seconds for each email. I have tried the below script, which works
but of course not in a favorable especially for processing 20,000+ emails in
spfiles folder.

@files = /home/sean/code/spam/spfiles/*;
my $outfile = 'mailrep_out.txt';
open (MYFILE, $outfile);
foreach $file (@files) {
   $cmd = spamassassin --test-mode  .$file. mail_out.txt;
   system ($cmd);
}
close(MYFILE);

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102791.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Spamassassin not parsing email messages

2012-12-28 Thread Martin Gregorie
On Fri, 2012-12-28 at 16:51 -0800, Sean Tout wrote:
 Hi John,
 
 Thank you much for the help. I have been trying to avoid executing
 spamassassin shell commands from perl since it takes a significant amount of
 time~=12 seconds for each email. I have tried the below script, which works
 but of course not in a favorable especially for processing 20,000+ emails in
 spfiles folder.
 
 @files = /home/sean/code/spam/spfiles/*;
 my $outfile = 'mailrep_out.txt';
 open (MYFILE, $outfile);
 foreach $file (@files) {
$cmd = spamassassin --test-mode  .$file. mail_out.txt;
system ($cmd);
 }
 close(MYFILE);
 
 Regards,
 
 -Sean.
 
As, from this, it seems that you have already got the messages held as
individual files in the /home/sean/code/spam/spfiles/ directory, why not
feed them directly to spamd with a small bash script:

for m in /home/sean/code/spam/spfiles/*
do
spamc $m | pipeline to analyse and store spamd replies
done

which should run a lot faster than calling spamassassin directly because
spamd will only need to be loaded once at the start of the run.

... or did I miss something obvious?


Martin





Re: Spamassassin not parsing email messages

2012-12-28 Thread Sean Tout
Hi Martin,

You certainly did not miss anythingbut I did! Being new to spamassassin,
I was only familiar with spamassassin command. which was awfully slow for a
large number of emails. But now that I used spamc, I'm getting 5+ messages
per second.

Thank you much for the advise. I have practically given up on the original
perl code since I'm unable to find out the issue. With spamc, I can get a
decent performance.

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102801.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.