Re: Spamassassin not parsing email messages
On Fri, 2012-12-28 at 21:48 -0800, Sean Tout wrote: I have practically given up on the original perl code since I'm unable to find out the issue. With spamc, I can get a decent performance. IMO, unless you need the extra facilities of amavis-new or one of the other smart wrappers for SA and Clamav, you're almost always better off using spamc/spamd for the reasons already given. FYI amavis-new is written in Perl and works by loading the SA code so it can directly pass messages to SA and read its responses. However, don't mistake using spamc/spamd for 'not using the original Perl code' - it isn't. Although spamc is a simple purpose-built, fast C program which adds minimal runtime overheads, spamd is little more than simple daemon launcher wrapped round the standard SA code. Look at it with less and you'll see what I mean... Martin
Re: Spamassassin not parsing email messages
On Fri, 28 Dec 2012 21:48:25 -0800 (PST) Sean Tout wrote: Hi Martin, You certainly did not miss anythingbut I did! Being new to spamassassin, I was only familiar with spamassassin command. which was awfully slow for a large number of emails. But now that I used spamc, I'm getting 5+ messages per second. Thank you much for the advise. I have practically given up on the original perl code since I'm unable to find out the issue. With spamc, I can get a decent performance. Using spamc avoids repeated initialisation, but if I want it to be really fast I do it something like this: for m in /home/sean/code/spam/spfiles/* do spamc $m ... [ $(( n=(n+1) % 20 )) -eq 0 ] spamc -K /dev/null done It puts spamc processes into the background in parallel. Occasionally running spamc -K in the foreground prevents unnecessary timeouts by limiting the number of spamc process waiting to be assigned to a spamd child process. At very least there's a speed-up from using all cpu cores, but with slow or unreliable network tests the speed-up can be enormous. You need to set --max-children in spamd appropriately.
Re: Spamassassin not parsing email messages
On Fri, Dec 28, 2012 at 12:45:03AM -0800, Sean Tout wrote: Hello, I wrote a short Perl program that reads email from an existing mbox formatted file, passes each individual email to Spamassassin for parse and score, then prints a report for each email. The strange thing is that I keep getting the same report score for all messages. I did confirm that I'm reading each message by printing it after reading it. I tried the below code on many different emails (spam and ham) yet I get the same report score for all of them. What am I doing wrong? You need to completely destroy SpamAssassin after usage. Change this: my $spamtest = Mail::SpamAssassin-new(); # This is the main loop. It's executed once for each email while(!$folder_reader-end_of_file()) { $email = $folder_reader-read_next_email(); $mail = $spamtest-parse($email); $status = $spamtest-check($mail); print RFILE $status-get_report(); print RFILE \n; } To something like this: while(!$folder_reader-end_of_file()) { my $email = $folder_reader-read_next_email(); my $spamtest = Mail::SpamAssassin-new(); my $mail = $spamtest-parse($email); my $status = $spamtest-check($mail); print RFILE $status-get_report(); print RFILE \n; $status-finish(); # important $mail-finish(); # important $spamtest-finish(); # important } I can't remember from the top of my head if $spamtest can be reused after finish(), but atleast this should work 100%.
Re: Spamassassin not parsing email messages
Hi Henrik, Thank you much for the prompt response and points. I ran the Perl script with the code you pasted below, but still got the same report scores for all emails! by the way, when I also tried to print contents of the emails using $status-get_content_preview(), I got [...] I'm unable to print any portions of the email messages using $status = $spamtest-check($mail), however I can print any portions using $folder_reader-read_next_email(). Regards, Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102772.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
Hi Henrik Jeff, One more input that might shed more light. I copied one of the emails from the above 3 emails into its own file and ran spamassassin from the command line in test mode against it and it worked fine. the command is spamassassin --test-mode /spamemails/singleemail.spam where singleemail.spam contains a single spam email. Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
That implies that what ever mechanism you're using in the original process is adding a blank line (or bare 'nl' or 'cr') to the beginning of the message that you're then handing to SA. Idiot question, are you doing (or not) a chomp in the initial read process? On Fri, 28 Dec 2012, Sean Tout wrote: Hi Henrik Jeff, One more input that might shed more light. I copied one of the emails from the above 3 emails into its own file and ran spamassassin from the command line in test mode against it and it worked fine. the command is spamassassin --test-mode /spamemails/singleemail.spam where singleemail.spam contains a single spam email. Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com. -- Dave Funk University of Iowa dbfunk (at) engineering.uiowa.eduCollege of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include std_disclaimer.h Better is not better, 'standard' is better. B{
Re: Spamassassin not parsing email messages
Hi Dave, That's most likely the case. But I'm not sure what's going in there and how to get rid of it. I tried with and without chomp() but got the same results. below is a snippet with chomp, which I applied before parsing the email with spamassassin. my $spamtest = Mail::SpamAssassin-new(); # This is the main loop. It's executed once for each email while(!$folder_reader-end_of_file()) { $email = $folder_reader-read_next_email(); chomp($email); $mail = $spamtest-parse($email); $status = $spamtest-check($mail); #rest of code per above. } Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102784.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
On Fri, 28 Dec 2012, Sean Tout wrote: That's most likely the case. But I'm not sure what's going in there and how to get rid of it. I tried with and without chomp() but got the same results. below is a snippet with chomp, which I applied before parsing the email with spamassassin. my $spamtest = Mail::SpamAssassin-new(); # This is the main loop. It's executed once for each email while(!$folder_reader-end_of_file()) { $email = $folder_reader-read_next_email(); Write $email to a file here and take a look at it. chomp($email); $mail = $spamtest-parse($email); $status = $spamtest-check($mail); #rest of code per above. } -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- How can you reason with someone who thinks we're on a glidepath to a police state and yet their solution is to grant the government a monopoly on force? They are insane. --- 211 days since the first successful private support mission to ISS (SpaceX)
Re: Spamassassin not parsing email messages
Hi John, I wrote every email read to an output file. The output file is identical to the input file I'm reading the emails from according to diff! Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102786.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
On Fri, 28 Dec 2012, Sean Tout wrote: Hi John, I wrote every email read to an output file. The output file is identical to the input file I'm reading the emails from according to diff! The concern is the format of the single mail object being sent to SpamAssassin for scanning. Having the very first line of that object be a blank line would explain the misformatted message rule hits you've reported. Capturing the entire mailbox and running a diff is certainly suggestive, but to be *sure* you want to look at the messages individually. If you capture that one mail object to a file, and it is a properly-formatted RFC-822 message with no leading blank lines, and you can successfully pipe that file through SA and get a sensible score, then the problem is not in the data, it's how it's being fed to SpamAssassin within that script. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The more you believe you can create heaven on earth the more likely you are to set up guillotines in the public square to hasten the process. -- James Lileks --- 211 days since the first successful private support mission to ISS (SpaceX)
Re: Spamassassin not parsing email messages
Hi John, Per your response below, here is what I did to confirm it's not a content problem. open (RFILE, $reportfile_name); while(!$folder_reader-end_of_file()) { $email = $folder_reader-read_next_email(); chomp($email); $mail = $spamtest-parse($email); $status = $spamtest-check($mail); print RFILE $$email; } then issued the following command: spamassassin --test-mode /home/stout/spam/reportfile_in.txt the above worked just fine. the contents of reportfile_in.txt are created by print RFILE $$email. Thoughts! Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102789.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
On Fri, 28 Dec 2012, Sean Tout wrote: Hi John, Per your response below, here is what I did to confirm it's not a content problem. open (RFILE, $reportfile_name); while(!$folder_reader-end_of_file()) { $email = $folder_reader-read_next_email(); chomp($email); $mail = $spamtest-parse($email); $status = $spamtest-check($mail); print RFILE $$email; } then issued the following command: spamassassin --test-mode /home/stout/spam/reportfile_in.txt the above worked just fine. the contents of reportfile_in.txt are created by print RFILE $$email. Thoughts! Unfortunately that's all I can recommend. I am not familiar with using the SpamAssassin libraries directly from Perl. If I were in your situation I'd do something hackish like system(spamc $RFILE) or an equally ugly shell script... :) Sorry. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Justice is justice, whereas social justice is code for one set of rules for the rich, another for the poor; one set for whites, another set for minorities; one set for straight men, another for women and gays. In short, it's the opposite of actual justice. -- Burt Prelutsky --- 211 days since the first successful private support mission to ISS (SpaceX)
Re: Spamassassin not parsing email messages
Hi John, Thank you much for the help. I have been trying to avoid executing spamassassin shell commands from perl since it takes a significant amount of time~=12 seconds for each email. I have tried the below script, which works but of course not in a favorable especially for processing 20,000+ emails in spfiles folder. @files = /home/sean/code/spam/spfiles/*; my $outfile = 'mailrep_out.txt'; open (MYFILE, $outfile); foreach $file (@files) { $cmd = spamassassin --test-mode .$file. mail_out.txt; system ($cmd); } close(MYFILE); Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102791.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
On Fri, 2012-12-28 at 16:51 -0800, Sean Tout wrote: Hi John, Thank you much for the help. I have been trying to avoid executing spamassassin shell commands from perl since it takes a significant amount of time~=12 seconds for each email. I have tried the below script, which works but of course not in a favorable especially for processing 20,000+ emails in spfiles folder. @files = /home/sean/code/spam/spfiles/*; my $outfile = 'mailrep_out.txt'; open (MYFILE, $outfile); foreach $file (@files) { $cmd = spamassassin --test-mode .$file. mail_out.txt; system ($cmd); } close(MYFILE); Regards, -Sean. As, from this, it seems that you have already got the messages held as individual files in the /home/sean/code/spam/spfiles/ directory, why not feed them directly to spamd with a small bash script: for m in /home/sean/code/spam/spfiles/* do spamc $m | pipeline to analyse and store spamd replies done which should run a lot faster than calling spamassassin directly because spamd will only need to be loaded once at the start of the run. ... or did I miss something obvious? Martin
Re: Spamassassin not parsing email messages
Hi Martin, You certainly did not miss anythingbut I did! Being new to spamassassin, I was only familiar with spamassassin command. which was awfully slow for a large number of emails. But now that I used spamc, I'm getting 5+ messages per second. Thank you much for the advise. I have practically given up on the original perl code since I'm unable to find out the issue. With spamc, I can get a decent performance. Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102801.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.