Re: Spamassassin not parsing email messages
On Fri, 2012-12-28 at 21:48 -0800, Sean Tout wrote: I have practically given up on the original perl code since I'm unable to find out the issue. With spamc, I can get a decent performance. IMO, unless you need the extra facilities of amavis-new or one of the other smart wrappers for SA and Clamav, you're almost always better off using spamc/spamd for the reasons already given. FYI amavis-new is written in Perl and works by loading the SA code so it can directly pass messages to SA and read its responses. However, don't mistake using spamc/spamd for 'not using the original Perl code' - it isn't. Although spamc is a simple purpose-built, fast C program which adds minimal runtime overheads, spamd is little more than simple daemon launcher wrapped round the standard SA code. Look at it with less and you'll see what I mean... Martin
Re: Spamassassin not parsing email messages
On Fri, 28 Dec 2012 21:48:25 -0800 (PST) Sean Tout wrote: Hi Martin, You certainly did not miss anythingbut I did! Being new to spamassassin, I was only familiar with spamassassin command. which was awfully slow for a large number of emails. But now that I used spamc, I'm getting 5+ messages per second. Thank you much for the advise. I have practically given up on the original perl code since I'm unable to find out the issue. With spamc, I can get a decent performance. Using spamc avoids repeated initialisation, but if I want it to be really fast I do it something like this: for m in /home/sean/code/spam/spfiles/* do spamc $m ... [ $(( n=(n+1) % 20 )) -eq 0 ] spamc -K /dev/null done It puts spamc processes into the background in parallel. Occasionally running spamc -K in the foreground prevents unnecessary timeouts by limiting the number of spamc process waiting to be assigned to a spamd child process. At very least there's a speed-up from using all cpu cores, but with slow or unreliable network tests the speed-up can be enormous. You need to set --max-children in spamd appropriately.
Re: Spamassassin not parsing email messages
On Fri, Dec 28, 2012 at 12:45:03AM -0800, Sean Tout wrote: Hello, I wrote a short Perl program that reads email from an existing mbox formatted file, passes each individual email to Spamassassin for parse and score, then prints a report for each email. The strange thing is that I keep getting the same report score for all messages. I did confirm that I'm reading each message by printing it after reading it. I tried the below code on many different emails (spam and ham) yet I get the same report score for all of them. What am I doing wrong? You need to completely destroy SpamAssassin after usage. Change this: my $spamtest = Mail::SpamAssassin-new(); # This is the main loop. It's executed once for each email while(!$folder_reader-end_of_file()) { $email = $folder_reader-read_next_email(); $mail = $spamtest-parse($email); $status = $spamtest-check($mail); print RFILE $status-get_report(); print RFILE \n; } To something like this: while(!$folder_reader-end_of_file()) { my $email = $folder_reader-read_next_email(); my $spamtest = Mail::SpamAssassin-new(); my $mail = $spamtest-parse($email); my $status = $spamtest-check($mail); print RFILE $status-get_report(); print RFILE \n; $status-finish(); # important $mail-finish(); # important $spamtest-finish(); # important } I can't remember from the top of my head if $spamtest can be reused after finish(), but atleast this should work 100%.
Re: Spamassassin not parsing email messages
Hi Henrik, Thank you much for the prompt response and points. I ran the Perl script with the code you pasted below, but still got the same report scores for all emails! by the way, when I also tried to print contents of the emails using $status-get_content_preview(), I got [...] I'm unable to print any portions of the email messages using $status = $spamtest-check($mail), however I can print any portions using $folder_reader-read_next_email(). Regards, Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102772.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
Hi Henrik Jeff, One more input that might shed more light. I copied one of the emails from the above 3 emails into its own file and ran spamassassin from the command line in test mode against it and it worked fine. the command is spamassassin --test-mode /spamemails/singleemail.spam where singleemail.spam contains a single spam email. Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
That implies that what ever mechanism you're using in the original process is adding a blank line (or bare 'nl' or 'cr') to the beginning of the message that you're then handing to SA. Idiot question, are you doing (or not) a chomp in the initial read process? On Fri, 28 Dec 2012, Sean Tout wrote: Hi Henrik Jeff, One more input that might shed more light. I copied one of the emails from the above 3 emails into its own file and ran spamassassin from the command line in test mode against it and it worked fine. the command is spamassassin --test-mode /spamemails/singleemail.spam where singleemail.spam contains a single spam email. Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com. -- Dave Funk University of Iowa dbfunk (at) engineering.uiowa.eduCollege of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include std_disclaimer.h Better is not better, 'standard' is better. B{
Re: Spamassassin not parsing email messages
Hi Dave, That's most likely the case. But I'm not sure what's going in there and how to get rid of it. I tried with and without chomp() but got the same results. below is a snippet with chomp, which I applied before parsing the email with spamassassin. my $spamtest = Mail::SpamAssassin-new(); # This is the main loop. It's executed once for each email while(!$folder_reader-end_of_file()) { $email = $folder_reader-read_next_email(); chomp($email); $mail = $spamtest-parse($email); $status = $spamtest-check($mail); #rest of code per above. } Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102784.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
On Fri, 28 Dec 2012, Sean Tout wrote: That's most likely the case. But I'm not sure what's going in there and how to get rid of it. I tried with and without chomp() but got the same results. below is a snippet with chomp, which I applied before parsing the email with spamassassin. my $spamtest = Mail::SpamAssassin-new(); # This is the main loop. It's executed once for each email while(!$folder_reader-end_of_file()) { $email = $folder_reader-read_next_email(); Write $email to a file here and take a look at it. chomp($email); $mail = $spamtest-parse($email); $status = $spamtest-check($mail); #rest of code per above. } -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- How can you reason with someone who thinks we're on a glidepath to a police state and yet their solution is to grant the government a monopoly on force? They are insane. --- 211 days since the first successful private support mission to ISS (SpaceX)
Re: Spamassassin not parsing email messages
Hi John, I wrote every email read to an output file. The output file is identical to the input file I'm reading the emails from according to diff! Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102786.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
On Fri, 28 Dec 2012, Sean Tout wrote: Hi John, I wrote every email read to an output file. The output file is identical to the input file I'm reading the emails from according to diff! The concern is the format of the single mail object being sent to SpamAssassin for scanning. Having the very first line of that object be a blank line would explain the misformatted message rule hits you've reported. Capturing the entire mailbox and running a diff is certainly suggestive, but to be *sure* you want to look at the messages individually. If you capture that one mail object to a file, and it is a properly-formatted RFC-822 message with no leading blank lines, and you can successfully pipe that file through SA and get a sensible score, then the problem is not in the data, it's how it's being fed to SpamAssassin within that script. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The more you believe you can create heaven on earth the more likely you are to set up guillotines in the public square to hasten the process. -- James Lileks --- 211 days since the first successful private support mission to ISS (SpaceX)
Re: Spamassassin not parsing email messages
Hi John, Per your response below, here is what I did to confirm it's not a content problem. open (RFILE, $reportfile_name); while(!$folder_reader-end_of_file()) { $email = $folder_reader-read_next_email(); chomp($email); $mail = $spamtest-parse($email); $status = $spamtest-check($mail); print RFILE $$email; } then issued the following command: spamassassin --test-mode /home/stout/spam/reportfile_in.txt the above worked just fine. the contents of reportfile_in.txt are created by print RFILE $$email. Thoughts! Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102789.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
On Fri, 28 Dec 2012, Sean Tout wrote: Hi John, Per your response below, here is what I did to confirm it's not a content problem. open (RFILE, $reportfile_name); while(!$folder_reader-end_of_file()) { $email = $folder_reader-read_next_email(); chomp($email); $mail = $spamtest-parse($email); $status = $spamtest-check($mail); print RFILE $$email; } then issued the following command: spamassassin --test-mode /home/stout/spam/reportfile_in.txt the above worked just fine. the contents of reportfile_in.txt are created by print RFILE $$email. Thoughts! Unfortunately that's all I can recommend. I am not familiar with using the SpamAssassin libraries directly from Perl. If I were in your situation I'd do something hackish like system(spamc $RFILE) or an equally ugly shell script... :) Sorry. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Justice is justice, whereas social justice is code for one set of rules for the rich, another for the poor; one set for whites, another set for minorities; one set for straight men, another for women and gays. In short, it's the opposite of actual justice. -- Burt Prelutsky --- 211 days since the first successful private support mission to ISS (SpaceX)
Re: Spamassassin not parsing email messages
Hi John, Thank you much for the help. I have been trying to avoid executing spamassassin shell commands from perl since it takes a significant amount of time~=12 seconds for each email. I have tried the below script, which works but of course not in a favorable especially for processing 20,000+ emails in spfiles folder. @files = /home/sean/code/spam/spfiles/*; my $outfile = 'mailrep_out.txt'; open (MYFILE, $outfile); foreach $file (@files) { $cmd = spamassassin --test-mode .$file. mail_out.txt; system ($cmd); } close(MYFILE); Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102791.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Spamassassin not parsing email messages
On Fri, 2012-12-28 at 16:51 -0800, Sean Tout wrote: Hi John, Thank you much for the help. I have been trying to avoid executing spamassassin shell commands from perl since it takes a significant amount of time~=12 seconds for each email. I have tried the below script, which works but of course not in a favorable especially for processing 20,000+ emails in spfiles folder. @files = /home/sean/code/spam/spfiles/*; my $outfile = 'mailrep_out.txt'; open (MYFILE, $outfile); foreach $file (@files) { $cmd = spamassassin --test-mode .$file. mail_out.txt; system ($cmd); } close(MYFILE); Regards, -Sean. As, from this, it seems that you have already got the messages held as individual files in the /home/sean/code/spam/spfiles/ directory, why not feed them directly to spamd with a small bash script: for m in /home/sean/code/spam/spfiles/* do spamc $m | pipeline to analyse and store spamd replies done which should run a lot faster than calling spamassassin directly because spamd will only need to be loaded once at the start of the run. ... or did I miss something obvious? Martin
Re: Spamassassin not parsing email messages
Hi Martin, You certainly did not miss anythingbut I did! Being new to spamassassin, I was only familiar with spamassassin command. which was awfully slow for a large number of emails. But now that I used spamc, I'm getting 5+ messages per second. Thank you much for the advise. I have practically given up on the original perl code since I'm unable to find out the issue. With spamc, I can get a decent performance. Regards, -Sean. -- View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102801.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Parsing Email
Hello, I've a project that I'm needing to solve. Fax machines (for a client) have been replaced with the phone company's fax server that e-mails the incomming fax (.tif) images to a specific e-address at the clients place of business. Just so happens, the e-mail passes through a mail server that will inspect it for e-viri as well as run it through spamassassin before it forwards it onto their machine. That mail server that pre-processes the clients e-mail is a machine I administer. What I'd like to do... is capture the contents of these particular fax e-mails as its passing through the machine I administer and either: 1- copy the fax images (detach the images from e-mail messages) and store these images on that server (whether as a file or put into a database as a blob) 2- create a database record that will essentially catalog the incoming fax to associate a fax file image (or db blob ID) A- and also search a database for existing origination fax #'s so that the fax can be associated as to the right company that sent it. In this case.. the DB used is a MySQL database that exists on this particular machine as well. Now.. what I need help in understanding... is ... assuming that I can handle each e-mail separately as it comes through, how do I parse the e-mail (like the way Spamassassin does) to have the ability to pull the component parts from the e-mail (from:, subject:, and MIME-encapsulated fax image) in order to be able to use these pieces (somehow) for the customer care module. I'm well versed in PHP... I used to do a lot of perl (many moons ago) and I'd like to make this work without too awful much pain. I think ultimately, I'll probably let the normal copy of the e-mail go onto the customers destination. I'd cause an extra Cc: to go through a specific e-mail account on the server where anything that is delivered to this account is strained by this e-mail parsing program that'll split the e-mail up into it's pieces, and distribute/use the chunks it in a manner that I can manipulate it later in the process. Any help to point me in the right direction? Thanks a lot Tyler Nally
Re: Parsing Email
On Wed, Oct 11, 2006 at 05:24:46PM -0400, Tyler Nally wrote: Now.. what I need help in understanding... is ... assuming that I can handle each e-mail separately as it comes through, how do I parse the e-mail (like the way Spamassassin does) to have the ability to pull the component parts from the e-mail (from:, subject:, and MIME-encapsulated fax image) in order to be able to use these pieces (somehow) for the customer care module. :) I answered this kind of question for someone on IRC a week or two ago, here's a quick example of how to use Mail::SpamAssassin::Message: use Mail::SpamAssassin::Message; my $msg = Mail::SpamAssassin::Message-new() || die Message error?; my $count = 0; foreach my $p ($msg-find_parts(qr/^image\b/i, 1)) { open(OUT, message..$count++) || die can't write file message.$count: $!; binmode OUT; print OUT $p-decode(); close(OUT); } So that parses a message from STDIN, goes through and finds all image parts, and writes them out to files called message.#. Use perldoc Mail::SpamAssassin::Message and perldoc Mail::SpamAssassin::Message::Node for more information about functions and such. :) -- Randomly Selected Tagline: Zero equals Zero - Prof. Farr pgpTIvsXoI7I0.pgp Description: PGP signature
Re: Parsing Email
On Wed, 11 Oct 2006, Theo Van Dinter wrote: On Wed, Oct 11, 2006 at 05:24:46PM -0400, Tyler Nally wrote: Now.. what I need help in understanding... is ... assuming that I can handle each e-mail separately as it comes through, how do I parse the e-mail (like the way Spamassassin does) to have the ability to pull the component parts from the e-mail (from:, subject:, and MIME-encapsulated fax image) in order to be able to use these pieces (somehow) for the customer care module. :) I answered this kind of question for someone on IRC a week or two ago, here's a quick example of how to use Mail::SpamAssassin::Message: Yeah, I learned to use Message.pm from felicity :) use Mail::SpamAssassin::Message; my $msg = Mail::SpamAssassin::Message-new() || die Message error?; my $count = 0; foreach my $p ($msg-find_parts(qr/^image\b/i, 1)) { open(OUT, message..$count++) || die can't write file message.$count: $!; binmode OUT; print OUT $p-decode(); close(OUT); } So that parses a message from STDIN, goes through and finds all image parts, and writes them out to files called message.#. I used code below to retrieve the spam forwarded as attachment from squirrelmail and feeds spam to sa-learn --- #!/usr/bin/perl use strict; use warnings; my $fh; open $fh, , shift; my @message = $fh; use Mail::SpamAssassin::Message; my $msg = Mail::SpamAssassin::Message-new( { 'message' = [EMAIL PROTECTED], } ) || die Message error?; #foreach my $p ($msg-find_parts(qr/^(text|image|application)\b/i, 1)) { foreach my $p ($msg-find_parts(qr/^message\b/i, 0)) { eval { no warnings ; my $type = $p-{'type'}; my $attachname = $p-{'name'}; print Content type is: $type\n; print write file name: $attachname\n; open my $out, , $attachname || die Can't write file $attachname:$!; binmode $out; print $out $p-decode(); }; #warn $@ if $@; } __END__ Use perldoc Mail::SpamAssassin::Message and perldoc Mail::SpamAssassin::Message::Node for more information about functions and such. :) -- Randomly Selected Tagline: Zero equals Zero - Prof. Farr Vincent Li http://pingpongit.homelinux.com Opensource .Implementation. .Consulting. Platform.Fedora. .Debian. .Mac OS X. Bloghttp://bl0g.blogdns.com
Re: Parsing Email
On Wed, Oct 11, 2006 at 02:48:28PM -0700, Vincent Li wrote: my $fh; open $fh, , shift; my @message = $fh; use Mail::SpamAssassin::Message; my $msg = Mail::SpamAssassin::Message-new( { 'message' = [EMAIL PROTECTED], } FYI, new() accepts a file handle, an array, a scalar, or undef (which causes it to use \*STDIN). So you don't need to slurp the message data in first. :) -- Randomly Selected Tagline: All in a days work for Confuse-a-Cat. pgp8ev9w34Yhn.pgp Description: PGP signature
Re: Parsing Email
Tyler Nally wrote: 1- copy the fax images (detach the images from e-mail messages) and store these images on that server (whether as a file or put into a database as a blob) If you're running Sendmail, you can use MIMEdefang www.mimedefang.org for this. It has a built-in function, action_replace_with_url, which does exactly what you want. -- Kelson Vibber SpeedGate Communications www.speed.net