Re: sa-learn --forget
[EMAIL PROTECTED] wrote: I got a message that has tagged as spam. Received a score of 5.2. This mail is a ham mail for me/us. So i ran --forget and received this: sa-learn --forget --mbox /var/opt/hula/netmail/users/forget Forgot tokens from 0 message(s) (1 message(s) examined) There was only 1 message/email in this folder. I expected to see Forgot tokens from 1 message(s) (1 message(s) examined) but this was not the case. What did i do wrong? SA 3.2.1 with sles9 and spamd running without any options --forget only works if that specific message has been learned as spam by the bayes subsystem. And, just because a message is tagged as spam, it does not mean that the bayes autolearner caused it to be trained. Really, what you would want to do is --ham, not --forget. --ham will explicitly add information to the database that the message is not spam. --forget will only negate any information resulting from learning that message, but doesn't change the database in any other way. Generally I would avoid using forget, it's really a special-case tool only. If a message was marked incorrectly, feed it to --spam or --ham as needed. (and no, running mis-learned through --forget first doesn't change anything. If a message was learned as spam, and you feed it to sa-learn --ham, SA is smart enough to do a forget and learn as ham in one pass.)
Re: Rule suggestion - smtp sanity
Matus UHLAR - fantomas wrote: On 13.07.07 17:04, arni wrote: From large providers i sometimes recieve messages through encrypted smtp, the header looks smth like this (qmail): ... with (AES256-SHA encrypted) SMTP; ... Would it be a good idea to give a minimal negative score on this -0.1 or -0.2 if this happens on the last hop? - It proves that the sending smtp server is very protocol sane, which spambots are usually not. it just proves that the mail was sent through sane server, but there could be spambod behind it. -0.1 and -0.2 is very small numbers. Do you encounter any case where that would help? Autolearning.
Re: PDFText Plugin for PDF file scoring - not for PDF images
Dallas Engelken wrote, on 14/07/07 12:17 AM: James MacLean wrote: Hi folks, Regrets if this is the wrong list. Wanted to be able to score on text found in PDF files. Did not see any obvious route, so made a plugin that calls XPDF's pdfinfo and pdftotext to get the text that is then scored. Sample local.cf could be : pdftotext_cmd /usr/local/bin/pdftotext pdfinfo_cmd /usr/local/bin/pdfinfo body PDF_TO_TEXT eval:check_pdftext(^Error,sex,drugs,'Title:\s+stock_tmp.pdf:4','Creator:\s+OpenOffice.org 1.1.4:4') Notice that a :4 gives a find of that regex 4 points. Really don't know if this was the right road to follow, as I copied the AntiVirus.pm and came up with this: http://support.ednet.ns.ca/SpamAssassin/PDFText.pm So far... it appears to work as expected and didn't take down a pretty busy server ;). Enjoy hearing any positive criticisms :). I did this the other day with CAM::PDF, but Theo recommended this work should be done in the post_message_parse() plugin call. Then you could just write body rules against the text, uris would get checked by uribldns plugin, etc -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com I did start with keeping it all in Perl, but when I tested my first SPAM with the CAM::PDF utils, it resulted in just a bunch of space separated letters :(. Interested in getting something working, I switched to the XPDF utils. Maybe getpdftext.pl is not a good example of how the modules work? Where do I find information on hooking into post_message_parse()? Tried greping in the module area with no luck :(. Certainly agree it would be better to get the text out and let everyone at it :). I couldn't see how to do that when I started down this road. I was even first trying to see if Exim would add another attachment to the e-mail which would be the output of pfdtotext, but again, wanted to get something running, so opted for what is there now :(. Thanks, JES
tests=[none]
Daily at least 2 or 3 spam show the above on my ISP's markup line. In the case of the one above I show: X-Spam-Virus: Yes (Email.Spam.Gen983.Sanesecurity.07071002) X-Spam-Seen: Tokens 131 X-Spam-New: Tokens 164 X-Spam-Remote: Host localhost.localdomain X-Spam-ASN: AS4355 207.69.195.0/24 X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on cpollock.localdomain X-Spam-Hammy: Tokens 0 X-Spam-Status: Yes, score=24.4 required=5.0 tests=BAYES_99=5,CLAMAV=10, DATE_IN_PAST_03_06=0.044,DCC_CHECK=2.17,DIGEST_MULTIPLE=0.001, DKIM_POLICY_SIGNSOME=0,PYZOR_CHECK=3.7,RAZOR2_CF_RANGE_51_100=0.5, RAZOR2_CF_RANGE_E4_51_100=1.5,RAZOR2_CHECK=0.5,SAGREY=1,STOX_REPLY_TYPE=0.001 autolearn=disabled version=3.2.1 X-Spam-Spammy: Tokens 33 X-Spam-Pyzor: Reported 677 times. X-Spam-DCC: cpollock 104; Body=many Fuz1=many Fuz2=many Yet their markup shows: X-Virus-Scanned: amavisd-new at Old-X-Spam-Score: 0 Old-X-Spam-Level: Old-X-Spam-Status: No, score=0 tagged_above=-10 required=6 tests=[none] Their explaination for this is: It's not that they had no tests run, it's that they had all the tests run and the score came out as ZERO so no header was added. Jim... That just doesn't sound right to me that all possible tests were run and there were no hits, but I guess its possible. -- Chris KeyID 0xE372A7DA98E6705C pgp9WVYvPWjvr.pgp Description: PGP signature
Re: Rule suggestion - smtp sanity
Most likely, Johnny Spammer monitoring this list will just add a FAKE header to take advantage of such a rule. Matt Kettler wrote: Matus UHLAR - fantomas wrote: On 13.07.07 17:04, arni wrote: From large providers i sometimes recieve messages through encrypted smtp, the header looks smth like this (qmail): ... with (AES256-SHA encrypted) SMTP; ... Would it be a good idea to give a minimal negative score on this -0.1 or -0.2 if this happens on the last hop? - It proves that the sending smtp server is very protocol sane, which spambots are usually not. it just proves that the mail was sent through sane server, but there could be spambod behind it. -0.1 and -0.2 is very small numbers. Do you encounter any case where that would help? Autolearning.
Re: Rule suggestion - smtp sanity
On Sat, 14 Jul 2007, Dave Koontz wrote: Most likely, Johnny Spammer monitoring this list will just add a FAKE header to take advantage of such a rule. You would only check it in the header that your MTA added. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Where We Want You To Go Today 07/05/07: Microsoft patents in-OS adware architecture incorporating spyware, profiling, competitor suppression and delivery confirmation (U.S. Patent #20070157227) --- 10 days until The 38th anniversary of Apollo 11 landing on the Moon
Re: Rule suggestion - smtp sanity
1) that won't help any. You'd want to check this against headers generated by trusted relays. 2) Even if he does, who cares. At such a small score it's unlikely to help the spammer any. However, email which is marginally above the autolearn threshold will be helped. (Personally, I get a reasonable amount of low-scoring ham in the 0.1 to 0.3 range. I find very little spam near the 5.0 threshold, and most of that is just under anyway.) Dave Koontz wrote: Most likely, Johnny Spammer monitoring this list will just add a FAKE header to take advantage of such a rule. Matt Kettler wrote: Matus UHLAR - fantomas wrote: On 13.07.07 17:04, arni wrote: From large providers i sometimes recieve messages through encrypted smtp, the header looks smth like this (qmail): ... with (AES256-SHA encrypted) SMTP; ... Would it be a good idea to give a minimal negative score on this -0.1 or -0.2 if this happens on the last hop? - It proves that the sending smtp server is very protocol sane, which spambots are usually not. it just proves that the mail was sent through sane server, but there could be spambod behind it. -0.1 and -0.2 is very small numbers. Do you encounter any case where that would help? Autolearning.
Re: tests=[none]
At 07:34 14-07-2007, Chris wrote: Daily at least 2 or 3 spam show the above on my ISP's markup line. In the case of the one above I show: X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on cpollock.localdomain X-Spam-Hammy: Tokens 0 X-Spam-Status: Yes, score=24.4 required=5.0 tests=BAYES_99=5,CLAMAV=10, DATE_IN_PAST_03_06=0.044,DCC_CHECK=2.17,DIGEST_MULTIPLE=0.001, DKIM_POLICY_SIGNSOME=0,PYZOR_CHECK=3.7,RAZOR2_CF_RANGE_51_100=0.5, RAZOR2_CF_RANGE_E4_51_100=1.5,RAZOR2_CHECK=0.5,SAGREY=1,STOX_REPLY_TYPE=0.001 autolearn=disabled version=3.2.1 X-Spam-Spammy: Tokens 33 X-Spam-Pyzor: Reported 677 times. X-Spam-DCC: cpollock 104; Body=many Fuz1=many Fuz2=many Yet their markup shows: X-Virus-Scanned: amavisd-new at Old-X-Spam-Score: 0 Old-X-Spam-Level: Old-X-Spam-Status: No, score=0 tagged_above=-10 required=6 tests=[none] Their explaination for this is: It's not that they had no tests run, it's that they had all the tests run and the score came out as ZERO so no header was added. Jim... That just doesn't sound right to me that all possible tests were run and there were no hits, but I guess its possible. Are you assuming that the two configurations are identical? Yours has Bayes, DKIM verification, Pyzor and DCC enabled. They may not be be using those plugins. Regards, -sm
Re: tests=[none]
On Saturday 14 July 2007 10:48 am, SM wrote: Yet their markup shows: X-Virus-Scanned: amavisd-new at Old-X-Spam-Score: 0 Old-X-Spam-Level: Old-X-Spam-Status: No, score=0 tagged_above=-10 required=6 tests=[none] Their explaination for this is: It's not that they had no tests run, it's that they had all the tests run and the score came out as ZERO so no header was added. Jim... That just doesn't sound right to me that all possible tests were run and there were no hits, but I guess its possible. Are you assuming that the two configurations are identical? Yours has Bayes, DKIM verification, Pyzor and DCC enabled. They may not be be using those plugins. Regards, -sm I know they're not using Bayes because it was so inacurate that they quit using it. I realize they're not using the same tests or plug-ins as I am, it just doesn't make sense to me that an ISP could run all possible tests and have none of them hit. -- Chris KeyID 0xE372A7DA98E6705C pgpy1zu3mr0st.pgp Description: PGP signature
Re: tests=[none]
At 09:36 AM 7/14/2007, Chris wrote: I realize they're not using the same tests or plug-ins as I am, i= t=20 just doesn't make sense to me that an ISP could run all possible tests and= =20 have none of them hit. I just removed the max limit to scan messages from Amasd-new because I came in today to a mailbox stuffed full of huge spam messages from some Asian company. All had no tests due to the size. At least they used their real name, they're now in my Postfix sender-reject file. I wonder how much this will slow the server down, scanning large messages? At least we don't service huge numbers of accounts like most of you do. -- Jerry Durand, Durand Interstellar, Inc. www.interstellar.com tel: +1 408 356-3886, USA toll free: 1 866 356-3886 Skype: jerrydurand
Re: PDFText Plugin for PDF file scoring - not for PDF images
On Sat, Jul 14, 2007 at 09:54:36AM -0300, James MacLean wrote: Where do I find information on hooking into post_message_parse()? Tried greping in the module area with no luck :(. Certainly agree it would be better to get the text out and let everyone at it :). You can ask. :) But yes, I didn't do a good job of fully documenting how this is supposed to work -- you have to know about the plugin call, then hunt around Message and Message::Node, etc. Sorry. Here's the basics: First, create a plugin with the post_message_parse method. Then in there, use $msg-find_parts() to find the parts that you're looking for (find_parts() is pretty well documented). Then, you simply take the data from $part-decode() and do something to convert it to text. Then you take that text and call $part-set_rendered($text). Later on, when SA looks for the text to use for body rules, uri parsing, etc, it takes anything that has rendered text. So here's a quick n' dirty sample that takes parts of image/theo and renders them into The plugin works!\n: package Mail::SpamAssassin::Plugin::RenderExample; use Mail::SpamAssassin::Plugin; use strict; use warnings; use vars qw(@ISA); @ISA = qw(Mail::SpamAssassin::Plugin); sub new { my $class = shift; my $mailsaobject = shift; $class = ref($class) || $class; my $self = $class-SUPER::new($mailsaobject); bless ($self, $class); return $self; } sub post_message_parse { my ($self, $opts) = @_; my $msg = $opts-{'message'}; foreach my $p ( $msg-find_parts(qr!^image/theo$!, 1) ) { $p-set_rendered(The plugin works!\n); } } 1; -- Randomly Selected Tagline: I'm a programmer: I don't buy software, I write it. - Tom Christiansen pgpGBxwKUvfY2.pgp Description: PGP signature
announce: urlx utility for spamassassin
Most systems that I'm familiar with nowadays have the users put spam emails that manage to get past the filters into a special folder (directory) so they can be examined, in order to make the spam filter system more effective. In pursuit of that Idea, I've written urlx. Urlx is designed to extract urls, both clear and obfusticated, from those spam emails and convert them into SpamAssassin rules automatically (Note: When I say automatic, I still expect a human to apply a sanity check somewhere). Urlx is not yet released to the general public, but if you're interested in helping test, please drop me an email. Mike- -- If you're not confused, you're not trying hard enough. -- Please note - Due to the intense volume of spam, we have installed site-wide spam filters at catherders.com. If email from you bounces, try non-HTML, non-encoded, non-attachments,
plugin to test attachments from unknown senders
Like other folks I've been getting hit with the PDF spam pretty hard. I think the way to solve this and the image spam in general is to do a plugin that does two things: 1) looks in the message to see if there is a binary attachment 2) looks in the AWL to see if the sender tuple is known 3) if (1==true) (2==false) fire a score I've been meaning to adapt my SAGREY plugin [1] for this but have not had time and may not have time for a while yet, so I thought I'd throw this out there to see if anybody else is interested in doing it [1] http://www.ntrg.com/misc/sagrey/ -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Rule suggestion - smtp sanity
On 7/13/2007 11:04 AM, arni wrote: From large providers i sometimes recieve messages through encrypted smtp, the header looks smth like this (qmail): ... with (AES256-SHA encrypted) SMTP; ... Would it be a good idea to give a minimal negative score on this -0.1 or -0.2 if this happens on the last hop? - It proves that the sending smtp server is very protocol sane, which spambots are usually not. It's a good idea to look at last-hop transfer and see if it used STARTTLS, if the certificate was valid, etc., and is something I've got on my to-do list for future development. The big problem is that there is no real standard and every MTA records the details differently. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
RE: plugin to test attachments from unknown senders
Aren't spammer tuples in the AWL too? I thought that it averaged both ways; Country AND Western. Dan -Original Message- From: Eric A. Hall [mailto:[EMAIL PROTECTED] Sent: Saturday, July 14, 2007 3:49 PM To: users@spamassassin.apache.org Subject: plugin to test attachments from unknown senders Like other folks I've been getting hit with the PDF spam pretty hard. I think the way to solve this and the image spam in general is to do a plugin that does two things: 1) looks in the message to see if there is a binary attachment 2) looks in the AWL to see if the sender tuple is known 3) if (1==true) (2==false) fire a score I've been meaning to adapt my SAGREY plugin [1] for this but have not had time and may not have time for a while yet, so I thought I'd throw this out there to see if anybody else is interested in doing it [1] http://www.ntrg.com/misc/sagrey/ -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Help with a multi-line mode rule
Hi all, I hope someone can help me with a rule I'm trying to write. My understanding of the multi-line mode, with the /m switch at the end, is this: in this mode, the caret (^) and dollar ($) match before and after newlines in the string. Is that correct? I believe this is the correct method for allowing me to use a full rule (ie. searching the entire undecoded message) but also specifying carets and dollars within the regex, right? So I think this should mean that I can have some text like this, for example: Subject: this is a test From: [EMAIL PROTECTED] X-Return-Path: [EMAIL PROTECTED] ...and create a rule like the following which should hit on it: fullMYRULE/^Subject:.* test$(?:\s(?!X-Return-Path).*)+\sX-Return-Path: [EMAIL PROTECTED]/m Right? If I test this rule using the Regex Coach tool at http://weitz.de/regex-coach/ (I'm on Windows), with the 'm' switch enabled, the rule works fine. But when I test it with SpamAssassin, it doesn't work and I believe it's due to the carat and dollar. However I want to specifically specify that the word test must be at the very end of the Subject line - hence, I want to have the $ after it. I also want to specify that the X-Return-Path must be there, which is why I have the rest of the rule the way it is, but that's not the issue. What am I doing wrong? (Of course in reality I'm not searching for the above strings, I'm trying to catch a particular spam sign, but this is a simple example of the method I'm using) Cheers, Jeremy
Re: plugin to test attachments from unknown senders
At 12:49 14-07-2007, Eric A. Hall wrote: Like other folks I've been getting hit with the PDF spam pretty hard. I think the way to solve this and the image spam in general is to do a plugin that does two things: 1) looks in the message to see if there is a binary attachment 2) looks in the AWL to see if the sender tuple is known 3) if (1==true) (2==false) fire a score You might also verify the AWL score in step to and fire step 3 if that score is above an arbitrary value. Note that your rule may trigger false positive for one-time senders. Regards, -sm
Re: RDNS_NONE and Qmail?
Matthew Yette wrote: I'm currently running qmail 1.03, SA 3.20 with qmail-scanner 1.25st. Every single piece of mail that runs through the system gets hit with RDNS_NONE, which adds 0.1 points to the score. Not a major deal - and if there isn't a fix, it wouldn't be a problem - but I figured I'd try to make things perfect if possible. :) There was a change in SA around 3.2.1 whereby it no longer relies on its own code to do PTR lookups (rDNS) of the MTAs showing in the Received: headers. Instead it relies on the local MTA to have done it and written it into the header field. By default Qmail doesn't do rDNS lookups (performance reasons), so you need to change tcpserver to do them - which then makes SA happy again. i.e. you want tcpserver -h instead of tcpserver -H -- Cheers Jason Haar Information Security Manager, Trimble Navigation Ltd. Phone: +64 3 9635 377 Fax: +64 3 9635 417 PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1