Re: sa-learn --forget

2007-07-14 Thread Matt Kettler
[EMAIL PROTECTED] wrote:
 I got a message that has tagged as spam. Received a score of 5.2. This
 mail is a ham mail for me/us. So i ran --forget and received this:
 sa-learn --forget --mbox /var/opt/hula/netmail/users/forget
 Forgot tokens from 0 message(s) (1 message(s) examined)
 There was only 1 message/email in this folder. I expected to see Forgot
 tokens from 1 message(s) (1 message(s) examined) but this was not the
 case. What did i do wrong?
 SA 3.2.1 with sles9 and spamd running without any options
   
--forget only works if that specific message has been learned as spam by
the bayes subsystem. And, just because a message is tagged as spam, it
does not mean that the bayes autolearner caused it to be trained.

Really, what you would want to do is --ham, not --forget.

--ham will explicitly add information to the database that the message
is not spam.


--forget will only negate any information resulting from learning that
message, but doesn't change the database in any other way. Generally I
would avoid using forget, it's really a special-case tool only. If a
message was marked incorrectly, feed it to --spam or --ham as needed.

(and no, running mis-learned through --forget first doesn't change
anything. If a message was learned as spam, and you feed it to sa-learn
--ham, SA is smart enough to do a forget and learn as ham in one pass.)





Re: Rule suggestion - smtp sanity

2007-07-14 Thread Matt Kettler
Matus UHLAR - fantomas wrote:
 On 13.07.07 17:04, arni wrote:
   
 From large providers i sometimes recieve messages through encrypted 
 smtp, the header looks smth like this (qmail):

 ...  with (AES256-SHA encrypted) SMTP; ...


 Would it be a good idea to give a minimal negative score on this -0.1 or 
 -0.2 if this happens on the last hop? - It proves that the sending smtp 
 server is very protocol sane, which spambots are usually not.
 

 it just proves that the mail was sent through sane server, but there could
 be spambod behind it.

 -0.1 and -0.2 is very small numbers. Do you encounter any case where that
 would help?

   
Autolearning.


Re: PDFText Plugin for PDF file scoring - not for PDF images

2007-07-14 Thread James MacLean

Dallas Engelken wrote, on 14/07/07 12:17 AM:

James MacLean wrote:

Hi folks,

Regrets if this is the wrong list.

Wanted to be able to score on text found in PDF files. Did not see 
any obvious route, so made a plugin that calls XPDF's pdfinfo and 
pdftotext to get the text that is then scored.


Sample local.cf could be :

pdftotext_cmd /usr/local/bin/pdftotext
pdfinfo_cmd /usr/local/bin/pdfinfo
body PDF_TO_TEXT 
eval:check_pdftext(^Error,sex,drugs,'Title:\s+stock_tmp.pdf:4','Creator:\s+OpenOffice.org 
1.1.4:4')


Notice that a :4 gives a find of that regex 4 points.

Really don't know if this was the right road to follow, as I copied 
the AntiVirus.pm and came up with this:

http://support.ednet.ns.ca/SpamAssassin/PDFText.pm

So far... it appears to work as expected and didn't take down a 
pretty busy server ;).


Enjoy hearing any positive criticisms :).


I did this the other day with CAM::PDF, but Theo recommended this work 
should be done in the post_message_parse() plugin call.   Then you 
could just write body rules against the text, uris would get checked 
by uribldns plugin, etc


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com

I did start with keeping it all in Perl, but when I tested my first SPAM 
with the CAM::PDF utils, it resulted in just a bunch of space separated 
letters :(. Interested in getting something working, I switched to the 
XPDF utils. Maybe getpdftext.pl is not a good example of how the modules 
work?


Where do I find information on hooking into post_message_parse()? Tried 
greping in the module area with no luck :(. Certainly agree it would be 
better to get the text out and let everyone at it :). I couldn't see how 
to do that when I started down this road. I was even first trying to see 
if Exim would add another attachment to the e-mail which would be the 
output of pfdtotext, but again, wanted to get something running, so 
opted for what is there now :(.


Thanks,
JES


tests=[none]

2007-07-14 Thread Chris
Daily at least 2 or 3 spam show the above on my ISP's markup line. In the case 
of the one above I show:

 X-Spam-Virus: Yes (Email.Spam.Gen983.Sanesecurity.07071002)
  X-Spam-Seen: Tokens 131
  X-Spam-New: Tokens 164
  X-Spam-Remote: Host localhost.localdomain
  X-Spam-ASN: AS4355 207.69.195.0/24
  X-Spam-Flag: YES
  X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on 
cpollock.localdomain
  X-Spam-Hammy: Tokens 0
  X-Spam-Status: Yes, score=24.4 required=5.0 tests=BAYES_99=5,CLAMAV=10,
 DATE_IN_PAST_03_06=0.044,DCC_CHECK=2.17,DIGEST_MULTIPLE=0.001,
 DKIM_POLICY_SIGNSOME=0,PYZOR_CHECK=3.7,RAZOR2_CF_RANGE_51_100=0.5,
 RAZOR2_CF_RANGE_E4_51_100=1.5,RAZOR2_CHECK=0.5,SAGREY=1,STOX_REPLY_TYPE=0.001
 autolearn=disabled version=3.2.1
  X-Spam-Spammy: Tokens 33
  X-Spam-Pyzor: Reported 677 times.
  X-Spam-DCC: cpollock 104; Body=many Fuz1=many Fuz2=many

Yet their markup shows:

 X-Virus-Scanned: amavisd-new at
  Old-X-Spam-Score: 0
  Old-X-Spam-Level: 
  Old-X-Spam-Status: No, score=0 tagged_above=-10 required=6 tests=[none]

Their explaination for this is:

 It's not that they had no tests run, it's that they had all the tests run
 and the score came out as ZERO so no header was added.
 Jim...
 

That just doesn't sound right to me that all possible tests were run and there 
were no hits, but I guess its possible.

-- 
Chris
KeyID 0xE372A7DA98E6705C


pgp9WVYvPWjvr.pgp
Description: PGP signature


Re: Rule suggestion - smtp sanity

2007-07-14 Thread Dave Koontz
Most likely, Johnny Spammer monitoring this list will just add a FAKE
header to take advantage of such a rule.

Matt Kettler wrote:
 Matus UHLAR - fantomas wrote:
   
 On 13.07.07 17:04, arni wrote:
   
 
 From large providers i sometimes recieve messages through encrypted 
 smtp, the header looks smth like this (qmail):

 ...  with (AES256-SHA encrypted) SMTP; ...


 Would it be a good idea to give a minimal negative score on this -0.1 or 
 -0.2 if this happens on the last hop? - It proves that the sending smtp 
 server is very protocol sane, which spambots are usually not.
 
   
 it just proves that the mail was sent through sane server, but there could
 be spambod behind it.

 -0.1 and -0.2 is very small numbers. Do you encounter any case where that
 would help?

   
 
 Autolearning.
   



Re: Rule suggestion - smtp sanity

2007-07-14 Thread John D. Hardin
On Sat, 14 Jul 2007, Dave Koontz wrote:

 Most likely, Johnny Spammer monitoring this list will just add a
 FAKE header to take advantage of such a rule.

You would only check it in the header that your MTA added.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Where We Want You To Go Today 07/05/07: Microsoft patents in-OS
  adware architecture incorporating spyware, profiling, competitor
  suppression and delivery confirmation (U.S. Patent #20070157227)
---
 10 days until The 38th anniversary of Apollo 11 landing on the Moon



Re: Rule suggestion - smtp sanity

2007-07-14 Thread Matt Kettler
1) that won't help any. You'd want to check this against headers
generated by trusted relays.

2) Even if he does, who cares. At such a small score it's unlikely to
help the spammer any. However, email which is marginally above the
autolearn threshold will be helped. (Personally, I get a reasonable
amount of low-scoring ham in the 0.1 to 0.3 range. I find very little
spam near the 5.0 threshold, and most of that is just under anyway.)

Dave Koontz wrote:
 Most likely, Johnny Spammer monitoring this list will just add a FAKE
 header to take advantage of such a rule.

 Matt Kettler wrote:
   
 Matus UHLAR - fantomas wrote:
   
 
 On 13.07.07 17:04, arni wrote:
   
 
   
 From large providers i sometimes recieve messages through encrypted 
 smtp, the header looks smth like this (qmail):

 ...  with (AES256-SHA encrypted) SMTP; ...


 Would it be a good idea to give a minimal negative score on this -0.1 or 
 -0.2 if this happens on the last hop? - It proves that the sending smtp 
 server is very protocol sane, which spambots are usually not.
 
   
 
 it just proves that the mail was sent through sane server, but there could
 be spambod behind it.

 -0.1 and -0.2 is very small numbers. Do you encounter any case where that
 would help?

   
 
   
 Autolearning.
   
 


   



Re: tests=[none]

2007-07-14 Thread SM

At 07:34 14-07-2007, Chris wrote:
Daily at least 2 or 3 spam show the above on my ISP's markup line. 
In the case

of the one above I show:

  X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on
cpollock.localdomain
  X-Spam-Hammy: Tokens 0
  X-Spam-Status: Yes, score=24.4 required=5.0 tests=BAYES_99=5,CLAMAV=10,
 DATE_IN_PAST_03_06=0.044,DCC_CHECK=2.17,DIGEST_MULTIPLE=0.001,
 DKIM_POLICY_SIGNSOME=0,PYZOR_CHECK=3.7,RAZOR2_CF_RANGE_51_100=0.5,
 
RAZOR2_CF_RANGE_E4_51_100=1.5,RAZOR2_CHECK=0.5,SAGREY=1,STOX_REPLY_TYPE=0.001

 autolearn=disabled version=3.2.1
  X-Spam-Spammy: Tokens 33
  X-Spam-Pyzor: Reported 677 times.
  X-Spam-DCC: cpollock 104; Body=many Fuz1=many Fuz2=many

Yet their markup shows:

 X-Virus-Scanned: amavisd-new at
  Old-X-Spam-Score: 0
  Old-X-Spam-Level:
  Old-X-Spam-Status: No, score=0 tagged_above=-10 required=6 tests=[none]

Their explaination for this is:

 It's not that they had no tests run, it's that they had all the tests run
 and the score came out as ZERO so no header was added.
 Jim...


That just doesn't sound right to me that all possible tests were run 
and there

were no hits, but I guess its possible.


Are you assuming that the two configurations are identical?  Yours 
has Bayes, DKIM verification, Pyzor and DCC enabled.  They may not be 
be using those plugins.


Regards,
-sm 



Re: tests=[none]

2007-07-14 Thread Chris
On Saturday 14 July 2007 10:48 am, SM wrote:

 Yet their markup shows:
   X-Virus-Scanned: amavisd-new at
Old-X-Spam-Score: 0
Old-X-Spam-Level:
Old-X-Spam-Status: No, score=0 tagged_above=-10 required=6
   tests=[none]
 
 Their explaination for this is:
   It's not that they had no tests run, it's that they had all the tests
   run and the score came out as ZERO so no header was added.
   Jim...
 
 That just doesn't sound right to me that all possible tests were run
 and there
 were no hits, but I guess its possible.

 Are you assuming that the two configurations are identical?  Yours
 has Bayes, DKIM verification, Pyzor and DCC enabled.  They may not be
 be using those plugins.

 Regards,
 -sm

I know they're not using Bayes because it was so inacurate that they quit 
using it. I realize they're not using the same tests or plug-ins as I am, it 
just doesn't make sense to me that an ISP could run all possible tests and 
have none of them hit.

-- 
Chris
KeyID 0xE372A7DA98E6705C


pgpy1zu3mr0st.pgp
Description: PGP signature


Re: tests=[none]

2007-07-14 Thread Jerry Durand

At 09:36 AM 7/14/2007, Chris wrote:

I realize they're not using the same tests or plug-ins as I am, i=
t=20
just doesn't make sense to me that an ISP could run all possible tests and=
=20
have none of them hit.


I just removed the max limit to scan messages from Amasd-new because 
I came in today to a mailbox stuffed full of huge spam messages from 
some Asian company.  All had no tests due to the size.  At least they 
used their real name, they're now in my Postfix sender-reject file.


I wonder how much this will slow the server down, scanning large 
messages?  At least we don't service huge numbers of accounts like 
most of you do.



--
Jerry Durand, Durand Interstellar, Inc.  www.interstellar.com
tel: +1 408 356-3886, USA toll free: 1 866 356-3886
Skype:  jerrydurand



Re: PDFText Plugin for PDF file scoring - not for PDF images

2007-07-14 Thread Theo Van Dinter
On Sat, Jul 14, 2007 at 09:54:36AM -0300, James MacLean wrote:
 Where do I find information on hooking into post_message_parse()? Tried 
 greping in the module area with no luck :(. Certainly agree it would be 
 better to get the text out and let everyone at it :).

You can ask. :)  But yes, I didn't do a good job of fully documenting how
this is supposed to work -- you have to know about the plugin call, then
hunt around Message and Message::Node, etc.  Sorry.  Here's the basics:

First, create a plugin with the post_message_parse method.  Then in
there, use $msg-find_parts() to find the parts that you're looking
for (find_parts() is pretty well documented).  Then, you simply take
the data from $part-decode() and do something to convert it to text.
Then you take that text and call $part-set_rendered($text).

Later on, when SA looks for the text to use for body rules, uri parsing,
etc, it takes anything that has rendered text.

So here's a quick n' dirty sample that takes parts of image/theo and
renders them into The plugin works!\n:


package Mail::SpamAssassin::Plugin::RenderExample;

use Mail::SpamAssassin::Plugin;
use strict;
use warnings;

use vars qw(@ISA);
@ISA = qw(Mail::SpamAssassin::Plugin);

sub new {
  my $class = shift; 
  my $mailsaobject = shift;
  $class = ref($class) || $class;
  my $self = $class-SUPER::new($mailsaobject);
  bless ($self, $class);
  return $self;
}

sub post_message_parse {
  my ($self, $opts) = @_;
  my $msg = $opts-{'message'};
  foreach my $p ( $msg-find_parts(qr!^image/theo$!, 1) ) {
$p-set_rendered(The plugin works!\n);
  }
}

1;


-- 
Randomly Selected Tagline:
I'm a programmer: I don't buy software, I write it. - Tom Christiansen


pgpGBxwKUvfY2.pgp
Description: PGP signature


announce: urlx utility for spamassassin

2007-07-14 Thread Michael W Cocke
Most systems that I'm familiar with nowadays have the users put spam
emails that manage to get past the filters into a special folder
(directory) so they can be examined, in order to make the spam filter
system more effective. In pursuit of that Idea, I've written urlx.

Urlx is designed to extract urls, both clear and obfusticated, from
those spam emails and convert them into SpamAssassin rules
automatically (Note: When I say automatic, I still expect a human to
apply a sanity check somewhere).

Urlx is not yet released to the general public, but if you're
interested in helping test, please drop me an email.

Mike-
--
If you're not confused, you're not trying hard enough.
--
Please note - Due to the intense volume of spam, we have installed 
site-wide spam filters at catherders.com.  If email from you bounces,
try non-HTML, non-encoded, non-attachments,


plugin to test attachments from unknown senders

2007-07-14 Thread Eric A. Hall

Like other folks I've been getting hit with the PDF spam pretty hard. I
think the way to solve this and the image spam in general is to do a
plugin that does two things:

 1) looks in the message to see if there is a binary attachment

 2) looks in the AWL to see if the sender tuple is known

 3) if (1==true)  (2==false) fire a score

I've been meaning to adapt my SAGREY plugin [1] for this but have not had
time and may not have time for a while yet, so I thought I'd throw this
out there to see if anybody else is interested in doing it

[1] http://www.ntrg.com/misc/sagrey/

-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/


Re: Rule suggestion - smtp sanity

2007-07-14 Thread Eric A. Hall

On 7/13/2007 11:04 AM, arni wrote:
  From large providers i sometimes recieve messages through encrypted 
 smtp, the header looks smth like this (qmail):
 
 ...  with (AES256-SHA encrypted) SMTP; ...
 
 
 Would it be a good idea to give a minimal negative score on this -0.1 or 
 -0.2 if this happens on the last hop? - It proves that the sending smtp 
 server is very protocol sane, which spambots are usually not.

It's a good idea to look at last-hop transfer and see if it used STARTTLS,
if the certificate was valid, etc., and is something I've got on my to-do
list for future development.

The big problem is that there is no real standard and every MTA records
the details differently.

-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/


RE: plugin to test attachments from unknown senders

2007-07-14 Thread Dan Barker
Aren't spammer tuples in the AWL too? I thought that it averaged both ways;
Country AND Western.

Dan 

-Original Message-
From: Eric A. Hall [mailto:[EMAIL PROTECTED] 
Sent: Saturday, July 14, 2007 3:49 PM
To: users@spamassassin.apache.org
Subject: plugin to test attachments from unknown senders


Like other folks I've been getting hit with the PDF spam pretty hard. I
think the way to solve this and the image spam in general is to do a plugin
that does two things:

 1) looks in the message to see if there is a binary attachment

 2) looks in the AWL to see if the sender tuple is known

 3) if (1==true)  (2==false) fire a score

I've been meaning to adapt my SAGREY plugin [1] for this but have not had
time and may not have time for a while yet, so I thought I'd throw this out
there to see if anybody else is interested in doing it

[1] http://www.ntrg.com/misc/sagrey/

-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/



Help with a multi-line mode rule

2007-07-14 Thread Jeremy Fairbrass
Hi all,
I hope someone can help me with a rule I'm trying to write. My understanding of 
the multi-line mode, with the /m switch at the end, 
is this: in this mode, the caret (^) and dollar ($) match before and after 
newlines in the string. Is that correct?

I believe this is the correct method for allowing me to use a full rule (ie. 
searching the entire undecoded message) but also 
specifying carets and dollars within the regex, right?

So I think this should mean that I can have some text like this, for example:

Subject: this is a test
From: [EMAIL PROTECTED]
X-Return-Path: [EMAIL PROTECTED]

...and create a rule like the following which should hit on it:

fullMYRULE/^Subject:.* test$(?:\s(?!X-Return-Path).*)+\sX-Return-Path: 
[EMAIL PROTECTED]/m

Right? If I test this rule using the Regex Coach tool at 
http://weitz.de/regex-coach/ (I'm on Windows), with the 'm' switch enabled, 
the rule works fine. But when I test it with SpamAssassin, it doesn't work and 
I believe it's due to the carat and dollar.

However I want to specifically specify that the word test must be at the very 
end of the Subject line - hence, I want to have the 
$ after it. I also want to specify that the X-Return-Path must be there, which 
is why I have the rest of the rule the way it is, but 
that's not the issue.

What am I doing wrong?

(Of course in reality I'm not searching for the above strings, I'm trying to 
catch a particular spam sign, but this is a simple 
example of the method I'm using)

Cheers,
Jeremy 





Re: plugin to test attachments from unknown senders

2007-07-14 Thread SM

At 12:49 14-07-2007, Eric A. Hall wrote:


Like other folks I've been getting hit with the PDF spam pretty hard. I
think the way to solve this and the image spam in general is to do a
plugin that does two things:

 1) looks in the message to see if there is a binary attachment

 2) looks in the AWL to see if the sender tuple is known

 3) if (1==true)  (2==false) fire a score


You might also verify the AWL score in step to and fire step 3 if 
that score is above an arbitrary value.  Note that your rule may 
trigger false positive for one-time senders.


Regards,
-sm 



Re: RDNS_NONE and Qmail?

2007-07-14 Thread Jason Haar
Matthew Yette wrote:
 I'm currently running qmail 1.03, SA 3.20 with qmail-scanner 1.25st.
 Every single piece of mail that runs through the system gets hit with
 RDNS_NONE, which adds 0.1 points to the score. Not a major deal - and
 if there isn't a fix, it wouldn't be a problem - but I figured I'd try
 to make things perfect if possible. :)
  
There was a change in SA around 3.2.1 whereby it no longer relies on its
own code to do PTR lookups (rDNS) of the MTAs showing in the Received:
headers. Instead it relies on the local MTA to have done it and written
it into the header field.

By default Qmail doesn't do rDNS lookups (performance reasons), so you
need to change tcpserver to do them - which then makes SA happy again.

i.e. you want tcpserver -h instead of tcpserver -H


-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1