date:20061103


John Rudd wrote:


I've put up a new version of Relay checker, in

...
I expect I might, at some point, switch from using a dynamic score in 
the plugin, to a normal score.  But that's the only change I expect to 
make, aside from bug fixes (if there are any), and/or a switch to using 
Net::DNS.


I wonder if there is any way for a plugin to hook into SA's DNS routines.  That might be better than 
calling Net::DNS directly.

RE: BIG increase in spam today

2006-11-03 Thread Bret Miller

 Am Donnerstag, 2. November 2006 16:04 schrieb Amos:
 (...)
  Actually, it's getting to the extent that some at work are raising
  questions as to whether our SA setup will be able to
 maintain adequate
  protection from this growing onslaught.
 
  Amos

 Only AFTER adequate initial RBL filtering. Spamhaus does a
 great job here.


It's not doing as great as it used to here. The amount of spam that SA
is processing is about 4X what it was in January. If this keep up, we'll
have to look at other possible options, maybe more RBLs?

Bret

Re: BIZ_TLD and INFO_TLD

2006-11-03 Thread Loren Wilton

Still seem to be mostly spammers here.  There is a slight increase in ham, 
but I don't think it would really change the scores all that much.  I have 
both of these domains scored at 5 with no problems.


   Loren

Re: Block wrote: spams

2006-11-03 Thread Loren Wilton




I haven't seen any of these. But if the spams universally have 
"single word wrote: stuff" as the subject then I'd consider a 
more stringent rule:

 /^\w+\s+wrote:/i

or
 /^(?:\w+\s+){1,2}wrote:/i

or
 /^(?:re:\s*|fw:\s*){0,20}(?:\w+\s+){1,2}wrote:/i

  Loren


  - Original Message - 
  From: 
  Juan Mas 
  
  To: MIKE YRABEDRA 
  Cc: spamassassin-users 
  Sent: Friday, November 03, 2006 7:15 
  AM
  Subject: Re: Block "wrote:" spams
  Ive been getting the same and just wrote a rule for it 
  today. Ive got what you have listed below. Havent tested it 
  though.
  On 11/3/06, MIKE 
  YRABEDRA  [EMAIL PROTECTED] 
  wrote:
  I 
am getting a lot of these "Bob wrote: " spams Anyone know a way to 
write the rule so if the subject has "wrote:" in thesubject, tag 
it?Here is what I have?header 
WROTE_SUBSubject =~ 
/\bwrote\:\b/idescribe 
WROTE_SUBWrote in Subject score 
WROTE_SUB 
3.0--Mike Yrabedra 
  B^)-- -Juan

Re: BIZ_TLD and INFO_TLD

at 2006. november 3. 18.20 Loren Wilton wrote:
 Still seem to be mostly spammers here.  There is a slight increase in ham,
 but I don't think it would really change the scores all that much.  I have
 both of these domains scored at 5 with no problems.
Why don't you use simplex algorithm (or similar) to compute optimal scores?
-- 
With regards: Imre Péntek
E-Mail: [EMAIL PROTECTED]

Bayesian scores

Hello,

Why BAYES_99 have only the score 3.5 while 5.0 is required to identify a mail 
as spam? I think this rule should have a score about 5.1 (or anything greater 
than 5.0).
-- 
With regards: Imre Péntek
E-Mail: [EMAIL PROTECTED]

Re: Bayesian scores

2006-11-03 Thread Jim Maul


Péntek Imre wrote:

Hello,

Why BAYES_99 have only the score 3.5 while 5.0 is required to identify a mail 
as spam? I think this rule should have a score about 5.1 (or anything greater 
than 5.0).


because if its wrong in its classification, then that 1 rule alone will 
cause a FP.  The whole idea is that no 1 rule cause a message to be 
tagged either way. (except for maybe whitlist/blacklist)


Anyway, if you want, change the score of the rule.  I've upped the 
scores on almost all bayes rules here because history has shown it to be 
incredibly accurate here.


-Jim

Re: Block wrote: spams

2006-11-03 Thread Justin Mason


there's a rule that matches them in 3.1.x sa-update, fwiw.

--j.

Loren Wilton writes:
 I haven't seen any of these.  But if the spams universally have single 
 word wrote: stuff as the subject then I'd consider a more stringent rule:
 
 /^\w+\s+wrote:/i
 
 or
 /^(?:\w+\s+){1,2}wrote:/i
 
 or
 /^(?:re:\s*|fw:\s*){0,20}(?:\w+\s+){1,2}wrote:/i
 
 Loren
 
   - Original Message - 
   From: Juan Mas 
   To: MIKE YRABEDRA 
   Cc: spamassassin-users 
   Sent: Friday, November 03, 2006 7:15 AM
   Subject: Re: Block wrote: spams
 
 
   Ive been getting the same and just wrote a rule for it today.  Ive got what 
 you have listed below.  Havent tested it though.
 
 
   On 11/3/06, MIKE YRABEDRA  [EMAIL PROTECTED] wrote:
 
 
 I am getting a lot of these Bob wrote:  spams 
 
 Anyone know a way to write the rule so if the subject has wrote: in the
 subject, tag it?
 
 Here is what I have?
 
 header WROTE_SUB  Subject =~ /\bwrote\:\b/i
 describe WROTE_SUB  Wrote in Subject 
 score WROTE_SUB   3.0
 
 
 
 
 --
 Mike Yrabedra B^)
 
 
 
 
 
 
 
   -- 
   -Juan

blocking mail gateways

2006-11-03 Thread dragin33


I have started to recieve a flood of spam that is getting through spam
assassin on my server.  I have my score set to 4 which I don't think is too
high but this spam is coming through sometimes with scores of .5 or 1.  I
want to be able to block the email gateways these things are being sent
from.  I only have limited configuration of the spam assassin server through
a web interface.  (i'm on a shared hosted server.)  It has a score blank and
I was wondering if there is something I can put in there to tell it to block
emails coming though these addresses.  Thanks.
-- 
View this message in context: 
http://www.nabble.com/blocking-mail-gateways-tf2569709.html#a7163359
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayesian scores

Jim Maul wrote:
 I've upped the scores on almost all bayes rules here because history has
 shown it to be incredibly accurate here.
Yes. BTW so far I've got no FP but still get false negatives with score 3.5, 
BAYES_99, using this database:
[5816] dbg: bayes: corpus size: nspam = 2757, nham = 1403
Built from scratch by myself, still growing.
As I have so big database there's very little possibility of mistaken bayesian 
score, but as I've built this database from scratch, I can also state that 
the same stands for little bayesian databases too. So I will use score 5.1 
for BAYES_99, and still suggest to use this in the SA distribution too. 
Thanks for helping me anyways.
-- 
With regards: Imre Péntek
E-Mail: [EMAIL PROTECTED]

Re: Bayesian scores

2006-11-03 Thread Jim Maul


Péntek Imre wrote:

Jim Maul wrote:

I've upped the scores on almost all bayes rules here because history has
shown it to be incredibly accurate here.
Yes. BTW so far I've got no FP but still get false negatives with score 3.5, 
BAYES_99, using this database:

[5816] dbg: bayes: corpus size: nspam = 2757, nham = 1403
Built from scratch by myself, still growing.
As I have so big database there's very little possibility of mistaken bayesian 
score, but as I've built this database from scratch, I can also state that 
the same stands for little bayesian databases too. So I will use score 5.1 
for BAYES_99, and still suggest to use this in the SA distribution too. 
Thanks for helping me anyways.



If you are getting false negatives with 3.5 then you need to find a way 
to get more rules to hit.  My average spam score here is 16.1 which is 
way over my 5.0 threshold.  The trick is to increase the distance 
between your average spam and ham scores as much as possible and then 
you can run with a higher spam threshold.  If you have spam not getting 
tagged, you should increase rules that trigger, not lower your threshold.


Are you using network tests, razor, surbl, add on rules from sare, etc?

-Jim

Re: Enabling/testing SPF?

2006-11-03 Thread Henry Kwan

Ramprasad ram at netcore.co.in writes:
 
 
 spamassassin -D  file 21 | grep -i spf 
 
 check the output
 
 which MTA do you use ? Your MTA must insert an X-Envelope-From: header
 ( or similar )
 
 Thanks
 Ram
 
 

Hi.

I'm using sendmail so I see that I have to modify sendmail.cf by adding
H?l?X-Envelope-From: $f.  By the way, how can I add
that bit via sendmail.mc instead of modifying sendmail.cf directly?

Anyway, this is what I get with the sample non-sample:

[EMAIL PROTECTED] Mail-SpamAssassin-3.1.7]# spamassassin -D  
sample-nonspam.txt 21 |
grep -i spf
[25342] dbg: config: read file /usr/share/spamassassin/25_spf.cf
[25342] dbg: config: read file /usr/share/spamassassin/60_whitelist_spf.cf

Even with a piece of mail that I had saved from a domain that has a confirmed
SPF record (and a X-Envelope-From: header), I get
the same output as above.

Thanks.

--Henry

P.S.  Sorry if this is a dupe.  Wasn't sure if this got sent as Pine complained
about my mailbox when I tried to send it earlier.

Re: Bayesian scores

Jim Maul wrote:
 Are you using network tests, razor, surbl, add on rules from sare, etc?
I can just guess, as I don't know how to get to be sure.
I can find several spams marked with:
RCVD_IN_BL_SPAMCOP_NET
UNPARSEABLE_RELAY
URIBL_AB_SURB
Are these mean I also use network tests?
As I see I don't use razor, I will read the wikipage about it.
-- 
Üdvözlettel: Ifj. Péntek Imre
E-Mail: [EMAIL PROTECTED]

R: BIZ_TLD and INFO_TLD

2006-11-03 Thread Giampaolo Tomassoni

 at 2006. november 3. 18.20 Loren Wilton wrote:
  Still seem to be mostly spammers here.  There is a slight 
 increase in ham,
  but I don't think it would really change the scores all that 
 much.  I have
  both of these domains scored at 5 with no problems.
 Why don't you use simplex algorithm (or similar) to compute 
 optimal scores?

I don't have a reliable ham corpus: my customers mostly use pop3...

---
Giampaolo Tomassoni - IT Consultant
Piazza VIII Aprile 1948, 4
I-53044 Chiusi (SI) - Italy
Ph: +39-0578-21100

MAI inviare una e-mail a:
NEVER send an e-mail to:
 [EMAIL PROTECTED]

 -- 
 With regards: Imre Péntek
 E-Mail: [EMAIL PROTECTED]

Re: Bayesian scores

2006-11-03 Thread Jim Maul


Péntek Imre wrote:

Jim Maul wrote:

Are you using network tests, razor, surbl, add on rules from sare, etc?

I can just guess, as I don't know how to get to be sure.
I can find several spams marked with:
RCVD_IN_BL_SPAMCOP_NET
UNPARSEABLE_RELAY
URIBL_AB_SURB
Are these mean I also use network tests?



I am not sure.  It would seem so to me.  Make sure you do not have -L 
being passed when starting spamd.



As I see I don't use razor, I will read the wikipage about it.


Definitely! Razor is by far one of the top performing rules on many SA 
setups.  It works great.


-Jim

Re: R: BIG increase in spam today


Federico Giannici wrote:

François Rousseau wrote:

Greylisting is not always good...

The greylisting insert delay in delevery and sometimes the email have 
to be delever fast. 


I don't trust enough DNSBLs to completely block an email only based on 
them.


What about combining BlackListing and GreyListing?
I'd like to use GreyLists (with long delay) for BlackListed emails only.

Has anybody already implemented it?
Is there already something able to implement it?


This was asked on the Postfix list recently:

http://groups.google.com/group/list.postfix.users/browse_thread/thread/5146269c41c5ca9d

The best answer was:

http://www.orangegroove.net/code/marbl/

Re: Bayesian scores

Jim Maul wrote:
 I am not sure.  It would seem so to me.  Make sure you do not have -L
 being passed when starting spamd.
I've started reading that wikipage, so now I can test for sure:
$ spamassassin -t -D  spam  output 21
$ grep network output
[6639] dbg: pyzor: network tests on, attempting Pyzor
[6639] dbg: reporter: network tests on, attempting SpamCop

Thanks for the suggestions.
-- 
With regards: Imre Péntek
E-Mail: [EMAIL PROTECTED]

Re: sa-learn training question(s)

2006-11-03 Thread Jason Wellman

Thanks for the feedback. One last question that I am currently tossing around. Sitewide vs individual learning... I have a small domain, less then 50 users. Should I be looking at setting up a sitewide bayes database instead of individual ones? Again I find conflicting information when I dig into it on the web. I find myself thinking that one persons Spam may be another's legitimate advertising...
- JasonOn 11/3/06, Bowie Bailey [EMAIL PROTECTED] wrote:
Matt Kettler wrote: Jason Wellman wrote: ... I have all incoming mail that is tagged as Spam delivered to a CaughtSpam IMAP box for each user. ...
Should I also have sa-learn from the CaughtSpam folder?I have read some places that say yes, and some that say no. YES. Those that say no clearly do not know what they're talking about.
Ummm...Do you really want to sa-learn from an unverified spam folder? Lets face it.. if there was no point in learning tagged spam, why does the autolearner only kick in on high-scoring spam?
The autolearner kicks in only on high-scoring spam to avoid learningfrom false positives.Learning from the CaughtSpam folder is likedropping the autolearn threshold down to 5.0 and removing the
header/body score requirements. That said, it will only learn the caught spam that wasn't already autolearned, but this is actually quite valuable as it will generally contain more of the borderline spam which is important for bayes to
know about.You do want to learn from as much spam (and ham) as possible, but youwant a human to sort it first.I would say that you should only learnfrom the IsSpam folder and encourage your users to copy the spam
over from the CaughtSpam folder to the IsSpam folder after they'veverified that there are no false positives. Second question.It is easy to tell a user (and some of mine are
non-tech folks) to put Spam in the IsSpam folder, but there isn't a way to really tell them that they need to put HAM in a certain folder, they just don't understand it.So my second question is
how are people feeding sa-learn good HAM? That depends a lot on the user. Some are good, some not so good. Most will generally do this only when they're getting FPs, but that's still
handy.Agreed.Just tell them to do it.If they do, great!If not, youstill get the false positives.In the end, they are the onesresponsible for making the Bayes DB effective.If they don't want to
help, there's not much you can do about it. I was toying with the idea of feeding in peoples Sent folders along with all messages from their INBOX and Trash that were
marked as read (I can pull these out using mboxgrep).This would also give me a larger sample of HAM them Spam which I understand is a good thing. Can anyone poke holes in my logic on this, or point
out a better source for me to scrape HAM to feed sa-learn? Well, doing inbox and trash, you'll autolearn any false-negatives that your user happened to read and did not move to the IsSpam.. If you
don't trust them to force-feed good ham, this might not be a good idea. Sent would appear to be fine.. unless your users are really dumb and frequently reply to spam.Ditto.Sent might be ok, but Trash is probably a bad idea.
--Bowie

Re: Enabling/testing SPF?

2006-11-03 Thread Henry Kwan

Ramprasad ram at netcore.co.in writes:

 
 spamassassin -D  file 21 | grep -i spf 
 
 check the output
 
 which MTA do you use ? Your MTA must insert an X-Envelope-From: header
 ( or similar )
 
 Thanks
 Ram
 

Hi,

After some more banging my head against the wall, I discovered that SPF checking
was disabled because I wasn't loading the plugin in my init.pre.  Apparently my
init.pre is so old that it never included a section on SPF.  So everytime I
upgraded, the new version SA would never replace my old init.pre so the SPF
plugin was never getting loaded.  After I insert the load plugin section into
init.pre and restarted spamd, SPF checking is now working.

Doh!

Thanks for your help.

How to disable IADB

2006-11-03 Thread Henk van Lingen


Hi,

One of my users gets lots of similar UCE, and learning doesn't help
a bit. Investigating the report headers, it seems the mails trigger
'IADB' rules, which seems to be a RBL whitelist.
( 70_iadb.cf  20_dnsbl_tests.cf)

Is there a way to disable this 'feature', without editting those files?

Regards,
-- 
Henk van Lingen, Systems  Network Administrator  (o-  -+
Dept. of Computer Science, Utrecht University./\|
phone: +31-30-2535278v_/_
http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/

Re: How to disable IADB

2006-11-03 Thread Theo Van Dinter

On Fri, Nov 03, 2006 at 09:02:46PM +0100, Henk van Lingen wrote:
 Is there a way to disable this 'feature', without editting those files?

Set the rule scores to 0.

-- 
Randomly Selected Tagline:
She's gonna say my name!
 
--Ralph Wiggum
  Lisa Gets an A (Episode AABF03)


pgprnpi6Vg9j8.pgp
Description: PGP signature

Re: SA TIMED OUT message debian sarge

2006-11-03 Thread Simon


On 11/3/06, Mark Martinec [EMAIL PROTECTED] wrote:

On Friday November 3 2006 05:23, Matt Kettler wrote:
 I believe the option is $sa_timeout
 Not sure what the default is, probably 30. Which should be enough to
 prevent that problem, unless you have a LOT of sa instances contending
 for the AWL database.
 Try adding a $sa_timeout = 60 to your Amavisd.conf and  lock_method
 flock to your spamassassin/local.cf (if you don't use NFS for DB storage.)


Thanks for all the replies on this topic.. With a combination of the
answers, i *seem* to have it sorted as well as a couple of good hints
to increase speed etc.

Thanks again.

Re: Relay Checker plugin v0.2

2006-11-03 Thread John Rudd


Stuart Johnston wrote:

John Rudd wrote:


I've put up a new version of Relay checker, in

...
I expect I might, at some point, switch from using a dynamic score in 
the plugin, to a normal score.  But that's the only change I expect to 
make, aside from bug fixes (if there are any), and/or a switch to 
using Net::DNS.


I wonder if there is any way for a plugin to hook into SA's DNS 
routines.  That might be better than calling Net::DNS directly.



If anyone knows of a way, I'd look into it.   I need to do both fwd and 
reverse lookups though.

Re: How to disable IADB

2006-11-03 Thread Henk van Lingen

On Fri, Nov 03, 2006 at 03:06:10PM -0500, Theo Van Dinter wrote:
   On Fri, Nov 03, 2006 at 09:02:46PM +0100, Henk van Lingen wrote:
Is there a way to disable this 'feature', without editting those files?
   
   Set the rule scores to 0.

  Oke, of course. There are however 28 such rules at the moment.

  grep IADB /var/lib/spamassassin/3.001007/*/* | grep score | wc
   28  872879

  They all get tested every time.

  I'd hoped for a 'skip_rbl_checks alike' check, or something.

  Thanks anyways,

-- 
Henk van Lingen, Systems  Network Administrator  (o-  -+
Dept. of Computer Science, Utrecht University./\|
phone: +31-30-2535278v_/_
http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/

RE: sa-learn training question(s)

2006-11-03 Thread Bowie Bailey

Jason Wellman wrote:
 Thanks for the feedback.  One last question that I am currently
 tossing around.  Sitewide vs individual learning... I have a small
 domain, less then 50 users.  Should I be looking at setting up a
 sitewide bayes database instead of individual ones?  Again I find
 conflicting information when I dig into it on the web.  I find myself
 thinking that one persons Spam may be another's legitimate
 advertising...  

In general, individual databases are better than site-wide *IF* they
are well trained.  A well trained site-wide database is better than a
bunch of individual databases that only get autolearning.

Also, keep in mind that each database will require learning from 200
ham and 200 spam before it becomes operational.  So make sure your
users aren't expecting an overnight improvement.

-- 
Bowie

Re: Spam

2006-11-03 Thread Markus Braun




you will get a format that's more suitable to put in the headers.
What do you mean, whaat this two options do, i found nothing on the
spamassassin site.


At the moment i use bayes and the emails are marked like this in the header:

But some emails come through the spamassasin filter like this from the 
header:


Return-path: [EMAIL PROTECTED]
Envelope-to: [EMAIL PROTECTED]
Received: from xdsl-10369.wroclaw.dialog.net.pl ([84.40.242.129])
by 89-149-XXX-125.internetserviceteam.com with esmtp (Exim 4.50)
id 1GfnGR-0002Fg-Nw
for [EMAIL PROTECTED]; Fri, 03 Nov 2006 01:50:36 +0100
Message-ID: [EMAIL PROTECTED]
From:   cases [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: me configure instead editing
Date:   Fri, 3 Nov 2006 01:51:19 +0100
MIME-Version: 1.0
Content-Type: multipart/related;
type=multipart/alternative;
boundary==_NextPart_000_0006_01C6FEEA.8EE1C180
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2962

It contains a picture in the body email and this text:



Function active am way! Destroy is without am closing in underlying handles 
odd even? Mutual exclusion see it exec?

Its is Looking am direction? It exec why of earth a we say ill.
Hash sha Melissa Schrumpf gtgtgti in gtgtgtthe. Without closing underlying 
handles odd even in actively redirected anyways.

Active way Sciences Division in Technology. Output actually appears in.
Gtgt yes similar issue program somehow or got confused a state. Gravereaux 
may tue is Tues Begin pgp Signed Hash sha.
Appear am first remove. Way Sciences Division! More Wish console or 
unresponse when script of started from.

Posted in will email visible? Parents a call them freaky true Thats.

So what can i do here?

thx
marcus

_
Sie suchen E-Mails, Dokumente oder Fotos? Die neue MSN Suche Toolbar mit 
Windows-Desktopsuche liefert in sekundenschnelle Ergebnisse. Jetzt neu! 
http://desktop.msn.de/ Jetzt gratis downloaden!

Ham Learning

2006-11-03 Thread Markus Braun


Hello,

when i learn with sa-learn some emails as ham i get this error message:

Parsing of undecoded UTF-8 will give garbage when decoding entities at 
/usr/share/perl5/Mail/SpamAssassin/HTML.pm line 182.



Can somebody explain me what this mean?

bye marcus

_
Die neue MSN Suche Toolbar mit Windows-Desktopsuche. Suchen Sie gleichzeitig 
im Web, Ihren E-Mails und auf Ihrem PC! Jetzt neu! http://desktop.msn.de/ 
Jetzt gratis downloaden!

Re: How to disable IADB

2006-11-03 Thread Theo Van Dinter

On Fri, Nov 03, 2006 at 09:38:27PM +0100, Henk van Lingen wrote:
   Oke, of course. There are however 28 such rules at the moment.

Technically the only one that matters is __RCVD_IN_IADB:

score __RCVD_IN_IADB 0

The rest look at the results generated by that rule, so if that rule doesn't
run ...

   I'd hoped for a 'skip_rbl_checks alike' check, or something.

patches to make rule groups welcome. :)

-- 
Randomly Selected Tagline:
It was real. At least, if it wasn't real, it did support 
 them, and as that is what sofas are supposed to do, this, 
 by any test that mattered, was a real sofa. 


pgpnPdGM5chCe.pgp
Description: PGP signature

Re: Amazon / RFCI false positives

2006-11-03 Thread Brian Godette

Seems pretty accurate to me since I have accounts that have been 
returning 550: User Unknown smtp rejects for 2+ years that still receive 
mail from Amazon on a weekly/monthly basis. Same thing for several airline 
mileage programs, big name stock brokerages, etc.

On Friday 03 November 2006 08:23, Tony Finch wrote:
 On Fri, 3 Nov 2006, Ralf Hildebrandt wrote:
  * Tony Finch [EMAIL PROTECTED]:
   Amazon.co.uk was listed by RFC-Ignorant at the start of this week, and
   it is now scoring more than 5: DNS_FROM_RFC_DSN 2.87, DNS_FROM_RFC_POST
   1.44, FROM_EXCESS_BASE64 1.05.
 
  Amazon.co.uk is not listed:
  http://www.rfc-ignorant.org/tools/lookup.php?domain=Amazon.co.uk

 My mistake: I cited the wrong domain. Try bounces.amazon.com which they
 use in the return path of their messages (I guess for all their
 international domains)
 http://www.rfc-ignorant.org/tools/lookup.php?domain=bounces.amazon.com

 Tony.

Re: How to disable IADB


Henk van Lingen wrote:

On Fri, Nov 03, 2006 at 03:06:10PM -0500, Theo Van Dinter wrote:
   On Fri, Nov 03, 2006 at 09:02:46PM +0100, Henk van Lingen wrote:
Is there a way to disable this 'feature', without editting those files?
   
   Set the rule scores to 0.


  Oke, of course. There are however 28 such rules at the moment.

  grep IADB /var/lib/spamassassin/3.001007/*/* | grep score | wc
   28  872879

  They all get tested every time.

  I'd hoped for a 'skip_rbl_checks alike' check, or something.

  Thanks anyways,


How about:

perl -n -e 'if(/(score RCVD_IN_IADB\w*)/){ print $1 0\n }' \ 
/var/lib/spamassassin/3.001003/updates_spamassassin_org/70_iadb.cf  \ 
/etc/mail/spamassassin/disable_iadb.cf

SA TIMED OUT message debian sarge (new error)

2006-11-03 Thread Simon


Hi There,

Looks like ive solved one issue, and another crops up!... I think that
i may need to move to a mysql storage engine here? approx 17,000
messages a day incoming on this server.

Any pointers here? - Thanks!!

Nov  4 11:39:40 mx1 amavis[32148]: (32148-07) SA TIMED OUT, backtrace:
at /usr/share/perl5/Mail/SpamAssassin/DBBasedAddrList.pm line
171\n\teval {...} called at /usr/share/perl5/Ma
il/SpamAssassin/DBBasedAddrList.pm line
171\n\tMail::SpamAssassin::DBBasedAddrList::remove_entry('Mail::SpamAssassin::DBBasedAddrList=HASH(0xa881df0)',
'HASH(0xa6bc474)') called at
/usr/share/perl5/Mail/SpamAssassin/AutoWhitelist.pm line
134\n\tMail::SpamAssassin::AutoWhitelist::check_address('Mail::SpamAssassin::AutoWhitelist=HASH(0xa87eba8)',
'[EMAIL PROTECTED]
adv.com', 82.227.79.148) called at
/usr/share/perl5/Mail/SpamAssassin/Plugin/AWL.pm line 355\n\teval
{...} called at /usr/share/perl5/Mail/SpamAssassin/Plugin/AWL.pm line
351\n\tMa
il::SpamAssassin::Plugin::AWL::check_from_in_auto_whitelist('Mail::SpamAssassin::Plugin::AWL=HASH(0xa09da08)',
'Mail::SpamAssassin::PerMsgStatus=HASH(0xa67060c)') called at (eval 2
80) line 7\n\tMail::SpamAssassin::PerMsgStatus::check_f...

RE: Amazon / RFCI false positives

2006-11-03 Thread Michael Scheidell

 -Original Message-
 From: Tony Finch [mailto:[EMAIL PROTECTED] On Behalf Of 
 Tony Finch
 Sent: Friday, November 03, 2006 9:59 AM
 To: users@spamassassin.apache.org
 Subject: Amazon / RFCI false positives

 Amazon.co.uk was listed by RFC-Ignorant at the start of this 
 week, and it is now scoring more than 5: DNS_FROM_RFC_DSN 
 2.87, DNS_FROM_RFC_POST 1.44,
 FROM_EXCESS_BASE64 1.05.

Not a false positive if their servers are broken.

Looks like their servers are broken.

They can either fix their servers, or you can disable the tests.

RE: Amazon / RFCI false positives

2006-11-03 Thread Michael Scheidell

  -Original Message-
 From: Michael Scheidell 
 Sent: Friday, November 03, 2006 6:32 PM
 To: Tony Finch; users@spamassassin.apache.org
 Subject: RE: Amazon / RFCI false positives

  -Original Message-
  From: Tony Finch [mailto:[EMAIL PROTECTED] On Behalf Of Tony 
  Finch
  Sent: Friday, November 03, 2006 9:59 AM
  To: users@spamassassin.apache.org
  Subject: Amazon / RFCI false positives

  Amazon.co.uk was listed by RFC-Ignorant at the start of 
 this week, and 
  it is now scoring more than 5: DNS_FROM_RFC_DSN 2.87, 
  DNS_FROM_RFC_POST 1.44,
  FROM_EXCESS_BASE64 1.05.

 Not a false positive if their servers are broken.

 Looks like their servers are broken.

 They can either fix their servers, or you can disable the tests.

Yep, still broken:

host -t mx bounces.amazon.com
bounces.amazon.com mail is handled by 10 bounces-0101.amazon.com.
bounces.amazon.com mail is handled by 10 bounces-2102.amazon.com.
bounces.amazon.com mail is handled by 10 bounces-2101.amazon.com.
bounces.amazon.com mail is handled by 10 bounces-0102.amazon.com.
smb-250# telnet bounces-0101.amazon.com 25
Trying 207.171.178.149...
telnet: connect to address 207.171.178.149: Connection refused
telnet: Unable to connect to remote host
smb-250# telnet bounces-2102.amazon.com 25
Trying 207.171.160.55...
telnet: connect to address 207.171.160.55: Connection refused
telnet: Unable to connect to remote host
smb-250# telnet bounces-2101.amazon.com 25
Trying 207.171.160.54...
telnet: connect to address 207.171.160.54: Connection refused
telnet: Unable to connect to remote host
smb-250# telnet bounces-0102.amazon.com 25
Trying 207.171.178.150...
telnet: connect to address 207.171.178.150: Connection refused
telnet: Unable to connect to remote host

Re: Relay Checker plugin v0.2


John Rudd wrote:

Stuart Johnston wrote:

John Rudd wrote:


I've put up a new version of Relay checker, in

...
I expect I might, at some point, switch from using a dynamic score in 
the plugin, to a normal score.  But that's the only change I expect 
to make, aside from bug fixes (if there are any), and/or a switch to 
using Net::DNS.


I wonder if there is any way for a plugin to hook into SA's DNS 
routines.  That might be better than calling Net::DNS directly.



If anyone knows of a way, I'd look into it.   I need to do both fwd and 
reverse lookups though.


The simple version might look like:

# Get resolver
my $dns = $pms-{parser_dns_pms};

# Reverse
$hostname = $dns-lookup_ptr ($ip);

# Forward
my @addrs = $dns-lookup_a ($hostname);

I'm not sure if the above code is really in any way better than the way you have it now.  There are 
also functions for doing dns in the background but I don't know if that would be practical or 
helpful for your plugin.


You also might consider using the rdns that SA has already calculated to save 
one query:

$hostname = $relay-{ip};

RE: SA TIMED OUT message debian sarge (new error)

2006-11-03 Thread Gary V


Hi There,

Looks like ive solved one issue, and another crops up!... I think that
i may need to move to a mysql storage engine here? approx 17,000
messages a day incoming on this server.

Any pointers here? - Thanks!!

Nov  4 11:39:40 mx1 amavis[32148]: (32148-07) SA TIMED OUT, backtrace:
at /usr/share/perl5/Mail/SpamAssassin/DBBasedAddrList.pm line
171\n\teval {...} called at /usr/share/perl5/Ma
il/SpamAssassin/DBBasedAddrList.pm line
171\n\tMail::SpamAssassin::DBBasedAddrList::remove_entry('Mail::SpamAssassin::DBBasedAddrList=HASH(0xa881df0)',
'HASH(0xa6bc474)') called at
/usr/share/perl5/Mail/SpamAssassin/AutoWhitelist.pm line
134\n\tMail::SpamAssassin::AutoWhitelist::check_address('Mail::SpamAssassin::AutoWhitelist=HASH(0xa87eba8)',
'[EMAIL PROTECTED]
adv.com', 82.227.79.148) called at
/usr/share/perl5/Mail/SpamAssassin/Plugin/AWL.pm line 355\n\teval
{...} called at /usr/share/perl5/Mail/SpamAssassin/Plugin/AWL.pm line
351\n\tMa
il::SpamAssassin::Plugin::AWL::check_from_in_auto_whitelist('Mail::SpamAssassin::Plugin::AWL=HASH(0xa09da08)',
'Mail::SpamAssassin::PerMsgStatus=HASH(0xa67060c)') called at (eval 2
80) line 7\n\tMail::SpamAssassin::PerMsgStatus::check_f...


This could be simply what spamassassin was doing at the point you ran out of 
time. One possible reason for timeouts is sa-learn is running an expiry, and 
possibly learning a message at the same time. The Debian package of 
amavisd-new has a cron entry that runs --force-expire once a day 
(/etc/cron.daily/amavisd-new). You can disable opportunistic expiry by 
setting:

bayes_auto_expire 0
in local.cf, but MAKE SURE the script works or Bayes will grow forever. 
Simply run it. If it takes a minute to run, it's very likely working. The 
script may be outdated also. The important part should read something like:

su - amavis -c '/usr/bin/sa-learn --sync --force-expire /dev/null'

Moving to MySQL helps considerably:
http://www200.pair.com/mecham/spam/debian-spamassassin-sql.html

Gary V

_
Add a Yahoo! contact to Windows Live Messenger for a chance to win a free 
trip! 
http://www.imagine-windowslive.com/minisites/yahoo/default.aspx?locale=en-ushmtagline

Re: SA TIMED OUT message debian sarge (new error)

2006-11-03 Thread Mark Martinec

Simon,

 Looks like ive solved one issue, and another crops up!... I think that
 i may need to move to a mysql storage engine here? approx 17,000
 messages a day incoming on this server.
 Any pointers here? - Thanks!!

 Nov  4 11:39:40 mx1 amavis[32148]: (32148-07) SA TIMED OUT, backtrace:
 at /usr/share/perl5/Mail/SpamAssassin/DBBasedAddrList.pm line 171
 ... /usr/share/perl5/Mail/SpamAssassin/AutoWhitelist.pm line 134
 ... /usr/share/perl5/Mail/SpamAssassin/Plugin/AWL.pm line 355

Move AWL to SQL, if you haven't already. It is not too bad to start
from scratch with an empty AWL database, it is probably not worth
salvaging your existing AWL.

  Mark

RE: Amazon / RFCI false positives

2006-11-03 Thread Tony Finch

On Fri, 3 Nov 2006, Michael Scheidell wrote:

 Not a false positive if their servers are broken.

True from the RFCI point of view, but NOT true from the SpamAssassin point
of view. These messages are wanted by their recipients so should not be
scored as spam by SpamAssassin.

Tony.
-- 
f.a.n.finch  [EMAIL PROTECTED]  http://dotat.at/
LUNDY FASTNET: EAST 3 OR 4, BECOMING VARIABLE 3 OR LESS LATER. SLIGHT
OCCASIONALLY MODERATE. FAIR. GOOD.

Re: sa-learn training question(s)

2006-11-03 Thread Matt Kettler

Bowie Bailey wrote:
 Matt Kettler wrote:
   
 Jason Wellman wrote:
 
 ...
 I have all incoming mail that is tagged as Spam
 delivered to a CaughtSpam IMAP box for each user.
 ...

 Should I also have sa-learn from the CaughtSpam folder?  I have
 read some places that say yes, and some that say no.
   
 YES. Those that say no clearly do not know what they're talking about.
 

 Ummm...Do you really want to sa-learn from an unverified spam folder?
   
Good point.. I took that to be a question about should I avoid
sa-learning mail that was already tagged.. A massive misread on my
part.. need more coffee.
   
 Lets face it.. if there was no point in learning tagged spam, why does
 the autolearner only kick in on high-scoring spam?
 

 The autolearner kicks in only on high-scoring spam to avoid learning
 from false positives.  Learning from the CaughtSpam folder is like
 dropping the autolearn threshold down to 5.0 and removing the
 header/body score requirements.
   
Again, I was off base, but if you re-read it in-context of how I
originaly read it, it makes sense.
   
 That said, it will only learn the caught spam that wasn't already
 autolearned, but this is actually quite valuable as it will generally
 contain more of the borderline spam which is important for bayes to
 know about.
 

 You do want to learn from as much spam (and ham) as possible, but you
 want a human to sort it first
Aye.

Re: Ham Learning

2006-11-03 Thread Matt Kettler

Markus Braun wrote:
 Hello,

 when i learn with sa-learn some emails as ham i get this error message:

 Parsing of undecoded UTF-8 will give garbage when decoding entities at
 /usr/share/perl5/Mail/SpamAssassin/HTML.pm line 182.


 Can somebody explain me what this mean? 

It's normal.. but AFAIK, that message should be suppressed in reasonably
recent versions of SA..

AFAIK only early SA 3.0 or 2.6 should generate that message.

Re: Amazon / RFCI false positives


From: Tony Finch [EMAIL PROTECTED]


On Fri, 3 Nov 2006, Michael Scheidell wrote:


Not a false positive if their servers are broken.


True from the RFCI point of view, but NOT true from the SpamAssassin point
of view. These messages are wanted by their recipients so should not be
scored as spam by SpamAssassin.


Kinda tough, ain't it?

You could setup a whitelist_from_rcvd for Amazon, though.

{^_^}

Re: Bayesian scores


Modify the score if you think that is appropriate. (I do. I score it at
5.1. The .1 is so I can be obnoxious in arguments about this,
like the argument which may start with your message.)

If you Bayes is VERY well trained with VERY few hams that come in
BAYES_99, like 1 in 1000 or less, then raising the score may be called
for. Raise the score in modest steps until you see BAYES_99 on ham
messages. Then back off a little.

{^_-}
- Original Message - 
From: Péntek Imre [EMAIL PROTECTED]



Hello,

Why BAYES_99 have only the score 3.5 while 5.0 is required to identify a mail
as spam? I think this rule should have a score about 5.1 (or anything greater
than 5.0).
--
With regards: Imre Péntek
E-Mail: [EMAIL PROTECTED]

Re: Bayesian scores


From: Jim Maul [EMAIL PROTECTED]

Péntek Imre wrote:

Jim Maul wrote:

I've upped the scores on almost all bayes rules here because history has
shown it to be incredibly accurate here.
Yes. BTW so far I've got no FP but still get false negatives with score 3.5, BAYES_99, 
using this database:

[5816] dbg: bayes: corpus size: nspam = 2757, nham = 1403
Built from scratch by myself, still growing.
As I have so big database there's very little possibility of mistaken bayesian score, 
but as I've built this database from scratch, I can also state that the same stands for 
little bayesian databases too. So I will use score 5.1 for BAYES_99, and still suggest 
to use this in the SA distribution too. Thanks for helping me anyways.



If you are getting false negatives with 3.5 then you need to find a way
to get more rules to hit.  My average spam score here is 16.1 which is
way over my 5.0 threshold.  The trick is to increase the distance
between your average spam and ham scores as much as possible and then
you can run with a higher spam threshold.  If you have spam not getting
tagged, you should increase rules that trigger, not lower your threshold.

Are you using network tests, razor, surbl, add on rules from sare, etc?


 Jim, if a rule has a history of hitting wrong once in 1000 or 1
times the score should be moved up from what the perceptron shows modulo
your mail flow. At 1000 messages a day finding one or even two hams in
the spam folder because of a rule scored too high is not severely annoying.
You can discover it. You can fix it. This goes for a low volume email
system with per user rules and Bayes. For a largish ISP different rules
of thumb must apply. Still, a really REALLY good rule can score pretty
high before it reveals itself as a problem with false negatives and you
have to lower the score a bit. BAYES_99 on a well trained system is one
such rule. Tweak scores gently until your tolerance for false positives
is exceeded. Then back off a bit, maybe even two notches.

{^_^}

Re: Block wrote: spams


And I would restart spamd after installing the rule.
{^_-}
- Original Message - 
From: Loren Wilton [EMAIL PROTECTED]



I haven't seen any of these.  But if the spams universally have single word wrote: 
stuff as the subject then I'd consider a more stringent rule:


   /^\w+\s+wrote:/i

or
   /^(?:\w+\s+){1,2}wrote:/i

or
   /^(?:re:\s*|fw:\s*){0,20}(?:\w+\s+){1,2}wrote:/i

   Loren

 - Original Message - 
 From: Juan Mas



 Ive been getting the same and just wrote a rule for it today.  Ive got what you have 
listed below.  Havent tested it though.



 On 11/3/06, MIKE YRABEDRA  [EMAIL PROTECTED] wrote:


   I am getting a lot of these Bob wrote:  spams

   Anyone know a way to write the rule so if the subject has wrote: in the
   subject, tag it?

   Here is what I have?

   header WROTE_SUB  Subject =~ /\bwrote\:\b/i
   describe WROTE_SUB  Wrote in Subject
   score WROTE_SUB   3.0




   --
   Mike Yrabedra B^)







 -- 
 -Juan

Re: BIZ_TLD and INFO_TLD


From: Péntek Imre [EMAIL PROTECTED]

Still seem to be mostly spammers here.  There is a slight increase in ham,
but I don't think it would really change the scores all that much.  I have
both of these domains scored at 5 with no problems.

Why don't you use simplex algorithm (or similar) to compute optimal scores?

 Local experience and laziness. When it becomes a problem we lower it
a little. I don't score them as high as he does, though. That's one of
the joys of per user scores, rules, and bayes.

{^_^}

Re: BIZ_TLD and INFO_TLD