Re: RCVD_IN_BSP_TRUSTED

2005-03-11 Thread Robert Menschel
Hello R,

Thursday, March 10, 2005, 12:28:51 AM, you wrote:

RM> From: Alana Craig <[EMAIL PROTECTED]>
RM> Subject: Updating my address book

 >> I would like to include your contact information in an address book I am
 >> creating for myself. Please enter your particulars using the link you see
 >> below:
 >> http://www.bebo.com/fr1/10076492a285606901b140803462c883765683d20

RM> X-Spam-Status: No, score=-3.8 required=5.0
RM> tests=BAYES_40,DNS_FROM_RFC_POST,
RM>  RCVD_IN_BSP_TRUSTED autolearn=failed version=3.0.2

RM> -4.3 RCVD_IN_BSP_TRUSTEDRBL: Sender is in Bonded Sender Program
RM> (trusted relay)
RM>[IronPort Bonded Sender - 
RM> ]

RM> should this obvious spam be allowed in BSP??

Yes, since the sending site,
> Received: from mail01.birthdayalarm.com ([65.19.128.185])
is bonded.  However, because they're bonded, report it to
bondedsender, and let birthdayalarm.com pay the bonding penalty.

BTW, I'm collecting samples of this type of spam (we're getting a fair
amount of it here also), and hope to have rules ready for specific.cf
eventually.  It's tricky, since there ARE valid emails with almost
identical characteristics...

(As for is it spam -- chances are yes, from what I've seen. Identical
to non-spam, except non-spam often will have more personal content.
The purpose of the spam, as far as I can guess, is to obtain valid
email addresses from gullible suckers.)

Bob Menschel





Re: Rule for downwards writing spam

2005-03-11 Thread Robert Menschel
Hello Matthew,

Thursday, March 10, 2005, 6:19:48 AM, you wrote:

MN> I've put together the following rule to try and catch the
MN> read-downwards type spam shown below. Could someone with a decent
MN> size corpus check it for me please? :-) (or if you see any obvious
MN> errors or improvements; it seems to work here)

No, since it would take me weeks to run these rules:

MN> body__UOLCC_DOWN2
MN> /\bc\b.*\b[il|\]\b.*\ba\b.*\b[il\|]\b.*\b[il\|]\b.*\bs\b/si

Any such use of .* in a body rule will cause such regex thrashing that
the rules will be unworkable for any sizeable system.

If you can get similar rules to work by replacing those .* with
.{1,10} or some similar limit, then perhaps I can afford to test them
for you...

Bob Menschel





Re: Whitelist collection project

2005-03-11 Thread Daryl C. W. O'Shea
Robert Menschel wrote:
And that leads to the second question: what's the best way for an "end
user" to obtain/verify SPF records? I have all the capabilities of XP
(shudder) and Cygwin readily available, and can get Linux command-line
capabilities via SSH to SARE's server, I believe.
Whatever your favourite way of retrieving DNS records is, will work.
On Windows you could use nslookup, at a command prompt:
nslookup
set type=txt
somedomain.com  
another.ca  
exit   when you're done
On a *nix system you could use:   dig txt somedomain.com
Daryl


Re[2]: Whitelist collection project

2005-03-11 Thread Robert Menschel
Hello Daryl,

Thursday, March 10, 2005, 5:51:26 PM, you wrote:

DCWOS> Robert Menschel wrote:
>> And that leads to the second question: what's the best way for an "end
>> user" to obtain/verify SPF records? I have all the capabilities of XP
>> (shudder) and Cygwin readily available, and can get Linux command-line
>> capabilities via SSH to SARE's server, I believe.

DCWOS> Whatever your favourite way of retrieving DNS records is, will work.
DCWOS> On Windows you could use nslookup, at a command prompt: ...

Thanks.  That's a good start.  Now, how will I know when a domain has
an SPF record to validate upon?  What do they look like?

When I do this on your domain, I see a TXT record that begins
> v=spf1
and has what appears to be two SMTP system addresses (a:), and an
~all.

Would I be correct to read it as,
> If you receive email that you can verify comes from one of these two
> SMTP machines, then you can be confident that the email does indeed
> come from this domain.
> However, the ~all indicates that we do not limit all email users to
> these two machines, and you could receive valid domain email from
> other sources.

Thanks again.

Bob Menschel





Re: Whitelist collection project

2005-03-11 Thread Daryl C. W. O'Shea
Robert Menschel wrote:
Hello Daryl,
Thursday, March 10, 2005, 5:51:26 PM, you wrote:
DCWOS> Whatever your favourite way of retrieving DNS records is, will work.
DCWOS> On Windows you could use nslookup, at a command prompt: ...
Thanks.  That's a good start.  Now, how will I know when a domain has
an SPF record to validate upon?  What do they look like?
When I do this on your domain, I see a TXT record that begins
v=spf1
and has what appears to be two SMTP system addresses (a:), and an
~all.
My own domain's SPF record doesn't end in ~all so I'm not sure which 
domain you queried.


Would I be correct to read it as,
If you receive email that you can verify comes from one of these two
SMTP machines, then you can be confident that the email does indeed
come from this domain.
However, the ~all indicates that we do not limit all email users to
these two machines, and you could receive valid domain email from
other sources.
You've pretty much got it... ~all stands for soft fail, which does mean 
that there may be some mail that comes from unlisted hosts, but most 
should come from the listed hosts.

Checkout http://spf.pobox.com for more info.
If you go to that page and enter a domain that already has an SPF record 
in the text box on the left side of the page, it'll explain what each 
part of the record means half way down the page.

Daryl


RE: RCVD_IN_BSP_TRUSTED

2005-03-11 Thread Greg Allen
I have a fix for that

score RCVD_IN_BSP_TRUSTED 0

I don't give big negative points to anyone. To each his own though.






-Original Message-
From: Robert Menschel [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 10, 2005 8:29 PM
To: R McGlue
Cc: [EMAIL PROTECTED]
Subject: Re: RCVD_IN_BSP_TRUSTED


Hello R,

Thursday, March 10, 2005, 12:28:51 AM, you wrote:

RM> From: Alana Craig <[EMAIL PROTECTED]>
RM> Subject: Updating my address book

 >> I would like to include your contact information in an address book I am
 >> creating for myself. Please enter your particulars using the link you
see
 >> below:
 >> http://www.bebo.com/fr1/10076492a285606901b140803462c883765683d20

RM> X-Spam-Status: No, score=-3.8 required=5.0
RM> tests=BAYES_40,DNS_FROM_RFC_POST,
RM>  RCVD_IN_BSP_TRUSTED autolearn=failed version=3.0.2

RM> -4.3 RCVD_IN_BSP_TRUSTEDRBL: Sender is in Bonded Sender Program
RM> (trusted relay)
RM>[IronPort Bonded Sender -
RM> ]

RM> should this obvious spam be allowed in BSP??

Yes, since the sending site,
> Received: from mail01.birthdayalarm.com ([65.19.128.185])
is bonded.  However, because they're bonded, report it to
bondedsender, and let birthdayalarm.com pay the bonding penalty.

BTW, I'm collecting samples of this type of spam (we're getting a fair
amount of it here also), and hope to have rules ready for specific.cf
eventually.  It's tricky, since there ARE valid emails with almost
identical characteristics...

(As for is it spam -- chances are yes, from what I've seen. Identical
to non-spam, except non-spam often will have more personal content.
The purpose of the spam, as far as I can guess, is to obtain valid
email addresses from gullible suckers.)

Bob Menschel




Header Tagging with # instead of *

2005-03-11 Thread Peter Guhl
Hello all

Our Mailclient handles * in filter rules as wildcards. Now I tried to
change the subject tagging to # (as I have seen it at other
spamassassins-results) but this is the comment character (--> --lint
fails). Experimenting with escaping resulted in \\#SPAM\\# (using
\#SPAM\# in local.cf) or in " (using "#SPAM#").

Now... how did those people manage to tag spam with #SPAM#? Any idea?

Somebody suggested to use SPAM. Of course, that's easy - but
nobody else does it and I don't want to invent my own tagging-standard
if I can avoid it.

Regards
 Peter



imbalance in bayes numbers

2005-03-11 Thread R McGlue
how much will the following imbalance skew the bayes algorithms (if at all)
bash-2.03$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  54265  0  non-token data: nspam
0.000  0 206342  0  non-token data: nham
0.000  0 250698  0  non-token data: ntokens
0.000  0 1110469760  0  non-token data: oldest atime
0.000  0 1110541102  0  non-token data: newest atime
0.000  0 1110540952  0  non-token data: last journal 
sync atime
0.000  0 1110513186  0  non-token data: last expiry atime
0.000  0  43200  0  non-token data: last expire 
atime delta
0.000  0 197193  0  non-token data: last expire 
reduction count

i take it this is a standard snapshot more ham than spam...
ronan


Issue with bayes and users

2005-03-11 Thread Matt
I'm tyring to use a global bayes database... but when user's try to
feed it spam a nd ham with sa-learn it does the following... right now
I even have the bayes directory set 777 just to debug.. what am I
doing wrong?

debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/usr/X11R6/bin', keeping.
debug: PATH included '/home/matth/bin', which doesn't exist, dropping.
debug: Final PATH set to: /usr/bin:/bin:/usr/local/bin:/usr/X11R6/bin
debug: using "/etc/mail/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: bayes: 9236 tie-ing to DB file R/O
/etc/mail/spamassassin/bayes/bayes_toks
Cannot open bayes databases /etc/mail/spamassassin/bayes/bayes_* R/O:
tie failed: Permission denied
debug: Score set 0 chosen.
debug: Initialising learner
debug: Initialising learner
debug: Syncing Bayes journal and expiring old tokens...
debug: lock: 9236 created
/etc/mail/spamassassin/bayes/bayes.lock.smtp2-ha.chilitech.net.9236
debug: lock: 9236 trying to get lock on
/etc/mail/spamassassin/bayes/bayes with 0 retries
debug: lock: 9236 link to /etc/mail/spamassassin/bayes/bayes.lock: link ok
debug: bayes: 9236 tie-ing to DB file R/W
/etc/mail/spamassassin/bayes/bayes_toks
debug: unlock: 9236 unlink /etc/mail/spamassassin/bayes/bayes.lock
Cannot open bayes databases /etc/mail/spamassassin/bayes/bayes_* R/W:
tie failed: Permission denied
debug: Syncing complete.
debug: Learning Ham
debug: uri tests: Done uriRE
debug: lock: 9236 created
/etc/mail/spamassassin/bayes/bayes.lock.smtp2-ha.chilitech.net.9236
debug: lock: 9236 trying to get lock on
/etc/mail/spamassassin/bayes/bayes with 0 retries
debug: lock: 9236 link to /etc/mail/spamassassin/bayes/bayes.lock: link ok
debug: bayes: 9236 tie-ing to DB file R/W
/etc/mail/spamassassin/bayes/bayes_toks
debug: unlock: 9236 unlink /etc/mail/spamassassin/bayes/bayes.lock
Cannot open bayes databases /etc/mail/spamassassin/bayes/bayes_* R/W:
tie failed: Permission denied
Learned from 0 message(s) (1 message(s) examined).
debug: bayes: 9236 untie-ing
debug: bayes: 9236 untie-ing db_toks
ERROR: the Bayes learn function returned an error, please re-run with
-D for more information


Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread DNI Support Department
Greetings:
If we disable network tests by using "--local" in our start up of spamd, 
spam assassin averages 0.1 to 0.3 seconds per email to process its rules.

If we enable network tests, then spam assassin averages 11 to 15 seconds 
per email to process its rules.

Of all the network tests, we find SURBL -- http://www.surbl.org/ -- to be 
the most productive.

Is there a way to enable network tests for just SURBL (we have a local, 
kept up to date with rsync, copy)?

Thank you.

Peter M. Abraham
Support and Customer Care Department
Dynamic Net, Inc.
Helping companies do business on the Net
420 Park Road; Suite 201
Wyomissing  PA  19610
Toll Free Voice:1-888-887-6727
International:  1-610-736-3795
FAX:1-610-736-3798
Support Email:  [EMAIL PROTECTED]
Company Email:  [EMAIL PROTECTED]
Web:http://www.dynamicnet.net/
http://www.wemanageservers.com/



Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread Raymond Dijkxhoorn
Hi!
If we disable network tests by using "--local" in our start up of spamd, spam 
assassin averages 0.1 to 0.3 seconds per email to process its rules.

If we enable network tests, then spam assassin averages 11 to 15 seconds per 
email to process its rules.

Of all the network tests, we find SURBL -- http://www.surbl.org/ -- to be the 
most productive.

Is there a way to enable network tests for just SURBL (we have a local, kept 
up to date with rsync, copy)?
You can score the other lookups at 0, so they will be skipped.
Bye,
Raymond.


Spam Assassin pattern help for regular expression

2005-03-11 Thread DNI Support Department
Greetings:
While it has never been pleasant, we regularly review spam including the 
HTML source code behind the spam to help us adjust our system-wide spam 
tagging rules.

We've noticed a lot of sick porn spam being left untagged.
The tests that raised the score, though not high enough were as follows:
HTML_IMAGE_ONLY_12,HTML_MESSAGE,MPART_ALT_DIFF
These tests are too generic to raise the score higher through customization.
However, I did notice in the HTML source code a common theme:
http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/Nf3KZuBf0T/file_name"; 
alt="rundowns" border="0">
http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/file_name"; 
border='0'>
http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/Y/eZ/file_namef"; 
alt="ouch" border="0">

http://tatighk.com/ae3019e288e5a5902958a62de/IWutqQ/filename"; 
border=0>
http://tatighk.com/ae3019e288e5a5902958a62de/filename"; 
alt="Antipas" border=0>
http://tatighk.com/ae3019e288e5a5902958a62de/fT66kl/KK0tcw71p/filename"; 
alt="strengthen" border=0>

http://muoniofgj.net/6481ddc2353481dae6c63affa/YriLMz/filename"; 
border="0">
http://muoniofgj.net/6481ddc2353481dae6c63affa/filename"; 
border='0'>
http://muoniofgj.net/6481ddc2353481dae6c63affa/txU/t1q/filename"; 
border=0>

Where the common theme appears to be the directory structure right after 
the domain name.

For the pattern experts out there, is there a way to craft a regular 
expression to catch the directory pattern used?

Specifically the directory pattern right after the domain name.
Thank you.

Peter M. Abraham
Support and Customer Care Department
Dynamic Net, Inc.
Helping companies do business on the Net
420 Park Road; Suite 201
Wyomissing  PA  19610
Toll Free Voice:1-888-887-6727
International:  1-610-736-3795
FAX:1-610-736-3798
Support Email:  [EMAIL PROTECTED]
Company Email:  [EMAIL PROTECTED]
Web:http://www.dynamicnet.net/
http://www.wemanageservers.com/



Re: Spam Assassin pattern help for regular expression

2005-03-11 Thread Jeff Chan
On Friday, March 11, 2005, 6:01:58 AM, DNI Department wrote:
> However, I did notice in the HTML source code a common theme:


>  src="http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/Nf3KZuBf0T/file_name"; 
> alt="rundowns" border="0">
> http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/file_name"; 
border='0'>>
> http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/Y/eZ/file_namef"; 
> alt="ouch" border="0">

> http://tatighk.com/ae3019e288e5a5902958a62de/IWutqQ/filename"; 
border=0>>
> http://tatighk.com/ae3019e288e5a5902958a62de/filename"; 
> alt="Antipas" border=0>
>  src="http://tatighk.com/ae3019e288e5a5902958a62de/fT66kl/KK0tcw71p/filename"; 
> alt="strengthen" border=0>


> http://muoniofgj.net/6481ddc2353481dae6c63affa/YriLMz/filename"; 
border="0">>
> http://muoniofgj.net/6481ddc2353481dae6c63affa/filename"; 
border='0'>>
> http://muoniofgj.net/6481ddc2353481dae6c63affa/txU/t1q/filename"; 
border=0>>

These three domains appear to belong to the same spammer.
Joker shut down tatighk.com for having an invalid address on
the registration, but the other two remain up at Tucows and
Primus Domain/Planetdomain.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Spam Assassin pattern help for regular expression

2005-03-11 Thread DNI Support Department
Greetings Jeff:
These are live examples; but it appears the porn spam all follow the same 
hex (?) directory structure after the domain name.

Hence, wanting a pattern for that purpose.
Thank you.
At 09:15 AM 3/11/2005, you wrote:
On Friday, March 11, 2005, 6:01:58 AM, DNI Department wrote:
> However, I did notice in the HTML source code a common theme:
>  src="http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/Nf3KZuBf0T/file_name";
> alt="rundowns" border="0">
> http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/file_name";
border='0'>>
> http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/Y/eZ/file_namef";
> alt="ouch" border="0">
> http://tatighk.com/ae3019e288e5a5902958a62de/IWutqQ/filename";
border=0>>
> http://tatighk.com/ae3019e288e5a5902958a62de/filename";
> alt="Antipas" border=0>
> 
> 
src="http://tatighk.com/ae3019e288e5a5902958a62de/fT66kl/KK0tcw71p/filename";
> alt="strengthen" border=0>

> http://muoniofgj.net/6481ddc2353481dae6c63affa/YriLMz/filename";
border="0">>
> http://muoniofgj.net/6481ddc2353481dae6c63affa/filename";
border='0'>>
> http://muoniofgj.net/6481ddc2353481dae6c63affa/txU/t1q/filename";
border=0>>
These three domains appear to belong to the same spammer.
Joker shut down tatighk.com for having an invalid address on
the registration, but the other two remain up at Tucows and
Primus Domain/Planetdomain.
Jeff C.
--
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

Peter M. Abraham
Support and Customer Care Department
Dynamic Net, Inc.
Helping companies do business on the Net
420 Park Road; Suite 201
Wyomissing  PA  19610
Toll Free Voice:1-888-887-6727
International:  1-610-736-3795
FAX:1-610-736-3798
Support Email:  [EMAIL PROTECTED]
Company Email:  [EMAIL PROTECTED]
Web:http://www.dynamicnet.net/
http://www.wemanageservers.com/



Re: Spam Assassin pattern help for regular expression

2005-03-11 Thread Jeff Chan
On Friday, March 11, 2005, 6:17:21 AM, DNI Department wrote:
> Greetings Jeff:

> These are live examples; but it appears the porn spam all follow the same 
> hex (?) directory structure after the domain name.

> Hence, wanting a pattern for that purpose.

I'll let others comment on expressions.

How about reporting the spams to Tucows and Primus to get them to
shut down the domains like Joker did?

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Spam Assassin pattern help for regular expression

2005-03-11 Thread Duncan Hill
On Friday 11 March 2005 14:17, DNI Support Department typed:
> Greetings Jeff:
>
> These are live examples; but it appears the porn spam all follow the same
> hex (?) directory structure after the domain name.
>
> Hence, wanting a pattern for that purpose.

> > > http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/file_name";

Hex is limited to the character classes [a-f0-9].  If the directories are all 
a constant length, a pattern of
url FOO /\/[a-f0-9]{23}\//
would match on a hex string 23 characters long.  Bit of a loose check though, 
as it might match some other random 23 character string that's a-f0-9 only.


Re: imbalance in bayes numbers

2005-03-11 Thread Matt Kettler
At 06:40 AM 3/11/2005, R McGlue wrote:
how much will the following imbalance skew the bayes algorithms (if at all)
Very little.. It will bias the scores very slightly towards higher bayes 
scores, but the chi-squared combining tends to make this effect not very 
noticeable unless the training imballance gets severe.



RE: RCVD_IN_BSP_TRUSTED

2005-03-11 Thread Gray, Richard
I believe that this domain is in fact legitimate and the messages in
question are *not* spam. My little sister signed up for it and I got
this crap in my inbox as result.

Basically she signed up, and puts in a list of everyone who's email
address she knows. Birthdayalarms sends out a message to each person
saying 'hello this is a message from . They want to know this stuff
about you' you enter the information, and it gives you the opportunity
to do the same thing and build up a little trust/friendshippy thing.
Very Viral :)

Anyway, something to think about. 

-Original Message-
From: Robert Menschel [mailto:[EMAIL PROTECTED] 
Sent: 11 March 2005 01:29
To: R McGlue
Cc: [EMAIL PROTECTED]
Subject: Re: RCVD_IN_BSP_TRUSTED

Hello R,

Thursday, March 10, 2005, 12:28:51 AM, you wrote:

RM> From: Alana Craig <[EMAIL PROTECTED]>
RM> Subject: Updating my address book

 >> I would like to include your contact information in an address book
I am  >> creating for myself. Please enter your particulars using the
link you see  >> below:
 >> http://www.bebo.com/fr1/10076492a285606901b140803462c883765683d20

RM> X-Spam-Status: No, score=-3.8 required=5.0 
RM> tests=BAYES_40,DNS_FROM_RFC_POST,
RM>  RCVD_IN_BSP_TRUSTED autolearn=failed version=3.0.2

RM> -4.3 RCVD_IN_BSP_TRUSTEDRBL: Sender is in Bonded Sender Program
RM> (trusted relay)
RM>[IronPort Bonded Sender - 
RM> ]

RM> should this obvious spam be allowed in BSP??

Yes, since the sending site,
> Received: from mail01.birthdayalarm.com ([65.19.128.185])
is bonded.  However, because they're bonded, report it to bondedsender,
and let birthdayalarm.com pay the bonding penalty.

BTW, I'm collecting samples of this type of spam (we're getting a fair
amount of it here also), and hope to have rules ready for specific.cf
eventually.  It's tricky, since there ARE valid emails with almost
identical characteristics...

(As for is it spam -- chances are yes, from what I've seen. Identical to
non-spam, except non-spam often will have more personal content.
The purpose of the spam, as far as I can guess, is to obtain valid email
addresses from gullible suckers.)

Bob Menschel





---
This email from dns has been validated by dnsMSS Managed Email Security and is 
free from all known viruses.

For further information contact [EMAIL PROTECTED]






Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread Vivek Khera
On Mar 11, 2005, at 8:57 AM, DNI Support Department wrote:
Is there a way to enable network tests for just SURBL (we have a 
local, kept up to date with rsync, copy)?

in your preferences file,
skip_rbl_checks 1
will turn off the RBL checks but leave SURBL checks on.
Vivek Khera, Ph.D.
+1-301-869-4449 x806


Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread Jeff Chan
On Friday, March 11, 2005, 7:48:26 AM, Vivek Khera wrote:

> On Mar 11, 2005, at 8:57 AM, DNI Support Department wrote:

>> Is there a way to enable network tests for just SURBL (we have a 
>> local, kept up to date with rsync, copy)?
>>

> in your preferences file,

> skip_rbl_checks 1

> will turn off the RBL checks but leave SURBL checks on.

> Vivek Khera, Ph.D.
> +1-301-869-4449 x806

Hmm, but is that a good thing or an inconsistency?  In any case
setting the scores of the regular RBL checks to 0 will definitely
do the right thing, and is arguably safer.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread Jeff Chan
On Friday, March 11, 2005, 7:51:45 AM, Jeff Chan wrote:
> On Friday, March 11, 2005, 7:48:26 AM, Vivek Khera wrote:

>> in your preferences file,

>> skip_rbl_checks 1

>> will turn off the RBL checks but leave SURBL checks on.

>> Vivek Khera, Ph.D.
>> +1-301-869-4449 x806

> Hmm, but is that a good thing or an inconsistency?  In any case
> setting the scores of the regular RBL checks to 0 will definitely
> do the right thing, and is arguably safer.

I.e., in terms of networking the DNS queries, shouldn't SURBLs
and RBLs be handled similarly?  The actual DNS lookups against
the lists are pretty similar, aside from the lack of needing to
resolve the wild domains into IPs first when using SURBLs.

(This is a question for the developers really.  :-)

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Bayes Autolearn Threshold - different scoring?

2005-03-11 Thread greg
Hello all,
Let me start out by saying I've been searching for a couple of days on the
web on this subject but to no avail, so I would appreciate any help.

I have been using SA for more than a year and right now I'm running 3.0.1
on linux (bayes corpus size: nspam = 19482, nham = 3249). My filter
behaves very well, I only get about one false positive a month and 2-3
false negatives (averaging about 100 spams a day,
http://www.amnesiak.com/spam/ if you're curious). I'm invoking SA through
procmailrc with | spamassassin -p /home/greg/.spamassassin/user_prefs .

My problem is this: I'm using squirrelmail, and to keep an eye on false
negatives (I define those as real mails that get shuttled to spam, just to
keep things clear) I have a 'spam' folder. As anyone that uses sqmail
knows, it gets very slow when any folder contains more than a few hundred
messages. But, since my filter is trained very well, I'd like to send
autolearned spams to /mail/Trash (ultimately to /dev/null) so I don't have
to deal with those. I figured just setting bayes_auto_learn_threshold_spam
6 would work great. It really does not do much of anything. I've decreased
it to 3, and to 1, but it really doesnt make a difference. I found these
relevant lines in a debug:

debug: running full-text regexp tests; score so far=4.648
debug: auto-learn: currently using scoreset 3, recomputing score based on
scoreset 1.
debug: auto-learn: message score: 4.648, computed score for autolearn: 3.987
debug: auto-learn? ham=0.1, spam=1, body-points=0, head-points=-2.82,
learned-points=1.886
debug: auto-learn? no: scored as spam but too few body points (0 < 3)
debug: is spam? score=4.648 required=1

What, exactly, is going on here? The head points I can explain (this is a
spam I saved that had already come to me) but the body points - I don't
understand. It also wasn't clear to me until this debug that the autolearn
had its own scoring system.

Any help or clarification would be great!

Thanks,
-Greg



Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread Stuart Johnston
DNI Support Department wrote:
Is there a way to enable network tests for just SURBL (we have a local, 
kept up to date with rsync, copy)?
One possible problem with doing this is that it will switch you to the
network score sets giving you lower scores for other tests.  Without the
other network tests to balance things out, you may end up with lower
scores overall.


Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread Jeff Chan
On Friday, March 11, 2005, 9:34:15 AM, Stuart Johnston wrote:
> DNI Support Department wrote:
>> 
>> Is there a way to enable network tests for just SURBL (we have a local, 
>> kept up to date with rsync, copy)?

> One possible problem with doing this is that it will switch you to the
> network score sets giving you lower scores for other tests.  Without the
> other network tests to balance things out, you may end up with lower
> scores overall.

Or you could boost the SURBL scores, or lower your spam
threshold.  :-)

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread DNI Support Department
Greetings:
SURBL does not appear to work with network tests totally disabled (i.e. 
using --local in the spamd startup).

No network tests:
0.1 to 0.6 seconds to score emails as spam or ham
Approximately 90% accuracy on tagging spam correctly
Approximately 2% false positives tagging ham as spam
Network tests (for SURBL)
11.0 to over 16 seconds to score emails as spam or ham
Approximately 95% accuracy on tagging spam correctly
Approximately 1 to 2% false positives tagging ham as spam
Thanks to input from this mailing list, we've disabled almost all of the 
network tests.

This has helped tremendously brining the scoring time down to an average of 
3 to 4 seconds per email with the range of times from 0.4 seconds to 8 seconds.

Thank you.
At 12:40 PM 3/11/2005, you wrote:
On Friday, March 11, 2005, 9:34:15 AM, Stuart Johnston wrote:
> DNI Support Department wrote:
>>
>> Is there a way to enable network tests for just SURBL (we have a local,
>> kept up to date with rsync, copy)?
> One possible problem with doing this is that it will switch you to the
> network score sets giving you lower scores for other tests.  Without the
> other network tests to balance things out, you may end up with lower
> scores overall.
Or you could boost the SURBL scores, or lower your spam
threshold.  :-)
Jeff C.
--
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

Peter M. Abraham
Support and Customer Care Department
Dynamic Net, Inc.
Helping companies do business on the Net
420 Park Road; Suite 201
Wyomissing  PA  19610
Toll Free Voice:1-888-887-6727
International:  1-610-736-3795
FAX:1-610-736-3798
Support Email:  [EMAIL PROTECTED]
Company Email:  [EMAIL PROTECTED]
Web:http://www.dynamicnet.net/
http://www.wemanageservers.com/



Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread Martin Hepworth
Are you running a caching name server locally on the machine? This helps 
alot in reducing the DNS traffic for RBLs and URI RBL's.

I normally process emails in under 2 seconds using couple of RBL's, 
pyzor and all of the subl.org URI lookups.

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
DNI Support Department wrote:
Greetings:
SURBL does not appear to work with network tests totally disabled (i.e. 
using --local in the spamd startup).

No network tests:
0.1 to 0.6 seconds to score emails as spam or ham
Approximately 90% accuracy on tagging spam correctly
Approximately 2% false positives tagging ham as spam
Network tests (for SURBL)
11.0 to over 16 seconds to score emails as spam or ham
Approximately 95% accuracy on tagging spam correctly
Approximately 1 to 2% false positives tagging ham as spam
Thanks to input from this mailing list, we've disabled almost all of the 
network tests.

This has helped tremendously brining the scoring time down to an average 
of 3 to 4 seconds per email with the range of times from 0.4 seconds to 
8 seconds.

Thank you.
At 12:40 PM 3/11/2005, you wrote:
On Friday, March 11, 2005, 9:34:15 AM, Stuart Johnston wrote:
> DNI Support Department wrote:
>>
>> Is there a way to enable network tests for just SURBL (we have a 
local,
>> kept up to date with rsync, copy)?

> One possible problem with doing this is that it will switch you to the
> network score sets giving you lower scores for other tests.  Without 
the
> other network tests to balance things out, you may end up with lower
> scores overall.

Or you could boost the SURBL scores, or lower your spam
threshold.  :-)
Jeff C.
--
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/


Peter M. Abraham
Support and Customer Care Department
Dynamic Net, Inc.
Helping companies do business on the Net
420 Park Road; Suite 201
Wyomissing  PA  19610
Toll Free Voice:1-888-887-6727
International:1-610-736-3795
FAX:1-610-736-3798
Support Email:[EMAIL PROTECTED]
Company Email:[EMAIL PROTECTED]
Web:http://www.dynamicnet.net/
http://www.wemanageservers.com/

**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.   
**


Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread Jeff Chan
On Friday, March 11, 2005, 9:57:45 AM, Martin Hepworth wrote:
> Are you running a caching name server locally on the machine? This helps 
> alot in reducing the DNS traffic for RBLs and URI RBL's.

> I normally process emails in under 2 seconds using couple of RBL's, 
> pyzor and all of the subl.org URI lookups.

I believe they are using rsynced local zone files.  (I'm hoping
they're using rbldnsd since it's so much faster and more
efficient than BIND.)

  http://www3.surbl.org/rsync-signup.html

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread Jeff Chan
On Friday, March 11, 2005, 9:53:19 AM, DNI Department wrote:
> Greetings:

> SURBL does not appear to work with network tests totally disabled (i.e. 
> using --local in the spamd startup).

It may be worth noting that disabling network tests at the
command line with --local and "skip_rbl_checks 1" inside
the configs may not produce the same results.  The former may
disable all network tests (including Razor, DCC, RBLs, SURBLs,
etc.), while the latter may disable only RBL checks.

Also Stuart Johnston's comment about the scoring being affected
by having network tests disabled is important.  IIRC there is
a matrix of 4 possible scores, with and without Bayes and with
and without network tests.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Spam Assassin pattern help for regular expression

2005-03-11 Thread List Mail User
>...
>Greetings:
>
>While it has never been pleasant, we regularly review spam including the 
>HTML source code behind the spam to help us adjust our system-wide spam 
>tagging rules.
>
>We've noticed a lot of sick porn spam being left untagged.
>
>The tests that raised the score, though not high enough were as follows:
>
>HTML_IMAGE_ONLY_12,HTML_MESSAGE,MPART_ALT_DIFF
>
>These tests are too generic to raise the score higher through customization.
>
>However, I did notice in the HTML source code a common theme:
>
>
>src="http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/Nf3KZuBf0T/file_name"; 
>alt="rundowns" border="0">
>http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/file_name"; 
>border='0'>
>http://yamanekohm.com/9d70188c4e7971b6d3b1e2fa8/Y/eZ/file_namef"; 
>alt="ouch" border="0">
>
>http://tatighk.com/ae3019e288e5a5902958a62de/IWutqQ/filename"; 
>border=0>
>http://tatighk.com/ae3019e288e5a5902958a62de/filename"; 
>alt="Antipas" border=0>
>src="http://tatighk.com/ae3019e288e5a5902958a62de/fT66kl/KK0tcw71p/filename"; 
>alt="strengthen" border=0>
>
>
>http://muoniofgj.net/6481ddc2353481dae6c63affa/YriLMz/filename"; 
>border="0">
>http://muoniofgj.net/6481ddc2353481dae6c63affa/filename"; 
>border='0'>
>http://muoniofgj.net/6481ddc2353481dae6c63affa/txU/t1q/filename"; 
>border=0>
>
>
>Where the common theme appears to be the directory structure right after 
>the domain name.
>
>For the pattern experts out there, is there a way to craft a regular 
>expression to catch the directory pattern used?
>
>Specifically the directory pattern right after the domain name.
>
>Thank you.
>
>Peter M. Abraham
>Support and Customer Care Department
>Dynamic Net, Inc.
>Helping companies do business on the Net
>420 Park Road; Suite 201
>Wyomissing  PA  19610
>Toll Free Voice:   1-888-887-6727
>International: 1-610-736-3795
>FAX:   1-610-736-3798
>Support Email: [EMAIL PROTECTED]
>Company Email: [EMAIL PROTECTED]
>Web:   http://www.dynamicnet.net/
>   http://www.wemanageservers.com/
>
>
>
These are part of the same porn/spam group that came up yesterday.
They are using a new person's name in Texas (again) "GEORGE BAKER".  Also,
note that Joker (again) has identified the (new)registration address as
invalid (just like all the previous ones).  Any 'whois' will show the name
servers:

  NS1.ANWOO-munged.COM
  NS1.BOMOFO-munged.COM
  NS1.EPOBOY-munged.COM
  NS1.MYNAMESERVER-munged.CA

Paul Shupak
[EMAIL PROTECTED]

P.S. Time to implement Bugzilla #4106 (maybe I'll get some time this weekend).


Local.cf does this look right?

2005-03-11 Thread jimsheffer
Hi everyone-

I've been using SpamAssassin on another mail server for about a year.
I've installed the new SpamAssassin version on naother mail server, and will
be testing this weekend.

I've got what I believe is a correct local.cf file set up, but want to make
sure with all the new syntax and everything.

Does anything here stand out as wrong?

Thanks for any help!

add_header spam Flag _YESNOCAPS_
add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_
autolearn=_AUTOLEARN_ version=_VERSION_
add_header all Level _STARS(*)_
add_header all Checker-Version SpamAssassin _VERSION_ (_SUBVERSION_) on
_HOSTNAME_
ok_language en
ok_locales all
trusted_networks xxx.xxx.xxx.xxx
use_razor2 1
razor_timeout 10
rbl_timeout 15
dns_available test: domain1.tld domain2.tld domain3.tld
use_bayes 1
use_bayes_rules 1
auto_whitelist_factor 0.5
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam 0.1
bayes_auto_learn_threshold_spam 12
bayes_min_ham_num 200
bayes_min_spam_num 200
bayes_learn_during_report 1
bayes_expiry_max_db_size 15
bayes_auto_expire 1
report_safe 0

Jim Sheffer,

OmniPilot Softwarehttp://www.omnipilot.com
Systems Administrator [EMAIL PROTECTED] 




Re: Bayes Autolearn Threshold - different scoring?

2005-03-11 Thread Kris Deugau
[EMAIL PROTECTED] wrote:
> My problem is this: I'm using squirrelmail,

As your only email access?

> and to keep an eye on false negatives (I define those as real mails
> that get shuttled to spam, just to keep things clear) I have a 'spam'
> folder. As anyone that uses sqmail knows, it gets very slow when any
> folder contains more than a few hundred messages.

  Try several thousand, as a number of customers have reported to
me...

Actually, it's only spewed out error messages in a very few cases.

> But, since my
> filter is trained very well, I'd like to send autolearned spams to
> /mail/Trash (ultimately to /dev/null) so I don't have to deal with
> those.

Mmm.  Dangerous - I've seen FPs get autolearned as spam once or twice. 
:(

What I do on my accounts is set up a "big-spam" folder, and rely on the
X-Spam-Level header to move mail there.  Anything scoring 15 or higher
gets 15 or more stars in X-Spam-Level, and I have this:

:0:
* ^X-Spam-Level:.\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
/home/kdeugau/mail/bigspam

before the check that files spam in my "main" spam folder.

With the well-tuned 2.64+SURBL systems I have, ~80% or the spam usually
ends up in the "big-spam" folder.

> I figured just setting bayes_auto_learn_threshold_spam 6 would
> work great. It really does not do much of anything. I've decreased
> it to 3, and to 1, but it really doesnt make a difference. I found
> these relevant lines in a debug:

[snip]
> debug: auto-learn? ham=0.1, spam=1, body-points=0, head-points=-2.82,
> learned-points=1.886
> debug: auto-learn? no: scored as spam but too few body points (0 < 3)

These two entries are the critical ones;  note the body-points and
head-points.  To be autolearned as spam, a message must hit tests worth
a total of 3 points or more on header tests, and a total of 3 points or
more on body tests.

I notice you're still using the default autolearn-as-ham setting;  this
is dangerous as very low-scoring spam can get autolearned incorrectly.
I've dropped it to -0.01 on my systems to prevent this.

> What, exactly, is going on here? The head points I can explain (this
> is a spam I saved that had already come to me) but the body points -
> I don't understand. It also wasn't clear to me until this debug that
> the autolearn had its own scoring system.

Not entirely;  to decide whether to autolearn a message one of the
"no-Bayes" score sets is used to calculate the scores, depending on
whether you've got network tests disabled or not.

-kgd
-- 
Get your mouse off of there!  You don't know where that email has been!


Re: Local.cf does this look right?

2005-03-11 Thread Matt Kettler
At 01:27 PM 3/11/2005, jimsheffer wrote:
Does anything here stand out as wrong?

dns_available test: domain1.tld domain2.tld domain3.tld
I do hope that just modified the actual values and you don't literally have 
"domain1.tld" in there..

Other than that, it looks fine, but rather than ask us, why not ask 
spamassassin?
Just run:
spamassassin --lint

If it runs quitely, you're likely OK. If it complains about a line, you 
have something to fix, and you know which line number to fix. 



report to spamcop errors. was: error during report: Insecure dependency

2005-03-11 Thread Matias Lopez Bergero
Hi.
Wen I report spam via spamassassin (3.0.2) I get this error or warning 
message:

% spamassassin -D -r --mbox spam
[...]
Insecure dependency in connect while running with -T switch at
/usr/lib/perl5/5.8.0/i386-linux-thread-multi/IO/Socket.pm line 114.
I can read that the reports to dcc, pyzor and razor where completed 
successfully, but I am not very sure of how many of the mails in the 
file (mbox) where reported.

How can avoid the spamcop warning message to do a good spam report?
BR,
Matías.
ps. My setup: Linux 2.4, SA 3.0.2, sendmail 8.13, milter-spamc 0.25, 
Razor Agents 2.67, Pyzor 0.4.0 and DCC 1.2.69.




Re: Bayes Autolearn Threshold - different scoring?

2005-03-11 Thread greg
> As your only email access?
pretty much, yes.

>   Try several thousand, as a number of customers have reported to
> me...

oh, I've been there - I'm just trying to avoid going there again. :)

> Mmm.  Dangerous - I've seen FPs get autolearned as spam once or twice.
> :(

I realize that. With my system on my spam the way it is now, my spam
threshold is set to one. I have not seen a FP >=3.0 in several months. So,
I know there's a risk.

> What I do on my accounts is set up a "big-spam" folder, and rely on the
> X-Spam-Level header to move mail there.  Anything scoring 15 or higher
> gets 15 or more stars in X-Spam-Level, and I have this:
>
> :0:
> * ^X-Spam-Level:.\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
> /home/kdeugau/mail/bigspam
>
> before the check that files spam in my "main" spam folder.
>
> With the well-tuned 2.64+SURBL systems I have, ~80% or the spam usually
> ends up in the "big-spam" folder.

If I did that with a threshold of 3.0 on my system I would have had 84% of
the total 'spams' I've gotten in the last week end up in the big-spam
folder, with no FPs.

> [snip]
>> debug: auto-learn? ham=0.1, spam=1, body-points=0, head-points=-2.82,
>> learned-points=1.886
>> debug: auto-learn? no: scored as spam but too few body points (0 < 3)
>
> These two entries are the critical ones;  note the body-points and
> head-points.  To be autolearned as spam, a message must hit tests worth
> a total of 3 points or more on header tests, and a total of 3 points or
> more on body tests.

I'm sure that's the problem. Here's a different sample spam, minus the
bayes score (which isn't counted on the autolearn body tests, correct?)
 2.2 RCVD_HELO_IP_MISMATCH  Received: HELO and IP do not match, but should
 3.0 DATE_IN_FUTURE_12_24   Date: is 12 to 24 hours after Received: date
 1.2 RCVD_NUMERIC_HELO  Received: contains an IP address used for HELO
 2.7 FORGED_YAHOO_RCVD  'From' yahoo.com does not match 'Received'
headers

No body hits there... So basically, I'm getting what I want from the
headers, and from what bayes already knows. How do I tweak the thresholds
that the autolearner uses, for example, either setting the body threshold
to 0 or eliminating that check entirely? I realize this might produce
unwanted results, so I'd probably give it a week or so initial experiment.

> I notice you're still using the default autolearn-as-ham setting;  this
> is dangerous as very low-scoring spam can get autolearned incorrectly.
> I've dropped it to -0.01 on my systems to prevent this.

That's a good tip, i'll implement that.

Thanks!




Re: Is it possible to use SURBL without enabling all network tests?

2005-03-11 Thread DNI Support Department
Greetings Martin:
We have rbldnsd running of a private IP with BIND/DNS forwarding calls to 
the various SURBL lists to that name server.  We are approved to rsync the 
data from surbl; and that's been working well.

Our primary mail server is on a physical server running several network 
applications besides mail; we will be moving it to its own physical server 
in a few months; hopefully then we can match your under 2 second benchmark.

On a side note, we are a paid spamcop.net subscriber; and we do report all 
spam to spamcop (under 8 hour average reporting time); so we do what we can 
to prevent spam in the first place.

Thank you.
At 12:57 PM 3/11/2005, you wrote:
Are you running a caching name server locally on the machine? This helps 
alot in reducing the DNS traffic for RBLs and URI RBL's.

I normally process emails in under 2 seconds using couple of RBL's, pyzor 
and all of the subl.org URI lookups.

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300

Peter M. Abraham
Support and Customer Care Department
Dynamic Net, Inc.
Helping companies do business on the Net
420 Park Road; Suite 201
Wyomissing  PA  19610
Toll Free Voice:1-888-887-6727
International:  1-610-736-3795
FAX:1-610-736-3798
Support Email:  [EMAIL PROTECTED]
Company Email:  [EMAIL PROTECTED]
Web:http://www.dynamicnet.net/
http://www.wemanageservers.com/



Re: Bayes Autolearn Threshold - different scoring?

2005-03-11 Thread Kris Deugau
[EMAIL PROTECTED] wrote:
> I'm sure that's the problem. Here's a different sample spam, minus
> the bayes score (which isn't counted on the autolearn body tests,
> correct?)

Correct.  But keep in mind that the autolearn process actually uses
different scores.

>  2.2 RCVD_HELO_IP_MISMATCH  Received: HELO and IP do not match, but
> should

>From scoreset 3 (2.178);  autolearn will use set 1 (score: 0.618)

>  3.0 DATE_IN_FUTURE_12_24   Date: is 12 to 24 hours after Received:
> date

Set 1 score is 2.329.

>  1.2 RCVD_NUMERIC_HELO  Received: contains an IP address used for
> HELO

Set 1 score is 1.531.

>  2.7 FORGED_YAHOO_RCVD  'From' yahoo.com does not match
> 'Received' headers

Set 1 score is 2.174.

All together, that's well over the minimum 3 points from headers...  but
no body score.

> No body hits there... So basically, I'm getting what I want from the
> headers, and from what bayes already knows. How do I tweak the
> thresholds that the autolearner uses, for example, either setting the
> body threshold to 0 or eliminating that check entirely?

Hack the code.  There's no option I've heard of, and nothing noted in
the man page IIRC to allow that.

> I realize this might produce
> unwanted results, so I'd probably give it a week or so initial
> experiment.

I don't know how the current setup was decided on, but I'd imagine that
other methods have been tried - for general use, the 3+3 minimum in the
distributed SA is probably ideal.  For some specific mail streams
(yours, perhaps?)  this may not be optimal and may need to be tweaked.

-kgd
-- 
Get your mouse off of there!  You don't know where that email has been!


Re: SA addr tests need to be updated

2005-03-11 Thread Eric A. Hall

On 3/9/2005 1:38 PM, Eric A. Hall wrote:

> I think the four affected rules are RCVD_HELO_IP_MISMATCH,
> RCVD_NUMERIC_HELO, RCVD_ILLEGAL_IP, RCVD_BY_IP

Extending the problem report--it seems that these rules don't fire in some
instances. I haven't really checked this out yet, but addresses with a
leading octet of 111, 123, and some others at or below ~130 seem to get
skipped entirely (so does 99 and a few other two-digit numbers). Further,
in keeping with the notion that all-numeric is illegal, high-numbered
decimals (eg, 789) don't trip the RCVD_NUMERIC_HELO rule either.

Let me know what you the plan is on this as I can add these kinds of tests
to my private set, but would rather not if they'll be in the core set.

-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/


Re: Bayes Autolearn Threshold - different scoring?

2005-03-11 Thread Greg Daly
Kris, thanks for your help and insight. From what I can see, the settings
are in PerMsgStatus.pm, line 308/309 (my version of course).

my $required_body_points = 3;
my $required_head_points = 3;

I'll try changing those around, and update my status to this list in a while.

Again, thanks!
-g

> [EMAIL PROTECTED] wrote:
>> I'm sure that's the problem. Here's a different sample spam, minus
>> the bayes score (which isn't counted on the autolearn body tests,
>> correct?)
>
> Correct.  But keep in mind that the autolearn process actually uses
> different scores.
>
>>  2.2 RCVD_HELO_IP_MISMATCH  Received: HELO and IP do not match, but
>> should
>
>>>From scoreset 3 (2.178);  autolearn will use set 1 (score: 0.618)
>
>>  3.0 DATE_IN_FUTURE_12_24   Date: is 12 to 24 hours after Received:
>> date
>
> Set 1 score is 2.329.
>
>>  1.2 RCVD_NUMERIC_HELO  Received: contains an IP address used for
>> HELO
>
> Set 1 score is 1.531.
>
>>  2.7 FORGED_YAHOO_RCVD  'From' yahoo.com does not match
>> 'Received' headers
>
> Set 1 score is 2.174.
>
> All together, that's well over the minimum 3 points from headers...  but
> no body score.
>
>> No body hits there... So basically, I'm getting what I want from the
>> headers, and from what bayes already knows. How do I tweak the
>> thresholds that the autolearner uses, for example, either setting the
>> body threshold to 0 or eliminating that check entirely?
>
> Hack the code.  There's no option I've heard of, and nothing noted in
> the man page IIRC to allow that.
>
>> I realize this might produce
>> unwanted results, so I'd probably give it a week or so initial
>> experiment.
>
> I don't know how the current setup was decided on, but I'd imagine that
> other methods have been tried - for general use, the 3+3 minimum in the
> distributed SA is probably ideal.  For some specific mail streams
> (yours, perhaps?)  this may not be optimal and may need to be tweaked.
>
> -kgd
> --
> Get your mouse off of there!  You don't know where that email has been!
>




Re: SA addr tests need to be updated

2005-03-11 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Eric A. Hall writes:
> On 3/9/2005 1:38 PM, Eric A. Hall wrote:
> 
> > I think the four affected rules are RCVD_HELO_IP_MISMATCH,
> > RCVD_NUMERIC_HELO, RCVD_ILLEGAL_IP, RCVD_BY_IP
> 
> Extending the problem report--it seems that these rules don't fire in some
> instances. I haven't really checked this out yet, but addresses with a
> leading octet of 111, 123, and some others at or below ~130 seem to get
> skipped entirely (so does 99 and a few other two-digit numbers).

That certainly sounds like a bug.

> Further,
> in keeping with the notion that all-numeric is illegal, high-numbered
> decimals (eg, 789) don't trip the RCVD_NUMERIC_HELO rule either.
> Let me know what you the plan is on this as I can add these kinds of tests
> to my private set, but would rather not if they'll be in the core set.

I'd recommend opening those as 2 bugs in our BZ, and if there's bugs
in existing rules based on what they should be doing, we can fix them;
or if there's additional rules that catch *other* cases that aren't
matching what we should already be catching, we can add new ones.

putting them in the bz means we can use the nifty auto-mass-check
functionality to get them quickly tested on the large, 5-person,
nightly-mass-check corpora.

- --j.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCMgJWMJF5cimLx9ARAqVWAJ9HrHw5Nl1lk9YHx5rB3NxW/2+LigCgomLH
YVgQ0SAdr2C0Ws9A4xU+JXk=
=9zrX
-END PGP SIGNATURE-



Re: SA addr tests need to be updated

2005-03-11 Thread Theo Van Dinter
On Fri, Mar 11, 2005 at 03:25:06PM -0500, Eric A. Hall wrote:
> Extending the problem report--it seems that these rules don't fire in some
> instances. I haven't really checked this out yet, but addresses with a
> leading octet of 111, 123, and some others at or below ~130 seem to get
> skipped entirely (so does 99 and a few other two-digit numbers).

Yeah, they're all listed as "reserved".  See M::SA::Constants for more detail...

> in keeping with the notion that all-numeric is illegal, high-numbered
> decimals (eg, 789) don't trip the RCVD_NUMERIC_HELO rule either.

hrm.  Sounds like it looks for a real looking IP and not just generic numbers.

-- 
Randomly Generated Tagline:
Asleep at the switch!  I wasn't asleep!  I was drunk!
 
-- Homer Simpson
   Homer the Vigilante


pgpMdOWywsQbT.pgp
Description: PGP signature


Re: SA addr tests need to be updated

2005-03-11 Thread Eric A. Hall

On 3/11/2005 3:42 PM, Theo Van Dinter wrote:
> On Fri, Mar 11, 2005 at 03:25:06PM -0500, Eric A. Hall wrote:
> 
>> Extending the problem report--it seems that these rules don't fire in
>> some instances. I haven't really checked this out yet, but addresses
>> with a leading octet of 111, 123, and some others at or below ~130
>> seem to get skipped entirely (so does 99 and a few other two-digit
>> numbers).
> 
> Yeah, they're all listed as "reserved".  See M::SA::Constants for more
> detail...

I suspected as much. But even then, RCVD_NUMERIC_HELO should match in all
cases because all-numeric is always illegal (regardless of the number
itself, any number is illegal period). Furthermore, they should be firing
on RCVD_ILLEGAL_IP since they are also illegal--bonus ratware sign.

-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/