Re: Grouping input

2005-05-25 Thread Matt Kettler
Robert Menschel wrote:

 MK However, these attempts are only going to be effective against the bayes 
 portion
 MK of SA.
 
 As I've said before, my opinion is that these attempts are NOT
 effective against SpamAssassin's Bayes system.
 
 As a rule, we do NOT receive hams which contain such extracted text.
 No matter where the spammers extract their text from, they're going to
 extract words that are not found in ham, and Bayes is going to learn
 that the presence of such words means S P A M.

I agree, mostly, however I have found that SOME emails with extracted text
collide with our ham profile. Not all, not even many, but some do collide.

Really this is entirely a function of how well the spammer can match your ham
profile with his extraction. If he can match it accurately, this technique will
be very effective against your bayes. If they can't match your ham profile, it
won't work at  all.


Just today I got one email with this hit list:

score=17.817, required 5,   autolearn=spam, AB_URI_RBL 1.00, BAYES_10 -0.91,
BLACK_URI_RBL 2.00, DRUGS_ERECTILE 1.00, INFO_GREYLIST_NOTDELAYED -0.00,
RAZOR2_CF_RANGE_51_100 0.20, RAZOR2_CHECK 1.05, RCVD_IN_BL_SPAMCOP_NET 1.50,
RCVD_IN_XBL 4.92, SPAMCOP_URI_RBL 3.00, VIAGRA_ONLINE 4.06


It got the BAYES_10 because the extracted text closely matches the general
language style of my end users. The spam content was 1 line and a url. The
extracted text was 4 lines.





Re: Grouping input

2005-05-24 Thread Matt Kettler
John August wrote:
 I've noticed spam which has a section of extracted text after the spam
 content. It seems to me that by taking things line by line, you'll reach
 a point at which the spam index peaks, and then trails off after. This
 is a pattern which would remain even if the overall spam index is low.
 
 Does the current spam assassin implement such an approach ? Or is the 
 algorithm sufficiently subtle to null out these attempts ?

AFAIK, no part of SA takes such an approach.

However, these attempts are only going to be effective against the bayes portion
of SA.

Since the rest of SA (all the static rule, SURBLs, etc) are completely
unaffected by the adding of pad text, SA is overall fairly resistant to this
kind of attack.

It might be an interesting development for SA's bayes subsystem to do a
partial-text analysis and see if it improves accuracy, but right now it does a
full-text analysis.