Robert Menschel wrote:
MK However, these attempts are only going to be effective against the bayes
portion
MK of SA.
As I've said before, my opinion is that these attempts are NOT
effective against SpamAssassin's Bayes system.
As a rule, we do NOT receive hams which contain such extracted text.
No matter where the spammers extract their text from, they're going to
extract words that are not found in ham, and Bayes is going to learn
that the presence of such words means S P A M.
I agree, mostly, however I have found that SOME emails with extracted text
collide with our ham profile. Not all, not even many, but some do collide.
Really this is entirely a function of how well the spammer can match your ham
profile with his extraction. If he can match it accurately, this technique will
be very effective against your bayes. If they can't match your ham profile, it
won't work at all.
Just today I got one email with this hit list:
score=17.817, required 5, autolearn=spam, AB_URI_RBL 1.00, BAYES_10 -0.91,
BLACK_URI_RBL 2.00, DRUGS_ERECTILE 1.00, INFO_GREYLIST_NOTDELAYED -0.00,
RAZOR2_CF_RANGE_51_100 0.20, RAZOR2_CHECK 1.05, RCVD_IN_BL_SPAMCOP_NET 1.50,
RCVD_IN_XBL 4.92, SPAMCOP_URI_RBL 3.00, VIAGRA_ONLINE 4.06
It got the BAYES_10 because the extracted text closely matches the general
language style of my end users. The spam content was 1 line and a url. The
extracted text was 4 lines.