Re: phish/bayes

2005-08-29 Thread Craig McLean

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

(Note: CC: changed to users@spamassassin.apache.org -
@incubator.apache.org address is deprecated).

Sander Holthaus - Orange XL wrote:
[snip]

| But couldn't some 'simple' rules fix this? One metafilter that looks for
| valid links (images, href's, email-addresses) to ebay, amazon, banks,
| etc. and another meta-rule that looks for links that point to non-ebay,
| non-amazon, non-bank links. A phisers will always need to point the
| users to a site that is under his control and it shouldn't be too hard
| to recognize this site.

I've been using the attached for a while to catch paypal phishing scams,
and am in the process of modifying it to catch ebay account scams too.

Caveat: It's never FPd for me but there is plenty of potential there.

Anyway, feel free to use/adapt/whatever to suit.
Kind Regards,
Craig.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDEvKjMDDagS2VwJ4RArUWAKDU1UZss3lF3joOxT+CZg1o2izfXQCglmt7
9owI38Yw6sPtLuhj9Cw/5Rs=
=W+hS
-END PGP SIGNATURE-
#
# Rules to catch PayPal phishing attempts.
#
# Checks for common paypal update your account phrases, or unauthorised
# access phrases. Confirms that the mail came from @paypal and contains 
# only paypal.com links, otherwise throws scores.
#
# Craig McLean - 2005/05/22

header __LOCAL_PP_ISFROMPP  From:addr =~ /[EMAIL PROTECTED]/i
header __LOCAL_PP_S_UPD Subject: =~ m'(?:confirm|update) (?:your|the) 
(?:billing)?(?:records?|information|account)'i
header __LOCAL_PP_S_AUT Subject: =~ m'unauthori[sz]ed access'i
body __LOCAL_PP_B_UPD  m'(?:confirm|updated?|verify|restore) (?:your|the) 
(?:account|current|billing|personal)? 
?(?:records?|information|account|identity|access|data)'i
body __LOCAL_PP_B_ATT  m'one or more attempts'i
body __LOCAL_PP_B_ACT  m'unusual activity'i
uri __LOCAL_PP_PPCGIURL 
m'https?://www\.paypal\.com/([A-Za-z0-9-_]+/)?cgi-bin/webscr\?'i
uri __LOCAL_PP_NONPPURL 
m'https?://(?:[A-Za-z0-9-_]+)\.(?!(paypal)\.com)(?:[A-Za-z0-9-_\.]+)'i

meta LOCAL_PP_UPD_BADURL (__LOCAL_PP_ISFROMPP  ((__LOCAL_PP_S_AUT || 
__LOCAL_PP_B_ATT || __LOCAL_PP_B_ACT || __LOCAL_PP_B_UPD || __LOCAL_PP_S_UPD) 
|| __LOCAL_PP_PPCGIURL)  __LOCAL_PP_NONPPURL)
meta LOCAL_PP_UPD_BADADDR (!__LOCAL_PP_ISFROMPP  ((__LOCAL_PP_S_AUT || 
__LOCAL_PP_B_ATT || __LOCAL_PP_B_ACT || __LOCAL_PP_B_UPD || __LOCAL_PP_S_UPD) 
 __LOCAL_PP_PPCGIURL))

describe LOCAL_PP_UPD_BADURL paypal/ebay account update, but has bad URL
describe LOCAL_PP_UPD_BADADDR paypal/ebay account update, but from bad email

score LOCAL_PP_UPD_BADURL 4
score LOCAL_PP_UPD_BADADDR 4


RE: phish/bayes

2005-08-28 Thread Greg Allen



I 
wouldn't worry about it. You can whitelist the real ebayservers with 
SA.

Also, 
if you want to catch more of the phish messages you can install the Clamav 
plugin for SA, it does very good at finding phishies. You have to 
also install Clamav, but it is a fairly simple thing to 
install.

On a 
side note, Ebay is not too smart IMO. Their real emails sometimes look a lot 
like phish, which must confuse the heck out of their customers. I am sure the 
bad guys like it though.



  -Original Message-From: satalk (sent by Nabble.com) 
  [mailto:[EMAIL PROTECTED]Sent: Thursday, August 25, 2005 6:49 
  PMTo: users@spamassassin.apache.orgSubject: 
  phish/bayesI could not find any email in this forum 
  addressing this issue - it does not mean there is not one - I just 
  could'nt find it :) MY question is as follows: Given that so many 
  valid tokens from ebay/paypal sites exist in phish emails, am I correct in 
  saying that it is imperative to avoid phish emails entering the bayes 
  database? Anthony 
  
  Sent from the SpamAssassin - 
  Users forum at Nabble.com. 


RE: phish/bayes

2005-08-28 Thread Sander Holthaus - Orange XL



I wouldn't count too much on ClamAV to protect you from 
phising. I supplied them with various phising samples, but only a select few 
have been added to the database. Next to that, I wonder how well suited ClamAV 
is for this job.

But couldn'tsome 'simple' rules fix this? One 
metafilter that looks for valid links (images, href's, email-addresses) to ebay, 
amazon, banks, etc. and another meta-rule that looks for links that point to 
non-ebay, non-amazon, non-bank links. A phisers will always need to point the 
users to a site that is under his control and it shouldn't be too hard to 
recognize this site.

Kind 
Regards,
Sander 
Holthaus


  
  
  From: Greg Allen 
  [mailto:[EMAIL PROTECTED] Sent: Sunday, August 28, 2005 12:19 
  PMTo: satalk; users@spamassassin.apache.orgSubject: RE: 
  phish/bayes
  
  I 
  wouldn't worry about it. You can whitelist the real ebayservers with 
  SA.
  
  Also, if you want to catch more of the phish messages 
  you can install the Clamav plugin for SA, it does very good at finding 
  phishies. You have to also install Clamav, but it is a fairly 
  simple thing to install.
  
  On a 
  side note, Ebay is not too smart IMO. Their real emails sometimes look a lot 
  like phish, which must confuse the heck out of their customers. I am sure the 
  bad guys like it though.
  
  
  
-Original Message-From: satalk (sent by 
Nabble.com) [mailto:[EMAIL PROTECTED]Sent: Thursday, August 25, 
2005 6:49 PMTo: users@spamassassin.apache.orgSubject: 
phish/bayesI could not find any email in this forum 
addressing this issue - it does not mean there is not one - I just 
could'nt find it :) MY question is as follows: Given that so 
many valid tokens from ebay/paypal sites exist in phish emails, am I 
correct in saying that it is imperative to avoid phish emails entering 
the bayes database? Anthony 

Sent from the SpamAssassin - 
Users forum at Nabble.com. 


Re: phish/bayes

2005-08-26 Thread Loren Wilton



MY question is as follows: Given that so many valid tokens from 
ebay/paypal sites exist in phish emails, am I correct in saying that it is 
imperative to avoid phish emails entering the bayes database? 
Probably not. A lot of them use links from ebay/paypal/whoever, but a 
lot of them pick up the links from Geocities or the like. Some of the 
better ones do a good job of getting the text right, and that could be a 
problem. But the vast majority are written by non-english speakers, and 
the results are close to butchered jabber. Ought to make some really nice 
bayes tokens only associated with spam and maybe the lete-speak crowd.

Ok, I just looked at some real Paypal mails. They all get bayes_00. 
Looking at three recent paypal phish, they are all getting bayes-50 to 60. 
Of course, I don't auto-train, and I don't know that I've ever bothered feeding 
paypal phish to bayes specifically, although it has likely seen the occasional 
one in a batch oif spam.

  Loren



Re: phish/bayes

2005-08-26 Thread Matt Kettler

At 06:49 PM 8/25/2005, satalk (sent by Nabble.com) wrote:

MY question is as follows:
Given that so many valid tokens from ebay/paypal sites
exist in phish emails, am I correct in saying that it is
imperative to avoid phish emails entering the bayes database?


I would say it's imperative NOT to avoid training phish mails. To avoid 
training them is to intentionally poison your database.


Don't ever avoid training a spam because it's got ham like content. This 
includes phish mails, bayes poison etc. Train them all. If it is spam, 
train it as spam. Period.


Remember, your bayes DB can only be as accurate as your training is. If 
your training isn't realistic, your bayes db won't work well on realistic 
email.


It's a common misconception that training ham-like spam will poison your 
bayes db. This problem might exist in very crude bayes implementations, but 
most bayes implementations, including SA, are largely immune to this.


SA's use of chi-squared combining makes it very resistant to being 
poisoned into creating FPs by training nonspam text inside spam. Most 
tokens that are seen in both spam and ham are given very little weight by 
the chi-squared combining.


On the other hand, failing to train those same messages makes SA very weak 
to having them FN in the future. If a token is only ever seen in ham it's 
given a very strong weight in the chi-squared combining.






Re: phish/bayes

2005-08-26 Thread jdow

From: Matt Kettler [EMAIL PROTECTED]


At 06:49 PM 8/25/2005, satalk (sent by Nabble.com) wrote:

MY question is as follows:
Given that so many valid tokens from ebay/paypal sites
exist in phish emails, am I correct in saying that it is
imperative to avoid phish emails entering the bayes database?


I would say it's imperative NOT to avoid training phish mails. To avoid 
training them is to intentionally poison your database.


Don't ever avoid training a spam because it's got ham like content. This 
includes phish mails, bayes poison etc. Train them all. If it is spam, 
train it as spam. Period.


Remember, your bayes DB can only be as accurate as your training is. If 
your training isn't realistic, your bayes db won't work well on realistic 
email.


It's a common misconception that training ham-like spam will poison your 
bayes db. This problem might exist in very crude bayes implementations, 
but most bayes implementations, including SA, are largely immune to this.


SA's use of chi-squared combining makes it very resistant to being 
poisoned into creating FPs by training nonspam text inside spam. Most 
tokens that are seen in both spam and ham are given very little weight by 
the chi-squared combining.


On the other hand, failing to train those same messages makes SA very weak 
to having them FN in the future. If a token is only ever seen in ham it's 
given a very strong weight in the chi-squared combining.


I modify that a little. I see no huge benefit and potential bad side
effects from indiscriminately training on every spam that comes through.
Instead I look at the low scoring spams, the ones that just barely were
caught. If they are not BAYES_99 already and have anything to train on
other than a single URL I train on them. I also train on missed spam
that has anything within it that can distinguish it. (That's anything
beyond a single URL.)

I figure single URL emails are best caught on their second time around
by the BLs in use. So far they always are. (Of course, most of them are
caught by the specific geocities rule, anyway.) SARE and Bayes are
highly synergistic in making a reliable SpamAssassin, I find. (So is no
automatic anything in the local.cf settings. Manual training uber alles.)

{^_-} 





phish/bayes

2005-08-25 Thread satalk (sent by Nabble.com)

I could not find any email in this forum addressing this issue - it does not
mean there is not one - I just could'nt find it :) 

MY question is as follows:
Given that so many valid tokens from ebay/paypal sites 
exist in phish emails, am I correct in saying that it is 
imperative to avoid phish emails entering the bayes database?

Anthony

Sent from the SpamAssassin - Users forum at Nabble.com.


Re: phish/bayes

2005-08-25 Thread Thomas Cameron
On Thu, 2005-08-25 at 15:49 -0700, satalk (sent by Nabble.com) wrote:
 I could not find any email in this forum addressing this issue - it
 does not 
 mean there is not one - I just could'nt find it :) 
 
 MY question is as follows: 
 Given that so many valid tokens from ebay/paypal sites 
 exist in phish emails, am I correct in saying that it is 
 imperative to avoid phish emails entering the bayes database? 

It has been my experience that the more of them I teach Bayes, the less
get through.  None of my legit eBay/PayPal e-mail has been tagged.

Thomas



RE: phish/bayes

2005-08-25 Thread Herb Martin
 From: Thomas Cameron [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, August 25, 2005 6:03 PM
 To: users@spamassassin.apache.org
 Subject: Re: phish/bayes
 
 On Thu, 2005-08-25 at 15:49 -0700, satalk (sent by Nabble.com) wrote:
  I could not find any email in this forum addressing this issue - it 
  does not mean there is not one - I just could'nt find it :)
  
  MY question is as follows: 
  Given that so many valid tokens from ebay/paypal sites 
 exist in phish 
  emails, am I correct in saying that it is imperative to avoid phish 
  emails entering the bayes database?
 
 It has been my experience that the more of them I teach 
 Bayes, the less get through.  None of my legit eBay/PayPal 
 e-mail has been tagged.

Mine too -- and we likely need to remind the original
poster that it is VERY important to also train some
VALID emails from the real source that such phishes
are targetting.

This puts the real mails words in as tokens an means
that the words in both types will not be strong indicators
of spam (or ham) and other differences will be used to
make the estimate.

--
Herb Martin