Re: Lots of comment in mail, how to score

2012-02-08 Thread Martin Gregorie
On Wed, 2012-02-08 at 03:04 +, Martin Gregorie wrote:
 If you cut and paste this example as a file and feed it to your browser,
 you should see the first body line in bold red letters. I've tested this
 with FireFox and Lynx, which work as I expected.

Correction: FireFox and Opera. Lynx ignores style specs and shows plain
text.

Martin




Re: Lots of comment in mail, how to score

2012-02-07 Thread Joseph Brennan




body  __SR1  /html\s{0,2}!--/
body  __SR2  /--\s{0,2}body/


does not work since body rules strip html comments

with rawbody it ignore limits but hits on both



And don't score too high.

Example: Confirmations from Travelocity contain a 28 KB comment.

Joseph Brennan
Columbia University Information Technology




Re: Lots of comment in mail, how to score

2012-02-07 Thread Kris Deugau

Joseph Brennan wrote:




body __SR1 /html\s{0,2}!--/
body __SR2 /--\s{0,2}body/


does not work since body rules strip html comments

with rawbody it ignore limits but hits on both



And don't score too high.

Example: Confirmations from Travelocity contain a 28 KB comment.


Eugh.

Any idea what's in that comment?

-kgd


Re: Lots of comment in mail, how to score

2012-02-07 Thread Martin Gregorie
On Tue, 2012-02-07 at 11:04 -0500, Kris Deugau wrote:
 Joseph Brennan wrote:
 
 
  body __SR1 /html\s{0,2}!--/
  body __SR2 /--\s{0,2}body/
 
  does not work since body rules strip html comments
 
  with rawbody it ignore limits but hits on both
 
 
  And don't score too high.
 
  Example: Confirmations from Travelocity contain a 28 KB comment.
 
BUT is that comment between html and body tags in a Travelocity
confirmation? It is in the example mail and, since I've never see a
comment there in mail or or on a web page this seemed like a fairly
safe thing to trigger on.

 Eugh.
 
Kindly note that my suggestion has been misquoted, probably by Joe
Brennan. As he quoted it, its missing the meta which is somewhat
important in thus case. With correction to doing a rawbody scan it
should be:

rawbody __SR1 /html\s{0,2}!--/
rawbody __SR2 /--\s{0,2}body/
metaRULE  (__SR1  __SR2)

which is actually quite specific since it won't fire unless the comment
is between just those tags and separated from them by at most two
whitespace characters. 

 Any idea what's in that comment?
 
a huge amount of garbage consisting of English words grouped by matched
parens, something like this: axe (elsewhere) zoo this (whenever
numeric) ... with nothing showing an obvious pattern except the
paired parens with text between them. I suppose you could use something
like:

body  RULE2 /\([\s\w]{1,30}\)/
tflag RULE2 multiple

which would be specific from this garbage, but would you really want to
run that across more than 80kb of comment? I suggested the approach of
matching each end of the comment and using a meta to ensure both are
present because that should run a lot faster than anything I could dream
up that matched against the guts of the comment.

Martin




Re: Lots of comment in mail, how to score

2012-02-07 Thread Kris Deugau

Martin Gregorie wrote:

BUT is that comment betweenhtml  andbody  tags in a Travelocity
confirmation? It is in the example mail and, since I've never see a
comment there in mail or or on a web page this seemed like a fairly
safe thing to trigger on.


*nod*  I should have just trimmed the quote down;  I wasn't referring 
specifically to those potential rules.



Kindly note that my suggestion has been misquoted, probably by Joe
Brennan. As he quoted it, its missing the meta which is somewhat
important in thus case. With correction to doing a rawbody scan it
should be:

rawbody __SR1 /html\s{0,2}!--/
rawbody __SR2 /--\s{0,2}body/
metaRULE  (__SR1  __SR2)


*nod*  I can't say I recall if I've seen comments arranged like that; 
I've paid more attention to the length and lack of useful content in the 
spamples I've come across.



Any idea what's in that comment?


a huge amount of garbage consisting of English words grouped by matched
parens, something like this: axe (elsewhere) zoo this (whenever
numeric) ... with nothing showing an obvious pattern except the
paired parens with text between them.


*nod*  Yeah, I've been seeing those.

I've got a number of rules targeting strange things in HTML comments 
generally:


rawbody LONG_COMMENTm|!--[^{};]{200,}--|
rawbody DUMB_COMMENT_1  m|!--\n?\s*\d+\s*\n?--|
rawbody DUMB_COMMENT_2  m|!--\n?\s*(?:-{72}\n){2,}-+\n?\s*--|
rawbody BACK2BACK_COMMENT   m|--!!--[\n\s\w]{,200}--!!--|
rawbody FILLER_COMMENT
  m|!--\n?\s*(?:\(?[\w.]{2,14}\)?\s{0,2}/\s{0,2}){8}|

Note the first one started at ~60 chars, then I kept having to bump it 
up due to Outlook's bizarre HTML generation.


The other oddity I've tripped over are excessively long style/style 
tags;  legit email seems to use as much as ~3K, but I've seen spams put 
all kinds of non-CSS garbage in there up to 20-30K in length.


-kgd


Re: Lots of comment in mail, how to score

2012-02-07 Thread Joseph Brennan

Martin Gregorie mar...@gregorie.org wrote:


 Example: Confirmations from Travelocity contain a 28 KB comment.


BUT is that comment between html and body tags in a Travelocity
confirmation? It is in the example mail and, since I've never see a
comment there in mail or or on a web page this seemed like a fairly
safe thing to trigger on.


No, it was inside body .. /body at least.  We noticed it a couple
of years ago, and I have only a note on file about it being 28 KB,
without an example.  I don't remember exactly what was in it, but it
was some kind of content that seemed to be about the reservation.

Most likely comment before body begins is unique to spam, but... you
never know.  It sounds like valid html so some web programmer might
find a reason to put it in mail output.


Now style ... /style with garbage in it is interesting.  That
would never be in real mail.  Or so you'd think!


Joseph Brennan
Columbia University Information Technology





Re: Lots of comment in mail, how to score

2012-02-07 Thread John Hardin

On Tue, 7 Feb 2012, Joseph Brennan wrote:

Now style ... /style with garbage in it is interesting.  That would 
never be in real mail.  Or so you'd think!


I do have a rule for garbage styles that is doing fairly well in 
masschecks:


  http://ruleqa.spamassassin.org/rule=STYLE_GIBBERISH

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Your mouse has moved. Your Windows Operating System must be
  relicensed due to this hardware change. Please contact Microsoft
  to obtain a new activation key. If this hardware change results in
  added functionality you may be subject to additional license fees.
  Your system will now shut down. Thank you for choosing Microsoft.
---
 5 days until Abraham Lincoln's and Charles Darwin's 203rd Birthdays


Re: Lots of comment in mail, how to score

2012-02-07 Thread Martin Gregorie
On Tue, 2012-02-07 at 20:13 -0500, Joseph Brennan wrote:
 Now style ... /style with garbage in it is interesting.  That
 would never be in real mail.  Or so you'd think!
 
Maybe, maybe not. I think spammers have found that you can put any old
junk between style/style tags. I base this on screwing up styles
when I was learning to use them and noticing that anything the browser
can't parse in there is silently ignored.   

For fun I kicked this together:
=
!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01//EN

html
head
  meta name=generator content=
  HTML Tidy for Linux/x86 (vers 25 March 2009), see www.w3.org

  titleBig red test/title
  style type=text/css
Maybe, maybe not. As a pure guess, I think spammers may have found that
  you can put any old junk between [style] and [/style] tags. I base
this on
  screwing up styles when I was learning to use them and noticing that
  anything the browser can't parse in there is silently ignored.
  /style
  style type=text/css
p.c1 {color: red; font-size: xx-large; font-weight: bold}
  /style
  style type=text/css
Maybe, maybe not. As a pure guess, I think spammers may have found that
  you can put any old junk between [style] and [/style] tags. I base
this on
  screwing up styles when I was learning to use them and noticing that
  anything the browser can't parse in there is silently ignored.
  p.c1 {color: red; font-size: xx-large; font-weight: bold}
  /style
/head

body
  p class=c1Big red test/p

  pHeading should be red/p
/body
/html
=

I used three style sections because, when I put the junk text into one
style section in front of the actual style definition, that got ignored.

If you cut and paste this example as a file and feed it to your browser,
you should see the first body line in bold red letters. I've tested this
with FireFox and Lynx, which work as I expected. As you can see, the
file has been passed through HTML by HTML-tidy, which says it is valid
HTML.


Martin




Lots of comment in mail, how to score

2012-02-06 Thread Mynabbler

I seem to remember we discussed a way to figure out how much HTML comment is
in a message, but I am not able to find a decent ruleset that is trying to
count the amount of comment.

Let me elaborate with an example: http://pastebin.com/AS6kvLH2

I do realize the spamvertized site (way way down the message) is at the
moment in blacklists. But it was not at the time the message was received.
And I reckon a fresh domain will be spammed in the next batch. But they
typically all have _pages_ of comment, and behind that scattering of words,
a small block with the payload.

What would be the best way to score such an unusual amout of HTML comment in
a message?
-- 
View this message in context: 
http://old.nabble.com/Lots-of-comment-in-mail%2C-how-to-score-tp33272106p33272106.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Lots of comment in mail, how to score

2012-02-06 Thread Benny Pedersen



Let me elaborate with an example: http://pastebin.com/AS6kvLH2


 1.0 RCVD_IN_CSSRBL: Received via a relay in Spamhaus CSS
[64.120.212.26 listed in zen.spamhaus.org]
 1.3 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
   [Blocked - see 
http://www.spamcop.net/bl.shtml?64.120.212.26]

 1.3 RCVD_IN_RP_RNBLRBL: Relay in RNBL,
https://senderscore.org/blacklistlookup/
[64.120.212.26 listed in 
bl.score.senderscore.com]

 1.4 RCVD_IN_BRBL_LASTEXT   RBL: RCVD_IN_BRBL_LASTEXT
[64.120.212.26 listed in 
bb.barracudacentral.org]

 1.7 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist
[URIs: universmallmail.com]
 1.6 URIBL_WS_SURBL Contains an URL listed in the WS SURBL 
blocklist

[URIs: universmallmail.com]
 1.7 URIBL_BLACKContains an URL listed in the URIBL 
blacklist

[URIs: universmallmail.com]
 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 0.9997]
 0.0 RELAY_US   Relayed through United States
 1.7 RCVD_IN_HOSTKARMA_BL   RBL: HostKarma: relay in black list
   [64.120.212.26 listed in 
hostkarma.junkemailfilter.com]
 0.8 SPF_NEUTRALSPF: sender does not match SPF record 
(neutral)
 0.1 SPF_HELO_NEUTRAL   SPF: HELO does not match SPF record 
(neutral)

 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.1 KHOP_DNSBL_BUMPHits a trusted non-overlapping DNSBL
 0.4 MAY_BE_FORGED  Relay IP's reverse DNS does not resolve to 
IP

 1.0 KHOP_DYNAMIC2  Relay looks like a dynamic address

seems wasted :)




Re: Lots of comment in mail, how to score

2012-02-06 Thread Mynabbler


Benny Pedersen wrote:
 
   1.0 RCVD_IN_CSSRBL: Received via a relay in Spamhaus CSS
   1.6 URIBL_WS_SURBL Contains an URL listed in the WS SURBL
 blocklist
  [URIs: universmallmail.com]
 
 seems wasted :)
 

As I said, sure they are in RBL now. They were not when this message was
delivered. That's the whole point of coming up with a diffent approach here,
the amount of comment in the message.
-- 
View this message in context: 
http://old.nabble.com/Lots-of-comment-in-mail%2C-how-to-score-tp33272106p33273247.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Lots of comment in mail, how to score

2012-02-06 Thread Benny Pedersen


As I said, sure they are in RBL now. They were not when this message 
was
delivered. That's the whole point of coming up with a diffent 
approach here,

the amount of comment in the message.


i got bayes_99 on this unknown spam

meta SPF_SPAM_AS_NEUTRAL (SPF_NEUTRAL  SPF_HELO_NEUTRAL)

and set score on this

if you like to make rules on html comments you need rawbody, and i try 
keep away from this needs


Re: Lots of comment in mail, how to score

2012-02-06 Thread Rob McEwen
On 2/6/2012 12:57 PM, Mynabbler wrote:
 As I said, sure they are in RBL now. They were not when this message was
 delivered.

Looking at the date/time stamps, I'm almost positive that this URI was
blacklisted in BOTH uribl-BLACK and ivmURI *hours* before your sample
message arrived.

But, of course, your question is till valid! Having rules in place in SA
to deal with this kind of attempt at getting around bayes-filtering is a
good idea!

-- 
Rob McEwen
http://dnsbl.invaluement.com/
r...@invaluement.com
+1 (478) 475-9032



Re: Lots of comment in mail, how to score

2012-02-06 Thread Dave Funk

On Mon, 6 Feb 2012, Benny Pedersen wrote:




As I said, sure they are in RBL now. They were not when this message was
delivered. That's the whole point of coming up with a diffent approach 
here,

the amount of comment in the message.


i got bayes_99 on this unknown spam

meta SPF_SPAM_AS_NEUTRAL (SPF_NEUTRAL  SPF_HELO_NEUTRAL)

and set score on this

if you like to make rules on html comments you need rawbody, and i try keep 
away from this needs


As currently implemented, true. However SA already has some kind of HTML
rendering engine so it knows the size of the raw  rendered message.
If there was some easy way to extract those numbers, calculate the 
ratio, and make it available to the rules processor, then a score could be 
generated at very little cost.



--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{


Re: Lots of comment in mail, how to score

2012-02-06 Thread Benny Pedersen


But, of course, your question is till valid! Having rules in place in 
SA
to deal with this kind of attempt at getting around bayes-filtering 
is a

good idea!


imho bayes does not see html comments, but still here it got bayes_99 
what did i miss ?




Re: Lots of comment in mail, how to score

2012-02-06 Thread Martin Gregorie
On Mon, 2012-02-06 at 09:57 -0800, Mynabbler wrote:
 As I said, sure they are in RBL now. They were not when this message was
 delivered. That's the whole point of coming up with a diffent approach here,
 the amount of comment in the message.

Something like this might work:

body  __SR1  /html\s{0,2}!--/
body  __SR2  /--\s{0,2}body/
meta  RULE   (__SR1  __SR2)
score RULE   3.5

on the grounds that I've never seen a comment in valid HTML that
immediately follows an html tag or immediately precedes a body tag. 

CAUTION: this has neither been syntax checked or tested.

It would also be quite reasonable to point a rule at the in-body URL, on
which somebody has gone to the trouble of setting up MX records for the
domain, and so may feature in more spam in the future. The URL
references a single, zero length main page called index.html - not a
normal feature of a legitimate site. If many of the spams have this URL
in common, it is definitely worth a few points.

Martin
 







Re: Lots of comment in mail, how to score

2012-02-06 Thread Benny Pedersen



body  __SR1  /html\s{0,2}!--/
body  __SR2  /--\s{0,2}body/


does not work since body rules strip html comments

with rawbody it ignore limits but hits on both