Re: Lots of comment in mail, how to score
On Wed, 2012-02-08 at 03:04 +, Martin Gregorie wrote: If you cut and paste this example as a file and feed it to your browser, you should see the first body line in bold red letters. I've tested this with FireFox and Lynx, which work as I expected. Correction: FireFox and Opera. Lynx ignores style specs and shows plain text. Martin
Re: Lots of comment in mail, how to score
body __SR1 /html\s{0,2}!--/ body __SR2 /--\s{0,2}body/ does not work since body rules strip html comments with rawbody it ignore limits but hits on both And don't score too high. Example: Confirmations from Travelocity contain a 28 KB comment. Joseph Brennan Columbia University Information Technology
Re: Lots of comment in mail, how to score
Joseph Brennan wrote: body __SR1 /html\s{0,2}!--/ body __SR2 /--\s{0,2}body/ does not work since body rules strip html comments with rawbody it ignore limits but hits on both And don't score too high. Example: Confirmations from Travelocity contain a 28 KB comment. Eugh. Any idea what's in that comment? -kgd
Re: Lots of comment in mail, how to score
On Tue, 2012-02-07 at 11:04 -0500, Kris Deugau wrote: Joseph Brennan wrote: body __SR1 /html\s{0,2}!--/ body __SR2 /--\s{0,2}body/ does not work since body rules strip html comments with rawbody it ignore limits but hits on both And don't score too high. Example: Confirmations from Travelocity contain a 28 KB comment. BUT is that comment between html and body tags in a Travelocity confirmation? It is in the example mail and, since I've never see a comment there in mail or or on a web page this seemed like a fairly safe thing to trigger on. Eugh. Kindly note that my suggestion has been misquoted, probably by Joe Brennan. As he quoted it, its missing the meta which is somewhat important in thus case. With correction to doing a rawbody scan it should be: rawbody __SR1 /html\s{0,2}!--/ rawbody __SR2 /--\s{0,2}body/ metaRULE (__SR1 __SR2) which is actually quite specific since it won't fire unless the comment is between just those tags and separated from them by at most two whitespace characters. Any idea what's in that comment? a huge amount of garbage consisting of English words grouped by matched parens, something like this: axe (elsewhere) zoo this (whenever numeric) ... with nothing showing an obvious pattern except the paired parens with text between them. I suppose you could use something like: body RULE2 /\([\s\w]{1,30}\)/ tflag RULE2 multiple which would be specific from this garbage, but would you really want to run that across more than 80kb of comment? I suggested the approach of matching each end of the comment and using a meta to ensure both are present because that should run a lot faster than anything I could dream up that matched against the guts of the comment. Martin
Re: Lots of comment in mail, how to score
Martin Gregorie wrote: BUT is that comment betweenhtml andbody tags in a Travelocity confirmation? It is in the example mail and, since I've never see a comment there in mail or or on a web page this seemed like a fairly safe thing to trigger on. *nod* I should have just trimmed the quote down; I wasn't referring specifically to those potential rules. Kindly note that my suggestion has been misquoted, probably by Joe Brennan. As he quoted it, its missing the meta which is somewhat important in thus case. With correction to doing a rawbody scan it should be: rawbody __SR1 /html\s{0,2}!--/ rawbody __SR2 /--\s{0,2}body/ metaRULE (__SR1 __SR2) *nod* I can't say I recall if I've seen comments arranged like that; I've paid more attention to the length and lack of useful content in the spamples I've come across. Any idea what's in that comment? a huge amount of garbage consisting of English words grouped by matched parens, something like this: axe (elsewhere) zoo this (whenever numeric) ... with nothing showing an obvious pattern except the paired parens with text between them. *nod* Yeah, I've been seeing those. I've got a number of rules targeting strange things in HTML comments generally: rawbody LONG_COMMENTm|!--[^{};]{200,}--| rawbody DUMB_COMMENT_1 m|!--\n?\s*\d+\s*\n?--| rawbody DUMB_COMMENT_2 m|!--\n?\s*(?:-{72}\n){2,}-+\n?\s*--| rawbody BACK2BACK_COMMENT m|--!!--[\n\s\w]{,200}--!!--| rawbody FILLER_COMMENT m|!--\n?\s*(?:\(?[\w.]{2,14}\)?\s{0,2}/\s{0,2}){8}| Note the first one started at ~60 chars, then I kept having to bump it up due to Outlook's bizarre HTML generation. The other oddity I've tripped over are excessively long style/style tags; legit email seems to use as much as ~3K, but I've seen spams put all kinds of non-CSS garbage in there up to 20-30K in length. -kgd
Re: Lots of comment in mail, how to score
Martin Gregorie mar...@gregorie.org wrote: Example: Confirmations from Travelocity contain a 28 KB comment. BUT is that comment between html and body tags in a Travelocity confirmation? It is in the example mail and, since I've never see a comment there in mail or or on a web page this seemed like a fairly safe thing to trigger on. No, it was inside body .. /body at least. We noticed it a couple of years ago, and I have only a note on file about it being 28 KB, without an example. I don't remember exactly what was in it, but it was some kind of content that seemed to be about the reservation. Most likely comment before body begins is unique to spam, but... you never know. It sounds like valid html so some web programmer might find a reason to put it in mail output. Now style ... /style with garbage in it is interesting. That would never be in real mail. Or so you'd think! Joseph Brennan Columbia University Information Technology
Re: Lots of comment in mail, how to score
On Tue, 7 Feb 2012, Joseph Brennan wrote: Now style ... /style with garbage in it is interesting. That would never be in real mail. Or so you'd think! I do have a rule for garbage styles that is doing fairly well in masschecks: http://ruleqa.spamassassin.org/rule=STYLE_GIBBERISH -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Your mouse has moved. Your Windows Operating System must be relicensed due to this hardware change. Please contact Microsoft to obtain a new activation key. If this hardware change results in added functionality you may be subject to additional license fees. Your system will now shut down. Thank you for choosing Microsoft. --- 5 days until Abraham Lincoln's and Charles Darwin's 203rd Birthdays
Re: Lots of comment in mail, how to score
On Tue, 2012-02-07 at 20:13 -0500, Joseph Brennan wrote: Now style ... /style with garbage in it is interesting. That would never be in real mail. Or so you'd think! Maybe, maybe not. I think spammers have found that you can put any old junk between style/style tags. I base this on screwing up styles when I was learning to use them and noticing that anything the browser can't parse in there is silently ignored. For fun I kicked this together: = !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01//EN html head meta name=generator content= HTML Tidy for Linux/x86 (vers 25 March 2009), see www.w3.org titleBig red test/title style type=text/css Maybe, maybe not. As a pure guess, I think spammers may have found that you can put any old junk between [style] and [/style] tags. I base this on screwing up styles when I was learning to use them and noticing that anything the browser can't parse in there is silently ignored. /style style type=text/css p.c1 {color: red; font-size: xx-large; font-weight: bold} /style style type=text/css Maybe, maybe not. As a pure guess, I think spammers may have found that you can put any old junk between [style] and [/style] tags. I base this on screwing up styles when I was learning to use them and noticing that anything the browser can't parse in there is silently ignored. p.c1 {color: red; font-size: xx-large; font-weight: bold} /style /head body p class=c1Big red test/p pHeading should be red/p /body /html = I used three style sections because, when I put the junk text into one style section in front of the actual style definition, that got ignored. If you cut and paste this example as a file and feed it to your browser, you should see the first body line in bold red letters. I've tested this with FireFox and Lynx, which work as I expected. As you can see, the file has been passed through HTML by HTML-tidy, which says it is valid HTML. Martin
Lots of comment in mail, how to score
I seem to remember we discussed a way to figure out how much HTML comment is in a message, but I am not able to find a decent ruleset that is trying to count the amount of comment. Let me elaborate with an example: http://pastebin.com/AS6kvLH2 I do realize the spamvertized site (way way down the message) is at the moment in blacklists. But it was not at the time the message was received. And I reckon a fresh domain will be spammed in the next batch. But they typically all have _pages_ of comment, and behind that scattering of words, a small block with the payload. What would be the best way to score such an unusual amout of HTML comment in a message? -- View this message in context: http://old.nabble.com/Lots-of-comment-in-mail%2C-how-to-score-tp33272106p33272106.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Lots of comment in mail, how to score
Let me elaborate with an example: http://pastebin.com/AS6kvLH2 1.0 RCVD_IN_CSSRBL: Received via a relay in Spamhaus CSS [64.120.212.26 listed in zen.spamhaus.org] 1.3 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net [Blocked - see http://www.spamcop.net/bl.shtml?64.120.212.26] 1.3 RCVD_IN_RP_RNBLRBL: Relay in RNBL, https://senderscore.org/blacklistlookup/ [64.120.212.26 listed in bl.score.senderscore.com] 1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT [64.120.212.26 listed in bb.barracudacentral.org] 1.7 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist [URIs: universmallmail.com] 1.6 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist [URIs: universmallmail.com] 1.7 URIBL_BLACKContains an URL listed in the URIBL blacklist [URIs: universmallmail.com] 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 0.9997] 0.0 RELAY_US Relayed through United States 1.7 RCVD_IN_HOSTKARMA_BL RBL: HostKarma: relay in black list [64.120.212.26 listed in hostkarma.junkemailfilter.com] 0.8 SPF_NEUTRALSPF: sender does not match SPF record (neutral) 0.1 SPF_HELO_NEUTRAL SPF: HELO does not match SPF record (neutral) 0.0 HTML_MESSAGE BODY: HTML included in message 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.1 KHOP_DNSBL_BUMPHits a trusted non-overlapping DNSBL 0.4 MAY_BE_FORGED Relay IP's reverse DNS does not resolve to IP 1.0 KHOP_DYNAMIC2 Relay looks like a dynamic address seems wasted :)
Re: Lots of comment in mail, how to score
Benny Pedersen wrote: 1.0 RCVD_IN_CSSRBL: Received via a relay in Spamhaus CSS 1.6 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist [URIs: universmallmail.com] seems wasted :) As I said, sure they are in RBL now. They were not when this message was delivered. That's the whole point of coming up with a diffent approach here, the amount of comment in the message. -- View this message in context: http://old.nabble.com/Lots-of-comment-in-mail%2C-how-to-score-tp33272106p33273247.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Lots of comment in mail, how to score
As I said, sure they are in RBL now. They were not when this message was delivered. That's the whole point of coming up with a diffent approach here, the amount of comment in the message. i got bayes_99 on this unknown spam meta SPF_SPAM_AS_NEUTRAL (SPF_NEUTRAL SPF_HELO_NEUTRAL) and set score on this if you like to make rules on html comments you need rawbody, and i try keep away from this needs
Re: Lots of comment in mail, how to score
On 2/6/2012 12:57 PM, Mynabbler wrote: As I said, sure they are in RBL now. They were not when this message was delivered. Looking at the date/time stamps, I'm almost positive that this URI was blacklisted in BOTH uribl-BLACK and ivmURI *hours* before your sample message arrived. But, of course, your question is till valid! Having rules in place in SA to deal with this kind of attempt at getting around bayes-filtering is a good idea! -- Rob McEwen http://dnsbl.invaluement.com/ r...@invaluement.com +1 (478) 475-9032
Re: Lots of comment in mail, how to score
On Mon, 6 Feb 2012, Benny Pedersen wrote: As I said, sure they are in RBL now. They were not when this message was delivered. That's the whole point of coming up with a diffent approach here, the amount of comment in the message. i got bayes_99 on this unknown spam meta SPF_SPAM_AS_NEUTRAL (SPF_NEUTRAL SPF_HELO_NEUTRAL) and set score on this if you like to make rules on html comments you need rawbody, and i try keep away from this needs As currently implemented, true. However SA already has some kind of HTML rendering engine so it knows the size of the raw rendered message. If there was some easy way to extract those numbers, calculate the ratio, and make it available to the rules processor, then a score could be generated at very little cost. -- Dave Funk University of Iowa dbfunk (at) engineering.uiowa.eduCollege of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include std_disclaimer.h Better is not better, 'standard' is better. B{
Re: Lots of comment in mail, how to score
But, of course, your question is till valid! Having rules in place in SA to deal with this kind of attempt at getting around bayes-filtering is a good idea! imho bayes does not see html comments, but still here it got bayes_99 what did i miss ?
Re: Lots of comment in mail, how to score
On Mon, 2012-02-06 at 09:57 -0800, Mynabbler wrote: As I said, sure they are in RBL now. They were not when this message was delivered. That's the whole point of coming up with a diffent approach here, the amount of comment in the message. Something like this might work: body __SR1 /html\s{0,2}!--/ body __SR2 /--\s{0,2}body/ meta RULE (__SR1 __SR2) score RULE 3.5 on the grounds that I've never seen a comment in valid HTML that immediately follows an html tag or immediately precedes a body tag. CAUTION: this has neither been syntax checked or tested. It would also be quite reasonable to point a rule at the in-body URL, on which somebody has gone to the trouble of setting up MX records for the domain, and so may feature in more spam in the future. The URL references a single, zero length main page called index.html - not a normal feature of a legitimate site. If many of the spams have this URL in common, it is definitely worth a few points. Martin
Re: Lots of comment in mail, how to score
body __SR1 /html\s{0,2}!--/ body __SR2 /--\s{0,2}body/ does not work since body rules strip html comments with rawbody it ignore limits but hits on both