Re: html experts: empty style tags.
Kai Schaetzl wrote: As I understand even those clients that produce empty style tags do this in the header and not in the body. There's a chance that you FP on body/style sections that appear in text/plain parts (e.g. samples) - AFAIK there is no test that matches only in text/html parts, so you can't avoid that. And the rule might be a heavy one as the expression may need to gulp a lot of non-matching text between body and style tag. which is why I think it should be in one of those html_eval plugins, like ones that check for ratio of html/txt, check extra close, etc. easy way to check body/vs head: rawbody __IN_BODY /body/ rawbody __RULES_THAT_SHOULD_NOT_BE_IN_BODY /style/ meta RULES_THAT_SHOULD_NOT_BE_IN_BODY __RULES_THAT_SHOULD_NOT_BE_IN_BODY __IN_BODY -- Michael Scheidell, CTO Phone: 561-999-5000, x 1259 *| *SECNAP Network Security Corporation * Certified SNORT Integrator * King of Spam Filters, SC Magazine 2008 * Information Security Award 2008, Info Security Products Guide * CRN Magazine Top 40 Emerging Security Vendors * Finalist 2009 Network Products Guide Hot Companies _ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ _
Re: html experts: empty style tags.
Kenneth Porter wrote on Sat, 31 Jan 2009 13:59:54 -0800: A simple-minded autodetect system would just look at the first tokens to spot HTML tags, like html, body, div, or p. An initial paragraph of plain text would be enough to prevent it from interpreting later HTML examples as making the whole message part HTML. Yeah, would ;-) I just wrote that reply as a general reminder why it wouldn't work well. You can come up with a lot of woulds that complicate this process. Anyway, there isn't even a Microsoft client doing this, for good reasons. And it's absolutely not standards compatible, anyway. So, just forget this path. And now back to Michael's first posting. body styleiihdpuvikzxwdivdidulauqqgbjwkpgxfsufxkmnjkcn/style There wasn't confirmation, but this sequence was obviously found in a text/html MIME part and not in a text/plain part. So, if I understand SA's processing correctly a body rule would see exactly of the above for content checks, or in the other example it would see Va . The 'body' in this case is the textual parts of the message body; any non-text MIME parts are stripped, and the message decoded from Quoted-Printable or Base-64-encoded format if necessary. The message Subject header is considered part of the body and becomes the first paragraph when running the rules. All HTML tags and line breaks will be removed before matching. (this doesn't clarify if it removes *all* HTML tags or only the ones in the text/html part. It's also not clear, if it removes the content of style tags in the body or just the tag itself. It may remove the head completely which would eliminate any style tags and content in the normal location as well. So, it might just remove the style tag if it encounters one in the body but keep the content. In this case an SA body rule would be able to match against it.) About display in the client: non of the major client's will display this as part of a text/html part. With the exception of maybe the very latest Outlook as this moved from IE to Office for the HTML rendering engine and I don't know how this behaves. If this is used in spam messages, it's misguided and won't fulfill what they want. For spam testing: you could indeed try to match against style tags of all kinds (empty or not, garbage or not) that appear in a body section with a rawbody rule. As I understand even those clients that produce empty style tags do this in the header and not in the body. There's a chance that you FP on body/style sections that appear in text/plain parts (e.g. samples) - AFAIK there is no test that matches only in text/html parts, so you can't avoid that. And the rule might be a heavy one as the expression may need to gulp a lot of non-matching text between body and style tag. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: html experts: empty style tags.
Michael Scheidell wrote on Sun, 01 Feb 2009 11:27:50 -0500: which is why I think it should be in one of those html_eval plugins, I agree, it would be more helpful and less ressource-hungry there. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: html experts: empty style tags.
Matus UHLAR - fantomas wrote on Fri, 30 Jan 2009 16:41:51 +0100: Aren't there any MUAs that try to autodetect the right content type? Even from microsoft? No. If they would then you couldn't send any plain text messages that *discuss* HTML code with examples. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: html experts: empty style tags.
--On Saturday, January 31, 2009 10:31 PM +0100 Kai Schaetzl mailli...@conactive.com wrote: Aren't there any MUAs that try to autodetect the right content type? Even from microsoft? No. If they would then you couldn't send any plain text messages that *discuss* HTML code with examples. A simple-minded autodetect system would just look at the first tokens to spot HTML tags, like html, body, div, or p. An initial paragraph of plain text would be enough to prevent it from interpreting later HTML examples as making the whole message part HTML.
Re: html experts: empty style tags.
Ned Slider wrote on Thu, 29 Jan 2009 19:02:19 +: Also, I have a low scoring generic 'body' rule for common drug names that should have hit on Dan's mail (and your reply) if SA did strip that junk, but it obviously doesn't (at least not for me). It will not work on these messages as they are not HTML. In a text/plain message the Vistylesdfghjnkrdfbn/styleAgstyleghbfghfgh/stylera will just appear like it appears here. I don't know how SA works in this respect, but it wouldn't make sense to remove the markup from text/plain. However, I think for text/html it first removes the tags before evaluating the content. I remember that this behavior has been stressed and explained several times in the past. And it wouldn't make sense to leave the contents of a script or style block, so I'm sure it removes them as well, no matter if it contains garbage or not. So, the parser should then indeed see a plain V... Maybe Justin or Karsten can confirm? Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: html experts: empty style tags.
Ned Slider wrote on Thu, 29 Jan 2009 19:02:19 +: Also, I have a low scoring generic 'body' rule for common drug names that should have hit on Dan's mail (and your reply) if SA did strip that junk, but it obviously doesn't (at least not for me). On 30.01.09 16:31, Kai Schaetzl wrote: It will not work on these messages as they are not HTML. In a text/plain message the Vistylesdfghjnkrdfbn/styleAgstyleghbfghfgh/stylera will just appear like it appears here. Aren't there any MUAs that try to autodetect the right content type? Even from microsoft? -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Micro$oft random number generator: 0, 0, 0, 4.33e+67, 0, 0, 0...
Re: html experts: empty style tags.
Kai Schaetzl wrote: Ned Slider wrote on Thu, 29 Jan 2009 19:02:19 +: Also, I have a low scoring generic 'body' rule for common drug names that should have hit on Dan's mail (and your reply) if SA did strip that junk, but it obviously doesn't (at least not for me). It will not work on these messages as they are not HTML. In a text/plain message the Vistylesdfghjnkrdfbn/styleAgstyleghbfghfgh/stylera will just appear like it appears here. I don't know how SA works in this respect, but it wouldn't make sense to remove the markup from text/plain. However, I think for text/html it first removes the tags before evaluating the content. I remember that this behavior has been stressed and explained several times in the past. And it wouldn't make sense to leave the contents of a script or style block, so I'm sure it removes them as well, no matter if it contains garbage or not. So, the parser should then indeed see a plain V... Maybe Justin or Karsten can confirm? Kai Good point. I'll try running some tests of my own sending the test mails as html formatted. Thanks for the explanation :)
Re: html experts: empty style tags.
On Thu, 29 Jan 2009 18:00:47 -0800, Kelson kel...@speed.net wrote: On the subject of style vs style type=text/css *Technically* the TYPE attribute is required in HTML 4, but in practice, no one really uses anything other than CSS, and most browsers will assume it. The current draft of HTML 5 recognizes this, and makes TYPE explicitly optional for STYLE, defaulting to text/css if not present: http://www.whatwg.org/specs/web-apps/current-work/#the-style-element So in HTML 5, this is perfectly valid: style h1 {font-family: Arial} /style It is only allowed within HEAD (though again in practice, most browsers are lenient about this), but if I'm reading the HTML 5 spec correctly, it will also allow style within the body, but *only* if it contains the SCOPED attribute, and only at the beginning of a section, like this: div style scoped h2 {color: green} /style Bunch of content /div But this would not be: div Some content style scoped h2 {color: red} /style More content /div As far as I was aware style within the body is only valid as part of an element e.g. p style=font-family: serif;some text/p. It's my understanding that you'd only have style dir/lang/media/title/type= Inline in something like a php etc page... which would be a tad pointless. Not entirely sure what my point is here but it filled up some time until dinner was ready :-D Best to all Nigel
Re: html experts: empty style tags.
--On Friday, January 30, 2009 4:41 PM +0100 Matus UHLAR - fantomas uh...@fantomas.sk wrote: Aren't there any MUAs that try to autodetect the right content type? Even from microsoft? IE had a nasty habit of ignoring the MIME type in HTTP headers and rendering HTML even when one wanted it displayed as text/plain. So it wouldn't surprise me if Outlook (Express) had the same annoying helpfulness.
Re: html experts: empty style tags.
On Fri, 2009-01-30 at 12:56 -0800, Kenneth Porter wrote: IE had a nasty habit of ignoring the MIME type in HTTP headers and rendering HTML even when one wanted it displayed as text/plain. So it wouldn't surprise me if Outlook (Express) had the same annoying helpfulness. I've wasted more time than I care to remember sorting out the so-called HTML in MS LookOut messages I've wanted to save for later reference. Almost without exception they fail HTMLtidy verification in spectacular fashion. Now I manually annotate the plaintext part because that's quicker than fixing the HTML part. The problem is independent of LookOut version: MicroSerfs just don't 'get' HTML. Martin
html experts: empty style tags.
is is EVER acceptable to have an empty style tag? (appears that anything inside an empty style/style is not displayed. see more and more of this in spam. can deal with this with a raw body check, but how about adding it to the official SA html checks? body styleiihdpuvikzxwdivdidulauqqgbjwkpgxfsufxkmnjkcn/style best I can tell from research, this is valid: style type=text/css h1 {color:red} p {color:blue} /style this is NOT valid: stylegarbage that won't show up /style http://www.w3schools.com/TAGS/tag_style.asp this should catch it: rawbody T_HTML_ILLEGAL_STYLE /style/i /ruletest.pl styletest.cf t.eml Hit Body (or Subject line) Rules Content Filter Analysis Details: (0.0 points) pts rule name description -- -- 0.0 T_HTML_ILLEGAL_STYLE RAW: T_HTML_ILLEGAL_STYLE Subtests Hit: none -- Michael Scheidell, CTO Phone: 561-999-5000, x 1259 *| *SECNAP Network Security Corporation * Information Security Award 2008, Info Security Products Guide * CRN Magazine Top 40 Emerging Security Vendors * Finalist 2009 Network Products Guide Hot Companies _ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ _
Re: html experts: empty style tags.
Michael Scheidell wrote on Thu, 29 Jan 2009 07:21:32 -0500: is is EVER acceptable to have an empty style tag? it's not valid HTML but what mail client does send valid HTML? (appears that anything inside an empty style/style is not displayed. same goes for a style tag with type. body styleiihdpuvikzxwdivdidulauqqgbjwkpgxfsufxkmnjkcn/style I may be wrong but I think a style section in the body is illegal and rather unlikely to occur in a legit email client (more unlikely than style tag without type attribute). If it doesn't display what is it good for? Faking bayes? styleiihdpuvikzxwdivdidulauqqgbjwkpgxfsufxkmnjkcn/style You could check for style tags (of any kind) that don't include { } and : as these are absolutely necessary for a rule. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: html experts: empty style tags.
On Thu, 2009-01-29 at 15:31 +0100, Kai Schaetzl wrote: Michael Scheidell wrote on Thu, 29 Jan 2009 07:21:32 -0500: If it doesn't display what is it good for? Faking bayes? No, obfuscating the actual display: Buy Vistylesdfghjnkrdfbn/styleAgstyleghbfghfgh/stylera! -- Daniel J McDonald, CCIE #2495, CISSP #78281, CNX Austin Energy http://www.austinenergy.com signature.asc Description: This is a digitally signed message part
Re: html experts: empty style tags.
On Thu, 2009-01-29 at 07:21 -0500, Michael Scheidell wrote: is is EVER acceptable to have an empty style tag? (appears that anything inside an empty style/style is not displayed. see more and more of this in spam. can deal with this with a raw body check, but how about adding it to the official SA html checks? What exactly do you mean? I guess, this type of rules can only be done using raw body checks. The HTML::Parser for example doesn't display the content, just as any browser or MUA. Or so I hope. ;) body styleiihdpuvikzxwdivdidulauqqgbjwkpgxfsufxkmnjkcn/style * required attribute type not specified * document type does not allow element style here See the W3C Markup Validation Service: http://validator.w3.org/ Yup, not valid. Same result for XHTML 1.1, XHTML 1.0 and HTML 4.01 Strict and Transitional. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
RE: html experts: empty style tags.
It hist an awful lot of ham here. Cheers, Phil -- Phil Randal | Networks Engineer Herefordshire Council | Deputy Chief Executive's Office | I.C.T. Services Division Thorn Office Centre, Rotherwas, Hereford, HR2 6JT Tel: 01432 260160 email: pran...@herefordshire.gov.uk Any opinion expressed in this e-mail or any attached files are those of the individual and not necessarily those of Herefordshire Council. This e-mail and any attached files are confidential and intended solely for the use of the addressee. This communication may contain material protected by law from being passed on. If you are not the intended recipient and have received this e-mail in error, you are advised that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. If you have received this e-mail in error please contact the sender immediately and destroy all copies of it. From: Michael Scheidell [mailto:scheid...@secnap.net] Sent: 29 January 2009 12:22 To: SpamAssassin Users List Cc: Wazir Shpoon; Jose Montero Subject: html experts: empty style tags. is is EVER acceptable to have an empty style tag? (appears that anything inside an empty style/style is not displayed. see more and more of this in spam. can deal with this with a raw body check, but how about adding it to the official SA html checks? body styleiihdpuvikzxwdivdidulauqqgbjwkpgxfsufxkmnjkcn/style best I can tell from research, this is valid: style type=text/css h1 {color:red} p {color:blue} /style this is NOT valid: stylegarbage that won't show up /style http://www.w3schools.com/TAGS/tag_style.asp this should catch it: rawbody T_HTML_ILLEGAL_STYLE /style/i /ruletest.pl styletest.cf t.eml Hit Body (or Subject line) Rules Content Filter Analysis Details: (0.0 points) pts rule name description -- -- 0.0 T_HTML_ILLEGAL_STYLE RAW: T_HTML_ILLEGAL_STYLE Subtests Hit: none -- Michael Scheidell, CTO Phone: 561-999-5000, x 1259 | SECNAP Network Security Corporation * Information Security Award 2008, Info Security Products Guide * CRN Magazine Top 40 Emerging Security Vendors * Finalist 2009 Network Products Guide Hot Companies This email has been scanned and certified safe by SpammerTrap(r). For Information please see www.secnap.com/products/spammertrap/
Re: html experts: empty style tags.
On Thu, 29 Jan 2009, Michael Scheidell wrote: (appears that anything inside an empty style/style is not displayed. see more and more of this in spam. can deal with this with a raw body check, but how about adding it to the official SA html checks? For a long time I have had local rules that score on empty STYLE, FONT, STRONG, SPAN and A tags, and strings of adjacent FONT tags. Unfortunately they hit often enough on legitimate mail sent by braindead MUAs (or, more precisely, MUAs with braindead HTML editors/generators) that they cannot be scored very strongly. They might be useful as META fodder to magnify the score of other spam signs, though. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Gun Control laws cannot reduce violent crime, because gun control laws focus obsessively on a tool a criminal might use to commit a crime rather than the criminal himself and his act of violence. --- 3 days until the 6th anniversary of the loss of STS-107 Columbia
Re: html experts: empty style tags.
Dan McDonald wrote on Thu, 29 Jan 2009 08:56:03 -0600: No, obfuscating the actual display: Buy Vistylesdfghjnkrdfbn/styleAgstyleghbfghfgh/stylera! but SA strips all HTML away before content processing, including that garbage within the style tags. And from Michael's description it doesn't sound like it is used like that, anyway. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Re: html experts: empty style tags.
Kai Schaetzl wrote: Dan McDonald wrote on Thu, 29 Jan 2009 08:56:03 -0600: No, obfuscating the actual display: Buy Vistylesdfghjnkrdfbn/styleAgstyleghbfghfgh/stylera! but SA strips all HTML away before content processing, including that garbage within the style tags. And from Michael's description it doesn't sound like it is used like that, anyway. Kai I've seen it used like that. Also, I have a low scoring generic 'body' rule for common drug names that should have hit on Dan's mail (and your reply) if SA did strip that junk, but it obviously doesn't (at least not for me).
Re: html experts: empty style tags.
John Hardin wrote: Unfortunately they hit often enough on legitimate mail sent by braindead MUAs (or, more precisely, MUAs with braindead HTML editors/generators) that they cannot be scored very strongly. you have LEGIT EMAIL with this in it? style -- Michael Scheidell, CTO Phone: 561-999-5000, x 1259 *| *SECNAP Network Security Corporation * Certified SNORT Integrator * King of Spam Filters, SC Magazine 2008 * Information Security Award 2008, Info Security Products Guide * CRN Magazine Top 40 Emerging Security Vendors * Finalist 2009 Network Products Guide Hot Companies _ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ _
Re: html experts: empty style tags.
Michael Scheidell wrote: John Hardin wrote: Unfortunately they hit often enough on legitimate mail sent by braindead MUAs (or, more precisely, MUAs with braindead HTML editors/generators) that they cannot be scored very strongly. you have LEGIT EMAIL with this in it? style I do too. AFAICT, it's Microsoft related. /Per Jessen, Zürich
Re: html experts: empty style tags.
On Thu, Jan 29, 2009 at 08:50:32PM +0100, Per Jessen wrote: you have LEGIT EMAIL with this in it? style I do too. AFAICT, it's Microsoft related. taking a look at my january corpus, there are a relative lot of hits for that, including things like STYLE/STYLE. a lot of the mails, as mentioned above, seem to have this (QP-encoded): meta name=3DGenerator content=3DMicrosoft Word 12 (filtered medium) -- Randomly Selected Tagline: At least it had heated rear windows--so your hands would stay warm while you pushed. - Unknown about the Yugo pgpiWTSCZE7Af.pgp Description: PGP signature
Re: html experts: empty style tags.
On Thu, 29 Jan 2009, Michael Scheidell wrote: John Hardin wrote: Unfortunately they hit often enough on legitimate mail sent by braindead MUAs (or, more precisely, MUAs with braindead HTML editors/generators) that they cannot be scored very strongly. you have LEGIT EMAIL with this in it? style I was speaking of all the empty tag rules together. I like the idea of checking for a style tag with obvious non-style data in it... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- A superior gunman is one who uses his superior judgment to keep himself out of situations that would require the use of his superior skills. --- 3 days until the 6th anniversary of the loss of STS-107 Columbia
Re: html experts: empty style tags.
--On Thursday, January 29, 2009 8:34 AM -0800 John Hardin jhar...@impsec.org wrote: For a long time I have had local rules that score on empty STYLE, FONT, STRONG, SPAN and A tags, and strings of adjacent FONT tags. Unfortunately they hit often enough on legitimate mail sent by braindead MUAs (or, more precisely, MUAs with braindead HTML editors/generators) that they cannot be scored very strongly. They might be useful as META fodder to magnify the score of other spam signs, though. Another problem I see is mismatched tags (eg. foo without /foo or vice versa, or improper nesting). I briefly looked into scoring it but it's very prevalent in ham. One way to test for it is to feed it to tidy and then count the error messages. Divide by the size of the message to normalize the error rate, and score based on errors per character. I don't have any working code. I just hand-fed some messages to tidy and Perl and fed the resulting counts to my calculator. I wish there was some way we could shame Microsoft into generating at least a valid nesting of tags, even ones that violate the DTD, so that we could score for errors. If MS made the same number of errors in the SMTP protocol, we'd never accept their messages. Perhaps a plugin that modified the HTML content to add an error report at the end. That would certainly get all the suits upset that their messages created by their favorite vendor were arriving full of bad markup.
Re: html experts: empty style tags.
--On Thursday, January 29, 2009 2:09 PM -0500 Michael Scheidell scheid...@secnap.net wrote: John Hardin wrote: Unfortunately they hit often enough on legitimate mail sent by braindead MUAs (or, more precisely, MUAs with braindead HTML editors/generators) that they cannot be scored very strongly. you have LEGIT EMAIL with this in it? Microsoft products regularly have STYLE/STYLE for no obvious reason. However style/style lower-case is unusual, but not unheard of. Joseph Brennan Columbia University Information Technology
Re: html experts: empty style tags.
On the subject of style vs style type=text/css *Technically* the TYPE attribute is required in HTML 4, but in practice, no one really uses anything other than CSS, and most browsers will assume it. The current draft of HTML 5 recognizes this, and makes TYPE explicitly optional for STYLE, defaulting to text/css if not present: http://www.whatwg.org/specs/web-apps/current-work/#the-style-element So in HTML 5, this is perfectly valid: style h1 {font-family: Arial} /style It is only allowed within HEAD (though again in practice, most browsers are lenient about this), but if I'm reading the HTML 5 spec correctly, it will also allow style within the body, but *only* if it contains the SCOPED attribute, and only at the beginning of a section, like this: div style scoped h2 {color: green} /style Bunch of content /div But this would not be: div Some content style scoped h2 {color: red} /style More content /div -- Kelson Vibber SpeedGate Communications www.speed.net