Re: HTML Validator
--On Friday, March 10, 2006 5:08 PM -0800 Kenneth Porter [EMAIL PROTECTED] wrote: Anyone know of a good validator that can be run over a MIME part to report on the quality of the HTML? This might be used as a go/no-go filter at milter level, or it could be used as an SA plugin to assign a variable score based on the quality of the HTML. For mailing lists catering to newbies who love HTML and can't understand why us old-timers hate it, we can set the list to exclude all invalid HTML. Sure, we'll accept your HTML. But only if it's really HTML. Not that crap that most MUA's write. I was trying to remember a web page I found that counseled not to use DOCTYPE and HTML tags around email to escape spam filters (pretty weird advice IMO) and I ran across indications that AOL is rejecting mail that fails to pass validation: http://www.petefreitag.com/item/307.cfm http://info.aol.co.uk/about/spam/mailer-daemon.adp http://postmaster.info.aol.com/errors/554hvufo.html http://www.clickz.com/showPage.html?page=3490146
Re: HTML Validator
Theo Van Dinter wrote: On Wed, Mar 15, 2006 at 09:58:52PM -0700, Philip Prindeville wrote: Ok, does anyone have *recent* statistical analysis (i.e. not almost a year old) on this? It could be that the people using this boneheaded construct have realized the error of their ways, and stopped doing it. Unfortunately not. I updated the ticket (http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4255) with new stats and a plugin that implements the check so people can play with it. The best version was comparing domains: MSECSSPAM% HAM% S/ORANK SCORE NAME 028446 50230.850 0.000.00 (all messages) 0.0 84.9921 15.00790.850 0.000.00 (all messages as %) 0.302 0.3340 0.11950.737 0.000.01 T_HTTPS_HTTP_MISMATCH If people want to play with the plugin and can improve the hit rate to a usable level (or if you find a bug in the code), please let us know! But otherwise this rule sucks pretty badly. :( Hmm. Thanks. Trying out the attachment, but having issues. Using 3.1.0 on FC3 Linux. Updated the bug. -Philip
Re: HTML Validator
On Thu, Mar 16, 2006 at 12:50:34PM -0700, Philip Prindeville wrote: Hmm. Thanks. Trying out the attachment, but having issues. Using 3.1.0 on FC3 Linux. Updated the bug. In general, it's bad to have the same conversation in multiple locations. I'd prefer to discuss issues with the plugin here as opposed to bugzilla since the plugin was put there so that people in the future can easily access it. Debugging problems and such I'd prefer to talk about here. I also responded to your issue in the ticket. It essentially came down to: yes, the plugin works fine with 3.1.0. The errors you saw indicate that you're not using 3.1.x. -- Randomly Generated Tagline: Diversity is God's way of amusing himself. pgpZlDCZ1KuCe.pgp Description: PGP signature
Re: HTML Validator
Kenneth Porter wrote: On Friday, March 10, 2006 9:43 PM -0700 Philip Prindeville [EMAIL PROTECTED] wrote: Do you mean: http://validator.w3.org/source/ I thought that was just a web form-based validator. I'll have to look at it to see if the validator can be run over an attachment (ie. an HTML MIME part) from a separate mail filter (eg. MIMEDefang). I'm wondering what would be involved in putting in an HTML parser that could call various rules to check things, like the case of: a href=http://www.foo.com/xyzzy;http://www.bar.com/aardvark/a where the link disagrees with the text between the anchor tags (yeah, you could limit it to partial matches on the host-portion)... This seems to be the Korean Chase issue that Chris encountered. -Philip
Re: HTML Validator
On Wed, Mar 15, 2006 at 08:13:48PM -0700, Philip Prindeville wrote: I'm wondering what would be involved in putting in an HTML parser that could call various rules to check things, like the case of: Well, you wouldn't call various rules, you'd look for a behavior while parsing and flag it for later detection by a rule. The current code means modificaations have to be made to HTML.pm. a href=http://www.foo.com/xyzzy;http://www.bar.com/aardvark/a This kind of rule actually doesn't need to be in the HTML parser, you could easily write a plugin that uses the already parsed anchor information. FWIW though, this rule has previously been discussed and dismissed as being non-useful (too many FPs). Earlier today on this list even. ;) -- Randomly Generated Tagline: You can lead a bigot to water, but if you don't tie him up you can't make him drown. - The Psychodots pgpoYaMYEPiT8.pgp Description: PGP signature
Re: HTML Validator
Philip Prindeville wrote: I'm wondering what would be involved in putting in an HTML parser that could call various rules to check things, like the case of: a href=http://www.foo.com/xyzzy;http://www.bar.com/aardvark/a where the link disagrees with the text between the anchor tags (yeah, you could limit it to partial matches on the host-portion)... This is the functional equivalent of pissing in the wind. If you are downwind, you are going to get wet. Anchor text in too many/most cases will not match the HREF. grep is good, but it isn't good enough to catch all cases without significant overhead. Anchor text is a descriptor, nothing more than that. It is not a regurgitation of the link HREF.
Re: HTML Validator
Craig Morrison wrote: Philip Prindeville wrote: I'm wondering what would be involved in putting in an HTML parser that could call various rules to check things, like the case of: a href=http://www.foo.com/xyzzy;http://www.bar.com/aardvark/a where the link disagrees with the text between the anchor tags (yeah, you could limit it to partial matches on the host-portion)... This is the functional equivalent of pissing in the wind. If you are downwind, you are going to get wet. Anchor text in too many/most cases will not match the HREF. grep is good, but it isn't good enough to catch all cases without significant overhead. Anchor text is a descriptor, nothing more than that. It is not a regurgitation of the link HREF. Usually it's not. That's the point. It's when the anchor text is tries to look like a URL that one needs to be suspicious. At the very least, if the anchor text starts with https://; but the anchor URL looks like http://;, I'd say that this is a definite spam. Does anyone have a way of doing a statistical analysis of ham that contains http(s?):// as the beginning of the anchor text? -Philip -Philip
Re: HTML Validator
On Wed, Mar 15, 2006 at 08:40:51PM -0700, Philip Prindeville wrote: Does anyone have a way of doing a statistical analysis of ham that contains http(s?):// as the beginning of the anchor text? So for the second time today: http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4255 -- Randomly Generated Tagline: We are what we pretend to be. -- Kurt Vonnegut, Jr. pgpqebd3pCJGD.pgp Description: PGP signature
Re: HTML Validator
Theo Van Dinter wrote: On Wed, Mar 15, 2006 at 08:40:51PM -0700, Philip Prindeville wrote: Does anyone have a way of doing a statistical analysis of ham that contains http(s?):// as the beginning of the anchor text? So for the second time today: http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4255 Ok, does anyone have *recent* statistical analysis (i.e. not almost a year old) on this? It could be that the people using this boneheaded construct have realized the error of their ways, and stopped doing it. -Philip
Re: HTML Validator
Kenneth Porter wrote: On Wednesday, March 08, 2006 6:46 PM -0800 Kenneth Porter [EMAIL PROTECTED] wrote: Makes me wonder about installing outbound filters that run a validator and reject anything that fails. I often see flame wars on mailing lists about allowing HTML posts to the list, but I wonder how the arguments would change if one allowed only *validated* HTML. I'll bet most who insist on using HTML would immediately be rejected by the validator. Sorry, your message was rejected because your MUA vendor writes garbage that we can't parse, and makes you look like a spammer. ;) Anyone know of a good validator that can be run over a MIME part to report on the quality of the HTML? This might be used as a go/no-go filter at milter level, or it could be used as an SA plugin to assign a variable score based on the quality of the HTML. For mailing lists catering to newbies who love HTML and can't understand why us old-timers hate it, we can set the list to exclude all invalid HTML. Sure, we'll accept your HTML. But only if it's really HTML. Not that crap that most MUA's write. I have never used it in a mail context; but tidy (from our friends at w3 http://www.w3.org/People/Raggett/tidy/) is a very nice validator. Might be too big a load for SA, tho. I think you will also find that M$ html output from OE is probably full of errors anyway...
Re: HTML Validator
Eric W. Bates wrote: I have never used it in a mail context; but tidy (from our friends at w3 http://www.w3.org/People/Raggett/tidy/) is a very nice validator. Might be too big a load for SA, tho. I think you will also find that M$ html output from OE is probably full of errors anyway... All the better. Maybe they can be shamed into fixing it. ;-) And maybe pigs will grow wings... Sigh. -Philip
HTML Validator (was: Interesting Phishing Trick)
On Wednesday, March 08, 2006 6:46 PM -0800 Kenneth Porter [EMAIL PROTECTED] wrote: Makes me wonder about installing outbound filters that run a validator and reject anything that fails. I often see flame wars on mailing lists about allowing HTML posts to the list, but I wonder how the arguments would change if one allowed only *validated* HTML. I'll bet most who insist on using HTML would immediately be rejected by the validator. Sorry, your message was rejected because your MUA vendor writes garbage that we can't parse, and makes you look like a spammer. ;) Anyone know of a good validator that can be run over a MIME part to report on the quality of the HTML? This might be used as a go/no-go filter at milter level, or it could be used as an SA plugin to assign a variable score based on the quality of the HTML. For mailing lists catering to newbies who love HTML and can't understand why us old-timers hate it, we can set the list to exclude all invalid HTML. Sure, we'll accept your HTML. But only if it's really HTML. Not that crap that most MUA's write.
Re: HTML Validator
Kenneth Porter wrote: Anyone know of a good validator that can be run over a MIME part to report on the quality of the HTML? This might be used as a go/no-go filter at milter level, or it could be used as an SA plugin to assign a variable score based on the quality of the HTML. For mailing lists catering to newbies who love HTML and can't understand why us old-timers hate it, we can set the list to exclude all invalid HTML. Sure, we'll accept your HTML. But only if it's really HTML. Not that crap that most MUA's write. Do you mean: http://validator.w3.org/source/ -Philip
Re: HTML Validator
On Friday, March 10, 2006 9:43 PM -0700 Philip Prindeville [EMAIL PROTECTED] wrote: Do you mean: http://validator.w3.org/source/ I thought that was just a web form-based validator. I'll have to look at it to see if the validator can be run over an attachment (ie. an HTML MIME part) from a separate mail filter (eg. MIMEDefang).