Sidney Markowitz writes: > Sidney Markowitz wrote, On 2/5/07 7:22 AM: > > EXTRA_MPART_TYPE && __OE_MUA && !__FORGED_OE > > I've come up with some information and some questions about this after > looking at the results of a set of rules T_SIDNEY_* that I put into my > sandbox. > > Here is the situation: EXTRA_MPART_TYPE looks for a Content-Type header > that contains both a content-type multipart/ specification and another > "type=" content-type specification. At first glance that seems wrong and > redundant and a good spam sign given it's good S/O ratio and rank. > > However, it turns out that RFC 2387 specifies Content-Type > multipart/related as having a type= field that describes the > content-type of its root MIME section. The EXTRA_MPART_TYPE rule will > fire on any RFC-compliant multipart/related message. It is the correct > MIME type to use for a message that includes components referenced by > other components. The common example would be an HTML message that > includes images that are not external links.
Well, don't forget -- RFC-compliant != nonspam. We're a spam-detection tool, not RFC-compliance-detection, so sometimes an RFC-compliant feature is still worth using as a rule. Having said that, EXTRA_MPART_TYPE is a pretty scary rule, and the whole area of ham FPs on mails with inline GIFs is, I suspect, pretty vast. :( This is why we locked it's score to 1.0, after all. It'd be great to sort this out. > Please look at past discussion on this list and in bug 5224 about > OE_MULTIPART_RELATED. That rule was proposed in that bug and turned out > to have a good S/O ratio. However, it was pointed out that there are > legitimate emails that trigger it and there are no signs that can be > used to distinguish the multipart/related header of Outlook Express mail > that is spam and that is ham. The end result of the discussion was that > Justin agreed that the rule should not be promoted out of testing. It looks like in that bug, the rule was added into testing -- was it removed later, after that point? > Which brings me to EXTRA_MPART_TYPE. That rule also matches something > which is legitimate RFC-compliant recommended usage when you want to > send HTML mail with embedded images. If it doesn't get quite as good S/O > as OE_MULTIPART_RELATED it's perhaps because there is a bit more ham > that does that without using OE or forged OE. That does mean that you > would see a more accurate slightly lower S/O for OE_MULTIPART_RELATED by > removing from the hits anything that also hit FORGED_OE. > > So should we really be using the EXTRA_MPART_TYPE rule? > > To get a more fine-grained idea about what is going on with it, see the > T_SIDNEY* rules from my sandbox. The names show what they are testing, > with "OE" meaning Outlook Express excluding forged OE, HTML matching > messages with HTML, EMPT meaning messages that match EXTRA_MPART_TYPE, > and an "N" prefix to any of those three being a "Not". > > I also just added T_SIDNEY_EMPT_NMPREL, T_SIDNEY_OE_EMPT_NMPREL, > T_SIDNEY_NOE_EMPT_NMPREL to see if there are any EXTRA_MPART_TYPE emails > that are not actually RFC2387 multipart/related messages. That hasn't > been run through mass test yet as I type this. I'd be fine with deprecating EXTRA_MPART_TYPE and replacing it with a better rule/rules, I think. Go for it ;) --j.
