On Fri, May 30, 2003 at 01:58:40PM +0200, Sven Luther wrote: >On Thu, May 29, 2003 at 11:53:32AM -0400, David Dawes wrote: >> On Thu, May 29, 2003 at 07:34:28AM +0200, Sven Luther wrote: >> >On Thu, May 29, 2003 at 12:00:22AM -0400, Mike A. Harris wrote: >> >> On Wed, 28 May 2003, Sven Luther wrote: >> >> >> >> >> > I was being sarcastic, his message was encoded with koi8-r, which, along >> >> >> > with being html, is one of the indescriminate reasons people block email >> >> >> > (and get a good number of false positives) >> >> >> >> >> >> however, foreign language encoding is separate from html email. >> >> >> >> >> >> blocking based on foreign language encodings is not such a good idea. >> >> >> blocking html is not so bad, though. >> >> > >> >> >You need to block multi-part mails with only one html part too though, >> >> >which is not so easy to do, i think. >> >> >> >> This filter doesn't catch *everything*, but for the last 6 years >> >> or so, it has had zero false positives for me while subscribed to >> >> limitless numbers of mailing lists. >> >> >> >> :0: >> >> * ^Content-Type:.*text/html >> >> HTML >> > >> >Yep, i have this too, but half the html spam i get pass trough this, and >> >because it is : >> > >> >Content-Type: multipart/alternative; >> > boundary="E_BBFDE6F0B.95CA_CC.D7." >> >... >> >This is a multi-part message in MIME format. >> > >> >--E_BBFDE6F0B.95CA_CC.D7. >> >Content-Type: text/html >> >Content-Transfer-Encoding: quoted-printable >> >... >> >--E_BBFDE6F0B.95CA_CC.D7.-- >> > >> >On the other hand i don't want to catch the emails which have a text and >> >an html section, since they are mostly valid ones. >> >> The XFree86 mailing list filtering checks for a few different types of >> html-only messages, including a few levels deep of nesting (which I've >> seen in some spam). It does catch the occasional false-positive, but >> it's fairly rare, and a reasonable tradeoff given its effectiveness. > >Are they available somewhere so i can take a look ?
No, but the Perl MIME-tools package makes it easy to break down an email message recursively. This is getting off-topic for this list, but here's a code snippet: use MIME::Parser; use MIME::WordDecoder; ... $nparts = int($ent->parts); if ($nparts == 0) { $misc = $ent->head->get('content-type'); if ($misc =~ /text\/html/i) { return "single part HTML message (1)"; } } elsif ($nparts == 1) { my $e = ($ent->parts)[0]; $nparts = int($e->parts); if ($nparts == 0) { $misc = $e->head->get('content-type'); if ($misc =~ /text\/html/i) { return "single part HTML message (2)"; } } elsif ($nparts == 1) { # Maybe this should be done recursively. my $e2 = ($e->parts)[0]; $nparts = int($e2->parts); if ($nparts == 0) { $misc = $e2->head->get('content-type'); if ($misc =~ /text\/html/i) { return "single part HTML message (3)"; } } } } >> >Anyway, i have almost managed to write a sed script doing this, but i am >> >not sure if it is possible to get the value of the boundary and match on >> >it in the address pattern when using sed. >> >> If you're prepared to use perl, there are packages for breaking out the >> mime structure. > >I would rather not use perl, if anything, i would write a small ocaml >program to do it or maybe extend spamoracle which i already call. The >execution cose per mail would be lower this way. I used perl because there was a nice package available that took care of the MIME parsing for me. David -- David Dawes Founder/committer/developer The XFree86 Project www.XFree86.org/~dawes ------------------------------------------------------- This SF.net email is sponsored by: eBay Get office equipment for less on eBay! http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel