On Fri, May 30, 2003 at 01:58:40PM +0200, Sven Luther wrote:
>On Thu, May 29, 2003 at 11:53:32AM -0400, David Dawes wrote:
>> On Thu, May 29, 2003 at 07:34:28AM +0200, Sven Luther wrote:
>> >On Thu, May 29, 2003 at 12:00:22AM -0400, Mike A. Harris wrote:
>> >> On Wed, 28 May 2003, Sven Luther wrote:
>> >> 
>> >> >> > I was being sarcastic, his message was encoded with koi8-r, which, along
>> >> >> > with being html, is one of the indescriminate reasons people block email
>> >> >> > (and get a good number of false positives)
>> >> >> 
>> >> >> however, foreign language encoding is separate from html email.
>> >> >> 
>> >> >> blocking based on foreign language encodings is not such a good idea.
>> >> >> blocking html is not so bad, though.
>> >> >
>> >> >You need to block multi-part mails with only one html part too though,
>> >> >which is not so easy to do, i think.
>> >> 
>> >> This filter doesn't catch *everything*, but for the last 6 years 
>> >> or so, it has had zero false positives for me while subscribed to 
>> >> limitless numbers of mailing lists.
>> >> 
>> >> :0:
>> >> * ^Content-Type:.*text/html
>> >> HTML
>> >
>> >Yep, i have this too, but half the html spam i get pass trough this, and
>> >because it is :
>> >
>> >Content-Type: multipart/alternative;
>> >        boundary="E_BBFDE6F0B.95CA_CC.D7."
>> >...
>> >This is a multi-part message in MIME format.
>> >
>> >--E_BBFDE6F0B.95CA_CC.D7.
>> >Content-Type: text/html
>> >Content-Transfer-Encoding: quoted-printable
>> >...
>> >--E_BBFDE6F0B.95CA_CC.D7.--
>> >
>> >On the other hand i don't want to catch the emails which have a text and
>> >an html section, since they are mostly valid ones.
>> 
>> The XFree86 mailing list filtering checks for a few different types of
>> html-only messages, including a few levels deep of nesting (which I've
>> seen in some spam).  It does catch the occasional false-positive, but
>> it's fairly rare, and a reasonable tradeoff given its effectiveness.
>
>Are they available somewhere so i can take a look ?

No, but the Perl MIME-tools package makes it easy to break down an email
message recursively.

This is getting off-topic for this list, but here's a code snippet:

use MIME::Parser;
use MIME::WordDecoder;

  ...

    $nparts = int($ent->parts);
    if ($nparts == 0) {
        $misc = $ent->head->get('content-type');
        if ($misc =~ /text\/html/i) {
            return "single part HTML message (1)";
        }
    } elsif ($nparts == 1) {
        my $e = ($ent->parts)[0];
        $nparts = int($e->parts);
        if ($nparts == 0) {
            $misc = $e->head->get('content-type');
            if ($misc =~ /text\/html/i) {
                return "single part HTML message (2)";
            }
        } elsif ($nparts == 1) {
            # Maybe this should be done recursively.
            my $e2 = ($e->parts)[0];
            $nparts = int($e2->parts);
            if ($nparts == 0) {
                $misc = $e2->head->get('content-type');
                if ($misc =~ /text\/html/i) {
                    return "single part HTML message (3)";
                }
            }
        }
    }


>> >Anyway, i have almost managed to write a sed script doing this, but i am
>> >not sure if it is possible to get the value of the boundary and match on
>> >it in the address pattern when using sed.
>> 
>> If you're prepared to use perl, there are packages for breaking out the
>> mime structure.
>
>I would rather not use perl, if anything, i would write a small ocaml
>program to do it or maybe extend spamoracle which i already call. The
>execution cose per mail would be lower this way.

I used perl because there was a nice package available that took care
of the MIME parsing for me.

David
-- 
David Dawes
Founder/committer/developer                     The XFree86 Project
www.XFree86.org/~dawes


-------------------------------------------------------
This SF.net email is sponsored by: eBay
Get office equipment for less on eBay!
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to