On Thu, 15 Jan 2004 18:15:40 +0200 (EET) Nerijus Baliunas <[EMAIL PROTECTED]> wrote:

NB> On Thu, 15 Jan 2004 16:27:35 +0100 (Romance Standard Time) Vadim Zeitlin <[EMAIL 
PROTECTED]> wrote:
NB> 
NB> VZ> XN> 3. Iterate with a script over all the messages in the current folder,
NB> VZ> XN>    and potentially move them into another folder.
NB> VZ> 
NB> VZ>  Yes, this should be possible. I didn't try it but I did try other things
NB> VZ> with Python and at least the possibility to write custom filters is very
NB> VZ> useful and they do work. I used this to write a filter to catch all this
NB> VZ> gibberish spam with 3 lines of random words in each message. It is quite
NB> VZ> simple to detect in Python after getting the msg text using MailFolder
NB> VZ> methods
NB> 
NB> Could you please post your script here?

 Here it is, slightly improved (I had to add export a new class to PYthon
to do this and it took me about 3 minutes in all, including testing on 2
platforms!).

        import re
        import Message
        import MimePart
        import MimeType
        #import MDialogs

        def isgibberish(msg_):
            "Detect if the message is a gibberish spam meant to foil Bayesian filters"
            msg = Message.MessagePtr(msg_)

            # apparently they also appear with other subjects but this form is the
            # most frequent one
            if not re.match("Re: [A-Z]{3,8},", msg.Subject()):
                #MDialogs.Status("ot the right subject form")
                return 0

            # they're also always multipart/alternative with text and html inside
            partTop = msg.GetTopMimePart()
            if partTop.GetType().GetFull() != "MULTIPART/ALTERNATIVE":
                #MDialogs.Status("Not MULTIPART/ALTERNATIVE")
                return 0

            # the text part comes first, as usual, but check for this
            partText = partTop.GetNested()
            if partText.GetType().GetFull() != "TEXT/PLAIN":
                #MDialogs.Status("Not TEXT/PLAIN")
                return 0

            # and they have exactly 3 lines of gibberish in the text part
            if partText.GetNumberOfLines() != 3:
                #MDialogs.Status("Not 3 lines")
                return 0

            # yes, it does look like spam
            return 1

You may decide not to do the subject tests and the lines with MDialogs are
there only for debugging. To use this just put it in spam.py file somewhere
where M can find it and add a filter test with kind == Python and
argument == "spam.isgibberish" (spam is the name of the .py file). The
action may be whatever you want, although tarring spammers in feathers is
not unfortunately supported by M yet :-/

NB> VZ> and it doesn't make sense to add a test for this to C++ code as in
NB> VZ> a few months such spams probably will have disappeared anyhow.
NB> 
NB> Well, who knows? They are designed to fool bayesian filters, so IMHO
NB> they will continue to evolve...

 Yes, what I meant was that they were surely going to change, so it doesn't
make sense to put such tests in the main program permanently.

 Regards,
VZ



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Mahogany-Developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mahogany-developers

Reply via email to