thanks a lot Stephen for all the suggestions :) Avik Pal Bengal Engineering & Scieence University,Shibpur github:https://github.com/avikpal IRC:- irc://freenode/avikp,isnick twitter:-https://twitter.com/avikpalme
On 17 April 2013 22:36, Stephen J. Turnbull <[email protected]> wrote: > Avik Pal writes: > > > Meanwhile It would be much appreciated if someone can direct me to > > an labeled dataset available on line. > > By "labelled" you mean pre-classified into spam vs ham? I see you > already found one, but you could also check the SpamBayes and > SpamAssassin distributions. > > > Here I have a suggestion, after submitting, whenever an email is > > classified as Spam, we store it in a separate archive and after the > > end of the day send them a mail telling "this is the digest for all > > the mails that Mailman thinks to be Spam" the subscriber may go > > there and can view them and also can mark them as not Spam, > > I suggest that you present this as an option for users who want to > tune the filters, and as something that can be used pre-release to > develop the initial parameters for the distributed classifier. > Although Bayesian classifiers do offer the option to train or tune > your personal classifier on a local corpus, most users just stick with > the distribution parameters plus self-training. It's pretty effective > (surprisingly so to me). I guess the logic is that spammers aren't > terribly creative. > > > Emails which stays as Spam will be dropped after a month > > Let's think carefully about that. Everybody deletes the spam; that's > why you started by asking for a labelled dataset, because nobody keeps > one around. Somebody really ought to do the public service of > collecting a corpus. Of course, if you do arrange to keep it around, > it's going to need to be an option that sites and list owners can > disable. > > _______________________________________________ Mailman-Developers mailing list [email protected] http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9
