-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 13-11-13 21:58, ML mail wrote: > Hi Tom, > > Thanks for your answer, well yes indeeed I do receive ham and spam > but my goal here would be to try and build up a good starting > global corpus to avoid as much as possible spam since the beginning > as I will be migrating quite a lot of domains and mail account to > this new mail server using dspam. I would like to avoid the > situation where user accounts gets migrated and as soon as they are > on the new server they receive a lot of spam messages until they > train their account themselves by marking the message as spam.
So collect the spam corpus from existing accounts that you trust, for instance your own. The last half year of spam in your junk folder should be a lot better than the ancient SA corpus, even when it contains only a few hunderd messages. You could use Sent folder contents as ham (most IMAP users have their outgoing mail on the server). use dspam_train on the collected mail, then migrate some accounts that you own to new machine. See if the filtering is ok, else train some more using a feedback loop that uses the messages that people do sent to designated mail addresses, or by hooking up re-training directly in the imap server [1]. [1] http://wiki2.dovecot.org/Plugins/Antispam > > The problem is that people are lazy and sometime won't even bother > to move their spam mails into the spam folder or send it to a > special spam@ email address. So I am afraid that users will not > train dspam and they will keep on getting a lot of spam. As long as you have some users doing that, and they are working on a global user (and mail contents from users don't differ too much), you should be fine. > > Now this makes me think, can dspam somehow autotrain? I mean get > better with spam detection with the time without having an end > user having to report the FN and FP to dspam? Or am I just dreaming > of science fiction stuff here ;-) Dspam is a heuristic system, which means that it doesn't know anything, but will learn from a teacher. The teacher should know what to learn (i.e. what is spam and what is ham). You (and the retraining users) are the teacher(s). If the teacher doesn't tell the student that he learnt a wrong thing, how will the student correct its work and actually get better? The above suggestions are only tools for teachers, to minimize the effort for doing the required teaching work. :) FWIW: I use the dovecot-antispam to retrain ham and spam, so literally all I have to do to train a FP is 'move ham mail from Spam to Inbox folder in imap client', and vice versa for FN. > > Cheers ML > > > > > On Wednesday, November 13, 2013 9:22 PM, Tom Hendrikx > <[email protected]> wrote: > > > On 13-11-13 20:49, ML mail wrote: >> Hello, > >> I would like to know what you guys recommend as best method for >> training a global user which will then be used for every accounts >> as a starting base. I have defined my global user as such in the >> group file: > >> Unfortunately I have the impression that my globaluser is not >> well trained as still more than 50% of the spam is being seen as >> innocent. I somehow suspect this having to do with the fact that >> the spam and ham mails from spamassassin are very old (10 >> years). > > >> How would you train a global user? and with which data? is there >> any public spam/ham data somewhere which can be used? > > > You are receiving spam and ham, right? Use that for training, a > public ham corpus doesn't look in any way like your regular ham, > and a public spam corpus from 10 years ago doesn't resemble actual > recent spam messages. > > If using (ancient) public corpi was a good solution to spam > filtering, we could drop heuristic engines and just release a > binary database dump every month based on public corpi :) > > Regards, Tom > > ------------------------------------------------------------------------------ > > > DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps > OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API > Access Free app hosting. Or install the open source package on any > LAMP server. Sign up and see examples for AngularJS, jQuery, > Sencha Touch and Native! > http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk > > > _______________________________________________ > Dspam-user mailing list [email protected] > https://lists.sourceforge.net/lists/listinfo/dspam-user > > > > ------------------------------------------------------------------------------ > > > DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps > OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API > Access Free app hosting. Or install the open source package on any > LAMP server. Sign up and see examples for AngularJS, jQuery, > Sencha Touch and Native! > http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk > > > > > > _______________________________________________ Dspam-user mailing > list [email protected] > https://lists.sourceforge.net/lists/listinfo/dspam-user > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJSg/QiAAoJEJPfMZ19VO/14+gQAJjMqncUGBkQt1bgbzLOO5OS kx4ZvYRBbjuaX7HJtya4Xz1aNHEYGxXZmsA91sQk+nMf80ULitZ2UjCm5zSqtmWm ut0ZUlcIsL/aZVToDHbpt4BF0eNypltOGDasdmxqeX/7j4SDIG3dDjqRR5JhW3ni ZJSZE/VFfuHU5Z4krzbZzPQgUymQMs9mDsCenRXJi0f1X8odNLGo3iYuKTdaN2du +k9pkCeyJ1FTKQXeJ4CSAUUTOQp0YoPPtftOaWrCm7yzguC6S6hto94MC0mRWpxO XFia8oiqCff5YZb5ggaZdsXJw6a0qyEZxu4r2zT58gfUlt4bY63dGuVQlVuIW1LK w1sGKh2Cp+fJFZQK1ra4U2j9xjs8bCm5Z+xRZ176STOcjuKNbvyXdKw/z/8roiNF fMasnj9F+gFKexaqKOKGpkg8Lc2jEWO9b1nX72QoefrdD91iFRz4Q7EM10Wp7Cot 7ZMKM70Tz+kXgtSsEilbyHYzyPq9ULRMUznQZUSgk1grM8oYaHl9mNUXPb6ujJ/3 y8+0sS8mVNovRjIWW2jTZZKpgSpkYyBx8dn8/Lw3qJSYzzNs+Jcs3XiwHp9wslpm baEK3EvU74tMAeU6pSLw2pi9FmWnWarx2zMJPwsEpz9T0I6cFworYJAd5R31CrZQ 1h0/tuly60YyXqAZPZSW =F6PM -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
