Raj skrev, on 03-08-2007 06:18:
i had a question concerning dspam training ...
i used shared group -- one single user "common" for the entire server with toe
mode
Same here with the group on 2 (entirely differently configured Postfix
MTA) sites, but on both I use a shared group and teft. One of the sites
is my own PC with Postfix/Fetchmail and few Postfix-configurable
anti-spam features possible, one is a production site for 1500+ users on
which Postfix/policyd is refusing a massive (and increasing every day)
amount of stuff before it ever gets to dspam.
i train dspam using aliases -- ie just forward to spam / not-spam aliases
I train dspam by the user dragging incorrectly judged messages (spam or
non-spam) to a "misjudged" folder and running a cron job on it every
hour. Same at both sites.
i have not done any corpus training till today
The school site has had a massive corpus training, the home site didn't
at first, but after a while the results were so unsatisfactory, that I
fed it as much spam and non-spam as I could, with dspam_train. This
doesn't offer trained spam as corpusfed, though.
i have never purged the dspam database
Purge both sites every week with 'dspam_clean -p', 'cos I don't trust
purge-4.1.sql.
i have noticed a few emails (html text) of absolutely the same type come into
my mailbox undetected as spam. This is a rare incident but happens. ie once in
around 2-3 days.
Major part of the entire body content of the spam email ie html code behind the
scene is exactly the same. All that varies is the hyperlink at the bottom which
points to different websites every time.
you can see them here
http://24x7server.net/spam.html
Unfortunately, the code renders in my Firefox 2.0.0.6 and all I see is
the spam message :)
However, I have 2 of these from 22-05 and 26-05 in my own site's spam
folder and can look at them there. My policy is to put everything that
is spam that gets into my inbox and I have to retrain, into the spam
folder after training. Everything - 80-90 per day - that dspam judges
correctly I chuck. The fact that I only have two of these in my spam
folder would tend to show that dspam has learned very quickly.
i want to know your experience in this matter ...any tips would be helpful
Change toe to teft. Turn on debugging and go through the debug output
for stuff that you're interested in and see on which premises spam is
being detected. If you don't immediately know what some of the criteria
mean, post here. Make sure logrotate is switched on for your debug
stuff, with compress on. Purging old stuff does no harm, doesn't affect
dspam's accuracy negatively. I don't think that my spams can help you,
since, even though using a shared group, the recipient's name is used by
dspam to judge, but if you want them, I can offer a tarball on my ftp site.
my dspam stats
common:
TP True Positives: 40383
TN True Negatives: 81087
FP False Positives: 41
FN False Negatives: 813
SC Spam Corpusfed: 759
NC Nonspam Corpusfed: 0
TL Training Left: 0
SHR Spam Hit Rate: 98.03%
HSR Ham Strike Rate:0.05%
OCA Overall Accuracy: 99.30%
That's better than my home site, but not good enough:
TP True Positives: 3465
TN True Negatives: 21215
FP False Positives: 4
FN False Negatives: 323
SC Spam Corpusfed: 74
NC Nonspam Corpusfed: 7
TL Training Left: 0
SHR Spam Hit Rate 91.47%
HSR Ham Strike Rate: 0.02%
OCA Overall Accuracy: 98.69%
The school's site is:
TP True Positives: 20963
TN True Negatives: 111208
FP False Positives: 508
FN False Negatives: 408
SC Spam Corpusfed: 3486
NC Nonspam Corpusfed: 3002
TL Training Left: 0
SHR Spam Hit Rate 98.09%
HSR Ham Strike Rate: 0.45%
OCA Overall Accuracy: 99.31%
I'm content with that.
Best,
--Tonni
--
Tony Earnshaw
Email: tonni at hetnet dot nl