I have "TrainingMode tum" set in dspam.conf and I get the impression
that this is being applied even during learning from
corpora with --source=corpus . 

If I learn the ham corpus before the spam corpus, I get a
lot of ubiquitous tokens, like  "Received*LMTPA", scoring <0.01, 
or >0.99 if done the other way around. If I learn the corpora with
--mode=teft everything starts to behave sensibly.

I'm not sure if this is intentional (though poorly documented), or
if it's a bug. If it is then I suspect the following if statement
is missing a test for "CTX->source == DSS_CORPUS"


>From libdspam.c

2530     if (ds_term->type == 'D' &&
2531         ( CTX->training_mode != DST_TUM  || 
2532           CTX->source == DSS_ERROR       ||
2533           CTX->source == DSS_INOCULATION ||
2534           ds_term->s.spam_hits + ds_term->s.innocent_hits < 50 ||
2535           ds_term->key == diction->whitelist_token             ||
2536           CTX->confidence < 0.70))
2537     {
2538         ds_term->s.status |= TST_DIRTY;
2539     }



------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Dspam-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to