I have "TrainingMode tum" set in dspam.conf and I get the impression
that this is being applied even during learning from
corpora with --source=corpus .
If I learn the ham corpus before the spam corpus, I get a
lot of ubiquitous tokens, like "Received*LMTPA", scoring <0.01,
or >0.99 if done the other way around. If I learn the corpora with
--mode=teft everything starts to behave sensibly.
I'm not sure if this is intentional (though poorly documented), or
if it's a bug. If it is then I suspect the following if statement
is missing a test for "CTX->source == DSS_CORPUS"
>From libdspam.c
2530 if (ds_term->type == 'D' &&
2531 ( CTX->training_mode != DST_TUM ||
2532 CTX->source == DSS_ERROR ||
2533 CTX->source == DSS_INOCULATION ||
2534 ds_term->s.spam_hits + ds_term->s.innocent_hits < 50 ||
2535 ds_term->key == diction->whitelist_token ||
2536 CTX->confidence < 0.70))
2537 {
2538 ds_term->s.status |= TST_DIRTY;
2539 }
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Dspam-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-devel