https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5861
Henrik Krohns <[EMAIL PROTECTED]> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
--- Comment #8 from Henrik Krohns <[EMAIL PROTECTED]> 2008-04-10 03:50:18 PST
---
I'm not comfortable in closing this bug yet.
(In reply to comment #7)
> (In reply to comment #5)
> > So is there something that can help with these short messages, that don't
> > create many tokens? When there aren't enough body tokens, by default all
> > those
> > hammy header tokens are sure to prevent correct scoring. It forces me to
> > ignore
> > such headers.
>
> Training on error should help -- train mostly on FPs and FNs from now on.
How can this help? If it wasn't obvious, ofcourse I trained it. It didn't help.
A mail from gmail had so many hammy tokens, it is impossible to train without
other more specific tokens.
Isn't there more stuff you can create tokens from, like filenames? What if you
get a mass of spam from gmail, containing only .doc attachment and no body? It
will still score BAYES_50 or something, all the hammy gmail tokens will prevent
better scores!! I demonstrated this already in my first post. Atleast my DKIM
patch should help remove some of excess tokens. I'll try to test how it
affects.
I know you guys are busy, but I think this isn't something to just shrug off.
Or is it just something that is rare and "gotta live with it"? Is there any
interest from your side in enchancing the Bayes engine or does it have to come
from contributions? You are the ones that know the system best.
> > Also whats the deal with saving those X-Spam-Relays-Internal tokens? I
> > ignored
> > it since I can't figure out any purpose to bloat my db.
>
> Consider a site with 2 MXes -- a primary and secondary MX. both are listed as
> IPs in internal_networks. For some reason, spammers tend to like sending spam
> via the secondary. The presence of that MX's IP in the
> 'X-Spam-Relays-Internal' hdr therefore becomes a spam sign, for that site.
>
There is still atleast one question unanswered. Why is the _unique_ mail id
recorded as a token? I understand IP, but not that.
If you don't have time, then please answer when you have it. It seems you just
try to blaze though as fast as you can.
I will try to analyze and help with this, but I could really use some
insightful input.
--
Configure bugmail:
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.