https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7022

--- Comment #18 from Ivo Truxa <[email protected]> ---
(In reply to John Hardin from comment #5)
> If this is done globally we'll lose the ability to detect some forms of
> obfuscation. On the flip side, discarding the accents may have the effect of
> making that obfuscation pointless.
> 
> How does that balance out? Do we gain more from discarding all accents than
> we lose from being able to tell whether or not accents are being used to
> obfuscate a common word, which is a fairly strong spam sign?

I come back again to this comment. As I wrote, it needs to be tested to see the
reality, but in fact I am persuaded it can be only better. 

You need to ask yourself why do spammers obfuscate some words? Certainly not
because they are hammy, but just because every anti-spam filter would
immediately catch them. They obfuscate the most spammy words. So the fear that
by removing the obfuscation you lose the advantage of a strong spam marker, is
false. Quite in contrary - the original unobfuscated spam-word will become even
much more spammy than before (thanks to many more hits), and will help to catch
the spam easier.

Only in the case that the obfuscated word transliterates in something else than
the original spam word, the score of the original word could not be used (and
increased), but in very most cases you would get a new nonsense-word, that
would become as strong spam marker as its obfuscated version.

Ivo

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to