https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229

--- Comment #5 from Mark Martinec <[email protected]> 2011-05-04 19:56:33 
UTC ---
> If you mean..
> $word = lc($word) if $word =~ /[a-zA-ZöäåÖÄÅ]{4}/;

Yes, something like that.

> I'm fine with that if it handles the special chars? But isn't that locale
> dependent? Doesn't seem to work for me.

It is locale dependent I believe. Another case for a Bug 3062.
It should be documented somewhere that SpamAssassin should be
run under a C locale.

> I guess more of the special chars would need to be handled in any case,
> I just went with the finnish ones and it worked for me well..

I only tried it with my installed version of perl under a C locale,
seems the lc() handles such characters well, but the string must be
decoded first into a correct character set - does not work on raw octets,
as it has not idea that these can be interpreted as ISO Latin1.

Ok, so this is probably too much of a change for a minor release,
backing off my suggestion.

What will happen with 8-bit characters in the source code?
So far there is no such case as far as I can tell.
Maybe these should be encoded in the source as \ooo or \x{hh}
to stay on the safe side (not depending on a locale).

Other common uppercase letters with diacritics from Latin1
should be included in the set I suppose.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to