https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042
John Hardin <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #3 from John Hardin <[email protected]> 2010-09-25 14:15:30 UTC --- (In reply to comment #2) > Help, I get warn: Malformed UTF-8 character (unexpected continuation byte > 0xac, > with no preceding start byte) in pattern match (m//) at > /home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1. > And that rule is not meant to be UTF-8 at all. That rule is > body J_BODY_US_BIG5 > /\xBFn\xA5\xFD\xA5\xCD|\xA4\xA6(\xA5\xA7|\xA5\xFD\xA5\xCD)|\xAC\xD5\xA5\xC9|\xA4G\xAB\xD7\xA4\xC0\xB1a|\xAA\xEA\xA4l\xA4s|\xBD\xBA\xB6\xE9/ Perl can sometimes get confused by REs like that, and it's not consistent either. The safest thing to do when coding strings of 8-bit characters like that is to enclose each character in a run in square brackets to make it a character class. This prevents Perl from trying to interpret pairs as a UTF-8 character. For example: body J_BODY_US_BIG5 /[\xBFn][\xA5][\xFD][\xA5][\xCD]|[\xA4][\xA6](?:[\xA5][\xA7]|[\xA5][\xFD][\xA5][\xCD])|[\xAC][\xD5][\xA5][\xC9]|\xA4G[\xAB][\xD7][\xA4][\xC0][\xB1a]|[\xAA][\xEA][\xA4l][\xA4s]|[\xBD][\xBA][\xB6][\xE9]/ -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
