http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3787





------- Additional Comments From [EMAIL PROTECTED]  2005-12-15 07:35 -------
Here is another workaround that seems to work in the demo script at
http://taint.org/xfer/2005/demo_utf8_bug.pl and doesn't require changing all the
rule patterns:

Insert  use encoding 'utf8';  at the beginning of the block that defines the
patterns that contain the \xC4 and so on bytes, i.e.,

  sub run_regexp {
    use encoding 'utf8';

I haven't tried adding that to whatever processes the rule regexps. Can someone
who knows where that would go put it in and see if it fixes the problem?

Alternatively,

  utf8::encode($text);

on the output from HTML::Parser solves the problem from the other direction.
That is, the first changes the pattern , the second changes the parsed text
string, either one working to get them to match each other. I'm just more wary
about changing the parsed text string because that seems to me more likely to
have side effects. I'm open to comments from someone who better understands the
internals of Perl strings and UTF-8 and other unicode encodings.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to