[Bug 5691] Slow rules on Russian spam

bugzilla-daemon Wed, 24 Oct 2007 07:48:04 -0700

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5691






------- Additional Comments From [EMAIL PROTECTED]  2007-10-24 07:45 -------
(In reply to comment #9)
> I wonder if perhaps this bug should be closed and the Bug 4636
> reopened, as it doesn't do its job well.
> 
> I'm not sure I know what is the true purpose of utf8::downgrade
> in r366594 (comment no.12 in Bug 4636). If the intention was to
> clear the utf8 flag for speed, it doesn't work this way, as the
> utf8::downgrade aborts its operation leaving the utf8 flag on
> (with or without throwing a signal, depending on its second arg)
> if a string contains a character that does not have a representation
> in the current locale charset

aha.  that's not good. :(


> My proposed patch just makes sure that a conversion is made
> from characters to utf-8 octets, this making sure the regexps
> in rules need not deal with 16-bit characters, which makes them
> terribly slow.
> 
> If the intention of the original utf8::downgrade call was to really
> convert to a locale character set, it needs to be documented and
> probably the target charset needs to be configurable. Also not to forget
> that utf8::downgrade is unsupported/experimental in Perl and needs
> to be replaced with something else anyway.
> 
> Reopening Bug 4636 or keeping this topic here?

I would suggest 

(a) retitling this bug to be more specific about utf8::downgrade
causing the issue

and (b) and commenting on bug 4636, indicating that there's a
bug in that code, and pointing at this.

That assumes that, if we replace the utf8::downgrade usage, we
have fixed this bug in its entirety -- is that the case?  If not,
then the utf8::downgrade bug should be opened as a new issue.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5691] Slow rules on Russian spam

Reply via email to