[Bug 7091] UTF-8 characters don't work in rules

bugzilla-daemon Mon, 23 Feb 2015 18:15:07 -0800

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7091


--- Comment #4 from John Hardin <[email protected]> ---
(In reply to Mark Martinec from comment #3)
> > > body CRAZY_EURO /€uro/
> > > header SUBJ_CREDIT_FR Subject =~ /crédit/
> > 
> > So... how do we make rules aware of whether or not normalize_charset is
> > enabled?
> 
> The same way as making them aware of original encoding on a text - you can't.
> 
> I have been asking myself the same question - and I think the question
> is wrong. There is no difference (from rules viewpoint) between text
> that is originally encoded as UTF-8 (or plain US-ASCII) and a text that
> is transcoded into UTF-8 from some other character set by normalize_charset.

I apologize, it appears my question was unclear. Let me try again.

normalize_charset is a local configuration option - it can be disabled.

A rule written for use when normalize_charset is enabled will generally be
simpler than one that needs to directly deal with multiple encodings. Is there
a way to write rule alternatives such that one will be used when the
normalize_charset option is enabled and the other when it is not? I'm wondering
if there is something similar to rule variants using or not using a
perl-5.10-ism switched by "if
can(Mail::SpamAssassin::Conf::perl_min_version_5010000)"

Is there no way to intelligently choose between different rules based on such
configuration options? That kinda leaves us with unwelcome alternatives: write
for one mode and ignore the other (which will probably be broken in you write
to normalized text or inefficient and complex if you write to non-normalized)
or write two rules (which will be double the work to scan - not recommended at
all).

Do we need a "can(Mail::SpamAssassin::Conf::normalize_enabled)" or some such?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7091] UTF-8 characters don't work in rules

Reply via email to