https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8370
Bug ID: 8370
Summary: Non-ASCII log entries unconditionally turned into
octets
Product: Spamassassin
Version: 4.0.2
Hardware: PC
OS: FreeBSD
Status: NEW
Severity: normal
Priority: P2
Component: Libraries
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: Undefined
I've created a plugin, that submits (textual parts of) emails to a local AI,
via an HTTP POST, asking its opinion on whether the email is spam or not. In
addition to a one-word verdict ("SPAM", "HAM, or "UNSURE"), the plugin requests
a one-line explanation.
The explanation is logged and can also be added to the email headers.
The language of the explanation is configurable. Although default is English,
it can be set to anything -- anything the AI is expected to understand, that
is.
It all (almost) works, but, when I set the language to Ukrainian, the Cyrillic
text of the explanation is turned into incomprehensible gibberish of octets by
the logging framework, and that's a shame... The logging ought to attempt to
convert the string(s) from UTF8 into whatever the local charset is (my LANG
environment variable is set to uk_UA.KOI8-U, for example) and log THAT. Maybe,
if the conversion fails it could fallback to the current method of logging hex
-- or, maybe, not even then.
--
You are receiving this mail because:
You are the assignee for the bug.