https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8370

            Bug ID: 8370
           Summary: Non-ASCII log entries unconditionally turned into
                    octets
           Product: Spamassassin
           Version: 4.0.2
          Hardware: PC
                OS: FreeBSD
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: Undefined

I've created a plugin, that submits (textual parts of) emails to a local AI,
via an HTTP POST, asking its opinion on whether the email is spam or not. In
addition to a one-word verdict ("SPAM", "HAM, or "UNSURE"), the plugin requests
a one-line explanation.

The explanation is logged and can also be added to the email headers.

The language of the explanation is configurable. Although default is English,
it can be set to anything -- anything the AI is expected to understand, that
is.

It all (almost) works, but, when I set the language to Ukrainian, the Cyrillic
text of the explanation is turned into incomprehensible gibberish of octets by
the logging framework, and that's a shame... The logging ought to attempt to
convert the string(s) from UTF8 into whatever the local charset is (my LANG
environment variable is set to uk_UA.KOI8-U, for example) and log THAT. Maybe,
if the conversion fails it could fallback to the current method of logging hex
-- or, maybe, not even then.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to