http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5083





------- Additional Comments From [EMAIL PROTECTED]  2006-09-05 15:55 -------
This idiom:

  { use bytes; $len = length($msg_resp) }

is actually the recommended way to get the length in bytes of a unicode string.
 We definitely need to be using that, alright.

The difficulty with manually doing the utf8 conversions in our own code is that
there are other places where a string will be "upgraded" automatically in the
other parts of the perl API.  For example, concatenating a marked-as-UTF-8
string and a marked-as-non-UTF-8 string will result in double encoding, iirc.

However, the idea of explicitly turning off the utf-8 layer on the spamd/spamc
socket, then performing the UTF-8 downgrade, is not a bad one, I think.  That
may work well, but we'd have to be sure to do this at the last minute, before
writing the strings to the sockets.

(BTW, other SA developers with experience in working with utf-8 strings --
particularly in the SA code, or in spamd -- please speak up here...)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to