On 01.08.2017 19:30, Ben RUBSON wrote:
Hi,

The following UTF-8 :
warn("warn with special char ééèè");
$r->log->error("log with special char ééèè");

Produces :
warn with special char ééèè at ...
[Tue Aug 01 19:25:28.914947 2017] [perl:error] [pid 56938] [client 
127.0.0.1:59952] log with special char \xc3\xa9\xc3\xa9\xc3\xa8\xc3\xa8

Why all these \x symbols ?

These represent the *bytes* which correspond to the UTF-8 encoding of your "special" characters above. E.g. the character "é" has the Unicode codepoint 233 (decimal) or E9 (hexadecimal). When encoded using the UTF-8 encoding, this is represented by 2 bytes C3 A9 (hexadecimal). The "\x" prefix is a common way to indicate that the symbols which follow should be interpreted as a hexadecimal number.

The exact reason why $r->log->error chooses to represent these characters in such a way in the logfile (instead of just printing them as the bytes that constitute their UTF-8 encoding) is not really known to me, but I can make a guess :

Internally, perl "knows" that these characters are Unicode. But when it writes them out to a file (such as here the logfile of Apache), it does not necessarily know that this file itself is opened "in UTF-8 mode" and that it can just send the characters that way.
So it "escapes" them in a way that will make them readable by a human, no 
matter what (*).
And those are the \x.. (pure ASCII) representations that you see in the logfile.

On the other hand, the "warn()" that you also use above, that is perl writing directly to its STDERR. And because that is a file that perl opened itself, it knows that it can handle UTF-8, so it writes these characters directly that way.

How to avoid them ?

In this case, I don't know, because it may depend on the way that Apache handles its logfiles, and not only on perl/mod_perl.


(*) for example, no matter which text editor you later use to view the logfile. All text editors can handle ASCII, but not necessarily UTF-8.

Ah, and I just saw your follow-up message, and between that and the above, we should have some reasonable explanation together.

Reply via email to