Re: Log and special characters

Ben RUBSON Wed, 02 Aug 2017 02:00:14 -0700

> On 02 Aug 2017, at 10:52, André Warnier (tomcat) <a...@ice-sa.com> wrote:
> 
> On 01.08.2017 19:30, Ben RUBSON wrote:
>> Hi,
>> 
>> The following UTF-8 :
>> warn("warn with special char ééèè");
>> $r->log->error("log with special char ééèè");
>> 
>> Produces :
>> warn with special char ééèè at ...
>> [Tue Aug 01 19:25:28.914947 2017] [perl:error] [pid 56938] [client 
>> 127.0.0.1:59952] log with special char \xc3\xa9\xc3\xa9\xc3\xa8\xc3\xa8
>> 
>> Why all these \x symbols ?
> 
> These represent the *bytes* which correspond to the UTF-8 encoding of your 
> "special" characters above. E.g. the character "é" has the Unicode codepoint 
> 233 (decimal) or E9 (hexadecimal). When encoded using the UTF-8 encoding, 
> this is represented by 2 bytes C3 A9 (hexadecimal). The "\x" prefix is a 
> common way to indicate that the symbols which follow should be interpreted as 
> a hexadecimal number.
> 
> The exact reason why $r->log->error chooses to represent these characters in 
> such a way in the logfile (instead of just printing them as the bytes that 
> constitute their UTF-8 encoding) is not really known to me, but I can make a 
> guess :
> 
> Internally, perl "knows" that these characters are Unicode.  But when it 
> writes them out to a file (such as here the logfile of Apache), it does not 
> necessarily know that this file itself is opened "in UTF-8 mode" and that it 
> can just send the characters that way.
> So it "escapes" them in a way that will make them readable by a human, no 
> matter what (*).
> And those are the \x.. (pure ASCII) representations that you see in the 
> logfile.
> 
> On the other hand, the "warn()" that you also use above, that is perl writing 
> directly to its STDERR. And because that is a file that perl opened itself, 
> it knows that it can handle UTF-8, so it writes these characters directly 
> that way.
> 
>> How to avoid them ?
> 
> In this case, I don't know, because it may depend on the way that Apache 
> handles its logfiles, and not only on perl/mod_perl.
> 
>> 
> (*) for example, no matter which text editor you later use to view the 
> logfile. All text editors can handle ASCII, but not necessarily UTF-8.
> 
> Ah, and I just saw your follow-up message, and between that and the above, we 
> should have some reasonable explanation together.


Thank you very much for your detailed answer André !
Yes Perl must certainly escape UTF-8 characters as you just explained.
If we convert the string to ascii first (using Encode), these special 
characters are not correctly displayed, this time due to Apache 
ap_escape_errorlog_item() function.

Best thing is then to avoid them :)

Many thanks !

Re: Log and special characters

Reply via email to