> On 02 Aug 2017, at 10:52, André Warnier (tomcat) <a...@ice-sa.com> wrote: > > On 01.08.2017 19:30, Ben RUBSON wrote: >> Hi, >> >> The following UTF-8 : >> warn("warn with special char ééèè"); >> $r->log->error("log with special char ééèè"); >> >> Produces : >> warn with special char ééèè at ... >> [Tue Aug 01 19:25:28.914947 2017] [perl:error] [pid 56938] [client >> 127.0.0.1:59952] log with special char \xc3\xa9\xc3\xa9\xc3\xa8\xc3\xa8 >> >> Why all these \x symbols ? > > These represent the *bytes* which correspond to the UTF-8 encoding of your > "special" characters above. E.g. the character "é" has the Unicode codepoint > 233 (decimal) or E9 (hexadecimal). When encoded using the UTF-8 encoding, > this is represented by 2 bytes C3 A9 (hexadecimal). The "\x" prefix is a > common way to indicate that the symbols which follow should be interpreted as > a hexadecimal number. > > The exact reason why $r->log->error chooses to represent these characters in > such a way in the logfile (instead of just printing them as the bytes that > constitute their UTF-8 encoding) is not really known to me, but I can make a > guess : > > Internally, perl "knows" that these characters are Unicode. But when it > writes them out to a file (such as here the logfile of Apache), it does not > necessarily know that this file itself is opened "in UTF-8 mode" and that it > can just send the characters that way. > So it "escapes" them in a way that will make them readable by a human, no > matter what (*). > And those are the \x.. (pure ASCII) representations that you see in the > logfile. > > On the other hand, the "warn()" that you also use above, that is perl writing > directly to its STDERR. And because that is a file that perl opened itself, > it knows that it can handle UTF-8, so it writes these characters directly > that way. > >> How to avoid them ? > > In this case, I don't know, because it may depend on the way that Apache > handles its logfiles, and not only on perl/mod_perl. > >> > (*) for example, no matter which text editor you later use to view the > logfile. All text editors can handle ASCII, but not necessarily UTF-8. > > Ah, and I just saw your follow-up message, and between that and the above, we > should have some reasonable explanation together.
Thank you very much for your detailed answer André ! Yes Perl must certainly escape UTF-8 characters as you just explained. If we convert the string to ascii first (using Encode), these special characters are not correctly displayed, this time due to Apache ap_escape_errorlog_item() function. Best thing is then to avoid them :) Many thanks !