[
https://issues.apache.org/jira/browse/LOG4J2-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659476#comment-13659476
]
Nick Williams commented on LOG4J2-255:
--------------------------------------
Okay, I think I understand all of this better. The 100% correct solution that
will work all of the time is to change all of the computers in the world to
have a default UTF-8 platform encoding. Too bad we don't have the power to do
that... ;-)
Here's what I think should be happening:
Internally, absolutely everything should be handled UTF-8 for consistency's
sake. However, when dealing with external resources:
- Data transmitted over the wire or interprocess (such as net, flume, etc.)
should use UTF-8 exclusively.
- XML written to a file or other non-network output stream should use UTF-8
exclusively.
- Data read from files or other non-network input streams should detect the
file encoding (is this possible? do we have to just rely on the platform
default here?) and read in that file encoding, converting to Unicode upon
reading (which should happen automatically, since all Strings in Java are
Unicode). My understanding of XML is that you SHOULD always encode it a Unicode
variant such as UTF-8, UTF-16, etc., but not everybody does.
- Data written to files or other output streams (including the Console) should
use the platform default encoding if no explicit encoding is specified. Every
AbstractStringLayout should provide a way to specify an encoding that overrides
the platform default encoding. AbstractStringLayout already does this by having
a mandatory constructor that takes a Charset. However, it doesn't account for
the possibility that it is constructed with a null Charset. IMO, it should be
setting the Charset to the platform default if it's constructed with a null
Charset. Furthermore, every class that extends AbstractStringLayout should use
this Charset /except/ XMLLayout, which should ALWAYS use UTF-8. The
`@PluginAttr("charset") String charsetName` parameter for
XMLLayout#createLayout should be removed, the `Charset charset` parameter for
XMLLayout#XMLLayout should be removed, and UTF-8 should be hardcoded as the
value for super(). (In fact, right now the XMLLayout is broken, because it
accepts a user-supplied Charset but the header is hard-coded to <?xml
version="1.0" encoding="UTF-8"?>.)
(Side note: Strings in Java are Unicode, not UTF-8. Some of the people
commenting here have used these terms interchangeably, but they are not
interchangeable. Unicode is the system of assigning decimal numbers to
characters. UTF-8, UTF-16, UTF-32, etc. are different systems for interpreting
bytes as these decimal, Unicode numbers.
http://stackoverflow.com/questions/643694/utf-8-vs-unicode)
> Multi-byte character strings are scrambled in log output
> --------------------------------------------------------
>
> Key: LOG4J2-255
> URL: https://issues.apache.org/jira/browse/LOG4J2-255
> Project: Log4j 2
> Issue Type: Bug
> Components: Appenders, Core
> Affects Versions: 2.0-beta6
> Reporter: Remko Popma
> Assignee: Remko Popma
> Priority: Blocker
> Fix For: 2.0-beta7
>
>
> When I tried to log a Japanese string the output was scrambled in both the
> Console and a log file.
> For example,
> logger.warn("日本語テスト"); // (Japanese test)
> came out as
> 15:07:00.184 [main] WARN test.JapaneseTest - 譌・譛ャ隱槭ユ繧ケ繝?
> This is the log4j2.xml configuration:
> <?xml version="1.0" encoding="UTF-8"?>
> <configuration status="warn">
> <appenders>
> <Console name="Console" target="SYSTEM_OUT">
> <PatternLayout>
> <pattern>%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n
> </pattern>
> </PatternLayout>
> </Console>
> <File name="tracelog" fileName="trace-log.txt" immediateFlush="true"
> append="false">
> <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level
> %logger{36} - %msg%n"/>
> </File>
> </appenders>
>
> <loggers>
> <root level="trace">
> <appender-ref ref="Console"/>
> <appender-ref ref="tracelog"/>
> </root>
> </loggers>
> </configuration>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]