DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=43736>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=43736


[EMAIL PROTECTED] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED




------- Additional Comments From [EMAIL PROTECTED]  2007-11-03 10:44 -------
The presense of an encoding in the document type declaration in the string 
representation of the 
document is ignored by the parser.  It is a relic of the document once being 
encoded in as a byte 
stream, by the time that the parser is seeing it the byte stream has been 
decoded as a string by the 
InputStreamReader which is oblivious to any encoding declaration in the byte 
stream.

An external parsed entity (such as our and JUL's XML log files) that is not in 
UTF-8 or UTF-16 requires 
an explicit text declaration 
(http://www.w3.org/TR/2006/REC-xml-20060816/#charencoding).  
Without a text declaration, a parser will snif the file and determine if it is 
UTF-16BE or UTF-16LE by the 
presense of alternative 0 bytes and if not will assume that is it is UTF-8.  
More detail at http://
www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing.  It will never consult the 
platform default 
encoding.

However, if it were written in the platform encoding with the proper text 
declaration, like:

<?xml encoding="ISO-8859-1"?>
<log4j:event../>
<log4j:event../>

the current XML decoders would fail to read the file since the would just 
append that to the string and 
then the text declaration would be in the wrong place leading to a parsing 
error.

The only way to do it right is to rewrite, which I'm willing to do after 
getting log4cxx out the door.  But 
there is never any case where using the platform encoding will get you the 
right content and using 
"UTF-8" would get you the wrong content.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to