DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=43736>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=43736





------- Additional Comments From [EMAIL PROTECTED]  2007-11-03 13:37 -------
I suggest looking at what the Rome developers have done with their XMLReader 
class [1].  It goes through an elaborate process to figure out the proper 
charset for the document.  It's explained here [2] and here [3].  It's 
copyrighted by Sun Microsystems, but it's under an Apache license.  I adapted 
it for the XMLC project [4] and modified it to use gnu-regexp instead of JDK1.4 
regexp, since my project depends on JDK1.3, not 1.4.  I also slightly modified 
a couple constructors to make it easier to provide a per/instance 
defaultEncoding if, for some reason, none can be detected.  I use it like 
this....

try {
    InputSource inputSource = new ClosingInputSource(url);
    try {
        XmlReader reader = new XmlReader(InputSourceOps.openSystemId(url), 
false, defaultEncoding);
        inputSource.setCharacterStream(reader);
        inputSource.setEncoding(reader.getEncoding());
    } catch (XmlReaderException xre) {
        //This is somewhat unlikely to happen, but doesn't hurt to have
        //extra fallback, which XmlReader conveniently allows for by
        //providing access to the original unconsumed inputstream via
        //the XmlReaderException
        inputSource.setByteStream(xre.getInputStream());
        inputSource.setEncoding(defaultEncoding);
    }
    return inputSource;
} catch (IOException ioe) {
    throw new XMLCError("Couldn't load file "+url, ioe);
}

A lot of thought was put into this by the Rome team.  Seems like it would make 
sense to reuse it in Log4j rather than reinvent the wheel with something, 
likely, not nearly as robust.

Jake

[1] 
https://rome.dev.java.net/source/browse/rome/src/java/com/sun/syndication/io/XmlReader.java
[2] http://wiki.java.net/bin/view/Javawsxml/Rome05CharsetEncoding
[3] http://blogs.sun.com/tucu/entry/detecting_xml_charset_encoding_again
[4] 
http://cvs.xmlc.forge.objectweb.org/cgi-bin/viewcvs.cgi/xmlc/xmlc/xmlc/modules/xmlc/src/org/enhydra/xml/io/XmlReader.java



-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to