[
https://issues.apache.org/jira/browse/SOLR-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990564#comment-12990564
]
Mark Miller commented on SOLR-2347:
-----------------------------------
Thanks for looking into this Uwe!
> Use InputStream and not Reader for XML parsing
> ----------------------------------------------
>
> Key: SOLR-2347
> URL: https://issues.apache.org/jira/browse/SOLR-2347
> Project: Solr
> Issue Type: Bug
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
>
> Followup to SOLR-96:
> Solr mostly uses java.io.Reader and passes this Reader to the XML parser.
> According to XML spec, a XML file should be initially seen as a binary stream
> with a default charset of UTF-8 or another charset given by the network
> protocol (like Content-Type header in HTTP). But very important, this default
> charset is only a "hint" to the parser - mandatory is the charset from the
> XML header processing inctruction. Because of this, the parser must be able
> to change the charset when reading the XML headers (possibly also when seeing
> BOM markers). This is not possible if the XML parser gets a java.io.Reader
> instead of java.io.InputStreams. SOLR-96 already fixed this for the
> XmlUpdateRequestHandler and the DocumentAnalysisRequestHandler. This issue
> should fix the rest to be conforming to XML-spec (open schema.xml and
> config.xml as InputStream not Reader and others).
> This change would not break anything in Solr (perhaps only backwards
> compatibility in the API), as the default used by XML parsers is UTF-8.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]