Use InputStream and not Reader for XML parsing
----------------------------------------------
Key: SOLR-2347
URL: https://issues.apache.org/jira/browse/SOLR-2347
Project: Solr
Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Followup to SOLR-96:
Solr mostly uses java.io.Reader and passes this Reader to the XML parser.
According to XML spec, a XML file should be initially seen as a binary stream
with a default charset of UTF-8 or another charset given by the network
protocol (like Content-Type header in HTTP). But very important, this default
charset is only a "hint" to the parser - mandatory is the charset from the XML
header processing inctruction. Because of this, the parser must be able to
change the charset when reading the XML headers (possibly also when seeing BOM
markers). This is not possible if the XML parser gets a java.io.Reader instead
of java.io.InputStreams. SOLR-96 already fixed this for the
XmlUpdateRequestHandler and the DocumentAnalysisRequestHandler. This issue
should fix the rest to be conforming to XML-spec (open schema.xml and
config.xml as InputStream not Reader and others).
This change would not break anything in Solr (perhaps only backwards
compatibility in the API), as the default used by XML parsers is UTF-8.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]