[ 
https://issues.apache.org/jira/browse/SOLR-96?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-96:
------------------------------

    Attachment: SOLR-96.patch

Here is a patch to fix this.

The whole problem *everywhere* in solr (even for config files) is, that XML 
files per spec are not intended to be handled a "text", they are binary!!! 
(this is why the MIME type is application/xml and text/xml was deprecated by 
IANA).

The APIs provided by Java that take java.io.Reader are only convenience methods 
to support parsing strings or database contents that are in text contents with 
already detected CharSet. XML files from unknown source must always be parsed 
as a byte-stream. Charsets determined from HTTP headers may only be used as a 
hint to the parser.

The patch changes the XmlUpdateRequestHandler to use the byte stream and pass 
the charset from Content-Type header as a hint to the parser.

This patch still misses a test.

In general we should review all XML parsing in solr and never ever use 
java.io.Reader!!!

> Solr should support alternate charsets for XML update messages
> --------------------------------------------------------------
>
>                 Key: SOLR-96
>                 URL: https://issues.apache.org/jira/browse/SOLR-96
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Hoss Man
>            Assignee: Uwe Schindler
>         Attachments: SOLR-96.patch
>
>
> At the moment, the XML messages sent to solr to add/delete documents must be 
> in UTF-8.  The imput processing should be changed to determine the charset 
> based on the HTTP header info, or the XML contents.
> Background and refrence material...
> http://www.nabble.com/double-curl-calls-in-post.sh--tf2287469.html#a6369448
> http://www.nabble.com/wana-use-CJKAnalyzer-tf2303256.html#a6451918
> http://www.ietf.org/rfc/rfc3023.txt
> http://www.w3.org/TR/REC-xml/#sec-guessing

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to