Hi Shawn,

there was already the idea to make the URL enoding configureable through the 
URL itself. This is similar how Google handles the case. You have an additional 
URL param called &ie=CHARSET (ie = input encoding). This parameter is quasi 
standardized along web services from several providers, so this would be ideal 
for users and quite easy to implement. This was already noted in 
https://issues.apache.org/jira/browse/SOLR-4283 (last comment). We could open 
an issue, the implementation is quite easy (it just needs a 2 step decode: 
start with US-ASCII, search for &ie=..., change encoding, restart). The code is 
already available in my head :-)

This user could then enforce his clients to append "&ie=ISO-8859-1" to his URLs 
(or use mod_rewrite in his installation to do it automatically).

The big problem with changing the *default* charset to something else than 
UTF-8 is: It would break all of Solr Cloud, because Solr Cloud internally uses 
UTF-8 for cross-node communication. This was also one of the reasons why we 
enforced UTF-8 - so there is no way around making the default charset UTF-8.

One addition: The charset for URL encoding is configureable, if you send POST 
requests: For POST requests you can still send the charset as part

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Shawn Heisey [mailto:[email protected]]
> Sent: Saturday, July 27, 2013 8:32 AM
> To: [email protected]
> Subject: Solr user intentionally wants container to NOT use UTF-8
> 
> We have a user on the solr-user mailing list that has been bitten by enforced
> UTF-8 encoding by SOLR-4265.  Their client sends queries in
> ISO-8859-1 so they need Tomcat to handle that charset, and presumably they
> also index in ISO-8859-1.
> 
> Everything's fine in Solr 3.5, but Solr 4.3 is overriding the Tomcat 
> configuration
> and interpreting the incoming data as UTF-8.  This is all intentional, but the
> user needs the old behavior.
> 
> I think we need to offer a solrconfig option to configure the character set
> rather than hard-coding it to UTF-8.  The example config should be
> commented, and when the config is not present, Solr should default to UTF-
> 8.
> 
> If I open an issue, is that something that is likely to happen?  I don't know 
> if
> I'd be able to tackle that project without some extensive research.
> 
> Thanks,
> Shawn
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For additional
> commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to