[ 
https://issues.apache.org/jira/browse/SOLR-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004546#comment-13004546
 ] 

Uwe Schindler commented on SOLR-2381:
-------------------------------------

Hi Bernd,
we know where the problem in Jetty is (they buffer 512 chars without respecting 
surrogates). When they then convert those buffered chars to UTF-8 its broken at 
the boundaries. This bug in Jetty may also affect JSON output, but JSON is much 
more compact and may not easily hit this buffer issue (as it does not use 
Strings to feed to writer, the broken method in JETTY is handling 
Writer.write(String,...).

In general we are discussing to not use Readers and Writers supplied by the 
Servlet Container. As HTTP is a byte-based protocol, code should only use 
InputStreams and OutputStreams to communicate with the client. Writers and 
Readers are only provided for convenience with JSP engines.

The input part of Solr no longer uses Readers, they pass always pass 
InputStreams around. I uploaded a patch a week ago to do the same on the output 
side of Solr: SOLR-ServletOutputWriter.patch

Please note: As JSP pages use Jetty's writers, analysis.jsp may still produce 
corrupt output.

Can you patch your solr with that one, then your problems should disappear for 
all OutputHandler generated content except JSP pages in Solr. We are thinking 
about optimizing this, internally, but the above patch removes all use of Solr. 
The patch is against trunk as far as I know.

> The included jetty server does not support UTF-8
> ------------------------------------------------
>
>                 Key: SOLR-2381
>                 URL: https://issues.apache.org/jira/browse/SOLR-2381
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Blocker
>             Fix For: 3.1, 4.0
>
>         Attachments: SOLR-2381.patch, SOLR-2381_xmltest.patch, 
> SOLR-ServletOutputWriter.patch, jetty-6.1.26-patched-JETTY-1340.jar, 
> jetty-util-6.1.26-patched-JETTY-1340.jar, post_utf8enhanced.sh, 
> utf8enhanced.xml
>
>
> Some background here: 
> http://www.lucidimagination.com/search/document/6babe83bd4a98b64/which_unicode_version_is_supported_with_lucene
> Some possible solutions:
> * wait and see if we get resolution on 
> http://jira.codehaus.org/browse/JETTY-1340. To be honest, I am not even sure 
> where jetty is being maintained (there is a separate jetty project at 
> eclipse.org with another bugtracker, but the older releases are at codehaus).
> * include a patched version of jetty with correct utf-8, using that patch.
> * remove jetty and include a different container instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to