[
https://issues.apache.org/jira/browse/SOLR-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004546#comment-13004546
]
Uwe Schindler commented on SOLR-2381:
-------------------------------------
Hi Bernd,
we know where the problem in Jetty is (they buffer 512 chars without respecting
surrogates). When they then convert those buffered chars to UTF-8 its broken at
the boundaries. This bug in Jetty may also affect JSON output, but JSON is much
more compact and may not easily hit this buffer issue (as it does not use
Strings to feed to writer, the broken method in JETTY is handling
Writer.write(String,...).
In general we are discussing to not use Readers and Writers supplied by the
Servlet Container. As HTTP is a byte-based protocol, code should only use
InputStreams and OutputStreams to communicate with the client. Writers and
Readers are only provided for convenience with JSP engines.
The input part of Solr no longer uses Readers, they pass always pass
InputStreams around. I uploaded a patch a week ago to do the same on the output
side of Solr: SOLR-ServletOutputWriter.patch
Please note: As JSP pages use Jetty's writers, analysis.jsp may still produce
corrupt output.
Can you patch your solr with that one, then your problems should disappear for
all OutputHandler generated content except JSP pages in Solr. We are thinking
about optimizing this, internally, but the above patch removes all use of Solr.
The patch is against trunk as far as I know.
> The included jetty server does not support UTF-8
> ------------------------------------------------
>
> Key: SOLR-2381
> URL: https://issues.apache.org/jira/browse/SOLR-2381
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2381.patch, SOLR-2381_xmltest.patch,
> SOLR-ServletOutputWriter.patch, jetty-6.1.26-patched-JETTY-1340.jar,
> jetty-util-6.1.26-patched-JETTY-1340.jar, post_utf8enhanced.sh,
> utf8enhanced.xml
>
>
> Some background here:
> http://www.lucidimagination.com/search/document/6babe83bd4a98b64/which_unicode_version_is_supported_with_lucene
> Some possible solutions:
> * wait and see if we get resolution on
> http://jira.codehaus.org/browse/JETTY-1340. To be honest, I am not even sure
> where jetty is being maintained (there is a separate jetty project at
> eclipse.org with another bugtracker, but the older releases are at codehaus).
> * include a patched version of jetty with correct utf-8, using that patch.
> * remove jetty and include a different container instead.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]