[ https://issues.apache.org/jira/browse/SOLR-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004546#comment-13004546 ]
Uwe Schindler commented on SOLR-2381: ------------------------------------- Hi Bernd, we know where the problem in Jetty is (they buffer 512 chars without respecting surrogates). When they then convert those buffered chars to UTF-8 its broken at the boundaries. This bug in Jetty may also affect JSON output, but JSON is much more compact and may not easily hit this buffer issue (as it does not use Strings to feed to writer, the broken method in JETTY is handling Writer.write(String,...). In general we are discussing to not use Readers and Writers supplied by the Servlet Container. As HTTP is a byte-based protocol, code should only use InputStreams and OutputStreams to communicate with the client. Writers and Readers are only provided for convenience with JSP engines. The input part of Solr no longer uses Readers, they pass always pass InputStreams around. I uploaded a patch a week ago to do the same on the output side of Solr: SOLR-ServletOutputWriter.patch Please note: As JSP pages use Jetty's writers, analysis.jsp may still produce corrupt output. Can you patch your solr with that one, then your problems should disappear for all OutputHandler generated content except JSP pages in Solr. We are thinking about optimizing this, internally, but the above patch removes all use of Solr. The patch is against trunk as far as I know. > The included jetty server does not support UTF-8 > ------------------------------------------------ > > Key: SOLR-2381 > URL: https://issues.apache.org/jira/browse/SOLR-2381 > Project: Solr > Issue Type: Bug > Reporter: Robert Muir > Assignee: Robert Muir > Priority: Blocker > Fix For: 3.1, 4.0 > > Attachments: SOLR-2381.patch, SOLR-2381_xmltest.patch, > SOLR-ServletOutputWriter.patch, jetty-6.1.26-patched-JETTY-1340.jar, > jetty-util-6.1.26-patched-JETTY-1340.jar, post_utf8enhanced.sh, > utf8enhanced.xml > > > Some background here: > http://www.lucidimagination.com/search/document/6babe83bd4a98b64/which_unicode_version_is_supported_with_lucene > Some possible solutions: > * wait and see if we get resolution on > http://jira.codehaus.org/browse/JETTY-1340. To be honest, I am not even sure > where jetty is being maintained (there is a separate jetty project at > eclipse.org with another bugtracker, but the older releases are at codehaus). > * include a patched version of jetty with correct utf-8, using that patch. > * remove jetty and include a different container instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org