[
https://issues.apache.org/jira/browse/SOLR-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated SOLR-2381:
------------------------------
Attachment: SOLR-2381_xmltest.patch
attached is a unit test. if you disable the 'case 4' so that it only uses 1, 2,
and 3 byte codepoints, the test always passes.
additionally it only fails with the XML response format (the default binary is
fine). the test chooses different formats for each iteration.
{noformat}
junit-sequential:
[junit] Testsuite:
org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 3.829 sec
[junit]
[junit] ------------- Standard Error -----------------
[junit] NOTE: reproduce with: ant test -Dtestcase=SolrExampleJettyTest
-Dtestmethod=testUnicode -Dtests.seed=-8507816048970822444:1424998400651628841
[junit] WARNING: test class left thread running:
Thread[MultiThreadedHttpConnectionManager cleanup,5,main]
[junit] RESOURCE LEAK: test class left 1 thread(s) running
[junit] NOTE: test params are: codec=PreFlex, locale=es_GT,
timezone=Asia/Hovd
[junit] NOTE: all tests run in this JVM:
[junit] [SolrExampleJettyTest]
[junit] NOTE: Windows Vista 6.0 x86/Sun Microsystems Inc. 1.6.0_23
(32-bit)/cpus=4,threads=2,free=9760576,total=16252928
[junit] ------------- ---------------- ---------------
[junit] Testcase:
testUnicode(org.apache.solr.client.solrj.embedded.SolrExampleJettyTest):
Caused an ERROR
[junit] Error executing query
[junit] org.apache.solr.client.solrj.SolrServerException: Error executing
query
[junit] at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
[junit] at
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119)
[junit] at
org.apache.solr.client.solrj.SolrExampleTests.testUnicode(SolrExampleTests.java:290)
[junit] at
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1213)
[junit] at
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1145)
[junit] Caused by: org.apache.solr.common.SolrException: parsing error
[junit] at
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:145)
[junit] at
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:106)
[junit] at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478)
[junit] at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)
[junit] at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
[junit] Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8
character 0xdf05(a surrogate character) at char #2475, byte #127)
[junit] at
com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
[junit] at
com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
[junit] at
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:218)
[junit] at
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:244)
[junit] at
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:130)
[junit] Caused by: java.io.CharConversionException: Invalid UTF-8 character
0xdf05(a surrogate character) at char #2475, byte #127)
[junit] at com.ctc.wstx.io.UTF8Reader.reportInvalid(UTF8Reader.java:335)
[junit] at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:247)
[junit] at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
[junit] at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
[junit] at
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
[junit] at
com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
[junit] at com.ctc.wstx.sr.StreamScanner.getNext(StreamScanner.java:763)
[junit] at
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2721)
[junit] at
com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
[junit]
[junit]
{noformat}
> The included jetty server does not support UTF-8
> ------------------------------------------------
>
> Key: SOLR-2381
> URL: https://issues.apache.org/jira/browse/SOLR-2381
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2381.patch, SOLR-2381_xmltest.patch,
> SOLR-ServletOutputWriter.patch, jetty-6.1.26-patched-JETTY-1340.jar,
> jetty-util-6.1.26-patched-JETTY-1340.jar
>
>
> Some background here:
> http://www.lucidimagination.com/search/document/6babe83bd4a98b64/which_unicode_version_is_supported_with_lucene
> Some possible solutions:
> * wait and see if we get resolution on
> http://jira.codehaus.org/browse/JETTY-1340. To be honest, I am not even sure
> where jetty is being maintained (there is a separate jetty project at
> eclipse.org with another bugtracker, but the older releases are at codehaus).
> * include a patched version of jetty with correct utf-8, using that patch.
> * remove jetty and include a different container instead.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]