[ https://issues.apache.org/jira/browse/SOLR-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated SOLR-2381: ------------------------------ Attachment: SOLR-2381_xmltest.patch attached is a unit test. if you disable the 'case 4' so that it only uses 1, 2, and 3 byte codepoints, the test always passes. additionally it only fails with the XML response format (the default binary is fine). the test chooses different formats for each iteration. {noformat} junit-sequential: [junit] Testsuite: org.apache.solr.client.solrj.embedded.SolrExampleJettyTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 3.829 sec [junit] [junit] ------------- Standard Error ----------------- [junit] NOTE: reproduce with: ant test -Dtestcase=SolrExampleJettyTest -Dtestmethod=testUnicode -Dtests.seed=-8507816048970822444:1424998400651628841 [junit] WARNING: test class left thread running: Thread[MultiThreadedHttpConnectionManager cleanup,5,main] [junit] RESOURCE LEAK: test class left 1 thread(s) running [junit] NOTE: test params are: codec=PreFlex, locale=es_GT, timezone=Asia/Hovd [junit] NOTE: all tests run in this JVM: [junit] [SolrExampleJettyTest] [junit] NOTE: Windows Vista 6.0 x86/Sun Microsystems Inc. 1.6.0_23 (32-bit)/cpus=4,threads=2,free=9760576,total=16252928 [junit] ------------- ---------------- --------------- [junit] Testcase: testUnicode(org.apache.solr.client.solrj.embedded.SolrExampleJettyTest): Caused an ERROR [junit] Error executing query [junit] org.apache.solr.client.solrj.SolrServerException: Error executing query [junit] at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) [junit] at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119) [junit] at org.apache.solr.client.solrj.SolrExampleTests.testUnicode(SolrExampleTests.java:290) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1213) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1145) [junit] Caused by: org.apache.solr.common.SolrException: parsing error [junit] at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:145) [junit] at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:106) [junit] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478) [junit] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) [junit] at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) [junit] Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 character 0xdf05(a surrogate character) at char #2475, byte #127) [junit] at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708) [junit] at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086) [junit] at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:218) [junit] at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:244) [junit] at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:130) [junit] Caused by: java.io.CharConversionException: Invalid UTF-8 character 0xdf05(a surrogate character) at char #2475, byte #127) [junit] at com.ctc.wstx.io.UTF8Reader.reportInvalid(UTF8Reader.java:335) [junit] at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:247) [junit] at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) [junit] at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) [junit] at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) [junit] at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) [junit] at com.ctc.wstx.sr.StreamScanner.getNext(StreamScanner.java:763) [junit] at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2721) [junit] at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) [junit] [junit] {noformat} > The included jetty server does not support UTF-8 > ------------------------------------------------ > > Key: SOLR-2381 > URL: https://issues.apache.org/jira/browse/SOLR-2381 > Project: Solr > Issue Type: Bug > Reporter: Robert Muir > Assignee: Robert Muir > Priority: Blocker > Fix For: 3.1, 4.0 > > Attachments: SOLR-2381.patch, SOLR-2381_xmltest.patch, > SOLR-ServletOutputWriter.patch, jetty-6.1.26-patched-JETTY-1340.jar, > jetty-util-6.1.26-patched-JETTY-1340.jar > > > Some background here: > http://www.lucidimagination.com/search/document/6babe83bd4a98b64/which_unicode_version_is_supported_with_lucene > Some possible solutions: > * wait and see if we get resolution on > http://jira.codehaus.org/browse/JETTY-1340. To be honest, I am not even sure > where jetty is being maintained (there is a separate jetty project at > eclipse.org with another bugtracker, but the older releases are at codehaus). > * include a patched version of jetty with correct utf-8, using that patch. > * remove jetty and include a different container instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org