[ 
https://issues.apache.org/jira/browse/LUCENE-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067144#comment-13067144
 ] 

Michael McCandless commented on LUCENE-3327:
--------------------------------------------

Hmm, I don't think this is quite right: in the BYTE1 case, these are the bytes 
from the term, and we shouldn't pretend they are unicode code points (which is 
what UnicodeUtil.newString is given).

Ie, we really do need the inputMode to be passed to inputToString.

Really, this test pretends a term is always a utf8 byte sequence, which in 
general is not the case (terms are arbitrary byte[]), it's just that this test 
only ever operates on terms that are in fact utf8 byte sequences (I think?).

Indeed I'm also hitting AIOOBE (ant test-core -Dtestcase=TestFSTs 
-Dtestmethod=testRandomWords 
-Dtests.seed=-3451527662631579719:-3355372777860187201):

{noformat}
There was 1 failure:
1) testRandomWords(org.apache.lucene.util.fst.TestFSTs)
java.lang.ArrayIndexOutOfBoundsException: 44
        at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:586)
        at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:203)
        at org.apache.lucene.util.fst.TestFSTs.inputToString(TestFSTs.java:989)
        at org.apache.lucene.util.fst.TestFSTs.access$000(TestFSTs.java:53)
        at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.verifyPruned(TestFSTs.java:833)
        at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:507)
        at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:366)
        at org.apache.lucene.util.fst.TestFSTs.doTest(TestFSTs.java:214)
        at 
org.apache.lucene.util.fst.TestFSTs.testRandomWords(TestFSTs.java:963)
        at 
org.apache.lucene.util.fst.TestFSTs.testRandomWords(TestFSTs.java:938)
{noformat}

Spooky because this test supposedly creates random valid unicode strings 
(_TestUtil.randomRealisticUnicodeString)... hmmm.

> TestFSTs.testRandomWords throws AIOBE when "verbose"=true
> ---------------------------------------------------------
>
>                 Key: LUCENE-3327
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3327
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/FSTs
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Trivial
>         Attachments: LUCENE-3327.patch
>
>
> Seems like invalid utf-8 sometimes gets passed to Bytesref.utf8ToString() in 
> the verbose "println"s.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to