[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657491#comment-13657491 ]
David Smiley commented on LUCENE-4583: -------------------------------------- Aha; thanks for the clarification. I see it now. And I see that after I commented the limit check, the assertion was hit. I didn't hit this assertion with Barakat's patch when I last ran it; weird but whatever. BTW ByteBlockPool doesn't really have this limit, notwithstanding the bug that Barakat fixed in his patch. It's not a hard limit as BBP.append() and readBytes() will conveniently loop for you whereas if code uses PagedBytes then you could loop on fillSlice() yourself to support big values. That is a bona-fide bug on ByteBlockPool that it didn't implement that loop correctly and it should be fixed if not in this issue then another. So a DocValues codec that supports large binary values could be nearly identical to the current codec but call fillSlice() in a loop, and only for variable-sized binary values (just like BBP's algorithm), and that would basically be the only change. Do you support such a change? If not then why not (a technical reason please)? If you can't support such a change, then would you also object to the addition of a new codec that simply lifted this limit as I proposed? Note that would include potentially a bunch of duplicated code just to call fillSlice() in a loop; I propose it would be simpler and more maintainable to not limit binary docvalues to 32k. > StraightBytesDocValuesField fails if bytes > 32k > ------------------------------------------------ > > Key: LUCENE-4583 > URL: https://issues.apache.org/jira/browse/LUCENE-4583 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 4.0, 4.1, 5.0 > Reporter: David Smiley > Priority: Critical > Fix For: 4.4 > > Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, > LUCENE-4583.patch, LUCENE-4583.patch > > > I didn't observe any limitations on the size of a bytes based DocValues field > value in the docs. It appears that the limit is 32k, although I didn't get > any friendly error telling me that was the limit. 32k is kind of small IMO; > I suspect this limit is unintended and as such is a bug. The following > test fails: > {code:java} > public void testBigDocValue() throws IOException { > Directory dir = newDirectory(); > IndexWriter writer = new IndexWriter(dir, writerConfig(false)); > Document doc = new Document(); > BytesRef bytes = new BytesRef((4+4)*4097);//4096 works > bytes.length = bytes.bytes.length;//byte data doesn't matter > doc.add(new StraightBytesDocValuesField("dvField", bytes)); > writer.addDocument(doc); > writer.commit(); > writer.close(); > DirectoryReader reader = DirectoryReader.open(dir); > DocValues docValues = MultiDocValues.getDocValues(reader, "dvField"); > //FAILS IF BYTES IS BIG! > docValues.getSource().getBytes(0, bytes); > reader.close(); > dir.close(); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org