[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657491#comment-13657491
 ] 

David Smiley commented on LUCENE-4583:
--------------------------------------

Aha; thanks for the clarification.  I see it now.  And I see that after I 
commented the limit check, the assertion was hit.  I didn't hit this assertion 
with Barakat's patch when I last ran it; weird but whatever.

BTW ByteBlockPool doesn't really have this limit, notwithstanding the bug that 
Barakat fixed in his patch. It's not a hard limit as BBP.append() and 
readBytes() will conveniently loop for you whereas if code uses PagedBytes then 
you could loop on fillSlice() yourself to support big values.  That is a 
bona-fide bug on ByteBlockPool that it didn't implement that loop correctly and 
it should be fixed if not in this issue then another.

So a DocValues codec that supports large binary values could be nearly 
identical to the current codec but call fillSlice() in a loop, and only for 
variable-sized binary values (just like BBP's algorithm), and that would 
basically be the only change. Do you support such a change? If not then why not 
(a technical reason please)?  If you can't support such a change, then would 
you also object to the addition of a new codec that simply lifted this limit as 
I proposed?  Note that would include potentially a bunch of duplicated code 
just to call fillSlice() in a loop; I propose it would be simpler and more 
maintainable to not limit binary docvalues to 32k.
                
> StraightBytesDocValuesField fails if bytes > 32k
> ------------------------------------------------
>
>                 Key: LUCENE-4583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4583
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: David Smiley
>            Priority: Critical
>             Fix For: 4.4
>
>         Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
> LUCENE-4583.patch, LUCENE-4583.patch
>
>
> I didn't observe any limitations on the size of a bytes based DocValues field 
> value in the docs.  It appears that the limit is 32k, although I didn't get 
> any friendly error telling me that was the limit.  32k is kind of small IMO; 
> I suspect this limit is unintended and as such is a bug.    The following 
> test fails:
> {code:java}
>   public void testBigDocValue() throws IOException {
>     Directory dir = newDirectory();
>     IndexWriter writer = new IndexWriter(dir, writerConfig(false));
>     Document doc = new Document();
>     BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
>     bytes.length = bytes.bytes.length;//byte data doesn't matter
>     doc.add(new StraightBytesDocValuesField("dvField", bytes));
>     writer.addDocument(doc);
>     writer.commit();
>     writer.close();
>     DirectoryReader reader = DirectoryReader.open(dir);
>     DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
>     //FAILS IF BYTES IS BIG!
>     docValues.getSource().getBytes(0, bytes);
>     reader.close();
>     dir.close();
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to