[
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509044#comment-13509044
]
Barakat Barakat edited comment on LUCENE-4583 at 12/3/12 9:54 PM:
------------------------------------------------------------------
The limitation comes from PagedBytes. When PagedBytes is created it is given a
number of bits to use per block. The blockSize is set to (1 << blockBits). From
what I've seen, classes that use PagedBytes usually pass in 15 as the
blockBits. This leads to the 32768 byte limit.
The fillSlice function of the PagedBytes.Reader will return a block of bytes
that is either inside one block or overlapping two blocks. If you try to give
it a length that is over the block size it will hit the out of bounds
exception. For the project I am working on, we need more than 32k bytes for our
DocValues. We need that much rarely, but we still need that much to keep the
search functioning. I fixed this for our project by changing fillSlices to this:
http://pastebin.com/raw.php?i=TCY8zjAi
Test unit:
http://pastebin.com/raw.php?i=Uy29BGGJ
After placing this in our Solr instance, the search no longer crashes and
returns the correct values when the document has a DocValues field more than
32k bytes. As far as I know there is no limit now. I haven't noticed a
performance hit. It shouldn't really affect performance unless you have many of
these large DocValues fields. Thank you to David for his help with this.
Edit: This only works when start == 0. Seeing if I can fix it.
was (Author: barakatx2):
The limitation comes from PagedBytes. When PagedBytes is created it is
given a number of bits to use per block. The blockSize is set to (1 <<
blockBits). From what I've seen, classes that use PagedBytes usually pass in 15
as the blockBits. This leads to the 32768 byte limit.
The fillSlice function of the PagedBytes.Reader will return a block of bytes
that is either inside one block or overlapping two blocks. If you try to give
it a length that is over the block size it will hit the out of bounds
exception. For the project I am working on, we need more than 32k bytes for our
DocValues. We need that much rarely, but we still need that much to keep the
search functioning. I fixed this for our project by changing fillSlices to this:
http://pastebin.com/raw.php?i=TCY8zjAi
Test unit:
http://pastebin.com/raw.php?i=Uy29BGGJ
After placing this in our Solr instance, the search no longer crashes and
returns the correct values when the document has a DocValues field more than
32k bytes. As far as I know there is no limit now. I haven't noticed a
performance hit. It shouldn't really affect performance unless you have many of
these large DocValues fields. Thank you to David for his help with this.
> StraightBytesDocValuesField fails if bytes > 32k
> ------------------------------------------------
>
> Key: LUCENE-4583
> URL: https://issues.apache.org/jira/browse/LUCENE-4583
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0, 4.1, 5.0
> Reporter: David Smiley
> Priority: Critical
>
> I didn't observe any limitations on the size of a bytes based DocValues field
> value in the docs. It appears that the limit is 32k, although I didn't get
> any friendly error telling me that was the limit. 32k is kind of small IMO;
> I suspect this limit is unintended and as such is a bug. The following
> test fails:
> {code:java}
> public void testBigDocValue() throws IOException {
> Directory dir = newDirectory();
> IndexWriter writer = new IndexWriter(dir, writerConfig(false));
> Document doc = new Document();
> BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
> bytes.length = bytes.bytes.length;//byte data doesn't matter
> doc.add(new StraightBytesDocValuesField("dvField", bytes));
> writer.addDocument(doc);
> writer.commit();
> writer.close();
> DirectoryReader reader = DirectoryReader.open(dir);
> DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
> //FAILS IF BYTES IS BIG!
> docValues.getSource().getBytes(0, bytes);
> reader.close();
> dir.close();
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]