[jira] [Comment Edited] (LUCENE-4583) StraightBytesDocValuesField fails if bytes > 32k

Barakat Barakat (JIRA) Mon, 03 Dec 2012 13:56:00 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509044#comment-13509044
 ]


Barakat Barakat edited comment on LUCENE-4583 at 12/3/12 9:54 PM:
------------------------------------------------------------------

The limitation comes from PagedBytes. When PagedBytes is created it is given a 
number of bits to use per block. The blockSize is set to (1 << blockBits). From 
what I've seen, classes that use PagedBytes usually pass in 15 as the 
blockBits. This leads to the 32768 byte limit.

The fillSlice function of the PagedBytes.Reader will return a block of bytes 
that is either inside one block or overlapping two blocks. If you try to give 
it a length that is over the block size it will hit the out of bounds 
exception. For the project I am working on, we need more than 32k bytes for our 
DocValues. We need that much rarely, but we still need that much to keep the 
search functioning. I fixed this for our project by changing fillSlices to this:

http://pastebin.com/raw.php?i=TCY8zjAi

Test unit:
http://pastebin.com/raw.php?i=Uy29BGGJ

After placing this in our Solr instance, the search no longer crashes and 
returns the correct values when the document has a DocValues field more than 
32k bytes. As far as I know there is no limit now. I haven't noticed a 
performance hit. It shouldn't really affect performance unless you have many of 
these large DocValues fields. Thank you to David for his help with this.

Edit: This only works when start == 0. Seeing if I can fix it.
                
      was (Author: barakatx2):
    The limitation comes from PagedBytes. When PagedBytes is created it is 
given a number of bits to use per block. The blockSize is set to (1 << 
blockBits). From what I've seen, classes that use PagedBytes usually pass in 15 
as the blockBits. This leads to the 32768 byte limit.

The fillSlice function of the PagedBytes.Reader will return a block of bytes 
that is either inside one block or overlapping two blocks. If you try to give 
it a length that is over the block size it will hit the out of bounds 
exception. For the project I am working on, we need more than 32k bytes for our 
DocValues. We need that much rarely, but we still need that much to keep the 
search functioning. I fixed this for our project by changing fillSlices to this:

http://pastebin.com/raw.php?i=TCY8zjAi

Test unit:
http://pastebin.com/raw.php?i=Uy29BGGJ

After placing this in our Solr instance, the search no longer crashes and 
returns the correct values when the document has a DocValues field more than 
32k bytes. As far as I know there is no limit now. I haven't noticed a 
performance hit. It shouldn't really affect performance unless you have many of 
these large DocValues fields. Thank you to David for his help with this.
                  
> StraightBytesDocValuesField fails if bytes > 32k
> ------------------------------------------------
>
>                 Key: LUCENE-4583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4583
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: David Smiley
>            Priority: Critical
>
> I didn't observe any limitations on the size of a bytes based DocValues field 
> value in the docs.  It appears that the limit is 32k, although I didn't get 
> any friendly error telling me that was the limit.  32k is kind of small IMO; 
> I suspect this limit is unintended and as such is a bug.    The following 
> test fails:
> {code:java}
>   public void testBigDocValue() throws IOException {
>     Directory dir = newDirectory();
>     IndexWriter writer = new IndexWriter(dir, writerConfig(false));
>     Document doc = new Document();
>     BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
>     bytes.length = bytes.bytes.length;//byte data doesn't matter
>     doc.add(new StraightBytesDocValuesField("dvField", bytes));
>     writer.addDocument(doc);
>     writer.commit();
>     writer.close();
>     DirectoryReader reader = DirectoryReader.open(dir);
>     DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
>     //FAILS IF BYTES IS BIG!
>     docValues.getSource().getBytes(0, bytes);
>     reader.close();
>     dir.close();
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4583) StraightBytesDocValuesField fails if bytes > 32k

Reply via email to