[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-05-14 Thread Barakat Barakat (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657498#comment-13657498
 ] 

Barakat Barakat commented on LUCENE-4583:
-

Just to clarify, before 4.3 I was fixing the bug by changing 
PagedBytes#fillSlice to the first patch I posted in this issue.

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-05-08 Thread Barakat Barakat (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barakat Barakat updated LUCENE-4583:


Attachment: LUCENE-4583.patch

I recently switched from 4.1 to 4.3, and my patch needed to be updated because 
of the changes to DocValues. The problem was almost fixed for BinaryDocValues, 
but it just needed one little change. I've attached a patch that removes the 
BinaryDocValues exception when the length is over BYTE_BLOCK_SIZE (32k), fixes 
ByteBlockPool#readBytes:348, and changes the 
TestDocValuesIndexing#testTooLargeBytes test to check for accuracy.


 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.3

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2012-12-04 Thread Barakat Barakat (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barakat Barakat updated LUCENE-4583:


Attachment: LUCENE-4583.patch

I updated the code to work when start isn't zero. The code can still crash if 
you ask for a length that goes beyond the total size of the paged bytes, but 
I'm not sure how you guys like to prevent things like that. The code seems to 
be working fine with our Solr core so far. I am new to posting patches and 
writing test units in Java so please let me know if there is anything wrong 
with the code.

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.1

 Attachments: LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2012-12-03 Thread Barakat Barakat (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509044#comment-13509044
 ] 

Barakat Barakat commented on LUCENE-4583:
-

The limitation comes from PagedBytes. When PagedBytes is created it is given a 
number of bits to use per block. The blockSize is set to (1  blockBits). From 
what I've seen, classes that use PagedBytes usually pass in 15 as the 
blockBits. This leads to the 32768 byte limit.

The fillSlice function of the PagedBytes.Reader will return a block of bytes 
that is either inside one block or overlapping two blocks. If you try to give 
it a length that is over the block size it will hit the out of bounds 
exception. For the project I am working on, we need more than 32k bytes for our 
DocValues. We need that much rarely, but we still need that much to keep the 
search functioning. I fixed this for our project by changing fillSlices to this:

http://pastebin.com/raw.php?i=TCY8zjAi

Test unit:
http://pastebin.com/raw.php?i=Uy29BGGJ

After placing this in our Solr instance, the search no longer crashes and 
returns the correct values when the document has a DocValues field more than 
32k bytes. As far as I know there is no limit now. I haven't noticed a 
performance hit. It shouldn't really affect performance unless you have many of 
these large DocValues fields. Thank you to David for his help with this.

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical

 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2012-12-03 Thread Barakat Barakat (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509044#comment-13509044
 ] 

Barakat Barakat edited comment on LUCENE-4583 at 12/3/12 9:54 PM:
--

The limitation comes from PagedBytes. When PagedBytes is created it is given a 
number of bits to use per block. The blockSize is set to (1  blockBits). From 
what I've seen, classes that use PagedBytes usually pass in 15 as the 
blockBits. This leads to the 32768 byte limit.

The fillSlice function of the PagedBytes.Reader will return a block of bytes 
that is either inside one block or overlapping two blocks. If you try to give 
it a length that is over the block size it will hit the out of bounds 
exception. For the project I am working on, we need more than 32k bytes for our 
DocValues. We need that much rarely, but we still need that much to keep the 
search functioning. I fixed this for our project by changing fillSlices to this:

http://pastebin.com/raw.php?i=TCY8zjAi

Test unit:
http://pastebin.com/raw.php?i=Uy29BGGJ

After placing this in our Solr instance, the search no longer crashes and 
returns the correct values when the document has a DocValues field more than 
32k bytes. As far as I know there is no limit now. I haven't noticed a 
performance hit. It shouldn't really affect performance unless you have many of 
these large DocValues fields. Thank you to David for his help with this.

Edit: This only works when start == 0. Seeing if I can fix it.

  was (Author: barakatx2):
The limitation comes from PagedBytes. When PagedBytes is created it is 
given a number of bits to use per block. The blockSize is set to (1  
blockBits). From what I've seen, classes that use PagedBytes usually pass in 15 
as the blockBits. This leads to the 32768 byte limit.

The fillSlice function of the PagedBytes.Reader will return a block of bytes 
that is either inside one block or overlapping two blocks. If you try to give 
it a length that is over the block size it will hit the out of bounds 
exception. For the project I am working on, we need more than 32k bytes for our 
DocValues. We need that much rarely, but we still need that much to keep the 
search functioning. I fixed this for our project by changing fillSlices to this:

http://pastebin.com/raw.php?i=TCY8zjAi

Test unit:
http://pastebin.com/raw.php?i=Uy29BGGJ

After placing this in our Solr instance, the search no longer crashes and 
returns the correct values when the document has a DocValues field more than 
32k bytes. As far as I know there is no limit now. I haven't noticed a 
performance hit. It shouldn't really affect performance unless you have many of 
these large DocValues fields. Thank you to David for his help with this.
  
 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical

 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org