[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657498#comment-13657498 ] Barakat Barakat commented on LUCENE-4583: - Just to clarify, before 4.3 I was fixing the bug by changing PagedBytes#fillSlice to the first patch I posted in this issue. StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.4 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barakat Barakat updated LUCENE-4583: Attachment: LUCENE-4583.patch I recently switched from 4.1 to 4.3, and my patch needed to be updated because of the changes to DocValues. The problem was almost fixed for BinaryDocValues, but it just needed one little change. I've attached a patch that removes the BinaryDocValues exception when the length is over BYTE_BLOCK_SIZE (32k), fixes ByteBlockPool#readBytes:348, and changes the TestDocValuesIndexing#testTooLargeBytes test to check for accuracy. StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.3 Attachments: LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barakat Barakat updated LUCENE-4583: Attachment: LUCENE-4583.patch I updated the code to work when start isn't zero. The code can still crash if you ask for a length that goes beyond the total size of the paged bytes, but I'm not sure how you guys like to prevent things like that. The code seems to be working fine with our Solr core so far. I am new to posting patches and writing test units in Java so please let me know if there is anything wrong with the code. StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.1 Attachments: LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509044#comment-13509044 ] Barakat Barakat commented on LUCENE-4583: - The limitation comes from PagedBytes. When PagedBytes is created it is given a number of bits to use per block. The blockSize is set to (1 blockBits). From what I've seen, classes that use PagedBytes usually pass in 15 as the blockBits. This leads to the 32768 byte limit. The fillSlice function of the PagedBytes.Reader will return a block of bytes that is either inside one block or overlapping two blocks. If you try to give it a length that is over the block size it will hit the out of bounds exception. For the project I am working on, we need more than 32k bytes for our DocValues. We need that much rarely, but we still need that much to keep the search functioning. I fixed this for our project by changing fillSlices to this: http://pastebin.com/raw.php?i=TCY8zjAi Test unit: http://pastebin.com/raw.php?i=Uy29BGGJ After placing this in our Solr instance, the search no longer crashes and returns the correct values when the document has a DocValues field more than 32k bytes. As far as I know there is no limit now. I haven't noticed a performance hit. It shouldn't really affect performance unless you have many of these large DocValues fields. Thank you to David for his help with this. StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509044#comment-13509044 ] Barakat Barakat edited comment on LUCENE-4583 at 12/3/12 9:54 PM: -- The limitation comes from PagedBytes. When PagedBytes is created it is given a number of bits to use per block. The blockSize is set to (1 blockBits). From what I've seen, classes that use PagedBytes usually pass in 15 as the blockBits. This leads to the 32768 byte limit. The fillSlice function of the PagedBytes.Reader will return a block of bytes that is either inside one block or overlapping two blocks. If you try to give it a length that is over the block size it will hit the out of bounds exception. For the project I am working on, we need more than 32k bytes for our DocValues. We need that much rarely, but we still need that much to keep the search functioning. I fixed this for our project by changing fillSlices to this: http://pastebin.com/raw.php?i=TCY8zjAi Test unit: http://pastebin.com/raw.php?i=Uy29BGGJ After placing this in our Solr instance, the search no longer crashes and returns the correct values when the document has a DocValues field more than 32k bytes. As far as I know there is no limit now. I haven't noticed a performance hit. It shouldn't really affect performance unless you have many of these large DocValues fields. Thank you to David for his help with this. Edit: This only works when start == 0. Seeing if I can fix it. was (Author: barakatx2): The limitation comes from PagedBytes. When PagedBytes is created it is given a number of bits to use per block. The blockSize is set to (1 blockBits). From what I've seen, classes that use PagedBytes usually pass in 15 as the blockBits. This leads to the 32768 byte limit. The fillSlice function of the PagedBytes.Reader will return a block of bytes that is either inside one block or overlapping two blocks. If you try to give it a length that is over the block size it will hit the out of bounds exception. For the project I am working on, we need more than 32k bytes for our DocValues. We need that much rarely, but we still need that much to keep the search functioning. I fixed this for our project by changing fillSlices to this: http://pastebin.com/raw.php?i=TCY8zjAi Test unit: http://pastebin.com/raw.php?i=Uy29BGGJ After placing this in our Solr instance, the search no longer crashes and returns the correct values when the document has a DocValues field more than 32k bytes. As far as I know there is no limit now. I haven't noticed a performance hit. It shouldn't really affect performance unless you have many of these large DocValues fields. Thank you to David for his help with this. StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org