[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657509#comment-13657509 ]
Robert Muir commented on LUCENE-4583: ------------------------------------- No, I don't support changing this codec. Its an all-in-memory one (which is an unfortunate default, but must be until various algorithms in grouping/join/etc package are improved such that we can safely use something more like DiskDV as a default). Other all-memory implementations like DirectPostingsFormat/MemoryPostings have similar limitations, even the specialized faceting one (e.g. entire segment cannot have more than 2GB total bytes). I dont want to add a bunch of stuff in a loop here or any of that, because it only causes additional complexity for the normal case, and I think its unreasonable to use a RAM docvalues impl if you have more than *32KB* per-document cost anyway. Sorry, thats just crazy: and I don't think we should add any additional trappy codec to support that. So if you want ridiculously huge per-document values, just use DiskDV which supports that. These abuse cases are extreme: if you really really want that all in RAM, then use it with FileSwitchDirectory. I mentioned before I was worried about this issue spinning out of control, and it appears this has taken place. Given these developments, i'd rather we not change the current limit at all. > StraightBytesDocValuesField fails if bytes > 32k > ------------------------------------------------ > > Key: LUCENE-4583 > URL: https://issues.apache.org/jira/browse/LUCENE-4583 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 4.0, 4.1, 5.0 > Reporter: David Smiley > Priority: Critical > Fix For: 4.4 > > Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, > LUCENE-4583.patch, LUCENE-4583.patch > > > I didn't observe any limitations on the size of a bytes based DocValues field > value in the docs. It appears that the limit is 32k, although I didn't get > any friendly error telling me that was the limit. 32k is kind of small IMO; > I suspect this limit is unintended and as such is a bug. The following > test fails: > {code:java} > public void testBigDocValue() throws IOException { > Directory dir = newDirectory(); > IndexWriter writer = new IndexWriter(dir, writerConfig(false)); > Document doc = new Document(); > BytesRef bytes = new BytesRef((4+4)*4097);//4096 works > bytes.length = bytes.bytes.length;//byte data doesn't matter > doc.add(new StraightBytesDocValuesField("dvField", bytes)); > writer.addDocument(doc); > writer.commit(); > writer.close(); > DirectoryReader reader = DirectoryReader.open(dir); > DocValues docValues = MultiDocValues.getDocValues(reader, "dvField"); > //FAILS IF BYTES IS BIG! > docValues.getSource().getBytes(0, bytes); > reader.close(); > dir.close(); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org