[ https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575189#comment-13575189 ]
Shai Erera commented on LUCENE-4764: ------------------------------------ bq. i wonder how it would perform if it wrote and kept in ram packed ints, since it knows whats in the byte[] We've tried that in the past. I don't remember on which issue we posted the results, but they were not compelling. I.e. what we tried is to keep the ints as int[] vs packed-ints. int[] performed (IIRC) 50% faster, while packed-int only ~6-10% faster. Also, their RAM footprint was very close. The problem is that packed-ints is only good if you know something about the numbers, i.e. their size, distribution etc. But with category ordinals, on this Wikipedia index, there's nothing "special" about them. Really every document keeps close to arbitrary integers between 1 - 2.2M ... If the following math holds -- 25 ords per document (that's 100 bytes/doc) x 6.6M documents -- that's going to be ~660MB (offsets not included). I suspect that packed-ints will consume approximately the same size (at least, per past results) but won't yield significantly better performance. Therefore if we want to cache anything at the int level, we should do an int[] caching aggregator. Mike, correct me if I'm wrong. > Faster but more RAM/Disk consuming DocValuesFormat for facets > ------------------------------------------------------------- > > Key: LUCENE-4764 > URL: https://issues.apache.org/jira/browse/LUCENE-4764 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4764.patch > > > The new default DV format for binary fields has much more > RAM-efficient encoding of the address for each document ... but it's > also a bit slower at decode time, which affects facets because we > decode for every collected docID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org