[ https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563689#comment-13563689 ]
Adrien Grand commented on LUCENE-4609: -------------------------------------- bq. Also, you need to use a tasks file that adds +dateFacets or +allFacets to each task ... and if you want to use +allFacets you need to pull down the latest line file docs that has the added facet dimensions ... OK, this is what I was missing. :-) bq. I had to change the PackedBytes.get to take a [reused] IntsRef in, else I was hitting thread-safety issues (AIOOBE)... Oops, I didn't know it would be called from multiple threads. bq. So we finally have something faster than dGap(vInt)! Good news! However the patch has a nocommit because it uses a byte[] to store data, so it cannot grow beyond 2G. I hope that the paging won't make it too much slower. (But maybe it could help reduce memory usage, if each page can have a different number of bits per value?) I think I'll open a separate issue to make encoding to byte[] and decoding from byte[] byte-aligned (it is long-aligned today). I ran a benchmark with the Lucene41 PF and the deltas were small (probably noise). > Write a PackedIntsEncoder/Decoder for facets > -------------------------------------------- > > Key: LUCENE-4609 > URL: https://issues.apache.org/jira/browse/LUCENE-4609 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet > Reporter: Shai Erera > Priority: Minor > Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, > LUCENE-4609.patch, LUCENE-4609.patch > > > Today the facets API lets you write IntEncoder/Decoder to encode/decode the > category ordinals. We have several such encoders, including VInt (default), > and block encoders. > It would be interesting to implement and benchmark a > PackedIntsEncoder/Decoder, with potentially two variants: (1) receives > bitsPerValue up front, when you e.g. know that you have a small taxonomy and > the max value you can see and (2) one that decides for each doc on the > optimal bitsPerValue, writes it as a header in the byte[] or something. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org