[
https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575199#comment-13575199
]
Shai Erera commented on LUCENE-4764:
------------------------------------
I think that it would actually be interesting to test *only* VInt, without
dgap. Because the ords seem to be arbitrary, I'm not even sure what they buy
us. Mike, can you try that? Index with a Sorting(Unique(VInt8)) and modify
FastCountingFacetsAggregator to not do dgap? Would be interesting to see the
effects on compression as well as speed. Dgap is something you want to do if
you suspect that a document will have e.g. higher ordinals, that are close to
each other in such a way that dgap would make them compress better ...
Robert, if I understand your proposal correctly, what you suggest is to encode:
int[] -- pairs of highest/lowest ordinal in a document + length (#additional
ords)
byte[] -- a packed-int of deltas for all documents (but deltas are computed off
the absolute ord in the int[]
Why would that be better than a single byte[] (packed-ints) + offsets?
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
> Key: LUCENE-4764
> URL: https://issues.apache.org/jira/browse/LUCENE-4764
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]