[ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536379#comment-13536379
 ] 

Michael McCandless commented on LUCENE-4609:
--------------------------------------------

bq. Well, It's a 'end point' encoder, meaning it encodes whatever values are 
received directly to the output.

Ahh right, OK.

bq. PForDelta is indeed slower. But we've met scenarios in which most dgaps are 
small - hence the NOnes, and the Four/Eight Flag encoders. 

OK makes sense.

{quote}
If indeed most values are small, say, could fit in 4 bits, but there's also one 
or two larger values which would require 12 or 14 bits, we could benefit hear 
greatly.
This is all relevant only where there are large amount of categories per 
document.
{quote}

Right ... I'm just wondering how often this happens in "typical" (if there is 
such a thing) facet aps.  Decode speed trumps compression ratios here, I think.

bq. That is right. To be frank, I'm not 100% sure what PackedInts does.. nor 
how large its header is.. 

The header is very large ... really you should only need 1) bpv, and 2) 
bytes.length (which I think you already have, via both payloads and DocValues). 
 If the PackedInts API isn't flexible enough for you to feed it bpv and 
bytes.length then let's fix that!

bq. For bits-per-value smaller than the size of a byte, there's a need to know 
how many bits should be left out from the last read byte.

Hopefully you don't need to separately encode "leftover unused bits" ... ie 
byte[].length (which is "free" here, since codec already stores this) should 
suffice.
                
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
>                 Key: LUCENE-4609
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4609
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Priority: Minor
>         Attachments: LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to