[jira] Commented: (LUCENE-1990) Add unsigned packed int impls in oal.util

Paul Elschot (JIRA) Wed, 20 Jan 2010 05:31:19 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802829#action_12802829
 ]


Paul Elschot commented on LUCENE-1990:
--------------------------------------

I've made a remark at LUCENE-1410 (a first attempt at a PFOR implementation) 
about the header structure for encoding this.
One thing that is not covered here is how to deal with input arrays with 
intermediate length that are shorter than 32 and longer than 3 or 4. Shorter 
ones can easily be encoded as vByte.
Simple9 might be a solution, but it has only 28 data bits and 9 different 
encoding cases so it appears to be somewhat small.
There is first attempt at Simple9 at LUCENE-2189.

Since the discussion here is on alignment (int/long) I'm wondering how (and 
whether) to go from the current byte aligned structures to int aligned. Using 
aligned ints would save the shifting done at IndexInput.getInt() that reads 4 
bytes and shifts them into place to create an int from them.
Simple9 can be int aligned and I'd like to add bigger variations of that, but 
peferably only ones that add a multiple of 4 bytes.

So would make sense to add functionality to IndexInput and IndexOutput to allow 
int aligned access?
Are java's data streams and/or nio buffers smart enough to avoid the byte 
shifting for ints in such cases?


> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990.patch, 
> LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1990) Add unsigned packed int impls in oal.util

Reply via email to