[jira] Commented: (LUCENE-1990) Add unsigned packed int impls in oal.util

Michael McCandless (JIRA) Wed, 13 Jan 2010 04:10:21 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799707#action_12799707
 ]


Michael McCandless commented on LUCENE-1990:
--------------------------------------------

How about something like this API, for writing packed ints:

{code}
abstract class Writer {
  public abstract void add(long v) throws IOException;
  public abstract void finish() throws IOException;
}
{code}

then a factory:

{code}
enum Mode {Packed, Aligned, FixedArray};

public static Writer getWriter(IndexOutput out, int valueCount, long maxValue, 
Mode mode);
{code}

(we can iterate on the names... always the hardest part).

Packed means full bit packing (most space efficient, but slowest
decode time), Aligned might waste some bits (eg for nbits=4, that's
naturally aligned, but for nbits=7, we'd waste 1 bit per long,
FixedArray (which'd use byte[], short[], int[], long[]) would
potentially waste the most bits but have the fastest decode.

If nbits happens to be 8, 16, 32, 64, the factory should just always
FixedArray I think?  And of course powers of two will automatically be
Aligned (with the per-nbits specialized code).

Wew can also default impls to underlying int[] vs long[] backing store
depending on 54/32 bit jre, and, nbits.  If jre is 32 bit but nbits is
> 32 bit I think we just use long[] backing.

For reading, a similar API:

{code}
abstract class Reader {
  public abstract long get(index);
}

public static Reader getReader(IndexInput in);
{code}


> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1990) Add unsigned packed int impls in oal.util

Reply via email to