[ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799707#action_12799707 ]
Michael McCandless commented on LUCENE-1990: -------------------------------------------- How about something like this API, for writing packed ints: {code} abstract class Writer { public abstract void add(long v) throws IOException; public abstract void finish() throws IOException; } {code} then a factory: {code} enum Mode {Packed, Aligned, FixedArray}; public static Writer getWriter(IndexOutput out, int valueCount, long maxValue, Mode mode); {code} (we can iterate on the names... always the hardest part). Packed means full bit packing (most space efficient, but slowest decode time), Aligned might waste some bits (eg for nbits=4, that's naturally aligned, but for nbits=7, we'd waste 1 bit per long, FixedArray (which'd use byte[], short[], int[], long[]) would potentially waste the most bits but have the fastest decode. If nbits happens to be 8, 16, 32, 64, the factory should just always FixedArray I think? And of course powers of two will automatically be Aligned (with the per-nbits specialized code). Wew can also default impls to underlying int[] vs long[] backing store depending on 54/32 bit jre, and, nbits. If jre is 32 bit but nbits is > 32 bit I think we just use long[] backing. For reading, a similar API: {code} abstract class Reader { public abstract long get(index); } public static Reader getReader(IndexInput in); {code} > Add unsigned packed int impls in oal.util > ----------------------------------------- > > Key: LUCENE-1990 > URL: https://issues.apache.org/jira/browse/LUCENE-1990 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Priority: Minor > Attachments: LUCENE-1990_PerformanceMeasurements20100104.zip > > > There are various places in Lucene that could take advantage of an > efficient packed unsigned int/long impl. EG the terms dict index in > the standard codec in LUCENE-1458 could subsantially reduce it's RAM > usage. FieldCache.StringIndex could as well. And I think "load into > RAM" codecs like the one in TestExternalCodecs could use this too. > I'm picturing something very basic like: > {code} > interface PackedUnsignedLongs { > long get(long index); > void set(long index, long value); > } > {code} > Plus maybe an iterator for getting and maybe also for setting. If it > helps, most of the usages of this inside Lucene will be "write once" > so eg the set could make that an assumption/requirement. > And a factory somewhere: > {code} > PackedUnsignedLongs create(int count, long maxValue); > {code} > I think we should simply autogen the code (we can start from the > autogen code in LUCENE-1410), or, if there is an good existing impl > that has a compatible license that'd be great. > I don't have time near-term to do this... so if anyone has the itch, > please jump! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org