[ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802780#action_12802780
 ] 

Michael McCandless commented on LUCENE-1990:
--------------------------------------------

bq.Introducing yet another level of indirection and making a 
byte/short/int/long-prvider detached from the implementation of the packed 
values it tempting.

You mean the layer that stores the minValue, so that the full range is
supported?  I actually think we should absorb that into packed ints,
so it's only one method call per lookup, and specialize the "positive
only" cases to avoid the extra add per lookup.

With that fix, it's still a method call per lookup, but I don't see
how we can get away from that, unless we allow for exposure of the raw
array for the no-packing cases (which we could consider...).

Remember we use packed ints in places where we can accept some loss of
CPU perf. for improvements in RAM usage (see the comment I just added
to LUCENE-2186).

bq.  However, as the Reader must have (fast) random access, wouldn't it make 
sense to make it possible to update values?

Yeah, we do eventually want CSF to be updateable, but I don't think we
need this for phase 1?  Likewise, I think all we need now for Lucene
is a "WriteOnceWriter", not a "RandomAccessWriter".  Ie, you open a
writer, you add (sequentially) all values, you close.

bq. ...should the index also be a long?

I would stick with int now (we are doing this for Lucene, whose docIDs
are still ints...).  Design for today.

bq. The whole 32bit vs. 64bit as backing array does present a bit of a problem 
with persistence. We'll be in a situation where the index will be optimized for 
the architecture used for building, not the one used for searching. Leaving the 
option of a future mmap open means that it is not possible to do a conversion 
when retrieving the bits, so I have no solution for this (other than doing 
memory-only).

I'm confused -- a future mmap impl shouldn't put pressure on the file
format used by packed ints today?  Ie, a future mmap impl can use a
totally different format than the designed-to-be-slurped-into-RAM
format for packed ints, today?

Also, what do you mean by optimized for building not searching?

Note that on 32 bit machines, if there is actually a gain, we can make
a backing store with ints yet still allow for storage of nbits>32?  It
"just" means a value may be split across 2 or 3 values?


> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990.patch, 
> LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to