[ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795861#action_12795861
 ] 

Toke Eskildsen commented on LUCENE-1990:
----------------------------------------

The first section if for 1M values in the structure, the second is for 10M. As 
the CPU on the test-machine (Intel T2400) has only 2MB of level 2 cache, the 
increased processing time for the seemingly same amount of work is an effect of 
more cache-misses.

Caching also accounts for why the packed version is sometimes better than the 
aligned. For values representable as 9 or 17 bits, the aligned version needs 16 
and 32 bits respectively. In the case with 10M values, the packed version uses 
1.1MB and 2.1MB for 9 and 17 bits respectively, while the aligned uses 2MB and 
4MB respectively. The simpler logic of the aligned version does not compensate 
enough for the higher amount of trips around main memory.

I did not generate any specialized code for the aligned case: No matter the 
number of bits/value, the amount of shifts, masks and ors is always the same. 
If the number of bits/value is known beforehand, specialized cases should be 
used (I made a factory that selects between packed, aligned and direct (#3), 
depending on the number of bits/value). The reason for not doing so in the 
first place is that I wanted to let the structure auto-adjust the bits/value 
when a new value was added. Having different implementations encapsulated in 
the same class means another level of indirection or conditionals, both of 
which I wanted to avoid for performance reasons. That being said, I haven't 
tested how much of a penalty this would be.

The standard use case seems to be some sort of update-round, after which no 
updates are performed. Having a cleanupAndOptimize-call that potentially 
creates a new and optimized structure, would fit well into this and would avoid 
the indirection / conditional penalty.

A whole other matter is long vs. ints. I've tried using longs instead of ints 
as the backing array and the penalty on my 32bit processor was very high (I 
need to make some tests on this). If it must be possible to set and get longs, 
it's hard to avoid using long[] as the internal structure, but if ints are 
accepted as the only valid values, selecting long[] as backing array for 64 bit 
machines and int[] for 32 bit, might be the solution.

All this calls for a factory-approach to hide the fairly complex task of 
choosing the right implementation.

> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to