[ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839852#action_12839852 ]
Toke Eskildsen commented on LUCENE-1990: ---------------------------------------- Some thoughts on avoiding the generic division by experimenting with reciprocal multiplication: For aligned, the sane number of values/block are [3, 5, 6, 7, 8, 9, 10, 16, 21, 32, 64]. I tried testing index from 0 to Integer.MAX_VALUE with these divisors and reciprocal multiplication. It worked perfectly for all divisors except [5, 7, 9, 10, 21]. Unfortunately it already falls for divisor 21 at index 252645140, which makes it useless as a full replacement. If one were so inclined, it would be possible to select aligned implementation based on valueCount, with fallback to the "slow" version. The gain of using fast division seems quite substantial as it makes aligned 14-40% faster than packed (note: Just tested on a single machine). However, re-introducing aligned with four different implementations (Aligned32, Aligned32Fast, Aligned64, Aligned64Fast) is rather daunting and it would make the selection code really messy. I can see that there are well-known tricks to get around the rounding errors. Some are described at http://www.cs.uiowa.edu/~jones/bcd/divide.html#fixed . I don't know if these extra tricks would negate the 14-40% speed gain though. Since I would like to get the patch out of the door, I vote for keeping aligned disabled and just note that more bit fiddling might make it attractive at some point. > Add unsigned packed int impls in oal.util > ----------------------------------------- > > Key: LUCENE-1990 > URL: https://issues.apache.org/jira/browse/LUCENE-1990 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Flex Branch > Reporter: Michael McCandless > Priority: Minor > Fix For: Flex Branch > > Attachments: generated_performance-te20100226.txt, > LUCENE-1990-te20100122.patch, LUCENE-1990-te20100210.patch, > LUCENE-1990-te20100212.patch, LUCENE-1990-te20100223.patch, > LUCENE-1990-te20100226.patch, LUCENE-1990-te20100226b.patch, > LUCENE-1990-te20100226c.patch, LUCENE-1990-te20100301.patch, > LUCENE-1990.patch, LUCENE-1990_PerformanceMeasurements20100104.zip, > perf-mkm-20100227.txt, performance-20100301.txt, performance-te20100226.txt > > > There are various places in Lucene that could take advantage of an > efficient packed unsigned int/long impl. EG the terms dict index in > the standard codec in LUCENE-1458 could subsantially reduce it's RAM > usage. FieldCache.StringIndex could as well. And I think "load into > RAM" codecs like the one in TestExternalCodecs could use this too. > I'm picturing something very basic like: > {code} > interface PackedUnsignedLongs { > long get(long index); > void set(long index, long value); > } > {code} > Plus maybe an iterator for getting and maybe also for setting. If it > helps, most of the usages of this inside Lucene will be "write once" > so eg the set could make that an assumption/requirement. > And a factory somewhere: > {code} > PackedUnsignedLongs create(int count, long maxValue); > {code} > I think we should simply autogen the code (we can start from the > autogen code in LUCENE-1410), or, if there is an good existing impl > that has a compatible license that'd be great. > I don't have time near-term to do this... so if anyone has the itch, > please jump! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org