[
https://issues.apache.org/jira/browse/LUCENE-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021757#comment-13021757
]
Steven Rowe commented on LUCENE-3037:
-------------------------------------
I implemented the theoretically O(log log n) complexity algorithm described
[here|http://bonsaicode.wordpress.com/2010/05/07/programming-praxis-integer-logarithms/]
and compared timing for Robert's, Yonik's, and my implementation. Yonik's is
fastest :).
Timings (log1 is Robert's, log2 is Yonik's, and log3 is mine):
{noformat}
log1: 2384ms for 100000000 iterations.
log2: 1068ms for 100000000 iterations.
log3: 1697ms for 100000000 iterations.
{noformat}
Here's the test, which also compares log2 and log3 against log1 for
correctness, or at least consistency ({{-Xmx4g}} required to avoid OOMs with
100M iterations):
{code:java}
public void testLogMethodPerformance() {
Random r = new Random();
int iterations = 100000000;
int[] docFreqs = new int[iterations];
int[] skipIntervals = new int[iterations];
int[] log1Results = new int[iterations];
int[] log2Results = new int[iterations];
int[] log3Results = new int[iterations];
for (int i = 0 ; i < iterations ; ++i) {
docFreqs[i] = r.nextInt(1000000000);
skipIntervals[i] = r.nextInt(1023) + 2;
}
long start = System.currentTimeMillis();
for (int i = 0 ; i < iterations ; ++i) {
log1Results[i] = MultiLevelSkipListReader.log(docFreqs[i],
skipIntervals[i]);
}
long stop = System.currentTimeMillis();
System.err.println("log1: " + (stop - start) + "ms for " + iterations + "
iterations.");
start = System.currentTimeMillis();
for (int i = 0 ; i < iterations ; ++i) {
log2Results[i] = MultiLevelSkipListReader.log2(docFreqs[i],
skipIntervals[i]);
}
stop = System.currentTimeMillis();
System.err.println("log2: " + (stop - start) + "ms for " + iterations + "
iterations.");
start = System.currentTimeMillis();
for (int i = 0 ; i < iterations ; ++i) {
log3Results[i] = MultiLevelSkipListReader.log3(docFreqs[i],
skipIntervals[i]);
}
stop = System.currentTimeMillis();
System.err.println("log3: " + (stop - start) + "ms for " + iterations + "
iterations.");
for (int i = 0 ; i < iterations ; ++i) {
assertEquals(log1Results[i], log2Results[i]);
assertEquals(log1Results[i], log3Results[i]);
}
}
{code}
Here's my implementation:
{code:java}
public static int log3(int x, int b) {
long b_lo = 1;
long b_hi = b;
long b_mid;
int lo = 0;
int hi = 1;
int mid;
// Bracket the solution by recursively squaring the base
// until the result exceeds x
while (b_hi < x) {
b_lo = b_hi;
b_hi *= b_hi;
lo = hi;
hi <<= 1;
}
// Find the solution by performing a binary search between
// the bracketing values (lo,hi) found above
while (hi - lo > 1) {
mid = (lo + hi) >> 1;
// b_mid = b_lo * b**(mid-lo)
// Java has no integer pow() method - use a loop instead.
// Yes, Math.pow(double,double) would work, but it's slower than this.
b_mid = b_lo;
for (int i = 0 ; i < mid - lo ; ++i) {
b_mid *= b;
}
if (b_mid > x) {
hi = mid;
b_hi = b_mid;
} else if (b_mid < x) {
lo = mid;
b_lo = b_mid;
} else {
return mid;
}
}
return b_hi == x ? hi : lo;
}
{code}
> TestFSTs.testRealTerms produces a corrupt index
> -----------------------------------------------
>
> Key: LUCENE-3037
> URL: https://issues.apache.org/jira/browse/LUCENE-3037
> Project: Lucene - Java
> Issue Type: Bug
> Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3037.patch, LUCENE-3037_test.patch, index.7z.001,
> index.7z.002, index.7z.003
>
>
> seems to be prox/skip related: the test passes, but the checkindex upon
> closing fails.
> ant test-core -Dtestcase=TestFSTs -Dtests.seed=-4012305283315171209:0
> -Dtests.multiplier=3 -Dtests.nightly=true
> -Dtests.linedocsfile=c:/data/enwiki.random.lines.txt.gz
> Note: to get the enwiki.random.lines.txt.gz you have to fetch it from hudson
> (warning 1 gigabyte file).
> you also have to run the test a few times to trigger it.
> ill upload the index this thing makes to this issue.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]