[
https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487897
]
Andy Liu commented on LUCENE-855:
---------------------------------
Hey Matt, I get this exception when running your newest FCRF with the
performance test. Can you check to see if you get this also?
java.lang.ArrayIndexOutOfBoundsException: 100000
at
org.apache.lucene.search.FieldCacheRangeFilter$5.get(FieldCacheRangeFilter.java:231)
at
org.apache.lucene.search.IndexSearcher$1.collect(IndexSearcher.java:136)
at org.apache.lucene.search.Scorer.score(Scorer.java:49)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:113)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:74)
at org.apache.lucene.search.Hits.<init>(Hits.java:53)
at org.apache.lucene.search.Searcher.search(Searcher.java:46)
at
org.apache.lucene.misc.TestRangeFilterPerformanceComparison$Benchmark.go(TestRangeFilterPerformanceComparison.java:312)
at
org.apache.lucene.misc.TestRangeFilterPerformanceComparison.testPerformance(TestRangeFilterPerformanceComparison.java:201)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:128)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
> MemoryCachedRangeFilter to boost performance of Range queries
> -------------------------------------------------------------
>
> Key: LUCENE-855
> URL: https://issues.apache.org/jira/browse/LUCENE-855
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.1
> Reporter: Andy Liu
> Assigned To: Otis Gospodnetic
> Attachments: FieldCacheRangeFilter.patch,
> FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch,
> FieldCacheRangeFilter.patch, MemoryCachedRangeFilter.patch,
> MemoryCachedRangeFilter_1.4.patch, TestRangeFilterPerformanceComparison.java,
> TestRangeFilterPerformanceComparison.java
>
>
> Currently RangeFilter uses TermEnum and TermDocs to find documents that fall
> within the specified range. This requires iterating through every single
> term in the index and can get rather slow for large document sets.
> MemoryCachedRangeFilter reads all <docId, value> pairs of a given field,
> sorts by value, and stores in a SortedFieldCache. During bits(), binary
> searches are used to find the start and end indices of the lower and upper
> bound values. The BitSet is populated by all the docId values that fall in
> between the start and end indices.
> TestMemoryCachedRangeFilterPerformance creates a 100K RAMDirectory-backed
> index with random date values within a 5 year range. Executing bits() 1000
> times on standard RangeQuery using random date intervals took 63904ms. Using
> MemoryCachedRangeFilter, it took 876ms. Performance increase is less
> dramatic when you have less unique terms in a field or using less number of
> documents.
> Currently MemoryCachedRangeFilter only works with numeric values (values are
> stored in a long[] array) but it can be easily changed to support Strings. A
> side "benefit" of storing the values are stored as longs, is that there's no
> longer the need to make the values lexographically comparable, i.e. padding
> numeric values with zeros.
> The downside of using MemoryCachedRangeFilter is there's a fairly significant
> memory requirement. So it's designed to be used in situations where range
> filter performance is critical and memory consumption is not an issue. The
> memory requirements are: (sizeof(int) + sizeof(long)) * numDocs.
> MemoryCachedRangeFilter also requires a warmup step which can take a while to
> run in large datasets (it took 40s to run on a 3M document corpus). Warmup
> can be called explicitly or is automatically called the first time
> MemoryCachedRangeFilter is applied using a given field.
> So in summery, MemoryCachedRangeFilter can be useful when:
> - Performance is critical
> - Memory is not an issue
> - Field contains many unique numeric values
> - Index contains large amount of documents
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]