[jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries
[ https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560912#action_12560912 ] vivek commented on LUCENE-855: -- Any plans to have this part of Lucene 2.3? > MemoryCachedRangeFilter to boost performance of Range queries > - > > Key: LUCENE-855 > URL: https://issues.apache.org/jira/browse/LUCENE-855 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.1 >Reporter: Andy Liu >Assignee: Otis Gospodnetic > Attachments: contrib-filters.tar.gz, FieldCacheRangeFilter.patch, > FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, > FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, > FieldCacheRangeFilter.patch, MemoryCachedRangeFilter.patch, > MemoryCachedRangeFilter_1.4.patch, TestRangeFilterPerformanceComparison.java, > TestRangeFilterPerformanceComparison.java > > > Currently RangeFilter uses TermEnum and TermDocs to find documents that fall > within the specified range. This requires iterating through every single > term in the index and can get rather slow for large document sets. > MemoryCachedRangeFilter reads all pairs of a given field, > sorts by value, and stores in a SortedFieldCache. During bits(), binary > searches are used to find the start and end indices of the lower and upper > bound values. The BitSet is populated by all the docId values that fall in > between the start and end indices. > TestMemoryCachedRangeFilterPerformance creates a 100K RAMDirectory-backed > index with random date values within a 5 year range. Executing bits() 1000 > times on standard RangeQuery using random date intervals took 63904ms. Using > MemoryCachedRangeFilter, it took 876ms. Performance increase is less > dramatic when you have less unique terms in a field or using less number of > documents. > Currently MemoryCachedRangeFilter only works with numeric values (values are > stored in a long[] array) but it can be easily changed to support Strings. A > side "benefit" of storing the values are stored as longs, is that there's no > longer the need to make the values lexographically comparable, i.e. padding > numeric values with zeros. > The downside of using MemoryCachedRangeFilter is there's a fairly significant > memory requirement. So it's designed to be used in situations where range > filter performance is critical and memory consumption is not an issue. The > memory requirements are: (sizeof(int) + sizeof(long)) * numDocs. > MemoryCachedRangeFilter also requires a warmup step which can take a while to > run in large datasets (it took 40s to run on a 3M document corpus). Warmup > can be called explicitly or is automatically called the first time > MemoryCachedRangeFilter is applied using a given field. > So in summery, MemoryCachedRangeFilter can be useful when: > - Performance is critical > - Memory is not an issue > - Field contains many unique numeric values > - Index contains large amount of documents -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-390) Contribution: LuceneIndexAccessor
[ https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559694#action_12559694 ] vivash edited comment on LUCENE-390 at 1/16/08 2:33 PM: --- Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 (the package last updated 09/2007: http://myhardshadow.com/indexaccessorapi/ ) We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. was (Author: vivash): Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. > Contribution: LuceneIndexAccessor > - > > Key: LUCENE-390 > URL: https://issues.apache.org/jira/browse/LUCENE-390 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: unspecified > Environment: Operating System: other > Platform: Other >Reporter: Maik Schreiber >Assignee: Lucene Developers >Priority: Minor > Attachments: lucene-indexaccess-0.2.0.zip > > > As per this post: > http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL > PROTECTED] > I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The > Apache Software Foundation. > Please note that it won't compile out of the box, but that should be fairly > easy > to fix using a CVS version of Lucene. Also it makes use of Log4J. > I'm fine with moving the classes to any package you like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-390) Contribution: LuceneIndexAccessor
[ https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559694#action_12559694 ] vivash edited comment on LUCENE-390 at 1/16/08 2:33 PM: --- Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 (the package last updated 09/2007: http://myhardshadow.com/indexaccessorapi/ ) We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. was (Author: vivash): Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. > Contribution: LuceneIndexAccessor > - > > Key: LUCENE-390 > URL: https://issues.apache.org/jira/browse/LUCENE-390 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: unspecified > Environment: Operating System: other > Platform: Other >Reporter: Maik Schreiber >Assignee: Lucene Developers >Priority: Minor > Attachments: lucene-indexaccess-0.2.0.zip > > > As per this post: > http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL > PROTECTED] > I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The > Apache Software Foundation. > Please note that it won't compile out of the box, but that should be fairly > easy > to fix using a CVS version of Lucene. Also it makes use of Log4J. > I'm fine with moving the classes to any package you like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-390) Contribution: LuceneIndexAccessor
[ https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559694#action_12559694 ] vivek commented on LUCENE-390: -- Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. > Contribution: LuceneIndexAccessor > - > > Key: LUCENE-390 > URL: https://issues.apache.org/jira/browse/LUCENE-390 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: unspecified > Environment: Operating System: other > Platform: Other >Reporter: Maik Schreiber >Assignee: Lucene Developers >Priority: Minor > Attachments: lucene-indexaccess-0.2.0.zip > > > As per this post: > http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL > PROTECTED] > I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The > Apache Software Foundation. > Please note that it won't compile out of the box, but that should be fairly > easy > to fix using a CVS version of Lucene. Also it makes use of Log4J. > I'm fine with moving the classes to any package you like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-390) Contribution: LuceneIndexAccessor
[ https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559694#action_12559694 ] vivek commented on LUCENE-390: -- Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. > Contribution: LuceneIndexAccessor > - > > Key: LUCENE-390 > URL: https://issues.apache.org/jira/browse/LUCENE-390 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: unspecified > Environment: Operating System: other > Platform: Other >Reporter: Maik Schreiber >Assignee: Lucene Developers >Priority: Minor > Attachments: lucene-indexaccess-0.2.0.zip > > > As per this post: > http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL > PROTECTED] > I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The > Apache Software Foundation. > Please note that it won't compile out of the box, but that should be fairly > easy > to fix using a CVS version of Lucene. Also it makes use of Log4J. > I'm fine with moving the classes to any package you like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]