[jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2008-01-20 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560912#action_12560912
 ] 

vivek commented on LUCENE-855:
--

Any plans to have this part of Lucene 2.3?

> MemoryCachedRangeFilter to boost performance of Range queries
> -
>
> Key: LUCENE-855
> URL: https://issues.apache.org/jira/browse/LUCENE-855
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.1
>Reporter: Andy Liu
>Assignee: Otis Gospodnetic
> Attachments: contrib-filters.tar.gz, FieldCacheRangeFilter.patch, 
> FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, 
> FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, 
> FieldCacheRangeFilter.patch, MemoryCachedRangeFilter.patch, 
> MemoryCachedRangeFilter_1.4.patch, TestRangeFilterPerformanceComparison.java, 
> TestRangeFilterPerformanceComparison.java
>
>
> Currently RangeFilter uses TermEnum and TermDocs to find documents that fall 
> within the specified range.  This requires iterating through every single 
> term in the index and can get rather slow for large document sets.
> MemoryCachedRangeFilter reads all  pairs of a given field, 
> sorts by value, and stores in a SortedFieldCache.  During bits(), binary 
> searches are used to find the start and end indices of the lower and upper 
> bound values.  The BitSet is populated by all the docId values that fall in 
> between the start and end indices.
> TestMemoryCachedRangeFilterPerformance creates a 100K RAMDirectory-backed 
> index with random date values within a 5 year range.  Executing bits() 1000 
> times on standard RangeQuery using random date intervals took 63904ms.  Using 
> MemoryCachedRangeFilter, it took 876ms.  Performance increase is less 
> dramatic when you have less unique terms in a field or using less number of 
> documents.
> Currently MemoryCachedRangeFilter only works with numeric values (values are 
> stored in a long[] array) but it can be easily changed to support Strings.  A 
> side "benefit" of storing the values are stored as longs, is that there's no 
> longer the need to make the values lexographically comparable, i.e. padding 
> numeric values with zeros.
> The downside of using MemoryCachedRangeFilter is there's a fairly significant 
> memory requirement.  So it's designed to be used in situations where range 
> filter performance is critical and memory consumption is not an issue.  The 
> memory requirements are: (sizeof(int) + sizeof(long)) * numDocs.  
> MemoryCachedRangeFilter also requires a warmup step which can take a while to 
> run in large datasets (it took 40s to run on a 3M document corpus).  Warmup 
> can be called explicitly or is automatically called the first time 
> MemoryCachedRangeFilter is applied using a given field.
> So in summery, MemoryCachedRangeFilter can be useful when:
> - Performance is critical
> - Memory is not an issue
> - Field contains many unique numeric values
> - Index contains large amount of documents

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-390) Contribution: LuceneIndexAccessor

2008-01-16 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559694#action_12559694
 ] 

vivash edited comment on LUCENE-390 at 1/16/08 2:33 PM:
---

Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I 
see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

(the package last updated 09/2007: http://myhardshadow.com/indexaccessorapi/ )

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.

  was (Author: vivash):
Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? 
I see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.
  
> Contribution: LuceneIndexAccessor
> -
>
> Key: LUCENE-390
> URL: https://issues.apache.org/jira/browse/LUCENE-390
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: unspecified
> Environment: Operating System: other
> Platform: Other
>Reporter: Maik Schreiber
>Assignee: Lucene Developers
>Priority: Minor
> Attachments: lucene-indexaccess-0.2.0.zip
>
>
> As per this post:
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL 
> PROTECTED]
> I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The
> Apache Software Foundation.
> Please note that it won't compile out of the box, but that should be fairly 
> easy
> to fix using a CVS version of Lucene. Also it makes use of Log4J.
> I'm fine with moving the classes to any package you like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-390) Contribution: LuceneIndexAccessor

2008-01-16 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559694#action_12559694
 ] 

vivash edited comment on LUCENE-390 at 1/16/08 2:33 PM:
---

Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I 
see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

(the package last updated 09/2007: http://myhardshadow.com/indexaccessorapi/ )

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.

  was (Author: vivash):
Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? 
I see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.
  
> Contribution: LuceneIndexAccessor
> -
>
> Key: LUCENE-390
> URL: https://issues.apache.org/jira/browse/LUCENE-390
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: unspecified
> Environment: Operating System: other
> Platform: Other
>Reporter: Maik Schreiber
>Assignee: Lucene Developers
>Priority: Minor
> Attachments: lucene-indexaccess-0.2.0.zip
>
>
> As per this post:
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL 
> PROTECTED]
> I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The
> Apache Software Foundation.
> Please note that it won't compile out of the box, but that should be fairly 
> easy
> to fix using a CVS version of Lucene. Also it makes use of Log4J.
> I'm fine with moving the classes to any package you like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-390) Contribution: LuceneIndexAccessor

2008-01-16 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559694#action_12559694
 ] 

vivek commented on LUCENE-390:
--

Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I 
see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.

> Contribution: LuceneIndexAccessor
> -
>
> Key: LUCENE-390
> URL: https://issues.apache.org/jira/browse/LUCENE-390
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: unspecified
> Environment: Operating System: other
> Platform: Other
>Reporter: Maik Schreiber
>Assignee: Lucene Developers
>Priority: Minor
> Attachments: lucene-indexaccess-0.2.0.zip
>
>
> As per this post:
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL 
> PROTECTED]
> I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The
> Apache Software Foundation.
> Please note that it won't compile out of the box, but that should be fairly 
> easy
> to fix using a CVS version of Lucene. Also it makes use of Log4J.
> I'm fine with moving the classes to any package you like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-390) Contribution: LuceneIndexAccessor

2008-01-16 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559694#action_12559694
 ] 

vivek commented on LUCENE-390:
--

Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I 
see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.

> Contribution: LuceneIndexAccessor
> -
>
> Key: LUCENE-390
> URL: https://issues.apache.org/jira/browse/LUCENE-390
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: unspecified
> Environment: Operating System: other
> Platform: Other
>Reporter: Maik Schreiber
>Assignee: Lucene Developers
>Priority: Minor
> Attachments: lucene-indexaccess-0.2.0.zip
>
>
> As per this post:
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL 
> PROTECTED]
> I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The
> Apache Software Foundation.
> Please note that it won't compile out of the box, but that should be fairly 
> easy
> to fix using a CVS version of Lucene. Also it makes use of Log4J.
> I'm fine with moving the classes to any package you like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]