[jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

Edward Drapkin (JIRA) Wed, 05 May 2010 12:17:26 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864454#action_12864454
 ]


Edward Drapkin commented on LUCENE-2447:
----------------------------------------

It's not entirely the fact that creating a MultiSearcher per request is too 
heavy.  if you'll look at 2440, I also modified ParallelMultiSearcher to 
support a fixed thread pool;  what I'm worried about is, even with a fixed 
thread pool of something small like 4 threads, the concurrent request count 
could spiral the amount of threads that the JVM has to deal with out of 
control.  If I can use the same ParallelMultiSearcher across requests, with a 
fixed thread pool of something sane like 16 or 24 threads, then I can be 
reasonably sure that this particular class isn't going to spiral thread counts 
out of control.  

As far as stuffing everything into the same index, we've looked into that and 
determined that it isn't a real possibility because the size of the indexes - 
there's quite a few ranging from a few MB to a few GB of data - would make the 
merge process relatively expensive and coupled with the fact that the indexes 
themselves are built and maintained separately, we'd be needing to run the 
merging process too frequently for it to be feasible.  

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2447
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2447
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Irrelevant
>            Reporter: Edward Drapkin
>            Priority: Minor
>         Attachments: LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set<Searchable> which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

Reply via email to