[jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

Edward Drapkin (JIRA) Wed, 05 May 2010 18:55:15 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864619#action_12864619
 ]


Edward Drapkin commented on LUCENE-2447:
----------------------------------------

I think that's where I have a misunderstanding and disagreement with the 
function of the multisearcher.  Because it's mostly stateless, I can't help but 
thing that it's designed to be instantiated on demand, every time that it needs 
to be used.  While with the "traditional" MultiSearcher, this doesn't seem to 
be much of a problem (3us is no time at all), it gets progressively heavier and 
more cumbersome to keep creating Executors every time with 
ParallelMultiSearcher.  OTOH, given the ability to specify an Executor at 
object creation "solves" this issue, making ParallelMultiSearcher almost as 
light as MultiSearcher itself.

Not to digress into architectural theory, but I have a disagreement with the 
way that these classes are designed to be used.  In order to make MultiSearcher 
even functional, the application needs to maintain its list of Searchables 
outside MultiSearcher itself; given the ability to specify an Executor at 
construction time (for ParallelMultiSearcher), you're now maintaining an array 
of Searchables (because we it's expensive and wasteful to create _those_ every 
time they're needed) and a thread pool management object in the calling object. 
 This reeks of state leakage to me, where the state of [Parallel]MultiSearcher 
is being maintained by an external, calling object and is being re-created 
every time it's needed, violating encapsulation conventions and practice.  
Furthermore, (with the caveat that I'm relatively new to lucene) MultiSearcher 
is itself a Searchable and this behavior is inconsistent with the way that I've 
seen other Searchables handled.  I'm not sure how much sense it makes to trade 
maintaining a reference to one Searchable (that encapsulates several) to 
maintaining a list/array/collection of references to other Searchables... 
especially when you look into multi-threaded apps and non-final collections of 
Searchables that may start to modify the state (however transient) of your 
MultiSearcher outside of the class itself.

I'm not sure how clear it is from my previous comments and the code itself, but 
the idea behind the patch was that the user (in this case, me) wouldn't be 
maintaining anything for the state of the [Parallel]MultiSearcher except for 
the instance of the class itself.  Right now, it's possible to do this (not 
keep any permanent references to anything that's fed into the MultiSearcher 
constructor) but only if you intend to always search all of your Searchables.  
The patch takes a few steps in the direction of making keeping references to 
the Searchables outside of the MultiSearcher unnecessary (although, come to 
think about it, if this is the direction this class heads in, getSearchables() 
needs to return a [deep] clone of multiSearcher.searchables rather than the 
array itself), but without any method defined in Searchable that allows you to 
identify a Searchable without a reference, I'm not sure that there is much more 
that can be done.

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2447
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2447
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Irrelevant
>            Reporter: Edward Drapkin
>            Priority: Minor
>         Attachments: LUCENE-2447-predicate.patch, LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set<Searchable> which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

Reply via email to