[ https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864619#action_12864619 ]
Edward Drapkin commented on LUCENE-2447: ---------------------------------------- I think that's where I have a misunderstanding and disagreement with the function of the multisearcher. Because it's mostly stateless, I can't help but thing that it's designed to be instantiated on demand, every time that it needs to be used. While with the "traditional" MultiSearcher, this doesn't seem to be much of a problem (3us is no time at all), it gets progressively heavier and more cumbersome to keep creating Executors every time with ParallelMultiSearcher. OTOH, given the ability to specify an Executor at object creation "solves" this issue, making ParallelMultiSearcher almost as light as MultiSearcher itself. Not to digress into architectural theory, but I have a disagreement with the way that these classes are designed to be used. In order to make MultiSearcher even functional, the application needs to maintain its list of Searchables outside MultiSearcher itself; given the ability to specify an Executor at construction time (for ParallelMultiSearcher), you're now maintaining an array of Searchables (because we it's expensive and wasteful to create _those_ every time they're needed) and a thread pool management object in the calling object. This reeks of state leakage to me, where the state of [Parallel]MultiSearcher is being maintained by an external, calling object and is being re-created every time it's needed, violating encapsulation conventions and practice. Furthermore, (with the caveat that I'm relatively new to lucene) MultiSearcher is itself a Searchable and this behavior is inconsistent with the way that I've seen other Searchables handled. I'm not sure how much sense it makes to trade maintaining a reference to one Searchable (that encapsulates several) to maintaining a list/array/collection of references to other Searchables... especially when you look into multi-threaded apps and non-final collections of Searchables that may start to modify the state (however transient) of your MultiSearcher outside of the class itself. I'm not sure how clear it is from my previous comments and the code itself, but the idea behind the patch was that the user (in this case, me) wouldn't be maintaining anything for the state of the [Parallel]MultiSearcher except for the instance of the class itself. Right now, it's possible to do this (not keep any permanent references to anything that's fed into the MultiSearcher constructor) but only if you intend to always search all of your Searchables. The patch takes a few steps in the direction of making keeping references to the Searchables outside of the MultiSearcher unnecessary (although, come to think about it, if this is the direction this class heads in, getSearchables() needs to return a [deep] clone of multiSearcher.searchables rather than the array itself), but without any method defined in Searchable that allows you to identify a Searchable without a reference, I'm not sure that there is much more that can be done. > Add support for subsets of searchables inside a > MultiSearcher/ParallelMultiSearcher instance's methods at runtime > ----------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-2447 > URL: https://issues.apache.org/jira/browse/LUCENE-2447 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 3.0.1 > Environment: Irrelevant > Reporter: Edward Drapkin > Priority: Minor > Attachments: LUCENE-2447-predicate.patch, LUCENE-2447.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > Here's the situation: We have a site with a fair few amount of indexes that > we're using MultiSearcher/ParallelMultiSearcher for, but the users can select > an arbitrary permutation of indexes to search. For example (contrived, but > illustratory): the site has indexes numbered 1 - 10; user A wants to search > in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search > even-numbered indexes. From Lucene 3.0.1, the only way to do this is to > continually instantiate a new MultiSearcher based on every permutation of > indexes that a user wants, which is not ideal at all. > What I've done is add a new parameter to all methods in MultiSearcher that > use the searchables array (docFreq, search, rewrite and > createDocFrequencyMap), a Set<Searchable> which is checked for isEmpty() and > contains() for every iteration over the searchables[]. The actual logic has > been moved into these methods and the old methods have become overloads that > pass a Collections.emptySet() into those methods, so I do not expect there to > be a very noticeable performance impact as a result of this modification, if > it's measurable at all. > I didn't modify the test for MultiSearcher very much, just enough to > illustrate the that subsetting of the search results works, since no other > logic has changed. If I need to do more for the testing, let me know and > I'll do it. > I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java > and TestMultiSearcher.java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org