[jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

Edward Drapkin (JIRA) Wed, 05 May 2010 14:32:29 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864519#action_12864519
 ]


Edward Drapkin commented on LUCENE-2447:
----------------------------------------

Ah, cool, regarding LUCENE-2440 :)

You mention that it's possible to accomplish what this accomplishes with the 
current API, via instantiating a MultiSearcher per request, which is possible, 
but I think this way would be much simpler and while increasing the complexity 
of the API, it does so in a consistent way that's easy to understand and use 
(and doesn't break BC); if the difference between the proposed change of the 
API and the current API is too different, maybe splitting the API change into a 
new class would be the solution (i.e. two classes: MultiSearcher and 
SplittableMultiSearcher).  Either way, under the current API, calls look like 
this:


  public void doSearch() {
    Set<Searchable> searchables = this.getSearchablesFromRequestParams(); 
//faux method 
    MultiSearcher mSearcher = new MultiSearcher(searchables);
    mSearcher.search(someQuery, 1000);
    //...
  }

Compare with, under my proposed API:

  public void doSearch() {
    this.mSearcher.search(this.getSearchablesFromRequestParams(), someQuery, 
1000);
    //...
  }


Keeping in mind that I'm not sure this is an entirely esoteric/niche 
requirement (surely I can't be the only one who has this issue) and this 
doesn't break any existing code or significantly increase its execution time, 
the end result is much cleaner code (from userland) that's also less resource 
intensive (however cheap - on my completely idle Q9300 it takes about 3us (20us 
for ParallelMultiSearcher) to instantiate* - it may be to instantiate 
MultiSearcher, it's still more expensive that keeping one instance around, 
especially in a heavily trafficked environment), especially regarding memory 
usage and garbage collection times.

* I created 100 indexes, each with 10,000 documents (each of which had 100 
fields named name1, name2, etc. with 128 bytes of random string) and then 
tested that - each index was ~60MB.  I can paste the code I used if you would 
like.

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2447
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2447
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Irrelevant
>            Reporter: Edward Drapkin
>            Priority: Minor
>         Attachments: LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set<Searchable> which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

Reply via email to