[jira] [Updated] (LUCENE-3421) PayloadTermQuery's explain is broken when span score is not included

2011-09-07 Thread Edward Drapkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Drapkin updated LUCENE-3421:
---

Summary: PayloadTermQuery's explain is broken when span score is not 
included  (was: PayloadTermQuery's explain is broken when span score is )

> PayloadTermQuery's explain is broken when span score is not included
> 
>
> Key: LUCENE-3421
> URL: https://issues.apache.org/jira/browse/LUCENE-3421
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
> Environment: irrelevant
>Reporter: Edward Drapkin
>
> When setting includeSpanScore to false with PayloadTermQuery, the explain is 
> broken.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3421) PayloadTermQuery's explain is broken when span score is

2011-09-07 Thread Edward Drapkin (JIRA)
PayloadTermQuery's explain is broken when span score is 


 Key: LUCENE-3421
 URL: https://issues.apache.org/jira/browse/LUCENE-3421
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
 Environment: irrelevant
Reporter: Edward Drapkin


When setting includeSpanScore to false with PayloadTermQuery, the explain is 
broken.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2997) PayloadQueryParser addition

2011-03-28 Thread Edward Drapkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Drapkin updated LUCENE-2997:
---

Description: 
I recently needed to deploy payloads for my search system and ran into a small 
wall: there was no query parser available for use with Payloads.  I through 
this one together, extending out of the new modular QueryParser structure.

Attached is the class file.  I didn't know what package this would belong in 
(whether with the query parser or with the rest of the payload functionality in 
contrib/analyzers), so it's in the default package for now.

I know this is a little, simple thing, but it seemed like something that should 
probably be included.

  was:
I recently needed to deploy payloads for my search system and ran into a small 
wall: there was no query parser available for use with Payloads.  I through 
this one together, extending out of the new modular QueryParser structure.

Attached is the class file.  I didn't know what package this would belong in 
(whether with the query parser or with the rest of the payload functionality in 
contrib/analyzers), so it's in the default package for now.


> PayloadQueryParser addition
> ---
>
> Key: LUCENE-2997
> URL: https://issues.apache.org/jira/browse/LUCENE-2997
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*, QueryParser
>Affects Versions: 3.0.1
> Environment: n/a
>Reporter: Edward Drapkin
>  Labels: features, patch
> Fix For: 3.0.4
>
> Attachments: PayloadQueryParser.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I recently needed to deploy payloads for my search system and ran into a 
> small wall: there was no query parser available for use with Payloads.  I 
> through this one together, extending out of the new modular QueryParser 
> structure.
> Attached is the class file.  I didn't know what package this would belong in 
> (whether with the query parser or with the rest of the payload functionality 
> in contrib/analyzers), so it's in the default package for now.
> I know this is a little, simple thing, but it seemed like something that 
> should probably be included.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2997) PayloadQueryParser addition

2011-03-28 Thread Edward Drapkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Drapkin updated LUCENE-2997:
---

Attachment: PayloadQueryParser.java

PayloadQueryParser implementation.

> PayloadQueryParser addition
> ---
>
> Key: LUCENE-2997
> URL: https://issues.apache.org/jira/browse/LUCENE-2997
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*, QueryParser
>Affects Versions: 3.0.1
> Environment: n/a
>Reporter: Edward Drapkin
>  Labels: features, patch
> Fix For: 3.0.4
>
> Attachments: PayloadQueryParser.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I recently needed to deploy payloads for my search system and ran into a 
> small wall: there was no query parser available for use with Payloads.  I 
> through this one together, extending out of the new modular QueryParser 
> structure.
> Attached is the class file.  I didn't know what package this would belong in 
> (whether with the query parser or with the rest of the payload functionality 
> in contrib/analyzers), so it's in the default package for now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-2997) PayloadQueryParser addition

2011-03-28 Thread Edward Drapkin (JIRA)
PayloadQueryParser addition
---

 Key: LUCENE-2997
 URL: https://issues.apache.org/jira/browse/LUCENE-2997
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*, QueryParser
Affects Versions: 3.0.1
 Environment: n/a
Reporter: Edward Drapkin
 Fix For: 3.0.4


I recently needed to deploy payloads for my search system and ran into a small 
wall: there was no query parser available for use with Payloads.  I through 
this one together, extending out of the new modular QueryParser structure.

Attached is the class file.  I didn't know what package this would belong in 
(whether with the query parser or with the rest of the payload functionality in 
contrib/analyzers), so it's in the default package for now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2508) Consolidate Highlighter implementations and a major refactor of the non-termvector highlighter

2010-06-22 Thread Edward Drapkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Drapkin updated LUCENE-2508:
---

Attachment: LUCENE-2508.patch

The "first stab" patch.

> Consolidate Highlighter implementations and a major refactor of the 
> non-termvector highlighter
> --
>
> Key: LUCENE-2508
> URL: https://issues.apache.org/jira/browse/LUCENE-2508
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/highlighter
> Environment: irrelevant
>Reporter: Edward Drapkin
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2508.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Originally, I had planned to create a contrib module to allow people to 
> highlight multiple documents in parallel, but after talking to Uwe in IRC 
> about it, I realized that it was pretty useless.  However, I was already 
> sitting on an iterative highlighting algorithm that was much faster (my tests 
> show 20% - 40%) and more accurate and, based on that same IRC conversation, I 
> decided to not let all the work that I had done go to waste and try to 
> contribute it back again.  Uwe had mentioned that "More like this" detected 
> term vectors when called and use the term vector implementation when 
> possible, if I recall correctly, so I decided to do that.
> The patch that I've attached is my first stab at this.  It's not nearly 
> complete and full disclosure dictates that I say that it's not fully 
> documented and there are not any unit tests written.  I wanted to go ahead 
> and open an issue to get some feedback on the approach that I've taken as 
> well as the fact that it exists will be a proverbial kick in my pants to 
> continue working on it.
> In short, what I've changed:
> * Completely rewritten the non-tv highlighter to be faster and cleaner.  
> There is some small loss in functionality for now, namely the loss of the 
> GradientHighlighter (I just haven't done this yet) and the lack of exposure 
> of TermFragments and their scores (I can expose this if it is deemed 
> necessary, this is one of the things I'd like feedback on). 
> * Moved org.apache.lucene.search.vectorhighlight and 
> org.apache.lucene.search.highlight to a single package with a unified 
> interface, search.highlight (with two sub-packages: 
> search.highlight.termvector and search.highlight.iterative, respectively).
> * Unified the highlighted term formatting into a single interface: 
> highlighter/Formatter and both highlighters use this now.  
> What I need to do before I personally would consider this finished:
> * Finish documentation, most specifically on TermVectorHighlighter.  I 
> haven't done this now as I expect things to change up quite a bit before 
> they're finalized and I really hate writing documentation that goes to waste, 
> but I do intend to complete this bullet :)
> * "Flesh out" the API of search.highlight.Highlighter as it's very barebones 
> right now
> * Continue removing and consolidating duplicate functionality, like I've done 
> with the highlighted word tag generation.
> What I think I need feedback on, before I can proceed:
> * FastTermVectorHighlighter and the iterative highlighters need completely 
> different sets of information in order to work.  The approach I've taken is 
> exposing a vectorHighlight method in the unified interface and a 
> iterativeHighlight method, as well as a single highlight method that takes 
> all the information needed for either of them and I'm unsure if this is the 
> best way to do this.
> * The naming of things; I'm not sure if this is a big issue, or even an issue 
> at all, but I'd like to not break any conventions that may exist that I'm 
> unaware of.
> * How big of a deal is exposing the particular score of a segment from the 
> highlighting interface and does this need to be extended into the term vector 
> highlighting as well?
> * There are a lot of methods in the tv implementation that are marked 
> depracted; since this release will almost definitely break backwards 
> compatibility anyway, are these safe to remove?
> * Any other input anyone else may have :)
> I'm going to continue to work on things that I can work on, at least unless 
> someone tells me I'm wasting my time and will look forward to hearing you 
> guys' feedback! :)
> As a sidenote because it does seem rather random that I would arbitrarily 
> re-write a working algorithm in the non-tv highlighter, I did it originally 
> because I wanted to parallelize the highlighting (which was a failed 
> experiment) and simply to see if I could make the algorithm faster, as I find 
> that sort of thing particularly fun :)
> As a second sidenote, if anyone would like an explanation of the a

[jira] Created: (LUCENE-2508) Consolidate Highlighter implementations and a major refactor of the non-termvector highlighter

2010-06-22 Thread Edward Drapkin (JIRA)
Consolidate Highlighter implementations and a major refactor of the 
non-termvector highlighter
--

 Key: LUCENE-2508
 URL: https://issues.apache.org/jira/browse/LUCENE-2508
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/highlighter
 Environment: irrelevant
Reporter: Edward Drapkin
Priority: Minor
 Fix For: 4.0
 Attachments: LUCENE-2508.patch

Originally, I had planned to create a contrib module to allow people to 
highlight multiple documents in parallel, but after talking to Uwe in IRC about 
it, I realized that it was pretty useless.  However, I was already sitting on 
an iterative highlighting algorithm that was much faster (my tests show 20% - 
40%) and more accurate and, based on that same IRC conversation, I decided to 
not let all the work that I had done go to waste and try to contribute it back 
again.  Uwe had mentioned that "More like this" detected term vectors when 
called and use the term vector implementation when possible, if I recall 
correctly, so I decided to do that.

The patch that I've attached is my first stab at this.  It's not nearly 
complete and full disclosure dictates that I say that it's not fully documented 
and there are not any unit tests written.  I wanted to go ahead and open an 
issue to get some feedback on the approach that I've taken as well as the fact 
that it exists will be a proverbial kick in my pants to continue working on it.

In short, what I've changed:

* Completely rewritten the non-tv highlighter to be faster and cleaner.  There 
is some small loss in functionality for now, namely the loss of the 
GradientHighlighter (I just haven't done this yet) and the lack of exposure of 
TermFragments and their scores (I can expose this if it is deemed necessary, 
this is one of the things I'd like feedback on). 
* Moved org.apache.lucene.search.vectorhighlight and 
org.apache.lucene.search.highlight to a single package with a unified 
interface, search.highlight (with two sub-packages: search.highlight.termvector 
and search.highlight.iterative, respectively).
* Unified the highlighted term formatting into a single interface: 
highlighter/Formatter and both highlighters use this now.  

What I need to do before I personally would consider this finished:

* Finish documentation, most specifically on TermVectorHighlighter.  I haven't 
done this now as I expect things to change up quite a bit before they're 
finalized and I really hate writing documentation that goes to waste, but I do 
intend to complete this bullet :)
* "Flesh out" the API of search.highlight.Highlighter as it's very barebones 
right now
* Continue removing and consolidating duplicate functionality, like I've done 
with the highlighted word tag generation.

What I think I need feedback on, before I can proceed:
* FastTermVectorHighlighter and the iterative highlighters need completely 
different sets of information in order to work.  The approach I've taken is 
exposing a vectorHighlight method in the unified interface and a 
iterativeHighlight method, as well as a single highlight method that takes all 
the information needed for either of them and I'm unsure if this is the best 
way to do this.
* The naming of things; I'm not sure if this is a big issue, or even an issue 
at all, but I'd like to not break any conventions that may exist that I'm 
unaware of.
* How big of a deal is exposing the particular score of a segment from the 
highlighting interface and does this need to be extended into the term vector 
highlighting as well?
* There are a lot of methods in the tv implementation that are marked 
depracted; since this release will almost definitely break backwards 
compatibility anyway, are these safe to remove?
* Any other input anyone else may have :)

I'm going to continue to work on things that I can work on, at least unless 
someone tells me I'm wasting my time and will look forward to hearing you guys' 
feedback! :)

As a sidenote because it does seem rather random that I would arbitrarily 
re-write a working algorithm in the non-tv highlighter, I did it originally 
because I wanted to parallelize the highlighting (which was a failed 
experiment) and simply to see if I could make the algorithm faster, as I find 
that sort of thing particularly fun :)

As a second sidenote, if anyone would like an explanation of the algorithm for 
the highlighting I devised, and why I feel that it's more accurate, I'd be 
happy to provide them with one (and benchmarks as well).

Thanks,
Eddie

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@

[jira] Commented: (LUCENE-2494) Modify ParallelMultiSearcher to use a CompletionService instead of slowly polling for results

2010-06-09 Thread Edward Drapkin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877145#action_12877145
 ] 

Edward Drapkin commented on LUCENE-2494:


That's MUCH better than what I had, kudos!

> Modify ParallelMultiSearcher to use a CompletionService instead of slowly 
> polling for results
> -
>
> Key: LUCENE-2494
> URL: https://issues.apache.org/jira/browse/LUCENE-2494
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
> Environment: Irrelevant
>Reporter: Edward Drapkin
>Assignee: Simon Willnauer
> Fix For: 3.1
>
> Attachments: LUCENE-2494.patch, LUCENE-2494.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Right now, the parallel multi searcher creates an array/list of Future 
> representing each of the searchables that's being concurrently searched (and 
> its corresponding search task).
> As it stands, once the tasks are all submitted to the executor, the array is 
> iterated over, FIFO, and Future.get() is called iteratively.  This obviously 
> works, but isn't ideal.  It's entirely possible (a situation I've run into) 
> where one of the first searchables represents a large index that takes a long 
> time to search, so the results of the other searchables can't be processed 
> until the large index is done searching.  In my case, we have two indexes 
> with several million records that get searched in front of some other 
> indexes, the smallest of which has only a few ten thousand entries and I 
> didn't think it was ideal for the results of the other indexes to wait.
> I've modified ParallelMultiSearcher to use CompletionServices instead, so 
> that results are processed in the order they are completed, rather than the 
> order that they are submitted.  All the tests still pass, and to the best of 
> my knowledge this won't break anything.  This have several advantages:
> 1) Speed - the thread owning the executor doesn't have to wait for the first 
> submitted task to finish in order to process the results of the other tasks, 
> which may have finished first
> 2) Removed several warnings (even if they are annotated away) due to the 
> ugliness of typecasting generic arrays.
> 3) Decreased the complexity of the code in some cases, usually by removing 
> the necessity of allocating and filling arrays.
> With a primed "cache" of searchables, I was getting 700-1200 ms per search, 
> and using the same phrases, with this patch, I am now getting 400-500ms per 
> search :)
> Patch is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2494) Modify ParallelMultiSearcher to use a CompletionService instead of slowly polling for results

2010-06-07 Thread Edward Drapkin (JIRA)
Modify ParallelMultiSearcher to use a CompletionService instead of slowly 
polling for results
-

 Key: LUCENE-2494
 URL: https://issues.apache.org/jira/browse/LUCENE-2494
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
 Environment: Irrelevant
Reporter: Edward Drapkin
 Fix For: 3.1
 Attachments: LUCENE-2494.patch

Right now, the parallel multi searcher creates an array/list of Future 
representing each of the searchables that's being concurrently searched (and 
its corresponding search task).

As it stands, once the tasks are all submitted to the executor, the array is 
iterated over, FIFO, and Future.get() is called iteratively.  This obviously 
works, but isn't ideal.  It's entirely possible (a situation I've run into) 
where one of the first searchables represents a large index that takes a long 
time to search, so the results of the other searchables can't be processed 
until the large index is done searching.  In my case, we have two indexes with 
several million records that get searched in front of some other indexes, the 
smallest of which has only a few ten thousand entries and I didn't think it was 
ideal for the results of the other indexes to wait.

I've modified ParallelMultiSearcher to use CompletionServices instead, so that 
results are processed in the order they are completed, rather than the order 
that they are submitted.  All the tests still pass, and to the best of my 
knowledge this won't break anything.  This have several advantages:
1) Speed - the thread owning the executor doesn't have to wait for the first 
submitted task to finish in order to process the results of the other tasks, 
which may have finished first
2) Removed several warnings (even if they are annotated away) due to the 
ugliness of typecasting generic arrays.
3) Decreased the complexity of the code in some cases, usually by removing the 
necessity of allocating and filling arrays.

With a primed "cache" of searchables, I was getting 700-1200 ms per search, and 
using the same phrases, with this patch, I am now getting 400-500ms per search 
:)

Patch is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2494) Modify ParallelMultiSearcher to use a CompletionService instead of slowly polling for results

2010-06-07 Thread Edward Drapkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Drapkin updated LUCENE-2494:
---

Attachment: LUCENE-2494.patch

> Modify ParallelMultiSearcher to use a CompletionService instead of slowly 
> polling for results
> -
>
> Key: LUCENE-2494
> URL: https://issues.apache.org/jira/browse/LUCENE-2494
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
> Environment: Irrelevant
>Reporter: Edward Drapkin
> Fix For: 3.1
>
> Attachments: LUCENE-2494.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Right now, the parallel multi searcher creates an array/list of Future 
> representing each of the searchables that's being concurrently searched (and 
> its corresponding search task).
> As it stands, once the tasks are all submitted to the executor, the array is 
> iterated over, FIFO, and Future.get() is called iteratively.  This obviously 
> works, but isn't ideal.  It's entirely possible (a situation I've run into) 
> where one of the first searchables represents a large index that takes a long 
> time to search, so the results of the other searchables can't be processed 
> until the large index is done searching.  In my case, we have two indexes 
> with several million records that get searched in front of some other 
> indexes, the smallest of which has only a few ten thousand entries and I 
> didn't think it was ideal for the results of the other indexes to wait.
> I've modified ParallelMultiSearcher to use CompletionServices instead, so 
> that results are processed in the order they are completed, rather than the 
> order that they are submitted.  All the tests still pass, and to the best of 
> my knowledge this won't break anything.  This have several advantages:
> 1) Speed - the thread owning the executor doesn't have to wait for the first 
> submitted task to finish in order to process the results of the other tasks, 
> which may have finished first
> 2) Removed several warnings (even if they are annotated away) due to the 
> ugliness of typecasting generic arrays.
> 3) Decreased the complexity of the code in some cases, usually by removing 
> the necessity of allocating and filling arrays.
> With a primed "cache" of searchables, I was getting 700-1200 ms per search, 
> and using the same phrases, with this patch, I am now getting 400-500ms per 
> search :)
> Patch is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

2010-05-05 Thread Edward Drapkin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864619#action_12864619
 ] 

Edward Drapkin commented on LUCENE-2447:


I think that's where I have a misunderstanding and disagreement with the 
function of the multisearcher.  Because it's mostly stateless, I can't help but 
thing that it's designed to be instantiated on demand, every time that it needs 
to be used.  While with the "traditional" MultiSearcher, this doesn't seem to 
be much of a problem (3us is no time at all), it gets progressively heavier and 
more cumbersome to keep creating Executors every time with 
ParallelMultiSearcher.  OTOH, given the ability to specify an Executor at 
object creation "solves" this issue, making ParallelMultiSearcher almost as 
light as MultiSearcher itself.

Not to digress into architectural theory, but I have a disagreement with the 
way that these classes are designed to be used.  In order to make MultiSearcher 
even functional, the application needs to maintain its list of Searchables 
outside MultiSearcher itself; given the ability to specify an Executor at 
construction time (for ParallelMultiSearcher), you're now maintaining an array 
of Searchables (because we it's expensive and wasteful to create _those_ every 
time they're needed) and a thread pool management object in the calling object. 
 This reeks of state leakage to me, where the state of [Parallel]MultiSearcher 
is being maintained by an external, calling object and is being re-created 
every time it's needed, violating encapsulation conventions and practice.  
Furthermore, (with the caveat that I'm relatively new to lucene) MultiSearcher 
is itself a Searchable and this behavior is inconsistent with the way that I've 
seen other Searchables handled.  I'm not sure how much sense it makes to trade 
maintaining a reference to one Searchable (that encapsulates several) to 
maintaining a list/array/collection of references to other Searchables... 
especially when you look into multi-threaded apps and non-final collections of 
Searchables that may start to modify the state (however transient) of your 
MultiSearcher outside of the class itself.

I'm not sure how clear it is from my previous comments and the code itself, but 
the idea behind the patch was that the user (in this case, me) wouldn't be 
maintaining anything for the state of the [Parallel]MultiSearcher except for 
the instance of the class itself.  Right now, it's possible to do this (not 
keep any permanent references to anything that's fed into the MultiSearcher 
constructor) but only if you intend to always search all of your Searchables.  
The patch takes a few steps in the direction of making keeping references to 
the Searchables outside of the MultiSearcher unnecessary (although, come to 
think about it, if this is the direction this class heads in, getSearchables() 
needs to return a [deep] clone of multiSearcher.searchables rather than the 
array itself), but without any method defined in Searchable that allows you to 
identify a Searchable without a reference, I'm not sure that there is much more 
that can be done.

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -
>
> Key: LUCENE-2447
> URL: https://issues.apache.org/jira/browse/LUCENE-2447
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0.1
> Environment: Irrelevant
>Reporter: Edward Drapkin
>Priority: Minor
> Attachments: LUCENE-2447-predicate.patch, LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very

[jira] Issue Comment Edited: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

2010-05-05 Thread Edward Drapkin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864552#action_12864552
 ] 

Edward Drapkin edited comment on LUCENE-2447 at 5/5/10 6:06 PM:


I talked to Uri in IRC, and he suggested making a Predicate utility 
interface and using that instead of forcing people to use Set.  
This seemed like a much better idea, so I went ahead and implemented that and 
this is the patch that reflects that.

(Note: patch name is LUCENE-2447-predicate.patch).

  was (Author: edwardd):
I talked to Uri in IRC, and he suggested making a Predicate utility 
interface and using that instead of forcing people to use Set.  
This seemed like a much better idea, so I went ahead and implemented that and 
this is the patch that reflects that.
  
> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -
>
> Key: LUCENE-2447
> URL: https://issues.apache.org/jira/browse/LUCENE-2447
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0.1
> Environment: Irrelevant
>Reporter: Edward Drapkin
>Priority: Minor
> Attachments: LUCENE-2447-predicate.patch, LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

2010-05-05 Thread Edward Drapkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Drapkin updated LUCENE-2447:
---

Attachment: LUCENE-2447-predicate.patch

I talked to Uri in IRC, and he suggested making a Predicate utility 
interface and using that instead of forcing people to use Set.  
This seemed like a much better idea, so I went ahead and implemented that and 
this is the patch that reflects that.

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -
>
> Key: LUCENE-2447
> URL: https://issues.apache.org/jira/browse/LUCENE-2447
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0.1
> Environment: Irrelevant
>Reporter: Edward Drapkin
>Priority: Minor
> Attachments: LUCENE-2447-predicate.patch, LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

2010-05-05 Thread Edward Drapkin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864519#action_12864519
 ] 

Edward Drapkin commented on LUCENE-2447:


Ah, cool, regarding LUCENE-2440 :)

You mention that it's possible to accomplish what this accomplishes with the 
current API, via instantiating a MultiSearcher per request, which is possible, 
but I think this way would be much simpler and while increasing the complexity 
of the API, it does so in a consistent way that's easy to understand and use 
(and doesn't break BC); if the difference between the proposed change of the 
API and the current API is too different, maybe splitting the API change into a 
new class would be the solution (i.e. two classes: MultiSearcher and 
SplittableMultiSearcher).  Either way, under the current API, calls look like 
this:


  public void doSearch() {
Set searchables = this.getSearchablesFromRequestParams(); 
//faux method 
MultiSearcher mSearcher = new MultiSearcher(searchables);
mSearcher.search(someQuery, 1000);
//...
  }

Compare with, under my proposed API:

  public void doSearch() {
this.mSearcher.search(this.getSearchablesFromRequestParams(), someQuery, 
1000);
//...
  }


Keeping in mind that I'm not sure this is an entirely esoteric/niche 
requirement (surely I can't be the only one who has this issue) and this 
doesn't break any existing code or significantly increase its execution time, 
the end result is much cleaner code (from userland) that's also less resource 
intensive (however cheap - on my completely idle Q9300 it takes about 3us (20us 
for ParallelMultiSearcher) to instantiate* - it may be to instantiate 
MultiSearcher, it's still more expensive that keeping one instance around, 
especially in a heavily trafficked environment), especially regarding memory 
usage and garbage collection times.

* I created 100 indexes, each with 10,000 documents (each of which had 100 
fields named name1, name2, etc. with 128 bytes of random string) and then 
tested that - each index was ~60MB.  I can paste the code I used if you would 
like.

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -
>
> Key: LUCENE-2447
> URL: https://issues.apache.org/jira/browse/LUCENE-2447
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0.1
> Environment: Irrelevant
>Reporter: Edward Drapkin
>Priority: Minor
> Attachments: LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

2010-05-05 Thread Edward Drapkin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864463#action_12864463
 ] 

Edward Drapkin commented on LUCENE-2447:


A quick search (in JIRA and markmail) yielded no results.  Can you link me to 
the issue?  While that may (yet to be seen) be a solution to *my* problem, is 
that a reason to not accept the proposed patches (2440 and this one, 2447)?

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -
>
> Key: LUCENE-2447
> URL: https://issues.apache.org/jira/browse/LUCENE-2447
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0.1
> Environment: Irrelevant
>Reporter: Edward Drapkin
>Priority: Minor
> Attachments: LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

2010-05-05 Thread Edward Drapkin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864454#action_12864454
 ] 

Edward Drapkin commented on LUCENE-2447:


It's not entirely the fact that creating a MultiSearcher per request is too 
heavy.  if you'll look at 2440, I also modified ParallelMultiSearcher to 
support a fixed thread pool;  what I'm worried about is, even with a fixed 
thread pool of something small like 4 threads, the concurrent request count 
could spiral the amount of threads that the JVM has to deal with out of 
control.  If I can use the same ParallelMultiSearcher across requests, with a 
fixed thread pool of something sane like 16 or 24 threads, then I can be 
reasonably sure that this particular class isn't going to spiral thread counts 
out of control.  

As far as stuffing everything into the same index, we've looked into that and 
determined that it isn't a real possibility because the size of the indexes - 
there's quite a few ranging from a few MB to a few GB of data - would make the 
merge process relatively expensive and coupled with the fact that the indexes 
themselves are built and maintained separately, we'd be needing to run the 
merging process too frequently for it to be feasible.  

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -
>
> Key: LUCENE-2447
> URL: https://issues.apache.org/jira/browse/LUCENE-2447
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0.1
> Environment: Irrelevant
>Reporter: Edward Drapkin
>Priority: Minor
> Attachments: LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

2010-05-05 Thread Edward Drapkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Drapkin updated LUCENE-2447:
---

Attachment: LUCENE-2447.patch

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -
>
> Key: LUCENE-2447
> URL: https://issues.apache.org/jira/browse/LUCENE-2447
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0.1
> Environment: Irrelevant
>Reporter: Edward Drapkin
>Priority: Minor
> Attachments: LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

2010-05-05 Thread Edward Drapkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Drapkin updated LUCENE-2447:
---

Attachment: (was: LUCENE-2447.patch)

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -
>
> Key: LUCENE-2447
> URL: https://issues.apache.org/jira/browse/LUCENE-2447
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0.1
> Environment: Irrelevant
>Reporter: Edward Drapkin
>Priority: Minor
> Attachments: LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

2010-05-05 Thread Edward Drapkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Drapkin updated LUCENE-2447:
---

Attachment: LUCENE-2447.patch

Patch for proposed change.

> Add support for subsets of searchables inside a 
> MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -
>
> Key: LUCENE-2447
> URL: https://issues.apache.org/jira/browse/LUCENE-2447
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0.1
> Environment: Irrelevant
>Reporter: Edward Drapkin
>Priority: Minor
> Attachments: LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that 
> we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
> an arbitrary permutation of indexes to search.  For example (contrived, but 
> illustratory): the site has indexes numbered 1 - 10; user A wants to search 
> in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
> even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
> continually instantiate a new MultiSearcher based on every permutation of 
> indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that 
> use the searchables array (docFreq, search, rewrite and 
> createDocFrequencyMap), a Set which is checked for isEmpty() and 
> contains() for every iteration over the searchables[].  The actual logic has 
> been moved into these methods and the old methods have become overloads that 
> pass a Collections.emptySet() into those methods, so I do not expect there to 
> be a very noticeable performance impact as a result of this modification, if 
> it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to 
> illustrate the that subsetting of the search results works, since no other 
> logic has changed.  If I need to do more for the testing, let me know and 
> I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
> and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime

2010-05-05 Thread Edward Drapkin (JIRA)
Add support for subsets of searchables inside a 
MultiSearcher/ParallelMultiSearcher instance's methods at runtime
-

 Key: LUCENE-2447
 URL: https://issues.apache.org/jira/browse/LUCENE-2447
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 3.0.1
 Environment: Irrelevant
Reporter: Edward Drapkin
Priority: Minor
 Attachments: LUCENE-2447.patch

Here's the situation: We have a site with a fair few amount of indexes that 
we're using MultiSearcher/ParallelMultiSearcher for, but the users can select 
an arbitrary permutation of indexes to search.  For example (contrived, but 
illustratory): the site has indexes numbered 1 - 10; user A wants to search in 
all 10; user B wants to search indexes 1, 2 and 3, user C wants to search 
even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to 
continually instantiate a new MultiSearcher based on every permutation of 
indexes that a user wants, which is not ideal at all.

What I've done is add a new parameter to all methods in MultiSearcher that use 
the searchables array (docFreq, search, rewrite and createDocFrequencyMap), a 
Set which is checked for isEmpty() and contains() for every 
iteration over the searchables[].  The actual logic has been moved into these 
methods and the old methods have become overloads that pass a 
Collections.emptySet() into those methods, so I do not expect there to be a 
very noticeable performance impact as a result of this modification, if it's 
measurable at all.

I didn't modify the test for MultiSearcher very much, just enough to illustrate 
the that subsetting of the search results works, since no other logic has 
changed.  If I need to do more for the testing, let me know and I'll do it.

I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java 
and TestMultiSearcher.java.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2440) Add support for custom ExecutorServices in ParallelMultiSearcher

2010-05-03 Thread Edward Drapkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Drapkin updated LUCENE-2440:
---

Attachment: LUCENE-2440.patch

Patch to added ticketed support.

> Add support for custom ExecutorServices in ParallelMultiSearcher
> 
>
> Key: LUCENE-2440
> URL: https://issues.apache.org/jira/browse/LUCENE-2440
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0.1
> Environment: Any
>Reporter: Edward Drapkin
>Priority: Minor
> Attachments: LUCENE-2440.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Right now, the ParallelMultiSearcher uses a cachedThreadPool, which is 
> limitless and a poor choice for a web application, given the threaded nature 
> of the requests (say a webapp with tomcat-default 200 threads and 100 indexes 
> could be looking at 2000 searching threads pretty easily).  Support for 
> adding a custom ExecutorService is pretty trivial.  Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2440) Add support for custom ExecutorServices in ParallelMultiSearcher

2010-05-03 Thread Edward Drapkin (JIRA)
Add support for custom ExecutorServices in ParallelMultiSearcher


 Key: LUCENE-2440
 URL: https://issues.apache.org/jira/browse/LUCENE-2440
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 3.0.1
 Environment: Any
Reporter: Edward Drapkin
Priority: Minor


Right now, the ParallelMultiSearcher uses a cachedThreadPool, which is 
limitless and a poor choice for a web application, given the threaded nature of 
the requests (say a webapp with tomcat-default 200 threads and 100 indexes 
could be looking at 2000 searching threads pretty easily).  Support for adding 
a custom ExecutorService is pretty trivial.  Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org