[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832461#action_12832461
 ] 

Shai Erera commented on LUCENE-1720:
------------------------------------

I like the idea of adding the projected activity timeout in general, but I'd 
like to question its usefulness in reality (or at least for search 
applications). The way I think of it (and it might be because I'm thinking of 
my use case) there are two problems with such API:
# It might not be very easy (if at all) or performing to project how much of 
the work has been done. For TermQuery it might be easy to tell this (e.g. 
numSeenSoFar / df(term)), but that will add an 'if' to every document that is 
traversed, and possible more operations. But for more complicated queries, I'm 
not sure you'll be able to tell how much of the query has been processed.
# If I am willing to sustain a 10s query, then I guess I'd want to extract as 
much information as I can in those 10s. If after 1s I realize I haven't 
processed even 10% of the data that doesn't mean I'd like to stop, right? Maybe 
the query/activity will speed up shortly? I think that if I put a cap on the 
query time, it means I don't mind spending that amount of time ... but I also 
recognize this may depend on the application, and therefore that is not a too 
strong argument.

I think this approach is interesting, as it is able to detect 'hanging' threads 
(such as those stuck in infinite loops).

I realize however that ActivityTimeMonitor is not search specific (which makes 
me think it should be moved to o.a.l.util or something) and therefore the 
projected activity timeout can have its usage in other places.

How about if we do it in a separate issue? We still need to write enough tests 
for what exists so far, and turn the Benchmark class into a benchmark task/alg. 
I think that if we can avoid extra functionality (which is likely to add more 
bugs to cover) it will be easier to finish that issue, no?
BTW, in order to support this we'll need to store the startTime as well, not 
just the timeoutTime, which means that we either add another startTimesThreads 
map, or change the map to be from Thread to a Times object which encapsulates 
both times ... Minor thing though.

Also, is this targeted to be added to 'core' or contrib?

> TimeLimitedIndexReader and associated utility class
> ---------------------------------------------------
>
>                 Key: LUCENE-1720
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1720
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, ActivityTimeMonitor.java, ActivityTimeMonitor.java, 
> LUCENE-1720.patch, TestTimeLimitedIndexReader.java, 
> TestTimeLimitedIndexReader.java, TimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to