[jira] Issue Comment Edited: (JENA-29) cancellation during query execution

Paolo Castagna (JIRA) Wed, 16 Feb 2011 12:46:47 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995505#comment-12995505
 ]


Paolo Castagna edited comment on JENA-29 at 2/16/11 8:44 PM:
-------------------------------------------------------------

We (@Talis) run a few public SPARQL end-points and we want to protect our 
machines from people running very expensive queries. Cancelling or timing out a 
long running is therefore very useful to us and we currently do it using a 
separate thread via a Callable and an ExecutorService which gives us a Future 
object which we use it to set a timeout calling get(long timeout, TimeUnit 
unit).

If one of the QueryExecution.execX() methods fail to return within the timeout, 
we send back an HTTP 500 error code and call QueryExecution.abort(). Once we 
start streaming back results (i.e. an HTTP 200 status code has been sent to the 
client) we never timeout or
interrupt.

Typically, we see timeouts for large CONSTRUCT or DESCRIBE queries. It's rare 
we timeout a SELECT query, unless it has a large sort (i.e. ORDER BY). In this 
case, we timeout. However, even if we send the HTTP 500 error code back to the 
client, the thread running the sort will continue until completion (wasting RAM 
and resources). This is where we would greatly benefit from JENA-29 or 
JENA-29+JENA-44.

We do not want or need to send partial results to people. We expect this to be 
very confusing and even difficult to explain to users. They might fall into the 
trap that if they are searching the 20 largest cities in Europe and we send 
them only 10, those 10 are the first 10 largest cities in Europe while in 
reality they might not even be in the first 20.

Being able to set a timeout directy in ARQ seems useful and it potentially 
removes the need for a separate thread. However, what the timeout exactly 
represent? Is it the time to get to the first result? Or what? If it's not the 
time to get to the first result, will it be possible to cancel/reset a timeout 
once it has been set and the query execution has started?

Finally, a few (personal) comments:

@Andy: "Timing out a query is (I think) going to be the #1 use case."

It is for us at Talis.

@Andy: "It might as well be possible to set/reset during query execution."

Indeed. We would like to be able to foget about the timeout once we start 
streaming back a large result set.

@Andy: "We'll need to timeout on sorting and grouping as well, and maybe any 
materializing iterators."

Yes.

@Andy: 3/ Buffer - get all the results before sending any, then the HTTP status 
code can be set. (3) is ideal functionally but looses streaming.

Yep. Ideal functionally, but not in practice with limited resources and 
multiple concurrent SPARQL queries.

@Stephen "I argue that there should be no expectation of returning any results 
that might be "queued up" in an iterator after cancellation has been requested."

I tend to agree, since it is unclear to me what exactly are the use-cases Simon 
referred to.

Perhaps, the way to move forward on this is to split this issue into two: one 
is about timing out queries and the other one is about delivering partial 
results when a query times out. 

      was (Author: castagna):
    We (@Talis) run a few public SPARQL end-points and we want to protect our 
machines from people running very expensive queries. Cancelling or timing out a 
long running is therefore very useful to us and we currently do it using a 
separate thread via a Callable and an ExecutorService which gives us a Future 
object which we use it to set a timeout calling get(long timeout, TimeUnit 
unit).

If one of the QueryExecution.execX() methods fail to return within the timeout, 
we send back an HTTP 500 error code and call QueryExecution.abort(). Once we 
start streaming back results (i.e. an HTTP 200 status code has been sent to the 
client) we never timeout or
interrupt.

Typically, we see timeouts for large CONSTRUCT or DESCRIBE queries. It's rare 
we timeout a SELECT query, unless it has a large sort (i.e.
ORDER BY). In this case, we timeout. However, even if we send the HTTP 500 
error code back to the client, the thread running the sort will
continue until completion (wasting RAM and resources). This is where we would 
greatly benefit from JENA-29 or JENA-29+JENA-44.

We do not want or need to send partial results to people. We expect this to be 
very confusing and even difficult to explain to users. They
might fall into the trap that if they are searching the 20 largest cities in 
Europe and we send them only 10, those 10 are the first 10
largest cities in Europe while in reality they might not even be in the first 
20.

Being able to set a timeout directy in ARQ seems useful and it potentially 
removes the need for a separate thread. However, what the
timeout exactly represent? Is it the time to get to the first result? Or what? 
If it's not the time to get to the first result, will it be
possible to cancel/reset a timeout once it has been set and the query execution 
has started?

Finally, a few (personal) comments:

@Andy: "Timing out a query is (I think) going to be the #1 use case."

It is for us at Talis.

@Andy: "It might as well be possible to set/reset during query execution."

Indeed. We would like to be able to foget about the timeout once we start 
streaming back a large result set.

@Andy: "We'll need to timeout on sorting and grouping as well, and maybe any 
materializing iterators."

Yes.

@Andy: 3/ Buffer - get all the results before sending any, then the HTTP status 
code can be set. (3) is ideal functionally but looses streaming.

Yep. Ideal functionally, but not in practice with limited resources and 
multiple concurrent SPARQL queries.

@Stephen "I argue that there should be no expectation of returning any results 
that might be "queued up" in an iterator after cancellation has been requested."

I tend to agree, since it is unclear to me what exactly are the use-cases Simon 
referred to.

Perhaps, the way to move forward on this is to split this issue into two: one 
is about timing out queries and the other one is about delivering partial 
results when a query times out. 
  
> cancellation during query execution
> -----------------------------------
>
>                 Key: JENA-29
>                 URL: https://issues.apache.org/jira/browse/JENA-29
>             Project: Jena
>          Issue Type: Improvement
>          Components: ARQ, TDB
>            Reporter: Simon Helsen
>            Assignee: Andy Seaborne
>         Attachments: JENA-29_ARQ_r8489.patch, JENA-29_TDB_r8489.patch, 
> JENA-29_tests_ARQ_r8489.patch, jena.patch, jenaAddition.patch, 
> queryIterRepeatApply.patch
>
>
> The requested improvement and proposed patch is made by Simon Helsen on 
> behalf of IBM
> ARQ query execution currently does not have a satisfactory way to cancel a 
> running query in a safe way. Moreover, cancel (unlike a hard abort) is 
> especially useful if it is able to provide partial result sets (i.e. all the 
> results it managed to compute up to when the cancellation was requested). 
> Although the exact cancellation behavior depends on the capabilities of the 
> underlying triple store, the proposed patch merely relies on the iterators 
> used by ARQ.
> Here is a more detailed explanation of the proposed changes:
> 1) the cancel() method in the QueryIterator initiates a cancellation request 
> (first boolean flag). In analogy with closeIterator(), it propagates through 
> all chained iterators, so the entire calculation is aware that a cancellation 
> is requested
> 2) to ensure a thread-safe semantics, the cancelRequest becomes a real cancel 
> once nextBinding() has been called. It sets the second boolean which is used 
> in hasNext(). This 2-phase approach is critical since the cancel() method can 
> be called at any time during a query execution by the external thread. And 
> because the behavior of hasNext() is such that it has to return the *same* 
> value until next() is called, this is the only way to guarantee semantic 
> safety when cancel() is invoked (let me re-phrase this: it is the only way I 
> was able to make it actually work)
> 3) cancel() does not close anything since it allows execution to finish 
> normally and the client is responsible to call close() just like with a 
> regular execution. Note that the client has to call cancel() explicitly 
> (typically in another thread) and has to assume that the returning result set 
> may be incomplete if this method is called (it is undetermined whether the 
> result is _actually_ incomplete)
> 4) in order to deal with order-by and groups, I had to make two more changes. 
> First, I had to make QueryIterSort and QueryIterGroup a slightly bit more 
> lazy. Currently, the full result set is calculated during plan calculation. 
> With my proposed adjustments, this full result set is called on the first 
> call to any of its Iterator methods (e.g. hasNext). This change does not 
> AFAIK affect the semantics. Second, because the desired behavior of 
> cancelling a sort or group query is to make sure everything is sorted/grouped 
> even if the total result set is not completed, I added an exception which 
> reverses the cancellation request of the encompassing iterator (as an example 
> see cancel() in QueryIterSort). This makes sure that the entire subset of 
> found and sorted elements is returned, not just the first element. However, 
> it also implies in the case of sort that when a query is cancelled, it will 
> first sort the partially complete result set before returning to the client.
> the attached patch is based on ARQ 2.8.5 (and a few classes in TDB 0.8.7 -> 
> possibly the other triple store implementations need adjustement as well)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Issue Comment Edited: (JENA-29) cancellation during query execution

Reply via email to