[
https://issues.apache.org/jira/browse/JENA-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995505#comment-12995505
]
Paolo Castagna edited comment on JENA-29 at 2/16/11 8:44 PM:
-------------------------------------------------------------
We (@Talis) run a few public SPARQL end-points and we want to protect our
machines from people running very expensive queries. Cancelling or timing out a
long running is therefore very useful to us and we currently do it using a
separate thread via a Callable and an ExecutorService which gives us a Future
object which we use it to set a timeout calling get(long timeout, TimeUnit
unit).
If one of the QueryExecution.execX() methods fail to return within the timeout,
we send back an HTTP 500 error code and call QueryExecution.abort(). Once we
start streaming back results (i.e. an HTTP 200 status code has been sent to the
client) we never timeout or
interrupt.
Typically, we see timeouts for large CONSTRUCT or DESCRIBE queries. It's rare
we timeout a SELECT query, unless it has a large sort (i.e. ORDER BY). In this
case, we timeout. However, even if we send the HTTP 500 error code back to the
client, the thread running the sort will continue until completion (wasting RAM
and resources). This is where we would greatly benefit from JENA-29 or
JENA-29+JENA-44.
We do not want or need to send partial results to people. We expect this to be
very confusing and even difficult to explain to users. They might fall into the
trap that if they are searching the 20 largest cities in Europe and we send
them only 10, those 10 are the first 10 largest cities in Europe while in
reality they might not even be in the first 20.
Being able to set a timeout directy in ARQ seems useful and it potentially
removes the need for a separate thread. However, what the timeout exactly
represent? Is it the time to get to the first result? Or what? If it's not the
time to get to the first result, will it be possible to cancel/reset a timeout
once it has been set and the query execution has started?
Finally, a few (personal) comments:
@Andy: "Timing out a query is (I think) going to be the #1 use case."
It is for us at Talis.
@Andy: "It might as well be possible to set/reset during query execution."
Indeed. We would like to be able to foget about the timeout once we start
streaming back a large result set.
@Andy: "We'll need to timeout on sorting and grouping as well, and maybe any
materializing iterators."
Yes.
@Andy: 3/ Buffer - get all the results before sending any, then the HTTP status
code can be set. (3) is ideal functionally but looses streaming.
Yep. Ideal functionally, but not in practice with limited resources and
multiple concurrent SPARQL queries.
@Stephen "I argue that there should be no expectation of returning any results
that might be "queued up" in an iterator after cancellation has been requested."
I tend to agree, since it is unclear to me what exactly are the use-cases Simon
referred to.
Perhaps, the way to move forward on this is to split this issue into two: one
is about timing out queries and the other one is about delivering partial
results when a query times out.
was (Author: castagna):
We (@Talis) run a few public SPARQL end-points and we want to protect our
machines from people running very expensive queries. Cancelling or timing out a
long running is therefore very useful to us and we currently do it using a
separate thread via a Callable and an ExecutorService which gives us a Future
object which we use it to set a timeout calling get(long timeout, TimeUnit
unit).
If one of the QueryExecution.execX() methods fail to return within the timeout,
we send back an HTTP 500 error code and call QueryExecution.abort(). Once we
start streaming back results (i.e. an HTTP 200 status code has been sent to the
client) we never timeout or
interrupt.
Typically, we see timeouts for large CONSTRUCT or DESCRIBE queries. It's rare
we timeout a SELECT query, unless it has a large sort (i.e.
ORDER BY). In this case, we timeout. However, even if we send the HTTP 500
error code back to the client, the thread running the sort will
continue until completion (wasting RAM and resources). This is where we would
greatly benefit from JENA-29 or JENA-29+JENA-44.
We do not want or need to send partial results to people. We expect this to be
very confusing and even difficult to explain to users. They
might fall into the trap that if they are searching the 20 largest cities in
Europe and we send them only 10, those 10 are the first 10
largest cities in Europe while in reality they might not even be in the first
20.
Being able to set a timeout directy in ARQ seems useful and it potentially
removes the need for a separate thread. However, what the
timeout exactly represent? Is it the time to get to the first result? Or what?
If it's not the time to get to the first result, will it be
possible to cancel/reset a timeout once it has been set and the query execution
has started?
Finally, a few (personal) comments:
@Andy: "Timing out a query is (I think) going to be the #1 use case."
It is for us at Talis.
@Andy: "It might as well be possible to set/reset during query execution."
Indeed. We would like to be able to foget about the timeout once we start
streaming back a large result set.
@Andy: "We'll need to timeout on sorting and grouping as well, and maybe any
materializing iterators."
Yes.
@Andy: 3/ Buffer - get all the results before sending any, then the HTTP status
code can be set. (3) is ideal functionally but looses streaming.
Yep. Ideal functionally, but not in practice with limited resources and
multiple concurrent SPARQL queries.
@Stephen "I argue that there should be no expectation of returning any results
that might be "queued up" in an iterator after cancellation has been requested."
I tend to agree, since it is unclear to me what exactly are the use-cases Simon
referred to.
Perhaps, the way to move forward on this is to split this issue into two: one
is about timing out queries and the other one is about delivering partial
results when a query times out.
> cancellation during query execution
> -----------------------------------
>
> Key: JENA-29
> URL: https://issues.apache.org/jira/browse/JENA-29
> Project: Jena
> Issue Type: Improvement
> Components: ARQ, TDB
> Reporter: Simon Helsen
> Assignee: Andy Seaborne
> Attachments: JENA-29_ARQ_r8489.patch, JENA-29_TDB_r8489.patch,
> JENA-29_tests_ARQ_r8489.patch, jena.patch, jenaAddition.patch,
> queryIterRepeatApply.patch
>
>
> The requested improvement and proposed patch is made by Simon Helsen on
> behalf of IBM
> ARQ query execution currently does not have a satisfactory way to cancel a
> running query in a safe way. Moreover, cancel (unlike a hard abort) is
> especially useful if it is able to provide partial result sets (i.e. all the
> results it managed to compute up to when the cancellation was requested).
> Although the exact cancellation behavior depends on the capabilities of the
> underlying triple store, the proposed patch merely relies on the iterators
> used by ARQ.
> Here is a more detailed explanation of the proposed changes:
> 1) the cancel() method in the QueryIterator initiates a cancellation request
> (first boolean flag). In analogy with closeIterator(), it propagates through
> all chained iterators, so the entire calculation is aware that a cancellation
> is requested
> 2) to ensure a thread-safe semantics, the cancelRequest becomes a real cancel
> once nextBinding() has been called. It sets the second boolean which is used
> in hasNext(). This 2-phase approach is critical since the cancel() method can
> be called at any time during a query execution by the external thread. And
> because the behavior of hasNext() is such that it has to return the *same*
> value until next() is called, this is the only way to guarantee semantic
> safety when cancel() is invoked (let me re-phrase this: it is the only way I
> was able to make it actually work)
> 3) cancel() does not close anything since it allows execution to finish
> normally and the client is responsible to call close() just like with a
> regular execution. Note that the client has to call cancel() explicitly
> (typically in another thread) and has to assume that the returning result set
> may be incomplete if this method is called (it is undetermined whether the
> result is _actually_ incomplete)
> 4) in order to deal with order-by and groups, I had to make two more changes.
> First, I had to make QueryIterSort and QueryIterGroup a slightly bit more
> lazy. Currently, the full result set is calculated during plan calculation.
> With my proposed adjustments, this full result set is called on the first
> call to any of its Iterator methods (e.g. hasNext). This change does not
> AFAIK affect the semantics. Second, because the desired behavior of
> cancelling a sort or group query is to make sure everything is sorted/grouped
> even if the total result set is not completed, I added an exception which
> reverses the cancellation request of the encompassing iterator (as an example
> see cancel() in QueryIterSort). This makes sure that the entire subset of
> found and sorted elements is returned, not just the first element. However,
> it also implies in the case of sort that when a query is cancelled, it will
> first sort the partially complete result set before returning to the client.
> the attached patch is based on ARQ 2.8.5 (and a few classes in TDB 0.8.7 ->
> possibly the other triple store implementations need adjustement as well)
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira