cancellation during query execution
-----------------------------------

                 Key: JENA-29
                 URL: https://issues.apache.org/jira/browse/JENA-29
             Project: Jena
          Issue Type: Improvement
          Components: ARQ, TDB
            Reporter: Simon Helsen
         Attachments: jena.patch

The requested improvement and proposed patch is made by Simon Helsen on behalf 
of IBM

ARQ query execution currently does not have a satisfactory way to cancel a 
running query in a safe way. Moreover, cancel (unlike a hard abort) is 
especially useful if it is able to provide partial result sets (i.e. all the 
results it managed to compute up to when the cancellation was requested). 
Although the exact cancellation behavior depends on the capabilities of the 
underlying triple store, the proposed patch merely relies on the iterators used 
by ARQ.

Here is a more detailed explanation of the proposed changes:

1) the cancel() method in the QueryIterator initiates a cancellation request 
(first boolean flag). In analogy with closeIterator(), it propagates through 
all chained iterators, so the entire calculation is aware that a cancellation 
is requested
2) to ensure a thread-safe semantics, the cancelRequest becomes a real cancel 
once nextBinding() has been called. It sets the second boolean which is used in 
hasNext(). This 2-phase approach is critical since the cancel() method can be 
called at any time during a query execution by the external thread. And because 
the behavior of hasNext() is such that it has to return the *same* value until 
next() is called, this is the only way to guarantee semantic safety when 
cancel() is invoked (let me re-phrase this: it is the only way I was able to 
make it actually work)
3) cancel() does not close anything since it allows execution to finish 
normally and the client is responsible to call close() just like with a regular 
execution. Note that the client has to call cancel() explicitly (typically in 
another thread) and has to assume that the returning result set may be 
incomplete if this method is called (it is undetermined whether the result is 
_actually_ incomplete)
4) in order to deal with order-by and groups, I had to make two more changes. 
First, I had to make QueryIterSort and QueryIterGroup a slightly bit more lazy. 
Currently, the full result set is calculated during plan calculation. With my 
proposed adjustments, this full result set is called on the first call to any 
of its Iterator methods (e.g. hasNext). This change does not AFAIK affect the 
semantics. Second, because the desired behavior of cancelling a sort or group 
query is to make sure everything is sorted/grouped even if the total result set 
is not completed, I added an exception which reverses the cancellation request 
of the encompassing iterator (as an example see cancel() in QueryIterSort). 
This makes sure that the entire subset of found and sorted elements is 
returned, not just the first element. However, it also implies in the case of 
sort that when a query is cancelled, it will first sort the partially complete 
result set before returning to the client.

the attached patch is based on ARQ 2.8.5 (and a few classes in TDB 0.8.7 -> 
possibly the other triple store implementations need adjustement as well)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to