[
https://issues.apache.org/jira/browse/JENA-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995492#comment-12995492
]
Simon Helsen commented on JENA-29:
----------------------------------
first, let me generally say that it is not because in some cases the "do your
best" approach does not work well, you should discard the entire approach. In
some cases, it *does* work well, and those cases are based on real use cases
and that is what matters to us. So, I would really appreciate if you take the
academic approach out of this since it won't help and is irrelevant.
Concrete:
1) subqueries: I think, as discussed in JENA-48, if a subquery is sorting, it
should give up right away. It should be simple to figure out if a query is a
subquery and enrich the logic to behave differently there.
2) This I don't get at all. If the QueryExecutionBase keeps track of whether a
query was requested to cancel, any outside client program can request that
information, that is, when partial results are desired. If *NO* partial results
are desired, don't use the cancel() the way I provided it but use abort() or
close(). What is problematic here?
3) again, same as above. If you don't care about partial results, then don't
use cancel() ! Instead use abort() or close(). And if you want to enrich
abort() in a thread-safe manner, be my guest
Stephen, again, you keep questioning our use-cases because it does not fit in a
generic theoretical understanding that it works well in all cases. If it works
well in the relevant cases, I don't see why it cannot be there. And I agree
that it is useful to have a version which just aborts, perhaps throws an
exception or whatever, but when I provided the cancel() patch, it was not
designed to do this.
Perhaps we could agree that we keep cancel() the way I suggested with small
improvements in the style of JENA-48 to take care of special subcases and at
the same time provide a thread-safe way to stop query execution (perhaps using
the same 2-stage technique I introduced with cancel()) to abort a query. I
think abort() could easily be adapted to that. As long as the javadoc is clear
about the difference between cancel() /** I try to give you partial results as
good as I can and stop the query as soon as I can */ and abort() /** I will
stop asap and throw an exception indicating I aborted */
But I have to insist that we need and use the "do your best" approach the way I
introduced it and that it works well in the cases our clients are using. If
someone expects unreasonable outcomes, they can be pointed out that the
contract does not guarantee this unreasonable expectation. And a top-level sort
is not unreasonable IMO
Could this be a way to resolve the difference of opinion? I.e. provide 2 ways
to stop a query with different behavior?
> cancellation during query execution
> -----------------------------------
>
> Key: JENA-29
> URL: https://issues.apache.org/jira/browse/JENA-29
> Project: Jena
> Issue Type: Improvement
> Components: ARQ, TDB
> Reporter: Simon Helsen
> Assignee: Andy Seaborne
> Attachments: JENA-29_ARQ_r8489.patch, JENA-29_TDB_r8489.patch,
> JENA-29_tests_ARQ_r8489.patch, jena.patch, jenaAddition.patch,
> queryIterRepeatApply.patch
>
>
> The requested improvement and proposed patch is made by Simon Helsen on
> behalf of IBM
> ARQ query execution currently does not have a satisfactory way to cancel a
> running query in a safe way. Moreover, cancel (unlike a hard abort) is
> especially useful if it is able to provide partial result sets (i.e. all the
> results it managed to compute up to when the cancellation was requested).
> Although the exact cancellation behavior depends on the capabilities of the
> underlying triple store, the proposed patch merely relies on the iterators
> used by ARQ.
> Here is a more detailed explanation of the proposed changes:
> 1) the cancel() method in the QueryIterator initiates a cancellation request
> (first boolean flag). In analogy with closeIterator(), it propagates through
> all chained iterators, so the entire calculation is aware that a cancellation
> is requested
> 2) to ensure a thread-safe semantics, the cancelRequest becomes a real cancel
> once nextBinding() has been called. It sets the second boolean which is used
> in hasNext(). This 2-phase approach is critical since the cancel() method can
> be called at any time during a query execution by the external thread. And
> because the behavior of hasNext() is such that it has to return the *same*
> value until next() is called, this is the only way to guarantee semantic
> safety when cancel() is invoked (let me re-phrase this: it is the only way I
> was able to make it actually work)
> 3) cancel() does not close anything since it allows execution to finish
> normally and the client is responsible to call close() just like with a
> regular execution. Note that the client has to call cancel() explicitly
> (typically in another thread) and has to assume that the returning result set
> may be incomplete if this method is called (it is undetermined whether the
> result is _actually_ incomplete)
> 4) in order to deal with order-by and groups, I had to make two more changes.
> First, I had to make QueryIterSort and QueryIterGroup a slightly bit more
> lazy. Currently, the full result set is calculated during plan calculation.
> With my proposed adjustments, this full result set is called on the first
> call to any of its Iterator methods (e.g. hasNext). This change does not
> AFAIK affect the semantics. Second, because the desired behavior of
> cancelling a sort or group query is to make sure everything is sorted/grouped
> even if the total result set is not completed, I added an exception which
> reverses the cancellation request of the encompassing iterator (as an example
> see cancel() in QueryIterSort). This makes sure that the entire subset of
> found and sorted elements is returned, not just the first element. However,
> it also implies in the case of sort that when a query is cancelled, it will
> first sort the partially complete result set before returning to the client.
> the attached patch is based on ARQ 2.8.5 (and a few classes in TDB 0.8.7 ->
> possibly the other triple store implementations need adjustement as well)
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira