[ 
https://issues.apache.org/jira/browse/JENA-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092106#comment-13092106
 ] 

Stephen Allen commented on JENA-44:
-----------------------------------

I did not include a cancellation mechanism in the DataBags themselves because 
it was not clear to me that it would be necessary.

The only point at which a significant amount of time can be spent in the 
DataBag code is in the add() method right as a spill is occurring.  The program 
execution may be in Array.sort() (SortedDataBag and DistinctDataBag) or it may 
be in the process of serializing tuples to disk.  Given anticipated spill 
thresholds (1,000-100,000 tuples or memory in the 10-100 MB range), and the 
fact that disk I/O is sequential (and thus fast), it seemed like an unnecessary 
complication to support cancellation since those operations would complete in 
the 10's of seconds range.  Any physical query operator using the DataBag would 
then be able to cancel immediately after the spill finished (QueryIterSort 
passes the cancel request to it's embedded iterator which will then throw the 
QueryCancellationException on the next iteration).

After the add phase is complete, and the QueryIterSort starts returning 
results, cancellation will be handled by the super class (QueryIteratorBase).

Porting the tests meant that they would test the QueryIterSort with the 
embedded DataBag to be sure that the temporary files were cleaned up when the 
iterator was cancelled.  So it's not really testing cancellation on the DataBag 
per say, but rather the new QueryIterSort.


> Support external sorting of bindings in ARQ
> -------------------------------------------
>
>                 Key: JENA-44
>                 URL: https://issues.apache.org/jira/browse/JENA-44
>             Project: Jena
>          Issue Type: New Feature
>          Components: ARQ
>            Reporter: Sam Tunnicliffe
>            Assignee: Paolo Castagna
>            Priority: Minor
>         Attachments: JENA-44-0.patch, 
> JENA-44-Depends-on-JENA-99-r1157891.patch, JENA-44_ARQ_r1156212.patch, 
> JENA-44_ARQ_r8531.patch, JENA-44_ARQ_r8724.patch
>
>
> In QueryIterSort, the sorting of the contents of an Iterator<Binding> is done 
> in memory, using Arrays.sort. This can be problematic where the set to be 
> sorted is large. A possible solution could be to use an external, disk-backed 
> algorithm. A hybrid approach may be better, whereby we attempt the in-memory 
> sort, but when the number of bindings encountered goes over a certain number, 
> resort to the disk-backed variant.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to