[jira] [Commented] (JENA-119) Eliminate memory bounds during query execution

Stephen Allen (Commented) (JIRA) Fri, 30 Sep 2011 07:46:10 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118099#comment-13118099
 ]


Stephen Allen commented on JENA-119:
------------------------------------

I disagree about allowing separate symbols to control each operator 
individually.  Is it going to be a common use case to give ORDER BY 10,000 
bindings and MINUS say 1,000?  How would you know that even made sense?  
Separate symbols means a user now has to configure up to 10 different options, 
for all of which he will have very little context or knowledge of what to set 
it to.  I don't even know that *I* can come up with good values for those 
numbers, much less ask a user to do so.

Would "tmpTableCount" be better?  I want to avoid "tmpTableSize", since "size" 
may imply memory size and we may also want to use that name in the future for 
JENA-126.
                
> Eliminate memory bounds during query execution
> ----------------------------------------------
>
>                 Key: JENA-119
>                 URL: https://issues.apache.org/jira/browse/JENA-119
>             Project: Jena
>          Issue Type: New Feature
>          Components: ARQ
>            Reporter: Stephen Allen
>         Attachments: JENA-119-r1177090-Fuseki-Construct.patch, 
> JENA-119-r1177452-ARQ-Construct.patch
>
>
> It would be nice to eliminate all memory bounds on queries.  Similar to 
> JENA-44, it would involve modifying all of the QueryIterator objects that 
> maintain unbounded collections of Bindings.
> The ones I've identified (let me know if I've missed any):
> + QueryIterSort
>       Complete!
> + QueryIterGroup
>       Probably one of the more complicated implementations.  I think it can 
> be done with a DistinctDataBag.
> + QueryIterDistinct
>       Can be implemented trivially using DistinctDataBag, but would lose 
> streaming capability.  We could do streaming just until the first spill, 
> which would be a little more difficult but not bad.  If we wanted streaming 
> even after spilling, then we would need an on-disk hashtable or b-tree (which 
> could get expensive for maybe limited benefit, do you really need streaming 
> after 10,000 results?).
> + QueryIteratorCopy
>     Only appears to be used QueryIterService.  Simple implementation using 
> DefaultDataBag.
> + QueryIteratorCaching
>       Does not match DataBag's assumption of completing all writes before 
> iterating.  But it isn't used anywhere, so maybe we remove it?
> + QueryIterDiff
> + QueryIterMinus
>       Both of these materialize the RHS into a collection.  Can be 
> implemented with DefaultDataBag.  As an aside, is this necessary to do for 
> all queries?  What if the RHS is cheap (i.e. a single TriplePattern)?
> + QueryIterJoin
> + QueryIterLeftJoin
>      Both materialize RHS.  Are they used anywhere?  I was under the 
> impression that ARQ only considered left-deep plans with indexed joins on the 
> RHS TriplePatterns.
> + SubQueries
>      I'm not sure how this is handled.  Are these materialized somewhere?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-119) Eliminate memory bounds during query execution

Reply via email to