[jira] [Updated] (JENA-119) Eliminate memory bounds during query execution

Stephen Allen (JIRA) Mon, 19 Sep 2011 16:25:32 -0700

     [ 
https://issues.apache.org/jira/browse/JENA-119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stephen Allen updated JENA-119:
-------------------------------

    Description: 
It would be nice to eliminate all memory bounds on queries.  Similar to 
JENA-44, it would involve modifying all of the QueryIterator objects that 
maintain unbounded collections of Bindings.


The ones I've identified (let me know if I've missed any):

+ QueryIterSort
      Complete!

+ QueryIterGroup
      Probably one of the more complicated implementations.  I think it can be 
done with a DistinctDataBag.

+ QueryIterDistinct
      Can be implemented trivially using DistinctDataBag, but would lose 
streaming capability.  We could do streaming just until the first spill, which 
would be a little more difficult but not bad.  If we wanted streaming even 
after spilling, then we would need an on-disk hashtable or b-tree (which could 
get expensive for maybe limited benefit, do you really need streaming after 
10,000 results?).

+ QueryIteratorCopy
    Only appears to be used QueryIterService.  Simple implementation using 
DefaultDataBag.

+ QueryIteratorCaching
      Does not match DataBag's assumption of completing all writes before 
iterating.  But it isn't used anywhere, so maybe we remove it?

+ QueryIterDiff
+ QueryIterMinus
      Both of these materialize the RHS into a collection.  Can be implemented 
with DefaultDataBag.  As an aside, is this necessary to do for all queries?  
What if the RHS is cheap (i.e. a single TriplePattern)?

+ QueryIterJoin
+ QueryIterLeftJoin
     Both materialize RHS.  Are they used anywhere?  I was under the impression 
that ARQ only considered left-deep plans with indexed joins on the RHS 
TriplePatterns.

+ SubQueries
     I'm not sure how this is handled.  Are these materialized somewhere?



  was:
It would be nice to eliminate all memory bounds on queries.  Similar to 
JENA-44, it would involve modifying all of the QueryIterator objects that 
maintain unbounded collections of Bindings.


The ones I've identified (let me know if I've missed any):

QueryIterSort
    Complete!

QueryIterGroup
    Probably one of the more complicated implementations.  I think it can be 
done with a DistinctDataBag.

QueryIterDistinct
    Can be implemented trivially using DistinctDataBag, but would lose 
streaming capability.  We could do streaming just until the first spill, which 
would be a little more difficult but not bad.  If we wanted streaming even 
after spilling, then we would need an on-disk hashtable or b-tree (which could 
get expensive for maybe limited benefit, do you really need streaming after 
10,000 results?).

QueryIteratorCopy
    Only appears to be used QueryIterService.  Simple implementation using 
DefaultDataBag.

QueryIteratorCaching
    Does not match DataBag's assumption of completing all writes before 
iterating.  But it isn't used anywhere, so maybe we remove it?

QueryIterDiff
QueryIterMinus
    Both of these materialize the RHS into a collection.  Can be implemented 
with DefaultDataBag.  As an aside, is this necessary to do for all queries?  
What if the RHS is cheap (i.e. a single TriplePattern)?

QueryIterJoin
QueryIterLeftJoin
   Both materialize RHS.  Are they used anywhere?  I was under the impression 
that ARQ only considered left-deep plans with indexed joins on the RHS 
TriplePatterns.

SubQueries
   I'm not sure how this is handled.  Are these materialized somewhere?




> Eliminate memory bounds during query execution
> ----------------------------------------------
>
>                 Key: JENA-119
>                 URL: https://issues.apache.org/jira/browse/JENA-119
>             Project: Jena
>          Issue Type: New Feature
>          Components: ARQ
>            Reporter: Stephen Allen
>
> It would be nice to eliminate all memory bounds on queries.  Similar to 
> JENA-44, it would involve modifying all of the QueryIterator objects that 
> maintain unbounded collections of Bindings.
> The ones I've identified (let me know if I've missed any):
> + QueryIterSort
>       Complete!
> + QueryIterGroup
>       Probably one of the more complicated implementations.  I think it can 
> be done with a DistinctDataBag.
> + QueryIterDistinct
>       Can be implemented trivially using DistinctDataBag, but would lose 
> streaming capability.  We could do streaming just until the first spill, 
> which would be a little more difficult but not bad.  If we wanted streaming 
> even after spilling, then we would need an on-disk hashtable or b-tree (which 
> could get expensive for maybe limited benefit, do you really need streaming 
> after 10,000 results?).
> + QueryIteratorCopy
>     Only appears to be used QueryIterService.  Simple implementation using 
> DefaultDataBag.
> + QueryIteratorCaching
>       Does not match DataBag's assumption of completing all writes before 
> iterating.  But it isn't used anywhere, so maybe we remove it?
> + QueryIterDiff
> + QueryIterMinus
>       Both of these materialize the RHS into a collection.  Can be 
> implemented with DefaultDataBag.  As an aside, is this necessary to do for 
> all queries?  What if the RHS is cheap (i.e. a single TriplePattern)?
> + QueryIterJoin
> + QueryIterLeftJoin
>      Both materialize RHS.  Are they used anywhere?  I was under the 
> impression that ARQ only considered left-deep plans with indexed joins on the 
> RHS TriplePatterns.
> + SubQueries
>      I'm not sure how this is handled.  Are these materialized somewhere?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (JENA-119) Eliminate memory bounds during query execution

Reply via email to