Eliminate memory bounds during query execution ----------------------------------------------
Key: JENA-119 URL: https://issues.apache.org/jira/browse/JENA-119 Project: Jena Issue Type: New Feature Components: ARQ Reporter: Stephen Allen It would be nice to eliminate all memory bounds on queries. Similar to JENA-44, it would involve modifying all of the QueryIterator objects that maintain unbounded collections of Bindings. The ones I've identified (let me know if I've missed any): QueryIterSort Complete! QueryIterGroup Probably one of the more complicated implementations. I think it can be done with a DistinctDataBag. QueryIterDistinct Can be implemented trivially using DistinctDataBag, but would lose streaming capability. We could do streaming just until the first spill, which would be a little more difficult but not bad. If we wanted streaming even after spilling, then we would need an on-disk hashtable or b-tree (which could get expensive for maybe limited benefit, do you really need streaming after 10,000 results?). QueryIteratorCopy Only appears to be used QueryIterService. Simple implementation using DefaultDataBag. QueryIteratorCaching Does not match DataBag's assumption of completing all writes before iterating. But it isn't used anywhere, so maybe we remove it? QueryIterDiff QueryIterMinus Both of these materialize the RHS into a collection. Can be implemented with DefaultDataBag. As an aside, is this necessary to do for all queries? What if the RHS is cheap (i.e. a single TriplePattern)? QueryIterJoin QueryIterLeftJoin Both materialize RHS. Are they used anywhere? I was under the impression that ARQ only considered left-deep plans with indexed joins on the RHS TriplePatterns. SubQueries I'm not sure how this is handled. Are these materialized somewhere? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira