Improvements to Bindings
------------------------

                 Key: JENA-121
                 URL: https://issues.apache.org/jira/browse/JENA-121
             Project: Jena
          Issue Type: Improvement
          Components: ARQ
            Reporter: Stephen Allen
            Priority: Minor


The Binding interface is a key object for query execution.  It has some issues 
such that it may be a good idea to think about tweaking the design a bit.  Some 
thoughts:


1) Bindings should be immutable (in the strong Java sense: 
http://www.ibm.com/developerworks/java/library/j-jtp02183/index.html)

2) Add a BindingPair class that represents a Var/Node value (could be called 
something else, BindingValue?)
2a) Binding constructor/factory method takes an Iterable<BindingPair> to 
initialize it
2b) Binding can now implement Iterable<BindingPair> which would be more 
efficient than iterating over the variables then looking up each node

3) An implementation that has better memory usage than BindingMap (a HashMap 
may be overkill here, if we can use the iterator from 2b in more places)

4) An implementation that copies parent BindingPairs instead of maintaining a 
reference.  If the parent bindings are not held onto by themselves after being 
incorporated into a parent, we can save memory by copying and letting the 
parent be GCed (indeed in the common join case, this appears to be what 
happens).  We would also get speed benefits from storing the BindingPairs in a 
single data structure, making iterating and looking up values faster.  
Additionally, more Binding objects die young instead of being held as part of a 
higher level algebra collection (like sort or distinct), which can help with GC 
overhead.

5) Expose an iterator of BindingPairs ordered by variable.  This is needed for 
BindingComparator, and may be an option for Algebra.merge()/compatible() if we 
eliminate fast get(Var) lookups of nodes (as a consequence of 3).  The ordering 
could be determined at construction or be initialized lazily.

6) Method for estimating memory size for the binding.  Would be very useful for 
setting threshold policies for DataBags.  Although this may be tough to do, 
especially if Nodes are shared between bindings.


Some of these points need some more investigation, and some good profiling to 
ensure that they are beneficial, especially 3, 4, and 5.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to