Marko A. Rodriguez created TINKERPOP3-866:
---------------------------------------------

             Summary: GroupStep and Traversal-Based Reductions
                 Key: TINKERPOP3-866
                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-866
             Project: TinkerPop 3
          Issue Type: Improvement
          Components: process
    Affects Versions: 3.0.1-incubating, 3.0.0-incubating
            Reporter: Marko A. Rodriguez
            Assignee: Marko A. Rodriguez
             Fix For: 3.1.0-incubating


Right now {{GroupStep}} is defined as:

{code}
public final class GroupStep<S, K, V, R> extends ReducingBarrierStep<S, Map<K, 
R>> implements MapReducer, TraversalParent {
    private Traversal.Admin<S, K> keyTraversal = null;
    private Traversal.Admin<S, V> valueTraversal = null;
    private Traversal.Admin<Collection<V>, R> reduceTraversal = null;
...
{code}

Look at {{reduceTraversal}}. It takes a {{Collection<V>}} of "values" and 
reduces them to a "reduction" {{R}}. Why are we using {{Collection<V>}}, why is 
this not:

{code}
private Traversal.Admin<V, R> reduceTraversal = null;
{code}

Now, when a new {{K}} is created (and reduce is defined), we clone 
{{reduceTraversal}}. Thus, each key has a {{reduceTraversal}} (identical 
clones) that operate in a stream like fashion on {{V}} to yield {{R}}. This 
enables us to remove the {{Collection<V>}} (memory hog) and allows us to 
defined {{GroupCountStep}} in terms of {{GroupStep}} without (?limited?) 
computational cost. HOWEVER, this changes the API as people who did this:

{code}
g.V.group.by(label()).by(outE().count()).by(sum(local))
{code}

would now have to do this:

{code}
g.V.group.by(label()).by(outE().count()).by(sum())
{code}

Its very minor, given the speed up we would gain and the ability for us to now 
do "groupCount" efficiently on arbitrary values -- not just bulks (e.g. sacks).






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to