Marko A. Rodriguez created TINKERPOP-1192:
---------------------------------------------
Summary: TraversalSideEffects should support registered reducers
(binary operators).
Key: TINKERPOP-1192
URL: https://issues.apache.org/jira/browse/TINKERPOP-1192
Project: TinkerPop
Issue Type: Improvement
Components: process
Affects Versions: 3.1.1-incubating
Reporter: Marko A. Rodriguez
Fix For: 3.2.0-incubating
TinkerPop 3.2.0-SNAPSHOT has made leaps and bounds towards completely aligning
Gremlin OLTP and Gremlin OLAP. What has got me really excited is that there is
such a strong conceptual alignment between the following components:
{code}
VertexProgram <=> Traversal
Iteration <=> Step
Messages <=> Traversers
MessageCombiner <=> TraverserSet ("bulking")
BSP <=> Barrier
Workers <=> Parallel Steps
Master <=> Sequential Steps
Memory <=> SideEffects
{code}
TraversalVertexProgram is very clean -- its lays atop the GraphComputer API in
a natural, effortless way.
However, there is one last pairing that needs some better alignment:
GraphComputer Memory and Traversal SideEffects. A Memory slot has the notion of
a key, a value, and a reducer (binary operator). A Traversal SideEffect as the
notion of a key and a value. I think we should enable Traversal SideEffects to
support registered reducers. If we do this, then there is perfect alignment
between the two models and we won't have to have "if(onGraphComputer)"-type
logic in our side-effect steps.
Right now in GroupCountSideEffectStep we do this:
{code}
public void sideEffect(final Traverser<S> traverser) {
Map<E,Long> groupCountMap =
this.getTraversal().getSideEffects().get(this.sideEffectKey);
MapHelper.incr(traverser.get(), traverser.bulk(), groupCountMap)
}
{code}
We are explicitly getting the Map from the sideEffects and updating it. This
model will not generally work in OLAP because groupCountMap is a distributed
data structure and thus, local updates to a Map don't distribute. I have it
working currently in master/, but at the cost of not being able to read the
sideEffect, only write to it. To make TraversalSideEffects consistent across
both OLTP and OLAP, I think we should express GroupCountSideEffectStep like
this (*** analogously for GroupSideEffectStep, TreeSideEffectStep, etc.):
{code}
public void sideEffect(final Traverser<S> traverser) {
this.getTraversal().getSideEffects().add(this.sideEffectKey,
Collections.singletonMap(traverser.get(), traverser.bulk());
}
{code}
Moreover, TraversalSideEffects should have the following method:
{code}
TraversalSideEffects.register(final String key, final Supplier<A>
initialValue, final BinaryOperator<A> reducer)
{code}
Note that we already have:
https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSideEffects.java#L88
We can deprecate the current registerSupplier() in support of register().
Moreover, for backwards compatibility, BinaryOperator<A> reducer would simply
be Operator.assign.
https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Operator.java#L59-L62
Thus, this would not be a breaking change and it will ensure a natural
congruence between these two related computing structures -- Memory and
TraversalSideEffects.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)