GitHub user okram opened a pull request:
https://github.com/apache/incubator-tinkerpop/pull/251
TINKERPOP-1192 & TINKERPOP-1193 & TINKERPOP-951 & TINKERPOP-958: The future
is now.
https://issues.apache.org/jira/browse/TINKERPOP-1192
https://issues.apache.org/jira/browse/TINKERPOP-1193
https://issues.apache.org/jira/browse/TINKERPOP-951
https://issues.apache.org/jira/browse/TINKERPOP-958
Summary: `TraversalSideEffects` and `GraphComputer` `Memory` are aligned.
The are unified via `MemoryTraversalSideEffects` which gives us the flexibility
to have a single `TraversalSideEffects` across an arbitrary number of OLAP jobs
and finally makes it so that OLTP and OLAP use of sideEffects are consistent.
This comes at the cost of a few API changes to `TraversalSideEffects` and some
semantics changes. For 99% of users, they won't notice. Only those that make
heavy use of `sideEffect{x = ...}` may have to change their usage.
```
g.V().out("created").
group("m").by(label).
pageRank(1.0).
by("pageRank").
by(inE()).
times(1).
in("created").
group("m").by("pageRank").
cap("m")
```
The result is:
[{2.0=[v[4], v[4], v[4], v[4]], 1.0=[v[6], v[6], v[6], v[1], v[1], v[1]],
software=[v[5], v[3], v[3], v[3]]}]
Do you see why this traversal is so cool? Check out what it is doing ---
1. Normal out("created").group(). Fine and dandy all the software
vertices are grouped.
2. But then we kick off another OLAP job that computes a single-step
spreading activation starting at the software vertices.
3. Then, from the software vertices, it goes to the people who created
that software and groups those people by their pageRank value.
First, (already known), this traversal compiles to 3 OLAP jobs and it "just
works." Slammin'.
Second, as of this afternoon, TraversalSideEffects (GraphComputer.Memory)
are preserved across OLAP jobs and across TraversalVertexPrograms.
--- it truly is just one traversal.
------
CHANGELOG
```
* Fixed a severe `Traversal` cloning issue that caused inconsistent
`TraversalSideEffects`.
* `TraversalSideEffects` remain consistent and useable across multiple
chained OLAP jobs.
* Added `MemoryTraversalSideEffects` which backs a `TraversalSideEffects`
by `Memory` in OLAP.
* `TraversalSideEffects` are now fully functional in OLAP save that an
accurate global view is possible after an iteration.
* Updated the `TraversalSideEffects` API to support registered reducers and
updated `get()`-semantics. (*breaking*)
* Split existing `profile()` into `ProfileStep` and
`ProfileSideEffectStep`.
* The `profile()`-step acts like a reducing barrier and emits
`TraversalMetrics` without the need for `cap()`. (*breaking*)
* Added `LocalBarrier` to allow traversers to remain distributed during an
iteration so as to limit cluster traffic.
* An OLAP-based `Barrier` synchronization bug has been fixed.
```
UPDATE
```
TraversalSideEffect Update
^^^^^^^^^^^^^^^^^^^^^^^^^^
There were significant changes to `TraversalSideEffect` both at the
semantic level and at the API level. Users that have traversals of the form
`sideEffect{â¦}` that use global side-effects should read the following
carefully. If the user only uses side-effects via non-lambda steps (e.g.
`groupCount("m")`) and with `g.withSideEffects()`, then the changes below will
not effect them. Providers will not be effected by the changes save any tests
cases that use `Traversal.getSideEffects().get()`. This method no longer
returns `Optional<V>`, but instead just the value `<V>`.
TraversalSideEffect Get API Change
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`TraversalSideEffects` can now logically operate within a distributed OLAP
environment. In order to make this possible, it is necessary that each
side-effect be registered with a reducing `BinaryOperator`. This binary
operator will combine distributed updates into a single global side-effect at
the master traversal. Many of the methods in `TraversalSideEffect` have been
`Deprecated`, but they are backwards compatible save that
`TraversalSideEffects.get()` no longer returns an `Optional`, but instead
throws an `IllegalArgumentException`. While the `Optional` semantics could have
remained, it was deemed best to directly return the side-effect value to reduce
object creation costs and because all side-effects must be registered apriori
and therefore there is never a reason why an unknown side-effect key would be
used.
TraversalSideEffect Registration Requirement
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All `TraversalSideEffects` must be registered upfront. This is because, in
OLAP, side-effects map to `Memory` compute keys and as such, must be declared
prior to the execution of the `TraversalVertexProgram`. If a user's traversal
creates a side-effect mid-traversal, it will fail. The traversal must use
`GraphTraversalSource.withSideEffect()` to declare the side-effects it will use
during its execution lifetime.
TraversalSideEffect Add Requirement
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In a distributed environment, a side-effect can not be mutated and be
expected to exist in the mutated form at the final, aggregated, master
traversal. For instance, if a side-effect "myCount" references a `Long`, the
`Long` can not be updated directly via `sideEffects.set("myCount",
sideEffects.get("myCount") + 1)`. Instead, it must rely on the reducer to do
the merging and thus, the `Step` must do `sideEffect.add("mySet",1)` where the
registered reducer is `Operator.sum`. Note that
`Traverser.sideEffect(key,value)` will use `TraversalSideEffect.add()`. Thus,
the below will increment "a". If no operator was provided, then the operator is
assumed `Operator.assign` and the final result of "a" would be 1.
`g.withSideEffect("a",0,sum).V().out().sideEffect(t -> t.sideEffect("a",1))`
See
link:https://issues.apache.org/jira/browse/TINKERPOP-1192[TINKERPOP-1192]
ProfileStep Update and GraphTraversal API Change
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The `profile()`-step has been refactored into 2 steps -- `ProfileStep` and
`ProfileSideEffectStep`. Users who previously used the `profile()` in
conjunction with `cap(TraversalMetrics.METRICS_KEY)` can now simply omit the
cap step. Users who retrieved `TraversalMetrics` from the side-effects after
iteration can still do so, but will need to specify a side-effect key when
using the `profile()`. For example, `profile("myMetrics")`.
See link:https://issues.apache.org/jira/browse/TINKERPOP-958[TINKERPOP-958]
```
VOTE +1.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1192
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-tinkerpop/pull/251.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #251
----
commit 8e807eec73a72b4bbbde3156e6aa8b237d099b42
Author: rjbriody <[email protected]>
Date: 2016-02-24T20:08:25Z
TINKERPOP-958 split .profile() into ProfileSideEffectStep and ProfileStep
(which is a MapStep).
commit 856bb564baad3a28754aa37a77803ab3d2f9b8a5
Author: rjbriody <[email protected]>
Date: 2016-02-25T16:07:17Z
Condense ProfileSideEffectTest into ProfileTest.
commit 7374ee45cd918f0770df3966476e98fac10794ae
Author: rjbriody <[email protected]>
Date: 2016-02-26T14:52:20Z
Fix computer issues for profile step.
commit 86941ee5c6ff37f2bc4f88ea520a530e9372ba79
Author: rjbriody <[email protected]>
Date: 2016-03-01T13:35:45Z
Update traversal docs for Profile step changes.
commit 06cdff86c21a9d2c20f528d9c620604dde1b6a07
Author: rjbriody <[email protected]>
Date: 2016-03-01T13:39:34Z
Update traversal docs for Profile step changes.
commit 6f6c9a5698373291c51e9b12af86a5dc3e09ff0e
Author: rjbriody <[email protected]>
Date: 2016-03-01T13:45:38Z
Revert "Update traversal docs for Profile step changes."
This reverts commit 06cdff86c21a9d2c20f528d9c620604dde1b6a07.
commit ebb6143e9dffbc0980dcb1445a18088f3e7fd4fa
Author: rjbriody <[email protected]>
Date: 2016-03-01T13:45:46Z
Revert "Update traversal docs for Profile step changes."
This reverts commit 86941ee5c6ff37f2bc4f88ea520a530e9372ba79.
commit 50c56563157337e71f74a940b9f6c17718860dd3
Author: rjbriody <[email protected]>
Date: 2016-03-01T13:49:22Z
Update traversal docs for Profile step changes.
commit 7013a9746cc04da32bb63fe0e6932a468c47ff0d
Author: rjbriody <[email protected]>
Date: 2016-03-01T13:50:37Z
merge master
commit 5f6ab602c2539f12c35c3be607f287dd689375af
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-02T21:23:20Z
Created MemoryTraversalSideEffects which backs a TraversalSideEffects by
GraphComputer.Memory. Everything works except AggregateStep and ProfileStep.
Need more thinking. I found a fundamental bug in Step.clone() where SideEffects
in the cloned traversal referenced the original traversal. It had to do with
the fast the cloning of a step is like this -- step.clone(),
step.setTraversal(traversal). As such, for all TraversalParent steps, I had to
Override setTraversal(traversal) and it is there where
TraversalParent.integrateChild(...) is called. However, once that was done,
everything with MemoryTraversalSideEffects just worked (except Aggregate and
Profile --- of course). This is a really sweet model. Removed
VertexTraversalSideEffects which was the old model which stored sideEffects as
traverser properties which were then processed and aggregated via MapReduce
jobs. With MemoryTraversalSideEffects, the sideEffects are contained within the
TraversalVertexProgram job --- its so slic
k (and way more efficient).
commit b63f62e7008614782bcd3c4ef5a8b4ea7e6eb830
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-02T23:30:54Z
through hell and back. TraversalSideEffects are clean and proper. Added
LocalBarrier interface which tells GraphComputer not to aggregate the
traversers to the master traversal, but instead to keep them distributed across
the cluster, but block -- that is, barrier. This is a way to force a
sychronization without having to move data. Currently only AggregateStep
implements LocalBarrier. However, other steps such as NoOpCollectingBarrierStep
and SupplyingBarrierStep can use it. All test cases pass except for the
ProfileTest stuff. Going to merge in the @rjbriody branch into this branch and
get everything smooth and slick and then PR both from this branch.
commit 08780c4b8ceaa4231b4a8421fea064c0f8122f84
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-03T14:25:19Z
Figured out why Groovy traversals were failing. I was overriding the
SideEffects in ScriptTraversal. I needed to use the SideEffects from the parent
traversal. Commented out the ProfileTests for now. Going to now try and merge
@rjbriody work into this branch and then if all is gravy, PR both from here.
commit e4ea17395478132e7f78bd53a9921ef803ab6a96
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-03T14:28:44Z
Merge branch 'TINKERPOP-958' of
https://github.com/rjbriody/incubator-tinkerpop into TINKERPOP-1192
Conflicts:
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/ProfileSideEffectStep.java
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/gryo/GryoMapper.java
gremlin-groovy-test/src/main/java/org/apache/tinkerpop/gremlin/process/GroovyProcessComputerSuite.java
gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java
commit ff7f7d2b74a42d7c9574a0ebbb8cc4a6897195f5
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-03T17:03:09Z
added @rjbriody ProfileStep updates.
commit 5ec7dd063087dfd8c7cc1894bf86fa82a80a75ba
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-03T19:11:02Z
Finalized the TravarsalSideEffect API work. There is one breaking change
that was needed -- its small. However, the semantics of TraversalSideEffect now
require all side-effects to be registered upfront via withSideEffect(). Users
can no longer create a side-effect mid-traversal. This ensures that OLTP and
OLAP behave identically. This occurs when people did something like ---
g.V().sideEffect{it.sideEffect('a',new Set())}.out().out().... that sideEffect
will need to be created at g.withSideEffect('a',Set::new).... Likewise. All
other methods work but are deprecated and assume Operator.assign if now
registered Reducer exists --- this will be okay for OLTP, but will lead to bad
results in OLAP. In short, peoeople should really register reducers with their
sideEffects -- g.V.withSideEffect(a,Set::new,Operator.addAll)......Added a
pretty insan-o nested, sideEffect usage test to GroupCountTest -- OLTP and OLAP
are happy.
commit ff390381b9896a16c73a056f167e84f2beb8f1d6
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-03T20:07:52Z
added test to make sure declared SideEffect is ultimate returned class.
commit f778b5b0ab13c3860c30e39eadd64dcdd7af1c18
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-03T22:11:40Z
Added some crazy tests to SideEffectTest which verify that registered
operators and suppliers as well as mid-traversal side-effect calls behave as
expected in both OLTP and OLAP. This required lots of tinkering with
DefaultTraversalSideEffects to get the logic right around registeration.
Removed a pointless TraversalSideEffectsTest we had. Doing all the complex
logic in SideEffectTest.
commit 731789637d2b47306bed3a2f3008c9bf063a0f3a
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-03T23:58:00Z
fixed a bug in DefaultTraversalSideEffects.mergedInto(). Added THE MOST
INSANE test case to PageRankTest. SideEffects are preserved across
GraphComputer jobs. Insane.
commit 02a551a5f544f8f3b2344631252c21c03a51f117
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-04T02:14:18Z
added some nice test cases that ensure cloning and compiling respect
TraversalSideEffect propagation. Added test cases that verify that the new
TraversalSideEffect API changes are correctly implemented by
DefaultTraversalSideEffect. All in all, just more tests cases to give us
confidence.
commit 1c174ba5e5dfa025b41df541ba21b3637e17901e
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-04T02:31:34Z
realized a very simple (and wildly efficient) optimization to
TraversalVertexProgram around when to and not to return halted traversers to
the master traversal. Okay. Done for the night.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---