Added some more details on collections recipe for List CTR

Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/4eb80bd4
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/4eb80bd4
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/4eb80bd4

Branch: refs/heads/TINKERPOP-1489
Commit: 4eb80bd4878d455bff761c091b8fcd8414d59660
Parents: 586e432
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Tue Jan 16 14:31:55 2018 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Tue Jan 16 14:31:55 2018 -0500

----------------------------------------------------------------------
 docs/src/recipes/collections.asciidoc | 282 ++++++++++++++++++++++++++++-
 1 file changed, 279 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/4eb80bd4/docs/src/recipes/collections.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/collections.asciidoc 
b/docs/src/recipes/collections.asciidoc
index 446ed46..5db1e6b 100644
--- a/docs/src/recipes/collections.asciidoc
+++ b/docs/src/recipes/collections.asciidoc
@@ -21,9 +21,237 @@ image:gremlin-collections.png[width=400]
 
 Lists and maps form the basis for much of the processing in Gremlin 
traversals. They are core to how side-effects
 are typically held and how results are generally produced. Being able to pick 
them apart and reformat them is sometimes
-required.
+required. This need to shape the date within a traversal may arise both at the
+link:http://tinkerpop.apache.org/docs/current/reference/#terminal-steps[terminal
 step] of the traversal (technically
+just prior to the terminal step) or in the middle of a traversal. Considering 
the former, a transformation just prior
+to iteration will get the result into the form required by the application 
which would remove the need for additional
+application level manipulation. Moreover, a transformation at this stage may 
reduce the size of the payload being
+returned which could be useful in remote applications. Examining the latter, 
there may be times where a `List` or `Map`
+requires some mid-traversal transformation so as to continue with the general 
logic of the traversal itself. For
+example, a traversal at some point might produce a `Map` of `List` objects 
where the lists contain vertices, where each
+`List` might need to be sorted by some criteria and then the top item for each 
extracted to become the basis for the
+continued traversal. Executing transformations for either of these types of 
situations can be made possible with the
+patterns described in this section.
 
-One of the most common ways to end up with a `Map` is with `valueMap()`:
+The appearance of a `List` as a traverser in Gremlin usually arises as a 
result of a `fold()` operation, but may also
+appear by way of some side-effect steps like `store()`:
+
+[gremlin-groovy,modern]
+----
+g.V().fold()
+g.V().store('a').cap('a')
+----
+
+It is worth noting that while a `Path` is not technically a `List` it does 
present like one and can be manipulated in
+similar fashion to lists:
+
+[gremlin-groovy,modern]
+----
+g.V().out().out().path()
+----
+
+These examples are obviously trivial and there are other ways that a traverser 
might end up in a `List` form, but, at
+this moment, the point here is to focus less on how to get a `List` and more 
on how to manipulate one within the
+Gremlin language. The examples going forward will also be similarly contrived 
insofar as producing a usable `List` to
+manipulate. Bear in mind that it may be quite possible to get the same end 
results of these examples using more direct
+means than what is demonstrated.
+
+It may seem simple, but the most obvious choice to modifying what is in a list 
is to simply `unfold()` the `List`:
+
+[gremlin-groovy,modern]
+----
+g.V().fold().unfold().values('name')
+g.V().store('a').cap('a').unfold().values('name')
+----
+
+The above examples show that `unfold()` works quite well when you want don't 
want to preserve the `List` structure of
+the traverser as it just flattens `List` traversers to the traversal stream. 
The above examples only have one `List`
+as a result, but consider what happens when there is more than one:
+
+[gremlin-groovy,modern]
+----
+g.V().union(fold(),fold())
+g.V().union(fold(),fold()).unfold().values('name')
+----
+
+The two separate `List` traversers are flattened to a single traversal stream 
and all the results are mixed together.
+While this approach may be acceptable, there are many cases where it might not 
be so. To preserve the individual
+structure of the `List` traversers "locally" `unfold()` the lists to transform 
them:
+
+[gremlin-groovy,modern]
+----
+g.V().
+  union(fold(),fold()).
+  local(unfold().values('name').fold())
+----
+
+The call to `local()` executes its anonymous sub-traversal over each 
individual `List` iterator and as the
+sub-traversal ends with a `fold()` step, the results are reduced back into a 
`List` to preserve the original structure,
+thus maintaining two traverser results.
+
+This pattern for unfolding and folding `List` traversers ends up having other 
applications:
+
+[gremlin-groovy,modern]
+----
+g.V().union(limit(3).fold(),tail(3).fold())    <1>
+g.V().union(limit(3).fold(),tail(3).fold()).
+  local(unfold().                              <2>
+        order().
+          by(bothE().count(),decr).
+        limit(1).
+        fold())
+g.V().union(limit(3).fold(),tail(3).fold()).   <3>
+  local(unfold().
+        has('age',gte(29)).
+        values('age').
+        mean())
+----
+
+<1> The output consists of two `List` traversers.
+<2> For each `List` of vertices, order them by their number of edges, and 
choose the first one which will be the one
+with the highest degree (i.e. number of edges). By ending with `fold()` the 
`List` traverser structure is preserved
+thus returning two `List` objects. Consider this a method for choosing a "max" 
or a highly ranked vertex. In this case
+the rank was determined by the number of edges, but it could have just as 
easily been determined by a vertex property,
+edge property, a calculated value, etc. - one simply needs to alter the `by()` 
step modulator to `order()`.
+<3> For each `List` of vertices, filter that `List` to only include vertices 
that have an "age" property with a
+value greater than or equal to "29" and then average the results of each list. 
More generally, consider how this
+approach performs some kind of reducing calculation on each `List` traverser. 
In this case, an average was calculated,
+but it might also have been a `sum()`, `count()` or similar operation that 
reduced the list to a single calculated
+value.
+
+So far, this section has focused on what to do with a `List` traverser once 
there is one present and there have been
+fairly contrived examples for how to produce one in the first place. The use 
of `fold()` has been used most frequently
+at this point to achieve list creation and that step should be recalled 
whenever there is a need to reduce some
+traversal stream to an actual `List`. Of course, it may become necessary to 
more manually construct a `List`,
+especially in cases where the expected output of the traversal is composed of 
one or more ordered results in the
+form of a `List`. For example, consider the following three traversals:
+
+[gremlin-groovy,modern]
+----
+g.V().has('name','marko').values('age')     <1>
+g.V().has('name','marko').                  <2>
+  repeat(out()).
+    until(has('lang','java')).
+  path().
+    by('name')
+g.V().has('name','marko').                  <3>
+  repeat(outE().inV()).
+    until(has('lang','java')).
+  path().
+  local(unfold().
+        has('weight').
+        values('weight').
+        mean())
+----
+
+<1> Get the age of "marko"
+<2> get the "name" values of the vertices in the collected paths that traverse 
out from "marko" to any vertex with
+the "lang" of "java".
+<3> Get the average of the "weight" values of edges in the collected paths 
that traverse out from "marko" to any vertex
+with the "lang" of "java". Note the use of the earlier defined pattern that 
used `local()` in conjunction with
+`unfold()`. In this case it filters out vertices from the `Path` as they are 
not relevant as the concern is only with
+the "weight" property on the edges.
+
+For purposes of this example, the three traversals above happen to represent 
three pieces of data that are required by
+an application. It is plain to note that all of the above traversals hold a 
similar pattern that starts with
+"getting 'marko'" and, in the case of the latter two, traversing on outgoing 
edges away from him and collecting data
+from that path. Ideally, all three of these traversals should execute as one 
to prevent having to submit three separate
+traversals, thus incurring additional query execution costs for what amounts 
to be largely the same underlying data but
+with different transformations applied. The goal here would be to return the 
results of this data as a `List` with
+three results (i.e. triple) that could then be submitted once by the 
application. The following example demonstrates
+the use of `store()` to aid in construction of this `List`:
+
+[gremlin-groovy,modern]
+----
+g.V().has('name','marko').as('v').                             <1>
+  store('a').                                                  <2>
+    by('age').
+  repeat(outE().as('e').inV().as('v')).                        <3>
+    until(has('lang','java')).
+  aggregate('b').                                              <4>
+    by(select(all,'v').unfold().values('name').fold()).
+  aggregate('c').                                              <5>
+    by(select(all,'e').unfold().values('weight').mean()).
+  fold().                                                      <6>
+  store('a').                                                  <7>
+    by(cap('b')).
+  store('a').                                                  <8>
+    by(cap('c')).
+  cap('a')
+----
+
+<1> Get the "marko" vertex and label that step as "v".
+<2> Store the first "age" of "marko" as the first item in the `List` called 
"a", which will ultimately be the result.
+<3> Execute the traversal away from "marko" and continue to traverse on 
outgoing edges until the vertex has the value
+of "java" for the "lang" property. Note the labels of "e" and "v". Note that 
"e" will contain a `List` of all of the
+edges that have been traversed and "v" will contain a `List` of all the 
vertices that have been traversed.
+<4> The incoming traverser to `aggregate('b')` are vertices that terminate the 
`repeat()` (i.e. those with the "lang"
+of "java"). Note however that the `by()` modulator overrides that traverser 
completely by starting a fresh stream of
+the list of vertices in "v". Those vertices are unfolded to retrieve the name 
property from each and then are reduced
+with `fold()` back into a list to be stored in the side-effected named "b".
+<5> A similar use of `aggregate()` as the previous step, though this one turns 
"e" into a stream of edges to calculate
+the `mean()` to store in a `List` called "c". Note that `aggregate()` was used 
here instead of `store()`, as
+the former is an eager collection of the elements in the stream (`store()` is 
lazy) and will force the traversal to be
+iterated up to that point before moving forward. Without that eager 
collection, "v" and "e" would not contain the
+complete information required for the production of "b" and "c".
+<6> Adding `fold()` step here is a bit of a trick. To see the trick, copy and 
paste all lines of Gremlin up to but
+not including this `fold()` step and run them against the "modern" graph. The 
output is three vertices and if the
+`profile()` step was added one would also see that the traversal contained 
three traversers. These three traversers
+with a vertex in each one were produced from the `repeat()` step (i.e. those 
vertices that had the "lang" of "java"
+when traversing away from "marko"). The `aggregate()` steps are side-effects 
and just allow the traversers to pass
+through them unchanged. The `fold()` obviously converts those three traversers 
to a single `List` to make one
+traverser with a `List` inside. That means that the remaining steps following 
the `fold()` will only be executed one
+time each instead of three, which, as will be shown, is critical to the proper 
result.
+<7> The single traverser with the `List` of three vertices in it passes to 
`store()`. The `by()` modulator presents an
+override telling Gremlin to ignore the `List` of three vertices and simply 
grab the "b" side effect created earlier and
+stick that into "a" as part of the result. The `List` with three vertices 
passes out unchanged as `store()` is a
+side-effect step.
+<8> Again, the single traverser with the `List` of three vertices passes to 
`store()` and again, the `by()` modulator
+presents an override to include "c" into the result.
+
+All of the above code and explanation show that `store()` can be used to 
construct `List` objects as side-effects
+which can then be used as a result. Note that `aggregate()` can be used to 
similar effect, should it make sense that
+lazy `List` creation is not acceptable with respect to the nature of the 
traversal. An interesting sub-pattern that
+emerges here is that the `by()` step can modulate its step to completely 
override the current traverser and ignore its
+contents for purpose of that step. This ability to override a traverser acts 
as a powerful and flexible tool as it
+means that each traverser can effectively become a completely different object 
as determined by a sub-traversal.
+
+Another interesting method for `List` creation was demonstrated a bit earlier 
but not examined in detail - the use of
+`union()`. It was shown earlier in the following context where it helped 
create a `List` of two lists of three
+vertices each:
+
+[gremlin-groovy,modern]
+----
+g.V().union(limit(3).fold(),tail(3).fold())
+----
+
+By folding the results of `union()`, it becomes possible to essentially 
construct lists with arbitrary traversal
+results.
+
+[gremlin-groovy,modern]
+----
+g.V().
+  local(union(identity(),                   <1>
+              bothE().count()).
+        fold())
+g.V().
+  store('x').
+    by(union(select('x').count(local),      <2>
+             identity(),
+             bothE().count()).
+             fold()).
+  cap('x')
+----
+
+<1> For each vertex, create a "pair" (i.e. a `List` of two objects) of the 
vertex itself and its edge count.
+<2> For each vertex, create a "triple" (i.e. a `List`of three objects) of the 
index of the vertex (starting at zero),
+the vertex itself and its edge count.
+
+The pattern here is to use `union()` in conjunction with `fold()`. As 
explained earlier, the `fold()` operation reduces
+the stream from `union()` to a single `List` that is then fed forward to the 
next step in the traversal.
+
+Now that `List` patterns have been explained, there can now be some attention 
on `Map`. One of the most common ways
+to end up with a `Map` is with `valueMap()`:
 
 [gremlin-groovy,modern]
 ----
@@ -64,4 +292,52 @@ on that sub-traversal (it does not iterate the entire 
thing).
 Generally speaking, a `Map` constructed as part of `group()` or `project()` 
will already be in the form required as
 the `by()` modulators would be written in such a fashion as to produce that 
final output. It would be unnecessary to
 deconstruct/reconstruct it. Be certain that there isn't a way to re-write the 
`group()` or `project()` to get the
-desired output before taking this approach.
\ No newline at end of file
+desired output before taking this approach.
+
+In the following case, `project()` is used to create a `Map` that does not 
meet this requirement as it contains some
+unavoidable extraneous keys in the output `Map`:
+
+[gremlin-groovy,modern]
+----
+g.V().
+  project('name','age','lang').
+    by('name').
+    by(coalesce(values('age'),constant('n/a'))).
+    by(coalesce(values('lang'),constant('n/a')))
+----
+
+The use of `coalesce()` works around the problem where "age" and "lang" are 
not necessarily property keys present on
+every single vertex in the traversal stream. When the "age" or "lang" are not 
present, the constant of "n/a" is
+supplied. While this may be an acceptable output, it is possible to shape the 
`Map` to be "nicer":
+
+[gremlin-groovy,modern]
+----
+g.V().
+  project('name','age','lang').
+    by('name').
+    by(coalesce(values('age'),constant('n/a'))).
+    by(coalesce(values('lang'),constant('n/a'))).
+  local(unfold().
+        filter(select(values).is(P.neq('n/a'))).
+        group().
+          by(keys).
+          by(values))
+----
+
+The addition steps above `unfold()` the `Map` to key-value entries and filter 
the values for "n/a" and remove them
+prior to reconstructing the `Map` with the method shown earlier. To go a step 
further, apply the pattern presented
+earlier to flatten `List` values within a `Map`:
+
+[gremlin-groovy,modern]
+----
+g.V().
+  project('name','age','lang').
+    by('name').
+    by(coalesce(values('age'),constant('n/a'))).
+    by(coalesce(values('lang'),constant('n/a'))).
+  local(unfold().
+        filter(select(values).is(P.neq('n/a'))).
+        group().
+          by(keys).
+          by(select(values).unfold()))
+----
\ No newline at end of file

Reply via email to