On Jul 12, 2016 4:22 PM, "Marko Rodriguez" <[email protected]> wrote:
>
> Hello,
>
> I believe that the most important aspect of Gremlin is the concept of
bulking. With graph traversals leading to combinatoric explosions in both
time and space, the best way to battle such explosions is to realize that
even if the number of unique paths is increasing rapidly, there is an upper
limit on the number of unique vertices touched. In order to project an
exponential space to a constant space, Gremlin traverses have the concept
of a bulk. If two traversers “are equals,” then when they meet each other
at the same step in the traversal and location in the graph, they are
merged and their respective bulks are sum’d. Thus, two traverser become
one. Excellent. This is why we can do massive branch traversals like below
in blazing fast time:
>
> gremlin> clockWithResult(10){g.V().repeat(out()).times(10).count().next()}
> ==>32.8619489 // 33 milliseconds
> ==>4947356572210703772 // 4 quintillion paths explored
>
> However, this model breaks down when traversers are no longer “equal.”
While two traversers may have the same step and graph location, they may
have different path histories. Thus, for traversals that leverage path
information, it is very difficult to get traverser equality and thus,
because the number of unique paths grows exponentially and traverser
equality is determined by path equality (when paths are required for a
traversal), you get exponential traverser growth.
>
> gremlin>
clockWithResult(10){g.V().repeat(out()).times(10).path().count().next()}
> // this will not complete within the life of universe
>
> Thus, in order to make Gremlin fast, its all about doing our best to
reduce the amount of information each traverser maintains which, in
principle, increases the likelihood that any two traversers can be
merged/bulked (increases the likelihood of “equality”).
>
> Ted Wilmes led the development of
https://issues.apache.org/jira/browse/TINKERPOP-1254 <
https://issues.apache.org/jira/browse/TINKERPOP-1254>. With this ticket,
traversers are able to shed path information they will no longer need down
the line. Thus, not only reducing their path footprint (space), but also
increasing the likelihood of bulking (space+time). How does it work? Take
the following example traversal:
>
> g.V().as(‘a’).out().as(‘b’).where(‘a’,eq(‘b’)).select(‘a’).out() // get
the adjacent neighbors of those traversers that have themselves as a
neighbor
>
> Prior to PathRetractionStrategy, a traverser’s path history would look
like this:
>
> [v[1]]                          // V
> [v[1], v[2]]                    // out
> [v[1], v[2]]                    // where
> [v[1], v[2], v[1]]              // select
> [v[1], v[2], v[1], v[3]]        // out
>
> Now with PathRetractionStrategy.
>
> [v[1]]                          // V
> [v[1], v[2]]                    // out
> [v[1]]                          // where
> []                              // select
> []                              // out
>
> Once where() is done with “b”, it wipes the “b”-path information for the
traverser’s path history. Next, once select() is done with “a”, it wipes
“a”-path information from the traverser’s path history. Thus, after
select(), there is no longer path information required for the traversal to
faithfully execute. Moreover, after select(), bulking becomes extremely
likely and thus, instead of calculating out() on an exponential set of
traversers, we will only calculate it on the upper limit of traverses which
is |V| (the total number of vertices in the graph). Next, what is really
cool is that MatchStep is smart about pruning path information dynamically
as needed based upon which patterns still required for the traverser to
complete its computation. Here is a match()-traversal that finds all
vertices that are in a triangle relationship. Note that we don’t care about
triangles, just the vertices involved in one as we only select(“a”).
However, by only selecting(“a”) this means that MatchStep is able to
dynamically prune path information as it is no longer needed — “b” and “c".
>
> g.V().has(“performances”,gt(500)).match(
>   as("a").out().as("b"),
>   as(“b”).out().as("c”),
>   as(“c”).out().as(“a”)).select("a").profile()
>
>
> This leads to the following runtimes below. Notice how
PathRetractionStrategy is able to increase the likelihood of bulking in
MatchStep — as path information is shed (no longer needed for the remaining
patterns to check), traversers get merged and thus, less computation is
required. However, if you need to select(“a”,”b”,”c”) from the match()
patterns, well, then PathRetractionStrategy provides no benefit. Though, if
you just need “a” and “b”, then PathRetractionStrategy helps. In short, as
long as intermediate results can be shed along the way, then
PathRetractionStrategy will capitalize on this fact and increase the
likelihood of bulking.
>
> OLTP w/o PathRetractionStrategy
>
> Traversal Metrics
> Step                                                               Count
Traversers       Time (ms)    % Dur
>
=============================================================================================================
> TinkerGraphStep(vertex,[performances.gt(500)])                         9
         9          20.937     1.42
> MatchStep(AND,[[MatchStartStep(a), ProfileStep,...                  9546
      9546        1435.432    97.36
>   MatchStartStep(a)                                                    9
         9          35.414
>   VertexStep(OUT,vertex)                                             603
       603          23.232
>   MatchEndStep(b)                                                    603
       603          44.727
>   MatchStartStep(b)                                                  603
       602           8.148
>   VertexStep(OUT,vertex)                                           24279
     24279          31.825
>   MatchEndStep(c)                                                  24279
     24279          41.708
>   MatchStartStep(c)                                                24279
     24248          31.367
>   VertexStep(OUT,vertex)                                         1073459
   1073459         749.078
>   MatchEndStep(a)                                                   9546
      9546         350.763
> SelectOneStep(a)                                                    9546
      9546          18.016     1.22
>                                             >TOTAL                     -
         -        1474.386        -
>
> OLTP w/ PathRetractionStrategy
>
> Traversal Metrics
> Step                                                               Count
Traversers       Time (ms)    % Dur
>
=============================================================================================================
> TinkerGraphStep(vertex,[performances.gt(500)])                         9
         9           1.441     0.48
> MatchStep(AND,[[MatchStartStep(a), ProfileStep,...                  9546
         9         295.555    99.20
>   MatchStartStep(a)                                                    9
         9           2.385
>   VertexStep(OUT,vertex)                                             603
       603           2.006
>   MatchEndStep(b)                                                    603
       603           5.554
>   MatchStartStep(b)                                                  603
       602           3.245
>   VertexStep(OUT,vertex)                                           24279
     24279          17.903
>   MatchEndStep(c)                                                  24279
     24279         112.304
>   MatchStartStep(c)                                                24279
      2574           4.445
>   VertexStep(OUT,vertex)                                         1073459
     64061          39.700
>   MatchEndStep(a)                                                   9546
       619          26.218
> SelectOneStep(a)                                                    9546
         9           0.603     0.20
> NoOpBarrierStep(2500)                                               9546
         9           0.349     0.12
>                                             >TOTAL                     -
         -         297.950        -
>
> OLAP w/o PathRetractionStrategy
>
> Traversal Metrics
> Step                                                               Count
Traversers       Time (ms)    % Dur
>
=============================================================================================================
> GraphStep(vertex,[])                                                 808
       808          15.306    32.30
> HasStep([performances.gt(500)])                                        9
         9           0.048     0.10
> MatchStep(AND,[[MatchStartStep(a), ProfileStep,...                  9546
      9546          21.210    44.76
>   MatchStartStep(a)                                                    9
         9           0.037
>   VertexStep(OUT,vertex)                                             603
       602           4.915
>   MatchEndStep(b) (profiling ignored)
                      0.000
>   MatchStartStep(b)                                                  603
       602           5.604
>   VertexStep(OUT,vertex)                                           24279
     24248          28.272
>   MatchEndStep(c) (profiling ignored)
                      0.000
>   MatchStartStep(c)                                                24279
     24248          26.132
>   VertexStep(OUT,vertex)                                         1073459
   1072266        1478.613
>   MatchEndStep(a) (profiling ignored)
                      0.000
> SelectOneStep(a)                                                    9546
      9546          10.826    22.84
>                                             >TOTAL                     -
         -          47.392        -
>
> OLTP w/ PathRetractionStrategy
>
> Traversal Metrics
> Step                                                               Count
Traversers       Time (ms)    % Dur
>
=============================================================================================================
> GraphStep(vertex,[])                                                 808
       808           5.633    94.99
> HasStep([performances.gt(500)])                                        9
         9           0.059     1.00
> MatchStep(AND,[[MatchStartStep(a), ProfileStep,...                  9546
         9           0.145     2.45
>   MatchStartStep(a)                                                    9
         9           0.041
>   VertexStep(OUT,vertex)                                             603
       602           3.854
>   MatchEndStep(b) (profiling ignored)
                      0.000
>   MatchStartStep(b)                                                  603
       602           1.891
>   VertexStep(OUT,vertex)                                           24279
     24248          78.803
>   MatchEndStep(c) (profiling ignored)
                      0.000
>   MatchStartStep(c)                                                24279
      2574           5.581
>   VertexStep(OUT,vertex)                                         1073459
     63887         101.209
>   MatchEndStep(a) (profiling ignored)
                      0.000
> SelectOneStep(a)                                                    9546
         9           0.092     1.56
>                                             >TOTAL                     -
         -           5.930        -
>
>
> Enjoy,
> Marko.
>
> http://markorodriguez.com
>
>
>

Reply via email to