Hello,

I believe that the most important aspect of Gremlin is the concept of bulking. 
With graph traversals leading to combinatoric explosions in both time and 
space, the best way to battle such explosions is to realize that even if the 
number of unique paths is increasing rapidly, there is an upper limit on the 
number of unique vertices touched. In order to project an exponential space to 
a constant space, Gremlin traverses have the concept of a bulk. If two 
traversers “are equals,” then when they meet each other at the same step in the 
traversal and location in the graph, they are merged and their respective bulks 
are sum’d. Thus, two traverser become one. Excellent. This is why we can do 
massive branch traversals like below in blazing fast time:

gremlin> clockWithResult(10){g.V().repeat(out()).times(10).count().next()}
==>32.8619489 // 33 milliseconds
==>4947356572210703772 // 4 quintillion paths explored

However, this model breaks down when traversers are no longer “equal.” While 
two traversers may have the same step and graph location, they may have 
different path histories. Thus, for traversals that leverage path information, 
it is very difficult to get traverser equality and thus, because the number of 
unique paths grows exponentially and traverser equality is determined by path 
equality (when paths are required for a traversal), you get exponential 
traverser growth.

gremlin> 
clockWithResult(10){g.V().repeat(out()).times(10).path().count().next()}
// this will not complete within the life of universe

Thus, in order to make Gremlin fast, its all about doing our best to reduce the 
amount of information each traverser maintains which, in principle, increases 
the likelihood that any two traversers can be merged/bulked (increases the 
likelihood of “equality”).

Ted Wilmes led the development of 
https://issues.apache.org/jira/browse/TINKERPOP-1254 
<https://issues.apache.org/jira/browse/TINKERPOP-1254>. With this ticket, 
traversers are able to shed path information they will no longer need down the 
line. Thus, not only reducing their path footprint (space), but also increasing 
the likelihood of bulking (space+time). How does it work? Take the following 
example traversal:

g.V().as(‘a’).out().as(‘b’).where(‘a’,eq(‘b’)).select(‘a’).out() // get the 
adjacent neighbors of those traversers that have themselves as a neighbor

Prior to PathRetractionStrategy, a traverser’s path history would look like 
this:
        
[v[1]]                          // V
[v[1], v[2]]                    // out
[v[1], v[2]]                    // where
[v[1], v[2], v[1]]              // select
[v[1], v[2], v[1], v[3]]        // out

Now with PathRetractionStrategy.

[v[1]]                          // V
[v[1], v[2]]                    // out
[v[1]]                          // where
[]                              // select
[]                              // out

Once where() is done with “b”, it wipes the “b”-path information for the 
traverser’s path history. Next, once select() is done with “a”, it wipes 
“a”-path information from the traverser’s path history. Thus, after select(), 
there is no longer path information required for the traversal to faithfully 
execute. Moreover, after select(), bulking becomes extremely likely and thus, 
instead of calculating out() on an exponential set of traversers, we will only 
calculate it on the upper limit of traverses which is |V| (the total number of 
vertices in the graph). Next, what is really cool is that MatchStep is smart 
about pruning path information dynamically as needed based upon which patterns 
still required for the traverser to complete its computation. Here is a 
match()-traversal that finds all vertices that are in a triangle relationship. 
Note that we don’t care about triangles, just the vertices involved in one as 
we only select(“a”). However, by only selecting(“a”) this means that MatchStep 
is able to dynamically prune path information as it is no longer needed — “b” 
and “c".

g.V().has(“performances”,gt(500)).match(
  as("a").out().as("b"),
  as(“b”).out().as("c”),
  as(“c”).out().as(“a”)).select("a").profile()


This leads to the following runtimes below. Notice how PathRetractionStrategy 
is able to increase the likelihood of bulking in MatchStep — as path 
information is shed (no longer needed for the remaining patterns to check), 
traversers get merged and thus, less computation is required. However, if you 
need to select(“a”,”b”,”c”) from the match() patterns, well, then 
PathRetractionStrategy provides no benefit. Though, if you just need “a” and 
“b”, then PathRetractionStrategy helps. In short, as long as intermediate 
results can be shed along the way, then PathRetractionStrategy will capitalize 
on this fact and increase the likelihood of bulking.

OLTP w/o PathRetractionStrategy 

Traversal Metrics
Step                                                               Count  
Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[performances.gt(500)])                         9        
   9          20.937     1.42
MatchStep(AND,[[MatchStartStep(a), ProfileStep,...                  9546        
9546        1435.432    97.36
  MatchStartStep(a)                                                    9        
   9          35.414
  VertexStep(OUT,vertex)                                             603        
 603          23.232
  MatchEndStep(b)                                                    603        
 603          44.727
  MatchStartStep(b)                                                  603        
 602           8.148
  VertexStep(OUT,vertex)                                           24279       
24279          31.825
  MatchEndStep(c)                                                  24279       
24279          41.708
  MatchStartStep(c)                                                24279       
24248          31.367
  VertexStep(OUT,vertex)                                         1073459     
1073459         749.078
  MatchEndStep(a)                                                   9546        
9546         350.763
SelectOneStep(a)                                                    9546        
9546          18.016     1.22
                                            >TOTAL                     -        
   -        1474.386        -

OLTP w/ PathRetractionStrategy 

Traversal Metrics
Step                                                               Count  
Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[performances.gt(500)])                         9        
   9           1.441     0.48
MatchStep(AND,[[MatchStartStep(a), ProfileStep,...                  9546        
   9         295.555    99.20
  MatchStartStep(a)                                                    9        
   9           2.385
  VertexStep(OUT,vertex)                                             603        
 603           2.006
  MatchEndStep(b)                                                    603        
 603           5.554
  MatchStartStep(b)                                                  603        
 602           3.245
  VertexStep(OUT,vertex)                                           24279       
24279          17.903
  MatchEndStep(c)                                                  24279       
24279         112.304
  MatchStartStep(c)                                                24279        
2574           4.445
  VertexStep(OUT,vertex)                                         1073459       
64061          39.700
  MatchEndStep(a)                                                   9546        
 619          26.218
SelectOneStep(a)                                                    9546        
   9           0.603     0.20
NoOpBarrierStep(2500)                                               9546        
   9           0.349     0.12
                                            >TOTAL                     -        
   -         297.950        -

OLAP w/o PathRetractionStrategy 

Traversal Metrics
Step                                                               Count  
Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[])                                                 808        
 808          15.306    32.30
HasStep([performances.gt(500)])                                        9        
   9           0.048     0.10
MatchStep(AND,[[MatchStartStep(a), ProfileStep,...                  9546        
9546          21.210    44.76
  MatchStartStep(a)                                                    9        
   9           0.037
  VertexStep(OUT,vertex)                                             603        
 602           4.915
  MatchEndStep(b) (profiling ignored)                                           
               0.000
  MatchStartStep(b)                                                  603        
 602           5.604
  VertexStep(OUT,vertex)                                           24279       
24248          28.272
  MatchEndStep(c) (profiling ignored)                                           
               0.000
  MatchStartStep(c)                                                24279       
24248          26.132
  VertexStep(OUT,vertex)                                         1073459     
1072266        1478.613
  MatchEndStep(a) (profiling ignored)                                           
               0.000
SelectOneStep(a)                                                    9546        
9546          10.826    22.84
                                            >TOTAL                     -        
   -          47.392        -

OLTP w/ PathRetractionStrategy 

Traversal Metrics
Step                                                               Count  
Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[])                                                 808        
 808           5.633    94.99
HasStep([performances.gt(500)])                                        9        
   9           0.059     1.00
MatchStep(AND,[[MatchStartStep(a), ProfileStep,...                  9546        
   9           0.145     2.45
  MatchStartStep(a)                                                    9        
   9           0.041
  VertexStep(OUT,vertex)                                             603        
 602           3.854
  MatchEndStep(b) (profiling ignored)                                           
               0.000
  MatchStartStep(b)                                                  603        
 602           1.891
  VertexStep(OUT,vertex)                                           24279       
24248          78.803
  MatchEndStep(c) (profiling ignored)                                           
               0.000
  MatchStartStep(c)                                                24279        
2574           5.581
  VertexStep(OUT,vertex)                                         1073459       
63887         101.209
  MatchEndStep(a) (profiling ignored)                                           
               0.000
SelectOneStep(a)                                                    9546        
   9           0.092     1.56
                                            >TOTAL                     -        
   -           5.930        -


Enjoy,
Marko.

http://markorodriguez.com



Reply via email to