[ 
https://issues.apache.org/jira/browse/TINKERPOP3-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582463#comment-14582463
 ] 

Marko A. Rodriguez commented on TINKERPOP3-700:
-----------------------------------------------

{{XMatchStep}} now works for both OLAP and OLTP by leveraging the nice 
constructs in {{ComputerAwareStep}}.
  
https://github.com/apache/incubator-tinkerpop/blob/15e3f77c96c38bbceadd1b977e1afc58913d076b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java

So, now I think we need to add something like:

{code}
XMatchStep.setMatchAlgorithm(final BiFunction<Traverser, 
List<Traversal>,Optional<Traversal>> matchAlgorithm);
{code}

In other words, given a Traverser and a list of Traversals, which traversal 
should it take next? With an {{Optional.empty()}} return meaning, "the 
traverser is done, send him to the next step."

What we have right now would be called {{GreedyMatchAlgorithm}} which just 
takes the first traversal in the list that it can legally take and has not 
already taken. If someone did {{BudgetMatchAlgorithm}}, then it could be an 
object that maintained state such as statistics about how many traversers are 
taking per branch (trying to choose that don't generate too many traversers), 
how many traversers are being filtered (choose super selective branches first), 
how long a traversal is taking, etc. Then it can be smart about picking which 
traversal pattern to go down. What is crazy about the {{XMatchStep}} 
implementation is that a {{BudgetMatchAlgorithm}} with global state would work 
(however, it would only have the global statistics about its particular graph 
partition (worker split)). In this way, budget match would work in OLAP.

Finally, AND/ORing appears trivial as we can just nest MatchSteps with an 
optimization being trying to get all ANDs to be "parallel" so the 
{{XXXMatchAlgorithm}} has as many choices to choose from. I will try and get 
that done next.


> WhereStep should "MatchStep" and ConjunctionP should use the BudgetAlgorithm
> ----------------------------------------------------------------------------
>
>                 Key: TINKERPOP3-700
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-700
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: process
>            Reporter: Marko A. Rodriguez
>
> {code}
> g.V.as('a').where(a.knows.b
>                   a.knows.c
>                   b.knows.c)
> {code}
> The above can be written as an OrP of the form:
> {code}
> g.V.as('a').or(select(a).knows.b.select(b).knows.c.select(a).knows.where(eq(c)),
>                
> select(a).knows.c.select(a).knows.b.select(b).knows.where(eq(c)));
> {code}
> In essence, the where-statements are rewritten in terms of every possible 
> permutation. When these permutations are put into an OrP (via 
> or(traversals…)), then if any branch returns a result, then the original 'a' 
> is emitted (as {{WhereStep}} is a {{FilterStep}}). If {{OrP}} is under the 
> BudgetAlgorithm, then {{OrP}} will "thread between" its traversals until a 
> value is yielded. *And given that all permutations are the same semantics -- 
> if one fails, they all fail!*
> What is nice about this, is that arbitrary nesting comes "for free."
> {code}
> g.V.as('a').where(a.knows.b
>                   a.uncle.b
>                     and(a.worksFor.c
>                           b.worksFor.c))
> {code}
> This is rewritten as:
> {code}
> g.V.as('a').or(select(a).knows.b.or(select(a).worksFor.c.select(b).worksFor.where(eq(c)),select(b).worksFor.c.select(a).worksFor.where(eq(c))).select(a).uncle.where(eq(b)),
>                
> select(a).uncle.b.select(a).knows.where(eq(b)).or(select(a).worksFor.c.select(b).worksFor.where(eq(c)),select(b).worksFor.c.select(a).worksFor.where(eq(c))))
> {code}
> *IMPORTANT* This is not a "match" in the {{MatchStep}} sense as it doesn't 
> return all permutations that bind, it only filters based on a single match.
> What is interesting about this approach:
>   1. The rewrite algorithm seems simple as its just concatenation given 
> {{select()}}-projections and {{where(eq)}}-tails.
>   2. The cool thing about the rewrite in all possible permutations is that if 
> any one {{FastNoSuchElementException}}, its booted from the {{ConjunctionP}} 
> analysis.
>   3. {{ConjunctionP}} has the BudgetAlgorithm and thus can be used for ANY 
> step that has conjunctions -- {{HasStep}}, {{IsStep}}, etc.
>   4. It uses the path data structure to maintain the variable bindings. 
> {{WhereStep}} has no state! Its all about {{OrP}}. 
>   5. Given that the path data is the variable bindings, then this also works 
> for OLAP as the traverser contains all the information it needs (no central 
> location of analysis!)
>       - However, you would only pick one permutation to do as `or()` does not 
> exist in OLAP. 
>       - and with one permutation, {{where().select()}} is then {{MatchStep}} 
> which would then work in OLAP!
>               - thus, Gremlin OLAP can rewrite {{match()}} to the 
> {{where().select()}} form and TADA!
>   
> *IMPORTANT* 4 and 5 above are pretty insane consequences. And if any, we 
> should at least use this realization to make {{match()}} work in OLAP.
> Next, realize that how {{where()}} should work is that if an {{as()}} is NOT 
> in the path data structure, then its a variable bindings for rewrite. 
> Moreover, if you don't provide a start {{as()}}, it is assumed to be the 
> incoming object (currently how {{where()}} works). For example:
> {code}
> g.V.where(knows.b
>           knows.c
>           b.knows.c)
> {code}
> This is rewritten as:
> {code}
> g.V.or(x.select(x).knows.b.or(select(b).worksFor.where(eq(a)),select(x).worksFor.where(eq(b))).select(x).uncle.where(eq(b)),
>        
> x.select(x).uncle.b.select(x).knows.where(eq(b)).or(select(b).worksFor.where(eq(x)),select(x).worksFor.where(eq(b))))
>   
> {code}
> To be sure, the {{as('a').select('a')}} fragments can of course be optimized 
> out to just {{as('a')}}.
>           



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to