[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337025#comment-15337025
 ] 

Marko A. Rodriguez commented on TINKERPOP-1254:
-----------------------------------------------

I have reviewed your code and overall this looks really good. Here are some 
notes both positive and negative in no particular order.

* {{PrunePathStrategy}} does "inherent parent traversal labels" .. That is 
smart. I was wondering how that was going to be done.
* {{MatchStep}} dynamically chooses internal traversals to execute based on 
runtime statistics. I don't see in your code how you are dropping labels 
dynamically based upon what is no longer needed given which traversals are left 
to execute (for that traverser, based on its tags).
* Dynamically pruning path labels in {{MatchStep}} is going to make things 
scale big time -- especially in OLAP. We will be able to see significant 
improvement even on the Grateful Dead graph.
* {{PrunePathStep}} should go away. (I think you know this)
* {{GraphTraversal.prunePath()}} should go away. (I think you know this)
* You have numerous exposed {{System.out.println(labels)}} in the various 
{{PathProcessor}} steps.

I do not thoroughly understand your {{ImmutablePath.retract()}} algorithm as 
{{ImmutablePath}} is a crazy data structure that is hard to reason on. 

Questions:
  1. Does the test suite pass?
  2. Do you have an benchmarks?

In terms of testing your strategy, please see how we test the other strategies 
and then introspect into the traversal post-compilation. For instance, 
something like this:

  
https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/OrderLimitStrategyTest.java

I compile the traversal than expect some result from its introspection. You 
would expect some {{Set<String>}} or perhaps even better {{List<Set<String>>}} 
for the keep labels of each step in the ordered traversal. 

Hope that helps.


> Support dropping traverser path information when it is no longer needed.
> ------------------------------------------------------------------------
>
>                 Key: TINKERPOP-1254
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.1.1-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set<String>)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set<String> dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser<S> traverser) {
>     final Traverser<S> start = this.starts.next();
>     if(this.dropPath) start.dropPath();
>     else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to