[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226946#comment-16226946 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/734 > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225848#comment-16225848 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/734 @okram if all looks good to you after artem's changes you should be free to merge this now: All tests pass with `docker/build.sh -t -n -i` VOTE +1 > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224914#comment-16224914 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user artem-aliev commented on the issue: https://github.com/apache/tinkerpop/pull/734 I have fixed test failures. TinkerPopComputer does not call ComputerPorgram.execute methods if spit has no vertices. For example: modern graph has 6 vertices but computer has 8 cores, there will be two empty splits. TraversalVertexProgram use execute step to setup next profiling step, so it is not setup side effects properly for empty splits. So tests did not filed in docker but failed on computer with more then 6 cores. The fix add check that profile side effects were regester properly before using > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217150#comment-16217150 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user artem-aliev commented on the issue: https://github.com/apache/tinkerpop/pull/734 Something strange happened, during rebases. I'll fix. > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215019#comment-16215019 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/734 This doesn't build for me with just regular `mvn clean install` > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214622#comment-16214622 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user dkuppitz commented on the issue: https://github.com/apache/tinkerpop/pull/734 VOTE: +1 > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213938#comment-16213938 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/734 VOTE +1 > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212766#comment-16212766 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user artem-aliev commented on the issue: https://github.com/apache/tinkerpop/pull/734 The fix add iteration time to the appropriate step. Iteration time is a time between workerIterationStart() and workerIterationEnd() callbacks. So before fix the timing looks like ``` gremlin> g.V().out().out().count().profile() ==>Traversal Metrics Step Count Traversers Time (ms)% Dur = GraphStep(vertex,[])9962 9962 70.87348.95 VertexStep(OUT,vertex) 1012657 3745 37.13225.65 VertexStep(OUT,edge) 2101815 6192 36.75125.39 CountGlobalStep1 1 0.018 0.01 >TOTAL - - 144.775- ``` While query runs 10s seconds. After the fix: ``` gremlin> g.V().out().out().count().profile() ==>Traversal Metrics Step Count Traversers Time (ms)% Dur = GraphStep(vertex,[])9962 9962 14186.80997.43 VertexStep(OUT,vertex) 1012657 3745 340.051 2.34 VertexStep(OUT,edge) 2101815 6192 33.684 0.23 CountGlobalStep1 1 0.004 0.00 >TOTAL - - 14560.549- ``` That shows that most of the time for this OLAP query was spend in the initial iteration (actually star graph creation). There still could be some inconsistencies because 1. No computer specific setup time is measured 2. Spark has a lot of lazy staff, so most of Spark RDD setup are counted as a first step. 3. The algorithm could fail to assign timing to the right step in bas of sophisticated queries. By the way new timing is pretty close to the wall clock time. > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209481#comment-16209481 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/734 This is a nice update @artem-aliev because it doesn't change API and it is general for all `GraphComputer` implementations. Great! A couple things please for a solid VOTE. 1. Please update the `CHANGELOG.asciidoc` with the change you made. 2. In this PR discussion, please provide a `CUT/PASTE` of what the new metrics `toString()` looks like so people can judge its merits. Thank you. > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208076#comment-16208076 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user artem-aliev closed the pull request at: https://github.com/apache/tinkerpop/pull/733 > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208077#comment-16208077 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user artem-aliev commented on the issue: https://github.com/apache/tinkerpop/pull/733 See #734 > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208075#comment-16208075 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user artem-aliev commented on the issue: https://github.com/apache/tinkerpop/pull/733 that should be against tp32 branch > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208074#comment-16208074 ] ASF GitHub Bot commented on TINKERPOP-1801: --- GitHub user artem-aliev opened a pull request: https://github.com/apache/tinkerpop/pull/734 TINKERPOP-1801: fix profile() timing in OLAP by adding worker iterati… …on timings to step metrics this is a simple fix that do not change any API You can merge this pull request into a Git repository by running: $ git pull https://github.com/artem-aliev/tinkerpop TINKERPOP-1801 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/734.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #734 commit 827ea9cfd57202612518e5e6bcff18f601dd2018 Author: artemalievDate: 2017-10-17T18:00:31Z TINKERPOP-1801: fix profile() timing in OLAP by adding worker iteration timings to step metrics this is a simple fix that do not change any API > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208065#comment-16208065 ] Artem Aliev commented on TINKERPOP-1801: That is simple way to fix it, without new API. Let's discuss better apporaches, I did not add new tests, I find set of them in a test suite. I slide have mine one but it is unstable because of timings. > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208050#comment-16208050 ] ASF GitHub Bot commented on TINKERPOP-1801: --- GitHub user artem-aliev opened a pull request: https://github.com/apache/tinkerpop/pull/733 TINKERPOP-1801: fix profile() timing in OLAP by adding worker iterati… …on timings to step metrics this is a simple fix that do not change any API You can merge this pull request into a Git repository by running: $ git pull https://github.com/artem-aliev/tinkerpop TINKERPOP-1801 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/733.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #733 commit 827ea9cfd57202612518e5e6bcff18f601dd2018 Author: artemalievDate: 2017-10-17T18:00:31Z TINKERPOP-1801: fix profile() timing in OLAP by adding worker iteration timings to step metrics this is a simple fix that do not change any API > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)