[jira] [Commented] (TINKERPOP-1800) Remote connect issue
[ https://issues.apache.org/jira/browse/TINKERPOP-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210480#comment-16210480 ] Loveneet kumar commented on TINKERPOP-1800: --- Thank you ...it was my mistake...i will read documents carefully next time.. > Remote connect issue > > > Key: TINKERPOP-1800 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1800 > Project: TinkerPop > Issue Type: Bug > Components: groovy >Affects Versions: 3.3.0 > Environment: WINDOWS 10 >Reporter: Loveneet kumar > > In windows 10 environment i facing this issue : > {color:#59afe1}gremlin> remote connect tinkerpop.server conf/remote.yaml > groovysh_parse: 2: expecting EOF, found 'conf' @ line 2, column 33. >remote connect tinkerpop.server conf/remote.yaml >^ > 1 error > Type ':help' or ':h' for help. > Display stack trace? [yN] > gremlin> :d > Buffer is empty > gremlin> :{color} > After changing slash direction > {color:#59afe1}gremlin> remote connect tinkerpop.server conf\remote.yaml > groovysh_parse: 2: unexpected char: '\' @ line 2, column 37. >remote connect tinkerpop.server conf\remote.yaml{color} >^ > 1 error -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] tinkerpop pull request #735: TINKERPOP-1803: inject() doesn't re-attach with...
GitHub user okram opened a pull request: https://github.com/apache/tinkerpop/pull/735 TINKERPOP-1803: inject() doesn't re-attach with remote traversals https://issues.apache.org/jira/browse/TINKERPOP-1803 Fixed an "attachement"-bug in `InjectStep` with a solution generalized to `StartStep`. VOTE +1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1803 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/735.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #735 commit 79b621c9a0ddc2d96f951c54ee3f1db3c8490d4c Author: Marko A. RodriguezDate: 2017-10-18T22:45:16Z Fixed an attachement-bug in with a solution generalized to . ---
[jira] [Commented] (TINKERPOP-1803) inject() doesn't re-attach with remote traversals
[ https://issues.apache.org/jira/browse/TINKERPOP-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210246#comment-16210246 ] ASF GitHub Bot commented on TINKERPOP-1803: --- GitHub user okram opened a pull request: https://github.com/apache/tinkerpop/pull/735 TINKERPOP-1803: inject() doesn't re-attach with remote traversals https://issues.apache.org/jira/browse/TINKERPOP-1803 Fixed an "attachement"-bug in `InjectStep` with a solution generalized to `StartStep`. VOTE +1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1803 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/735.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #735 commit 79b621c9a0ddc2d96f951c54ee3f1db3c8490d4c Author: Marko A. RodriguezDate: 2017-10-18T22:45:16Z Fixed an attachement-bug in with a solution generalized to . > inject() doesn't re-attach with remote traversals > - > > Key: TINKERPOP-1803 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1803 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.2.6 >Reporter: stephen mallette >Assignee: Marko A. Rodriguez >Priority: Critical > > In the console we get this: > {code} > gremlin> v2 = g.V(2).next() > ==>v[2] > gremlin> g.V(1).out().inject(v2).values("name") > ==>vadas > ==>lop > ==>vadas > ==>josh > {code} > From gremlin-python we can see: > {code} > >>> v2 = g.V(2).next() > >>> g.V(1).out().inject(v2).values("name").toList() > [u'lop', u'vadas', u'josh'] > {code} > and using {code}withRemote(){code} in java: > {code} > gremlin> v2 = g.V(2).next() > ==>v[2] > gremlin> g.V(1).out().inject(v2).values("name") > ==>lop > ==>vadas > ==>josh > {code} > Since {{inject()}} doesn't re-attach the vertex when {{values()}} gets called > it acts on a reference vertex with no properties and returns nothing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (TINKERPOP-1803) inject() doesn't re-attach with remote traversals
[ https://issues.apache.org/jira/browse/TINKERPOP-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko A. Rodriguez reassigned TINKERPOP-1803: - Assignee: Marko A. Rodriguez > inject() doesn't re-attach with remote traversals > - > > Key: TINKERPOP-1803 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1803 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.2.6 >Reporter: stephen mallette >Assignee: Marko A. Rodriguez >Priority: Critical > > In the console we get this: > {code} > gremlin> v2 = g.V(2).next() > ==>v[2] > gremlin> g.V(1).out().inject(v2).values("name") > ==>vadas > ==>lop > ==>vadas > ==>josh > {code} > From gremlin-python we can see: > {code} > >>> v2 = g.V(2).next() > >>> g.V(1).out().inject(v2).values("name").toList() > [u'lop', u'vadas', u'josh'] > {code} > and using {code}withRemote(){code} in java: > {code} > gremlin> v2 = g.V(2).next() > ==>v[2] > gremlin> g.V(1).out().inject(v2).values("name") > ==>lop > ==>vadas > ==>josh > {code} > Since {{inject()}} doesn't re-attach the vertex when {{values()}} gets called > it acts on a reference vertex with no properties and returns nothing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TINKERPOP-1797) LambdaRestrictionStrategy and LambdaMapStep in `by()`-modulation.
[ https://issues.apache.org/jira/browse/TINKERPOP-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko A. Rodriguez closed TINKERPOP-1797. - Resolution: Fixed Fix Version/s: 3.3.1 3.2.7 > LambdaRestrictionStrategy and LambdaMapStep in `by()`-modulation. > - > > Key: TINKERPOP-1797 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1797 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.2.6 >Reporter: Marko A. Rodriguez >Assignee: Marko A. Rodriguez > Fix For: 3.2.7, 3.3.1 > > > {code} > gremlin> g.V().groupCount().by(label).order(local).by(values) > The provided step contains a lambda comparator: > OrderLocalStep([[[LambdaMapStep(values)@[~gremlin.incidentToAdjacent, > ~gremlin.pathRetraction]], incr]]) > Type ':help' or ':h' for help. > Display stack trace? [yN] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1797) LambdaRestrictionStrategy and LambdaMapStep in `by()`-modulation.
[ https://issues.apache.org/jira/browse/TINKERPOP-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209852#comment-16209852 ] ASF GitHub Bot commented on TINKERPOP-1797: --- Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/730 > LambdaRestrictionStrategy and LambdaMapStep in `by()`-modulation. > - > > Key: TINKERPOP-1797 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1797 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.2.6 >Reporter: Marko A. Rodriguez >Assignee: Marko A. Rodriguez > > {code} > gremlin> g.V().groupCount().by(label).order(local).by(values) > The provided step contains a lambda comparator: > OrderLocalStep([[[LambdaMapStep(values)@[~gremlin.incidentToAdjacent, > ~gremlin.pathRetraction]], incr]]) > Type ':help' or ':h' for help. > Display stack trace? [yN] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] tinkerpop pull request #730: TINKERPOP-1797: LambdaRestrictionStrategy and L...
Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/730 ---
[jira] [Commented] (TINKERPOP-1797) LambdaRestrictionStrategy and LambdaMapStep in `by()`-modulation.
[ https://issues.apache.org/jira/browse/TINKERPOP-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209845#comment-16209845 ] ASF GitHub Bot commented on TINKERPOP-1797: --- Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/730 All tests pass with `docker/build.sh -t -n -i` VOTE +1 > LambdaRestrictionStrategy and LambdaMapStep in `by()`-modulation. > - > > Key: TINKERPOP-1797 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1797 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.2.6 >Reporter: Marko A. Rodriguez >Assignee: Marko A. Rodriguez > > {code} > gremlin> g.V().groupCount().by(label).order(local).by(values) > The provided step contains a lambda comparator: > OrderLocalStep([[[LambdaMapStep(values)@[~gremlin.incidentToAdjacent, > ~gremlin.pathRetraction]], incr]]) > Type ':help' or ':h' for help. > Display stack trace? [yN] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] tinkerpop issue #730: TINKERPOP-1797: LambdaRestrictionStrategy and LambdaMa...
Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/730 All tests pass with `docker/build.sh -t -n -i` VOTE +1 ---
[jira] [Created] (TINKERPOP-1803) inject() doesn't re-attach with remote traversals
stephen mallette created TINKERPOP-1803: --- Summary: inject() doesn't re-attach with remote traversals Key: TINKERPOP-1803 URL: https://issues.apache.org/jira/browse/TINKERPOP-1803 Project: TinkerPop Issue Type: Bug Components: process Affects Versions: 3.2.6 Reporter: stephen mallette Priority: Critical In the console we get this: {code} gremlin> v2 = g.V(2).next() ==>v[2] gremlin> g.V(1).out().inject(v2).values("name") ==>vadas ==>lop ==>vadas ==>josh {code} >From gremlin-python we can see: {code} >>> v2 = g.V(2).next() >>> g.V(1).out().inject(v2).values("name").toList() [u'lop', u'vadas', u'josh'] {code} and using {code}withRemote(){code} in java: {code} gremlin> v2 = g.V(2).next() ==>v[2] gremlin> g.V(1).out().inject(v2).values("name") ==>lop ==>vadas ==>josh {code} Since {{inject()}} doesn't re-attach the vertex when {{values()}} gets called it acts on a reference vertex with no properties and returns nothing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TINKERPOP-1802) hasId() fails for empty collections
Daniel Kuppitz created TINKERPOP-1802: - Summary: hasId() fails for empty collections Key: TINKERPOP-1802 URL: https://issues.apache.org/jira/browse/TINKERPOP-1802 Project: TinkerPop Issue Type: Bug Components: process Affects Versions: 3.2.6, 3.3.0 Reporter: Daniel Kuppitz Assignee: Daniel Kuppitz {noformat} gremlin> g.V().hasId(within([])) 0 Type ':help' or ':h' for help. Display stack trace? [yN] {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1752) Gremlin.Net: Generate completely type-safe methods
[ https://issues.apache.org/jira/browse/TINKERPOP-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209601#comment-16209601 ] ASF GitHub Bot commented on TINKERPOP-1752: --- Github user FlorianHockmann commented on the issue: https://github.com/apache/tinkerpop/pull/712 I finally found some time to work on this again and fixed the issues mentioned by @jorgebay. However, I left the `Bindings` implementation unchanged despite the problems with concurrent access as it seems to be still the best solution. (I'm of course open for suggestions on how this can be improved.) I also noticed that my changes broke the `WithoutStrategies` source step as that now correctly expects to get the `Types` of the Strategies to exclude for which Gremlin.Net had no serializer. So I added a serializer that works basically like the respective one in gremlin-python as it also simply creates an object of the `Type` and then this object will be serialized as before. A unit test ensures that all Strategies have a parameterless constructor as we can't serialize their `Type` otherwise. Honestly, I was a bit surprised that I had to serialize `Types` by serializing an empty object of that `Type` although the IO docs show that the GraphSON type `g:Class` [can be serialized like this](http://tinkerpop.apache.org/docs/3.3.0/dev/io/#_class): ```json { "@type" : "g:Class", "@value" : "java.io.File" } ``` but the Gremlin Server couldn't deserialize the Strategy class when I serialized it like this. So is the documentation wrong here? @spmallette: Could you clarify my confusion here? Also the IO docs don't mention how `TraversalStrategies` are serialized in general. Should we add that? The build is currently failing, but that seems to be caused by travis-ci/travis-ci#8607. I built Gremlin.Net locally and executed the tests without any problems. BTW: Would it make sense to create a separate pull request for `master` or can we simply execute `generate.groovy` later when this is merged from `tp32` into `master`? > Gremlin.Net: Generate completely type-safe methods > -- > > Key: TINKERPOP-1752 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1752 > Project: TinkerPop > Issue Type: Improvement > Components: dotnet >Affects Versions: 3.2.5 >Reporter: Florian Hockmann >Priority: Minor > > Currently the generated traversal methods in Gremlin.Net take {{params > object[] args}} as an argument which allows the user to provide an arbitrary > number of arguments with any type. While this makes the generation rather > simple, it doesn't tell the user which arguments are actually valid so users > can submit completely invalid traversals like: > {code} > g.V(1).AddE(1234, "invalidArgument2").Next() > {code} > Type-safe methods could also use the original argument names to tell users > something about what kind of values the methods expect. Consider for example > the following method signatures for the C# step {{AddE}} that are basically a > 1:1 representation of the original Java {{addE}} step: > {code} > public GraphTraversal< S , Edge > AddE (Direction direction, string > firstVertexKeyOrEdgeLabel, string edgeLabelOrSecondVertexKey, params object[] > propertyKeyValues); > public GraphTraversal< S , Edge > AddE (string edgeLabel); > {code} > Implementing this should make TINKERPOP-1725 obsolete and also resolve > TINKERPOP-1751. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] tinkerpop issue #712: TINKERPOP-1752: Gremlin.Net: Generate completely type-...
Github user FlorianHockmann commented on the issue: https://github.com/apache/tinkerpop/pull/712 I finally found some time to work on this again and fixed the issues mentioned by @jorgebay. However, I left the `Bindings` implementation unchanged despite the problems with concurrent access as it seems to be still the best solution. (I'm of course open for suggestions on how this can be improved.) I also noticed that my changes broke the `WithoutStrategies` source step as that now correctly expects to get the `Types` of the Strategies to exclude for which Gremlin.Net had no serializer. So I added a serializer that works basically like the respective one in gremlin-python as it also simply creates an object of the `Type` and then this object will be serialized as before. A unit test ensures that all Strategies have a parameterless constructor as we can't serialize their `Type` otherwise. Honestly, I was a bit surprised that I had to serialize `Types` by serializing an empty object of that `Type` although the IO docs show that the GraphSON type `g:Class` [can be serialized like this](http://tinkerpop.apache.org/docs/3.3.0/dev/io/#_class): ```json { "@type" : "g:Class", "@value" : "java.io.File" } ``` but the Gremlin Server couldn't deserialize the Strategy class when I serialized it like this. So is the documentation wrong here? @spmallette: Could you clarify my confusion here? Also the IO docs don't mention how `TraversalStrategies` are serialized in general. Should we add that? The build is currently failing, but that seems to be caused by travis-ci/travis-ci#8607. I built Gremlin.Net locally and executed the tests without any problems. BTW: Would it make sense to create a separate pull request for `master` or can we simply execute `generate.groovy` later when this is merged from `tp32` into `master`? ---
[jira] [Updated] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stephen mallette updated TINKERPOP-1801: Component/s: hadoop > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TINKERPOP-1800) Remote connect issue
[ https://issues.apache.org/jira/browse/TINKERPOP-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stephen mallette updated TINKERPOP-1800: Fix Version/s: (was: 3.3.0) > Remote connect issue > > > Key: TINKERPOP-1800 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1800 > Project: TinkerPop > Issue Type: Bug > Components: groovy >Affects Versions: 3.3.0 > Environment: WINDOWS 10 >Reporter: Loveneet kumar > > In windows 10 environment i facing this issue : > {color:#59afe1}gremlin> remote connect tinkerpop.server conf/remote.yaml > groovysh_parse: 2: expecting EOF, found 'conf' @ line 2, column 33. >remote connect tinkerpop.server conf/remote.yaml >^ > 1 error > Type ':help' or ':h' for help. > Display stack trace? [yN] > gremlin> :d > Buffer is empty > gremlin> :{color} > After changing slash direction > {color:#59afe1}gremlin> remote connect tinkerpop.server conf\remote.yaml > groovysh_parse: 2: unexpected char: '\' @ line 2, column 37. >remote connect tinkerpop.server conf\remote.yaml{color} >^ > 1 error -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209481#comment-16209481 ] ASF GitHub Bot commented on TINKERPOP-1801: --- Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/734 This is a nice update @artem-aliev because it doesn't change API and it is general for all `GraphComputer` implementations. Great! A couple things please for a solid VOTE. 1. Please update the `CHANGELOG.asciidoc` with the change you made. 2. In this PR discussion, please provide a `CUT/PASTE` of what the new metrics `toString()` looks like so people can judge its merits. Thank you. > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] tinkerpop issue #734: TINKERPOP-1801: fix profile() timing in OLAP by adding...
Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/734 This is a nice update @artem-aliev because it doesn't change API and it is general for all `GraphComputer` implementations. Great! A couple things please for a solid VOTE. 1. Please update the `CHANGELOG.asciidoc` with the change you made. 2. In this PR discussion, please provide a `CUT/PASTE` of what the new metrics `toString()` looks like so people can judge its merits. Thank you. ---
[jira] [Commented] (TINKERPOP-1786) Recipe and missing manifest items for Spark on Yarn
[ https://issues.apache.org/jira/browse/TINKERPOP-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209465#comment-16209465 ] ASF GitHub Bot commented on TINKERPOP-1786: --- Github user vtslab commented on the issue: https://github.com/apache/tinkerpop/pull/721 I am fine with the PR now. Build server needs a check, though. > Recipe and missing manifest items for Spark on Yarn > --- > > Key: TINKERPOP-1786 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1786 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop >Affects Versions: 3.3.0, 3.1.8, 3.2.6 > Environment: gremlin-console >Reporter: Marc de Lignie >Priority: Minor > Fix For: 3.2.7, 3.3.1 > > > Thorough documentation for running OLAP queries on Spark on Yarn has been > missing, keeping some users from getting the benefits of this nice feature of > the Tinkerpop stack and resulting in a significant number of questions on the > gremlin users list. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] tinkerpop issue #721: TINKERPOP-1786 Recipe and missing manifest items for S...
Github user vtslab commented on the issue: https://github.com/apache/tinkerpop/pull/721 I am fine with the PR now. Build server needs a check, though. ---
Re: Notes on TraverserSet and Sqlg optimizations
Yes the hasCode() and equals() is correct. It is however a slightly heavier operation than TinkerGraph as Sqlg's Element's id is a more complex object holding the label and its id. I should have mentioned that in Sqlg the traverser is always a B_LP_O_P_S_SE_SL_Traverser. As Sqlg returns multiple VertexSteps in one go I use the path information to reconstruct the jdbc ResultSet from the db. This makes the hashCode() and equals() operation heavier as it is called on B_LP_O_P_S_SE_SL_Traverser which calls hashCode() and equals() on Path and they in turn are non trivial operations. Cheers Pieter On 17/10/2017 23:58, Marko Rodriguez wrote: …do your vertices implement hashCode() and equals() “correctly” ? Marko. On Oct 17, 2017, at 2:40 PM, Stephen Mallettewrote: So if I understand correctly the map is only needed for bulking so quite often is not needed. afaik, it is only used for bulking though it's hard to characterize how often it is used - i suppose it all depends on the types of traversals you write and the nature of the data being traversed. A significant difference. The performance numbers are interesting. You don't get a speedup in sqlg though when bullking would be enacted though - only when bulking would have no effect - correct? On Fri, Oct 13, 2017 at 3:48 PM, pieter gmail wrote: Hi, Doing step optimizations I am noticing a rather severe performance hit in TraverserSet. Sqlg does a secondary optimization on steps that it can not optimize from the GraphStep. Before the secondary optimization these steps will execute at least one query for each incoming start. The optimization caches the incoming start traverser and the step is executed for all incoming traversers in one go. This has the effect of changing the semantics into a breath first traversal as opposed to the default depth first. So basically the replaced steps code looks like follows @Override protected Traverser.Admin processNextStart() throws NoSuchElementException { if (this.first) { this.first = false; while (this.starts.hasNext()) { Traverser.Admin start = this.starts.next(); this.traversal.addStart(start); } The performance hit is in the this.traversal.addStart(start) which ends up putting the start into the TraverserSet's internal LinkedHashMap. So if I understand correctly the map is only needed for bulking so quite often is not needed. Replacing the map with an ArrayList improves the performance drastically. For the test the optimization does the following. I replace the TraversalFilterStep with a custom SqlTraversalFilterStep which extends from a custom SqlAbstractStep. The custom SqlgAbstractStep in turn replaces the ExpandableStepIterator with a custom SqlgExpandableStepIterator which is a copy of ExpandableStepIterator except for replacing TraverserSet with a List traversers = new ArrayList<>(); @Test public void testSqlgTraversalFilterStepPerformance() { this.sqlgGraph.tx().normalBatchModeOn(); int count = 1; for (int i = 0; i < count; i++) { Vertex a1 = this.sqlgGraph.addVertex(T.label, "A", "name", "a1"); Vertex b1 = this.sqlgGraph.addVertex(T.label, "B", "name", "b1"); a1.addEdge("ab", b1); } this.sqlgGraph.tx().commit(); StopWatch stopWatch = new StopWatch(); for (int i = 0; i < 1000; i++) { stopWatch.start(); GraphTraversal traversal = this.sqlgGraph.traversal() .V().hasLabel("A") .where(__.out().hasLabel("B")); List vertices = traversal.toList(); Assert.assertEquals(count, vertices.size()); stopWatch.stop(); System.out.println(stopWatch.toString()); stopWatch.reset(); } } Without the ArrayList optimization the output is, 0:00:12.198 0:00:09.756 0:00:09.435 0:00:14.466 0:00:10.197 0:00:04.937 0:00:02.974 0:00:02.942 0:00:02.977 0:00:03.142 0:00:03.207 With the ArrayList optimization the output is, 0:00:00.334 0:00:00.147 0:00:00.114 0:00:00.100 ... time for jit 0:00:00.055 0:00:00.056 0:00:00.054 0:00:00.053 0:00:00.054 0:00:00.055 A significant difference. For TinkerGraph this tests optimization is moot as the TraversalFilterStep resets the step for every step making the TraverserSet's map empty so the traversers equals method is never called. Not sure if there are scenarios where this optimization will be any good for TinkerGraph but thought I'd let you know how I am optimizing steps. A concern is that I am now replacing core steps which makes Sqlg further away from the reference implementation making it fragile to changes in TinkerPop and harder to keep up to upstream changes. Perhaps there is a way to make TravererSet's current behavior configurable? Cheers Pieter
Re: Notes on TraverserSet and Sqlg optimizations
Currently Sqlg's optimization strategies removes bulking as it does not work with Sqlg's way of accessing the database. Sqlg fetches many VertexSteps in one go and bulking needs it to be on a one by one basis. Bulking is still possible but only by removing Sqlg's strategies from the traversal. They way I understood bulking it is only of use for a particular graph shape. Graphs with lots references from the same label back to itself. For the kind of graphs I work on and hopefully most of my users the graphs are more like trees where bulking is less useful. Later I hope to look at bulking and see if its possible to predict whether a query would be better of with bulking. Cheers Pieter On 17/10/2017 22:40, Stephen Mallette wrote: So if I understand correctly the map is only needed for bulking so quite often is not needed. afaik, it is only used for bulking though it's hard to characterize how often it is used - i suppose it all depends on the types of traversals you write and the nature of the data being traversed. A significant difference. The performance numbers are interesting. You don't get a speedup in sqlg though when bullking would be enacted though - only when bulking would have no effect - correct? On Fri, Oct 13, 2017 at 3:48 PM, pieter gmailwrote: Hi, Doing step optimizations I am noticing a rather severe performance hit in TraverserSet. Sqlg does a secondary optimization on steps that it can not optimize from the GraphStep. Before the secondary optimization these steps will execute at least one query for each incoming start. The optimization caches the incoming start traverser and the step is executed for all incoming traversers in one go. This has the effect of changing the semantics into a breath first traversal as opposed to the default depth first. So basically the replaced steps code looks like follows @Override protected Traverser.Admin processNextStart() throws NoSuchElementException { if (this.first) { this.first = false; while (this.starts.hasNext()) { Traverser.Admin start = this.starts.next(); this.traversal.addStart(start); } The performance hit is in the this.traversal.addStart(start) which ends up putting the start into the TraverserSet's internal LinkedHashMap. So if I understand correctly the map is only needed for bulking so quite often is not needed. Replacing the map with an ArrayList improves the performance drastically. For the test the optimization does the following. I replace the TraversalFilterStep with a custom SqlTraversalFilterStep which extends from a custom SqlAbstractStep. The custom SqlgAbstractStep in turn replaces the ExpandableStepIterator with a custom SqlgExpandableStepIterator which is a copy of ExpandableStepIterator except for replacing TraverserSet with a List traversers = new ArrayList<>(); @Test public void testSqlgTraversalFilterStepPerformance() { this.sqlgGraph.tx().normalBatchModeOn(); int count = 1; for (int i = 0; i < count; i++) { Vertex a1 = this.sqlgGraph.addVertex(T.label, "A", "name", "a1"); Vertex b1 = this.sqlgGraph.addVertex(T.label, "B", "name", "b1"); a1.addEdge("ab", b1); } this.sqlgGraph.tx().commit(); StopWatch stopWatch = new StopWatch(); for (int i = 0; i < 1000; i++) { stopWatch.start(); GraphTraversal traversal = this.sqlgGraph.traversal() .V().hasLabel("A") .where(__.out().hasLabel("B")); List vertices = traversal.toList(); Assert.assertEquals(count, vertices.size()); stopWatch.stop(); System.out.println(stopWatch.toString()); stopWatch.reset(); } } Without the ArrayList optimization the output is, 0:00:12.198 0:00:09.756 0:00:09.435 0:00:14.466 0:00:10.197 0:00:04.937 0:00:02.974 0:00:02.942 0:00:02.977 0:00:03.142 0:00:03.207 With the ArrayList optimization the output is, 0:00:00.334 0:00:00.147 0:00:00.114 0:00:00.100 ... time for jit 0:00:00.055 0:00:00.056 0:00:00.054 0:00:00.053 0:00:00.054 0:00:00.055 A significant difference. For TinkerGraph this tests optimization is moot as the TraversalFilterStep resets the step for every step making the TraverserSet's map empty so the traversers equals method is never called. Not sure if there are scenarios where this optimization will be any good for TinkerGraph but thought I'd let you know how I am optimizing steps. A concern is that I am now replacing core steps which makes Sqlg further away from the reference implementation making it fragile to changes in TinkerPop and harder to keep up to upstream changes. Perhaps there is a way to make TravererSet's current behavior configurable? Cheers Pieter