[jira] [Created] (TINKERPOP-2081) PersistedOutputRDD materialises rdd lazily with Spark 2.x
Artem Aliev created TINKERPOP-2081: -- Summary: PersistedOutputRDD materialises rdd lazily with Spark 2.x Key: TINKERPOP-2081 URL: https://issues.apache.org/jira/browse/TINKERPOP-2081 Project: TinkerPop Issue Type: Bug Affects Versions: 3.3.4 Reporter: Artem Aliev PersistedOutputRDD is not actually persist RDD in spark memory but mark it for lazy caching in the future. It looks like caching was eager in Spark 1.6, but in spark 2.0 it lazy. The lazy caching looks wrong for this case, the source graph could be changed after snapshot is created and snapshot should not be affected by that changes. The fix itself is simple: PersistedOutputRDD should call any spark action to trigger eager caching. For example count() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TINKERPOP-1871) Exception handling is slow in element ReferenceElement creation
Artem Aliev created TINKERPOP-1871: -- Summary: Exception handling is slow in element ReferenceElement creation Key: TINKERPOP-1871 URL: https://issues.apache.org/jira/browse/TINKERPOP-1871 Project: TinkerPop Issue Type: Improvement Affects Versions: 3.3.1 Reporter: Artem Aliev Following exception happen for each vertex in OLAP and takes ~10% of execution time. [https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/util/reference/ReferenceElement.java#L48] The exception is always thrown for ComputerGraph.ComputerAdjacentVertex class. So the check could be added to improve performance. This is 3.3.x issue only, 3.2 has no this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TINKERPOP-1870) n^2 synchronious operation in OLAP WorkerExecutor.execute() method
[ https://issues.apache.org/jira/browse/TINKERPOP-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Aliev updated TINKERPOP-1870: --- Affects Version/s: 3.2.7 3.3.1 > n^2 synchronious operation in OLAP WorkerExecutor.execute() method > -- > > Key: TINKERPOP-1870 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1870 > Project: TinkerPop > Issue Type: Improvement >Affects Versions: 3.2.7, 3.3.1 >Reporter: Artem Aliev >Priority: Major > Attachments: findTraverser1.png, findTraverser2.png, > findTraverserFixed.png > > > [https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/WorkerExecutor.java#L80-L93] > This block of code iterates over all remote traverses to select one related > to the current vertex and remove it. This operation is repeated for the next > vertex and so one. For following example query it means n^2 operations (n is > number of vertices). All of them in sync block. multi core spark executor > will do this operations serial. > {code} > g.V().emit().repeat(both().dedup()).count().next() > {code} > See jvisualvm screenshot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1870) n^2 synchronious operation in OLAP WorkerExecutor.execute() method
[ https://issues.apache.org/jira/browse/TINKERPOP-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326941#comment-16326941 ] Artem Aliev commented on TINKERPOP-1870: I wrapped the block into findVertexTraverser() method to see its timing in profiler. See attached profiler screenshots So it takes 20-30% of execution time in single 6 core executor. The performance was was greatly improved on my 10k vertex graph: Before fix: {code} gremlin> g.V().count() ==>1 gremlin> g.E().count() ==>16 gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()} ==>52349.640981 gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()} ==>53800.89875495 gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()} ==>50643.744645 {code} After fix: {code} gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()} ==>42062.945477 gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()} ==>38419.46317196 gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()} ==>34336.707208 {code} {code} >mvn clean install [INFO] BUILD SUCCESS {code} > n^2 synchronious operation in OLAP WorkerExecutor.execute() method > -- > > Key: TINKERPOP-1870 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1870 > Project: TinkerPop > Issue Type: Improvement >Reporter: Artem Aliev >Priority: Major > Attachments: findTraverser1.png, findTraverser2.png, > findTraverserFixed.png > > > [https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/WorkerExecutor.java#L80-L93] > This block of code iterates over all remote traverses to select one related > to the current vertex and remove it. This operation is repeated for the next > vertex and so one. For following example query it means n^2 operations (n is > number of vertices). All of them in sync block. multi core spark executor > will do this operations serial. > {code} > g.V().emit().repeat(both().dedup()).count().next() > {code} > See jvisualvm screenshot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TINKERPOP-1870) n^2 synchronious operation in OLAP WorkerExecutor.execute() method
[ https://issues.apache.org/jira/browse/TINKERPOP-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Aliev updated TINKERPOP-1870: --- Attachment: findTraverserFixed.png findTraverser2.png findTraverser1.png > n^2 synchronious operation in OLAP WorkerExecutor.execute() method > -- > > Key: TINKERPOP-1870 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1870 > Project: TinkerPop > Issue Type: Improvement >Reporter: Artem Aliev >Priority: Major > Attachments: findTraverser1.png, findTraverser2.png, > findTraverserFixed.png > > > [https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/WorkerExecutor.java#L80-L93] > This block of code iterates over all remote traverses to select one related > to the current vertex and remove it. This operation is repeated for the next > vertex and so one. For following example query it means n^2 operations (n is > number of vertices). All of them in sync block. multi core spark executor > will do this operations serial. > {code} > g.V().emit().repeat(both().dedup()).count().next() > {code} > See jvisualvm screenshot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1870) n^2 synchronious operation in OLAP WorkerExecutor.execute() method
[ https://issues.apache.org/jira/browse/TINKERPOP-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326356#comment-16326356 ] Artem Aliev commented on TINKERPOP-1870: The fix I provided could be simplified. VertexRemoteSet extends RemoteSet now for backward compatibility, just in case someone use it directly in VertexPrograms. If it is fully internal structure It could become simple synhronious MultiValue hash map. The map preserves traverser order for each vertex. > n^2 synchronious operation in OLAP WorkerExecutor.execute() method > -- > > Key: TINKERPOP-1870 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1870 > Project: TinkerPop > Issue Type: Improvement >Reporter: Artem Aliev >Priority: Major > > [https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/WorkerExecutor.java#L80-L93] > This block of code iterates over all remote traverses to select one related > to the current vertex and remove it. This operation is repeated for the next > vertex and so one. For following example query it means n^2 operations (n is > number of vertices). All of them in sync block. multi core spark executor > will do this operations serial. > {code} > g.V().emit().repeat(both().dedup()).count().next() > {code} > See jvisualvm screenshot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TINKERPOP-1870) n^2 synchronious operation in OLAP WorkerExecutor.execute() method
Artem Aliev created TINKERPOP-1870: -- Summary: n^2 synchronious operation in OLAP WorkerExecutor.execute() method Key: TINKERPOP-1870 URL: https://issues.apache.org/jira/browse/TINKERPOP-1870 Project: TinkerPop Issue Type: Improvement Reporter: Artem Aliev [https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/WorkerExecutor.java#L80-L93] This block of code iterates over all remote traverses to select one related to the current vertex and remove it. This operation is repeated for the next vertex and so one. For following example query it means n^2 operations (n is number of vertices). All of them in sync block. multi core spark executor will do this operations serial. {code} g.V().emit().repeat(both().dedup()).count().next() {code} See jvisualvm screenshot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing
[ https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208065#comment-16208065 ] Artem Aliev commented on TINKERPOP-1801: That is simple way to fix it, without new API. Let's discuss better apporaches, I did not add new tests, I find set of them in a test suite. I slide have mine one but it is unstable because of timings. > OLAP profile() step return incorrect timing > > > Key: TINKERPOP-1801 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.3.0, 3.2.6 >Reporter: Artem Aliev > > Graph ProfileStep calculates time of next()/hasNext() calls, expecting > recursion. > But Message passing/RDD joins is used by GraphComputer. > So next() does not recursively call next steps, but message is generated. And > most of the time is taken by message passing (RDD join). > Thus on graph computer the time between ProfileStep should be measured, not > inside it. > The other approach is to get Spark statistics with SparkListener and add > spark stages timings into profiler metrics. that will work only for spark but > will give better representation of step costs. > The simple fix is measuring time between OLAP iterations and add it to the > profiler step. > This will not take into account computer setup time, but will be precise > enough for long running queries. > To reproduce: > tinkerPop 3.2.6 gremlin: > {code} > plugin activated: tinkerpop.server > plugin activated: tinkerpop.utilities > plugin activated: tinkerpop.spark > plugin activated: tinkerpop.tinkergraph > gremlin> graph = > GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') > gremlin> g = graph.traversal().withComputer(SparkGraphComputer) > ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], > sparkgraphcomputer] > gremlin> g.V().out().out().count().profile() > ==>Traversal Metrics > Step Count > Traversers Time (ms)% Dur > = > GraphStep(vertex,[]) 808 >808 2.02518.35 > VertexStep(OUT,vertex) 8049 >562 4.43040.14 > VertexStep(OUT,edge) 327370 > 7551 4.58141.50 > CountGlobalStep1 > 1 0.001 0.01 > >TOTAL - > - 11.038- > gremlin> clock(1){g.V().out().out().count().next() } > ==>3421.92758 > gremlin> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TINKERPOP-1801) OLAP profile() step return incorrect timing
Artem Aliev created TINKERPOP-1801: -- Summary: OLAP profile() step return incorrect timing Key: TINKERPOP-1801 URL: https://issues.apache.org/jira/browse/TINKERPOP-1801 Project: TinkerPop Issue Type: Bug Affects Versions: 3.2.6, 3.3.0 Reporter: Artem Aliev Graph ProfileStep calculates time of next()/hasNext() calls, expecting recursion. But Message passing/RDD joins is used by GraphComputer. So next() does not recursively call next steps, but message is generated. And most of the time is taken by message passing (RDD join). Thus on graph computer the time between ProfileStep should be measured, not inside it. The other approach is to get Spark statistics with SparkListener and add spark stages timings into profiler metrics. that will work only for spark but will give better representation of step costs. The simple fix is measuring time between OLAP iterations and add it to the profiler step. This will not take into account computer setup time, but will be precise enough for long running queries. To reproduce: tinkerPop 3.2.6 gremlin: {code} plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.spark plugin activated: tinkerpop.tinkergraph gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties') gremlin> g = graph.traversal().withComputer(SparkGraphComputer) ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer] gremlin> g.V().out().out().count().profile() ==>Traversal Metrics Step Count Traversers Time (ms)% Dur = GraphStep(vertex,[]) 808 808 2.02518.35 VertexStep(OUT,vertex) 8049 562 4.43040.14 VertexStep(OUT,edge) 327370 7551 4.58141.50 CountGlobalStep1 1 0.001 0.01 >TOTAL - - 11.038- gremlin> clock(1){g.V().out().out().count().next() } ==>3421.92758 gremlin> {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TINKERPOP-1783) PageRank gives incorrect results for graphs with sinks
[ https://issues.apache.org/jira/browse/TINKERPOP-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167499#comment-16167499 ] Artem Aliev commented on TINKERPOP-1783: The work around I proposed is incorrect. The correct behaviour is "user come to random vertex from the sink vertex" > PageRank gives incorrect results for graphs with sinks > -- > > Key: TINKERPOP-1783 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1783 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.3.0, 3.1.8, 3.2.6 >Reporter: Artem Aliev > > {quote} Sink vertices (those with no outgoing edges) should evenly distribute > their rank to the entire graph but in the current implementation it is just > lost. > {quote} > Wiki: https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm > {quote} In the original form of PageRank, the sum of PageRank over all pages > was the total number of pages on the web at that time > {quote} > I found the issue, while comparing results with the spark graphX. > So this is a copy of https://issues.apache.org/jira/browse/SPARK-18847 > How to reproduce: > {code} > gremlin> graph = TinkerFactory.createModern() > gremlin> g = graph.traversal().withComputer() > gremlin> > g.V().pageRank(0.85).times(40).by('pageRank').values('pageRank').sum() > ==>1.318625 > gremlin> g.V().pageRank(0.85).times(1).by('pageRank').values('pageRank').sum() > ==>3.4497 > #inital values: > gremlin> g.V().pageRank(0.85).times(0).by('pageRank').values('pageRank').sum() > ==>6.0 > {code} > They fixed the issue by normalising values after each step. > The other way to fix is to send the message to it self (stay on the same > page). > To workaround the problem just add self pointing edges: > {code} > gremlin>g.V().as('B').addE('knows').from('B') > {code} > Then you'll get always correct sum. But I'm not sure it is a proper > assumption. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TINKERPOP-1783) PageRank gives incorrect results for graphs with sinks
Artem Aliev created TINKERPOP-1783: -- Summary: PageRank gives incorrect results for graphs with sinks Key: TINKERPOP-1783 URL: https://issues.apache.org/jira/browse/TINKERPOP-1783 Project: TinkerPop Issue Type: Bug Affects Versions: 3.2.6, 3.1.8, 3.3.0 Reporter: Artem Aliev {quote} Sink vertices (those with no outgoing edges) should evenly distribute their rank to the entire graph but in the current implementation it is just lost. {quote} Wiki: https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm {quote} In the original form of PageRank, the sum of PageRank over all pages was the total number of pages on the web at that time {quote} I found the issue, while comparing results with the spark graphX. So this is a copy of https://issues.apache.org/jira/browse/SPARK-18847 How to reproduce: {code} gremlin> graph = TinkerFactory.createModern() gremlin> g = graph.traversal().withComputer() gremlin> g.V().pageRank(0.85).times(40).by('pageRank').values('pageRank').sum() ==>1.318625 gremlin> g.V().pageRank(0.85).times(1).by('pageRank').values('pageRank').sum() ==>3.4497 #inital values: gremlin> g.V().pageRank(0.85).times(0).by('pageRank').values('pageRank').sum() ==>6.0 {code} They fixed the issue by normalising values after each step. The other way to fix is to send the message to it self (stay on the same page). To workaround the problem just add self pointing edges: {code} gremlin>g.V().as('B').addE('knows').from('B') {code} Then you'll get always correct sum. But I'm not sure it is a proper assumption. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TINKERPOP-1754) Spark can not deserialise some ScriptRecordReader parse exceptions
Artem Aliev created TINKERPOP-1754: -- Summary: Spark can not deserialise some ScriptRecordReader parse exceptions Key: TINKERPOP-1754 URL: https://issues.apache.org/jira/browse/TINKERPOP-1754 Project: TinkerPop Issue Type: Bug Components: hadoop Affects Versions: 3.3.0 Reporter: Artem Aliev Priority: Minor ScriptException refer to groovy exception that could point to "Script" class that is not available for system class loader. Spark can not deserialise the exception and user did not get the parse error. To fix the problem ScriptRecordReader should not try to propagate all cause exceptions abut only the message with parse error. Spark output: {code} WARN [task-result-getter-0] 2017-08-16 11:11:41,777 TaskEndReason.scala:192 - Task exception could not be deserialized java.lang.ClassNotFoundException: Script1 at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_40] at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_40] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[na:1.8.0_40] at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_40] at java.lang.Class.forName0(Native Method) ~[na:1.8.0_40] at java.lang.Class.forName(Class.java:348) ~[na:1.8.0_40] at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) ~[spark-core_2.11-2.2.0.0-bb4c2a9.jar:2.2.0.0-bb4c2a9] at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) [na:1.8.0_40] at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) [na:1.8.0_40] at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1484) [na:1.8.0_40] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1334) [na:1.8.0_40] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [na:1.8.0_40] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) [na:1.8.0_40] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [na:1.8.0_40] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [na:1.8.0_40] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [na:1.8.0_40] at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501) [na:1.8.0_40] at java.lang.Throwable.readObject(Throwable.java:914) ~[na:1.8.0_40] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_40] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_40] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_40] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_40] at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) [na:1.8.0_40] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896) [na:1.8.0_40] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [na:1.8.0_40] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [na:1.8.0_40] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [na:1.8.0_40] at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501) [na:1.8.0_40] at java.lang.Throwable.readObject(Throwable.java:914) ~[na:1.8.0_40] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_40] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_40] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_40] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_40] at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) [na:1.8.0_40] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896) [na:1.8.0_40] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [na:1.8.0_40] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [na:1.8.0_40] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) [na:1.8.0_40] at org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:193) ~[spark-core_2.11-2.2.0.0-bb4c2a9.jar:2.2.0.0-bb4c2a9] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_40] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_40] at
[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context
[ https://issues.apache.org/jira/browse/TINKERPOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851300#comment-15851300 ] Artem Aliev commented on TINKERPOP-1271: mvn clean install -DskipIntegrationTests=false passed , but I see following out, is it ok? {code} ... Running org.apache.tinkerpop.gremlin.spark.SparkGremlinGryoSerializerTest [ERROR] org.apache.tinkerpop.gremlin.AbstractGremlinSuite - The SparkGremlinSuite will run for this Graph as it is testing a Gremlin flavor but the Graph does not publicly acknowledged it yet with the @OptIn annotation. Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.273 sec - in org.apache.tinkerpop.gremlin.spark.SparkGremlinGryoSerializerTest ... {code} {code} ... [WARN] org.apache.tinkerpop.gremlin.hadoop.groovy.plugin.HadoopGremlinPlugin - Be sure to set the environmental variable: HADOOP_GREMLIN_LIBS [WARN] org.apache.tinkerpop.gremlin.hadoop.process.computer.AbstractHadoopGraphComputer - /Users/artemaliev/git/tinkerpop.ali/spark-gremlin/target/test-case-data/HadoopGremlinPluginCheck/shouldGracefullyHandleBadGremlinHadoopLibs/ does not reference a valid directory -- proceeding regardless ... {code} > SparkContext should be restarted if Killed and using Persistent Context > --- > > Key: TINKERPOP-1271 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1271 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.2.0-incubating, 3.1.2-incubating >Reporter: Russell Spitzer > > If the persisted Spark Context is killed by the user via the Spark UI or is > terminated for some other error the Gremlin Console/Server is left with a > stopped Spark Context. This could be caught and the spark context recreated. > Oddly enough if you simply wait the context will "reset" itself or possible > get GC'd out of the system and everything works again. > ##Repo > {code} > gremlin> g.V().count() > WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer > - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless > ==>6 > gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend > - Application has been killed. Reason: Master removed our application: KILLED > ERROR org.apache.spark.scheduler.TaskSchedulerImpl - Lost executor 0 on > 10.150.0.180: Remote RPC client disassociated. Likely due to containers > exceeding thresholds, or network issues. Check driver logs for WARN messages. > // Driver has been killed here via the Master UI > gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties') > ==>hadoopgraph[gryoinputformat->gryooutputformat] > gremlin> g.V().count() > WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer > - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless > java.lang.IllegalStateException: Cannot call methods on a stopped > SparkContext. > This stopped SparkContext was created at: > org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) > org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53) > org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60) > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122) > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > The currently active SparkContext was created at: > org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) > org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53) > org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60) > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122) > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > {code} > Full trace from TP > {code} > at > org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106) > at > org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130) > at > org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at >
[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context
[ https://issues.apache.org/jira/browse/TINKERPOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850209#comment-15850209 ] Artem Aliev commented on TINKERPOP-1271: "mvn install" tests passed I have test it manually on master with spark 2.0 and back ported SPARK-19362, to check stop works. > SparkContext should be restarted if Killed and using Persistent Context > --- > > Key: TINKERPOP-1271 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1271 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.2.0-incubating, 3.1.2-incubating >Reporter: Russell Spitzer > > If the persisted Spark Context is killed by the user via the Spark UI or is > terminated for some other error the Gremlin Console/Server is left with a > stopped Spark Context. This could be caught and the spark context recreated. > Oddly enough if you simply wait the context will "reset" itself or possible > get GC'd out of the system and everything works again. > ##Repo > {code} > gremlin> g.V().count() > WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer > - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless > ==>6 > gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend > - Application has been killed. Reason: Master removed our application: KILLED > ERROR org.apache.spark.scheduler.TaskSchedulerImpl - Lost executor 0 on > 10.150.0.180: Remote RPC client disassociated. Likely due to containers > exceeding thresholds, or network issues. Check driver logs for WARN messages. > // Driver has been killed here via the Master UI > gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties') > ==>hadoopgraph[gryoinputformat->gryooutputformat] > gremlin> g.V().count() > WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer > - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless > java.lang.IllegalStateException: Cannot call methods on a stopped > SparkContext. > This stopped SparkContext was created at: > org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) > org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53) > org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60) > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122) > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > The currently active SparkContext was created at: > org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) > org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53) > org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60) > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122) > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > {code} > Full trace from TP > {code} > at > org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106) > at > org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130) > at > org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) > at > org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1129) > at > org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:507) > at > org.apache.tinkerpop.gremlin.spark.structure.io.InputFormatRDD.readGraphRDD(InputFormatRDD.java:42) > at > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:195) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > {code} > If
[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context
[ https://issues.apache.org/jira/browse/TINKERPOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837700#comment-15837700 ] Artem Aliev commented on TINKERPOP-1271: I have filed a spark bug for it SPARK-19362 but then find it was fixed with https://issues.apache.org/jira/browse/SPARK-18751 in spark 2.1 > SparkContext should be restarted if Killed and using Persistent Context > --- > > Key: TINKERPOP-1271 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1271 > Project: TinkerPop > Issue Type: Bug > Components: hadoop >Affects Versions: 3.2.0-incubating, 3.1.2-incubating >Reporter: Russell Spitzer > > If the persisted Spark Context is killed by the user via the Spark UI or is > terminated for some other error the Gremlin Console/Server is left with a > stopped Spark Context. This could be caught and the spark context recreated. > Oddly enough if you simply wait the context will "reset" itself or possible > get GC'd out of the system and everything works again. > ##Repo > {code} > gremlin> g.V().count() > WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer > - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless > ==>6 > gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend > - Application has been killed. Reason: Master removed our application: KILLED > ERROR org.apache.spark.scheduler.TaskSchedulerImpl - Lost executor 0 on > 10.150.0.180: Remote RPC client disassociated. Likely due to containers > exceeding thresholds, or network issues. Check driver logs for WARN messages. > // Driver has been killed here via the Master UI > gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties') > ==>hadoopgraph[gryoinputformat->gryooutputformat] > gremlin> g.V().count() > WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer > - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless > java.lang.IllegalStateException: Cannot call methods on a stopped > SparkContext. > This stopped SparkContext was created at: > org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) > org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53) > org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60) > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122) > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > The currently active SparkContext was created at: > org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) > org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53) > org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60) > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122) > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > {code} > Full trace from TP > {code} > at > org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106) > at > org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130) > at > org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) > at > org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1129) > at > org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:507) > at > org.apache.tinkerpop.gremlin.spark.structure.io.InputFormatRDD.readGraphRDD(InputFormatRDD.java:42) > at > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:195) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >