[jira] [Created] (TINKERPOP-2081) PersistedOutputRDD materialises rdd lazily with Spark 2.x

2018-10-26 Thread Artem Aliev (JIRA)
Artem Aliev created TINKERPOP-2081:
--

 Summary: PersistedOutputRDD materialises rdd lazily with Spark 2.x
 Key: TINKERPOP-2081
 URL: https://issues.apache.org/jira/browse/TINKERPOP-2081
 Project: TinkerPop
  Issue Type: Bug
Affects Versions: 3.3.4
Reporter: Artem Aliev


PersistedOutputRDD is not actually persist RDD in spark memory but mark it for 
lazy caching in the future. It looks like caching was eager in Spark 1.6, but 
in spark 2.0 it lazy.
The lazy caching looks wrong for this case, the source graph could be changed 
after snapshot is created and snapshot should not be affected by that changes.

The fix itself is simple: PersistedOutputRDD should call any spark action to 
trigger eager caching. For example count()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TINKERPOP-1871) Exception handling is slow in element ReferenceElement creation

2018-01-16 Thread Artem Aliev (JIRA)
Artem Aliev created TINKERPOP-1871:
--

 Summary: Exception handling is slow in element  ReferenceElement 
creation
 Key: TINKERPOP-1871
 URL: https://issues.apache.org/jira/browse/TINKERPOP-1871
 Project: TinkerPop
  Issue Type: Improvement
Affects Versions: 3.3.1
Reporter: Artem Aliev


Following exception happen for each vertex in OLAP and takes ~10% of execution 
time.

[https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/util/reference/ReferenceElement.java#L48]

The exception is always thrown for ComputerGraph.ComputerAdjacentVertex class.

So the check could be added to improve performance. This is 3.3.x issue only, 
3.2 has no this problem.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TINKERPOP-1870) n^2 synchronious operation in OLAP WorkerExecutor.execute() method

2018-01-16 Thread Artem Aliev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TINKERPOP-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated TINKERPOP-1870:
---
Affects Version/s: 3.2.7
   3.3.1

> n^2 synchronious operation in OLAP WorkerExecutor.execute() method
> --
>
> Key: TINKERPOP-1870
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1870
> Project: TinkerPop
>  Issue Type: Improvement
>Affects Versions: 3.2.7, 3.3.1
>Reporter: Artem Aliev
>Priority: Major
> Attachments: findTraverser1.png, findTraverser2.png, 
> findTraverserFixed.png
>
>
> [https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/WorkerExecutor.java#L80-L93]
> This block of code iterates over all remote traverses to select one related 
> to the current vertex and remove it. This operation is repeated for the next 
> vertex and so one. For following example query it means n^2 operations (n is 
> number of vertices). All of them in sync block. multi core spark executor 
> will do this operations serial. 
> {code}
> g.V().emit().repeat(both().dedup()).count().next()
> {code}
> See jvisualvm screenshot. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-1870) n^2 synchronious operation in OLAP WorkerExecutor.execute() method

2018-01-16 Thread Artem Aliev (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326941#comment-16326941
 ] 

Artem Aliev commented on TINKERPOP-1870:


I wrapped the block into findVertexTraverser() method to see its timing in 
profiler. See attached profiler screenshots

So it takes 20-30% of execution time in single 6 core executor. The performance 
was was greatly improved on my 10k vertex graph:

Before fix:
{code}

gremlin> g.V().count()
==>1
gremlin> g.E().count()
==>16

gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()}
==>52349.640981
gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()}
==>53800.89875495
gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()}
==>50643.744645

{code}

After fix:
{code}

gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()}
==>42062.945477
gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()}
==>38419.46317196
gremlin> clock(1) \{g.V().emit().repeat(both().dedup()).count().next()}
==>34336.707208

{code}

{code}
>mvn clean install
[INFO] BUILD SUCCESS
{code}

> n^2 synchronious operation in OLAP WorkerExecutor.execute() method
> --
>
> Key: TINKERPOP-1870
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1870
> Project: TinkerPop
>  Issue Type: Improvement
>Reporter: Artem Aliev
>Priority: Major
> Attachments: findTraverser1.png, findTraverser2.png, 
> findTraverserFixed.png
>
>
> [https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/WorkerExecutor.java#L80-L93]
> This block of code iterates over all remote traverses to select one related 
> to the current vertex and remove it. This operation is repeated for the next 
> vertex and so one. For following example query it means n^2 operations (n is 
> number of vertices). All of them in sync block. multi core spark executor 
> will do this operations serial. 
> {code}
> g.V().emit().repeat(both().dedup()).count().next()
> {code}
> See jvisualvm screenshot. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TINKERPOP-1870) n^2 synchronious operation in OLAP WorkerExecutor.execute() method

2018-01-16 Thread Artem Aliev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TINKERPOP-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Aliev updated TINKERPOP-1870:
---
Attachment: findTraverserFixed.png
findTraverser2.png
findTraverser1.png

> n^2 synchronious operation in OLAP WorkerExecutor.execute() method
> --
>
> Key: TINKERPOP-1870
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1870
> Project: TinkerPop
>  Issue Type: Improvement
>Reporter: Artem Aliev
>Priority: Major
> Attachments: findTraverser1.png, findTraverser2.png, 
> findTraverserFixed.png
>
>
> [https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/WorkerExecutor.java#L80-L93]
> This block of code iterates over all remote traverses to select one related 
> to the current vertex and remove it. This operation is repeated for the next 
> vertex and so one. For following example query it means n^2 operations (n is 
> number of vertices). All of them in sync block. multi core spark executor 
> will do this operations serial. 
> {code}
> g.V().emit().repeat(both().dedup()).count().next()
> {code}
> See jvisualvm screenshot. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-1870) n^2 synchronious operation in OLAP WorkerExecutor.execute() method

2018-01-15 Thread Artem Aliev (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326356#comment-16326356
 ] 

Artem Aliev commented on TINKERPOP-1870:


The fix I provided could be simplified. VertexRemoteSet extends RemoteSet now 
for backward compatibility, just in case someone use it directly in 
VertexPrograms. If it is fully internal structure It could become simple 
synhronious MultiValue hash map. The map preserves traverser order for  each 
vertex.

> n^2 synchronious operation in OLAP WorkerExecutor.execute() method
> --
>
> Key: TINKERPOP-1870
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1870
> Project: TinkerPop
>  Issue Type: Improvement
>Reporter: Artem Aliev
>Priority: Major
>
> [https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/WorkerExecutor.java#L80-L93]
> This block of code iterates over all remote traverses to select one related 
> to the current vertex and remove it. This operation is repeated for the next 
> vertex and so one. For following example query it means n^2 operations (n is 
> number of vertices). All of them in sync block. multi core spark executor 
> will do this operations serial. 
> {code}
> g.V().emit().repeat(both().dedup()).count().next()
> {code}
> See jvisualvm screenshot. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TINKERPOP-1870) n^2 synchronious operation in OLAP WorkerExecutor.execute() method

2018-01-15 Thread Artem Aliev (JIRA)
Artem Aliev created TINKERPOP-1870:
--

 Summary: n^2 synchronious operation in OLAP 
WorkerExecutor.execute() method
 Key: TINKERPOP-1870
 URL: https://issues.apache.org/jira/browse/TINKERPOP-1870
 Project: TinkerPop
  Issue Type: Improvement
Reporter: Artem Aliev


[https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/WorkerExecutor.java#L80-L93]

This block of code iterates over all remote traverses to select one related to 
the current vertex and remove it. This operation is repeated for the next 
vertex and so one. For following example query it means n^2 operations (n is 
number of vertices). All of them in sync block. multi core spark executor will 
do this operations serial. 
{code}
g.V().emit().repeat(both().dedup()).count().next()
{code}

See jvisualvm screenshot. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing

2017-10-17 Thread Artem Aliev (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208065#comment-16208065
 ] 

Artem Aliev commented on TINKERPOP-1801:


That is simple way to fix it, without new API. Let's discuss better apporaches,
I did not add new tests, I find set of them in a  test suite. I slide have mine 
one but it is unstable because of timings.


>  OLAP profile() step return incorrect timing
> 
>
> Key: TINKERPOP-1801
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1801
> Project: TinkerPop
>  Issue Type: Bug
>Affects Versions: 3.3.0, 3.2.6
>Reporter: Artem Aliev
>
> Graph ProfileStep calculates time of next()/hasNext() calls, expecting 
> recursion.
> But Message passing/RDD joins is used by GraphComputer.
> So next() does not recursively call next steps, but message is generated. And 
> most of the time is taken by message passing (RDD join). 
> Thus on graph computer the time between ProfileStep should be measured, not 
> inside it.
> The other approach is to get Spark statistics with SparkListener and add 
> spark stages timings into profiler metrics. that will work only for spark but 
> will give better representation of step costs.
> The simple fix is measuring time between OLAP iterations and add it to the 
> profiler step.
> This will not take into account computer setup time, but will be precise 
> enough for long running queries.
> To reproduce:
> tinkerPop 3.2.6 gremlin:
> {code}
> plugin activated: tinkerpop.server
> plugin activated: tinkerpop.utilities
> plugin activated: tinkerpop.spark
> plugin activated: tinkerpop.tinkergraph
> gremlin> graph = 
> GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
> gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
> ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], 
> sparkgraphcomputer]
> gremlin> g.V().out().out().count().profile()
> ==>Traversal Metrics
> Step   Count  
> Traversers   Time (ms)% Dur
> =
> GraphStep(vertex,[]) 808  
>808   2.02518.35
> VertexStep(OUT,vertex)  8049  
>562   4.43040.14
> VertexStep(OUT,edge)  327370  
>   7551   4.58141.50
> CountGlobalStep1  
>  1   0.001 0.01
> >TOTAL -  
>  -  11.038-
> gremlin> clock(1){g.V().out().out().count().next() }
> ==>3421.92758
> gremlin>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (TINKERPOP-1801) OLAP profile() step return incorrect timing

2017-10-17 Thread Artem Aliev (JIRA)
Artem Aliev created TINKERPOP-1801:
--

 Summary:  OLAP profile() step return incorrect timing
 Key: TINKERPOP-1801
 URL: https://issues.apache.org/jira/browse/TINKERPOP-1801
 Project: TinkerPop
  Issue Type: Bug
Affects Versions: 3.2.6, 3.3.0
Reporter: Artem Aliev


Graph ProfileStep calculates time of next()/hasNext() calls, expecting 
recursion.
But Message passing/RDD joins is used by GraphComputer.
So next() does not recursively call next steps, but message is generated. And 
most of the time is taken by message passing (RDD join). 
Thus on graph computer the time between ProfileStep should be measured, not 
inside it.

The other approach is to get Spark statistics with SparkListener and add spark 
stages timings into profiler metrics. that will work only for spark but will 
give better representation of step costs.
The simple fix is measuring time between OLAP iterations and add it to the 
profiler step.
This will not take into account computer setup time, but will be precise enough 
for long running queries.

To reproduce:
tinkerPop 3.2.6 gremlin:

{code}
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.tinkergraph
gremlin> graph = 
GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], 
sparkgraphcomputer]
gremlin> g.V().out().out().count().profile()
==>Traversal Metrics
Step   Count  
Traversers   Time (ms)% Dur
=
GraphStep(vertex,[]) 808
 808   2.02518.35
VertexStep(OUT,vertex)  8049
 562   4.43040.14
VertexStep(OUT,edge)  327370
7551   4.58141.50
CountGlobalStep1
   1   0.001 0.01
>TOTAL -
   -  11.038-
gremlin> clock(1){g.V().out().out().count().next() }
==>3421.92758
gremlin>
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TINKERPOP-1783) PageRank gives incorrect results for graphs with sinks

2017-09-15 Thread Artem Aliev (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167499#comment-16167499
 ] 

Artem Aliev commented on TINKERPOP-1783:


The work around I proposed is incorrect.
The correct behaviour is "user come to random vertex from the sink vertex"

> PageRank gives incorrect results for graphs with sinks
> --
>
> Key: TINKERPOP-1783
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1783
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.3.0, 3.1.8, 3.2.6
>Reporter: Artem Aliev
>
> {quote} Sink vertices (those with no outgoing edges) should evenly distribute 
> their rank to the entire graph but in the current implementation it is just 
> lost.
> {quote} 
> Wiki: https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm
> {quote}  In the original form of PageRank, the sum of PageRank over all pages 
> was the total number of pages on the web at that time
> {quote} 
> I found the issue, while comparing results with the spark graphX.
> So this is a copy of  https://issues.apache.org/jira/browse/SPARK-18847
> How to reproduce:
> {code}
> gremlin> graph = TinkerFactory.createModern()
> gremlin> g = graph.traversal().withComputer()
> gremlin> 
> g.V().pageRank(0.85).times(40).by('pageRank').values('pageRank').sum()
> ==>1.318625
> gremlin> g.V().pageRank(0.85).times(1).by('pageRank').values('pageRank').sum()
> ==>3.4497
> #inital values:
> gremlin> g.V().pageRank(0.85).times(0).by('pageRank').values('pageRank').sum()
> ==>6.0
> {code}
> They fixed the issue by normalising values after each step.
> The other way to fix is to send the message to it self (stay on the same 
> page).
> To workaround the problem just add self pointing edges:
> {code}
> gremlin>g.V().as('B').addE('knows').from('B')
> {code}
> Then you'll get always correct sum. But I'm not sure it is a proper 
> assumption. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (TINKERPOP-1783) PageRank gives incorrect results for graphs with sinks

2017-09-14 Thread Artem Aliev (JIRA)
Artem Aliev created TINKERPOP-1783:
--

 Summary: PageRank gives incorrect results for graphs with sinks
 Key: TINKERPOP-1783
 URL: https://issues.apache.org/jira/browse/TINKERPOP-1783
 Project: TinkerPop
  Issue Type: Bug
Affects Versions: 3.2.6, 3.1.8, 3.3.0
Reporter: Artem Aliev


{quote} Sink vertices (those with no outgoing edges) should evenly distribute 
their rank to the entire graph but in the current implementation it is just 
lost.
{quote} 

Wiki: https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm
{quote}  In the original form of PageRank, the sum of PageRank over all pages 
was the total number of pages on the web at that time
{quote} 

I found the issue, while comparing results with the spark graphX.
So this is a copy of  https://issues.apache.org/jira/browse/SPARK-18847

How to reproduce:
{code}
gremlin> graph = TinkerFactory.createModern()
gremlin> g = graph.traversal().withComputer()
gremlin> g.V().pageRank(0.85).times(40).by('pageRank').values('pageRank').sum()
==>1.318625
gremlin> g.V().pageRank(0.85).times(1).by('pageRank').values('pageRank').sum()
==>3.4497
#inital values:
gremlin> g.V().pageRank(0.85).times(0).by('pageRank').values('pageRank').sum()
==>6.0
{code}

They fixed the issue by normalising values after each step.
The other way to fix is to send the message to it self (stay on the same page).
To workaround the problem just add self pointing edges:
{code}
gremlin>g.V().as('B').addE('knows').from('B')
{code}
Then you'll get always correct sum. But I'm not sure it is a proper assumption. 







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (TINKERPOP-1754) Spark can not deserialise some ScriptRecordReader parse exceptions

2017-08-18 Thread Artem Aliev (JIRA)
Artem Aliev created TINKERPOP-1754:
--

 Summary: Spark can not deserialise some ScriptRecordReader parse 
exceptions
 Key: TINKERPOP-1754
 URL: https://issues.apache.org/jira/browse/TINKERPOP-1754
 Project: TinkerPop
  Issue Type: Bug
  Components: hadoop
Affects Versions: 3.3.0
Reporter: Artem Aliev
Priority: Minor


ScriptException refer to groovy exception that could point to "Script" class 
that is not available for system class loader. Spark can not deserialise the 
exception and user did not get the parse error.
To fix the problem ScriptRecordReader should not try to propagate all cause 
exceptions abut only the message with parse error.

Spark output:
{code}
WARN  [task-result-getter-0] 2017-08-16 11:11:41,777  TaskEndReason.scala:192 - 
Task exception could not be deserialized
java.lang.ClassNotFoundException: Script1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
~[na:1.8.0_40]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_40]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) 
~[na:1.8.0_40]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_40]
at java.lang.Class.forName0(Native Method) ~[na:1.8.0_40]
at java.lang.Class.forName(Class.java:348) ~[na:1.8.0_40]
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
 ~[spark-core_2.11-2.2.0.0-bb4c2a9.jar:2.2.0.0-bb4c2a9]
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) 
[na:1.8.0_40]
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) 
[na:1.8.0_40]
at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1484) 
[na:1.8.0_40]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1334) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
[na:1.8.0_40]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501) 
[na:1.8.0_40]
at java.lang.Throwable.readObject(Throwable.java:914) ~[na:1.8.0_40]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_40]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_40]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_40]
at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_40]
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
[na:1.8.0_40]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501) 
[na:1.8.0_40]
at java.lang.Throwable.readObject(Throwable.java:914) ~[na:1.8.0_40]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_40]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_40]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_40]
at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_40]
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896) 
[na:1.8.0_40]
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) 
[na:1.8.0_40]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
[na:1.8.0_40]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) 
[na:1.8.0_40]
at 
org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:193)
 ~[spark-core_2.11-2.2.0.0-bb4c2a9.jar:2.2.0.0-bb4c2a9]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_40]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_40]
at 

[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context

2017-02-03 Thread Artem Aliev (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851300#comment-15851300
 ] 

Artem Aliev commented on TINKERPOP-1271:


mvn clean install -DskipIntegrationTests=false passed , but I see following 
out, is it ok?
{code}
...
Running org.apache.tinkerpop.gremlin.spark.SparkGremlinGryoSerializerTest
[ERROR] org.apache.tinkerpop.gremlin.AbstractGremlinSuite - The 
SparkGremlinSuite will run for this Graph as it is testing a Gremlin flavor but 
the Graph does not publicly acknowledged it yet with the @OptIn annotation.
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.273 sec - in 
org.apache.tinkerpop.gremlin.spark.SparkGremlinGryoSerializerTest
...
{code}

{code}
...
[WARN] org.apache.tinkerpop.gremlin.hadoop.groovy.plugin.HadoopGremlinPlugin - 
Be sure to set the environmental variable: HADOOP_GREMLIN_LIBS
[WARN] 
org.apache.tinkerpop.gremlin.hadoop.process.computer.AbstractHadoopGraphComputer
 - 
/Users/artemaliev/git/tinkerpop.ali/spark-gremlin/target/test-case-data/HadoopGremlinPluginCheck/shouldGracefullyHandleBadGremlinHadoopLibs/
 does not reference a valid directory -- proceeding regardless
...
{code}


> SparkContext should be restarted if Killed and using Persistent Context
> ---
>
> Key: TINKERPOP-1271
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1271
> Project: TinkerPop
>  Issue Type: Bug
>  Components: hadoop
>Affects Versions: 3.2.0-incubating, 3.1.2-incubating
>Reporter: Russell Spitzer
>
> If the persisted Spark Context is killed by the user via the Spark UI or is 
> terminated for some other error the Gremlin Console/Server is left with a 
> stopped Spark Context. This could be caught and the spark context recreated. 
> Oddly enough if you simply wait the context will "reset" itself or possible 
> get GC'd out of the system and everything works again. 
> ##Repo
> {code}
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> ==>6
> gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend 
>  - Application has been killed. Reason: Master removed our application: KILLED
> ERROR org.apache.spark.scheduler.TaskSchedulerImpl  - Lost executor 0 on 
> 10.150.0.180: Remote RPC client disassociated. Likely due to containers 
> exceeding thresholds, or network issues. Check driver logs for WARN messages.
> // Driver has been killed here via the Master UI
> gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
> ==>hadoopgraph[gryoinputformat->gryooutputformat]
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> java.lang.IllegalStateException: Cannot call methods on a stopped 
> SparkContext.
> This stopped SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> The currently active SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}
> Full trace from TP
> {code}
>   at 
> org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
>   at 
> org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130)
>   at 
> org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> 

[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context

2017-02-02 Thread Artem Aliev (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850209#comment-15850209
 ] 

Artem Aliev commented on TINKERPOP-1271:


"mvn install" tests passed
 I have test it manually on master with spark 2.0 and back ported SPARK-19362, 
to check stop works.


> SparkContext should be restarted if Killed and using Persistent Context
> ---
>
> Key: TINKERPOP-1271
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1271
> Project: TinkerPop
>  Issue Type: Bug
>  Components: hadoop
>Affects Versions: 3.2.0-incubating, 3.1.2-incubating
>Reporter: Russell Spitzer
>
> If the persisted Spark Context is killed by the user via the Spark UI or is 
> terminated for some other error the Gremlin Console/Server is left with a 
> stopped Spark Context. This could be caught and the spark context recreated. 
> Oddly enough if you simply wait the context will "reset" itself or possible 
> get GC'd out of the system and everything works again. 
> ##Repo
> {code}
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> ==>6
> gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend 
>  - Application has been killed. Reason: Master removed our application: KILLED
> ERROR org.apache.spark.scheduler.TaskSchedulerImpl  - Lost executor 0 on 
> 10.150.0.180: Remote RPC client disassociated. Likely due to containers 
> exceeding thresholds, or network issues. Check driver logs for WARN messages.
> // Driver has been killed here via the Master UI
> gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
> ==>hadoopgraph[gryoinputformat->gryooutputformat]
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> java.lang.IllegalStateException: Cannot call methods on a stopped 
> SparkContext.
> This stopped SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> The currently active SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}
> Full trace from TP
> {code}
>   at 
> org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
>   at 
> org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130)
>   at 
> org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at 
> org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1129)
>   at 
> org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:507)
>   at 
> org.apache.tinkerpop.gremlin.spark.structure.io.InputFormatRDD.readGraphRDD(InputFormatRDD.java:42)
>   at 
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:195)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {code}
> If 

[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context

2017-01-25 Thread Artem Aliev (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837700#comment-15837700
 ] 

Artem Aliev commented on TINKERPOP-1271:


I have filed a spark bug for it SPARK-19362
but then find it was fixed with 
https://issues.apache.org/jira/browse/SPARK-18751 in spark 2.1

> SparkContext should be restarted if Killed and using Persistent Context
> ---
>
> Key: TINKERPOP-1271
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1271
> Project: TinkerPop
>  Issue Type: Bug
>  Components: hadoop
>Affects Versions: 3.2.0-incubating, 3.1.2-incubating
>Reporter: Russell Spitzer
>
> If the persisted Spark Context is killed by the user via the Spark UI or is 
> terminated for some other error the Gremlin Console/Server is left with a 
> stopped Spark Context. This could be caught and the spark context recreated. 
> Oddly enough if you simply wait the context will "reset" itself or possible 
> get GC'd out of the system and everything works again. 
> ##Repo
> {code}
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> ==>6
> gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend 
>  - Application has been killed. Reason: Master removed our application: KILLED
> ERROR org.apache.spark.scheduler.TaskSchedulerImpl  - Lost executor 0 on 
> 10.150.0.180: Remote RPC client disassociated. Likely due to containers 
> exceeding thresholds, or network issues. Check driver logs for WARN messages.
> // Driver has been killed here via the Master UI
> gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
> ==>hadoopgraph[gryoinputformat->gryooutputformat]
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> java.lang.IllegalStateException: Cannot call methods on a stopped 
> SparkContext.
> This stopped SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> The currently active SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}
> Full trace from TP
> {code}
>   at 
> org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
>   at 
> org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130)
>   at 
> org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at 
> org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1129)
>   at 
> org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:507)
>   at 
> org.apache.tinkerpop.gremlin.spark.structure.io.InputFormatRDD.readGraphRDD(InputFormatRDD.java:42)
>   at 
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:195)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>