Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-08-01 Thread Ted Yu
Have you seen the following ?
http://stackoverflow.com/questions/27553547/xloggc-not-creating-log-file-if-path-doesnt-exist-for-the-first-time

On Sat, Jul 23, 2016 at 5:18 PM, Ascot Moss  wrote:

> I tried to add -Xloggc:./jvm_gc.log
>
> --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -Xloggc:./jvm_gc.log -XX:+PrintGCDateStamps"
>
> however, I could not find ./jvm_gc.log
>
> How to resolve the OOM and gc log issue?
>
> Regards
>
> On Sun, Jul 24, 2016 at 6:37 AM, Ascot Moss  wrote:
>
>> My JDK is Java 1.8 u40
>>
>> On Sun, Jul 24, 2016 at 3:45 AM, Ted Yu  wrote:
>>
>>> Since you specified +PrintGCDetails, you should be able to get some
>>> more detail from the GC log.
>>>
>>> Also, which JDK version are you using ?
>>>
>>> Please use Java 8 where G1GC is more reliable.
>>>
>>> On Sat, Jul 23, 2016 at 10:38 AM, Ascot Moss 
>>> wrote:
>>>
 Hi,

 I added the following parameter:

 --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
 -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
 -XX:InitiatingHeapOccupancyPercent=70 -XX:+PrintGCDetails
 -XX:+PrintGCTimeStamps"

 Still got Java heap space error.

 Any idea to resolve?  (my spark is 1.6.1)


 16/07/23 23:31:50 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID
 22, n1791): java.lang.OutOfMemoryError: Java heap space   at
 scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)

 at
 scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)

 at
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:248)

 at
 org.apache.spark.util.collection.CompactBuffer.toArray(CompactBuffer.scala:30)

 at
 org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findSplits$1(DecisionTree.scala:1009)
 at
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)

 at
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)

 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

 at scala.collection.Iterator$class.foreach(Iterator.scala:727)

 at
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

 at
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

 at
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

 at
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

 at 
 scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

 at scala.collection.AbstractIterator.to(Iterator.scala:1157)

 at
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

 at
 scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

 at
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

 at
 scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

 at
 org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

 at
 org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

 at
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

 at
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

 at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

 at org.apache.spark.scheduler.Task.run(Task.scala:89)

 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

 at java.lang.Thread.run(Thread.java:745)

 Regards



 On Sat, Jul 23, 2016 at 9:49 AM, Ascot Moss 
 wrote:

> Thanks. Trying with extra conf now.
>
> On Sat, Jul 23, 2016 at 6:59 AM, RK Aduri 
> wrote:
>
>> I can see large number of collections happening on driver and
>> eventually, driver is running out of memory. ( am not sure whether you 
>> have
>> persisted any rdd or data frame). May be you would want to avoid doing so
>> many collections or persist unwanted data in memory.
>>
>> To begin with, you may want to re-run the job with this following
>> config: --conf 

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-07-23 Thread Ascot Moss
I tried to add -Xloggc:./jvm_gc.log

--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -Xloggc:./jvm_gc.log -XX:+PrintGCDateStamps"

however, I could not find ./jvm_gc.log

How to resolve the OOM and gc log issue?

Regards

On Sun, Jul 24, 2016 at 6:37 AM, Ascot Moss  wrote:

> My JDK is Java 1.8 u40
>
> On Sun, Jul 24, 2016 at 3:45 AM, Ted Yu  wrote:
>
>> Since you specified +PrintGCDetails, you should be able to get some more
>> detail from the GC log.
>>
>> Also, which JDK version are you using ?
>>
>> Please use Java 8 where G1GC is more reliable.
>>
>> On Sat, Jul 23, 2016 at 10:38 AM, Ascot Moss 
>> wrote:
>>
>>> Hi,
>>>
>>> I added the following parameter:
>>>
>>> --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
>>> -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
>>> -XX:InitiatingHeapOccupancyPercent=70 -XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps"
>>>
>>> Still got Java heap space error.
>>>
>>> Any idea to resolve?  (my spark is 1.6.1)
>>>
>>>
>>> 16/07/23 23:31:50 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID
>>> 22, n1791): java.lang.OutOfMemoryError: Java heap space   at
>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>>
>>> at
>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>>
>>> at
>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:248)
>>>
>>> at
>>> org.apache.spark.util.collection.CompactBuffer.toArray(CompactBuffer.scala:30)
>>>
>>> at
>>> org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findSplits$1(DecisionTree.scala:1009)
>>> at
>>> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>>>
>>> at
>>> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>>>
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>
>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>
>>> at
>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>
>>> at
>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>
>>> at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>
>>> at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>
>>> at 
>>> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>>
>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>
>>> at
>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>
>>> at
>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>
>>> at
>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>
>>> at
>>> scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>
>>> at
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>
>>> at
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>
>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>
>>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> Regards
>>>
>>>
>>>
>>> On Sat, Jul 23, 2016 at 9:49 AM, Ascot Moss 
>>> wrote:
>>>
 Thanks. Trying with extra conf now.

 On Sat, Jul 23, 2016 at 6:59 AM, RK Aduri 
 wrote:

> I can see large number of collections happening on driver and
> eventually, driver is running out of memory. ( am not sure whether you 
> have
> persisted any rdd or data frame). May be you would want to avoid doing so
> many collections or persist unwanted data in memory.
>
> To begin with, you may want to re-run the job with this following
> config: --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” —> and this will give you an
> idea of how you are getting OOM.
>
>
> On Jul 22, 2016, at 3:52 PM, Ascot Moss  wrote:
>
> Hi
>
> Please help!
>
>  When running random forest training phase in cluster mode, I got GC
> overhead 

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-07-23 Thread Ascot Moss
My JDK is Java 1.8 u40

On Sun, Jul 24, 2016 at 3:45 AM, Ted Yu  wrote:

> Since you specified +PrintGCDetails, you should be able to get some more
> detail from the GC log.
>
> Also, which JDK version are you using ?
>
> Please use Java 8 where G1GC is more reliable.
>
> On Sat, Jul 23, 2016 at 10:38 AM, Ascot Moss  wrote:
>
>> Hi,
>>
>> I added the following parameter:
>>
>> --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
>> -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
>> -XX:InitiatingHeapOccupancyPercent=70 -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps"
>>
>> Still got Java heap space error.
>>
>> Any idea to resolve?  (my spark is 1.6.1)
>>
>>
>> 16/07/23 23:31:50 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID
>> 22, n1791): java.lang.OutOfMemoryError: Java heap space   at
>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>
>> at
>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>
>> at
>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:248)
>>
>> at
>> org.apache.spark.util.collection.CompactBuffer.toArray(CompactBuffer.scala:30)
>>
>> at
>> org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findSplits$1(DecisionTree.scala:1009)
>> at
>> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>>
>> at
>> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>>
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>
>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>
>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>
>> at
>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>
>> at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>
>> at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>
>> at 
>> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>
>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>
>> at
>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>
>> at
>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>
>> at
>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>
>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>
>> at
>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>
>> at
>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>
>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Regards
>>
>>
>>
>> On Sat, Jul 23, 2016 at 9:49 AM, Ascot Moss  wrote:
>>
>>> Thanks. Trying with extra conf now.
>>>
>>> On Sat, Jul 23, 2016 at 6:59 AM, RK Aduri 
>>> wrote:
>>>
 I can see large number of collections happening on driver and
 eventually, driver is running out of memory. ( am not sure whether you have
 persisted any rdd or data frame). May be you would want to avoid doing so
 many collections or persist unwanted data in memory.

 To begin with, you may want to re-run the job with this following
 config: --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” —> and this will give you an
 idea of how you are getting OOM.


 On Jul 22, 2016, at 3:52 PM, Ascot Moss  wrote:

 Hi

 Please help!

  When running random forest training phase in cluster mode, I got GC
 overhead limit exceeded.

 I have used two parameters when submitting the job to cluster

 --driver-memory 64g \

 --executor-memory 8g \

 My Current settings:

 (spark-defaults.conf)

 spark.executor.memory   8g

 (spark-env.sh)

 export SPARK_WORKER_MEMORY=8g

 export HADOOP_HEAPSIZE=8000


 Any idea how to resolve it?

 Regards






 ###  (the erro log) ###

 16/07/23 

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-07-23 Thread Ted Yu
Since you specified +PrintGCDetails, you should be able to get some more
detail from the GC log.

Also, which JDK version are you using ?

Please use Java 8 where G1GC is more reliable.

On Sat, Jul 23, 2016 at 10:38 AM, Ascot Moss  wrote:

> Hi,
>
> I added the following parameter:
>
> --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
> -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
> -XX:InitiatingHeapOccupancyPercent=70 -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps"
>
> Still got Java heap space error.
>
> Any idea to resolve?  (my spark is 1.6.1)
>
>
> 16/07/23 23:31:50 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID 22,
> n1791): java.lang.OutOfMemoryError: Java heap space   at
> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>
> at
> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>
> at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:248)
>
> at
> org.apache.spark.util.collection.CompactBuffer.toArray(CompactBuffer.scala:30)
>
> at
> org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findSplits$1(DecisionTree.scala:1009)
> at
> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>
> at
> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
> at
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>
> at
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Regards
>
>
>
> On Sat, Jul 23, 2016 at 9:49 AM, Ascot Moss  wrote:
>
>> Thanks. Trying with extra conf now.
>>
>> On Sat, Jul 23, 2016 at 6:59 AM, RK Aduri 
>> wrote:
>>
>>> I can see large number of collections happening on driver and
>>> eventually, driver is running out of memory. ( am not sure whether you have
>>> persisted any rdd or data frame). May be you would want to avoid doing so
>>> many collections or persist unwanted data in memory.
>>>
>>> To begin with, you may want to re-run the job with this following
>>> config: --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” —> and this will give you an
>>> idea of how you are getting OOM.
>>>
>>>
>>> On Jul 22, 2016, at 3:52 PM, Ascot Moss  wrote:
>>>
>>> Hi
>>>
>>> Please help!
>>>
>>>  When running random forest training phase in cluster mode, I got GC
>>> overhead limit exceeded.
>>>
>>> I have used two parameters when submitting the job to cluster
>>>
>>> --driver-memory 64g \
>>>
>>> --executor-memory 8g \
>>>
>>> My Current settings:
>>>
>>> (spark-defaults.conf)
>>>
>>> spark.executor.memory   8g
>>>
>>> (spark-env.sh)
>>>
>>> export SPARK_WORKER_MEMORY=8g
>>>
>>> export HADOOP_HEAPSIZE=8000
>>>
>>>
>>> Any idea how to resolve it?
>>>
>>> Regards
>>>
>>>
>>>
>>>
>>>
>>>
>>> ###  (the erro log) ###
>>>
>>> 16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1 (TID
>>> 30, n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>
>>> at
>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>>
>>> at
>>> 

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-07-23 Thread Ascot Moss
Hi,

I added the following parameter:

--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
-XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
-XX:InitiatingHeapOccupancyPercent=70 -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps"

Still got Java heap space error.

Any idea to resolve?  (my spark is 1.6.1)


16/07/23 23:31:50 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID 22,
n1791): java.lang.OutOfMemoryError: Java heap space   at
scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)

at
scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)

at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:248)

at
org.apache.spark.util.collection.CompactBuffer.toArray(CompactBuffer.scala:30)

at
org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findSplits$1(DecisionTree.scala:1009)
at
org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)

at
org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Regards



On Sat, Jul 23, 2016 at 9:49 AM, Ascot Moss  wrote:

> Thanks. Trying with extra conf now.
>
> On Sat, Jul 23, 2016 at 6:59 AM, RK Aduri  wrote:
>
>> I can see large number of collections happening on driver and eventually,
>> driver is running out of memory. ( am not sure whether you have persisted
>> any rdd or data frame). May be you would want to avoid doing so many
>> collections or persist unwanted data in memory.
>>
>> To begin with, you may want to re-run the job with this following config: 
>> --conf
>> "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps” —> and this will give you an idea of how you are
>> getting OOM.
>>
>>
>> On Jul 22, 2016, at 3:52 PM, Ascot Moss  wrote:
>>
>> Hi
>>
>> Please help!
>>
>>  When running random forest training phase in cluster mode, I got GC
>> overhead limit exceeded.
>>
>> I have used two parameters when submitting the job to cluster
>>
>> --driver-memory 64g \
>>
>> --executor-memory 8g \
>>
>> My Current settings:
>>
>> (spark-defaults.conf)
>>
>> spark.executor.memory   8g
>>
>> (spark-env.sh)
>>
>> export SPARK_WORKER_MEMORY=8g
>>
>> export HADOOP_HEAPSIZE=8000
>>
>>
>> Any idea how to resolve it?
>>
>> Regards
>>
>>
>>
>>
>>
>>
>> ###  (the erro log) ###
>>
>> 16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1 (TID
>> 30, n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> at
>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>
>> at
>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>
>> at
>> org.apache.spark.util.collection.CompactBuffer.growToSize(CompactBuffer.scala:144)
>>
>> at
>> org.apache.spark.util.collection.CompactBuffer.$plus$plus$eq(CompactBuffer.scala:90)
>>
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>
>> at
>> 

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-07-22 Thread RK Aduri
I can see large number of collections happening on driver and eventually, 
driver is running out of memory. ( am not sure whether you have persisted any 
rdd or data frame). May be you would want to avoid doing so many collections or 
persist unwanted data in memory.

To begin with, you may want to re-run the job with this following config: 
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps” —> and this will give you an idea of how you are 
getting OOM.


> On Jul 22, 2016, at 3:52 PM, Ascot Moss  wrote:
> 
> Hi
> 
> Please help!
> 
>  When running random forest training phase in cluster mode, I got GC overhead 
> limit exceeded.
> 
> I have used two parameters when submitting the job to cluster
> --driver-memory 64g \
> 
> --executor-memory 8g \
> 
> 
> My Current settings:
> (spark-defaults.conf)
> 
> spark.executor.memory   8g
> 
> 
> (spark-env.sh)
> export SPARK_WORKER_MEMORY=8g
> 
> export HADOOP_HEAPSIZE=8000
> 
> 
> 
> Any idea how to resolve it?
> 
> Regards
> 
> 
> 
> 
> 
> 
> ###  (the erro log) ###
> 16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1 (TID 30, 
> n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded
> 
> at scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
> 
> at scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
> 
> at 
> org.apache.spark.util.collection.CompactBuffer.growToSize(CompactBuffer.scala:144)
> 
> at 
> org.apache.spark.util.collection.CompactBuffer.$plus$plus$eq(CompactBuffer.scala:90)
> 
> at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
> 
> at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
> 
> at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.mergeIfKeyExists(ExternalAppendOnlyMap.scala:318)
> 
> at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:365)
> 
> at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:265)
> 
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> 
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> 
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
> 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
> 
> at scala.collection.TraversableOnce$class.to 
> (TraversableOnce.scala:273)
> 
> at scala.collection.AbstractIterator.to 
> (Iterator.scala:1157)
> 
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
> 
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
> 
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
> 
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
> 
> at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
> 
> at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
> 
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
> 
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
> 
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> 
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> 
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 
> at java.lang.Thread.run(Thread.java:745)
> 


-- 
Collective[i] dramatically improves sales and marketing performance using 
technology, applications and a revolutionary network designed to provide 
next generation analytics and decision-support directly to business users. 
Our goal is to maximize human potential and minimize mistakes. In most 
cases, the results are astounding. We cannot, however, stop emails from 
sometimes being sent to the wrong person. If you are not the intended 
recipient, please notify us by replying to this email's sender and deleting 
it (and any attachments) permanently from your system. If you are, please 
respect the confidentiality of this communication's contents.


ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-07-22 Thread Ascot Moss
Hi

Please help!

 When running random forest training phase in cluster mode, I got GC
overhead limit exceeded.

I have used two parameters when submitting the job to cluster

--driver-memory 64g \

--executor-memory 8g \

My Current settings:

(spark-defaults.conf)

spark.executor.memory   8g

(spark-env.sh)

export SPARK_WORKER_MEMORY=8g

export HADOOP_HEAPSIZE=8000


Any idea how to resolve it?

Regards






###  (the erro log) ###

16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1 (TID 30,
n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded

at
scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)

at
scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)

at
org.apache.spark.util.collection.CompactBuffer.growToSize(CompactBuffer.scala:144)

at
org.apache.spark.util.collection.CompactBuffer.$plus$plus$eq(CompactBuffer.scala:90)

at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)

at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)

at
org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.mergeIfKeyExists(ExternalAppendOnlyMap.scala:318)

at
org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:365)

at
org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:265)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to
(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


Re: Remote RPC client disassociated

2016-07-01 Thread Akhil Das
Can you try the Cassandra connector 1.5? It is also compatible with Spark
1.6 according to their documentation
https://github.com/datastax/spark-cassandra-connector#version-compatibility
You can also crosspost it over here
https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user

On Fri, Jul 1, 2016 at 5:45 PM, Joaquin Alzola <joaquin.alz...@lebara.com>
wrote:

> HI Akhil
>
>
>
> I am using:
>
> Cassandra: 3.0.5
>
> Spark: 1.6.1
>
> Scala 2.10
>
> Spark-cassandra connector: 1.6.0
>
>
>
> *From:* Akhil Das [mailto:ak...@hacked.work]
> *Sent:* 01 July 2016 11:38
> *To:* Joaquin Alzola <joaquin.alz...@lebara.com>
> *Cc:* user@spark.apache.org
> *Subject:* Re: Remote RPC client disassociated
>
>
>
> This looks like a version conflict, which version of spark are you using?
> The Cassandra connector you are using is for Scala 2.10x and Spark 1.6
> version.
>
>
>
> On Thu, Jun 30, 2016 at 6:34 PM, Joaquin Alzola <joaquin.alz...@lebara.com>
> wrote:
>
> HI List,
>
>
>
> I am launching this spark-submit job:
>
>
>
> hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages
> com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars
> /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py
>
>
>
> spark_v2.py is:
>
> from pyspark_cassandra import CassandraSparkContext, Row
>
> from pyspark import SparkContext, SparkConf
>
> from pyspark.sql import SQLContext
>
> conf = SparkConf().setAppName("test").setMaster("spark://
> 192.168.23.31:7077").set("spark.cassandra.connection.host",
> "192.168.23.31")
>
> sc = CassandraSparkContext(conf=conf)
>
> table =
> sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
>
> food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
>
> food_count.collect()
>
>
>
>
>
> Error I get when running the above command:
>
>
>
> [Stage 0:>  (0 +
> 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>  (0 +
> 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>  (0 +
> 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>  (0 +
> 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> 16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4
> times; aborting job
>
> Traceback (most recent call last):
>
>   File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in 
>
> food_count =
> table.select("errorcode2001").groupBy("errorcode2001").count()
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in
> count
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in
> fold
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in
> collect
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line
> 813, in __call__
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line
> 308, in get_return_value
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
>
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage
> 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of
> the running tasks) Reason: Remote RPC client disassociated. Likely due to
> containers exceeding thresholds, or network issues. Check driver log

RE: Remote RPC client disassociated

2016-07-01 Thread Joaquin Alzola
HI Akhil

I am using:
Cassandra: 3.0.5
Spark: 1.6.1
Scala 2.10
Spark-cassandra connector: 1.6.0

From: Akhil Das [mailto:ak...@hacked.work]
Sent: 01 July 2016 11:38
To: Joaquin Alzola <joaquin.alz...@lebara.com>
Cc: user@spark.apache.org
Subject: Re: Remote RPC client disassociated

This looks like a version conflict, which version of spark are you using? The 
Cassandra connector you are using is for Scala 2.10x and Spark 1.6 version.

On Thu, Jun 30, 2016 at 6:34 PM, Joaquin Alzola 
<joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote:
HI List,

I am launching this spark-submit job:

hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages 
com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars 
/mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py

spark_v2.py is:
from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = 
SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077<http://192.168.23.31:7077>").set("spark.cassandra.connection.host",
 "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()


Error I get when running the above command:

[Stage 0:>              (0 + 3) / 
7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>              (0 + 7) / 
7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>              (0 + 5) / 
7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>                  (0 + 4) / 
7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 times; 
aborting job
Traceback (most recent call last):
  File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in 
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in count
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in fold
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, 
in __call__
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in 
get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in 
stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 
14, as4): ExecutorLostFailure (executor 4 exited caused by one of the running 
tasks) Reason: Remote RPC client disassociated. Likely due to containers 
exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
   at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Op

Re: Remote RPC client disassociated

2016-07-01 Thread Akhil Das
This looks like a version conflict, which version of spark are you using?
The Cassandra connector you are using is for Scala 2.10x and Spark 1.6
version.

On Thu, Jun 30, 2016 at 6:34 PM, Joaquin Alzola <joaquin.alz...@lebara.com>
wrote:

> HI List,
>
>
>
> I am launching this spark-submit job:
>
>
>
> hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages
> com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars
> /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py
>
>
>
> spark_v2.py is:
>
> from pyspark_cassandra import CassandraSparkContext, Row
>
> from pyspark import SparkContext, SparkConf
>
> from pyspark.sql import SQLContext
>
> conf = SparkConf().setAppName("test").setMaster("spark://
> 192.168.23.31:7077").set("spark.cassandra.connection.host",
> "192.168.23.31")
>
> sc = CassandraSparkContext(conf=conf)
>
> table =
> sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
>
> food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
>
> food_count.collect()
>
>
>
>
>
> Error I get when running the above command:
>
>
>
> [Stage 0:>  (0 +
> 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>  (0 +
> 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>  (0 +
> 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>  (0 +
> 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> 16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4
> times; aborting job
>
> Traceback (most recent call last):
>
>   File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in 
>
> food_count =
> table.select("errorcode2001").groupBy("errorcode2001").count()
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in
> count
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in
> fold
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in
> collect
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line
> 813, in __call__
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line
> 308, in get_return_value
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
>
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage
> 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of
> the running tasks) Reason: Remote RPC client disassociated. Likely due to
> containers exceeding thresholds, or network issues. Check driver logs for
> WARN messages.
>
> Driver stacktrace:
>
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>
>at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
>   

RE: Remote RPC client disassociated

2016-06-30 Thread Joaquin Alzola
>>> 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout 
>>> writer for python
java.lang.AbstractMethodError: 
pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>> You are trying to call an abstract method.  Please check the method 
>> DeferringRowReader.read

Do not know how to fix this issue.
Have seen in many tutorials around the net and those ones made the same calling 
I am currently doing

from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = 
SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077").set("spark.cassandra.connection.host",
 "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()

I am really new to this psark thing. Was able to configure it correctly nd now 
learning the API.
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


Re: Remote RPC client disassociated

2016-06-30 Thread Jeff Zhang
>>> 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout
writer for python

java.lang.AbstractMethodError:
pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;


You are trying to call an abstract method.  Please check the method
DeferringRowReader.read

On Thu, Jun 30, 2016 at 4:34 AM, Joaquin Alzola <joaquin.alz...@lebara.com>
wrote:

> HI List,
>
>
>
> I am launching this spark-submit job:
>
>
>
> hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages
> com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars
> /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py
>
>
>
> spark_v2.py is:
>
> from pyspark_cassandra import CassandraSparkContext, Row
>
> from pyspark import SparkContext, SparkConf
>
> from pyspark.sql import SQLContext
>
> conf = SparkConf().setAppName("test").setMaster("spark://
> 192.168.23.31:7077").set("spark.cassandra.connection.host",
> "192.168.23.31")
>
> sc = CassandraSparkContext(conf=conf)
>
> table =
> sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
>
> food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
>
> food_count.collect()
>
>
>
>
>
> Error I get when running the above command:
>
>
>
> [Stage 0:>  (0 +
> 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>  (0 +
> 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>  (0 +
> 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>  (0 +
> 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> 16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4
> times; aborting job
>
> Traceback (most recent call last):
>
>   File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in 
>
> food_count =
> table.select("errorcode2001").groupBy("errorcode2001").count()
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in
> count
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in
> fold
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in
> collect
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line
> 813, in __call__
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line
> 308, in get_return_value
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
>
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage
> 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of
> the running tasks) Reason: Remote RPC client disassociated. Likely due to
> containers exceeding thresholds, or network issues. Check driver logs for
> WARN messages.
>
> Driver stacktrace:
>
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at
> org.apache.s

Remote RPC client disassociated

2016-06-30 Thread Joaquin Alzola
HI List,

I am launching this spark-submit job:

hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages 
com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars 
/mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py

spark_v2.py is:
from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = 
SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077").set("spark.cassandra.connection.host",
 "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()


Error I get when running the above command:

[Stage 0:>  (0 + 3) / 
7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>      (0 + 7) / 
7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>          (0 + 5) / 
7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>              (0 + 4) / 
7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 times; 
aborting job
Traceback (most recent call last):
  File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in 
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in count
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in fold
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, 
in __call__
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in 
get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in 
stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 
14, as4): ExecutorLostFailure (executor 4 exited caused by one of the running 
tasks) Reason: Remote RPC client disassociated. Likely due to containers 
exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
   at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.Spar

PySpark crashed because "remote RPC client disassociated"

2016-06-29 Thread jw.cmu
I am running my own PySpark application (solving matrix factorization using
Gemulla's DSGD algorithm). The program seemed to work fine on smaller
movielens dataset but failed on larger Netflix data. It too about 14 hours
to complete two iterations and lost an executor (I used totally 8 executors
all on one 16-core machine) because "remote RPC client disassociated".

Below is the full error message. I would appreciate any pointer on debugging
this problem. Thanks!

16/06/29 12:43:50 WARN TaskSetManager: Lost task 7.0 in stage 2581.0 (TID
9304, no139.nome.nx): TaskKilled (killed intentionally)
py4j.protocol.Py4JJavaError16/06/29 12:43:53 WARN TaskSetManager: Lost task
6.0 in stage 2581.0 (TID 9303, no139.nome.nx): TaskKilled (killed
intentionally)
16/06/29 12:43:53 WARN TaskSetManager: Lost task 2.0 in stage 2581.0 (TID
9299, no139.nome.nx): TaskKilled (killed intentionally)
16/06/29 12:43:53 INFO TaskSchedulerImpl: Removed TaskSet 2581.0, whose
tasks have all completed, from pool
: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3
in stage 2581.0 failed 1 times, most recent failure: Lost task 3.0 in stage
2581.0 (TID 9300, no139.nome.nx): ExecutorLostFailure (executor 5 exited
caused by one of the running tasks) Reason: Remote RPC client disassociated.
Likely due to containers exceeding thresholds, or network issues. Check
driver logs for WARN messages.
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at
org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
at
org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.GeneratedMethodAccessor96.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-crashed-because-remote-RPC-client-disassociated-tp27248.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org