Re: pyspark streaming crashes

2016-01-04 Thread Antony Mayi
just for reference in my case this problem is caused by this bug: 
https://issues.apache.org/jira/browse/SPARK-12617 

On Monday, 21 December 2015, 14:32, Antony Mayi  
wrote:
 
 

 I noticed it might be related to longer GC pauses (1-2 sec) - the crash 
usually occurs after such pause. could that be causing the python-java gateway 
timing out? 

On Sunday, 20 December 2015, 23:05, Antony Mayi  
wrote:
 
 

 Hi,
can anyone please help me troubleshooting this prob: I have a streaming pyspark 
application (spark 1.5.2 on yarn-client) which keeps crashing after few hours. 
Doesn't seem to be running out of mem neither on driver or executors.
driver error:
py4j.protocol.Py4JJavaError: An error occurred while calling 
o1.awaitTermination.: java.io.IOException: py4j.Py4JException: Error while 
obtaining a new communication channel        at 
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)        at 
org.apache.spark.streaming.api.python.TransformFunction.writeObject(PythonDStream.scala:77)
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)

(all) executors error:
  File 
"/u04/yarn/local/usercache/das/appcache/application_1450337892069_0336/container_1450337892069_0336_01_08/pyspark.zip/pyspark/worker.py",
 line 136, in main    if read_int(infile) == SpecialLengths.END_OF_STREAM:  
File 
"/u04/yarn/local/usercache/das/appcache/application_1450337892069_0336/container_1450337892069_0336_01_08/pyspark.zip/pyspark/serializers.py",
 line 545, in read_int    raise EOFError


GC (using G1GC) debugging just before crash:
driver:   [Eden: 2316.0M(2316.0M)->0.0B(2318.0M) Survivors: 140.0M->138.0M 
Heap: 3288.7M(4096.0M)->675.5M(4096.0M)]
executor(s):   [Eden: 2342.0M(2342.0M)->0.0B(2378.0M) Survivors: 52.0M->34.0M 
Heap: 3601.7M(4096.0M)->1242.7M(4096.0M)]

thanks.

 
   

 
  

Re: pyspark streaming crashes

2015-12-21 Thread Antony Mayi
I noticed it might be related to longer GC pauses (1-2 sec) - the crash usually 
occurs after such pause. could that be causing the python-java gateway timing 
out? 

On Sunday, 20 December 2015, 23:05, Antony Mayi  
wrote:
 
 

 Hi,
can anyone please help me troubleshooting this prob: I have a streaming pyspark 
application (spark 1.5.2 on yarn-client) which keeps crashing after few hours. 
Doesn't seem to be running out of mem neither on driver or executors.
driver error:
py4j.protocol.Py4JJavaError: An error occurred while calling 
o1.awaitTermination.: java.io.IOException: py4j.Py4JException: Error while 
obtaining a new communication channel        at 
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)        at 
org.apache.spark.streaming.api.python.TransformFunction.writeObject(PythonDStream.scala:77)
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)

(all) executors error:
  File 
"/u04/yarn/local/usercache/das/appcache/application_1450337892069_0336/container_1450337892069_0336_01_08/pyspark.zip/pyspark/worker.py",
 line 136, in main    if read_int(infile) == SpecialLengths.END_OF_STREAM:  
File 
"/u04/yarn/local/usercache/das/appcache/application_1450337892069_0336/container_1450337892069_0336_01_08/pyspark.zip/pyspark/serializers.py",
 line 545, in read_int    raise EOFError


GC (using G1GC) debugging just before crash:
driver:   [Eden: 2316.0M(2316.0M)->0.0B(2318.0M) Survivors: 140.0M->138.0M 
Heap: 3288.7M(4096.0M)->675.5M(4096.0M)]
executor(s):   [Eden: 2342.0M(2342.0M)->0.0B(2378.0M) Survivors: 52.0M->34.0M 
Heap: 3601.7M(4096.0M)->1242.7M(4096.0M)]

thanks.

 
  

pyspark streaming crashes

2015-12-20 Thread Antony Mayi
Hi,
can anyone please help me troubleshooting this prob: I have a streaming pyspark 
application (spark 1.5.2 on yarn-client) which keeps crashing after few hours. 
Doesn't seem to be running out of mem neither on driver or executors.
driver error:
py4j.protocol.Py4JJavaError: An error occurred while calling 
o1.awaitTermination.: java.io.IOException: py4j.Py4JException: Error while 
obtaining a new communication channel        at 
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)        at 
org.apache.spark.streaming.api.python.TransformFunction.writeObject(PythonDStream.scala:77)
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)

(all) executors error:
  File 
"/u04/yarn/local/usercache/das/appcache/application_1450337892069_0336/container_1450337892069_0336_01_08/pyspark.zip/pyspark/worker.py",
 line 136, in main    if read_int(infile) == SpecialLengths.END_OF_STREAM:  
File 
"/u04/yarn/local/usercache/das/appcache/application_1450337892069_0336/container_1450337892069_0336_01_08/pyspark.zip/pyspark/serializers.py",
 line 545, in read_int    raise EOFError


GC (using G1GC) debugging just before crash:
driver:   [Eden: 2316.0M(2316.0M)->0.0B(2318.0M) Survivors: 140.0M->138.0M 
Heap: 3288.7M(4096.0M)->675.5M(4096.0M)]
executor(s):   [Eden: 2342.0M(2342.0M)->0.0B(2378.0M) Survivors: 52.0M->34.0M 
Heap: 3601.7M(4096.0M)->1242.7M(4096.0M)]

thanks.