[jira] [Commented] (SPARK-10361) model.predictAll() fails at user_product.first()

2015-08-31 Thread Velu nambi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723621#comment-14723621
 ] 

Velu nambi commented on SPARK-10361:


Thanks [~srowen].

Is this a known issue, any suggestions ?

> model.predictAll() fails at user_product.first()
> 
>
> Key: SPARK-10361
> URL: https://issues.apache.org/jira/browse/SPARK-10361
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
> Environment: Windows 10, Python 2.7 and with all the three versions 
> of Spark
>Reporter: Velu nambi
>
> This code, adapted from the documentation, fails when calling PredictAll() 
> after an ALS.train()
> 15/08/31 00:11:45 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> 15/08/31 00:11:45 ERROR PythonRDD: This may have been caused by a prior 
> exception:
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> 15/08/31 00:11:45 ERROR Executor: Exception in task 0.0 in stage 187.0 (TID 
> 85)
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at 

[jira] [Commented] (SPARK-10361) model.predictAll() fails at user_product.first()

2015-08-31 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723668#comment-14723668
 ] 

Sean Owen commented on SPARK-10361:
---

The tests are passing and I haven't heard anything like this. It does point to 
a local problem. At least, this stack trace is not the problem per se; the 
Python process wasn't able to connect to the JVM. You'd need to see why.

> model.predictAll() fails at user_product.first()
> 
>
> Key: SPARK-10361
> URL: https://issues.apache.org/jira/browse/SPARK-10361
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
> Environment: Windows 10, Python 2.7 and with all the three versions 
> of Spark
>Reporter: Velu nambi
>
> This code, adapted from the documentation, fails when calling PredictAll() 
> after an ALS.train()
> 15/08/31 00:11:45 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> 15/08/31 00:11:45 ERROR PythonRDD: This may have been caused by a prior 
> exception:
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> 15/08/31 00:11:45 ERROR Executor: Exception in task 0.0 in stage 187.0 (TID 
> 85)
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> 

[jira] [Commented] (SPARK-10361) model.predictAll() fails at user_product.first()

2015-08-31 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723193#comment-14723193
 ] 

Sean Owen commented on SPARK-10361:
---

The error is right here though -- your Python and JVM processes aren't able to 
communicate. 

> model.predictAll() fails at user_product.first()
> 
>
> Key: SPARK-10361
> URL: https://issues.apache.org/jira/browse/SPARK-10361
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
> Environment: Windows 10, Python 2.7 and with all the three versions 
> of Spark
>Reporter: Velu nambi
>
> This code, adapted from the documentation, fails when calling PredictAll() 
> after an ALS.train()
> 15/08/31 00:11:45 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> 15/08/31 00:11:45 ERROR PythonRDD: This may have been caused by a prior 
> exception:
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> 15/08/31 00:11:45 ERROR Executor: Exception in task 0.0 in stage 187.0 (TID 
> 85)
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at 

[jira] [Commented] (SPARK-10361) model.predictAll() fails at user_product.first()

2015-08-31 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723141#comment-14723141
 ] 

Sean Owen commented on SPARK-10361:
---

Doesn't this just appear to be a network problem or other problem specific to 
your cluster? The error is SocketException

> model.predictAll() fails at user_product.first()
> 
>
> Key: SPARK-10361
> URL: https://issues.apache.org/jira/browse/SPARK-10361
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
> Environment: Windows 10, Python 2.7 and with all the three versions 
> of Spark
>Reporter: Velu nambi
>
> This code, adapted from the documentation, fails when calling PredictAll() 
> after an ALS.train()
> 15/08/31 00:11:45 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> 15/08/31 00:11:45 ERROR PythonRDD: This may have been caused by a prior 
> exception:
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> 15/08/31 00:11:45 ERROR Executor: Exception in task 0.0 in stage 187.0 (TID 
> 85)
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> 

[jira] [Commented] (SPARK-10361) model.predictAll() fails at user_product.first()

2015-08-31 Thread Velu nambi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723162#comment-14723162
 ] 

Velu nambi commented on SPARK-10361:


[~srowen] I'm running a standalone version of spark on windows.
I didn't see any process crash or anything suspicious in the firewall logs -- 
let me know if I'm missing something ?


> model.predictAll() fails at user_product.first()
> 
>
> Key: SPARK-10361
> URL: https://issues.apache.org/jira/browse/SPARK-10361
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
> Environment: Windows 10, Python 2.7 and with all the three versions 
> of Spark
>Reporter: Velu nambi
>
> This code, adapted from the documentation, fails when calling PredictAll() 
> after an ALS.train()
> 15/08/31 00:11:45 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> 15/08/31 00:11:45 ERROR PythonRDD: This may have been caused by a prior 
> exception:
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>   at 
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> 15/08/31 00:11:45 ERROR Executor: Exception in task 0.0 in stage 187.0 (TID 
> 85)
> java.net.SocketException: Connection reset by peer: socket write error
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(Unknown Source)
>   at java.net.SocketOutputStream.write(Unknown Source)
>   at java.io.BufferedOutputStream.write(Unknown Source)
>   at java.io.DataOutputStream.write(Unknown Source)
>   at java.io.FilterOutputStream.write(Unknown Source)
>   at 
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at 
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>   at 
>