How much executor-memory are you setting for the JVM? What about the Driver JVM memory?
Also check the Windows Event Log for Out of memory errors for one of the 2 above JVMs. On Dec 14, 2014 6:04 AM, "genesis fatum" <genesis.fa...@gmail.com> wrote: > Hi, > > My environment is: standalone spark 1.1.1 on windows 8.1 pro. > > The following case works fine: > >>> a = [1,2,3,4,5,6,7,8,9] > >>> b = [] > >>> for x in range(100000): > ... b.append(a) > ... > >>> rdd1 = sc.parallelize(b) > >>> rdd1.first() > >>>[1, 2, 3, 4, 5, 6, 7, 8, 9] > > The following case does not work. The only difference is the size of the > array. Note the loop range: 100K vs. 1M. > >>> a = [1,2,3,4,5,6,7,8,9] > >>> b = [] > >>> for x in range(1000000): > ... b.append(a) > ... > >>> rdd1 = sc.parallelize(b) > >>> rdd1.first() > >>> > 14/12/14 07:52:19 ERROR PythonRDD: Python worker exited unexpectedly > (crashed) > java.net.SocketException: Connection reset by peer: socket write error > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(Unknown Source) > at java.net.SocketOutputStream.write(Unknown Source) > at java.io.BufferedOutputStream.flushBuffer(Unknown Source) > at java.io.BufferedOutputStream.write(Unknown Source) > at java.io.DataOutputStream.write(Unknown Source) > at java.io.FilterOutputStream.write(Unknown Source) > at > org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$ > 1.apply(PythonRDD.scala:341) > at > org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$ > 1.apply(PythonRDD.scala:339) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRD > D.scala:339) > at > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.app > ly$mcV$sp(PythonRDD.scala:209) > at > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.app > ly(PythonRDD.scala:184) > at > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.app > ly(PythonRDD.scala:184) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1364) > at > org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scal > a:183) > > What I have tried: > 1. Replaced JRE 32bit with JRE64 > 2. Multiple configurations when I start pyspark: --driver-memory, > --executor-memory > 3. Tried to set the SparkConf with different settings > 4. Tried also with spark 1.1.0 > > Being new to Spark, I am sure that it is something simple that I am missing > and would appreciate any thoughts. > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-is-crashing-in-this-case-why-tp20675.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >