Re: pyspark on yarn - lost executor

2014-09-21 Thread Sandy Ryza
Hi Oleg, Those parameters control the number and size of Spark's daemons on the cluster. If you're interested in how these daemons relate to each other and interact with YARN, I wrote a post on this a little while ago -

Re: pyspark on yarn - lost executor

2014-09-18 Thread Oleg Ruchovets
Great. Upgrade helped. Still need some inputs: 1) Is there any log files of spark job execution? 2) Where can I read about tuning / parameter configuration: For example: --num-executors 12 --driver-memory 4g --executor-memory 2g what is the meaning of thous parameters? Thanks Oleg. On Thu,

pyspark on yarn - lost executor

2014-09-17 Thread Oleg Ruchovets
Hi , I am execution pyspark on yarn. I have successfully executed initial dataset but now I growed it 10 times more. during execution I got all the time this error: 14/09/17 19:28:50 ERROR cluster.YarnClientClusterScheduler: Lost executor 68 on UCS-NODE1.sms1.local: remote Akka client

Re: pyspark on yarn - lost executor

2014-09-17 Thread Eric Friedman
How many partitions do you have in your input rdd? Are you specifying numPartitions in subsequent calls to groupByKey/reduceByKey? On Sep 17, 2014, at 4:38 AM, Oleg Ruchovets oruchov...@gmail.com wrote: Hi , I am execution pyspark on yarn. I have successfully executed initial dataset

Re: pyspark on yarn - lost executor

2014-09-17 Thread Oleg Ruchovets
Sure, I'll post to the mail list. groupByKey(self, numPartitions=None)source code http://spark.apache.org/docs/1.0.2/api/python/pyspark.rdd-pysrc.html#RDD.groupByKey Group the values for each key in the RDD into a single sequence. Hash-partitions the resulting RDD with into numPartitions

Re: pyspark on yarn - lost executor

2014-09-17 Thread Davies Liu
Maybe the Python worker use too much memory during groupByKey(), groupByKey() with larger numPartitions can help. Also, can you upgrade your cluster to 1.1? It can spilling the data into disks if the memory can not hold all the data during groupByKey(). Also, If there is hot key with dozens of