Re: Spark job concurrency problem

2015-05-05 Thread Imran Rashid
can you give your entire spark submit command? Are you missing --executor-cores num_cpu? Also, if you intend to use all 6 nodes, you also need --num-executors 6 On Mon, May 4, 2015 at 2:07 AM, Xi Shen davidshe...@gmail.com wrote: Hi, I have two small RDD, each has about 600 records. In my

Spark job concurrency problem

2015-05-04 Thread Xi Shen
Hi, I have two small RDD, each has about 600 records. In my code, I did val rdd1 = sc...cache() val rdd2 = sc...cache() val result = rdd1.cartesian(rdd2).*repartition*(num_cpu).map {case (a,b) = some_expensive_job(a,b) } I ran my job in YARN cluster with --master yarn-cluster, I have 6