Mohini, We set that parameter before we went and played with the number of executors and that didn't seem to help at all.
Thanks, KP On Tue, Mar 14, 2017 at 3:37 PM, mohini kalamkar <mohini.kalam...@gmail.com> wrote: > Hi, > > try using this parameter --conf spark.sql.shuffle.partitions=1000 > > Thanks, > Mohini > > On Tue, Mar 14, 2017 at 3:30 PM, kpeng1 <kpe...@gmail.com> wrote: > >> Hi All, >> >> I am currently on Spark 1.6 and I was doing a sql join on two tables that >> are over 100 million rows each and I noticed that it was spawn 30000+ >> tasks >> (this is the progress meter that we are seeing show up). We tried to >> coalesece, repartition and shuffle partitions to drop the number of tasks >> down because we were getting time outs due to the number of task being >> spawned, but those operations did not seem to reduce the number of tasks. >> The solution we came up with was actually to set the num executors to 50 >> (--num-executors=50) and it looks like it spawned 200 active tasks, but >> the >> total number of tasks remained the same. Was wondering if anyone knows >> what >> is going on? Is there an optimal number of executors, I was under the >> impression that the default dynamic allocation would pick the optimal >> number >> of executors for us and that this situation wouldn't happen. Is there >> something I am missing? >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Setting-Optimal-Number-of-Spark-Execut >> or-Instances-tp28493.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > > -- > Thanks & Regards, > Mohini Kalamkar > M: +1 310 567 9329 <(310)%20567-9329> >