Hi Sameer, You can also use repartition to create a higher number of tasks.
-Jayant On Fri, Nov 21, 2014 at 12:02 PM, Jayant Shekhar <jay...@cloudera.com> wrote: > Hi Sameer, > > You can try increasing the number of executor-cores. > > -Jayant > > > > > > On Fri, Nov 21, 2014 at 11:18 AM, Sameer Tilak <ssti...@live.com> wrote: > >> Hi All, >> I have been using MLLib's linear regression and I have some question >> regarding the performance. We have a cluster of 10 nodes -- each node has >> 24 cores and 148GB memory. I am running my app as follows: >> >> time spark-submit --class medslogistic.MedsLogistic --master yarn-client >> --executor-memory 6G --num-executors 10 /pathtomyapp/myapp.jar >> >> I am also going to play with number of executors (reduce it) may be that >> will give us different results. >> >> The input is a 800MB sparse file in LibSVNM format. Total number of >> features is 150K. It takes approximately 70 minutes for the regression to >> finish. The job imposes very little load on CPU, memory, network, and disk. >> Total >> number of tasks is 104. Total time gets divided fairly uniformly across >> these tasks each task. I was wondering, is it possible to reduce the >> execution time further? >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > >