Hi, Where's this yarn-client mode specified? When you said "However, when I run the job I see that the stage which reads the directory has only 8 tasks." -- how do you see 8 tasks for a stage? It appears you're in local[*] mode on a 8-core machine (like me) and that's why I'm asking such basic questions.
Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Jul 26, 2016 at 2:39 PM, Mail.com <pradeep.mi...@mail.com> wrote: > More of jars and files and app name. It runs on yarn-client mode. > > Thanks, > Pradeep > >> On Jul 26, 2016, at 7:10 AM, Jacek Laskowski <ja...@japila.pl> wrote: >> >> Hi, >> >> What's "<all other stuff>"? What master URL do you use? >> >> Pozdrawiam, >> Jacek Laskowski >> ---- >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> >>> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <pradeep.mi...@mail.com> wrote: >>> Hi All, >>> >>> I have a directory which has 12 files. I want to read the entire file so I >>> am reading it as wholeTextFiles(dirpath, numPartitions). >>> >>> I run spark-submit as <all other stuff> --num-executors 12 --executor-cores >>> 1 and numPartitions 12. >>> >>> However, when I run the job I see that the stage which reads the directory >>> has only 8 tasks. So some task reads more than one file and takes twice the >>> time. >>> >>> What can I do that the files are read by 12 tasks I.e one file per task. >>> >>> Thanks, >>> Pradeep >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org