Yes driver keeps fair amount of meta data to manage scheduling across all your executors. I assume with 64 nodes you have more executors as well. Simple way to test is to increase driver memory.
On Wed, Jun 22, 2016 at 10:10 AM, Raghava Mutharaju < m.vijayaragh...@gmail.com> wrote: > It is an iterative algorithm which uses map, mapPartitions, join, union, > filter, broadcast and count. The goal is to compute a set of tuples and in > each iteration few tuples are added to it. Outline is given below > > 1) Start with initial set of tuples, T > 2) In each iteration compute deltaT, and add them to T, i.e., T = T + > deltaT > 3) Stop when current T size (count) is same as previous T size, i.e., > deltaT is 0. > > Do you think something happens on the driver due to the application logic > and when the partitions are increased? > > Regards, > Raghava. > > > On Wed, Jun 22, 2016 at 12:33 PM, Sonal Goyal <sonalgoy...@gmail.com> > wrote: > >> What does your application do? >> >> Best Regards, >> Sonal >> Founder, Nube Technologies <http://www.nubetech.co> >> Reifier at Strata Hadoop World >> <https://www.youtube.com/watch?v=eD3LkpPQIgM> >> Reifier at Spark Summit 2015 >> <https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/> >> >> <http://in.linkedin.com/in/sonalgoyal> >> >> >> >> On Wed, Jun 22, 2016 at 9:57 PM, Raghava Mutharaju < >> m.vijayaragh...@gmail.com> wrote: >> >>> Hello All, >>> >>> We have a Spark cluster where driver and master are running on the same >>> node. We are using Spark Standalone cluster manager. If the number of nodes >>> (and the partitions) are increased, the same dataset that used to run to >>> completion on lesser number of nodes is now giving an out of memory on the >>> driver. >>> >>> For example, a dataset that runs on 32 nodes with number of partitions >>> set to 256 completes whereas the same dataset when run on 64 nodes with >>> number of partitions as 512 gives an OOM on the driver side. >>> >>> From what I read in the Spark documentation and other articles, >>> following are the responsibilities of the driver/master. >>> >>> 1) create spark context >>> 2) build DAG of operations >>> 3) schedule tasks >>> >>> I am guessing that 1) and 2) should not change w.r.t number of >>> nodes/partitions. So is it that since the driver has to keep track of lot >>> more tasks, that it gives an OOM? >>> >>> What could be the possible reasons behind the driver-side OOM when the >>> number of partitions are increased? >>> >>> Regards, >>> Raghava. >>> >> >> > > > -- > Regards, > Raghava > http://raghavam.github.io > -- [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] <https://twitter.com/Xactly> [image: Facebook] <https://www.facebook.com/XactlyCorp> [image: YouTube] <http://www.youtube.com/xactlycorporation>