Re: Custom Hadoop InputSplit, Spark partitions, spark executors/task and Yarn containers

2015-09-24 Thread Adrian Tanase
To: Anfernee Xu Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Re: Custom Hadoop InputSplit, Spark partitions, spark executors/task and Yarn containers Hi Anfernee, That's correct that each InputSplit will map to exactly a Spark partition. On YARN, each Spa

Re: Custom Hadoop InputSplit, Spark partitions, spark executors/task and Yarn containers

2015-09-24 Thread Sabarish Sasidharan
dy Ryza > Date: Thursday, September 24, 2015 at 2:43 AM > To: Anfernee Xu > Cc: "user@spark.apache.org" > Subject: Re: Custom Hadoop InputSplit, Spark partitions, spark > executors/task and Yarn containers > > Hi Anfernee, > > That's correct that each InputSplit w

Custom Hadoop InputSplit, Spark partitions, spark executors/task and Yarn containers

2015-09-23 Thread Anfernee Xu
Hi Spark experts, I'm coming across these terminologies and having some confusions, could you please help me understand them better? For instance I have implemented a Hadoop InputFormat to load my external data in Spark, in turn my custom InputFormat will create a bunch of InputSplit's, my

Re: Custom Hadoop InputSplit, Spark partitions, spark executors/task and Yarn containers

2015-09-23 Thread Sandy Ryza
Hi Anfernee, That's correct that each InputSplit will map to exactly a Spark partition. On YARN, each Spark executor maps to a single YARN container. Each executor can run multiple tasks over its lifetime, both parallel and sequentially. If you enable dynamic allocation, after the stage