It is already possible to request a specific host for a partition. But you may want to evaluate the cost of container allocation and need to reset the entire DAG against the benefits that you get from data locality.
-- sent from mobile On May 9, 2016 2:59 PM, "Chandni Singh" <[email protected]> wrote: > Hi Pramod, > > I thought about this and IMO one way to achieve a little more efficiently > is by providing some support from the platform and intelligent > partitioning in BlockReader. > > 1. Platform support: A partition be able to express on which node it > should be created. Application master then requests RM to deploy the > partition on that node. > > 2. Initially just one instance of Block Reader is created. When it receives > BlockMetadata, it can derive where the new hdfs blocks are. So it can > create more Partitions if there isn't a BlockReader on that node already > running. > > I will like to take it up if there is some consensus to support this. > > Chandni > > On Mon, May 9, 2016 at 2:56 PM, Sandesh Hegde <[email protected]> > wrote: > > > So the requirement is to mix runtime and deployment decisions. > > How about allowing the operators to request re-deployment based on the > > runtime condition? > > > > > > On Mon, May 9, 2016 at 2:33 PM Pramod Immaneni <[email protected]> > > wrote: > > > > > The file splitter, block reader combination allows for parallel reading > > of > > > files by multiple partitions by dividing the files into blocks. Does > > anyone > > > have any ideas on how to have the block readers be data local to the > > blocks > > > they are reading. > > > > > > I think we will need to spawn block readers on all nodes where the > block > > > are present and if the readers are reading multiple files this could > mean > > > all the nodes in the cluster and route the block meta information to > the > > > appropriate block reader. > > > > > > Thanks > > > > > >
