It is already possible to request a specific host for a partition.

But you may want to evaluate the cost of container allocation and need to
reset the entire DAG against the benefits that you get from data locality.

--
sent from mobile
On May 9, 2016 2:59 PM, "Chandni Singh" <[email protected]> wrote:

> Hi Pramod,
>
> I thought about this and IMO one way to achieve a little more efficiently
>  is by providing some support from the platform and intelligent
> partitioning in BlockReader.
>
> 1.  Platform support: A partition be able to express on which node it
> should be created. Application master then requests RM to deploy the
> partition on that node.
>
> 2. Initially just one instance of Block Reader is created. When it receives
> BlockMetadata, it can derive where the new hdfs blocks are. So it can
> create more Partitions if there isn't a BlockReader on that node already
> running.
>
> I will like to take it up if there is some consensus to support this.
>
> Chandni
>
> On Mon, May 9, 2016 at 2:56 PM, Sandesh Hegde <[email protected]>
> wrote:
>
> > So the requirement is to mix runtime and deployment decisions.
> > How about allowing the operators to request re-deployment based on the
> > runtime condition?
> >
> >
> > On Mon, May 9, 2016 at 2:33 PM Pramod Immaneni <[email protected]>
> > wrote:
> >
> > > The file splitter, block reader combination allows for parallel reading
> > of
> > > files by multiple partitions by dividing the files into blocks. Does
> > anyone
> > > have any ideas on how to have the block readers be data local to the
> > blocks
> > > they are reading.
> > >
> > > I think we will need to spawn block readers on all nodes where the
> block
> > > are present and if the readers are reading multiple files this could
> mean
> > > all the nodes in the cluster and route the block meta information to
> the
> > > appropriate block reader.
> > >
> > > Thanks
> > >
> >
>

Reply via email to