Re: Support for Anti-Affinity in Apex

Isha Arkatkar Tue, 19 Jan 2016 14:20:07 -0800

Hi Gaurav,

   I think Pramod agreed with approach 2a.
   For 2b approach, I think initially we will not have set of nodes
available in cluster. So we need to try greedy approach to get containers
on different nodes to get a mapping. Once we have mapping though, we can
send request on specific nodes. I think 2b can be more applicable for
container re-allocation scenario, if it node locality request is honored by
Yarn. I can test this out.


Pramod, could you please elaborate on anti-cluster affinity?

Thanks,
Isha

On Tue, Jan 19, 2016 at 2:14 PM, Thomas Weise <[email protected]>
wrote:

> Gaurav,
>
> Does request for specific node work on FairScheduler now?
>
>
>
> On Tue, Jan 19, 2016 at 2:06 PM, Gaurav Gupta <[email protected]>
> wrote:
>
> > I agree with Pramod that we should go with 2b and we are already doing
> > node locality so you can use that feature.
> > Regarding 3, do we need to support relaxed anti_affinity. Anti_affinity
> > will mostly be used where user wants such segregation of operators on
> > different nodes for his/her App.
> >
> > Thanks
> > - Gaurav
> >
> > > On Jan 19, 2016, at 1:57 PM, Pramod Immaneni <[email protected]>
> > wrote:
> > >
> > > Sorry I meant distro agnostic (without the not) in the first sentence.
> > >
> > > On Tue, Jan 19, 2016 at 1:57 PM, Pramod Immaneni <
> [email protected]
> > >
> > > wrote:
> > >
> > >> Isha this sounds great. 2 a. sounds like a good approach that is not
> > >> distro agnostic. How about also supporting a minor variation of it as
> an
> > >> option where it greedily gets the total number of containers and
> > discards
> > >> ones it can't use and repeats the process for the remaining till
> > everything
> > >> has been allocated. Also does it make sense to support anti-cluster
> > >> affinity?
> > >>
> > >> Thanks
> > >>
> > >> On Tue, Jan 19, 2016 at 1:21 PM, Isha Arkatkar <[email protected]>
> > >> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>>   We want add support for Anti-affinity in Apex to allow applications
> > to
> > >>> launch specific physical operators on different nodes(APEXCORE-10
> > >>> <https://issues.apache.org/jira/browse/APEXCORE-10>). Want to
> request
> > >>> your
> > >>> suggestions/ideas for the same!
> > >>>
> > >>>  The reasons for using anti-affinity in operators could be: to ensure
> > >>> reliability, for performance reasons (such as application may not
> want
> > 2
> > >>> i/o intensive operators to land on the same node to improve
> > performance)
> > >>> or
> > >>> for some application specific constraints(for example,  2 partitions
> > >>> cannot
> > >>> be run on the same node since they use same port number). This is the
> > >>> general rationale for adding Anti-affinity support.
> > >>>
> > >>> Since, Yarn does not support anti-affinity yet (YARN-1042
> > >>> <https://issues.apache.org/jira/browse/YARN-1042>), we need to
> > implement
> > >>> the logic in AM. Wanted to get your views on following aspects for
> this
> > >>> implementation:
> > >>>
> > >>> *1. How to specify anti-affinity for physical operators/partitions in
> > >>> application:*
> > >>>    One way for this is to have an attribute for setting anti-affinity
> > at
> > >>> the logical operator context. And an operator can set this attribute
> > with
> > >>> list of operator names which should not be collocated.
> > >>>     Consider dag with 3 operators:
> > >>>     TestOperator o1 = dag.addOperator("O1", new TestOperator());
> > >>>     TestOperator o2 = dag.addOperator("O2", new TestOperator());
> > >>>     TestOperator o3 = dag.addOperator("O3", new TestOperator());
> > >>>
> > >>> To set anti-affinity for O1 operator:
> > >>>    dag.setAttribute(o1, OperatorContext.ANTI_AFFINITY, new
> > >>> ArrayList<String>(Arrays.asList("O2", "O3")));
> > >>>     This would mean O1 should not be allocated on nodes containing
> > >>> operators O2 and O3. This applies to all allocated partitions of O1,
> > O2,
> > >>> O3.
> > >>>
> > >>>   Also, if same operator name is part of anti-affinity list, it means
> > >>> partitions of the operator should not be allocated on the same node.
> > >>> example:
> > >>>    dag.setAttribute(o2, OperatorContext.ANTI_AFFINITY, new
> > >>> ArrayList<String>(Arrays.asList("O2")));
> > >>>    This indicates anti-affinity between all partitions of O2. i.e.
> all
> > >>> partitions of O2 should be launched on different nodes.
> > >>>
> > >>>   Based on the anti-affinity attribute specified for logical
> operator,
> > >>> during physical plan creation, we can add this list to each
> > PTContainer.
> > >>> This in turn will be available for Stram for sending container
> requests
> > >>> accordingly.
> > >>>
> > >>>   Please suggest if there is a better way to express this intent.
> > >>>
> > >>> *2. How to implement anti-affinity in AM*
> > >>>   There are 2 ways we can implement this:
> > >>>  * a. Blacklisting of nodes: *We can group the physical container
> > >>> requests
> > >>> based on anti-affinity requirements and send allocation requests for
> > >>> containers in groups. After first group is done, blacklist the nodes
> > >>> before
> > >>> sending second group of container requests. This will ensure that the
> > >>> containers with anti-affinity requirements  will be allocated on
> > different
> > >>> nodes.
> > >>> *   b. Node specific container request: *Explore and create a map of
> > nodes
> > >>> present in the cluster and send allocation request for container on a
> > >>> specific node, honoring anti-affinity. There are couple of open Yarn
> > Jiras
> > >>> for node specific container requests: YARN-1412
> > >>> <https://issues.apache.org/jira/browse/YARN-1412>, YARN-2027
> > >>> <https://issues.apache.org/jira/browse/YARN-2027>. So, need to check
> > if
> > >>> this is a plausible approach.
> > >>>
> > >>> *3. Strict Vs Relaxed anti-affinity*
> > >>>  Depending on cluster resources availability, it may not be possible
> to
> > >>> honor all anti-affinity requirements specified.
> > >>> *Strict Anti-affinity:* AM will keep trying to allocate containers as
> > per
> > >>> anti-affinity requirements indefinitely. This behavior will be
> similar
> > to
> > >>> how an application shows in ACCEPTED state, till resources are
> > available
> > >>> to
> > >>> launch in cluster.
> > >>> *Relaxed Anti-affinity:* AM will drop the anti-affinity constraint
> > after a
> > >>> certain timeout.
> > >>>
> > >>> We need a way to set this attribute through application. (Either in
> > >>> operator context or in DAGContext for application wide setting.)
> > >>>
> > >>> *4. How do we unit test this feature*
> > >>>  We could use Mockito for mocking Yarn behaviors and test only AM
> > >>> implementation, since it may not be easy to simulate some scenarios
> > >>> manually in cluster. Please suggest if there are better ways to test
> > this.
> > >>>
> > >>> Please suggest improvements or any other ideas on all of the above.
> > >>>
> > >>> Thanks!
> > >>> Isha
> > >>>
> > >>> P.S. Sorry for long email. Please let me know if I should start
> > separate
> > >>> threads for any of the above points.
> > >>>
> > >>
> > >>
> >
> >
>

Re: Support for Anti-Affinity in Apex

Reply via email to