Nope it still doesn’t work Thanks - Gaurav
> On Jan 19, 2016, at 2:14 PM, Thomas Weise <[email protected]> wrote: > > Gaurav, > > Does request for specific node work on FairScheduler now? > > > > On Tue, Jan 19, 2016 at 2:06 PM, Gaurav Gupta <[email protected]> > wrote: > >> I agree with Pramod that we should go with 2b and we are already doing >> node locality so you can use that feature. >> Regarding 3, do we need to support relaxed anti_affinity. Anti_affinity >> will mostly be used where user wants such segregation of operators on >> different nodes for his/her App. >> >> Thanks >> - Gaurav >> >>> On Jan 19, 2016, at 1:57 PM, Pramod Immaneni <[email protected]> >> wrote: >>> >>> Sorry I meant distro agnostic (without the not) in the first sentence. >>> >>> On Tue, Jan 19, 2016 at 1:57 PM, Pramod Immaneni <[email protected] >>> >>> wrote: >>> >>>> Isha this sounds great. 2 a. sounds like a good approach that is not >>>> distro agnostic. How about also supporting a minor variation of it as an >>>> option where it greedily gets the total number of containers and >> discards >>>> ones it can't use and repeats the process for the remaining till >> everything >>>> has been allocated. Also does it make sense to support anti-cluster >>>> affinity? >>>> >>>> Thanks >>>> >>>> On Tue, Jan 19, 2016 at 1:21 PM, Isha Arkatkar <[email protected]> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> We want add support for Anti-affinity in Apex to allow applications >> to >>>>> launch specific physical operators on different nodes(APEXCORE-10 >>>>> <https://issues.apache.org/jira/browse/APEXCORE-10>). Want to request >>>>> your >>>>> suggestions/ideas for the same! >>>>> >>>>> The reasons for using anti-affinity in operators could be: to ensure >>>>> reliability, for performance reasons (such as application may not want >> 2 >>>>> i/o intensive operators to land on the same node to improve >> performance) >>>>> or >>>>> for some application specific constraints(for example, 2 partitions >>>>> cannot >>>>> be run on the same node since they use same port number). This is the >>>>> general rationale for adding Anti-affinity support. >>>>> >>>>> Since, Yarn does not support anti-affinity yet (YARN-1042 >>>>> <https://issues.apache.org/jira/browse/YARN-1042>), we need to >> implement >>>>> the logic in AM. Wanted to get your views on following aspects for this >>>>> implementation: >>>>> >>>>> *1. How to specify anti-affinity for physical operators/partitions in >>>>> application:* >>>>> One way for this is to have an attribute for setting anti-affinity >> at >>>>> the logical operator context. And an operator can set this attribute >> with >>>>> list of operator names which should not be collocated. >>>>> Consider dag with 3 operators: >>>>> TestOperator o1 = dag.addOperator("O1", new TestOperator()); >>>>> TestOperator o2 = dag.addOperator("O2", new TestOperator()); >>>>> TestOperator o3 = dag.addOperator("O3", new TestOperator()); >>>>> >>>>> To set anti-affinity for O1 operator: >>>>> dag.setAttribute(o1, OperatorContext.ANTI_AFFINITY, new >>>>> ArrayList<String>(Arrays.asList("O2", "O3"))); >>>>> This would mean O1 should not be allocated on nodes containing >>>>> operators O2 and O3. This applies to all allocated partitions of O1, >> O2, >>>>> O3. >>>>> >>>>> Also, if same operator name is part of anti-affinity list, it means >>>>> partitions of the operator should not be allocated on the same node. >>>>> example: >>>>> dag.setAttribute(o2, OperatorContext.ANTI_AFFINITY, new >>>>> ArrayList<String>(Arrays.asList("O2"))); >>>>> This indicates anti-affinity between all partitions of O2. i.e. all >>>>> partitions of O2 should be launched on different nodes. >>>>> >>>>> Based on the anti-affinity attribute specified for logical operator, >>>>> during physical plan creation, we can add this list to each >> PTContainer. >>>>> This in turn will be available for Stram for sending container requests >>>>> accordingly. >>>>> >>>>> Please suggest if there is a better way to express this intent. >>>>> >>>>> *2. How to implement anti-affinity in AM* >>>>> There are 2 ways we can implement this: >>>>> * a. Blacklisting of nodes: *We can group the physical container >>>>> requests >>>>> based on anti-affinity requirements and send allocation requests for >>>>> containers in groups. After first group is done, blacklist the nodes >>>>> before >>>>> sending second group of container requests. This will ensure that the >>>>> containers with anti-affinity requirements will be allocated on >> different >>>>> nodes. >>>>> * b. Node specific container request: *Explore and create a map of >> nodes >>>>> present in the cluster and send allocation request for container on a >>>>> specific node, honoring anti-affinity. There are couple of open Yarn >> Jiras >>>>> for node specific container requests: YARN-1412 >>>>> <https://issues.apache.org/jira/browse/YARN-1412>, YARN-2027 >>>>> <https://issues.apache.org/jira/browse/YARN-2027>. So, need to check >> if >>>>> this is a plausible approach. >>>>> >>>>> *3. Strict Vs Relaxed anti-affinity* >>>>> Depending on cluster resources availability, it may not be possible to >>>>> honor all anti-affinity requirements specified. >>>>> *Strict Anti-affinity:* AM will keep trying to allocate containers as >> per >>>>> anti-affinity requirements indefinitely. This behavior will be similar >> to >>>>> how an application shows in ACCEPTED state, till resources are >> available >>>>> to >>>>> launch in cluster. >>>>> *Relaxed Anti-affinity:* AM will drop the anti-affinity constraint >> after a >>>>> certain timeout. >>>>> >>>>> We need a way to set this attribute through application. (Either in >>>>> operator context or in DAGContext for application wide setting.) >>>>> >>>>> *4. How do we unit test this feature* >>>>> We could use Mockito for mocking Yarn behaviors and test only AM >>>>> implementation, since it may not be easy to simulate some scenarios >>>>> manually in cluster. Please suggest if there are better ways to test >> this. >>>>> >>>>> Please suggest improvements or any other ideas on all of the above. >>>>> >>>>> Thanks! >>>>> Isha >>>>> >>>>> P.S. Sorry for long email. Please let me know if I should start >> separate >>>>> threads for any of the above points. >>>>> >>>> >>>> >> >>
