[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317763#comment-15317763
 ] 

Fan Du edited comment on MESOS-5545 at 6/7/16 3:35 AM:
-------------------------------------------------------

[~vinodkone] Thanks for the comments.

Rack topology information does not fall into scope of network isolator, because 
it's not the target which can/should be isolated.

Here is the explanation to justify rack topology information can be updated:
The state of rack information could only transit from no rack information to 
valid rack information, in other words, it's possible that tasks use resources 
without rack information, but later on agents report rack id to master, the 
logic could be one/all of design decisions: a) notify corresponding frameworks 
with updated rack id for previous resources, b) subsequent allocation will have 
rack id tagged with agents, c)Resource freed by framework will have rack id for 
the next round allocation. The scenario is simpler and cleaner compared with 
attributes updates. OR only activate the agents for resource allocation once 
got valid rack id.

Using attributes is a way to export the rack information, but I don't think 
that's possible in production, scale of +10000 servers, setting attributes with 
rack information from 3rd party logic and start agents?! Automatically exposing 
the rack information could save lots of deployment and maintenance effort. 

Apologize, seems I don't quite get the meaning of first class field, 
influencing allocation decision is not the intention of the ticket, I believe 
that part of work is out of scope the ticket, which I put them in the Future 
section of the design doc. The allocation strategy DOES honor DRF, current 
implementation is do the allocation in a per agent basis, and we could 
investigate different allocation modes.

In addition, I'd prefer arranging agents in a per rack basis, because randomly 
shuffling agents scale to +10000 nodes is no good for every allocation 
iteration.
IIRC, this number is grown.

All in all, IMHO, it's a good feature for Mesos, the question is how to do it 
elegantly. :)


was (Author: fan.du):
[~vinodkone] Thanks for the comments.

Rack topology information does not fall into scope of network isolator, because 
it's not the target which can/should be isolated.

Here is the explanation to justify rack topology information can be updated:
The state of rack information could only transit from no rack information to 
valid rack information, in other words, it's possible that tasks use resources 
without rack information, but later on agents report rack id to master, the 
logic could be one/all of design decisions: a) notify corresponding frameworks 
with updated rack id for previous resources, b) subsequent allocation will have 
rack id tagged with agents, c)Resource freed by framework will have rack id for 
the next round allocation. The scenario is simpler and cleaner compared with 
attributes updates. OR only activate the agents for resource allocation once 
got valid rack id.

Using attributes is a way to export the rack information, but I don't think 
that's possible in production, scale of +10000 servers, setting attributes with 
rack information from 3rd party logic and start agents?! Automatically exposing 
the rack information could save lots of deployment and maintenance effort. 

Apologize, seems I don't quite get the meaning of first class field, 
influencing allocation decision is not the intention of the ticket, I believe 
that part of work is out of scope the ticket, which I put them in the Future 
section of the design doc.
The allocation strategy DOES honor DRF, current implementation is do the 
allocation in a per agent basis, and we could investigate different allocation 
modes.

In addition, I'd prefer arranging agents in a per rack basis, because randomly 
shuffling agents scale to +10000 nodes is no good for every allocation 
iteration.
IIRC, this number is grown.

All in all, IMHO, it's a good feature for Mesos, the question is how to do it 
elegantly. :)

> Add rack awareness support for Mesos resources
> ----------------------------------------------
>
>                 Key: MESOS-5545
>                 URL: https://issues.apache.org/jira/browse/MESOS-5545
>             Project: Mesos
>          Issue Type: Story
>          Components: hadoop, master
>            Reporter: Fan Du
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to