[ 
https://issues.apache.org/jira/browse/STORM-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418155#comment-15418155
 ] 

ASF GitHub Bot commented on STORM-1766:
---------------------------------------

Github user harshach commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1621#discussion_r74524961
  
    --- Diff: storm-core/src/jvm/org/apache/storm/scheduler/Cluster.java ---
    @@ -103,6 +103,9 @@ public Cluster(Cluster src) {
             this.status.putAll(src.status);
             this.topologyResources.putAll(src.topologyResources);
             this.blackListedHosts.addAll(src.blackListedHosts);
    +        if (src.networkTopography != null) {
    --- End diff --
    
    is this supposed to be == null. why are we creating new Map if there is one 
already.


> A better algorithm server rack selection for RAS
> ------------------------------------------------
>
>                 Key: STORM-1766
>                 URL: https://issues.apache.org/jira/browse/STORM-1766
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: Boyang Jerry Peng
>            Assignee: Boyang Jerry Peng
>             Fix For: 2.0.0
>
>
> Currently the getBestClustering algorithm for RAS finds the "Best" 
> cluster/rack based on which rack has the most available resources this may be 
> insufficient and may cause topologies not to be able to be scheduled 
> successfully even though there are enough resources to schedule it in the 
> cluster. We attempt to find the rack with the most resources by find the rack 
> with the biggest sum of available memory + available cpu. This method is not 
> effective since it does not consider the number of slots available. This 
> method also fails in identifying racks that are not schedulable due to the 
> exhaustion of one of the resources either memory, cpu, or slots. The current 
> implementation also tries the initial scheduling on one rack and not try to 
> schedule on all the racks before giving up which may cause topologies to be 
> failed to be scheduled due to the above mentioned shortcomings in the current 
> method. Also the current method does not consider failures of workers. When 
> executors of a topology gets unassigned and needs to be scheduled again, the 
> current logic in getBestClustering may be inadequate if not complete wrong. 
> When executors needs to rescheduled due to a fault, getBestClustering will 
> likely return a cluster that is different from where the majority of 
> executors from the topology is originally scheduling in.
> Thus, I propose a different strategy/algorithm to find the "best" cluster. I 
> have come up with a ordering strategy I dub subordinate resource availability 
> ordering (inspired by Dominant Resource Fairness) that sorts racks by the 
> subordinate (not dominant) resource availability.
> For example given 4 racks with the following resource availabilities
> {code}
> //generate some that has alot of memory but little of cpu
> rack-3 Avail [ CPU 100.0 MEM 200000.0 Slots 40 ] Total [ CPU 100.0 MEM 
> 200000.0 Slots 40 ]
> //generate some supervisors that are depleted of one resource
> rack-2 Avail [ CPU 0.0 MEM 80000.0 Slots 40 ] Total [ CPU 0.0 MEM 80000.0 
> Slots 40 ]
> //generate some that has a lot of cpu but little of memory
> rack-4 Avail [ CPU 6100.0 MEM 10000.0 Slots 40 ] Total [ CPU 6100.0 MEM 
> 10000.0 Slots 40 ]
> //generate another rack of supervisors with less resources than rack-0
> rack-1 Avail [ CPU 2000.0 MEM 40000.0 Slots 40 ] Total [ CPU 2000.0 MEM 
> 40000.0 Slots 40 ]
> rack-0 Avail [ CPU 4000.0 MEM 80000.0 Slots 40( ] Total [ CPU 4000.0 MEM 
> 80000.0 Slots 40 ]
> Cluster Overall Avail [ CPU 12200.0 MEM 410000.0 Slots 200 ] Total [ CPU 
> 12200.0 MEM 410000.0 Slots 200 ]
> {code}
> It is clear that rack-0 is the best cluster since its the most balanced and 
> can potentially schedule the most executors, while rack-2 is the worst rack 
> since rack-2 is depleted of cpu resource thus rendering it unschedulable even 
> though there are other resources available.
> We first calculate the resource availability percentage of all the racks for 
> each resource by computing:
> {code}
> (resource available on rack) / (resource available in cluster)
> {code}
> We do this calculation to normalize the values otherwise the resource values 
> would not be comparable.
> So for our example:
> {code}
> rack-3 Avail [ CPU 0.819672131147541% MEM 48.78048780487805% Slots 20.0% ] 
> effective resources: 0.00819672131147541
> rack-2 Avail [ 0.0% MEM 19.51219512195122% Slots 20.0% ] effective resources: 
> 0.0
> rack-4 Avail [ CPU 50.0% MEM 2.4390243902439024% Slots 20.0% ] effective 
> resources: 0.024390243902439025
> rack-1 Avail [ CPU 16.39344262295082% MEM 9.75609756097561% Slots 20.0% ] 
> effective resources: 0.0975609756097561
> rack-0 Avail [ CPU 32.78688524590164% MEM 19.51219512195122% Slots 20.0% ] 
> effective resources: 0.1951219512195122
> {code}
> The effective resource of a rack, which is also the subordinate resource, is 
> computed by: 
> {code}
> MIN(resource availability percentage of {CPU, Memory, # of free Slots}).
> {code}
> Then we order the racks by the effective resource.
> Thus for our example:
> {code}
> Sorted rack: [rack-0, rack-1, rack-4, rack-3, rack-2]
> {code}
> Also to deal with the presence of failures, if a topology is partially 
> scheduled, we find the rack with the most scheduled executors for the 
> topology and we try to schedule on that rack first.
> Thus for the sorting for racks. We first sort by the number of executors 
> already scheduled on the rack and then by the subordinate resource 
> availability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to