[ https://issues.apache.org/jira/browse/YUNIKORN-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092476#comment-17092476 ]
Tao Yang commented on YUNIKORN-21: ---------------------------------- Thanks [~wangda] for the review. Replies about these questions are as following: {quote} 1) What is the difference between node-sorting policy and evaluator? If the node-order is solely based on the evaluator, we can use one single component to replace the two? (Or we only expose sorter interface and the evaluator becomes implementation details). {quote} Node-sorting policy is just as before which only defines two policy (Fairness and BinPacking) without implementations, evaluator is used to evaluate nodes and give scores in different dimensions, we can have a simple evaluator as before (just give a score whose value is the used/capacity ratio), or a more complex evaluator based on multiple scores and weights. I think we can abstract some common components from algorithms so that they are loose-coupled and can be reused in different algorithms. {quote} 2) The scope of incremental sorting algorithm is not very clear to me, are we going to maintain a sorted list for every request? It might be too much if we want to do it on a per-request basis (we could have many different requests). {quote} We are not going to maintain a sorted list for every request, we can just maintain one common sorted list for every partition which would be enough for most of requests without specified requirements for nodes(like affinity, anti-affinity, best-fit). For the requests with specified requirements for nodes, my initial thought is to handle them with different strategies, if few nodes are need to be adjusted, we can get the common sorted list and rearrange some nodes, or else we can directly resort nodes based on the merged score(might be commonScore + dynamicScore), we can have a further discussion for this. {quote} 3) I don't quite sure about the "Weight" concept, are we going to support a multi node scorer like K8s default scheduler? I personally don't prefer that way, since a weighted result is not easy to explain the behavior. {quote} Yes, that's what I thought, I agree that it could be not easy to explain, but I think it's flexible to be used for complex scenarios, for example, we may need to consider GPU resource as the most important factor for requests with GPU requirements, weight is a easy way to define which factor is more important and the config value might be different for different clusters. I would like to accept a better approach if anyone has. {quote} 4) How fast we can do the resorting? Since node list is keep changing, and node's status also changing fast, are we going to keep an always up-to-date sorting result, or we will have some latencies. (If we need pre-sorted node lists on a per-request basis, there're too many sorted node lists we need to maintain). {quote} As the doc described, We can have multiple algorithms like DefaultNodeSortingAlgorithm(sort all nodes at all time), IncrementalNodeSortingAlgorithm(sort updated nodes incrementally) etc. , users can choose which one to use or easily add customized algorithms themselves, for IncrementalNodeSortingAlgorithm, it can leverage SubjectManager to keep updated nodes and cached sorting result instead of always up-to-date sorting result (as shown in the first sequence diagram in the doc), the resorting only happens when scheduler is actually allocating for a specified request, and the scheduling interval should be tiny and the updated nodes should be few at most times, so that the resorting can be quite fast for common requests. The scheduling throughput can be improved from 450 to 5000+ in a mock cluster with 1000 nodes according to the benchmark results of scheduler_perf_test.go in my local test. {quote} 1/2 are not request-related, 3 is request-related, I'm wondering how we deal with these different use cases based on the proposal. {quote} For 1/2, users can leverage IncrementalNodeSortingAlgorithm and NodeEvaluator which can flexibly defines multiple static scorers and their weights to get a better performance , for 3, sorting all nodes for requests seems unavoidable, DefaultNodeSortingAlgorithm and NodeEvaluator with one dynamic scorer(calculate the fix score) is enough to be used. Make sense? {quote} Also, it will be important to make sure the node sorting policy can be used by preemption logic. {quote} Agree about this. > Revisit node sorting algorithm for fairness > ------------------------------------------- > > Key: YUNIKORN-21 > URL: https://issues.apache.org/jira/browse/YUNIKORN-21 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler > Reporter: Wangda Tan > Priority: Major > Attachments: Improve node sorting algorithm v1.pdf, Improve node > sorting algorithm v2.pdf > > > Currently, we're using DominantRatio for the node sorting algorithm > {code:java} > func CompUsageShares(left, right *Resource) int { > lshares := getShares(left,nil) rshares := getShares(right,nil) > return compareShares(lshares, rshares) > }{code} > Which is not good, two reasons: > # Dominate resource compare is about 8X more expensive than single float > compares for two resource types. > # Dominate resource is not stable when we have scarce resource types like > GPU. A node with 192GB mem, 32 vcores, and 1 GPU available, compared to 168GB > mem, 64 vcore and 8 GPU available; the prior one can go first because of the > following logic: > {code:java} > if total == nil || total.Resources[k] == 0 { > // negative share is logged > if v < 0 { > log.Logger().Debug("usage is negative no total, share is also negative", > zap.Int64("resource quantity", int64(v))) > } > shares[idx] = float64(v) idx++ continue > }{code} > I think we should discard dominate resource compare for node resource. > Instead, we just use one resource type (like vcores) to compare available > resource. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org