[ 
https://issues.apache.org/jira/browse/YUNIKORN-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092476#comment-17092476
 ] 

Tao Yang commented on YUNIKORN-21:
----------------------------------

Thanks [~wangda] for the review.
Replies about these questions are as following:
{quote}
1) What is the difference between node-sorting policy and evaluator? If the 
node-order is solely based on the evaluator, we can use one single component to 
replace the two? (Or we only expose sorter interface and the evaluator becomes 
implementation details).
{quote}
Node-sorting policy is just as before which only defines two policy (Fairness 
and BinPacking) without implementations, evaluator is used to evaluate nodes 
and give scores in different dimensions, we can have a simple evaluator as 
before (just give a score whose value is the used/capacity ratio), or a more 
complex evaluator based on multiple scores and weights. I think we can abstract 
some common components from algorithms so that they are loose-coupled and can 
be reused in different algorithms.

{quote}
2) The scope of incremental sorting algorithm is not very clear to me, are we 
going to maintain a sorted list for every request? It might be too much if we 
want to do it on a per-request basis (we could have many different requests).
{quote}
We are not going to maintain a sorted list for every request, we can just 
maintain one common sorted list for every partition which would be enough for 
most of requests without specified requirements for nodes(like affinity, 
anti-affinity, best-fit). For the requests with specified requirements for 
nodes, my initial thought is to handle them with different strategies, if few 
nodes are need to be adjusted, we can get the common sorted list and rearrange 
some nodes, or else we can directly resort nodes based on the merged 
score(might be commonScore + dynamicScore),  we can have a further discussion 
for this.

{quote}
3) I don't quite sure about the "Weight" concept, are we going to support a 
multi node scorer like K8s default scheduler? I personally don't prefer that 
way, since a weighted result is not easy to explain the behavior.
{quote}
Yes, that's what I thought, I agree that it could be not easy to explain, but I 
think it's flexible to be used for complex scenarios, for example, we may need 
to consider GPU resource as the most important factor for requests with GPU 
requirements, weight is a easy way to define which factor is more important and 
the config value might be different for different clusters. I would like to 
accept a better approach if anyone has.

{quote}
4) How fast we can do the resorting? Since node list is keep changing, and 
node's status also changing fast, are we going to keep an always up-to-date 
sorting result, or we will have some latencies. (If we need pre-sorted node 
lists on a per-request basis, there're too many sorted node lists we need to 
maintain).
{quote}
As the doc described, We can have multiple algorithms like 
DefaultNodeSortingAlgorithm(sort all nodes at all time), 
IncrementalNodeSortingAlgorithm(sort updated nodes incrementally) etc. , users 
can choose which one to use or easily add customized algorithms themselves, for 
IncrementalNodeSortingAlgorithm,  it can leverage SubjectManager to keep 
updated nodes and cached sorting result instead of always up-to-date sorting 
result (as shown in the first sequence diagram in the doc), the resorting only 
happens when scheduler is actually allocating for a specified request, and the 
scheduling interval should be tiny and the updated nodes should be few at most 
times, so that the resorting can be quite fast for common requests. The 
scheduling throughput can be improved from 450 to 5000+ in a mock cluster with 
1000 nodes according to the benchmark results of scheduler_perf_test.go in my 
local test.

{quote}
1/2 are not request-related, 3 is request-related, I'm wondering how we deal 
with these different use cases based on the proposal.
{quote}
For 1/2, users can leverage IncrementalNodeSortingAlgorithm and NodeEvaluator 
which can flexibly defines multiple static scorers and their weights to get a 
better performance , for 3,  sorting all nodes for requests seems unavoidable, 
DefaultNodeSortingAlgorithm and NodeEvaluator with one dynamic scorer(calculate 
the fix score) is enough to be used. Make sense?

{quote}
Also, it will be important to make sure the node sorting policy can be used by 
preemption logic.
{quote}
Agree about this.

> Revisit node sorting algorithm for fairness
> -------------------------------------------
>
>                 Key: YUNIKORN-21
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-21
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Wangda Tan
>            Priority: Major
>         Attachments: Improve node sorting algorithm v1.pdf, Improve node 
> sorting algorithm v2.pdf
>
>
> Currently, we're using DominantRatio for the node sorting algorithm
> {code:java}
> func CompUsageShares(left, right *Resource) int {
>  lshares := getShares(left,nil) rshares := getShares(right,nil)
>  return compareShares(lshares, rshares) 
> }{code}
> Which is not good, two reasons:
>  # Dominate resource compare is about 8X more expensive than single float 
> compares for two resource types.
>  # Dominate resource is not stable when we have scarce resource types like 
> GPU. A node with 192GB mem, 32 vcores, and 1 GPU available, compared to 168GB 
> mem, 64 vcore and 8 GPU available; the prior one can go first because of the 
> following logic:
> {code:java}
> if total == nil || total.Resources[k] == 0 {
>  // negative share is logged
>  if v < 0 {
>   log.Logger().Debug("usage is negative no total, share is also negative", 
> zap.Int64("resource quantity", int64(v))) 
>  }
>  shares[idx] = float64(v) idx++ continue
> }{code}
> I think we should discard dominate resource compare for node resource. 
> Instead, we just use one resource type (like vcores) to compare available 
> resource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to