[ https://issues.apache.org/jira/browse/MESOS-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835725#comment-16835725 ]
Meng Zhu commented on MESOS-9725: --------------------------------- Based on a recent internal test, the sort() does not take much time. And this ticket would introduce some extra complexities. The review above (https://reviews.apache.org/r/70497/) is pretty ready except one issue that still needs to figure out. In the review, we used a hashmap and used double as the key. This worries us because of the double precision issue. A solution is to use rational numbers. Given the benefit and complexity of the patch, we decided to shelve it for now. Move this ticket back to `accepted`. > Perform incremental sorting in the random sorter. > ------------------------------------------------- > > Key: MESOS-9725 > URL: https://issues.apache.org/jira/browse/MESOS-9725 > Project: Mesos > Issue Type: Improvement > Reporter: Meng Zhu > Assignee: Meng Zhu > Priority: Major > Labels: performance, resource-management > > By doing random sampling every time as the caller asks for the next client > (See MESOS-9722) we could avoid the cost of full shuffling and only pay as we > go. > While the hope is to do each random sampling with O(1) cost, the presence of > weights complicates the matter. We will need to pay O(log( n )) for every > sample even with fancy data structures like segment tree or binary index > trees (naive ones will result in O( n ) since we need to look at every node's > weights). And the current full node shuffling is already optimal (nlog( n )) > if all nodes are picked. > However, since the number of *distinct* weights is usually much smaller > comparing to the size of clients, we can minimize the sample cost by picking > a client in two steps: > Step1: randomly pick a group of clients that has the same weight by > generating a weighted random number. > Step2: Once a vector of clients is chosen, randomly sample a specific client > within the group. Since all the clients in the chosen vector have the same > weight, we do not need to consider any weights. > > Since the size of distinct weights is usually much smaller comparing to the > size of clients, this way, we minimize the cost of generating weighted random > numbers which are linear with the size of weights. -- This message was sent by Atlassian JIRA (v7.6.3#76005)