[
https://issues.apache.org/jira/browse/MESOS-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835725#comment-16835725
]
Meng Zhu commented on MESOS-9725:
---------------------------------
Based on a recent internal test, the sort() does not take much time. And this
ticket would introduce some extra complexities.
The review above (https://reviews.apache.org/r/70497/) is pretty ready except
one issue that still needs to figure out. In the review, we used a hashmap and
used double as the key. This worries us because of the double precision issue.
A solution is to use rational numbers.
Given the benefit and complexity of the patch, we decided to shelve it for now.
Move this ticket back to `accepted`.
> Perform incremental sorting in the random sorter.
> -------------------------------------------------
>
> Key: MESOS-9725
> URL: https://issues.apache.org/jira/browse/MESOS-9725
> Project: Mesos
> Issue Type: Improvement
> Reporter: Meng Zhu
> Assignee: Meng Zhu
> Priority: Major
> Labels: performance, resource-management
>
> By doing random sampling every time as the caller asks for the next client
> (See MESOS-9722) we could avoid the cost of full shuffling and only pay as we
> go.
> While the hope is to do each random sampling with O(1) cost, the presence of
> weights complicates the matter. We will need to pay O(log( n )) for every
> sample even with fancy data structures like segment tree or binary index
> trees (naive ones will result in O( n ) since we need to look at every node's
> weights). And the current full node shuffling is already optimal (nlog( n ))
> if all nodes are picked.
> However, since the number of *distinct* weights is usually much smaller
> comparing to the size of clients, we can minimize the sample cost by picking
> a client in two steps:
> Step1: randomly pick a group of clients that has the same weight by
> generating a weighted random number.
> Step2: Once a vector of clients is chosen, randomly sample a specific client
> within the group. Since all the clients in the chosen vector have the same
> weight, we do not need to consider any weights.
>
> Since the size of distinct weights is usually much smaller comparing to the
> size of clients, this way, we minimize the cost of generating weighted random
> numbers which are linear with the size of weights.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)