Re: [PATCH v2 master] Add proposal of a new data collector and new command-line options

'Viktor Bachraty' via ganeti-devel Wed, 03 Aug 2016 10:53:52 -0700

LGTM, I just changed a few typos. Thanks!


On Tuesday, August 2, 2016 at 3:23:34 PM UTC+1, Daniil Leshchev wrote:
>
> From: Daniil Leshchev <[email protected]> 
>
> Introduce new command-line options for configuring 
> balancing process. 
>
> Introduce the data collector for gathering information 
> about network speed. This information can be used in order 
> to optimize time of cluster balancing. 
> --- 
>  doc/design-migration-speed-hbal.rst | 98 
> +++++++++++++++++++++++++++++++++++++ 
>  1 file changed, 98 insertions(+) 
>
> diff --git a/doc/design-migration-speed-hbal.rst 
> b/doc/design-migration-speed-hbal.rst 
> index a0dcfe0..6eba94c 100644 
> --- a/doc/design-migration-speed-hbal.rst 
> +++ b/doc/design-migration-speed-hbal.rst 
> @@ -26,3 +26,101 @@ a compromise between moves speed and optimal scoring. 
> This can be implemented 
>  by introducing ``--avoid-disk-moves *FACTOR*`` option which will admit 
> disk 
>  moves only if the gain in the cluster metrics is *FACTOR* times 
>  higher than the gain achievable by non disk moves. 
> + 
> +Avoiding insignificant long-time solutions 
> +========================================== 
> + 
> +The next step is to estimate an amount of time required to perform a 
> balancing 
> +step and introduce a new term: ``long-time`` solution. 
> + 
> +``--long-solution-threshold`` option will specify a duration in seconds. 
> +A solution exceeding the duration is a ``long-time`` solution by 
> definition. 
> + 
> +With time estimations we will be able to filter Hbal's sequences and 
> +eliminating long-time solutions which don't lead to a sufficient cluster 
> metric 
> +improvement. This can be done by the ``--avoid-long-solutions *FACTOR*`` 
> option 
> +which will allow only long solutions, whose K/N metrics are more than 
> *FACTOR*, 
> +where K is the number of times cluster metric has increased and N is an 
> +estimated time to perform this solution divided by the threshold. 
> + 
> +The default values for the new options are: 
> + 
> +``--long-solution-threshold`` = 1000 seconds 
> +(all adequate solutions are not ``long-time``) 
> + 
> +``--avoid-long-solutions`` = 0.0 
> +(no filtering by time estimations, feature disabled) 
> + 
> +Network bandwidth cluster tags 
> +============================== 
> + 
> +The bandwidth between nodes, node groups and within whole cluster could 
> be 
> +specified "by hand" with the cluster tags. 
> + 
> +Every node contains it's own set of bandwidth tags. Tags from higher 
> level are 
> +inherited by lower levels: nodegroups inherits cluster tags and so nodes 
> +]inherits nodegroups tags. Tags from lower level (if exist) override 
> higher 
> +level tags with the same bandwidth prefixes: node tags override 
> nodegroups tags 
> +as well as nodegroups tags override cluster tags. 
> + 
> +Below are some examples of using bandwidth tags: 
> +(The examples are provided using in the Htools Text backend format) 
> + 
> +1) single nodegroup with 5 nodes, all bandwidths symmetric. 
> + 
> +group-01|...|nic:100MBit| 
> + 
> +*no* node-level bandwidth tags (they are inherited by group tags) 
> + 
> +htools:bandwidth:nic 
> +htools:bandwidth:nic:100MBit::nic:100MBit::100 
> + 
> + 
> +2) 3 nodegroups, within each nodegroup symmetric bandwidths, 
> +between nodegroups different bandwidths. 
> + 
> +group-01|...|nic:1000MBit (overrides cluster tags)| 
> +group-02|...|nic:100MBit (overrides cluster tags)| 
> +group-03|...|inherited by cluster tags (nic:10MBit)| 
> + 
> +*no* node-level bandwidth tags (they are inherited by group tags) 
> + 
> +htools:bandwidth:nic 
> +nic:10MBit 
> +htools:bandwidth:nic:10MBit::nic:10MBit::10 
> +htools:bandwidth:nic:100MBit::nic:100MBit::100 
> +htools:bandwidth:nic:1000MBit::nic:1000MBit::1000 
> +htools:bandwidth:nic:10MBit::nic:100MBit::10 
> +htools:bandwidth:nic:10MBit::nic:1000MBit::10 
> +htools:bandwidth:nic:100MBit::nic:1000MBit::100 
> + 
> + 
> +Network bandwidth estimation 
> +============================ 
> + 
> +Balancing time can be estimated by dividing the amount of data to be 
> moved by 
> +the current network bandwidth between the affected nodes. 
> + 
> +We propose to add a new data collector, that will gather information 
> about 
> +network speed by sending a test file between between nodes. Accounting 
> the 
> +time, we can estimate average network speed between nodegroups in a 
> cluster. 
> + 
> +DataCollector implementation details 
> +==================================== 
> + 
> +The new bandwidth data collector introduces an ability to collect actual 
> +information about network speed between nodegroups in a cluster. We 
> assume, 
> +that the network bandwidth between any nodes within one nodegroup is 
> almost the 
> +same unlike the network speed between different nodegroups. So the 
> proposed 
> +data collector will provide the most necessary data for time estimations. 
> + 
> +For sending packets *scp* utility will be used. The default size of file 
> +to send is 5Mbyte in order to obtain adequate values of the network 
> speed. 
> + 
> +During *dcUpdate* every data collector sends file with known size to a 
> node 
> +from another nodegroup (chosen randomly) and measures time to perform it. 
> So 
> +after, MonD get response of *dcReport* from collectors, it will be able 
> to fill 
> +nodegroup bandwidth map in nodes. The information collected will be used 
> to 
> +estimate the time of balancing steps. In the case of existing network 
> speed 
> +information for some node from bandwidth tags as well as from bandwidth 
> data 
> +collector the last one will be chosen. 
> -- 
> 1.9.1 
>
>

Re: [PATCH v2 master] Add proposal of a new data collector and new command-line options

Reply via email to