On 2016-06-12 21:22:41, [email protected] wrote:
> From: Daniil Leshchev <[email protected]>
> 
> Introduce new command-line options for configuring
> balancing process.
> 
> Introduce the data collector for gathering information
> about network speed. This information can be used in order
> to optimize time of cluster balancing.
> ---
>  doc/design-migration-speed-hbal.rst | 44 
> +++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/doc/design-migration-speed-hbal.rst 
> b/doc/design-migration-speed-hbal.rst
> index a0dcfe0..14b867e 100644
> --- a/doc/design-migration-speed-hbal.rst
> +++ b/doc/design-migration-speed-hbal.rst
> @@ -26,3 +26,47 @@ a compromise between moves speed and optimal scoring. This 
> can be implemented
>  by introducing ``--avoid-disk-moves *FACTOR*`` option which will admit disk
>  moves only if the gain in the cluster metrics is *FACTOR* times
>  higher than the gain achievable by non disk moves.
> +
> +Avoiding insignificant long-time solutions
> +======================================
> +
> +The next step is to estimate an amount of time required to perform a 
> balancing
> +step and introduce a new term: ``long-time`` solution.
> +
> +``--long-solution-threshold`` option will specify a duration in seconds.
> +A solution exceeding the duration is a ``long-time`` solution by definition.
> +
> +With time estimations we will be able to filter Hbal's sequences and not 
> allow
> +to perform long-time solutions without enough gain in cluster metric. This 
> can
> +be done by introducing ``--avoid-long-solutions *FACTOR*`` option, which will
> +admit only long-time solutions whose K/N metrics are more, than *FACTOR* 
> where
> +K is gain of such solution and N is an estimated time to perform it.
> +
> +As a result we can achieve almost similar improvement of the cluster metrics
> +after balancing with significant decrease of time to balancing.

Interesting. The avoid-long-solutions factor makes sense, but the
long-solution-threshold as based on a fixed value makes me wonder if
it's the appropriate metric. Depending on cluster state, an operation's
given speed can vary greatly.

Given that the only scenario that takes long and a variable amount of
time is replicating disks when using DRBD, my original plan a few years
back was to add an instance's disk size into the scoring factors, but I
never got around to that.

> +Network bandwidth estimation
> +============================
> +
> +Balancing time can be estimated by taking amount of data to be moved and
> +current network bandwidth between each pair of affected nodes.
> +
> +We propose to add a new data collector, that will gather information about
> +network speed by sending some amount of data. By counting time to perform 
> this,
> +we can estimate average network speed between any two nodes in the cluster.
> +
> +DataCollector implementation details
> +====================================
> +
> +As a first approach we suggest implement dummy data collector whose output
> +could be configured by user.
> +
> +For serious data collector it's useless to send tiny packets less than 100Kb,
> +because of time to connection establishing. Since in almost all 
> implementations
> +of TCP/IP stack MTU is limited to approximately 1500 bytes, we propose also 
> not
> +to use *ping* command, but implement own process of package sending or for
> +example parse output from *scp* command.
> +
> +During *dcUpdate* every data collector sends requests to other nodes and
> +measures time to get response. So after master node invoke *dcReport*
> +on all collectors, it will get full graph of network speed.

Is this complex setup needed? What kind of network setups do you have in
mind where the need to compute available replication bandwidth in real
time is useful?

In common Ganeti deployments using DRBD, replication is done over a
private network where only replication and instance migration is taking
place. As such, the network is fully symmetric and 'dedicated' to
Ganeti; a fixed configuration parameter (node-group level) would
approximate the available bandwidth well enough, I think.

What do you think?

regards,
iustin

Reply via email to