Re: [Qemu-devel] a suggestion to extend the query block stats commands for multi-disk case

Stefan Hajnoczi Mon, 05 Dec 2016 03:56:11 -0800

On Mon, Dec 05, 2016 at 04:56:38PM +0800, Dou Liyang wrote:
> Hi all,
> 
> Currently, in Qemu, we query the block states by using “info blockstats”.
> For each disk, the command calls “bdrv_query_stats” function to get the
> info of block states in “qmp_query_blockstats” .
> Since commit 13344f3a, Stefan added mutex to ensures that the dataplane
> IOThread and the main loop's monitor code do not race.
> These are very useful to us.
> 
> Recently, we fount that when the guest in qemu had a lot of disks, the
> I/O performance might be degraded if we used the command above.
> 
> So, we consider whether the following optimization can be carried out:
> 
> 1. Add an optional parameter to identify a block, If this parameter is
> not empty, only the states of the identified block is acquired.
> 
>  BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
>                                       bool query_nodes,
> +                                     BlockBackend *query_blk,
>                                       Error **errp)
>  {
>      BlockStatsList *head = NULL, **p_next = &head;
> 
> 2. Just use the mutex when we set the 'x-data-plane=on' feature.
> 
> 
> The following is the reason:
> 
> 1. background:
> 
> we have some guests which have more than 450 disks. And we use
> “qmp_query_blockstats” to monitor their disks states in real time( 2
> times/s).
> 
> we found that, in the monitor situation, the I/O performance of the
> guests might be degraded. We tested(used “dd”):
> 
>  disk number |  degraded    |
>      10      |  1.4%   |
>      100     |  3.4%   |
>      200     |  5.5%   |
>      500     |  13.6%  |
> 
> 
> 2. In the description above, a comparison was made on the number of 500
> disks:
> 
>   2.1. Adds the specified parameter： degraded 2.1%
> 
>   2.2. Turn off the lock protection(for test)： degraded 1.6%
> 
> 3. Guess
> 
> We stamped the time-stamps before and after the method named
> “qmp_query_blockstats”, found that in our host:
> 
> 3.1 the time to acquire a disk states is about 4637ns (without
> considering the time of searching the linked list).
> 3.2  the time to get 500 disks states is about 901583ns.
> 3.3  if we don't consider the mutex,  the time to acquire a disk states
> is about 2302ns.
> 
> The operation consumes the running resources of I/O . As the number of
> disks increases, it becomes apparent.
> 
> We think that the performance degradation is inevitable, but if we can
> configure the parameters, allowing users to balance between the number
> of disks states and the performance of I/O, that may be also a good
> thing.
> 
> And, we searched in the mail-list archive, did not find the relevant
> discussion. We are not sure if anyone has ever experienced this
> problem, or if anyone ever mentioned it before.
> 
> So, we boldly put forward, and hope to get everyone's advices.


Adding the BlockBackend *query_blk parameter is a reasonable short-term
workaround.  I wonder if the polling stats APIs need to be rethought in
the longer term.

Regarding the low-level details of block statistics inside QEMU, we can
probably collect statistics without requiring the AioContext lock.  This
means guest I/O processing does not need to be interrupted.  There
should be some RCU-style scheme that can be used to extract the stats.

Stefan

signature.asc
Description: PGP signature

Re: [Qemu-devel] a suggestion to extend the query block stats commands for multi-disk case

Reply via email to