On 10.08.20 12:14, Denis V. Lunev wrote:
> There are severe delays with IO requests processing if QEMU is running in
> virtual machine or over software defined storage. Such delays potentially
> results in unpredictable guest behavior. For example, guests over IDE or
> SATA drive could remount filesystem read-only if write is performed
> longer than 10 seconds.
> 
> Such reports are very complex to process. Some good starting point for this
> seems quite reasonable. This patch provides one. It adds logging of such
> potentially dangerous long IO operations.
> 
> Signed-off-by: Denis V. Lunev <d...@openvz.org>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
> CC: Stefan Hajnoczi <stefa...@redhat.com>
> CC: Kevin Wolf <kw...@redhat.com>
> CC: Max Reitz <mre...@redhat.com>
> ---
>  block/accounting.c         | 59 +++++++++++++++++++++++++++++++++++++-
>  blockdev.c                 |  7 ++++-
>  include/block/accounting.h |  5 +++-
>  3 files changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/block/accounting.c b/block/accounting.c
> index 8d41c8a83a..24f48c84b8 100644
> --- a/block/accounting.c
> +++ b/block/accounting.c

[...]

> @@ -182,6 +185,58 @@ void block_latency_histograms_clear(BlockAcctStats 
> *stats)
>      }
>  }
>  
> +static const char *block_account_type(enum BlockAcctType type)
> +{
> +    switch (type) {
> +    case BLOCK_ACCT_READ:
> +        return "READ";
> +    case BLOCK_ACCT_WRITE:
> +        return "WRITE";
> +    case BLOCK_ACCT_FLUSH:
> +        return "DISCARD";
> +    case BLOCK_ACCT_UNMAP:
> +        return "TRUNCATE";

I don’t understand how FLUSH translates to DISCARD, and UNMAP to TRUNCATE.

> +    case BLOCK_ACCT_NONE:
> +        return "NONE";
> +    case BLOCK_MAX_IOTYPE:
> +        break;
> +    }
> +    return "UNKNOWN";
> +}
> +
> +static void block_acct_report_long(BlockAcctStats *stats,
> +                                   BlockAcctCookie *cookie, int64_t 
> latency_ns)
> +{
> +    unsigned latency_ms = latency_ns / SCALE_MS;
> +    int64_t start_time_host_s;
> +    char buf[64];
> +    struct tm t;
> +    BlockDriverState *bs;
> +
> +    if (cookie->type == BLOCK_ACCT_NONE) {
> +        return;
> +    }
> +    if (stats->latency_log_threshold_ms == 0) {
> +        return;
> +    }
> +    if (latency_ms < stats->latency_log_threshold_ms) {
> +        return;
> +    }
> +
> +    start_time_host_s = cookie->start_time_ns / NANOSECONDS_PER_SECOND;
> +    strftime(buf, sizeof(buf), "%m-%d %H:%M:%S",
> +             localtime_r(&start_time_host_s, &t));
> +
> +    bs = blk_bs(blk_stats2blk(stats));
> +    qemu_log("long %s[%ld] IO request: %d.03%d since %s.%03d bs: %s(%s, 
> %s)\n",
> +             block_account_type(cookie->type), cookie->bytes,
> +             (int)(latency_ms / 1000), (int)(latency_ms % 1000), buf,

Why not just latency_ms * .001f and %.3f?

> +             (int)((cookie->start_time_ns / 1000000) % 1000),

s/1000000/SCALE_MS/? (But I’m not sure whether that’s what the SCALE_?S
are for.)

> +             bs == NULL ? "unknown" : bdrv_get_node_name(bs),
> +             bs == NULL ? "unknown" : bdrv_get_format_name(bs),
> +             bs == NULL ? "unknown" : bs->filename);

Now that I’m writing this response already, I wonder whether a QMP event
wouldn’t be nice.  (But considering that accounting apparently just
doesn’t with -blockdev, I suppose that’s not that big of a worry.)

> +}
> +
>  static void block_account_one_io(BlockAcctStats *stats, BlockAcctCookie 
> *cookie,
>                                   bool failed)
>  {

[...]

> diff --git a/blockdev.c b/blockdev.c
> index 3848a9c8ab..66158d1292 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -622,7 +622,8 @@ static BlockBackend *blockdev_init(const char *file, 
> QDict *bs_opts,
>  
>          bs->detect_zeroes = detect_zeroes;
>  
> -        block_acct_setup(blk_get_stats(blk), account_invalid, 
> account_failed);
> +        block_acct_setup(blk_get_stats(blk), account_invalid, account_failed,
> +                qemu_opt_get_number(opts, "latency-log-threshold", 0));

latency_log_threshold_ms is an unsigned int and so this will silently
overflow for values >= 2^32.

(Or for user-specified values < 0, which are wrapped around.)

>  
>          if (!parse_stats_intervals(blk_get_stats(blk), interval_list, errp)) 
> {
>              blk_unref(blk);
> @@ -3727,6 +3728,10 @@ QemuOptsList qemu_common_drive_opts = {
>              .type = QEMU_OPT_BOOL,
>              .help = "whether to account for failed I/O operations "
>                      "in the statistics",
> +        },{
> +            .name = "latency-log-threshold",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "threshold for long I/O report (disabled if <=0), in ms",

Because of that overflow, negative values will not necessarily disable
reporting, because truncating them to 32 bit may make them small
absolute values again.

In any case, I’d just say “disabled if 0”, and not mention anything
about <0.

Max

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to