On 9/17/20 4:56 PM, Max Reitz wrote: > On 10.08.20 12:14, Denis V. Lunev wrote: >> There are severe delays with IO requests processing if QEMU is running in >> virtual machine or over software defined storage. Such delays potentially >> results in unpredictable guest behavior. For example, guests over IDE or >> SATA drive could remount filesystem read-only if write is performed >> longer than 10 seconds. >> >> Such reports are very complex to process. Some good starting point for this >> seems quite reasonable. This patch provides one. It adds logging of such >> potentially dangerous long IO operations. >> >> Signed-off-by: Denis V. Lunev <d...@openvz.org> >> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >> CC: Stefan Hajnoczi <stefa...@redhat.com> >> CC: Kevin Wolf <kw...@redhat.com> >> CC: Max Reitz <mre...@redhat.com> >> --- >> block/accounting.c | 59 +++++++++++++++++++++++++++++++++++++- >> blockdev.c | 7 ++++- >> include/block/accounting.h | 5 +++- >> 3 files changed, 68 insertions(+), 3 deletions(-) >> >> diff --git a/block/accounting.c b/block/accounting.c >> index 8d41c8a83a..24f48c84b8 100644 >> --- a/block/accounting.c >> +++ b/block/accounting.c > [...] > >> @@ -182,6 +185,58 @@ void block_latency_histograms_clear(BlockAcctStats >> *stats) >> } >> } >> >> +static const char *block_account_type(enum BlockAcctType type) >> +{ >> + switch (type) { >> + case BLOCK_ACCT_READ: >> + return "READ"; >> + case BLOCK_ACCT_WRITE: >> + return "WRITE"; >> + case BLOCK_ACCT_FLUSH: >> + return "DISCARD"; >> + case BLOCK_ACCT_UNMAP: >> + return "TRUNCATE"; > I don’t understand how FLUSH translates to DISCARD, and UNMAP to TRUNCATE. >
Ha, seems I have made copy/pasted and missed this. Nice catch. >> + case BLOCK_ACCT_NONE: >> + return "NONE"; >> + case BLOCK_MAX_IOTYPE: >> + break; >> + } >> + return "UNKNOWN"; >> +} >> + >> +static void block_acct_report_long(BlockAcctStats *stats, >> + BlockAcctCookie *cookie, int64_t >> latency_ns) >> +{ >> + unsigned latency_ms = latency_ns / SCALE_MS; >> + int64_t start_time_host_s; >> + char buf[64]; >> + struct tm t; >> + BlockDriverState *bs; >> + >> + if (cookie->type == BLOCK_ACCT_NONE) { >> + return; >> + } >> + if (stats->latency_log_threshold_ms == 0) { >> + return; >> + } >> + if (latency_ms < stats->latency_log_threshold_ms) { >> + return; >> + } >> + >> + start_time_host_s = cookie->start_time_ns / NANOSECONDS_PER_SECOND; >> + strftime(buf, sizeof(buf), "%m-%d %H:%M:%S", >> + localtime_r(&start_time_host_s, &t)); >> + >> + bs = blk_bs(blk_stats2blk(stats)); >> + qemu_log("long %s[%ld] IO request: %d.03%d since %s.%03d bs: %s(%s, >> %s)\n", >> + block_account_type(cookie->type), cookie->bytes, >> + (int)(latency_ms / 1000), (int)(latency_ms % 1000), buf, > Why not just latency_ms * .001f and %.3f? I personally dislike floating point, this leads to a bit strange output. > >> + (int)((cookie->start_time_ns / 1000000) % 1000), > s/1000000/SCALE_MS/? (But I’m not sure whether that’s what the SCALE_?S > are for.) yes, you right. >> + bs == NULL ? "unknown" : bdrv_get_node_name(bs), >> + bs == NULL ? "unknown" : bdrv_get_format_name(bs), >> + bs == NULL ? "unknown" : bs->filename); > Now that I’m writing this response already, I wonder whether a QMP event > wouldn’t be nice. (But considering that accounting apparently just > doesn’t with -blockdev, I suppose that’s not that big of a worry.) may be. Why not? Though there is some note. Usually after the stall there is a bunch of requests completed at the same time once the storage is unfrozen, thus some aggregation would be required. I think that this should be made separately. >> +} >> + >> static void block_account_one_io(BlockAcctStats *stats, BlockAcctCookie >> *cookie, >> bool failed) >> { > [...] > >> diff --git a/blockdev.c b/blockdev.c >> index 3848a9c8ab..66158d1292 100644 >> --- a/blockdev.c >> +++ b/blockdev.c >> @@ -622,7 +622,8 @@ static BlockBackend *blockdev_init(const char *file, >> QDict *bs_opts, >> >> bs->detect_zeroes = detect_zeroes; >> >> - block_acct_setup(blk_get_stats(blk), account_invalid, >> account_failed); >> + block_acct_setup(blk_get_stats(blk), account_invalid, >> account_failed, >> + qemu_opt_get_number(opts, "latency-log-threshold", 0)); > latency_log_threshold_ms is an unsigned int and so this will silently > overflow for values >= 2^32. > > (Or for user-specified values < 0, which are wrapped around.) > >> >> if (!parse_stats_intervals(blk_get_stats(blk), interval_list, >> errp)) { >> blk_unref(blk); >> @@ -3727,6 +3728,10 @@ QemuOptsList qemu_common_drive_opts = { >> .type = QEMU_OPT_BOOL, >> .help = "whether to account for failed I/O operations " >> "in the statistics", >> + },{ >> + .name = "latency-log-threshold", >> + .type = QEMU_OPT_NUMBER, >> + .help = "threshold for long I/O report (disabled if <=0), in >> ms", > Because of that overflow, negative values will not necessarily disable > reporting, because truncating them to 32 bit may make them small > absolute values again. > > In any case, I’d just say “disabled if 0”, and not mention anything > about <0. will do. Thanks for review! Den