Re: [PATCH 0/2] block: add logging facility for long standing IO requests

2020-07-31 Thread Denis V. Lunev
On 7/31/20 1:25 PM, Stefan Hajnoczi wrote:
> On Fri, Jul 10, 2020 at 08:27:09PM +0300, Denis V. Lunev wrote:
>> There are severe delays with IO requests processing if QEMU is running in
>> virtual machine or over software defined storage. Such delays potentially
>> results in unpredictable guest behavior. For example, guests over IDE or
>> SATA drive could remount filesystem read-only if write is performed
>> longer than 10 seconds.
>>
>> Such reports are very complex to process. Some good starting point for this
>> seems quite reasonable. This patch provides one. It adds logging of such
>> potentially dangerous long IO operations.
>>
>> Signed-off-by: Denis V. Lunev 
>> CC: Vladimir Sementsov-Ogievskiy 
>> CC: Kevin Wolf 
>> CC: Max Reitz 
> If I understand correctly this only reports completed I/Os, so if the
> host hasn't given up on an I/O request yet then QEMU will not report it
> is taking a long time. In the meantime the guest could start printing
> timeout errors.
>
> I think this patch series is good as it is. In the future maybe a QMP
> command that lists in-flight I/O requests would be nice. That helps
> when troubleshooting I/Os that are hung.
We could dump requests at block level, we do have proper lists,
but the would be a little bit different.

Anyway, I __DO__ like this idea, I have missed this somehow :)

Thank you for the suggestion,
    Den



Re: [PATCH 0/2] block: add logging facility for long standing IO requests

2020-07-31 Thread Stefan Hajnoczi
On Fri, Jul 10, 2020 at 08:27:09PM +0300, Denis V. Lunev wrote:
> There are severe delays with IO requests processing if QEMU is running in
> virtual machine or over software defined storage. Such delays potentially
> results in unpredictable guest behavior. For example, guests over IDE or
> SATA drive could remount filesystem read-only if write is performed
> longer than 10 seconds.
> 
> Such reports are very complex to process. Some good starting point for this
> seems quite reasonable. This patch provides one. It adds logging of such
> potentially dangerous long IO operations.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Vladimir Sementsov-Ogievskiy 
> CC: Kevin Wolf 
> CC: Max Reitz 

If I understand correctly this only reports completed I/Os, so if the
host hasn't given up on an I/O request yet then QEMU will not report it
is taking a long time. In the meantime the guest could start printing
timeout errors.

I think this patch series is good as it is. In the future maybe a QMP
command that lists in-flight I/O requests would be nice. That helps
when troubleshooting I/Os that are hung.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 0/2] block: add logging facility for long standing IO requests

2020-07-29 Thread Stefan Hajnoczi
On Fri, Jul 10, 2020 at 08:27:09PM +0300, Denis V. Lunev wrote:
> There are severe delays with IO requests processing if QEMU is running in
> virtual machine or over software defined storage. Such delays potentially
> results in unpredictable guest behavior. For example, guests over IDE or
> SATA drive could remount filesystem read-only if write is performed
> longer than 10 seconds.
> 
> Such reports are very complex to process. Some good starting point for this
> seems quite reasonable. This patch provides one. It adds logging of such
> potentially dangerous long IO operations.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Vladimir Sementsov-Ogievskiy 
> CC: Kevin Wolf 
> CC: Max Reitz 

This looks useful. It is indeed hard to diagnose soft lockups, I/O
timeouts, etc inside the guest :). QEMU should print more info. Thanks
for doing this!

Stefan


signature.asc
Description: PGP signature


[PATCH 0/2] block: add logging facility for long standing IO requests

2020-07-10 Thread Denis V. Lunev
There are severe delays with IO requests processing if QEMU is running in
virtual machine or over software defined storage. Such delays potentially
results in unpredictable guest behavior. For example, guests over IDE or
SATA drive could remount filesystem read-only if write is performed
longer than 10 seconds.

Such reports are very complex to process. Some good starting point for this
seems quite reasonable. This patch provides one. It adds logging of such
potentially dangerous long IO operations.

Signed-off-by: Denis V. Lunev 
CC: Vladimir Sementsov-Ogievskiy 
CC: Kevin Wolf 
CC: Max Reitz