Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-29 Thread Raphael Norwitz
As mst said, not for 6.2.

On Thu, Nov 11, 2021 at 06:33:45PM +0300, Roman Kagan wrote:
> vhost-user-blk realize only attempts to reconnect if the previous
> connection attempt failed on "a problem with the connection and not an
> error related to the content (which would fail again the same way in the
> next attempt)".
> 
> However this distinction is very subtle, and may be inadvertently broken
> if the code changes somewhere deep down the stack and a new error gets
> propagated up to here.
> 
> OTOH now that the number of reconnection attempts is limited it seems
> harmless to try reconnecting on any error.
> 
> So relax the condition of whether to retry connecting to check for any
> error.
> 
> This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
> during realize".
> 
> Signed-off-by: Roman Kagan 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/block/vhost-user-blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index ba13cb87e5..f9b17f6813 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -511,7 +511,7 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>  *errp = NULL;
>  }
>  ret = vhost_user_blk_realize_connect(s, errp);
> -} while (ret == -EPROTO && retries--);
> +} while (ret < 0 && retries--);
>  
>  if (ret < 0) {
>  goto virtio_err;
> -- 
> 2.33.1
> 


Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-29 Thread Raphael Norwitz
> > 
> > I see. I hadn't looked at the rest of the series yet because I ran out
> > of time, but now that I'm skimming them, I see quite a few places that
> > use non-EPROTO, but I wonder which of them actually should be
> > reconnected. So far all I saw were presumably persistent errors where a
> > retry won't help. Can you give me some examples?
> 
> E.g. the particular case you mention earlier, -ECONNREFUSED, is not
> unlikely to happen due to the vhost-user server restart for maintenance;
> in this case retying looks like a reasonable thing to do, doesn't it?
>

Seems like a net-positive to me, expecially with the cleanups in the
rest of the series, but I don't feel strongly.

> Thanks,
> Roman.
> 


Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-12 Thread Roman Kagan
On Fri, Nov 12, 2021 at 12:37:59PM +0100, Kevin Wolf wrote:
> Am 12.11.2021 um 08:39 hat Roman Kagan geschrieben:
> > On Thu, Nov 11, 2021 at 06:52:30PM +0100, Kevin Wolf wrote:
> > > Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben:
> > > > vhost-user-blk realize only attempts to reconnect if the previous
> > > > connection attempt failed on "a problem with the connection and not an
> > > > error related to the content (which would fail again the same way in the
> > > > next attempt)".
> > > > 
> > > > However this distinction is very subtle, and may be inadvertently broken
> > > > if the code changes somewhere deep down the stack and a new error gets
> > > > propagated up to here.
> > > > 
> > > > OTOH now that the number of reconnection attempts is limited it seems
> > > > harmless to try reconnecting on any error.
> > > > 
> > > > So relax the condition of whether to retry connecting to check for any
> > > > error.
> > > > 
> > > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
> > > > during realize".
> > > > 
> > > > Signed-off-by: Roman Kagan 
> > > 
> > > It results in less than perfect error messages. With a modified export
> > > that just crashes qemu-storage-daemon during get_features, I get:
> > > 
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read 
> > > msg header. Read 0 instead of 12. Original request 1.
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > > after error: vhost_backend_init failed: Protocol error
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > > after error: Failed to connect to '/tmp/vsock': Connection refused
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > > after error: Failed to connect to '/tmp/vsock': Connection refused
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to 
> > > connect to '/tmp/vsock': Connection refused
> > 
> > This patch doesn't change any error messages.  Which ones specifically
> > became less than perfect as a result of this patch?
> 
> But it adds error messages (for each retry), which are different from
> the first error message. As I said this is not the end of the world, but
> maybe a bit more confusing.

Ah, now I see what you mean: it adds reconnection attempts where there
used to be immediate failure return, so now every failed attempt logs
its own message.

> > > I guess this might be tolerable. On the other hand, the patch doesn't
> > > really fix anything either, but just gets rid of possible subtleties.
> > 
> > The remaining patches in the series make other errors beside -EPROTO
> > propagate up to this point, and some (most) of them are retryable.  This
> > was the reason to include this patch at the beginning of the series (I
> > guess I should've mentioned that in the patch log).
> 
> I see. I hadn't looked at the rest of the series yet because I ran out
> of time, but now that I'm skimming them, I see quite a few places that
> use non-EPROTO, but I wonder which of them actually should be
> reconnected. So far all I saw were presumably persistent errors where a
> retry won't help. Can you give me some examples?

E.g. the particular case you mention earlier, -ECONNREFUSED, is not
unlikely to happen due to the vhost-user server restart for maintenance;
in this case retying looks like a reasonable thing to do, doesn't it?

Thanks,
Roman.



Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-12 Thread Kevin Wolf
Am 12.11.2021 um 08:39 hat Roman Kagan geschrieben:
> On Thu, Nov 11, 2021 at 06:52:30PM +0100, Kevin Wolf wrote:
> > Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben:
> > > vhost-user-blk realize only attempts to reconnect if the previous
> > > connection attempt failed on "a problem with the connection and not an
> > > error related to the content (which would fail again the same way in the
> > > next attempt)".
> > > 
> > > However this distinction is very subtle, and may be inadvertently broken
> > > if the code changes somewhere deep down the stack and a new error gets
> > > propagated up to here.
> > > 
> > > OTOH now that the number of reconnection attempts is limited it seems
> > > harmless to try reconnecting on any error.
> > > 
> > > So relax the condition of whether to retry connecting to check for any
> > > error.
> > > 
> > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
> > > during realize".
> > > 
> > > Signed-off-by: Roman Kagan 
> > 
> > It results in less than perfect error messages. With a modified export
> > that just crashes qemu-storage-daemon during get_features, I get:
> > 
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read 
> > msg header. Read 0 instead of 12. Original request 1.
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > after error: vhost_backend_init failed: Protocol error
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > after error: Failed to connect to '/tmp/vsock': Connection refused
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > after error: Failed to connect to '/tmp/vsock': Connection refused
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to connect 
> > to '/tmp/vsock': Connection refused
> 
> This patch doesn't change any error messages.  Which ones specifically
> became less than perfect as a result of this patch?

But it adds error messages (for each retry), which are different from
the first error message. As I said this is not the end of the world, but
maybe a bit more confusing.

> > I guess this might be tolerable. On the other hand, the patch doesn't
> > really fix anything either, but just gets rid of possible subtleties.
> 
> The remaining patches in the series make other errors beside -EPROTO
> propagate up to this point, and some (most) of them are retryable.  This
> was the reason to include this patch at the beginning of the series (I
> guess I should've mentioned that in the patch log).

I see. I hadn't looked at the rest of the series yet because I ran out
of time, but now that I'm skimming them, I see quite a few places that
use non-EPROTO, but I wonder which of them actually should be
reconnected. So far all I saw were presumably persistent errors where a
retry won't help. Can you give me some examples?

Kevin




Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-11 Thread Roman Kagan
On Thu, Nov 11, 2021 at 06:52:30PM +0100, Kevin Wolf wrote:
> Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben:
> > vhost-user-blk realize only attempts to reconnect if the previous
> > connection attempt failed on "a problem with the connection and not an
> > error related to the content (which would fail again the same way in the
> > next attempt)".
> > 
> > However this distinction is very subtle, and may be inadvertently broken
> > if the code changes somewhere deep down the stack and a new error gets
> > propagated up to here.
> > 
> > OTOH now that the number of reconnection attempts is limited it seems
> > harmless to try reconnecting on any error.
> > 
> > So relax the condition of whether to retry connecting to check for any
> > error.
> > 
> > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
> > during realize".
> > 
> > Signed-off-by: Roman Kagan 
> 
> It results in less than perfect error messages. With a modified export
> that just crashes qemu-storage-daemon during get_features, I get:
> 
> qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read msg 
> header. Read 0 instead of 12. Original request 1.
> qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after 
> error: vhost_backend_init failed: Protocol error
> qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after 
> error: Failed to connect to '/tmp/vsock': Connection refused
> qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after 
> error: Failed to connect to '/tmp/vsock': Connection refused
> qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to connect 
> to '/tmp/vsock': Connection refused

This patch doesn't change any error messages.  Which ones specifically
became less than perfect as a result of this patch?

> I guess this might be tolerable. On the other hand, the patch doesn't
> really fix anything either, but just gets rid of possible subtleties.

The remaining patches in the series make other errors beside -EPROTO
propagate up to this point, and some (most) of them are retryable.  This
was the reason to include this patch at the beginning of the series (I
guess I should've mentioned that in the patch log).

Thanks,
Roman.



Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-11 Thread Kevin Wolf
Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben:
> vhost-user-blk realize only attempts to reconnect if the previous
> connection attempt failed on "a problem with the connection and not an
> error related to the content (which would fail again the same way in the
> next attempt)".
> 
> However this distinction is very subtle, and may be inadvertently broken
> if the code changes somewhere deep down the stack and a new error gets
> propagated up to here.
> 
> OTOH now that the number of reconnection attempts is limited it seems
> harmless to try reconnecting on any error.
> 
> So relax the condition of whether to retry connecting to check for any
> error.
> 
> This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
> during realize".
> 
> Signed-off-by: Roman Kagan 

It results in less than perfect error messages. With a modified export
that just crashes qemu-storage-daemon during get_features, I get:

qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read msg 
header. Read 0 instead of 12. Original request 1.
qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after 
error: vhost_backend_init failed: Protocol error
qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after 
error: Failed to connect to '/tmp/vsock': Connection refused
qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after 
error: Failed to connect to '/tmp/vsock': Connection refused
qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to connect to 
'/tmp/vsock': Connection refused

I guess this might be tolerable. On the other hand, the patch doesn't
really fix anything either, but just gets rid of possible subtleties.

Kevin




[PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-11 Thread Roman Kagan
vhost-user-blk realize only attempts to reconnect if the previous
connection attempt failed on "a problem with the connection and not an
error related to the content (which would fail again the same way in the
next attempt)".

However this distinction is very subtle, and may be inadvertently broken
if the code changes somewhere deep down the stack and a new error gets
propagated up to here.

OTOH now that the number of reconnection attempts is limited it seems
harmless to try reconnecting on any error.

So relax the condition of whether to retry connecting to check for any
error.

This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
during realize".

Signed-off-by: Roman Kagan 
---
 hw/block/vhost-user-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index ba13cb87e5..f9b17f6813 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -511,7 +511,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 *errp = NULL;
 }
 ret = vhost_user_blk_realize_connect(s, errp);
-} while (ret == -EPROTO && retries--);
+} while (ret < 0 && retries--);
 
 if (ret < 0) {
 goto virtio_err;
-- 
2.33.1