Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize
As mst said, not for 6.2. On Thu, Nov 11, 2021 at 06:33:45PM +0300, Roman Kagan wrote: > vhost-user-blk realize only attempts to reconnect if the previous > connection attempt failed on "a problem with the connection and not an > error related to the content (which would fail again the same way in the > next attempt)". > > However this distinction is very subtle, and may be inadvertently broken > if the code changes somewhere deep down the stack and a new error gets > propagated up to here. > > OTOH now that the number of reconnection attempts is limited it seems > harmless to try reconnecting on any error. > > So relax the condition of whether to retry connecting to check for any > error. > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection > during realize". > > Signed-off-by: Roman Kagan Reviewed-by: Raphael Norwitz > --- > hw/block/vhost-user-blk.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > index ba13cb87e5..f9b17f6813 100644 > --- a/hw/block/vhost-user-blk.c > +++ b/hw/block/vhost-user-blk.c > @@ -511,7 +511,7 @@ static void vhost_user_blk_device_realize(DeviceState > *dev, Error **errp) > *errp = NULL; > } > ret = vhost_user_blk_realize_connect(s, errp); > -} while (ret == -EPROTO && retries--); > +} while (ret < 0 && retries--); > > if (ret < 0) { > goto virtio_err; > -- > 2.33.1 >
Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize
> > > > I see. I hadn't looked at the rest of the series yet because I ran out > > of time, but now that I'm skimming them, I see quite a few places that > > use non-EPROTO, but I wonder which of them actually should be > > reconnected. So far all I saw were presumably persistent errors where a > > retry won't help. Can you give me some examples? > > E.g. the particular case you mention earlier, -ECONNREFUSED, is not > unlikely to happen due to the vhost-user server restart for maintenance; > in this case retying looks like a reasonable thing to do, doesn't it? > Seems like a net-positive to me, expecially with the cleanups in the rest of the series, but I don't feel strongly. > Thanks, > Roman. >
Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize
On Fri, Nov 12, 2021 at 12:37:59PM +0100, Kevin Wolf wrote: > Am 12.11.2021 um 08:39 hat Roman Kagan geschrieben: > > On Thu, Nov 11, 2021 at 06:52:30PM +0100, Kevin Wolf wrote: > > > Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben: > > > > vhost-user-blk realize only attempts to reconnect if the previous > > > > connection attempt failed on "a problem with the connection and not an > > > > error related to the content (which would fail again the same way in the > > > > next attempt)". > > > > > > > > However this distinction is very subtle, and may be inadvertently broken > > > > if the code changes somewhere deep down the stack and a new error gets > > > > propagated up to here. > > > > > > > > OTOH now that the number of reconnection attempts is limited it seems > > > > harmless to try reconnecting on any error. > > > > > > > > So relax the condition of whether to retry connecting to check for any > > > > error. > > > > > > > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection > > > > during realize". > > > > > > > > Signed-off-by: Roman Kagan > > > > > > It results in less than perfect error messages. With a modified export > > > that just crashes qemu-storage-daemon during get_features, I get: > > > > > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read > > > msg header. Read 0 instead of 12. Original request 1. > > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting > > > after error: vhost_backend_init failed: Protocol error > > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting > > > after error: Failed to connect to '/tmp/vsock': Connection refused > > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting > > > after error: Failed to connect to '/tmp/vsock': Connection refused > > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to > > > connect to '/tmp/vsock': Connection refused > > > > This patch doesn't change any error messages. Which ones specifically > > became less than perfect as a result of this patch? > > But it adds error messages (for each retry), which are different from > the first error message. As I said this is not the end of the world, but > maybe a bit more confusing. Ah, now I see what you mean: it adds reconnection attempts where there used to be immediate failure return, so now every failed attempt logs its own message. > > > I guess this might be tolerable. On the other hand, the patch doesn't > > > really fix anything either, but just gets rid of possible subtleties. > > > > The remaining patches in the series make other errors beside -EPROTO > > propagate up to this point, and some (most) of them are retryable. This > > was the reason to include this patch at the beginning of the series (I > > guess I should've mentioned that in the patch log). > > I see. I hadn't looked at the rest of the series yet because I ran out > of time, but now that I'm skimming them, I see quite a few places that > use non-EPROTO, but I wonder which of them actually should be > reconnected. So far all I saw were presumably persistent errors where a > retry won't help. Can you give me some examples? E.g. the particular case you mention earlier, -ECONNREFUSED, is not unlikely to happen due to the vhost-user server restart for maintenance; in this case retying looks like a reasonable thing to do, doesn't it? Thanks, Roman.
Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize
Am 12.11.2021 um 08:39 hat Roman Kagan geschrieben: > On Thu, Nov 11, 2021 at 06:52:30PM +0100, Kevin Wolf wrote: > > Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben: > > > vhost-user-blk realize only attempts to reconnect if the previous > > > connection attempt failed on "a problem with the connection and not an > > > error related to the content (which would fail again the same way in the > > > next attempt)". > > > > > > However this distinction is very subtle, and may be inadvertently broken > > > if the code changes somewhere deep down the stack and a new error gets > > > propagated up to here. > > > > > > OTOH now that the number of reconnection attempts is limited it seems > > > harmless to try reconnecting on any error. > > > > > > So relax the condition of whether to retry connecting to check for any > > > error. > > > > > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection > > > during realize". > > > > > > Signed-off-by: Roman Kagan > > > > It results in less than perfect error messages. With a modified export > > that just crashes qemu-storage-daemon during get_features, I get: > > > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read > > msg header. Read 0 instead of 12. Original request 1. > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting > > after error: vhost_backend_init failed: Protocol error > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting > > after error: Failed to connect to '/tmp/vsock': Connection refused > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting > > after error: Failed to connect to '/tmp/vsock': Connection refused > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to connect > > to '/tmp/vsock': Connection refused > > This patch doesn't change any error messages. Which ones specifically > became less than perfect as a result of this patch? But it adds error messages (for each retry), which are different from the first error message. As I said this is not the end of the world, but maybe a bit more confusing. > > I guess this might be tolerable. On the other hand, the patch doesn't > > really fix anything either, but just gets rid of possible subtleties. > > The remaining patches in the series make other errors beside -EPROTO > propagate up to this point, and some (most) of them are retryable. This > was the reason to include this patch at the beginning of the series (I > guess I should've mentioned that in the patch log). I see. I hadn't looked at the rest of the series yet because I ran out of time, but now that I'm skimming them, I see quite a few places that use non-EPROTO, but I wonder which of them actually should be reconnected. So far all I saw were presumably persistent errors where a retry won't help. Can you give me some examples? Kevin
Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize
On Thu, Nov 11, 2021 at 06:52:30PM +0100, Kevin Wolf wrote: > Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben: > > vhost-user-blk realize only attempts to reconnect if the previous > > connection attempt failed on "a problem with the connection and not an > > error related to the content (which would fail again the same way in the > > next attempt)". > > > > However this distinction is very subtle, and may be inadvertently broken > > if the code changes somewhere deep down the stack and a new error gets > > propagated up to here. > > > > OTOH now that the number of reconnection attempts is limited it seems > > harmless to try reconnecting on any error. > > > > So relax the condition of whether to retry connecting to check for any > > error. > > > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection > > during realize". > > > > Signed-off-by: Roman Kagan > > It results in less than perfect error messages. With a modified export > that just crashes qemu-storage-daemon during get_features, I get: > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read msg > header. Read 0 instead of 12. Original request 1. > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after > error: vhost_backend_init failed: Protocol error > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after > error: Failed to connect to '/tmp/vsock': Connection refused > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after > error: Failed to connect to '/tmp/vsock': Connection refused > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to connect > to '/tmp/vsock': Connection refused This patch doesn't change any error messages. Which ones specifically became less than perfect as a result of this patch? > I guess this might be tolerable. On the other hand, the patch doesn't > really fix anything either, but just gets rid of possible subtleties. The remaining patches in the series make other errors beside -EPROTO propagate up to this point, and some (most) of them are retryable. This was the reason to include this patch at the beginning of the series (I guess I should've mentioned that in the patch log). Thanks, Roman.
Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize
Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben: > vhost-user-blk realize only attempts to reconnect if the previous > connection attempt failed on "a problem with the connection and not an > error related to the content (which would fail again the same way in the > next attempt)". > > However this distinction is very subtle, and may be inadvertently broken > if the code changes somewhere deep down the stack and a new error gets > propagated up to here. > > OTOH now that the number of reconnection attempts is limited it seems > harmless to try reconnecting on any error. > > So relax the condition of whether to retry connecting to check for any > error. > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection > during realize". > > Signed-off-by: Roman Kagan It results in less than perfect error messages. With a modified export that just crashes qemu-storage-daemon during get_features, I get: qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read msg header. Read 0 instead of 12. Original request 1. qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after error: vhost_backend_init failed: Protocol error qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after error: Failed to connect to '/tmp/vsock': Connection refused qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting after error: Failed to connect to '/tmp/vsock': Connection refused qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to connect to '/tmp/vsock': Connection refused I guess this might be tolerable. On the other hand, the patch doesn't really fix anything either, but just gets rid of possible subtleties. Kevin