Error propagation between the generic vhost code and the specific backends is not quite consistent: some places follow "return -1 and set errno" convention, while others assume "return negated errno". Furthermore, not enough care is taken not to clobber errno.
As a result, on certain code paths the errno resulting from a failure may get overridden by another function call, and then that zero errno inidicating success is propagated up the stack, leading to failures being lost. In particular, we've seen errors in the communication with a vhost-user-blk slave not trigger an immediate connection drop and reconnection, leaving it in a broken state. Rework error propagation to always return negated errno on errors and correctly pass it up the stack. Roman Kagan (10): vhost-user-blk: reconnect on any error during realize chardev/char-socket: tcp_chr_recv: don't clobber errno chardev/char-socket: tcp_chr_sync_read: don't clobber errno chardev/char-fe: don't allow EAGAIN from blocking read vhost-backend: avoid overflow on memslots_limit vhost-backend: stick to -errno error return convention vhost-vdpa: stick to -errno error return convention vhost-user: stick to -errno error return convention vhost: stick to -errno error return convention vhost-user-blk: propagate error return from generic vhost chardev/char-fe.c | 7 +- chardev/char-socket.c | 17 +- hw/block/vhost-user-blk.c | 4 +- hw/virtio/vhost-backend.c | 4 +- hw/virtio/vhost-user.c | 401 +++++++++++++++++++++----------------- hw/virtio/vhost-vdpa.c | 37 ++-- hw/virtio/vhost.c | 98 +++++----- 7 files changed, 307 insertions(+), 261 deletions(-) -- 2.33.1