Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
On 14.12.2017 17:31, Ilya Maximets wrote: > One update for the testing scenario: > > No need to kill OVS. The issue reproducible with simple 'del-port' > and 'add-port'. virtio driver in guest could crash on both operations. > Most times it crashes in my case on 'add-port' after deletion. > > Hi Maxime, > I already saw below patches and original linux kernel virtio issue. > Just had no enough time to test them. > Now I tested below patches and they fixes virtio driver crash. > Thanks for suggestion. > > Michael, > I tested "[PATCH] virtio_error: don't invoke status callbacks " > and it fixes the QEMU crash in case of broken guest index. > Thanks. > > Best regards, Ilya Maximets. > > P.S. Previously I mentioned that I can not reproduce virtio driver > crash with "[PATCH] virtio_error: don't invoke status callbacks" It should be "[PATCH dontapply] virtio: rework set_status callbacks". Sorry again. > applied. I was wrong. I can reproduce now. System was misconfigured. > Sorry. > > > On 14.12.2017 12:01, Maxime Coquelin wrote: >> Hi Ilya, >> >> On 12/14/2017 08:06 AM, Ilya Maximets wrote: >>> On 13.12.2017 22:48, Michael S. Tsirkin wrote: On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote: >>> That >>> looks very strange. Some of the functions gets 'old_status', others >>> the 'new_status'. I'm a bit confused. >> >> OK, fair enough. Fixed - let's pass old status everywhere, >> users that need the new one can get it from the vdev. >> >>> And it's not functional in current state: >>> >>> hw/net/virtio-net.c:264:28: error: ‘status’ undeclared >> >> Fixed too. new version below. > > This doesn't fix the segmentation fault. Hmm you are right. Looking into it. > I have exactly same crash stacktrace: > > #0 vhost_memory_unmap hw/virtio/vhost.c:446 > #1 vhost_virtqueue_stop hw/virtio/vhost.c:1155 > #2 vhost_dev_stop hw/virtio/vhost.c:1594 > #3 vhost_net_stop_one hw/net/vhost_net.c:289 > #4 vhost_net_stop hw/net/vhost_net.c:368 > #5 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at > hw/net/virtio-net.c:180 > #6 virtio_net_set_status (vdev=0x5625f3901100, old_status= out>) at hw/net/virtio-net.c:254 > #7 virtio_set_status (vdev=vdev@entry=0x5625f3901100, val= out>) at hw/virtio/virtio.c:1152 > #8 virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 > "Guest says index %u is available") at hw/virtio/virtio.c:2460 BTW what is causing this? Why is guest avail index corrupted? >>> >>> My testing environment for the issue: >>> >>> * QEMU 2.10.1 >> >> Could you try to backport below patch and try again killing OVS? >> >> commit 2ae39a113af311cb56a0c35b7f212dafcef15303 >> Author: Maxime Coquelin >> Date: Thu Nov 16 19:48:35 2017 +0100 >> >> vhost: restore avail index from vring used index on disconnection >> >> vhost_virtqueue_stop() gets avail index value from the backend, >> except if the backend is not responding. >> >> It happens when the backend crashes, and in this case, internal >> state of the virtio queue is inconsistent, making packets >> to corrupt the vring state. >> >> With a Linux guest, it results in following error message on >> backend reconnection: >> >> [ 22.444905] virtio_net virtio0: output.0:id 0 is not a head! >> [ 22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5 >> [ 22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5 >> >> Fixes: 283e2c2adcb8 ("net: virtio-net discards TX data after link down") >> Cc: qemu-sta...@nongnu.org >> Signed-off-by: Maxime Coquelin >> Reviewed-by: Michael S. Tsirkin >> Signed-off-by: Michael S. Tsirkin >> >> commit 2d4ba6cc741df15df6fbb4feaa706a02e103083a >> Author: Maxime Coquelin >> Date: Thu Nov 16 19:48:34 2017 +0100 >> >> virtio: Add queue interface to restore avail index from vring used index >> >> In case of backend crash, it is not possible to restore internal >> avail index from the backend value as vhost_get_vring_base >> callback fails. >> >> This patch provides a new interface to restore internal avail index >> from the vring used index, as done by some vhost-user backend on >> reconnection. >> >> Signed-off-by: Maxime Coquelin >> Reviewed-by: Michael S. Tsirkin >> Signed-off-by: Michael S. Tsirkin >> >> >> Cheers, >> Maxime >> >> >> > >
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
One update for the testing scenario: No need to kill OVS. The issue reproducible with simple 'del-port' and 'add-port'. virtio driver in guest could crash on both operations. Most times it crashes in my case on 'add-port' after deletion. Hi Maxime, I already saw below patches and original linux kernel virtio issue. Just had no enough time to test them. Now I tested below patches and they fixes virtio driver crash. Thanks for suggestion. Michael, I tested "[PATCH] virtio_error: don't invoke status callbacks " and it fixes the QEMU crash in case of broken guest index. Thanks. Best regards, Ilya Maximets. P.S. Previously I mentioned that I can not reproduce virtio driver crash with "[PATCH] virtio_error: don't invoke status callbacks" applied. I was wrong. I can reproduce now. System was misconfigured. Sorry. On 14.12.2017 12:01, Maxime Coquelin wrote: > Hi Ilya, > > On 12/14/2017 08:06 AM, Ilya Maximets wrote: >> On 13.12.2017 22:48, Michael S. Tsirkin wrote: >>> On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote: >> That >> looks very strange. Some of the functions gets 'old_status', others >> the 'new_status'. I'm a bit confused. > > OK, fair enough. Fixed - let's pass old status everywhere, > users that need the new one can get it from the vdev. > >> And it's not functional in current state: >> >> hw/net/virtio-net.c:264:28: error: ‘status’ undeclared > > Fixed too. new version below. This doesn't fix the segmentation fault. >>> >>> Hmm you are right. Looking into it. >>> I have exactly same crash stacktrace: #0 vhost_memory_unmap hw/virtio/vhost.c:446 #1 vhost_virtqueue_stop hw/virtio/vhost.c:1155 #2 vhost_dev_stop hw/virtio/vhost.c:1594 #3 vhost_net_stop_one hw/net/vhost_net.c:289 #4 vhost_net_stop hw/net/vhost_net.c:368 #5 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at hw/net/virtio-net.c:180 #6 virtio_net_set_status (vdev=0x5625f3901100, old_status=>>> out>) at hw/net/virtio-net.c:254 #7 virtio_set_status (vdev=vdev@entry=0x5625f3901100, val=>>> out>) at hw/virtio/virtio.c:1152 #8 virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 "Guest says index %u is available") at hw/virtio/virtio.c:2460 >>> >>> BTW what is causing this? Why is guest avail index corrupted? >> >> My testing environment for the issue: >> >> * QEMU 2.10.1 > > Could you try to backport below patch and try again killing OVS? > > commit 2ae39a113af311cb56a0c35b7f212dafcef15303 > Author: Maxime Coquelin > Date: Thu Nov 16 19:48:35 2017 +0100 > > vhost: restore avail index from vring used index on disconnection > > vhost_virtqueue_stop() gets avail index value from the backend, > except if the backend is not responding. > > It happens when the backend crashes, and in this case, internal > state of the virtio queue is inconsistent, making packets > to corrupt the vring state. > > With a Linux guest, it results in following error message on > backend reconnection: > > [ 22.444905] virtio_net virtio0: output.0:id 0 is not a head! > [ 22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5 > [ 22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5 > > Fixes: 283e2c2adcb8 ("net: virtio-net discards TX data after link down") > Cc: qemu-sta...@nongnu.org > Signed-off-by: Maxime Coquelin > Reviewed-by: Michael S. Tsirkin > Signed-off-by: Michael S. Tsirkin > > commit 2d4ba6cc741df15df6fbb4feaa706a02e103083a > Author: Maxime Coquelin > Date: Thu Nov 16 19:48:34 2017 +0100 > > virtio: Add queue interface to restore avail index from vring used index > > In case of backend crash, it is not possible to restore internal > avail index from the backend value as vhost_get_vring_base > callback fails. > > This patch provides a new interface to restore internal avail index > from the vring used index, as done by some vhost-user backend on > reconnection. > > Signed-off-by: Maxime Coquelin > Reviewed-by: Michael S. Tsirkin > Signed-off-by: Michael S. Tsirkin > > > Cheers, > Maxime > > >
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
Hi Ilya, On 12/14/2017 08:06 AM, Ilya Maximets wrote: On 13.12.2017 22:48, Michael S. Tsirkin wrote: On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote: That looks very strange. Some of the functions gets 'old_status', others the 'new_status'. I'm a bit confused. OK, fair enough. Fixed - let's pass old status everywhere, users that need the new one can get it from the vdev. And it's not functional in current state: hw/net/virtio-net.c:264:28: error: ‘status’ undeclared Fixed too. new version below. This doesn't fix the segmentation fault. Hmm you are right. Looking into it. I have exactly same crash stacktrace: #0 vhost_memory_unmap hw/virtio/vhost.c:446 #1 vhost_virtqueue_stop hw/virtio/vhost.c:1155 #2 vhost_dev_stop hw/virtio/vhost.c:1594 #3 vhost_net_stop_one hw/net/vhost_net.c:289 #4 vhost_net_stop hw/net/vhost_net.c:368 #5 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at hw/net/virtio-net.c:180 #6 virtio_net_set_status (vdev=0x5625f3901100, old_status=) at hw/net/virtio-net.c:254 #7 virtio_set_status (vdev=vdev@entry=0x5625f3901100, val=) at hw/virtio/virtio.c:1152 #8 virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 "Guest says index %u is available") at hw/virtio/virtio.c:2460 BTW what is causing this? Why is guest avail index corrupted? My testing environment for the issue: * QEMU 2.10.1 Could you try to backport below patch and try again killing OVS? commit 2ae39a113af311cb56a0c35b7f212dafcef15303 Author: Maxime Coquelin Date: Thu Nov 16 19:48:35 2017 +0100 vhost: restore avail index from vring used index on disconnection vhost_virtqueue_stop() gets avail index value from the backend, except if the backend is not responding. It happens when the backend crashes, and in this case, internal state of the virtio queue is inconsistent, making packets to corrupt the vring state. With a Linux guest, it results in following error message on backend reconnection: [ 22.444905] virtio_net virtio0: output.0:id 0 is not a head! [ 22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5 [ 22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5 Fixes: 283e2c2adcb8 ("net: virtio-net discards TX data after link down") Cc: qemu-sta...@nongnu.org Signed-off-by: Maxime Coquelin Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin commit 2d4ba6cc741df15df6fbb4feaa706a02e103083a Author: Maxime Coquelin Date: Thu Nov 16 19:48:34 2017 +0100 virtio: Add queue interface to restore avail index from vring used index In case of backend crash, it is not possible to restore internal avail index from the backend value as vhost_get_vring_base callback fails. This patch provides a new interface to restore internal avail index from the vring used index, as done by some vhost-user backend on reconnection. Signed-off-by: Maxime Coquelin Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin Cheers, Maxime
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
On 13.12.2017 22:48, Michael S. Tsirkin wrote: > On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote: That looks very strange. Some of the functions gets 'old_status', others the 'new_status'. I'm a bit confused. >>> >>> OK, fair enough. Fixed - let's pass old status everywhere, >>> users that need the new one can get it from the vdev. >>> And it's not functional in current state: hw/net/virtio-net.c:264:28: error: ‘status’ undeclared >>> >>> Fixed too. new version below. >> >> This doesn't fix the segmentation fault. > > Hmm you are right. Looking into it. > >> I have exactly same crash stacktrace: >> >> #0 vhost_memory_unmap hw/virtio/vhost.c:446 >> #1 vhost_virtqueue_stop hw/virtio/vhost.c:1155 >> #2 vhost_dev_stop hw/virtio/vhost.c:1594 >> #3 vhost_net_stop_one hw/net/vhost_net.c:289 >> #4 vhost_net_stop hw/net/vhost_net.c:368 >> #5 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at >> hw/net/virtio-net.c:180 >> #6 virtio_net_set_status (vdev=0x5625f3901100, old_status=) >> at hw/net/virtio-net.c:254 >> #7 virtio_set_status (vdev=vdev@entry=0x5625f3901100, val=) >> at hw/virtio/virtio.c:1152 >> #8 virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 "Guest >> says index %u is available") at hw/virtio/virtio.c:2460 > > BTW what is causing this? Why is guest avail index corrupted? My testing environment for the issue: * QEMU 2.10.1 * testpmd from a bit outdated DPDK 16.07.0 in guest with uio_pci_generic * OVS 2.8 with DPDK 17.05.2 on the host. * 2 vhost-user ports in VM in server mode. * Testpmd just forwards packets from one port to another with --forward-mode=mac testpmd with virtio-net driver sometimes crashes after killing the OVS if some heavy traffic flows. After restarting of the OVS and stopping it again QEMU crashes on vhost disconnect and virtqueue_get_head() failure. So, the sequence is follows: 1. Start OVS, Start QEMU, start testpmd, start external packet generator. 2. pkill -11 ovs-vswitchd 3. If testpmd not crashed in guest goto step #1. 4. Start OVS (testpmd with virtio driver still in down state) 5. pkill -11 ovs-vswitchd 6. Observe QEMU crash I suspect that virtio-net driver from DPDK 16.07.0 corrupts vrings just before crash at step #2. I didn't tried actually to investigate virtio driver crash because it's a bit out of my scope and I have no enough time and a driver slightly outdated. But the stability of QEMU itself is really important thing. One interesting thing is that I can not reproduce virtio driver crash with "virtio: rework set_status callbacks" applied. I had to break vrings manually to reproduce original nested call of vhost_net_stop(). > >> #9 virtqueue_get_head at hw/virtio/virtio.c:543 >> #10 virtqueue_drop_all hw/virtio/virtio.c:984 >> #11 virtio_net_drop_tx_queue_data hw/net/virtio-net.c:240 >> #12 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 >> #13 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 >> #14 vhost_net_stop_one at hw/net/vhost_net.c:290 >> #15 vhost_net_stop at hw/net/vhost_net.c:368 >> #16 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at >> hw/net/virtio-net.c:180 >> #17 virtio_net_set_status (vdev=0x5625f3901100, old_status=) >> at hw/net/virtio-net.c:254 >> #18 qmp_set_link at net/net.c:1430 >> #19 chr_closed_bh at net/vhost-user.c:214 >> #20 aio_bh_call at util/async.c:90 >> #21 aio_bh_poll at util/async.c:118 >> #22 aio_dispatch at util/aio-posix.c:429 >> #23 aio_ctx_dispatch at util/async.c:261 >> #24 g_main_context_dispatch () from /lib64/libglib-2.0.so.0 >> #25 glib_pollfds_poll () at util/main-loop.c:213 >> #26 os_host_main_loop_wait at util/main-loop.c:261 >> #27 main_loop_wait at util/main-loop.c:515 >> #28 main_loop () at vl.c:1917 >> #29 main >> >> >> Actually the logic doesn't change. In function virtio_net_vhost_status(): >> >> -if ((virtio_net_started(n, status) && !nc->peer->link_down) == >> +if ((virtio_net_started(n, vdev->status) && !nc->peer->link_down) == >> !!n->vhost_started) { >> return; >> } >> >> previously new 'status' was checked and the new 'vdev->status' checked now. >> It's the same condition that doesn't work because vhost_started flag is still >> set to 1. >> Anyway, nc->peer->link_down is true in our case, so there is no difference if >> we'll change the vdev->status. >> > > Signed-off-by: Michael S. Tsirkin >>> >>> Still completely untested, sorry about that - hope you can help here. >>> >>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h >>> index 098bdaa..f5d0ee1 100644 >>> --- a/include/hw/virtio/virtio.h >>> +++ b/include/hw/virtio/virtio.h >>> @@ -115,7 +115,7 @@ typedef struct VirtioDeviceClass { >>> void (*get_config)(VirtIODevice *vdev, uint8_t *config); >>> void (*set_config)(VirtIODevice *vdev, const uint8_t *config); >>> void (*reset)(VirtIODevice *vdev); >>> -void (*set_status)(VirtIODevice *vdev,
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote: > >> That > >> looks very strange. Some of the functions gets 'old_status', others > >> the 'new_status'. I'm a bit confused. > > > > OK, fair enough. Fixed - let's pass old status everywhere, > > users that need the new one can get it from the vdev. > > > >> And it's not functional in current state: > >> > >> hw/net/virtio-net.c:264:28: error: ‘status’ undeclared > > > > Fixed too. new version below. > > This doesn't fix the segmentation fault. Hmm you are right. Looking into it. > I have exactly same crash stacktrace: > > #0 vhost_memory_unmap hw/virtio/vhost.c:446 > #1 vhost_virtqueue_stop hw/virtio/vhost.c:1155 > #2 vhost_dev_stop hw/virtio/vhost.c:1594 > #3 vhost_net_stop_one hw/net/vhost_net.c:289 > #4 vhost_net_stop hw/net/vhost_net.c:368 > #5 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at > hw/net/virtio-net.c:180 > #6 virtio_net_set_status (vdev=0x5625f3901100, old_status=) > at hw/net/virtio-net.c:254 > #7 virtio_set_status (vdev=vdev@entry=0x5625f3901100, val=) > at hw/virtio/virtio.c:1152 > #8 virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 "Guest > says index %u is available") at hw/virtio/virtio.c:2460 BTW what is causing this? Why is guest avail index corrupted? > #9 virtqueue_get_head at hw/virtio/virtio.c:543 > #10 virtqueue_drop_all hw/virtio/virtio.c:984 > #11 virtio_net_drop_tx_queue_data hw/net/virtio-net.c:240 > #12 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 > #13 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 > #14 vhost_net_stop_one at hw/net/vhost_net.c:290 > #15 vhost_net_stop at hw/net/vhost_net.c:368 > #16 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at > hw/net/virtio-net.c:180 > #17 virtio_net_set_status (vdev=0x5625f3901100, old_status=) > at hw/net/virtio-net.c:254 > #18 qmp_set_link at net/net.c:1430 > #19 chr_closed_bh at net/vhost-user.c:214 > #20 aio_bh_call at util/async.c:90 > #21 aio_bh_poll at util/async.c:118 > #22 aio_dispatch at util/aio-posix.c:429 > #23 aio_ctx_dispatch at util/async.c:261 > #24 g_main_context_dispatch () from /lib64/libglib-2.0.so.0 > #25 glib_pollfds_poll () at util/main-loop.c:213 > #26 os_host_main_loop_wait at util/main-loop.c:261 > #27 main_loop_wait at util/main-loop.c:515 > #28 main_loop () at vl.c:1917 > #29 main > > > Actually the logic doesn't change. In function virtio_net_vhost_status(): > > -if ((virtio_net_started(n, status) && !nc->peer->link_down) == > +if ((virtio_net_started(n, vdev->status) && !nc->peer->link_down) == > !!n->vhost_started) { > return; > } > > previously new 'status' was checked and the new 'vdev->status' checked now. > It's the same condition that doesn't work because vhost_started flag is still > set to 1. > Anyway, nc->peer->link_down is true in our case, so there is no difference if > we'll change the vdev->status. > > >>> > >>> Signed-off-by: Michael S. Tsirkin > > > > Still completely untested, sorry about that - hope you can help here. > > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > > index 098bdaa..f5d0ee1 100644 > > --- a/include/hw/virtio/virtio.h > > +++ b/include/hw/virtio/virtio.h > > @@ -115,7 +115,7 @@ typedef struct VirtioDeviceClass { > > void (*get_config)(VirtIODevice *vdev, uint8_t *config); > > void (*set_config)(VirtIODevice *vdev, const uint8_t *config); > > void (*reset)(VirtIODevice *vdev); > > -void (*set_status)(VirtIODevice *vdev, uint8_t val); > > +void (*set_status)(VirtIODevice *vdev, uint8_t old_status); > > /* For transitional devices, this is a bitmap of features > > * that are only exposed on the legacy interface but not > > * the modern one. > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > > index 05d1440..b8b07ba 100644 > > --- a/hw/block/virtio-blk.c > > +++ b/hw/block/virtio-blk.c > > @@ -814,15 +814,15 @@ static uint64_t virtio_blk_get_features(VirtIODevice > > *vdev, uint64_t features, > > return features; > > } > > > > -static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status) > > +static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t old_status) > > { > > VirtIOBlock *s = VIRTIO_BLK(vdev); > > > > -if (!(status & (VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK))) { > > +if (!(vdev->status & (VIRTIO_CONFIG_S_DRIVER | > > VIRTIO_CONFIG_S_DRIVER_OK))) { > > assert(!s->dataplane_started); > > } > > > > -if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) { > > +if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) { > > return; > > } > > > > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c > > index 9470bd7..881b1ff 100644 > > --- a/hw/char/virtio-serial-bus.c > > +++ b/hw/char/virtio-serial-bus.c > > @@ -616,7 +616,7 @@ static void guest_reset(VirtIOSerial *vser) > > } > > } > >
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
On 11.12.2017 07:35, Michael S. Tsirkin wrote: > On Fri, Dec 08, 2017 at 05:54:18PM +0300, Ilya Maximets wrote: >> On 07.12.2017 20:27, Michael S. Tsirkin wrote: >>> On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote: On 06.12.2017 19:45, Michael S. Tsirkin wrote: > On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote: >> In case virtio error occured after vhost_dev_close(), qemu will crash >> in nested cleanup while checking IOMMU flag because dev->vdev already >> set to zero and resources are already freed. >> >> Example: >> >> Program received signal SIGSEGV, Segmentation fault. >> vhost_virtqueue_stop at hw/virtio/vhost.c:1155 >> >> #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 >> #1 vhost_dev_stop at hw/virtio/vhost.c:1594 >> #2 vhost_net_stop_one at hw/net/vhost_net.c:289 >> #3 vhost_net_stop at hw/net/vhost_net.c:368 >> >> Nested call to vhost_net_stop(). First time was at #14. >> >> #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 >> #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 >> #6 virtio_set_status at hw/virtio/virtio.c:1146 >> #7 virtio_error at hw/virtio/virtio.c:2455 >> >> virtqueue_get_head() failed here. >> >> #8 virtqueue_get_head at hw/virtio/virtio.c:543 >> #9 virtqueue_drop_all at hw/virtio/virtio.c:984 >> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 >> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 >> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 >> >> vhost_dev_stop() was executed here. dev->vdev == NULL now. >> >> #13 vhost_net_stop_one at hw/net/vhost_net.c:290 >> #14 vhost_net_stop at hw/net/vhost_net.c:368 >> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 >> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 >> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 >> #18 chr_closed_bh at net/vhost-user.c:214 >> #19 aio_bh_call at util/async.c:90 >> #20 aio_bh_poll at util/async.c:118 >> #21 aio_dispatch at util/aio-posix.c:429 >> #22 aio_ctx_dispatch at util/async.c:261 >> #23 g_main_context_dispatch >> #24 glib_pollfds_poll at util/main-loop.c:213 >> #25 os_host_main_loop_wait at util/main-loop.c:261 >> #26 main_loop_wait at util/main-loop.c:515 >> #27 main_loop () at vl.c:1917 >> #28 main at vl.c:4795 >> >> Above backtrace captured from qemu crash on vhost disconnect while >> virtio driver in guest was in failed state. >> >> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but >> it will assert further trying to free already freed ioeventfds. The >> real problem is that we're allowing nested calls to 'vhost_net_stop'. >> >> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid >> any possible double frees and segmentation faults doue to using of >> already freed resources by setting 'vhost_started' flag to zero prior >> to 'vhost_net_stop' call. >> >> Signed-off-by: Ilya Maximets >> --- >> >> This issue was already addressed more than a year ago by the following >> patch: >> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html >> but it was dropped without review due to not yet implemented >> re-connection >> of vhost-user. Re-connection implementation lately fixed most of the >> nested calls, but few of them still exists. For example, above backtrace >> captured after 'virtqueue_get_head()' failure on vhost-user >> disconnection. >> >> hw/net/virtio-net.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> index 38674b0..4d95a18 100644 >> --- a/hw/net/virtio-net.c >> +++ b/hw/net/virtio-net.c >> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, >> uint8_t status) >> n->vhost_started = 0; >> } >> } else { >> -vhost_net_stop(vdev, n->nic->ncs, queues); >> n->vhost_started = 0; >> +vhost_net_stop(vdev, n->nic->ncs, queues); >> } >> } > > Well the wider context is > > > n->vhost_started = 1; > r = vhost_net_start(vdev, n->nic->ncs, queues); > if (r < 0) { > error_report("unable to start vhost net: %d: " > "falling back on userspace virtio", -r); > n->vhost_started = 0; > } > } else { > vhost_net_stop(vdev, n->nic->ncs, queues); > n->vhost_started = 0; > > So we set it to 1 before start, we shoul
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
On Fri, Dec 08, 2017 at 05:54:18PM +0300, Ilya Maximets wrote: > On 07.12.2017 20:27, Michael S. Tsirkin wrote: > > On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote: > >> On 06.12.2017 19:45, Michael S. Tsirkin wrote: > >>> On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote: > In case virtio error occured after vhost_dev_close(), qemu will crash > in nested cleanup while checking IOMMU flag because dev->vdev already > set to zero and resources are already freed. > > Example: > > Program received signal SIGSEGV, Segmentation fault. > vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > > #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > #1 vhost_dev_stop at hw/virtio/vhost.c:1594 > #2 vhost_net_stop_one at hw/net/vhost_net.c:289 > #3 vhost_net_stop at hw/net/vhost_net.c:368 > > Nested call to vhost_net_stop(). First time was at #14. > > #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 > #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 > #6 virtio_set_status at hw/virtio/virtio.c:1146 > #7 virtio_error at hw/virtio/virtio.c:2455 > > virtqueue_get_head() failed here. > > #8 virtqueue_get_head at hw/virtio/virtio.c:543 > #9 virtqueue_drop_all at hw/virtio/virtio.c:984 > #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 > #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 > #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 > > vhost_dev_stop() was executed here. dev->vdev == NULL now. > > #13 vhost_net_stop_one at hw/net/vhost_net.c:290 > #14 vhost_net_stop at hw/net/vhost_net.c:368 > #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 > #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 > #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 > #18 chr_closed_bh at net/vhost-user.c:214 > #19 aio_bh_call at util/async.c:90 > #20 aio_bh_poll at util/async.c:118 > #21 aio_dispatch at util/aio-posix.c:429 > #22 aio_ctx_dispatch at util/async.c:261 > #23 g_main_context_dispatch > #24 glib_pollfds_poll at util/main-loop.c:213 > #25 os_host_main_loop_wait at util/main-loop.c:261 > #26 main_loop_wait at util/main-loop.c:515 > #27 main_loop () at vl.c:1917 > #28 main at vl.c:4795 > > Above backtrace captured from qemu crash on vhost disconnect while > virtio driver in guest was in failed state. > > We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but > it will assert further trying to free already freed ioeventfds. The > real problem is that we're allowing nested calls to 'vhost_net_stop'. > > This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid > any possible double frees and segmentation faults doue to using of > already freed resources by setting 'vhost_started' flag to zero prior > to 'vhost_net_stop' call. > > Signed-off-by: Ilya Maximets > --- > > This issue was already addressed more than a year ago by the following > patch: > https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html > but it was dropped without review due to not yet implemented > re-connection > of vhost-user. Re-connection implementation lately fixed most of the > nested calls, but few of them still exists. For example, above backtrace > captured after 'virtqueue_get_head()' failure on vhost-user > disconnection. > > hw/net/virtio-net.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index 38674b0..4d95a18 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, > uint8_t status) > n->vhost_started = 0; > } > } else { > -vhost_net_stop(vdev, n->nic->ncs, queues); > n->vhost_started = 0; > +vhost_net_stop(vdev, n->nic->ncs, queues); > } > } > >>> > >>> Well the wider context is > >>> > >>> > >>> n->vhost_started = 1; > >>> r = vhost_net_start(vdev, n->nic->ncs, queues); > >>> if (r < 0) { > >>> error_report("unable to start vhost net: %d: " > >>> "falling back on userspace virtio", -r); > >>> n->vhost_started = 0; > >>> } > >>> } else { > >>> vhost_net_stop(vdev, n->nic->ncs, queues); > >>> n->vhost_started = 0; > >>> > >>> So we set it to 1 before start, we should clear after stop. > >> > >> OK. I agree that cle
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
On 07.12.2017 20:27, Michael S. Tsirkin wrote: > On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote: >> On 06.12.2017 19:45, Michael S. Tsirkin wrote: >>> On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote: In case virtio error occured after vhost_dev_close(), qemu will crash in nested cleanup while checking IOMMU flag because dev->vdev already set to zero and resources are already freed. Example: Program received signal SIGSEGV, Segmentation fault. vhost_virtqueue_stop at hw/virtio/vhost.c:1155 #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 #1 vhost_dev_stop at hw/virtio/vhost.c:1594 #2 vhost_net_stop_one at hw/net/vhost_net.c:289 #3 vhost_net_stop at hw/net/vhost_net.c:368 Nested call to vhost_net_stop(). First time was at #14. #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 #6 virtio_set_status at hw/virtio/virtio.c:1146 #7 virtio_error at hw/virtio/virtio.c:2455 virtqueue_get_head() failed here. #8 virtqueue_get_head at hw/virtio/virtio.c:543 #9 virtqueue_drop_all at hw/virtio/virtio.c:984 #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 vhost_dev_stop() was executed here. dev->vdev == NULL now. #13 vhost_net_stop_one at hw/net/vhost_net.c:290 #14 vhost_net_stop at hw/net/vhost_net.c:368 #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 #18 chr_closed_bh at net/vhost-user.c:214 #19 aio_bh_call at util/async.c:90 #20 aio_bh_poll at util/async.c:118 #21 aio_dispatch at util/aio-posix.c:429 #22 aio_ctx_dispatch at util/async.c:261 #23 g_main_context_dispatch #24 glib_pollfds_poll at util/main-loop.c:213 #25 os_host_main_loop_wait at util/main-loop.c:261 #26 main_loop_wait at util/main-loop.c:515 #27 main_loop () at vl.c:1917 #28 main at vl.c:4795 Above backtrace captured from qemu crash on vhost disconnect while virtio driver in guest was in failed state. We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but it will assert further trying to free already freed ioeventfds. The real problem is that we're allowing nested calls to 'vhost_net_stop'. This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid any possible double frees and segmentation faults doue to using of already freed resources by setting 'vhost_started' flag to zero prior to 'vhost_net_stop' call. Signed-off-by: Ilya Maximets --- This issue was already addressed more than a year ago by the following patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html but it was dropped without review due to not yet implemented re-connection of vhost-user. Re-connection implementation lately fixed most of the nested calls, but few of them still exists. For example, above backtrace captured after 'virtqueue_get_head()' failure on vhost-user disconnection. hw/net/virtio-net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 38674b0..4d95a18 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) n->vhost_started = 0; } } else { -vhost_net_stop(vdev, n->nic->ncs, queues); n->vhost_started = 0; +vhost_net_stop(vdev, n->nic->ncs, queues); } } >>> >>> Well the wider context is >>> >>> >>> n->vhost_started = 1; >>> r = vhost_net_start(vdev, n->nic->ncs, queues); >>> if (r < 0) { >>> error_report("unable to start vhost net: %d: " >>> "falling back on userspace virtio", -r); >>> n->vhost_started = 0; >>> } >>> } else { >>> vhost_net_stop(vdev, n->nic->ncs, queues); >>> n->vhost_started = 0; >>> >>> So we set it to 1 before start, we should clear after stop. >> >> OK. I agree that clearing after is a bit safer. But in this case we need >> a separate flag or other way to detect that we're already inside the >> 'vhost_net_stop()'. >> >> What do you think about that old patch: >> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html ? >> >> It implements the same thing bu
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote: > On 06.12.2017 19:45, Michael S. Tsirkin wrote: > > On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote: > >> In case virtio error occured after vhost_dev_close(), qemu will crash > >> in nested cleanup while checking IOMMU flag because dev->vdev already > >> set to zero and resources are already freed. > >> > >> Example: > >> > >> Program received signal SIGSEGV, Segmentation fault. > >> vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > >> > >> #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > >> #1 vhost_dev_stop at hw/virtio/vhost.c:1594 > >> #2 vhost_net_stop_one at hw/net/vhost_net.c:289 > >> #3 vhost_net_stop at hw/net/vhost_net.c:368 > >> > >> Nested call to vhost_net_stop(). First time was at #14. > >> > >> #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 > >> #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 > >> #6 virtio_set_status at hw/virtio/virtio.c:1146 > >> #7 virtio_error at hw/virtio/virtio.c:2455 > >> > >> virtqueue_get_head() failed here. > >> > >> #8 virtqueue_get_head at hw/virtio/virtio.c:543 > >> #9 virtqueue_drop_all at hw/virtio/virtio.c:984 > >> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 > >> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 > >> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 > >> > >> vhost_dev_stop() was executed here. dev->vdev == NULL now. > >> > >> #13 vhost_net_stop_one at hw/net/vhost_net.c:290 > >> #14 vhost_net_stop at hw/net/vhost_net.c:368 > >> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 > >> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 > >> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 > >> #18 chr_closed_bh at net/vhost-user.c:214 > >> #19 aio_bh_call at util/async.c:90 > >> #20 aio_bh_poll at util/async.c:118 > >> #21 aio_dispatch at util/aio-posix.c:429 > >> #22 aio_ctx_dispatch at util/async.c:261 > >> #23 g_main_context_dispatch > >> #24 glib_pollfds_poll at util/main-loop.c:213 > >> #25 os_host_main_loop_wait at util/main-loop.c:261 > >> #26 main_loop_wait at util/main-loop.c:515 > >> #27 main_loop () at vl.c:1917 > >> #28 main at vl.c:4795 > >> > >> Above backtrace captured from qemu crash on vhost disconnect while > >> virtio driver in guest was in failed state. > >> > >> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but > >> it will assert further trying to free already freed ioeventfds. The > >> real problem is that we're allowing nested calls to 'vhost_net_stop'. > >> > >> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid > >> any possible double frees and segmentation faults doue to using of > >> already freed resources by setting 'vhost_started' flag to zero prior > >> to 'vhost_net_stop' call. > >> > >> Signed-off-by: Ilya Maximets > >> --- > >> > >> This issue was already addressed more than a year ago by the following > >> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html > >> but it was dropped without review due to not yet implemented re-connection > >> of vhost-user. Re-connection implementation lately fixed most of the > >> nested calls, but few of them still exists. For example, above backtrace > >> captured after 'virtqueue_get_head()' failure on vhost-user disconnection. > >> > >> hw/net/virtio-net.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >> index 38674b0..4d95a18 100644 > >> --- a/hw/net/virtio-net.c > >> +++ b/hw/net/virtio-net.c > >> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, > >> uint8_t status) > >> n->vhost_started = 0; > >> } > >> } else { > >> -vhost_net_stop(vdev, n->nic->ncs, queues); > >> n->vhost_started = 0; > >> +vhost_net_stop(vdev, n->nic->ncs, queues); > >> } > >> } > > > > Well the wider context is > > > > > > n->vhost_started = 1; > > r = vhost_net_start(vdev, n->nic->ncs, queues); > > if (r < 0) { > > error_report("unable to start vhost net: %d: " > > "falling back on userspace virtio", -r); > > n->vhost_started = 0; > > } > > } else { > > vhost_net_stop(vdev, n->nic->ncs, queues); > > n->vhost_started = 0; > > > > So we set it to 1 before start, we should clear after stop. > > OK. I agree that clearing after is a bit safer. But in this case we need > a separate flag or other way to detect that we're already inside the > 'vhost_net_stop()'. > > What do you think about that old patch: > https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html ? > > It implements the same thing but introduces additional flag. It even could > be sti
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote: > On 06.12.2017 19:45, Michael S. Tsirkin wrote: > > On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote: > >> In case virtio error occured after vhost_dev_close(), qemu will crash > >> in nested cleanup while checking IOMMU flag because dev->vdev already > >> set to zero and resources are already freed. > >> > >> Example: > >> > >> Program received signal SIGSEGV, Segmentation fault. > >> vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > >> > >> #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > >> #1 vhost_dev_stop at hw/virtio/vhost.c:1594 > >> #2 vhost_net_stop_one at hw/net/vhost_net.c:289 > >> #3 vhost_net_stop at hw/net/vhost_net.c:368 > >> > >> Nested call to vhost_net_stop(). First time was at #14. > >> > >> #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 > >> #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 > >> #6 virtio_set_status at hw/virtio/virtio.c:1146 > >> #7 virtio_error at hw/virtio/virtio.c:2455 > >> > >> virtqueue_get_head() failed here. > >> > >> #8 virtqueue_get_head at hw/virtio/virtio.c:543 > >> #9 virtqueue_drop_all at hw/virtio/virtio.c:984 > >> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 > >> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 > >> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 > >> > >> vhost_dev_stop() was executed here. dev->vdev == NULL now. > >> > >> #13 vhost_net_stop_one at hw/net/vhost_net.c:290 > >> #14 vhost_net_stop at hw/net/vhost_net.c:368 > >> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 > >> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 > >> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 > >> #18 chr_closed_bh at net/vhost-user.c:214 > >> #19 aio_bh_call at util/async.c:90 > >> #20 aio_bh_poll at util/async.c:118 > >> #21 aio_dispatch at util/aio-posix.c:429 > >> #22 aio_ctx_dispatch at util/async.c:261 > >> #23 g_main_context_dispatch > >> #24 glib_pollfds_poll at util/main-loop.c:213 > >> #25 os_host_main_loop_wait at util/main-loop.c:261 > >> #26 main_loop_wait at util/main-loop.c:515 > >> #27 main_loop () at vl.c:1917 > >> #28 main at vl.c:4795 > >> > >> Above backtrace captured from qemu crash on vhost disconnect while > >> virtio driver in guest was in failed state. > >> > >> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but > >> it will assert further trying to free already freed ioeventfds. The > >> real problem is that we're allowing nested calls to 'vhost_net_stop'. > >> > >> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid > >> any possible double frees and segmentation faults doue to using of > >> already freed resources by setting 'vhost_started' flag to zero prior > >> to 'vhost_net_stop' call. > >> > >> Signed-off-by: Ilya Maximets > >> --- > >> > >> This issue was already addressed more than a year ago by the following > >> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html > >> but it was dropped without review due to not yet implemented re-connection > >> of vhost-user. Re-connection implementation lately fixed most of the > >> nested calls, but few of them still exists. For example, above backtrace > >> captured after 'virtqueue_get_head()' failure on vhost-user disconnection. > >> > >> hw/net/virtio-net.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >> index 38674b0..4d95a18 100644 > >> --- a/hw/net/virtio-net.c > >> +++ b/hw/net/virtio-net.c > >> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, > >> uint8_t status) > >> n->vhost_started = 0; > >> } > >> } else { > >> -vhost_net_stop(vdev, n->nic->ncs, queues); > >> n->vhost_started = 0; > >> +vhost_net_stop(vdev, n->nic->ncs, queues); > >> } > >> } > > > > Well the wider context is > > > > > > n->vhost_started = 1; > > r = vhost_net_start(vdev, n->nic->ncs, queues); > > if (r < 0) { > > error_report("unable to start vhost net: %d: " > > "falling back on userspace virtio", -r); > > n->vhost_started = 0; > > } > > } else { > > vhost_net_stop(vdev, n->nic->ncs, queues); > > n->vhost_started = 0; > > > > So we set it to 1 before start, we should clear after stop. > > OK. I agree that clearing after is a bit safer. But in this case we need > a separate flag or other way to detect that we're already inside the > 'vhost_net_stop()'. > > What do you think about that old patch: > https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html ? > > It implements the same thing but introduces additional flag. It even could > be sti
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
On 06.12.2017 19:45, Michael S. Tsirkin wrote: > On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote: >> In case virtio error occured after vhost_dev_close(), qemu will crash >> in nested cleanup while checking IOMMU flag because dev->vdev already >> set to zero and resources are already freed. >> >> Example: >> >> Program received signal SIGSEGV, Segmentation fault. >> vhost_virtqueue_stop at hw/virtio/vhost.c:1155 >> >> #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 >> #1 vhost_dev_stop at hw/virtio/vhost.c:1594 >> #2 vhost_net_stop_one at hw/net/vhost_net.c:289 >> #3 vhost_net_stop at hw/net/vhost_net.c:368 >> >> Nested call to vhost_net_stop(). First time was at #14. >> >> #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 >> #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 >> #6 virtio_set_status at hw/virtio/virtio.c:1146 >> #7 virtio_error at hw/virtio/virtio.c:2455 >> >> virtqueue_get_head() failed here. >> >> #8 virtqueue_get_head at hw/virtio/virtio.c:543 >> #9 virtqueue_drop_all at hw/virtio/virtio.c:984 >> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 >> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 >> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 >> >> vhost_dev_stop() was executed here. dev->vdev == NULL now. >> >> #13 vhost_net_stop_one at hw/net/vhost_net.c:290 >> #14 vhost_net_stop at hw/net/vhost_net.c:368 >> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 >> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 >> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 >> #18 chr_closed_bh at net/vhost-user.c:214 >> #19 aio_bh_call at util/async.c:90 >> #20 aio_bh_poll at util/async.c:118 >> #21 aio_dispatch at util/aio-posix.c:429 >> #22 aio_ctx_dispatch at util/async.c:261 >> #23 g_main_context_dispatch >> #24 glib_pollfds_poll at util/main-loop.c:213 >> #25 os_host_main_loop_wait at util/main-loop.c:261 >> #26 main_loop_wait at util/main-loop.c:515 >> #27 main_loop () at vl.c:1917 >> #28 main at vl.c:4795 >> >> Above backtrace captured from qemu crash on vhost disconnect while >> virtio driver in guest was in failed state. >> >> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but >> it will assert further trying to free already freed ioeventfds. The >> real problem is that we're allowing nested calls to 'vhost_net_stop'. >> >> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid >> any possible double frees and segmentation faults doue to using of >> already freed resources by setting 'vhost_started' flag to zero prior >> to 'vhost_net_stop' call. >> >> Signed-off-by: Ilya Maximets >> --- >> >> This issue was already addressed more than a year ago by the following >> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html >> but it was dropped without review due to not yet implemented re-connection >> of vhost-user. Re-connection implementation lately fixed most of the >> nested calls, but few of them still exists. For example, above backtrace >> captured after 'virtqueue_get_head()' failure on vhost-user disconnection. >> >> hw/net/virtio-net.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> index 38674b0..4d95a18 100644 >> --- a/hw/net/virtio-net.c >> +++ b/hw/net/virtio-net.c >> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, >> uint8_t status) >> n->vhost_started = 0; >> } >> } else { >> -vhost_net_stop(vdev, n->nic->ncs, queues); >> n->vhost_started = 0; >> +vhost_net_stop(vdev, n->nic->ncs, queues); >> } >> } > > Well the wider context is > > > n->vhost_started = 1; > r = vhost_net_start(vdev, n->nic->ncs, queues); > if (r < 0) { > error_report("unable to start vhost net: %d: " > "falling back on userspace virtio", -r); > n->vhost_started = 0; > } > } else { > vhost_net_stop(vdev, n->nic->ncs, queues); > n->vhost_started = 0; > > So we set it to 1 before start, we should clear after stop. OK. I agree that clearing after is a bit safer. But in this case we need a separate flag or other way to detect that we're already inside the 'vhost_net_stop()'. What do you think about that old patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html ? It implements the same thing but introduces additional flag. It even could be still applicable.
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop
On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote: > In case virtio error occured after vhost_dev_close(), qemu will crash > in nested cleanup while checking IOMMU flag because dev->vdev already > set to zero and resources are already freed. > > Example: > > Program received signal SIGSEGV, Segmentation fault. > vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > > #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > #1 vhost_dev_stop at hw/virtio/vhost.c:1594 > #2 vhost_net_stop_one at hw/net/vhost_net.c:289 > #3 vhost_net_stop at hw/net/vhost_net.c:368 > > Nested call to vhost_net_stop(). First time was at #14. > > #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 > #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 > #6 virtio_set_status at hw/virtio/virtio.c:1146 > #7 virtio_error at hw/virtio/virtio.c:2455 > > virtqueue_get_head() failed here. > > #8 virtqueue_get_head at hw/virtio/virtio.c:543 > #9 virtqueue_drop_all at hw/virtio/virtio.c:984 > #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 > #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 > #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 > > vhost_dev_stop() was executed here. dev->vdev == NULL now. > > #13 vhost_net_stop_one at hw/net/vhost_net.c:290 > #14 vhost_net_stop at hw/net/vhost_net.c:368 > #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 > #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 > #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 > #18 chr_closed_bh at net/vhost-user.c:214 > #19 aio_bh_call at util/async.c:90 > #20 aio_bh_poll at util/async.c:118 > #21 aio_dispatch at util/aio-posix.c:429 > #22 aio_ctx_dispatch at util/async.c:261 > #23 g_main_context_dispatch > #24 glib_pollfds_poll at util/main-loop.c:213 > #25 os_host_main_loop_wait at util/main-loop.c:261 > #26 main_loop_wait at util/main-loop.c:515 > #27 main_loop () at vl.c:1917 > #28 main at vl.c:4795 > > Above backtrace captured from qemu crash on vhost disconnect while > virtio driver in guest was in failed state. > > We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but > it will assert further trying to free already freed ioeventfds. The > real problem is that we're allowing nested calls to 'vhost_net_stop'. > > This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid > any possible double frees and segmentation faults doue to using of > already freed resources by setting 'vhost_started' flag to zero prior > to 'vhost_net_stop' call. > > Signed-off-by: Ilya Maximets > --- > > This issue was already addressed more than a year ago by the following > patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html > but it was dropped without review due to not yet implemented re-connection > of vhost-user. Re-connection implementation lately fixed most of the > nested calls, but few of them still exists. For example, above backtrace > captured after 'virtqueue_get_head()' failure on vhost-user disconnection. > > hw/net/virtio-net.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index 38674b0..4d95a18 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t > status) > n->vhost_started = 0; > } > } else { > -vhost_net_stop(vdev, n->nic->ncs, queues); > n->vhost_started = 0; > +vhost_net_stop(vdev, n->nic->ncs, queues); > } > } Well the wider context is n->vhost_started = 1; r = vhost_net_start(vdev, n->nic->ncs, queues); if (r < 0) { error_report("unable to start vhost net: %d: " "falling back on userspace virtio", -r); n->vhost_started = 0; } } else { vhost_net_stop(vdev, n->nic->ncs, queues); n->vhost_started = 0; So we set it to 1 before start, we should clear after stop. > -- > 2.7.4