> Am 13.09.2016 um 20:04 schrieb Michael Roth <mdr...@linux.vnet.ibm.com>: > > Quoting Peter Lieven (2016-09-13 10:52:04) >> >> >>>> Am 13.09.2016 um 17:42 schrieb Stefan Hajnoczi <stefa...@redhat.com>: >>>> >>>> On Thu, Sep 08, 2016 at 03:58:26PM -0500, Michael Roth wrote: >>>> Quoting Stefan Hajnoczi (2016-09-05 12:54:35) >>>>>> On Fri, Aug 26, 2016 at 01:45:56PM +0200, Peter Lieven wrote: >>>>>>>> Am 25.08.2016 um 19:23 schrieb Michael Roth: >>>>>>>> Quoting Peter Lieven (2016-08-25 01:38:13) >>>>>>>> 7c509d1 virtio: decrement vq->inuse in virtqueue_discard() >>>>>>>> 700f26b virtio: recalculate vq->inuse after migration >>>>>>> Looks like these got posted during the freeze :( >>>>>>> >>>>>>>> The virtio thing is important because live migration is broken without >>>>>>>> the fix as 86cc089 is in 2.6.1. >>>>>>> Not sure I understand the relation to 86cc089. Wouldn't the check >>>>>>> introduced there always pass due to target initializing inuse to 0? >>>>>>> >>>>>>> Or is the issue that the fix introduced in 86cc089 is only partially >>>>>>> effective due to inuse not being recalculated properly on target? That >>>>>>> might >>>>>>> warrant a 2.6.1.1... >>>>>> >>>>>> This is what Stefan wrote in the cover letter to the series: >>>>>> >>>>>> "I should mention this is for QEMU 2.7. These fixes are needed if the >>>>>> CVE-2016-5403 patch has been applied. Without these patches any device >>>>>> that holds VirtQueueElements acros >>>>>> live migration will terminate with a "Virtqueue size exceeded" error >>>>>> message. virtio-balloon and virtio-scsi are affected. virtio-bl >>>>>> probably too but I haven't tested it." >>>>>> >>>>>> Maybe >>>>> >>>>> The virtio inuse fixes are needed for stable (v2.6.2?) so that the >>>>> spurious "Virtqueue size exceeded" on migration is solved. >>>>> >>>>> The error can be reproduced when there is a VirtQueueElement pending >>>>> across migration (e.g. virtio-blk s->rq failed request list). >>>> >>>> Thanks for clarifying. I'm planning to do a 2.6.2 to capture these, the >>>> patches Peter mentioned, and some other fixes that came during 2.7 RC >>>> phase. >>>> >>>> I have an initial staging tree at: >>>> >>>> https://github.com/mdroth/qemu/commits/stable-2.6-staging >>>> >>>> There's still a few PULLs in flight with patches I plan to pull in, but >>>> hoping to send out the patch round-up early next week and a release the >>>> following week. >>> >>> Two more candidates for stable: >>> >>> 4b7f91e virtio: zero vq->inuse in virtio_reset() >>> 104e70c virtio-balloon: discard virtqueue element on reset >>> >>> They also deal with "Virtqueue size exceeded" errors. >>> >>> Stefan >> >> There also seems to be an regression (segfault) in the VNC server in 2.6.1, >> but i am still investigating. > > Do you have a reproducer? I can try a bisect. Trying to get the initial > staging tree posted today but want to make sure any known regressions are > addressed beforehand.
i am out of Office till Monday, but if I remember correctly I saw mutex errors (not segfaults) with 2.6.1 that were not there on 2.5.1.1. They happened while my colleagues where experimenting with a new VNC client. So its likely that a certain connect/disconnect pattern is the trigger. I am not sure if the same issue exists in master. For more details we might have to wait till i am back at the office, sorry. However, CC'ing Jan from Kamp. Maybe he has a reproducer. Peter