> >> 2. Unclear semantics. Alex tried to document what the actual semantics >> of hinted pages are. Assume the following in the guest to a previously >> hinted page >> >> /* page was hinted and is reused now */ >> if (page[x] != Y) >> page[x] == Y; >> /* migration ends, we now run on the destination */ >> BUG_ON(page[x] != Y); >> /* BUG, because the content chan >> >> A guest can observe that. And that could be a random driver that just >> allocated a page. >> >> (I *assume* in Linux we might catch that using kasan, but I am not 100% >> sure, also, the actual semantics to document are unclear - e.g., for >> other guests) >> >> As Alex mentioned, it is not even guaranteed in QEMU that we receive a >> zero page on the destination, it could also be something else (e.g., >> previously migrated values). > > So this is only an issue for pages that are pushed out of the balloon > as a part of the shrinker process though. So fixing it would be pretty > straightforward as we would just have to initialize or at least dirty > pages that are leaked as a part of the shrinker. That may have an > impact on performance though as it would result in us dirtying pages > that are freed as a result of the shrinker being triggered. >
It really depends on the desired semantics, which are unclear because there is no doc/spec. Either QEMU is buggy or the kernel is buggy. >> 3. If I am not wrong, the iothread works in >> virtio_ballloon_get_free_page_hints() on the virtqueue only with holding >> the free_page_lock (no BQL). >> >> Assume we're migrating, the iothread is active, and the guest triggers a >> device reset. >> >> virtio_balloon_device_reset() will trigger a >> virtio_balloon_free_page_stop(s). That won't actually wait for the >> iothread to stop, it will only temporarily lock free_page_lock and >> update s->free_page_report_status. >> >> I think there can be a race between the device reset and the iothread. >> Once virtio_balloon_free_page_stop() returned, >> virtio_ballloon_get_free_page_hints() can still call >> - virtio_queue_set_notification(vq, 0); >> - virtio_queue_set_notification(vq, 1); >> - virtio_notify(vdev, vq); >> - virtqueue_pop() >> >> I doubt this is very nice. > > And our conversation had me start looking though reference to > virtio_balloon_free_page_stop. It looks like we call it for when we > unrealize the device or reset the device. It might make more sense for > us to look at pushing the status to DONE and forcing the iothread to > be flushed out. > >> There are other concerns I had regarding the iothread (e.g., while >> reporting is active, virtio_ballloon_get_free_page_hints() is >> essentially a busy loop, in contrast to documented - >> continue_to_get_hints will always be true). >> >>> The appeal of hinting is that it's 0 overhead outside migration, >>> and pains were taken to avoid keeping pages locked while >>> hypervisor is busy. >>> >>> If we are to drop hinting completely we need to show that reporting >>> can be comparable, and we'll probably want to add a mode for >>> reporting that behaves somewhat similarly. >> >> Depends on the actual users. If we're dropping a feature that nobody is >> actively using, I don't think we have to show anything. >> >> This feature obviously saw no proper review. > > I'm pretty sure it had some, as it went through several iterations as > I recall. However I don't think the review of the virtio interface was > very detailed as I think most of the attention was on the kernel > interface. Yes, that‘s what I meant. The kernel side and the migration code (QEMU) got a lot of attention.