Hi Peter, Apologies for the delay to answer you.
[...] >> >> - Doing the above is still not enough, as KVM figures what operation to >> do depending on the current state of the memslots. >> Assuming we already have an already existing MR y, and now we get the >> list DELETE(y) CREATE(y/2) (ie reducing y to half its size). >> In this case the interval tree can't do anything, but it's very hard to >> understand that the second request in the list is a CREATE, because when >> KVM performs the check to understand which type of operation it is >> (before doing any modification at all in the memslot list), it finds >> that y (memslot with same id) exist, but has a different size than what >> provided from userspace, therefore it could either fail, or misinterpret >> it as another operation (currently fails -EINVALID). > > Another good question.. I think that can be partly solved by not allowing > ID duplication in the same batched ioctl, or maybe we can fail it. From > qemu perspective, we may need to change the memslot id allocation to not > reuse IDs of being-deleted memslots until the batch is processed. > > Something like adding similar INVALID tags to kvm memslots in QEMU when we > are preparing the batch, then we should only reset memory_size==0 and clear > INVALID tag after the ioctl returns. Then during the batch, any new slots > to be added from kvm_get_free_slot() will not return any duplication of a > deleting one. First of all, you're right. No interval tree is needed. I think the approach I am currently using is something like what you described above: when a DELETE operation is created in QEMU (there is no MOVE), I set the invalid_slot flag to 1. Then KVM will firstly process the requests with invalid_slot == 1, mark the memslot invalid, and then process all the others (invalid included, as they need the actual DELETE/MOVE operation). > >> If we instead already provide the labels, then we can: >> 1. substitute the memslots pointed by DELETE/MOVE with invalid & swap >> (so it is visible, non-atomic. But we don't care, as we are not deleting >> anything) >> 2. delete the invalid memslot (in the inactive memslot list) >> 3. process the other requests (in the inactive memslot list) >> 4. single and atomic swap (step 2 and 3 are now visible). >> >> What do you think? > > Adding some limitation to the new ioctl makes sense to me, especially > moving the DELETE/MOVE to be handled earlier, rather than a complete > mixture of all of them. > > I'm wondering whether the batch ioctl can be layed out separately on the > operations, then it can be clearly documented on the sequence of handling > each op: > > struct kvm_userspace_memory_region_batch { > struct kvm_userspace_memory_region deletes[]; > struct kvm_userspace_memory_region moves[]; > struct kvm_userspace_memory_region creates[]; > struct kvm_userspace_memory_region flags_only[]; > }; > > So that the ordering can be very obvious. But I didn't really think deeper > than that. No, I think it doesn't work. Oder needs to be preserved, I see many DELETE+CREATE operations on the same slot id. > > Side note: do we use MOVE at all in QEMU? Nope :) > >> >> Bonus question: with this atomic operation, do we really need the >> invalid memslot logic for MOVE/DELETE? > > I think so. Not an expert on that, but iiuc that's to make sure we'll zap > all shadow paging that maps to the slots being deleted/moved. > > Paolo can always help to keep me honest above. Yes, we need to keep that logic. v2 is coming soon. Thank you, Emanuele