On Mon, Dec 02, 2024 at 12:37:19PM +0530, Prasad Pandit wrote: > Hello Peter, > > On Fri, 29 Nov 2024 at 22:16, Peter Xu <pet...@redhat.com> wrote: > > I saw that there's still discussion in the previous version, while this > > cover letter doesn't mention why it was ignored. Especially, at least to > > me, what Fabiano commented on simplifying the flush condition check makes > > senes to me to be addressed. > > * It is not ignored. Simplifying flush conditionals makes sense to me > too, that is why in the 'v0' version of this series I had added the > !migration_in_postcopy() check to the migrate_multifd() function, > right?
As explained, that addition was wrong, because migrate_multifd() should always return the user option only. Again, you can add another helper. > I tried to discuss in the 'v1' thread if there's another way to > simplify conditionals. Not sure if you've followed all comments in the > thread. I'll post a version to clean it up, either we go with Fabiano's, or mine, or a 3rd option. We shouldn't pile up more condition check there. It's growing into something not maintainable. > > * Secondly, as you mention above, I also thought Fabiano is pointing > at the complexity of the 'if' conditionals and thus I replied that his > proposed patch does not seem to solve for that complexity. But in his > subsequent reply Fabiano mentions that it is not just about > conditionals, but larger complexity of how and when multifd threads do > flush and sync amongst them. Yes they're relevant, but I think we can cleanup the whole thing and it's not that complicated, IMHO. We'll see. > > * IMHO, simplifying that larger complexity of how multifd threads do > flush and sync can be done independently, outside the scope of this > patch series, which is about enabling multifd and postcopy together. I assume you're working on the test cases, I hope this won't block you from continuing your work on this series. As mentioned above, I think we need to clean this up before moving on, unfortunately. And I hope things settle already before you have the test cases ready. I appreciate you add the test cases for multifd+postcopy. That's very important. Before that you can keep your patch as-is, and leave that part for us to figure out. Feel free to chime in anytime as well. > > > Meanwhile, before I read into any details, I found that all the tests I > > requested are still missing. Would you please consider adding them? > > > > My previous comment regarding to test is here: > > https://lore.kernel.org/qemu-devel/ZykJBq7ME5jgSzCA@x1n/ > > > > I listed exactly the minimum set of tests that I think we should have. > ... > > In general, any migration new feature must have both doc updates and test > > coverage as long as applicable. > > > > Multifd still has its doc missing, which is unfortunate. We could have one > > doc prior to this work, but it can also be done later. > > > > OTOH on testing: this is not a new feature alone, but it's more dangerous > > than a new feature due to what I mentioned before, that postcopy needs > > atomicity on page movements, and multifd is completely against that from > > design POV due to NIC drivers being able to modify guest pages directly. > > > > It means multifd+postcopy bugs will be extremely hard to debug if we don't > > put it right first. So please be serious on the test coverage on this > > work. > > * I'm yet to get to the test cases. The revised series(v1 and v2) are > posted to share patch updates which were suggested in the previous > reviews. Test cases are a separate/different effort from source > patches. If we want to hold on this patch series till we get the test > cases and documentation in place, that is okay, I'll work on that > next. So we talked about this in our meeting, but still just to keep it a record so whoever work on migration can reference: we do require test cases and it's not separate effort. We request both test cases and docs to present before mering a feature, unless there's good reason to not to. E.g. multifd doesn't yet have doc, so doc is not required for this work yet, however test cases are. Another outlier is VFIO+multifd cannot easily add test case because CI normally doesn't have available hardware environment. However does should apply there to be required at least from migration POV. Thanks, -- Peter Xu