On Tue, Dec 02, 2025 at 11:50:31AM +0000, Nikita Kalyazin wrote: > > It looks fine indeed, but it looks slightly weird then, as you'll have two > > ways to populate the page cache. Logically here atomicity is indeed not > > needed when you trap both MISSING + MINOR. > > I reran the test based on the UFFDIO_COPY prototype I had using your series > [2], and UFFDIO_COPY is slower than write() to populate 512 MiB: 237 vs 202 > ms (+17%). Even though UFFDIO_COPY alone is functionally sufficient, I > would prefer to have an option to use write() where possible and only > falling back to UFFDIO_COPY for userspace faults to have better performance.
Yes, write() should be fine. Especially to gmem, I guess write() support is needed when VMAs cannot be mapped at all in strict CoCo context, so it needs to be available one way or another. IIUC it's because UFFDIO_COPY (or memcpy(), I recall you used to test that instead) will involve pgtable operations. So I wonder if the VMA mapping the gmem will still be accessed at some point later (either private->share convertable ones for device DMAs for CoCo, or fully shared non-CoCo use case), then the pgtable overhead will happen later for a write()-styled fault resolution. >From that POV, above number makes sense. Thanks for the extra testing results. > > [2] > https://lore.kernel.org/all/[email protected] -- Peter Xu

