On 13/09/2025 01:32, Vishal Annapurve wrote:
On Fri, Sep 12, 2025 at 3:35 PM James Houghton <jthough...@google.com> wrote:
+
+ if (folio_test_uptodate(folio)) {
+ folio_unlock(folio);
+ folio_put(folio);
+ return -ENOSPC;
Does it actually matter for the folio not to be uptodate? It seems
unnecessarily restrictive not to be able to overwrite data if we're
saying that this is only usable for unencrypted memory anyway.
In the context of direct map removal [1] it does actually because when
we mark a folio as prepared, we remove it from the direct map making it
inaccessible to the way write() performs the copy. It does not matter
if direct map removal isn't enabled though. Do you think it should be
conditional?
Oh, good point. It's simpler (both to implement and to describe) to
disallow a second write() call in all cases (no matter if the direct
map for the page has been removed or if the contents have been
encrypted), so I'm all for leaving it unconditional like you have now.
Thanks!
Are we deviating from the way read/write semantics work for the other
filesystems? I don't think other filesystems carry this restriction of
one-time-write only. Do we strictly need the differing semantics?
Yes, I believe we are deviating from other "regular" filesystems, but I
don't think what we propose is too uncommon as in "special" filesystems
such as sysfs subsequent calls to attributes like "remove" will likely
fail as well (not due to up-to-date flag though).
Maybe it would be simpler to not overload uptodate flag and just not
allow read/write if folio is not mapped in the direct map for non-conf
VMs (assuming there could be other ways to deduce that information).
The only such interface I'm aware of is kernel_page_present() so the
check may look like:
for (i = 0; i < folio_nr_pages(folio); i++) {
struct page *page = folio_page(folio, i);
if (!kernel_page_present(page)) {
folio_unlock(folio);
folio_put(folio);
return -ENOSPC;
}
}
However, kernel_page_present() is not currently exported to modules.
Alternatively, the same effect can be achieved via checking for both
kvm_gmem_test_no_direct_map(inode) [1] and folio_test_uptodate(folio).
It would be the "conditional" check I mentioned earlier in the thread.
[1]: https://lore.kernel.org/kvm/20250912091708.17502-6-roy...@amazon.co.uk/
Can there be users who want to populate the file ranges multiple times
as it seems more performant?
Yes, you are right, there may be use cases like that. At the same time,
I think they are much less common because it's more typical for the
initial population to cover larger memory ranges and be sensitive to
performance.
[1]: https://lore.kernel.org/kvm/20250828093902.2719-1-roy...@amazon.co.uk