Re:Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-11 Thread ThinerLogoer
At 2023-08-12 03:00:54, "David Hildenbrand"  wrote:
>On 11.08.23 07:49, ThinerLogoer wrote:
>> At 2023-08-11 05:24:43, "Peter Xu"  wrote:
>>> On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
> I think we have the following options (there might be more)
>
> 1) This patch.
>
> 2) New flag for memory-backend-file. We already have "readonly" and
> "share=". I'm having a hard time coming up with a good name that really
> describes the subtle difference.
>
> 3) Glue behavior to the QEMU machine
>

 4) '-deny-private-discard' argv, or environment variable, or both
>>>
>>> I'd personally vote for (2).  How about "fdperm"?  To describe when we want
>>> to use different rw permissions on the file (besides the access permission
>>> of the memory we already provided with "readonly"=XXX).  IIUC the only sane
>>> value will be ro/rw/default, where "default" should just use the same rw
>>> permission as the memory ("readonly"=XXX).
>>>
>>> Would that be relatively clean and also work in this use case?
>>>
>>> (the other thing I'd wish we don't have that fallback is, as long as we
>>> have any of that "fallback" we'll need to be compatible with it since
>>> then, and for ever...)
>> 
>> If it must be (2), I would vote (2) + (4), with (4) adjust the default 
>> behavior of said `fdperm`.
>> Mainly because (private+discard) is itself not a good practice and (4) serves
>> as a good tool to help catch existing (private+discard) problems.
>
>Instead of fdperm, maybe we could find a better name.
>
>The man page of "open" says: The argument flags must include one of the 
>following access modes: O_RDONLY, O_WRONLY, or O_RDWR.  These request 
>opening the file read-only, write-only, or read/write, respectively.
>
>So maybe something a bit more mouthful like "file-access-mode" would be 
>better.

Maybe "fd-mode"or "open-mode" to make it shorter and also non ambiguous.

"open-access-mode" can also be considered.

Btw my chatgpt agent suggested 10 names to me, in case you feel hard to
decide a name:

access-mode
file-mode
open-mode
file-permission
file-access-mode (aha!)
file-access-permission
file-io-access-mode
file-open-permission
file-operation-mode
file-read-write-mode


>
>We could have the options
>*readwrite
>*readonly
>*auto
>
>Whereby "auto" is the default.

I like the "auto" here.

>
>Specifying:
>
>* "readonly=true,file-access-mode=readwrite" would fail.
>* "shared=true,readonly=false,file-access-mode=readonly" would fail.
>* "shared=false,readonly=false,file-access-mode=readonly" would work.

Yeah, sanitizing the arguments here is wise.

>
>In theory, we could adjust the mode of "auto" with compat machines. But 
>maybe it would be sufficient to do something like the following
>
>1) shared=true,readonly=false
>   -> readwrite
>2) shared=true,readonly=true
>   > readonly
>2) shared=false,readonly=true
>   -> readonly
>3) shared=false,readonly=true
>   hugetlb? -> readwrite
>   otherwise -> readonly

Looks like there is some typo here. I assume it was:

1) shared=true,readonly=false
-> readwrite
2) shared=true,readonly=true
-> readonly
3) shared=false,readonly=true
-> readonly
4) shared=false,readonly=false
hugetlb? -> readwrite
otherwise -> readonly


>Reason being the traditional usage of hugetlb with MAP_PRIVATE where I 
>can see possible users with postcopy. Further, VM templating with 
>hugetlb might not be that common ... users could still modify the 
>behavior if they really have to.
>
>That would make your use case work automatically with file-backed memory.

This seems a good idea. I am good with the solution you proposed
here as well.


--

Regards,

logoerthiner


Re:Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-10 Thread ThinerLogoer
At 2023-08-10 22:19:45, "David Hildenbrand"  wrote:
>>> Most importantly, we won't be corrupting/touching the original file in any
>>> case, because it is R/O.
>>>
>>> If we really want to be careful, we could clue that behavior to compat
>>> machines. I'm not really sure yet if we really have to go down that path.
>>>
>>> Any other alternatives? I'd like to avoid new flags where not really
>>> required.
>> 
>> I was just thinking of a new flag. :) So have you already discussed that
>> possibility and decided that not a good idea?
>
>Not really. I was briefly playing with that idea but already struggled 
>to come up with a reasonable name :)
>
>Less toggles and just have it working nice, if possible.
>
>> 
>> The root issue to me here is we actually have two resources (memory map of
>> the process, and the file) but we only have one way to describe the
>> permissions upon the two objects.  I'd think it makes a lot more sense if a
>> new flag is added, when there's a need to differentiate the two.
>> 
>> Consider if you see a bunch of qemu instances with:
>> 
>>-mem-path $RAM_FILE
>> 
>> On the same host, which can be as weird as it could be to me.. At least
>> '-mem-path' looks still like a way to exclusively own a ram file for an
>> instance. I hesitate the new fallback can confuse people too, while that's
>> so far not the major use case.
>
>Once I learned that this is not a MAP_SHARED mapping, I was extremely 
>confused. For example, vhost-user with "-mem-path" will absolutely not 
>work with "-mem-path", even though the documentation explicitly spells 
>that out (I still have to send a patch to fix that).
>
>I guess "-mem-path" was primarily only used to consume hugetlb. Even for 
>tmpfs it will already result in a double memory consumption, just like 
>when using -memory-backend-memfd,share=no.
>
>I guess deprecating it was the right decision.
>
>But memory-backend-file also defaults to "share=no" ... so the same 
>default behavior unfortunately.
>
>> 
>> Nobody may really rely on any existing behavior of the failure, but
>> changing existing behavior is just always not wanted.  The guideline here
>> to me is: whether we want existing "-mem-path XXX" users to start using the
>> fallback in general?  If it's "no", then maybe it implies a new flag is
>> better?
>
>I think we have the following options (there might be more)
>
>1) This patch.
>
>2) New flag for memory-backend-file. We already have "readonly" and 
>"share=". I'm having a hard time coming up with a good name that really 
>describes the subtle difference.
>
>3) Glue behavior to the QEMU machine
>

4) '-deny-private-discard' argv, or environment variable, or both

I have proposed a 4) earlier in discussion which is to add a global qemu flag 
like
'-deny-private-discard' or '-disallow-private-discard' (let's find a better 
name!)
for some duration until private discard behavior phases out. We do
everything exactly the same as before without the flag, and with this flag,
private CoW mapping files are strictly opened readonly, discard on private
memory backend is brutally denied very early when possibility arises,
and private memory backed file are always opened readonly without
creating any file (so the file must exist, no more nasty edge cases).

This has the benefit that it can also help diagnose and debug all existing
private discard usages, which could be required in the long run. Therefore,
And with this flag we directly solves the immediate demand of
 while delays the hard problem indefinitely.
I think this solution seems most promising and acceptable by most ones.
At least for my use case, I would be glad to insert a such flag to my argv
if it is all I need, since it does not hurt the flexibility I care about.

Note that difference on this option probably should not cause difference
on the machine specification. Otherwise migration will fail because one
machine has this option and the other does not, which is absurd, since
it is a backend implementation flag.

>
>For 3), one option would be to always open a COW file readonly (as 
>Thiner originally proposed). We could leave "-mem-path" behavior alone 
>and only change memory-backend-file semantics. If the COW file does 
>*not* exist yet, we would refuse to create the file like patch 2+3 do. 
>Therefore, no ftruncate() errors, and fallocate() errors would always 
>happen.
>
>
>What are your thoughts?
>
>[...]
>

I would be happy if -mem-path stays supported since in this case I would
not need knowledge of memory backend before migration.

--

Regards,

logoerthiner

Re:Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-08 Thread ThinerLogoer
At 2023-08-09 05:01:17, "Peter Xu"  wrote:
>On Mon, Aug 07, 2023 at 09:07:32PM +0200, David Hildenbrand wrote:
>> From: Thiner Logoer 
>> 
>> Users may specify
>> * "-mem-path" or
>> * "-object memory-backend-file,share=off,readonly=off"
>> and expect such COW (MAP_PRIVATE) mappings to work, even if the user
>> does not have write permissions to open the file.
>> 
>> For now, we would always fail in that case, always requiring file write
>> permissions. Let's detect when that failure happens and fallback to opening
>> the file readonly.
>> 
>> Warn the user, since there are other use cases where we want the file to
>> be mapped writable: ftruncate() and fallocate() will fail if the file
>> was not opened with write permissions.
>> 
>> Signed-off-by: Thiner Logoer 
>> Co-developed-by: David Hildenbrand 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  softmmu/physmem.c | 26 ++
>>  1 file changed, 18 insertions(+), 8 deletions(-)
>> 
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>> index 3df73542e1..d1ae694b20 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -1289,8 +1289,7 @@ static int64_t get_file_align(int fd)
>>  static int file_ram_open(const char *path,
>>   const char *region_name,
>>   bool readonly,
>> - bool *created,
>> - Error **errp)
>> + bool *created)
>>  {
>>  char *filename;
>>  char *sanitized_name;
>> @@ -1334,10 +1333,7 @@ static int file_ram_open(const char *path,
>>  g_free(filename);
>>  }
>>  if (errno != EEXIST && errno != EINTR) {
>> -error_setg_errno(errp, errno,
>> - "can't open backing store %s for guest RAM",
>> - path);
>> -return -1;
>> +return -errno;
>>  }
>>  /*
>>   * Try again on EINTR and EEXIST.  The latter happens when
>> @@ -1946,9 +1942,23 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
>> MemoryRegion *mr,
>>  bool created;
>>  RAMBlock *block;
>>  
>> -fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created,
>> -   errp);
>> +fd = file_ram_open(mem_path, memory_region_name(mr), readonly, 
>> &created);
>> +if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
>> +/*
>> + * We can have a writable MAP_PRIVATE mapping of a readonly file.
>> + * However, some operations like ftruncate() or fallocate() might 
>> fail
>> + * later, let's warn the user.
>> + */
>> +fd = file_ram_open(mem_path, memory_region_name(mr), true, 
>> &created);
>> +if (fd >= 0) {
>> +warn_report("backing store %s for guest RAM (MAP_PRIVATE) 
>> opened"
>> +" readonly because the file is not writable", 
>> mem_path);
>
>I can understand the use case, but this will be slightly unwanted,
>especially the user doesn't yet have a way to predict when will it happen.
>
>Meanwhile this changes the behavior, is it a concern that someone may want
>to rely on current behavior of failing?
>

I am happy to learn if there is some solid evidence supporting this view.

The target of compatibility is "private+discard" which seems itself pathological
practice in early discussion. The only difference in behavior that might be 
unwanted
in your argument lies in "readonly file+private+discard" failure time. Before 
the
patch it fails early, after the patch it fails later and may does additional 
stuff.
Do you think that a bug reporting mechanism which relies on qemu failure
timing is valid?

If someone argues that "readonly file+private+discard" early failure behavior
is all where their system relies, I would be happy to learn why. Actually this
is much easier to solve outside qemu, by checking memory file permission,
compared to the "readonly+private" alternative plan that requires a btrfs.

In the long run the "private+discard" setup may itself be warned and
deprecated, rather than "private+readonly file" which is supported by
linux kernel.

Current patch is already a compromise considering compatibility, as it
always tries open the file in readwrite mode first, to persist behavior on
"readwrite+private+discard" case. Personally I prefer to try opening the file
readonly first, as this will reduce the risk of opening ram file accidentally
with write permission.

However I can accept a second level of compromise, aka adding 
"-disallow-private-discard"
to qemu argument for at least three qemu releases before "private+discard"
get deprecated and removed, if extreme compatibility enthusiast is involved.
With this argument, readonly private is enabled and private
discard fails immediately when discard request is detected, and without this
argument readonly private is disabled and the behavior is unchanged.
This argument is useful also because