Re: [PATCH v2] audio/pw: Report more accurate error when connecting to PipeWire fails

2024-10-10 Thread Michal Prívozník
On 9/18/24 11:29, Marc-André Lureau wrote:
> On Wed, Sep 18, 2024 at 12:17 PM Michal Privoznik  wrote:
>>
>> According to its man page [1], pw_context_connect() sets errno on
>> failure:
>>
>>   Returns a Core on success or NULL with errno set on error.
>>
>> It may be handy to see errno when figuring out why PipeWire
>> failed to connect. That leaves us with just one possible path to
>> reach 'fail_error' label which is then moved to that path and
>> also its error message is adjusted slightly.
>>
>> 1: 
>> https://docs.pipewire.org/group__pw__core.html#ga5994e3a54e4ec718094ca02a1234815b
>>
>> Signed-off-by: Michal Privoznik 
> 
> Reviewed-by: Marc-André Lureau 

Thanks, can you merge it too please? I don't have commit access.

Michal




Re: [PATCH 0/2] i386/sev: Two trivial improvements to sev_get_capabilities()

2024-07-03 Thread Michal Prívozník
On 6/24/24 10:52, Michal Privoznik wrote:
> I've noticed that recent QEMU + libvirt (current HEADs, roughly) behave
> a bit different than expected. The problem is in recent change to
> 'query-sev-capabilities' command (well, sev_get_capabilities() in fact)
> which libvirt uses (see patch 2/2). The first one is trivial.
> 
> Michal Privoznik (2):
>   i386/sev: Fix error message in sev_get_capabilities()
>   i386/sev: Fallback to the default SEV device if none provided in
> sev_get_capabilities()
> 
>  target/i386/sev.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 

Polite ping. Patch 2/2 is not reviewed and it causes problems to libvirt.

Michal




Re: [PATCH v3 1/4] osdep: Make qemu_madvise() to set errno in all cases

2024-06-03 Thread Michal Prívozník
On 6/3/24 10:50, Akihiko Odaki wrote:
> On 2024/06/03 16:56, Michal Prívozník wrote:
>> On 6/2/24 08:26, Akihiko Odaki wrote:
>>> On 2024/06/01 0:46, Philippe Mathieu-Daudé wrote:
>>>> On 31/5/24 17:10, Michal Privoznik wrote:
>>>>> The unspoken premise of qemu_madvise() is that errno is set on
>>>>> error. And it is mostly the case except for posix_madvise() which
>>>>> is documented to return either zero (on success) or a positive
>>>>> error number. This means, we must set errno ourselves. And while
>>>>> at it, make the function return a negative value on error, just
>>>>> like other error paths do.
>>>>>
>>>>> Signed-off-by: Michal Privoznik 
>>>>> ---
>>>>>    util/osdep.c | 14 +-
>>>>>    1 file changed, 13 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/util/osdep.c b/util/osdep.c
>>>>> index e996c4744a..1345238a5c 100644
>>>>> --- a/util/osdep.c
>>>>> +++ b/util/osdep.c
>>>>> @@ -57,7 +57,19 @@ int qemu_madvise(void *addr, size_t len, int
>>>>> advice)
>>>>>    #if defined(CONFIG_MADVISE)
>>>>>    return madvise(addr, len, advice);
>>>>>    #elif defined(CONFIG_POSIX_MADVISE)
>>>>> -    return posix_madvise(addr, len, advice);
>>>>> +    /*
>>>>> + * On Darwin posix_madvise() has the same return semantics as
>>>>> + * plain madvise, i.e. errno is set and -1 is returned.
>>>>> Otherwise,
>>>>> + * a positive error number is returned.
>>>>> + */
>>>>
>>>> Alternative is to guard with #ifdef CONFIG_DARWIN ... #else ... #endif
>>>> which might be clearer.
>>>>
>>>> Although this approach seems reasonable, so:
>>>> Reviewed-by: Philippe Mathieu-Daudé 
>>>
>>> We should use plain madvise() if posix_madvise() is broken. In fact,
>>> QEMU detects the availability of plain madvise() and use it instead of
>>> posix_madvise() on my MacBook.
>>>
>>> Perhaps it may be better to stop defining CONFIG_POSIX_MADVISE on Darwin
>>> to ensure we never use the broken implementation.
>>>
>>
>> Well, doesn't Darwin have madvise() in the first place?
>>
>> https://opensource.apple.com/source/xnu/xnu-7195.81.3/bsd/man/man2/madvise.2.auto.html
>>
>> I thought that's the reason for posix_madvise() to behave the same as
>> madvise() there.
> 
> It does have madvise() and QEMU on my MacBook uses it instead of
> posix_madvise().
> 

I don't have a Mac myself, but I ran some tests on my colleague's Mac
and yes, posix_madvise() is basically just an alias to madvise(). No
dispute there.

> The behavior of posix_madvise() is probably just a bug (and perhaps it
> is too late for them to fix).
> 

So what does this mean for this patch? Should I resend with the change
you're suggesting or this is good as is? I mean, posix_madvise() is not
going to be used on Mac anyways.

Michal




Re: [PATCH v3 1/4] osdep: Make qemu_madvise() to set errno in all cases

2024-06-03 Thread Michal Prívozník
On 6/2/24 08:26, Akihiko Odaki wrote:
> On 2024/06/01 0:46, Philippe Mathieu-Daudé wrote:
>> On 31/5/24 17:10, Michal Privoznik wrote:
>>> The unspoken premise of qemu_madvise() is that errno is set on
>>> error. And it is mostly the case except for posix_madvise() which
>>> is documented to return either zero (on success) or a positive
>>> error number. This means, we must set errno ourselves. And while
>>> at it, make the function return a negative value on error, just
>>> like other error paths do.
>>>
>>> Signed-off-by: Michal Privoznik 
>>> ---
>>>   util/osdep.c | 14 +-
>>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/util/osdep.c b/util/osdep.c
>>> index e996c4744a..1345238a5c 100644
>>> --- a/util/osdep.c
>>> +++ b/util/osdep.c
>>> @@ -57,7 +57,19 @@ int qemu_madvise(void *addr, size_t len, int advice)
>>>   #if defined(CONFIG_MADVISE)
>>>   return madvise(addr, len, advice);
>>>   #elif defined(CONFIG_POSIX_MADVISE)
>>> -    return posix_madvise(addr, len, advice);
>>> +    /*
>>> + * On Darwin posix_madvise() has the same return semantics as
>>> + * plain madvise, i.e. errno is set and -1 is returned. Otherwise,
>>> + * a positive error number is returned.
>>> + */
>>
>> Alternative is to guard with #ifdef CONFIG_DARWIN ... #else ... #endif
>> which might be clearer.
>>
>> Although this approach seems reasonable, so:
>> Reviewed-by: Philippe Mathieu-Daudé 
> 
> We should use plain madvise() if posix_madvise() is broken. In fact,
> QEMU detects the availability of plain madvise() and use it instead of
> posix_madvise() on my MacBook.
> 
> Perhaps it may be better to stop defining CONFIG_POSIX_MADVISE on Darwin
> to ensure we never use the broken implementation.
> 

Well, doesn't Darwin have madvise() in the first place?

https://opensource.apple.com/source/xnu/xnu-7195.81.3/bsd/man/man2/madvise.2.auto.html

I thought that's the reason for posix_madvise() to behave the same as madvise() 
there.

Michal




Re: [PATCH v3 1/4] osdep: Make qemu_madvise() to set errno in all cases

2024-06-03 Thread Michal Prívozník
On 5/31/24 17:46, Philippe Mathieu-Daudé wrote:
> On 31/5/24 17:10, Michal Privoznik wrote:
>> The unspoken premise of qemu_madvise() is that errno is set on
>> error. And it is mostly the case except for posix_madvise() which
>> is documented to return either zero (on success) or a positive
>> error number. This means, we must set errno ourselves. And while
>> at it, make the function return a negative value on error, just
>> like other error paths do.
>>
>> Signed-off-by: Michal Privoznik 
>> ---
>>   util/osdep.c | 14 +-
>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/util/osdep.c b/util/osdep.c
>> index e996c4744a..1345238a5c 100644
>> --- a/util/osdep.c
>> +++ b/util/osdep.c
>> @@ -57,7 +57,19 @@ int qemu_madvise(void *addr, size_t len, int advice)
>>   #if defined(CONFIG_MADVISE)
>>   return madvise(addr, len, advice);
>>   #elif defined(CONFIG_POSIX_MADVISE)
>> -    return posix_madvise(addr, len, advice);
>> +    /*
>> + * On Darwin posix_madvise() has the same return semantics as
>> + * plain madvise, i.e. errno is set and -1 is returned. Otherwise,
>> + * a positive error number is returned.
>> + */
> 
> Alternative is to guard with #ifdef CONFIG_DARWIN ... #else ... #endif
> which might be clearer.

That's how I had it written (locally) initially, but then thought: well,
what if there's another OS that behaves the same? This way, we don't
have to care and just do the right thing.

> 
> Although this approach seems reasonable, so:
> Reviewed-by: Philippe Mathieu-Daudé 

Thanks!

Michal




Re: [PATCH v2 1/4] osdep: Make qemu_madvise() to set errno in all cases

2024-05-31 Thread Michal Prívozník
On 5/31/24 11:08, David Hildenbrand wrote:
> On 31.05.24 10:12, Philippe Mathieu-Daudé wrote:
>> On 31/5/24 10:01, David Hildenbrand wrote:
>>> On 31.05.24 09:57, Philippe Mathieu-Daudé wrote:
 Hi Michal,

 On 31/5/24 09:28, Michal Privoznik wrote:
> The unspoken premise of qemu_madvise() is that errno is set on
> error. And it is mostly the case except for posix_madvise() which
> is documented to return either zero (on success) or a positive
> error number.

 Watch out, Linux:

  RETURN VALUE

     On success, posix_madvise() returns 0.  On failure,
     it returns a positive error number.

 but on Darwin:

  RETURN VALUES

     Upon successful completion, a value of 0 is returned.
     Otherwise, a value of -1 is returned and errno is set
     to indicate the error.

 (Haven't checked other POSIX OSes).

>>>
>>> ... but it's supposed to follow the "posix standard" :D Maybe an issue
>>> in the docs?
>>>
>>> FreeBSD seems to follow the standard: "The posix_madvise() interface is
>>> identical, except it returns an error number on error and does not
>>> modify errno, and is provided for standards conformance."
>>>
>>> Same with OpenBSD: "The posix_madvise() interface has the same effect,
>>> but returns the error value instead of only setting errno."
>>
>> On Darwin, MADVISE(2):
>>
>>  The posix_madvise() behaves same as madvise() except that it uses
>>  values with POSIX_ prefix for the advice system call argument.
>>
>>  The posix_madvise function is part of IEEE 1003.1-2001 and was first
>>  implemented in Mac OS X 10.2.
>>
>> Per IEEE 1003.1-2001
>> (https://pubs.opengroup.org/onlinepubs/009695399/functions/posix_madvise.html):
>>
>>     RETURN VALUE
>>
>>   Upon successful completion, posix_madvise() shall return zero;
>>   otherwise, an error number shall be returned to indicate the error.
>>
>> Note the use of "shall" which is described in RFC2119 as:
>>
>>  This word, or the adjective "RECOMMENDED", mean that there
>>  may exist valid reasons in particular circumstances to ignore a
>>  particular item, but the full implications must be understood and
>>  carefully weighed before choosing a different course.
> 
> Agreed, so we have to be careful.
> 
> I do wonder if there would be the option for an automatic approach: for
> example, detect if the errno was/was not changed. Hm.
> 

Firstly, thanks Philippe for this great catch! I did think that "posix_"
prefix might mean POSIX is followed. Anyway, looks like the common
denominator is: on success 0 returned. And then, on Darwin, errno is set
and -1 is returned. On everything(?) else, a positive value is returned
and errno is left untouched. So I think we can get away with something
like the following:

int rc = posix_madvise();
if (rc) {
  if (rc > 0) {
errno = rc;
  }
  return -1;
}
return 0;

Plus a comment explaining the difference on Darwin.

Michal




Re: [PATCH 3/3] backends/hostmem: Round up memory size for qemu_madvise() and mbind()

2024-05-28 Thread Michal Prívozník
On 5/28/24 18:47, David Hildenbrand wrote:
> Am 28.05.24 um 18:15 schrieb Michal Privoznik:
>> ./build/qemu-system-x86_64 \ -m
>> size=8389632k,slots=16,maxmem=2560k \ -object
>> '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"hugepages2M","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}'
>>  \ -numa node,nodeid=0,cpus=0,memdev=ram-node0
> 
> For DIMMs and friends we now (again) enforce that the size must be
> aligned to the page size:
> 
> commit 540a1abbf0b243e4cfb4333c5d30a041f7080ba4
> Author: David Hildenbrand 
> Date:   Wed Jan 17 14:55:54 2024 +0100
> 
>     memory-device: reintroduce memory region size check
> 
>     We used to check that the memory region size is multiples of the
> overall
>     requested address alignment for the device memory address.
> 
>     We removed that check, because there are cases (i.e., hv-balloon) where
>     devices unconditionally request an address alignment that has a very
> large
>     alignment (i.e., 32 GiB), but the actual memory device size might
> not be
>     multiples of that alignment.
> 
>     However, this change:
> 
>     (a) allows for some practically impossible DIMM sizes, like "1GB+1
> byte".
>     (b) allows for DIMMs that partially cover hugetlb pages, previously
>     reported in [1].
> ...
> 
> Partial hugetlb pages do not particularly make sense; wasting memory. Do
> we expect people to actually ave working setup that we might break when
> disallowing such configurations? Or would we actually help them identify
> that something non-sensical is happening?
> 
> When using memory-backend-memfd, we already do get a proper error:
> 
>  ./build/qemu-system-x86_64 -m 2047m -object
> memory-backend-memfd,id=ram-node0,hugetlb=on,size=2047m,reserve=off
> -numa node,nodeid=0,cpus=0,memdev=ram-node0 -S
> qemu-system-x86_64: failed to resize memfd to 2146435072: Invalid argument
> 
> So this only applies to memory-backend-file, where we maybe should fail
> in a similar way?
> 

Yeah, let's fail in that case. But non-aligned length is just one of
many reasons madvise()/mbind() can fail. What about the others? Should
we make them report an error or just a warning?

Michal




Re: [PATCH] qga: Add an interactive mode to guest-exec via VSOCK for Linux

2024-05-22 Thread Michal Prívozník
On 5/22/24 17:06, Alexander Ivanov wrote:
> Add an interactive mode to the guest-exec command in the QEMU Guest Agent
> using the VSOCK communication mechanism. It enables interactive sessions
> with the executed command in the guest, allowing real-time input/output.
> 
> Introduce "interactive" mode in the GuestExecCaptureOutputMode enumeration
> and add optional "cid" and "port" fields to the guest-exec response. In
> such a way user can execute guest-exec command, get CID and port number
> from the response and connect to the guest server. After connection user
> can communicate with the started process. All the data transmitted to the
> server is redirected to stdin. Data from stdout and stderr is redirected
> to the client. All data blocks are preceded by 32-bit headers (network
> byte order): most significant bit contains a sign of stream (stdout - 0,
> stderr - 1), all the other bits contain the payload size.
> 
> Signed-off-by: Alexander Ivanov 
> ---
>  qga/commands.c   | 272 +--
>  qga/qapi-schema.json |  11 +-
>  2 files changed, 273 insertions(+), 10 deletions(-)

FYI, libvirt recently added support for SSH over VSOCK:

https://libvirt.org/ssh-proxy.html

and it'll be released soon:

https://libvirt.org/news.html

Though, it requires (soon to be released) systemd to automatically set
up SSHD to listen on VSOCK (but it can be configured manually).

Michal




Re: [PATCH] hostmem: Round up memory size for qemu_madvise() in host_memory_backend_memory_complete()

2023-12-04 Thread Michal Prívozník
On 12/4/23 10:21, David Hildenbrand wrote:
> On 01.12.23 10:07, Michal Prívozník wrote:
>> On 11/27/23 14:55, David Hildenbrand wrote:
>>> On 27.11.23 14:37, David Hildenbrand wrote:
>>>> On 27.11.23 13:32, Michal Privoznik wrote:
>>>>> Simple reproducer:
>>>>> qemu.git $ ./build/qemu-system-x86_64 \
>>>>> -m size=8389632k,slots=16,maxmem=2560k \
>>>>> -object
>>>>> '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/hugepages2M/","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}'
>>>>>  \
>>>>> -numa node,nodeid=0,cpus=0,memdev=ram-node0
>>>>>
>>>>> With current master I get:
>>>>>
>>>>> qemu-system-x86_64: cannot bind memory to host NUMA nodes: Invalid
>>>>> argument
>>>>>
>>>>> The problem is that memory size (8193MiB) is not an integer
>>>>> multiple of underlying pagesize (2MiB) which triggers a check
>>>>> inside of madvise(), since we can't really set a madvise() policy
>>>>> just to a fraction of a page.
>>>>
>>>> I thought we would just always fail create something that doesn't
>>>> really
>>>> make any sense.
>>>>
>>>> Why would we want to support that case?
>>>>
>>>> Let me dig, I thought we would have had some check there at some point
>>>> that would make that fail (especially: RAM block not aligned to the
>>>> pagesize).
>>>
>>>
>>> At least memory-backend-memfd properly fails for that case:
>>>
>>> $ ./build/qemu-system-x86_64 -object
>>> memory-backend-memfd,hugetlb=on,size=3m,id=tmp
>>> qemu-system-x86_64: failed to resize memfd to 3145728: Invalid argument
>>>
>>> memory-backend-file ends up creating a new file:
>>>
>>>   $ ./build/qemu-system-x86_64 -object
>>> memory-backend-file,share=on,mem-path=/dev/hugepages/tmp,size=3m,id=tmp
>>>
>>> $ stat /dev/hugepages/tmp
>>>    File: /dev/hugepages/tmp
>>>    Size: 4194304 Blocks: 0  IO Block: 2097152 regular
>>> file
>>>
>>> ... and ends up sizing it properly aligned to the huge page size.
>>>
>>>
>>> Seems to be due to:
>>>
>>>  if (memory < block->page_size) {
>>>  error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be
>>> equal to "
>>>     "or larger than page size 0x%zx",
>>>     memory, block->page_size);
>>>  return NULL;
>>>  }
>>>
>>>  memory = ROUND_UP(memory, block->page_size);
>>>
>>>  /*
>>>   * ftruncate is not supported by hugetlbfs in older
>>>   * hosts, so don't bother bailing out on errors.
>>>   * If anything goes wrong with it under other filesystems,
>>>   * mmap will fail.
>>>   *
>>>   * Do not truncate the non-empty backend file to avoid corrupting
>>>   * the existing data in the file. Disabling shrinking is not
>>>   * enough. For example, the current vNVDIMM implementation stores
>>>   * the guest NVDIMM labels at the end of the backend file. If the
>>>   * backend file is later extended, QEMU will not be able to find
>>>   * those labels. Therefore, extending the non-empty backend file
>>>   * is disabled as well.
>>>   */
>>>  if (truncate && ftruncate(fd, offset + memory)) {
>>>  perror("ftruncate");
>>>  }
>>>
>>> So we create a bigger file and map the bigger file and also have a
>>> RAMBlock that is bigger. So we'll also consume more memory.
>>>
>>> ... but the memory region is smaller and we tell the VM that it has
>>> less memory. Lot of work with no obvious benefit, and only some
>>> memory waste :)
>>>
>>>
>>> We better should have just rejected such memory backends right from
>>> the start. But now it's likely too late.
>>>
>>> I suspect other things like
>>>   * qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
>>>   * qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
>>>
>>> fail, but we don't care for hugetlb at least regarding merg

Re: [PATCH] hostmem: Round up memory size for qemu_madvise() in host_memory_backend_memory_complete()

2023-12-01 Thread Michal Prívozník
On 11/27/23 14:55, David Hildenbrand wrote:
> On 27.11.23 14:37, David Hildenbrand wrote:
>> On 27.11.23 13:32, Michal Privoznik wrote:
>>> Simple reproducer:
>>> qemu.git $ ./build/qemu-system-x86_64 \
>>> -m size=8389632k,slots=16,maxmem=2560k \
>>> -object
>>> '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/hugepages2M/","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}'
>>>  \
>>> -numa node,nodeid=0,cpus=0,memdev=ram-node0
>>>
>>> With current master I get:
>>>
>>> qemu-system-x86_64: cannot bind memory to host NUMA nodes: Invalid
>>> argument
>>>
>>> The problem is that memory size (8193MiB) is not an integer
>>> multiple of underlying pagesize (2MiB) which triggers a check
>>> inside of madvise(), since we can't really set a madvise() policy
>>> just to a fraction of a page.
>>
>> I thought we would just always fail create something that doesn't really
>> make any sense.
>>
>> Why would we want to support that case?
>>
>> Let me dig, I thought we would have had some check there at some point
>> that would make that fail (especially: RAM block not aligned to the
>> pagesize).
> 
> 
> At least memory-backend-memfd properly fails for that case:
> 
> $ ./build/qemu-system-x86_64 -object
> memory-backend-memfd,hugetlb=on,size=3m,id=tmp
> qemu-system-x86_64: failed to resize memfd to 3145728: Invalid argument
> 
> memory-backend-file ends up creating a new file:
> 
>  $ ./build/qemu-system-x86_64 -object
> memory-backend-file,share=on,mem-path=/dev/hugepages/tmp,size=3m,id=tmp
> 
> $ stat /dev/hugepages/tmp
>   File: /dev/hugepages/tmp
>   Size: 4194304 Blocks: 0  IO Block: 2097152 regular file
> 
> ... and ends up sizing it properly aligned to the huge page size.
> 
> 
> Seems to be due to:
> 
>     if (memory < block->page_size) {
>     error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
>    "or larger than page size 0x%zx",
>    memory, block->page_size);
>     return NULL;
>     }
> 
>     memory = ROUND_UP(memory, block->page_size);
> 
>     /*
>  * ftruncate is not supported by hugetlbfs in older
>  * hosts, so don't bother bailing out on errors.
>  * If anything goes wrong with it under other filesystems,
>  * mmap will fail.
>  *
>  * Do not truncate the non-empty backend file to avoid corrupting
>  * the existing data in the file. Disabling shrinking is not
>  * enough. For example, the current vNVDIMM implementation stores
>  * the guest NVDIMM labels at the end of the backend file. If the
>  * backend file is later extended, QEMU will not be able to find
>  * those labels. Therefore, extending the non-empty backend file
>  * is disabled as well.
>  */
>     if (truncate && ftruncate(fd, offset + memory)) {
>     perror("ftruncate");
>     }
> 
> So we create a bigger file and map the bigger file and also have a
> RAMBlock that is bigger. So we'll also consume more memory.
> 
> ... but the memory region is smaller and we tell the VM that it has
> less memory. Lot of work with no obvious benefit, and only some
> memory waste :)
> 
> 
> We better should have just rejected such memory backends right from
> the start. But now it's likely too late.
> 
> I suspect other things like
>  * qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
>  * qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
> 
> fail, but we don't care for hugetlb at least regarding merging
> and don't even log an error.
> 
> But QEMU_MADV_DONTDUMP might also be broken, because that
> qemu_madvise() call will just fail.
> 
> Your fix would be correct. But I do wonder if we want to just let that
> case fail and warn users that they are doing something that doesn't
> make too much sense.
> 

Yeah, what's suspicious is: if the size is smaller than page size we
error out, but if it's larger (but still not aligned) we accept that.
I'm failing to see reasoning there. Looks like the ROUND_UP() was added
in v0.13.0-rc0~1201^2~4 (though it's done with some bit magic) and the
check itself was added in v2.8.0-rc0~30^2~23. So it's a bit late, yes.

OTOH - if users want to waste resources, should we stop them? For
instance, when user requests more vCPUs than physical CPUs a warning is
printed:

$ ./build/qemu-system-x86_64 -accel kvm -smp cpus=128
qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested
(128) exceeds the recommended cpus supported by KVM (8)

but that's about it. So maybe the error can be demoted to just a warning?

Michal




Re: [PATCH v2 00/10] meson: replace submodules with wrap files

2023-06-07 Thread Michal Prívozník
On 6/7/23 09:47, Daniel P. Berrangé wrote:
> On Wed, Jun 07, 2023 at 09:41:40AM +0200, Michal Prívozník wrote:
>> On 6/5/23 11:52, Paolo Bonzini wrote:
>>> This series replaces git submodules for bundled libraries with .wrap
>>> files that can be used directly by meson for subprojects. 
>>
>> Pardon my lack of knowledge, but even after I clone new repo and run:
>>
>>   ./configure --enable-donwload && make && make test
>>
>> I still see berkeley-softfloat-3 submodule missing:
>>
>>   git submodule status
>>   ...
>>   0c37a43527f0ee2b9584e7fb2fdc805e902635ac roms/vbootrom
>>   fatal: no submodule mapping found in .gitmodules for path
>> 'tests/fp/berkeley-softfloat-3'
>>
>> Is this expected?
> 
> Yet another example of submodules sucking. Once we removed the submodules
> from .gitmodules, git doesn't know what to do with the existing chcked
> out submodules from before this time.
> 
> Best thing todo is purge all existing submodules, eg
> 
>   git submodule deinit --all --force
> 
> and if there are stale directories left over, manually delete those too,
> so you get back to a more pristine checkout state.

I'm not sure that helps. I mean:

  git clone https://gitlab.com/qemu-project/qemu.git qemu2.git && \
  cd qemu2.git/ && \
  git submodule status

still complains:

  fatal: no submodule mapping found in .gitmodules for path
'tests/fp/berkeley-softfloat-3'

Michal




Re: [PATCH v2 00/10] meson: replace submodules with wrap files

2023-06-07 Thread Michal Prívozník
On 6/5/23 11:52, Paolo Bonzini wrote:
> This series replaces git submodules for bundled libraries with .wrap
> files that can be used directly by meson for subprojects. 

Pardon my lack of knowledge, but even after I clone new repo and run:

  ./configure --enable-donwload && make && make test

I still see berkeley-softfloat-3 submodule missing:

  git submodule status
  ...
  0c37a43527f0ee2b9584e7fb2fdc805e902635ac roms/vbootrom
  fatal: no submodule mapping found in .gitmodules for path
'tests/fp/berkeley-softfloat-3'

Is this expected?

Michal




Re: [RFC 0/3] virtio-blk: add iothread-vq-mapping parameter

2023-02-06 Thread Michal Prívozník
On 1/18/23 20:47, Stefan Hajnoczi wrote:
> This is a preview of the iothread-vq-mapping parameter that assigns virtqueues
> to IOThreads. The syntax is implemented but multiple IOThreads are not 
> actually
> supported yet. The purpose of this RFC is to reach agreement on the syntax and
> to prepare for libvirt support.
> 
> virtio-blk and virtio-scsi devices will need a way to specify the
> mapping between IOThreads and virtqueues. At the moment all virtqueues
> are assigned to a single IOThread or the main loop. This single thread
> can be a CPU bottleneck, so it is necessary to allow finer-grained
> assignment to spread the load.
> 
> This series introduces command-line syntax for the new iothread-vq-mapping
> property is as follows:
> 
>   --device 
> '{"driver":"virtio-blk-pci","iothread-vq-mapping":[{"iothread":"iothread0","vqs":[0,1,2]},...]},...'
> 
> IOThreads are specified by name and virtqueues are specified by 0-based
> index.
> 
> It will be common to simply assign virtqueues round-robin across a set
> of IOThreads. A convenient syntax that does not require specifying
> individual virtqueue indices is available:
> 
>   --device 
> '{"driver":"virtio-blk-pci","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"},...]},...'
> 
> There is no way to reassign virtqueues at runtime and I expect that to be a
> very rare requirement.
> 
> Perhaps libvirt only needs to support round-robin because specifying 
> individual
> virtqueues is very specific and probably only useful for low-level performance
> investigation. The libvirt domain XML syntax for this could be:
> 
>   
> 
>   
>   
>   
> 
> ...
>   

Just for completeness, this how disk XML looks now:

  




  

It corresponds to the following cmd line:

  -blockdev 
'{"driver":"file","filename":"/tmp/data.qcow","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}'
 \
  -blockdev 
'{"node-name":"libvirt-3-format","read-only":false,"driver":"qcow2","file":"libvirt-3-storage"}'
 \
  -device 
'{"driver":"virtio-blk-pci","iothread":"iothread1","num-queues":4,"bus":"pci.0","addr":"0x3","drive":"libvirt-3-format","id":"virtio-disk0","bootindex":1}'
 \

We already have @iothread attribute, so inventing an 
subelement is a bit misleading, because if users query which disk uses
iothreads, they need to change their XPATH. Currently they can get away
with:

  //disk[driver/@iothread]/source/@file

but I guess for backwards compatibility, we can put the first iothread
ID into the attribute, e.g.:

  

 
 

  


We've done something similar, when introducing multiple listen addresses
for VNC.

Now, an iothread is actually a thread pool. I guess we will never ever
need to assign individual threads from the pool to queues, right?

Michal




Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd

2023-01-31 Thread Michal Prívozník
On 1/31/23 22:01, Peter Xu wrote:
> I'll wait 1-2 more days to see whether Michal has anything to comment.

Yeah, we can go with your patches and leave FD passing for future work.
It's orthogonal after all.

In the end we can have (in the order of precedence):

1) new cmd line argument, say:

   qemu-system-x86_64 -userfaultfd fd=5 # where FD 5 is passed by
libvirt when exec()-ing qemu, just like other FDs, e.g. -chardev
socket,fd=XXX

2) your patches, where qemu opens /dev/userfaultfd

3) current behavior, userfaultfd syscall


Michal




Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd

2023-01-26 Thread Michal Prívozník
On 1/26/23 16:25, Peter Xu wrote:
> On Thu, Jan 26, 2023 at 02:15:11PM +, Dr. David Alan Gilbert wrote:
>> * Michal Prívozník (mpriv...@redhat.com) wrote:
>>> On 1/25/23 23:40, Peter Xu wrote:
>>>> The new /dev/userfaultfd handle is superior to the system call with a
>>>> better permission control and also works for a restricted seccomp
>>>> environment.
>>>>
>>>> The new device was only introduced in v6.1 so we need a header update.
>>>>
>>>> Please have a look, thanks.
>>>
>>> I was wondering whether it would make sense/be possible for mgmt app
>>> (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
>>> itself. But looking into the code, libvirt would need to do that when
>>> spawning QEMU because that's when QEMU itself initializes internal state
>>> and queries userfaultfd caps.
>>
>> You also have to be careful about what the userfaultfd semantics are; I
>> can't remember them - but if you open it in one process and pass it to
>> another process, which processes address space are you trying to
>> monitor?
> 
> Yes it's a problem.  The kernel always fetches the current mm_struct* which
> represents the current context of virtual address space when creating the
> uffd handle (for either the syscall or the ioctl() approach).

Ah, I did not realize that.

> 
> It works only if Libvirt will invoke QEMU as a thread and they'll share the
> same address space.
> 
> Why libvirt would like to do so?

Well, we tend to pass files as FD more and more, because it allows us to
give access to "privileged" files to unprivileged process. What I did
not realize is that userfaultfd is different, not yet another file.

Michal




Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd

2023-01-26 Thread Michal Prívozník
On 1/25/23 23:40, Peter Xu wrote:
> The new /dev/userfaultfd handle is superior to the system call with a
> better permission control and also works for a restricted seccomp
> environment.
> 
> The new device was only introduced in v6.1 so we need a header update.
> 
> Please have a look, thanks.

I was wondering whether it would make sense/be possible for mgmt app
(libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
itself. But looking into the code, libvirt would need to do that when
spawning QEMU because that's when QEMU itself initializes internal state
and queries userfaultfd caps.

Michal




Re: [PATCH v2] hostmem: Honor multiple preferred nodes if possible

2022-12-19 Thread Michal Prívozník
On 12/19/22 10:58, David Hildenbrand wrote:
> 
> I'll fixup. I just queued the fixed-up patch to
> 
> https://github.com/davidhildenbrand/qemu.git mem-next
> 
> Please double-check. Thanks!
> 

Looks good. Sorry for not doing it properly the first time.

Michal




Re: [PATCH v2] hostmem: Honor multiple preferred nodes if possible

2022-12-19 Thread Michal Prívozník
On 12/19/22 10:55, David Hildenbrand wrote:
> On 16.12.22 14:47, Michal Prívozník wrote:
>> On 12/16/22 14:41, David Hildenbrand wrote:
>>> On 15.12.22 10:55, Michal Privoznik wrote:
>>>> If a memory-backend is configured with mode
>>>> HOST_MEM_POLICY_PREFERRED then
>>>> host_memory_backend_memory_complete() calls mbind() as:
>>>>
>>>>     mbind(..., MPOL_PREFERRED, nodemask, ...);
>>>>
>>>> Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
>>>> to the .host-nodes attribute. Therefore, there can be multiple
>>>> nodes specified. However, the documentation to MPOL_PREFERRED
>>>> says:
>>>>
>>>>     MPOL_PREFERRED
>>>>   This mode sets the preferred node for allocation. ...
>>>>   If nodemask specifies more than one node ID, the first node
>>>>   in the mask will be selected as the preferred node.
>>>>
>>>> Therefore, only the first node is honored and the rest is
>>>> silently ignored. Well, with recent changes to the kernel and
>>>> numactl we can do better.
>>>>
>>>> The Linux kernel added in v5.15 via commit cfcaa66f8032
>>>> ("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
>>>> support for MPOL_PREFERRED_MANY, which accepts multiple preferred
>>>> NUMA nodes instead.
>>>>
>>>> Then, numa_has_preferred_many() API was introduced to numactl
>>>> (v2.0.15~26) allowing applications to query kernel support.
>>>>
>>>> Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
>>>> mbind() call instead and stop ignoring multiple nodes, silently.
>>>>
>>>> Signed-off-by: Michal Privoznik 
>>>> ---
>>>
>>> [...]
>>>
>>>> +#ifdef HAVE_NUMA_SET_PREFERRED_MANY
> 
> That should be HAVE_NUMA_HAS_PREFERRED_MANY, right?
> 

Oops, yes. Do you want me to send v3?

Michal




Re: [PATCH v2] hostmem: Honor multiple preferred nodes if possible

2022-12-16 Thread Michal Prívozník
On 12/16/22 14:41, David Hildenbrand wrote:
> On 15.12.22 10:55, Michal Privoznik wrote:
>> If a memory-backend is configured with mode
>> HOST_MEM_POLICY_PREFERRED then
>> host_memory_backend_memory_complete() calls mbind() as:
>>
>>    mbind(..., MPOL_PREFERRED, nodemask, ...);
>>
>> Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
>> to the .host-nodes attribute. Therefore, there can be multiple
>> nodes specified. However, the documentation to MPOL_PREFERRED
>> says:
>>
>>    MPOL_PREFERRED
>>  This mode sets the preferred node for allocation. ...
>>  If nodemask specifies more than one node ID, the first node
>>  in the mask will be selected as the preferred node.
>>
>> Therefore, only the first node is honored and the rest is
>> silently ignored. Well, with recent changes to the kernel and
>> numactl we can do better.
>>
>> The Linux kernel added in v5.15 via commit cfcaa66f8032
>> ("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
>> support for MPOL_PREFERRED_MANY, which accepts multiple preferred
>> NUMA nodes instead.
>>
>> Then, numa_has_preferred_many() API was introduced to numactl
>> (v2.0.15~26) allowing applications to query kernel support.
>>
>> Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
>> mbind() call instead and stop ignoring multiple nodes, silently.
>>
>> Signed-off-by: Michal Privoznik 
>> ---
> 
> [...]
> 
>> +#ifdef HAVE_NUMA_SET_PREFERRED_MANY
>> +    if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
>> +    /*
>> + * Replace with MPOL_PREFERRED_MANY otherwise the mbind()
>> below
>> + * silently picks the first node.
>> + */
>> +    mode = MPOL_PREFERRED_MANY;
>> +    }
>> +#endif
>> +
> 
> 
> I was curios if we want to warn the user if "mode ==
> MPOL_PREFERRED_MANY" and we were given more than one node.

I was wondering about that, but given that we currently silently ignore
other nodes, I think it's safe to assume the warning is not necessary.
Then again, as users upgrade to newer kernels this is going to be the
new norm and the warning won't be necessary.

> 
> 
> Apart from that LGTM, thanks
> 
> Reviewed-by: David Hildenbrand 
> 

Thanks,
Michal




Re: [PATCH v2] seccomp: Get actual errno value from failed seccomp functions

2022-10-27 Thread Michal Prívozník
On 10/26/22 14:33, Daniel P. Berrangé wrote:
> On Wed, Oct 26, 2022 at 09:30:24AM +0200, Michal Privoznik wrote:
>> Upon failure, a libseccomp API returns actual errno value very
>> rarely. Fortunately, after its commit 34bf78ab (contained in
>> 2.5.0 release), the SCMP_FLTATR_API_SYSRAWRC attribute can be set
>> which makes subsequent APIs return true errno on failure.
>>
>> This is especially critical when seccomp_load() fails, because
>> generic -ECANCELED says nothing.
>>
>> Signed-off-by: Michal Privoznik 
>> Reviewed-by: Philippe Mathieu-Daudé 
>> ---
>>
>> v2 of:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg04509.html
>>
>> diff to v1:
>> - added comment when setting SYSRAWRC attribute per Philippe's
>>   suggestion
>>
>>  meson.build|  9 +
>>  softmmu/qemu-seccomp.c | 13 +
>>  2 files changed, 22 insertions(+)
> 
> Reviewed-by: Daniel P. Berrangé 

Thank you. Can somebody merge this please? I don't have commit access.

Michal




Re: [PATCH] configure: Avoid using strings binary

2022-10-13 Thread Michal Prívozník
On 10/13/22 12:39, Peter Maydell wrote:
> On Thu, 13 Oct 2022 at 09:47, Michal Privoznik  wrote:
>>
>> When determining the endiandness of the target architecture we're
>> building for a small program is compiled, which in an obfuscated
>> way declares two strings. Then, we look which string is in
>> correct order (using strings binary) and deduct the endiandness.
>> But using the strings binary is problematic, because it's part of
>> toolchain (strings is just a symlink to
>> x86_64-pc-linux-gnu-strings or llvm-strings). And when
>> (cross-)compiling, it requires users to set the symlink to the
>> correct toolchain.
>>
>> Fortunately, we have a better alternative anyways. Since we
>> require either clang or gcc we can rely on macros they declare.
>>
>> Bug: https://bugs.gentoo.org/876933
>> Signed-off-by: Michal Privoznik 
> 
> If we can determine this just by looking at C macros, does
> this really need to be a configure test at all ? Paolo?

Yes, because we're using this information to generate a file for meson
that's later used during cross compilation.

Michal




Re: [PATCH] configure: Avoid using strings binary

2022-10-13 Thread Michal Prívozník
On 10/13/22 10:37, Michal Privoznik wrote:
> When determining the endiandness of the target architecture we're
> building for a small program is compiled, which in an obfuscated
> way declares two strings. Then, we look which string is in
> correct order (using strings binary) and deduct the endiandness.
> But using the strings binary is problematic, because it's part of
> toolchain (strings is just a symlink to
> x86_64-pc-linux-gnu-strings or llvm-strings). And when
> (cross-)compiling, it requires users to set the symlink to the
> correct toolchain.
> 
> Fortunately, we have a better alternative anyways. Since we
> require either clang or gcc we can rely on macros they declare.
> 
> Bug: https://bugs.gentoo.org/876933
> Signed-off-by: Michal Privoznik 
> ---
>  configure | 33 ++---
>  1 file changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/configure b/configure
> index 45ee6f4eb3..91e04635cb 100755
> --- a/configure
> +++ b/configure
> @@ -1426,27 +1426,30 @@ fi
>  # ---
>  # big/little endian test
>  cat > $TMPC << EOF
> -#include 
> -short big_endian[] = { 0x4269, 0x4765, 0x4e64, 0x4961, 0x4e00, 0, };
> -short little_endian[] = { 0x694c, 0x7454, 0x654c, 0x6e45, 0x6944, 0x6e41, 0, 
> };
> -int main(int argc, char *argv[])
> -{
> -return printf("%s %s\n", (char *)big_endian, (char *)little_endian);
> -}
> +#if defined(__BYTE_ORDER) && __BYTE_ORDER == __BIG_ENDIAN || \

Actually, this needs to be __BYTE_ORDER__ (missing those two underscores
at the end).

> +defined(__BIG_ENDIAN__)
> +# error BIG
> +#endif
> +int main(void) { return 0; }
>  EOF
>  
>  if compile_prog ; then
> -if strings -a $TMPE | grep -q BiGeNdIaN ; then
> -bigendian="yes"
> -elif strings -a $TMPE | grep -q LiTtLeEnDiAn ; then
> -bigendian="no"
> -else
> -echo big/little test failed
> -exit 1
> -fi
> +  bigendian="yes"

And this needs to be no. Will post v2 shortly.

Michal




Re: [PATCH RFC 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

2022-09-21 Thread Michal Prívozník
On 9/21/22 16:54, David Hildenbrand wrote:
> On 21.09.22 16:44, Michal Prívozník wrote:
>> On 7/21/22 14:07, David Hildenbrand wrote:
>>>
>>
>> Ping? Is there any plan how to move forward? I have libvirt patches
>> ready to consume this and I'd like to prune my old local branches :-)
> 
> Heh, I was thinking about this series just today. I was distracted with
> all other kind of stuff.
> 
> I'll move forward with this series later this week/early next week.

No rush, it's only that I don't want this to fall into void. Let me know
if I can help somehow. Meanwhile, here's my aforementioned branch:

https://gitlab.com/MichalPrivoznik/libvirt/-/tree/qemu_thread_context

I've made it so that ThreadContext is generated whenever
.prealloc-threads AND .host-nodes are used (i.e. no XML visible config
knob). And I'm generating ThreadContext objects for each memory backend
separately even though they can be reused, but IMO that's optimization
that can be done later.

Michal




Re: [PATCH RFC 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

2022-09-21 Thread Michal Prívozník
On 7/21/22 14:07, David Hildenbrand wrote:
>

Ping? Is there any plan how to move forward? I have libvirt patches
ready to consume this and I'd like to prune my old local branches :-)

Michal




Re: [PATCH v5 1/3] Update linux headers to 6.0-rc1

2022-08-22 Thread Michal Prívozník
On 8/17/22 04:08, Chenyi Qiang wrote:
> commit 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
> 
> Signed-off-by: Chenyi Qiang 
> ---
>  include/standard-headers/asm-x86/bootparam.h  |   7 +-
>  include/standard-headers/drm/drm_fourcc.h |  73 +++-
>  include/standard-headers/linux/ethtool.h  |  29 +--
>  include/standard-headers/linux/input.h|  12 +-
>  include/standard-headers/linux/pci_regs.h |  30 ++-
>  include/standard-headers/linux/vhost_types.h  |  17 +-
>  include/standard-headers/linux/virtio_9p.h|   2 +-
>  .../standard-headers/linux/virtio_config.h|   7 +-
>  include/standard-headers/linux/virtio_ids.h   |  14 +-
>  include/standard-headers/linux/virtio_net.h   |  34 +++-
>  include/standard-headers/linux/virtio_pci.h   |   2 +
>  linux-headers/asm-arm64/kvm.h |  27 +++
>  linux-headers/asm-generic/unistd.h|   4 +-
>  linux-headers/asm-riscv/kvm.h |  22 +++
>  linux-headers/asm-riscv/unistd.h  |   3 +-
>  linux-headers/asm-s390/kvm.h  |   1 +
>  linux-headers/asm-x86/kvm.h   |  33 ++--
>  linux-headers/asm-x86/mman.h  |  14 --
>  linux-headers/linux/kvm.h | 172 +-
>  linux-headers/linux/userfaultfd.h |  10 +-
>  linux-headers/linux/vduse.h   |  47 +
>  linux-headers/linux/vfio.h|   4 +-
>  linux-headers/linux/vfio_zdev.h   |   7 +
>  linux-headers/linux/vhost.h   |  35 +++-
>  24 files changed, 523 insertions(+), 83 deletions(-)
> 


> diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
> index bf6e96011d..46de10a809 100644
> --- a/linux-headers/asm-x86/kvm.h
> +++ b/linux-headers/asm-x86/kvm.h
> @@ -198,13 +198,13 @@ struct kvm_msrs {
>   __u32 nmsrs; /* number of msrs in entries */
>   __u32 pad;
>  
> - struct kvm_msr_entry entries[0];
> + struct kvm_msr_entry entries[];
>  };
>  

I don't think it's this simple. I think this needs to go hand in hand with 
kvm_arch_get_supported_msr_feature().

Also, this breaks clang build:

clang -m64 -mcx16 -Ilibqemu-x86_64-softmmu.fa.p -I. -I.. -Itarget/i386 
-I../target/i386 -Iqapi -Itrace -Iui -Iui/shader -I/usr/include/pixman-1 
-I/usr/include/spice-server -I/usr/include/spice-1 -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include -fcolor-diagnostics -Wall -Winvalid-pch -Werror 
-std=gnu11 -O0 -g -isystem /home/zippy/work/qemu/qemu.git/linux-headers 
-isystem linux-headers -iquote . -iquote /home/zippy/work/qemu/qemu.git -iquote 
/home/zippy/work/qemu/qemu.git/include -iquote 
/home/zippy/work/qemu/qemu.git/tcg/i386 -pthread -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -Wold-style-definition -Wtype-limits 
-Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body 
-Wnested-externs -Wendif-labels -Wexpansion-to-defined 
-Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value 
-Wno-string-plus-int -Wno-typedef-redefinition 
-Wno-tautological-type-limit-compare -Wno-psabi -fstack-protector-strong -O0 
-ggdb -fPIE -isystem../linux-headers -isystemlinux-headers -DNEED_CPU_H 
'-DCONFIG_TARGET="x86_64-softmmu-config-target.h"' 
'-DCONFIG_DEVICES="x86_64-softmmu-config-devices.h"' -MD -MQ 
libqemu-x86_64-softmmu.fa.p/target_i386_kvm_kvm.c.o -MF 
libqemu-x86_64-softmmu.fa.p/target_i386_kvm_kvm.c.o.d -o 
libqemu-x86_64-softmmu.fa.p/target_i386_kvm_kvm.c.o -c ../target/i386/kvm/kvm.c
../target/i386/kvm/kvm.c:470:25: error: field 'info' with variable sized type 
'struct kvm_msrs' not at the end of a struct or class is a GNU extension 
[-Werror,-Wgnu-variable-sized-type-not-at-end]
struct kvm_msrs info;
^
../target/i386/kvm/kvm.c:1701:27: error: field 'cpuid' with variable sized type 
'struct kvm_cpuid2' not at the end of a struct or class is a GNU extension 
[-Werror,-Wgnu-variable-sized-type-not-at-end]
struct kvm_cpuid2 cpuid;
  ^
../target/i386/kvm/kvm.c:2868:25: error: field 'info' with variable sized type 
'struct kvm_msrs' not at the end of a struct or class is a GNU extension 
[-Werror,-Wgnu-variable-sized-type-not-at-end]
struct kvm_msrs info;
^
3 errors generated.


Michal




Re: [PATCH RFC 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

2022-08-09 Thread Michal Prívozník
On 8/9/22 20:06, David Hildenbrand wrote:
> On 09.08.22 12:56, Joao Martins wrote:
>> On 7/21/22 13:07, David Hildenbrand wrote:
>>> This is a follow-up on "util: NUMA aware memory preallocation" [1] by
>>> Michal.
>>>
>>> Setting the CPU affinity of threads from inside QEMU usually isn't
>>> easily possible, because we don't want QEMU -- once started and running
>>> guest code -- to be able to mess up the system. QEMU disallows relevant
>>> syscalls using seccomp, such that any such invocation will fail.
>>>
>>> Especially for memory preallocation in memory backends, the CPU affinity
>>> can significantly increase guest startup time, for example, when running
>>> large VMs backed by huge/gigantic pages, because of NUMA effects. For
>>> NUMA-aware preallocation, we have to set the CPU affinity, however:
>>>
>>> (1) Once preallocation threads are created during preallocation, management
>>> tools cannot intercept anymore to change the affinity. These threads
>>> are created automatically on demand.
>>> (2) QEMU cannot easily set the CPU affinity itself.
>>> (3) The CPU affinity derived from the NUMA bindings of the memory backend
>>> might not necessarily be exactly the CPUs we actually want to use
>>> (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).
>>>
>>> There is an easy "workaround". If we have a thread with the right CPU
>>> affinity, we can simply create new threads on demand via that prepared
>>> context. So, all we have to do is setup and create such a context ahead
>>> of time, to then configure preallocation to create new threads via that
>>> environment.
>>>
>>> So, let's introduce a user-creatable "thread-context" object that
>>> essentially consists of a context thread used to create new threads.
>>> QEMU can either try setting the CPU affinity itself ("cpu-affinity",
>>> "node-affinity" property), or upper layers can extract the thread id
>>> ("thread-id" property) to configure it externally.
>>>
>>> Make memory-backends consume a thread-context object
>>> (via the "prealloc-context" property) and use it when preallocating to
>>> create new threads with the desired CPU affinity. Further, to make it
>>> easier to use, allow creation of "thread-context" objects, including
>>> setting the CPU affinity directly from QEMU, *before* enabling the
>>> sandbox option.
>>>
>>>
>>> Quick test on a system with 2 NUMA nodes:
>>>
>>> Without CPU affinity:
>>> time qemu-system-x86_64 \
>>> -object 
>>> memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind
>>>  \
>>> -nographic -monitor stdio
>>>
>>> real0m5.383s
>>> real0m3.499s
>>> real0m5.129s
>>> real0m4.232s
>>> real0m5.220s
>>> real0m4.288s
>>> real0m3.582s
>>> real0m4.305s
>>> real0m5.421s
>>> real0m4.502s
>>>
>>> -> It heavily depends on the scheduler CPU selection
>>>
>>> With CPU affinity:
>>> time qemu-system-x86_64 \
>>> -object thread-context,id=tc1,node-affinity=0 \
>>> -object 
>>> memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1
>>>  \
>>> -sandbox enable=on,resourcecontrol=deny \
>>> -nographic -monitor stdio
>>>
>>> real0m1.959s
>>> real0m1.942s
>>> real0m1.943s
>>> real0m1.941s
>>> real0m1.948s
>>> real0m1.964s
>>> real0m1.949s
>>> real0m1.948s
>>> real0m1.941s
>>> real0m1.937s
>>>
>>> On reasonably large VMs, the speedup can be quite significant.
>>>
>> Really awesome work!
> 
> Thanks!
> 
>>
>> I am not sure I picked up this well while reading the series, but it seems 
>> to me that
>> prealloc is still serialized on per memory-backend when solely configured by 
>> command-line
>> right?
> 
> I think it's serialized in any case, even when preallocation is
> triggered manually using prealloc=on. I might be wrong, but any kind of
> object creation or property changes should be serialized by the BQL.
> 
> In theory, we can "easily" preallocate in our helper --
> qemu_prealloc_mem() -- concurrently when we don't have to bother about
> handling SIGBUS -- that is, when the kernel supports
> MADV_POPULATE_WRITE. Without MADV_POPULATE_WRITE on older kernels, we'll
> serialize in there as well.
> 
>>
>> Meaning when we start prealloc we wait until the memory-backend 
>> thread-context action is
>> completed (per-memory-backend) even if other to-be-configured 
>> memory-backends will use a
>> thread-context on a separate set of pinned CPUs on another node ... and 
>> wouldn't in theory
>> "need" to wait until the former prealloc finishes?
> 
> Yes. This series only takes care of NUMA-aware preallocation, but
> doesn't preallocate multiple memory backends in parallel.
> 
> In theory, it would be quite easy to preallocate concurrently: simply

Re: [PATCH RFC 3/7] util: Introduce ThreadContext user-creatable object

2022-08-05 Thread Michal Prívozník
On 7/21/22 14:07, David Hildenbrand wrote:
> Setting the CPU affinity of QEMU threads is a bit problematic, because
> QEMU doesn't always have permissions to set the CPU affinity itself,
> for example, with seccomp after initialized by QEMU:
> -sandbox enable=on,resourcecontrol=deny
> 
> While upper layers are already aware how to handl;e CPU affinities for
> long-lived threads like iothreads or vcpu threads, especially short-lived
> threads, as used for memory-backend preallocation, are more involved to
> handle. These threads are created on demand and upper layers are not even
> able to identify and configure them.
> 
> Introduce the concept of a ThreadContext, that is essentially a thread
> used for creating new threads. All threads created via that context
> thread inherit the configured CPU affinity. Consequently, it's
> sufficient to create a ThreadContext and configure it once, and have all
> threads created via that ThreadContext inherit the same CPU affinity.
> 
> The CPU affinity of a ThreadContext can be configured two ways:
> 
> (1) Obtaining the thread id via the "thread-id" property and setting the
> CPU affinity manually.
> 
> (2) Setting the "cpu-affinity" property and letting QEMU try set the
> CPU affinity itself. This will fail if QEMU doesn't have permissions
> to do so anymore after seccomp was initialized.
> 
> A ThreadContext can be reused, simply be reconfiguring the CPU affinity.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  include/qemu/thread-context.h |  58 +++
>  qapi/qom.json |  16 ++
>  util/meson.build  |   1 +
>  util/oslib-posix.c|   1 +
>  util/thread-context.c | 279 ++
>  5 files changed, 355 insertions(+)
>  create mode 100644 include/qemu/thread-context.h
>  create mode 100644 util/thread-context.c
> 
> diff --git a/include/qemu/thread-context.h b/include/qemu/thread-context.h
> new file mode 100644
> index 00..c799cbe7a1
> --- /dev/null
> +++ b/include/qemu/thread-context.h
> @@ -0,0 +1,58 @@
> +/*
> + * QEMU Thread Context
> + *
> + * Copyright Red Hat Inc., 2022
> + *
> + * Authors:
> + *  David Hildenbrand 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef SYSEMU_THREAD_CONTEXT_H
> +#define SYSEMU_THREAD_CONTEXT_H
> +
> +#include "qapi/qapi-types-machine.h"
> +#include "qemu/thread.h"
> +#include "qom/object.h"
> +
> +#define TYPE_THREAD_CONTEXT "thread-context"
> +OBJECT_DECLARE_TYPE(ThreadContext, ThreadContextClass,
> +THREAD_CONTEXT)
> +
> +struct ThreadContextClass {
> +ObjectClass parent_class;
> +};
> +
> +struct ThreadContext {
> +/* private */
> +Object parent;
> +
> +/* private */
> +unsigned int thread_id;
> +QemuThread thread;
> +
> +/* Semaphore to wait for context thread action. */
> +QemuSemaphore sem;
> +/* Semaphore to wait for action in context thread. */
> +QemuSemaphore sem_thread;
> +/* Mutex to synchronize requests. */
> +QemuMutex mutex;
> +
> +/* Commands for the thread to execute. */
> +int thread_cmd;
> +void *thread_cmd_data;
> +
> +/* CPU affinity bitmap used for initialization. */
> +unsigned long *init_cpu_bitmap;
> +int init_cpu_nbits;
> +};
> +
> +void thread_context_create_thread(ThreadContext *tc, QemuThread *thread,
> +  const char *name,
> +  void *(*start_routine)(void *), void *arg,
> +  int mode);
> +
> +#endif /* SYSEMU_THREAD_CONTEXT_H */
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 80dd419b39..4775a333ed 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -830,6 +830,20 @@
>  'reduced-phys-bits': 'uint32',
>  '*kernel-hashes': 'bool' } }
>  
> +##
> +# @ThreadContextProperties:
> +#
> +# Properties for thread context objects.
> +#
> +# @cpu-affinity: the CPU affinity for all threads created in the thread
> +#context (default: QEMU main thread affinity)
> +#
> +# Since: 7.2
> +##
> +{ 'struct': 'ThreadContextProperties',
> +  'data': { '*cpu-affinity': ['uint16'] } }
> +
> +
>  ##
>  # @ObjectType:
>  #
> @@ -882,6 +896,7 @@
>  { 'name': 'secret_keyring',
>'if': 'CONFIG_SECRET_KEYRING' },
>  'sev-guest',
> +'thread-context',
>  's390-pv-guest',
>  'throttle-group',
>  'tls-creds-anon',
> @@ -948,6 +963,7 @@
>'secret_keyring': { 'type': 'SecretKeyringProperties',
>'if': 'CONFIG_SECRET_KEYRING' },
>'sev-guest':  'SevGuestProperties',
> +  'thread-context': 'ThreadContextProperties',
>'throttle-group': 'ThrottleGroupProperties',
>'tls-creds-anon': 'TlsCredsAnonProperties',
>'tls-creds-psk': 

Re: [PATCH RFC 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

2022-08-05 Thread Michal Prívozník
On 7/21/22 14:07, David Hildenbrand wrote:
> This is a follow-up on "util: NUMA aware memory preallocation" [1] by
> Michal.
> 
> Setting the CPU affinity of threads from inside QEMU usually isn't
> easily possible, because we don't want QEMU -- once started and running
> guest code -- to be able to mess up the system. QEMU disallows relevant
> syscalls using seccomp, such that any such invocation will fail.
> 
> Especially for memory preallocation in memory backends, the CPU affinity
> can significantly increase guest startup time, for example, when running
> large VMs backed by huge/gigantic pages, because of NUMA effects. For
> NUMA-aware preallocation, we have to set the CPU affinity, however:
> 
> (1) Once preallocation threads are created during preallocation, management
> tools cannot intercept anymore to change the affinity. These threads
> are created automatically on demand.
> (2) QEMU cannot easily set the CPU affinity itself.
> (3) The CPU affinity derived from the NUMA bindings of the memory backend
> might not necessarily be exactly the CPUs we actually want to use
> (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).
> 
> There is an easy "workaround". If we have a thread with the right CPU
> affinity, we can simply create new threads on demand via that prepared
> context. So, all we have to do is setup and create such a context ahead
> of time, to then configure preallocation to create new threads via that
> environment.
> 
> So, let's introduce a user-creatable "thread-context" object that
> essentially consists of a context thread used to create new threads.
> QEMU can either try setting the CPU affinity itself ("cpu-affinity",
> "node-affinity" property), or upper layers can extract the thread id
> ("thread-id" property) to configure it externally.
> 
> Make memory-backends consume a thread-context object
> (via the "prealloc-context" property) and use it when preallocating to
> create new threads with the desired CPU affinity. Further, to make it
> easier to use, allow creation of "thread-context" objects, including
> setting the CPU affinity directly from QEMU, *before* enabling the
> sandbox option.
> 
> 
> Quick test on a system with 2 NUMA nodes:
> 
> Without CPU affinity:
> time qemu-system-x86_64 \
> -object 
> memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind
>  \
> -nographic -monitor stdio
> 
> real0m5.383s
> real0m3.499s
> real0m5.129s
> real0m4.232s
> real0m5.220s
> real0m4.288s
> real0m3.582s
> real0m4.305s
> real0m5.421s
> real0m4.502s
> 
> -> It heavily depends on the scheduler CPU selection
> 
> With CPU affinity:
> time qemu-system-x86_64 \
> -object thread-context,id=tc1,node-affinity=0 \
> -object 
> memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1
>  \
> -sandbox enable=on,resourcecontrol=deny \
> -nographic -monitor stdio
> 
> real0m1.959s
> real0m1.942s
> real0m1.943s
> real0m1.941s
> real0m1.948s
> real0m1.964s
> real0m1.949s
> real0m1.948s
> real0m1.941s
> real0m1.937s
> 
> On reasonably large VMs, the speedup can be quite significant.
> 

I've timed 'virsh start' with a guest that has 47GB worth of 1GB
hugepages and seen the startup time halved basically (from 10.5s to
5.6s). The host has 4 NUMA nodes and I'm pinning the guest onto two nodes.

I've written libvirt counterpart (which I'll post as soon as these are
merged). The way it works is the whenever .prealloc-threads= is to be
used AND qemu is capable of thread-context the thread-context object is
generated before every memory-backend-*, like this:

-object
'{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[2]}' \
-object
'{"qom-type":"memory-backend-memfd","id":"ram-node0","hugetlb":true,"hugetlbsize":1073741824,"share":true,"prealloc":true,"prealloc-threads":16,"size":21474836480,"host-nodes":[2],"policy":"bind","prealloc-context":"tc-ram-node0"}'
\
-numa node,nodeid=0,cpus=0,cpus=2,memdev=ram-node0 \
-object
'{"qom-type":"thread-context","id":"tc-ram-node1","node-affinity":[3]}' \
-object
'{"qom-type":"memory-backend-memfd","id":"ram-node1","hugetlb":true,"hugetlbsize":1073741824,"share":true,"prealloc":true,"prealloc-threads":16,"size":28991029248,"host-nodes":[3],"policy":"bind","prealloc-context":"tc-ram-node1"}'
\


Now, it's not visible in this snippet, but my code does not reuse
thread-context objects. So if there's another memfd, it'll get its own TC:

-object
'{"qom-type":"thread-context","id":"tc-memdimm0","node-affinity":[1]}' \
-object
'{"qom-type":"memory-backend-memfd","id":"memdimm0","hugetlb":true,"hugetlbsize":1073741824,"share":true,"prealloc":true,"prealloc-thread

Re: [PATCH RFC 7/7] vl: Allow ThreadContext objects to be created before the sandbox option

2022-08-05 Thread Michal Prívozník
On 7/21/22 14:07, David Hildenbrand wrote:
> Currently, there is no way to configure a CPU affinity inside QEMU when
> the sandbox option disables it for QEMU as a whole, for example, via:
> -sandbox enable=on,resourcecontrol=deny
> 
> While ThreadContext objects can be created on the QEMU commandline and
> the CPU affinity can be configured externally via the thread-id, this is
> insufficient if a ThreadContext with a certain CPU affinity is already
> required during QEMU startup, before we can intercept QEMU and
> configure the CPU affinity.
> 
> Blocking sched_setaffinity() was introduced in 24f8cdc57224 ("seccomp:
> add resourcecontrol argument to command line"), "to avoid any bigger of the
> process". However, we only care about once QEMU is running, not when
> the instance starting QEMU explicitly requests a certain CPU affinity
> on the QEMU comandline.
> 
> Right now, for NUMA-aware preallocation of memory backends used for initial
> machine RAM, one has to:
> 
> 1) Start QEMU with the memory-backend with "prealloc=off"
> 2) Pause QEMU before it starts the guest (-S)
> 3) Create ThreadContext, configure the CPU affinity using the thread-id
> 4) Configure the ThreadContext as "prealloc-context" of the memory
>backend
> 5) Trigger preallocation by setting "prealloc=on"
> 
> To simplify this handling especially for initial machine RAM,
> allow creation of ThreadContext objects before parsing sandbox options,
> such that the CPU affinity requested on the QEMU commandline alongside the
> sandbox option can be set. As ThreadContext objects essentially only create
> a persistant context thread and set the CPU affinity, this is easily
> possible.
> 
> With this change, we can create a ThreadContext with a CPU affinity on
> the QEMU commandline and use it for preallocation of memory backends
> glued to the machine (simplified example):
> 
> qemu-system-x86_64 -m 1G \
>  -object thread-context,id=tc1,cpu-affinity=3-4 \
>  -object 
> memory-backend-ram,id=pc.ram,size=1G,prealloc=on,prealloc-threads=2,prealloc-context=tc1
>  \
>  -machine memory-backend=pc.ram \
>  -S -monitor stdio -sandbox enable=on,resourcecontrol=deny
> 
> And while we can query the current CPU affinity:
>   (qemu) qom-get tc1 cpu-affinity
>   [
>   3,
>   4
>   ]
> 
> We can no longer change it from QEMU directly:
>   (qemu) qom-set tc1 cpu-affinity 1-2
>   Error: Setting CPU affinity failed: Operation not permitted
> 
> Signed-off-by: David Hildenbrand 
> ---
>  softmmu/vl.c | 30 +-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index aabd82e09a..252732cf5d 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -1761,6 +1761,27 @@ static void object_option_parse(const char *optarg)
>  visit_free(v);
>  }
>  
> +/*
> + * Very early object creation, before the sandbox options have been 
> activated.
> + */
> +static bool object_create_pre_sandbox(const char *type)
> +{
> +/*
> + * Objects should in general not get initialized "too early" without
> + * a reason. If you add one, state the reason in a comment!
> + */
> +
> +/*
> + * Reason: -sandbox on,resourcecontrol=deny disallows setting CPU
> + * affinity of threads.
> + */
> +if (g_str_equal(type, "thread-context")) {
> +return true;
> +}
> +
> +return false;
> +}
> +
>  /*
>   * Initial object creation happens before all other
>   * QEMU data types are created. The majority of objects
> @@ -1775,6 +1796,11 @@ static bool object_create_early(const char *type)
>   * add one, state the reason in a comment!
>   */
>  
> +/* Reason: already created. */
> +if (object_create_pre_sandbox(type)) {
> +return false;
> +}
> +
>  /* Reason: property "chardev" */
>  if (g_str_equal(type, "rng-egd") ||
>  g_str_equal(type, "qtest")) {
> @@ -1895,7 +1921,7 @@ static void qemu_create_early_backends(void)
>   */
>  static bool object_create_late(const char *type)
>  {
> -return !object_create_early(type);
> +return !object_create_early(type) && !object_create_pre_sandbox(type);
>  }
>  
>  static void qemu_create_late_backends(void)
> @@ -2365,6 +2391,8 @@ static int process_runstate_actions(void *opaque, 
> QemuOpts *opts, Error **errp)
>  
>  static void qemu_process_early_options(void)
>  {
> +object_option_foreach_add(object_create_pre_sandbox);
> +
>  #ifdef CONFIG_SECCOMP
>  QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL);
>  if (olist) {

Cool, this is processed before -sandbox, so threads can have their
affinity. However, it's also processed before -name debug-threads=on
which means that even though we try to set a thread name in 3/7, it's
effectively a dead code because name_threads from
util/qemu-thread-posix.c is still false. Could we shift things a bit?
E.g. like this:

static void qemu_process_early_options(void)
{
qemu_opts_foreach(qemu_find_opts("name"),
  

Re: [PATCH RFC 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

2022-07-25 Thread Michal Prívozník
On 7/21/22 14:07, David Hildenbrand wrote:
> This is a follow-up on "util: NUMA aware memory preallocation" [1] by
> Michal.

I've skimmed through patches and haven't spotted anything obviously
wrong. I'll test these more once I write libvirt support for them (which
I plan to do soon).

Michal




Re: [PATCH 1/2] usb: document guest-reset and guest-reset-all

2022-07-12 Thread Michal Prívozník
On 7/11/22 11:44, Gerd Hoffmann wrote:
> Suggested-by: Michal Prívozník 
> Signed-off-by: Gerd Hoffmann 
> ---
>  docs/system/devices/usb.rst | 29 +
>  1 file changed, 29 insertions(+)

Thank you!

Reviewed-by: Michal Privoznik 

Michal




Re: [libvirt PATCH v2] tools: add virt-qmp-proxy for proxying QMP clients to libvirt QEMU guests

2022-07-01 Thread Michal Prívozník
On 6/20/22 19:19, Daniel P. Berrangé wrote:
> Libvirt provides QMP passthrough APIs for the QEMU driver and these are
> exposed in virsh. It is not especially pleasant, however, using the raw
> QMP JSON syntax. QEMU has a tool 'qmp-shell' which can speak QMP and
> exposes a human friendly interactive shell. It is not possible to use
> this with libvirt managed guest, however, since only one client can
> attach to he QMP socket at any point in time.
> 
> The virt-qmp-proxy tool aims to solve this problem. It opens a UNIX
> socket and listens for incoming client connections, speaking QMP on
> the connected socket. It will forward any QMP commands received onto
> the running libvirt QEMU guest, and forward any replies back to the
> QMP client.
> 
>   $ virsh start demo
>   $ virt-qmp-proxy demo demo.qmp &
>   $ qmp-shell demo.qmp
>   Welcome to the QMP low-level shell!
>   Connected to QEMU 6.2.0
> 
>   (QEMU) query-kvm
>   {
>   "return": {
>   "enabled": true,
>   "present": true
>   }
>   }
> 
> Note this tool of course has the same risks as the raw libvirt
> QMP passthrough. It is safe to run query commands to fetch information
> but commands which change the QEMU state risk disrupting libvirt's
> management of QEMU, potentially resulting in data loss/corruption in
> the worst case.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
> 
> Changed in v2:
> 
>  - Rewrote to not be such a gross hack, specifically
>   - Wired up usage of libvirt event loop for sock I/O
>   - Register with libvirt for QMP events
>   - Incrementally read from socket & try json parsing
> until we get a full command, instead of assuming
> a full command in one read
>   - Forwarding of passed FDs in both directions
> (libvirt -> client untested, since AFAIK, no
> QMP cmd returns FDs currently)
> 
> 
> Other thought
> 
> This patch is against libvirt.git but has a dependancy on the
> libvirt-python.git APIs. If we put this in libvirt-client RPM
> then we get a new dep on python.
> 
> Perhaps better to have this live in libvirt-python.git/examples,
> though I would like it present as a standard tool ? Another
> option is to bundle with virt-install which is a python app
> commonly present on virt hosts ?

Or, we could have it in a separate RPM which would require
libvirt-client and libvirt-python.

> 
>  docs/manpages/meson.build|   1 +
>  docs/manpages/virt-qmp-proxy.rst | 120 +++
>  tools/meson.build|   5 +
>  tools/virt-qmp-proxy | 360 +++
>  4 files changed, 486 insertions(+)
>  create mode 100644 docs/manpages/virt-qmp-proxy.rst
>  create mode 100755 tools/virt-qmp-proxy

Reviewed-by: Michal Privoznik 

Michal




Re: [PATCH v3] qga: add guest-get-diskstats command for Linux guests

2022-05-17 Thread Michal Prívozník
On 5/15/22 11:54, luzhipeng wrote:
> Add a new 'guest-get-diskstats' command for report disk io statistics
> for Linux guests. This can be usefull for getting io flow or handling
> IO fault, no need to enter guests.
> 
> Signed-off-by: luzhipeng 
> ---
> Changes v2->v3:
>  bugfix for memory leak
> Changes v1->v2:
>  v1:https://patchew.org/QEMU/20220512011930.214-1-luzhip...@cestc.cn/
>  
>  qga/commands-posix.c | 99 
>  qga/commands-win32.c |  6 +++
>  qga/qapi-schema.json | 86 ++
>  3 files changed, 191 insertions(+)
> 

Couple of points, see below:

> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> index 69f209af87..1d89be8a83 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -2783,6 +2783,98 @@ GuestMemoryBlockInfo 
> *qmp_guest_get_memory_block_info(Error **errp)
>  return info;
>  }
>  
> +#define MAX_NAME_LEN 128
> +static GuestDiskStatsInfoList *guest_get_diskstats(Error **errp)
> +{
> +#ifdef CONFIG_LINUX
> +GuestDiskStatsInfoList *head = NULL, **tail = &head;
> +const char *diskstats = "/proc/diskstats";
> +FILE *fp;
> +size_t n;
> +char *line = NULL;
> +char dev_name[MAX_NAME_LEN];
> +int i;
> +unsigned int ios_pgr, tot_ticks, rq_ticks, wr_ticks, dc_ticks, fl_ticks;
> +unsigned long rd_ios, rd_merges_or_rd_sec, rd_ticks_or_wr_sec, wr_ios;
> +unsigned long wr_merges, rd_sec_or_wr_ios, wr_sec;
> +unsigned long dc_ios, dc_merges, dc_sec, fl_ios;
> +unsigned int major, minor;

Nearly all these variables are used solely inside the while() loop below. Might 
be worth moving them there so decrease clutter here.

> +
> +fp = fopen(diskstats, "r");
> +if (fp  == NULL) {
> +error_setg_errno(errp, errno, "open(\"%s\")", diskstats);
> +return NULL;
> +}
> +while (getline(&line, &n, fp) != -1) {
> +GuestDiskStatsInfo *diskstatinfo;
> +GuestDiskStats *diskstat;

Empty line after variable declaration block please.

Also, if these are declared as g_autofree, then you don't need to free them 
explicitly in every error path.

> +i = sscanf(line, "%u %u %s %lu %lu %lu"
> +   "%lu %lu %lu %lu %u %u %u %u"
> +   "%lu %lu %lu %u %lu %u",
> +  &major, &minor, dev_name,
> +  &rd_ios, &rd_merges_or_rd_sec, &rd_sec_or_wr_ios,
> +  &rd_ticks_or_wr_sec, &wr_ios, &wr_merges, &wr_sec,
> +  &wr_ticks, &ios_pgr, &tot_ticks, &rq_ticks,
> +  &dc_ios, &dc_merges, &dc_sec, &dc_ticks,
> +  &fl_ios, &fl_ticks);
> +
> +diskstatinfo = g_new0(GuestDiskStatsInfo, 1);
> +diskstat = g_new0(GuestDiskStats, 1);
> +if (i < 7) {
> +g_free(diskstat);
> +g_free(diskstatinfo);
> +continue;
> +}

How about checking whether i < 7 and only after it wasn't allocating memory?

> +diskstatinfo->name = g_strdup(dev_name);
> +diskstatinfo->major = major;
> +diskstatinfo->minor = minor;
> +if (i == 7) {
> +diskstat->read_ios = rd_ios;
> +diskstat->read_sectors = rd_merges_or_rd_sec;
> +diskstat->write_ios = rd_sec_or_wr_ios;
> +diskstat->write_sectors = rd_ticks_or_wr_sec;
> +}
> +if (i >= 14) {
> +diskstat->read_ios = rd_ios;
> +diskstat->read_sectors = rd_sec_or_wr_ios;
> +diskstat->read_merges = rd_merges_or_rd_sec;
> +diskstat->read_ticks = rd_ticks_or_wr_sec;
> +diskstat->write_ios = wr_ios;
> +diskstat->write_sectors = wr_sec;
> +diskstat->write_merges = wr_merges;
> +diskstat->write_ticks = wr_ticks;
> +diskstat->ios_pgr = ios_pgr;
> +diskstat->total_ticks = tot_ticks;
> +diskstat->weight_ticks = rq_ticks;
> +}
> +if (i >= 18) {
> +diskstat->discard_ios = dc_ios;
> +diskstat->discard_merges = dc_merges;
> +diskstat->discard_sectors = dc_sec;
> +diskstat->discard_ticks = dc_ticks;
> +}
> +if (i >= 20) {
> +diskstat->flush_ios = fl_ios;
> +diskstat->flush_ticks = fl_ticks;
> +}
> +

So this is best effort and some variables are going to be unset. I think we 
should give callers a chance to differentiate whether zero value they are 
seeing is a result of us parsing zero in sscanf() above OR whether we just did 
not bother filling the value.

> +diskstatinfo->stats = diskstat;

If you go with g_autofree as suggested above then this could just be:

  diskstatinfo->stats = g_steal_pointer(&diskstat);

> +QAPI_LIST_APPEND(tail, diskstatinfo);
> +}
> +g_free(line);
> +fclose(fp);
> +return head;
> +#else
> +g_debug("disk stats reporting available only for Linux");
> +return NU

Re: [PATCH] util: NUMA aware memory preallocation

2022-05-11 Thread Michal Prívozník
On 5/10/22 11:12, Daniel P. Berrangé wrote:
> On Tue, May 10, 2022 at 08:55:33AM +0200, Michal Privoznik wrote:
>> When allocating large amounts of memory the task is offloaded
>> onto threads. These threads then use various techniques to
>> allocate the memory fully (madvise(), writing into the memory).
>> However, these threads are free to run on any CPU, which becomes
>> problematic on NUMA machines because it may happen that a thread
>> is running on a distant node.
>>
>> Ideally, this is something that a management application would
>> resolve, but we are not anywhere close to that, Firstly, memory
>> allocation happens before monitor socket is even available. But
>> okay, that's what -preconfig is for. But then the problem is that
>> 'object-add' would not return until all memory is preallocated.
>>
>> Long story short, management application has no way of learning
>> TIDs of allocator threads so it can't make them run NUMA aware.
> 
> So I'm wondering what the impact of this problem is for various
> scenarios.

The scenario which I tested this with was no  but using
'virsh emulatorpin' afterwards to pin emulator thread somewhere. For
those which are unfamiliar with libvirt, this is about placing the main
qemu TID (with the main eventloop) into a CGroup that restricts on what
CPUs it can run.

> 
> The default config for a KVM guest with libvirt is no CPU pinning
> at all. The kernel auto-places CPUs and decides on where RAM is to
> be allocated. So in this case, whether or not libvirt can talk to
> QMP in time to query threads is largely irrelevant, as we don't
> want todo placement in any case.
> 
> In theory the kernel should allocate RAM on the node local to
> where the process is currently executing. So as long as the
> guest RAM fits in available free RAM on the local node, RAM
> should be allocated from the node that matches the CPU running
> the QEMU main thread.
> 
> The challenge is if we spawn N more threads to do pre-alloc,
> these can be spread onto other nodes. I wonder if the kernel
> huas any preference for keeping threads within a process on
> the same NUMA node ?

That's not exactly what I saw. I would have thought too that initially
the prealloc thread could be spawned just anywhere but after few
iterations the scheduler realized what NUMA node the thread is close to
and automatically schedule it to run there. Well, it didn't happen.

> 
> Overall, if libvirt is not applying pinning to the QEMU guest,
> then we're 100% reliant on the kernel todo something sensible,
> both for normal QEMU execution and for prealloc. Since we're
> not doing placement of QEMU RAM or CPUs, the logic in this
> patch won't do anything either.
> 
> 
> If the guest has more RAM than can fit on the local NUMA node,
> then we're doomed no matter what, even ignoring prealloc, there
> will be cross-node traffic. This scenario requires the admin to
> setup proper CPU /memory pinning for QEMU in libvirt.
> 
> If libvirt is doing CPU pinning (as instructed by the mgmt app
> above us), then when we first start QEMU, the process thread
> leader will get given affinity by libvirt prior to exec. This
> affinity will be the union of affinity for all CPUs that will
> be later configured.
> 
> The typical case for CPU pinning, is that everything fits in
> one NUMA node, and so in this case, we don't need todo anything
> more. The prealloc threads will already be constrained to the
> right place by the affinity of the QEMU thread leader, so the
> logic in this patch will run, but it won't do anything that
> was not already done.
> 
> So we're left with the hardest case, where the guest is explicitly
> spread across multiple NUMA nodes. In this case the thread leader
> affinity will span many NUMA nodes, and so the prealloc threads
> will freely be placed across any CPU that is in the union of CPUs
> the guest is placed on. Just as with thue non-pinned case, the
> prealloc will be at the mercy of the kernel making sensible
> placement decisions.

Indeed, but it's at least somewhat restricted. NB, in real scenario
users will map guest NUMA nodes onto host ones with 1:1 relationship.
And each guest NUMA node will have its own memdev=, i.e. its own set of
threads, so in the end, prealloc threads won't jump between host NUMA
nodes but stay local to the node they are allocating memory on.

> 
> The very last cases is the only one where this patch can potentially
> be beneficial. The problem is that because libvirt is in charge of
> enforcing CPU affinity, IIRC, we explicitly block QEMU from doing
> anything with CPU affinity. So AFAICT, this patch should result in
> an error from sched_setaffinity when run under libvirt.

Yes, I had to disable capability dropping in qemu.conf.

After all, I think maybe the right place to fix this is kernel? I mean,
why don't prealloc threads converge to the nodes they are working with?

Michal




Re: [PATCH 3/3] qga/commands-posix: Fix listing ifaces for Solaris

2022-03-21 Thread Michal Prívozník
On 3/20/22 22:38, Andrew Deason wrote:
> The code for guest-network-get-interfaces needs a couple of small
> adjustments for Solaris:
> 
> - The results from SIOCGIFHWADDR are documented as being in ifr_addr,
>   not ifr_hwaddr (ifr_hwaddr doesn't exist on Solaris).
> 
> - The implementation of guest_get_network_stats is Linux-specific, so
>   hide it under #ifdef CONFIG_LINUX. On non-Linux, we just won't
>   provide network interface stats.
> 
> Signed-off-by: Andrew Deason 
> ---
>  qga/commands-posix.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> index bd0d67f674..c0b00fc488 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -2781,20 +2781,21 @@ guest_find_interface(GuestNetworkInterfaceList *head,
>  return head->value;
>  }
>  }
>  
>  return NULL;
>  }
>  
>  static int guest_get_network_stats(const char *name,
> GuestNetworkInterfaceStat *stats)
>  {
> +#ifdef CONFIG_LINUX
>  int name_len;
>  char const *devinfo = "/proc/net/dev";
>  FILE *fp;
>  char *line = NULL, *colon;
>  size_t n = 0;
>  fp = fopen(devinfo, "r");
>  if (!fp) {
>  return -1;
>  }
>  name_len = strlen(name);
> @@ -2836,20 +2837,21 @@ static int guest_get_network_stats(const char *name,
>  stats->tx_errs = tx_errs;
>  stats->tx_dropped = tx_dropped;
>  fclose(fp);
>  g_free(line);
>  return 0;
>  }
>  }
>  fclose(fp);
>  g_free(line);
>  g_debug("/proc/net/dev: Interface '%s' not found", name);
> +#endif /* CONFIG_LINUX */

I wonder whether we should signal this somehow. I mean, have something
like this:

#else /* !CONFIG_LINUX */
  g_debug("Stats reporting available only for Linux");
#endif /* !CONFIG_LINUX */

>  return -1;
>  }

A counter argument is that if fopen() above fails then -1 is returned
without any error/debug message reported. And stats fetching is best
effort anyway.

Michal




Re: [PATCH 0/3] qga: Implement guest-network-get-interfaces for Solaris

2022-03-21 Thread Michal Prívozník
On 3/20/22 22:38, Andrew Deason wrote:
> This implements the guest agent guest-network-get-interfaces command on
> Solaris. Solaris provides a getifaddrs() that's very similar to the Linux one,
> so the implementation is mostly the same.
> 
> Andrew Deason (3):
>   qga/commands-posix: Use getifaddrs when available
>   qga/commands-posix: Fix iface hw address detection
>   qga/commands-posix: Fix listing ifaces for Solaris
> 
>  meson.build  |   1 +
>  qga/commands-posix.c | 488 
> +++
>  2 files changed, 260 insertions(+), 229 deletions(-)
> 

Reviewed-by: Michal Privoznik 

Michal




Re: [PATCH] x86/amx: compatible with older kernel release

2022-03-18 Thread Michal Prívozník
On 3/18/22 12:55, Yang Zhong wrote:
> The AMX KVM introduced one new ARCH_GET_XCOMP_SUPP system attribute
> API to get host side supported_xcr0 and latest Qemu can decide if it
> can request dynamically enabled XSAVE features permission. But this
> implementation(19db68ca68) did not consider older kernel release.
> This patch can avoid to read this new KVM_GET_DEVICE_ATTR ioctl.
> 
> Signed-off-by: Yang Zhong 
> ---
>  target/i386/kvm/kvm.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Tested-by: Michal Privoznik 

Thank you.

Michal




Re: [PULL 15/22] x86: Grant AMX permission for guest

2022-03-18 Thread Michal Prívozník
On 3/16/22 16:57, Peter Krempa wrote:
> On Tue, Mar 08, 2022 at 12:34:38 +0100, Paolo Bonzini wrote:
>> From: Yang Zhong 
>>
>> Kernel allocates 4K xstate buffer by default. For XSAVE features
>> which require large state component (e.g. AMX), Linux kernel
>> dynamically expands the xstate buffer only after the process has
>> acquired the necessary permissions. Those are called dynamically-
>> enabled XSAVE features (or dynamic xfeatures).
>>
>> There are separate permissions for native tasks and guests.
>>
>> Qemu should request the guest permissions for dynamic xfeatures
>> which will be exposed to the guest. This only needs to be done
>> once before the first vcpu is created.
>>
>> KVM implemented one new ARCH_GET_XCOMP_SUPP system attribute API to
>> get host side supported_xcr0 and Qemu can decide if it can request
>> dynamically enabled XSAVE features permission.
>> https://lore.kernel.org/all/20220126152210.3044876-1-pbonz...@redhat.com/
>>
>> Suggested-by: Paolo Bonzini 
>> Signed-off-by: Yang Zhong 
>> Signed-off-by: Jing Liu 
>> Message-Id: <20220217060434.52460-4-yang.zh...@intel.com>
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  target/i386/cpu.c  |  7 +
>>  target/i386/cpu.h  |  4 +++
>>  target/i386/kvm/kvm-cpu.c  | 12 
>>  target/i386/kvm/kvm.c  | 57 ++
>>  target/i386/kvm/kvm_i386.h |  1 +
>>  5 files changed, 75 insertions(+), 6 deletions(-)
> 
> With this commit qemu crashes for me when invoking the following
> QMP command:
> 
> $ ~pipo/git/qemu.git/build/qemu-system-x86_64 -S -no-user-config -nodefaults 
> -nographic -machine none,accel=kvm -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 90, "minor": 2, "major": 6}, 
> "package": "v7.0.0-rc0-8-g1d60bb4b14"}, "capabilities": ["oob"]}}
> {'execute':'qmp_capabilities'}
> {"return": {}}
> {"execute":"qom-list-properties","arguments":{"typename":"max-x86_64-cpu"},"id":"libvirt-41"}
> qemu-system-x86_64: ../target/i386/kvm/kvm-cpu.c:105: kvm_cpu_xsave_init: 
> Assertion `esa->size == eax' failed.
> Aborted (core dumped)
> 
> Note that the above is on a box with an 'AMD Ryzen 9 3900X'.
> 
> Curiously on a laptop with an Intel chip (Intel(R) Core(TM) i7-10610U)
> it seems to work.
> 
> 

Not trying to beat a dead horse here, but I've just found another
problem with this patch. On my laptop (Linux maggie
5.15.26-gentoo-x86_64 #1 SMP Thu Mar 10 08:55:28 CET 2022 x86_64
Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz GenuineIntel GNU/Linux), when
I start a guest it no longer sees AVX instructions:

  qemu.git $ ./build/qemu-system-x86_64 -accel kvm -cpu host ...

Michal




Re: Call for GSoC and Outreachy project ideas for summer 2022

2022-02-21 Thread Michal Prívozník
On 2/21/22 12:27, Paolo Bonzini wrote:
> On 2/21/22 10:36, Michal Prívozník wrote:
>> Indeed. Libvirt's participating on its own since 2016, IIRC. Since we're
>> still in org acceptance phase we have some time to decide this,
>> actually. We can do the final decision after participating orgs are
>> announced. My gut feeling says that it's going to be more work on QEMU
>> side which would warrant it to be on the QEMU ideas page.
> 
> There are multiple projects that can be done on this topic, some
> QEMU-only, some Libvirt-only.  For now I would announce the project on
> the Libvirt side (and focus on those projects) since you are comentoring.
> 

Alright then. I've listed the project idea here:

https://gitlab.com/libvirt/libvirt/-/issues/276

Please let me know what do you think.

Michal




Re: Call for GSoC and Outreachy project ideas for summer 2022

2022-02-21 Thread Michal Prívozník
On 2/19/22 14:46, Stefan Hajnoczi wrote:
> On Fri, 18 Feb 2022 at 16:03, Paolo Bonzini  wrote:
>>
>> On 2/18/22 12:39, Michal Prívozník wrote:
>>> On 2/17/22 18:52, Paolo Bonzini wrote:
>>>> I would like to co-mentor one or more projects about adding more
>>>> statistics to Mark Kanda's newly-born introspectable statistics
>>>> subsystem in QEMU
>>>> (https://patchew.org/QEMU/20220215150433.2310711-1-mark.ka...@oracle.com/),
>>>> for example integrating "info blockstats"; and/or, to add matching
>>>> functionality to libvirt.
>>>>
>>>> However, I will only be available for co-mentoring unfortunately.
>>>
>>> I'm happy to offer my helping hand in this. I mean the libvirt part,
>>> since I am a libvirt developer.
>>>
>>> I believe this will be listed in QEMU's ideas list, right?
>>
>> Does Libvirt participate to GSoC as an independent organization this
>> year?  If not, I'll add it as a Libvirt project on the QEMU ideas list.
> 
> Libvirt participates as its own GSoC organization. If a project has
> overlap we could do it in either org, or have a QEMU project and a
> libvirt project if the amount of work is large enough.

Indeed. Libvirt's participating on its own since 2016, IIRC. Since we're
still in org acceptance phase we have some time to decide this,
actually. We can do the final decision after participating orgs are
announced. My gut feeling says that it's going to be more work on QEMU
side which would warrant it to be on the QEMU ideas page.

But anyway, we can wait an see. Meanwhile, as Stefan is trying to
compile the list for org application, I'm okay with putting it onto
QEMU's list.

Michal




Re: Call for GSoC and Outreachy project ideas for summer 2022

2022-02-18 Thread Michal Prívozník
On 2/17/22 18:52, Paolo Bonzini wrote:
> On 1/28/22 16:47, Stefan Hajnoczi wrote:
>> Dear QEMU, KVM, and rust-vmm communities,
>> QEMU will apply for Google Summer of Code 2022
>> (https://summerofcode.withgoogle.com/) and has been accepted into
>> Outreachy May-August 2022 (https://www.outreachy.org/). You can now
>> submit internship project ideas for QEMU, KVM, and rust-vmm!
>>
>> If you have experience contributing to QEMU, KVM, or rust-vmm you can
>> be a mentor. It's a great way to give back and you get to work with
>> people who are just starting out in open source.
>>
>> Please reply to this email by February 21st with your project ideas.
> 
> I would like to co-mentor one or more projects about adding more
> statistics to Mark Kanda's newly-born introspectable statistics
> subsystem in QEMU
> (https://patchew.org/QEMU/20220215150433.2310711-1-mark.ka...@oracle.com/),
> for example integrating "info blockstats"; and/or, to add matching
> functionality to libvirt.
> 
> However, I will only be available for co-mentoring unfortunately.

I'm happy to offer my helping hand in this. I mean the libvirt part,
since I am a libvirt developer.

I believe this will be listed in QEMU's ideas list, right?

Michal




Re: [PATCH v1 0/2] virtio-mem: Handle preallocation with migration

2022-01-19 Thread Michal Prívozník
On 1/18/22 16:07, David Hildenbrand wrote:
> While playing with migration of virtio-mem with an ordinary file backing,
> I realized that migration and prealloc doesn't currently work as expected
> for virtio-mem, especially when migrating zeropages or skipping migration
> of some pages.
> 
> In contrast to ordinary memory backend preallocation, virtio-mem
> preallocates memory before plugging blocks to the guest. Consequently,
> when migrating we are not actually preallocating on the destination but
> "only" migrate pages. When migrating the zeropage, we might not end up
> allocating actual backend memory.
> 
> Postcopy needs some extra care, and I realized that prealloc+postcopy is
> shaky in general. Let's at least try to mimic what ordinary
> prealloc+postcopy does: temporarily allocate the memory, discard it, and
> cross fingers that we'll still have sufficient memory when postcopy
> actually tries placing pages.
> 
> For postcopy to work with prealloc=on, we need a matching "requested-size"
> on source and destination, meaning we have to start QEMU on the destination
> with the current "requested-size" on the source. Only that way, we can try
> temporarily allocating the "requested-size" to see if there is a
> fundamental issue. If we detect a mismatch, we don't start postcopy.
> 
> Cc: Juan Quintela 
> Cc: Dr. David Alan Gilbert 
> Cc: Michael S. Tsirkin 
> Cc: Michal Privoznik 
> 
> David Hildenbrand (2):
>   virtio-mem: Warn if a memory backend with "prealloc=on" is used
>   virtio-mem: Handle preallocation with migration
> 
>  hw/virtio/virtio-mem.c | 143 +
>  include/hw/virtio/virtio-mem.h |   6 ++
>  2 files changed, 149 insertions(+)
> 

I don't feel confident to review, but I feel confident enough to test:

Tested-by: Michal Privoznik 

Michal




Re: [PATCH] qemu-options: Remove the deprecated -no-quit option

2021-12-15 Thread Michal Prívozník
On 12/15/21 09:24, Thomas Huth wrote:
> This option was just a wrapper around the -display ...,window-close=off
> parameter, and the name "no-quit" is rather confusing compared to
> "window-close" (since there are still other means to quit the emulator),
> so let's remove this now.
> 
> Signed-off-by: Thomas Huth 
> ---
>  docs/about/deprecated.rst   | 6 --
>  docs/about/removed-features.rst | 7 +++
>  qemu-options.hx | 8 
>  softmmu/vl.c| 8 +---
>  4 files changed, 8 insertions(+), 21 deletions(-)

Libvirt has never used it. So from my POV: ACK.

Michal




Re: [PATCH v1 00/12] Add riscv kvm accel support

2021-12-03 Thread Michal Prívozník
On 11/20/21 08:46, Yifei Jiang wrote:
> This series adds both riscv32 and riscv64 kvm support, and implements
> migration based on riscv.

What libvirt does when detecting KVM support is issuing query-kvm
monitor command and checking if both 'present' and 'enabled' bools are
true. If this is what these patches end up with we should get KVM
acceleration in Libvirt for free.

Michal




Re: [PATCH v1 0/8] virtio-mem: Support "prealloc=on"

2021-11-30 Thread Michal Prívozník
On 11/30/21 11:41, David Hildenbrand wrote:
> Based-on: <20211130092838.24224-1-da...@redhat.com>
> 
> Patch #1 - #7 are fully reviewed [1] but did not get picked up yet, so I'm
> sending them along here, as they are required to use os_mem_prealloc() in
> a safe way once the VM is running.
> 
> Support preallocation of memory to make virtio-mem safe to use with
> scarce memory resources such as hugetlb. Before acknowledging a plug
> request from the guest, we'll try preallcoating memory. If that fails,
> we'll fail the request gracefully and warn the usr once.
> 
> To fully support huge pages for shared memory, we'll have to adjust shared
> memory users, such as virtiofsd, to map guest memory via MAP_NORESERVE as
> well, because otherwise, they'll end up overwriting the "reserve=off"
> decision made by QEMU and try reserving huge pages for the sparse memory
> region.
> 
> In the future, we might want to process guest requests, including
> preallocating memory, asynchronously via a dedicated iothread.
> 
> [1] https://lkml.kernel.org/r/20211004120208.7409-1-da...@redhat.com
> 
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Igor Mammedov 
> Cc: Eduardo Habkost 
> Cc: Dr. David Alan Gilbert 
> Cc: Daniel P. Berrangé 
> Cc: Gavin Shan 
> Cc: Hui Zhu 
> Cc: Sebastien Boeuf 
> Cc: Pankaj Gupta 
> Cc: Michal Prívozník 
> 
> David Hildenbrand (8):
>   util/oslib-posix: Let touch_all_pages() return an error
>   util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
>   util/oslib-posix: Introduce and use MemsetContext for
> touch_all_pages()
>   util/oslib-posix: Don't create too many threads with small memory or
> little pages
>   util/oslib-posix: Avoid creating a single thread with
> MADV_POPULATE_WRITE
>   util/oslib-posix: Support concurrent os_mem_prealloc() invocation
>   util/oslib-posix: Forward SIGBUS to MCE handler under Linux
>   virtio-mem: Support "prealloc=on" option
> 
>  hw/virtio/virtio-mem.c |  39 +-
>  include/hw/virtio/virtio-mem.h |   4 +
>  include/qemu/osdep.h   |   7 +
>  softmmu/cpus.c |   4 +
>  util/oslib-posix.c | 231 +
>  5 files changed, 226 insertions(+), 59 deletions(-)
> 

Reviewed-by: Michal Privoznik 

Michal




Re: [PATCH v1 2/3] virtio-mem: Support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE

2021-11-30 Thread Michal Prívozník
On 11/30/21 10:28, David Hildenbrand wrote:
> With VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we signal the VM that reading
> unplugged memory is not supported. We have to fail feature negotiation
> in case the guest does not support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE.
> 
> First, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE is required to properly handle
> memory backends (or architectures) without support for the shared zeropage
> in the hypervisor cleanly. Without the shared zeropage, even reading an
> unpopulated virtual memory location can populate real memory and
> consequently consume memory in the hypervisor. We have a guaranteed shared
> zeropage only on MAP_PRIVATE anonymous memory.
> 
> Second, we want VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE to be the default
> long-term as even populating the shared zeropage can be problematic: for
> example, without THP support (possible) or without support for the shared
> huge zeropage with THP (unlikely), the PTE page tables to hold the shared
> zeropage entries can consume quite some memory that cannot be reclaimed
> easily.
> 
> Third, there are other optimizations+features (e.g., protection of
> unplugged memory, reducing the total memory slot size and bitmap sizes)
> that will require VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE.
> 
> We really only support x86 targets with virtio-mem for now (and
> Linux similarly only support x86), but that might change soon, so prepare
> for different targets already.
> 
> Add a new "unplugged-inaccessible" tristate property for x86 targets:
> - "off" will keep VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE unset and legacy
>   guests working.
> - "on" will set VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE and stop legacy guests
>   from using the device.
> - "auto" selects the default based on support for the shared zeropage.
> 
> Warn in case the property is set to "off" and we don't have support for the
> shared zeropage.
> 
> For existing compat machines, the property will default to "off", to
> not change the behavior but eventually warn about a problematic setup.
> Short-term, we'll set the property default to "auto" for new QEMU machines.
> Mid-term, we'll set the property default to "on" for new QEMU machines.
> Long-term, we'll deprecate the parameter and disallow legacy
> guests completely.
> 
> The property has to match on the migration source and destination. "auto"
> will result in the same VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE setting as long
> as the qemu command line (esp. memdev) match -- so "auto" is good enough
> for migration purposes and the parameter doesn't have to be migrated
> explicitly.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-mem.c | 63 ++
>  include/hw/virtio/virtio-mem.h |  8 +
>  2 files changed, 71 insertions(+)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index d5a578142b..1e57156e81 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -32,6 +32,14 @@
>  #include CONFIG_DEVICES
>  #include "trace.h"
>  
> +/*
> + * We only had legacy x86 guests that did not support
> + * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. Other targets don't have legacy 
> guests.
> + */
> +#if defined(TARGET_X86_64) || defined(TARGET_I386)
> +#define VIRTIO_MEM_HAS_LEGACY_GUESTS
> +#endif
> +
>  /*
>   * Let's not allow blocks smaller than 1 MiB, for example, to keep the 
> tracking
>   * bitmap small.
> @@ -110,6 +118,19 @@ static uint64_t virtio_mem_default_block_size(RAMBlock 
> *rb)
>  return MAX(page_size, VIRTIO_MEM_MIN_BLOCK_SIZE);
>  }
>  
> +#if defined(VIRTIO_MEM_HAS_LEGACY_GUESTS)
> +static bool virtio_mem_has_shared_zeropage(RAMBlock *rb)
> +{
> +/*
> + * We only have a guaranteed shared zeropage on ordinary MAP_PRIVATE
> + * anonymous RAM. In any other case, reading unplugged *can* populate a
> + * fresh page, consuming actual memory.
> + */
> +return !qemu_ram_is_shared(rb) && rb->fd < 0 &&
> +   qemu_ram_pagesize(rb) == qemu_real_host_page_size;
> +}
> +#endif /* VIRTIO_MEM_HAS_LEGACY_GUESTS */
> +
>  /*
>   * Size the usable region bigger than the requested size if possible. Esp.
>   * Linux guests will only add (aligned) memory blocks in case they fully
> @@ -653,15 +674,29 @@ static uint64_t virtio_mem_get_features(VirtIODevice 
> *vdev, uint64_t features,
>  Error **errp)
>  {
>  MachineState *ms = MACHINE(qdev_get_machine());
> +VirtIOMEM *vmem = VIRTIO_MEM(vdev);
>  
>  if (ms->numa_state) {
>  #if defined(CONFIG_ACPI)
>  virtio_add_feature(&features, VIRTIO_MEM_F_ACPI_PXM);
>  #endif
>  }
> +assert(vmem->unplugged_inaccessible != ON_OFF_AUTO_AUTO);
> +if (vmem->unplugged_inaccessible == ON_OFF_AUTO_ON) {
> +virtio_add_feature(&features, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE);
> +}
>  return features;
>  }
>  
> +static int virtio_mem_validate_features(VirtIODevice *vdev)
> +{
> +if (virtio_host_has_feature(vd

Re: [PATCH v1 1/3] linux-headers: sync VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE

2021-11-30 Thread Michal Prívozník
On 11/30/21 10:28, David Hildenbrand wrote:
> TODO: replace by a proper header sync.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  include/standard-headers/linux/virtio_mem.h | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/include/standard-headers/linux/virtio_mem.h 
> b/include/standard-headers/linux/virtio_mem.h
> index 05e5ade75d..18c74c527c 100644
> --- a/include/standard-headers/linux/virtio_mem.h
> +++ b/include/standard-headers/linux/virtio_mem.h
> @@ -68,9 +68,10 @@
>   * explicitly triggered (VIRTIO_MEM_REQ_UNPLUG).
>   *
>   * There are no guarantees what will happen if unplugged memory is
> - * read/written. Such memory should, in general, not be touched. E.g.,
> - * even writing might succeed, but the values will simply be discarded at
> - * random points in time.
> + * read/written. In general, unplugged memory should not be touched, because
> + * the resulting action is undefined. There is one exception: without
> + * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, unplugged memory inside the usable
> + * region can be read, to simplify creation of memory dumps.
>   *
>   * It can happen that the device cannot process a request, because it is
>   * busy. The device driver has to retry later.
> @@ -87,6 +88,8 @@
>  
>  /* node_id is an ACPI PXM and is valid */
>  #define VIRTIO_MEM_F_ACPI_PXM0
> +/* unplugged memory must not be accessed */
> +#define VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE  1
>  
>  
>  /* --- virtio-mem: guest -> host requests --- */
> 

This is verbatim copy of kernel commit of 61082ad6a6e1f.

Reviewed-by: Michal Privoznik 

Michal




Re: [PATCH v1 3/3] virtio-mem: Set "unplugged-inaccessible=auto" for the 6.2 machine on x86

2021-11-30 Thread Michal Prívozník
On 11/30/21 10:28, David Hildenbrand wrote:
> Set the new default to "auto", keeping it set to "on" for compat

I believe you wanted to say: keeping it set to "off", because that's
what the patch does.

> machines. This property is only available for x86 targets.
> 
> TODO: once 6.2 was released and we have compat machines, target the next
>   QEMU release.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/i386/pc.c   | 1 +
>  hw/virtio/virtio-mem.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index a2ef40ecbc..045ba05431 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -99,6 +99,7 @@ GlobalProperty pc_compat_6_1[] = {
>  { TYPE_X86_CPU, "hv-version-id-major", "0x0006" },
>  { TYPE_X86_CPU, "hv-version-id-minor", "0x0001" },
>  { "ICH9-LPC", "x-keep-pci-slot-hpc", "false" },
> +{ "virtio-mem", "unplugged-inaccessible", "off" },
>  };
>  const size_t pc_compat_6_1_len = G_N_ELEMENTS(pc_compat_6_1);
>  
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 1e57156e81..a5d26d414f 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -1169,7 +1169,7 @@ static Property virtio_mem_properties[] = {
>   TYPE_MEMORY_BACKEND, HostMemoryBackend *),
>  #if defined(VIRTIO_MEM_HAS_LEGACY_GUESTS)
>  DEFINE_PROP_ON_OFF_AUTO(VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP, 
> VirtIOMEM,
> -unplugged_inaccessible, ON_OFF_AUTO_OFF),
> +unplugged_inaccessible, ON_OFF_AUTO_AUTO),
>  #endif
>  DEFINE_PROP_END_OF_LIST(),
>  };
> 

Reviewed-by: Michal Privoznik 

Michal




Re: Odd square bracket encoding in QOM names

2021-11-30 Thread Michal Prívozník
On 11/30/21 09:35, Mark Cave-Ayland wrote:
> Hi all,
> 
> Has there been a recent change as to how square brackets are encoded
> within QOM names? I noticed that the output has changed here in the
> "info qom-tree" output in qemu-system-m68k for the q800 machine.
> 
> The q800 machine has a set of 256 memory region aliases that used to
> appear in the "info qom-tree" output as:
> 
>     /mac_m68k.io[100] (memory-region)
>     /mac_m68k.io[101] (memory-region)
>     /mac_m68k.io[102] (memory-region)
> 
> but they now appear as:
> 
>     /mac_m68k.io\x5b100\x5d[0] (memory-region)
>     /mac_m68k.io\x5b101\x5d[0] (memory-region)
>     /mac_m68k.io\x5b102\x5d[0] (memory-region)
> 
> Is there something that could cause the names to be double-encoded
> before being displayed?

I see the same behavior on x86_64 and qemu-system-x86_64 but with a
different member:

/machine (pc-i440fx-4.0-machine)
  /device-memory[0] (memory-region)
  /fw_cfg (fw_cfg_io)
/\x2from@etc\x2facpi\x2frsdp[0] (memory-region)
/\x2from@etc\x2facpi\x2ftables[0] (memory-region)
/\x2from@etc\x2ftable-loader[0] (memory-region)

And the same happens over QMP too:

virsh qemu-monitor-command $dom qom-list '"path":"/machine/fw_cfg"'

{"return":[{"name":"type","type":"string"},{"name":"parent_bus","type":"link"},{"name":"realized","type":"bool"},{"name":"hotplugged","type":"bool"},{"name":"hotpluggable","type":"bool"},{"name":"acpi-mr-restore","type":"bool"},{"name":"x-file-slots","type":"uint16"},{"name":"dma_enabled","type":"bool"},{"name":"\\x2from@etc\\x2facpi\\x2ftables[0]","type":"child"},{"name":"fwcfg.dma[0]","type":"child"},{"name":"fwcfg[0]","type":"child"},{"name":"\\x2from@etc\\x2facpi\\x2frsdp[0]","type":"child"},{"name":"\\x2from@etc\\x2ftable-loader[0]","type":"child"}],"id":"libvirt-455"}

Michal




Re: RFC: extending QGA guest-network-get-interfaces

2021-11-24 Thread Michal Prívozník
On 11/24/21 15:00, Marc-André Lureau wrote:
> Hi
> 
> Or Shoval filed two RFE for guest-network-get-interfaces:
> - "Guest agent should report the interface Permanent address"
> https://bugzilla.redhat.com/show_bug.cgi?id=2025303
> 

As I've replied in the BZ, this is something that Libvirt can improve
on. What happens when guest changes a MAC address is that
NIC_RX_FILTER_CHANGED event is emitted and libvirt processes it (mostly
to update guest XML, but not limited only to that). But, Libvirt does
not forward this event further to users. Therefore, if a mgmt app on the
top of Libvirt wants to keep track of MAC addresses, it can't do that
simply.

> - "Guest agent should report interfaces from non root network
> namespaces"  https://bugzilla.redhat.com/show_bug.cgi?id=2025296
> 

This is a bit tricky to achieve. There's no easy way to list all nsids
(they don't form a hierarchy like other NSs). We could traverse all
mount points (be aware of mount namespaces), all PIDs (be aware of PID
namespaces) and hope we covered all net NSs (a net namespace can have no
nsid).

IOW, network NSs behave different to other namespaces.

Michal




Re: [PATCH] qmp: Stabilize preconfig

2021-11-10 Thread Michal Prívozník
On 11/3/21 9:02 AM, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
>> On Mon, Nov 01, 2021 at 03:37:58PM +0100, Michal Prívozník wrote:
>>> On 10/25/21 2:19 PM, Markus Armbruster wrote:
>>>> Michal Privoznik  writes:
>>>>
>>>>> The -preconfig option and exit-preconfig command are around for
>>>>> quite some time now. However, they are still marked as unstable.
>>>>> This is suboptimal because it may block some upper layer in
>>>>> consuming it. In this specific case - Libvirt avoids using
>>>>> experimental features.
>>>>>
>>>>> Signed-off-by: Michal Privoznik 
>>>>
>>>> If I remember correctly, the motivation for -preconfig was NUMA
>>>> configuration via QMP.  More uses may have appeared since.
>>>>
>>>> Back then, I questioned the need for yet another option and yet another
>>>> state: why not -S?
>>>>
>>>> The answer boiled down to
>>>>
>>>> 0. Yes, having just one would be a simpler and cleaner interface, but
>>>>
>>>> 1. the godawful mess QEMU startup has become makes -S unsuitable for
>>>>some things we want to do, so we need -preconfig,
>>>>
>>>> 2. which is in turn unsuitable for other things we want to do, so we
>>>>still need -S".
>>>>
>>>> 3. Cleaning up the mess to the point where "simpler and cleaner" becomes
>>>>viable again is not in the cards right now.
>>>
>>> I see a difference between the two. -preconfig starts QEMU in such a way
>>> that its configuration can still be changed (in my particular use case
>>> vCPUs can be assigned to NUMA nodes), while -S does not allow that. If
>>> we had one state for both, then some commands must be forbidden from
>>> executing as soon as 'cont' is issued. Moreover, those commands would
>>> need to do much more than they are doing now (e.g. regenerate ACPI table
>>> after each run). Subsequently, validating configuration would need to be
>>> postponed until the first 'cont' because with just one state QEMU can't
>>> know when the last config command was issued.
> 
> Doesn't all this apply to x-exit-preconfig already?
> 
> * Some commands are only allowed before x-exit-preconfig,
>   e.g. set-numa-node.
> 
> * The complete (pre-)configuration is only available at
>   x-exit-preconfig.  In particular, ACPI tables can be fixed only then.

So why was preconfig introduced in the first place? I mean, from
libvirt's POV it doesn't really matter whether it needs to use both
-preconfig and -S or just -S (or some new -option). But ideally, we
would start QEMU with nothing but monitor config and then pass whole
configuration via the monitor. I thought it would be simpler for QEMU if
it had three states.

Michal




Re: [PATCH] device_tree: Fix compiler error

2021-11-09 Thread Michal Prívozník
On 11/9/21 9:38 AM, Richard Henderson wrote:
> On 11/8/21 9:07 PM, Stefan Weil wrote:
>> A build with gcc (Debian 10.2.1-6) 10.2.1 20210110 fails:
>>
>> ../../../softmmu/device_tree.c: In function ‘qemu_fdt_add_path’:
>> ../../../softmmu/device_tree.c:560:18: error: ‘retval’ may be used
>> uninitialized in this function [-Werror=maybe-uninitialized]
>>    560 | int namelen, retval;
>>    |  ^~
>>
>> This is not a real error, but the compiler can be satisfied with a
>> small change.
>>
>> Fixes: b863f0b75852 ("device_tree: Add qemu_fdt_add_path")
>> Signed-off-by: Stefan Weil 
> 
> Reviewed-by: Richard Henderson 
> 
> Though I think there's a good deal that could be cleaned up about this
> function:
> 
> (1a) Remove the unused return value?
>  The single use does not check the return.
> 
> (1b) Don't attempt to return a node, merely a success/failure code.
>  Certainly the local documentation here could be improved...
> 
> (1c) Return parent; make retval local to the loop.
> 
> (2) Merge p and path; there's no point retaining the unmodified parameter.
> 
> (3) Move name and namelen inside the loop.


(4) swap if() bodies?

if (retval == -FDT_ERR_NOTFOUND) {
} else if (retval < 0) {
}

Michal




Re: [PATCH] qmp: Stabilize preconfig

2021-11-01 Thread Michal Prívozník
On 10/25/21 2:19 PM, Markus Armbruster wrote:
> Michal Privoznik  writes:
> 
>> The -preconfig option and exit-preconfig command are around for
>> quite some time now. However, they are still marked as unstable.
>> This is suboptimal because it may block some upper layer in
>> consuming it. In this specific case - Libvirt avoids using
>> experimental features.
>>
>> Signed-off-by: Michal Privoznik 
> 
> If I remember correctly, the motivation for -preconfig was NUMA
> configuration via QMP.  More uses may have appeared since.
> 
> Back then, I questioned the need for yet another option and yet another
> state: why not -S?
> 
> The answer boiled down to
> 
> 0. Yes, having just one would be a simpler and cleaner interface, but
> 
> 1. the godawful mess QEMU startup has become makes -S unsuitable for
>some things we want to do, so we need -preconfig,
> 
> 2. which is in turn unsuitable for other things we want to do, so we
>still need -S".
> 
> 3. Cleaning up the mess to the point where "simpler and cleaner" becomes
>viable again is not in the cards right now.

I see a difference between the two. -preconfig starts QEMU in such a way
that its configuration can still be changed (in my particular use case
vCPUs can be assigned to NUMA nodes), while -S does not allow that. If
we had one state for both, then some commands must be forbidden from
executing as soon as 'cont' is issued. Moreover, those commands would
need to do much more than they are doing now (e.g. regenerate ACPI table
after each run). Subsequently, validating configuration would need to be
postponed until the first 'cont' because with just one state QEMU can't
know when the last config command was issued.

Having said all of that, I'm not sure if -preconfig is the way to go or
we want to go the other way. I don't have a strong opinion.

Michal




Re: [PATCH 1/2] chardev: Propagate error from logfile opening

2021-09-06 Thread Michal Prívozník
On 8/17/21 11:54 AM, Marc-André Lureau wrote:
> On Tue, Aug 17, 2021 at 12:56 PM Michal Privoznik 
> wrote:
> 
>> If a chardev has a logfile the file is opened using
>> qemu_open_old() which does the job, but since @errp is not
>> propagated into qemu_open_internal() we lose much more accurate
>> error and just report "Unable to open logfile $errno".  When
>> using plain files, it's probably okay as nothing complex is
>> happening behind the curtains. But the problem becomes more
>> prominent when passing an "/dev/fdset/XXX" path since much more
>> needs to be done.
>>
>> The fix is to use qemu_create() which passes @errp further down.
>>
>> Signed-off-by: Michal Privoznik 
>>
> 
> Reviewed-by: Marc-André Lureau 

Thanks, can you add it to your pull queue please?

Michal




Re: [PATCH 2/2] monitor: Report EBADFD if fdset contains invalid FD

2021-08-17 Thread Michal Prívozník
On 8/17/21 11:59 AM, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Aug 17, 2021 at 12:56 PM Michal Privoznik 
> wrote:
> 
>> When opening a path that starts with "/dev/fdset/" the control
>> jumps into qemu_parse_fdset() and then into
>> monitor_fdset_dup_fd_add(). In here, corresponding fdset is found
>> and then all FDs from the set are iterated over trying to find an
>> FD that matches expected access mode. For instance, if caller
>> wants O_WRONLY then the FD set has to contain an O_WRONLY FD.
>>
>> If no such FD is found then errno is set to EACCES which results
>> in very misleading error messages, for instance:
>>
>>   Could not dup FD for /dev/fdset/3 flags 441: Permission denied
>>
>> There is no permission issue, the problem is that there was no FD
>> within given fdset that was in expected access mode. Therefore,
>> let's set errno to EBADFD, which gives us somewhat better
>> error messages:
>>
>>   Could not dup FD for /dev/fdset/3 flags 441: File descriptor in bad state
>>
>> Signed-off-by: Michal Privoznik 
>>
> 
> I am not sure this is any better. If you try to open a read-only file, the
> system also reports EACCES (Permission denied). This is what the current
> code models, I believe.

Fair enough. Another idea I had was that if an FD that's O_RDWR was
passed but only read or only write access was requested then such FD
could be accepted because it is capable of reading/writing.

But since patch 1/2 was accepted then I guess 2/2 is not that much needed.

Michal




Re: [PATCH 2/2] numa: Parse initiator= attribute before cpus= attribute

2021-07-09 Thread Michal Prívozník
On 7/9/21 11:26 AM, Igor Mammedov wrote:
> On Wed,  7 Jul 2021 15:40:30 +0200
> Michal Privoznik  wrote:
> 
>> When parsing cpus= attribute of -numa object couple of checks
>> is performed, such as correct initiator setting (see the if()
>> statement at the end of for() loop in
>> machine_set_cpu_numa_node()).
>>
>> However, with the current code cpus= attribute is parsed before
>> initiator= attribute and thus the check may fail even though it
>> is not obvious why. But since parsing the initiator= attribute
>> does not depend on the cpus= attribute we can swap the order of
>> the two.
> 
> Reviewed-by: Igor Mammedov 
> 
> FYI:
> I'm planning to deprecate '-numa node,cpus=' in favor of '-numa cpu'.

Yes, I'm aware of that and I remember I posted some patches for libvirt
that you reviewed. I need to return to them rather sooner than later.
Meanwhile, this patch helps.

Thanks,
Michal




Re: [PATCH] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device

2021-06-14 Thread Michal Prívozník
On 6/14/21 6:03 PM, Philippe Mathieu-Daudé wrote:
> On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote:
>> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
>> -ENOMEM in case of error. The driver was correctly handling the
>> error path to recycle its volatile IOVA mappings.
>>
>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
>> DMA mappings per container", April 2019) added the -ENOSPC error to
>> signal the user exhausted the DMA mappings available for a container.
> 
> Hmm this commit has been added before v5.1-rc4.
> 
> So while this fixes the behavior of v5.1-rc4+ kernels,
> older kernels using this fix will have the same problem...
> 
> Should I check uname(2)'s utsname.release[]? Is it reliable?

Not at all. What if somebody runs kernel with that commit backported? I
would leave this up to distro maintainers to figure out. They know what
kernel patches are backported and thus what qemu patches should be
backported too. I'm wondering if there's a way we can help them by
letting know (other than mentioning the kernel commit).

Michal




Re: [PATCH] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device

2021-06-14 Thread Michal Prívozník
On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote:
> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
> -ENOMEM in case of error. The driver was correctly handling the
> error path to recycle its volatile IOVA mappings.
> 
> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
> DMA mappings per container", April 2019) added the -ENOSPC error to
> signal the user exhausted the DMA mappings available for a container.
> 
> The block driver started to mis-behave:
> 
>   qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
>   (qemu)
>   (qemu) info status
>   VM status: paused (io-error)
>   (qemu) c
>   VFIO_MAP_DMA failed: No space left on device
>   qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: 
> Assertion `ctx == blk->ctx' failed.
> 
> Fix by handling the -ENOSPC error when DMA mappings are exhausted;
> other errors (such -ENOMEM) are still handled later in the same
> function.
> 
> An easy way to reproduce this bug is to restrict the DMA mapping
> limit (65535 by default) when loading the VFIO IOMMU module:
> 
>   # modprobe vfio_iommu_type1 dma_entry_limit=666
> 
> Cc: qemu-sta...@nongnu.org
> Reported-by: Michal Prívozník 
> Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver")
> Buglink: https://bugs.launchpad.net/qemu/+bug/186
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Michal, is it still possible for you to test this (old bug)?
> 

Unfortunately I no longer have access to the machine. But, IIRC it was
fairly easy to reproduce - just passthrough any NVMe disk using NVMe
disk backend (-blockdev '{"driver":"nvme", ...).

Sorry,
Michal




Re: [PULL 12/25] virtio-gpu: move virgl realize + properties

2021-05-21 Thread Michal Prívozník
On 5/10/21 3:20 PM, Gerd Hoffmann wrote:
> Move device init (realize) and properties.
> 
> Drop the virgl property, the virtio-gpu-gl-device has virgl enabled no
> matter what.  Just use virtio-gpu-device instead if you don't want
> enable virgl and opengl.  This simplifies the logic and reduces the test
> matrix.
> 
> Signed-off-by: Gerd Hoffmann 
> Message-id: 20210430113547.1816178-1-kra...@redhat.com
> Message-Id: <20210430113547.1816178-4-kra...@redhat.com>
> ---
>  include/hw/virtio/virtio-gpu.h |  1 +
>  hw/display/virtio-gpu-gl.c | 33 +
>  hw/display/virtio-gpu.c| 23 +--
>  3 files changed, 35 insertions(+), 22 deletions(-)
> 

> @@ -1251,12 +1236,6 @@ static Property virtio_gpu_properties[] = {
>  VIRTIO_GPU_BASE_PROPERTIES(VirtIOGPU, parent_obj.conf),
>  DEFINE_PROP_SIZE("max_hostmem", VirtIOGPU, conf_max_hostmem,
>   256 * MiB),
> -#ifdef CONFIG_VIRGL
> -DEFINE_PROP_BIT("virgl", VirtIOGPU, parent_obj.conf.flags,
> -VIRTIO_GPU_FLAG_VIRGL_ENABLED, true),
> -DEFINE_PROP_BIT("stats", VirtIOGPU, parent_obj.conf.flags,
> -VIRTIO_GPU_FLAG_STATS_ENABLED, false),
> -#endif
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> 

Sorry for catching this a bit late, but libvirt is looking for "virgl"
property when guest XML has 3D acceleration enabled:


  

  
  


This is the corresponding part of cmd line:

  -device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2

The commit message suggests that virtio-gpu-gl-device should be used
instead. Fair enough, so IIUC the cmd line should be changed to:

  -device virtio-gpu-gl-device,id=video0,max_outputs=1,bus=pci.0,addr=0x2


Michal




Re: [PATCH] virtio-blk: drop deprecated scsi=on|off property

2021-04-29 Thread Michal Prívozník
On 4/29/21 5:52 PM, Stefan Hajnoczi wrote:
> The scsi=on|off property was deprecated in QEMU 5.0 and can be removed
> completely at this point.
> 
> Drop the scsi=on|off option. It was only available on Legacy virtio-blk
> devices. Linux v5.6 already dropped support for it.
> 
> Remove the hw_compat_2_4[] property assignment since scsi=on|off no
> longer exists. Old guests with Legacy virtio-blk devices no longer see
> the SCSI host features bit.
> 
> Live migrating old guests from an old QEMU with the SCSI feature bit
> enabled will fail with "Features 0x... unsupported. Allowed features:
> 0x...". We've followed the QEMU deprecation policy so users have been
> warned...
> 
> I have tested that libvirt still works when the property is absent. It
> no longer adds scsi=on|off to the command-line.
> 
> Cc: Markus Armbruster 
> Cc: Christoph Hellwig 
> Cc: Peter Krempa 
> Cc: Dr. David Alan Gilbert 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  docs/specs/tpm.rst   |   2 +-
>  docs/system/deprecated.rst   |  13 ---
>  docs/pci_expander_bridge.txt |   2 +-
>  hw/block/virtio-blk.c| 192 +--
>  hw/core/machine.c|   2 -
>  5 files changed, 3 insertions(+), 208 deletions(-)

Reviewed-by: Michal Privoznik 

Michal




Re: [PULL v8 00/86] Misc QEMU patches for 2020-09-24

2020-10-02 Thread Michal Prívozník

On 10/2/20 6:22 PM, Eduardo Habkost wrote:

On Fri, Oct 02, 2020 at 05:58:55PM +0200, Michal Prívozník wrote:

On 9/30/20 9:58 PM, Paolo Bonzini wrote:


Eduardo Habkost (10):



docs: Create docs/devel/qom.rst


cd442a45db60a1a72fcf980c24bd1227f13f8a87 is the first bad commit

Sorry for noticing this earlier, but is this known? The build starts failing
for me after this commit:

/usr/bin/sphinx-build -Dversion=5.1.50 -Drelease= -W -Ddepfile=docs/devel.d
-Ddepfile_stamp=docs/devel.stamp -b html -d
/home/zippy/work/qemu/qemu.git/build/docs/devel.p
/home/zippy/work/qemu/qemu.git/docs/devel
/home/zippy/work/qemu/qemu.git/build/docs/devel
Running Sphinx v3.2.1
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 20 source files that are out of date
updating environment: [new config] 20 added, 0 changed, 0 removed
reading sources... [100%] testing




Warning, treated as error:
/home/zippy/work/qemu/qemu.git/docs/../include/qom/object.h:747:Error in
declarator
If declarator-id with parameters (e.g., 'void f(int arg)'):
   Invalid C declaration: Expected identifier in nested name. [error at 24]
 object_initialize_child ( parent,  propname,  child,  type)
 ^
If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
   Error in declarator or parameters
   Invalid C declaration: Expecting "(" in parameters. [error at 32]
 object_initialize_child ( parent,  propname,  child,  type)
 ^

make[1]: *** [Makefile.ninja:9898: docs/devel.stamp] Error 2
make[1]: *** Deleting file 'docs/devel.stamp'
make[1]: Leaving directory '/home/zippy/work/qemu/qemu.git/build'
make: *** [GNUmakefile:11: all] Error 2


I can't reproduce it using Sphinx v2.2.2.  I'm still trying to
understand what exactly the error means.



Same here.


I really wish we used virtualenv + requirements.txt to require a
specific version of Sphinx instead of wasting time dealing a wide
range of Sphinx versions.



I already have a patch that I keep locally to build with v3:

diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
index 7e09773a9c..ae83f6a1a8 100644
--- a/docs/qemu-option-trace.rst.inc
+++ b/docs/qemu-option-trace.rst.inc
@@ -1,7 +1,7 @@

 Specify tracing options.

-.. option:: [enable=]PATTERN
+.. option:: enable=PATTERN

   Immediately enable events matching *PATTERN*
   (either event name or a globbing pattern).  This option is only


That said, I'm not objecting to requiring v2 for now and switching to v3 
later.



But interestingly, through trial and error I've came across this hack, 
which allows me to build again. I have no idea why it works:


diff --git i/include/qom/object.h w/include/qom/object.h
index 27aaa67e63..59c729ebb7 100644
--- i/include/qom/object.h
+++ w/include/qom/object.h
@@ -762,13 +762,14 @@ bool object_initialize_child_with_propsv(Object 
*parentobj,

  *  child, sizeof(*child), type,
  *  &error_abort, NULL)
  */
-#define object_initialize_child(parent, propname, child, type)  \
-object_initialize_child_internal((parent), (propname),  \
- (child), sizeof(*(child)), (type))
 void object_initialize_child_internal(Object *parent, const char 
*propname,

   void *child, size_t size,
   const char *type);

+#define object_initialize_child(parent, propname, child, type)  \
+object_initialize_child_internal((parent), (propname),  \
+ (child), sizeof(*(child)), (type))
+
 /**
  * object_dynamic_cast:
  * @obj: The object to cast.




Re: [PULL v8 00/86] Misc QEMU patches for 2020-09-24

2020-10-02 Thread Michal Prívozník

On 9/30/20 9:58 PM, Paolo Bonzini wrote:
>

Eduardo Habkost (10):



   docs: Create docs/devel/qom.rst


cd442a45db60a1a72fcf980c24bd1227f13f8a87 is the first bad commit

Sorry for noticing this earlier, but is this known? The build starts 
failing for me after this commit:


/usr/bin/sphinx-build -Dversion=5.1.50 -Drelease= -W 
-Ddepfile=docs/devel.d -Ddepfile_stamp=docs/devel.stamp -b html -d 
/home/zippy/work/qemu/qemu.git/build/docs/devel.p 
/home/zippy/work/qemu/qemu.git/docs/devel 
/home/zippy/work/qemu/qemu.git/build/docs/devel

Running Sphinx v3.2.1
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 20 source files that are out of date
updating environment: [new config] 20 added, 0 changed, 0 removed
reading sources... [100%] testing 





Warning, treated as error:
/home/zippy/work/qemu/qemu.git/docs/../include/qom/object.h:747:Error in 
declarator

If declarator-id with parameters (e.g., 'void f(int arg)'):
  Invalid C declaration: Expected identifier in nested name. [error at 24]
object_initialize_child ( parent,  propname,  child,  type)
^
If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
  Error in declarator or parameters
  Invalid C declaration: Expecting "(" in parameters. [error at 32]
object_initialize_child ( parent,  propname,  child,  type)
^

make[1]: *** [Makefile.ninja:9898: docs/devel.stamp] Error 2
make[1]: *** Deleting file 'docs/devel.stamp'
make[1]: Leaving directory '/home/zippy/work/qemu/qemu.git/build'
make: *** [GNUmakefile:11: all] Error 2


Michal




Re: [PATCH] configure: Improve zstd test

2020-03-06 Thread Michal Prívozník
On 5. 3. 2020 16:27, Juan Quintela wrote:
> Alex Bennée  wrote:
>> Juan Quintela  writes:
>>
>>> There were one error on the test (missing an s for --exists).
>>> But we really need a recent zstd (0.8.1).
>>> That version was released in 2016, so it is newer that some of our travis
>>> images.  Just check for the version that we need.
>>>
>>> Signed-off-by: Juan Quintela 
>>> Reported-by: Richard Henderson 
>>> ---
>>>  configure | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/configure b/configure
>>> index 7b373bc0bb..1bf48df1ef 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -2464,7 +2464,8 @@ fi
>>>  # zstd check
>>>  
>>>  if test "$zstd" != "no" ; then
>>> -if $pkg_config --exist libzstd ; then
>>> +libzstd_minver="0.8.1"
>>> +if $pkg_config --atleast-version=$libzstd_minver libzstd ; then
>>>  zstd_cflags="$($pkg_config --cflags libzstd)"
>>>  zstd_libs="$($pkg_config --libs libzstd)"
>>>  LIBS="$zstd_libs $LIBS"
>>
>> Hmm still breaks with:
>>
>>make docker-test-build@ubuntu J=9 V=1
> 
> Thanks.
> 
>> With:
>>
>>   FY_SOURCE=2 -g   -c -o monitor/qmp.o /tmp/qemu-test/src/monitor/qmp.c
>>   /tmp/qemu-test/src/migration/multifd-zstd.c: In function 
>> 'zstd_send_prepare':
>>   /tmp/qemu-test/src/migration/multifd-zstd.c:125:9: error: unknown type 
>> name 'ZSTD_EndDirective'; did you mean 'ZSTD_DDict'?
>>ZSTD_EndDirective flush = ZSTD_e_continue;
>>^
> 
> Greate, more things were introduced later.
> As it would be too easy, the zstd repository is not lineal, you need to
> checkout the tag you want to see when something has been introduced.
> 
> Will try to get this fixed.
> 
> Sorry for the inconveniences.
> 
> 
>>   Version: 1.3.8+dfsg-3
>>   Depends: libzstd1 (= 1.3.8+dfsg-3)
>>   Description: fast lossless compression algorithm -- development files
> 
> I don't undertsand now.
> 
> ZSTD_EndDirective was included in 1.3.0.
> 
> I can just change that for 1.3.9, but I don't know why is that there.
> Could you do a grep ZSTD_EndDirective /usr/lib/zstd.h?

Thing is, they have so called experimental APIs. You get them only if
you define ZSTD_STATIC_LINKING_ONLY before including zstd.h. So the
plain grep of a symbol tells us nothing. We need to check if it's not in
#ifdef. Looks like 1.3.9 is the minimal version which has everything we
want.

Michal




[Bug 1863333] [NEW] Assigning NVMe disk to a domain causes VFIO_MAP_DMA errors

2020-02-14 Thread Michal Prívozník
Public bug reported:

I'm seeing some errors when assigning my NVMe disk to qemu. This is the
full command line:


/home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 \
-name guest=fedora,debug-threads=on \
-S \
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes
 \
-machine pc-i440fx-4.1,accel=kvm,usb=off,dump-guest-core=off \
-cpu host \
-m size=4194304k,slots=16,maxmem=1099511627776k \
-overcommit mem-lock=off \
-smp 4,sockets=1,dies=1,cores=2,threads=2 \
-object iothread,id=iothread1 \
-object iothread,id=iothread2 \
-object iothread,id=iothread3 \
-object iothread,id=iothread4 \
-mem-prealloc \
-mem-path /hugepages2M/libvirt/qemu/2-fedora \
-numa node,nodeid=0,cpus=0,mem=4096 \
-uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=31,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-boot menu=on,strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-blockdev 
'{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}'
 \
-blockdev 
'{"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-2-storage","backing":null}'
 \
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-2-format,id=scsi0-0-0-0,bootindex=1
 \
-blockdev 
'{"driver":"nvme","device":":02:00.0","namespace":1,"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}'
 \
-blockdev 
'{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}'
 \
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0
 \
-netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 \
-device 
virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3
 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=35,server,nowait \
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 \
-spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on \
-device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
-sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on

And these are the errors I see:

2020-02-14T09:06:18.183167Z qemu-system-x86_64: VFIO_MAP_DMA failed: Invalid 
argument
2020-02-14T09:10:49.753767Z qemu-system-x86_64: VFIO_MAP_DMA failed: Cannot 
allocate memory
2020-02-14T09:11:04.530344Z qemu-system-x86_64: VFIO_MAP_DMA failed: No space 
left on device
2020-02-14T09:11:04.531087Z qemu-system-x86_64: VFIO_MAP_DMA failed: No space 
left on device
2020-02-14T09:11:04.531230Z qemu-system-x86_64: VFIO_MAP_DMA failed: No space 
left on device


I'm doing nothing with the disk inside the guest, but:

  # dd if=/dev/vda of=/dev/null status=progress

(the disk appears as /dev/vda in the guest). Surprisingly, I do not see
these errors when I use the traditional PCI assignment (-device vfio-
pci). My versions of kernel and qemu:

moe ~ # uname -r
5.4.15-gentoo
moe ~ # /home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 
--version
QEMU emulator version 4.2.50 (v4.2.0-1439-g5d6542bea7-dirty)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/186

Title:
  Assigning NVMe disk to a domain causes VFIO_MAP_DMA errors

Status in QEMU:
  New

Bug description:
  I'm seeing some errors when assigning my NVMe disk to qemu. This is
  the full command line:

  
  /home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 \
  -name guest=fedora,debug-threads=on \
  -S \
  -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes
 \
  -machine pc-i440fx-4.1,accel=kvm,usb=off,dump-guest-core=off \
  -cpu host \
  -m size=4194304k,slots=16,maxmem=1099511627776k \
  -overcommit mem-lock=off \
  -smp 4,sockets=1,dies=1,cores=2,threads=2 \
  -object iothread,id=iothread1 \
  -object iothread,id=iothread2 \
  -object iothread,id=iothread3 \
  -object iothread,id=iothread4 \
  -mem-prealloc \
  -mem-path /hugepages2M/libvirt/qemu/2-fedora \
  -numa node,nodeid=0,cpus=0,mem=4096 \
  -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
  -no-user-config \
  -nodefaults \
  -chardev socket,id=charmonitor,fd=31,server,nowait \
  -mon

Re: [PATCH] x86: Check for machine state object class before typecasting it

2019-12-30 Thread Michal Prívozník
On 12/30/19 10:45 AM, Philippe Mathieu-Daudé wrote:
> On 12/30/19 10:35 AM, Michal Prívozník wrote:
>> On 12/30/19 9:41 AM, Philippe Mathieu-Daudé wrote:
>>> On 12/30/19 9:00 AM, Michal Privoznik wrote:
>>>> In v4.2.0-246-ged9e923c3c the SMM property was moved from PC
>>>
>>> Typo v4.2.0-246-ged9e923c3c -> ed9e923c3c.
>>
>> This depends on how you format the hash :-)
>> I've used 'git describe ed9e923c3c' because I find it more readable for
>> us humans (at least we see what version the commit was introduced in).
>> But I don't know what the praxis is in qemu.
> 
> Hmm I never used it. Your explanation makes sense, but the tag confused
> me because I don't have it locally. However git (and gitk) seems clever
> enough to only use the useful part:

The v4.2.0 tag is in origin. I wonder how come you do not have it.

> 
> $ git show randomcrap-ged9e923c3c
> commit ed9e923c3c9a2c50c4e82ba178b3fb1feba56867
> Author: Paolo Bonzini 
> Date:   Thu Dec 12 17:28:01 2019 +0100
> 
>     x86: move SMM property to X86MachineState
> 
> FYI My output is different:
> 
> $ git describe ed9e923c3c
> pull-target-arm-20191216-1-199-ged9e923c3c

You may want to use 'git describe --tags --match "v*" $commit'

But again, feel free to change it to whatever you/committer wants.

Michal




Re: [PATCH v2] accel/kvm: Make "kernel_irqchip" default on

2019-12-30 Thread Michal Prívozník
On 12/28/19 11:43 AM, Xiaoyao Li wrote:
> Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an
> accelerator property") moves kernel_irqchip property from "-machine" to
> "-accel kvm", but it forgets to set the default value of
> kernel_irqchip_allowed and kernel_irqchip_split.
> 
> Also cleaning up the three useless members (kernel_irqchip_allowed,
> kernel_irqchip_required, kernel_irqchip_split) in struct MachineState.
> 
> Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an 
> accelerator property")
> Reported-by: Vitaly Kuznetsov 
> Signed-off-by: Xiaoyao Li 
> ---
> Changes in v2:
>   - Add Reported-by tag;
>   - Initialize kernel_irqchip_split in init_machine();
> ---
>  accel/kvm/kvm-all.c | 3 +++
>  include/hw/boards.h | 3 ---
>  2 files changed, 3 insertions(+), 3 deletions(-)

Huh, I've just converged to the same patch.

Tested-by: Michal Prívozník 

Michal




Re: [PATCH] x86: Check for machine state object class before typecasting it

2019-12-30 Thread Michal Prívozník
On 12/30/19 9:41 AM, Philippe Mathieu-Daudé wrote:
> On 12/30/19 9:00 AM, Michal Privoznik wrote:
>> In v4.2.0-246-ged9e923c3c the SMM property was moved from PC
> 
> Typo v4.2.0-246-ged9e923c3c -> ed9e923c3c.

This depends on how you format the hash :-)
I've used 'git describe ed9e923c3c' because I find it more readable for
us humans (at least we see what version the commit was introduced in).
But I don't know what the praxis is in qemu.

> 
>> machine class to x86 machine class. Makes sense, but the change
>> was too aggressive: in target/i386/kvm.c:kvm_arch_init() it
>> altered check which sets SMRAM if given machine has SMM enabled.
>> The line that detects whether given machine object is class of
>> PC_MACHINE was removed from the check. This makes qemu try to
>> enable SMRAM for all machine types, which is not what we want.
>>
> 
> Fixes: ed9e923c3c
> Reviewed-by: Philippe Mathieu-Daudé 

Thanks,
Michal




Re: Making QEMU easier for management tools and applications

2019-12-23 Thread Michal Prívozník
On 12/21/19 10:02 AM, Markus Armbruster wrote:
> Stefan Hajnoczi  writes:
> 


>> 4. Go and Rust bindings would also be useful.  There is
>> https://github.com/intel/govmm but I think it makes sense to keep it
>> in qemu.git and provide an interface similar to our Python modules.
> 
> Mapping QAPI/QMP commands and events to function signatures isn't hard
> (the QAPI code generator does).  Two problems (at least):
> 
> 1. Leads to some pretty ridiculous functions.  Here's one:
> 
> void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
>  const char *device,
>  const char *target,
>  bool has_replaces, const char *replaces,
>  MirrorSyncMode sync,
>  bool has_speed, int64_t speed,
>  bool has_granularity, uint32_t granularity,
>  bool has_buf_size, int64_t buf_size,
>  bool has_on_source_error,
>  BlockdevOnError on_source_error,
>  bool has_on_target_error, BlockdevOnError 
> on_target_error,
>  bool has_filter_node_name, const char 
> *filter_node_name,
>  bool has_copy_mode, MirrorCopyMode copy_mode, 
>  bool has_auto_finalize, bool auto_finalize,
>  bool has_auto_dismiss, bool auto_dismiss,
>  Error **errp);
> 
>   We commonly use 'boxed': true for such beasts, which results in
>   functions like this one:
> 
> void qmp_blockdev_add(BlockdevOptions *arg, Error **errp);
> 
> 2. Many schema changes that are nicely backward compatible in QMP are
>anything but in such an "obvious" C API.  Adding optional arguments,
>for instance, or changing integer type width.  The former is less of
>an issue with 'boxed': true.
> 
> Perhaps less of an issue with dynamic languages.
> 
> I figure a static language would need much more expressive oomph than C
> to be a good target.  No idea how well Go or Rust bindings can work.

This is something that bothered me for a while now. Even though it's not
as bad as it used to be because we are not adding so much wrappers for
monitor commands as we used to. I mean, in libvirt the wrapper for a
monitor command has to be written by hand. Worse, whenever I'm adding a
wrapper I look at the QMP schema of it and let my muscle memory write
the wrapper.

However, it's not only what Markus already mentioned. Even if we
generated wrappers by a script, we need to be able to generate wrappers
for every single supported version of qemu.

For instance, if qemu version X has a command that accepts some set of
arguments and this set changes in version X+1 then libvirt needs both
wrappers and decides at runtime (depending on what version it is talking
to) what wrapper to use.

Unfortunately, I don't see any easy way out.

Michal




Re: [PULL 48/87] x86: move SMM property to X86MachineState

2019-12-23 Thread Michal Prívozník
On 12/23/19 2:38 PM, Paolo Bonzini wrote:
> On 23/12/19 12:40, Michal Prívozník wrote:
>>
>> diff --git i/target/i386/kvm.c w/target/i386/kvm.c
>> index 0b511906e3..7ee3202634 100644
>> --- i/target/i386/kvm.c
>> +++ w/target/i386/kvm.c
>> @@ -2173,6 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>  }
>>
>>  if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
>> +object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) &&
>>  x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
>>  smram_machine_done.notify = register_smram_listener;
>>  qemu_add_machine_init_done_notifier(&smram_machine_done);
> 
> Yes, it's correct.  Is it okay if I just send a patch with your
> Signed-off-by?

ACK.

Michal




Re: [PULL 48/87] x86: move SMM property to X86MachineState

2019-12-23 Thread Michal Prívozník
On 12/23/19 12:33 PM, Daniel P. Berrangé wrote:
> On Mon, Dec 23, 2019 at 12:28:43PM +0100, Michal Prívozník wrote:
>> On 12/18/19 1:02 PM, Paolo Bonzini wrote:
>>> Add it to microvm as well, it is a generic property of the x86
>>> architecture.
>>>
>>> Suggested-by: Sergio Lopez 
>>> Signed-off-by: Paolo Bonzini 
>>> ---
>>>  hw/i386/pc.c  | 49 
>>> -
>>>  hw/i386/pc_piix.c |  6 +++---
>>>  hw/i386/pc_q35.c  |  2 +-
>>>  hw/i386/x86.c | 50 
>>> +-
>>>  include/hw/i386/pc.h  |  3 ---
>>>  include/hw/i386/x86.h |  5 +
>>>  target/i386/kvm.c |  3 +--
>>>  7 files changed, 59 insertions(+), 59 deletions(-)
>>>
>>
>>
>>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>>> index ef63f3a..c7ff67a 100644
>>> --- a/target/i386/kvm.c
>>> +++ b/target/i386/kvm.c
>>> @@ -2173,8 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>  }
>>>  
>>>  if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
>>> -object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE) &&
>>> -pc_machine_is_smm_enabled(PC_MACHINE(ms))) {
>>> +x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
>>>  smram_machine_done.notify = register_smram_listener;
>>>  qemu_add_machine_init_done_notifier(&smram_machine_done);
>>>  }
>>>
>>
>> Sorry for not catching this earlier, but I don't think this is right.
>> The @ms is not instance of X
>>
>>
>> After I refreshed my qemu master I realized that libvirt is unable to
>> fetch capabilities. Libvirt runs the following command:
>>
>>   qemu.git $ ./x86_64-softmmu/qemu-system-x86_64 -S -no-user-config
>> -nodefaults -nographic -machine none,accel=kvm:tcg
> 
> Hmm, it would be good if we can get QEMU CI to launch QEMU  in
> this way, as this isn't the first time some change has broken
> launching of QEMU for probing capabilities.

Agreed.

NB, this diff fixes the issue for me, but I have no idea if it's correct
(it looks correct judging by the way the code looked before):

diff --git i/target/i386/kvm.c w/target/i386/kvm.c
index 0b511906e3..7ee3202634 100644
--- i/target/i386/kvm.c
+++ w/target/i386/kvm.c
@@ -2173,6 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }

 if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
+object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) &&
 x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
 smram_machine_done.notify = register_smram_listener;
 qemu_add_machine_init_done_notifier(&smram_machine_done);


Michal




Re: [PULL 48/87] x86: move SMM property to X86MachineState

2019-12-23 Thread Michal Prívozník
On 12/18/19 1:02 PM, Paolo Bonzini wrote:
> Add it to microvm as well, it is a generic property of the x86
> architecture.
> 
> Suggested-by: Sergio Lopez 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/i386/pc.c  | 49 -
>  hw/i386/pc_piix.c |  6 +++---
>  hw/i386/pc_q35.c  |  2 +-
>  hw/i386/x86.c | 50 +-
>  include/hw/i386/pc.h  |  3 ---
>  include/hw/i386/x86.h |  5 +
>  target/i386/kvm.c |  3 +--
>  7 files changed, 59 insertions(+), 59 deletions(-)
> 


> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index ef63f3a..c7ff67a 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -2173,8 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  }
>  
>  if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
> -object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE) &&
> -pc_machine_is_smm_enabled(PC_MACHINE(ms))) {
> +x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
>  smram_machine_done.notify = register_smram_listener;
>  qemu_add_machine_init_done_notifier(&smram_machine_done);
>  }
> 

Sorry for not catching this earlier, but I don't think this is right.
The @ms is not instance of X


After I refreshed my qemu master I realized that libvirt is unable to
fetch capabilities. Libvirt runs the following command:

  qemu.git $ ./x86_64-softmmu/qemu-system-x86_64 -S -no-user-config
-nodefaults -nographic -machine none,accel=kvm:tcg

plus some other (for now) irrelevant args. But qemu fails to initialize:

  qemu.git/target/i386/kvm.c:2176:kvm_arch_init: Object 0x563493f306b0
is not an instance of type x86-machine

and indeed it is not:

#0  0x750acd21 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:50
#1  0x75096535 in __GI_abort () at abort.c:79
#2  0x55d23275 in object_dynamic_cast_assert
(obj=0x567846b0, typename=0x55fd42f7 "x86-machine",
file=0x55fd3878 "/home/zippy/work/qemu/qemu.git/target/i386/kvm.c",
line=2176, func=0x55fd4eb0 <__func__.31258> "kvm_arch_init") at
qom/object.c:815
#3  0x55a1c3fb in kvm_arch_init (ms=0x567846b0,
s=0x568a8430) at /home/zippy/work/qemu/qemu.git/target/i386/kvm.c:2176
#4  0x558b4ad7 in kvm_init (ms=0x567846b0) at
/home/zippy/work/qemu/qemu.git/accel/kvm/kvm-all.c:2068
#5  0x55a44f0a in accel_init_machine (accel=0x568a8430,
ms=0x567846b0) at accel/accel.c:55
#6  0x55a3e28d in do_configure_accelerator
(opaque=0x7fffd6c2, opts=0x568a8290, errp=0x566f34f0
) at vl.c:2737
#7  0x55e9b773 in qemu_opts_foreach (list=0x5654ffe0
, func=0x55a3e1a8 ,
opaque=0x7fffd6c2, errp=0x566f34f0 ) at
util/qemu-option.c:1170
#8  0x55a3e4cb in configure_accelerators
(progname=0x7fffdde1
"/home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64") at
vl.c:2798
#9  0x55a417a8 in main (argc=7, argv=0x7fffda08,
envp=0x7fffda48) at vl.c:4121


#2  0x55d23275 in object_dynamic_cast_assert
(obj=0x567846b0, typename=0x55fd42f7 "x86-machine",
file=0x55fd3878 "/home/zippy/work/qemu/qemu.git/target/i386/kvm.c",
line=2176, func=0x55fd4eb0 <__func__.31258> "kvm_arch_init") at
qom/object.c:815
815 abort();
object_dynamic_cast_assert 1 $ p obj->class->type->name
$4 = 0x567ad720 "none-machine"


Michal




Re: [PATCH v4 21/24] nvme: support multiple namespaces

2019-12-19 Thread Michal Prívozník
On 12/19/19 2:09 PM, Klaus Jensen wrote:
> This adds support for multiple namespaces by introducing a new 'nvme-ns'
> device model. The nvme device creates a bus named from the device name
> ('id'). The nvme-ns devices then connect to this and registers
> themselves with the nvme device.
> 
> This changes how an nvme device is created. Example with two namespaces:
> 
>   -drive file=nvme0n1.img,if=none,id=disk1
>   -drive file=nvme0n2.img,if=none,id=disk2
>   -device nvme,serial=deadbeef,id=nvme0
>   -device nvme-ns,drive=disk1,bus=nvme0,nsid=1
>   -device nvme-ns,drive=disk2,bus=nvme0,nsid=2
> 
> The drive property is kept on the nvme device to keep the change
> backward compatible, but the property is now optional. Specifying a
> drive for the nvme device will always create the namespace with nsid 1.
> 
> Signed-off-by: Klaus Jensen 
> Signed-off-by: Klaus Jensen 

Klaus, just to make sure I understand correctly, this implements
multiple namespaces for *emulated* NVMe, right? I'm asking because I
just merged libvirt patches to support:

-drive
file.driver=nvme,file.device=:01:00.0,file.namespace=1,format=raw,if=none,id=drive-virtio-disk0
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1

and seeing these patches made me doubt my design. But if your patches
touch emulated NVMe only, then libvirt's fine because it doesn't expose
that just yet.

Michal




Re: [Qemu-devel] [PATCH] Makefile: Drop $(DESTDIR) from generated FW paths

2019-08-05 Thread Michal Prívozník
On 8/5/19 3:00 PM, Peter Maydell wrote:
> 
> Hi -- this looks like a duplicate of
> https://patchew.org/QEMU/20190530192812.17637-1-o...@aepfle.de/
> 
> (which Philippe has put in a pullreq which I guess is
> destined for 4.1, though I'm still waiting for confirmation
> of that, ie that it really is a for-4.1-worthy bug).
> 

Ah, did not realize that. Sorry. I've replied there.

Michal




Re: [Qemu-devel] [PULL 0/1] EDK2 firmware patches

2019-08-05 Thread Michal Prívozník
On 8/3/19 12:22 PM, Peter Maydell wrote:
> On Sat, 3 Aug 2019 at 09:26, Philippe Mathieu-Daudé  wrote:
>>
>> The following changes since commit 9bcf2dfa163f67b0fec6ee0fe88ad5dc5d69dc59:
>>
>>   Merge remote-tracking branch 
>> 'remotes/elmarco/tags/slirp-CVE-2019-14378-pull-request' into staging 
>> (2019-08-02 13:06:03 +0100)
>>
>> are available in the Git repository at:
>>
>>   https://gitlab.com/philmd/qemu.git tags/edk2-next-20190803
>>
>> for you to fetch changes up to 177cd674d6203d3c1a98e170ea56c5a904ac4ce8:
>>
>>   Makefile: remove DESTDIR from firmware file content (2019-08-03 09:52:32 
>> +0200)
>>
>> 
>> A harmless build-sys patch that fixes a regression affecting Linux
>> distributions packaging QEMU.
>>
>> 
>>
>> Olaf Hering (1):
>>   Makefile: remove DESTDIR from firmware file content
> 
> Is this pullreq intended for 4.1 ?

Please do pull it into 4.1 as the commit it fixes is aiming at 4.1 (just
like the whole feature). If not included then distros will need to
backport it anyway. Just my $0.02. Let's wait for Philippe's confirmation.

Michal



Re: [Qemu-devel] [PATCH 00/10] bundle edk2 platform firmware with QEMU

2019-03-09 Thread Michal Prívozník
On 3/9/19 1:48 AM, Laszlo Ersek wrote:
> Repo:   https://github.com/lersek/qemu.git
> Branch: edk2_build
> 
> This series advances the roms/edk2 submodule to the "edk2-stable201903"
> release, and builds and captures platform firmware binaries from that
> release. At this point they are meant to be used by both end-users and
> by Igor's ACPI unit tests in qtest ("make check").
> 
> Previous discussion:
> 
>   [Qemu-devel] bundling edk2 platform firmware images with QEMU
>   80f0bae3-e79a-bb68-04c4-1c9c684d95b8@redhat.com">http://mid.mail-archive.com/80f0bae3-e79a-bb68-04c4-1c9c684d95b8@redhat.com
>   https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg02601.html
> 
> Note that the series was formatted with "--no-binary" (affecting patch
> #8), therefore it cannot be applied with "git-am". See the remote
> repo/branch reference near the top instead.
> 
> Thanks,
> Laszlo
> 
> Laszlo Ersek (10):
>   roms: lift "edk2-funcs.sh" from "tests/uefi-test-tools/build.sh"
>   roms/edk2-funcs.sh: require gcc-4.8+ for building i386 and x86_64
>   tests/uefi-test-tools/build.sh: work around TianoCore#1607
>   roms/edk2: advance to tag edk2-stable201903
>   roms/edk2-funcs.sh: add the qemu_edk2_get_thread_count() function
>   roms/Makefile: replace the $(EFIROM) target with "edk2-basetools"
>   roms: build edk2 firmware binaries and variable store templates
>   pc-bios: add edk2 firmware binaries and variable store templates
>   pc-bios: document the edk2 firmware images; add firmware descriptors
>   Makefile: install the edk2 firmware images and their descriptors
> 
>  Makefile   |  17 +-
>  pc-bios/README |  11 +
>  pc-bios/descriptors/50-edk2-i386-secure.json   |  34 +++
>  pc-bios/descriptors/50-edk2-x86_64-secure.json |  35 +++
>  pc-bios/descriptors/60-edk2-aarch64.json   |  31 +++
>  pc-bios/descriptors/60-edk2-arm.json   |  31 +++
>  pc-bios/descriptors/60-edk2-i386.json  |  33 +++
>  pc-bios/descriptors/60-edk2-x86_64.json|  34 +++
>  pc-bios/edk2-aarch64-code.fd   | Bin 0 -> 67108864 bytes
>  pc-bios/edk2-arm-code.fd   | Bin 0 -> 67108864 bytes
>  pc-bios/edk2-arm-vars.fd   | Bin 0 -> 67108864 bytes
>  pc-bios/edk2-i386-code.fd  | Bin 0 -> 3653632 bytes
>  pc-bios/edk2-i386-secure-code.fd   | Bin 0 -> 3653632 bytes
>  pc-bios/edk2-i386-vars.fd  | Bin 0 -> 540672 bytes
>  pc-bios/edk2-licenses.txt  | 209 
>  pc-bios/edk2-x86_64-code.fd| Bin 0 -> 3653632 bytes
>  pc-bios/edk2-x86_64-secure-code.fd | Bin 0 -> 3653632 bytes
>  roms/Makefile  |   9 +-
>  roms/Makefile.edk2 | 138 +++
>  roms/edk2  |   2 +-
>  roms/edk2-build.sh |  55 +
>  roms/edk2-funcs.sh | 253 
>  tests/uefi-test-tools/build.sh | 100 +---
>  23 files changed, 897 insertions(+), 95 deletions(-)
>  create mode 100644 pc-bios/descriptors/50-edk2-i386-secure.json
>  create mode 100644 pc-bios/descriptors/50-edk2-x86_64-secure.json
>  create mode 100644 pc-bios/descriptors/60-edk2-aarch64.json
>  create mode 100644 pc-bios/descriptors/60-edk2-arm.json
>  create mode 100644 pc-bios/descriptors/60-edk2-i386.json
>  create mode 100644 pc-bios/descriptors/60-edk2-x86_64.json
>  create mode 100644 pc-bios/edk2-aarch64-code.fd
>  create mode 100644 pc-bios/edk2-arm-code.fd
>  create mode 100644 pc-bios/edk2-arm-vars.fd
>  create mode 100644 pc-bios/edk2-i386-code.fd
>  create mode 100644 pc-bios/edk2-i386-secure-code.fd
>  create mode 100644 pc-bios/edk2-i386-vars.fd
>  create mode 100644 pc-bios/edk2-licenses.txt
>  create mode 100644 pc-bios/edk2-x86_64-code.fd
>  create mode 100644 pc-bios/edk2-x86_64-secure-code.fd
>  create mode 100644 roms/Makefile.edk2
>  create mode 100755 roms/edk2-build.sh
>  create mode 100644 roms/edk2-funcs.sh
> 

Unsure whether my ACK is worth anything on this list, but you have it.

Reviewed-by: Michal Privoznik 

Now that libvirt patches for consuming those JSON descriptions are
almost merged, distros can start picking this up.

Michal



Re: [Qemu-devel] [PATCH 4/5] pr-manager: add query-pr-managers QMP command

2018-06-28 Thread Michal Prívozník
On 06/27/2018 05:44 PM, Paolo Bonzini wrote:
> On 26/06/2018 18:31, Michal Privoznik wrote:
>>>  
>>> +static bool pr_manager_helper_is_connected(PRManager *p)
>>> +{
>>> +PRManagerHelper *pr_mgr = PR_MANAGER_HELPER(p);
>>> +bool result;
>>> +
>>> +qemu_mutex_lock(&pr_mgr->lock);
>>> +result = (pr_mgr->ioc != NULL);
>> I worry it is not that easy. pr_mgr->ioc is unset only when there's
>> PR_IN/PR_OUT command coming from the guest (in pr_manager_helper_run ->
>> pr_manager_helper_write). In fact, after 5/5 that is also the time when
>> the event is delivered. But that might be too late for mgmt app to
>> restart the helper process (although pr_manager_helper_run() tries to
>> reconnect for 5 seconds before giving up).
> 
> That's true, however the important thing IMO is to have a QMP interface
> that libvirt can use; everything else is just quality of implementation.
> 
> qemu-pr-helper anyway does something only when a guests sends it a PR
> command - and with libvirt's per-guest model, that would (hopefully)
> mean that the only case that remains is when someone manually kills the
> qemu-pr-helper process.  In that case there's a certain amount of PEBKAC
> involved... :)

Unless an assert() is triggered ;-)

But since you merged my suggested changes in 5/5 libvirt can catch the
event pretty soon, so in my testing qemu was still left with 3-4
connection retries which is plenty.

Michal



Re: [Qemu-devel] [PATCH v2 0/2] Avoid using RUN_STATE_PRECONFIG unless explicitly requested with --preconfig

2018-06-11 Thread Michal Prívozník
On 06/04/2018 02:03 PM, Daniel P. Berrangé wrote:
> The change to use RUN_STATE_PRECONFIG as the default initial state, even
> when not requested with --preconfig has caused a number of problems. This
> series introduces a new RUN_STATE_NONE to act as the initial state, so
> that we never use RUN_STATE_PRECONFIG unless the mgmt app has explicitly
> requested todo so.
> 
> Daniel P. Berrangé (2):
>   vl: don't use RUN_STATE_PRECONFIG as initial state
>   vl: fix use of --daemonize with --preconfig
> 
>  qapi/run-state.json |  6 +-
>  vl.c| 49 +++--
>  2 files changed, 35 insertions(+), 20 deletions(-)
> 

So do we have some agreement here? I'm running qemu from git and I'm
still using my patch to make libvirt work.

Michal