Re: [PULL 06/35] hw/acpi: refactor acpi hp modules so that targets can just use what they need

2021-09-06 Thread Ani Sinha
On Mon, Sep 6, 2021 at 4:19 PM Ani Sinha  wrote:
>
> On Mon, Sep 6, 2021 at 3:54 PM Philippe Mathieu-Daudé  
> wrote:
> >
> > On 9/6/21 12:03 PM, Ani Sinha wrote:
> > > On Mon, 6 Sep 2021, Philippe Mathieu-Daudé wrote:
> > >> On 9/4/21 11:36 PM, Michael S. Tsirkin wrote:
> > >>> From: Ani Sinha 
> > >>>
> > >>> Currently various acpi hotplug modules like cpu hotplug, memory 
> > >>> hotplug, pci
> > >>> hotplug, nvdimm hotplug are all pulled in when CONFIG_ACPI_X86 is 
> > >>> turned on.
> > >>> This brings in support for whole lot of subsystems that some targets 
> > >>> like
> > >>> mips does not need. They are added just to satisfy symbol dependencies. 
> > >>> This
> > >>> is ugly and should be avoided. Targets should be able to pull in just 
> > >>> what they
> > >>> need and no more. For example, mips only needs support for PIIX4 and 
> > >>> does not
> > >>> need acpi pci hotplug support or cpu hotplug support or memory hotplug 
> > >>> support
> > >>> etc. This change is an effort to clean this up.
> > >>> In this change, new config variables are added for various acpi hotplug
> > >>> subsystems. Targets like mips can only enable PIIX4 support and not the 
> > >>> rest
> > >>> of all the other modules which were being previously pulled in as a 
> > >>> part of
> > >>> CONFIG_ACPI_X86. Function stubs make sure that symbols which piix4 
> > >>> needs but
> > >>> are not required by mips (for example, symbols specific to pci hotplug 
> > >>> etc)
> > >>> are available to satisfy the dependencies.
> > >>>
> > >>> Currently, this change only addresses issues with mips malta targets. 
> > >>> In future
> > >>> we might be able to clean up other targets which are similarly pulling 
> > >>> in lot
> > >>> of unnecessary hotplug modules by enabling ACPI_X86.
> > >>>
> > >>> This change should also address issues such as the following:
> > >>> https://gitlab.com/qemu-project/qemu/-/issues/221
> > >>> https://gitlab.com/qemu-project/qemu/-/issues/193
> > >>
> > >> FYI per 
> > >> https://docs.gitlab.com/ee/administration/issue_closing_pattern.html
> > >> this should have been:
> > >>
> > >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/193
> > >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/221
> > >>
> > >
> > > Ah my apologies. Will do this next time.
> > >
> > >> Can we close these issues manually?
> > >
> > > Since both you and I have verified that those issues gets fixed with my
> > > change, yes we can close them. I do not have a gitlab account. Should I
> > > have one? Is there special permissions needed to handle these tickets?
> >
> > Since you are listed in the MAINTAINERS file, long-term you'll
> > eventually use it anyway (i.e. to run the CI pipelines before sending
> > patches, to subscribe to the 'ACPI' label to get notifications or
> > comment ACPI-related issues).
> >
> > The process is quite straight-forward, once having an account you
> > simply request to be member of the project via the WebUI then you
> > can help triaging the issues (and closing these two).
>
> Hmm. I created an account and added a comment to the tickets. However
> I am unable to close them. I requested access to the project.

I could be wrong, but I think only reporters can open and close bugs
like yourself on gitlab.



Re: [PATCH] hw/display/ati_2d: Fix buffer overflow in ati_2d_blt (CVE-2021-3638)

2021-09-06 Thread Philippe Mathieu-Daudé
On 9/6/21 9:19 PM, Alexander Bulekov wrote:
> On 210906 2019, Philippe Mathieu-Daudé wrote:
>> (Forgot to Cc Alex for eventual reproducer)
> 
> Here you go. Should we be fuzzing this on OSS-Fuzz?

Should we limit what we fuzz there? All bugs found so far
have been useful. The issues fixed improved QEMU quality, and
the ones we couldn't fix (yet, mostly due to lack of time)
helped us understand design flaw and we are wondering how to
address them.

> = 8< =
> 
> /*
>  * cat << EOF | ./qemu-system-i386 -display none -machine accel=qtest, -m \
>  * 512M -device ati-vga,romfile= -nodefaults -qtest /dev/null -qtest stdio
>  * outl 0xcf8 0x80001018
>  * outl 0xcfc 0xe100
>  * outl 0xcf8 0x80001004
>  * outw 0xcfc 0x02

Thanks, this was the missing part :>

>  * write 0xe10016c4 0x1 0x04
>  * write 0xe10016e4 0x1 0x58
>  * write 0xe1001438 0x4 0x041a
>  * write 0xe100143c 0x4 0x0115
>  * EOF
>  */
> static void test_fuzz(void)
> {
> QTestState *s = qtest_init(
> "-display none , -m 512M -device ati-vga,romfile= -nodefaults -qtest 
> /dev/null");
> qtest_outl(s, 0xcf8, 0x80001018);
> qtest_outl(s, 0xcfc, 0xe100);
> qtest_outl(s, 0xcf8, 0x80001004);
> qtest_outw(s, 0xcfc, 0x02);
> qtest_bufwrite(s, 0xe10016c4, "\x04", 0x1);
> qtest_bufwrite(s, 0xe10016e4, "\x58", 0x1);
> qtest_bufwrite(s, 0xe1001438, "\x04\x00\x00\x1a", 0x4);
> qtest_bufwrite(s, 0xe100143c, "\x01\x00\x00\x15", 0x4);
> qtest_quit(s);
> }
> 
> = >8 =
> -Alex
> 
>>
>> On 9/6/21 6:44 PM, Mauro Matteo Cascella wrote:
>>> On Mon, Sep 6, 2021 at 5:31 PM Philippe Mathieu-Daudé  
>>> wrote:

 When building QEMU with DEBUG_ATI defined then running with
 '-device ati-vga,romfile="" -d unimp,guest_errors -trace ati\*'
 we get:

   ati_mm_write 4 0x16c0 DP_CNTL <- 0x1
   ati_mm_write 4 0x146c DP_GUI_MASTER_CNTL <- 0x2
   ati_mm_write 4 0x16c8 DP_MIX <- 0xff
   ati_mm_write 4 0x16c4 DP_DATATYPE <- 0x2
   ati_mm_write 4 0x224 CRTC_OFFSET <- 0x0
   ati_mm_write 4 0x142c DST_PITCH_OFFSET <- 0xfe0
   ati_mm_write 4 0x1420 DST_Y <- 0x3fff
   ati_mm_write 4 0x1410 DST_HEIGHT <- 0x3fff
   ati_mm_write 4 0x1588 DST_WIDTH_X <- 0x3fff3fff
   ati_2d_blt: vram:0x7fff5fa0 addr:0 ds:0x7fff61273800 stride:2560 
 bpp:32 rop:0xff
   ati_2d_blt: 0 0 0, 0 127 0, (0,0) -> (16383,16383) 16383x16383 > ^
   ati_2d_blt: pixman_fill(dst:0x7fff5fa0, stride:254, bpp:8, x:16383, 
 y:16383, w:16383, h:16383, xor:0xff00)
   Thread 3 "qemu-system-i38" received signal SIGSEGV, Segmentation fault.
   (gdb) bt
   #0  0x77f62ce0 in sse2_fill.lto_priv () at 
 /lib64/libpixman-1.so.0
   #1  0x77f09278 in pixman_fill () at /lib64/libpixman-1.so.0
   #2  0x57b5a9af in ati_2d_blt (s=0x63128800) at 
 hw/display/ati_2d.c:196
   #3  0x57b4b5a2 in ati_mm_write (opaque=0x63128800, 
 addr=5512, data=1073692671, size=4) at hw/display/ati.c:843
   #4  0x58b90ec4 in memory_region_write_accessor 
 (mr=0x63139cc0, addr=5512, ..., size=4, ...) at softmmu/memory.c:492

 Commit 584acf34cb0 ("ati-vga: Fix reverse bit blts") introduced
 the local dst_x and dst_y which adjust the (x, y) coordinates
 depending on the direction in the SRCCOPY ROP3 operation, but
 forgot to address the same issue for the PATCOPY, BLACKNESS and
 WHITENESS operations, which also call pixman_fill().

 Fix that now by using the adjusted coordinates in the pixman_fill
 call, and update the related debug printf().

 Reported-by: Qiang Liu 
 Fixes: 584acf34cb0 ("ati-vga: Fix reverse bit blts")
 Signed-off-by: Philippe Mathieu-Daudé 
 ---
  hw/display/ati_2d.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/hw/display/ati_2d.c b/hw/display/ati_2d.c
 index 4dc10ea7952..692bec91de4 100644
 --- a/hw/display/ati_2d.c
 +++ b/hw/display/ati_2d.c
 @@ -84,7 +84,7 @@ void ati_2d_blt(ATIVGAState *s)
  DPRINTF("%d %d %d, %d %d %d, (%d,%d) -> (%d,%d) %dx%d %c %c\n",
  s->regs.src_offset, s->regs.dst_offset, 
 s->regs.default_offset,
  s->regs.src_pitch, s->regs.dst_pitch, s->regs.default_pitch,
 -s->regs.src_x, s->regs.src_y, s->regs.dst_x, s->regs.dst_y,
 +s->regs.src_x, s->regs.src_y, dst_x, dst_y,
  s->regs.dst_width, s->regs.dst_height,
  (s->regs.dp_cntl & DST_X_LEFT_TO_RIGHT ? '>' : '<'),
  (s->regs.dp_cntl & DST_Y_TOP_TO_BOTTOM ? 'v' : '^'));
 @@ -180,11 +180,11 @@ void ati_2d_blt(ATIVGAState *s)
  dst_stride /= sizeof(uint32_t);
  DPRINTF("pixman_fill(%p, %d, %d, %d, %d, %d, %d, %x)\n",
  dst_bits, dst_stride, bpp,
 -s->regs.dst_x, s->regs.dst_y,

Re: [PATCH] hw/display/ati_2d: Fix buffer overflow in ati_2d_blt (CVE-2021-3638)

2021-09-06 Thread Philippe Mathieu-Daudé
On 9/6/21 9:52 PM, BALATON Zoltan wrote:
> On Mon, 6 Sep 2021, Philippe Mathieu-Daudé wrote:
>> (Forgot to Cc Alex for eventual reproducer)
>>
>> On 9/6/21 6:44 PM, Mauro Matteo Cascella wrote:
>>> On Mon, Sep 6, 2021 at 5:31 PM Philippe Mathieu-Daudé
>>>  wrote:

 When building QEMU with DEBUG_ATI defined then running with
 '-device ati-vga,romfile="" -d unimp,guest_errors -trace ati\*'
 we get:

   ati_mm_write 4 0x16c0 DP_CNTL <- 0x1
   ati_mm_write 4 0x146c DP_GUI_MASTER_CNTL <- 0x2
   ati_mm_write 4 0x16c8 DP_MIX <- 0xff
   ati_mm_write 4 0x16c4 DP_DATATYPE <- 0x2
   ati_mm_write 4 0x224 CRTC_OFFSET <- 0x0
   ati_mm_write 4 0x142c DST_PITCH_OFFSET <- 0xfe0
   ati_mm_write 4 0x1420 DST_Y <- 0x3fff
   ati_mm_write 4 0x1410 DST_HEIGHT <- 0x3fff
   ati_mm_write 4 0x1588 DST_WIDTH_X <- 0x3fff3fff
   ati_2d_blt: vram:0x7fff5fa0 addr:0 ds:0x7fff61273800
 stride:2560 bpp:32 rop:0xff
   ati_2d_blt: 0 0 0, 0 127 0, (0,0) -> (16383,16383) 16383x16383 > ^
   ati_2d_blt: pixman_fill(dst:0x7fff5fa0, stride:254, bpp:8,
 x:16383, y:16383, w:16383, h:16383, xor:0xff00)
   Thread 3 "qemu-system-i38" received signal SIGSEGV, Segmentation
 fault.
   (gdb) bt
   #0  0x77f62ce0 in sse2_fill.lto_priv () at
 /lib64/libpixman-1.so.0
   #1  0x77f09278 in pixman_fill () at /lib64/libpixman-1.so.0
   #2  0x57b5a9af in ati_2d_blt (s=0x63128800) at
 hw/display/ati_2d.c:196
   #3  0x57b4b5a2 in ati_mm_write (opaque=0x63128800,
 addr=5512, data=1073692671, size=4) at hw/display/ati.c:843
   #4  0x58b90ec4 in memory_region_write_accessor
 (mr=0x63139cc0, addr=5512, ..., size=4, ...) at
 softmmu/memory.c:492

 Commit 584acf34cb0 ("ati-vga: Fix reverse bit blts") introduced
 the local dst_x and dst_y which adjust the (x, y) coordinates
 depending on the direction in the SRCCOPY ROP3 operation, but
 forgot to address the same issue for the PATCOPY, BLACKNESS and
 WHITENESS operations, which also call pixman_fill().

 Fix that now by using the adjusted coordinates in the pixman_fill
 call, and update the related debug printf().

 Reported-by: Qiang Liu 
 Fixes: 584acf34cb0 ("ati-vga: Fix reverse bit blts")
 Signed-off-by: Philippe Mathieu-Daudé 
 ---
  hw/display/ati_2d.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/hw/display/ati_2d.c b/hw/display/ati_2d.c
 index 4dc10ea7952..692bec91de4 100644
 --- a/hw/display/ati_2d.c
 +++ b/hw/display/ati_2d.c
 @@ -84,7 +84,7 @@ void ati_2d_blt(ATIVGAState *s)
  DPRINTF("%d %d %d, %d %d %d, (%d,%d) -> (%d,%d) %dx%d %c %c\n",
  s->regs.src_offset, s->regs.dst_offset,
 s->regs.default_offset,
  s->regs.src_pitch, s->regs.dst_pitch,
 s->regs.default_pitch,
 -    s->regs.src_x, s->regs.src_y, s->regs.dst_x,
 s->regs.dst_y,
 +    s->regs.src_x, s->regs.src_y, dst_x, dst_y,
  s->regs.dst_width, s->regs.dst_height,
  (s->regs.dp_cntl & DST_X_LEFT_TO_RIGHT ? '>' : '<'),
  (s->regs.dp_cntl & DST_Y_TOP_TO_BOTTOM ? 'v' : '^'));
 @@ -180,11 +180,11 @@ void ati_2d_blt(ATIVGAState *s)
  dst_stride /= sizeof(uint32_t);
  DPRINTF("pixman_fill(%p, %d, %d, %d, %d, %d, %d, %x)\n",
  dst_bits, dst_stride, bpp,
 -    s->regs.dst_x, s->regs.dst_y,
 +    dst_x, dst_y,
  s->regs.dst_width, s->regs.dst_height,
  filler);
  pixman_fill((uint32_t *)dst_bits, dst_stride, bpp,
 -    s->regs.dst_x, s->regs.dst_y,
 +    dst_x, dst_y,
  s->regs.dst_width, s->regs.dst_height,
  filler);
  if (dst_bits >= s->vga.vram_ptr + s->vga.vbe_start_addr &&
 -- 
 2.31.1

>>>
>>> Tested-by: Mauro Matteo Cascella 
>>
>> Thanks. I wouldn't be surprise if we get another CVE in this code /
>> file / function ASAP this patch get merged... The code calls for a
>> rewrite, as per this function comment in its header:
>>
>> void ati_2d_blt(ATIVGAState *s)
>> {
>>    /* FIXME it is probably more complex than this and may need to be */
>>    /* rewritten but for now as a start just to get some output: */
> 
> It's also broken currently since the previous CVE fixes when I've tried
> to change it to only use unsigned values to avoid underflows and get
> away with only checking for overflows which simplifies it a bit. But
> turns out that's wrong, the hardware does allow negative values and
> while most drivers don't use that (such as Linux and MorphOS, so they
> still work), at least Solaris driver does and it produces broken picture
> now once X starts. (This can be reproduced 

Re: [PATCH 4/5] ebpf_rss_helper: Added helper for eBPF RSS.

2021-09-06 Thread Jason Wang
On Mon, Sep 6, 2021 at 11:50 PM Andrew Melnichenko  wrote:
>
> Hi,
>>
>> I think it's for back-compatibility.
>>
>> E.g current codes works without mmap(), and user will surprise that it
>> wont' work after upgrading their qemu.
>
> Well, the current code would require additional capabilities with 
> "kernel.unprivileged_bpf_disabled=1", which may be possible on RedHat systems.
> Technically we may have mmap test which will show that mmap for 
> BPF_MAP_TYPE_ARRAY works, but on the target system, we will know it only in 
> runtime.
> If I'm not mistaken, mmap for BPF_MAP_TYPE_ARRAY was added before kernel 5.4 
> and our bpf program requires kernel 5.8+.

Ok, if this is the case, please explain this in the commit log.

Btw, any reason that 5.8 is required for our bpf program?

Thanks

> So, there are no reasons to add bpf() update map as a fallback for mmap().
>
> On Wed, Sep 1, 2021 at 9:42 AM Jason Wang  wrote:
>>
>>
>> 在 2021/8/31 上午1:07, Yuri Benditovich 写道:
>> > On Fri, Aug 20, 2021 at 6:41 AM Jason Wang  wrote:
>> >>
>> >> 在 2021/7/13 下午11:37, Andrew Melnychenko 写道:
>> >>> Helper program. Loads eBPF RSS program and maps and passes them through 
>> >>> unix socket.
>> >>> Libvirt may launch this helper and pass eBPF fds to qemu virtio-net.
>> >>
>> >> I wonder if this can be done as helper for TAP/bridge.
>> >>
>> >> E.g it's the qemu to launch those helper with set-uid.
>> >>
>> >> Then libvirt won't even need to care about that?
>> >>
>> > There are pros and cons for such a solution with set-uid.
>> >  From my point of view one of the cons is that set-uid is efficient
>> > only at install time so the coexistence of different qemu builds (and
>> > different helpers for each one) is kind of problematic.
>> > With the current solution this does not present any problem: the
>> > developer can have several different builds, each one automatically
>> > has its own helper and there is no conflict between these builds and
>> > between these builds and installed qemu package. Changing the
>> > 'emulator' in the libvirt profile automatically brings the proper
>> > helper to work.
>>
>>
>> I'm not sure I get you here. We can still have default/sample helper to
>> make sure it works for different builds.
>>
>> If we can avoid the involvement of libvirt, that would be better.
>>
>> Thanks
>>
>>
>> >
>> >>> Also, libbpf dependency now exclusively for Linux.
>> >>> Libbpf is used for eBPF RSS steering, which is supported only by Linux 
>> >>> TAP.
>> >>> There is no reason yet to build eBPF loader and helper for non Linux 
>> >>> systems,
>> >>> even if libbpf is present.
>> >>>
>> >>> Signed-off-by: Andrew Melnychenko 
>> >>> ---
>> >>>ebpf/qemu-ebpf-rss-helper.c | 130 
>> >>>meson.build |  37 ++
>> >>>2 files changed, 154 insertions(+), 13 deletions(-)
>> >>>create mode 100644 ebpf/qemu-ebpf-rss-helper.c
>> >>>
>> >>> diff --git a/ebpf/qemu-ebpf-rss-helper.c b/ebpf/qemu-ebpf-rss-helper.c
>> >>> new file mode 100644
>> >>> index 00..fe68758f57
>> >>> --- /dev/null
>> >>> +++ b/ebpf/qemu-ebpf-rss-helper.c
>> >>> @@ -0,0 +1,130 @@
>> >>> +/*
>> >>> + * eBPF RSS Helper
>> >>> + *
>> >>> + * Developed by Daynix Computing LTD (http://www.daynix.com)
>> >>> + *
>> >>> + * Authors:
>> >>> + *  Andrew Melnychenko 
>> >>> + *
>> >>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> >>> + * the COPYING file in the top-level directory.
>> >>> + *
>> >>> + * Description: This is helper program for libvirtd.
>> >>> + *  It loads eBPF RSS program and passes fds through unix 
>> >>> socket.
>> >>> + *  Built by meson, target - 'qemu-ebpf-rss-helper'.
>> >>> + */
>> >>> +
>> >>> +#include 
>> >>> +#include 
>> >>> +#include 
>> >>> +#include 
>> >>> +#include 
>> >>> +#include 
>> >>> +#include 
>> >>> +#include 
>> >>> +
>> >>> +#include "ebpf_rss.h"
>> >>> +
>> >>> +#include "qemu-helper-stamp.h"
>> >>> +
>> >>> +void QEMU_HELPER_STAMP(void) {}
>> >>> +
>> >>> +static int send_fds(int socket, int *fds, int n)
>> >>> +{
>> >>> +struct msghdr msg = {};
>> >>> +struct cmsghdr *cmsg = NULL;
>> >>> +char buf[CMSG_SPACE(n * sizeof(int))];
>> >>> +char dummy_buffer = 0;
>> >>> +struct iovec io = { .iov_base = _buffer,
>> >>> +.iov_len = sizeof(dummy_buffer) };
>> >>> +
>> >>> +memset(buf, 0, sizeof(buf));
>> >>> +
>> >>> +msg.msg_iov = 
>> >>> +msg.msg_iovlen = 1;
>> >>> +msg.msg_control = buf;
>> >>> +msg.msg_controllen = sizeof(buf);
>> >>> +
>> >>> +cmsg = CMSG_FIRSTHDR();
>> >>> +cmsg->cmsg_level = SOL_SOCKET;
>> >>> +cmsg->cmsg_type = SCM_RIGHTS;
>> >>> +cmsg->cmsg_len = CMSG_LEN(n * sizeof(int));
>> >>> +
>> >>> +memcpy(CMSG_DATA(cmsg), fds, n * sizeof(int));
>> >>> +
>> >>> +return sendmsg(socket, , 0);
>> >>> +}
>> >>> +
>> >>> +static void print_help_and_exit(const char *prog, int exitcode)
>> >>> +{
>> 

Re: [PATCH v4 00/33] Qemu SGX virtualization

2021-09-06 Thread Yang Zhong
On Mon, Sep 06, 2021 at 03:13:08PM +0200, Paolo Bonzini wrote:
> Hi,
> 
> the monitor patches did not pass the test-hmp qtest, and also they
> should be in target/i386/monitor.c (see other commands that were
> implemented there for SEV).  However, I've sent a pull request with
> the rest.
>

  Paolo, thanks for pulling those patches! In fact, the first POC implemented 
the 
  monitor to target/i386/monitor.c, but in that time, there were lots of 
'pragma GCC poison'
  errors during the build, so i had to move to common monitor. Let me check 
this again, thanks!

  Yang
 
> Thanks,
> 
> Paolo
> 
> On Mon, Jul 19, 2021 at 1:27 PM Yang Zhong  wrote:
> >
> > Since Sean Christopherson has left Intel and i am responsible for Qemu SGX
> > upstream work. His @intel.com address will be bouncing and his new email(
> > sea...@google.com) is also in CC lists.
> >
> > This series is Qemu SGX virtualization implementation rebased on latest
> > Qemu release. The numa support for SGX will be sent in another patchset
> > once this basic SGX patchset are merged.
> >
> > You can find Qemu repo here:
> >
> > https://github.com/intel/qemu-sgx.git upstream
> >
> > If you want to try SGX, you can directly install the linux release(at least 
> > 5.13.0-rc1+)
> > since kvm SGX has been merged into linux release.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >
> > To simplify, you'd better install linux on host and guest, which can support
> > SGX on host and guest kernel. And to me, use below reference command to boot
> > SGX guest:
> >
> > #qemu-system-x86_64 \
> > .. \
> > -cpu host,+sgx-provisionkey \
> > -object memory-backend-epc,id=mem1,size=64M,prealloc=on \
> > -object memory-backend-epc,id=mem2,size=28M \
> > -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2
> >
> > Overview
> > 
> >
> > Intel Software Guard eXtensions (SGX) is a set of instructions and 
> > mechanisms
> > for memory accesses in order to provide security accesses for sensitive
> > applications and data. SGX allows an application to use it's pariticular
> > address space as an *enclave*, which is a protected area provides 
> > confidentiality
> > and integrity even in the presence of privileged malware. Accesses to the
> > enclave memory area from any software not resident in the enclave are 
> > prevented,
> > including those from privileged software.
> >
> > SGX virtaulization
> > ==
> >
> > The KVM SGX creates one new misc device, sgx_vepc, and Qemu will open 
> > '/dev/sgx_vepc'
> > device node to mmap() host EPC memory to guest. The Qemu also adds 
> > 'sgx-epc' device
> > to expose EPC sections to guest through CPUID and ACPI table.  The Qemu SGX 
> > also
> > supports multiple virtual EPC sections to guest, we just put them together 
> > physically
> > contiguous for the sake of simplicity. The kernel SGX NUMA has been merged 
> > into Linux
> > tip tree, we will support this function in the next phase.
> >
> > Although the current host SGX subsystem can not support SGX2 feature, the 
> > KVM/Qemu
> > implementation still expose this feature to guest. Guest SGX2 support 
> > doesn't have
> > interaction with host kernel SGX driver, the SGX guest can normally use 
> > those new
> > instructions.
> >
> > As for SGX virtualization detailed infomation, please reference 
> > docs/intel-sgx.txt
> > docuement(patch 33).
> >
> > Changelog:
> > =
> >
> > (Changelog here is for global changes, please see each patch's changelog 
> > for changes
> > made to specific patch.)
> >
> > v3-->v4:
> >- Rebased the sgx patches into latest Qemu release.
> >- Moved sgx compound property setter/getter from MachineState to 
> > X86MachineState(Paolo).
> >- Re-defined struct SgxEPC, removed 'id' property and added struct 
> > SgxEPCList for
> >  sgx-epc.0.{memdev}(Paolo).
> >- Removed g_malloc0(), and changed the 'SGXEPCState *sgx_epc' to 
> > 'SGXEPCState sgx_epc'
> >  in struct PCMachineState(Paolo).
> >- Changed the SGX compound property cmdline from sgx-epc.{memdev}.0 to
> >  sgx-epc.0.{memdev}(Paolo).
> >- Removed the signature from the 'git format-patch' command(Jarkko).
> >
> > v2-->v3:
> >- Rebased the sgx patches into latest Qemu release.
> >- Implemented the compound property for SGX, ref patch5, the command 
> > from '-sgx-epc'
> >  to '-M'(Paolo).
> >- Moved the sgx common code from sgx-epc.c to sgx.c. The sgx-epc.c is
> >  only responsible for virtual epc device.
> >- Removed the previous patch13(linux-headers: Add placeholder for 
> > KVM_CAP_SGX_ATTRIBUTE)
> >  because ehabk...@redhat.com updated Linux headers to 5.13-rc4 with 
> > commit 278f064e452.
> >- Updated the patch1 because ram_flags were changed by David Hildenbra.
> >- Added one patch24, which avoid reset operation caused by bios reset.
> >- Added one patch25, which make prealloc property consistent with Qemu 

Re: [PATCH 5/5] vfio: defer to commit kvm route in migraiton resume phase

2021-09-06 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



在 2021/9/4 5:57, Alex Williamson 写道:
> On Wed, 25 Aug 2021 15:56:20 +0800
> "Longpeng(Mike)"  wrote:
> 
>> In migration resume phase, all unmasked msix vectors need to be
>> setup when load the VF state. However, the setup operation would
>> takes longer if the VF has more unmasked vectors.
>>
>> In our case, the VF has 65 vectors and each one spend at most 0.8ms
>> on setup operation the total cost of the VF is about 8-58ms. For a
>> VM that has 8 VFs of this type, the total cost is more than 250ms.
>>
>> vfio_pci_load_config
>>   vfio_msix_enable
>> msix_set_vector_notifiers
>>   for (vector = 0; vector < dev->msix_entries_nr; vector++) {
>> vfio_msix_vector_do_use
>>   vfio_add_kvm_msi_virq
>> kvm_irqchip_commit_routes <-- expensive
>>   }
>>
>> We can reduce the cost by only commit once outside the loop. The
>> routes is cached in kvm_state, we commit them first and then bind
>> irqfd for each vector.
>>
>> The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
>> we mesure the cost of the vfio_msix_enable for each one, and
>> we can see 90+% costs can be reduce.
>>
>> Origin  Apply this patch
>> and vfio enable optimization
>> 1st 8   2
>> 2nd 15  2
>> 3rd 22  2
>> 4th 24  3
>> 5th 36  2
>> 6th 44  3
>> 7th 51  3
>> 8th 58  4
>> Total   258ms   21ms
> 
> Almost seems like we should have started here rather than much more
> subtle improvements from patch 3.
> 
>  
>> The optimition can be also applied to msi type.
>>
>> Signed-off-by: Longpeng(Mike) 
>> ---
>>  hw/vfio/pci.c | 47 ---
>>  1 file changed, 44 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 3ab67d6..50e7ec7 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -427,12 +427,17 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
>> VFIOMSIVector *vector,
>>  return;
>>  }
>>  
>> -virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, >pdev, 
>> false);
>> +virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, >pdev,
>> + vdev->defer_add_virq);
> 
> See comment on previous patch about these bool function args.
> 
>>  if (virq < 0) {
>>  event_notifier_cleanup(>kvm_interrupt);
>>  return;
>>  }
>>  
>> +if (vdev->defer_add_virq) {
>> +goto out;
>> +}
> 
> See comment on previous patch about this goto flow.
> 
>> +
>>  if (kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, 
>> >kvm_interrupt,
>> NULL, virq) < 0) {
>>  kvm_irqchip_release_virq(kvm_state, virq);
>> @@ -440,6 +445,7 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
>> VFIOMSIVector *vector,
>>  return;
>>  }
>>  
>> +out:
>>  vector->virq = virq;
>>  }
>>  
>> @@ -577,6 +583,36 @@ static void vfio_msix_vector_release(PCIDevice *pdev, 
>> unsigned int nr)
>>  }
>>  }
>>  
>> +static void vfio_commit_kvm_msi_virq(VFIOPCIDevice *vdev)
>> +{
>> +int i;
>> +VFIOMSIVector *vector;
>> +bool commited = false;
>> +
>> +for (i = 0; i < vdev->nr_vectors; i++) {
>> +vector = >msi_vectors[i];
>> +
>> +if (vector->virq < 0) {
>> +continue;
>> +}
>> +
>> +/* Commit cached route entries to KVM core first if not yet */
>> +if (!commited) {
>> +kvm_irqchip_commit_routes(kvm_state);
>> +commited = true;
>> +}
> 
> Why is this in the loop, shouldn't we just start with:
> 

The kvm_irqchip_commit_routes won't be called if all of the vector->virq are -1
originally, so I just want to preserve the behavior here.

But it seems no any side effect if we call it directly, I'll take your advice in
the next version, thanks.

> if (!vdev->nr_vectors) {
> return;
> }
> 
> kvm_irqchip_commit_routes(kvm_state);
> 
> for (...
> 
>> +
>> +if (kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
>> +   >kvm_interrupt,
>> +   NULL, vector->virq) < 0) {
>> +kvm_irqchip_release_virq(kvm_state, vector->virq);
>> +event_notifier_cleanup(>kvm_interrupt);
>> +vector->virq = -1;
>> +return;
>> +}
> 
> And all the other vectors we've allocated?  Error logging?
> 

Oh, it's a bug, will fix.

>> +}
>> +}
>> +
>>  static void vfio_msix_enable(VFIOPCIDevice *vdev)
>>  {
>>  PCIDevice *pdev = >pdev;
>> @@ -624,6 +660,7 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>>  if (!pdev->msix_function_masked && vdev->defer_add_virq) {
>>  int ret;
>>  vfio_disable_irqindex(>vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>> +vfio_commit_kvm_msi_virq(vdev);
>>  

Re: [PATCH v3 05/15] target/ppc: PMU: add instruction counting

2021-09-06 Thread David Gibson
On Fri, Sep 03, 2021 at 05:31:06PM -0300, Daniel Henrique Barboza wrote:
> The PMU is already counting cycles by calculating time elapsed in
> nanoseconds. Counting instructions is a different matter and requires
> another approach.
> 
> This patch adds the capability of counting completed instructions
> (Perf event PM_INST_CMPL) by counting the amount of instructions
> translated in each translation block right before exiting it.
> 
> A new pmu_count_insns() helper in translation.c was added to do that.
> After verifying that the PMU is running (MMCR0_FC bit not set), call
> helper_insns_inc(). This new helper from power8_pmu.c will add the
> instructions to the relevant counters. It'll also be responsible for
> triggering counter negative overflows later on.
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  target/ppc/cpu.h |  1 +
>  target/ppc/helper.h  |  1 +
>  target/ppc/helper_regs.c |  3 ++
>  target/ppc/power8_pmu.c  | 70 
>  target/ppc/translate.c   | 46 ++
>  5 files changed, 114 insertions(+), 7 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 74698a3600..4d4886ac74 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -628,6 +628,7 @@ enum {
>  HFLAGS_FP = 13,  /* MSR_FP */
>  HFLAGS_PR = 14,  /* MSR_PR */
>  HFLAGS_PMCCCLEAR = 15, /* PMU MMCR0 PMCC equal to 0b00 */
> +HFLAGS_MMCR0FC = 16, /* MMCR0 FC bit */
>  HFLAGS_VSX = 23, /* MSR_VSX if cpu has VSX */
>  HFLAGS_VR = 25,  /* MSR_VR if cpu has VRE */
>  
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 5122632784..47dbbe6da1 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -21,6 +21,7 @@ DEF_HELPER_1(hrfid, void, env)
>  DEF_HELPER_2(store_lpcr, void, env, tl)
>  DEF_HELPER_2(store_pcr, void, env, tl)
>  DEF_HELPER_2(store_mmcr0, void, env, tl)
> +DEF_HELPER_2(insns_inc, void, env, i32)
>  #endif
>  DEF_HELPER_1(check_tlb_flush_local, void, env)
>  DEF_HELPER_1(check_tlb_flush_global, void, env)
> diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
> index 4c1d9575ac..27d139edd8 100644
> --- a/target/ppc/helper_regs.c
> +++ b/target/ppc/helper_regs.c
> @@ -109,6 +109,9 @@ static uint32_t hreg_compute_hflags_value(CPUPPCState 
> *env)
>  if (((env->spr[SPR_POWER_MMCR0] & MMCR0_PMCC) >> 18) == 0) {
>  hflags |= 1 << HFLAGS_PMCCCLEAR;
>  }
> +if (env->spr[SPR_POWER_MMCR0] & MMCR0_FC) {
> +hflags |= 1 << HFLAGS_MMCR0FC;
> +}
>  
>  #ifndef CONFIG_USER_ONLY
>  if (!env->has_hv_mode || (msr & (1ull << MSR_HV))) {
> diff --git a/target/ppc/power8_pmu.c b/target/ppc/power8_pmu.c
> index 3f7b305f4f..9769c0ff35 100644
> --- a/target/ppc/power8_pmu.c
> +++ b/target/ppc/power8_pmu.c
> @@ -31,10 +31,13 @@ static void update_PMC_PM_CYC(CPUPPCState *env, int sprn,
>  env->spr[sprn] += time_delta;
>  }
>  
> -static void update_programmable_PMC_reg(CPUPPCState *env, int sprn,
> -uint64_t time_delta)
> +static uint8_t get_PMC_event(CPUPPCState *env, int sprn)

I like the idea of splitting out a helper to get the selected event
(might even make sense to move that to the earlier patch).  What would
be even nicer is if it also included handling of the fact that some
events are specific to particular PMCs (like 0xF0 for PMC1).  That
means that all the event selection logic will be here, rather than
having to check the PMC number again in the caller.  Obviously to do
that you'll need some special "bad event" return value, which might
mean changing the return type.

>  {
> -uint8_t event, evt_extr;
> +uint8_t evt_extr = 0;
> +
> +if (env->spr[SPR_POWER_MMCR1] == 0) {
> +return 0;
> +}
>  
>  switch (sprn) {
>  case SPR_POWER_PMC1:
> @@ -50,10 +53,16 @@ static void update_programmable_PMC_reg(CPUPPCState *env, 
> int sprn,
>  evt_extr = MMCR1_PMC4EVT_EXTR;
>  break;
>  default:
> -return;
> +return 0;
>  }
>  
> -event = extract64(env->spr[SPR_POWER_MMCR1], evt_extr, MMCR1_EVT_SIZE);
> +return extract64(env->spr[SPR_POWER_MMCR1], evt_extr, MMCR1_EVT_SIZE);
> +}
> +
> +static void update_programmable_PMC_reg(CPUPPCState *env, int sprn,
> +uint64_t time_delta)
> +{
> +uint8_t event = get_PMC_event(env, sprn);
>  
>  /*
>   * MMCR0_PMC1SEL = 0xF0 is the architected PowerISA v3.1 event
> @@ -99,8 +108,9 @@ void helper_store_mmcr0(CPUPPCState *env, target_ulong 
> value)
>  
>  env->spr[SPR_POWER_MMCR0] = value;
>  
> -/* MMCR0 writes can change HFLAGS_PMCCCLEAR */
> -if ((curr_value & MMCR0_PMCC) != (value & MMCR0_PMCC)) {
> +/* MMCR0 writes can change HFLAGS_PMCCCLEAR and HFLAGS_MMCR0FC */
> +if (((curr_value & MMCR0_PMCC) != (value & MMCR0_PMCC)) ||
> +(curr_FC != new_FC)) {
>  hreg_compute_hflags(env);
>  }
>  
> @@ -123,4 +133,50 @@ 

Re: [PATCH v3 04/15] target/ppc/power8_pmu.c: enable PMC1-PMC4 events

2021-09-06 Thread David Gibson
On Fri, Sep 03, 2021 at 05:31:05PM -0300, Daniel Henrique Barboza wrote:
65;6402;1c> This patch enable all PMCs but PMC5 to count cycles. To do that we
> need to implement MMCR1 bits where the event are stored, retrieve
> them, see if the PMC was configured with a PM_CYC event, and
> calculate cycles if that's the case.
> 
> PowerISA v3.1 defines the following conditions to count cycles:
> 
> - PMC1 set with the event 0xF0;
> - PMC6, which always count cycles
> 
> However, the PowerISA also defines a range of 'implementation dependent'
> events that the chip can use in the 0x01-0xBF range. Turns out that IBM
> POWER chips implements some non-ISA events, and the Linux kernel makes uses
> of them. For instance, 0x1E is an implementation specific event that
> counts cycles in PMCs 1-4 that the kernel uses. Let's also support 0x1E
> to count cycles to allow for existing kernels to behave properly with the
> PMU.
> 
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

> ---
>  target/ppc/cpu.h| 11 +
>  target/ppc/power8_pmu.c | 52 +
>  2 files changed, 63 insertions(+)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index a9b31736af..74698a3600 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -352,6 +352,17 @@ typedef struct ppc_v3_pate_t {
>  /* MMCR0 userspace r/w mask */
>  #define MMCR0_UREG_MASK (MMCR0_FC | MMCR0_PMAO | MMCR0_PMAE)
>  
> +#define MMCR1_EVT_SIZE 8
> +/* extract64() does a right shift before extracting */
> +#define MMCR1_PMC1SEL_START 32
> +#define MMCR1_PMC1EVT_EXTR (64 - MMCR1_PMC1SEL_START - MMCR1_EVT_SIZE)
> +#define MMCR1_PMC2SEL_START 40
> +#define MMCR1_PMC2EVT_EXTR (64 - MMCR1_PMC2SEL_START - MMCR1_EVT_SIZE)
> +#define MMCR1_PMC3SEL_START 48
> +#define MMCR1_PMC3EVT_EXTR (64 - MMCR1_PMC3SEL_START - MMCR1_EVT_SIZE)
> +#define MMCR1_PMC4SEL_START 56
> +#define MMCR1_PMC4EVT_EXTR (64 - MMCR1_PMC4SEL_START - MMCR1_EVT_SIZE)
> +
>  /* LPCR bits */
>  #define LPCR_VPM0 PPC_BIT(0)
>  #define LPCR_VPM1 PPC_BIT(1)
> diff --git a/target/ppc/power8_pmu.c b/target/ppc/power8_pmu.c
> index 47de38a99e..3f7b305f4f 100644
> --- a/target/ppc/power8_pmu.c
> +++ b/target/ppc/power8_pmu.c
> @@ -31,10 +31,62 @@ static void update_PMC_PM_CYC(CPUPPCState *env, int sprn,
>  env->spr[sprn] += time_delta;
>  }
>  
> +static void update_programmable_PMC_reg(CPUPPCState *env, int sprn,
> +uint64_t time_delta)
> +{
> +uint8_t event, evt_extr;
> +
> +switch (sprn) {
> +case SPR_POWER_PMC1:
> +evt_extr = MMCR1_PMC1EVT_EXTR;
> +break;
> +case SPR_POWER_PMC2:
> +evt_extr = MMCR1_PMC2EVT_EXTR;
> +break;
> +case SPR_POWER_PMC3:
> +evt_extr = MMCR1_PMC3EVT_EXTR;
> +break;
> +case SPR_POWER_PMC4:
> +evt_extr = MMCR1_PMC4EVT_EXTR;
> +break;
> +default:
> +return;
> +}
> +
> +event = extract64(env->spr[SPR_POWER_MMCR1], evt_extr, MMCR1_EVT_SIZE);
> +
> +/*
> + * MMCR0_PMC1SEL = 0xF0 is the architected PowerISA v3.1 event
> + * that counts cycles using PMC1.
> + *
> + * IBM POWER chips also has support for an implementation dependent
> + * event, 0x1E, that enables cycle counting on PMCs 1-4. The
> + * Linux kernel makes extensive use of 0x1E, so let's also support
> + * it.
> + */
> +switch (event) {
> +case 0xF0:
> +if (sprn == SPR_POWER_PMC1) {
> +update_PMC_PM_CYC(env, sprn, time_delta);
> +}
> +break;
> +case 0x1E:
> +update_PMC_PM_CYC(env, sprn, time_delta);
> +break;
> +default:
> +return;
> +}
> +}
> +
>  static void update_cycles_PMCs(CPUPPCState *env)
>  {
>  uint64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
>  uint64_t time_delta = now - env->pmu_base_time;
> +int sprn;
> +
> +for (sprn = SPR_POWER_PMC1; sprn < SPR_POWER_PMC5; sprn++) {
> +update_programmable_PMC_reg(env, sprn, time_delta);
> +}
>  
>  update_PMC_PM_CYC(env, SPR_POWER_PMC6, time_delta);
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 4/5] kvm: irqchip: support defer to commit the route

2021-09-06 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



在 2021/9/4 5:57, Alex Williamson 写道:
> On Wed, 25 Aug 2021 15:56:19 +0800
> "Longpeng(Mike)"  wrote:
> 
>> The kvm_irqchip_commit_routes() is relatively expensive, so
>> provide the users a choice to commit the route immediately
>> or not when they add msi/msix route.
>>
>> Signed-off-by: Longpeng(Mike) 
>> ---
>>  accel/kvm/kvm-all.c| 10 +++---
>>  accel/stubs/kvm-stub.c |  3 ++-
>>  hw/misc/ivshmem.c  |  2 +-
>>  hw/vfio/pci.c  |  2 +-
>>  hw/virtio/virtio-pci.c |  2 +-
>>  include/sysemu/kvm.h   |  4 +++-
>>  target/i386/kvm/kvm.c  |  2 +-
>>  7 files changed, 16 insertions(+), 9 deletions(-)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 0125c17..1f788a2 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -1950,7 +1950,8 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
>>  return kvm_set_irq(s, route->kroute.gsi, 1);
>>  }
>>  
>> -int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
>> +int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
>> +  bool defer_commit)
>>  {
>>  struct kvm_irq_routing_entry kroute = {};
>>  int virq;
>> @@ -1993,7 +1994,9 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, 
>> PCIDevice *dev)
>>  
>>  kvm_add_routing_entry(s, );
>>  kvm_arch_add_msi_route_post(, vector, dev);
>> -kvm_irqchip_commit_routes(s);
>> +if (!defer_commit) {
>> +kvm_irqchip_commit_routes(s);
>> +}
> 
> 
> Personally I'd rather rename the function to
> kvm_irqchip_add_deferred_msi_route() and kvm_irqchip_add_msi_route()
> becomes a wrapper appending kvm_irqchip_commit_routes() to that.
> Thanks,
> 

Ok, will do in the next version.

> Alex
> 
>>  
>>  return virq;
>>  }
>> @@ -2151,7 +2154,8 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
>>  abort();
>>  }
>>  
>> -int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
>> +int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
>> +  bool defer_commit)
>>  {
>>  return -ENOSYS;
>>  }
>> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
>> index 5b1d00a..d5caaca 100644
>> --- a/accel/stubs/kvm-stub.c
>> +++ b/accel/stubs/kvm-stub.c
>> @@ -81,7 +81,8 @@ int kvm_on_sigbus(int code, void *addr)
>>  }
>>  
>>  #ifndef CONFIG_USER_ONLY
>> -int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
>> +int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
>> +  bool defer_commit)
>>  {
>>  return -ENOSYS;
>>  }
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index 1ba4a98..98b14cc 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -429,7 +429,7 @@ static void ivshmem_add_kvm_msi_virq(IVShmemState *s, 
>> int vector,
>>  IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
>>  assert(!s->msi_vectors[vector].pdev);
>>  
>> -ret = kvm_irqchip_add_msi_route(kvm_state, vector, pdev);
>> +ret = kvm_irqchip_add_msi_route(kvm_state, vector, pdev, false);
>>  if (ret < 0) {
>>  error_setg(errp, "kvm_irqchip_add_msi_route failed");
>>  return;
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index ca37fb7..3ab67d6 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -427,7 +427,7 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
>> VFIOMSIVector *vector,
>>  return;
>>  }
>>  
>> -virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, >pdev);
>> +virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, >pdev, 
>> false);
>>  if (virq < 0) {
>>  event_notifier_cleanup(>kvm_interrupt);
>>  return;
>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>> index 433060a..7e2d021 100644
>> --- a/hw/virtio/virtio-pci.c
>> +++ b/hw/virtio/virtio-pci.c
>> @@ -684,7 +684,7 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy 
>> *proxy,
>>  int ret;
>>  
>>  if (irqfd->users == 0) {
>> -ret = kvm_irqchip_add_msi_route(kvm_state, vector, >pci_dev);
>> +ret = kvm_irqchip_add_msi_route(kvm_state, vector, >pci_dev, 
>> false);
>>  if (ret < 0) {
>>  return ret;
>>  }
>> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
>> index a1ab1ee..1932dc0 100644
>> --- a/include/sysemu/kvm.h
>> +++ b/include/sysemu/kvm.h
>> @@ -473,9 +473,11 @@ void kvm_init_cpu_signals(CPUState *cpu);
>>   *  message.
>>   * @dev:Owner PCI device to add the route. If @dev is specified
>>   *  as @NULL, an empty MSI message will be inited.
>> + * @defer_commit:   Defer to commit new route to the KVM core.
>>   * @return: virq (>=0) when success, errno (<0) when failed.
>>   */
>> -int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev);
>> +int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
>> +  bool 

Re: [PATCH v3 03/15] target/ppc: PMU basic cycle count for pseries TCG

2021-09-06 Thread David Gibson
On Fri, Sep 03, 2021 at 05:31:04PM -0300, Daniel Henrique Barboza wrote:
> This patch adds the barebones of the PMU logic by enabling cycle
> counting, done via the performance monitor counter 6. The overall logic
> goes as follows:
> 
> - a helper is added to control the PMU state on each MMCR0 write. This
> allows for the PMU to start/stop as the frozen counter bit (MMCR0_FC)
> is cleared or set;
> 
> - MMCR0 reg initial value is set to 0x8000 (MMCR0_FC set) to avoid
> having to spin the PMU right at system init;
> 
> - the intended usage is to freeze the counters by setting MMCR0_FC, do
> any additional setting of events to be counted via MMCR1 (not
> implemented yet) and enable the PMU by zeroing MMCR0_FC. Software must
> freeze counters to read the results - on the fly reading of the PMCs
> will return the starting value of each one.
> 
> Since there will be more PMU exclusive code to be added next, put the
> PMU logic in its own helper to keep all in the same place. The name of
> the new helper file, power8_pmu.c, is an indicative that the PMU logic
> has been tested with the IBM POWER chip family, POWER8 being the oldest
> version tested. This doesn't mean that this PMU logic will break with
> any other PPC64 chip that implements Book3s, but since we can't assert
> that this PMU will work with all available Book3s emulated processors
> we're choosing to be explicit.
> 
> Signed-off-by: Daniel Henrique Barboza 

LGTM, except for one nit:
> +#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
> +
> +static void update_PMC_PM_CYC(CPUPPCState *env, int sprn,
> +  uint64_t time_delta)
> +{
> +/*
> + * The pseries and pvn clock runs at 1Ghz, meaning that

s/pvn/pnv/ ?


-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v3 02/15] target/ppc: add user write access control for PMU SPRs

2021-09-06 Thread David Gibson
On Fri, Sep 03, 2021 at 05:31:03PM -0300, Daniel Henrique Barboza wrote:
> The PMU needs to enable writing of its uregs to userspace, otherwise
> Perf applications will not able to setup the counters correctly. This
> patch enables user space writing of all PMU uregs.
> 
> MMCR0 is a special case because its userspace writing access is controlled
> by MMCR0_PMCC bits. There are 4 configurations available (0b00, 0b01,
> 0b10 and 0b11) but for our purposes here we're handling only
> MMCR0_PMCC = 0b00. In this case, if userspace tries to write MMCR0, a
> hypervisor emulation assistance interrupt occurs.
> 
> This is being done by adding HFLAGS_PMCCCLEAR to hflags. This flag
> indicates if MMCR0_PMCC is cleared (0b00), and a new 'pmcc_clear' flag in
> DisasContext allow us to use it in spr_write_MMCR0_ureg().
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  target/ppc/cpu.h |  1 +
>  target/ppc/cpu_init.c| 18 +++---
>  target/ppc/helper_regs.c |  3 +++
>  target/ppc/spr_tcg.h |  3 ++-
>  target/ppc/translate.c   | 53 +++-
>  5 files changed, 67 insertions(+), 11 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index f68bb8d8aa..8dfbb62022 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -616,6 +616,7 @@ enum {
>  HFLAGS_SE = 10,  /* MSR_SE -- from elsewhere on embedded ppc */
>  HFLAGS_FP = 13,  /* MSR_FP */
>  HFLAGS_PR = 14,  /* MSR_PR */
> +HFLAGS_PMCCCLEAR = 15, /* PMU MMCR0 PMCC equal to 0b00 */
>  HFLAGS_VSX = 23, /* MSR_VSX if cpu has VSX */
>  HFLAGS_VR = 25,  /* MSR_VR if cpu has VRE */
>  
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 9efc6c2d87..bb5ea04c61 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -6867,7 +6867,7 @@ static void register_book3s_pmu_sup_sprs(CPUPPCState 
> *env)
>  static void register_book3s_pmu_user_sprs(CPUPPCState *env)
>  {
>  spr_register(env, SPR_POWER_UMMCR0, "UMMCR0",
> - _read_MMCR0_ureg, SPR_NOACCESS,
> + _read_MMCR0_ureg, _write_MMCR0_ureg,
>   _read_ureg, _write_ureg,
>   0x);
>  spr_register(env, SPR_POWER_UMMCR1, "UMMCR1",
> @@ -6875,31 +6875,31 @@ static void register_book3s_pmu_user_sprs(CPUPPCState 
> *env)
>   _read_ureg, _write_ureg,
>   0x);
>  spr_register(env, SPR_POWER_UMMCRA, "UMMCRA",
> - _read_ureg, SPR_NOACCESS,
> + _read_ureg, _write_ureg,
>   _read_ureg, _write_ureg,
>   0x);
>  spr_register(env, SPR_POWER_UPMC1, "UPMC1",
> - _read_ureg, SPR_NOACCESS,
> + _read_ureg, _write_ureg,

Surely this can't be write.  AFAICT spr_write_ureg() will
unconditionally allow full userspace write access.  That can't be
right - otherwise the OS could never safely use the PMU for itself.

>   _read_ureg, _write_ureg,
>   0x);
>  spr_register(env, SPR_POWER_UPMC2, "UPMC2",
> - _read_ureg, SPR_NOACCESS,
> + _read_ureg, _write_ureg,
>   _read_ureg, _write_ureg,
>   0x);
>  spr_register(env, SPR_POWER_UPMC3, "UPMC3",
> - _read_ureg, SPR_NOACCESS,
> + _read_ureg, _write_ureg,
>   _read_ureg, _write_ureg,
>   0x);
>  spr_register(env, SPR_POWER_UPMC4, "UPMC4",
> - _read_ureg, SPR_NOACCESS,
> + _read_ureg, _write_ureg,
>   _read_ureg, _write_ureg,
>   0x);
>  spr_register(env, SPR_POWER_UPMC5, "UPMC5",
> - _read_ureg, SPR_NOACCESS,
> + _read_ureg, _write_ureg,
>   _read_ureg, _write_ureg,
>   0x);
>  spr_register(env, SPR_POWER_UPMC6, "UPMC6",
> - _read_ureg, SPR_NOACCESS,
> + _read_ureg, _write_ureg,
>   _read_ureg, _write_ureg,
>   0x);
>  spr_register(env, SPR_POWER_USIAR, "USIAR",
> @@ -6975,7 +6975,7 @@ static void register_power8_pmu_sup_sprs(CPUPPCState 
> *env)
>  static void register_power8_pmu_user_sprs(CPUPPCState *env)
>  {
>  spr_register(env, SPR_POWER_UMMCR2, "UMMCR2",
> - _read_MMCR2_ureg, SPR_NOACCESS,
> + _read_MMCR2_ureg, _write_ureg,
>   _read_ureg, _write_ureg,
>   0x);
>  spr_register(env, SPR_POWER_USIER, "USIER",
> diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
> index 405450d863..4c1d9575ac 100644
> --- a/target/ppc/helper_regs.c
> +++ b/target/ppc/helper_regs.c
> @@ -106,6 +106,9 @@ static uint32_t hreg_compute_hflags_value(CPUPPCState 
> *env)
>  if (env->spr[SPR_LPCR] & LPCR_GTSE) {
>  hflags |= 1 << HFLAGS_GTSE;
>  }
> +

Re: [PATCH 3/5] vfio: defer to enable msix in migration resume phase

2021-09-06 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



在 2021/9/4 5:56, Alex Williamson 写道:
> On Wed, 25 Aug 2021 15:56:18 +0800
> "Longpeng(Mike)"  wrote:
> 
>> The vf's unmasked msix vectors will be enable one by one in
>> migraiton resume phase, VFIO_DEVICE_SET_IRQS will be called
>> for each vector, it's a bit expensive if the vf has more
>> vectors.
>>
>> We can call VFIO_DEVICE_SET_IRQS once outside the loop of set
>> vector notifiers to reduce the cost.
>>
>> The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
>> we mesure the cost of the vfio_msix_enable for each one, and
>> we can see 10% costs can be reduced.
>>
>> Origin  Apply this patch
> 
> Original?
> 
>> 1st 8   4
>> 2nd 15  11
>> 3rd 22  18
>> 4th 24  25
>> 5th 36  33
>> 6th 44  40
>> 7th 51  47
>> 8th 58  54
>> Total   258ms   232ms
> 
> If the values here are ms for execution of vfio_msix_enable() per VF,

Yes.

> why are the values increasing per VF?  Do we have 65 vectors per VF or
> do we have 65 vectors total, weighted towards to higher VFs?

We have 65 vectors per VF.

The KVM_SET_GSI_ROUTING scans and updates all of the assigned irqfds
unconditionally, so it will spend more time if there are more irqfds.

We have 65 irqfds when process the 1st VF, 130 irqfds when process the 2nd VF,
195 irqfds when process the 3rd VF ... so we'll see the values are increasing as
a result.

> This doesn't make sense without the data from the last patch in the
> series.
> 
>>
>> Signed-off-by: Longpeng(Mike) 
>> ---
>>  hw/vfio/pci.c | 22 ++
>>  hw/vfio/pci.h |  1 +
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 7cc43fe..ca37fb7 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -372,6 +372,10 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, 
>> bool msix)
>>  int ret = 0, i, argsz;
>>  int32_t *fds;
>>  
>> +if (!vdev->nr_vectors) {
>> +return 0;
>> +}
> 
> How would this occur?  Via the new call below?  But then we'd leave
> vfio_msix_enabled() with MSI-X DISABLED???
> 
>> +
>>  argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
>>  
>>  irq_set = g_malloc0(argsz);
>> @@ -495,6 +499,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
>> unsigned int nr,
>>  }
>>  }
>>  
>> +if (vdev->defer_add_virq) {
>> +vdev->nr_vectors = MAX(vdev->nr_vectors, nr + 1);
>> +goto clear_pending;
>> +}
> 
> This is a really ugly use of 'goto' to simply jump around code you'd
> like to skip rather than reformat the function with branches to
> conditionalize that code.  Gotos for consolidated error paths, retries,
> hard to break loops are ok, not this.
> 

Got it, thanks.

> 
>> +
>>  /*
>>   * We don't want to have the host allocate all possible MSI vectors
>>   * for a device if they're not in use, so we shutdown and incrementally
>> @@ -524,6 +533,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
>> unsigned int nr,
>>  }
>>  }
>>  
>> +clear_pending:
>>  /* Disable PBA emulation when nothing more is pending. */
>>  clear_bit(nr, vdev->msix->pending);
>>  if (find_first_bit(vdev->msix->pending,
>> @@ -608,6 +618,16 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>>  if (msix_set_vector_notifiers(pdev, vfio_msix_vector_use,
>>vfio_msix_vector_release, NULL)) {
>>  error_report("vfio: msix_set_vector_notifiers failed");
>> +return;
>> +}
>> +
>> +if (!pdev->msix_function_masked && vdev->defer_add_virq) {
>> +int ret;
>> +vfio_disable_irqindex(>vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>> +ret = vfio_enable_vectors(vdev, true);
>> +if (ret) {
>> +error_report("vfio: failed to enable vectors, %d", ret);
>> +}
>>  }
>>  
>>  trace_vfio_msix_enable(vdev->vbasedev.name);
>> @@ -2456,7 +2476,9 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, 
>> QEMUFile *f)
>>  if (msi_enabled(pdev)) {
>>  vfio_msi_enable(vdev);
>>  } else if (msix_enabled(pdev)) {
>> +vdev->defer_add_virq = true;
>>  vfio_msix_enable(vdev);
>> +vdev->defer_add_virq = false;
> 
> 
> Ick.  Why is this a special case for vfio_msix_enable()?  Wouldn't we
> prefer to always batch vector-use work while we're in the process of
> enabling MSI-X?  
> 

Ok, will do in next version.

In addition, I'll rename the field to 'defer_kvm_irq_routing' as you suggested
in another earlier thread.

'''
> -vfio_add_kvm_msi_virq(vdev, vector, nr, true);
> +if (unlikely(vdev->defer_set_virq)) {

Likewise this could be "vdev->defer_kvm_irq_routing" and we could apply
it to all IRQ types.

> +vector->need_switch = true;
> +} else {
'''

>>  }
>>  
>>  

Re: [PATCH v3 01/15] target/ppc: add user read functions for MMCR0 and MMCR2

2021-09-06 Thread David Gibson
On Fri, Sep 03, 2021 at 05:31:02PM -0300, Daniel Henrique Barboza wrote:
> From: Gustavo Romero 
> 
> We're going to add PMU support for TCG PPC64 chips, based on IBM POWER8+
> emulation and following PowerISA v3.1.
> 
> Let's start by handling the user read of UMMCR0 and UMMCR2. According to
> PowerISA 3.1 these registers omit some of its bits from userspace.
> 
> CC: Gustavo Romero 
> Signed-off-by: Gustavo Romero 
> Signed-off-by: Daniel Henrique Barboza 

LGTM except for one nit...

[snip]
> +void spr_read_MMCR2_ureg(DisasContext *ctx, int gprn, int sprn)
> +{
> +TCGv t0 = tcg_temp_new();
> +
> +/*
> + * On read, filter out all bits that are not FCnP0 bits.
> + * When MMCR0[PMCC] is set to 0b10 or 0b11, providing
> + * problem state programs read/write access to MMCR2,
> + * only the FCnP0 bits can be accessed. All other bits are
> + * not changed when mtspr is executed in problem state, and
> + * all other bits return 0s when mfspr is executed in problem
> + * state, according to ISA v3.1, section 10.4.6 Monitor Mode
> + * Control Register 2, p. 1316, third paragraph.
> + */
> +gen_load_spr(t0, SPR_POWER_MMCR2);
> +tcg_gen_andi_tl(t0, t0, 0x402010080402UL);

A #define for this mask... and #defines with meaningful names for the
various bits it includes would be nice.

> +tcg_gen_mov_tl(cpu_gpr[gprn], t0);
> +
> +tcg_temp_free(t0);
> +}
> +
>  #if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
>  void spr_write_ureg(DisasContext *ctx, int sprn, int gprn)
>  {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/5] vfio: use helper to simplfy the failure path in vfio_msi_enable

2021-09-06 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



在 2021/9/4 5:55, Alex Williamson 写道:
> On Wed, 25 Aug 2021 15:56:16 +0800
> "Longpeng(Mike)"  wrote:
> 
>> The main difference of the failure path in vfio_msi_enable and
>> vfio_msi_disable_common is enable INTX or not.
>>
>> Extend the vfio_msi_disable_common to provide a arg to decide
> 
> "an arg"
> 

Thanks a lot, I'll fix all of the grammatical errors and typos in this version
together.

>> whether need to fallback, and then we can use this helper to
>> instead the redundant code in vfio_msi_enable.
> 
> Do you mean s/instead/replace/?
> 
>>
>> Signed-off-by: Longpeng(Mike) 
>> ---
>>  hw/vfio/pci.c | 34 --
>>  1 file changed, 12 insertions(+), 22 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index e1ea1d8..7cc43fe 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -47,6 +47,7 @@
>>  
>>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>> +static void vfio_msi_disable_common(VFIOPCIDevice *vdev, bool enable_intx);
>>  
>>  /*
>>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
>> @@ -650,29 +651,17 @@ retry:
>>  if (ret) {
>>  if (ret < 0) {
>>  error_report("vfio: Error: Failed to setup MSI fds: %m");
>> -} else if (ret != vdev->nr_vectors) {
> 
> This part of the change is subtle and not mentioned in the commit log.
> It does seem unnecessary to test against this specific return value
> since any positive return is an error indicating the number of vectors
> we should retry with, but this change should be described in a separate
> patch.
> 
Ok, thanks, I'll split in the next version.

>> +} else {
>>  error_report("vfio: Error: Failed to enable %d "
>>   "MSI vectors, retry with %d", vdev->nr_vectors, 
>> ret);
>>  }
>>  
>> -for (i = 0; i < vdev->nr_vectors; i++) {
>> -VFIOMSIVector *vector = >msi_vectors[i];
>> -if (vector->virq >= 0) {
>> -vfio_remove_kvm_msi_virq(vector);
>> -}
>> -qemu_set_fd_handler(event_notifier_get_fd(>interrupt),
>> -NULL, NULL, NULL);
>> -event_notifier_cleanup(>interrupt);
>> -}
>> -
>> -g_free(vdev->msi_vectors);
>> -vdev->msi_vectors = NULL;
>> +vfio_msi_disable_common(vdev, false);
>>  
>> -if (ret > 0 && ret != vdev->nr_vectors) {
>> +if (ret > 0) {
>>  vdev->nr_vectors = ret;
>>  goto retry;
>>  }
>> -vdev->nr_vectors = 0;
>>  
>>  /*
>>   * Failing to setup MSI doesn't really fall within any 
>> specification.
>> @@ -680,7 +669,6 @@ retry:
>>   * out to fall back to INTx for this device.
>>   */
>>  error_report("vfio: Error: Failed to enable MSI");
>> -vdev->interrupt = VFIO_INT_NONE;
>>  
>>  return;
>>  }
>> @@ -688,7 +676,7 @@ retry:
>>  trace_vfio_msi_enable(vdev->vbasedev.name, vdev->nr_vectors);
>>  }
>>  
>> -static void vfio_msi_disable_common(VFIOPCIDevice *vdev)
>> +static void vfio_msi_disable_common(VFIOPCIDevice *vdev, bool enable_intx)
> 
> I'd rather avoid these sort of modal options to functions where we can.
> Maybe this suggests instead that re-enabling INTx should be removed
> from the common helper and callers needing to do so should do it
> outside of the common helper.  Thanks,
> 
Ok, thanks.

> Alex
> 
> 
>>  {
>>  Error *err = NULL;
>>  int i;
>> @@ -710,9 +698,11 @@ static void vfio_msi_disable_common(VFIOPCIDevice *vdev)
>>  vdev->nr_vectors = 0;
>>  vdev->interrupt = VFIO_INT_NONE;
>>  
>> -vfio_intx_enable(vdev, );
>> -if (err) {
>> -error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
>> +if (enable_intx) {
>> +vfio_intx_enable(vdev, );
>> +if (err) {
>> +error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
>> +}
>>  }
>>  }
>>  
>> @@ -737,7 +727,7 @@ static void vfio_msix_disable(VFIOPCIDevice *vdev)
>>  vfio_disable_irqindex(>vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>>  }
>>  
>> -vfio_msi_disable_common(vdev);
>> +vfio_msi_disable_common(vdev, true);
>>  
>>  memset(vdev->msix->pending, 0,
>> BITS_TO_LONGS(vdev->msix->entries) * sizeof(unsigned long));
>> @@ -748,7 +738,7 @@ static void vfio_msix_disable(VFIOPCIDevice *vdev)
>>  static void vfio_msi_disable(VFIOPCIDevice *vdev)
>>  {
>>  vfio_disable_irqindex(>vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
>> -vfio_msi_disable_common(vdev);
>> +vfio_msi_disable_common(vdev, true);
>>  
>>  trace_vfio_msi_disable(vdev->vbasedev.name);
>>  }
> 
> .
> 



Re: [PATCH v8 0/7] DEVICE_UNPLUG_GUEST_ERROR QAPI event

2021-09-06 Thread David Gibson
On Mon, Sep 06, 2021 at 09:47:48PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This new version amends the QAPI doc in patch 5, as suggested
> by David and Markus, and added all reviewed-by and acked-by
> tags.

I've staged this in the ppc-for-6.2 tree.  Obviously it has some stuff
that isn't purely ppc related, but I'm figuring that should be ok with
the acks from Igor and Markus.  Let me know if there are any
objections.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4 2/5] spapr_numa.c: split FORM1 code into helpers

2021-09-06 Thread David Gibson
On Mon, Sep 06, 2021 at 09:50:36PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 9/6/21 9:30 PM, David Gibson wrote:
> > On Fri, Aug 27, 2021 at 06:24:52AM -0300, Daniel Henrique Barboza wrote:
> > > The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
> > > and doesn't need be concerned with all the legacy support for older
> > > pseries FORM1 guests.
> > > 
> > > We're also not going to calculate associativity domains based on numa
> > > distance (via spapr_numa_define_associativity_domains) since the
> > > distances will be written directly into new DT properties.
> > > 
> > > Let's split FORM1 code into its own functions to allow for easier
> > > insertion of FORM2 logic later on.
> > > 
> > > Reviewed-by: Greg Kurz 
> > > Signed-off-by: Daniel Henrique Barboza 
> > > ---
> > >   hw/ppc/spapr_numa.c | 61 +
> > >   1 file changed, 39 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> > > index 779f18b994..04a86f9b5b 100644
> > > --- a/hw/ppc/spapr_numa.c
> > > +++ b/hw/ppc/spapr_numa.c
> > > @@ -155,6 +155,32 @@ static void 
> > > spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
> > >   }
> > > +/*
> > > + * Set NUMA machine state data based on FORM1 affinity semantics.
> > > + */
> > > +static void spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
> > > +   MachineState *machine)
> > > +{
> > > +bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
> > > +
> > > +/*
> > > + * Legacy NUMA guests (pseries-5.1 and older, or guests with only
> > > + * 1 NUMA node) will not benefit from anything we're going to do
> > > + * after this point.
> > > + */
> > > +if (using_legacy_numa) {
> > > +return;
> > > +}
> > 
> > My only concern with this patch is that handling the
> > "using_legacy_numa" case might logically belong outside the FORM1
> > code.  I mean, I'm pretty sure using_legacy_numa implies FORM1 in
> > practice, but conceptually it seems like a more fundamental question
> > than the DT encoding of the NUMA information.
> 
> I'll carry on this suggestion for the next spin, v6, given that the v5 I sent
> a few minutes ago is also verifying legacy numa in FORM1 code.

Ok.  I should note that I'm not saying what you have now is definitely
wrong, it just looks a bit odd to me.  If you have a rationale for
doing it this way, go ahead and tell me, rather than changing it.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v5 2/4] spapr_numa.c: split FORM1 code into helpers

2021-09-06 Thread David Gibson
On Mon, Sep 06, 2021 at 09:25:25PM -0300, Daniel Henrique Barboza wrote:
65;6402;1c> The upcoming FORM2 NUMA affinity will support asymmetric NUMA 
topologies
> and doesn't need be concerned with all the legacy support for older
> pseries FORM1 guests.
> 
> We're also not going to calculate associativity domains based on numa
> distance (via spapr_numa_define_associativity_domains) since the
> distances will be written directly into new DT properties.
> 
> Let's split FORM1 code into its own functions to allow for easier
> insertion of FORM2 logic later on.
> 
> Reviewed-by: Greg Kurz 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr_numa.c | 61 +
>  1 file changed, 39 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> index 9ee4b479fe..84636cb86a 100644
> --- a/hw/ppc/spapr_numa.c
> +++ b/hw/ppc/spapr_numa.c
> @@ -155,6 +155,32 @@ static void 
> spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
>  
>  }
>  
> +/*
> + * Set NUMA machine state data based on FORM1 affinity semantics.
> + */
> +static void spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
> +   MachineState *machine)
> +{
> +bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
> +
> +/*
> + * Legacy NUMA guests (pseries-5.1 and older, or guests with only
> + * 1 NUMA node) will not benefit from anything we're going to do
> + * after this point.
> + */
> +if (using_legacy_numa) {
> +return;
> +}

As noted on the previous version (send moments before seeing the new
spin), I'm just slightly uncomfortable with the logic being

   if (form1) {
   if (!legacy) {
  
   }
   }

rather than

   if (!legacy) {
   if (form1) {
   
   }
   }

> +
> +if (!spapr_numa_is_symmetrical(machine)) {
> +error_report("Asymmetrical NUMA topologies aren't supported "
> + "in the pSeries machine");
> +exit(EXIT_FAILURE);
> +}
> +
> +spapr_numa_define_associativity_domains(spapr);
> +}
> +
>  void spapr_numa_associativity_reset(SpaprMachineState *spapr)
>  {
>  SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> @@ -210,22 +236,7 @@ void spapr_numa_associativity_reset(SpaprMachineState 
> *spapr)
>  spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
>  }
>  
> -/*
> - * Legacy NUMA guests (pseries-5.1 and older, or guests with only
> - * 1 NUMA node) will not benefit from anything we're going to do
> - * after this point.
> - */
> -if (using_legacy_numa) {
> -return;
> -}
> -
> -if (!spapr_numa_is_symmetrical(machine)) {
> -error_report("Asymmetrical NUMA topologies aren't supported "
> - "in the pSeries machine");
> -exit(EXIT_FAILURE);
> -}
> -
> -spapr_numa_define_associativity_domains(spapr);
> +spapr_numa_FORM1_affinity_init(spapr, machine);
>  }
>  
>  void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
> @@ -302,12 +313,8 @@ int 
> spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
>  return ret;
>  }
>  
> -/*
> - * Helper that writes ibm,associativity-reference-points and
> - * max-associativity-domains in the RTAS pointed by @rtas
> - * in the DT @fdt.
> - */
> -void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
> +static void spapr_numa_FORM1_write_rtas_dt(SpaprMachineState *spapr,
> +   void *fdt, int rtas)
>  {
>  MachineState *ms = MACHINE(spapr);
>  SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> @@ -365,6 +372,16 @@ void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, 
> void *fdt, int rtas)
>   maxdomains, sizeof(maxdomains)));
>  }
>  
> +/*
> + * Helper that writes ibm,associativity-reference-points and
> + * max-associativity-domains in the RTAS pointed by @rtas
> + * in the DT @fdt.
> + */
> +void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
> +{
> +spapr_numa_FORM1_write_rtas_dt(spapr, fdt, rtas);
> +}
> +
>  static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
>SpaprMachineState *spapr,
>target_ulong opcode,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v5 1/4] spapr: move NUMA associativity init to machine reset

2021-09-06 Thread David Gibson
On Mon, Sep 06, 2021 at 09:25:24PM -0300, Daniel Henrique Barboza wrote:
> At this moment we only support one form of NUMA affinity, FORM1. This
> allows us to init the internal structures during machine_init(), and
> given that NUMA distances won't change during the guest lifetime we
> don't need to bother with that again.
> 
> We're about to introduce FORM2, a new NUMA affinity mode for pSeries
> guests. This means that we'll only be certain about the affinity mode
> being used after client architecture support. This also means that the
> guest can switch affinity modes in machine reset.
> 
> Let's prepare the ground for the FORM2 support by moving the NUMA
> internal data init from machine_init() to machine_reset(). Change the
> name to spapr_numa_associativity_reset() to make it clearer that this is
> a function that can be called multiple times during the guest lifecycle.
> We're also simplifying its current API since this method will be called
> during CAS time (do_client_architecture_support()) later on and there's no
> MachineState pointer already solved there.
> 
> Signed-off-by: Daniel Henrique Barboza 

Applied to ppc-for-6.2, thanks.

> ---
>  hw/ppc/spapr.c  | 6 +++---
>  hw/ppc/spapr_numa.c | 4 ++--
>  include/hw/ppc/spapr_numa.h | 9 +
>  3 files changed, 6 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d39fd4e644..8e1ff6cd10 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1621,6 +1621,9 @@ static void spapr_machine_reset(MachineState *machine)
>   */
>  spapr_irq_reset(spapr, _fatal);
>  
> +/* Reset numa_assoc_array */
> +spapr_numa_associativity_reset(spapr);
> +
>  /*
>   * There is no CAS under qtest. Simulate one to please the code that
>   * depends on spapr->ov5_cas. This is especially needed to test device
> @@ -2808,9 +2811,6 @@ static void spapr_machine_init(MachineState *machine)
>  
>  spapr->gpu_numa_id = spapr_numa_initial_nvgpu_numa_id(machine);
>  
> -/* Init numa_assoc_array */
> -spapr_numa_associativity_init(spapr, machine);
> -
>  if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
>  ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0,
>spapr->max_compat_pvr)) {
> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> index 779f18b994..9ee4b479fe 100644
> --- a/hw/ppc/spapr_numa.c
> +++ b/hw/ppc/spapr_numa.c
> @@ -155,10 +155,10 @@ static void 
> spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
>  
>  }
>  
> -void spapr_numa_associativity_init(SpaprMachineState *spapr,
> -   MachineState *machine)
> +void spapr_numa_associativity_reset(SpaprMachineState *spapr)
>  {
>  SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +MachineState *machine = MACHINE(spapr);
>  int nb_numa_nodes = machine->numa_state->num_nodes;
>  int i, j, max_nodes_with_gpus;
>  bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
> diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
> index 6f9f02d3de..0e457bba57 100644
> --- a/include/hw/ppc/spapr_numa.h
> +++ b/include/hw/ppc/spapr_numa.h
> @@ -16,14 +16,7 @@
>  #include "hw/boards.h"
>  #include "hw/ppc/spapr.h"
>  
> -/*
> - * Having both SpaprMachineState and MachineState as arguments
> - * feels odd, but it will spare a MACHINE() call inside the
> - * function. spapr_machine_init() is the only caller for it, and
> - * it has both pointers resolved already.
> - */
> -void spapr_numa_associativity_init(SpaprMachineState *spapr,
> -   MachineState *machine);

Nice additional cleanup to the signature, thanks.

> +void spapr_numa_associativity_reset(SpaprMachineState *spapr);
>  void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas);
>  void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
> int offset, int nodeid);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v5 4/4] spapr: move FORM1 verifications to do_client_architecture_support()

2021-09-06 Thread David Gibson
On Mon, Sep 06, 2021 at 09:25:27PM -0300, Daniel Henrique Barboza wrote:
> FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
> NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
> device that has a different latency than the original NUMA node from the
> regular memory. FORM2 is also enable to deal with asymmetric NUMA
> distances gracefully, something that our FORM1 implementation doesn't
> do.
> 
> Move these FORM1 verifications to a new function and wait until after
> CAS, when we're sure that we're sticking with FORM1, to enforce them.
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr.c  | 33 -
>  hw/ppc/spapr_hcall.c|  6 +
>  hw/ppc/spapr_numa.c | 49 -
>  include/hw/ppc/spapr_numa.h |  1 +
>  4 files changed, 50 insertions(+), 39 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 8d98e3b08a..c974c07fb8 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2797,39 +2797,6 @@ static void spapr_machine_init(MachineState *machine)
>  /* init CPUs */
>  spapr_init_cpus(spapr);
>  
> -/*
> - * check we don't have a memory-less/cpu-less NUMA node
> - * Firmware relies on the existing memory/cpu topology to provide the
> - * NUMA topology to the kernel.
> - * And the linux kernel needs to know the NUMA topology at start
> - * to be able to hotplug CPUs later.
> - */
> -if (machine->numa_state->num_nodes) {
> -for (i = 0; i < machine->numa_state->num_nodes; ++i) {
> -/* check for memory-less node */
> -if (machine->numa_state->nodes[i].node_mem == 0) {
> -CPUState *cs;
> -int found = 0;
> -/* check for cpu-less node */
> -CPU_FOREACH(cs) {
> -PowerPCCPU *cpu = POWERPC_CPU(cs);
> -if (cpu->node_id == i) {
> -found = 1;
> -break;
> -}
> -}
> -/* memory-less and cpu-less node */
> -if (!found) {
> -error_report(
> -   "Memory-less/cpu-less nodes are not supported (node 
> %d)",
> - i);
> -exit(1);
> -}
> -}
> -}
> -
> -}
> -
>  spapr->gpu_numa_id = spapr_numa_initial_nvgpu_numa_id(machine);
>  
>  if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 7efbe93f4b..27ee713600 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1202,9 +1202,15 @@ target_ulong do_client_architecture_support(PowerPCCPU 
> *cpu,
>   * If the guest chooses FORM2 we need to reset the associativity
>   * information - it is being defaulted to FORM1 during
>   * spapr_machine_reset().
> + *
> + * If we're sure that we'll be using FORM1, verify now if we have
> + * a configuration or condition that is not available for FORM1
> + * (namely asymmetric NUMA topologies and empty NUMA nodes).
>   */
>  if (spapr_ovec_test(spapr->ov5_cas, OV5_FORM2_AFFINITY)) {
>  spapr_numa_associativity_reset(spapr);
> +} else {
> +spapr_numa_check_FORM1_constraints(MACHINE(spapr));

Couldn't you put this call into one of the existing FORM1 functions?

>  }
>  
>  /*
> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> index ca276e16cb..0c57d03184 100644
> --- a/hw/ppc/spapr_numa.c
> +++ b/hw/ppc/spapr_numa.c
> @@ -155,6 +155,49 @@ static void 
> spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
>  
>  }
>  
> +void spapr_numa_check_FORM1_constraints(MachineState *machine)
> +{
> +int i;
> +
> +if (!spapr_numa_is_symmetrical(machine)) {
> +error_report("Asymmetrical NUMA topologies aren't supported "
> + "in the pSeries machine");

Error message needs an update since they are now possible with FORM2.

> +exit(EXIT_FAILURE);
> +}
> +
> +/*
> + * check we don't have a memory-less/cpu-less NUMA node
> + * Firmware relies on the existing memory/cpu topology to provide the
> + * NUMA topology to the kernel.
> + * And the linux kernel needs to know the NUMA topology at start
> + * to be able to hotplug CPUs later.
> + */
> +if (machine->numa_state->num_nodes) {
> +for (i = 0; i < machine->numa_state->num_nodes; ++i) {
> +/* check for memory-less node */
> +if (machine->numa_state->nodes[i].node_mem == 0) {
> +CPUState *cs;
> +int found = 0;
> +/* check for cpu-less node */
> +CPU_FOREACH(cs) {
> +PowerPCCPU *cpu = POWERPC_CPU(cs);
> +if (cpu->node_id == i) {
> +found = 1;

Re: [PATCH v5 3/4] spapr_numa.c: base FORM2 NUMA affinity support

2021-09-06 Thread David Gibson
On Mon, Sep 06, 2021 at 09:25:26PM -0300, Daniel Henrique Barboza wrote:
> The main feature of FORM2 affinity support is the separation of NUMA
> distances from ibm,associativity information. This allows for a more
> flexible and straightforward NUMA distance assignment without relying on
> complex associations between several levels of NUMA via
> ibm,associativity matches. Another feature is its extensibility. This base
> support contains the facilities for NUMA distance assignment, but in the
> future more facilities will be added for latency, performance, bandwidth
> and so on.
> 
> This patch implements the base FORM2 affinity support as follows:
> 
> - the use of FORM2 associativity is indicated by using bit 2 of byte 5
> of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
> or FORM2 affinity. Setting both forms will default to FORM2. We're not
> advertising FORM2 for pseries-6.1 and older machine versions to prevent
> guest visible changes in those;
> 
> - call spapr_numa_associativity_reset() in do_client_architecture_support()
> if FORM2 is chosen. This will avoid re-initializing FORM1 artifacts that
> were already initialized in spapr_machine_reset();
> 
> - ibm,associativity-reference-points has a new semantic. Instead of
> being used to calculate distances via NUMA levels, it's now used to
> indicate the primary domain index in the ibm,associativity domain of
> each resource. In our case it's set to {0x4}, matching the position
> where we already place logical_domain_id;

Hmm... I'm a bit torn on this.  The whole reason the ibm,associativity
things are arrays rather than just numbers was to enable the FORM1
nonsense. So we have a choice here: keep the associativity arrays in
the same form, for simplicity of the code, or reduce the associativity
arrays to one entry for FORM2, to simplify the overall DT contents in
the "modern" case.

> - two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table

This doesn't really have anything to do with RTAS.

> and ibm,numa-distance-table. The index table is used to list all the
> NUMA logical domains of the platform, in ascending order, and allows for
> spartial NUMA configurations (although QEMU ATM doesn't support
> that).

s/spartial/partial/

Perhaps more to the point, it allows for sparsely allocated domain
IDs.

> ibm,numa-distance-table is an array that contains all the distances from
> the first NUMA node to all other nodes, then the second NUMA node
> distances to all other nodes and so on;
> 
> - spapr_post_load changes: since we're adding a new NUMA affinity that
> isn't compatible with the existing one, migration must be handled
> accordingly because we can't be certain of whether the guest went
> through CAS in the source. The solution chosen is to initiate the NUMA
> associativity data in spapr_post_load() unconditionally. The worst case
> would be to write the DT twice if the guest is in pre-CAS stage.
> Otherwise, we're making sure that a FORM1 guest will have the
> spapr->numa_assoc_array initialized with the proper information based on
> user distance, something that we're not doing with FORM2.
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr.c  |  24 +++
>  hw/ppc/spapr_hcall.c|  10 +++
>  hw/ppc/spapr_numa.c | 135 +++-
>  include/hw/ppc/spapr.h  |   1 +
>  include/hw/ppc/spapr_ovec.h |   1 +
>  5 files changed, 170 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 8e1ff6cd10..8d98e3b08a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1789,6 +1789,22 @@ static int spapr_post_load(void *opaque, int 
> version_id)
>  return err;
>  }
>  
> +/*
> + * NUMA data init is made in CAS time. There is no reliable
> + * way of telling whether the guest already went through CAS
> + * in the source due to how spapr_ov5_cas_needed works: a
> + * FORM1 guest can be migrated with ov5_cas empty regardless
> + * of going through CAS first.
> + *
> + * One solution is to always call numa_associativity_reset. The
> + * downside is that a guest migrated before CAS will run
> + * numa_associativity_reset again when going through it, but
> + * at least we're making sure spapr->numa_assoc_array will be
> + * initialized and hotplug operations won't fail in both before
> + * and after CAS migration cases.
> + */
> + spapr_numa_associativity_reset(spapr);
> +
>  return err;
>  }
>  
> @@ -2755,6 +2771,11 @@ static void spapr_machine_init(MachineState *machine)
>  
>  spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
>  
> +/* Do not advertise FORM2 support for pseries-6.1 and older */
> +if (!smc->pre_6_2_numa_affinity) {
> +spapr_ovec_set(spapr->ov5, OV5_FORM2_AFFINITY);
> +}
> +
>  /* advertise support for dedicated HP event source to guests */
>  if (spapr->use_hotplug_event_source) {
>  

[PATCH v8 7/7] memory_hotplug.c: send DEVICE_UNPLUG_GUEST_ERROR in acpi_memory_hotplug_write()

2021-09-06 Thread Daniel Henrique Barboza
MEM_UNPLUG_ERROR is deprecated since the introduction of
DEVICE_UNPLUG_GUEST_ERROR. Keep emitting both while the deprecation of
MEM_UNPLUG_ERROR is pending.

CC: Michael S. Tsirkin 
CC: Igor Mammedov 
Acked-by: Michael S. Tsirkin 
Reviewed-by: Greg Kurz 
Reviewed-by: David Gibson 
Reviewed-by: Igor Mammedov 
Reviewed-by: Markus Armbruster 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/acpi/memory_hotplug.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index 6a71de408b..d0fffcf787 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -8,6 +8,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-events-acpi.h"
 #include "qapi/qapi-events-machine.h"
+#include "qapi/qapi-events-qdev.h"
 
 #define MEMORY_SLOTS_NUMBER  "MDNR"
 #define MEMORY_HOTPLUG_IO_REGION "HPMR"
@@ -178,8 +179,16 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr 
addr, uint64_t data,
 hotplug_handler_unplug(hotplug_ctrl, dev, _err);
 if (local_err) {
 trace_mhp_acpi_pc_dimm_delete_failed(mem_st->selector);
+
+/*
+ * Send both MEM_UNPLUG_ERROR and DEVICE_UNPLUG_GUEST_ERROR
+ * while the deprecation of MEM_UNPLUG_ERROR is
+ * pending.
+ */
 qapi_event_send_mem_unplug_error(dev->id ? : "",
  error_get_pretty(local_err));
+qapi_event_send_device_unplug_guest_error(!!dev->id, dev->id,
+  dev->canonical_path);
 error_free(local_err);
 break;
 }
-- 
2.31.1




[PATCH v8 5/7] qapi/qdev.json: add DEVICE_UNPLUG_GUEST_ERROR QAPI event

2021-09-06 Thread Daniel Henrique Barboza
At this moment we only provide one event to report a hotunplug error,
MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
machine is now able to report unplug errors for other device types, such
as CPUs.

Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
create a generic DEVICE_UNPLUG_GUEST_ERROR event that can be used by all
guest side unplug errors in the future. This event has a similar API as
the existing DEVICE_DELETED event, always providing the QOM path of the
device and dev->id if there's any.

With this new generic event, MEM_UNPLUG_ERROR is now marked as deprecated.

Reviewed-by: David Gibson 
Reviewed-by: Greg Kurz 
Reviewed-by: Markus Armbruster 
Signed-off-by: Daniel Henrique Barboza 
---
 docs/about/deprecated.rst | 10 ++
 qapi/machine.json |  7 ++-
 qapi/qdev.json| 27 ++-
 stubs/qdev.c  |  7 +++
 4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 6d438f1c8d..1a8ffc9381 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -204,6 +204,16 @@ The ``I7200`` guest CPU relies on the nanoMIPS ISA, which 
is deprecated
 (the ISA has never been upstreamed to a compiler toolchain). Therefore
 this CPU is also deprecated.
 
+
+QEMU API (QAPI) events
+--
+
+``MEM_UNPLUG_ERROR`` (since 6.2)
+
+
+Use the more generic event ``DEVICE_UNPLUG_GUEST_ERROR`` instead.
+
+
 System emulator machines
 
 
diff --git a/qapi/machine.json b/qapi/machine.json
index 157712f006..cd397f1ee4 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1271,6 +1271,10 @@
 #
 # @msg: Informative message
 #
+# Features:
+# @deprecated: This event is deprecated. Use @DEVICE_UNPLUG_GUEST_ERROR
+#  instead.
+#
 # Since: 2.4
 #
 # Example:
@@ -1283,7 +1287,8 @@
 #
 ##
 { 'event': 'MEM_UNPLUG_ERROR',
-  'data': { 'device': 'str', 'msg': 'str' } }
+  'data': { 'device': 'str', 'msg': 'str' },
+  'features': ['deprecated'] }
 
 ##
 # @SMPConfiguration:
diff --git a/qapi/qdev.json b/qapi/qdev.json
index 0e9cb2ae88..d75e68908b 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -84,7 +84,9 @@
 #This command merely requests that the guest begin the hot removal
 #process.  Completion of the device removal process is signaled with a
 #DEVICE_DELETED event. Guest reset will automatically complete removal
-#for all devices.
+#for all devices.  If a guest-side error in the hot removal process is
+#detected, the device will not be removed and a 
DEVICE_UNPLUG_GUEST_ERROR
+#event is sent.  Some errors cannot be detected.
 #
 # Since: 0.14
 #
@@ -124,3 +126,26 @@
 ##
 { 'event': 'DEVICE_DELETED',
   'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @DEVICE_UNPLUG_GUEST_ERROR:
+#
+# Emitted when a device hot unplug fails due to a guest reported error.
+#
+# @device: the device's ID if it has one
+#
+# @path: the device's QOM path
+#
+# Since: 6.2
+#
+# Example:
+#
+# <- { "event": "DEVICE_UNPLUG_GUEST_ERROR"
+#  "data": { "device": "core1",
+#"path": "/machine/peripheral/core1" },
+#  },
+#  "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
+#
+##
+{ 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
+  'data': { '*device': 'str', 'path': 'str' } }
diff --git a/stubs/qdev.c b/stubs/qdev.c
index 92e6143134..28d6d531e6 100644
--- a/stubs/qdev.c
+++ b/stubs/qdev.c
@@ -21,3 +21,10 @@ void qapi_event_send_device_deleted(bool has_device,
 {
 /* Nothing to do. */
 }
+
+void qapi_event_send_device_unplug_guest_error(bool has_device,
+   const char *device,
+   const char *path
+{
+/* Nothing to do. */
+}
-- 
2.31.1




[PATCH v8 6/7] spapr: use DEVICE_UNPLUG_GUEST_ERROR to report unplug errors

2021-09-06 Thread Daniel Henrique Barboza
Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal
error path, signalling that the hotunplug process wasn't successful.
This allow us to send a DEVICE_UNPLUG_GUEST_ERROR in drc_unisolate_logical()
to signal this error to the management layer.

We also have another error path in spapr_memory_unplug_rollback() for
configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs
in the hotunplug error path, but it will reconfigure them. Let's send
the DEVICE_UNPLUG_GUEST_ERROR event in that code path as well to cover the
case of older kernels.

Acked-by: David Gibson 
Reviewed-by: Greg Kurz 
Reviewed-by: Markus Armbruster 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c | 10 +-
 hw/ppc/spapr_drc.c |  9 +
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4f1ee90e9e..206c536d3a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -29,6 +29,7 @@
 #include "qemu/datadir.h"
 #include "qapi/error.h"
 #include "qapi/qapi-events-machine.h"
+#include "qapi/qapi-events-qdev.h"
 #include "qapi/visitor.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/hostmem.h"
@@ -3686,11 +3687,18 @@ void spapr_memory_unplug_rollback(SpaprMachineState 
*spapr, DeviceState *dev)
 
 /*
  * Tell QAPI that something happened and the memory
- * hotunplug wasn't successful.
+ * hotunplug wasn't successful. Keep sending
+ * MEM_UNPLUG_ERROR even while sending
+ * DEVICE_UNPLUG_GUEST_ERROR until the deprecation of
+ * MEM_UNPLUG_ERROR is due.
  */
 qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
  "for device %s", dev->id);
+
 qapi_event_send_mem_unplug_error(dev->id ? : "", qapi_error);
+
+qapi_event_send_device_unplug_guest_error(!!dev->id, dev->id,
+  dev->canonical_path);
 }
 
 /* Callback to be called during DRC release. */
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index a4d9496f76..f8ac0a10df 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -17,6 +17,8 @@
 #include "hw/ppc/spapr_drc.h"
 #include "qom/object.h"
 #include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qapi/qapi-events-qdev.h"
 #include "qapi/visitor.h"
 #include "qemu/error-report.h"
 #include "hw/ppc/spapr.h" /* for RTAS return codes */
@@ -173,10 +175,9 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
  "for device %s", drc->dev->id);
 }
 
-/*
- * TODO: send a QAPI DEVICE_UNPLUG_ERROR event when
- * it is implemented.
- */
+qapi_event_send_device_unplug_guest_error(!!drc->dev->id,
+  drc->dev->id,
+  
drc->dev->canonical_path);
 }
 
 return RTAS_OUT_SUCCESS; /* Nothing to do */
-- 
2.31.1




[PATCH v8 2/7] spapr.c: handle dev->id in spapr_memory_unplug_rollback()

2021-09-06 Thread Daniel Henrique Barboza
As done in hw/acpi/memory_hotplug.c, pass an empty string if dev->id
is NULL to qapi_event_send_mem_unplug_error() to avoid relying on
a behavior that can be changed in the future.

Suggested-by: Markus Armbruster 
Reviewed-by: Greg Kurz 
Reviewed-by: David Gibson 
Reviewed-by: Markus Armbruster 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 81699d4f8b..4f1ee90e9e 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3690,7 +3690,7 @@ void spapr_memory_unplug_rollback(SpaprMachineState 
*spapr, DeviceState *dev)
  */
 qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
  "for device %s", dev->id);
-qapi_event_send_mem_unplug_error(dev->id, qapi_error);
+qapi_event_send_mem_unplug_error(dev->id ? : "", qapi_error);
 }
 
 /* Callback to be called during DRC release. */
-- 
2.31.1




[PATCH v8 3/7] spapr_drc.c: do not error_report() when drc->dev->id == NULL

2021-09-06 Thread Daniel Henrique Barboza
The error_report() call in drc_unisolate_logical() is not considering
that drc->dev->id can be NULL, and the underlying functions error_report()
calls to do its job (vprintf(), g_strdup_printf() ...) has undefined
behavior when trying to handle "%s" with NULL arguments.

Besides, there is no utility into reporting that an unknown device was
rejected by the guest.

Acked-by: David Gibson 
Reviewed-by: Greg Kurz 
Reviewed-by: Markus Armbruster 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_drc.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index a2f2634601..a4d9496f76 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -167,8 +167,11 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
 }
 
 drc->unplug_requested = false;
-error_report("Device hotunplug rejected by the guest "
- "for device %s", drc->dev->id);
+
+if (drc->dev->id) {
+error_report("Device hotunplug rejected by the guest "
+ "for device %s", drc->dev->id);
+}
 
 /*
  * TODO: send a QAPI DEVICE_UNPLUG_ERROR event when
-- 
2.31.1




[PATCH v8 1/7] memory_hotplug.c: handle dev->id = NULL in acpi_memory_hotplug_write()

2021-09-06 Thread Daniel Henrique Barboza
qapi_event_send_mem_unplug_error() deals with @device being NULL by
replacing it with an empty string ("") when emitting the event. Aside
from the fact that this behavior (qapi visitor mapping NULL pointer to
"") can be patched/changed someday, there's also the lack of utility
that the event brings to listeners, e.g. "a memory unplug error happened
somewhere".

In theory we should just avoit emitting this event at all if dev->id is
NULL, but this would be an incompatible change to existing guests.
Instead, let's make the forementioned behavior explicit: if dev->id is
NULL, pass an empty string to qapi_event_send_mem_unplug_error().

Suggested-by: Markus Armbruster 
Reviewed-by: Igor Mammedov 
Reviewed-by: Greg Kurz 
Reviewed-by: David Gibson 
Reviewed-by: Markus Armbruster 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/acpi/memory_hotplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index af37889423..6a71de408b 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -178,7 +178,7 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr 
addr, uint64_t data,
 hotplug_handler_unplug(hotplug_ctrl, dev, _err);
 if (local_err) {
 trace_mhp_acpi_pc_dimm_delete_failed(mem_st->selector);
-qapi_event_send_mem_unplug_error(dev->id,
+qapi_event_send_mem_unplug_error(dev->id ? : "",
  error_get_pretty(local_err));
 error_free(local_err);
 break;
-- 
2.31.1




Re: [PATCH v4 2/5] spapr_numa.c: split FORM1 code into helpers

2021-09-06 Thread Daniel Henrique Barboza




On 9/6/21 9:30 PM, David Gibson wrote:

On Fri, Aug 27, 2021 at 06:24:52AM -0300, Daniel Henrique Barboza wrote:

The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
and doesn't need be concerned with all the legacy support for older
pseries FORM1 guests.

We're also not going to calculate associativity domains based on numa
distance (via spapr_numa_define_associativity_domains) since the
distances will be written directly into new DT properties.

Let's split FORM1 code into its own functions to allow for easier
insertion of FORM2 logic later on.

Reviewed-by: Greg Kurz 
Signed-off-by: Daniel Henrique Barboza 
---
  hw/ppc/spapr_numa.c | 61 +
  1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 779f18b994..04a86f9b5b 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -155,6 +155,32 @@ static void 
spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
  
  }
  
+/*

+ * Set NUMA machine state data based on FORM1 affinity semantics.
+ */
+static void spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
+   MachineState *machine)
+{
+bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
+
+/*
+ * Legacy NUMA guests (pseries-5.1 and older, or guests with only
+ * 1 NUMA node) will not benefit from anything we're going to do
+ * after this point.
+ */
+if (using_legacy_numa) {
+return;
+}


My only concern with this patch is that handling the
"using_legacy_numa" case might logically belong outside the FORM1
code.  I mean, I'm pretty sure using_legacy_numa implies FORM1 in
practice, but conceptually it seems like a more fundamental question
than the DT encoding of the NUMA information.


I'll carry on this suggestion for the next spin, v6, given that the v5 I sent
a few minutes ago is also verifying legacy numa in FORM1 code.


Thanks,


Daniel




+
+if (!spapr_numa_is_symmetrical(machine)) {
+error_report("Asymmetrical NUMA topologies aren't supported "
+ "in the pSeries machine");
+exit(EXIT_FAILURE);
+}
+
+spapr_numa_define_associativity_domains(spapr);
+}
+
  void spapr_numa_associativity_init(SpaprMachineState *spapr,
 MachineState *machine)
  {
@@ -210,22 +236,7 @@ void spapr_numa_associativity_init(SpaprMachineState 
*spapr,
  spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
  }
  
-/*

- * Legacy NUMA guests (pseries-5.1 and older, or guests with only
- * 1 NUMA node) will not benefit from anything we're going to do
- * after this point.
- */
-if (using_legacy_numa) {
-return;
-}
-
-if (!spapr_numa_is_symmetrical(machine)) {
-error_report("Asymmetrical NUMA topologies aren't supported "
- "in the pSeries machine");
-exit(EXIT_FAILURE);
-}
-
-spapr_numa_define_associativity_domains(spapr);
+spapr_numa_FORM1_affinity_init(spapr, machine);
  }
  
  void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,

@@ -302,12 +313,8 @@ int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState 
*spapr, void *fdt,
  return ret;
  }
  
-/*

- * Helper that writes ibm,associativity-reference-points and
- * max-associativity-domains in the RTAS pointed by @rtas
- * in the DT @fdt.
- */
-void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
+static void spapr_numa_FORM1_write_rtas_dt(SpaprMachineState *spapr,
+   void *fdt, int rtas)
  {
  MachineState *ms = MACHINE(spapr);
  SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
@@ -365,6 +372,16 @@ void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, 
void *fdt, int rtas)
   maxdomains, sizeof(maxdomains)));
  }
  
+/*

+ * Helper that writes ibm,associativity-reference-points and
+ * max-associativity-domains in the RTAS pointed by @rtas
+ * in the DT @fdt.
+ */
+void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
+{
+spapr_numa_FORM1_write_rtas_dt(spapr, fdt, rtas);
+}
+
  static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
SpaprMachineState *spapr,
target_ulong opcode,






[PATCH v8 4/7] qapi/qdev.json: fix DEVICE_DELETED parameters doc

2021-09-06 Thread Daniel Henrique Barboza
Clarify that @device is optional and that 'path' is the device
path from QOM.

This change follows Markus' suggestion verbatim, provided in full
context here:

https://lists.gnu.org/archive/html/qemu-devel/2021-07/msg01891.html

Suggested-by: Markus Armbruster 
Reviewed-by: Greg Kurz 
Reviewed-by: Markus Armbruster 
Reviewed-by: David Gibson 
Signed-off-by: Daniel Henrique Barboza 
---
 qapi/qdev.json | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qapi/qdev.json b/qapi/qdev.json
index b83178220b..0e9cb2ae88 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -108,9 +108,9 @@
 # At this point, it's safe to reuse the specified device ID. Device removal can
 # be initiated by the guest or by HMP/QMP commands.
 #
-# @device: device name
+# @device: the device's ID if it has one
 #
-# @path: device path
+# @path: the device's QOM path
 #
 # Since: 1.5
 #
-- 
2.31.1




[PATCH v8 0/7] DEVICE_UNPLUG_GUEST_ERROR QAPI event

2021-09-06 Thread Daniel Henrique Barboza
Hi,

This new version amends the QAPI doc in patch 5, as suggested
by David and Markus, and added all reviewed-by and acked-by
tags.

changes from v7:
- patch 5:
  * s/internal guest/guest reported/
- v7 link: https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg04115.html

Daniel Henrique Barboza (7):
  memory_hotplug.c: handle dev->id = NULL in acpi_memory_hotplug_write()
  spapr.c: handle dev->id in spapr_memory_unplug_rollback()
  spapr_drc.c: do not error_report() when drc->dev->id == NULL
  qapi/qdev.json: fix DEVICE_DELETED parameters doc
  qapi/qdev.json: add DEVICE_UNPLUG_GUEST_ERROR QAPI event
  spapr: use DEVICE_UNPLUG_GUEST_ERROR to report unplug errors
  memory_hotplug.c: send DEVICE_UNPLUG_GUEST_ERROR in
acpi_memory_hotplug_write()

 docs/about/deprecated.rst | 10 ++
 hw/acpi/memory_hotplug.c  | 11 ++-
 hw/ppc/spapr.c| 12 ++--
 hw/ppc/spapr_drc.c| 16 ++--
 qapi/machine.json |  7 ++-
 qapi/qdev.json| 31 ---
 stubs/qdev.c  |  7 +++
 7 files changed, 81 insertions(+), 13 deletions(-)

-- 
2.31.1




Re: [PATCH v4 2/5] spapr_numa.c: split FORM1 code into helpers

2021-09-06 Thread David Gibson
On Fri, Aug 27, 2021 at 06:24:52AM -0300, Daniel Henrique Barboza wrote:
> The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
> and doesn't need be concerned with all the legacy support for older
> pseries FORM1 guests.
> 
> We're also not going to calculate associativity domains based on numa
> distance (via spapr_numa_define_associativity_domains) since the
> distances will be written directly into new DT properties.
> 
> Let's split FORM1 code into its own functions to allow for easier
> insertion of FORM2 logic later on.
> 
> Reviewed-by: Greg Kurz 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr_numa.c | 61 +
>  1 file changed, 39 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> index 779f18b994..04a86f9b5b 100644
> --- a/hw/ppc/spapr_numa.c
> +++ b/hw/ppc/spapr_numa.c
> @@ -155,6 +155,32 @@ static void 
> spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
>  
>  }
>  
> +/*
> + * Set NUMA machine state data based on FORM1 affinity semantics.
> + */
> +static void spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
> +   MachineState *machine)
> +{
> +bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
> +
> +/*
> + * Legacy NUMA guests (pseries-5.1 and older, or guests with only
> + * 1 NUMA node) will not benefit from anything we're going to do
> + * after this point.
> + */
> +if (using_legacy_numa) {
> +return;
> +}

My only concern with this patch is that handling the
"using_legacy_numa" case might logically belong outside the FORM1
code.  I mean, I'm pretty sure using_legacy_numa implies FORM1 in
practice, but conceptually it seems like a more fundamental question
than the DT encoding of the NUMA information.

> +
> +if (!spapr_numa_is_symmetrical(machine)) {
> +error_report("Asymmetrical NUMA topologies aren't supported "
> + "in the pSeries machine");
> +exit(EXIT_FAILURE);
> +}
> +
> +spapr_numa_define_associativity_domains(spapr);
> +}
> +
>  void spapr_numa_associativity_init(SpaprMachineState *spapr,
> MachineState *machine)
>  {
> @@ -210,22 +236,7 @@ void spapr_numa_associativity_init(SpaprMachineState 
> *spapr,
>  spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
>  }
>  
> -/*
> - * Legacy NUMA guests (pseries-5.1 and older, or guests with only
> - * 1 NUMA node) will not benefit from anything we're going to do
> - * after this point.
> - */
> -if (using_legacy_numa) {
> -return;
> -}
> -
> -if (!spapr_numa_is_symmetrical(machine)) {
> -error_report("Asymmetrical NUMA topologies aren't supported "
> - "in the pSeries machine");
> -exit(EXIT_FAILURE);
> -}
> -
> -spapr_numa_define_associativity_domains(spapr);
> +spapr_numa_FORM1_affinity_init(spapr, machine);
>  }
>  
>  void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
> @@ -302,12 +313,8 @@ int 
> spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
>  return ret;
>  }
>  
> -/*
> - * Helper that writes ibm,associativity-reference-points and
> - * max-associativity-domains in the RTAS pointed by @rtas
> - * in the DT @fdt.
> - */
> -void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
> +static void spapr_numa_FORM1_write_rtas_dt(SpaprMachineState *spapr,
> +   void *fdt, int rtas)
>  {
>  MachineState *ms = MACHINE(spapr);
>  SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> @@ -365,6 +372,16 @@ void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, 
> void *fdt, int rtas)
>   maxdomains, sizeof(maxdomains)));
>  }
>  
> +/*
> + * Helper that writes ibm,associativity-reference-points and
> + * max-associativity-domains in the RTAS pointed by @rtas
> + * in the DT @fdt.
> + */
> +void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
> +{
> +spapr_numa_FORM1_write_rtas_dt(spapr, fdt, rtas);
> +}
> +
>  static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
>SpaprMachineState *spapr,
>target_ulong opcode,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH v5 3/4] spapr_numa.c: base FORM2 NUMA affinity support

2021-09-06 Thread Daniel Henrique Barboza
The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.

This patch implements the base FORM2 affinity support as follows:

- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;

- call spapr_numa_associativity_reset() in do_client_architecture_support()
if FORM2 is chosen. This will avoid re-initializing FORM1 artifacts that
were already initialized in spapr_machine_reset();

- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;

- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;

- spapr_post_load changes: since we're adding a new NUMA affinity that
isn't compatible with the existing one, migration must be handled
accordingly because we can't be certain of whether the guest went
through CAS in the source. The solution chosen is to initiate the NUMA
associativity data in spapr_post_load() unconditionally. The worst case
would be to write the DT twice if the guest is in pre-CAS stage.
Otherwise, we're making sure that a FORM1 guest will have the
spapr->numa_assoc_array initialized with the proper information based on
user distance, something that we're not doing with FORM2.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c  |  24 +++
 hw/ppc/spapr_hcall.c|  10 +++
 hw/ppc/spapr_numa.c | 135 +++-
 include/hw/ppc/spapr.h  |   1 +
 include/hw/ppc/spapr_ovec.h |   1 +
 5 files changed, 170 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8e1ff6cd10..8d98e3b08a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1789,6 +1789,22 @@ static int spapr_post_load(void *opaque, int version_id)
 return err;
 }
 
+/*
+ * NUMA data init is made in CAS time. There is no reliable
+ * way of telling whether the guest already went through CAS
+ * in the source due to how spapr_ov5_cas_needed works: a
+ * FORM1 guest can be migrated with ov5_cas empty regardless
+ * of going through CAS first.
+ *
+ * One solution is to always call numa_associativity_reset. The
+ * downside is that a guest migrated before CAS will run
+ * numa_associativity_reset again when going through it, but
+ * at least we're making sure spapr->numa_assoc_array will be
+ * initialized and hotplug operations won't fail in both before
+ * and after CAS migration cases.
+ */
+ spapr_numa_associativity_reset(spapr);
+
 return err;
 }
 
@@ -2755,6 +2771,11 @@ static void spapr_machine_init(MachineState *machine)
 
 spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
 
+/* Do not advertise FORM2 support for pseries-6.1 and older */
+if (!smc->pre_6_2_numa_affinity) {
+spapr_ovec_set(spapr->ov5, OV5_FORM2_AFFINITY);
+}
+
 /* advertise support for dedicated HP event source to guests */
 if (spapr->use_hotplug_event_source) {
 spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
@@ -4700,8 +4721,11 @@ DEFINE_SPAPR_MACHINE(6_2, "6.2", true);
  */
 static void spapr_machine_6_1_class_options(MachineClass *mc)
 {
+SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
 spapr_machine_6_2_class_options(mc);
 compat_props_add(mc->compat_props, hw_compat_6_1, hw_compat_6_1_len);
+smc->pre_6_2_numa_affinity = true;
 }
 
 DEFINE_SPAPR_MACHINE(6_1, "6.1", false);
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 0e9a5b2e40..7efbe93f4b 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -11,6 +11,7 @@
 #include "helper_regs.h"
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_cpu_core.h"
+#include "hw/ppc/spapr_numa.h"
 #include "mmu-hash64.h"
 #include "cpu-models.h"
 

[PATCH v5 2/4] spapr_numa.c: split FORM1 code into helpers

2021-09-06 Thread Daniel Henrique Barboza
The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
and doesn't need be concerned with all the legacy support for older
pseries FORM1 guests.

We're also not going to calculate associativity domains based on numa
distance (via spapr_numa_define_associativity_domains) since the
distances will be written directly into new DT properties.

Let's split FORM1 code into its own functions to allow for easier
insertion of FORM2 logic later on.

Reviewed-by: Greg Kurz 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_numa.c | 61 +
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 9ee4b479fe..84636cb86a 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -155,6 +155,32 @@ static void 
spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
 
 }
 
+/*
+ * Set NUMA machine state data based on FORM1 affinity semantics.
+ */
+static void spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
+   MachineState *machine)
+{
+bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
+
+/*
+ * Legacy NUMA guests (pseries-5.1 and older, or guests with only
+ * 1 NUMA node) will not benefit from anything we're going to do
+ * after this point.
+ */
+if (using_legacy_numa) {
+return;
+}
+
+if (!spapr_numa_is_symmetrical(machine)) {
+error_report("Asymmetrical NUMA topologies aren't supported "
+ "in the pSeries machine");
+exit(EXIT_FAILURE);
+}
+
+spapr_numa_define_associativity_domains(spapr);
+}
+
 void spapr_numa_associativity_reset(SpaprMachineState *spapr)
 {
 SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
@@ -210,22 +236,7 @@ void spapr_numa_associativity_reset(SpaprMachineState 
*spapr)
 spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
 }
 
-/*
- * Legacy NUMA guests (pseries-5.1 and older, or guests with only
- * 1 NUMA node) will not benefit from anything we're going to do
- * after this point.
- */
-if (using_legacy_numa) {
-return;
-}
-
-if (!spapr_numa_is_symmetrical(machine)) {
-error_report("Asymmetrical NUMA topologies aren't supported "
- "in the pSeries machine");
-exit(EXIT_FAILURE);
-}
-
-spapr_numa_define_associativity_domains(spapr);
+spapr_numa_FORM1_affinity_init(spapr, machine);
 }
 
 void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
@@ -302,12 +313,8 @@ int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState 
*spapr, void *fdt,
 return ret;
 }
 
-/*
- * Helper that writes ibm,associativity-reference-points and
- * max-associativity-domains in the RTAS pointed by @rtas
- * in the DT @fdt.
- */
-void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
+static void spapr_numa_FORM1_write_rtas_dt(SpaprMachineState *spapr,
+   void *fdt, int rtas)
 {
 MachineState *ms = MACHINE(spapr);
 SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
@@ -365,6 +372,16 @@ void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, 
void *fdt, int rtas)
  maxdomains, sizeof(maxdomains)));
 }
 
+/*
+ * Helper that writes ibm,associativity-reference-points and
+ * max-associativity-domains in the RTAS pointed by @rtas
+ * in the DT @fdt.
+ */
+void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
+{
+spapr_numa_FORM1_write_rtas_dt(spapr, fdt, rtas);
+}
+
 static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
   SpaprMachineState *spapr,
   target_ulong opcode,
-- 
2.31.1




[PATCH v5 1/4] spapr: move NUMA associativity init to machine reset

2021-09-06 Thread Daniel Henrique Barboza
At this moment we only support one form of NUMA affinity, FORM1. This
allows us to init the internal structures during machine_init(), and
given that NUMA distances won't change during the guest lifetime we
don't need to bother with that again.

We're about to introduce FORM2, a new NUMA affinity mode for pSeries
guests. This means that we'll only be certain about the affinity mode
being used after client architecture support. This also means that the
guest can switch affinity modes in machine reset.

Let's prepare the ground for the FORM2 support by moving the NUMA
internal data init from machine_init() to machine_reset(). Change the
name to spapr_numa_associativity_reset() to make it clearer that this is
a function that can be called multiple times during the guest lifecycle.
We're also simplifying its current API since this method will be called
during CAS time (do_client_architecture_support()) later on and there's no
MachineState pointer already solved there.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c  | 6 +++---
 hw/ppc/spapr_numa.c | 4 ++--
 include/hw/ppc/spapr_numa.h | 9 +
 3 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d39fd4e644..8e1ff6cd10 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1621,6 +1621,9 @@ static void spapr_machine_reset(MachineState *machine)
  */
 spapr_irq_reset(spapr, _fatal);
 
+/* Reset numa_assoc_array */
+spapr_numa_associativity_reset(spapr);
+
 /*
  * There is no CAS under qtest. Simulate one to please the code that
  * depends on spapr->ov5_cas. This is especially needed to test device
@@ -2808,9 +2811,6 @@ static void spapr_machine_init(MachineState *machine)
 
 spapr->gpu_numa_id = spapr_numa_initial_nvgpu_numa_id(machine);
 
-/* Init numa_assoc_array */
-spapr_numa_associativity_init(spapr, machine);
-
 if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
 ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0,
   spapr->max_compat_pvr)) {
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 779f18b994..9ee4b479fe 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -155,10 +155,10 @@ static void 
spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
 
 }
 
-void spapr_numa_associativity_init(SpaprMachineState *spapr,
-   MachineState *machine)
+void spapr_numa_associativity_reset(SpaprMachineState *spapr)
 {
 SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+MachineState *machine = MACHINE(spapr);
 int nb_numa_nodes = machine->numa_state->num_nodes;
 int i, j, max_nodes_with_gpus;
 bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
index 6f9f02d3de..0e457bba57 100644
--- a/include/hw/ppc/spapr_numa.h
+++ b/include/hw/ppc/spapr_numa.h
@@ -16,14 +16,7 @@
 #include "hw/boards.h"
 #include "hw/ppc/spapr.h"
 
-/*
- * Having both SpaprMachineState and MachineState as arguments
- * feels odd, but it will spare a MACHINE() call inside the
- * function. spapr_machine_init() is the only caller for it, and
- * it has both pointers resolved already.
- */
-void spapr_numa_associativity_init(SpaprMachineState *spapr,
-   MachineState *machine);
+void spapr_numa_associativity_reset(SpaprMachineState *spapr);
 void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas);
 void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
int offset, int nodeid);
-- 
2.31.1




[PATCH v5 0/4] pSeries FORM2 affinity support

2021-09-06 Thread Daniel Henrique Barboza
Hi,

In this new version, the biggest change is that now we're initializing
NUMA associativity internal data during machine_reset(), instead of
machine_init(), to allow for the guest to switch between FORM1 and
FORM2 during guest reset. All other changes are consequence of this
design change.

Changes from v4:
- former patch 1:
  * dropped, pseries-6.2 machine type is already available
- new patch 1:
  * move numa associativity init to machine reset
- patch 3:
  * avoid resetting associativity data if FORM1 was chosen
- former patch 4:
  * dropped, folded into patch 1
- patch 4 (former 5):
  * move both FORM1 verifications to post-CAS
- v4 link: https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg04860.html
 

Daniel Henrique Barboza (4):
  spapr: move NUMA associativity init to machine reset
  spapr_numa.c: split FORM1 code into helpers
  spapr_numa.c: base FORM2 NUMA affinity support
  spapr: move FORM1 verifications to do_client_architecture_support()

 hw/ppc/spapr.c  |  63 +-
 hw/ppc/spapr_hcall.c|  16 +++
 hw/ppc/spapr_numa.c | 225 +---
 include/hw/ppc/spapr.h  |   1 +
 include/hw/ppc/spapr_numa.h |  10 +-
 include/hw/ppc/spapr_ovec.h |   1 +
 6 files changed, 253 insertions(+), 63 deletions(-)

-- 
2.31.1




[PATCH v5 4/4] spapr: move FORM1 verifications to do_client_architecture_support()

2021-09-06 Thread Daniel Henrique Barboza
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also enable to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.

Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c  | 33 -
 hw/ppc/spapr_hcall.c|  6 +
 hw/ppc/spapr_numa.c | 49 -
 include/hw/ppc/spapr_numa.h |  1 +
 4 files changed, 50 insertions(+), 39 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8d98e3b08a..c974c07fb8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2797,39 +2797,6 @@ static void spapr_machine_init(MachineState *machine)
 /* init CPUs */
 spapr_init_cpus(spapr);
 
-/*
- * check we don't have a memory-less/cpu-less NUMA node
- * Firmware relies on the existing memory/cpu topology to provide the
- * NUMA topology to the kernel.
- * And the linux kernel needs to know the NUMA topology at start
- * to be able to hotplug CPUs later.
- */
-if (machine->numa_state->num_nodes) {
-for (i = 0; i < machine->numa_state->num_nodes; ++i) {
-/* check for memory-less node */
-if (machine->numa_state->nodes[i].node_mem == 0) {
-CPUState *cs;
-int found = 0;
-/* check for cpu-less node */
-CPU_FOREACH(cs) {
-PowerPCCPU *cpu = POWERPC_CPU(cs);
-if (cpu->node_id == i) {
-found = 1;
-break;
-}
-}
-/* memory-less and cpu-less node */
-if (!found) {
-error_report(
-   "Memory-less/cpu-less nodes are not supported (node 
%d)",
- i);
-exit(1);
-}
-}
-}
-
-}
-
 spapr->gpu_numa_id = spapr_numa_initial_nvgpu_numa_id(machine);
 
 if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 7efbe93f4b..27ee713600 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1202,9 +1202,15 @@ target_ulong do_client_architecture_support(PowerPCCPU 
*cpu,
  * If the guest chooses FORM2 we need to reset the associativity
  * information - it is being defaulted to FORM1 during
  * spapr_machine_reset().
+ *
+ * If we're sure that we'll be using FORM1, verify now if we have
+ * a configuration or condition that is not available for FORM1
+ * (namely asymmetric NUMA topologies and empty NUMA nodes).
  */
 if (spapr_ovec_test(spapr->ov5_cas, OV5_FORM2_AFFINITY)) {
 spapr_numa_associativity_reset(spapr);
+} else {
+spapr_numa_check_FORM1_constraints(MACHINE(spapr));
 }
 
 /*
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index ca276e16cb..0c57d03184 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -155,6 +155,49 @@ static void 
spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
 
 }
 
+void spapr_numa_check_FORM1_constraints(MachineState *machine)
+{
+int i;
+
+if (!spapr_numa_is_symmetrical(machine)) {
+error_report("Asymmetrical NUMA topologies aren't supported "
+ "in the pSeries machine");
+exit(EXIT_FAILURE);
+}
+
+/*
+ * check we don't have a memory-less/cpu-less NUMA node
+ * Firmware relies on the existing memory/cpu topology to provide the
+ * NUMA topology to the kernel.
+ * And the linux kernel needs to know the NUMA topology at start
+ * to be able to hotplug CPUs later.
+ */
+if (machine->numa_state->num_nodes) {
+for (i = 0; i < machine->numa_state->num_nodes; ++i) {
+/* check for memory-less node */
+if (machine->numa_state->nodes[i].node_mem == 0) {
+CPUState *cs;
+int found = 0;
+/* check for cpu-less node */
+CPU_FOREACH(cs) {
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+if (cpu->node_id == i) {
+found = 1;
+break;
+}
+}
+/* memory-less and cpu-less node */
+if (!found) {
+error_report(
+   "Memory-less/cpu-less nodes are not supported (node 
%d)",
+ i);
+exit(EXIT_FAILURE);
+}
+}
+}
+}
+}
+
 /*
  * Set NUMA machine state data based on 

Re: [PATCH v7 5/7] qapi/qdev.json: add DEVICE_UNPLUG_GUEST_ERROR QAPI event

2021-09-06 Thread David Gibson
On Mon, Sep 06, 2021 at 09:40:47AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 9/4/21 8:49 AM, Markus Armbruster wrote:
> > David Gibson  writes:
> > 
> > > On Wed, Sep 01, 2021 at 03:19:26PM +0200, Markus Armbruster wrote:
> > > > Daniel Henrique Barboza  writes:
> > > > 
> > > > > At this moment we only provide one event to report a hotunplug error,
> > > > > MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> > > > > machine is now able to report unplug errors for other device types, 
> > > > > such
> > > > > as CPUs.
> > > > > 
> > > > > Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> > > > > create a generic DEVICE_UNPLUG_GUEST_ERROR event that can be used by 
> > > > > all
> > > > > guest side unplug errors in the future. This event has a similar API 
> > > > > as
> > > > > the existing DEVICE_DELETED event, always providing the QOM path of 
> > > > > the
> > > > > device and dev->id if there's any.
> > > > > 
> > > > > With this new generic event, MEM_UNPLUG_ERROR is now marked as 
> > > > > deprecated.
> > > > > 
> > > > > Signed-off-by: Daniel Henrique Barboza 
> > > > > ---
> > > > 
> > > > [...]
> > > > 
> > > > > diff --git a/qapi/qdev.json b/qapi/qdev.json
> > > > > index 0e9cb2ae88..8b1a1dd43b 100644
> > > > > --- a/qapi/qdev.json
> > > > > +++ b/qapi/qdev.json
> > > > > @@ -84,7 +84,9 @@
> > > > >   #This command merely requests that the guest begin the hot 
> > > > > removal
> > > > >   #process.  Completion of the device removal process is 
> > > > > signaled with a
> > > > >   #DEVICE_DELETED event. Guest reset will automatically 
> > > > > complete removal
> > > > > -#for all devices.
> > > > > +#for all devices.  If a guest-side error in the hot removal 
> > > > > process is
> > > > > +#detected, the device will not be removed and a 
> > > > > DEVICE_UNPLUG_GUEST_ERROR
> > > > > +#event is sent.  Some errors cannot be detected.
> > > > >   #
> > > > >   # Since: 0.14
> > > > >   #
> > > > > @@ -124,3 +126,27 @@
> > > > >   ##
> > > > >   { 'event': 'DEVICE_DELETED',
> > > > > 'data': { '*device': 'str', 'path': 'str' } }
> > > > > +
> > > > > +##
> > > > > +# @DEVICE_UNPLUG_GUEST_ERROR:
> > > > > +#
> > > > > +# Emitted when a device hot unplug fails due to an internal guest
> > > > > +# error.
> > > > 
> > > > Suggest to scratch "internal".
> > > 
> > > I'd suggest s/internal guest/guest reported/.  "guest error" is a bit
> > > vague, this doesn't neccessarily indicate a bug in the guest.  The key
> > > point is that we're reporting this event because the guest performed
> > > some explicit action to tell us something went wrong with the plug
> > > attempt.
> > 
> > Yes, that's better.
> 
> 
> I agree.  David, let me know if you need another spin with this
> change.

Yes please.  I'm afraid I kind of lost track of the last posting.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v3 2/9] qapi: make blockdev-add a coroutine command

2021-09-06 Thread Vladimir Sementsov-Ogievskiy

06.09.2021 22:28, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


We are going to support nbd reconnect on open in a next commit. This
means that we want to do several connection attempts during some time.
And this should be done in a coroutine, otherwise we'll stuck.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 06674c25c9..6e4042530a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4219,7 +4219,8 @@
  # <- { "return": {} }
  #
  ##
-{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
+{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
+  'coroutine': true }
  
  ##

  # @blockdev-reopen:


Why is this safe?

Prior discusson:
Message-ID: <87lfq0yp9v@dusky.pond.sub.org>
https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg04921.html



Hmm.. I'm afraid, that I can't prove that it's safe. At least it will mean to 
audit .bdrv_open() of all block drivers.. And nothing prevents creating new 
incompatible drivers in future..

On the other hand, looking at qmp_blockdev_add, bdrv_open() is the only thing 
of interest.

And theoretically, bdrv_open() should work in coroutine context. We do call 
this function from coroutine_fn functions sometimes. So, maybe, if in some 
circumstances, bdrv_open() is not compatible with coroutine context, we can 
consider it as a bug? And fix it later, if it happen?

--
Best regards,
Vladimir



Re: [PATCH] hw/display/ati_2d: Fix buffer overflow in ati_2d_blt (CVE-2021-3638)

2021-09-06 Thread Alexander Bulekov
On 210906 2019, Philippe Mathieu-Daudé wrote:
> (Forgot to Cc Alex for eventual reproducer)

Here you go. Should we be fuzzing this on OSS-Fuzz?

= 8< =

/*
 * cat << EOF | ./qemu-system-i386 -display none -machine accel=qtest, -m \
 * 512M -device ati-vga,romfile= -nodefaults -qtest /dev/null -qtest stdio
 * outl 0xcf8 0x80001018
 * outl 0xcfc 0xe100
 * outl 0xcf8 0x80001004
 * outw 0xcfc 0x02
 * write 0xe10016c4 0x1 0x04
 * write 0xe10016e4 0x1 0x58
 * write 0xe1001438 0x4 0x041a
 * write 0xe100143c 0x4 0x0115
 * EOF
 */
static void test_fuzz(void)
{
QTestState *s = qtest_init(
"-display none , -m 512M -device ati-vga,romfile= -nodefaults -qtest 
/dev/null");
qtest_outl(s, 0xcf8, 0x80001018);
qtest_outl(s, 0xcfc, 0xe100);
qtest_outl(s, 0xcf8, 0x80001004);
qtest_outw(s, 0xcfc, 0x02);
qtest_bufwrite(s, 0xe10016c4, "\x04", 0x1);
qtest_bufwrite(s, 0xe10016e4, "\x58", 0x1);
qtest_bufwrite(s, 0xe1001438, "\x04\x00\x00\x1a", 0x4);
qtest_bufwrite(s, 0xe100143c, "\x01\x00\x00\x15", 0x4);
qtest_quit(s);
}

= >8 =
-Alex

> 
> On 9/6/21 6:44 PM, Mauro Matteo Cascella wrote:
> > On Mon, Sep 6, 2021 at 5:31 PM Philippe Mathieu-Daudé  
> > wrote:
> >>
> >> When building QEMU with DEBUG_ATI defined then running with
> >> '-device ati-vga,romfile="" -d unimp,guest_errors -trace ati\*'
> >> we get:
> >>
> >>   ati_mm_write 4 0x16c0 DP_CNTL <- 0x1
> >>   ati_mm_write 4 0x146c DP_GUI_MASTER_CNTL <- 0x2
> >>   ati_mm_write 4 0x16c8 DP_MIX <- 0xff
> >>   ati_mm_write 4 0x16c4 DP_DATATYPE <- 0x2
> >>   ati_mm_write 4 0x224 CRTC_OFFSET <- 0x0
> >>   ati_mm_write 4 0x142c DST_PITCH_OFFSET <- 0xfe0
> >>   ati_mm_write 4 0x1420 DST_Y <- 0x3fff
> >>   ati_mm_write 4 0x1410 DST_HEIGHT <- 0x3fff
> >>   ati_mm_write 4 0x1588 DST_WIDTH_X <- 0x3fff3fff
> >>   ati_2d_blt: vram:0x7fff5fa0 addr:0 ds:0x7fff61273800 stride:2560 
> >> bpp:32 rop:0xff
> >>   ati_2d_blt: 0 0 0, 0 127 0, (0,0) -> (16383,16383) 16383x16383 > ^
> >>   ati_2d_blt: pixman_fill(dst:0x7fff5fa0, stride:254, bpp:8, x:16383, 
> >> y:16383, w:16383, h:16383, xor:0xff00)
> >>   Thread 3 "qemu-system-i38" received signal SIGSEGV, Segmentation fault.
> >>   (gdb) bt
> >>   #0  0x77f62ce0 in sse2_fill.lto_priv () at 
> >> /lib64/libpixman-1.so.0
> >>   #1  0x77f09278 in pixman_fill () at /lib64/libpixman-1.so.0
> >>   #2  0x57b5a9af in ati_2d_blt (s=0x63128800) at 
> >> hw/display/ati_2d.c:196
> >>   #3  0x57b4b5a2 in ati_mm_write (opaque=0x63128800, 
> >> addr=5512, data=1073692671, size=4) at hw/display/ati.c:843
> >>   #4  0x58b90ec4 in memory_region_write_accessor 
> >> (mr=0x63139cc0, addr=5512, ..., size=4, ...) at softmmu/memory.c:492
> >>
> >> Commit 584acf34cb0 ("ati-vga: Fix reverse bit blts") introduced
> >> the local dst_x and dst_y which adjust the (x, y) coordinates
> >> depending on the direction in the SRCCOPY ROP3 operation, but
> >> forgot to address the same issue for the PATCOPY, BLACKNESS and
> >> WHITENESS operations, which also call pixman_fill().
> >>
> >> Fix that now by using the adjusted coordinates in the pixman_fill
> >> call, and update the related debug printf().
> >>
> >> Reported-by: Qiang Liu 
> >> Fixes: 584acf34cb0 ("ati-vga: Fix reverse bit blts")
> >> Signed-off-by: Philippe Mathieu-Daudé 
> >> ---
> >>  hw/display/ati_2d.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/hw/display/ati_2d.c b/hw/display/ati_2d.c
> >> index 4dc10ea7952..692bec91de4 100644
> >> --- a/hw/display/ati_2d.c
> >> +++ b/hw/display/ati_2d.c
> >> @@ -84,7 +84,7 @@ void ati_2d_blt(ATIVGAState *s)
> >>  DPRINTF("%d %d %d, %d %d %d, (%d,%d) -> (%d,%d) %dx%d %c %c\n",
> >>  s->regs.src_offset, s->regs.dst_offset, 
> >> s->regs.default_offset,
> >>  s->regs.src_pitch, s->regs.dst_pitch, s->regs.default_pitch,
> >> -s->regs.src_x, s->regs.src_y, s->regs.dst_x, s->regs.dst_y,
> >> +s->regs.src_x, s->regs.src_y, dst_x, dst_y,
> >>  s->regs.dst_width, s->regs.dst_height,
> >>  (s->regs.dp_cntl & DST_X_LEFT_TO_RIGHT ? '>' : '<'),
> >>  (s->regs.dp_cntl & DST_Y_TOP_TO_BOTTOM ? 'v' : '^'));
> >> @@ -180,11 +180,11 @@ void ati_2d_blt(ATIVGAState *s)
> >>  dst_stride /= sizeof(uint32_t);
> >>  DPRINTF("pixman_fill(%p, %d, %d, %d, %d, %d, %d, %x)\n",
> >>  dst_bits, dst_stride, bpp,
> >> -s->regs.dst_x, s->regs.dst_y,
> >> +dst_x, dst_y,
> >>  s->regs.dst_width, s->regs.dst_height,
> >>  filler);
> >>  pixman_fill((uint32_t *)dst_bits, dst_stride, bpp,
> >> -s->regs.dst_x, s->regs.dst_y,
> >> +dst_x, dst_y,
> >>  s->regs.dst_width, s->regs.dst_height,
> >>  filler);
> >>  if (dst_bits 

Re: [PATCH] hw/display/ati_2d: Fix buffer overflow in ati_2d_blt (CVE-2021-3638)

2021-09-06 Thread BALATON Zoltan

On Mon, 6 Sep 2021, Philippe Mathieu-Daudé wrote:

(Forgot to Cc Alex for eventual reproducer)

On 9/6/21 6:44 PM, Mauro Matteo Cascella wrote:

On Mon, Sep 6, 2021 at 5:31 PM Philippe Mathieu-Daudé  wrote:


When building QEMU with DEBUG_ATI defined then running with
'-device ati-vga,romfile="" -d unimp,guest_errors -trace ati\*'
we get:

  ati_mm_write 4 0x16c0 DP_CNTL <- 0x1
  ati_mm_write 4 0x146c DP_GUI_MASTER_CNTL <- 0x2
  ati_mm_write 4 0x16c8 DP_MIX <- 0xff
  ati_mm_write 4 0x16c4 DP_DATATYPE <- 0x2
  ati_mm_write 4 0x224 CRTC_OFFSET <- 0x0
  ati_mm_write 4 0x142c DST_PITCH_OFFSET <- 0xfe0
  ati_mm_write 4 0x1420 DST_Y <- 0x3fff
  ati_mm_write 4 0x1410 DST_HEIGHT <- 0x3fff
  ati_mm_write 4 0x1588 DST_WIDTH_X <- 0x3fff3fff
  ati_2d_blt: vram:0x7fff5fa0 addr:0 ds:0x7fff61273800 stride:2560 bpp:32 
rop:0xff
  ati_2d_blt: 0 0 0, 0 127 0, (0,0) -> (16383,16383) 16383x16383 > ^
  ati_2d_blt: pixman_fill(dst:0x7fff5fa0, stride:254, bpp:8, x:16383, 
y:16383, w:16383, h:16383, xor:0xff00)
  Thread 3 "qemu-system-i38" received signal SIGSEGV, Segmentation fault.
  (gdb) bt
  #0  0x77f62ce0 in sse2_fill.lto_priv () at /lib64/libpixman-1.so.0
  #1  0x77f09278 in pixman_fill () at /lib64/libpixman-1.so.0
  #2  0x57b5a9af in ati_2d_blt (s=0x63128800) at 
hw/display/ati_2d.c:196
  #3  0x57b4b5a2 in ati_mm_write (opaque=0x63128800, addr=5512, 
data=1073692671, size=4) at hw/display/ati.c:843
  #4  0x58b90ec4 in memory_region_write_accessor (mr=0x63139cc0, 
addr=5512, ..., size=4, ...) at softmmu/memory.c:492

Commit 584acf34cb0 ("ati-vga: Fix reverse bit blts") introduced
the local dst_x and dst_y which adjust the (x, y) coordinates
depending on the direction in the SRCCOPY ROP3 operation, but
forgot to address the same issue for the PATCOPY, BLACKNESS and
WHITENESS operations, which also call pixman_fill().

Fix that now by using the adjusted coordinates in the pixman_fill
call, and update the related debug printf().

Reported-by: Qiang Liu 
Fixes: 584acf34cb0 ("ati-vga: Fix reverse bit blts")
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/display/ati_2d.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/display/ati_2d.c b/hw/display/ati_2d.c
index 4dc10ea7952..692bec91de4 100644
--- a/hw/display/ati_2d.c
+++ b/hw/display/ati_2d.c
@@ -84,7 +84,7 @@ void ati_2d_blt(ATIVGAState *s)
 DPRINTF("%d %d %d, %d %d %d, (%d,%d) -> (%d,%d) %dx%d %c %c\n",
 s->regs.src_offset, s->regs.dst_offset, s->regs.default_offset,
 s->regs.src_pitch, s->regs.dst_pitch, s->regs.default_pitch,
-s->regs.src_x, s->regs.src_y, s->regs.dst_x, s->regs.dst_y,
+s->regs.src_x, s->regs.src_y, dst_x, dst_y,
 s->regs.dst_width, s->regs.dst_height,
 (s->regs.dp_cntl & DST_X_LEFT_TO_RIGHT ? '>' : '<'),
 (s->regs.dp_cntl & DST_Y_TOP_TO_BOTTOM ? 'v' : '^'));
@@ -180,11 +180,11 @@ void ati_2d_blt(ATIVGAState *s)
 dst_stride /= sizeof(uint32_t);
 DPRINTF("pixman_fill(%p, %d, %d, %d, %d, %d, %d, %x)\n",
 dst_bits, dst_stride, bpp,
-s->regs.dst_x, s->regs.dst_y,
+dst_x, dst_y,
 s->regs.dst_width, s->regs.dst_height,
 filler);
 pixman_fill((uint32_t *)dst_bits, dst_stride, bpp,
-s->regs.dst_x, s->regs.dst_y,
+dst_x, dst_y,
 s->regs.dst_width, s->regs.dst_height,
 filler);
 if (dst_bits >= s->vga.vram_ptr + s->vga.vbe_start_addr &&
--
2.31.1



Tested-by: Mauro Matteo Cascella 


Thanks. I wouldn't be surprise if we get another CVE in this code /
file / function ASAP this patch get merged... The code calls for a
rewrite, as per this function comment in its header:

void ati_2d_blt(ATIVGAState *s)
{
   /* FIXME it is probably more complex than this and may need to be */
   /* rewritten but for now as a start just to get some output: */


It's also broken currently since the previous CVE fixes when I've tried to 
change it to only use unsigned values to avoid underflows and get away 
with only checking for overflows which simplifies it a bit. But turns out 
that's wrong, the hardware does allow negative values and while most 
drivers don't use that (such as Linux and MorphOS, so they still work), at 
least Solaris driver does and it produces broken picture now once X 
starts. (This can be reproduced with Solaris 10 x86 iso, but Solaris also 
needs more features to be implemented to make it work so fixing this alone 
is not enough to get past the first screen, text will be still missing.) 
To fix this we will need to revert to signed values and check for both 
over and underflow. I planned to try that eventually but haven't yet got 
around to it.


I don't think assigning a CVE to a bug that is in an experimental and 
largely unused part and happens when one enables debug code really 

Re: [PATCH v3 2/9] qapi: make blockdev-add a coroutine command

2021-09-06 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> We are going to support nbd reconnect on open in a next commit. This
> means that we want to do several connection attempts during some time.
> And this should be done in a coroutine, otherwise we'll stuck.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  qapi/block-core.json | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 06674c25c9..6e4042530a 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -4219,7 +4219,8 @@
>  # <- { "return": {} }
>  #
>  ##
> -{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
> +{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
> +  'coroutine': true }
>  
>  ##
>  # @blockdev-reopen:

Why is this safe?

Prior discusson:
Message-ID: <87lfq0yp9v@dusky.pond.sub.org>
https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg04921.html




[PATCH v3 5/9] nbd/client-connection: improve error message of cancelled attempt

2021-09-06 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/client-connection.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 722998c985..2bda42641d 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -351,8 +351,15 @@ nbd_co_establish_connection(NBDClientConnection *conn, 
NBDExportInfo *info,
 if (conn->err) {
 error_propagate(errp, error_copy(conn->err));
 } else {
-error_setg(errp,
-   "Connection attempt cancelled by other operation");
+/*
+ * The only possible case here is cancelling by open_timer
+ * during nbd_open(). So, the error message is for that case.
+ * If we have more use cases, we can refactor
+ * nbd_co_establish_connection_cancel() to take an additional
+ * parameter cancel_reason, that would be passed than to the
+ * caller of cancelled nbd_co_establish_connection().
+ */
+error_setg(errp, "Connection attempt cancelled by timeout");
 }
 
 return NULL;
-- 
2.29.2




[PATCH v3 8/9] iotests.py: add qemu_io_popen()

2021-09-06 Thread Vladimir Sementsov-Ogievskiy
Add qemu-io Popen constructor wrapper. To be used in the following new
test commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 3b7b57489a..be53c8d5ec 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -216,6 +216,10 @@ def qemu_io_wrap_args(args: Sequence[str]):
 return qemu_io_args + list(args)
 
 
+def qemu_io_popen(*args):
+return qemu_tool_popen('qemu-io', qemu_io_wrap_args(args))
+
+
 def qemu_io(*args):
 '''Run qemu-io and return the stdout data'''
 return qemu_tool_pipe_and_status('qemu-io', qemu_io_wrap_args(args))[0]
-- 
2.29.2




[PATCH v3 9/9] iotests: add nbd-reconnect-on-open test

2021-09-06 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 .../qemu-iotests/tests/nbd-reconnect-on-open  | 71 +++
 .../tests/nbd-reconnect-on-open.out   | 11 +++
 2 files changed, 82 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/nbd-reconnect-on-open
 create mode 100644 tests/qemu-iotests/tests/nbd-reconnect-on-open.out

diff --git a/tests/qemu-iotests/tests/nbd-reconnect-on-open 
b/tests/qemu-iotests/tests/nbd-reconnect-on-open
new file mode 100755
index 00..7ee9bce947
--- /dev/null
+++ b/tests/qemu-iotests/tests/nbd-reconnect-on-open
@@ -0,0 +1,71 @@
+#!/usr/bin/env python3
+#
+# Test nbd reconnect on open
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import time
+
+import iotests
+from iotests import qemu_img_create, file_path, qemu_io_popen, qemu_nbd, \
+qemu_io_log, log
+
+iotests.script_initialize(supported_fmts=['qcow2'])
+
+disk, nbd_sock = file_path('disk', 'nbd-sock')
+
+
+def create_args(open_timeout):
+return ['--image-opts', '-c', 'read 0 1M',
+f'driver=nbd,open-timeout={open_timeout},'
+f'server.type=unix,server.path={nbd_sock}']
+
+
+def check_fail_to_connect(open_timeout):
+log(f'Check fail to connect with {open_timeout} seconds of timeout')
+
+start_t = time.time()
+qemu_io_log(*create_args(open_timeout))
+delta_t = time.time() - start_t
+
+max_delta = open_timeout + 0.2
+if open_timeout <= delta_t <= max_delta:
+log(f'qemu_io finished in {open_timeout}..{max_delta} seconds, OK')
+else:
+note = 'too early' if delta_t < open_timeout else 'too long'
+log(f'qemu_io finished in {delta_t:.1f} seconds, {note}')
+
+
+qemu_img_create('-f', iotests.imgfmt, disk, '1M')
+
+# Start NBD client when NBD server is not yet running. It should not fail, but
+# wait for 5 seconds for the server to be available.
+client = qemu_io_popen(*create_args(5))
+
+time.sleep(1)
+qemu_nbd('-k', nbd_sock, '-f', iotests.imgfmt, disk)
+
+# client should succeed
+log(client.communicate()[0], filters=[iotests.filter_qemu_io])
+
+# Server was started without --persistent flag, so it should be off now. Let's
+# check it and it the same time check that with open-timeout=0 client fails
+# immediately.
+check_fail_to_connect(0)
+
+# Check that we will fail after non-zero timeout if server is still unavailable
+check_fail_to_connect(1)
diff --git a/tests/qemu-iotests/tests/nbd-reconnect-on-open.out 
b/tests/qemu-iotests/tests/nbd-reconnect-on-open.out
new file mode 100644
index 00..a35ae30ea4
--- /dev/null
+++ b/tests/qemu-iotests/tests/nbd-reconnect-on-open.out
@@ -0,0 +1,11 @@
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Check fail to connect with 0 seconds of timeout
+qemu-io: can't open: Failed to connect to 'TEST_DIR/PID-nbd-sock': No such 
file or directory
+
+qemu_io finished in 0..0.2 seconds, OK
+Check fail to connect with 1 seconds of timeout
+qemu-io: can't open: Failed to connect to 'TEST_DIR/PID-nbd-sock': No such 
file or directory
+
+qemu_io finished in 1..1.2 seconds, OK
-- 
2.29.2




[PATCH v3 6/9] iotests.py: add qemu_tool_popen()

2021-09-06 Thread Vladimir Sementsov-Ogievskiy
Split qemu_tool_popen() from qemu_tool_pipe_and_status() to be used
separately.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 89663dac06..b518545c09 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -106,14 +106,21 @@ def unarchive_sample_image(sample, fname):
 shutil.copyfileobj(f_in, f_out)
 
 
+def qemu_tool_popen(tool: str, args: Sequence[str],
+connect_stderr: bool = True) -> subprocess.Popen:
+stderr = subprocess.STDOUT if connect_stderr else None
+return subprocess.Popen(args,
+stdout=subprocess.PIPE,
+stderr=stderr,
+universal_newlines=True)
+
+
 def qemu_tool_pipe_and_status(tool: str, args: Sequence[str],
   connect_stderr: bool = True) -> Tuple[str, int]:
 """
 Run a tool and return both its output and its exit code
 """
-stderr = subprocess.STDOUT if connect_stderr else None
-with subprocess.Popen(args, stdout=subprocess.PIPE,
-  stderr=stderr, universal_newlines=True) as subp:
+with qemu_tool_popen(tool, args, connect_stderr) as subp:
 output = subp.communicate()[0]
 if subp.returncode < 0:
 cmd = ' '.join(args)
-- 
2.29.2




[PATCH v3 0/9] nbd reconnect on open

2021-09-06 Thread Vladimir Sementsov-Ogievskiy
Hi all!

After a long delay here is v3.

v3 is rebased on top of big refactoring of nbd connection code, and on
top of last portion of it, not yet merged:
Based-on: <20210902103805.25686-1-vsement...@virtuozzo.com>
   "[PATCH v6 0/5] block/nbd: drop connection_co"

So, the core patch (02) is changed a lot. QAPI interface added.

Vladimir Sementsov-Ogievskiy (9):
  nbd/client-connection: nbd_co_establish_connection(): fix non set errp
  qapi: make blockdev-add a coroutine command
  nbd: allow reconnect on open, with corresponding new options
  nbd/client-connection: nbd_co_establish_connection(): return real
error
  nbd/client-connection: improve error message of cancelled attempt
  iotests.py: add qemu_tool_popen()
  iotests.py: add and use qemu_io_wrap_args()
  iotests.py: add qemu_io_popen()
  iotests: add nbd-reconnect-on-open test

 qapi/block-core.json  | 12 +++-
 block/nbd.c   | 45 +++-
 nbd/client-connection.c   | 56 +++
 tests/qemu-iotests/iotests.py | 39 ++
 .../qemu-iotests/tests/nbd-reconnect-on-open  | 71 +++
 .../tests/nbd-reconnect-on-open.out   | 11 +++
 6 files changed, 203 insertions(+), 31 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/nbd-reconnect-on-open
 create mode 100644 tests/qemu-iotests/tests/nbd-reconnect-on-open.out

-- 
2.29.2




[PATCH v3 4/9] nbd/client-connection: nbd_co_establish_connection(): return real error

2021-09-06 Thread Vladimir Sementsov-Ogievskiy
The only user of errp is call to nbd_do_establish_connection() in
nbd_open(). The only way to cancel this call is through open_timer
timeout. And for this case, user will be more interested in description
of last failed connect rather than in
"Connection attempt cancelled by other operation".

So, let's change behavior on cancel to return previous failure error if
available.

Do the same for non-blocking failure case. In this case we still don't
have a caller that is interested in errp. But let's be consistent.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/client-connection.c | 50 -
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 695f855754..722998c985 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -39,16 +39,18 @@ struct NBDClientConnection {
 
 QemuMutex mutex;
 
+NBDExportInfo updated_info;
 /*
- * @sioc and @err represent a connection attempt.  While running
- * is true, they are only used by the connection thread, and mutex
- * locking is not needed.  Once the thread finishes,
- * nbd_co_establish_connection then steals these pointers while
- * under the mutex.
+ * @sioc represents a successful result. While thread is running, @sioc is
+ * used only by thread and not protected by mutex. When thread is not
+ * running, @sioc is stolen by nbd_co_establish_connection() under mutex.
  */
-NBDExportInfo updated_info;
 QIOChannelSocket *sioc;
 QIOChannel *ioc;
+/*
+ * @err represents previous attempt. It may be copied by
+ * nbd_co_establish_connection() when it reports failure.
+ */
 Error *err;
 
 /* All further fields are accessed only under mutex */
@@ -170,18 +172,18 @@ static void *connect_thread_func(void *opaque)
 
 qemu_mutex_lock(>mutex);
 while (!conn->detached) {
+Error *local_err = NULL;
+
 assert(!conn->sioc);
 conn->sioc = qio_channel_socket_new();
 
 qemu_mutex_unlock(>mutex);
 
-error_free(conn->err);
-conn->err = NULL;
 conn->updated_info = conn->initial_info;
 
 ret = nbd_connect(conn->sioc, conn->saddr,
   conn->do_negotiation ? >updated_info : NULL,
-  conn->tlscreds, >ioc, >err);
+  conn->tlscreds, >ioc, _err);
 
 /*
  * conn->updated_info will finally be returned to the user. Clear the
@@ -194,6 +196,10 @@ static void *connect_thread_func(void *opaque)
 
 qemu_mutex_lock(>mutex);
 
+error_free(conn->err);
+conn->err = NULL;
+error_propagate(>err, local_err);
+
 if (ret < 0) {
 object_unref(OBJECT(conn->sioc));
 conn->sioc = NULL;
@@ -311,14 +317,17 @@ nbd_co_establish_connection(NBDClientConnection *conn, 
NBDExportInfo *info,
 }
 
 conn->running = true;
-error_free(conn->err);
-conn->err = NULL;
 qemu_thread_create(, "nbd-connect",
connect_thread_func, conn, 
QEMU_THREAD_DETACHED);
 }
 
 if (!blocking) {
-error_setg(errp, "No connection at the moment");
+if (conn->err) {
+error_propagate(errp, error_copy(conn->err));
+} else {
+error_setg(errp, "No connection at the moment");
+}
+
 return NULL;
 }
 
@@ -339,14 +348,23 @@ nbd_co_establish_connection(NBDClientConnection *conn, 
NBDExportInfo *info,
  * attempt as failed, but leave the connection thread running,
  * to reuse it for the next connection attempt.
  */
-error_setg(errp, "Connection attempt cancelled by other 
operation");
+if (conn->err) {
+error_propagate(errp, error_copy(conn->err));
+} else {
+error_setg(errp,
+   "Connection attempt cancelled by other operation");
+}
+
 return NULL;
 } else {
-error_propagate(errp, conn->err);
-conn->err = NULL;
-if (!conn->sioc) {
+/* Thread finished. There must be either error or sioc */
+assert(!conn->err != !conn->sioc);
+
+if (conn->err) {
+error_propagate(errp, error_copy(conn->err));
 return NULL;
 }
+
 if (conn->do_negotiation) {
 memcpy(info, >updated_info, sizeof(*info));
 if (conn->ioc) {
-- 
2.29.2




[PATCH v3 7/9] iotests.py: add and use qemu_io_wrap_args()

2021-09-06 Thread Vladimir Sementsov-Ogievskiy
For qemu_io* functions support --image-opts argument, which conflicts
with -f argument from qemu_io_args.

For QemuIoInteractive use new wrapper as well, which allows relying on
default format.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b518545c09..3b7b57489a 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -208,10 +208,17 @@ def img_info_log(filename, filter_path=None, 
imgopts=False, extra_args=()):
 filter_path = filename
 log(filter_img_info(output, filter_path))
 
+
+def qemu_io_wrap_args(args: Sequence[str]):
+if '-f' in args or '--image-opts' in args:
+return qemu_io_args_no_fmt + list(args)
+else:
+return qemu_io_args + list(args)
+
+
 def qemu_io(*args):
 '''Run qemu-io and return the stdout data'''
-args = qemu_io_args + list(args)
-return qemu_tool_pipe_and_status('qemu-io', args)[0]
+return qemu_tool_pipe_and_status('qemu-io', qemu_io_wrap_args(args))[0]
 
 def qemu_io_log(*args):
 result = qemu_io(*args)
@@ -220,12 +227,7 @@ def qemu_io_log(*args):
 
 def qemu_io_silent(*args):
 '''Run qemu-io and return the exit code, suppressing stdout'''
-if '-f' in args or '--image-opts' in args:
-default_args = qemu_io_args_no_fmt
-else:
-default_args = qemu_io_args
-
-args = default_args + list(args)
+args = qemu_io_wrap_args(args)
 exitcode = subprocess.call(args, stdout=open('/dev/null', 'w'))
 if exitcode < 0:
 sys.stderr.write('qemu-io received signal %i: %s\n' %
@@ -234,14 +236,14 @@ def qemu_io_silent(*args):
 
 def qemu_io_silent_check(*args):
 '''Run qemu-io and return the true if subprocess returned 0'''
-args = qemu_io_args + list(args)
+args = qemu_io_wrap_args(args)
 exitcode = subprocess.call(args, stdout=open('/dev/null', 'w'),
stderr=subprocess.STDOUT)
 return exitcode == 0
 
 class QemuIoInteractive:
 def __init__(self, *args):
-self.args = qemu_io_args_no_fmt + list(args)
+self.args = qemu_io_wrap_args(args)
 # We need to keep the Popen objext around, and not
 # close it immediately. Therefore, disable the pylint check:
 # pylint: disable=consider-using-with
-- 
2.29.2




[PATCH v3 2/9] qapi: make blockdev-add a coroutine command

2021-09-06 Thread Vladimir Sementsov-Ogievskiy
We are going to support nbd reconnect on open in a next commit. This
means that we want to do several connection attempts during some time.
And this should be done in a coroutine, otherwise we'll stuck.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 06674c25c9..6e4042530a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4219,7 +4219,8 @@
 # <- { "return": {} }
 #
 ##
-{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
+{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
+  'coroutine': true }
 
 ##
 # @blockdev-reopen:
-- 
2.29.2




[PATCH v3 3/9] nbd: allow reconnect on open, with corresponding new options

2021-09-06 Thread Vladimir Sementsov-Ogievskiy
It is useful when start of vm and start of nbd server are not
simple to sync.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json |  9 -
 block/nbd.c  | 45 +++-
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6e4042530a..30d491bcd4 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3994,6 +3994,12 @@
 #   future requests before a successful reconnect will
 #   immediately fail. Default 0 (Since 4.2)
 #
+# @open-timeout: In seconds. If zero, the nbd driver tries the connection
+#only once, and fails to open if the connection fails.
+#If non-zero, the nbd driver will repeat connection attempts
+#until successful or until @open-timeout seconds have elapsed.
+#Default 0 (Since 6.2)
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsNbd',
@@ -4001,7 +4007,8 @@
 '*export': 'str',
 '*tls-creds': 'str',
 '*x-dirty-bitmap': 'str',
-'*reconnect-delay': 'uint32' } }
+'*reconnect-delay': 'uint32',
+'*open-timeout': 'uint32' } }
 
 ##
 # @BlockdevOptionsRaw:
diff --git a/block/nbd.c b/block/nbd.c
index 306b2de9f2..38a503102c 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -80,6 +80,7 @@ typedef struct BDRVNBDState {
 NBDClientState state;
 
 QEMUTimer *reconnect_delay_timer;
+QEMUTimer *open_timer;
 
 NBDClientRequest requests[MAX_NBD_REQUESTS];
 NBDReply reply;
@@ -87,6 +88,7 @@ typedef struct BDRVNBDState {
 
 /* Connection parameters */
 uint32_t reconnect_delay;
+uint32_t open_timeout;
 SocketAddress *saddr;
 char *export, *tlscredsid;
 QCryptoTLSCreds *tlscreds;
@@ -218,6 +220,32 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 s->state = NBD_CLIENT_QUIT;
 }
 
+static void open_timer_del(BDRVNBDState *s)
+{
+if (s->open_timer) {
+timer_free(s->open_timer);
+s->open_timer = NULL;
+}
+}
+
+static void open_timer_cb(void *opaque)
+{
+BDRVNBDState *s = opaque;
+
+nbd_co_establish_connection_cancel(s->conn);
+open_timer_del(s);
+}
+
+static void open_timer_init(BDRVNBDState *s, uint64_t expire_time_ns)
+{
+assert(!s->open_timer);
+s->open_timer = aio_timer_new(bdrv_get_aio_context(s->bs),
+  QEMU_CLOCK_REALTIME,
+  SCALE_NS,
+  open_timer_cb, s);
+timer_mod(s->open_timer, expire_time_ns);
+}
+
 static bool nbd_client_connecting(BDRVNBDState *s)
 {
 NBDClientState state = qatomic_load_acquire(>state);
@@ -1737,6 +1765,15 @@ static QemuOptsList nbd_runtime_opts = {
 "future requests before a successful reconnect will "
 "immediately fail. Default 0",
 },
+{
+.name = "open-timeout",
+.type = QEMU_OPT_NUMBER,
+.help = "In seconds. If zero, the nbd driver tries the connection "
+"only once, and fails to open if the connection fails. "
+"If non-zero, the nbd driver will repeat connection "
+"attempts until successful or until @open-timeout seconds "
+"have elapsed. Default 0",
+},
 { /* end of list */ }
 },
 };
@@ -1792,6 +1829,7 @@ static int nbd_process_options(BlockDriverState *bs, 
QDict *options,
 }
 
 s->reconnect_delay = qemu_opt_get_number(opts, "reconnect-delay", 0);
+s->open_timeout = qemu_opt_get_number(opts, "open-timeout", 0);
 
 ret = 0;
 
@@ -1823,7 +1861,12 @@ static int nbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->conn = nbd_client_connection_new(s->saddr, true, s->export,
 s->x_dirty_bitmap, s->tlscreds);
 
-/* TODO: Configurable retry-until-timeout behaviour. */
+if (s->open_timeout) {
+nbd_client_connection_enable_retry(s->conn);
+open_timer_init(s, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
+s->open_timeout * NANOSECONDS_PER_SECOND);
+}
+
 s->state = NBD_CLIENT_CONNECTING_WAIT;
 ret = nbd_do_establish_connection(bs, errp);
 if (ret < 0) {
-- 
2.29.2




[PATCH v3 1/9] nbd/client-connection: nbd_co_establish_connection(): fix non set errp

2021-09-06 Thread Vladimir Sementsov-Ogievskiy
When we don't have a connection and blocking is false, we return NULL
but don't set errp. That's wrong.

We have two paths for calling nbd_co_establish_connection():

1. nbd_open() -> nbd_do_establish_connection() -> ...
  but that will never set blocking=false

2. nbd_reconnect_attempt() -> nbd_co_do_establish_connection() -> ...
  but that uses errp=NULL

So, we are safe with our wrong errp policy in
nbd_co_establish_connection(). Still let's fix it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/client-connection.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 7123b1e189..695f855754 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -318,6 +318,7 @@ nbd_co_establish_connection(NBDClientConnection *conn, 
NBDExportInfo *info,
 }
 
 if (!blocking) {
+error_setg(errp, "No connection at the moment");
 return NULL;
 }
 
-- 
2.29.2




Re: [PULL] qemu-socket unix socket bugfix 2021-09-06

2021-09-06 Thread Michael Tokarev

06.09.2021 21:41, Peter Maydell wrote:
..

Hi. gpg says the key you signed this with has expired:

gpg: Signature made Mon 06 Sep 2021 16:19:32 BST
gpg:using RSA key 7B73BAD68BE7A2C289314B22701B4F6B1A693E59
gpg:issuer "m...@tls.msk.ru"
gpg: Good signature from "Michael Tokarev " [expired]
gpg: aka "Michael Tokarev " [expired]
gpg: aka "Michael Tokarev " [expired]
gpg: Note: This key has expired!
Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
  Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931  4B22 701B 4F6B 1A69 3E59


Um.


Assuming you've renewed the key,  can you give me a keyserver I can
download the updated version from, please?


Sure.  I thought I uploaded it long ago.

I just uploaded my key to pgp.mit.edu and keys.openpgp.org.
Hopefully these work properly with subkeys.

And it can definitely be fetched from keyring.debian.org.

Thanks!

/mjt



Re: [RFC PATCH 0/2] riscv: Adding custom CSR related Kconfig options

2021-09-06 Thread Richard Henderson

On 9/6/21 9:05 AM, Alistair Francis wrote:

I honestly don't see a scenario where that happens. The maintenance
overhead and confusion of changing the CPUs at build time is too high.


Yes indeed.  One qemu image should support all cpu variations at once.


I also don't think we should need that for CSR accesses. Custom
instructions are a whole different can of worms.


Custom instruction sets are manageable.

In general, we use separate decodetree instances, and select on them at a high level of 
translation.  We have examples of this in both target/arm/ and target/mips/ dealing with 
different ISAs.


Prod me when you get to that point.


r~



Re: [PULL] qemu-socket unix socket bugfix 2021-09-06

2021-09-06 Thread Peter Maydell
On Mon, 6 Sept 2021 at 16:21, Michael Tokarev  wrote:
>
> The following changes since commit 935efca6c246c108253b0e4e51cc87648fc7ca10:
>
>   Merge remote-tracking branch 
> 'remotes/thuth-gitlab/tags/pull-request-2021-09-06' into staging (2021-09-06 
> 12:38:07 +0100)
>
> are available in the Git repository at:
>
>   git://git.corpit.ru/qemu.git tags/patch-fetch
>
> for you to fetch changes up to 118d527f2e4baec5fe8060b22a6212468b8e4d3f:
>
>   qemu-sockets: fix unix socket path copy (again) (2021-09-06 17:18:54 +0300)
>
> 
> qemu-socket unix socket bugfix 2021-09-06
>
> 
> Michael Tokarev (1):
>   qemu-sockets: fix unix socket path copy (again)

Hi. gpg says the key you signed this with has expired:

gpg: Signature made Mon 06 Sep 2021 16:19:32 BST
gpg:using RSA key 7B73BAD68BE7A2C289314B22701B4F6B1A693E59
gpg:issuer "m...@tls.msk.ru"
gpg: Good signature from "Michael Tokarev " [expired]
gpg: aka "Michael Tokarev " [expired]
gpg: aka "Michael Tokarev " [expired]
gpg: Note: This key has expired!
Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
 Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931  4B22 701B 4F6B 1A69 3E59

Assuming you've renewed the key,  can you give me a keyserver I can
download the updated version from, please?

thanks
-- PMM



Re: [PATCH] hw/display/ati_2d: Fix buffer overflow in ati_2d_blt (CVE-2021-3638)

2021-09-06 Thread Philippe Mathieu-Daudé
(Forgot to Cc Alex for eventual reproducer)

On 9/6/21 6:44 PM, Mauro Matteo Cascella wrote:
> On Mon, Sep 6, 2021 at 5:31 PM Philippe Mathieu-Daudé  
> wrote:
>>
>> When building QEMU with DEBUG_ATI defined then running with
>> '-device ati-vga,romfile="" -d unimp,guest_errors -trace ati\*'
>> we get:
>>
>>   ati_mm_write 4 0x16c0 DP_CNTL <- 0x1
>>   ati_mm_write 4 0x146c DP_GUI_MASTER_CNTL <- 0x2
>>   ati_mm_write 4 0x16c8 DP_MIX <- 0xff
>>   ati_mm_write 4 0x16c4 DP_DATATYPE <- 0x2
>>   ati_mm_write 4 0x224 CRTC_OFFSET <- 0x0
>>   ati_mm_write 4 0x142c DST_PITCH_OFFSET <- 0xfe0
>>   ati_mm_write 4 0x1420 DST_Y <- 0x3fff
>>   ati_mm_write 4 0x1410 DST_HEIGHT <- 0x3fff
>>   ati_mm_write 4 0x1588 DST_WIDTH_X <- 0x3fff3fff
>>   ati_2d_blt: vram:0x7fff5fa0 addr:0 ds:0x7fff61273800 stride:2560 
>> bpp:32 rop:0xff
>>   ati_2d_blt: 0 0 0, 0 127 0, (0,0) -> (16383,16383) 16383x16383 > ^
>>   ati_2d_blt: pixman_fill(dst:0x7fff5fa0, stride:254, bpp:8, x:16383, 
>> y:16383, w:16383, h:16383, xor:0xff00)
>>   Thread 3 "qemu-system-i38" received signal SIGSEGV, Segmentation fault.
>>   (gdb) bt
>>   #0  0x77f62ce0 in sse2_fill.lto_priv () at /lib64/libpixman-1.so.0
>>   #1  0x77f09278 in pixman_fill () at /lib64/libpixman-1.so.0
>>   #2  0x57b5a9af in ati_2d_blt (s=0x63128800) at 
>> hw/display/ati_2d.c:196
>>   #3  0x57b4b5a2 in ati_mm_write (opaque=0x63128800, addr=5512, 
>> data=1073692671, size=4) at hw/display/ati.c:843
>>   #4  0x58b90ec4 in memory_region_write_accessor (mr=0x63139cc0, 
>> addr=5512, ..., size=4, ...) at softmmu/memory.c:492
>>
>> Commit 584acf34cb0 ("ati-vga: Fix reverse bit blts") introduced
>> the local dst_x and dst_y which adjust the (x, y) coordinates
>> depending on the direction in the SRCCOPY ROP3 operation, but
>> forgot to address the same issue for the PATCOPY, BLACKNESS and
>> WHITENESS operations, which also call pixman_fill().
>>
>> Fix that now by using the adjusted coordinates in the pixman_fill
>> call, and update the related debug printf().
>>
>> Reported-by: Qiang Liu 
>> Fixes: 584acf34cb0 ("ati-vga: Fix reverse bit blts")
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/display/ati_2d.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/display/ati_2d.c b/hw/display/ati_2d.c
>> index 4dc10ea7952..692bec91de4 100644
>> --- a/hw/display/ati_2d.c
>> +++ b/hw/display/ati_2d.c
>> @@ -84,7 +84,7 @@ void ati_2d_blt(ATIVGAState *s)
>>  DPRINTF("%d %d %d, %d %d %d, (%d,%d) -> (%d,%d) %dx%d %c %c\n",
>>  s->regs.src_offset, s->regs.dst_offset, s->regs.default_offset,
>>  s->regs.src_pitch, s->regs.dst_pitch, s->regs.default_pitch,
>> -s->regs.src_x, s->regs.src_y, s->regs.dst_x, s->regs.dst_y,
>> +s->regs.src_x, s->regs.src_y, dst_x, dst_y,
>>  s->regs.dst_width, s->regs.dst_height,
>>  (s->regs.dp_cntl & DST_X_LEFT_TO_RIGHT ? '>' : '<'),
>>  (s->regs.dp_cntl & DST_Y_TOP_TO_BOTTOM ? 'v' : '^'));
>> @@ -180,11 +180,11 @@ void ati_2d_blt(ATIVGAState *s)
>>  dst_stride /= sizeof(uint32_t);
>>  DPRINTF("pixman_fill(%p, %d, %d, %d, %d, %d, %d, %x)\n",
>>  dst_bits, dst_stride, bpp,
>> -s->regs.dst_x, s->regs.dst_y,
>> +dst_x, dst_y,
>>  s->regs.dst_width, s->regs.dst_height,
>>  filler);
>>  pixman_fill((uint32_t *)dst_bits, dst_stride, bpp,
>> -s->regs.dst_x, s->regs.dst_y,
>> +dst_x, dst_y,
>>  s->regs.dst_width, s->regs.dst_height,
>>  filler);
>>  if (dst_bits >= s->vga.vram_ptr + s->vga.vbe_start_addr &&
>> --
>> 2.31.1
>>
> 
> Tested-by: Mauro Matteo Cascella 

Thanks. I wouldn't be surprise if we get another CVE in this code /
file / function ASAP this patch get merged... The code calls for a
rewrite, as per this function comment in its header:

void ati_2d_blt(ATIVGAState *s)
{
/* FIXME it is probably more complex than this and may need to be */
/* rewritten but for now as a start just to get some output: */

Regards,

Phil.




Re: [PATCH] linux-user: manage binfmt-misc preserve-arg[0] flag

2021-09-06 Thread Michael Tokarev
> Add --preserve-argv0 in qemu-binfmt-conf.sh to configure the preserve-argv0
> flag.
...
> diff --git a/linux-user/main.c b/linux-user/main.c
> @@ -697,6 +707,20 @@ int main(int argc, char **argv, char **envp)
>  }
>  }
>  
> +/*
> + * get binfmt_misc flags
> + */
> +preserve_argv0 = !!(qemu_getauxval(AT_FLAGS) & AT_FLAGS_PRESERVE_ARGV0);
> +
> +/*
> + * Manage binfmt-misc preserve-arg[0] flag
> + *argv[optind] full path to the binary
> + *argv[optind + 1] original argv[0]
> + */
> +if (optind + 1 < argc && preserve_argv0) {
> +optind++;
> +}

Please note: this code is executed after parse_args() which is called
way up. And parse_args were able to mess up with the options & optind.

This is sort of a corner case really, but we rely on argv[1][0] being
!= '-'.  I think it is better to explicitly omit a call to parse_args()
for the AT_FLAGS_PRESERVE_ARGV0 case.  But parse_args() apparently is
a misnomer, since it also parses $ENVironment variables - this part
apparently should be run either way.

I noticed this because this interferes with my change in this area that
enables similar functionality (detecting the binfmt usage) but without
requiring any kernel changes and working with any version of kernel (it
has been discussed previously) - with both my code and this code in place
and the patched kernel, we update optind TWICE, one in parse_args() and
second here.  This caused someone filesystem to be wiped out already due
to wrong options being used.

Thanks,

/mjt



Re: [PATCH v2 2/5] s390x: kvm: topology: interception of PTF instruction

2021-09-06 Thread Thomas Huth

On 22/07/2021 19.42, Pierre Morel wrote:

Interception of the PTF instruction depending on the new
KVM_CAP_S390_CPU_TOPOLOGY KVM extension.

Signed-off-by: Pierre Morel 
---
  hw/s390x/s390-virtio-ccw.c | 45 ++
  include/hw/s390x/s390-virtio-ccw.h |  7 +
  target/s390x/kvm/kvm.c | 21 ++
  3 files changed, 73 insertions(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index e4b18aef49..500e856974 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -404,6 +404,49 @@ static void s390_pv_prepare_reset(S390CcwMachineState *ms)
  s390_pv_prep_reset();
  }
  
+int s390_handle_ptf(S390CPU *cpu, uint8_t r1, uintptr_t ra)

+{
+S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine());
+CPUS390XState *env = >env;
+uint64_t reg = env->regs[r1];
+uint8_t fc = reg & S390_TOPO_FC_MASK;
+
+if (!s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY)) {
+s390_program_interrupt(env, PGM_OPERAND, ra);


I think that should be PGM_OPERATION instead?


+return 0;
+}
+
+if (env->psw.mask & PSW_MASK_PSTATE) {
+s390_program_interrupt(env, PGM_PRIVILEGED, ra);
+return 0;
+}
+
+if (reg & ~S390_TOPO_FC_MASK) {
+s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+return 0;
+}
+
+switch (fc) {
+case 0:/* Horizontal polarization is already set */
+env->regs[r1] = S390_PTF_REASON_DONE; > +return 2;
+case 1:/* Vertical polarization is not supported */
+env->regs[r1] = S390_PTF_REASON_NONE;



This way, you're clearing the bits in the FC field. Is this intended by the 
architecture? If I get the PoP right, it just sets the bits in the RC field, 
but likely it should not clear the 1 in the FC field? Did you try on LPAR or 
z/VM to see what happens there?



+return 2;
+case 2:/* Report if a topology change report is pending */
+if (ms->topology_change_report_pending) {
+ms->topology_change_report_pending = false;
+return 1;
+}
+return 0;
+default:
+s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+break;


Just a matter of taste - but you could drop the break here.


+}
+
+return 0;
+}
+
  static void s390_machine_reset(MachineState *machine)
  {
  S390CcwMachineState *ms = S390_CCW_MACHINE(machine);
@@ -433,6 +476,8 @@ static void s390_machine_reset(MachineState *machine)
  run_on_cpu(cs, s390_do_cpu_ipl, RUN_ON_CPU_NULL);
  break;
  case S390_RESET_MODIFIED_CLEAR:
+/* clear topology_change_report pending condition on subsystem reset */
+ms->topology_change_report_pending = false;
  /*
   * Susbsystem reset needs to be done before we unshare memory
   * and lose access to VIRTIO structures in guest memory.
diff --git a/include/hw/s390x/s390-virtio-ccw.h 
b/include/hw/s390x/s390-virtio-ccw.h
index 3331990e02..fbde357332 100644
--- a/include/hw/s390x/s390-virtio-ccw.h
+++ b/include/hw/s390x/s390-virtio-ccw.h
@@ -27,9 +27,16 @@ struct S390CcwMachineState {
  bool aes_key_wrap;
  bool dea_key_wrap;
  bool pv;
+bool topology_change_report_pending;
  uint8_t loadparm[8];
  };
  
+#define S390_PTF_REASON_NONE (0x00 << 8)

+#define S390_PTF_REASON_DONE (0x01 << 8)
+#define S390_PTF_REASON_BUSY (0x02 << 8)
+#define S390_TOPO_FC_MASK 0xffUL
+int s390_handle_ptf(S390CPU *cpu, uint8_t r1, uintptr_t ra);
+
  struct S390CcwMachineClass {
  /*< private >*/
  MachineClass parent_class;
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 5b1fdb55c4..9a0c13d4ac 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -97,6 +97,7 @@
  
  #define PRIV_B9_EQBS0x9c

  #define PRIV_B9_CLP 0xa0
+#define PRIV_B9_PTF 0xa2
  #define PRIV_B9_PCISTG  0xd0
  #define PRIV_B9_PCILG   0xd2
  #define PRIV_B9_RPCIT   0xd3
@@ -1452,6 +1453,16 @@ static int kvm_mpcifc_service_call(S390CPU *cpu, struct 
kvm_run *run)
  }
  }
  
+static int kvm_handle_ptf(S390CPU *cpu, struct kvm_run *run)

+{
+uint8_t r1 = (run->s390_sieic.ipb >> 20) & 0x0f;
+uint8_t ret;


Why is ret an uint8_t ? s390_handle_ptf() returns an "int".


+ret = s390_handle_ptf(cpu, r1, RA_IGNORED);
+setcc(cpu, ret);
+return 0; > +}


 Thomas




[qemu-web RFC] CONTRIBUTING.md: Mention maintainers

2021-09-06 Thread Hanna Reitz
All patches to the QEMU website should be CC-ed to the website
maintainers, who (right now) are Thomas and Paolo.

Signed-off-by: Hanna Reitz 
---
This is an RFC first because I feel bad about sending a patch that
gives people responsibilities who aren't me.  But Thomas seemed to agree
with me that making this requirement explicit would be nice, so I guess
someone has to send a patch for it...

Second, I'm not sure whether this is the ideal place.  Perhaps we should
have a MAINTAINERS file, but on the other hand, maybe that would be a
bit too much.
---
 CONTRIBUTING.md | 5 +
 1 file changed, 5 insertions(+)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index b5209ac..d5cbf07 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -5,6 +5,11 @@ the the developer mailing list:
 
 https://lists.nongnu.org/mailman/listinfo/qemu-devel
 
+You should also CC the website maintainers:
+
+* Thomas Huth 
+* Paolo Bonzini 
+
 For further guidance on sending patches consult:
 
 https://wiki.qemu.org/Contribute/SubmitAPatch
-- 
2.31.1




Re: [PATCH] hw/display/ati_2d: Fix buffer overflow in ati_2d_blt (CVE-2021-3638)

2021-09-06 Thread Mauro Matteo Cascella
On Mon, Sep 6, 2021 at 5:31 PM Philippe Mathieu-Daudé  wrote:
>
> When building QEMU with DEBUG_ATI defined then running with
> '-device ati-vga,romfile="" -d unimp,guest_errors -trace ati\*'
> we get:
>
>   ati_mm_write 4 0x16c0 DP_CNTL <- 0x1
>   ati_mm_write 4 0x146c DP_GUI_MASTER_CNTL <- 0x2
>   ati_mm_write 4 0x16c8 DP_MIX <- 0xff
>   ati_mm_write 4 0x16c4 DP_DATATYPE <- 0x2
>   ati_mm_write 4 0x224 CRTC_OFFSET <- 0x0
>   ati_mm_write 4 0x142c DST_PITCH_OFFSET <- 0xfe0
>   ati_mm_write 4 0x1420 DST_Y <- 0x3fff
>   ati_mm_write 4 0x1410 DST_HEIGHT <- 0x3fff
>   ati_mm_write 4 0x1588 DST_WIDTH_X <- 0x3fff3fff
>   ati_2d_blt: vram:0x7fff5fa0 addr:0 ds:0x7fff61273800 stride:2560 bpp:32 
> rop:0xff
>   ati_2d_blt: 0 0 0, 0 127 0, (0,0) -> (16383,16383) 16383x16383 > ^
>   ati_2d_blt: pixman_fill(dst:0x7fff5fa0, stride:254, bpp:8, x:16383, 
> y:16383, w:16383, h:16383, xor:0xff00)
>   Thread 3 "qemu-system-i38" received signal SIGSEGV, Segmentation fault.
>   (gdb) bt
>   #0  0x77f62ce0 in sse2_fill.lto_priv () at /lib64/libpixman-1.so.0
>   #1  0x77f09278 in pixman_fill () at /lib64/libpixman-1.so.0
>   #2  0x57b5a9af in ati_2d_blt (s=0x63128800) at 
> hw/display/ati_2d.c:196
>   #3  0x57b4b5a2 in ati_mm_write (opaque=0x63128800, addr=5512, 
> data=1073692671, size=4) at hw/display/ati.c:843
>   #4  0x58b90ec4 in memory_region_write_accessor (mr=0x63139cc0, 
> addr=5512, ..., size=4, ...) at softmmu/memory.c:492
>
> Commit 584acf34cb0 ("ati-vga: Fix reverse bit blts") introduced
> the local dst_x and dst_y which adjust the (x, y) coordinates
> depending on the direction in the SRCCOPY ROP3 operation, but
> forgot to address the same issue for the PATCOPY, BLACKNESS and
> WHITENESS operations, which also call pixman_fill().
>
> Fix that now by using the adjusted coordinates in the pixman_fill
> call, and update the related debug printf().
>
> Reported-by: Qiang Liu 
> Fixes: 584acf34cb0 ("ati-vga: Fix reverse bit blts")
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/display/ati_2d.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/display/ati_2d.c b/hw/display/ati_2d.c
> index 4dc10ea7952..692bec91de4 100644
> --- a/hw/display/ati_2d.c
> +++ b/hw/display/ati_2d.c
> @@ -84,7 +84,7 @@ void ati_2d_blt(ATIVGAState *s)
>  DPRINTF("%d %d %d, %d %d %d, (%d,%d) -> (%d,%d) %dx%d %c %c\n",
>  s->regs.src_offset, s->regs.dst_offset, s->regs.default_offset,
>  s->regs.src_pitch, s->regs.dst_pitch, s->regs.default_pitch,
> -s->regs.src_x, s->regs.src_y, s->regs.dst_x, s->regs.dst_y,
> +s->regs.src_x, s->regs.src_y, dst_x, dst_y,
>  s->regs.dst_width, s->regs.dst_height,
>  (s->regs.dp_cntl & DST_X_LEFT_TO_RIGHT ? '>' : '<'),
>  (s->regs.dp_cntl & DST_Y_TOP_TO_BOTTOM ? 'v' : '^'));
> @@ -180,11 +180,11 @@ void ati_2d_blt(ATIVGAState *s)
>  dst_stride /= sizeof(uint32_t);
>  DPRINTF("pixman_fill(%p, %d, %d, %d, %d, %d, %d, %x)\n",
>  dst_bits, dst_stride, bpp,
> -s->regs.dst_x, s->regs.dst_y,
> +dst_x, dst_y,
>  s->regs.dst_width, s->regs.dst_height,
>  filler);
>  pixman_fill((uint32_t *)dst_bits, dst_stride, bpp,
> -s->regs.dst_x, s->regs.dst_y,
> +dst_x, dst_y,
>  s->regs.dst_width, s->regs.dst_height,
>  filler);
>  if (dst_bits >= s->vga.vram_ptr + s->vga.vbe_start_addr &&
> --
> 2.31.1
>

Tested-by: Mauro Matteo Cascella 

Thanks.
-- 
Mauro Matteo Cascella
Red Hat Product Security
PGP-Key ID: BB3410B0




Re: Guest Agent issue with 'guest-get-osinfo' command on Windows

2021-09-06 Thread Konstantin Kostiuk
On Mon, Sep 6, 2021 at 6:59 PM Richard W.M. Jones  wrote:

> On Mon, Sep 06, 2021 at 06:45:08PM +0300, Konstantin Kostiuk wrote:
> > Hi All,
> >
> > I reviewed glib, libguestfs, and libosinfo tools. All tools read the
> registry
> > to get information about Windows but read different registry values. All
> > information is returned in a localized form.
> > Related key: HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion
> > We can get 'pretty-name' from 'ProductName' value (all tools use it).
> > About 'version' there is three variant:
> > 1. Set 'version' equals to 'kernel-version'. libguestfs and libosinfo
> have this
> > behavior.
> > 2. Read 'version' from 'ReleaseId' value. glib has this behavior. In the
> case
> > of Windows Server 2022, 'ReleaseId' equals 2009.
> > 3. Read 'version' from 'DisplayVersion' value. In the case of Windows
> Server
> > 2022, 'DisplayVersion' equals 21H2.
>
> The important point is, however you get it, return the information as
> a libosinfo short value ("win2k22" in this case).
>

To get this string libguestfs just use a set of conditions. Return
"win2k22" if Windows type is 'server'
and ProductName contains 2022.

But as I know guest agent does not returns short name.


>
> > What do you think about this solution instead of using a conversion
> matrix?
> > What version we should use in this case?
>
> If you need to cover old and new versions of Windows then there's no
> good way.  You just need lots of conditionals and to constantly evolve
> the code as new versions come out.
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat
> http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> Fedora Windows cross-compiler. Compile Windows programs, test, and
> build Windows installers. Over 100 libraries supported.
> http://fedoraproject.org/wiki/MinGW
>
>


[qemu-web PATCH] Update FUSE block export blog post

2021-09-06 Thread Hanna Reitz
Because I forgot to CC Thomas on the discussion adding this post, it was
merged prematurely.  This patch updates the post to incorporate the
feedback I received on it:

- Title change: This article mostly deals with presenting a guest image
  in one image format as a raw image, so the title should reflect that;
  there is much less focus on exporting block devices from a live VM

- Mention libguestfs, and contrast against it; make a note that
  libguestfs provides security that FUSE exports cannot provide

- Have a full example in the intro, to show where we are going with this
  post

- Some heading depths changed (nesting did not really make sense)

- Be more explicit that by "file mounts" I do not mean a filesystem with
  a root directory and a single file in it

- Explicitly mention that "/" is a directory without a name, to
  illustrate the fact that root nodes do not have names

- Short intro for "QEMU block exports", explaining its place in this
  post

- Make all exports writable

- Use "exp0" as export ID to get shorter lines that fit better into 80
  characters

- Reference the intro example in the intro of "Mounting an image on
  itself"

- Show "qemu-fuse-disk-export.py" in *italic* instead of as `code`
  (because I had all other command names in *italic*)

Signed-off-by: Hanna Reitz 
---
 _posts/2021-08-22-fuse-blkexport.md | 145 ++--
 1 file changed, 117 insertions(+), 28 deletions(-)

diff --git a/_posts/2021-08-22-fuse-blkexport.md 
b/_posts/2021-08-22-fuse-blkexport.md
index 7e8066e..1db6e74 100644
--- a/_posts/2021-08-22-fuse-blkexport.md
+++ b/_posts/2021-08-22-fuse-blkexport.md
@@ -1,30 +1,55 @@
 ---
 layout: post
-title:  "Exporting block devices as raw image files with FUSE"
+title:  "Presenting guest images as raw image files with FUSE"
 date:   2021-08-22 14:00:00 +0200
 author: Hanna Reitz
 categories: [storage, features, tutorials]
 ---
 Sometimes, there is a VM disk image whose contents you want to manipulate
-without booting the VM.  For raw images, that process is usually fairly simple,
-because most Linux systems bring tools for the job, e.g.:
+without booting the VM.  One way of doing this is to use
+[libguestfs](https://libguestfs.org), which can boot a minimal Linux VM to
+provide the host with secure access to the disk’s contents.  For example,
+[*guestmount*](https://libguestfs.org/guestmount.1.html) allows you to mount a
+guest filesystem on the host, without requiring root rights.
+
+However, maybe you cannot or do not want to use libguestfs, e.g. because you do
+not have KVM available in your environment, and so it becomes too slow; or
+because you do not want to go through a guest OS, but want to access the raw
+image data directly on the host, with minimal overhead.
+
+**Note**: Guest images can generally be arbitrarily modified by VM guests.  If
+you have an image to which an untrusted guest had write access at some point,
+you must treat any data and metadata on this image as potentially having been
+modified in a malicious manner.  Parsing anything must be done carefully and
+with caution.  Note that many existing tools are not careful in this regard, 
for
+example, filesystem drivers generally deliberately do not have protection
+against maliciously corrupted filesystems.  This is why in contrast accessing 
an
+image through libguestfs is considered secure, because the actual access 
happens
+in a libvirt-managed VM guest.
+
+From this point, we assume you are aware of the security caveats and still want
+to access and manipulate image data on the host.
+
+Now, unless your image is already in raw format, you will be faced with the
+problem of getting it into raw format.  The tools that you might want to use 
for
+image manipulation generally only work on raw images (because that is how block
+device files appear), like:
 * *dd* to just copy data to and from given offsets,
 * *parted* to manipulate the partition table,
 * *kpartx* to present all partitions as block devices,
 * *mount* to access filesystems’ contents.
 
-Sadly, but naturally, such tools only work for raw images, and not for images
-e.g. in QEMU’s qcow2 format.  To access such an image’s content, the format has
-to be translated to create a raw image, for example by:
+So if you want to use such tools on image files e.g. in QEMU’s qcow2 format, 
you
+will need to translate them into raw images first, for example by:
 * Exporting the image file with `qemu-nbd -c` as an NBD block device file,
 * Converting between image formats using `qemu-img convert`,
 * Accessing the image from a guest, where it appears as a normal block device.
 
 Unfortunately, none of these methods is perfect: `qemu-nbd -c` generally
-requires root rights, converting to a temporary raw copy requires additional
-disk space and the conversion process takes time, and accessing the image from 
a
-guest is just quite cumbersome in general (and also specifically something that
-we set out to avoid in the first 

Re: [PATCH v6 0/5] hw/arm/virt: Introduce cpu topology support

2021-09-06 Thread Andrew Jones
On Fri, Sep 03, 2021 at 03:38:13PM +0800, wangyanan (Y) wrote:
> 
> On 2021/9/3 15:25, Peter Maydell wrote:
> > On Fri, 3 Sept 2021 at 08:05, wangyanan (Y)  wrote:
> > > 
> > > On 2021/9/2 23:56, Peter Maydell wrote:
> > > > On Tue, 24 Aug 2021 at 13:20, Yanan Wang  wrote:
> > > > > This new version is based on patch series [1] which introduces some
> > > > > fix and improvement for smp parsing.
> > > > > 
> > > > > Description:
> > > > > Once the view of an accurate virtual cpu topology is provided to 
> > > > > guest,
> > > > > with a well-designed vCPU pinning to the pCPU we may get a huge 
> > > > > benefit,
> > > > > e.g., the scheduling performance improvement. See Dario Faggioli's
> > > > > research and the related performance tests in [2] for reference.
> > > > > 
> > > > > This patch series introduces cpu topology support for ARM platform.
> > > > > Both cpu-map in DT and ACPI PPTT table are introduced to store the
> > > > > topology information. And we only describe the topology information
> > > > > to 6.2 and newer virt machines, considering compatibility.
> > > > > 
> > > > > patches not yet reviewed: #1 and #3.
> > > > > 
> > > > > [1] 
> > > > > https://lore.kernel.org/qemu-devel/20210823122804.7692-1-wangyana...@huawei.com/
> > > > > [2] 
> > > > > https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines
> > > > > -friend-or-foe-dario-faggioli-suse
> > > > Hi; this series doesn't apply to current head-of-git. Is it
> > > > intended to be based on some other series ?
> > > > 
> > > Yes, it was based on the -smp parsing changes in [1] which hasn't been
> > > picked yet. Given that [1] somehow affects the topology parsing results
> > > which we will describe to guest, I think it may be better that [1] can be
> > > merged first and then this series follows.
> > OK. I'll ignore this for now; please resend once that other series
> > has been accepted.
> Got it.

Also, you'll likely want to rebase on Igor's acpi refactor series[*]

[*] https://www.mail-archive.com/qemu-devel@nongnu.org/msg822151.html

Thanks,
drew

> 
> Thanks,
> Yanan
> > thanks
> > -- PMM
> > 
> > .
> 




Re: [PATCH] meson.build: Do not look for VNC-related libraries if have_system is not set

2021-09-06 Thread Philippe Mathieu-Daudé
On 9/6/21 5:39 PM, Thomas Huth wrote:
> When running "./configure --static --disable-system" there is currently
> a warning if the static version of libpng is missing:
> 
>  WARNING: Static library 'png16' not found for dependency 'libpng', may not
>  be statically linked
> 
> Since it does not make sense to look for the VNC-related libraries at all
> when we're building without system emulator binaries, let's add a check
> for have_system here to silence this warning.
> 
> Signed-off-by: Thomas Huth 
> ---
>  meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé 




Re: Guest Agent issue with 'guest-get-osinfo' command on Windows

2021-09-06 Thread Richard W.M. Jones
On Mon, Sep 06, 2021 at 06:45:08PM +0300, Konstantin Kostiuk wrote:
> Hi All,
> 
> I reviewed glib, libguestfs, and libosinfo tools. All tools read the registry
> to get information about Windows but read different registry values. All
> information is returned in a localized form.
> Related key: HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion
> We can get 'pretty-name' from 'ProductName' value (all tools use it).
> About 'version' there is three variant:
> 1. Set 'version' equals to 'kernel-version'. libguestfs and libosinfo have 
> this
> behavior.
> 2. Read 'version' from 'ReleaseId' value. glib has this behavior. In the case
> of Windows Server 2022, 'ReleaseId' equals 2009.
> 3. Read 'version' from 'DisplayVersion' value. In the case of Windows Server
> 2022, 'DisplayVersion' equals 21H2.

The important point is, however you get it, return the information as
a libosinfo short value ("win2k22" in this case).

> What do you think about this solution instead of using a conversion matrix?
> What version we should use in this case?

If you need to cover old and new versions of Windows then there's no
good way.  You just need lots of conditionals and to constantly evolve
the code as new versions come out.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW




Re: [PULL v2 01/36] target/i386: add missing bits to CR4_RESERVED_MASK

2021-09-06 Thread Richard W.M. Jones
On Mon, Sep 06, 2021 at 05:26:57PM +0200, Paolo Bonzini wrote:
> From: Daniel P. Berrangé 
> 
> Booting Fedora kernels with -cpu max hangs very early in boot. Disabling
> the la57 CPUID bit fixes the problem. git bisect traced the regression to
> 
>   commit 213ff024a2f92020290296cb9dc29c2af3d4a221 (HEAD, refs/bisect/bad)
>   Author: Lara Lazier 
>   Date:   Wed Jul 21 17:26:50 2021 +0200
> 
> target/i386: Added consistency checks for CR4
> 
> All MBZ bits in CR4 must be zero. (APM2 15.5)
> Added reserved bitmask and added checks in both
> helper_vmrun and helper_write_crN.
> 
> Signed-off-by: Lara Lazier 
> Message-Id: <20210721152651.14683-2-laramglaz...@gmail.com>
> Signed-off-by: Paolo Bonzini 
> 
> In this commit CR4_RESERVED_MASK is missing CR4_LA57_MASK and
> two others. Adding this lets Fedora kernels boot once again.
> 
> Signed-off-by: Daniel P. Berrangé 
> Tested-by: Richard W.M. Jones 

I tested it again and it still works:

$ LIBGUESTFS_BACKEND_SETTINGS=force_tcg LIBGUESTFS_HV=$PWD/qemu-system-x86_64 
libguestfs-test-tool
...
= TEST FINISHED OK =

(versus without the patch where it appears to hang in very early kernel)

Rich.

> Message-Id: <20210831175033.175584-1-berra...@redhat.com>
> [Removed VMXE/SMXE, matching the commit message. - Paolo]
> Fixes: 213ff024a2 ("target/i386: Added consistency checks for CR4", 
> 2021-07-22)
> Signed-off-by: Paolo Bonzini 
> ---
>  target/i386/cpu.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 6c50d3ab4f..21b33fbe2e 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -257,6 +257,7 @@ typedef enum X86Seg {
>  | CR4_DE_MASK | CR4_PSE_MASK | CR4_PAE_MASK \
>  | CR4_MCE_MASK | CR4_PGE_MASK | CR4_PCE_MASK \
>  | CR4_OSFXSR_MASK | CR4_OSXMMEXCPT_MASK |CR4_UMIP_MASK \
> +| CR4_LA57_MASK \
>  | CR4_FSGSBASE_MASK | CR4_PCIDE_MASK | CR4_OSXSAVE_MASK \
>  | CR4_SMEP_MASK | CR4_SMAP_MASK | CR4_PKE_MASK | 
> CR4_PKS_MASK))
>  
> -- 
> 2.31.1

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/




Re: [PATCH 4/5] ebpf_rss_helper: Added helper for eBPF RSS.

2021-09-06 Thread Andrew Melnichenko
Hi,

> I think it's for back-compatibility.
>
> E.g current codes works without mmap(), and user will surprise that it
> wont' work after upgrading their qemu.
>
Well, the current code would require additional capabilities with
"kernel.unprivileged_bpf_disabled=1", which may be possible on RedHat
systems.
Technically we may have mmap test which will show that mmap for
BPF_MAP_TYPE_ARRAY works, but on the target system, we will know it only in
runtime.
If I'm not mistaken, mmap for BPF_MAP_TYPE_ARRAY was added before kernel
5.4 and our bpf program requires kernel 5.8+.
So, there are no reasons to add bpf() update map as a fallback for mmap().

On Wed, Sep 1, 2021 at 9:42 AM Jason Wang  wrote:

>
> 在 2021/8/31 上午1:07, Yuri Benditovich 写道:
> > On Fri, Aug 20, 2021 at 6:41 AM Jason Wang  wrote:
> >>
> >> 在 2021/7/13 下午11:37, Andrew Melnychenko 写道:
> >>> Helper program. Loads eBPF RSS program and maps and passes them
> through unix socket.
> >>> Libvirt may launch this helper and pass eBPF fds to qemu virtio-net.
> >>
> >> I wonder if this can be done as helper for TAP/bridge.
> >>
> >> E.g it's the qemu to launch those helper with set-uid.
> >>
> >> Then libvirt won't even need to care about that?
> >>
> > There are pros and cons for such a solution with set-uid.
> >  From my point of view one of the cons is that set-uid is efficient
> > only at install time so the coexistence of different qemu builds (and
> > different helpers for each one) is kind of problematic.
> > With the current solution this does not present any problem: the
> > developer can have several different builds, each one automatically
> > has its own helper and there is no conflict between these builds and
> > between these builds and installed qemu package. Changing the
> > 'emulator' in the libvirt profile automatically brings the proper
> > helper to work.
>
>
> I'm not sure I get you here. We can still have default/sample helper to
> make sure it works for different builds.
>
> If we can avoid the involvement of libvirt, that would be better.
>
> Thanks
>
>
> >
> >>> Also, libbpf dependency now exclusively for Linux.
> >>> Libbpf is used for eBPF RSS steering, which is supported only by Linux
> TAP.
> >>> There is no reason yet to build eBPF loader and helper for non Linux
> systems,
> >>> even if libbpf is present.
> >>>
> >>> Signed-off-by: Andrew Melnychenko 
> >>> ---
> >>>ebpf/qemu-ebpf-rss-helper.c | 130
> 
> >>>meson.build |  37 ++
> >>>2 files changed, 154 insertions(+), 13 deletions(-)
> >>>create mode 100644 ebpf/qemu-ebpf-rss-helper.c
> >>>
> >>> diff --git a/ebpf/qemu-ebpf-rss-helper.c b/ebpf/qemu-ebpf-rss-helper.c
> >>> new file mode 100644
> >>> index 00..fe68758f57
> >>> --- /dev/null
> >>> +++ b/ebpf/qemu-ebpf-rss-helper.c
> >>> @@ -0,0 +1,130 @@
> >>> +/*
> >>> + * eBPF RSS Helper
> >>> + *
> >>> + * Developed by Daynix Computing LTD (http://www.daynix.com)
> >>> + *
> >>> + * Authors:
> >>> + *  Andrew Melnychenko 
> >>> + *
> >>> + * This work is licensed under the terms of the GNU GPL, version 2.
> See
> >>> + * the COPYING file in the top-level directory.
> >>> + *
> >>> + * Description: This is helper program for libvirtd.
> >>> + *  It loads eBPF RSS program and passes fds through unix
> socket.
> >>> + *  Built by meson, target - 'qemu-ebpf-rss-helper'.
> >>> + */
> >>> +
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +
> >>> +#include "ebpf_rss.h"
> >>> +
> >>> +#include "qemu-helper-stamp.h"
> >>> +
> >>> +void QEMU_HELPER_STAMP(void) {}
> >>> +
> >>> +static int send_fds(int socket, int *fds, int n)
> >>> +{
> >>> +struct msghdr msg = {};
> >>> +struct cmsghdr *cmsg = NULL;
> >>> +char buf[CMSG_SPACE(n * sizeof(int))];
> >>> +char dummy_buffer = 0;
> >>> +struct iovec io = { .iov_base = _buffer,
> >>> +.iov_len = sizeof(dummy_buffer) };
> >>> +
> >>> +memset(buf, 0, sizeof(buf));
> >>> +
> >>> +msg.msg_iov = 
> >>> +msg.msg_iovlen = 1;
> >>> +msg.msg_control = buf;
> >>> +msg.msg_controllen = sizeof(buf);
> >>> +
> >>> +cmsg = CMSG_FIRSTHDR();
> >>> +cmsg->cmsg_level = SOL_SOCKET;
> >>> +cmsg->cmsg_type = SCM_RIGHTS;
> >>> +cmsg->cmsg_len = CMSG_LEN(n * sizeof(int));
> >>> +
> >>> +memcpy(CMSG_DATA(cmsg), fds, n * sizeof(int));
> >>> +
> >>> +return sendmsg(socket, , 0);
> >>> +}
> >>> +
> >>> +static void print_help_and_exit(const char *prog, int exitcode)
> >>> +{
> >>> +fprintf(stderr, "%s - load eBPF RSS program for qemu and pass
> eBPF fds"
> >>> +" through unix socket.\n", prog);
> >>> +fprintf(stderr, "\t--fd , -f  - unix socket file
> descriptor"
> >>> +" used to pass eBPF fds.\n");
> >>> +fprintf(stderr, "\t--help, -h - this help.\n");
> >>> +exit(exitcode);
> >>> +}
> >>> +
> 

Re: [PULL v2 00/36] (Mostly) x86 changes for 2021-09-06

2021-09-06 Thread Peter Maydell
On Mon, 6 Sept 2021 at 16:28, Paolo Bonzini  wrote:
>
> The following changes since commit 935efca6c246c108253b0e4e51cc87648fc7ca10:
>
>   Merge remote-tracking branch 
> 'remotes/thuth-gitlab/tags/pull-request-2021-09-06' into staging (2021-09-06 
> 12:38:07 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to e423a6e6467abe2994e70670eb197069cc652782:
>
>   doc: Add the SGX doc (2021-09-06 17:24:38 +0200)
>
> 
> * SGX support (Sean, Yang)
> * vGIF and vVMLOAD/VMSAVE support (Lara)
> * Fix LA57 support in TCG (Daniel)
> 
>
> v1->v2: now entirely x86 - removed gbm patch and added the first one to fix 
> TCG LA57


>  slirp|   2 +-

Nothing slirp-related in the changelog, but the module has
been changed by commit f99ca7795fa6d17
("target/i386: Moved int_ctl into CPUX86State structure")
in your branch. Looks like an accident, could you clean it up,
and resubmit, please ?

thanks
-- PMM



Re: Guest Agent issue with 'guest-get-osinfo' command on Windows

2021-09-06 Thread Konstantin Kostiuk
Hi All,

I reviewed glib, libguestfs, and libosinfo tools. All tools read the
registry to get information about Windows but read different registry
values. All information is returned in a localized form.
Related key: HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion
We can get 'pretty-name' from 'ProductName' value (all tools use it).
About 'version' there is three variant:
1. Set 'version' equals to 'kernel-version'. libguestfs and libosinfo have
this behavior.
2. Read 'version' from 'ReleaseId' value. glib has this behavior. In the
case of Windows Server 2022, 'ReleaseId' equals 2009.
3. Read 'version' from 'DisplayVersion' value. In the case of Windows
Server 2022, 'DisplayVersion' equals 21H2.

What do you think about this solution instead of using a conversion matrix?
What version we should use in this case?

Best wishes,
Kostiantyn Kostiuk


On Thu, Sep 2, 2021 at 5:54 PM Marc-André Lureau 
wrote:

> Hi
>
> On Thu, Sep 2, 2021 at 6:16 PM Konstantin Kostiuk 
> wrote:
>
>> I tried to use glib to get OS info. Glib provide 3 values with version
>> about Windows:
>> g_get_os_info(G_OS_INFO_KEY_PRETTY_NAME)
>> g_get_os_info(G_OS_INFO_KEY_VERSION)
>> g_get_os_info(G_OS_INFO_KEY_VERSION_ID)
>>
>> Output for Windows Server 2019:
>> PRETTY_NAME = Windows 10 Server 1809
>> VERSION = 10 Server 1809
>> VERSION_ID = 10_server_1809
>>
>> Output for Windows Server 2022:
>> PRETTY_NAME = Windows 10 Server 2009
>> VERSION = 10 Server 2009
>> VERSION_ID = 10_server_2009
>>
>> So, for now, we can't use glib directly.
>>
>
> Ah apparently there is a bug about it:
> https://gitlab.gnome.org/GNOME/glib/-/issues/2443
>
> (we should aim for reusing glib functions, imho)
>
>
>> On Thu, Sep 2, 2021 at 4:55 PM Richard W.M. Jones 
>> wrote:
>>
>>> On Thu, Sep 02, 2021 at 02:36:51PM +0100, Daniel P. Berrangé wrote:
>>> > On Thu, Sep 02, 2021 at 03:36:01PM +0300, Konstantin Kostiuk wrote:
>>> > > Hi Team,
>>> > >
>>> > > We have several bugs related to 'guest-get-osinfo' command in
>>> Windows Guest
>>> > > Agent:
>>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1998919
>>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1972070
>>> > >
>>> > > This command returns the following data:
>>> > > {
>>> > > "name": "Microsoft Windows",
>>> > > "kernel-release": "20344",
>>> > > "version": "N/A",
>>> > > "variant": "server",
>>> > > "pretty-name": "Windows Server 2022 Datacenter",
>>> > > "version-id": "N/A",
>>> > > "variant-id": "server",
>>> > > "kernel-version": "10.0",
>>> > > "machine": "x86_64",
>>> > > "id": "mswindows"
>>> > > }
>>> > >
>>> > > The problem is with "version" and "pretty-name". Windows Server
>>> > > 2016/2019/2022 and Windows 11 have the same MajorVersion
>>> ("kernel-version")
>>> >
>>> > Yes, this is a long standing issue with version mapping Windows
>>> > guests, to which no one has ever come up with a nice solution
>>> > that I know of.
>>> >
>>> > In libosinfo database, we just report the kernel version as the
>>> > OS version, and accept the fact that there's a clash in version
>>> > between various Windows products.
>>> >
>>> >
>>> https://gitlab.com/libosinfo/osinfo-db/-/blob/master/data/os/microsoft.com/win-2k19.xml.in
>>> >
>>> >
>>> https://gitlab.com/libosinfo/osinfo-db/-/blob/master/data/os/microsoft.com/win-10.xml.in
>>> >
>>> > Apps that need to distinguish simply have to look at the
>>> > product name, even if this causes localization pain.
>>> >
>>> > Similarly in libguestfs, the virt-inspector tool just reports
>>> > the kernel version, and product name from the registry:
>>> >
>>> > # virt-inspector -d win2k8r2
>>> > 
>>> > 
>>> >   
>>> > /dev/sda2
>>> > windows
>>> > x86_64
>>> > windows
>>> > Windows Server 2008 R2 Standard
>>> > Server
>>> > 6
>>> > 1
>>> >
>>> >
>>> > # virt-inspector -d win10x64
>>> > 
>>> > 
>>> >   
>>> > /dev/sda2
>>> > windows
>>> > x86_64
>>> > windows
>>> > Windows 10 Pro
>>> > Client
>>> > 10
>>> > 0
>>> > /Windows
>>> >
>>>  ControlSet001
>>> > DESKTOP-GR8HTR3
>>> > win10
>>>
>>> We actually try to turn it into a libosinfo compatible short string as
>>> you can see from Dan's second example above and this code:
>>>
>>>
>>> https://github.com/libguestfs/libguestfs/blob/master/lib/inspect-osinfo.c
>>>
>>> Which is I think what every tool should return.  libosinfo is the only
>>> project that attempts to classify a broad range of OSes and is
>>> constantly being updated.
>>>
>>> > > This solution has several problems: need to update the conversion
>>> matrix
>>> > > for each Windows build, one Windows name can have different build
>>> numbers.
>>> > > For example, Windows Server 2022 (preview) build number is 20344,
>>> Windows
>>> > > Server 2022 build number is 20348.
>>> > >
>>> > > There are two possible solutions:
>>> > > 1. Use build number range instead of one number. Known implementation
>>> > > issue: Microsoft provides a table (
>>> > >
>>> 

Re: [PATCH] meson.build: Do not look for VNC-related libraries if have_system is not set

2021-09-06 Thread Daniel P . Berrangé
On Mon, Sep 06, 2021 at 05:39:39PM +0200, Thomas Huth wrote:
> When running "./configure --static --disable-system" there is currently
> a warning if the static version of libpng is missing:
> 
>  WARNING: Static library 'png16' not found for dependency 'libpng', may not
>  be statically linked
> 
> Since it does not make sense to look for the VNC-related libraries at all
> when we're building without system emulator binaries, let's add a check
> for have_system here to silence this warning.
> 
> Signed-off-by: Thomas Huth 
> ---
>  meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/meson.build b/meson.build
> index 7e58e6279b..f07236d947 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -931,7 +931,7 @@ vnc = not_found
>  png = not_found
>  jpeg = not_found
>  sasl = not_found
> -if not get_option('vnc').disabled()
> +if have_system and not get_option('vnc').disabled()
>vnc = declare_dependency() # dummy dependency
>png = dependency('libpng', required: get_option('vnc_png'),
> method: 'pkg-config', kwargs: static_kwargs)

Reviewed-by: Daniel P. Berrangé 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH] meson.build: Do not look for VNC-related libraries if have_system is not set

2021-09-06 Thread Thomas Huth
When running "./configure --static --disable-system" there is currently
a warning if the static version of libpng is missing:

 WARNING: Static library 'png16' not found for dependency 'libpng', may not
 be statically linked

Since it does not make sense to look for the VNC-related libraries at all
when we're building without system emulator binaries, let's add a check
for have_system here to silence this warning.

Signed-off-by: Thomas Huth 
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 7e58e6279b..f07236d947 100644
--- a/meson.build
+++ b/meson.build
@@ -931,7 +931,7 @@ vnc = not_found
 png = not_found
 jpeg = not_found
 sasl = not_found
-if not get_option('vnc').disabled()
+if have_system and not get_option('vnc').disabled()
   vnc = declare_dependency() # dummy dependency
   png = dependency('libpng', required: get_option('vnc_png'),
method: 'pkg-config', kwargs: static_kwargs)
-- 
2.27.0




Re: arm: Launching EFI-enabled arm32 Linux

2021-09-06 Thread Andre Przywara
On Sat, 4 Sep 2021 21:26:45 +0200
Adam Lackorzynski  wrote:

Hi Adam,

> while trying to launch an EFI-enabled arm32 Linux binary (zImage) I
> noticed I get an undefined instruction exception on the first
> instruction. Now this is a bit special because Linux uses a nop
> instruction there that also is a PE file signature ('MZ') such that the
> CPU runs over it and the file is still recognized as a PE binary. Linux
> uses 0x13105a4d (tstne r0, #0x4d000) as the instruction (see also
> arch/arm/boot/compressed/head.S and efi-header.S in Linux).
> However, QEMU's instruction decoder will only recognize TST with bits
> 12-15 being 0, which this instruction is not fullfilling, and thus the
> undef exception. I guess other CPU implementations will allow this
> encoding. So while investigating I was doing the following to make Linux
> proceed. I also believe this was working in a previous version of QEMU.
> 
> diff --git a/target/arm/a32.decode b/target/arm/a32.decode
> index fcd8cd4f7d..222553750e 100644
> --- a/target/arm/a32.decode
> +++ b/target/arm/a32.decode
> @@ -127,7 +127,7 @@ ADD_rri   001 0100 .      
> @s_rri_rot
>  ADC_rri   001 0101 .      @s_rri_rot
>  SBC_rri   001 0110 .      @s_rri_rot
>  RSC_rri   001 0111 .      @s_rri_rot
> -TST_xri   001 1000 1      @S_xri_rot
> +TST_xri   001 1000 1      @S_xri_rot
>  TEQ_xri   001 1001 1      @S_xri_rot
>  CMP_xri   001 1010 1      @S_xri_rot
>  CMN_xri   001 1011 1      @S_xri_rot
> 
> 
> Any thoughts on this?

thanks for the report, I was looking at this and have a kernel patch
to fix this properly as Peter suggested. And while I agree on the
problem, I was struggling to reproduce this in reality: both with
-kernel and when booting through U-Boot the "Z" bit is set, which lets
QEMU not even bother about the rest of the encoding - the condition
flags don't match, so it proceeds. If I change the __nop to use "tsteq",
I see it hanging due to the missing exception handler, but not with
"tstne".
So can you say how you spotted this issue? This would be needed as a
justification for patching the guts of the ARM Linux kernel port.

Cheers,
Andre



Re: [PATCH v3 0/6] qapi: Add support for aliases

2021-09-06 Thread Markus Armbruster
Kevin Wolf  writes:

> This series introduces alias definitions for QAPI object types (structs
> and unions).
>
> This allows using the same QAPI type and visitor even when the syntax
> has some variations between different external interfaces such as QMP
> and the command line.
>
> It also provides a new tool for evolving the schema while maintaining
> backwards compatibility (possibly during a deprecation period).
>
> The first user is intended to be a QAPIfied -chardev command line
> option, for which I'll send a separate series. A git tag is available
> that contains both this series and the chardev changes that make use of
> it:
>
> https://repo.or.cz/qemu/kevin.git qapi-alias-chardev-v3

Review complete.  Let's discuss my findings, decide what we'd rather
improve on top, then see whether the remainder needs a respin.

> v3:
> - Mention the new functions in the big comment in visitor.h. However,
>   since the comment is about users of the visitor rather than the
>   generated code, it seems like to wrong place to go into details.
> - Updated commit message for patch 3 ('Simplify full_name_nth() ...')
> - Patch 4 ('qapi: Apply aliases in qobject-input-visitor'):
> - Multiple matching wildcard aliases are considered conflicting now
> - Improved comments for several functions
> - Renamed bool *implicit_object into *is_alias_prefix, which
>   describes better what it is rather than what it is used for
> - Simplified alias_present() into input_present()
> - Fixed potential use of wrong StackObject in error message
> - Patch 5 ('qapi: Add support for aliases'):
> - Made QAPISchemaAlias a QAPISchemaMember
> - Check validity of alias source paths (must exist in at least one
>   variant, no optional objects in the path of a wildcard alias, no
>   alias loops)

I love this one, thanks!

> - Many new tests cases, both positive and negative, including unit tests
>   of the generated visit functions

Tests look good now.

> - Coding style changes
> - Rebased documentation (.txt -> .rst conversion in master)

[...]




[PATCH] hw/display/ati_2d: Fix buffer overflow in ati_2d_blt (CVE-2021-3638)

2021-09-06 Thread Philippe Mathieu-Daudé
When building QEMU with DEBUG_ATI defined then running with
'-device ati-vga,romfile="" -d unimp,guest_errors -trace ati\*'
we get:

  ati_mm_write 4 0x16c0 DP_CNTL <- 0x1
  ati_mm_write 4 0x146c DP_GUI_MASTER_CNTL <- 0x2
  ati_mm_write 4 0x16c8 DP_MIX <- 0xff
  ati_mm_write 4 0x16c4 DP_DATATYPE <- 0x2
  ati_mm_write 4 0x224 CRTC_OFFSET <- 0x0
  ati_mm_write 4 0x142c DST_PITCH_OFFSET <- 0xfe0
  ati_mm_write 4 0x1420 DST_Y <- 0x3fff
  ati_mm_write 4 0x1410 DST_HEIGHT <- 0x3fff
  ati_mm_write 4 0x1588 DST_WIDTH_X <- 0x3fff3fff
  ati_2d_blt: vram:0x7fff5fa0 addr:0 ds:0x7fff61273800 stride:2560 bpp:32 
rop:0xff
  ati_2d_blt: 0 0 0, 0 127 0, (0,0) -> (16383,16383) 16383x16383 > ^
  ati_2d_blt: pixman_fill(dst:0x7fff5fa0, stride:254, bpp:8, x:16383, 
y:16383, w:16383, h:16383, xor:0xff00)
  Thread 3 "qemu-system-i38" received signal SIGSEGV, Segmentation fault.
  (gdb) bt
  #0  0x77f62ce0 in sse2_fill.lto_priv () at /lib64/libpixman-1.so.0
  #1  0x77f09278 in pixman_fill () at /lib64/libpixman-1.so.0
  #2  0x57b5a9af in ati_2d_blt (s=0x63128800) at 
hw/display/ati_2d.c:196
  #3  0x57b4b5a2 in ati_mm_write (opaque=0x63128800, addr=5512, 
data=1073692671, size=4) at hw/display/ati.c:843
  #4  0x58b90ec4 in memory_region_write_accessor (mr=0x63139cc0, 
addr=5512, ..., size=4, ...) at softmmu/memory.c:492

Commit 584acf34cb0 ("ati-vga: Fix reverse bit blts") introduced
the local dst_x and dst_y which adjust the (x, y) coordinates
depending on the direction in the SRCCOPY ROP3 operation, but
forgot to address the same issue for the PATCOPY, BLACKNESS and
WHITENESS operations, which also call pixman_fill().

Fix that now by using the adjusted coordinates in the pixman_fill
call, and update the related debug printf().

Reported-by: Qiang Liu 
Fixes: 584acf34cb0 ("ati-vga: Fix reverse bit blts")
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/display/ati_2d.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/display/ati_2d.c b/hw/display/ati_2d.c
index 4dc10ea7952..692bec91de4 100644
--- a/hw/display/ati_2d.c
+++ b/hw/display/ati_2d.c
@@ -84,7 +84,7 @@ void ati_2d_blt(ATIVGAState *s)
 DPRINTF("%d %d %d, %d %d %d, (%d,%d) -> (%d,%d) %dx%d %c %c\n",
 s->regs.src_offset, s->regs.dst_offset, s->regs.default_offset,
 s->regs.src_pitch, s->regs.dst_pitch, s->regs.default_pitch,
-s->regs.src_x, s->regs.src_y, s->regs.dst_x, s->regs.dst_y,
+s->regs.src_x, s->regs.src_y, dst_x, dst_y,
 s->regs.dst_width, s->regs.dst_height,
 (s->regs.dp_cntl & DST_X_LEFT_TO_RIGHT ? '>' : '<'),
 (s->regs.dp_cntl & DST_Y_TOP_TO_BOTTOM ? 'v' : '^'));
@@ -180,11 +180,11 @@ void ati_2d_blt(ATIVGAState *s)
 dst_stride /= sizeof(uint32_t);
 DPRINTF("pixman_fill(%p, %d, %d, %d, %d, %d, %d, %x)\n",
 dst_bits, dst_stride, bpp,
-s->regs.dst_x, s->regs.dst_y,
+dst_x, dst_y,
 s->regs.dst_width, s->regs.dst_height,
 filler);
 pixman_fill((uint32_t *)dst_bits, dst_stride, bpp,
-s->regs.dst_x, s->regs.dst_y,
+dst_x, dst_y,
 s->regs.dst_width, s->regs.dst_height,
 filler);
 if (dst_bits >= s->vga.vram_ptr + s->vga.vbe_start_addr &&
-- 
2.31.1




Re: [PATCH v3 6/6] tests/qapi-schema: Test cases for aliases

2021-09-06 Thread Markus Armbruster
Kevin Wolf  writes:

> Signed-off-by: Kevin Wolf 

[...]

> diff --git a/tests/unit/test-qobject-input-visitor.c 
> b/tests/unit/test-qobject-input-visitor.c
> index e41b91a2a6..f2891b6f5d 100644
> --- a/tests/unit/test-qobject-input-visitor.c
> +++ b/tests/unit/test-qobject-input-visitor.c
> @@ -952,6 +952,214 @@ static void 
> test_visitor_in_list_union_number(TestInputVisitorData *data,
>  g_string_free(gstr_list, true);
>  }
>  
> +static void test_visitor_in_alias_struct_local(TestInputVisitorData *data,
> +   const void *unused)
> +{
> +AliasStruct1 *tmp = NULL;
> +Error *err = NULL;
> +Visitor *v;
> +

Context: the schema makes 'bar' an alias for 'foo'.

> +/* Can still specify the real member name with alias support */
> +v = visitor_input_test_init(data, "{ 'foo': 42 }");
> +visit_type_AliasStruct1(v, NULL, , _abort);
> +g_assert_cmpint(tmp->foo, ==, 42);
> +qapi_free_AliasStruct1(tmp);
> +
> +/* The alias is a working alternative */
> +v = visitor_input_test_init(data, "{ 'bar': 42 }");
> +visit_type_AliasStruct1(v, NULL, , _abort);
> +g_assert_cmpint(tmp->foo, ==, 42);
> +qapi_free_AliasStruct1(tmp);
> +
> +/* But you can't use both at the same time */
> +v = visitor_input_test_init(data, "{ 'foo': 5, 'bar': 42 }");
> +visit_type_AliasStruct1(v, NULL, , );
> +error_free_or_abort();

I double-checked this reports "Value for parameter foo was already given
through an alias", as it should.

Pointing to what exactly is giving values to foo already would be nice.
In this case, 'foo' is obvious, but 'bar' is not.  This is not a demand.

> +}
> +
> +static void test_visitor_in_alias_struct_nested(TestInputVisitorData *data,
> +const void *unused)
> +{
> +AliasStruct2 *tmp = NULL;
> +Error *err = NULL;
> +Visitor *v;
> +

Context: the schema makes 'bar' and 'nested.bar' aliases for
'nested.foo'.

> +/* Can still specify the real member names with alias support */
> +v = visitor_input_test_init(data, "{ 'nested': { 'foo': 42 } }");
> +visit_type_AliasStruct2(v, NULL, , _abort);
> +g_assert_cmpint(tmp->nested->foo, ==, 42);
> +qapi_free_AliasStruct2(tmp);
> +
> +/* The inner alias is a working alternative */
> +v = visitor_input_test_init(data, "{ 'nested': { 'bar': 42 } }");
> +visit_type_AliasStruct2(v, NULL, , _abort);
> +g_assert_cmpint(tmp->nested->foo, ==, 42);
> +qapi_free_AliasStruct2(tmp);
> +
> +/* So is the outer alias */
> +v = visitor_input_test_init(data, "{ 'bar': 42 }");
> +visit_type_AliasStruct2(v, NULL, , _abort);
> +g_assert_cmpint(tmp->nested->foo, ==, 42);
> +qapi_free_AliasStruct2(tmp);
> +
> +/* You can't use more than one option at the same time */
> +v = visitor_input_test_init(data, "{ 'bar': 5, 'nested': { 'foo': 42 } 
> }");
> +visit_type_AliasStruct2(v, NULL, , );
> +error_free_or_abort();

"Value for parameter nested.foo was already given through an alias".
Good.

> +
> +v = visitor_input_test_init(data, "{ 'bar': 5, 'nested': { 'bar': 42 } 
> }");
> +visit_type_AliasStruct2(v, NULL, , );
> +error_free_or_abort();

Likewise.

> +
> +v = visitor_input_test_init(data, "{ 'nested': { 'foo': 42, 'bar': 42 } 
> }");
> +visit_type_AliasStruct2(v, NULL, , );
> +error_free_or_abort();

Likewise.

> +
> +v = visitor_input_test_init(data, "{ 'bar': 5, "
> +  "  'nested': { 'foo': 42, 'bar': 42 } 
> }");
> +visit_type_AliasStruct2(v, NULL, , );
> +error_free_or_abort();

Likewise.

In the second of these four cases, none of the things giving values to
nested.foo is obvious.  Still not a demand.

> +}
> +
> +static void test_visitor_in_alias_wildcard(TestInputVisitorData *data,
> +   const void *unused)
> +{
> +AliasStruct3 *tmp = NULL;
> +Error *err = NULL;
> +Visitor *v;
> +

Context: the schema makes 'foo', 'bar', and 'nested.bar' aliases for
'nested.foo', using a wildcard alias for the former two.

> +/* Can still specify the real member names with alias support */
> +v = visitor_input_test_init(data, "{ 'nested': { 'foo': 42 } }");
> +visit_type_AliasStruct3(v, NULL, , _abort);
> +g_assert_cmpint(tmp->nested->foo, ==, 42);
> +qapi_free_AliasStruct3(tmp);
> +
> +/* The wildcard alias makes it work on the top level */
> +v = visitor_input_test_init(data, "{ 'foo': 42 }");
> +visit_type_AliasStruct3(v, NULL, , _abort);
> +g_assert_cmpint(tmp->nested->foo, ==, 42);
> +qapi_free_AliasStruct3(tmp);
> +
> +/* It makes the inner alias available, too */
> +v = visitor_input_test_init(data, "{ 'bar': 42 }");
> +visit_type_AliasStruct3(v, NULL, , _abort);
> +g_assert_cmpint(tmp->nested->foo, ==, 42);
> +qapi_free_AliasStruct3(tmp);
> +
> +/* You can't use more 

[PULL v2 01/36] target/i386: add missing bits to CR4_RESERVED_MASK

2021-09-06 Thread Paolo Bonzini
From: Daniel P. Berrangé 

Booting Fedora kernels with -cpu max hangs very early in boot. Disabling
the la57 CPUID bit fixes the problem. git bisect traced the regression to

  commit 213ff024a2f92020290296cb9dc29c2af3d4a221 (HEAD, refs/bisect/bad)
  Author: Lara Lazier 
  Date:   Wed Jul 21 17:26:50 2021 +0200

target/i386: Added consistency checks for CR4

All MBZ bits in CR4 must be zero. (APM2 15.5)
Added reserved bitmask and added checks in both
helper_vmrun and helper_write_crN.

Signed-off-by: Lara Lazier 
Message-Id: <20210721152651.14683-2-laramglaz...@gmail.com>
Signed-off-by: Paolo Bonzini 

In this commit CR4_RESERVED_MASK is missing CR4_LA57_MASK and
two others. Adding this lets Fedora kernels boot once again.

Signed-off-by: Daniel P. Berrangé 
Tested-by: Richard W.M. Jones 
Message-Id: <20210831175033.175584-1-berra...@redhat.com>
[Removed VMXE/SMXE, matching the commit message. - Paolo]
Fixes: 213ff024a2 ("target/i386: Added consistency checks for CR4", 2021-07-22)
Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 6c50d3ab4f..21b33fbe2e 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -257,6 +257,7 @@ typedef enum X86Seg {
 | CR4_DE_MASK | CR4_PSE_MASK | CR4_PAE_MASK \
 | CR4_MCE_MASK | CR4_PGE_MASK | CR4_PCE_MASK \
 | CR4_OSFXSR_MASK | CR4_OSXMMEXCPT_MASK |CR4_UMIP_MASK \
+| CR4_LA57_MASK \
 | CR4_FSGSBASE_MASK | CR4_PCIDE_MASK | CR4_OSXSAVE_MASK \
 | CR4_SMEP_MASK | CR4_SMAP_MASK | CR4_PKE_MASK | CR4_PKS_MASK))
 
-- 
2.31.1



[PULL v2 00/36] (Mostly) x86 changes for 2021-09-06

2021-09-06 Thread Paolo Bonzini
The following changes since commit 935efca6c246c108253b0e4e51cc87648fc7ca10:

  Merge remote-tracking branch 
'remotes/thuth-gitlab/tags/pull-request-2021-09-06' into staging (2021-09-06 
12:38:07 +0100)

are available in the Git repository at:

  https://gitlab.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to e423a6e6467abe2994e70670eb197069cc652782:

  doc: Add the SGX doc (2021-09-06 17:24:38 +0200)


* SGX support (Sean, Yang)
* vGIF and vVMLOAD/VMSAVE support (Lara)
* Fix LA57 support in TCG (Daniel)


v1->v2: now entirely x86 - removed gbm patch and added the first one to fix TCG 
LA57

Daniel P. Berrangé (1):
  target/i386: add missing bits to CR4_RESERVED_MASK

Lara Lazier (7):
  target/i386: VMRUN and VMLOAD canonicalizations
  target/i386: Added VGIF feature
  target/i386: Moved int_ctl into CPUX86State structure
  target/i386: Added VGIF V_IRQ masking capability
  target/i386: Added ignore TPR check in ctl_has_irq
  target/i386: Added changed priority check for VIRQ
  target/i386: Added vVMLOAD and vVMSAVE feature

Sean Christopherson (21):
  memory: Add RAM_PROTECTED flag to skip IOMMU mappings
  hostmem: Add hostmem-epc as a backend for SGX EPC
  i386: Add 'sgx-epc' device to expose EPC sections to guest
  vl: Add sgx compound properties to expose SGX EPC sections to guest
  i386: Add primary SGX CPUID and MSR defines
  i386: Add SGX CPUID leaf FEAT_SGX_12_0_EAX
  i386: Add SGX CPUID leaf FEAT_SGX_12_0_EBX
  i386: Add SGX CPUID leaf FEAT_SGX_12_1_EAX
  i386: Add get/set/migrate support for SGX_LEPUBKEYHASH MSRs
  i386: Add feature control MSR dependency when SGX is enabled
  i386: Update SGX CPUID info according to hardware/KVM/user input
  i386: kvm: Add support for exposing PROVISIONKEY to guest
  i386: Propagate SGX CPUID sub-leafs to KVM
  Adjust min CPUID level to 0x12 when SGX is enabled
  hw/i386/fw_cfg: Set SGX bits in feature control fw_cfg accordingly
  hw/i386/pc: Account for SGX EPC sections when calculating device memory
  i386/pc: Add e820 entry for SGX EPC section(s)
  i386: acpi: Add SGX EPC entry to ACPI tables
  q35: Add support for SGX EPC
  i440fx: Add support for SGX EPC
  doc: Add the SGX doc

Yang Zhong (7):
  qom: Add memory-backend-epc ObjectOptions support
  hostmem-epc: Add the reset interface for EPC backend reset
  sgx-epc: Add the reset interface for sgx-epc virt device
  sgx-epc: Avoid bios reset during sgx epc initialization
  hostmem-epc: Make prealloc consistent with qemu cmdline during reset
  Kconfig: Add CONFIG_SGX support
  sgx-epc: Add the fill_device_info() callback support

 backends/hostmem-epc.c   | 118 ++
 backends/meson.build |   1 +
 configs/devices/i386-softmmu/default.mak |   1 +
 docs/intel-sgx.txt   | 167 +++
 hw/i386/Kconfig  |   5 +
 hw/i386/acpi-build.c |  22 +++
 hw/i386/fw_cfg.c |  10 +-
 hw/i386/meson.build  |   2 +
 hw/i386/pc.c |  15 +-
 hw/i386/pc_piix.c|   4 +
 hw/i386/pc_q35.c |   3 +
 hw/i386/sgx-epc.c| 265 +++
 hw/i386/sgx-stub.c   |  13 ++
 hw/i386/sgx.c|  84 ++
 hw/i386/x86.c|  29 
 hw/vfio/common.c |   1 +
 include/exec/memory.h|  15 +-
 include/hw/i386/pc.h |   8 +
 include/hw/i386/sgx-epc.h|  67 
 include/hw/i386/x86.h|   1 +
 monitor/hmp-cmds.c   |  10 ++
 qapi/machine.json|  52 +-
 qapi/qom.json|  19 +++
 qemu-options.hx  |  10 +-
 slirp|   2 +-
 softmmu/memory.c |   5 +
 softmmu/physmem.c|   3 +-
 target/i386/cpu.c| 199 +--
 target/i386/cpu.h|  39 +
 target/i386/kvm/kvm.c|  75 +
 target/i386/kvm/kvm_i386.h   |   2 +
 target/i386/machine.c|  42 -
 target/i386/svm.h|   8 +
 target/i386/tcg/seg_helper.c |   2 +-
 target/i386/tcg/sysemu/excp_helper.c |   2 +-
 target/i386/tcg/sysemu/misc_helper.c |  11 +-
 target/i386/tcg/sysemu/svm_helper.c  | 121 +-
 37 files changed, 1368 insertions(+), 65 deletions(-)
 create mode 100644 backends/hostmem-epc.c
 create mode 100644 

[PULL] qemu-socket unix socket bugfix 2021-09-06

2021-09-06 Thread Michael Tokarev
The following changes since commit 935efca6c246c108253b0e4e51cc87648fc7ca10:

  Merge remote-tracking branch 
'remotes/thuth-gitlab/tags/pull-request-2021-09-06' into staging (2021-09-06 
12:38:07 +0100)

are available in the Git repository at:

  git://git.corpit.ru/qemu.git tags/patch-fetch

for you to fetch changes up to 118d527f2e4baec5fe8060b22a6212468b8e4d3f:

  qemu-sockets: fix unix socket path copy (again) (2021-09-06 17:18:54 +0300)


qemu-socket unix socket bugfix 2021-09-06


Michael Tokarev (1):
  qemu-sockets: fix unix socket path copy (again)

 util/qemu-sockets.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)



Re: [PATCH v3 5/6] qapi: Add support for aliases

2021-09-06 Thread Markus Armbruster
Kevin Wolf  writes:

> Introduce alias definitions for object types (structs and unions). This
> allows using the same QAPI type and visitor for many syntax variations
> that exist in the external representation, like between QMP and the
> command line. It also provides a new tool for evolving the schema while
> maintaining backwards compatibility during a deprecation period.
>
> Signed-off-by: Kevin Wolf 
> ---
>  docs/devel/qapi-code-gen.rst   | 104 +-
>  docs/sphinx/qapidoc.py |   2 +-
>  scripts/qapi/expr.py   |  47 +-
>  scripts/qapi/schema.py | 116 +++--
>  scripts/qapi/types.py  |   4 +-
>  scripts/qapi/visit.py  |  34 +++-
>  tests/qapi-schema/test-qapi.py |   7 +-
>  tests/qapi-schema/double-type.err  |   2 +-
>  tests/qapi-schema/unknown-expr-key.err |   2 +-
>  9 files changed, 297 insertions(+), 21 deletions(-)
>
> diff --git a/docs/devel/qapi-code-gen.rst b/docs/devel/qapi-code-gen.rst
> index 26c62b0e7b..c0883507a8 100644
> --- a/docs/devel/qapi-code-gen.rst
> +++ b/docs/devel/qapi-code-gen.rst
> @@ -262,7 +262,8 @@ Syntax::
> 'data': MEMBERS,
> '*base': STRING,
> '*if': COND,
> -   '*features': FEATURES }
> +   '*features': FEATURES,
> +   '*aliases': ALIASES }
>  MEMBERS = { MEMBER, ... }
>  MEMBER = STRING : TYPE-REF
> | STRING : { 'type': TYPE-REF,
> @@ -312,6 +313,9 @@ the schema`_ below for more on this.
>  The optional 'features' member specifies features.  See Features_
>  below for more on this.
>  
> +The optional 'aliases' member specifies aliases.  See Aliases_ below
> +for more on this.
> +
>  
>  Union types
>  ---
> @@ -321,13 +325,15 @@ Syntax::
>  UNION = { 'union': STRING,
>'data': BRANCHES,
>'*if': COND,
> -  '*features': FEATURES }
> +  '*features': FEATURES,
> +  '*aliases': ALIASES }
>| { 'union': STRING,
>'data': BRANCHES,
>'base': ( MEMBERS | STRING ),
>'discriminator': STRING,
>'*if': COND,
> -  '*features': FEATURES }
> +  '*features': FEATURES,
> +  '*aliases': ALIASES }
>  BRANCHES = { BRANCH, ... }
>  BRANCH = STRING : TYPE-REF
> | STRING : { 'type': TYPE-REF, '*if': COND }
> @@ -437,6 +443,9 @@ the schema`_ below for more on this.
>  The optional 'features' member specifies features.  See Features_
>  below for more on this.
>  
> +The optional 'aliases' member specifies aliases.  See Aliases_ below
> +for more on this.
> +
>  
>  Alternate types
>  ---
> @@ -888,6 +897,95 @@ shows a conditional entity only when the condition is 
> satisfied in
>  this particular build.
>  
>  
> +Aliases
> +---
> +
> +Object types, including structs and unions, can contain alias
> +definitions.
> +
> +Aliases define alternative member names that may be used in wire input
> +to provide a value for a member in the same object or in a nested
> +object.

Explaining intended use would be nice.  From your cover letter:

This allows using the same QAPI type and visitor even when the syntax
has some variations between different external interfaces such as QMP
and the command line.

It also provides a new tool for evolving the schema while maintaining
backwards compatibility (possibly during a deprecation period).

For the second use, we need to be able to tack feature 'deprecated' to
exactly one of the two.

We can already tack it to the "real" member.  The real member's
'deprecated' must not apply to its aliases.

We can't tack it to the alias, yet.  More on that in review of PATCH 6.

> +
> +Syntax::
> +
> +ALIASES = [ ALIAS, ... ]
> +ALIAS = { '*name': STRING,
> +  'source': [ STRING, ... ] }
> +
> +If ``name`` is present, then the single member referred to by ``source``
> +is made accessible with the name given by ``name`` in the type where the
> +alias definition is specified.
> +
> +If ``name`` is not present, then this is a wildcard alias and all
> +members in the object referred to by ``source`` are made accessible in
> +the type where the alias definition is specified with the same name as
> +they have in ``source``.
> +
> +``source`` is a non-empty list of member names representing the path to
> +an object member. The first name is resolved in the same object.  Each
> +subsequent member is resolved in the object named by the preceding
> +member.
> +
> +Do not use optional objects in the path of a wildcard alias unless there
> +is no semantic difference between an empty object and an absent object.
> +Absent objects are implicitly turned into empty ones if an alias could
> +apply and provide a value in the nested object, which is always the case
> +for wildcard 

Re: [PATCH v3 4/6] qapi: Apply aliases in qobject-input-visitor

2021-09-06 Thread Markus Armbruster
Kevin Wolf  writes:

> When looking for an object in a struct in the external representation,
> check not only the currently visited struct, but also whether an alias
> in the current StackObject matches and try to fetch the value from the
> alias then. Providing two values for the same object through different
> aliases is an error.
>
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/qobject-input-visitor.c | 227 +--
>  1 file changed, 218 insertions(+), 9 deletions(-)
>
> diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
> index 16a75442ff..6193df28a5 100644
> --- a/qapi/qobject-input-visitor.c
> +++ b/qapi/qobject-input-visitor.c
> @@ -97,6 +97,8 @@ struct QObjectInputVisitor {
>  QObject *root;
>  bool keyval;/* Assume @root made with keyval_parse() */
>  
> +QDict *empty_qdict; /* Used for implicit objects */

Would

   /* For visiting objects where all members are from aliases */

be clearer?

> +
>  /* Stack of objects being visited (all entries will be either
>   * QDict or QList). */
>  QSLIST_HEAD(, StackObject) stack;
> @@ -169,9 +171,190 @@ static const char *full_name(QObjectInputVisitor *qiv, 
> const char *name)
>  return full_name_so(qiv, name, false, tos);
>  }
>  
> +static bool find_object_member(QObjectInputVisitor *qiv,
> +   StackObject **so, const char **name,
> +   bool *is_alias_prefix, Error **errp);

According to the function's contract below, three cases:

* Input present: update *so, *name, return true.

* Input absent: zap *so, *name, set *is_alias_prefix, return false.

* Error: set *errp, leave *is_alias_prefix undefined, return false.

> +
> +/*
> + * Check whether the member @name in @so, or an alias for it, is
> + * present in the input and can be used to obtain the value.
> + */
> +static bool input_present(QObjectInputVisitor *qiv, StackObject *so,
> +  const char *name)
> +{
> +/*
> + * Check whether the alias member is present in the input
> + * (possibly recursively because aliases are transitive).
> + * The QAPI generator makes sure that alises cannot form loops, so
> + * the recursion guaranteed to terminate.
> + */
> +if (!find_object_member(qiv, , , NULL, NULL)) {

* Input absent: zap @so and @name.

* Error: don't zap.

Since @so and @name aren't used anymore, the difference doesn't matter.
Okay.

> +return false;
> +}
> +
> +/*
> + * Every source can be used only once. If a value in the input
> + * would end up being used twice through aliases, we'll fail the
> + * second access.
> + */
> +if (!g_hash_table_contains(so->h, name)) {
> +return false;
> +}
> +
> +return true;
> +}
> +
> +/*
> + * Check whether the member @name in the object visited by @so can be
> + * specified in the input by using the alias described by @a (which
> + * must be an alias contained in so->aliases).
> + *
> + * If @name is only a prefix of the alias source, but doesn't match
> + * immediately, false is returned and *is_alias_prefix is set to true
> + * if it is non-NULL.  In all other cases, *is_alias_prefix is left
> + * unchanged.
> + */
> +static bool alias_source_matches(QObjectInputVisitor *qiv,
> + StackObject *so, InputVisitorAlias *a,
> + const char *name, bool *is_alias_prefix)
> +{
> +if (a->src[0] == NULL) {
> +assert(a->name == NULL);
> +return true;
> +}
> +
> +if (!strcmp(a->src[0], name)) {
> +if (a->name && a->src[1] == NULL) {
> +/*
> + * We're matching an exact member, the source for this alias is
> + * immediately in @so.
> + */
> +return true;
> +} else if (is_alias_prefix) {
> +/*
> + * We're only looking at a prefix of the source path for the 
> alias.
> + * If the input contains no object of the requested name, we will
> + * implicitly create an empty one so that the alias can still be
> + * used.
> + *
> + * We want to create the implicit object only if the alias is
> + * actually used, but we can't tell here for wildcard aliases 
> (only
> + * a later visitor call will determine this). This means that
> + * wildcard aliases must never have optional keys in their source
> + * path. The QAPI generator checks this condition.
> + */

Double-checking: this actually ensures that we only ever create the
implicit object when it will not remain empty.  Correct?

> +if (!a->name || input_present(qiv, a->alias_so, a->name)) {
> +*is_alias_prefix = true;
> +}
> +}
> +}
> +
> +return false;
> +}
> +
> +/*
> + * Find the place in the input where the 

Re: [PULL 00/13] Testing, build system and misc patches

2021-09-06 Thread Peter Maydell
On Mon, 6 Sept 2021 at 16:08, Paolo Bonzini  wrote:
>
> On 06/09/21 11:51, Thomas Huth wrote:
> > On 03/09/2021 18.49, Peter Maydell wrote:
> >> But I think there is an underlying meson bug here which that kind of
> >> use of an if is merely working around: if we ask for a static library
> >> it should not give us a dynamic library.
> >
> > Agreed. Actually, when I run configure with "--static --disable-system"
> > on my laptop, I'm also getting some warnings:
> >
> > WARNING: Static library 'z' not found for dependency 'zlib', may not be
> > statically linked
> > Run-time dependency zlib found: YES 1.2.11
> > Run-time dependency appleframeworks found: NO (tried framework)
> > Library rt found: YES
> > WARNING: Static library 'png16' not found for dependency 'libpng', may
> > not be statically linked
> > WARNING: Static library 'z' not found for dependency 'libpng', may not
> > be statically linked
> >
> > ... and linking then later fails while running "make".
> >
> > Paolo, could the behavior of meson be changed to fail already the
> > configuration step in this case instead of only printing a warning?
>
> The reason why this is just a warning is explained only in the code, and
> it's this:
>
>  # Library wasn't found, maybe we're looking in the wrong
>  # places or the library will be provided with LDFLAGS or
>  # LIBRARY_PATH from the environment (on macOS), and many
>  # other edge cases that we can't account for.
>  #
>  # Add all -L paths and use it as -lfoo
>
> In other words, Meson doesn't really know the library will be used for a
> statically-linked binary (as opposed to just not wanting a shared
> library for whatever reason).  So it looks for a .a file, and forces use
> of the a static library by passing a path to that file.  If it cannot
> find one, it warns.

Then Meson needs a feature so we can tell it "yes, we really did mean
that we want a static library, and only a static library will do".

thanks
-- PMM



Re: [RFC PATCH : v3 2/2] Implementation of nvme-mi plugin in nvme-cli

2021-09-06 Thread Mohit Kapoor

On Tue, Aug 03, 2021 at 02:34:46PM +0530, Mohit Kapoor wrote:

From: mohit kapoor 

Subject: [RFC PATCH : v3 2/2] Implementation of nvme-mi plugin in nvme-cli


Hello All,
Request to kindly provide your valuable feedback for the patches shared
for nvme-mi over QEMU and nvme-cli.
Looking forward for any suggestions and improvements.

Regards,
Mohit Kapoor



Re: [PULL 00/13] Testing, build system and misc patches

2021-09-06 Thread Paolo Bonzini

On 06/09/21 11:51, Thomas Huth wrote:

On 03/09/2021 18.49, Peter Maydell wrote:

On Fri, 3 Sept 2021 at 17:37, Alex Bennée  wrote:

Thomas Huth  writes:

On 03/09/2021 15.22, Peter Maydell wrote:

This provokes a new warning from meson on a linux-static build:
Run-time dependency appleframeworks found: NO (tried framework)
Library rt found: YES
Found pkg-config: /usr/bin/pkg-config (0.29.1)
WARNING: Static library 'gbm' not found for dependency 'gbm', may not
be statically linked
Run-time dependency gbm found: YES 20.0.8
Dependency libpng found: YES 1.6.34 (cached)
Dependency libjpeg found: YES unknown (cached)
If we're building statically and we can't find a static
library then (a) we shouldn't print a WARNING and
(b) we shouldn't then conclude that we've found gdm.


Hmmm, no clue what's wrong here, since I basically did declare it like
all other libraries are declared, too (so this problem should have
shown up somewhere else already?)... Paolo, do you have any ideas
what's going on here?


In attempting to replicate I found all the dynamic libs blow up:


   WARNING: Static library 'xkbcommon' not found for dependency 
'xkbcommon', may not be statically l

   Run-time dependency xkbcommon found: YES 1.0.3


I do vaguely recall complaining about new meson warnings for
static library detection in the past as well:
https://lore.kernel.org/qemu-devel/CAFEAcA8chPqS0keyGv0vBgNgacnMo95gA3LZDU2QfmteQ=4...@mail.gmail.com/ 

https://lore.kernel.org/qemu-devel/cafeaca_-cnmt-sy3nqngkpuqet86m6-82rf-uv3qkwcr14k...@mail.gmail.com/ 

https://lore.kernel.org/qemu-devel/cafeaca8xhxcghh2hibsdcxzryrru+xcwvsa85o7kl9bsmw7...@mail.gmail.com/ 




So is this a general problem with static libs. BTW I didn't catch this
because I only build user with --static as I thought system --static was
flakey anyway.


I'm not doing a system build in this case... Looking at some of
those older threads, it looks like part of the answer is that
for dependencies that we don't need for linux-user mode we should
guard the test with some suitable if condition so we don't create
the dependency unless we're going to use it, eg the brlapi check
uses "if not get_option('brlapi').auto() or have_system", rbd
has a similar thing involving have_block, etc.


Ok, thanks, that seems to work, I'll change the patch accordingly.


But I think there is an underlying meson bug here which that kind of
use of an if is merely working around: if we ask for a static library
it should not give us a dynamic library.


Agreed. Actually, when I run configure with "--static --disable-system" 
on my laptop, I'm also getting some warnings:


WARNING: Static library 'z' not found for dependency 'zlib', may not be 
statically linked

Run-time dependency zlib found: YES 1.2.11
Run-time dependency appleframeworks found: NO (tried framework)
Library rt found: YES
WARNING: Static library 'png16' not found for dependency 'libpng', may 
not be statically linked
WARNING: Static library 'z' not found for dependency 'libpng', may not 
be statically linked


... and linking then later fails while running "make".

Paolo, could the behavior of meson be changed to fail already the 
configuration step in this case instead of only printing a warning?


The reason why this is just a warning is explained only in the code, and
it's this:

# Library wasn't found, maybe we're looking in the wrong
# places or the library will be provided with LDFLAGS or
# LIBRARY_PATH from the environment (on macOS), and many
# other edge cases that we can't account for.
#
# Add all -L paths and use it as -lfoo

In other words, Meson doesn't really know the library will be used for a
statically-linked binary (as opposed to just not wanting a shared
library for whatever reason).  So it looks for a .a file, and forces use
of the a static library by passing a path to that file.  If it cannot
find one, it warns.

Note that pre-Meson we didn't warn for --disable-system (correct) but we
did the wrong thing silently for --enable-system (just like now, except
without a warning).  So Meson's warning forces us to be a bit more verbose
to do only strictly necesary tests, as in:

  pam = not_found
  if not get_option('auth_pam').auto() or have_system
pam = cc.find_library('pam', has_headers: ['security/pam_appl.h'],
  required: get_option('auth_pam'),
  kwargs: static_kwargs)
  endif

but it also catches incorrect setups on the user side and makes the "configure"
step a little faster with --disable-system.

FWIW, in the latest Meson version there's a shortcut for the above pattern,
since it is very common in QEMU; it can be rewritten as follows to avoid the
if/endif:

  pam = cc.find_library('pam', has_headers: ['security/pam_appl.h'],
required: get_option('auth_pam').disable_auto_if(not 
have_system)
kwargs: static_kwargs)

Paolo




Re: [PATCH v2 6/8] pc: Add VIOT table for virtio-iommu

2021-09-06 Thread Eric Auger
Hi jean,

On 9/3/21 4:32 PM, Jean-Philippe Brucker wrote:
> The ACPI Virtual I/O Translation table (VIOT) describes the relation
> between a virtio-iommu and the endpoints it manages. When a virtio-iommu
> device is instantiated, add a VIOT table.

As there is no used of pcms->virtio_iommu and virtio_iommu_bdf yet, maybe 
squash this into next patch?

Thanks

Eric

>
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  hw/i386/Kconfig  | 1 +
>  include/hw/i386/pc.h | 2 ++
>  hw/i386/acpi-build.c | 5 +
>  hw/i386/pc.c | 7 +++
>  4 files changed, 15 insertions(+)
>
> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> index ddedcef0b2..13db05d557 100644
> --- a/hw/i386/Kconfig
> +++ b/hw/i386/Kconfig
> @@ -54,6 +54,7 @@ config PC_ACPI
>  select ACPI_X86
>  select ACPI_CPU_HOTPLUG
>  select ACPI_MEMORY_HOTPLUG
> +select ACPI_VIOT
>  select SMBUS_EEPROM
>  select PFLASH_CFI01
>  depends on ACPI_SMBUS
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 88dffe7517..979b8d0b7c 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -45,6 +45,8 @@ typedef struct PCMachineState {
>  bool pit_enabled;
>  bool hpet_enabled;
>  bool default_bus_bypass_iommu;
> +bool virtio_iommu;
> +uint16_t virtio_iommu_bdf;
>  uint64_t max_fw_size;
>  
>  /* NUMA information: */
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index a33ac8b91e..078b7f5c6f 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -71,6 +71,7 @@
>  
>  #include "hw/acpi/ipmi.h"
>  #include "hw/acpi/hmat.h"
> +#include "hw/acpi/viot.h"
>  
>  /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
>   * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
> @@ -2559,6 +2560,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
> *machine)
>  build_dmar_q35(tables_blob, tables->linker, x86ms->oem_id,
> x86ms->oem_table_id);
>  }
> +} else if (pcms->virtio_iommu) {
> +acpi_add_table(table_offsets, tables_blob);
> +build_viot(tables_blob, tables->linker, pcms->virtio_iommu_bdf,
> +   x86ms->oem_id, x86ms->oem_table_id);
>  }
>  if (machine->nvdimms_state->is_enabled) {
>  nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index c2b9d62a35..694fc9ce07 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -84,6 +84,7 @@
>  #include "hw/i386/intel_iommu.h"
>  #include "hw/net/ne2000-isa.h"
>  #include "standard-headers/asm-x86/bootparam.h"
> +#include "hw/virtio/virtio-iommu.h"
>  #include "hw/virtio/virtio-pmem-pci.h"
>  #include "hw/virtio/virtio-mem-pci.h"
>  #include "hw/mem/memory-device.h"
> @@ -1388,6 +1389,12 @@ static void pc_machine_device_plug_cb(HotplugHandler 
> *hotplug_dev,
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
> object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
>  pc_virtio_md_pci_plug(hotplug_dev, dev, errp);
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
> +PCMachineState *pcms = PC_MACHINE(hotplug_dev);
> +PCIDevice *pdev = PCI_DEVICE(dev);
> +
> +pcms->virtio_iommu = true;
> +pcms->virtio_iommu_bdf = pci_get_bdf(pdev);
>  }
>  }
>  




Re: [PATCH v2 8/8] docs: Add '-device virtio-iommu' entry

2021-09-06 Thread Eric Auger
Hi,

On 9/3/21 4:32 PM, Jean-Philippe Brucker wrote:
> Document the virtio-iommu device for qemu-system-x86_64. In particular
> note the lack of interrupt remapping, which may be an important
> limitation on x86.
>
> Suggested-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  qemu-options.hx | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 83aa59a920..9a1906a748 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -976,6 +976,9 @@ SRST
>  Please also refer to the wiki page for general scenarios of VT-d
>  emulation in QEMU: https://wiki.qemu.org/Features/VT-d.
>  
> +``-device virtio-iommu``
> +Enable a paravirtual IOMMU, that manages DMA isolation and remapping
> +for all PCI devices, but does not support interrupt remapping.
maybe you should also document which machines do support the
virtio-iommu (as done for intel-iommu). By the way intel-iommu only is
supported on q35. Is it the same for the virtio-iommu. I think we should
restrict the virtio-iommu to q35 too

Thanks

Eric
>  ERST
>  
>  DEF("name", HAS_ARG, QEMU_OPTION_name,




Re: [PATCH v2 7/8] pc: Allow instantiating a virtio-iommu device

2021-09-06 Thread Eric Auger
Hi Jean,

On 9/3/21 4:32 PM, Jean-Philippe Brucker wrote:
> From: Eric Auger 
>
> Add a hotplug handler for virtio-iommu on x86 and set the necessary
> reserved region property. On x86, the [0xfee0, 0xfeef] DMA
> region is reserved for MSIs. DMA transactions to this range either
> trigger IRQ remapping in the IOMMU or bypasses IOMMU translation.
s/bypasses/bypass.
> Although virtio-iommu does not support IRQ remapping it must be informed
> of the reserved region so that it can forward DMA transactions targeting
> this region.

>
> Signed-off-by: Eric Auger 
Feel free to remove my SoB. I have done much here.
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  hw/i386/pc.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 694fc9ce07..c1e1cffe16 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -797,6 +797,11 @@ void pc_machine_done(Notifier *notifier, void *data)
>   "irqchip support.");
>  exit(EXIT_FAILURE);
>  }
> +
> +if (pcms->virtio_iommu && x86_iommu_get_default()) {
> +error_report("QEMU does not support multiple vIOMMUs for x86 yet.");
> +exit(EXIT_FAILURE);
> +}
>  }

I think you shall detect the case of dual instantiation of intel_iommu
and virtio-iommu. Maybe pc_hotplug_allowed() can be used for that. Note
that both devices can be refered to in either order.
>  
>  void pc_guest_info_init(PCMachineState *pcms)
> @@ -1376,6 +1381,14 @@ static void 
> pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
> object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
>  pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
> +/* Declare the reserved MSI region */
> +char *resv_prop_str = g_strdup_printf("0xfee0:0xfeef:%d",
> +  VIRTIO_IOMMU_RESV_MEM_T_MSI);
> +
> +qdev_prop_set_uint32(dev, "len-reserved-regions", 1);
> +qdev_prop_set_string(dev, "reserved-regions[0]", resv_prop_str);
> +g_free(resv_prop_str);
>  }
>  }
>  
> @@ -1393,6 +1406,11 @@ static void pc_machine_device_plug_cb(HotplugHandler 
> *hotplug_dev,
>  PCMachineState *pcms = PC_MACHINE(hotplug_dev);
>  PCIDevice *pdev = PCI_DEVICE(dev);
>  
> +if (pcms->virtio_iommu) {
> +error_setg(errp,
> +   "QEMU does not support multiple vIOMMUs for x86 
> yet.");
> +return;
> +}
>  pcms->virtio_iommu = true;
>  pcms->virtio_iommu_bdf = pci_get_bdf(pdev);
>  }
> @@ -1436,7 +1454,8 @@ static HotplugHandler 
> *pc_get_hotplug_handler(MachineState *machine,
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
>  object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
>  object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
> -object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
> +object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI) ||
> +object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
>  return HOTPLUG_HANDLER(machine);
>  }
>  
Thanks

Eric




Re: [PATCH v2 8/8] docs: Add '-device virtio-iommu' entry

2021-09-06 Thread Daniel P . Berrangé
On Fri, Sep 03, 2021 at 04:32:09PM +0200, Jean-Philippe Brucker wrote:
> Document the virtio-iommu device for qemu-system-x86_64. In particular
> note the lack of interrupt remapping, which may be an important
> limitation on x86.
> 
> Suggested-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  qemu-options.hx | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 83aa59a920..9a1906a748 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -976,6 +976,9 @@ SRST
>  Please also refer to the wiki page for general scenarios of VT-d
>  emulation in QEMU: https://wiki.qemu.org/Features/VT-d.
>  
> +``-device virtio-iommu``
> +Enable a paravirtual IOMMU, that manages DMA isolation and remapping
> +for all PCI devices, but does not support interrupt remapping.

It would be desirable to document why this is better/worse/equiv to
the intel-iommu device documented just before, so that people have a
better idea of which they should be trying to use.

I'm going to assume intel-iommu is more likely to "just work" out of
the box since it models real hardware that OS are likely to already
support ?  Is that right though ?


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 8/8] docs: Add '-device virtio-iommu' entry

2021-09-06 Thread Eric Auger
Hi Jean,

On 9/3/21 4:32 PM, Jean-Philippe Brucker wrote:
> Document the virtio-iommu device for qemu-system-x86_64. In particular
Nit: this is not only for qemu-system-x86_6. This also documents the
option usage for aarch64. Only the interrupt remapping note is x86-64
specific.

I think it also would be worth to mention the interaction with the
bypass iommu option as you mention "for all devices".

Thanks

Eric
> note the lack of interrupt remapping, which may be an important
> limitation on x86.
>
> Suggested-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  qemu-options.hx | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 83aa59a920..9a1906a748 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -976,6 +976,9 @@ SRST
>  Please also refer to the wiki page for general scenarios of VT-d
>  emulation in QEMU: https://wiki.qemu.org/Features/VT-d.
>  
> +``-device virtio-iommu``
> +Enable a paravirtual IOMMU, that manages DMA isolation and remapping
> +for all PCI devices, but does not support interrupt remapping.
>  ERST
>  
>  DEF("name", HAS_ARG, QEMU_OPTION_name,




Re: [PATCH v2 4/8] hw/arm/virt: Remove device tree restriction for virtio-iommu

2021-09-06 Thread Eric Auger
Hi Jean,

On 9/3/21 4:32 PM, Jean-Philippe Brucker wrote:
> virtio-iommu is now supported with ACPI VIOT as well as device tree.
> Remove the restriction that prevents from instantiating a virtio-iommu
> device under ACPI.
>
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Eric

> ---
>  hw/arm/virt.c| 10 ++
>  hw/virtio/virtio-iommu-pci.c |  7 ---
>  2 files changed, 2 insertions(+), 15 deletions(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 81eda46b0b..b4598d3fe6 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2551,16 +2551,10 @@ static HotplugHandler 
> *virt_machine_get_hotplug_handler(MachineState *machine,
>  MachineClass *mc = MACHINE_GET_CLASS(machine);
>  
>  if (device_is_dynamic_sysbus(mc, dev) ||
> -   (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM))) {
> +object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
> +object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
>  return HOTPLUG_HANDLER(machine);
>  }
> -if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
> -VirtMachineState *vms = VIRT_MACHINE(machine);
> -
> -if (!vms->bootinfo.firmware_loaded || !virt_is_acpi_enabled(vms)) {
> -return HOTPLUG_HANDLER(machine);
> -}
> -}
>  return NULL;
>  }
>  
> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
> index 770c286be7..f30eb16cbf 100644
> --- a/hw/virtio/virtio-iommu-pci.c
> +++ b/hw/virtio/virtio-iommu-pci.c
> @@ -48,16 +48,9 @@ static void virtio_iommu_pci_realize(VirtIOPCIProxy 
> *vpci_dev, Error **errp)
>  VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
>  
>  if (!qdev_get_machine_hotplug_handler(DEVICE(vpci_dev))) {
> -MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
> -
> -error_setg(errp,
> -   "%s machine fails to create iommu-map device tree 
> bindings",
> -   mc->name);
>  error_append_hint(errp,
>"Check your machine implements a hotplug handler "
>"for the virtio-iommu-pci device\n");
> -error_append_hint(errp, "Check the guest is booted without FW or 
> with "
> -  "-no-acpi\n");
>  return;
>  }
>  for (int i = 0; i < s->nb_reserved_regions; i++) {




Re: [PATCH v2] include/block.h: remove outdated comment

2021-09-06 Thread Stefan Hajnoczi
On Fri, Sep 03, 2021 at 01:38:00PM +0200, Emanuele Giuseppe Esposito wrote:
> There are a couple of errors in bdrv_drained_begin header comment:
> - block_job_pause does not exist anymore, it has been replaced
>   with job_pause in b15de82867
> - job_pause is automatically invoked as a .drained_begin callback
>   (child_job_drained_begin) by the child_job BdrvChildClass struct
>   in blockjob.c. So no additional pause should be required.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
> v2:
> + add "block jobs" to the external request sources
> 
>  include/block/block.h | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v2 5/8] hw/arm/virt: Reject instantiation of multiple IOMMUs

2021-09-06 Thread Eric Auger
Hi Jean,
On 9/3/21 4:32 PM, Jean-Philippe Brucker wrote:
> We do not support instantiating multiple IOMMUs. Before adding a
> virtio-iommu, check that no other IOMMU is present. This will detect
> both "iommu=smmuv3" machine parameter and another virtio-iommu instance.
>
> Signed-off-by: Jean-Philippe Brucker 

You may add
Fixes: 70e89132c9 ("hw/arm/virt: Add the virtio-iommu device tree mappings")
as the problem already exists with dt.

Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  hw/arm/virt.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index b4598d3fe6..5ca225291f 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2475,6 +2475,11 @@ static void virt_machine_device_plug_cb(HotplugHandler 
> *hotplug_dev,
>  if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
>  PCIDevice *pdev = PCI_DEVICE(dev);
>  
> +if (vms->iommu != VIRT_IOMMU_NONE) {
> +error_setg(errp, "virt machine does not support multiple 
> IOMMUs");
> +return;
> +}
> +
>  vms->iommu = VIRT_IOMMU_VIRTIO;
>  vms->virtio_iommu_bdf = pci_get_bdf(pdev);
>  create_virtio_iommu_dt_bindings(vms);




Re: [PATCH] target/i386: add missing bits to CR4_RESERVED_MASK

2021-09-06 Thread Paolo Bonzini

On 31/08/21 19:57, Richard W.M. Jones wrote:

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 6c50d3ab4f..ce85f1a29d 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -257,6 +257,7 @@ typedef enum X86Seg {
  | CR4_DE_MASK | CR4_PSE_MASK | CR4_PAE_MASK \
  | CR4_MCE_MASK | CR4_PGE_MASK | CR4_PCE_MASK \
  | CR4_OSFXSR_MASK | CR4_OSXMMEXCPT_MASK |CR4_UMIP_MASK \
+| CR4_LA57_MASK | CR4_VMXE_MASK | CR4_SMXE_MASK \
  | CR4_FSGSBASE_MASK | CR4_PCIDE_MASK | CR4_OSXSAVE_MASK \
  | CR4_SMEP_MASK | CR4_SMAP_MASK | CR4_PKE_MASK | 
CR4_PKS_MASK))

First thing to say is I tested this locally and it fixes the
problem seen inhttps://bugzilla.redhat.com/show_bug.cgi?id=1999700.
I will also add this patch to Fedora soon.  So:

Tested-by: Richard W.M. Jones

But my question is, does this mean that every time a new CPU feature
appears we must remember to update this code?


This is used only by TCG, which explains why VMXE/SMXE were not there; 
however, LA57 is missing indeed.


New features must be included both here and in cr4_reserved_bits, but 
only if TCG supports them, otherwise they can be left out.  Since 
VMXE/SMXE are not supported by TCG, they should be either added both 
here and in cr4_reserved_bits (keyed on env->features[FEAT_1_ECX] & 
CPUID_EXT_{VMX,SMX} respectively), or they should not be added to 
CR4_RESERVED_MASK either.  On the other hand LA57 is already handled by 
cr4_reserved_bits, so it is okay to just add it here.


Thanks,

Paolo




Re: [PATCH v2 3/8] hw/arm/virt-acpi-build: Add VIOT table for virtio-iommu

2021-09-06 Thread Eric Auger
Hi Jean,

On 9/3/21 4:32 PM, Jean-Philippe Brucker wrote:
> When a virtio-iommu is instantiated, describe it using the ACPI VIOT
> table.
>
> Signed-off-by: Jean-Philippe Brucker 

Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  hw/arm/Kconfig   | 1 +
>  hw/arm/virt-acpi-build.c | 7 +++
>  2 files changed, 8 insertions(+)
>
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 4ba0aca067..7da0422446 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -29,6 +29,7 @@ config ARM_VIRT
>  select ACPI_HW_REDUCED
>  select ACPI_NVDIMM
>  select ACPI_APEI
> +select ACPI_VIOT
>  
>  config CHEETAH
>  bool
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 037cc1fd82..e2fa677d80 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -55,6 +55,7 @@
>  #include "kvm_arm.h"
>  #include "migration/vmstate.h"
>  #include "hw/acpi/ghes.h"
> +#include "hw/acpi/viot.h"
>  
>  #define ARM_SPI_BASE 32
>  
> @@ -849,6 +850,12 @@ void virt_acpi_build(VirtMachineState *vms, 
> AcpiBuildTables *tables)
>  }
>  #endif
>  
> +if (vms->iommu == VIRT_IOMMU_VIRTIO) {
> +acpi_add_table(table_offsets, tables_blob);
> +build_viot(tables_blob, tables->linker, vms->virtio_iommu_bdf,
> +   vms->oem_id, vms->oem_table_id);
> +}
> +
>  /* XSDT is pointed to by RSDP */
>  xsdt = tables_blob->len;
>  build_xsdt(tables_blob, tables->linker, table_offsets, vms->oem_id,




Re: [PULL 18/28] file-posix: try BLKSECTGET on block devices too, do not round to power of 2

2021-09-06 Thread Halil Pasic
On Fri, 25 Jun 2021 16:18:12 +0200
Paolo Bonzini  wrote:

> bs->sg is only true for character devices, but block devices can also
> be used with scsi-block and scsi-generic.  Unfortunately BLKSECTGET
> returns bytes in an int for /dev/sgN devices, and sectors in a short
> for block devices, so account for that in the code.
> 
> The maximum transfer also need not be a power of 2 (for example I have
> seen disks with 1280 KiB maximum transfer) so there's no need to pass
> the result through pow2floor.
> 
> Signed-off-by: Paolo Bonzini 

We have found that this patch leads to in guest I/O errors when DASD
is used as a source device. I.e. libvirt domain xml wise something like:


  
  
  
  
  


I don't think it is the fault of this patch: it LGTM. But it correlates
100%, furthermore the problem seems to be related to the value of
bl.max_iov which now comes from sysfs. 

We are still investigating what is actually wrong. Just wanted to give
everybody a heads-up that this does seem to cause a nasty regression on
s390x, even if the code itself is perfect.

Regards,
Halil



Re: [PATCH 0/2] iothread: cleanup after adding a new parameter to IOThread

2021-09-06 Thread Philippe Mathieu-Daudé
On 7/27/21 4:59 PM, Stefano Garzarella wrote:
> We recently added a new parameter (aio-max-batch) to IOThread.
> This series cleans up the code a bit, no functional changes.
> 
> Stefano Garzarella (2):
>   iothread: rename PollParamInfo to IOThreadParamInfo
>   iothread: use IOThreadParamInfo in iothread_[set|get]_param()
> 
>  iothread.c | 28 +++-
>  1 file changed, 15 insertions(+), 13 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PULL 00/36] (Mostly) x86 changes for 2021-09-06

2021-09-06 Thread Peter Maydell
On Mon, 6 Sept 2021 at 14:13, Paolo Bonzini  wrote:
>
> The following changes since commit 31ebff513fad11f315377f6b07447169be8d9f86:
>
>   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2021-09-03' 
> into staging (2021-09-04 19:21:19 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 4e3cdb6ce6048bb28d70a438081252a29563b757:
>
>   doc: Add the SGX doc (2021-09-06 04:10:24 -0400)
>
> 
> * SGX support (Sean, Yang)
> * vGIF and vVMLOAD/VMSAVE support (Lara)
> * Move GBM handling to Meson (Thomas)

Hi; this has a merge conflict in meson.build; please rebase and resend.
(I think it's likely the gbm patch that Thomas mentions that's already
in master.)

-- PMM



Re: [PATCH 0/2] iothread: cleanup after adding a new parameter to IOThread

2021-09-06 Thread Stefano Garzarella

Ping :-)

Looks like it went into the crack during feature freeze,
should I resend it?

On Tue, Jul 27, 2021 at 04:59:34PM +0200, Stefano Garzarella wrote:

We recently added a new parameter (aio-max-batch) to IOThread.
This series cleans up the code a bit, no functional changes.

Stefano Garzarella (2):
 iothread: rename PollParamInfo to IOThreadParamInfo
 iothread: use IOThreadParamInfo in iothread_[set|get]_param()

iothread.c | 28 +++-
1 file changed, 15 insertions(+), 13 deletions(-)

--
2.31.1







Re: [PULL v2 00/10] Testing, build system and misc patches

2021-09-06 Thread Peter Maydell
On Mon, 6 Sept 2021 at 12:29, Thomas Huth  wrote:
>
>  Hi Peter!
>
> The following changes since commit 31ebff513fad11f315377f6b07447169be8d9f86:
>
>   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2021-09-03' 
> into staging (2021-09-04 19:21:19 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/thuth/qemu.git tags/pull-request-2021-09-06
>
> for you to fetch changes up to 6695e4c0fd9ef05bf6ab8e3402d5bc95b39c4cf3:
>
>   softmmu/vl: Deprecate the -sdl and -curses option (2021-09-06 10:00:14 
> +0200)
>
> v2:
>  - Dropped patches that were already merged through Alex' pull request
>  - Fixed the GBM patch to not cause a warning with --static use builds anymore
>
> 
> * Add definitions of terms for CI/testing
> * Fix g_setenv problem discovered by Coverity
> * Gitlab CI improvements
> * Build system improvements (configure script + meson.build)
> * Removal of the show-fixed-bugs.sh script
> * Clean up of the sdl and curses options
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.2
for any user-visible changes.

-- PMM



Re: [PULL 08/36] configure / meson: Move the GBM handling to meson.build

2021-09-06 Thread Thomas Huth

On 06/09/2021 15.10, Paolo Bonzini wrote:

From: Thomas Huth 

The GBM library detection does not need to be in the configure script,
since it does not have any user-facing options (there are no
--enable-gbm or --disable-gbm switches). Let's move it to meson.build
instead, so we don't have to clutter config-host.mak with the related
switches.

Additionally, only check for GBM if it is really required, i.e. if we
either compile with OpenGL or with virglrenderer support.

Signed-off-by: Thomas Huth 
Message-Id: <20210713111516.734834-1-th...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
  configure  | 14 --
  contrib/vhost-user-gpu/meson.build |  5 ++---
  meson.build| 14 --
  3 files changed, 10 insertions(+), 23 deletions(-)

diff --git a/configure b/configure
index bd823307a6..8adf2127c3 100755
--- a/configure
+++ b/configure
@@ -3451,13 +3451,6 @@ esac
  ##
  # opengl probe (for sdl2, gtk)
  
-gbm="no"

-if $pkg_config gbm; then
-gbm_cflags="$($pkg_config --cflags gbm)"
-gbm_libs="$($pkg_config --libs gbm)"
-gbm="yes"
-fi
-
  if test "$opengl" != "no" ; then
epoxy=no
if $pkg_config epoxy; then
@@ -4688,13 +4681,6 @@ if test "$opengl" = "yes" ; then
echo "OPENGL_LIBS=$opengl_libs" >> $config_host_mak
  fi
  
-if test "$gbm" = "yes" ; then

-echo "CONFIG_GBM=y" >> $config_host_mak
-echo "GBM_LIBS=$gbm_libs" >> $config_host_mak
-echo "GBM_CFLAGS=$gbm_cflags" >> $config_host_mak
-fi
-
-
  if test "$avx2_opt" = "yes" ; then
echo "CONFIG_AVX2_OPT=y" >> $config_host_mak
  fi
diff --git a/contrib/vhost-user-gpu/meson.build 
b/contrib/vhost-user-gpu/meson.build
index 4cb52a91d7..92c8f3a86a 100644
--- a/contrib/vhost-user-gpu/meson.build
+++ b/contrib/vhost-user-gpu/meson.build
@@ -1,6 +1,5 @@
-if 'CONFIG_TOOLS' in config_host and virgl.found() \
-and 'CONFIG_GBM' in config_host and 'CONFIG_LINUX' in config_host \
-and pixman.found()
+if 'CONFIG_TOOLS' in config_host and virgl.found() and gbm.found() \
+and 'CONFIG_LINUX' in config_host and pixman.found()
executable('vhost-user-gpu', files('vhost-user-gpu.c', 'virgl.c', 
'vugbm.c'),
   dependencies: [qemuutil, pixman, gbm, virgl, vhost_user, opengl],
   install: true,
diff --git a/meson.build b/meson.build
index bf63784812..13df8c37c7 100644
--- a/meson.build
+++ b/meson.build
@@ -472,11 +472,6 @@ if not get_option('zstd').auto() or have_block
  required: get_option('zstd'),
  method: 'pkg-config', kwargs: static_kwargs)
  endif
-gbm = not_found
-if 'CONFIG_GBM' in config_host
-  gbm = declare_dependency(compile_args: config_host['GBM_CFLAGS'].split(),
-   link_args: config_host['GBM_LIBS'].split())
-endif
  virgl = not_found
  if not get_option('virglrenderer').auto() or have_system
virgl = dependency('virglrenderer',
@@ -816,11 +811,17 @@ coreaudio = not_found
  if 'CONFIG_AUDIO_COREAUDIO' in config_host
coreaudio = declare_dependency(link_args: 
config_host['COREAUDIO_LIBS'].split())
  endif
+
  opengl = not_found
  if 'CONFIG_OPENGL' in config_host
opengl = declare_dependency(compile_args: 
config_host['OPENGL_CFLAGS'].split(),
link_args: config_host['OPENGL_LIBS'].split())
  endif
+gbm = not_found
+if virgl.found() or 'CONFIG_OPENGL' in config_host
+  gbm = dependency('gbm', method: 'pkg-config',
+   required: false, kwargs: static_kwargs)
+endif


Please drop this version of the patch, it caused a new warning in Peter's 
merge tests, see:


 https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg01026.html

I've got a fixed version in my pull request from today.

 Thomas




[PATCH v4] hw/arm/aspeed: Add Fuji machine type

2021-09-06 Thread pdel
From: Peter Delevoryas 

This adds a new machine type "fuji-bmc" based on the following device tree:

https://github.com/torvalds/linux/blob/40cb6373b46/arch/arm/boot/dts/aspeed-bmc-facebook-fuji.dts

Most of the i2c devices are not there, they're added here:

https://github.com/facebook/openbmc/blob/fb2ed12002fb/meta-facebook/meta-fuji/recipes-utils/openbmc-utils/files/setup_i2c.sh

I tested this by building a Fuji image from Facebook's OpenBMC repo,
booting, and ssh'ing from host-to-guest.

Signed-off-by: Peter Delevoryas 
---
 hw/arm/aspeed.c | 113 
 1 file changed, 113 insertions(+)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 7a9459340c..95ce4b1670 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -159,6 +159,10 @@ struct AspeedMachineState {
 #define RAINIER_BMC_HW_STRAP1 0x
 #define RAINIER_BMC_HW_STRAP2 0x
 
+/* Fuji hardware value */
+#define FUJI_BMC_HW_STRAP10x
+#define FUJI_BMC_HW_STRAP20x
+
 /*
  * The max ram region is for firmwares that scan the address space
  * with load/store to guess how much RAM the SoC has.
@@ -772,6 +776,91 @@ static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
 aspeed_eeprom_init(aspeed_i2c_get_bus(>i2c, 15), 0x50, 64 * KiB);
 }
 
+static void get_pca9548_channels(I2CBus *bus, uint8_t mux_addr,
+ I2CBus **channels)
+{
+I2CSlave *mux = i2c_slave_create_simple(bus, "pca9548", mux_addr);
+for (int i = 0; i < 8; i++) {
+channels[i] = pca954x_i2c_get_bus(mux, i);
+}
+}
+
+#define TYPE_LM75 TYPE_TMP105
+#define TYPE_TMP75 TYPE_TMP105
+#define TYPE_TMP422 "tmp422"
+
+static void fuji_bmc_i2c_init(AspeedMachineState *bmc)
+{
+AspeedSoCState *soc = >soc;
+I2CBus *i2c[144] = {};
+
+for (int i = 0; i < 16; i++) {
+i2c[i] = aspeed_i2c_get_bus(>i2c, i);
+}
+I2CBus *i2c180 = i2c[2];
+I2CBus *i2c480 = i2c[8];
+I2CBus *i2c600 = i2c[11];
+
+get_pca9548_channels(i2c180, 0x70, [16]);
+get_pca9548_channels(i2c480, 0x70, [24]);
+/* NOTE: The device tree skips [32, 40) in the alias numbering */
+get_pca9548_channels(i2c600, 0x77, [40]);
+get_pca9548_channels(i2c[24], 0x71, [48]);
+get_pca9548_channels(i2c[25], 0x72, [56]);
+get_pca9548_channels(i2c[26], 0x76, [64]);
+get_pca9548_channels(i2c[27], 0x76, [72]);
+for (int i = 0; i < 8; i++) {
+get_pca9548_channels(i2c[40 + i], 0x76, [80 + i * 8]);
+}
+
+i2c_slave_create_simple(i2c[17], TYPE_LM75, 0x4c);
+i2c_slave_create_simple(i2c[17], TYPE_LM75, 0x4d);
+
+aspeed_eeprom_init(i2c[19], 0x52, 64 * KiB);
+aspeed_eeprom_init(i2c[20], 0x50, 2 * KiB);
+aspeed_eeprom_init(i2c[22], 0x52, 2 * KiB);
+
+i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x48);
+i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x49);
+i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x4a);
+i2c_slave_create_simple(i2c[3], TYPE_TMP422, 0x4c);
+
+aspeed_eeprom_init(i2c[8], 0x51, 64 * KiB);
+i2c_slave_create_simple(i2c[8], TYPE_LM75, 0x4a);
+
+i2c_slave_create_simple(i2c[50], TYPE_LM75, 0x4c);
+aspeed_eeprom_init(i2c[50], 0x52, 64 * KiB);
+i2c_slave_create_simple(i2c[51], TYPE_TMP75, 0x48);
+i2c_slave_create_simple(i2c[52], TYPE_TMP75, 0x49);
+
+i2c_slave_create_simple(i2c[59], TYPE_TMP75, 0x48);
+i2c_slave_create_simple(i2c[60], TYPE_TMP75, 0x49);
+
+aspeed_eeprom_init(i2c[65], 0x53, 64 * KiB);
+i2c_slave_create_simple(i2c[66], TYPE_TMP75, 0x49);
+i2c_slave_create_simple(i2c[66], TYPE_TMP75, 0x48);
+aspeed_eeprom_init(i2c[68], 0x52, 64 * KiB);
+aspeed_eeprom_init(i2c[69], 0x52, 64 * KiB);
+aspeed_eeprom_init(i2c[70], 0x52, 64 * KiB);
+aspeed_eeprom_init(i2c[71], 0x52, 64 * KiB);
+
+aspeed_eeprom_init(i2c[73], 0x53, 64 * KiB);
+i2c_slave_create_simple(i2c[74], TYPE_TMP75, 0x49);
+i2c_slave_create_simple(i2c[74], TYPE_TMP75, 0x48);
+aspeed_eeprom_init(i2c[76], 0x52, 64 * KiB);
+aspeed_eeprom_init(i2c[77], 0x52, 64 * KiB);
+aspeed_eeprom_init(i2c[78], 0x52, 64 * KiB);
+aspeed_eeprom_init(i2c[79], 0x52, 64 * KiB);
+aspeed_eeprom_init(i2c[28], 0x50, 2 * KiB);
+
+for (int i = 0; i < 8; i++) {
+aspeed_eeprom_init(i2c[81 + i * 8], 0x56, 64 * KiB);
+i2c_slave_create_simple(i2c[82 + i * 8], TYPE_TMP75, 0x48);
+i2c_slave_create_simple(i2c[83 + i * 8], TYPE_TMP75, 0x4b);
+i2c_slave_create_simple(i2c[84 + i * 8], TYPE_TMP75, 0x4a);
+}
+}
+
 static bool aspeed_get_mmio_exec(Object *obj, Error **errp)
 {
 return ASPEED_MACHINE(obj)->mmio_exec;
@@ -1070,6 +1159,26 @@ static void 
aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
 aspeed_soc_num_cpus(amc->soc_name);
 };
 
+static void aspeed_machine_fuji_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
+
+mc->desc = "Facebook 

[PATCH v2] Prevent vhost-user-blk-test hang

2021-09-06 Thread Raphael Norwitz
In the vhost-user-blk-test, as of now there is nothing stoping
vhost-user-blk in QEMU writing to the socket right after forking off the
storage daemon before it has a chance to come up properly, leaving the
test hanging forever. This intermittently hanging test has caused QEMU
automation failures reported multiple times on the mailing list [1].

This change makes the storage-daemon notify the vhost-user-blk-test
that it is fully initialized and ready to handle client connections by
creating a pidfile on initialiation. This ensures that the storage-daemon
backend won't miss vhost-user messages and thereby resolves the hang.

[1] 
https://lore.kernel.org/qemu-devel/CAFEAcA8kYpz9LiPNxnWJAPSjc=nv532bedyfynabemeohqb...@mail.gmail.com/

Signed-off-by: Raphael Norwitz 
---
 tests/qtest/vhost-user-blk-test.c | 33 ++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 6f108a1b62..78140e6f28 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -24,6 +24,9 @@
 #define TEST_IMAGE_SIZE (64 * 1024 * 1024)
 #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
 #define PCI_SLOT_HP 0x06
+#define PIDFILE_RETRIES 5
+
+const char *pidfile_format = "/tmp/daemon-%d";
 
 typedef struct {
 pid_t pid;
@@ -885,7 +888,8 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
  int num_queues)
 {
 const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
-int i;
+int i, err, retries;
+char *daemon_pidfile_path;
 gchar *img_path;
 GString *storage_daemon_command = g_string_new(NULL);
 QemuStorageDaemonState *qsd;
@@ -898,6 +902,12 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
 " -object memory-backend-memfd,id=mem,size=256M,share=on "
 " -M memory-backend=mem -m 256M ");
 
+err = asprintf(_pidfile_path, pidfile_format, getpid());
+if (err == -1) {
+fprintf(stderr, "Failed to format storage-daemon pidfile name %m");
+abort();
+}
+
 for (i = 0; i < vus_instances; i++) {
 int fd;
 char *sock_path = create_listen_socket();
@@ -914,6 +924,9 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
i + 1, sock_path);
 }
 
+g_string_append_printf(storage_daemon_command, "--pidfile %s",
+   daemon_pidfile_path);
+
 g_test_message("starting vhost-user backend: %s",
storage_daemon_command->str);
 pid_t pid = fork();
@@ -930,7 +943,25 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
 execlp("/bin/sh", "sh", "-c", storage_daemon_command->str, NULL);
 exit(1);
 }
+
+/*
+ * Ensure the storage-daemon has come up properly before allowing the
+ * test to proceed.
+ */
+retries = 0;
+while (access(daemon_pidfile_path, F_OK) != 0) {
+if (retries > PIDFILE_RETRIES) {
+fprintf(stderr, "The storage-daemon failed to come up after %d "
+"seconds - killing the test", PIDFILE_RETRIES);
+abort();
+}
+
+retries++;
+usleep(1000);
+}
+
 g_string_free(storage_daemon_command, true);
+free(daemon_pidfile_path);
 
 qsd = g_new(QemuStorageDaemonState, 1);
 qsd->pid = pid;
-- 
2.20.1



[PATCH v2 0/1] hw/arm/aspeed: Initialize AST2600 UART clock selection registers

2021-09-06 Thread pdel
From: Peter Delevoryas 

After fixing some commit log issues in another patch:

https://lore.kernel.org/qemu-devel/2f2f44c6-4817-4d58-a7a0-496446ac7...@fb.com/

I noticed that I had similar issues in another patch I submitted:

https://lore.kernel.org/qemu-devel/20210831142502.279485-1-p...@fb.com/

So I'm resubmitting this patch with the URL in the commit description
changed to a permalink using a commit hash and removing the test
instructions that I had at the end.

Thanks,
Peter

Peter Delevoryas (1):
  hw/arm/aspeed: Initialize AST2600 UART clock selection registers

 hw/misc/aspeed_scu.c | 4 
 1 file changed, 4 insertions(+)

-- 
2.30.2




  1   2   >