date:20231204

Re: [PATCH v4 0/4] Support RISC-V IOPMP

2023-12-04 Thread Ethan Chen via

Ping.
https://patchew.org/QEMU/20231122053251.440723-1-etha...@andestech.com/

On Wed, Nov 22, 2023 at 01:32:47PM +0800, Ethan Chen wrote:
> This series implements IOPMP specification v1.0.0-draft4 rapid-k model.
> The specification url:
> https://github.com/riscv-non-isa/iopmp-spec/blob/main/riscv_iopmp_specification.pdf
> 
> When IOPMP is enabled, a DMA device ATCDMAC300 is added to RISC-V virt
> platform. This DMA device is connected to the IOPMP and has the 
> functionalities
> required by IOPMP, including:
> - Support setup the connection to IOPMP
> - Support asynchronous I/O to handle stall transactions
> - Send transaction information
> 
> IOPMP takes a transaction which partially match an entry as a partially hit
> error. The transaction size is depending on source device, destination device
> and bus.
> 
> Source device can send a transaction_info to IOPMP. IOPMP will check partially
> hit by transaction_info. If source device does not send a transaction_info,
> IOPMP checks information in IOMMU and dose not check partially hit.
> 
> Changes for v4:
> 
>   - Add descriptions of IOPMP and ATCDMAC300
>   - Refine coding style and comments
>   - config XILINX_AXI does not include file stream.c but selects config STREAM
> instead.
>   - ATCDMAC300: INT_STATUS is write 1 clear per bit
>   Rename iopmp_address_sink to transcation_info_sink
>   - IOPMP: Refine error message and remove unused variable
>   - VIRT: Document new options
>   atcdmac300 is only added when iopmp is enabled
>   serial setting should not be changed
> 
> Ethan Chen (4):
>   hw/core: Add config stream
>   Add RISC-V IOPMP support
>   hw/dma: Add Andes ATCDMAC300 support
>   hw/riscv/virt: Add IOPMP support
> 
>  docs/system/riscv/virt.rst|  11 +
>  hw/Kconfig|   1 +
>  hw/core/Kconfig   |   3 +
>  hw/core/meson.build   |   2 +-
>  hw/dma/Kconfig|   4 +
>  hw/dma/atcdmac300.c   | 566 ++
>  hw/dma/meson.build|   1 +
>  hw/misc/Kconfig   |   4 +
>  hw/misc/meson.build   |   1 +
>  hw/misc/riscv_iopmp.c | 966 ++
>  hw/riscv/Kconfig  |   2 +
>  hw/riscv/virt.c   |  65 ++
>  include/hw/dma/atcdmac300.h   | 180 
>  include/hw/misc/riscv_iopmp.h | 341 +++
>  .../hw/misc/riscv_iopmp_transaction_info.h|  28 +
>  include/hw/riscv/virt.h   |  10 +-
>  16 files changed, 2183 insertions(+), 2 deletions(-)
>  create mode 100644 hw/dma/atcdmac300.c
>  create mode 100644 hw/misc/riscv_iopmp.c
>  create mode 100644 include/hw/dma/atcdmac300.h
>  create mode 100644 include/hw/misc/riscv_iopmp.h
>  create mode 100644 include/hw/misc/riscv_iopmp_transaction_info.h
> 
> -- 
> 2.34.1
>

Re: [PATCH] ui/gtk: flush display pipeline before saving vmstate when blob=true

2023-12-04 Thread Marc-André Lureau

Hi

On Tue, Dec 5, 2023 at 6:40 AM Dongwon Kim  wrote:
>
> If the guest state is paused before it gets a response for the current
> scanout frame submission (resource-flush), it won't start submitting
> new frames after being restored as it still waits for the old response,
> which is accepted as a scanout render done signal. So it's needed to
> unblock the current scanout render pipeline before the run state is
> changed to make sure the guest receives the response for the current
> frame submission.
>
> Cc: Marc-André Lureau 
> Cc: Vivek Kasireddy 
> Signed-off-by: Dongwon Kim 
> ---
>  ui/gtk.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/ui/gtk.c b/ui/gtk.c
> index 810d7fc796..0f6237dd2f 100644
> --- a/ui/gtk.c
> +++ b/ui/gtk.c
> @@ -678,6 +678,18 @@ static const DisplayGLCtxOps egl_ctx_ops = {
>  static void gd_change_runstate(void *opaque, bool running, RunState state)
>  {
>  GtkDisplayState *s = opaque;
> +int i;
> +
> +if (state == RUN_STATE_SAVE_VM) {
> +for (i = 0; i < s->nb_vcs; i++) {
> +VirtualConsole *vc = &s->vc[i];
> +
> +if (vc->gfx.guest_fb.dmabuf) {

&& ..dmabuf->fence_fd >= 0

> +/* force flushing current scanout blob rendering process */
> +gd_hw_gl_flushed(vc);

This defeats the purpose of the fence, maybe we should wait for it to
be signaled first. At worse, I suppose the client can have some
glitches. Although since the guest is stopped, this is unlikely.

[PATCH v2] qemu: send stop event after bdrv_flush_all

2023-12-04 Thread tianren

From: Tianren Zhang 

The stop process is not finished until bdrv_flush_all
is done. Some users (e.g., libvirt) detect the STOP
event and invokes some lock release logic to revoke
the disk lock held by current qemu when such event is
emitted. In such case, if the bdrv_flush_all is after
the stop event, it's possible that the disk lock is
released while the qemu is still waiting for I/O.
Therefore, it's better to have the stop event generated
after the whole stop process is done, so we can
guarantee to users that the stop process is finished
when they get the STOP event.

Change-Id: Ia2f95cd55edfdeb71ee2e04005ac216cfabffa22
Signed-off-by: Tianren Zhang 
---
v2: do not call runstate_is_running twice
---
 system/cpus.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/system/cpus.c b/system/cpus.c
index a444a747f0..49af0f92b5 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -262,21 +262,24 @@ void cpu_interrupt(CPUState *cpu, int mask)
 static int do_vm_stop(RunState state, bool send_stop)
 {
 int ret = 0;
+bool do_send_stop = false;
 
 if (runstate_is_running()) {
 runstate_set(state);
 cpu_disable_ticks();
 pause_all_vcpus();
 vm_state_notify(0, state);
-if (send_stop) {
-qapi_event_send_stop();
-}
+do_send_stop = send_stop;
 }
 
 bdrv_drain_all();
 ret = bdrv_flush_all();
 trace_vm_stop_flush_all(ret);
 
+if (do_send_stop) {
+qapi_event_send_stop();
+}
+
 return ret;
 }
 
-- 
2.41.0

[PULL 0/1] ufs fix for 2023-12-05

2023-12-04 Thread Jeuk Kim

From: Jeuk Kim 

The following changes since commit 1664d74c50739401c8b40e8b514d12b5fc250067:

  tests/avocado: Update yamon-bin-02.22.zip URL (2023-12-04 08:17:35 -0500)

are available in the Git repository at:

  https://gitlab.com/jeuk20.kim/qemu.git tags/pull-ufs-20231205

for you to fetch changes up to 80a37b039ea9473d038bcef8bb64f4213affeae8:

  hw/ufs: avoid generating the same ID string for different LU devices 
(2023-12-05 13:57:18 +0900)


ufs fixes for 8.2

- Fix QEMU not starting when creating two UFS host controllers


Akinobu Mita (1):
  hw/ufs: avoid generating the same ID string for different LU devices

 hw/ufs/ufs.c | 8 
 1 file changed, 8 insertions(+)

[PULL 1/1] hw/ufs: avoid generating the same ID string for different LU devices

2023-12-04 Thread Jeuk Kim

From: Akinobu Mita 

QEMU would not start when trying to create two UFS host controllers and
a UFS logical unit for each with the following options:

-device ufs,id=bus0 \
-device ufs-lu,drive=drive1,bus=bus0,lun=0 \
-device ufs,id=bus1 \
-device ufs-lu,drive=drive2,bus=bus1,lun=0 \

This is because the same ID string ("0:0:0/scsi-disk") is generated
for both UFS logical units.

To fix this issue, prepend the parent pci device's path to make
the ID string unique.
(":00:03.0/0:0:0/scsi-disk" and ":00:04.0/0:0:0/scsi-disk")

Resolves: #2018
Fixes: 096434fea13a ("hw/ufs: Modify lu.c to share codes with SCSI subsystem")
Signed-off-by: Akinobu Mita 
Reviewed-by: Jeuk Kim 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20231204150543.48252-1-akinobu.m...@gmail.com>
Signed-off-by: Jeuk Kim 
---
 hw/ufs/ufs.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/ufs/ufs.c b/hw/ufs/ufs.c
index 68c5f1f6c9..eccdb852a0 100644
--- a/hw/ufs/ufs.c
+++ b/hw/ufs/ufs.c
@@ -1323,9 +1323,17 @@ static bool ufs_bus_check_address(BusState *qbus, 
DeviceState *qdev,
 return true;
 }
 
+static char *ufs_bus_get_dev_path(DeviceState *dev)
+{
+BusState *bus = qdev_get_parent_bus(dev);
+
+return qdev_get_dev_path(bus->parent);
+}
+
 static void ufs_bus_class_init(ObjectClass *class, void *data)
 {
 BusClass *bc = BUS_CLASS(class);
+bc->get_dev_path = ufs_bus_get_dev_path;
 bc->check_address = ufs_bus_check_address;
 }
 
-- 
2.34.1

Re: [PATCH-for-8.2?] hw/ufs: avoid generating the same ID string for different LU devices

2023-12-04 Thread Jeuk Kim

Hi Philippe,

Thank you for informing me.
I want this issue is fixed for the 8.2 release.

I created an issue like you said, but I couldn't assign the milestone (I'm 
guessing it's a permission problem).
https://gitlab.com/qemu-project/qemu/-/issues/2018
Could you help me with this?

Regardless, I'm going to submit a pull request to fix the issue right away.

Thank you.

On 4/12/23, Philippe Mathieu-Daudé wrote:
> Hi Jeuk,
> 
> On 4/12/23 17:50, Philippe Mathieu-Daudé wrote:
>> On 4/12/23 16:05, Akinobu Mita wrote:
>>> QEMU would not start when trying to create two UFS host controllers and
>>> a UFS logical unit for each with the following options:
>>>
>>> -device ufs,id=bus0 \
>>> -device ufs-lu,drive=drive1,bus=bus0,lun=0 \
>>> -device ufs,id=bus1 \
>>> -device ufs-lu,drive=drive2,bus=bus1,lun=0 \
>>>
>>> This is because the same ID string ("0:0:0/scsi-disk") is generated
>>> for both UFS logical units.
>>>
>>> To fix this issue, prepend the parent pci device's path to make
>>> the ID string unique.
>>> (":00:03.0/0:0:0/scsi-disk" and ":00:04.0/0:0:0/scsi-disk")
>>>
>>> Fixes: 096434fea13a ("hw/ufs: Modify lu.c to share codes with SCSI
>>> subsystem")
>
> If you think this must be fixed for the 8.2 release, please assign
> a release blocker issues to the GitLab 8.2 milestone here:
> https://gitlab.com/qemu-project/qemu/-/milestones/10
>
>>> Signed-off-by: Akinobu Mita 
>>
>> Reviewed-by: Philippe Mathieu-Daudé 
>>
>>> ---
>>>   hw/ufs/ufs.c 8 
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/hw/ufs/ufs.c b/hw/ufs/ufs.c
>>> index 68c5f1f6c9..eccdb852a0 100644
>>> --- a/hw/ufs/ufs.c
>>> +++ b/hw/ufs/ufs.c
>>> @@ -1323,9 +1323,17 @@ static bool ufs_bus_check_address(BusState
>>> *qbus, DeviceState *qdev,
>>>   return true;
>>>   }
>>> +static char *ufs_bus_get_dev_path(DeviceState *dev)
>>> +{
>>> +    BusState *bus = qdev_get_parent_bus(dev);
>>> +
>>> +    return qdev_get_dev_path(bus->parent);
>>> +}
>>> +
>>>   static void ufs_bus_class_init(ObjectClass *class, void *data)
>>>   {
>>>   BusClass *bc = BUS_CLASS(class);
>>> +    bc->get_dev_path = ufs_bus_get_dev_path;
>>>   bc->check_address = ufs_bus_check_address;
>>>   }

Re: [PATCH-for-8.2?] hw/ufs: avoid generating the same ID string for different LU devices

2023-12-04 Thread Jeuk Kim



On 4/12/23 16:05, Akinobu Mita wrote:
> QEMU would not start when trying to create two UFS host controllers and
> a UFS logical unit for each with the following options:
>
> -device ufs,id=bus0 \
> -device ufs-lu,drive=drive1,bus=bus0,lun=0 \
> -device ufs,id=bus1 \
> -device ufs-lu,drive=drive2,bus=bus1,lun=0 \
>
> This is because the same ID string ("0:0:0/scsi-disk") is generated
> for both UFS logical units.
>
> To fix this issue, prepend the parent pci device's path to make
> the ID string unique.
> (":00:03.0/0:0:0/scsi-disk" and ":00:04.0/0:0:0/scsi-disk")
>
> Fixes: 096434fea13a ("hw/ufs: Modify lu.c to share codes with SCSI subsystem")
> Signed-off-by: Akinobu Mita 

Reviewed-by: Jeuk Kim 

Thank you!

Re: [PATCH v5] accel/kvm: Turn DPRINTF macro use into tracepoints

2023-12-04 Thread JAI ARORA

Hello maintainers,

This is a friendly ping.
Are there any other review comments for this patch?


Thanks,
Jai Arora

On Sat, 2 Dec 2023 at 10:19, Jai Arora  wrote:

> Patch removes DPRINTF macro and adds multiple tracepoints
> to capture different kvm events.
>
> We also drop the DPRINTFs that don't add any additional
> information than trace_kvm_run_exit already does.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1827
>
> Signed-off-by: Jai Arora 
> Reviewed-by: Alex Bennée 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
> v5: Addressed review comments by Philippe Mathieu-Daudé
> Corrects typo DRPINTF in commit message
> Changed %d to PRIu32 in kvm_run_exit_system_event
>
> I am not sure what you meant by keeping previous tag.
> I think you meant to keep version tag same,
> so I will keep patch tag as v5 again this time.
>
> Thank you for the feedback.
>
>  accel/kvm/kvm-all.c| 28 ++--
>  accel/kvm/trace-events |  7 ++-
>  2 files changed, 12 insertions(+), 23 deletions(-)
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index e39a810a4e..80ac7b35b7 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -69,16 +69,6 @@
>  #define KVM_GUESTDBG_BLOCKIRQ 0
>  #endif
>
> -//#define DEBUG_KVM
> -
> -#ifdef DEBUG_KVM
> -#define DPRINTF(fmt, ...) \
> -do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> -#else
> -#define DPRINTF(fmt, ...) \
> -do { } while (0)
> -#endif
> -
>  struct KVMParkedVcpu {
>  unsigned long vcpu_id;
>  int kvm_fd;
> @@ -331,7 +321,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>  struct KVMParkedVcpu *vcpu = NULL;
>  int ret = 0;
>
> -DPRINTF("kvm_destroy_vcpu\n");
> +trace_kvm_destroy_vcpu();
>
>  ret = kvm_arch_destroy_vcpu(cpu);
>  if (ret < 0) {
> @@ -341,7 +331,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>  mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
>  if (mmap_size < 0) {
>  ret = mmap_size;
> -DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
> +trace_kvm_failed_get_vcpu_mmap_size();
>  goto err;
>  }
>
> @@ -443,7 +433,6 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
> PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
>  if (cpu->kvm_dirty_gfns == MAP_FAILED) {
>  ret = -errno;
> -DPRINTF("mmap'ing vcpu dirty gfns failed: %d\n", ret);
>  goto err;
>  }
>  }
> @@ -2821,7 +2810,7 @@ int kvm_cpu_exec(CPUState *cpu)
>  struct kvm_run *run = cpu->kvm_run;
>  int ret, run_ret;
>
> -DPRINTF("kvm_cpu_exec()\n");
> +trace_kvm_cpu_exec();
>
>  if (kvm_arch_process_async_events(cpu)) {
>  qatomic_set(&cpu->exit_request, 0);
> @@ -2848,7 +2837,7 @@ int kvm_cpu_exec(CPUState *cpu)
>
>  kvm_arch_pre_run(cpu, run);
>  if (qatomic_read(&cpu->exit_request)) {
> -DPRINTF("interrupt exit requested\n");
> +   trace_kvm_interrupt_exit_request();
>  /*
>   * KVM requires us to reenter the kernel after IO exits to
> complete
>   * instruction emulation. This self-signal will ensure that we
> @@ -2878,7 +2867,7 @@ int kvm_cpu_exec(CPUState *cpu)
>
>  if (run_ret < 0) {
>  if (run_ret == -EINTR || run_ret == -EAGAIN) {
> -DPRINTF("io window exit\n");
> +trace_kvm_io_window_exit();
>  kvm_eat_signals(cpu);
>  ret = EXCP_INTERRUPT;
>  break;
> @@ -2900,7 +2889,6 @@ int kvm_cpu_exec(CPUState *cpu)
>  trace_kvm_run_exit(cpu->cpu_index, run->exit_reason);
>  switch (run->exit_reason) {
>  case KVM_EXIT_IO:
> -DPRINTF("handle_io\n");
>  /* Called outside BQL */
>  kvm_handle_io(run->io.port, attrs,
>(uint8_t *)run + run->io.data_offset,
> @@ -2910,7 +2898,6 @@ int kvm_cpu_exec(CPUState *cpu)
>  ret = 0;
>  break;
>  case KVM_EXIT_MMIO:
> -DPRINTF("handle_mmio\n");
>  /* Called outside BQL */
>  address_space_rw(&address_space_memory,
>   run->mmio.phys_addr, attrs,
> @@ -2920,11 +2907,9 @@ int kvm_cpu_exec(CPUState *cpu)
>  ret = 0;
>  break;
>  case KVM_EXIT_IRQ_WINDOW_OPEN:
> -DPRINTF("irq_window_open\n");
>  ret = EXCP_INTERRUPT;
>  break;
>  case KVM_EXIT_SHUTDOWN:
> -DPRINTF("shutdown\n");
>  qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
>  ret = EXCP_INTERRUPT;
>  break;
> @@ -2959,6 +2944,7 @@ int kvm_cpu_exec(CPUState *cpu)
>  ret = 0;
>  break;
>  case KVM_EXIT_SYSTEM_EVENT:
> +trace_kvm_run_exit_system_event(cpu->cpu_index,
> run->system_event.type);
>  switch (run->system_event.type)

RE: [PATCH 3/9] Hexagon (target/hexagon) Make generators object oriented - gen_helper_protos

2023-12-04 Thread Brian Cain



> -Original Message-
> From: Taylor Simpson 
> Sent: Monday, December 4, 2023 7:53 PM
> To: qemu-devel@nongnu.org
> Cc: Brian Cain ; Matheus Bernardino (QUIC)
> ; Sid Manning ; Marco
> Liebel (QUIC) ; richard.hender...@linaro.org;
> phi...@linaro.org; a...@rev.ng; a...@rev.ng; ltaylorsimp...@gmail.com
> Subject: [PATCH 3/9] Hexagon (target/hexagon) Make generators object
> oriented - gen_helper_protos
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> Signed-off-by: Taylor Simpson 
> ---
>  target/hexagon/gen_helper_protos.py | 184 
>  target/hexagon/hex_common.py|  15 +--
>  2 files changed, 55 insertions(+), 144 deletions(-)
> 
> diff --git a/target/hexagon/gen_helper_protos.py
> b/target/hexagon/gen_helper_protos.py
> index 131043795a..9277199e1d 100755
> --- a/target/hexagon/gen_helper_protos.py
> +++ b/target/hexagon/gen_helper_protos.py
> @@ -22,39 +22,6 @@
>  import string
>  import hex_common
> 
> -##
> -## Helpers for gen_helper_prototype
> -##
> -def_helper_types = {
> -"N": "s32",
> -"O": "s32",
> -"P": "s32",
> -"M": "s32",
> -"C": "s32",
> -"R": "s32",
> -"V": "ptr",
> -"Q": "ptr",
> -}
> -
> -def_helper_types_pair = {
> -"R": "s64",
> -"C": "s64",
> -"S": "s64",
> -"G": "s64",
> -"V": "ptr",
> -"Q": "ptr",
> -}
> -
> -
> -def gen_def_helper_opn(f, tag, regtype, regid, i):
> -if hex_common.is_pair(regid):
> -f.write(f", {def_helper_types_pair[regtype]}")
> -elif hex_common.is_single(regid):
> -f.write(f", {def_helper_types[regtype]}")
> -else:
> -hex_common.bad_register(regtype, regid)
> -
> -
>  ##
>  ## Generate the DEF_HELPER prototype for an instruction
>  ## For A2_add: Rd32=add(Rs32,Rt32)
> @@ -65,116 +32,62 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
>  regs = tagregs[tag]
>  imms = tagimms[tag]
> 
> -numresults = 0
> +## If there is a scalar result, it is the return type
> +return_type = ""

Should we use `return_type = None` here?

>  numscalarresults = 0
> -numscalarreadwrite = 0
>  for regtype, regid in regs:
> -if hex_common.is_written(regid):
> -numresults += 1
> -if hex_common.is_scalar_reg(regtype):
> +reg = hex_common.get_register(tag, regtype, regid)
> +if reg.is_written() and reg.is_scalar_reg():
> +return_type = reg.helper_proto_type()
>  numscalarresults += 1
> -if hex_common.is_readwrite(regid):
> -if hex_common.is_scalar_reg(regtype):
> -numscalarreadwrite += 1
> +if numscalarresults == 0:
> +return_type = "void"

Should we use `return_type = None` here?

> 
>  if numscalarresults > 1:
> -## The helper is bogus when there is more than one result
> -f.write(f"DEF_HELPER_1({tag}, void, env)\n")
> -else:
> -## Figure out how many arguments the helper will take
> -if numscalarresults == 0:
> -def_helper_size = len(regs) + len(imms) + numscalarreadwrite + 1
> -if hex_common.need_pkt_has_multi_cof(tag):
> -def_helper_size += 1
> -if hex_common.need_pkt_need_commit(tag):
> -def_helper_size += 1
> -if hex_common.need_part1(tag):
> -def_helper_size += 1
> -if hex_common.need_slot(tag):
> -def_helper_size += 1
> -if hex_common.need_PC(tag):
> -def_helper_size += 1
> -if hex_common.helper_needs_next_PC(tag):
> -def_helper_size += 1
> -if hex_common.need_condexec_reg(tag, regs):
> -def_helper_size += 1
> -f.write(f"DEF_HELPER_{def_helper_size}({tag}")
> -## The return type is void
> -f.write(", void")
> -else:
> -def_helper_size = len(regs) + len(imms) + numscalarreadwrite
> -if hex_common.need_pkt_has_multi_cof(tag):
> -def_helper_size += 1
> -if hex_common.need_pkt_need_commit(tag):
> -def_helper_size += 1
> -if hex_common.need_part1(tag):
> -def_helper_size += 1
> -if hex_common.need_slot(tag):
> -def_helper_size += 1
> -if hex_common.need_PC(tag):
> -def_helper_size += 1
> -if hex_common.need_condexec_reg(tag, regs):
> -def_helper_size += 1
> -if hex_common.helper_needs_next_PC(tag):
> -def_helper_size += 1
> -f.write(f"DEF_HELPER_{def_helper_size}({tag}")
> -
> -## Generate the qemu DEF_HELPER type for each result
> -## Iterate over this list twice
> -## - Emit the scalar result
> -## - Emit the vector result
> -i = 0
> -for regtype, re

Re: [PATCH v6 00/16] Support smp.clusters for x86 in QEMU

2023-12-04 Thread Zhao Liu

Hi maintainers,

Just a friendly ping. Do I need to refresh another version?

Thanks,
Zhao

On Fri, Nov 17, 2023 at 03:50:50PM +0800, Zhao Liu wrote:
> Date: Fri, 17 Nov 2023 15:50:50 +0800
> From: Zhao Liu 
> Subject: [PATCH v6 00/16] Support smp.clusters for x86 in QEMU
> X-Mailer: git-send-email 2.34.1
> 
> From: Zhao Liu 
> 
> Hi list,
> 
> This is the our v6 patch series, rebased on the master branch at the
> commit 34a5cb6d8434 (Merge tag 'pull-tcg-20231114' of
> https://gitlab.com/rth7680/qemu into staging).
> 
> Because the first four patches of v5 [1] have been merged, v6 contains
> the remaining patches and reabse on the latest master.
> 
> No more change since v5 exclude the comment update about QEMU version
> (see Changelog).
> 
> Welcome your comments!
> 
> 
> PS: About the idea to implement generic smp cache topology, we're
> considerring to port the original x-l2-cache-topo option to smp [2].
> Just like:
> 
> -smp cpus=4,sockets=2,cores=2,threads=1, \
>  l3-cache=socket,l2-cache=core,l1-i-cache=core,l1-d-cache=core
> 
> Any feedback about this direction is also welcomed! ;-)
> 
> 
> ---
> # Introduction
> 
> This series adds the cluster support for x86 PC machine, which allows
> x86 can use smp.clusters to configure the module level CPU topology
> of x86.
> 
> This series also is the preparation to help x86 to define the more
> flexible cache topology, such as having multiple cores share the
> same L2 cache at cluster level. (That was what x-l2-cache-topo did,
> and we will explore a generic way.)
> 
> About why we don't share L2 cache at cluster and need a configuration
> way, pls see section: ## Why not share L2 cache in cluster directly.
> 
> 
> # Background
> 
> The "clusters" parameter in "smp" is introduced by ARM [3], but x86
> hasn't supported it.
> 
> At present, x86 defaults L2 cache is shared in one core, but this is
> not enough. There're some platforms that multiple cores share the
> same L2 cache, e.g., Alder Lake-P shares L2 cache for one module of
> Atom cores [4], that is, every four Atom cores shares one L2 cache.
> Therefore, we need the new CPU topology level (cluster/module).
> 
> Another reason is for hybrid architecture. cluster support not only
> provides another level of topology definition in x86, but would also
> provide required code change for future our hybrid topology support.
> 
> 
> # Overview
> 
> ## Introduction of module level for x86
> 
> "cluster" in smp is the CPU topology level which is between "core" and
> die.
> 
> For x86, the "cluster" in smp is corresponding to the module level [4],
> which is above the core level. So use the "module" other than "cluster"
> in x86 code.
> 
> And please note that x86 already has a cpu topology level also named
> "cluster" [5], this level is at the upper level of the package. Here,
> the cluster in x86 cpu topology is completely different from the
> "clusters" as the smp parameter. After the module level is introduced,
> the cluster as the smp parameter will actually refer to the module level
> of x86.
> 
> 
> ## Why not share L2 cache in cluster directly
> 
> Though "clusters" was introduced to help define L2 cache topology
> [3], using cluster to define x86's L2 cache topology will cause the
> compatibility problem:
> 
> Currently, x86 defaults that the L2 cache is shared in one core, which
> actually implies a default setting "cores per L2 cache is 1" and
> therefore implicitly defaults to having as many L2 caches as cores.
> 
> For example (i386 PC machine):
> -smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)
> 
> Considering the topology of the L2 cache, this (*) implicitly means "1
> core per L2 cache" and "2 L2 caches per die".
> 
> If we use cluster to configure L2 cache topology with the new default
> setting "clusters per L2 cache is 1", the above semantics will change
> to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
> cores per L2 cache".
> 
> So the same command (*) will cause changes in the L2 cache topology,
> further affecting the performance of the virtual machine.
> 
> Therefore, x86 should only treat cluster as a cpu topology level and
> avoid using it to change L2 cache by default for compatibility.
> 
> 
> ## module level in CPUID
> 
> Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
> erroneous smp_num_siblings on Intel Hybrid platforms") is able to
> handle platforms with Module level enumerated via CPUID.1F.
> 
> Expose the module level in CPUID[0x1F] (for Intel CPUs) if the machine
> has more than 1 modules since v3.
> 
> 
> ## New cache topology info in CPUCacheInfo
> 
> (This is in preparation for users being able to configure cache topology
> from the cli later on.)
> 
> Currently, by default, the cache topology is encoded as:
> 1. i/d cache is shared in one core.
> 2. L2 cache is shared in one core.
> 3. L3 cache is shared in one die.
> 
> This default general setting has caused a misunderstanding, that is, the
> cache topology is com

Re: [RFC 1/3] scripts/checkpatch: Check common spelling be default

2023-12-04 Thread Zhao Liu

Hi Thomas,

On Mon, Dec 04, 2023 at 10:07:12AM +0100, Thomas Huth wrote:
> Date: Mon, 4 Dec 2023 10:07:12 +0100
> From: Thomas Huth 
> Subject: Re: [RFC 1/3] scripts/checkpatch: Check common spelling be default
> 
> On 04/12/2023 09.29, Zhao Liu wrote:
> > From: Zhao Liu 
> > 
> > Add the check for common spelling mistakes for QEMU, which stole
> > selling.txt from Linux kernel and referenced the Linux kernel's
> 
> You need to sellcheck^Wspellcheck the above line ;-)

Oh no, I didn't expect this place to be wrong...--codespelling isn't
infallible!

> 
> > implementation in checkpatch.pl.
> > 
> > This check covers common spelling mistakes, and can be updated/
> > extended as per QEMU's realities.
> > 
> > Signed-off-by: Zhao Liu 
> > ---
> >   scripts/checkpatch.pl |   44 ++
> >   scripts/spelling.txt  | 1713 +
> >   2 files changed, 1757 insertions(+)
> >   create mode 100644 scripts/spelling.txt
> 
> I like the idea of having spellchecking in checkpatch.pl 

Thanks!

> ... not sure though
> if we need both, this patch and support for codespell. If you ask me, I'd
> just go with the next codespell patch and avoid adding a full spelling.txt
> file here ... but if others like this patch here, too, I'm also fine with
> it.

OK, I'll wait a few days to see if anyone else has any comments, if not
I'll refresh a V2 and just keep the next patch.

Regards,
Zhao

RE: [PATCH 1/9] Hexagon (target/hexagon) Clean up handling of modifier registers

2023-12-04 Thread Brian Cain



> -Original Message-
> From: Taylor Simpson 
> Sent: Monday, December 4, 2023 7:53 PM
> To: qemu-devel@nongnu.org
> Cc: Brian Cain ; Matheus Bernardino (QUIC)
> ; Sid Manning ; Marco
> Liebel (QUIC) ; richard.hender...@linaro.org;
> phi...@linaro.org; a...@rev.ng; a...@rev.ng; ltaylorsimp...@gmail.com
> Subject: [PATCH 1/9] Hexagon (target/hexagon) Clean up handling of modifier
> registers
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> Currently, the register number (MuN) for modifier registers is the
> modifier register number rather than the index into hex_gpr.  This
> patch changes MuN to the hex_gpr index, which is consistent with
> the handling of control registers.
> 
> Note that HELPER(fcircadd) needs the CS register corresponding to the
> modifier register specified in the instruction.  We create a TCGv
> variable "CS" to hold the value to pass to the helper.
> 
> Signed-off-by: Taylor Simpson 
> ---
>  target/hexagon/gen_tcg.h|  9 -
>  target/hexagon/macros.h |  3 +--
>  target/hexagon/idef-parser/parser-helpers.c |  8 +++-
>  target/hexagon/gen_tcg_funcs.py | 13 +
>  4 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
> index d992059fce..1c4391b415 100644
> --- a/target/hexagon/gen_tcg.h
> +++ b/target/hexagon/gen_tcg.h
> @@ -68,15 +68,14 @@
>  do { \
>  TCGv tcgv_siV = tcg_constant_tl(siV); \
>  tcg_gen_mov_tl(EA, RxV); \
> -gen_helper_fcircadd(RxV, RxV, tcgv_siV, MuV, \
> -hex_gpr[HEX_REG_CS0 + MuN]); \
> +gen_helper_fcircadd(RxV, RxV, tcgv_siV, MuV, CS); \
>  } while (0)
>  #define GET_EA_pcr(SHIFT) \
>  do { \
>  TCGv ireg = tcg_temp_new(); \
>  tcg_gen_mov_tl(EA, RxV); \
>  gen_read_ireg(ireg, MuV, (SHIFT)); \
> -gen_helper_fcircadd(RxV, RxV, ireg, MuV, hex_gpr[HEX_REG_CS0 +
> MuN]); \
> +gen_helper_fcircadd(RxV, RxV, ireg, MuV, CS); \
>  } while (0)
> 
>  /* Instructions with multiple definitions */
> @@ -113,7 +112,7 @@
>  TCGv ireg = tcg_temp_new(); \
>  tcg_gen_mov_tl(EA, RxV); \
>  gen_read_ireg(ireg, MuV, SHIFT); \
> -gen_helper_fcircadd(RxV, RxV, ireg, MuV, hex_gpr[HEX_REG_CS0 +
> MuN]); \
> +gen_helper_fcircadd(RxV, RxV, ireg, MuV, CS); \
>  LOAD; \
>  } while (0)
> 
> @@ -427,7 +426,7 @@
>  TCGv BYTE G_GNUC_UNUSED = tcg_temp_new(); \
>  tcg_gen_mov_tl(EA, RxV); \
>  gen_read_ireg(ireg, MuV, SHIFT); \
> -gen_helper_fcircadd(RxV, RxV, ireg, MuV, hex_gpr[HEX_REG_CS0 +
> MuN]); \
> +gen_helper_fcircadd(RxV, RxV, ireg, MuV, CS); \
>  STORE; \
>  } while (0)
> 
> diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
> index 9a51b5709b..939f22e76b 100644
> --- a/target/hexagon/macros.h
> +++ b/target/hexagon/macros.h
> @@ -462,8 +462,7 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val,
> int shift)
>  #define fPM_CIRI(REG, IMM, MVAL) \
>  do { \
>  TCGv tcgv_siV = tcg_constant_tl(siV); \
> -gen_helper_fcircadd(REG, REG, tcgv_siV, MuV, \
> -hex_gpr[HEX_REG_CS0 + MuN]); \
> +gen_helper_fcircadd(REG, REG, tcgv_siV, MuV, CS); \
>  } while (0)
>  #else
>  #define fEA_IMM(IMM)do { EA = (IMM); } while (0)
> diff --git a/target/hexagon/idef-parser/parser-helpers.c 
> b/target/hexagon/idef-
> parser/parser-helpers.c
> index 4af020933a..95f2b43076 100644
> --- a/target/hexagon/idef-parser/parser-helpers.c
> +++ b/target/hexagon/idef-parser/parser-helpers.c
> @@ -1541,10 +1541,8 @@ void gen_circ_op(Context *c,
>   HexValue *increment,
>   HexValue *modifier)
>  {
> -HexValue cs = gen_tmp(c, locp, 32, UNSIGNED);
>  HexValue increment_m = *increment;
>  increment_m = rvalue_materialize(c, locp, &increment_m);
> -OUT(c, locp, "gen_read_reg(", &cs, ", HEX_REG_CS0 + MuN);\n");
>  OUT(c,
>  locp,
>  "gen_helper_fcircadd(",
> @@ -1555,7 +1553,7 @@ void gen_circ_op(Context *c,
>  &increment_m,
>  ", ",
>  modifier);
> -OUT(c, locp, ", ", &cs, ");\n");
> +OUT(c, locp, ", CS);\n");
>  }
> 
>  HexValue gen_locnt_op(Context *c, YYLTYPE *locp, HexValue *src)
> @@ -2080,9 +2078,9 @@ void emit_arg(Context *c, YYLTYPE *locp, HexValue
> *arg)
>  char reg_id[5];
>  reg_compose(c, locp, &(arg->reg), reg_id);
>  EMIT_SIG(c, ", %s %s", type, reg_id);
> -/* MuV register requires also MuN to provide its index */
> +/* MuV register requires also CS for circular addressing*/
>  if (arg->reg.type == MODIFIER) {
> -EMIT_SIG(c, ", int MuN");
> +EMIT_SIG(c, ", TCGv CS");
>  }

RE: [PATCH 6/9] Hexagon (target/hexagon) Make generators object oriented - gen_op_regs

2023-12-04 Thread Brian Cain



> -Original Message-
> From: Taylor Simpson 
> Sent: Monday, December 4, 2023 7:53 PM
> To: qemu-devel@nongnu.org
> Cc: Brian Cain ; Matheus Bernardino (QUIC)
> ; Sid Manning ; Marco
> Liebel (QUIC) ; richard.hender...@linaro.org;
> phi...@linaro.org; a...@rev.ng; a...@rev.ng; ltaylorsimp...@gmail.com
> Subject: [PATCH 6/9] Hexagon (target/hexagon) Make generators object
> oriented - gen_op_regs
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> Signed-off-by: Taylor Simpson 
> ---
>  target/hexagon/gen_op_regs.py | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/target/hexagon/gen_op_regs.py b/target/hexagon/gen_op_regs.py
> index a8a7712129..7b7b33895a 100755
> --- a/target/hexagon/gen_op_regs.py
> +++ b/target/hexagon/gen_op_regs.py
> @@ -70,6 +70,7 @@ def strip_reg_prefix(x):
>  def main():
>  hex_common.read_semantics_file(sys.argv[1])
>  hex_common.read_attribs_file(sys.argv[2])
> +hex_common.init_registers()
>  tagregs = hex_common.get_tagregs(full=True)
>  tagimms = hex_common.get_tagimms()
> 
> @@ -80,11 +81,12 @@ def main():
>  wregs = []
>  regids = ""
>  for regtype, regid, _, numregs in regs:
> -if hex_common.is_read(regid):
> +reg = hex_common.get_register(tag, regtype, regid)
> +if reg.is_read():
>  if regid[0] not in regids:
>  regids += regid[0]
>  rregs.append(regtype + regid + numregs)
> -if hex_common.is_written(regid):
> +if reg.is_written():
>  wregs.append(regtype + regid + numregs)
>  if regid[0] not in regids:
>  regids += regid[0]
> --
> 2.34.1

Reviewed-by: Brian Cain

RE: [PATCH 2/9] Hexagon (target/hexagon) Make generators object oriented - gen_tcg_funcs

2023-12-04 Thread Brian Cain



> -Original Message-
> From: Taylor Simpson 
> Sent: Monday, December 4, 2023 7:53 PM
> To: qemu-devel@nongnu.org
> Cc: Brian Cain ; Matheus Bernardino (QUIC)
> ; Sid Manning ; Marco
> Liebel (QUIC) ; richard.hender...@linaro.org;
> phi...@linaro.org; a...@rev.ng; a...@rev.ng; ltaylorsimp...@gmail.com
> Subject: [PATCH 2/9] Hexagon (target/hexagon) Make generators object
> oriented - gen_tcg_funcs
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> The generators are generally a bunch of Python if-then-else
> statements based on the regtype and regid.  Encapsulate regtype/regid
> into a class hierarchy.  Clients lookup the register and invoke
> methods.
> 
> This has several advantages for making the code easier to read,
> understand, and maintain
> - The class name makes it more clear what the operand does
> - All the methods for a given type of operand are together
> - Don't need hex_common.bad_register
>   If a regtype/regid is missing, the lookup in hex_common.get_register
>   will fail
> - We can remove the functions in hex_common that use regtype/regid
>   (e.g., is_read)
> 
> This patch creates the class hierarchy in hex_common and converts
> gen_tcg_funcs.py.  The other scripts will be converted in subsequent
> patches in this series.
> 
> Signed-off-by: Taylor Simpson 
> ---
>  target/hexagon/gen_tcg_funcs.py | 583 +++-
>  target/hexagon/hex_common.py| 542 +
>  2 files changed, 589 insertions(+), 536 deletions(-)
> 
> diff --git a/target/hexagon/gen_tcg_funcs.py
> b/target/hexagon/gen_tcg_funcs.py
> index 02d93bc5ce..8c2bc03c10 100755
> --- a/target/hexagon/gen_tcg_funcs.py
> +++ b/target/hexagon/gen_tcg_funcs.py
> @@ -23,466 +23,13 @@
>  import hex_common
> 
> 
> -##
> -## Helpers for gen_tcg_func
> -##
> -def gen_decl_ea_tcg(f, tag):
> -f.write("TCGv EA G_GNUC_UNUSED = tcg_temp_new();\n")
> -
> -
> -def genptr_decl_pair_writable(f, tag, regtype, regid, regno):
> -regN = f"{regtype}{regid}N"
> -if regtype == "R":
> -f.write(f"const int {regN} = insn->regno[{regno}];\n")
> -elif regtype == "C":
> -f.write(f"const int {regN} = insn->regno[{regno}] + 
> HEX_REG_SA0;\n")
> -else:
> -hex_common.bad_register(regtype, regid)
> -f.write(f"TCGv_i64 {regtype}{regid}V = " f"get_result_gpr_pair(ctx,
> {regN});\n")
> -
> -
> -def genptr_decl_writable(f, tag, regtype, regid, regno):
> -regN = f"{regtype}{regid}N"
> -if regtype == "R":
> -f.write(f"const int {regN} = insn->regno[{regno}];\n")
> -f.write(f"TCGv {regtype}{regid}V = get_result_gpr(ctx, 
> {regN});\n")
> -elif regtype == "C":
> -f.write(f"const int {regN} = insn->regno[{regno}] + 
> HEX_REG_SA0;\n")
> -f.write(f"TCGv {regtype}{regid}V = get_result_gpr(ctx, 
> {regN});\n")
> -elif regtype == "P":
> -f.write(f"const int {regN} = insn->regno[{regno}];\n")
> -f.write(f"TCGv {regtype}{regid}V = tcg_temp_new();\n")
> -else:
> -hex_common.bad_register(regtype, regid)
> -
> -
> -def genptr_decl(f, tag, regtype, regid, regno):
> -regN = f"{regtype}{regid}N"
> -if regtype == "R":
> -if regid in {"ss", "tt"}:
> -f.write(f"TCGv_i64 {regtype}{regid}V = 
> tcg_temp_new_i64();\n")
> -f.write(f"const int {regN} = insn->regno[{regno}];\n")
> -elif regid in {"dd", "ee", "xx", "yy"}:
> -genptr_decl_pair_writable(f, tag, regtype, regid, regno)
> -elif regid in {"s", "t", "u", "v"}:
> -f.write(
> -f"TCGv {regtype}{regid}V = " 
> f"hex_gpr[insn->regno[{regno}]];\n"
> -)
> -elif regid in {"d", "e", "x", "y"}:
> -genptr_decl_writable(f, tag, regtype, regid, regno)
> -else:
> -hex_common.bad_register(regtype, regid)
> -elif regtype == "P":
> -if regid in {"s", "t", "u", "v"}:
> -f.write(
> -f"TCGv {regtype}{regid}V = " 
> f"hex_pred[insn->regno[{regno}]];\n"
> -)
> -elif regid in {"d", "e", "x"}:
> -genptr_decl_writable(f, tag, regtype, regid, regno)
> -else:
> -hex_common.bad_register(regtype, regid)
> -elif regtype == "C":
> -if regid == "ss":
> -f.write(f"TCGv_i64 {regtype}{regid}V = " 
> f"tcg_temp_new_i64();\n")
> -f.write(f"const int {regN} = insn->regno[{regno}] + "
> "HEX_REG_SA0;\n")
> -elif regid == "dd":
> -genptr_decl_pair_writable(f, tag, regtype, regid, regno)
> -elif regid == "s":
> -f.write(f"TCGv {regtype}{regid}V = tcg_temp_new();\n")
> -f.write(
> -f"const int {regtype}{regid}N = insn->regno[{regno}] + "
> -"HEX_REG_SA0;\n"
> -)
> -eli

[PATCH] ui/gtk: flush display pipeline before saving vmstate when blob=true

2023-12-04 Thread Dongwon Kim

If the guest state is paused before it gets a response for the current
scanout frame submission (resource-flush), it won't start submitting
new frames after being restored as it still waits for the old response,
which is accepted as a scanout render done signal. So it's needed to
unblock the current scanout render pipeline before the run state is
changed to make sure the guest receives the response for the current
frame submission.

Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/ui/gtk.c b/ui/gtk.c
index 810d7fc796..0f6237dd2f 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -678,6 +678,18 @@ static const DisplayGLCtxOps egl_ctx_ops = {
 static void gd_change_runstate(void *opaque, bool running, RunState state)
 {
 GtkDisplayState *s = opaque;
+int i;
+
+if (state == RUN_STATE_SAVE_VM) {
+for (i = 0; i < s->nb_vcs; i++) {
+VirtualConsole *vc = &s->vc[i];
+
+if (vc->gfx.guest_fb.dmabuf) {
+/* force flushing current scanout blob rendering process */
+gd_hw_gl_flushed(vc);
+}
+}
+}

 gd_update_caption(s);
 }
-- 
2.34.1

[PATCH 4/9] Hexagon (target/hexagon) Make generators object oriented - gen_helper_funcs

2023-12-04 Thread Taylor Simpson

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_helper_funcs.py | 400 +++--
 target/hexagon/hex_common.py   |  58 -
 2 files changed, 151 insertions(+), 307 deletions(-)

diff --git a/target/hexagon/gen_helper_funcs.py 
b/target/hexagon/gen_helper_funcs.py
index ce21d3b688..60b7e95e8c 100755
--- a/target/hexagon/gen_helper_funcs.py
+++ b/target/hexagon/gen_helper_funcs.py
@@ -23,181 +23,14 @@
 import hex_common
 
 
-##
-## Helpers for gen_helper_function
-##
-def gen_decl_ea(f):
-f.write("uint32_t EA;\n")
-
-
-def gen_helper_return_type(f, regtype, regid, regno):
-if regno > 1:
-f.write(", ")
-f.write("int32_t")
-
-
-def gen_helper_return_type_pair(f, regtype, regid, regno):
-if regno > 1:
-f.write(", ")
-f.write("int64_t")
-
-
-def gen_helper_arg(f, regtype, regid, regno):
-if regno > 0:
-f.write(", ")
-f.write(f"int32_t {regtype}{regid}V")
-
-
-def gen_helper_arg_new(f, regtype, regid, regno):
-if regno >= 0:
-f.write(", ")
-f.write(f"int32_t {regtype}{regid}N")
-
-
-def gen_helper_arg_pair(f, regtype, regid, regno):
-if regno >= 0:
-f.write(", ")
-f.write(f"int64_t {regtype}{regid}V")
-
-
-def gen_helper_arg_ext(f, regtype, regid, regno):
-if regno > 0:
-f.write(", ")
-f.write(f"void *{regtype}{regid}V_void")
-
-
-def gen_helper_arg_ext_pair(f, regtype, regid, regno):
-if regno > 0:
-f.write(", ")
-f.write(f"void *{regtype}{regid}V_void")
-
-
-def gen_helper_arg_opn(f, regtype, regid, i, tag):
-if hex_common.is_pair(regid):
-if hex_common.is_hvx_reg(regtype):
-gen_helper_arg_ext_pair(f, regtype, regid, i)
-else:
-gen_helper_arg_pair(f, regtype, regid, i)
-elif hex_common.is_single(regid):
-if hex_common.is_old_val(regtype, regid, tag):
-if hex_common.is_hvx_reg(regtype):
-gen_helper_arg_ext(f, regtype, regid, i)
-else:
-gen_helper_arg(f, regtype, regid, i)
-elif hex_common.is_new_val(regtype, regid, tag):
-gen_helper_arg_new(f, regtype, regid, i)
-else:
-hex_common.bad_register(regtype, regid)
-else:
-hex_common.bad_register(regtype, regid)
-
-
-def gen_helper_arg_imm(f, immlett):
-f.write(f", int32_t {hex_common.imm_name(immlett)}")
-
-
-def gen_helper_dest_decl(f, regtype, regid, regno, subfield=""):
-f.write(f"int32_t {regtype}{regid}V{subfield} = 0;\n")
-
-
-def gen_helper_dest_decl_pair(f, regtype, regid, regno, subfield=""):
-f.write(f"int64_t {regtype}{regid}V{subfield} = 0;\n")
-
-
-def gen_helper_dest_decl_ext(f, regtype, regid):
-if regtype == "Q":
-f.write(
-f"/* {regtype}{regid}V is *(MMQReg *)" 
f"({regtype}{regid}V_void) */\n"
-)
-else:
-f.write(
-f"/* {regtype}{regid}V is *(MMVector *)"
-f"({regtype}{regid}V_void) */\n"
-)
-
-
-def gen_helper_dest_decl_ext_pair(f, regtype, regid, regno):
-f.write(
-f"/* {regtype}{regid}V is *(MMVectorPair *))"
-f"{regtype}{regid}V_void) */\n"
-)
-
-
-def gen_helper_dest_decl_opn(f, regtype, regid, i):
-if hex_common.is_pair(regid):
-if hex_common.is_hvx_reg(regtype):
-gen_helper_dest_decl_ext_pair(f, regtype, regid, i)
-else:
-gen_helper_dest_decl_pair(f, regtype, regid, i)
-elif hex_common.is_single(regid):
-if hex_common.is_hvx_reg(regtype):
-gen_helper_dest_decl_ext(f, regtype, regid)
-else:
-gen_helper_dest_decl(f, regtype, regid, i)
-else:
-hex_common.bad_register(regtype, regid)
-
-
-def gen_helper_src_var_ext(f, regtype, regid):
-if regtype == "Q":
-f.write(
-f"/* {regtype}{regid}V is *(MMQReg *)" 
f"({regtype}{regid}V_void) */\n"
-)
-else:
-f.write(
-f"/* {regtype}{regid}V is *(MMVector *)"
-f"({regtype}{regid}V_void) */\n"
-)
-
-
-def gen_helper_src_var_ext_pair(f, regtype, regid, regno):
-f.write(
-f"/* {regtype}{regid}V{regno} is *(MMVectorPair *)"
-f"({regtype}{regid}V{regno}_void) */\n"
-)
-
-
-def gen_helper_return(f, regtype, regid, regno):
-f.write(f"return {regtype}{regid}V;\n")
-
-
-def gen_helper_return_pair(f, regtype, regid, regno):
-f.write(f"return {regtype}{regid}V;\n")
-
-
-def gen_helper_dst_write_ext(f, regtype, regid):
-return
-
-
-def gen_helper_dst_write_ext_pair(f, regtype, regid):
-return
-
-
-def gen_helper_return_opn(f, regtype, regid, i):
-if hex_common.is_pair(regid):
-if hex_common.is_hvx_reg(regtype):
-gen_helper_dst_write_ext_pair(f, regtype, regid)
-else:
-gen_helper_return_pair(f, regtype, regid, i)
-elif hex_common.is_single(regid):
-if hex_common.is_hvx_reg(regtype):
-

[PATCH 9/9] Hexagon (target/hexagon) Remove dead functions from hex_common.py

2023-12-04 Thread Taylor Simpson

These functions are no longer used after making the generators
object oriented.

Signed-off-by: Taylor Simpson 
---
 target/hexagon/hex_common.py | 51 
 1 file changed, 51 deletions(-)

diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index 59fed78ab0..90d61a1b16 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -33,9 +33,6 @@
 overrides = {}  # tags with helper overrides
 idef_parser_enabled = {}  # tags enabled for idef-parser
 
-def bad_register(regtype, regid):
-raise Exception(f"Bad register parse: regtype '{regtype}' regid '{regid}'")
-
 # We should do this as a hash for performance,
 # but to keep order let's keep it as a list.
 def uniquify(seq):
@@ -200,46 +197,6 @@ def get_tagimms():
 return dict(zip(tags, list(map(compute_tag_immediates, tags
 
 
-def is_pair(regid):
-return len(regid) == 2
-
-
-def is_single(regid):
-return len(regid) == 1
-
-
-def is_written(regid):
-return regid[0] in "dexy"
-
-
-def is_writeonly(regid):
-return regid[0] in "de"
-
-
-def is_read(regid):
-return regid[0] in "stuvwxy"
-
-
-def is_readwrite(regid):
-return regid[0] in "xy"
-
-
-def is_scalar_reg(regtype):
-return regtype in "RPC"
-
-
-def is_hvx_reg(regtype):
-return regtype in "VQ"
-
-
-def is_old_val(regtype, regid, tag):
-return regtype + regid + "V" in semdict[tag]
-
-
-def is_new_val(regtype, regid, tag):
-return regtype + regid + "N" in semdict[tag]
-
-
 def need_slot(tag):
 if (
 "A_CVI_SCATTER" not in attribdict[tag]
@@ -280,14 +237,6 @@ def skip_qemu_helper(tag):
 return tag in overrides.keys()
 
 
-def is_tmp_result(tag):
-return "A_CVI_TMP" in attribdict[tag] or "A_CVI_TMP_DST" in attribdict[tag]
-
-
-def is_new_result(tag):
-return "A_CVI_NEW" in attribdict[tag]
-
-
 def is_idef_parser_enabled(tag):
 return tag in idef_parser_enabled
 
-- 
2.34.1

[PATCH 7/9] Hexagon (target/hexagon) Make generators object oriented - gen_analyze_funcs

2023-12-04 Thread Taylor Simpson

This patch conflicts with
https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg00729.html
If that series goes in first, we'll rework this patch and vice versa.

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_analyze_funcs.py | 163 +---
 target/hexagon/hex_common.py| 151 ++
 2 files changed, 157 insertions(+), 157 deletions(-)

diff --git a/target/hexagon/gen_analyze_funcs.py 
b/target/hexagon/gen_analyze_funcs.py
index c3b521abef..a9af666cef 100755
--- a/target/hexagon/gen_analyze_funcs.py
+++ b/target/hexagon/gen_analyze_funcs.py
@@ -23,162 +23,6 @@
 import hex_common
 
 
-##
-## Helpers for gen_analyze_func
-##
-def is_predicated(tag):
-return "A_CONDEXEC" in hex_common.attribdict[tag]
-
-
-def analyze_opn_old(f, tag, regtype, regid, regno):
-regN = f"{regtype}{regid}N"
-predicated = "true" if is_predicated(tag) else "false"
-if regtype == "R":
-if regid in {"ss", "tt"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_reg_read_pair(ctx, {regN});\n")
-elif regid in {"dd", "ee", "xx", "yy"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_reg_write_pair(ctx, {regN}, 
{predicated});\n")
-elif regid in {"s", "t", "u", "v"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_reg_read(ctx, {regN});\n")
-elif regid in {"d", "e", "x", "y"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_reg_write(ctx, {regN}, {predicated});\n")
-else:
-hex_common.bad_register(regtype, regid)
-elif regtype == "P":
-if regid in {"s", "t", "u", "v"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_pred_read(ctx, {regN});\n")
-elif regid in {"d", "e", "x"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_pred_write(ctx, {regN});\n")
-else:
-hex_common.bad_register(regtype, regid)
-elif regtype == "C":
-if regid == "ss":
-f.write(
-f"const int {regN} = insn->regno[{regno}] "
-"+ HEX_REG_SA0;\n"
-)
-f.write(f"ctx_log_reg_read_pair(ctx, {regN});\n")
-elif regid == "dd":
-f.write(f"const int {regN} = insn->regno[{regno}] " "+ 
HEX_REG_SA0;\n")
-f.write(f"ctx_log_reg_write_pair(ctx, {regN}, 
{predicated});\n")
-elif regid == "s":
-f.write(
-f"const int {regN} = insn->regno[{regno}] "
-"+ HEX_REG_SA0;\n"
-)
-f.write(f"ctx_log_reg_read(ctx, {regN});\n")
-elif regid == "d":
-f.write(f"const int {regN} = insn->regno[{regno}] " "+ 
HEX_REG_SA0;\n")
-f.write(f"ctx_log_reg_write(ctx, {regN}, {predicated});\n")
-else:
-hex_common.bad_register(regtype, regid)
-elif regtype == "M":
-if regid == "u":
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_reg_read(ctx, {regN});\n")
-else:
-hex_common.bad_register(regtype, regid)
-elif regtype == "V":
-newv = "EXT_DFL"
-if hex_common.is_new_result(tag):
-newv = "EXT_NEW"
-elif hex_common.is_tmp_result(tag):
-newv = "EXT_TMP"
-if regid in {"dd", "xx"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(
-f"ctx_log_vreg_write_pair(ctx, {regN}, {newv}, " 
f"{predicated});\n"
-)
-elif regid in {"uu", "vv"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_vreg_read_pair(ctx, {regN});\n")
-elif regid in {"s", "u", "v", "w"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_vreg_read(ctx, {regN});\n")
-elif regid in {"d", "x", "y"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_vreg_write(ctx, {regN}, {newv}, " 
f"{predicated});\n")
-else:
-hex_common.bad_register(regtype, regid)
-elif regtype == "Q":
-if regid in {"d", "e", "x"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_qreg_write(ctx, {regN});\n")
-elif regid in {"s", "t", "u", "v"}:
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"ctx_log_qreg_read(ctx, {regN});\n")
-else:
-hex_common.bad_register(regtype, regid)
-elif regtype == "G":
-if regid in {"dd"}:
-f.wri

[PATCH 1/9] Hexagon (target/hexagon) Clean up handling of modifier registers

2023-12-04 Thread Taylor Simpson

Currently, the register number (MuN) for modifier registers is the
modifier register number rather than the index into hex_gpr.  This
patch changes MuN to the hex_gpr index, which is consistent with
the handling of control registers.

Note that HELPER(fcircadd) needs the CS register corresponding to the
modifier register specified in the instruction.  We create a TCGv
variable "CS" to hold the value to pass to the helper.

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg.h|  9 -
 target/hexagon/macros.h |  3 +--
 target/hexagon/idef-parser/parser-helpers.c |  8 +++-
 target/hexagon/gen_tcg_funcs.py | 13 +
 4 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index d992059fce..1c4391b415 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -68,15 +68,14 @@
 do { \
 TCGv tcgv_siV = tcg_constant_tl(siV); \
 tcg_gen_mov_tl(EA, RxV); \
-gen_helper_fcircadd(RxV, RxV, tcgv_siV, MuV, \
-hex_gpr[HEX_REG_CS0 + MuN]); \
+gen_helper_fcircadd(RxV, RxV, tcgv_siV, MuV, CS); \
 } while (0)
 #define GET_EA_pcr(SHIFT) \
 do { \
 TCGv ireg = tcg_temp_new(); \
 tcg_gen_mov_tl(EA, RxV); \
 gen_read_ireg(ireg, MuV, (SHIFT)); \
-gen_helper_fcircadd(RxV, RxV, ireg, MuV, hex_gpr[HEX_REG_CS0 + MuN]); \
+gen_helper_fcircadd(RxV, RxV, ireg, MuV, CS); \
 } while (0)
 
 /* Instructions with multiple definitions */
@@ -113,7 +112,7 @@
 TCGv ireg = tcg_temp_new(); \
 tcg_gen_mov_tl(EA, RxV); \
 gen_read_ireg(ireg, MuV, SHIFT); \
-gen_helper_fcircadd(RxV, RxV, ireg, MuV, hex_gpr[HEX_REG_CS0 + MuN]); \
+gen_helper_fcircadd(RxV, RxV, ireg, MuV, CS); \
 LOAD; \
 } while (0)
 
@@ -427,7 +426,7 @@
 TCGv BYTE G_GNUC_UNUSED = tcg_temp_new(); \
 tcg_gen_mov_tl(EA, RxV); \
 gen_read_ireg(ireg, MuV, SHIFT); \
-gen_helper_fcircadd(RxV, RxV, ireg, MuV, hex_gpr[HEX_REG_CS0 + MuN]); \
+gen_helper_fcircadd(RxV, RxV, ireg, MuV, CS); \
 STORE; \
 } while (0)
 
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 9a51b5709b..939f22e76b 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -462,8 +462,7 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int 
shift)
 #define fPM_CIRI(REG, IMM, MVAL) \
 do { \
 TCGv tcgv_siV = tcg_constant_tl(siV); \
-gen_helper_fcircadd(REG, REG, tcgv_siV, MuV, \
-hex_gpr[HEX_REG_CS0 + MuN]); \
+gen_helper_fcircadd(REG, REG, tcgv_siV, MuV, CS); \
 } while (0)
 #else
 #define fEA_IMM(IMM)do { EA = (IMM); } while (0)
diff --git a/target/hexagon/idef-parser/parser-helpers.c 
b/target/hexagon/idef-parser/parser-helpers.c
index 4af020933a..95f2b43076 100644
--- a/target/hexagon/idef-parser/parser-helpers.c
+++ b/target/hexagon/idef-parser/parser-helpers.c
@@ -1541,10 +1541,8 @@ void gen_circ_op(Context *c,
  HexValue *increment,
  HexValue *modifier)
 {
-HexValue cs = gen_tmp(c, locp, 32, UNSIGNED);
 HexValue increment_m = *increment;
 increment_m = rvalue_materialize(c, locp, &increment_m);
-OUT(c, locp, "gen_read_reg(", &cs, ", HEX_REG_CS0 + MuN);\n");
 OUT(c,
 locp,
 "gen_helper_fcircadd(",
@@ -1555,7 +1553,7 @@ void gen_circ_op(Context *c,
 &increment_m,
 ", ",
 modifier);
-OUT(c, locp, ", ", &cs, ");\n");
+OUT(c, locp, ", CS);\n");
 }
 
 HexValue gen_locnt_op(Context *c, YYLTYPE *locp, HexValue *src)
@@ -2080,9 +2078,9 @@ void emit_arg(Context *c, YYLTYPE *locp, HexValue *arg)
 char reg_id[5];
 reg_compose(c, locp, &(arg->reg), reg_id);
 EMIT_SIG(c, ", %s %s", type, reg_id);
-/* MuV register requires also MuN to provide its index */
+/* MuV register requires also CS for circular addressing*/
 if (arg->reg.type == MODIFIER) {
-EMIT_SIG(c, ", int MuN");
+EMIT_SIG(c, ", TCGv CS");
 }
 }
 break;
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index f5246cee6d..02d93bc5ce 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -99,10 +99,15 @@ def genptr_decl(f, tag, regtype, regid, regno):
 hex_common.bad_register(regtype, regid)
 elif regtype == "M":
 if regid == "u":
-f.write(f"const int {regtype}{regid}N = " 
f"insn->regno[{regno}];\n")
 f.write(
-f"TCGv {regtype}{regid}V = hex_gpr[{regtype}{regid}N + "
-"HEX_REG_M0];\n"
+f"const int {regN} = insn->regno[{regno}] + HEX_REG_M0;\n"
+)
+f.write(
+f"TCGv {

[PATCH 2/9] Hexagon (target/hexagon) Make generators object oriented - gen_tcg_funcs

2023-12-04 Thread Taylor Simpson

The generators are generally a bunch of Python if-then-else
statements based on the regtype and regid.  Encapsulate regtype/regid
into a class hierarchy.  Clients lookup the register and invoke
methods.

This has several advantages for making the code easier to read,
understand, and maintain
- The class name makes it more clear what the operand does
- All the methods for a given type of operand are together
- Don't need hex_common.bad_register
  If a regtype/regid is missing, the lookup in hex_common.get_register
  will fail
- We can remove the functions in hex_common that use regtype/regid
  (e.g., is_read)

This patch creates the class hierarchy in hex_common and converts
gen_tcg_funcs.py.  The other scripts will be converted in subsequent
patches in this series.

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg_funcs.py | 583 +++-
 target/hexagon/hex_common.py| 542 +
 2 files changed, 589 insertions(+), 536 deletions(-)

diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index 02d93bc5ce..8c2bc03c10 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -23,466 +23,13 @@
 import hex_common
 
 
-##
-## Helpers for gen_tcg_func
-##
-def gen_decl_ea_tcg(f, tag):
-f.write("TCGv EA G_GNUC_UNUSED = tcg_temp_new();\n")
-
-
-def genptr_decl_pair_writable(f, tag, regtype, regid, regno):
-regN = f"{regtype}{regid}N"
-if regtype == "R":
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-elif regtype == "C":
-f.write(f"const int {regN} = insn->regno[{regno}] + 
HEX_REG_SA0;\n")
-else:
-hex_common.bad_register(regtype, regid)
-f.write(f"TCGv_i64 {regtype}{regid}V = " f"get_result_gpr_pair(ctx, 
{regN});\n")
-
-
-def genptr_decl_writable(f, tag, regtype, regid, regno):
-regN = f"{regtype}{regid}N"
-if regtype == "R":
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"TCGv {regtype}{regid}V = get_result_gpr(ctx, {regN});\n")
-elif regtype == "C":
-f.write(f"const int {regN} = insn->regno[{regno}] + 
HEX_REG_SA0;\n")
-f.write(f"TCGv {regtype}{regid}V = get_result_gpr(ctx, {regN});\n")
-elif regtype == "P":
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-f.write(f"TCGv {regtype}{regid}V = tcg_temp_new();\n")
-else:
-hex_common.bad_register(regtype, regid)
-
-
-def genptr_decl(f, tag, regtype, regid, regno):
-regN = f"{regtype}{regid}N"
-if regtype == "R":
-if regid in {"ss", "tt"}:
-f.write(f"TCGv_i64 {regtype}{regid}V = tcg_temp_new_i64();\n")
-f.write(f"const int {regN} = insn->regno[{regno}];\n")
-elif regid in {"dd", "ee", "xx", "yy"}:
-genptr_decl_pair_writable(f, tag, regtype, regid, regno)
-elif regid in {"s", "t", "u", "v"}:
-f.write(
-f"TCGv {regtype}{regid}V = " 
f"hex_gpr[insn->regno[{regno}]];\n"
-)
-elif regid in {"d", "e", "x", "y"}:
-genptr_decl_writable(f, tag, regtype, regid, regno)
-else:
-hex_common.bad_register(regtype, regid)
-elif regtype == "P":
-if regid in {"s", "t", "u", "v"}:
-f.write(
-f"TCGv {regtype}{regid}V = " 
f"hex_pred[insn->regno[{regno}]];\n"
-)
-elif regid in {"d", "e", "x"}:
-genptr_decl_writable(f, tag, regtype, regid, regno)
-else:
-hex_common.bad_register(regtype, regid)
-elif regtype == "C":
-if regid == "ss":
-f.write(f"TCGv_i64 {regtype}{regid}V = " 
f"tcg_temp_new_i64();\n")
-f.write(f"const int {regN} = insn->regno[{regno}] + " 
"HEX_REG_SA0;\n")
-elif regid == "dd":
-genptr_decl_pair_writable(f, tag, regtype, regid, regno)
-elif regid == "s":
-f.write(f"TCGv {regtype}{regid}V = tcg_temp_new();\n")
-f.write(
-f"const int {regtype}{regid}N = insn->regno[{regno}] + "
-"HEX_REG_SA0;\n"
-)
-elif regid == "d":
-genptr_decl_writable(f, tag, regtype, regid, regno)
-else:
-hex_common.bad_register(regtype, regid)
-elif regtype == "M":
-if regid == "u":
-f.write(
-f"const int {regN} = insn->regno[{regno}] + HEX_REG_M0;\n"
-)
-f.write(
-f"TCGv {regtype}{regid}V = hex_gpr[{regN}];\n"
-)
-f.write(
-f"TCGv CS G_GNUC_UNUSED = "
-f"hex_gpr[{regN} - HEX_REG_M0 + HEX_REG_CS0];\n"
-)
-else:
-hex_common.bad_register(regtype, regid)
-elif regtype == "V":
-if regid in {"dd"}:
-f.write(f"const int {regtype}{regid}N = " 
f"insn->regno[{regn

[PATCH 6/9] Hexagon (target/hexagon) Make generators object oriented - gen_op_regs

2023-12-04 Thread Taylor Simpson

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_op_regs.py | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/gen_op_regs.py b/target/hexagon/gen_op_regs.py
index a8a7712129..7b7b33895a 100755
--- a/target/hexagon/gen_op_regs.py
+++ b/target/hexagon/gen_op_regs.py
@@ -70,6 +70,7 @@ def strip_reg_prefix(x):
 def main():
 hex_common.read_semantics_file(sys.argv[1])
 hex_common.read_attribs_file(sys.argv[2])
+hex_common.init_registers()
 tagregs = hex_common.get_tagregs(full=True)
 tagimms = hex_common.get_tagimms()
 
@@ -80,11 +81,12 @@ def main():
 wregs = []
 regids = ""
 for regtype, regid, _, numregs in regs:
-if hex_common.is_read(regid):
+reg = hex_common.get_register(tag, regtype, regid)
+if reg.is_read():
 if regid[0] not in regids:
 regids += regid[0]
 rregs.append(regtype + regid + numregs)
-if hex_common.is_written(regid):
+if reg.is_written():
 wregs.append(regtype + regid + numregs)
 if regid[0] not in regids:
 regids += regid[0]
-- 
2.34.1

Re: [RFC 0/8] Support generic Luks encryption

2023-12-04 Thread Yong Huang

On Tue, Dec 5, 2023 at 1:44 AM Daniel P. Berrangé 
wrote:

> On Tue, Dec 05, 2023 at 01:32:51AM +0800, Yong Huang wrote:
> > On Tue, Dec 5, 2023 at 12:51 AM Daniel P. Berrangé 
> > wrote:
> >
> > > On Tue, Dec 05, 2023 at 12:41:16AM +0800, Yong Huang wrote:
> > > > On Tue, Dec 5, 2023 at 12:24 AM Daniel P. Berrangé <
> berra...@redhat.com>
> > > > wrote:
> > > >
> > > > > On Tue, Dec 05, 2023 at 12:06:17AM +0800, Hyman Huang wrote:
> > > > > > This functionality was motivated by the following to-do list seen
> > > > > > in crypto documents:
> > > > > > https://wiki.qemu.org/Features/Block/Crypto
> > > > > >
> > > > > > The last chapter says we should "separate header volume":
> > > > > >
> > > > > > The LUKS format has ability to store the header in a separate
> volume
> > > > > > from the payload. We should extend the LUKS driver in QEMU to
> support
> > > > > > this use case.
> > > > > >
> > > > > > As a proof-of-concept, I've created this patchset, which I've
> named
> > > > > > the Gluks: generic luks. As their name suggests, they offer
> > > encryption
> > > > > > for any format that QEMU theoretically supports.
> > > > >
> > > > > I don't see the point in creating a new driver.
> > > > >
> > > > > I would expect detached header support to be implemented via an
> > > > > optional new 'header' field in the existing driver. ie
> > > > >
> > > > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > > > index ca390c5700..48d1f2a974 100644
> > > > > --- a/qapi/block-core.json
> > > > > +++ b/qapi/block-core.json
> > > > > @@ -3352,11 +3352,15 @@
> > > > >  # decryption key (since 2.6). Mandatory except when doing a
> > > > >  # metadata-only probe of the image.
> > > > >  #
> > > > > +# @header: optional reference to the location of a blockdev
> > > > > +# storing a detached LUKS heaer
> > > > > +#
> > > > >  # Since: 2.9
> > > > >  ##
> > > > >  { 'struct': 'BlockdevOptionsLUKS',
> > > > >'base': 'BlockdevOptionsGenericFormat',
> > > > > -  'data': { '*key-secret': 'str' } }
> > > > > +  'data': { '*key-secret': 'str',
> > > > > +"*header-file': 'BlockdevRef'} }
> > > > >
> > > > >  ##
> > > > >  # @BlockdevOptionsGenericCOWFormat:
> > > > > @@ -4941,9 +4945,18 @@
> > > > >  #
> > > > >  # Driver specific image creation options for LUKS.
> > > > >  #
> > > > > -# @file: Node to create the image format on
> > > > > +# @file: Node to create the image format on. Mandatory
> > > > > +# unless a detached header file is specified using
> > > > > +# @header.
> > > > >  #
> > > > > -# @size: Size of the virtual disk in bytes
> > > > > +# @size: Size of the virtual disk in bytes.  Mandatory
> > > > > +# unless a detached header file is specified using
> > > > > +# @header.
> > > > > +#
> > > > > +# @header: optional reference to the location of a blockdev
> > > > > +# storing a detached LUKS heaer. The @file option is
> > > > > +# is optional when this is given, unless it is desired
> > > > > +# to perform pre-allocation
> > > > >  #
> > > > >  # @preallocation: Preallocation mode for the new image (since:
> 4.2)
> > > > >  # (default: off; allowed values: off, metadata, falloc, full)
> > > > > @@ -4952,8 +4965,9 @@
> > > > >  ##
> > > > >  { 'struct': 'BlockdevCreateOptionsLUKS',
> > > > >'base': 'QCryptoBlockCreateOptionsLUKS',
> > > > > -  'data': { 'file': 'BlockdevRef',
> > > > > -'size': 'size',
> > > > > +  'data': { '*file':'BlockdevRef',
> > > > > +'*size':'size',
> > > > > +'*header':  'BlockdevRef'
> > > > >  '*preallocation':   'PreallocMode' } }
> > > > >
> > > > >  ##
> > > > >
> > > > > It ends up giving basicallly the same workflow as you outline,
> > > > > without needing the new block driver
> > > > >
> > > >
> > > > How about the design and usage, could it be simpler? Any advice? :)
> > > >
> > > >
> > > > As you can see below, the Gluks format block layer driver's design is
> > > > quite simple.
> > > >
> > > >  virtio-blk/vhost-user-blk...(front-end device)
> > > >   ^
> > > >   |
> > > >  Gluks   (format-like disk node)
> > > >   / \
> > > >file   header (blockdev reference)
> > > > / \
> > > >  filefile (protocol node)
> > > >|   |
> > > >disk data   Luks data
> > >
> > > What I suggested above ends up with the exact same block driver
> > > graph, unless I'm missing something.
> > >
> >
> > I could overlook something or fail to adequately convey the goal of the
> > patchset. :(
> >
> > Indeed, utilizing the same block driver might be effective if our only
> goal
> > is to divide the header volume, giving us an additional way to use Luks.
> >
> > While supporting encryption for any disk format that QEMU is capable of
> > supporting is another feature of this patchset. This implies th

[PATCH 8/9] Hexagon (target/hexagon) Remove unused WRITES_PRED_REG attribute

2023-12-04 Thread Taylor Simpson

This is the only remaining use of the is_written function.  We will
remove it in the subsequent commit.

Signed-off-by: Taylor Simpson 
---
 target/hexagon/attribs_def.h.inc |  1 -
 target/hexagon/hex_common.py | 11 ---
 2 files changed, 12 deletions(-)

diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 21d457fa4a..87942d46f4 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -117,7 +117,6 @@ DEF_ATTRIB(IMPLICIT_READS_P1, "Reads the P1 register", "", 
"")
 DEF_ATTRIB(IMPLICIT_READS_P2, "Reads the P2 register", "", "")
 DEF_ATTRIB(IMPLICIT_READS_P3, "Reads the P3 register", "", "")
 DEF_ATTRIB(IMPLICIT_WRITES_USR, "May write USR", "", "")
-DEF_ATTRIB(WRITES_PRED_REG, "Writes a predicate register", "", "")
 DEF_ATTRIB(COMMUTES, "The operation is communitive", "", "")
 DEF_ATTRIB(DEALLOCRET, "dealloc_return", "", "")
 DEF_ATTRIB(DEALLOCFRAME, "deallocframe", "", "")
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index e64d114cf3..59fed78ab0 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -94,10 +94,6 @@ def is_cond_call(tag):
 def calculate_attribs():
 add_qemu_macro_attrib("fREAD_PC", "A_IMPLICIT_READS_PC")
 add_qemu_macro_attrib("fTRAP", "A_IMPLICIT_READS_PC")
-add_qemu_macro_attrib("fWRITE_P0", "A_WRITES_PRED_REG")
-add_qemu_macro_attrib("fWRITE_P1", "A_WRITES_PRED_REG")
-add_qemu_macro_attrib("fWRITE_P2", "A_WRITES_PRED_REG")
-add_qemu_macro_attrib("fWRITE_P3", "A_WRITES_PRED_REG")
 add_qemu_macro_attrib("fSET_OVERFLOW", "A_IMPLICIT_WRITES_USR")
 add_qemu_macro_attrib("fSET_LPCFG", "A_IMPLICIT_WRITES_USR")
 add_qemu_macro_attrib("fLOAD", "A_SCALAR_LOAD")
@@ -122,13 +118,6 @@ def calculate_attribs():
 continue
 macro = macros[macname]
 attribdict[tag] |= set(macro.attribs)
-# Figure out which instructions write predicate registers
-tagregs = get_tagregs()
-for tag in tags:
-regs = tagregs[tag]
-for regtype, regid in regs:
-if regtype == "P" and is_written(regid):
-attribdict[tag].add("A_WRITES_PRED_REG")
 # Mark conditional jumps and calls
 # Not all instructions are properly marked with A_CONDEXEC
 for tag in tags:
-- 
2.34.1

[PATCH 3/9] Hexagon (target/hexagon) Make generators object oriented - gen_helper_protos

2023-12-04 Thread Taylor Simpson

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_helper_protos.py | 184 
 target/hexagon/hex_common.py|  15 +--
 2 files changed, 55 insertions(+), 144 deletions(-)

diff --git a/target/hexagon/gen_helper_protos.py 
b/target/hexagon/gen_helper_protos.py
index 131043795a..9277199e1d 100755
--- a/target/hexagon/gen_helper_protos.py
+++ b/target/hexagon/gen_helper_protos.py
@@ -22,39 +22,6 @@
 import string
 import hex_common
 
-##
-## Helpers for gen_helper_prototype
-##
-def_helper_types = {
-"N": "s32",
-"O": "s32",
-"P": "s32",
-"M": "s32",
-"C": "s32",
-"R": "s32",
-"V": "ptr",
-"Q": "ptr",
-}
-
-def_helper_types_pair = {
-"R": "s64",
-"C": "s64",
-"S": "s64",
-"G": "s64",
-"V": "ptr",
-"Q": "ptr",
-}
-
-
-def gen_def_helper_opn(f, tag, regtype, regid, i):
-if hex_common.is_pair(regid):
-f.write(f", {def_helper_types_pair[regtype]}")
-elif hex_common.is_single(regid):
-f.write(f", {def_helper_types[regtype]}")
-else:
-hex_common.bad_register(regtype, regid)
-
-
 ##
 ## Generate the DEF_HELPER prototype for an instruction
 ## For A2_add: Rd32=add(Rs32,Rt32)
@@ -65,116 +32,62 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
 regs = tagregs[tag]
 imms = tagimms[tag]
 
-numresults = 0
+## If there is a scalar result, it is the return type
+return_type = ""
 numscalarresults = 0
-numscalarreadwrite = 0
 for regtype, regid in regs:
-if hex_common.is_written(regid):
-numresults += 1
-if hex_common.is_scalar_reg(regtype):
+reg = hex_common.get_register(tag, regtype, regid)
+if reg.is_written() and reg.is_scalar_reg():
+return_type = reg.helper_proto_type()
 numscalarresults += 1
-if hex_common.is_readwrite(regid):
-if hex_common.is_scalar_reg(regtype):
-numscalarreadwrite += 1
+if numscalarresults == 0:
+return_type = "void"
 
 if numscalarresults > 1:
-## The helper is bogus when there is more than one result
-f.write(f"DEF_HELPER_1({tag}, void, env)\n")
-else:
-## Figure out how many arguments the helper will take
-if numscalarresults == 0:
-def_helper_size = len(regs) + len(imms) + numscalarreadwrite + 1
-if hex_common.need_pkt_has_multi_cof(tag):
-def_helper_size += 1
-if hex_common.need_pkt_need_commit(tag):
-def_helper_size += 1
-if hex_common.need_part1(tag):
-def_helper_size += 1
-if hex_common.need_slot(tag):
-def_helper_size += 1
-if hex_common.need_PC(tag):
-def_helper_size += 1
-if hex_common.helper_needs_next_PC(tag):
-def_helper_size += 1
-if hex_common.need_condexec_reg(tag, regs):
-def_helper_size += 1
-f.write(f"DEF_HELPER_{def_helper_size}({tag}")
-## The return type is void
-f.write(", void")
-else:
-def_helper_size = len(regs) + len(imms) + numscalarreadwrite
-if hex_common.need_pkt_has_multi_cof(tag):
-def_helper_size += 1
-if hex_common.need_pkt_need_commit(tag):
-def_helper_size += 1
-if hex_common.need_part1(tag):
-def_helper_size += 1
-if hex_common.need_slot(tag):
-def_helper_size += 1
-if hex_common.need_PC(tag):
-def_helper_size += 1
-if hex_common.need_condexec_reg(tag, regs):
-def_helper_size += 1
-if hex_common.helper_needs_next_PC(tag):
-def_helper_size += 1
-f.write(f"DEF_HELPER_{def_helper_size}({tag}")
-
-## Generate the qemu DEF_HELPER type for each result
-## Iterate over this list twice
-## - Emit the scalar result
-## - Emit the vector result
-i = 0
-for regtype, regid in regs:
-if hex_common.is_written(regid):
-if not hex_common.is_hvx_reg(regtype):
-gen_def_helper_opn(f, tag, regtype, regid, i)
-i += 1
+raise Exception("numscalarresults > 1")
 
-## Put the env between the outputs and inputs
-f.write(", env")
-i += 1
+declared = []
+declared.append(return_type)
 
-# Second pass
-for regtype, regid in regs:
-if hex_common.is_written(regid):
-if hex_common.is_hvx_reg(regtype):
-gen_def_helper_opn(f, tag, regtype, regid, i)
-i += 1
-
-## For conditional instructions, we pass in the destination register
-if "A_CONDEXEC" in hex_common.attribdict[tag]:
-for regtype, regid in regs:
-if hex_common.is_write

[PATCH 0/9] Hexagon (target/hexagon) Make generators object oriented

2023-12-04 Thread Taylor Simpson

See commit message in second patch

Taylor Simpson (9):
  Hexagon (target/hexagon) Clean up handling of modifier registers
  Hexagon (target/hexagon) Make generators object oriented -
gen_tcg_funcs
  Hexagon (target/hexagon) Make generators object oriented -
gen_helper_protos
  Hexagon (target/hexagon) Make generators object oriented -
gen_helper_funcs
  Hexagon (target/hexagon) Make generators object oriented -
gen_idef_parser_funcs
  Hexagon (target/hexagon) Make generators object oriented - gen_op_regs
  Hexagon (target/hexagon) Make generators object oriented -
gen_analyze_funcs
  Hexagon (target/hexagon) Remove unused WRITES_PRED_REG attribute
  Hexagon (target/hexagon) Remove dead functions from hex_common.py

 target/hexagon/gen_tcg.h|   9 +-
 target/hexagon/macros.h |   3 +-
 target/hexagon/attribs_def.h.inc|   1 -
 target/hexagon/idef-parser/parser-helpers.c |   8 +-
 target/hexagon/gen_analyze_funcs.py | 163 +---
 target/hexagon/gen_helper_funcs.py  | 400 +++---
 target/hexagon/gen_helper_protos.py | 184 ++---
 target/hexagon/gen_idef_parser_funcs.py |  20 +-
 target/hexagon/gen_op_regs.py   |   6 +-
 target/hexagon/gen_tcg_funcs.py | 578 ++
 target/hexagon/hex_common.py| 818 ++--
 11 files changed, 963 insertions(+), 1227 deletions(-)

-- 
2.34.1

[PATCH 5/9] Hexagon (target/hexagon) Make generators object oriented - gen_idef_parser_funcs

2023-12-04 Thread Taylor Simpson

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_idef_parser_funcs.py | 20 
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/target/hexagon/gen_idef_parser_funcs.py 
b/target/hexagon/gen_idef_parser_funcs.py
index f4518e653f..550a48cb7b 100644
--- a/target/hexagon/gen_idef_parser_funcs.py
+++ b/target/hexagon/gen_idef_parser_funcs.py
@@ -46,6 +46,7 @@ def main():
 hex_common.read_semantics_file(sys.argv[1])
 hex_common.read_attribs_file(sys.argv[2])
 hex_common.calculate_attribs()
+hex_common.init_registers()
 tagregs = hex_common.get_tagregs()
 tagimms = hex_common.get_tagimms()
 
@@ -132,22 +133,9 @@ def main():
 
 arguments = []
 for regtype, regid in regs:
-prefix = "in " if hex_common.is_read(regid) else ""
-
-is_pair = hex_common.is_pair(regid)
-is_single_old = hex_common.is_single(regid) and 
hex_common.is_old_val(
-regtype, regid, tag
-)
-is_single_new = hex_common.is_single(regid) and 
hex_common.is_new_val(
-regtype, regid, tag
-)
-
-if is_pair or is_single_old:
-arguments.append(f"{prefix}{regtype}{regid}V")
-elif is_single_new:
-arguments.append(f"{prefix}{regtype}{regid}N")
-else:
-hex_common.bad_register(regtype, regid)
+reg = hex_common.get_register(tag, regtype, regid)
+prefix = "in " if reg.is_read() else ""
+arguments.append(f"{prefix}{reg.reg_tcg()}")
 
 for immlett, bits, immshift in imms:
 arguments.append(hex_common.imm_name(immlett))
-- 
2.34.1

[RFC PATCH 07/11] hw/acpi: Add ACPI GED support for the sleep event

2023-12-04 Thread Annie Li

From: Miguel Luis 

Add support for ACPI GED sleep event on the ACPI device interface so that
HW-reduced systems can enable guests to sleep.

Signed-off-by: Miguel Luis 
---
 hw/acpi/generic_event_device.c | 9 +
 include/hw/acpi/generic_event_device.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index a3d31631fe..97a6f82b35 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -13,6 +13,7 @@
 #include "qapi/error.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/generic_event_device.h"
+#include "hw/acpi/control_method_device.h"
 #include "hw/irq.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
@@ -25,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
 ACPI_GED_MEM_HOTPLUG_EVT,
 ACPI_GED_PWR_DOWN_EVT,
 ACPI_GED_NVDIMM_HOTPLUG_EVT,
+ACPI_GED_SLEEP_EVT,
 };
 
 /*
@@ -117,6 +119,11 @@ void build_ged_aml(Aml *table, const char *name, 
HotplugHandler *hotplug_dev,
aml_notify(aml_name("\\_SB.NVDR"),
   aml_int(0x80)));
 break;
+case ACPI_GED_SLEEP_EVT:
+aml_append(if_ctx,
+   aml_notify(aml_name(ACPI_SLEEP_BUTTON_DEVICE),
+  aml_int(0x80)));
+break;
 default:
 /*
  * Please make sure all the events in ged_supported_events[]
@@ -284,6 +291,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, 
AcpiEventStatusBits ev)
 sel = ACPI_GED_MEM_HOTPLUG_EVT;
 } else if (ev & ACPI_POWER_DOWN_STATUS) {
 sel = ACPI_GED_PWR_DOWN_EVT;
+} else if (ev & ACPI_SLEEP_STATUS) {
+sel = ACPI_GED_SLEEP_EVT;
 } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
 sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
 } else {
diff --git a/include/hw/acpi/generic_event_device.h 
b/include/hw/acpi/generic_event_device.h
index ba84ce0214..6186bdf368 100644
--- a/include/hw/acpi/generic_event_device.h
+++ b/include/hw/acpi/generic_event_device.h
@@ -95,6 +95,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
 #define ACPI_GED_MEM_HOTPLUG_EVT   0x1
 #define ACPI_GED_PWR_DOWN_EVT  0x2
 #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
+#define ACPI_GED_SLEEP_EVT 0x8
 
 typedef struct GEDState {
 MemoryRegion evt;
-- 
2.34.3

[RFC PATCH 09/11] hw/arm: enable sleep support for arm/virt

2023-12-04 Thread Annie Li

From: Miguel Luis 

Include the ACPI control method device into arm/virt ACPI tables and the
corresponding handling which enables triggering the event.

Signed-off-by: Miguel Luis 
---
 hw/arm/virt-acpi-build.c | 13 +
 hw/arm/virt.c| 13 -
 include/hw/arm/virt.h|  1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 8bc35a483c..15e00cc5dc 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -42,6 +42,7 @@
 #include "hw/acpi/pci.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/generic_event_device.h"
+#include "hw/acpi/control_method_device.h"
 #include "hw/acpi/tpm.h"
 #include "hw/acpi/hmat.h"
 #include "hw/pci/pcie_host.h"
@@ -816,6 +817,17 @@ static void build_fadt_rev6(GArray *table_data, BIOSLinker 
*linker,
 .rev = 6,
 .minor_ver = 0,
 .flags = 1 << ACPI_FADT_F_HW_REDUCED_ACPI,
+/* ACPI 5.0: 4.8.3.7 Sleep Control and Status Registers */
+.sleep_ctl = {
+.space_id = AML_AS_SYSTEM_MEMORY,
+.bit_width = 8,
+.address = vms->memmap[VIRT_ACPI_GED].base + 
ACPI_GED_REG_SLEEP_CTL,
+},
+.sleep_sts = {
+.space_id = AML_AS_SYSTEM_MEMORY,
+.bit_width = 8,
+.address = vms->memmap[VIRT_ACPI_GED].base + 
ACPI_GED_REG_SLEEP_STS,
+},
 .xdsdt_tbl_offset = &dsdt_tbl_offset,
 };
 
@@ -890,6 +902,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 }
 
 acpi_dsdt_add_power_button(scope);
+acpi_dsdt_add_sleep_button(scope);
 #ifdef CONFIG_TPM
 acpi_dsdt_add_tpm(scope, vms);
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index be2856c018..8b9a328360 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -644,7 +644,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState 
*vms)
 DeviceState *dev;
 MachineState *ms = MACHINE(vms);
 int irq = vms->irqmap[VIRT_ACPI_GED];
-uint32_t event = ACPI_GED_PWR_DOWN_EVT;
+uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_SLEEP_EVT;
 
 if (ms->ram_slots) {
 event |= ACPI_GED_MEM_HOTPLUG_EVT;
@@ -931,6 +931,14 @@ static void create_rtc(const VirtMachineState *vms)
 g_free(nodename);
 }
 
+static void virt_sleep_req(Notifier *n, void *opaque)
+{
+VirtMachineState *s = container_of(n, VirtMachineState, sleep_notifier);
+
+if (s->acpi_dev) {
+acpi_send_event(s->acpi_dev, ACPI_SLEEP_STATUS);
+}
+}
 static DeviceState *gpio_key_dev;
 static void virt_powerdown_req(Notifier *n, void *opaque)
 {
@@ -2299,6 +2307,9 @@ static void machvirt_init(MachineState *machine)
 create_gpio_devices(vms, VIRT_SECURE_GPIO, secure_sysmem);
 }
 
+ /* connect sleep request */
+ vms->sleep_notifier.notify = virt_sleep_req;
+
  /* connect powerdown request */
  vms->powerdown_notifier.notify = virt_powerdown_req;
  qemu_register_powerdown_notifier(&vms->powerdown_notifier);
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index f69239850e..82598c1879 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -170,6 +170,7 @@ struct VirtMachineState {
 DeviceState *gic;
 DeviceState *acpi_dev;
 Notifier powerdown_notifier;
+Notifier sleep_notifier;
 PCIBus *bus;
 char *oem_id;
 char *oem_table_id;
-- 
2.34.3

[RFC PATCH 03/11] test/acpi: allow DSDT table changes

2023-12-04 Thread Annie Li

List various DSDT files allowed to be changed in
tests/qtest/bios-tables-test-allowed-diff.h

Signed-off-by: Annie Li 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 41 +
 1 file changed, 41 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..eb309b1493 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,42 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/DSDT",
+"tests/data/acpi/pc/DSDT.acpierst",
+"tests/data/acpi/pc/DSDT.acpihmat",
+"tests/data/acpi/pc/DSDT.bridge",
+"tests/data/acpi/pc/DSDT.cphp",
+"tests/data/acpi/pc/DSDT.dimmpxm",
+"tests/data/acpi/pc/DSDT.hpbridge",
+"tests/data/acpi/pc/DSDT.hpbrroot",
+"tests/data/acpi/pc/DSDT.ipmikcs",
+"tests/data/acpi/pc/DSDT.memhp",
+"tests/data/acpi/pc/DSDT.nohpet",
+"tests/data/acpi/pc/DSDT.numamem",
+"tests/data/acpi/pc/DSDT.roothp",
+"tests/data/acpi/q35/DSDT",
+"tests/data/acpi/q35/DSDT.acpierst",
+"tests/data/acpi/q35/DSDT.acpihmat",
+"tests/data/acpi/q35/DSDT.acpihmat-noinitiator",
+"tests/data/acpi/q35/DSDT.applesmc",
+"tests/data/acpi/q35/DSDT.bridge",
+"tests/data/acpi/q35/DSDT.core-count",
+"tests/data/acpi/q35/DSDT.core-count2",
+"tests/data/acpi/q35/DSDT.cphp",
+"tests/data/acpi/q35/DSDT.cxl",
+"tests/data/acpi/q35/DSDT.dimmpxm",
+"tests/data/acpi/q35/DSDT.ipmibt",
+"tests/data/acpi/q35/DSDT.ipmismbus",
+"tests/data/acpi/q35/DSDT.ivrs",
+"tests/data/acpi/q35/DSDT.memhp",
+"tests/data/acpi/q35/DSDT.mmio64",
+"tests/data/acpi/q35/DSDT.multi-bridge",
+"tests/data/acpi/q35/DSDT.noacpihp",
+"tests/data/acpi/q35/DSDT.nohpet",
+"tests/data/acpi/q35/DSDT.numamem",
+"tests/data/acpi/q35/DSDT.pvpanic-isa",
+"tests/data/acpi/q35/DSDT.thread-count",
+"tests/data/acpi/q35/DSDT.thread-count2",
+"tests/data/acpi/q35/DSDT.tis.tpm12",
+"tests/data/acpi/q35/DSDT.tis.tpm2",
+"tests/data/acpi/q35/DSDT.type4-count",
+"tests/data/acpi/q35/DSDT.viot",
+"tests/data/acpi/q35/DSDT.xapic",
-- 
2.34.3

[RFC PATCH 11/11] arm/virt: enable sleep support

2023-12-04 Thread Annie Li

From: Miguel Luis 

This is for reference that qmp_system_sleep relies on wakeup support delegated
by qemu_wakeup_suspend_enabled() hence the need for calling
qemu_register_wakeup_support().

Signed-off-by: Miguel Luis 
---
 hw/arm/virt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 8b9a328360..6407734105 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2309,6 +2309,7 @@ static void machvirt_init(MachineState *machine)
 
  /* connect sleep request */
  vms->sleep_notifier.notify = virt_sleep_req;
+ qemu_register_wakeup_support();
 
  /* connect powerdown request */
  vms->powerdown_notifier.notify = virt_powerdown_req;
-- 
2.34.3

[RFC PATCH 10/11] tests/acpi: Update FACP and DSDT tables for sleep button

2023-12-04 Thread Annie Li

From: Miguel Luis 

  *
  * ACPI Data Table [FACP]
  *
  * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
  */

 [000h    4]Signature : "FACP"[Fixed ACPI 
Description Table (FADT)]
 [004h 0004   4] Table Length : 0114
 [008h 0008   1] Revision : 06
-[009h 0009   1] Checksum : 15
+[009h 0009   1] Checksum : E2
 [00Ah 0010   6]   Oem ID : "BOCHS "
 [010h 0016   8] Oem Table ID : "BXPC"
 [018h 0024   4] Oem Revision : 0001
 [01Ch 0028   4]  Asl Compiler ID : "BXPC"
 [020h 0032   4]Asl Compiler Revision : 0001

 [024h 0036   4] FACS Address : 
 [028h 0040   4] DSDT Address : 
 [02Ch 0044   1]Model : 00
 [02Dh 0045   1]   PM Profile : 00 [Unspecified]
 [02Eh 0046   2]SCI Interrupt : 
 [030h 0048   4] SMI Command Port : 
 [034h 0052   1]ACPI Enable Value : 00
 [035h 0053   1]   ACPI Disable Value : 00
 [036h 0054   1]   S4BIOS Command : 00
 [037h 0055   1]  P-State Control : 00
@@ -148,50 +148,50 @@
 [0DCh 0220   1] Space ID : 00 [SystemMemory]
 [0DDh 0221   1]Bit Width : 00
 [0DEh 0222   1]   Bit Offset : 00
 [0DFh 0223   1] Encoded Access Width : 00 [Undefined/Legacy]
 [0E0h 0224   8]  Address : 

 [0E8h 0232  12]   GPE1 Block : [Generic Address Structure]
 [0E8h 0232   1] Space ID : 00 [SystemMemory]
 [0E9h 0233   1]Bit Width : 00
 [0EAh 0234   1]   Bit Offset : 00
 [0EBh 0235   1] Encoded Access Width : 00 [Undefined/Legacy]
 [0ECh 0236   8]  Address : 

 [0F4h 0244  12]   Sleep Control Register : [Generic Address Structure]
 [0F4h 0244   1] Space ID : 00 [SystemMemory]
-[0F5h 0245   1]Bit Width : 00
+[0F5h 0245   1]Bit Width : 08
 [0F6h 0246   1]   Bit Offset : 00
 [0F7h 0247   1] Encoded Access Width : 00 [Undefined/Legacy]
-[0F8h 0248   8]  Address : 
+[0F8h 0248   8]  Address : 0908

 [100h 0256  12]Sleep Status Register : [Generic Address Structure]
 [100h 0256   1] Space ID : 00 [SystemMemory]
-[101h 0257   1]Bit Width : 00
+[101h 0257   1]Bit Width : 08
 [102h 0258   1]   Bit Offset : 00
 [103h 0259   1] Encoded Access Width : 00 [Undefined/Legacy]
-[104h 0260   8]  Address : 
+[104h 0260   8]  Address : 09080001

 [10Ch 0268   8]Hypervisor ID : 554D4551

 Raw Table Data: Length 276 (0x114)

-: 46 41 43 50 14 01 00 00 06 15 42 4F 43 48 53 20  // FACP..BOCHS
+: 46 41 43 50 14 01 00 00 06 E2 42 4F 43 48 53 20  // FACP..BOCHS
 0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPCBXPC
 0020: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 0070: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 0080: 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 00D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
 00E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
-00F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
-0100: 00 00 00 00 00 00 00 00 00 00 00 00 51 45 4D 55  // QEMU
+00F0: 00 00 00 00 00 08 00 00 00 00 08 09 00 00 00 00  // 
+0100: 00 08 00 00 01 00 08 09 00 00 00 00 51 45 4D 55  // QEMU
 0110: 00 00 00 00  // 

  *
  * Original Table Header:
  * Signature"DSDT"
- * Length   0x144C (5196)
+ * Length   0x149E (5278)
  * Revision 0x02
- * Checksum 0x9F
+ * Checksum 0x2B
  * OEM ID

[RFC PATCH 08/11] tests/acpi: allow FACP and DSDT table changes for arm/virt

2023-12-04 Thread Annie Li

From: Miguel Luis 

List changed files for FACP and DSDT table changes for the arm/virt.

Signed-off-by: Miguel Luis 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..83d368734c 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,8 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/virt/DSDT",
+"tests/data/acpi/virt/DSDT.acpihmatvirt",
+"tests/data/acpi/virt/DSDT.memhp",
+"tests/data/acpi/virt/DSDT.pxb",
+"tests/data/acpi/virt/DSDT.topology",
+"tests/data/acpi/virt/FACP",
+
-- 
2.34.3

[RFC PATCH 06/11] acpi: Send the GPE event of suspend and wakeup for x86

2023-12-04 Thread Annie Li

The GPE event is triggered to notify the guest to suspend or
wakeup itself. This patch removes the previous implementation
of QEMU_WAKEUP_REASON_OTHER pretending the resume was caused
by power button.

Signed-off-by: Annie Li 
---
 hw/acpi/core.c   | 17 +
 hw/core/machine-qmp-cmds.c   |  2 ++
 include/hw/acpi/acpi.h   |  1 +
 include/hw/acpi/acpi_dev_interface.h |  1 +
 4 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index ec5e127d17..e5c3ff9a54 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -354,6 +354,16 @@ int acpi_get_slic_oem(AcpiSlicOem *oem)
 return -1;
 }
 
+void acpi_send_sleep_wakeup_event(void)
+{
+Object *obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
+
+if (obj) {
+/* Send _GPE.L07 event */
+acpi_send_event(DEVICE(obj), ACPI_SLEEP_STATUS);
+}
+}
+
 static void acpi_notify_wakeup(Notifier *notifier, void *data)
 {
 ACPIREGS *ar = container_of(notifier, ACPIREGS, wakeup);
@@ -369,10 +379,9 @@ static void acpi_notify_wakeup(Notifier *notifier, void 
*data)
 (ACPI_BITMASK_WAKE_STATUS | ACPI_BITMASK_TIMER_STATUS);
 break;
 case QEMU_WAKEUP_REASON_OTHER:
-/* ACPI_BITMASK_WAKE_STATUS should be set on resume.
-   Pretend that resume was caused by power button */
-ar->pm1.evt.sts |=
-(ACPI_BITMASK_WAKE_STATUS | ACPI_BITMASK_POWER_BUTTON_STATUS);
+/* ACPI_BITMASK_WAKE_STATUS should be set on resume. */
+ar->pm1.evt.sts |= ACPI_BITMASK_WAKE_STATUS;
+acpi_send_sleep_wakeup_event();
 break;
 default:
 break;
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 9f1e636c90..d51802214b 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "hw/acpi/vmgenid.h"
+#include "hw/acpi/acpi.h"
 #include "hw/boards.h"
 #include "hw/intc/intc.h"
 #include "hw/mem/memory-device.h"
@@ -264,6 +265,7 @@ void qmp_system_sleep(Error **errp)
"suspend from running is not supported by this guest");
 return;
 }
+acpi_send_sleep_wakeup_event();
 }
 
 void qmp_system_powerdown(Error **errp)
diff --git a/include/hw/acpi/acpi.h b/include/hw/acpi/acpi.h
index e0e51e85b4..07e31aa138 100644
--- a/include/hw/acpi/acpi.h
+++ b/include/hw/acpi/acpi.h
@@ -181,6 +181,7 @@ uint32_t acpi_gpe_ioport_readb(ACPIREGS *ar, uint32_t addr);
 
 void acpi_send_gpe_event(ACPIREGS *ar, qemu_irq irq,
  AcpiEventStatusBits status);
+void acpi_send_sleep_wakeup_event(void);
 
 void acpi_update_sci(ACPIREGS *acpi_regs, qemu_irq irq);
 
diff --git a/include/hw/acpi/acpi_dev_interface.h 
b/include/hw/acpi/acpi_dev_interface.h
index 68d9d15f50..1cb050cd3a 100644
--- a/include/hw/acpi/acpi_dev_interface.h
+++ b/include/hw/acpi/acpi_dev_interface.h
@@ -13,6 +13,7 @@ typedef enum {
 ACPI_NVDIMM_HOTPLUG_STATUS = 16,
 ACPI_VMGENID_CHANGE_STATUS = 32,
 ACPI_POWER_DOWN_STATUS = 64,
+ACPI_SLEEP_STATUS = 128,
 } AcpiEventStatusBits;
 
 #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
-- 
2.34.3

[RFC PATCH 05/11] tests/acpi/bios-tables-test: update DSDT tables for Control Method Sleep button

2023-12-04 Thread Annie Li

Update various DSDT tables and empty bios-tables-test-allowed-diff.h

Following the step 5 and 6 in tests/qtest/bios-tables-test.c, the changes
in the tables are:

DSDT:

 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20210604 (64-bit version)
  * Copyright (c) 2000 - 2021 Intel Corporation
  *
  * Disassembling to symbolic ASL+ operators
  *
- * Disassembly of tests/data/acpi/pc/DSDT, Mon Dec  4 15:52:25 2023
+ * Disassembly of /tmp/aml-LRMEF2, Mon Dec  4 15:52:25 2023
  *
  * Original Table Header:
  * Signature"DSDT"
- * Length   0x1AAE (6830)
+ * Length   0x1B64 (7012)
  * Revision 0x01  32-bit table (V1), no 64-bit math support
- * Checksum 0x0B
+ * Checksum 0x14
  * OEM ID   "BOCHS "
  * OEM Table ID "BXPC"
  * OEM Revision 0x0001 (1)
  * Compiler ID  "BXPC"
  * Compiler Version 0x0001 (1)
  */
 DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC", 0x0001)
 {
 Scope (\)
 {
 OperationRegion (DBG, SystemIO, 0x0402, One)
 Field (DBG, ByteAcc, NoLock, Preserve)
 {
 DBGB,   8
 }

@@ -488,32 +488,69 @@
 {
 Memory32Fixed (ReadOnly,
 0xFED0, // Address Base
 0x0400, // Address Length
 )
 })
 }
 }

 Scope (_GPE)
 {
 Name (_HID, "ACPI0006" /* GPE Block Device */)  // _HID: Hardware ID
 }

 Scope (_SB)
 {
+Device (\_SB.SLPB)
+{
+Name (_HID, EisaId ("PNP0C0E") /* Sleep Button Device */)  // 
_HID: Hardware ID
+Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
+{
+One,
+0x04
+})
+OperationRegion (\B**, SystemIO, 0x0201, One)
+Field (\B**, ByteAcc, NoLock, WriteAsZeros)
+{
+SBP,1,
+SBW,1
+}
+}
+}
+
+Scope (\_GPE)
+{
+Method (_L07, 0, NotSerialized)  // _Lxx: Level-Triggered GPE, 
xx=0x00-0xFF
+{
+If (\_SB.SLPB.SBP)
+{
+\_SB.SLPB.SBP = One
+Notify (\_SB.SLPB, 0x80) // Status Change
+}
+
+If (\_SB.SLPB.SBW)
+{
+\_SB.SLPB.SBW = One
+Notify (\_SB.SLPB, 0x02) // Device Wake
+}
+}
+}
+
+Scope (_SB)
+{
 Device (\_SB.PCI0.PRES)
 {
 Name (_HID, EisaId ("PNP0A06") /* Generic Container Device */)  // 
_HID: Hardware ID
 Name (_UID, "CPU Hotplug resources")  // _UID: Unique ID
 Mutex (CPLK, 0x00)
 Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
 {
 IO (Decode16,
 0xAF00, // Range Minimum
 0xAF00, // Range Maximum
 0x01,   // Alignment
 0x0C,   // Length
 )
 })
 OperationRegion (PRST, SystemIO, 0xAF00, 0x0C)
 Field (PRST, ByteAcc, NoLock, WriteAsZeros)

Signed-off-by: Annie Li 
---
 tests/data/acpi/pc/DSDT   | Bin 6830 -> 7012 bytes
 tests/data/acpi/pc/DSDT.acpierst  | Bin 6741 -> 6923 bytes
 tests/data/acpi/pc/DSDT.acpihmat  | Bin 8155 -> 8337 bytes
 tests/data/acpi/pc/DSDT.bridge| Bin 13701 -> 13883 bytes
 tests/data/acpi/pc/DSDT.cphp  | Bin 7294 -> 7476 bytes
 tests/data/acpi/pc/DSDT.dimmpxm   | Bin 8484 -> 8666 bytes
 tests/data/acpi/pc/DSDT.hpbridge  | Bin 6781 -> 6963 bytes
 tests/data/acpi/pc/DSDT.hpbrroot  | Bin 3337 -> 3519 bytes
 tests/data/acpi/pc/DSDT.ipmikcs   | Bin 6902 -> 7084 bytes
 tests/data/acpi/pc/DSDT.memhp | Bin 8189 -> 8371 bytes
 tests/data/acpi/pc/DSDT.nohpet| Bin 6688 -> 6870 bytes
 tests/data/acpi/pc/DSDT.numamem   | Bin 6836 -> 7018 bytes
 tests/data/acpi/pc/DSDT.roothp| Bin 10623 -> 10805 bytes
 tests/data/acpi/q35/DSDT  | Bin 8355 -> 8537 bytes
 tests/data/acpi/q35/DSDT.acpierst | Bin 8372 -> 8554 bytes
 tests/data/acpi/q35/DSDT.acpihmat | Bin 9680 -> 9862 bytes
 tests/data/acpi/q35/DSDT.acpihmat-noinitiator | Bin 8634 -> 8816 bytes
 tests/data/acpi/q35/DSDT.applesmc | Bin 8401 -> 8583 bytes
 tests/data/acpi/q35/DSDT.bridge   | Bin 11968 -> 12150 bytes
 tests/data/acpi/q35/DSDT.core-count   | Bin 12913 -> 13095 bytes
 tests/data/acpi/q35/DSDT.core-count2  | Bin 33770 -> 33952 bytes
 tests/data/acpi/q35/DSDT.cphp | Bin 8819 -> 9001 bytes
 tests/data/acpi/q35/DSDT.cxl

[RFC PATCH 04/11] acpi: Support Control Method sleep button for x86

2023-12-04 Thread Annie Li

Add Control Method Sleep button and its GPE event handler for
x86.

Signed-off-by: Annie Li 
---
 hw/i386/acpi-build.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 80db183b78..75985e1423 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -40,6 +40,7 @@
 #include "hw/acpi/acpi_aml_interface.h"
 #include "hw/input/i8042.h"
 #include "hw/acpi/memory_hotplug.h"
+#include "hw/acpi/control_method_device.h"
 #include "sysemu/tpm.h"
 #include "hw/acpi/tpm.h"
 #include "hw/acpi/vmgenid.h"
@@ -1537,6 +1538,14 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 }
 aml_append(dsdt, scope);
 
+sb_scope = aml_scope("_SB");
+acpi_dsdt_add_sleep_button(sb_scope);
+aml_append(dsdt, sb_scope);
+
+scope =  aml_scope("\\_GPE");
+acpi_dsdt_add_sleep_gpe_event_handler(scope);
+aml_append(dsdt, scope);
+
 if (pcmc->legacy_cpu_hotplug) {
 build_legacy_cpu_hotplug_aml(dsdt, machine, pm->cpu_hp_io_base);
 } else {
-- 
2.34.3

[RFC PATCH 02/11] acpi: Implement control method sleep button

2023-12-04 Thread Annie Li

The control method sleep button is added, as well as its GPE event
handler.

Co-Developed-by: Miguel Luis 
Signed-off-by: Annie Li 
---
 hw/acpi/control_method_device.c | 49 +
 hw/acpi/meson.build |  1 +
 include/hw/acpi/control_method_device.h | 20 ++
 3 files changed, 70 insertions(+)

diff --git a/hw/acpi/control_method_device.c b/hw/acpi/control_method_device.c
new file mode 100644
index 00..9e4841b8e2
--- /dev/null
+++ b/hw/acpi/control_method_device.c
@@ -0,0 +1,49 @@
+/*
+ * Control method devices
+ *
+ * Copyright (C) 2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "hw/acpi/control_method_device.h"
+#include "hw/mem/nvdimm.h"
+
+void acpi_dsdt_add_sleep_button(Aml *scope)
+{
+Aml *dev = aml_device("\\_SB."ACPI_SLEEP_BUTTON_DEVICE);
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0C0E")));
+Aml *pkg = aml_package(2);
+aml_append(pkg, aml_int(0x01));
+aml_append(pkg, aml_int(0x04));
+aml_append(dev, aml_name_decl("_PRW", pkg));
+aml_append(dev, aml_operation_region("\\Boo", AML_SYSTEM_IO,
+ aml_int(0x201), 0x1));
+Aml *field = aml_field("\\Boo", AML_BYTE_ACC, AML_NOLOCK,
+   AML_WRITE_AS_ZEROS);
+aml_append(field, aml_named_field("SBP", 1));
+aml_append(field, aml_named_field("SBW", 1));
+aml_append(dev, field);
+aml_append(scope, dev);
+}
+
+void acpi_dsdt_add_sleep_gpe_event_handler(Aml *scope)
+{
+ Aml *method = aml_method("_L07", 0, AML_NOTSERIALIZED);
+ Aml *condition = aml_if(aml_name("\\_SB.SLPB.SBP"));
+ aml_append(condition, aml_store(aml_int(1), aml_name("\\_SB.SLPB.SBP")));
+ aml_append(condition,
+aml_notify(aml_name("\\_SB."ACPI_SLEEP_BUTTON_DEVICE),
+aml_int(0x80)));
+ aml_append(method, condition);
+ condition = aml_if(aml_name("\\_SB.SLPB.SBW"));
+ aml_append(condition, aml_store(aml_int(1), aml_name("\\_SB.SLPB.SBW")));
+ aml_append(condition,
+aml_notify(aml_name("\\_SB."ACPI_SLEEP_BUTTON_DEVICE),
+aml_int(0x2)));
+ aml_append(method, condition);
+ aml_append(scope, method);
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index fc1b952379..486d28cf42 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -16,6 +16,7 @@ acpi_ss.add(when: 'CONFIG_ACPI_PCI', if_true: files('pci.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_CXL', if_true: files('cxl.c'), if_false: 
files('cxl-stub.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_VMGENID', if_true: files('vmgenid.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_HW_REDUCED', if_true: 
files('generic_event_device.c'))
+acpi_ss.add(when: 'CONFIG_ACPI_HW_REDUCED', if_true: 
files('control_method_device.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_HMAT', if_true: files('hmat.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'), if_false: 
files('ghes-stub.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_PIIX4', if_true: files('piix4.c'))
diff --git a/include/hw/acpi/control_method_device.h 
b/include/hw/acpi/control_method_device.h
new file mode 100644
index 00..bce20512c4
--- /dev/null
+++ b/include/hw/acpi/control_method_device.h
@@ -0,0 +1,20 @@
+/*
+ * Control method devices
+ *
+ * Copyright (C) 2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+
+#ifndef HW_ACPI_CONTROL_METHOD_DEVICE_H
+#define HW_ACPI_CONTROL_NETHOD_DEVICE_H
+
+#define ACPI_SLEEP_BUTTON_DEVICE "SLPB"
+
+void acpi_dsdt_add_sleep_button(Aml *scope);
+void acpi_dsdt_add_sleep_gpe_event_handler(Aml *scope);
+
+#endif
-- 
2.34.3

[RFC PATCH 01/11] acpi: hmp/qmp: Add hmp/qmp support for system_sleep

2023-12-04 Thread Annie Li

Following hmp/qmp commands are implemented for pressing virtual
sleep button,

hmp: system_sleep
qmp: { "execute": "system_sleep" }

These commands put the guest into suspend or other power states
depending on the power settings inside the guest.

Signed-off-by: Annie Li 
---
 hmp-commands.hx| 14 ++
 hw/core/machine-hmp-cmds.c |  5 +
 hw/core/machine-qmp-cmds.c |  9 +
 include/monitor/hmp.h  |  1 +
 qapi/machine.json  | 18 ++
 qapi/pragma.json   |  1 +
 6 files changed, 48 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 765349ed14..bd01e49ec5 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -652,6 +652,20 @@ SRST
   whether profiling is on or off.
 ERST
 
+{
+.name   = "system_sleep",
+.args_type  = "",
+.params = "",
+.help   = "send ACPI sleep event",
+.cmd = hmp_system_sleep,
+},
+
+SRST
+``system_sleep``
+  Push the virtual sleep button; if supported the system will enter
+  an ACPI sleep state.
+ERST
+
 {
 .name   = "system_reset",
 .args_type  = "",
diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index a6ff6a4875..641a365e3e 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -185,6 +185,11 @@ void hmp_system_reset(Monitor *mon, const QDict *qdict)
 qmp_system_reset(NULL);
 }
 
+void hmp_system_sleep(Monitor *mon, const QDict *qdict)
+{
+qmp_system_sleep(NULL);
+}
+
 void hmp_system_powerdown(Monitor *mon, const QDict *qdict)
 {
 qmp_system_powerdown(NULL);
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 3860a50c3b..9f1e636c90 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -257,6 +257,15 @@ void qmp_system_reset(Error **errp)
 qemu_system_reset_request(SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET);
 }
 
+void qmp_system_sleep(Error **errp)
+{
+if (!qemu_wakeup_suspend_enabled()) {
+error_setg(errp,
+   "suspend from running is not supported by this guest");
+return;
+}
+}
+
 void qmp_system_powerdown(Error **errp)
 {
 qemu_system_powerdown_request();
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 13f9a2dedb..d72a3b775c 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -45,6 +45,7 @@ void hmp_quit(Monitor *mon, const QDict *qdict);
 void hmp_stop(Monitor *mon, const QDict *qdict);
 void hmp_sync_profile(Monitor *mon, const QDict *qdict);
 void hmp_system_reset(Monitor *mon, const QDict *qdict);
+void hmp_system_sleep(Monitor *mon, const QDict *qdict);
 void hmp_system_powerdown(Monitor *mon, const QDict *qdict);
 void hmp_exit_preconfig(Monitor *mon, const QDict *qdict);
 void hmp_announce_self(Monitor *mon, const QDict *qdict);
diff --git a/qapi/machine.json b/qapi/machine.json
index b6d634b30d..3ac69df92f 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -297,6 +297,24 @@
 ##
 { 'command': 'system_reset' }
 
+##
+# @system_sleep:
+#
+# Requests that a guest perform a ACPI sleep transition by pushing a virtual
+# sleep button.
+#
+# Notes: A guest may or may not respond to this command. This command
+#returning does not indicate that a guest has accepted the request
+#or that it has gone to sleep.
+#
+# Example:
+#
+# -> { "execute": "system_sleep" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'system_sleep' }
+
 ##
 # @system_powerdown:
 #
diff --git a/qapi/pragma.json b/qapi/pragma.json
index 0aa4eeddd3..ef15229854 100644
--- a/qapi/pragma.json
+++ b/qapi/pragma.json
@@ -23,6 +23,7 @@
 'set_password',
 'system_powerdown',
 'system_reset',
+'system_sleep',
 'system_wakeup' ],
 # Commands allowed to return a non-dictionary
 'command-returns-exceptions': [
-- 
2.34.3

[RFC PATCH 00/11] Support ACPI Control Method Sleep button

2023-12-04 Thread Annie Li

The ACPI sleep button can be implemented as a fixed hardware button
or Control Method Sleep button.

The patch of implementing a fixed hardware sleep button was posted
here 1). More discussions can be found here 2). Essentially, the
discussion mainly focuses on whether the sleep button is implemented
as a fixed hardware button or Control Method Sleep button. The latter
benefits different architectures since the code can be shared among
them.

This patch set implements Control Method Sleep button for both x86
and ARM platform.

For x86, a sleep button GPE event handler is implemented, so a GPE
event is triggered to indicate the OSPM the sleep button is pressed.
Tests have been done for Linux 6.6.0-rc2+, and Windows Server 2016,
the sleep button works as expected.

For ARM, a GED event is triggered to notify the OSPM. With proper
debug knobs it is possible to see the guest OSPM acknowledges the
sleep event:

[ 268.429495] evregion-0119 ev_address_space_dispa: Entry
[ 268.430480] evrgnini-0043 ev_system_memory_regio: Entry
[ 268.431423] evrgnini-0079 ev_system_memory_regio: Exit- AE_OK
[ 268.432303] evregion-0230 ev_address_space_dispa: Handler 81544775 
(@e8f0a66d) Address 0908 [SystemMemory]
[ 268.433943] evregion-0325 ev_address_space_dispa: Exit- AE_OK
[ 268.434793]   evmisc-0132 ev_queue_notify_reques: Dispatching Notify on 
[SLPB] (Device) Value 0x80 (Status Change) Node ada658b8

But that seems to be all, depicting that sleep/wakeup for ARM is broken
and there are still missing some pieces of the puzzle.

Nonetheless, we would like to take this RFC as an opportunity for updates
on this subject as possible roadmaps.

1) https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06478.html
2) 
https://lore.kernel.org/all/20210920095316.2dd13...@redhat.com/T/#mfe24f89778020deeacfe45083f3eea3cf9f55961

Annie Li (6):
  acpi: hmp/qmp: Add hmp/qmp support for system_sleep
  acpi: Implement control method sleep button
  test/acpi: allow DSDT table changes
  acpi: Support Control Method sleep button for x86
  tests/acpi/bios-tables-test: update DSDT tables for Control Method
Sleep button
  acpi: Send the GPE event of suspend and wakeup for x86

Miguel Luis (5):
  hw/acpi: Add ACPI GED support for the sleep event
  tests/acpi: allow FACP and DSDT table changes for arm/virt
  hw/arm: enable sleep support for arm/virt
  tests/acpi: Update FACP and DSDT tables for sleep button
  arm/virt: enable sleep support

 hmp-commands.hx   |  14 +
 hw/acpi/control_method_device.c   |  49 ++
 hw/acpi/core.c|  17 --
 hw/acpi/generic_event_device.c|   9 
 hw/acpi/meson.build   |   1 +
 hw/arm/virt-acpi-build.c  |  13 +
 hw/arm/virt.c |  14 -
 hw/core/machine-hmp-cmds.c|   5 ++
 hw/core/machine-qmp-cmds.c|  11 
 hw/i386/acpi-build.c  |   9 
 include/hw/acpi/acpi.h|   1 +
 include/hw/acpi/acpi_dev_interface.h  |   1 +
 include/hw/acpi/control_method_device.h   |  20 +++
 include/hw/acpi/generic_event_device.h|   1 +
 include/hw/arm/virt.h |   1 +
 include/monitor/hmp.h |   1 +
 qapi/machine.json |  18 +++
 qapi/pragma.json  |   1 +
 tests/data/acpi/pc/DSDT   | Bin 6830 -> 7012 bytes
 tests/data/acpi/pc/DSDT.acpierst  | Bin 6741 -> 6923 bytes
 tests/data/acpi/pc/DSDT.acpihmat  | Bin 8155 -> 8337 bytes
 tests/data/acpi/pc/DSDT.bridge| Bin 13701 -> 13883 bytes
 tests/data/acpi/pc/DSDT.cphp  | Bin 7294 -> 7476 bytes
 tests/data/acpi/pc/DSDT.dimmpxm   | Bin 8484 -> 8666 bytes
 tests/data/acpi/pc/DSDT.hpbridge  | Bin 6781 -> 6963 bytes
 tests/data/acpi/pc/DSDT.hpbrroot  | Bin 3337 -> 3519 bytes
 tests/data/acpi/pc/DSDT.ipmikcs   | Bin 6902 -> 7084 bytes
 tests/data/acpi/pc/DSDT.memhp | Bin 8189 -> 8371 bytes
 tests/data/acpi/pc/DSDT.nohpet| Bin 6688 -> 6870 bytes
 tests/data/acpi/pc/DSDT.numamem   | Bin 6836 -> 7018 bytes
 tests/data/acpi/pc/DSDT.roothp| Bin 10623 -> 10805 bytes
 tests/data/acpi/q35/DSDT  | Bin 8355 -> 8537 bytes
 tests/data/acpi/q35/DSDT.acpierst | Bin 8372 -> 8554 bytes
 tests/data/acpi/q35/DSDT.acpihmat | Bin 9680 -> 9862 bytes
 tests/data/acpi/q35/DSDT.acpihmat-noinitiator | Bin 8634 -> 8816 bytes
 tests/data/acpi/q35/DSDT.applesmc | Bin 8401 -> 8583 bytes
 tests/data/acpi/q35/DSDT.bridge   | Bin 11968 -> 12150 bytes
 tests/data/acpi/q35/DSDT.core-count   | Bin 12913 -> 13095 bytes
 tests

[PATCH v3 1/2] vhost: Add worker backend callouts

2023-12-04 Thread Mike Christie

This adds the vhost backend callouts for the worker ioctls added in the
6.4 linux kernel commit:

c1ecd8e95007 ("vhost: allow userspace to create workers")

Signed-off-by: Mike Christie 
Reviewed-by: Stefano Garzarella 
Reviewed-by: Stefan Hajnoczi 

---
 hw/virtio/vhost-backend.c | 28 
 include/hw/virtio/vhost-backend.h | 14 ++
 2 files changed, 42 insertions(+)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 17f3fc6a0823..833804dd40f2 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -158,6 +158,30 @@ static int vhost_kernel_set_vring_busyloop_timeout(struct 
vhost_dev *dev,
 return vhost_kernel_call(dev, VHOST_SET_VRING_BUSYLOOP_TIMEOUT, s);
 }
 
+static int vhost_kernel_new_worker(struct vhost_dev *dev,
+   struct vhost_worker_state *worker)
+{
+return vhost_kernel_call(dev, VHOST_NEW_WORKER, worker);
+}
+
+static int vhost_kernel_free_worker(struct vhost_dev *dev,
+struct vhost_worker_state *worker)
+{
+return vhost_kernel_call(dev, VHOST_FREE_WORKER, worker);
+}
+
+static int vhost_kernel_attach_vring_worker(struct vhost_dev *dev,
+struct vhost_vring_worker *worker)
+{
+return vhost_kernel_call(dev, VHOST_ATTACH_VRING_WORKER, worker);
+}
+
+static int vhost_kernel_get_vring_worker(struct vhost_dev *dev,
+ struct vhost_vring_worker *worker)
+{
+return vhost_kernel_call(dev, VHOST_GET_VRING_WORKER, worker);
+}
+
 static int vhost_kernel_set_features(struct vhost_dev *dev,
  uint64_t features)
 {
@@ -313,6 +337,10 @@ const VhostOps kernel_ops = {
 .vhost_set_vring_err = vhost_kernel_set_vring_err,
 .vhost_set_vring_busyloop_timeout =
 vhost_kernel_set_vring_busyloop_timeout,
+.vhost_get_vring_worker = vhost_kernel_get_vring_worker,
+.vhost_attach_vring_worker = vhost_kernel_attach_vring_worker,
+.vhost_new_worker = vhost_kernel_new_worker,
+.vhost_free_worker = vhost_kernel_free_worker,
 .vhost_set_features = vhost_kernel_set_features,
 .vhost_get_features = vhost_kernel_get_features,
 .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index a86d103f8245..70c2e8ffeee5 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -45,6 +45,8 @@ struct vhost_memory;
 struct vhost_vring_file;
 struct vhost_vring_state;
 struct vhost_vring_addr;
+struct vhost_vring_worker;
+struct vhost_worker_state;
 struct vhost_scsi_target;
 struct vhost_iotlb_msg;
 struct vhost_virtqueue;
@@ -85,6 +87,14 @@ typedef int (*vhost_set_vring_err_op)(struct vhost_dev *dev,
   struct vhost_vring_file *file);
 typedef int (*vhost_set_vring_busyloop_timeout_op)(struct vhost_dev *dev,
struct vhost_vring_state 
*r);
+typedef int (*vhost_attach_vring_worker_op)(struct vhost_dev *dev,
+struct vhost_vring_worker *worker);
+typedef int (*vhost_get_vring_worker_op)(struct vhost_dev *dev,
+ struct vhost_vring_worker *worker);
+typedef int (*vhost_new_worker_op)(struct vhost_dev *dev,
+   struct vhost_worker_state *worker);
+typedef int (*vhost_free_worker_op)(struct vhost_dev *dev,
+struct vhost_worker_state *worker);
 typedef int (*vhost_set_features_op)(struct vhost_dev *dev,
  uint64_t features);
 typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
@@ -172,6 +182,10 @@ typedef struct VhostOps {
 vhost_set_vring_call_op vhost_set_vring_call;
 vhost_set_vring_err_op vhost_set_vring_err;
 vhost_set_vring_busyloop_timeout_op vhost_set_vring_busyloop_timeout;
+vhost_new_worker_op vhost_new_worker;
+vhost_free_worker_op vhost_free_worker;
+vhost_get_vring_worker_op vhost_get_vring_worker;
+vhost_attach_vring_worker_op vhost_attach_vring_worker;
 vhost_set_features_op vhost_set_features;
 vhost_get_features_op vhost_get_features;
 vhost_set_backend_cap_op vhost_set_backend_cap;
-- 
2.34.1

[PATCH v3 0/2] vhost-scsi: Support worker ioctls

2023-12-04 Thread Mike Christie

The following patches allow users to configure the vhost worker threads
for vhost-scsi. With vhost-net we get a worker thread per rx/tx virtqueue
pair, but for vhost-scsi we get one worker for all workqueues. This
becomes a bottlneck after 2 queues are used.

In the upstream linux kernel commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/vhost/vhost.c?id=c1ecd8e9500797748ae4f79657971955d452d69d

we enabled the vhost layer to be able to create a worker thread and
attach it to a virtqueue.

This patchset adds support to vhost-scsi to use these ioctls so we are
no longer limited to the single worker.

v3:
- Warn user if they have set worker_per_virtqueue=true but the kernel
doesn't support it.
v2:
- Make config option a bool instead of an int.

[PATCH v3 2/2] vhost-scsi: Add support for a worker thread per virtqueue

2023-12-04 Thread Mike Christie

This adds support for vhost-scsi to be able to create a worker thread
per virtqueue. Right now for vhost-net we get a worker thread per
tx/rx virtqueue pair which scales nicely as we add more virtqueues and
CPUs, but for scsi we get the single worker thread that's shared by all
virtqueues. When trying to send IO to more than 2 virtqueues the single
thread becomes a bottlneck.

This patch adds a new setting, worker_per_virtqueue, which can be set
to:

false: Existing behavior where we get the single worker thread.
true: Create a worker per IO virtqueue.

Signed-off-by: Mike Christie 
Reviewed-by: Stefan Hajnoczi 

---
 hw/scsi/vhost-scsi.c| 62 +
 include/hw/virtio/virtio-scsi.h |  1 +
 2 files changed, 63 insertions(+)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 3126df9e1d9d..08aa7534df51 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -165,6 +165,59 @@ static const VMStateDescription vmstate_virtio_vhost_scsi 
= {
 .pre_save = vhost_scsi_pre_save,
 };
 
+static int vhost_scsi_set_workers(VHostSCSICommon *vsc, bool per_virtqueue)
+{
+struct vhost_dev *dev = &vsc->dev;
+struct vhost_vring_worker vq_worker;
+struct vhost_worker_state worker;
+int i, ret;
+
+/* Use default worker */
+if (!per_virtqueue || dev->nvqs == VHOST_SCSI_VQ_NUM_FIXED + 1) {
+return 0;
+}
+
+/*
+ * ctl/evt share the first worker since it will be rare for them
+ * to send cmds while IO is running.
+ */
+for (i = VHOST_SCSI_VQ_NUM_FIXED + 1; i < dev->nvqs; i++) {
+memset(&worker, 0, sizeof(worker));
+
+ret = dev->vhost_ops->vhost_new_worker(dev, &worker);
+if (ret == -ENOTTY) {
+/*
+ * worker ioctls are not implemented so just ignore and
+ * and continue device setup.
+ */
+warn_report("vhost-scsi: Backend supports a single worker. "
+"Ignoring worker_per_virtqueue=true setting.");
+ret = 0;
+break;
+} else if (ret) {
+break;
+}
+
+memset(&vq_worker, 0, sizeof(vq_worker));
+vq_worker.worker_id = worker.worker_id;
+vq_worker.index = i;
+
+ret = dev->vhost_ops->vhost_attach_vring_worker(dev, &vq_worker);
+if (ret == -ENOTTY) {
+/*
+ * It's a bug for the kernel to have supported the worker creation
+ * ioctl but not attach.
+ */
+dev->vhost_ops->vhost_free_worker(dev, &worker);
+break;
+} else if (ret) {
+break;
+}
+}
+
+return ret;
+}
+
 static void vhost_scsi_realize(DeviceState *dev, Error **errp)
 {
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
@@ -232,6 +285,13 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
 goto free_vqs;
 }
 
+ret = vhost_scsi_set_workers(vsc, vs->conf.worker_per_virtqueue);
+if (ret < 0) {
+error_setg(errp, "vhost-scsi: vhost worker setup failed: %s",
+   strerror(-ret));
+goto free_vqs;
+}
+
 /* At present, channel and lun both are 0 for bootable vhost-scsi disk */
 vsc->channel = 0;
 vsc->lun = 0;
@@ -297,6 +357,8 @@ static Property vhost_scsi_properties[] = {
  VIRTIO_SCSI_F_T10_PI,
  false),
 DEFINE_PROP_BOOL("migratable", VHostSCSICommon, migratable, false),
+DEFINE_PROP_BOOL("worker_per_virtqueue", VirtIOSCSICommon,
+ conf.worker_per_virtqueue, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 779568ab5d28..0e9a1867665e 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -51,6 +51,7 @@ typedef struct virtio_scsi_config VirtIOSCSIConfig;
 struct VirtIOSCSIConf {
 uint32_t num_queues;
 uint32_t virtqueue_size;
+bool worker_per_virtqueue;
 bool seg_max_adjust;
 uint32_t max_sectors;
 uint32_t cmd_per_lun;
-- 
2.34.1

Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()

2023-12-04 Thread Richard Henderson


On 12/4/23 12:09, Michal Suchánek wrote:

On Mon, Dec 04, 2023 at 02:50:17PM -0500, Stefan Hajnoczi wrote:

On Mon, 4 Dec 2023 at 14:40, Philippe Mathieu-Daudé  wrote:

+void tcg_unregister_thread(void)
+{
+unsigned int n;
+
+n = qatomic_fetch_dec(&tcg_cur_ctxs);
+g_free(tcg_ctxs[n]);
+qatomic_set(&tcg_ctxs[n], NULL);
+}


tcg_ctxs[n] may not be our context, so this looks like it could free
another thread's context and lead to undefined behavior.


Correct.


There is cpu->thread_id so perhaps cpu->thread_ctx could be added as
well. That would require a bitmap of used threads contexts rather than a
counter, though.


Or don't free the context at all, but re-use it when incrementing and tcg_ctxs[n] != null 
(i.e. plugging in a repacement vcpu).  After all, there can only be tcg_max_ctxs contexts.



r~

Re: [PATCH V6 13/14] tests/qtest: bootfile per vm

2023-12-04 Thread Peter Xu

On Mon, Dec 04, 2023 at 06:13:36PM -0300, Fabiano Rosas wrote:
> Steve Sistare  writes:
> 
> > Create a separate bootfile for the outgoing and incoming vm, so the block
> > layer can lock the file during the background migration test.  Otherwise,
> > the test fails with:
> >   "Failed to get "write" lock.  Is another process using the image
> >[/tmp/migration-test-WAKPD2/bootsect]?"
> 
> Hm.. what is the background migration even trying to access on the boot
> disk? @Peter?

I didn't yet notice this patch until you asked, but background snapshot is
not designed to be used like this, afaict.

It should normally be used when someone would like to use "savevm", then
background snapshot makes that snapshot save happen with VM running (live)
and mostly as performant as "savevm" due to page write protections (IOW, it
is not dirty tracking, but wr-protect each page so not writtable at all
until unprotected).

Another difference (from "savevm") is, instead of storing that image onto
the block images, it stores that image also separately just like migrating
with "file:" as of now.

When the dest QEMU starts it'll try to grab the image lock already because
it should never run with src running, just like when "loadvm" QEMU doesn't
assume the QEMU that ran "savevm" will be running.

> 
> This might be a good use for the -snapshot option. It should stop any
> attempt to get the write lock. Not a lot of difference, but slightly
> simpler.

We don't yet have a background-snapshot test case.  If we ever need,
that'll need to be done in two steps: start src, save snapshot into file,
start dest, load from snapshot file.  We just shouldn't boot two together.

Now after two years when I re-read the snapshot code a bit, I didn't even
find where QEMU took the disk snapshots.. logically it should be done at
the start of live background snapshot when VM was dumping device states,
something like bdrv_all_can_snapshot() orshould be needed to make sure all
images support snapshot on its own or it should already fail, and take
snapshots to match the image.

IOW, I don't even think current raw disk would be able to support
background snapshot at all, otherwise if VM is live I don't see a way to
match the image (which is still lively updated by the running VM) to a live
snapshot taken.  Copy the author, Andrey, for this question.

Before that is confirmed, maybe the easiest way is we can go without a
background snapshot test case for suspend vm scenario.

Thanks,

-- 
Peter Xu

Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-04 Thread Steven Sistare

On 12/4/2023 12:24 PM, Peter Xu wrote:
> On Fri, Dec 01, 2023 at 11:23:33AM -0500, Steven Sistare wrote:
 @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int 
 version_id)
  return -EINVAL;
  }
  s->state = r;
 +vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
>>>
>>> IIUC current vm_was_suspended (based on my read of your patch) was not the
>>> same as a boolean representing "whether VM is suspended", but only a
>>> temporary field to remember that for a VM stop request.  To be explicit, I
>>> didn't see this flag set in qemu_system_suspend() in your previous patch.
>>>
>>> If so, we can already do:
>>>
>>>   vm_set_suspended(s->vm_was_suspended);
>>>
>>> Irrelevant of RUN_STATE_SUSPENDED?
>>
>> We need both terms of the expression.
>>
>> If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = 
>> false.
>> We call global_state_store prior to vm_stop_force_state, so the incoming
>> side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.
> 
> Right.
> 
>> However, the runstate is RUN_STATE_INMIGRATE.  When incoming finishes by
>> calling vm_start, we need to restore the suspended state.  Thus in 
>> global_state_post_load, we must set vm_was_suspended = true.
> 
> With above, shouldn't global_state_get_runstate() (on dest) fetch SUSPENDED
> already?  Then I think it should call vm_start(SUSPENDED) if to start.

The V6 code does not pass a state to vm_start, and knowledge of vm_was_suspended
is confined to the global_state and cpus functions.  IMO this is a more modular
and robust solution, as multiple sites may call vm_start(), and the right thing
happens.  Look at patch 6.  The changes are minimal because vm_start "just 
works".

> Maybe you're talking about the special case where autostart==false?  We
> used to have this (existing process_incoming_migration_bh()):
> 
> if (!global_state_received() ||
> global_state_get_runstate() == RUN_STATE_RUNNING) {
> if (autostart) {
> vm_start();
> } else {
> runstate_set(RUN_STATE_PAUSED);
> }
> }
> 
> If so maybe I get you, because in the "else" path we do seem to lose the
> SUSPENDED state again, but in that case IMHO we should logically set
> vm_was_suspended only when we "lose" it - we didn't lose it during
> migration, but only until we decided to switch to PAUSED (due to
> autostart==false). IOW, change above to something like:
> 
> state = global_state_get_runstate();
> if (!global_state_received() || runstate_is_alive(state)) {
> if (autostart) {
> vm_start(state);
> } else {
> if (runstate_is_suspended(state)) {
> /* Remember suspended state before setting system to STOPed */
> vm_was_suspended = true;
> }
> runstate_set(RUN_STATE_PAUSED);
> }
> }

This is similar to V5 which tested suspended and fiddled with runstate at
multiple call sites in migration and snapshot.  I believe V6 is cleaner.

> It may or may not have a functional difference even if current patch,
> though.  However maybe clearer to follow vm_was_suspended's strict
> definition.
> 
>>
>> If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
>> then vm_was_suspended = true.  Migration from that state sets
>> vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and 
>> ends with runstate_set(RUN_STATE_PAUSED).
>>
>> I will add a comment here in the code.
>>  
  return 0;
  }
 @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = {
  .fields = (VMStateField[]) {
  VMSTATE_UINT32(size, GlobalState),
  VMSTATE_BUFFER(runstate, GlobalState),
 +VMSTATE_BOOL(vm_was_suspended, GlobalState),
  VMSTATE_END_OF_LIST()
  },
  };
>>>
>>> I think this will break migration between old/new, unfortunately.  And
>>> since the global state exist mostly for every VM, all VM setup should be
>>> affected, and over all archs.
>>
>> Thanks, I keep forgetting that my binary tricks are no good here.  However,
>> I have one other trick up my sleeve, which is to store vm_was_running in
>> global_state.runstate[strlen(runstate) + 2].  It is forwards and backwards
>> compatible, since that byte is always 0 in older qemu.  It can be implemented
>> with a few lines of code change confined to global_state.c, versus many 
>> lines 
>> spread across files to do it the conventional way using a compat property and
>> a subsection.  Sound OK?  
> 
> Tricky!  But sounds okay to me.  I think you're inventing some of your own
> way of being compatible, not relying on machine type as a benefit.  If go
> this route please document clearly on the layout and also what it looked
> like in old binaries.
> 
> I think maybe it'll be good to keep using strings, so in the new binaries
> we allow >1 strings, t

Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-04 Thread Peter Xu

On Mon, Dec 04, 2023 at 06:09:16PM -0300, Fabiano Rosas wrote:
> Right, I got your point. I just think we could avoid designing this new
> string format by creating new fields with the extra space:
> 
> typedef struct QEMU_PACKED {
> uint32_t size;
> uint8_t runstate[50];
> uint8_t unused[50];
> RunState state;
> bool received;
> } GlobalState;
> 
> In my mind this works seamlessly, or am I mistaken?

I think what you proposed should indeed work.

Currently it's:

.fields = (VMStateField[]) {
VMSTATE_UINT32(size, GlobalState),
VMSTATE_BUFFER(runstate, GlobalState),
VMSTATE_END_OF_LIST()
},

I had a quick look at vmstate_info_buffer, it mostly only get()/put() those
buffers with its sizeof(), so looks all fine.  For sure in all cases we'd
better test it to verify.

One side note is since we so far use qapi_enum_parse() for the runstate, I
think the "size" is not ever used..

If we do want a split, IMHO we can consider making runstate[] even smaller
to just free up the rest spaces all in one shot:

  typedef struct QEMU_PACKED {
  uint32_t size;
  /*
   * Assuming 16 is good enough to fit all possible runstate strings..
   * This field must be a string ending with '\0'.
   */
  uint8_t runstate[16];
  /* 0x00 when QEMU doesn't support it, or "0"/"1" to reflect its state */
  uint8_t vm_was_suspended[1];
  /*
   * Still free of use space.  Note that we only have 99 bytes for use
   * because the last byte (the 100th byte) must be zero due to legacy
   * reasons, if not it may be set to zero after loaded on dest QEMU. 
   */
  uint8_t unused[82];
  RunState state;
  bool received;
  } GlobalState;

Pairs with something like:

.fields = (VMStateField[]) {
/* Used to be "size" but never used on dest, so always ignored */
VMSTATE_UNUSED(4),
VMSTATE_BUFFER(runstate, GlobalState),
VMSTATE_BUFFER(vm_was_suspended, GlobalState),
/*
 * This is actually all zeros, but just to differenciate from the
 * last byte..
 */
VMSTATE_BUFFER(unused, GlobalState),
/*
 * For historical reasons, the last byte must be 0x00 or it'll be
 * overwritten by old qemu otherwise.
 */
VMSTATE_UNUSED(1),
VMSTATE_END_OF_LIST()
},

> 
> In any case, a oneshot hack might be better than both our suggestions
> because we can just clean it up a couple of releases from now as if
> nothing happened.

It can be forgotten forever, then we keep the code less readable.  If we
have a plan to do that and not so awkward, IMHO we should go directly with
that plan.

Thanks,

-- 
Peter Xu

QEMU CentOS Stream 8 x86_64 runner is offline

2023-12-04 Thread Stefan Hajnoczi

Hi,
GitLab is reporting that that the CentOS Stream 8 x86_64 runner
(#12198892) has been offline for a few hours.

The machine responds to ping so maybe gitlab-runner just needs to be
restarted?

Thanks,
Stefan


signature.asc
Description: PGP signature

Re: [PATCH V6 10/14] tests/qtest: option to suspend during migration

2023-12-04 Thread Fabiano Rosas

Steve Sistare  writes:

> Add an option to suspend the src in a-b-bootblock.S, which puts the guest
> in S3 state after one round of writing to memory.  The option is enabled by
> poking a 1 into the suspend_me word in the boot block prior to starting the
> src vm.  Generate symbol offsets in a-b-bootblock.h so that the suspend_me
> offset is known.  Generate the bootblock for each test, because suspend_me
> may differ for each.
>
> Signed-off-by: Steve Sistare 
> Acked-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH V6 14/14] tests/qtest: background migration with suspend

2023-12-04 Thread Fabiano Rosas

Steve Sistare  writes:

> Add a test case to verify that the suspended state is handled correctly by
> a background migration.  The test suspends the src, migrates, then wakes
> the dest.
>
> Signed-off-by: Steve Sistare 

Reviewed-by: Fabiano Rosas

Re: [PATCH V6 13/14] tests/qtest: bootfile per vm

2023-12-04 Thread Fabiano Rosas

Steve Sistare  writes:

> Create a separate bootfile for the outgoing and incoming vm, so the block
> layer can lock the file during the background migration test.  Otherwise,
> the test fails with:
>   "Failed to get "write" lock.  Is another process using the image
>[/tmp/migration-test-WAKPD2/bootsect]?"

Hm.. what is the background migration even trying to access on the boot
disk? @Peter?

This might be a good use for the -snapshot option. It should stop any
attempt to get the write lock. Not a lot of difference, but slightly
simpler.

Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-04 Thread Fabiano Rosas

Peter Xu  writes:

> On Mon, Dec 04, 2023 at 04:31:56PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Fri, Dec 01, 2023 at 11:23:33AM -0500, Steven Sistare wrote:
>> >> >> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, 
>> >> >> int version_id)
>> >> >>  return -EINVAL;
>> >> >>  }
>> >> >>  s->state = r;
>> >> >> +vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
>> >> > 
>> >> > IIUC current vm_was_suspended (based on my read of your patch) was not 
>> >> > the
>> >> > same as a boolean representing "whether VM is suspended", but only a
>> >> > temporary field to remember that for a VM stop request.  To be 
>> >> > explicit, I
>> >> > didn't see this flag set in qemu_system_suspend() in your previous 
>> >> > patch.
>> >> > 
>> >> > If so, we can already do:
>> >> > 
>> >> >   vm_set_suspended(s->vm_was_suspended);
>> >> > 
>> >> > Irrelevant of RUN_STATE_SUSPENDED?
>> >> 
>> >> We need both terms of the expression.
>> >> 
>> >> If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = 
>> >> false.
>> >> We call global_state_store prior to vm_stop_force_state, so the incoming
>> >> side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.
>> >
>> > Right.
>> >
>> >> However, the runstate is RUN_STATE_INMIGRATE.  When incoming finishes by
>> >> calling vm_start, we need to restore the suspended state.  Thus in 
>> >> global_state_post_load, we must set vm_was_suspended = true.
>> >
>> > With above, shouldn't global_state_get_runstate() (on dest) fetch SUSPENDED
>> > already?  Then I think it should call vm_start(SUSPENDED) if to start.
>> >
>> > Maybe you're talking about the special case where autostart==false?  We
>> > used to have this (existing process_incoming_migration_bh()):
>> >
>> > if (!global_state_received() ||
>> > global_state_get_runstate() == RUN_STATE_RUNNING) {
>> > if (autostart) {
>> > vm_start();
>> > } else {
>> > runstate_set(RUN_STATE_PAUSED);
>> > }
>> > }
>> >
>> > If so maybe I get you, because in the "else" path we do seem to lose the
>> > SUSPENDED state again, but in that case IMHO we should logically set
>> > vm_was_suspended only when we "lose" it - we didn't lose it during
>> > migration, but only until we decided to switch to PAUSED (due to
>> > autostart==false). IOW, change above to something like:
>> >
>> > state = global_state_get_runstate();
>> > if (!global_state_received() || runstate_is_alive(state)) {
>> > if (autostart) {
>> > vm_start(state);
>> > } else {
>> > if (runstate_is_suspended(state)) {
>> > /* Remember suspended state before setting system to 
>> > STOPed */
>> > vm_was_suspended = true;
>> > }
>> > runstate_set(RUN_STATE_PAUSED);
>> > }
>> > }
>> >
>> > It may or may not have a functional difference even if current patch,
>> > though.  However maybe clearer to follow vm_was_suspended's strict
>> > definition.
>> >
>> >> 
>> >> If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
>> >> then vm_was_suspended = true.  Migration from that state sets
>> >> vm_was_suspended = s->vm_was_suspended = true in global_state_post_load 
>> >> and 
>> >> ends with runstate_set(RUN_STATE_PAUSED).
>> >> 
>> >> I will add a comment here in the code.
>> >>  
>> >> >>  return 0;
>> >> >>  }
>> >> >> @@ -134,6 +143,7 @@ static const VMStateDescription 
>> >> >> vmstate_globalstate = {
>> >> >>  .fields = (VMStateField[]) {
>> >> >>  VMSTATE_UINT32(size, GlobalState),
>> >> >>  VMSTATE_BUFFER(runstate, GlobalState),
>> >> >> +VMSTATE_BOOL(vm_was_suspended, GlobalState),
>> >> >>  VMSTATE_END_OF_LIST()
>> >> >>  },
>> >> >>  };
>> >> > 
>> >> > I think this will break migration between old/new, unfortunately.  And
>> >> > since the global state exist mostly for every VM, all VM setup should be
>> >> > affected, and over all archs.
>> >> 
>> >> Thanks, I keep forgetting that my binary tricks are no good here.  
>> >> However,
>> >> I have one other trick up my sleeve, which is to store vm_was_running in
>> >> global_state.runstate[strlen(runstate) + 2].  It is forwards and backwards
>> >> compatible, since that byte is always 0 in older qemu.  It can be 
>> >> implemented
>> >> with a few lines of code change confined to global_state.c, versus many 
>> >> lines 
>> >> spread across files to do it the conventional way using a compat property 
>> >> and
>> >> a subsection.  Sound OK?  
>> >
>> > Tricky!  But sounds okay to me.  I think you're inventing some of your own
>> > way of being compatible, not relying on machine type as a benefit.  If go
>> > this route please document clearly on the layout and also what it looked
>> > like in old binaries.
>> >
>> > I think maybe it'll be good to keep using strings, so in the new binaries

Re: [PATCH V6 11/14] tests/qtest: precopy migration with suspend

2023-12-04 Thread Peter Xu

On Thu, Nov 30, 2023 at 01:37:24PM -0800, Steve Sistare wrote:
> Add a test case to verify that the suspended state is handled correctly
> during live migration precopy.  The test suspends the src, migrates, then
> wakes the dest.
> 
> Signed-off-by: Steve Sistare 
> ---
>  tests/qtest/migration-helpers.c |  3 ++
>  tests/qtest/migration-helpers.h |  2 ++
>  tests/qtest/migration-test.c| 64 
> ++---
>  3 files changed, 65 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index fd3b94e..37e8e81 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char 
> *name,
>  if (g_str_equal(name, "STOP")) {
>  state->stop_seen = true;
>  return true;
> +} else if (g_str_equal(name, "SUSPEND")) {
> +state->suspend_seen = true;
> +return true;
>  } else if (g_str_equal(name, "RESUME")) {
>  state->resume_seen = true;
>  return true;
> diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
> index 3d32699..b478549 100644
> --- a/tests/qtest/migration-helpers.h
> +++ b/tests/qtest/migration-helpers.h
> @@ -18,6 +18,8 @@
>  typedef struct QTestMigrationState {
>  bool stop_seen;
>  bool resume_seen;
> +bool suspend_seen;
> +bool suspend_me;
>  } QTestMigrationState;
>  
>  bool migrate_watch_for_events(QTestState *who, const char *name,
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index e10d5a4..200f023 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -178,7 +178,7 @@ static void bootfile_delete(void)
>  /*
>   * Wait for some output in the serial output file,
>   * we get an 'A' followed by an endless string of 'B's
> - * but on the destination we won't have the A.
> + * but on the destination we won't have the A (unless we enabled 
> suspend/resume)
>   */
>  static void wait_for_serial(const char *side)
>  {
> @@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, 
> QTestMigrationState *state)
>  }
>  }
>  
> +static void wait_for_suspend(QTestState *who, QTestMigrationState *state)
> +{
> +if (!state->suspend_seen) {
> +qtest_qmp_eventwait(who, "SUSPEND");
> +}
> +}
> +
>  /*
>   * It's tricky to use qemu's migration event capability with qtest,
>   * events suddenly appearing confuse the qmp()/hmp() responses.
> @@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who)
>  {
>  uint64_t pass, prev_pass = 0, changes = 0;
>  
> -while (changes < 2 && !src_state.stop_seen) {
> +while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) {
>  usleep(1000);
>  pass = get_migration_pass(who);
>  changes += (pass != prev_pass);
> @@ -595,7 +602,8 @@ static void migrate_wait_for_dirty_mem(QTestState *from,
>  watch_byte = qtest_readb(from, watch_address);
>  do {
>  usleep(1000 * 10);
> -} while (qtest_readb(from, watch_address) == watch_byte);
> +} while (qtest_readb(from, watch_address) == watch_byte &&
> + !src_state.suspend_seen);

This is hackish to me.

AFAIU the guest code won't ever dirty anything after printing the initial
'B'.  IOW, migrate_wait_for_dirty_mem() should not be called for suspend
test at all, I guess, because we know it won't.

>  }
>  
>  
> @@ -771,6 +779,7 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  dst_state = (QTestMigrationState) { };
>  src_state = (QTestMigrationState) { };
>  bootfile_create(tmpfs, args->suspend_me);
> +src_state.suspend_me = args->suspend_me;
>  
>  if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
>  memory_size = "150M";
> @@ -1730,6 +1739,9 @@ static void test_precopy_common(MigrateCommon *args)
>   * change anything.
>   */
>  if (args->result == MIG_TEST_SUCCEED) {
> +if (src_state.suspend_me) {
> +wait_for_suspend(from, &src_state);
> +}
>  qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
>  wait_for_stop(from, &src_state);
>  migrate_ensure_converge(from);
> @@ -1777,6 +1789,9 @@ static void test_precopy_common(MigrateCommon *args)
>   */
>  wait_for_migration_complete(from);
>  
> +if (src_state.suspend_me) {
> +wait_for_suspend(from, &src_state);
> +}

Here it's pretty much uneasy to follow too, waiting for SUSPEND to come,
even after migration has already completed!

I suspect it never waits, since suspend_seen is normally always already
set (with the above hack, migrate_wait_for_dirty_mem() plays the role of
waiting for SUSPENDED).

>  wait_for_stop(from, &src_state);
>  
>  } else {

IMHO it'll be clean

Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()

2023-12-04 Thread Michal Suchánek

On Mon, Dec 04, 2023 at 02:50:17PM -0500, Stefan Hajnoczi wrote:
> On Mon, 4 Dec 2023 at 14:40, Philippe Mathieu-Daudé  wrote:
> >
> > Unplugging vCPU triggers the following assertion in
> 
> Unplugging leaks the tcg context refcount but does not trigger the
> assertion directly. Maybe clarify that by changing the wording:
> 
> "Plugging a vCPU after it has been unplugged triggers..."
> 
> > tcg_register_thread():
> >
> >  796 void tcg_register_thread(void)
> >  797 {
> >  ...
> >  812 /* Claim an entry in tcg_ctxs */
> >  813 n = qatomic_fetch_inc(&tcg_cur_ctxs);
> >  814 g_assert(n < tcg_max_ctxs);
> >
> > Implement and use tcg_unregister_thread() so when a
> > vCPU is unplugged, the tcg_cur_ctxs refcount is
> > decremented.
> >
> > Reported-by: Michal Suchánek 
> > Suggested-by: Stefan Hajnoczi 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> > RFC: untested
> > Report: 
> > https://lore.kernel.org/qemu-devel/20231204183638.gz9...@kitsune.suse.cz/
> > ---
> >  include/tcg/startup.h   |  5 +
> >  accel/tcg/tcg-accel-ops-mttcg.c |  1 +
> >  accel/tcg/tcg-accel-ops-rr.c|  1 +
> >  tcg/tcg.c   | 17 +
> >  4 files changed, 24 insertions(+)
> >
> > diff --git a/include/tcg/startup.h b/include/tcg/startup.h
> > index f71305765c..520942a4a1 100644
> > --- a/include/tcg/startup.h
> > +++ b/include/tcg/startup.h
> > @@ -45,6 +45,11 @@ void tcg_init(size_t tb_size, int splitwx, unsigned 
> > max_cpus);
> >   */
> >  void tcg_register_thread(void);
> >
> > +/**
> > + * tcg_unregister_thread: Unregister this thread with the TCG runtime
> > + */
> > +void tcg_unregister_thread(void);
> > +
> >  /**
> >   * tcg_prologue_init(): Generate the code for the TCG prologue
> >   *
> > diff --git a/accel/tcg/tcg-accel-ops-mttcg.c 
> > b/accel/tcg/tcg-accel-ops-mttcg.c
> > index fac80095bb..88d7427aad 100644
> > --- a/accel/tcg/tcg-accel-ops-mttcg.c
> > +++ b/accel/tcg/tcg-accel-ops-mttcg.c
> > @@ -120,6 +120,7 @@ static void *mttcg_cpu_thread_fn(void *arg)
> >
> >  tcg_cpus_destroy(cpu);
> >  qemu_mutex_unlock_iothread();
> > +tcg_unregister_thread();
> >  rcu_remove_force_rcu_notifier(&force_rcu.notifier);
> >  rcu_unregister_thread();
> >  return NULL;
> > diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
> > index 611932f3c3..c2af3aad21 100644
> > --- a/accel/tcg/tcg-accel-ops-rr.c
> > +++ b/accel/tcg/tcg-accel-ops-rr.c
> > @@ -302,6 +302,7 @@ static void *rr_cpu_thread_fn(void *arg)
> >  rr_deal_with_unplugged_cpus();
> >  }
> >
> > +tcg_unregister_thread();
> >  rcu_remove_force_rcu_notifier(&force_rcu);
> >  rcu_unregister_thread();
> >  return NULL;
> > diff --git a/tcg/tcg.c b/tcg/tcg.c
> > index d2ea22b397..5125342d70 100644
> > --- a/tcg/tcg.c
> > +++ b/tcg/tcg.c
> > @@ -781,11 +781,18 @@ static void alloc_tcg_plugin_context(TCGContext *s)
> >   * modes.
> >   */
> >  #ifdef CONFIG_USER_ONLY
> > +
> >  void tcg_register_thread(void)
> >  {
> >  tcg_ctx = &tcg_init_ctx;
> >  }
> > +
> > +void tcg_unregister_thread(void)
> > +{
> > +}
> > +
> >  #else
> > +
> >  void tcg_register_thread(void)
> >  {
> >  TCGContext *s = g_malloc(sizeof(*s));
> > @@ -814,6 +821,16 @@ void tcg_register_thread(void)
> >
> >  tcg_ctx = s;
> >  }
> > +
> > +void tcg_unregister_thread(void)
> > +{
> > +unsigned int n;
> > +
> > +n = qatomic_fetch_dec(&tcg_cur_ctxs);
> > +g_free(tcg_ctxs[n]);
> > +qatomic_set(&tcg_ctxs[n], NULL);
> > +}
> 
> tcg_ctxs[n] may not be our context, so this looks like it could free
> another thread's context and lead to undefined behavior.

There is cpu->thread_id so perhaps cpu->thread_ctx could be added as
well. That would require a bitmap of used threads contexts rather than a
counter, though.

Thanks

Michal

Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-04 Thread Peter Xu

On Mon, Dec 04, 2023 at 04:31:56PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Fri, Dec 01, 2023 at 11:23:33AM -0500, Steven Sistare wrote:
> >> >> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int 
> >> >> version_id)
> >> >>  return -EINVAL;
> >> >>  }
> >> >>  s->state = r;
> >> >> +vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
> >> > 
> >> > IIUC current vm_was_suspended (based on my read of your patch) was not 
> >> > the
> >> > same as a boolean representing "whether VM is suspended", but only a
> >> > temporary field to remember that for a VM stop request.  To be explicit, 
> >> > I
> >> > didn't see this flag set in qemu_system_suspend() in your previous patch.
> >> > 
> >> > If so, we can already do:
> >> > 
> >> >   vm_set_suspended(s->vm_was_suspended);
> >> > 
> >> > Irrelevant of RUN_STATE_SUSPENDED?
> >> 
> >> We need both terms of the expression.
> >> 
> >> If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = 
> >> false.
> >> We call global_state_store prior to vm_stop_force_state, so the incoming
> >> side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.
> >
> > Right.
> >
> >> However, the runstate is RUN_STATE_INMIGRATE.  When incoming finishes by
> >> calling vm_start, we need to restore the suspended state.  Thus in 
> >> global_state_post_load, we must set vm_was_suspended = true.
> >
> > With above, shouldn't global_state_get_runstate() (on dest) fetch SUSPENDED
> > already?  Then I think it should call vm_start(SUSPENDED) if to start.
> >
> > Maybe you're talking about the special case where autostart==false?  We
> > used to have this (existing process_incoming_migration_bh()):
> >
> > if (!global_state_received() ||
> > global_state_get_runstate() == RUN_STATE_RUNNING) {
> > if (autostart) {
> > vm_start();
> > } else {
> > runstate_set(RUN_STATE_PAUSED);
> > }
> > }
> >
> > If so maybe I get you, because in the "else" path we do seem to lose the
> > SUSPENDED state again, but in that case IMHO we should logically set
> > vm_was_suspended only when we "lose" it - we didn't lose it during
> > migration, but only until we decided to switch to PAUSED (due to
> > autostart==false). IOW, change above to something like:
> >
> > state = global_state_get_runstate();
> > if (!global_state_received() || runstate_is_alive(state)) {
> > if (autostart) {
> > vm_start(state);
> > } else {
> > if (runstate_is_suspended(state)) {
> > /* Remember suspended state before setting system to STOPed 
> > */
> > vm_was_suspended = true;
> > }
> > runstate_set(RUN_STATE_PAUSED);
> > }
> > }
> >
> > It may or may not have a functional difference even if current patch,
> > though.  However maybe clearer to follow vm_was_suspended's strict
> > definition.
> >
> >> 
> >> If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
> >> then vm_was_suspended = true.  Migration from that state sets
> >> vm_was_suspended = s->vm_was_suspended = true in global_state_post_load 
> >> and 
> >> ends with runstate_set(RUN_STATE_PAUSED).
> >> 
> >> I will add a comment here in the code.
> >>  
> >> >>  return 0;
> >> >>  }
> >> >> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate 
> >> >> = {
> >> >>  .fields = (VMStateField[]) {
> >> >>  VMSTATE_UINT32(size, GlobalState),
> >> >>  VMSTATE_BUFFER(runstate, GlobalState),
> >> >> +VMSTATE_BOOL(vm_was_suspended, GlobalState),
> >> >>  VMSTATE_END_OF_LIST()
> >> >>  },
> >> >>  };
> >> > 
> >> > I think this will break migration between old/new, unfortunately.  And
> >> > since the global state exist mostly for every VM, all VM setup should be
> >> > affected, and over all archs.
> >> 
> >> Thanks, I keep forgetting that my binary tricks are no good here.  However,
> >> I have one other trick up my sleeve, which is to store vm_was_running in
> >> global_state.runstate[strlen(runstate) + 2].  It is forwards and backwards
> >> compatible, since that byte is always 0 in older qemu.  It can be 
> >> implemented
> >> with a few lines of code change confined to global_state.c, versus many 
> >> lines 
> >> spread across files to do it the conventional way using a compat property 
> >> and
> >> a subsection.  Sound OK?  
> >
> > Tricky!  But sounds okay to me.  I think you're inventing some of your own
> > way of being compatible, not relying on machine type as a benefit.  If go
> > this route please document clearly on the layout and also what it looked
> > like in old binaries.
> >
> > I think maybe it'll be good to keep using strings, so in the new binaries
> > we allow >1 strings, then we define properly on those strings (index 0:
> > runstate, existed since start; index 2: suspended, perhaps using "1"/"0"

Re: [RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()

2023-12-04 Thread Stefan Hajnoczi

On Mon, 4 Dec 2023 at 14:40, Philippe Mathieu-Daudé  wrote:
>
> Unplugging vCPU triggers the following assertion in

Unplugging leaks the tcg context refcount but does not trigger the
assertion directly. Maybe clarify that by changing the wording:

"Plugging a vCPU after it has been unplugged triggers..."

> tcg_register_thread():
>
>  796 void tcg_register_thread(void)
>  797 {
>  ...
>  812 /* Claim an entry in tcg_ctxs */
>  813 n = qatomic_fetch_inc(&tcg_cur_ctxs);
>  814 g_assert(n < tcg_max_ctxs);
>
> Implement and use tcg_unregister_thread() so when a
> vCPU is unplugged, the tcg_cur_ctxs refcount is
> decremented.
>
> Reported-by: Michal Suchánek 
> Suggested-by: Stefan Hajnoczi 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> RFC: untested
> Report: 
> https://lore.kernel.org/qemu-devel/20231204183638.gz9...@kitsune.suse.cz/
> ---
>  include/tcg/startup.h   |  5 +
>  accel/tcg/tcg-accel-ops-mttcg.c |  1 +
>  accel/tcg/tcg-accel-ops-rr.c|  1 +
>  tcg/tcg.c   | 17 +
>  4 files changed, 24 insertions(+)
>
> diff --git a/include/tcg/startup.h b/include/tcg/startup.h
> index f71305765c..520942a4a1 100644
> --- a/include/tcg/startup.h
> +++ b/include/tcg/startup.h
> @@ -45,6 +45,11 @@ void tcg_init(size_t tb_size, int splitwx, unsigned 
> max_cpus);
>   */
>  void tcg_register_thread(void);
>
> +/**
> + * tcg_unregister_thread: Unregister this thread with the TCG runtime
> + */
> +void tcg_unregister_thread(void);
> +
>  /**
>   * tcg_prologue_init(): Generate the code for the TCG prologue
>   *
> diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c
> index fac80095bb..88d7427aad 100644
> --- a/accel/tcg/tcg-accel-ops-mttcg.c
> +++ b/accel/tcg/tcg-accel-ops-mttcg.c
> @@ -120,6 +120,7 @@ static void *mttcg_cpu_thread_fn(void *arg)
>
>  tcg_cpus_destroy(cpu);
>  qemu_mutex_unlock_iothread();
> +tcg_unregister_thread();
>  rcu_remove_force_rcu_notifier(&force_rcu.notifier);
>  rcu_unregister_thread();
>  return NULL;
> diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
> index 611932f3c3..c2af3aad21 100644
> --- a/accel/tcg/tcg-accel-ops-rr.c
> +++ b/accel/tcg/tcg-accel-ops-rr.c
> @@ -302,6 +302,7 @@ static void *rr_cpu_thread_fn(void *arg)
>  rr_deal_with_unplugged_cpus();
>  }
>
> +tcg_unregister_thread();
>  rcu_remove_force_rcu_notifier(&force_rcu);
>  rcu_unregister_thread();
>  return NULL;
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index d2ea22b397..5125342d70 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -781,11 +781,18 @@ static void alloc_tcg_plugin_context(TCGContext *s)
>   * modes.
>   */
>  #ifdef CONFIG_USER_ONLY
> +
>  void tcg_register_thread(void)
>  {
>  tcg_ctx = &tcg_init_ctx;
>  }
> +
> +void tcg_unregister_thread(void)
> +{
> +}
> +
>  #else
> +
>  void tcg_register_thread(void)
>  {
>  TCGContext *s = g_malloc(sizeof(*s));
> @@ -814,6 +821,16 @@ void tcg_register_thread(void)
>
>  tcg_ctx = s;
>  }
> +
> +void tcg_unregister_thread(void)
> +{
> +unsigned int n;
> +
> +n = qatomic_fetch_dec(&tcg_cur_ctxs);
> +g_free(tcg_ctxs[n]);
> +qatomic_set(&tcg_ctxs[n], NULL);
> +}

tcg_ctxs[n] may not be our context, so this looks like it could free
another thread's context and lead to undefined behavior.

I haven't read the code so I can't suggest an alternative myself.

Stefan

> +
>  #endif /* !CONFIG_USER_ONLY */
>
>  /* pool based memory allocation */
> --
> 2.41.0
>

Re: qemu ppc64 crash when adding CPU

2023-12-04 Thread Philippe Mathieu-Daudé


Hi,

On 4/12/23 19:57, Stefan Hajnoczi wrote:

On Mon, 4 Dec 2023 at 13:37, Michal Suchánek  wrote:



Looking at tcg.c line 784 is here:

ster_thread(void)
{
 TCGContext *s = g_malloc(sizeof(*s));
 unsigned int i, n;

 *s = tcg_init_ctx;

 /* Relink mem_base.  */
 for (i = 0, n = tcg_init_ctx.nb_globals; i < n; ++i) {
 if (tcg_init_ctx.temps[i].mem_base) {
 ptrdiff_t b = tcg_init_ctx.temps[i].mem_base - tcg_init_ctx.temps;
 tcg_debug_assert(b >= 0 && b < n);
 s->temps[i].mem_base = &s->temps[b];
 }
 }

 /* Claim an entry in tcg_ctxs */
 n = qatomic_fetch_inc(&tcg_cur_ctxs);

g_assert(n < tcg_max_ctxs); <<<

 qatomic_set(&tcg_ctxs[n], s);

 if (n > 0) {
 alloc_tcg_plugin_context(s);
 tcg_region_initial_alloc(s);
 }

 tcg_ctx = s;
}

Any idea why qemu would be crashing here?


Hi Michal,

$ git grep tcg_cur_ctxs
tcg/region.c:409:unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
tcg/region.c:889:unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
tcg/tcg-internal.h:34:extern unsigned int tcg_cur_ctxs;
tcg/tcg.c:241:unsigned int tcg_cur_ctxs;
tcg/tcg.c:806:n = qatomic_fetch_inc(&tcg_cur_ctxs);
tcg/tcg.c:1369:tcg_cur_ctxs = 1;

I don't see a qatomic_dec(&tcg_cur_ctxs) anywhere, so it seems hot
unplugging a vcpu doesn't release the tcg_cur_ctxs refcount. Do we
need a tcg_unregister_thread() function?


Suggested fix posted as RFC patch:
https://lore.kernel.org/qemu-devel/20231204194039.56169-1-phi...@linaro.org/

[RFC PATCH-for-8.2?] accel/tcg: Implement tcg_unregister_thread()

2023-12-04 Thread Philippe Mathieu-Daudé

Unplugging vCPU triggers the following assertion in
tcg_register_thread():

 796 void tcg_register_thread(void)
 797 {
 ...
 812 /* Claim an entry in tcg_ctxs */
 813 n = qatomic_fetch_inc(&tcg_cur_ctxs);
 814 g_assert(n < tcg_max_ctxs);

Implement and use tcg_unregister_thread() so when a
vCPU is unplugged, the tcg_cur_ctxs refcount is
decremented.

Reported-by: Michal Suchánek 
Suggested-by: Stefan Hajnoczi 
Signed-off-by: Philippe Mathieu-Daudé 
---
RFC: untested
Report: 
https://lore.kernel.org/qemu-devel/20231204183638.gz9...@kitsune.suse.cz/
---
 include/tcg/startup.h   |  5 +
 accel/tcg/tcg-accel-ops-mttcg.c |  1 +
 accel/tcg/tcg-accel-ops-rr.c|  1 +
 tcg/tcg.c   | 17 +
 4 files changed, 24 insertions(+)

diff --git a/include/tcg/startup.h b/include/tcg/startup.h
index f71305765c..520942a4a1 100644
--- a/include/tcg/startup.h
+++ b/include/tcg/startup.h
@@ -45,6 +45,11 @@ void tcg_init(size_t tb_size, int splitwx, unsigned 
max_cpus);
  */
 void tcg_register_thread(void);
 
+/**
+ * tcg_unregister_thread: Unregister this thread with the TCG runtime
+ */
+void tcg_unregister_thread(void);
+
 /**
  * tcg_prologue_init(): Generate the code for the TCG prologue
  *
diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c
index fac80095bb..88d7427aad 100644
--- a/accel/tcg/tcg-accel-ops-mttcg.c
+++ b/accel/tcg/tcg-accel-ops-mttcg.c
@@ -120,6 +120,7 @@ static void *mttcg_cpu_thread_fn(void *arg)
 
 tcg_cpus_destroy(cpu);
 qemu_mutex_unlock_iothread();
+tcg_unregister_thread();
 rcu_remove_force_rcu_notifier(&force_rcu.notifier);
 rcu_unregister_thread();
 return NULL;
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 611932f3c3..c2af3aad21 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -302,6 +302,7 @@ static void *rr_cpu_thread_fn(void *arg)
 rr_deal_with_unplugged_cpus();
 }
 
+tcg_unregister_thread();
 rcu_remove_force_rcu_notifier(&force_rcu);
 rcu_unregister_thread();
 return NULL;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index d2ea22b397..5125342d70 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -781,11 +781,18 @@ static void alloc_tcg_plugin_context(TCGContext *s)
  * modes.
  */
 #ifdef CONFIG_USER_ONLY
+
 void tcg_register_thread(void)
 {
 tcg_ctx = &tcg_init_ctx;
 }
+
+void tcg_unregister_thread(void)
+{
+}
+
 #else
+
 void tcg_register_thread(void)
 {
 TCGContext *s = g_malloc(sizeof(*s));
@@ -814,6 +821,16 @@ void tcg_register_thread(void)
 
 tcg_ctx = s;
 }
+
+void tcg_unregister_thread(void)
+{
+unsigned int n;
+
+n = qatomic_fetch_dec(&tcg_cur_ctxs);
+g_free(tcg_ctxs[n]);
+qatomic_set(&tcg_ctxs[n], NULL);
+}
+
 #endif /* !CONFIG_USER_ONLY */
 
 /* pool based memory allocation */
-- 
2.41.0

Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-04 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Dec 01, 2023 at 11:23:33AM -0500, Steven Sistare wrote:
>> >> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int 
>> >> version_id)
>> >>  return -EINVAL;
>> >>  }
>> >>  s->state = r;
>> >> +vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
>> > 
>> > IIUC current vm_was_suspended (based on my read of your patch) was not the
>> > same as a boolean representing "whether VM is suspended", but only a
>> > temporary field to remember that for a VM stop request.  To be explicit, I
>> > didn't see this flag set in qemu_system_suspend() in your previous patch.
>> > 
>> > If so, we can already do:
>> > 
>> >   vm_set_suspended(s->vm_was_suspended);
>> > 
>> > Irrelevant of RUN_STATE_SUSPENDED?
>> 
>> We need both terms of the expression.
>> 
>> If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = 
>> false.
>> We call global_state_store prior to vm_stop_force_state, so the incoming
>> side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.
>
> Right.
>
>> However, the runstate is RUN_STATE_INMIGRATE.  When incoming finishes by
>> calling vm_start, we need to restore the suspended state.  Thus in 
>> global_state_post_load, we must set vm_was_suspended = true.
>
> With above, shouldn't global_state_get_runstate() (on dest) fetch SUSPENDED
> already?  Then I think it should call vm_start(SUSPENDED) if to start.
>
> Maybe you're talking about the special case where autostart==false?  We
> used to have this (existing process_incoming_migration_bh()):
>
> if (!global_state_received() ||
> global_state_get_runstate() == RUN_STATE_RUNNING) {
> if (autostart) {
> vm_start();
> } else {
> runstate_set(RUN_STATE_PAUSED);
> }
> }
>
> If so maybe I get you, because in the "else" path we do seem to lose the
> SUSPENDED state again, but in that case IMHO we should logically set
> vm_was_suspended only when we "lose" it - we didn't lose it during
> migration, but only until we decided to switch to PAUSED (due to
> autostart==false). IOW, change above to something like:
>
> state = global_state_get_runstate();
> if (!global_state_received() || runstate_is_alive(state)) {
> if (autostart) {
> vm_start(state);
> } else {
> if (runstate_is_suspended(state)) {
> /* Remember suspended state before setting system to STOPed */
> vm_was_suspended = true;
> }
> runstate_set(RUN_STATE_PAUSED);
> }
> }
>
> It may or may not have a functional difference even if current patch,
> though.  However maybe clearer to follow vm_was_suspended's strict
> definition.
>
>> 
>> If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
>> then vm_was_suspended = true.  Migration from that state sets
>> vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and 
>> ends with runstate_set(RUN_STATE_PAUSED).
>> 
>> I will add a comment here in the code.
>>  
>> >>  return 0;
>> >>  }
>> >> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = 
>> >> {
>> >>  .fields = (VMStateField[]) {
>> >>  VMSTATE_UINT32(size, GlobalState),
>> >>  VMSTATE_BUFFER(runstate, GlobalState),
>> >> +VMSTATE_BOOL(vm_was_suspended, GlobalState),
>> >>  VMSTATE_END_OF_LIST()
>> >>  },
>> >>  };
>> > 
>> > I think this will break migration between old/new, unfortunately.  And
>> > since the global state exist mostly for every VM, all VM setup should be
>> > affected, and over all archs.
>> 
>> Thanks, I keep forgetting that my binary tricks are no good here.  However,
>> I have one other trick up my sleeve, which is to store vm_was_running in
>> global_state.runstate[strlen(runstate) + 2].  It is forwards and backwards
>> compatible, since that byte is always 0 in older qemu.  It can be implemented
>> with a few lines of code change confined to global_state.c, versus many 
>> lines 
>> spread across files to do it the conventional way using a compat property and
>> a subsection.  Sound OK?  
>
> Tricky!  But sounds okay to me.  I think you're inventing some of your own
> way of being compatible, not relying on machine type as a benefit.  If go
> this route please document clearly on the layout and also what it looked
> like in old binaries.
>
> I think maybe it'll be good to keep using strings, so in the new binaries
> we allow >1 strings, then we define properly on those strings (index 0:
> runstate, existed since start; index 2: suspended, perhaps using "1"/"0" to
> express, while 0x00 means old binary, etc.).
>
> I hope this trick will need less code than the subsection solution,
> otherwise I'd still consider going with that, which is the "common
> solution".
>
> Let's also see whether Juan/Fabiano/others has any opinions.

Can't we pack the structure and just go ahead and slash 'r

Re: [PATCH 1/2] target/riscv: Add vill check for whole vector register move instructions

2023-12-04 Thread Daniel Henrique Barboza





On 11/29/23 14:03, Max Chou wrote:

The ratified version of RISC-V V spec section 16.6 says that
`The instructions operate as if EEW=SEW`.

So the whole vector register move instructions depend on the vtype
register that means the whole vector register move instructions should
raise an illegal-instruction exception when vtype.vill=1.

Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/insn_trans/trans_rvv.c.inc | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 78bd363310d..114ad87397f 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -3631,13 +3631,14 @@ static bool trans_vcompress_vm(DisasContext *s, arg_r 
*a)
  }
  
  /*

- * Whole Vector Register Move Instructions ignore vtype and vl setting.
- * Thus, we don't need to check vill bit. (Section 16.6)
+ * Whole Vector Register Move Instructions depend on vtype register(vsew).
+ * Thus, we need to check vill bit. (Section 16.6)
   */
  #define GEN_VMV_WHOLE_TRANS(NAME, LEN) \
  static bool trans_##NAME(DisasContext *s, arg_##NAME * a)   \
  {   \
  if (require_rvv(s) &&   \
+vext_check_isa_ill(s) &&\
  QEMU_IS_ALIGNED(a->rd, LEN) &&  \
  QEMU_IS_ALIGNED(a->rs2, LEN)) { \
  uint32_t maxsz = (s->cfg_ptr->vlen >> 3) * LEN; \

Re: [PATCH-for-8.2?] hw/ufs: avoid generating the same ID string for different LU devices

2023-12-04 Thread Philippe Mathieu-Daudé


Hi Jeuk,

On 4/12/23 17:50, Philippe Mathieu-Daudé wrote:

On 4/12/23 16:05, Akinobu Mita wrote:

QEMU would not start when trying to create two UFS host controllers and
a UFS logical unit for each with the following options:

-device ufs,id=bus0 \
-device ufs-lu,drive=drive1,bus=bus0,lun=0 \
-device ufs,id=bus1 \
-device ufs-lu,drive=drive2,bus=bus1,lun=0 \

This is because the same ID string ("0:0:0/scsi-disk") is generated
for both UFS logical units.

To fix this issue, prepend the parent pci device's path to make
the ID string unique.
(":00:03.0/0:0:0/scsi-disk" and ":00:04.0/0:0:0/scsi-disk")

Fixes: 096434fea13a ("hw/ufs: Modify lu.c to share codes with SCSI 
subsystem")


If you think this must be fixed for the 8.2 release, please assign
a release blocker issues to the GitLab 8.2 milestone here:
https://gitlab.com/qemu-project/qemu/-/milestones/10


Signed-off-by: Akinobu Mita 


Reviewed-by: Philippe Mathieu-Daudé 


---
  hw/ufs/ufs.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/hw/ufs/ufs.c b/hw/ufs/ufs.c
index 68c5f1f6c9..eccdb852a0 100644
--- a/hw/ufs/ufs.c
+++ b/hw/ufs/ufs.c
@@ -1323,9 +1323,17 @@ static bool ufs_bus_check_address(BusState 
*qbus, DeviceState *qdev,

  return true;
  }
+static char *ufs_bus_get_dev_path(DeviceState *dev)
+{
+    BusState *bus = qdev_get_parent_bus(dev);
+
+    return qdev_get_dev_path(bus->parent);
+}
+
  static void ufs_bus_class_init(ObjectClass *class, void *data)
  {
  BusClass *bc = BUS_CLASS(class);
+    bc->get_dev_path = ufs_bus_get_dev_path;
  bc->check_address = ufs_bus_check_address;
  }

[PATCH RFC v3 0/1] Implement STM32L4x5 EXTI

2023-12-04 Thread ~inesvarhol

This patch allows to emulate the STM32L4x5 EXTI device.
It implements register access and software interruptions.

This is RFC because we're new at contributing to QEMU.
We had some difficulties writing qtests and the result might be bizarre.

We have some questions about the next steps for our stm32l4x5 project :

Should we send a non-RFC patch after this peripheral implementation is
reviewed,
or should we wait for more peripherals to be implemented?
We have syscfg and uart implementations ongoing.

Also, should the version numbers restart from 1 when sending a non-RFC
tag ?

Sincerely,
Inès Varhol


Changes from v2 to v3:
- adding more tests writing/reading in exti registers
- adding tests checking that interrupt work by reading NVIC registers
- correcting exti_write in SWIER (so it sets an irq only when a bit goes
from '0' to '1')
- correcting exti_set_irq (so it never writes in PR when the relevant
bit in IMR is '0')

Changes from v1 to v2:
- use arrays to deduplicate code and logic
- move internal constant EXTI_NUM_GPIO_EVENT_IN_LINES from the header
to the .c file
- Improve copyright headers
- replace static const with #define
- use the DEFINE_TYPES macro
- fill the impl and valid field of the exti's MemoryRegionOps
- fix invalid test caused by a last minute change

Based-on: <170049810484.22920.61207457697187832...@git.sr.ht>
([RFC v3 2/2] hw/arm: Add minimal support for the B-L475E-IOT01A board)

Inès Varhol (1):
  Implement STM32L4x5 EXTI

 hw/arm/Kconfig|   1 +
 hw/arm/stm32l4x5_soc.c|  65 +++-
 hw/misc/Kconfig   |   3 +
 hw/misc/meson.build   |   1 +
 hw/misc/stm32l4x5_exti.c  | 306 +++
 hw/misc/trace-events  |   5 +
 include/hw/arm/stm32l4x5_soc.h|   3 +
 include/hw/misc/stm32l4x5_exti.h  |  58 
 tests/qtest/meson.build   |   5 +
 tests/qtest/stm32l4x5_exti-test.c | 485 ++
 10 files changed, 930 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/stm32l4x5_exti.c
 create mode 100644 include/hw/misc/stm32l4x5_exti.h
 create mode 100644 tests/qtest/stm32l4x5_exti-test.c

-- 
2.38.5

[PATCH RFC v3 1/1] Implement STM32L4x5 EXTI

2023-12-04 Thread ~inesvarhol

From: Inès Varhol 

Signed-off-by: Arnaud Minier 
Signed-off-by: Inès Varhol 
---
 hw/arm/Kconfig|   1 +
 hw/arm/stm32l4x5_soc.c|  65 +++-
 hw/misc/Kconfig   |   3 +
 hw/misc/meson.build   |   1 +
 hw/misc/stm32l4x5_exti.c  | 306 +++
 hw/misc/trace-events  |   5 +
 include/hw/arm/stm32l4x5_soc.h|   3 +
 include/hw/misc/stm32l4x5_exti.h  |  58 
 tests/qtest/meson.build   |   5 +
 tests/qtest/stm32l4x5_exti-test.c | 485 ++
 10 files changed, 930 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/stm32l4x5_exti.c
 create mode 100644 include/hw/misc/stm32l4x5_exti.h
 create mode 100644 tests/qtest/stm32l4x5_exti-test.c

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index b95576fb0c..28d378ed83 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -458,6 +458,7 @@ config STM32L4X5_SOC
 bool
 select ARM_V7M
 select OR_IRQ
+select STM32L4X5_EXTI
 
 config XLNX_ZYNQMP_ARM
 bool
diff --git a/hw/arm/stm32l4x5_soc.c b/hw/arm/stm32l4x5_soc.c
index f476878b2c..cf786eac1d 100644
--- a/hw/arm/stm32l4x5_soc.c
+++ b/hw/arm/stm32l4x5_soc.c
@@ -45,10 +45,51 @@
 #define SRAM2_BASE_ADDRESS 0x1000
 #define SRAM2_SIZE (32 * KiB)
 
+#define EXTI_ADDR 0x40010400
+
+#define NUM_EXTI_IRQ 40
+/* Match exti line connections with their CPU IRQ number */
+/* See Vector Table (Reference Manual p.396) */
+static const int exti_irq[NUM_EXTI_IRQ] = {
+6,  /* GPIO[0] */
+7,  /* GPIO[1] */
+8,  /* GPIO[2] */
+9,  /* GPIO[3] */
+10, /* GPIO[4] */
+23, 23, 23, 23, 23, /* GPIO[5..9]  */
+40, 40, 40, 40, 40, 40, /* GPIO[10..15]*/
+1,  /* PVD */
+67, /* OTG_FS_WKUP, Direct */
+41, /* RTC_ALARM   */
+2,  /* RTC_TAMP_STAMP2/CSS_LSE */
+3,  /* RTC wakeup timer*/
+63, /* COMP1   */
+63, /* COMP2   */
+31, /* I2C1 wakeup, Direct */
+33, /* I2C2 wakeup, Direct */
+72, /* I2C3 wakeup, Direct */
+37, /* USART1 wakeup, Direct   */
+38, /* USART2 wakeup, Direct   */
+39, /* USART3 wakeup, Direct   */
+52, /* UART4 wakeup, Direct*/
+53, /* UART4 wakeup, Direct*/
+70, /* LPUART1 wakeup, Direct  */
+65, /* LPTIM1, Direct  */
+66, /* LPTIM2, Direct  */
+76, /* SWPMI1 wakeup, Direct   */
+1,  /* PVM1 wakeup */
+1,  /* PVM2 wakeup */
+1,  /* PVM3 wakeup */
+1,  /* PVM4 wakeup */
+78  /* LCD wakeup, Direct  */
+};
+
 static void stm32l4x5_soc_initfn(Object *obj)
 {
 Stm32l4x5SocState *s = STM32L4X5_SOC(obj);
 
+object_initialize_child(obj, "exti", &s->exti, TYPE_STM32L4X5_EXTI);
+
 s->sysclk = qdev_init_clock_in(DEVICE(s), "sysclk", NULL, NULL, 0);
 s->refclk = qdev_init_clock_in(DEVICE(s), "refclk", NULL, NULL, 0);
 }
@@ -59,7 +100,9 @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, 
Error **errp)
 Stm32l4x5SocState *s = STM32L4X5_SOC(dev_soc);
 const Stm32l4x5SocClass *sc = STM32L4X5_SOC_GET_CLASS(dev_soc);
 MemoryRegion *system_memory = get_system_memory();
-DeviceState *armv7m;
+DeviceState *dev, *armv7m;
+SysBusDevice *busdev;
+int i;
 
 /*
  * We use s->refclk internally and only define it with qdev_init_clock_in()
@@ -124,6 +167,25 @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, 
Error **errp)
 return;
 }
 
+dev = DEVICE(&s->exti);
+if (!sysbus_realize(SYS_BUS_DEVICE(&s->exti), errp)) {
+return;
+}
+busdev = SYS_BUS_DEVICE(dev);
+sysbus_mmio_map(busdev, 0, EXTI_ADDR);
+for (i = 0; i < NUM_EXTI_IRQ; i++) {
+/* IRQ seems not to be connected ? */
+sysbus_connect_irq(busdev, i, qdev_get_gpio_in(armv7m, exti_irq[i]));
+}
+
+/*
+ * Uncomment when Syscfg is implemented
+ * for (i = 0; i < 16; i++) {
+ * qdev_connect_gpio_out(DEVICE(&s->syscfg), i,
+ *   qdev_get_gpio_in(dev, i));
+ * }
+ */
+
 /* APB1 BUS */
 create_unimplemented_device("TIM2",  0x4000, 0x400);
 create_unimplemented_device("TIM3",  0x4400, 0x400);
@@ -164,7 +2

Re: qemu ppc64 crash when adding CPU

2023-12-04 Thread Stefan Hajnoczi

On Mon, 4 Dec 2023 at 13:37, Michal Suchánek  wrote:
>
> Hello,
>
> When running a VM with libvirt I get:
>
> /usr/bin/qemu-system-ppc64 --version
> QEMU emulator version 8.1.3 (Virtualization / 15.5)
> Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
>
> /usr/bin/qemu-system-ppc64 -name
> guest=sles12sp5-ppc64le,debug-threads=on -S -object
> {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-11-sles12sp5-ppc64le/master-key.aes"}
> -machine
> pseries-7.1,usb=off,dump-guest-core=off,memory-backend=ppc_spapr.ram
> -accel tcg -cpu POWER9 -m 4096 -object
> {"qom-type":"memory-backend-ram","id":"ppc_spapr.ram","size":4294967296}
> -overcommit mem-lock=off -smp 16,sockets=1,dies=1,cores=2,threads=8
> -uuid a6ad6a7d-125b-4525-b452-241ce2000eda -display none -no-user-config
> -nodefaults -chardev socket,id=charmonitor,fd=29,server=on,wait=off -mon
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
> -boot strict=on -device
> {"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.0","addr":"0x3"}
> -device
> {"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x4"}
> -device
> {"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.0","addr":"0x2"}
> -blockdev
> {"driver":"file","filename":"/home/hramrach/Downloads/SLE-12-SP5-Server-MINI-ISO-ppc64le-GM-DVD.iso","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}
> -blockdev
> {"node-name":"libvirt-2-format","read-only":true,"driver":"raw","file":"libvirt-2-storage"}
> -device
> {"driver":"scsi-cd","bus":"scsi0.0","channel":0,"scsi-id":0,"lun":0,"device_id":"drive-scsi0-0-0-0","drive":"libvirt-2-format","id":"scsi0-0-0-0","bootindex":2}
> -blockdev
> {"driver":"file","filename":"/var/lib/libvirt/images/sles12sp5-ppc64le.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}
> -blockdev
> {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-1-storage","backing":null}
> -device
> {"driver":"scsi-hd","bus":"scsi0.0","channel":0,"scsi-id":0,"lun":1,"device_id":"drive-scsi0-0-0-1","drive":"libvirt-1-format","id":"scsi0-0-0-1","bootindex":1}
> -netdev {"type":"tap","fd":"30","id":"hostnet0"} -device
> {"driver":"e1000","netdev":"hostnet0","id":"net0","mac":"52:54:00:3b:d5:a5","bus":"pci.0","addr":"0x1"}
> -chardev pty,id=charserial0 -device
> {"driver":"spapr-vty","chardev":"charserial0","id":"serial0","reg":805306368}
> -audiodev {"id":"audio1","driver":"none"} -device
> {"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.0","addr":"0x5"}
> -object
> {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}
> -device
> {"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.0","addr":"0x6"}
> -sandbox
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
> -msg timestamp=on
>
> virsh qemu-monitor-command sles12sp5-ppc64le query-hotpluggable-cpus | jq . | 
> cat
> {
>   "return": [
> {
>   "props": {
> "core-id": 8,
> "node-id": 0
>   },
>   "vcpus-count": 8,
>   "qom-path": "/machine/unattached/device[2]",
>   "type": "power9_v2.2-spapr-cpu-core"
> },
> {
>   "props": {
> "core-id": 0,
> "node-id": 0
>   },
>   "vcpus-count": 8,
>   "qom-path": "/machine/unattached/device[1]",
>   "type": "power9_v2.2-spapr-cpu-core"
> }
>   ],
>   "id": "libvirt-155"
> }
>
> virsh qemu-monitor-command sles12sp5-ppc64le device_del 
> '"id":"/machine/unattached/device[2]"' | jq .
> {
>   "return": {},
>   "id": "libvirt-218"
> }
>
> virsh qemu-monitor-command sles12sp5-ppc64le query-hotpluggable-cpus | jq . | 
> cat
> {
>   "return": [
> {
>   "props": {
> "core-id": 8,
> "node-id": 0
>   },
>   "vcpus-count": 8,
>   "type": "power9_v2.2-spapr-cpu-core"
> },
> {
>   "props": {
> "core-id": 0,
> "node-id": 0
>   },
>   "vcpus-count": 8,
>   "qom-path": "/machine/unattached/device[1]",
>   "type": "power9_v2.2-spapr-cpu-core"
> }
>   ],
>   "id": "libvirt-235"
> }
>
> virsh qemu-monitor-command sles12sp5-ppc64le device_add '"id":"cpu-666"' 
> '"driver":"power9_v2.2-spapr-cpu-core"' '"core-id":8' '"node-id":0'  | jq .
>
> __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51  }
> (gdb) up
> #1  0x7f7839c553e5 in __GI_abort () at abort.c:79
> 79raise (SIGABRT);
> (gdb) up
> #2  0x7f783c54a125 in g_assertion_message (domain=domain@entry=0x0, 
> file=file@entry=0x556b3baf9242 "../tcg/tcg.c", line=line@entry=784, 
> func=func@entry=0x556b3bb55720 <__func__.55816> "tcg_register_thread",
> message=message@entry=0x7f76a46e8f40 "assertion failed: (n < 
> tcg_max_ctxs)") at ../glib/gtestutils.c:3223
> 3223g_abort ();
>
> This ends the usable part of stacktrace, going upp the call stack gdb
> locks up.
>
> Looking at tcg.c line 784 is here:
>
>

Re: [PATCH] hostmem: Round up memory size for qemu_madvise() in host_memory_backend_memory_complete()

2023-12-04 Thread David Hildenbrand





Fair enough. After all of this, I'm inclined to turn this into a proper
error and deny not page aligned sizes. There's no real benefit in having
them and furthermore, the original bug report is about cryptic error
message.



I guess if we glue this to compat machines we should be definitely fine.

--
Cheers,

David / dhildenb

qemu ppc64 crash when adding CPU

2023-12-04 Thread Michal Suchánek

Hello,

When running a VM with libvirt I get:

/usr/bin/qemu-system-ppc64 --version
QEMU emulator version 8.1.3 (Virtualization / 15.5)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers

/usr/bin/qemu-system-ppc64 -name
guest=sles12sp5-ppc64le,debug-threads=on -S -object
{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-11-sles12sp5-ppc64le/master-key.aes"}
-machine
pseries-7.1,usb=off,dump-guest-core=off,memory-backend=ppc_spapr.ram
-accel tcg -cpu POWER9 -m 4096 -object
{"qom-type":"memory-backend-ram","id":"ppc_spapr.ram","size":4294967296}
-overcommit mem-lock=off -smp 16,sockets=1,dies=1,cores=2,threads=8
-uuid a6ad6a7d-125b-4525-b452-241ce2000eda -display none -no-user-config
-nodefaults -chardev socket,id=charmonitor,fd=29,server=on,wait=off -mon
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-boot strict=on -device
{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.0","addr":"0x3"}
-device
{"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x4"}
-device
{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.0","addr":"0x2"}
-blockdev
{"driver":"file","filename":"/home/hramrach/Downloads/SLE-12-SP5-Server-MINI-ISO-ppc64le-GM-DVD.iso","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}
-blockdev
{"node-name":"libvirt-2-format","read-only":true,"driver":"raw","file":"libvirt-2-storage"}
-device
{"driver":"scsi-cd","bus":"scsi0.0","channel":0,"scsi-id":0,"lun":0,"device_id":"drive-scsi0-0-0-0","drive":"libvirt-2-format","id":"scsi0-0-0-0","bootindex":2}
-blockdev
{"driver":"file","filename":"/var/lib/libvirt/images/sles12sp5-ppc64le.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}
-blockdev
{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-1-storage","backing":null}
-device
{"driver":"scsi-hd","bus":"scsi0.0","channel":0,"scsi-id":0,"lun":1,"device_id":"drive-scsi0-0-0-1","drive":"libvirt-1-format","id":"scsi0-0-0-1","bootindex":1}
-netdev {"type":"tap","fd":"30","id":"hostnet0"} -device
{"driver":"e1000","netdev":"hostnet0","id":"net0","mac":"52:54:00:3b:d5:a5","bus":"pci.0","addr":"0x1"}
-chardev pty,id=charserial0 -device
{"driver":"spapr-vty","chardev":"charserial0","id":"serial0","reg":805306368}
-audiodev {"id":"audio1","driver":"none"} -device
{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.0","addr":"0x5"}
-object
{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}
-device
{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.0","addr":"0x6"}
-sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on

virsh qemu-monitor-command sles12sp5-ppc64le query-hotpluggable-cpus | jq . | 
cat
{
  "return": [
{
  "props": {
"core-id": 8,
"node-id": 0
  },
  "vcpus-count": 8,
  "qom-path": "/machine/unattached/device[2]",
  "type": "power9_v2.2-spapr-cpu-core"
},
{
  "props": {
"core-id": 0,
"node-id": 0
  },
  "vcpus-count": 8,
  "qom-path": "/machine/unattached/device[1]",
  "type": "power9_v2.2-spapr-cpu-core"
}
  ],
  "id": "libvirt-155"
}

virsh qemu-monitor-command sles12sp5-ppc64le device_del 
'"id":"/machine/unattached/device[2]"' | jq . 
{
  "return": {},
  "id": "libvirt-218"
}

virsh qemu-monitor-command sles12sp5-ppc64le query-hotpluggable-cpus | jq . | 
cat
{
  "return": [
{
  "props": {
"core-id": 8,
"node-id": 0
  },
  "vcpus-count": 8,
  "type": "power9_v2.2-spapr-cpu-core"
},
{
  "props": {
"core-id": 0,
"node-id": 0
  },
  "vcpus-count": 8,
  "qom-path": "/machine/unattached/device[1]",
  "type": "power9_v2.2-spapr-cpu-core"
}
  ],
  "id": "libvirt-235"
}

virsh qemu-monitor-command sles12sp5-ppc64le device_add '"id":"cpu-666"' 
'"driver":"power9_v2.2-spapr-cpu-core"' '"core-id":8' '"node-id":0'  | jq .

__GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51  }
(gdb) up
#1  0x7f7839c553e5 in __GI_abort () at abort.c:79
79raise (SIGABRT);
(gdb) up
#2  0x7f783c54a125 in g_assertion_message (domain=domain@entry=0x0, 
file=file@entry=0x556b3baf9242 "../tcg/tcg.c", line=line@entry=784, 
func=func@entry=0x556b3bb55720 <__func__.55816> "tcg_register_thread", 
message=message@entry=0x7f76a46e8f40 "assertion failed: (n < 
tcg_max_ctxs)") at ../glib/gtestutils.c:3223
3223g_abort ();

This ends the usable part of stacktrace, going upp the call stack gdb
locks up.

Looking at tcg.c line 784 is here:

ster_thread(void)
{
TCGContext *s = g_malloc(sizeof(*s));
unsigned int i, n;

*s = tcg_init_ctx;

/* Relink mem_base.  */
for (i = 0, n = tcg_init_ctx.nb_globals; i < n; ++i) {
if (tcg_init_ctx.temps[i].mem_base) {
ptrdiff_t b = tcg_init_ctx.temps[i].mem_base - t

Re: [RFC 0/8] Support generic Luks encryption

2023-12-04 Thread Daniel P . Berrangé

On Tue, Dec 05, 2023 at 01:32:51AM +0800, Yong Huang wrote:
> On Tue, Dec 5, 2023 at 12:51 AM Daniel P. Berrangé 
> wrote:
> 
> > On Tue, Dec 05, 2023 at 12:41:16AM +0800, Yong Huang wrote:
> > > On Tue, Dec 5, 2023 at 12:24 AM Daniel P. Berrangé 
> > > wrote:
> > >
> > > > On Tue, Dec 05, 2023 at 12:06:17AM +0800, Hyman Huang wrote:
> > > > > This functionality was motivated by the following to-do list seen
> > > > > in crypto documents:
> > > > > https://wiki.qemu.org/Features/Block/Crypto
> > > > >
> > > > > The last chapter says we should "separate header volume":
> > > > >
> > > > > The LUKS format has ability to store the header in a separate volume
> > > > > from the payload. We should extend the LUKS driver in QEMU to support
> > > > > this use case.
> > > > >
> > > > > As a proof-of-concept, I've created this patchset, which I've named
> > > > > the Gluks: generic luks. As their name suggests, they offer
> > encryption
> > > > > for any format that QEMU theoretically supports.
> > > >
> > > > I don't see the point in creating a new driver.
> > > >
> > > > I would expect detached header support to be implemented via an
> > > > optional new 'header' field in the existing driver. ie
> > > >
> > > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > > index ca390c5700..48d1f2a974 100644
> > > > --- a/qapi/block-core.json
> > > > +++ b/qapi/block-core.json
> > > > @@ -3352,11 +3352,15 @@
> > > >  # decryption key (since 2.6). Mandatory except when doing a
> > > >  # metadata-only probe of the image.
> > > >  #
> > > > +# @header: optional reference to the location of a blockdev
> > > > +# storing a detached LUKS heaer
> > > > +#
> > > >  # Since: 2.9
> > > >  ##
> > > >  { 'struct': 'BlockdevOptionsLUKS',
> > > >'base': 'BlockdevOptionsGenericFormat',
> > > > -  'data': { '*key-secret': 'str' } }
> > > > +  'data': { '*key-secret': 'str',
> > > > +"*header-file': 'BlockdevRef'} }
> > > >
> > > >  ##
> > > >  # @BlockdevOptionsGenericCOWFormat:
> > > > @@ -4941,9 +4945,18 @@
> > > >  #
> > > >  # Driver specific image creation options for LUKS.
> > > >  #
> > > > -# @file: Node to create the image format on
> > > > +# @file: Node to create the image format on. Mandatory
> > > > +# unless a detached header file is specified using
> > > > +# @header.
> > > >  #
> > > > -# @size: Size of the virtual disk in bytes
> > > > +# @size: Size of the virtual disk in bytes.  Mandatory
> > > > +# unless a detached header file is specified using
> > > > +# @header.
> > > > +#
> > > > +# @header: optional reference to the location of a blockdev
> > > > +# storing a detached LUKS heaer. The @file option is
> > > > +# is optional when this is given, unless it is desired
> > > > +# to perform pre-allocation
> > > >  #
> > > >  # @preallocation: Preallocation mode for the new image (since: 4.2)
> > > >  # (default: off; allowed values: off, metadata, falloc, full)
> > > > @@ -4952,8 +4965,9 @@
> > > >  ##
> > > >  { 'struct': 'BlockdevCreateOptionsLUKS',
> > > >'base': 'QCryptoBlockCreateOptionsLUKS',
> > > > -  'data': { 'file': 'BlockdevRef',
> > > > -'size': 'size',
> > > > +  'data': { '*file':'BlockdevRef',
> > > > +'*size':'size',
> > > > +'*header':  'BlockdevRef'
> > > >  '*preallocation':   'PreallocMode' } }
> > > >
> > > >  ##
> > > >
> > > > It ends up giving basicallly the same workflow as you outline,
> > > > without needing the new block driver
> > > >
> > >
> > > How about the design and usage, could it be simpler? Any advice? :)
> > >
> > >
> > > As you can see below, the Gluks format block layer driver's design is
> > > quite simple.
> > >
> > >  virtio-blk/vhost-user-blk...(front-end device)
> > >   ^
> > >   |
> > >  Gluks   (format-like disk node)
> > >   / \
> > >file   header (blockdev reference)
> > > / \
> > >  filefile (protocol node)
> > >|   |
> > >disk data   Luks data
> >
> > What I suggested above ends up with the exact same block driver
> > graph, unless I'm missing something.
> >
> 
> I could overlook something or fail to adequately convey the goal of the
> patchset. :(
> 
> Indeed, utilizing the same block driver might be effective if our only goal
> is to divide the header volume, giving us an additional way to use Luks.
> 
> While supporting encryption for any disk format that QEMU is capable of
> supporting is another feature of this patchset. This implies that we might
> link the Luks header to other blockdev references, which might alter how
> the Luks are used and make them incompatible with it. It's not
> user-friendly in my opinion, and I'm not aware of a more elegant solution.

That existing LUKS driver can already be used in combination with
any QEMU block driver, and

Re: [PATCH] Revert "test/qga: use G_TEST_DIR to locate os-release test file"

2023-12-04 Thread Andrey Drobyshev

On 12/4/23 19:09, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Dec 4, 2023 at 9:01 PM Andrey Drobyshev
>  wrote:
>>
>> On 12/4/23 18:51, Marc-André Lureau wrote:
>>> Hi
>>>
>>> On Mon, Dec 4, 2023 at 8:33 PM Andrey Drobyshev
>>>  wrote:

 Since the commit a85d09269b QGA_OS_RELEASE variable points to the path
 relative to the build dir.  Then on qemu-ga startup this path can't be
 found as qemu-ga cwd is somewhere else, which leads to the test failure:

   # ./tests/unit/test-qga -p /qga/guest-get-osinfo
   # random seed: R02S3a90c22d77ff1070fbd844f4959cf4a4
   # Start of qga tests
   **
   ERROR:../tests/unit/test-qga.c:906:test_qga_guest_get_osinfo: 'str' 
 should not be NULL
   Bail out! ERROR:../tests/unit/test-qga.c:906:test_qga_guest_get_osinfo: 
 'str' should not be NULL

 Let's obtain the absolute path again.
>>>
>>> Can you detail how the build and the test is done?
>>>
>>
>> Simple as:
>>
>>> ./configure --cc=gcc --target-list=x86_64-softmmu --enable-guest-agent && 
>>> make -j16
>>> cd build; tests/unit/test-qga -p /qga/guest-get-osinfo
>>
>>
>>> If I recall correctly, this change was done in order to move qga to a
>>> subproject(), but isn't strictly required at this point. Although I
>>> believe it is more correct to lookup test data relative to
>>> G_TEST_DIST.
>>>
>>
>> Then we'd have to change cwd of qemu-ga at startup to ensure relative
>> paths work.  Right now (with the initial change) it appears broken.
> 
> By reverting the patch, it is _still_ broken if you run the test
> manually from a different directory (say from tests/unit for example)
>
> With G_TEST_DIST, and proper testing environment, it works from any directory.
> 

No, it seems to be failing as well, only earlier.  Before the revert:
> cd build/tests/unit; ./test-qga 
> # random seed: R02S450ef942c699b5af6dff48f9c5b73b33
> **
> ERROR:../tests/unit/test-qga.c:79:fixture_setup: assertion failed (error == 
> NULL): Failed to execute child process “$SRC/build/tests/unit/qga/qemu-ga” 
> (No such file or directory) (g-exec-error-quark, 8)
> Bail out! ERROR:../tests/unit/test-qga.c:79:fixture_setup: assertion failed 
> (error == NULL): Failed to execute child process 
> “$SRC/build/tests/unit/qga/qemu-ga” (No such file or directory) 
> (g-exec-error-quark, 8)

But maybe my testing environment isn't proper?

> Tests are not meant to be run manually, you should run them through
> the test runner: meson test -v test-qga
> 

That's a good point, but I just found it suspicious that this is
literally the *only* case of the *only* unit test which fails (when run
directly from ./build).  Could we fix the direct execution as well then?

Btw test runner also cannot be run from just any directory, otherwise it
complains:
> meson test -v test-qga 
> 
> ERROR: No such build data file as 
> '$SRC/build/tests/unit/meson-private/build.dat'.


>>

 This reverts commit a85d09269bb1a7071d3ce0f2957e3ca9dba7c047.

 Signed-off-by: Andrey Drobyshev 
 ---
  tests/unit/test-qga.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

 diff --git a/tests/unit/test-qga.c b/tests/unit/test-qga.c
 index 671e83cb86..47cf5e30ec 100644
 --- a/tests/unit/test-qga.c
 +++ b/tests/unit/test-qga.c
 @@ -1034,10 +1034,12 @@ static void 
 test_qga_guest_get_osinfo(gconstpointer data)
  g_autoptr(QDict) ret = NULL;
  char *env[2];
  QDict *val;
 +g_autofree gchar *cwd = NULL;

 +cwd = g_get_current_dir();
  env[0] = g_strdup_printf(
 -"QGA_OS_RELEASE=%s%c..%cdata%ctest-qga-os-release",
 -g_test_get_dir(G_TEST_DIST), G_DIR_SEPARATOR, G_DIR_SEPARATOR, 
 G_DIR_SEPARATOR);
 +"QGA_OS_RELEASE=%s%ctests%cdata%ctest-qga-os-release",
 +cwd, G_DIR_SEPARATOR, G_DIR_SEPARATOR, G_DIR_SEPARATOR);
  env[1] = NULL;
  fixture_setup(&fixture, NULL, env);

 --
 2.39.3


>>>
>>>
>>
> 
>

Re: [PATCH v2] system/memory: use ldn_he_p/stn_he_p

2023-12-04 Thread Patrick Venture

On Mon, Dec 4, 2023 at 3:24 AM Philippe Mathieu-Daudé 
wrote:

> Hi Patrick,
>
> On 3/12/23 16:42, Patrick Venture wrote:
>
> > Friendly ping? Is this going to be applied or do I need to add another
> > CC or?  I do think it should go into stable.
>
> I'll send a PR with this patch included.
>

Thanks!

>
> Regards,
>
> Phil.
>

Re: [RFC 0/8] Support generic Luks encryption

2023-12-04 Thread Yong Huang

On Tue, Dec 5, 2023 at 12:51 AM Daniel P. Berrangé 
wrote:

> On Tue, Dec 05, 2023 at 12:41:16AM +0800, Yong Huang wrote:
> > On Tue, Dec 5, 2023 at 12:24 AM Daniel P. Berrangé 
> > wrote:
> >
> > > On Tue, Dec 05, 2023 at 12:06:17AM +0800, Hyman Huang wrote:
> > > > This functionality was motivated by the following to-do list seen
> > > > in crypto documents:
> > > > https://wiki.qemu.org/Features/Block/Crypto
> > > >
> > > > The last chapter says we should "separate header volume":
> > > >
> > > > The LUKS format has ability to store the header in a separate volume
> > > > from the payload. We should extend the LUKS driver in QEMU to support
> > > > this use case.
> > > >
> > > > As a proof-of-concept, I've created this patchset, which I've named
> > > > the Gluks: generic luks. As their name suggests, they offer
> encryption
> > > > for any format that QEMU theoretically supports.
> > >
> > > I don't see the point in creating a new driver.
> > >
> > > I would expect detached header support to be implemented via an
> > > optional new 'header' field in the existing driver. ie
> > >
> > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > index ca390c5700..48d1f2a974 100644
> > > --- a/qapi/block-core.json
> > > +++ b/qapi/block-core.json
> > > @@ -3352,11 +3352,15 @@
> > >  # decryption key (since 2.6). Mandatory except when doing a
> > >  # metadata-only probe of the image.
> > >  #
> > > +# @header: optional reference to the location of a blockdev
> > > +# storing a detached LUKS heaer
> > > +#
> > >  # Since: 2.9
> > >  ##
> > >  { 'struct': 'BlockdevOptionsLUKS',
> > >'base': 'BlockdevOptionsGenericFormat',
> > > -  'data': { '*key-secret': 'str' } }
> > > +  'data': { '*key-secret': 'str',
> > > +"*header-file': 'BlockdevRef'} }
> > >
> > >  ##
> > >  # @BlockdevOptionsGenericCOWFormat:
> > > @@ -4941,9 +4945,18 @@
> > >  #
> > >  # Driver specific image creation options for LUKS.
> > >  #
> > > -# @file: Node to create the image format on
> > > +# @file: Node to create the image format on. Mandatory
> > > +# unless a detached header file is specified using
> > > +# @header.
> > >  #
> > > -# @size: Size of the virtual disk in bytes
> > > +# @size: Size of the virtual disk in bytes.  Mandatory
> > > +# unless a detached header file is specified using
> > > +# @header.
> > > +#
> > > +# @header: optional reference to the location of a blockdev
> > > +# storing a detached LUKS heaer. The @file option is
> > > +# is optional when this is given, unless it is desired
> > > +# to perform pre-allocation
> > >  #
> > >  # @preallocation: Preallocation mode for the new image (since: 4.2)
> > >  # (default: off; allowed values: off, metadata, falloc, full)
> > > @@ -4952,8 +4965,9 @@
> > >  ##
> > >  { 'struct': 'BlockdevCreateOptionsLUKS',
> > >'base': 'QCryptoBlockCreateOptionsLUKS',
> > > -  'data': { 'file': 'BlockdevRef',
> > > -'size': 'size',
> > > +  'data': { '*file':'BlockdevRef',
> > > +'*size':'size',
> > > +'*header':  'BlockdevRef'
> > >  '*preallocation':   'PreallocMode' } }
> > >
> > >  ##
> > >
> > > It ends up giving basicallly the same workflow as you outline,
> > > without needing the new block driver
> > >
> >
> > How about the design and usage, could it be simpler? Any advice? :)
> >
> >
> > As you can see below, the Gluks format block layer driver's design is
> > quite simple.
> >
> >  virtio-blk/vhost-user-blk...(front-end device)
> >   ^
> >   |
> >  Gluks   (format-like disk node)
> >   / \
> >file   header (blockdev reference)
> > / \
> >  filefile (protocol node)
> >|   |
> >disk data   Luks data
>
> What I suggested above ends up with the exact same block driver
> graph, unless I'm missing something.
>

I could overlook something or fail to adequately convey the goal of the
patchset. :(

Indeed, utilizing the same block driver might be effective if our only goal
is to divide the header volume, giving us an additional way to use Luks.

While supporting encryption for any disk format that QEMU is capable of
supporting is another feature of this patchset. This implies that we might
link the Luks header to other blockdev references, which might alter how
the Luks are used and make them incompatible with it. It's not
user-friendly in my opinion, and I'm not aware of a more elegant solution.



> With regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|
>
>

-- 
Best regards

Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-04 Thread Peter Xu

On Fri, Dec 01, 2023 at 11:23:33AM -0500, Steven Sistare wrote:
> >> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int 
> >> version_id)
> >>  return -EINVAL;
> >>  }
> >>  s->state = r;
> >> +vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
> > 
> > IIUC current vm_was_suspended (based on my read of your patch) was not the
> > same as a boolean representing "whether VM is suspended", but only a
> > temporary field to remember that for a VM stop request.  To be explicit, I
> > didn't see this flag set in qemu_system_suspend() in your previous patch.
> > 
> > If so, we can already do:
> > 
> >   vm_set_suspended(s->vm_was_suspended);
> > 
> > Irrelevant of RUN_STATE_SUSPENDED?
> 
> We need both terms of the expression.
> 
> If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = false.
> We call global_state_store prior to vm_stop_force_state, so the incoming
> side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.

Right.

> However, the runstate is RUN_STATE_INMIGRATE.  When incoming finishes by
> calling vm_start, we need to restore the suspended state.  Thus in 
> global_state_post_load, we must set vm_was_suspended = true.

With above, shouldn't global_state_get_runstate() (on dest) fetch SUSPENDED
already?  Then I think it should call vm_start(SUSPENDED) if to start.

Maybe you're talking about the special case where autostart==false?  We
used to have this (existing process_incoming_migration_bh()):

if (!global_state_received() ||
global_state_get_runstate() == RUN_STATE_RUNNING) {
if (autostart) {
vm_start();
} else {
runstate_set(RUN_STATE_PAUSED);
}
}

If so maybe I get you, because in the "else" path we do seem to lose the
SUSPENDED state again, but in that case IMHO we should logically set
vm_was_suspended only when we "lose" it - we didn't lose it during
migration, but only until we decided to switch to PAUSED (due to
autostart==false). IOW, change above to something like:

state = global_state_get_runstate();
if (!global_state_received() || runstate_is_alive(state)) {
if (autostart) {
vm_start(state);
} else {
if (runstate_is_suspended(state)) {
/* Remember suspended state before setting system to STOPed */
vm_was_suspended = true;
}
runstate_set(RUN_STATE_PAUSED);
}
}

It may or may not have a functional difference even if current patch,
though.  However maybe clearer to follow vm_was_suspended's strict
definition.

> 
> If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
> then vm_was_suspended = true.  Migration from that state sets
> vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and 
> ends with runstate_set(RUN_STATE_PAUSED).
> 
> I will add a comment here in the code.
>  
> >>  return 0;
> >>  }
> >> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = {
> >>  .fields = (VMStateField[]) {
> >>  VMSTATE_UINT32(size, GlobalState),
> >>  VMSTATE_BUFFER(runstate, GlobalState),
> >> +VMSTATE_BOOL(vm_was_suspended, GlobalState),
> >>  VMSTATE_END_OF_LIST()
> >>  },
> >>  };
> > 
> > I think this will break migration between old/new, unfortunately.  And
> > since the global state exist mostly for every VM, all VM setup should be
> > affected, and over all archs.
> 
> Thanks, I keep forgetting that my binary tricks are no good here.  However,
> I have one other trick up my sleeve, which is to store vm_was_running in
> global_state.runstate[strlen(runstate) + 2].  It is forwards and backwards
> compatible, since that byte is always 0 in older qemu.  It can be implemented
> with a few lines of code change confined to global_state.c, versus many lines 
> spread across files to do it the conventional way using a compat property and
> a subsection.  Sound OK?  

Tricky!  But sounds okay to me.  I think you're inventing some of your own
way of being compatible, not relying on machine type as a benefit.  If go
this route please document clearly on the layout and also what it looked
like in old binaries.

I think maybe it'll be good to keep using strings, so in the new binaries
we allow >1 strings, then we define properly on those strings (index 0:
runstate, existed since start; index 2: suspended, perhaps using "1"/"0" to
express, while 0x00 means old binary, etc.).

I hope this trick will need less code than the subsection solution,
otherwise I'd still consider going with that, which is the "common
solution".

Let's also see whether Juan/Fabiano/others has any opinions.

-- 
Peter Xu

Re: [PATCH] Revert "test/qga: use G_TEST_DIR to locate os-release test file"

2023-12-04 Thread Marc-André Lureau

Hi

On Mon, Dec 4, 2023 at 9:01 PM Andrey Drobyshev
 wrote:
>
> On 12/4/23 18:51, Marc-André Lureau wrote:
> > Hi
> >
> > On Mon, Dec 4, 2023 at 8:33 PM Andrey Drobyshev
> >  wrote:
> >>
> >> Since the commit a85d09269b QGA_OS_RELEASE variable points to the path
> >> relative to the build dir.  Then on qemu-ga startup this path can't be
> >> found as qemu-ga cwd is somewhere else, which leads to the test failure:
> >>
> >>   # ./tests/unit/test-qga -p /qga/guest-get-osinfo
> >>   # random seed: R02S3a90c22d77ff1070fbd844f4959cf4a4
> >>   # Start of qga tests
> >>   **
> >>   ERROR:../tests/unit/test-qga.c:906:test_qga_guest_get_osinfo: 'str' 
> >> should not be NULL
> >>   Bail out! ERROR:../tests/unit/test-qga.c:906:test_qga_guest_get_osinfo: 
> >> 'str' should not be NULL
> >>
> >> Let's obtain the absolute path again.
> >
> > Can you detail how the build and the test is done?
> >
>
> Simple as:
>
> > ./configure --cc=gcc --target-list=x86_64-softmmu --enable-guest-agent && 
> > make -j16
> > cd build; tests/unit/test-qga -p /qga/guest-get-osinfo
>
>
> > If I recall correctly, this change was done in order to move qga to a
> > subproject(), but isn't strictly required at this point. Although I
> > believe it is more correct to lookup test data relative to
> > G_TEST_DIST.
> >
>
> Then we'd have to change cwd of qemu-ga at startup to ensure relative
> paths work.  Right now (with the initial change) it appears broken.

By reverting the patch, it is _still_ broken if you run the test
manually from a different directory (say from tests/unit for example)

With G_TEST_DIST, and proper testing environment, it works from any directory.

Tests are not meant to be run manually, you should run them through
the test runner: meson test -v test-qga

>
> >>
> >> This reverts commit a85d09269bb1a7071d3ce0f2957e3ca9dba7c047.
> >>
> >> Signed-off-by: Andrey Drobyshev 
> >> ---
> >>  tests/unit/test-qga.c | 6 --
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tests/unit/test-qga.c b/tests/unit/test-qga.c
> >> index 671e83cb86..47cf5e30ec 100644
> >> --- a/tests/unit/test-qga.c
> >> +++ b/tests/unit/test-qga.c
> >> @@ -1034,10 +1034,12 @@ static void 
> >> test_qga_guest_get_osinfo(gconstpointer data)
> >>  g_autoptr(QDict) ret = NULL;
> >>  char *env[2];
> >>  QDict *val;
> >> +g_autofree gchar *cwd = NULL;
> >>
> >> +cwd = g_get_current_dir();
> >>  env[0] = g_strdup_printf(
> >> -"QGA_OS_RELEASE=%s%c..%cdata%ctest-qga-os-release",
> >> -g_test_get_dir(G_TEST_DIST), G_DIR_SEPARATOR, G_DIR_SEPARATOR, 
> >> G_DIR_SEPARATOR);
> >> +"QGA_OS_RELEASE=%s%ctests%cdata%ctest-qga-os-release",
> >> +cwd, G_DIR_SEPARATOR, G_DIR_SEPARATOR, G_DIR_SEPARATOR);
> >>  env[1] = NULL;
> >>  fixture_setup(&fixture, NULL, env);
> >>
> >> --
> >> 2.39.3
> >>
> >>
> >
> >
>


-- 
Marc-André Lureau

Re: [PATCH] Revert "test/qga: use G_TEST_DIR to locate os-release test file"

2023-12-04 Thread Andrey Drobyshev

On 12/4/23 18:51, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Dec 4, 2023 at 8:33 PM Andrey Drobyshev
>  wrote:
>>
>> Since the commit a85d09269b QGA_OS_RELEASE variable points to the path
>> relative to the build dir.  Then on qemu-ga startup this path can't be
>> found as qemu-ga cwd is somewhere else, which leads to the test failure:
>>
>>   # ./tests/unit/test-qga -p /qga/guest-get-osinfo
>>   # random seed: R02S3a90c22d77ff1070fbd844f4959cf4a4
>>   # Start of qga tests
>>   **
>>   ERROR:../tests/unit/test-qga.c:906:test_qga_guest_get_osinfo: 'str' should 
>> not be NULL
>>   Bail out! ERROR:../tests/unit/test-qga.c:906:test_qga_guest_get_osinfo: 
>> 'str' should not be NULL
>>
>> Let's obtain the absolute path again.
> 
> Can you detail how the build and the test is done?
> 

Simple as:

> ./configure --cc=gcc --target-list=x86_64-softmmu --enable-guest-agent && 
> make -j16
> cd build; tests/unit/test-qga -p /qga/guest-get-osinfo


> If I recall correctly, this change was done in order to move qga to a
> subproject(), but isn't strictly required at this point. Although I
> believe it is more correct to lookup test data relative to
> G_TEST_DIST.
> 

Then we'd have to change cwd of qemu-ga at startup to ensure relative
paths work.  Right now (with the initial change) it appears broken.

>>
>> This reverts commit a85d09269bb1a7071d3ce0f2957e3ca9dba7c047.
>>
>> Signed-off-by: Andrey Drobyshev 
>> ---
>>  tests/unit/test-qga.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/unit/test-qga.c b/tests/unit/test-qga.c
>> index 671e83cb86..47cf5e30ec 100644
>> --- a/tests/unit/test-qga.c
>> +++ b/tests/unit/test-qga.c
>> @@ -1034,10 +1034,12 @@ static void test_qga_guest_get_osinfo(gconstpointer 
>> data)
>>  g_autoptr(QDict) ret = NULL;
>>  char *env[2];
>>  QDict *val;
>> +g_autofree gchar *cwd = NULL;
>>
>> +cwd = g_get_current_dir();
>>  env[0] = g_strdup_printf(
>> -"QGA_OS_RELEASE=%s%c..%cdata%ctest-qga-os-release",
>> -g_test_get_dir(G_TEST_DIST), G_DIR_SEPARATOR, G_DIR_SEPARATOR, 
>> G_DIR_SEPARATOR);
>> +"QGA_OS_RELEASE=%s%ctests%cdata%ctest-qga-os-release",
>> +cwd, G_DIR_SEPARATOR, G_DIR_SEPARATOR, G_DIR_SEPARATOR);
>>  env[1] = NULL;
>>  fixture_setup(&fixture, NULL, env);
>>
>> --
>> 2.39.3
>>
>>
> 
>

Re: [PATCH 2/2] linux-user: Fix openat() emulation to not modify atime

2023-12-04 Thread Daniel P . Berrangé

On Thu, Nov 30, 2023 at 07:21:40PM -0800, Shu-Chun Weng wrote:
> Commit b8002058 strengthened openat()'s /proc detection by calling
> realpath(3) on the given path, which allows various paths and symlinks
> that points to the /proc file system to be intercepted correctly.
> 
> Using realpath(3), though, has a side effect that it reads the symlinks
> along the way, and thus changes their atime. The results in the
> following code snippet already get ~now instead of the real atime:
> 
>   int fd = open("/path/to/a/symlink", O_PATH | O_NOFOLLOW);
>   struct stat st;
>   fstat(fd, st);
>   return st.st_atime;
> 
> This change opens a path that doesn't appear to be part of /proc
> directly and checks the destination of /proc/self/fd/n to determine if
> it actually refers to a file in /proc.
> 
> Neither this nor the existing code works with symlinks or indirect paths
> (e.g.  /tmp/../proc/self/exe) that points to /proc/self/exe because it
> is itself a symlink, and both realpath(3) and /proc/self/fd/n will
> resolve into the location of QEMU.

I wonder if we can detect that by opening with O_NOFOLLOW, then
calling fstatfs() on the FD, and checking f_type == PROCFS_SUPER_MAGIC


> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index e384e14248..25e2cda10a 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -8308,8 +8308,6 @@ static int open_net_route(CPUArchState *cpu_env, int fd)
>  int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *fname,
>  int flags, mode_t mode, bool safe)
>  {
> -g_autofree char *proc_name = NULL;
> -const char *pathname;
>  struct fake_open {
>  const char *filename;
>  int (*fill)(CPUArchState *cpu_env, int fd);
> @@ -8333,13 +8331,39 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
> const char *fname,
>  #endif
>  { NULL, NULL, NULL }
>  };
> +char pathname[PATH_MAX];
>  
> -/* if this is a file from /proc/ filesystem, expand full name */
> -proc_name = realpath(fname, NULL);
> -if (proc_name && strncmp(proc_name, "/proc/", 6) == 0) {
> -pathname = proc_name;
> +if (strncmp(fname, "/proc/", 6) == 0) {
> +pstrcpy(pathname, sizeof(pathname), fname);
>  } else {
> -pathname = fname;
> +char procpath[PATH_MAX];
> +int fd, n;
> +
> +if (safe) {
> +fd = safe_openat(dirfd, path(fname), flags, mode);
> +} else {
> +fd = openat(dirfd, path(fname), flags, mode);
> +}
> +if (fd < 0) {
> +return fd;
> +}
> +
> +/*
> + * Try to get the real path of the file we just opened. We avoid 
> calling
> + * `realpath(3)` because it calls `readlink(2)` on symlinks which
> + * changes their atime. Note that since `/proc/self/exe` is a 
> symlink,
> + * `pathname` will never resolves to it (neither will `realpath(3)`).
> + * That's why we check `fname` against the "/proc/" prefix first.
> + */
> +snprintf(procpath, sizeof(procpath), "/proc/self/fd/%d", fd);

g_strdup_printf() + g_autofree to avoid this PATH_MAX buffer

> +n = readlink(procpath, pathname, sizeof(pathname));
> +pathname[n < sizeof(pathname) ? n : sizeof(pathname)] = '\0';

If you call lstat() then sb_size will tell you how big the buffer
needs to be for a subsequent readlink(), whcih can be allocated
on the heap and released with g_autofree, avoiding the othuer PATH_MAX
buffer

> +
> +/* if this is not a file from /proc/ filesystem, the fd is good 
> as-is */
> +if (strncmp(pathname, "/proc/", 6) != 0) {
> +return fd;
> +}
> +close(fd);
>  }
>  
>  if (is_proc_myself(pathname, "exe")) {
> @@ -8390,9 +8414,9 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
> const char *fname,
>  }
>  
>  if (safe) {
> -return safe_openat(dirfd, path(pathname), flags, mode);
> +return safe_openat(dirfd, pathname, flags, mode);
>  } else {
> -return openat(dirfd, path(pathname), flags, mode);
> +return openat(dirfd, pathname, flags, mode);
>  }
>  }
>  
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC 0/8] Support generic Luks encryption

2023-12-04 Thread Daniel P . Berrangé

On Tue, Dec 05, 2023 at 12:41:16AM +0800, Yong Huang wrote:
> On Tue, Dec 5, 2023 at 12:24 AM Daniel P. Berrangé 
> wrote:
> 
> > On Tue, Dec 05, 2023 at 12:06:17AM +0800, Hyman Huang wrote:
> > > This functionality was motivated by the following to-do list seen
> > > in crypto documents:
> > > https://wiki.qemu.org/Features/Block/Crypto
> > >
> > > The last chapter says we should "separate header volume":
> > >
> > > The LUKS format has ability to store the header in a separate volume
> > > from the payload. We should extend the LUKS driver in QEMU to support
> > > this use case.
> > >
> > > As a proof-of-concept, I've created this patchset, which I've named
> > > the Gluks: generic luks. As their name suggests, they offer encryption
> > > for any format that QEMU theoretically supports.
> >
> > I don't see the point in creating a new driver.
> >
> > I would expect detached header support to be implemented via an
> > optional new 'header' field in the existing driver. ie
> >
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index ca390c5700..48d1f2a974 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -3352,11 +3352,15 @@
> >  # decryption key (since 2.6). Mandatory except when doing a
> >  # metadata-only probe of the image.
> >  #
> > +# @header: optional reference to the location of a blockdev
> > +# storing a detached LUKS heaer
> > +#
> >  # Since: 2.9
> >  ##
> >  { 'struct': 'BlockdevOptionsLUKS',
> >'base': 'BlockdevOptionsGenericFormat',
> > -  'data': { '*key-secret': 'str' } }
> > +  'data': { '*key-secret': 'str',
> > +"*header-file': 'BlockdevRef'} }
> >
> >  ##
> >  # @BlockdevOptionsGenericCOWFormat:
> > @@ -4941,9 +4945,18 @@
> >  #
> >  # Driver specific image creation options for LUKS.
> >  #
> > -# @file: Node to create the image format on
> > +# @file: Node to create the image format on. Mandatory
> > +# unless a detached header file is specified using
> > +# @header.
> >  #
> > -# @size: Size of the virtual disk in bytes
> > +# @size: Size of the virtual disk in bytes.  Mandatory
> > +# unless a detached header file is specified using
> > +# @header.
> > +#
> > +# @header: optional reference to the location of a blockdev
> > +# storing a detached LUKS heaer. The @file option is
> > +# is optional when this is given, unless it is desired
> > +# to perform pre-allocation
> >  #
> >  # @preallocation: Preallocation mode for the new image (since: 4.2)
> >  # (default: off; allowed values: off, metadata, falloc, full)
> > @@ -4952,8 +4965,9 @@
> >  ##
> >  { 'struct': 'BlockdevCreateOptionsLUKS',
> >'base': 'QCryptoBlockCreateOptionsLUKS',
> > -  'data': { 'file': 'BlockdevRef',
> > -'size': 'size',
> > +  'data': { '*file':'BlockdevRef',
> > +'*size':'size',
> > +'*header':  'BlockdevRef'
> >  '*preallocation':   'PreallocMode' } }
> >
> >  ##
> >
> > It ends up giving basicallly the same workflow as you outline,
> > without needing the new block driver
> >
> 
> How about the design and usage, could it be simpler? Any advice? :)
> 
> 
> As you can see below, the Gluks format block layer driver's design is
> quite simple.
> 
>  virtio-blk/vhost-user-blk...(front-end device)
>   ^
>   |
>  Gluks   (format-like disk node)
>   / \
>file   header (blockdev reference)
> / \
>  filefile (protocol node)
>|   |
>disk data   Luks data

What I suggested above ends up with the exact same block driver
graph, unless I'm missing something.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] Revert "test/qga: use G_TEST_DIR to locate os-release test file"

2023-12-04 Thread Marc-André Lureau

Hi

On Mon, Dec 4, 2023 at 8:33 PM Andrey Drobyshev
 wrote:
>
> Since the commit a85d09269b QGA_OS_RELEASE variable points to the path
> relative to the build dir.  Then on qemu-ga startup this path can't be
> found as qemu-ga cwd is somewhere else, which leads to the test failure:
>
>   # ./tests/unit/test-qga -p /qga/guest-get-osinfo
>   # random seed: R02S3a90c22d77ff1070fbd844f4959cf4a4
>   # Start of qga tests
>   **
>   ERROR:../tests/unit/test-qga.c:906:test_qga_guest_get_osinfo: 'str' should 
> not be NULL
>   Bail out! ERROR:../tests/unit/test-qga.c:906:test_qga_guest_get_osinfo: 
> 'str' should not be NULL
>
> Let's obtain the absolute path again.

Can you detail how the build and the test is done?

If I recall correctly, this change was done in order to move qga to a
subproject(), but isn't strictly required at this point. Although I
believe it is more correct to lookup test data relative to
G_TEST_DIST.

>
> This reverts commit a85d09269bb1a7071d3ce0f2957e3ca9dba7c047.
>
> Signed-off-by: Andrey Drobyshev 
> ---
>  tests/unit/test-qga.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tests/unit/test-qga.c b/tests/unit/test-qga.c
> index 671e83cb86..47cf5e30ec 100644
> --- a/tests/unit/test-qga.c
> +++ b/tests/unit/test-qga.c
> @@ -1034,10 +1034,12 @@ static void test_qga_guest_get_osinfo(gconstpointer 
> data)
>  g_autoptr(QDict) ret = NULL;
>  char *env[2];
>  QDict *val;
> +g_autofree gchar *cwd = NULL;
>
> +cwd = g_get_current_dir();
>  env[0] = g_strdup_printf(
> -"QGA_OS_RELEASE=%s%c..%cdata%ctest-qga-os-release",
> -g_test_get_dir(G_TEST_DIST), G_DIR_SEPARATOR, G_DIR_SEPARATOR, 
> G_DIR_SEPARATOR);
> +"QGA_OS_RELEASE=%s%ctests%cdata%ctest-qga-os-release",
> +cwd, G_DIR_SEPARATOR, G_DIR_SEPARATOR, G_DIR_SEPARATOR);
>  env[1] = NULL;
>  fixture_setup(&fixture, NULL, env);
>
> --
> 2.39.3
>
>


-- 
Marc-André Lureau

Re: [PATCH-for-8.2?] hw/ufs: avoid generating the same ID string for different LU devices

2023-12-04 Thread Philippe Mathieu-Daudé


On 4/12/23 16:05, Akinobu Mita wrote:

QEMU would not start when trying to create two UFS host controllers and
a UFS logical unit for each with the following options:

-device ufs,id=bus0 \
-device ufs-lu,drive=drive1,bus=bus0,lun=0 \
-device ufs,id=bus1 \
-device ufs-lu,drive=drive2,bus=bus1,lun=0 \

This is because the same ID string ("0:0:0/scsi-disk") is generated
for both UFS logical units.

To fix this issue, prepend the parent pci device's path to make
the ID string unique.
(":00:03.0/0:0:0/scsi-disk" and ":00:04.0/0:0:0/scsi-disk")

Fixes: 096434fea13a ("hw/ufs: Modify lu.c to share codes with SCSI subsystem")
Signed-off-by: Akinobu Mita 


Reviewed-by: Philippe Mathieu-Daudé 


---
  hw/ufs/ufs.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/hw/ufs/ufs.c b/hw/ufs/ufs.c
index 68c5f1f6c9..eccdb852a0 100644
--- a/hw/ufs/ufs.c
+++ b/hw/ufs/ufs.c
@@ -1323,9 +1323,17 @@ static bool ufs_bus_check_address(BusState *qbus, 
DeviceState *qdev,
  return true;
  }
  
+static char *ufs_bus_get_dev_path(DeviceState *dev)

+{
+BusState *bus = qdev_get_parent_bus(dev);
+
+return qdev_get_dev_path(bus->parent);
+}
+
  static void ufs_bus_class_init(ObjectClass *class, void *data)
  {
  BusClass *bc = BUS_CLASS(class);
+bc->get_dev_path = ufs_bus_get_dev_path;
  bc->check_address = ufs_bus_check_address;
  }

[PATCH v2 0/4] scsi: eliminate AioContext lock

2023-12-04 Thread Stefan Hajnoczi

v2:
- Reschedule BH in new AioContext if change is detected [Kevin]
- Drop stray "remember" in Patch 2's commit description [Eric]

The SCSI subsystem uses the AioContext lock to protect internal state. This is
necessary because the main loop and the IOThread can access SCSI state in
parallel. This inter-thread access happens during scsi_device_purge_requests()
and scsi_dma_restart_cb().

This patch series modifies the code so SCSI state is only accessed from the
IOThread that is executing requests. Once this has been achieved the AioContext
lock is no longer necessary.

Note that a few aio_context_acquire()/aio_context_release() calls still remain
after this series. They surround API calls that invoke AIO_WAIT_WHILE() and
therefore still rely on the AioContext lock for now.

Stefan Hajnoczi (4):
  scsi: only access SCSIDevice->requests from one thread
  virtio-scsi: don't lock AioContext around
virtio_queue_aio_attach_host_notifier()
  scsi: don't lock AioContext in I/O code path
  dma-helpers: don't lock AioContext in dma_blk_cb()

 include/hw/scsi/scsi.h  |   7 +-
 hw/scsi/scsi-bus.c  | 180 ++--
 hw/scsi/scsi-disk.c |  23 
 hw/scsi/scsi-generic.c  |  20 +---
 hw/scsi/virtio-scsi-dataplane.c |   8 +-
 system/dma-helpers.c|   7 +-
 6 files changed, 136 insertions(+), 109 deletions(-)

-- 
2.43.0

Re: [RFC 0/8] Support generic Luks encryption

2023-12-04 Thread Yong Huang

On Tue, Dec 5, 2023 at 12:24 AM Daniel P. Berrangé 
wrote:

> On Tue, Dec 05, 2023 at 12:06:17AM +0800, Hyman Huang wrote:
> > This functionality was motivated by the following to-do list seen
> > in crypto documents:
> > https://wiki.qemu.org/Features/Block/Crypto
> >
> > The last chapter says we should "separate header volume":
> >
> > The LUKS format has ability to store the header in a separate volume
> > from the payload. We should extend the LUKS driver in QEMU to support
> > this use case.
> >
> > As a proof-of-concept, I've created this patchset, which I've named
> > the Gluks: generic luks. As their name suggests, they offer encryption
> > for any format that QEMU theoretically supports.
>
> I don't see the point in creating a new driver.
>
> I would expect detached header support to be implemented via an
> optional new 'header' field in the existing driver. ie
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index ca390c5700..48d1f2a974 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3352,11 +3352,15 @@
>  # decryption key (since 2.6). Mandatory except when doing a
>  # metadata-only probe of the image.
>  #
> +# @header: optional reference to the location of a blockdev
> +# storing a detached LUKS heaer
> +#
>  # Since: 2.9
>  ##
>  { 'struct': 'BlockdevOptionsLUKS',
>'base': 'BlockdevOptionsGenericFormat',
> -  'data': { '*key-secret': 'str' } }
> +  'data': { '*key-secret': 'str',
> +"*header-file': 'BlockdevRef'} }
>
>  ##
>  # @BlockdevOptionsGenericCOWFormat:
> @@ -4941,9 +4945,18 @@
>  #
>  # Driver specific image creation options for LUKS.
>  #
> -# @file: Node to create the image format on
> +# @file: Node to create the image format on. Mandatory
> +# unless a detached header file is specified using
> +# @header.
>  #
> -# @size: Size of the virtual disk in bytes
> +# @size: Size of the virtual disk in bytes.  Mandatory
> +# unless a detached header file is specified using
> +# @header.
> +#
> +# @header: optional reference to the location of a blockdev
> +# storing a detached LUKS heaer. The @file option is
> +# is optional when this is given, unless it is desired
> +# to perform pre-allocation
>  #
>  # @preallocation: Preallocation mode for the new image (since: 4.2)
>  # (default: off; allowed values: off, metadata, falloc, full)
> @@ -4952,8 +4965,9 @@
>  ##
>  { 'struct': 'BlockdevCreateOptionsLUKS',
>'base': 'QCryptoBlockCreateOptionsLUKS',
> -  'data': { 'file': 'BlockdevRef',
> -'size': 'size',
> +  'data': { '*file':'BlockdevRef',
> +'*size':'size',
> +'*header':  'BlockdevRef'
>  '*preallocation':   'PreallocMode' } }
>
>  ##
>
> It ends up giving basicallly the same workflow as you outline,
> without needing the new block driver
>

How about the design and usage, could it be simpler? Any advice? :)


As you can see below, the Gluks format block layer driver's design is
quite simple.

 virtio-blk/vhost-user-blk...(front-end device)
  ^
  |
 Gluks   (format-like disk node)
  / \
   file   header (blockdev reference)
/ \
 filefile (protocol node)
   |   |
   disk data   Luks data


>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|
>
>

-- 
Best regards

[PATCH v2 1/4] scsi: only access SCSIDevice->requests from one thread

2023-12-04 Thread Stefan Hajnoczi

Stop depending on the AioContext lock and instead access
SCSIDevice->requests from only one thread at a time:
- When the VM is running only the BlockBackend's AioContext may access
  the requests list.
- When the VM is stopped only the main loop may access the requests
  list.

These constraints protect the requests list without the need for locking
in the I/O code path.

Note that multiple IOThreads are not supported yet because the code
assumes all SCSIRequests are executed from a single AioContext. Leave
that as future work.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
---
 include/hw/scsi/scsi.h |   7 +-
 hw/scsi/scsi-bus.c | 180 -
 2 files changed, 130 insertions(+), 57 deletions(-)

diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 3692ca82f3..10c4e8288d 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -69,14 +69,19 @@ struct SCSIDevice
 {
 DeviceState qdev;
 VMChangeStateEntry *vmsentry;
-QEMUBH *bh;
 uint32_t id;
 BlockConf conf;
 SCSISense unit_attention;
 bool sense_is_ua;
 uint8_t sense[SCSI_SENSE_BUF_SIZE];
 uint32_t sense_len;
+
+/*
+ * The requests list is only accessed from the AioContext that executes
+ * requests or from the main loop when IOThread processing is stopped.
+ */
 QTAILQ_HEAD(, SCSIRequest) requests;
+
 uint32_t channel;
 uint32_t lun;
 int blocksize;
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index fc4b77fdb0..f3ec11f892 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -85,6 +85,88 @@ SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, int 
id, int lun)
 return d;
 }
 
+/*
+ * Invoke @fn() for each enqueued request in device @s. Must be called from the
+ * main loop thread while the guest is stopped. This is only suitable for
+ * vmstate ->put(), use scsi_device_for_each_req_async() for other cases.
+ */
+static void scsi_device_for_each_req_sync(SCSIDevice *s,
+  void (*fn)(SCSIRequest *, void *),
+  void *opaque)
+{
+SCSIRequest *req;
+SCSIRequest *next_req;
+
+assert(!runstate_is_running());
+assert(qemu_in_main_thread());
+
+QTAILQ_FOREACH_SAFE(req, &s->requests, next, next_req) {
+fn(req, opaque);
+}
+}
+
+typedef struct {
+SCSIDevice *s;
+void (*fn)(SCSIRequest *, void *);
+void *fn_opaque;
+} SCSIDeviceForEachReqAsyncData;
+
+static void scsi_device_for_each_req_async_bh(void *opaque)
+{
+g_autofree SCSIDeviceForEachReqAsyncData *data = opaque;
+SCSIDevice *s = data->s;
+AioContext *ctx;
+SCSIRequest *req;
+SCSIRequest *next;
+
+/*
+ * If the AioContext changed before this BH was called then reschedule into
+ * the new AioContext before accessing ->requests. This can happen when
+ * scsi_device_for_each_req_async() is called and then the AioContext is
+ * changed before BHs are run.
+ */
+ctx = blk_get_aio_context(s->conf.blk);
+if (ctx != qemu_get_current_aio_context()) {
+aio_bh_schedule_oneshot(ctx, scsi_device_for_each_req_async_bh, data);
+return;
+}
+
+QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) {
+data->fn(req, data->fn_opaque);
+}
+
+/* Drop the reference taken by scsi_device_for_each_req_async() */
+object_unref(OBJECT(s));
+}
+
+/*
+ * Schedule @fn() to be invoked for each enqueued request in device @s. @fn()
+ * runs in the AioContext that is executing the request.
+ */
+static void scsi_device_for_each_req_async(SCSIDevice *s,
+   void (*fn)(SCSIRequest *, void *),
+   void *opaque)
+{
+assert(qemu_in_main_thread());
+
+SCSIDeviceForEachReqAsyncData *data =
+g_new(SCSIDeviceForEachReqAsyncData, 1);
+
+data->s = s;
+data->fn = fn;
+data->fn_opaque = opaque;
+
+/*
+ * Hold a reference to the SCSIDevice until
+ * scsi_device_for_each_req_async_bh() finishes.
+ */
+object_ref(OBJECT(s));
+
+aio_bh_schedule_oneshot(blk_get_aio_context(s->conf.blk),
+scsi_device_for_each_req_async_bh,
+data);
+}
+
 static void scsi_device_realize(SCSIDevice *s, Error **errp)
 {
 SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(s);
@@ -144,20 +226,18 @@ void scsi_bus_init_named(SCSIBus *bus, size_t bus_size, 
DeviceState *host,
 qbus_set_bus_hotplug_handler(BUS(bus));
 }
 
-static void scsi_dma_restart_bh(void *opaque)
+void scsi_req_retry(SCSIRequest *req)
 {
-SCSIDevice *s = opaque;
-SCSIRequest *req, *next;
+req->retry = true;
+}
 
-qemu_bh_delete(s->bh);
-s->bh = NULL;
-
-aio_context_acquire(blk_get_aio_context(s->conf.blk));
-QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) {
-scsi_req_ref(req);
-if (req->retry) {
-req->ret

[PATCH v2 2/4] virtio-scsi: don't lock AioContext around virtio_queue_aio_attach_host_notifier()

2023-12-04 Thread Stefan Hajnoczi

virtio_queue_aio_attach_host_notifier() does not require the AioContext
lock. Stop taking the lock and add an explicit smp_wmb() because we were
relying on the implicit barrier in the AioContext lock before.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 hw/scsi/virtio-scsi-dataplane.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 1e684beebe..135e23fe54 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -149,23 +149,17 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 
 memory_region_transaction_commit();
 
-/*
- * These fields are visible to the IOThread so we rely on implicit barriers
- * in aio_context_acquire() on the write side and aio_notify_accept() on
- * the read side.
- */
 s->dataplane_starting = false;
 s->dataplane_started = true;
+smp_wmb(); /* paired with aio_notify_accept() */
 
 if (s->bus.drain_count == 0) {
-aio_context_acquire(s->ctx);
 virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);
 virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
 
 for (i = 0; i < vs->conf.num_queues; i++) {
 virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx);
 }
-aio_context_release(s->ctx);
 }
 return 0;
 
-- 
2.43.0

[PATCH v2 4/4] dma-helpers: don't lock AioContext in dma_blk_cb()

2023-12-04 Thread Stefan Hajnoczi

Commit abfcd2760b3e ("dma-helpers: prevent dma_blk_cb() vs
dma_aio_cancel() race") acquired the AioContext lock inside dma_blk_cb()
to avoid a race with scsi_device_purge_requests() running in the main
loop thread.

The SCSI code no longer calls dma_aio_cancel() from the main loop thread
while I/O is running in the IOThread AioContext. Therefore it is no
longer necessary to take this lock to protect DMAAIOCB fields. The
->cb() function also does not require the lock because blk_aio_*() and
friends do not need the AioContext lock.

Both hw/ide/core.c and hw/ide/macio.c also call dma_blk_io() but don't
rely on it taking the AioContext lock, so this change is safe.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 system/dma-helpers.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/system/dma-helpers.c b/system/dma-helpers.c
index 36211acc7e..528117f256 100644
--- a/system/dma-helpers.c
+++ b/system/dma-helpers.c
@@ -119,13 +119,12 @@ static void dma_blk_cb(void *opaque, int ret)
 
 trace_dma_blk_cb(dbs, ret);
 
-aio_context_acquire(ctx);
 dbs->acb = NULL;
 dbs->offset += dbs->iov.size;
 
 if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
 dma_complete(dbs, ret);
-goto out;
+return;
 }
 dma_blk_unmap(dbs);
 
@@ -168,7 +167,7 @@ static void dma_blk_cb(void *opaque, int ret)
 trace_dma_map_wait(dbs);
 dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);
 cpu_register_map_client(dbs->bh);
-goto out;
+return;
 }
 
 if (!QEMU_IS_ALIGNED(dbs->iov.size, dbs->align)) {
@@ -179,8 +178,6 @@ static void dma_blk_cb(void *opaque, int ret)
 dbs->acb = dbs->io_func(dbs->offset, &dbs->iov,
 dma_blk_cb, dbs, dbs->io_func_opaque);
 assert(dbs->acb);
-out:
-aio_context_release(ctx);
 }
 
 static void dma_aio_cancel(BlockAIOCB *acb)
-- 
2.43.0

[PATCH v2 3/4] scsi: don't lock AioContext in I/O code path

2023-12-04 Thread Stefan Hajnoczi

blk_aio_*() doesn't require the AioContext lock and the SCSI subsystem's
internal state also does not anymore.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Acked-by: Kevin Wolf 
---
 hw/scsi/scsi-disk.c| 23 ---
 hw/scsi/scsi-generic.c | 20 +++-
 2 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 6691f5edb8..2c1bbb3530 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -273,8 +273,6 @@ static void scsi_aio_complete(void *opaque, int ret)
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
 assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
@@ -286,7 +284,6 @@ static void scsi_aio_complete(void *opaque, int ret)
 scsi_req_complete(&r->req, GOOD);
 
 done:
-aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 scsi_req_unref(&r->req);
 }
 
@@ -394,8 +391,6 @@ static void scsi_read_complete(void *opaque, int ret)
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
 assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
@@ -406,7 +401,6 @@ static void scsi_read_complete(void *opaque, int ret)
 trace_scsi_disk_read_complete(r->req.tag, r->qiov.size);
 }
 scsi_read_complete_noio(r, ret);
-aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 /* Actually issue a read to the block device.  */
@@ -448,8 +442,6 @@ static void scsi_do_read_cb(void *opaque, int ret)
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
 assert (r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
@@ -459,7 +451,6 @@ static void scsi_do_read_cb(void *opaque, int ret)
 block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
 }
 scsi_do_read(opaque, ret);
-aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 /* Read more data from scsi device into buffer.  */
@@ -533,8 +524,6 @@ static void scsi_write_complete(void * opaque, int ret)
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
 assert (r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
@@ -544,7 +533,6 @@ static void scsi_write_complete(void * opaque, int ret)
 block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
 }
 scsi_write_complete_noio(r, ret);
-aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 static void scsi_write_data(SCSIRequest *req)
@@ -1742,8 +1730,6 @@ static void scsi_unmap_complete(void *opaque, int ret)
 SCSIDiskReq *r = data->r;
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
 assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
@@ -1754,7 +1740,6 @@ static void scsi_unmap_complete(void *opaque, int ret)
 block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
 scsi_unmap_complete_noio(data, ret);
 }
-aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
@@ -1822,8 +1807,6 @@ static void scsi_write_same_complete(void *opaque, int 
ret)
 SCSIDiskReq *r = data->r;
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
 assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
@@ -1847,7 +1830,6 @@ static void scsi_write_same_complete(void *opaque, int 
ret)
data->sector << BDRV_SECTOR_BITS,
&data->qiov, 0,
scsi_write_same_complete, data);
-aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 return;
 }
 
@@ -1857,7 +1839,6 @@ done:
 scsi_req_unref(&r->req);
 qemu_vfree(data->iov.iov_base);
 g_free(data);
-aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
@@ -2810,7 +2791,6 @@ static void scsi_block_sgio_complete(void *opaque, int 
ret)
 {
 SCSIBlockReq *req = (SCSIBlockReq *)opaque;
 SCSIDiskReq *r = &req->req;
-SCSIDevice *s = r->req.dev;
 sg_io_hdr_t *io_hdr = &req->io_header;
 
 if (ret == 0) {
@@ -2827,13 +2807,10 @@ static void scsi_block_sgio_complete(void *opaque, int 
ret)
 }
 
 if (ret > 0) {
-aio_context_acquire(blk_get_aio_context(s->conf.blk));
 if (scsi_handle_rw_erro

Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate

2023-12-04 Thread Steven Sistare

On 12/4/2023 11:35 AM, Peter Xu wrote:
> On Fri, Dec 01, 2023 at 12:11:32PM -0500, Steven Sistare wrote:
 diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
 index f6a337b..1d6828f 100644
 --- a/include/sysemu/runstate.h
 +++ b/include/sysemu/runstate.h
 @@ -40,6 +40,11 @@ static inline bool 
 shutdown_caused_by_guest(ShutdownCause cause)
  return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
  }
  
 +static inline bool runstate_is_started(RunState state)
>>>
>>> Would runstate_has_vm_running() sound better?  It is a bit awkward when
>>> saying something like "start a runstate".
>>
>> I have been searching for the perfect name for this accessor.
>> IMO using "running" in this accessor is confusing because it applies to both
>> the running and suspended state.  So, I invented a new aggregate state called
>> started.  vm_start transitions the machine to a started state.
>>
>> How about runstate_was_started?  It works well at both start and stop call 
>> sites:
>>
>> void vm_resume(RunState state)
>> if (runstate_was_started(state)) {
> 
> This one looks fine, but...
> 
>> vm_start();
>>
>> int vm_stop_force_state(RunState state)
>> if (runstate_was_started(runstate_get())) {
> 
> .. this one makes the past tense not looking good.
> 
>> return vm_stop(state);
> 
> How about runstate_is_alive()?  So far the best I can come up with. :)
> 
> Even if you prefer "started", I'd vote for not using past tense, hence
> runstate_is_started().

runstate_is_live also occurred to me.  I'll use that.

- Steve

Re: [PATCH 1/4] scsi: only access SCSIDevice->requests from one thread

2023-12-04 Thread Stefan Hajnoczi

On Fri, Dec 01, 2023 at 05:03:13PM +0100, Kevin Wolf wrote:
> Am 23.11.2023 um 20:49 hat Stefan Hajnoczi geschrieben:
> > Stop depending on the AioContext lock and instead access
> > SCSIDevice->requests from only one thread at a time:
> > - When the VM is running only the BlockBackend's AioContext may access
> >   the requests list.
> > - When the VM is stopped only the main loop may access the requests
> >   list.
> > 
> > These constraints protect the requests list without the need for locking
> > in the I/O code path.
> > 
> > Note that multiple IOThreads are not supported yet because the code
> > assumes all SCSIRequests are executed from a single AioContext. Leave
> > that as future work.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/hw/scsi/scsi.h |   7 +-
> >  hw/scsi/scsi-bus.c | 174 -
> >  2 files changed, 124 insertions(+), 57 deletions(-)
> > 
> > diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
> > index 3692ca82f3..10c4e8288d 100644
> > --- a/include/hw/scsi/scsi.h
> > +++ b/include/hw/scsi/scsi.h
> > @@ -69,14 +69,19 @@ struct SCSIDevice
> >  {
> >  DeviceState qdev;
> >  VMChangeStateEntry *vmsentry;
> > -QEMUBH *bh;
> >  uint32_t id;
> >  BlockConf conf;
> >  SCSISense unit_attention;
> >  bool sense_is_ua;
> >  uint8_t sense[SCSI_SENSE_BUF_SIZE];
> >  uint32_t sense_len;
> > +
> > +/*
> > + * The requests list is only accessed from the AioContext that executes
> > + * requests or from the main loop when IOThread processing is stopped.
> > + */
> >  QTAILQ_HEAD(, SCSIRequest) requests;
> > +
> >  uint32_t channel;
> >  uint32_t lun;
> >  int blocksize;
> > diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> > index fc4b77fdb0..b8bfde9565 100644
> > --- a/hw/scsi/scsi-bus.c
> > +++ b/hw/scsi/scsi-bus.c
> > @@ -85,6 +85,82 @@ SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, 
> > int id, int lun)
> >  return d;
> >  }
> >  
> > +/*
> > + * Invoke @fn() for each enqueued request in device @s. Must be called 
> > from the
> > + * main loop thread while the guest is stopped. This is only suitable for
> > + * vmstate ->put(), use scsi_device_for_each_req_async() for other cases.
> > + */
> > +static void scsi_device_for_each_req_sync(SCSIDevice *s,
> > +  void (*fn)(SCSIRequest *, void 
> > *),
> > +  void *opaque)
> > +{
> > +SCSIRequest *req;
> > +SCSIRequest *next_req;
> > +
> > +assert(!runstate_is_running());
> > +assert(qemu_in_main_thread());
> > +
> > +QTAILQ_FOREACH_SAFE(req, &s->requests, next, next_req) {
> > +fn(req, opaque);
> > +}
> > +}
> > +
> > +typedef struct {
> > +SCSIDevice *s;
> > +void (*fn)(SCSIRequest *, void *);
> > +void *fn_opaque;
> > +} SCSIDeviceForEachReqAsyncData;
> > +
> > +static void scsi_device_for_each_req_async_bh(void *opaque)
> > +{
> > +g_autofree SCSIDeviceForEachReqAsyncData *data = opaque;
> > +SCSIDevice *s = data->s;
> > +SCSIRequest *req;
> > +SCSIRequest *next;
> > +
> > +/*
> > + * It is unlikely that the AioContext will change before this BH is 
> > called,
> > + * but if it happens then ->requests must not be accessed from this
> > + * AioContext.
> > + */
> 
> What is the scenario where this happens? I would have expected that
> switching the AioContext of a node involves draining the node first,
> which would execute this BH before the context changes.

I don't think aio_poll() is invoked by bdrv_drained_begin() when there
are no requests in flight. In that case the BH could remain pending
across bdrv_drained_begin()/bdrv_drained_end().

> The other option I see is an empty BlockBackend, which can change its
> AioContext without polling BHs, but in that case there is no connection
> to other users, so the only change could come from virtio-scsi itself.
> If there is such a case, it would probably be helpful to be specific in
> the comment.
>
> > +if (blk_get_aio_context(s->conf.blk) == 
> > qemu_get_current_aio_context()) {
> > +QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) {
> > +data->fn(req, data->fn_opaque);
> > +}
> > +}
> 
> Of course, if the situation does happen, the question is why just doing
> nothing is correct. Wouldn't that mean that the guest still sees stuck
> requests?
> 
> Would rescheduling the BH in the new context be better?

In the case where there are no requests it is correct to do nothing, but
it's not a general solution.

> > +
> > +/* Drop the reference taken by scsi_device_for_each_req_async() */
> > +object_unref(OBJECT(s));
> > +}
> > +
> > +/*
> > + * Schedule @fn() to be invoked for each enqueued request in device @s. 
> > @fn()
> > + * runs in the AioContext that is executing the request.
> > + */
> > +static void scsi_device_for_each_req_async(SCSIDevice *

Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate

2023-12-04 Thread Peter Xu

On Fri, Dec 01, 2023 at 12:11:32PM -0500, Steven Sistare wrote:
> >> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> >> index f6a337b..1d6828f 100644
> >> --- a/include/sysemu/runstate.h
> >> +++ b/include/sysemu/runstate.h
> >> @@ -40,6 +40,11 @@ static inline bool 
> >> shutdown_caused_by_guest(ShutdownCause cause)
> >>  return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
> >>  }
> >>  
> >> +static inline bool runstate_is_started(RunState state)
> > 
> > Would runstate_has_vm_running() sound better?  It is a bit awkward when
> > saying something like "start a runstate".
> 
> I have been searching for the perfect name for this accessor.
> IMO using "running" in this accessor is confusing because it applies to both
> the running and suspended state.  So, I invented a new aggregate state called
> started.  vm_start transitions the machine to a started state.
> 
> How about runstate_was_started?  It works well at both start and stop call 
> sites:
> 
> void vm_resume(RunState state)
> if (runstate_was_started(state)) {

This one looks fine, but...

> vm_start();
> 
> int vm_stop_force_state(RunState state)
> if (runstate_was_started(runstate_get())) {

.. this one makes the past tense not looking good.

> return vm_stop(state);

How about runstate_is_alive()?  So far the best I can come up with. :)

Even if you prefer "started", I'd vote for not using past tense, hence
runstate_is_started().

Thanks,

-- 
Peter Xu

Re: [RFC 0/8] Support generic Luks encryption

2023-12-04 Thread Yong Huang

On Tue, Dec 5, 2023 at 12:24 AM Daniel P. Berrangé 
wrote:

> On Tue, Dec 05, 2023 at 12:06:17AM +0800, Hyman Huang wrote:
> > This functionality was motivated by the following to-do list seen
> > in crypto documents:
> > https://wiki.qemu.org/Features/Block/Crypto
> >
> > The last chapter says we should "separate header volume":
> >
> > The LUKS format has ability to store the header in a separate volume
> > from the payload. We should extend the LUKS driver in QEMU to support
> > this use case.
> >
> > As a proof-of-concept, I've created this patchset, which I've named
> > the Gluks: generic luks. As their name suggests, they offer encryption
> > for any format that QEMU theoretically supports.
>
> I don't see the point in creating a new driver.
>

Indeed, this definitely makes things simple. The next version would do that
!

>
> I would expect detached header support to be implemented via an
> optional new 'header' field in the existing driver. ie
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index ca390c5700..48d1f2a974 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3352,11 +3352,15 @@
>  # decryption key (since 2.6). Mandatory except when doing a
>  # metadata-only probe of the image.
>  #
> +# @header: optional reference to the location of a blockdev
> +# storing a detached LUKS heaer
> +#
>  # Since: 2.9
>  ##
>  { 'struct': 'BlockdevOptionsLUKS',
>'base': 'BlockdevOptionsGenericFormat',
> -  'data': { '*key-secret': 'str' } }
> +  'data': { '*key-secret': 'str',
> +"*header-file': 'BlockdevRef'} }
>
>  ##
>  # @BlockdevOptionsGenericCOWFormat:
> @@ -4941,9 +4945,18 @@
>  #
>  # Driver specific image creation options for LUKS.
>  #
> -# @file: Node to create the image format on
> +# @file: Node to create the image format on. Mandatory
> +# unless a detached header file is specified using
> +# @header.
>  #
> -# @size: Size of the virtual disk in bytes
> +# @size: Size of the virtual disk in bytes.  Mandatory
> +# unless a detached header file is specified using
> +# @header.
> +#
> +# @header: optional reference to the location of a blockdev
> +# storing a detached LUKS heaer. The @file option is
> +# is optional when this is given, unless it is desired
> +# to perform pre-allocation
>  #
>  # @preallocation: Preallocation mode for the new image (since: 4.2)
>  # (default: off; allowed values: off, metadata, falloc, full)
> @@ -4952,8 +4965,9 @@
>  ##
>  { 'struct': 'BlockdevCreateOptionsLUKS',
>'base': 'QCryptoBlockCreateOptionsLUKS',
> -  'data': { 'file': 'BlockdevRef',
> -'size': 'size',
> +  'data': { '*file':'BlockdevRef',
> +'*size':'size',
> +'*header':  'BlockdevRef'
>  '*preallocation':   'PreallocMode' } }
>
>  ##
>
> It ends up giving basicallly the same workflow as you outline,
> without needing the new block driver
>
Yes, most of the logic reuse the pre-existing Luks driver.


> With regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|
>
>

-- 
Best regards

[PATCH] Revert "test/qga: use G_TEST_DIR to locate os-release test file"

2023-12-04 Thread Andrey Drobyshev

Since the commit a85d09269b QGA_OS_RELEASE variable points to the path
relative to the build dir.  Then on qemu-ga startup this path can't be
found as qemu-ga cwd is somewhere else, which leads to the test failure:

  # ./tests/unit/test-qga -p /qga/guest-get-osinfo
  # random seed: R02S3a90c22d77ff1070fbd844f4959cf4a4
  # Start of qga tests
  **
  ERROR:../tests/unit/test-qga.c:906:test_qga_guest_get_osinfo: 'str' should 
not be NULL
  Bail out! ERROR:../tests/unit/test-qga.c:906:test_qga_guest_get_osinfo: 'str' 
should not be NULL

Let's obtain the absolute path again.

This reverts commit a85d09269bb1a7071d3ce0f2957e3ca9dba7c047.

Signed-off-by: Andrey Drobyshev 
---
 tests/unit/test-qga.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/unit/test-qga.c b/tests/unit/test-qga.c
index 671e83cb86..47cf5e30ec 100644
--- a/tests/unit/test-qga.c
+++ b/tests/unit/test-qga.c
@@ -1034,10 +1034,12 @@ static void test_qga_guest_get_osinfo(gconstpointer 
data)
 g_autoptr(QDict) ret = NULL;
 char *env[2];
 QDict *val;
+g_autofree gchar *cwd = NULL;
 
+cwd = g_get_current_dir();
 env[0] = g_strdup_printf(
-"QGA_OS_RELEASE=%s%c..%cdata%ctest-qga-os-release",
-g_test_get_dir(G_TEST_DIST), G_DIR_SEPARATOR, G_DIR_SEPARATOR, 
G_DIR_SEPARATOR);
+"QGA_OS_RELEASE=%s%ctests%cdata%ctest-qga-os-release",
+cwd, G_DIR_SEPARATOR, G_DIR_SEPARATOR, G_DIR_SEPARATOR);
 env[1] = NULL;
 fixture_setup(&fixture, NULL, env);
 
-- 
2.39.3

Re: [RFC 0/8] Support generic Luks encryption

2023-12-04 Thread Daniel P . Berrangé

On Tue, Dec 05, 2023 at 12:06:17AM +0800, Hyman Huang wrote:
> This functionality was motivated by the following to-do list seen
> in crypto documents:
> https://wiki.qemu.org/Features/Block/Crypto 
> 
> The last chapter says we should "separate header volume": 
> 
> The LUKS format has ability to store the header in a separate volume
> from the payload. We should extend the LUKS driver in QEMU to support
> this use case.
> 
> As a proof-of-concept, I've created this patchset, which I've named
> the Gluks: generic luks. As their name suggests, they offer encryption
> for any format that QEMU theoretically supports.

I don't see the point in creating a new driver.

I would expect detached header support to be implemented via an
optional new 'header' field in the existing driver. ie

diff --git a/qapi/block-core.json b/qapi/block-core.json
index ca390c5700..48d1f2a974 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3352,11 +3352,15 @@
 # decryption key (since 2.6). Mandatory except when doing a
 # metadata-only probe of the image.
 #
+# @header: optional reference to the location of a blockdev
+# storing a detached LUKS heaer
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsLUKS',
   'base': 'BlockdevOptionsGenericFormat',
-  'data': { '*key-secret': 'str' } }
+  'data': { '*key-secret': 'str',
+"*header-file': 'BlockdevRef'} }
 
 ##
 # @BlockdevOptionsGenericCOWFormat:
@@ -4941,9 +4945,18 @@
 #
 # Driver specific image creation options for LUKS.
 #
-# @file: Node to create the image format on
+# @file: Node to create the image format on. Mandatory
+# unless a detached header file is specified using
+# @header.
 #
-# @size: Size of the virtual disk in bytes
+# @size: Size of the virtual disk in bytes.  Mandatory
+# unless a detached header file is specified using
+# @header.
+#
+# @header: optional reference to the location of a blockdev
+# storing a detached LUKS heaer. The @file option is
+# is optional when this is given, unless it is desired
+# to perform pre-allocation
 #
 # @preallocation: Preallocation mode for the new image (since: 4.2)
 # (default: off; allowed values: off, metadata, falloc, full)
@@ -4952,8 +4965,9 @@
 ##
 { 'struct': 'BlockdevCreateOptionsLUKS',
   'base': 'QCryptoBlockCreateOptionsLUKS',
-  'data': { 'file': 'BlockdevRef',
-'size': 'size',
+  'data': { '*file':'BlockdevRef',
+'*size':'size',
+'*header':  'BlockdevRef'
 '*preallocation':   'PreallocMode' } }
 
 ##

It ends up giving basicallly the same workflow as you outline,
without needing the new block driver

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[RFC 4/8] Gluks: Introduce Gluks options

2023-12-04 Thread Hyman Huang

Similar to Luks, the Gluks format primarily recycles the
Luks choices with the exception of the "size" option.

Signed-off-by: Hyman Huang 
---
 block/crypto.c   |  4 ++--
 block/generic-luks.c | 18 ++
 block/generic-luks.h |  3 +++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 6afae1de2e..6f8528dccc 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -150,7 +150,7 @@ error:
 }
 
 
-static QemuOptsList block_crypto_runtime_opts_luks = {
+QemuOptsList block_crypto_runtime_opts_luks = {
 .name = "crypto",
 .head = QTAILQ_HEAD_INITIALIZER(block_crypto_runtime_opts_luks.head),
 .desc = {
@@ -181,7 +181,7 @@ static QemuOptsList block_crypto_create_opts_luks = {
 };
 
 
-static QemuOptsList block_crypto_amend_opts_luks = {
+QemuOptsList block_crypto_amend_opts_luks = {
 .name = "crypto",
 .head = QTAILQ_HEAD_INITIALIZER(block_crypto_create_opts_luks.head),
 .desc = {
diff --git a/block/generic-luks.c b/block/generic-luks.c
index f23e202991..ebc0365d40 100644
--- a/block/generic-luks.c
+++ b/block/generic-luks.c
@@ -35,6 +35,21 @@ typedef struct BDRVGLUKSState {
 uint64_t header_size;   /* In bytes */
 } BDRVGLUKSState;
 
+static QemuOptsList gluks_create_opts_luks = {
+.name = "crypto",
+.head = QTAILQ_HEAD_INITIALIZER(gluks_create_opts_luks.head),
+.desc = {
+BLOCK_CRYPTO_OPT_DEF_LUKS_KEY_SECRET(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_ALG(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_MODE(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_ALG(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG(""),
+BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME(""),
+{ /* end of list */ }
+},
+};
+
 static int gluks_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -71,6 +86,9 @@ static BlockDriver bdrv_generic_luks = {
 .bdrv_co_create_opts= gluks_co_create_opts,
 .bdrv_child_perm= gluks_child_perms,
 .bdrv_co_getlength  = gluks_co_getlength,
+
+.create_opts= &gluks_create_opts_luks,
+.amend_opts = &block_crypto_amend_opts_luks,
 };
 
 static void block_generic_luks_init(void)
diff --git a/block/generic-luks.h b/block/generic-luks.h
index 2aae866fa4..f18adf41ea 100644
--- a/block/generic-luks.h
+++ b/block/generic-luks.h
@@ -23,4 +23,7 @@
 #ifndef GENERIC_LUKS_H
 #define GENERIC_LUKS_H
 
+extern QemuOptsList block_crypto_runtime_opts_luks;
+extern QemuOptsList block_crypto_amend_opts_luks;
+
 #endif /* GENERIC_LUKS_H */
-- 
2.39.1

[RFC 6/8] crypto: Provide the Luks crypto driver to Gluks

2023-12-04 Thread Hyman Huang

Hooks up the Luks crypto driver for Gluks.

Signed-off-by: Hyman Huang 
---
 crypto/block.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/block.c b/crypto/block.c
index 3dcf22a69f..7e695c0a04 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -27,6 +27,7 @@
 static const QCryptoBlockDriver *qcrypto_block_drivers[] = {
 [Q_CRYPTO_BLOCK_FORMAT_QCOW] = &qcrypto_block_driver_qcow,
 [Q_CRYPTO_BLOCK_FORMAT_LUKS] = &qcrypto_block_driver_luks,
+[Q_CRYPTO_BLOCK_FORMAT_GLUKS] = &qcrypto_block_driver_luks,
 };
 
 
-- 
2.39.1

[PATCH] hw/ufs: avoid generating the same ID string for different LU devices

2023-12-04 Thread Akinobu Mita

QEMU would not start when trying to create two UFS host controllers and
a UFS logical unit for each with the following options:

-device ufs,id=bus0 \
-device ufs-lu,drive=drive1,bus=bus0,lun=0 \
-device ufs,id=bus1 \
-device ufs-lu,drive=drive2,bus=bus1,lun=0 \

This is because the same ID string ("0:0:0/scsi-disk") is generated
for both UFS logical units.

To fix this issue, prepend the parent pci device's path to make
the ID string unique.
(":00:03.0/0:0:0/scsi-disk" and ":00:04.0/0:0:0/scsi-disk")

Fixes: 096434fea13a ("hw/ufs: Modify lu.c to share codes with SCSI subsystem")
Signed-off-by: Akinobu Mita 
---
 hw/ufs/ufs.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/ufs/ufs.c b/hw/ufs/ufs.c
index 68c5f1f6c9..eccdb852a0 100644
--- a/hw/ufs/ufs.c
+++ b/hw/ufs/ufs.c
@@ -1323,9 +1323,17 @@ static bool ufs_bus_check_address(BusState *qbus, 
DeviceState *qdev,
 return true;
 }
 
+static char *ufs_bus_get_dev_path(DeviceState *dev)
+{
+BusState *bus = qdev_get_parent_bus(dev);
+
+return qdev_get_dev_path(bus->parent);
+}
+
 static void ufs_bus_class_init(ObjectClass *class, void *data)
 {
 BusClass *bc = BUS_CLASS(class);
+bc->get_dev_path = ufs_bus_get_dev_path;
 bc->check_address = ufs_bus_check_address;
 }
 
-- 
2.34.1

[RFC 5/8] qapi: Introduce Gluks types to qapi

2023-12-04 Thread Hyman Huang

Primarily using the Luks types again, Gluks adds an
extra option called "header", which points to the Luks
header node's description.

Signed-off-by: Hyman Huang 
---
 qapi/block-core.json | 22 +-
 qapi/crypto.json | 10 +++---
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index ca390c5700..e2208f6891 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3185,12 +3185,14 @@
 #
 # @snapshot-access: Since 7.0
 #
+# @gluks: Since 9.0
+#
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
 'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
-'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
+'file', 'snapshot-access', 'ftp', 'ftps', 'gluks', 'gluster',
 {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
 {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
 'http', 'https',
@@ -3957,6 +3959,23 @@
 '*debug': 'int',
 '*logfile': 'str' } }
 
+##
+# @BlockdevOptionsGLUKS:
+#
+# Driver specific block device options for GLUKS.
+#
+# @header: reference to the definition of the luks header node.
+#
+# @key-secret: the ID of a QCryptoSecret object providing the
+# decryption key.
+#
+# Since: 9.0
+##
+{ 'struct': 'BlockdevOptionsGLUKS',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'header': 'BlockdevRef',
+'key-secret': 'str' } }
+
 ##
 # @BlockdevOptionsIoUring:
 #
@@ -4680,6 +4699,7 @@
   'file':   'BlockdevOptionsFile',
   'ftp':'BlockdevOptionsCurlFtp',
   'ftps':   'BlockdevOptionsCurlFtps',
+  'gluks':  'BlockdevOptionsGLUKS',
   'gluster':'BlockdevOptionsGluster',
   'host_cdrom':  { 'type': 'BlockdevOptionsFile',
'if': 'HAVE_HOST_BLOCK_DEVICE' },
diff --git a/qapi/crypto.json b/qapi/crypto.json
index fd3d46ebd1..9afb242b5b 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -154,11 +154,13 @@
 #
 # @luks: LUKS encryption format.  Recommended for new images
 #
+# @gluks: generic LUKS encryption format. (since 9.0)
+#
 # Since: 2.6
 ##
 { 'enum': 'QCryptoBlockFormat',
 #  'prefix': 'QCRYPTO_BLOCK_FORMAT',
-  'data': ['qcow', 'luks']}
+  'data': ['qcow', 'luks', 'gluks']}
 
 ##
 # @QCryptoBlockOptionsBase:
@@ -246,7 +248,8 @@
   'base': 'QCryptoBlockOptionsBase',
   'discriminator': 'format',
   'data': { 'qcow': 'QCryptoBlockOptionsQCow',
-'luks': 'QCryptoBlockOptionsLUKS' } }
+'luks': 'QCryptoBlockOptionsLUKS',
+'gluks': 'QCryptoBlockOptionsLUKS' } }
 
 ##
 # @QCryptoBlockCreateOptions:
@@ -260,7 +263,8 @@
   'base': 'QCryptoBlockOptionsBase',
   'discriminator': 'format',
   'data': { 'qcow': 'QCryptoBlockOptionsQCow',
-'luks': 'QCryptoBlockCreateOptionsLUKS' } }
+'luks': 'QCryptoBlockCreateOptionsLUKS',
+'gluks': 'QCryptoBlockCreateOptionsLUKS' } }
 
 ##
 # @QCryptoBlockInfoBase:
-- 
2.39.1

[RFC 1/8] crypto: Export util functions and structures

2023-12-04 Thread Hyman Huang

Luks driver logic is primarily reused by Gluk, which,
therefore, exports several pre-existing functions and
structures.

Signed-off-by: Hyman Huang 
---
 block/crypto.c | 16 
 block/crypto.h | 23 +++
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 921933a5e5..6afae1de2e 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -34,14 +34,6 @@
 #include "qemu/memalign.h"
 #include "crypto.h"
 
-typedef struct BlockCrypto BlockCrypto;
-
-struct BlockCrypto {
-QCryptoBlock *block;
-bool updating_keys;
-};
-
-
 static int block_crypto_probe_generic(QCryptoBlockFormat format,
   const uint8_t *buf,
   int buf_size,
@@ -321,7 +313,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 }
 
 
-static int coroutine_fn GRAPH_UNLOCKED
+int coroutine_fn GRAPH_UNLOCKED
 block_crypto_co_create_generic(BlockDriverState *bs, int64_t size,
QCryptoBlockCreateOptions *opts,
PreallocMode prealloc, Error **errp)
@@ -385,7 +377,7 @@ block_crypto_co_truncate(BlockDriverState *bs, int64_t 
offset, bool exact,
 return bdrv_co_truncate(bs->file, offset, exact, prealloc, 0, errp);
 }
 
-static void block_crypto_close(BlockDriverState *bs)
+void block_crypto_close(BlockDriverState *bs)
 {
 BlockCrypto *crypto = bs->opaque;
 qcrypto_block_free(crypto->block);
@@ -404,7 +396,7 @@ static int block_crypto_reopen_prepare(BDRVReopenState 
*state,
  */
 #define BLOCK_CRYPTO_MAX_IO_SIZE (1024 * 1024)
 
-static int coroutine_fn GRAPH_RDLOCK
+int coroutine_fn GRAPH_RDLOCK
 block_crypto_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
QEMUIOVector *qiov, BdrvRequestFlags flags)
 {
@@ -466,7 +458,7 @@ block_crypto_co_preadv(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 }
 
 
-static int coroutine_fn GRAPH_RDLOCK
+int coroutine_fn GRAPH_RDLOCK
 block_crypto_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
 QEMUIOVector *qiov, BdrvRequestFlags flags)
 {
diff --git a/block/crypto.h b/block/crypto.h
index 72e792c9af..06465009f0 100644
--- a/block/crypto.h
+++ b/block/crypto.h
@@ -21,6 +21,8 @@
 #ifndef BLOCK_CRYPTO_H
 #define BLOCK_CRYPTO_H
 
+#include "crypto/block.h"
+
 #define BLOCK_CRYPTO_OPT_DEF_KEY_SECRET(prefix, helpstr)\
 {   \
 .name = prefix BLOCK_CRYPTO_OPT_QCOW_KEY_SECRET,\
@@ -131,4 +133,25 @@ block_crypto_amend_opts_init(QDict *opts, Error **errp);
 QCryptoBlockOpenOptions *
 block_crypto_open_opts_init(QDict *opts, Error **errp);
 
+typedef struct BlockCrypto BlockCrypto;
+
+struct BlockCrypto {
+QCryptoBlock *block;
+bool updating_keys;
+};
+
+int coroutine_fn GRAPH_UNLOCKED
+block_crypto_co_create_generic(BlockDriverState *bs, int64_t size,
+   QCryptoBlockCreateOptions *opts,
+   PreallocMode prealloc, Error **errp);
+
+int coroutine_fn GRAPH_RDLOCK
+block_crypto_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
+   QEMUIOVector *qiov, BdrvRequestFlags flags);
+
+int coroutine_fn GRAPH_RDLOCK
+block_crypto_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
+QEMUIOVector *qiov, BdrvRequestFlags flags);
+
+void block_crypto_close(BlockDriverState *bs);
 #endif /* BLOCK_CRYPTO_H */
-- 
2.39.1

[RFC 2/8] crypto: Introduce payload offset set function

2023-12-04 Thread Hyman Huang

Implement the payload offset set function for Gluks.

Signed-off-by: Hyman Huang 
---
 crypto/block.c | 4 
 include/crypto/block.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/crypto/block.c b/crypto/block.c
index 7bb4b74a37..3dcf22a69f 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -319,6 +319,10 @@ QCryptoHashAlgorithm 
qcrypto_block_get_kdf_hash(QCryptoBlock *block)
 return block->kdfhash;
 }
 
+void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset)
+{
+block->payload_offset = offset;
+}
 
 uint64_t qcrypto_block_get_payload_offset(QCryptoBlock *block)
 {
diff --git a/include/crypto/block.h b/include/crypto/block.h
index 4f63a37872..b47a90c529 100644
--- a/include/crypto/block.h
+++ b/include/crypto/block.h
@@ -312,4 +312,5 @@ void qcrypto_block_free(QCryptoBlock *block);
 
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlock, qcrypto_block_free)
 
+void qcrypto_block_set_payload_offset(QCryptoBlock *block, uint64_t offset);
 #endif /* QCRYPTO_BLOCK_H */
-- 
2.39.1

[RFC 3/8] Gluks: Add the basic framework

2023-12-04 Thread Hyman Huang

Gluks would be a built-in format in the QEMU block layer.

Signed-off-by: Hyman Huang 
---
 block/generic-luks.c | 81 
 block/generic-luks.h | 26 ++
 block/meson.build|  1 +
 3 files changed, 108 insertions(+)
 create mode 100644 block/generic-luks.c
 create mode 100644 block/generic-luks.h

diff --git a/block/generic-luks.c b/block/generic-luks.c
new file mode 100644
index 00..f23e202991
--- /dev/null
+++ b/block/generic-luks.c
@@ -0,0 +1,81 @@
+/*
+ * QEMU block driver for the generic luks encryption
+ *
+ * Copyright (c) 2024 SmartX Inc
+ *
+ * Author: Hyman Huang 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "block/block_int.h"
+#include "block/crypto.h"
+#include "crypto/block.h"
+
+#include "generic-luks.h"
+
+/* BDRVGLUKSState holds the state of one generic LUKS instance */
+typedef struct BDRVGLUKSState {
+BlockCrypto crypto;
+BdrvChild *header;  /* LUKS header node */
+uint64_t header_size;   /* In bytes */
+} BDRVGLUKSState;
+
+static int gluks_open(BlockDriverState *bs, QDict *options, int flags,
+  Error **errp)
+{
+return 0;
+}
+
+static int coroutine_fn GRAPH_UNLOCKED
+gluks_co_create_opts(BlockDriver *drv, const char *filename,
+ QemuOpts *opts, Error **errp)
+{
+return 0;
+}
+
+static void
+gluks_child_perms(BlockDriverState *bs, BdrvChild *c,
+  const BdrvChildRole role,
+  BlockReopenQueue *reopen_queue,
+  uint64_t perm, uint64_t shared,
+  uint64_t *nperm, uint64_t *nshared)
+{
+
+}
+
+static int64_t coroutine_fn GRAPH_RDLOCK
+gluks_co_getlength(BlockDriverState *bs)
+{
+return 0;
+}
+
+static BlockDriver bdrv_generic_luks = {
+.format_name= "gluks",
+.instance_size  = sizeof(BDRVGLUKSState),
+.bdrv_open  = gluks_open,
+.bdrv_co_create_opts= gluks_co_create_opts,
+.bdrv_child_perm= gluks_child_perms,
+.bdrv_co_getlength  = gluks_co_getlength,
+};
+
+static void block_generic_luks_init(void)
+{
+bdrv_register(&bdrv_generic_luks);
+}
+
+block_init(block_generic_luks_init);
diff --git a/block/generic-luks.h b/block/generic-luks.h
new file mode 100644
index 00..2aae866fa4
--- /dev/null
+++ b/block/generic-luks.h
@@ -0,0 +1,26 @@
+/*
+ * QEMU block driver for the generic luks encryption
+ *
+ * Copyright (c) 2024 SmartX Inc
+ *
+ * Author: Hyman Huang 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ *
+ */
+
+#ifndef GENERIC_LUKS_H
+#define GENERIC_LUKS_H
+
+#endif /* GENERIC_LUKS_H */
diff --git a/block/meson.build b/block/meson.build
index 59ff6d380c..74f2da7bed 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -39,6 +39,7 @@ block_ss.add(files(
   'throttle.c',
   'throttle-groups.c',
   'write-threshold.c',
+  'generic-luks.c',
 ), zstd, zlib, gnutls)
 
 system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
-- 
2.39.1

[RFC 8/8] block: Support Gluks format image creation using qemu-img

2023-12-04 Thread Hyman Huang

To create a Gluks header image, use the command as follows:
$ qemu-img create --object secret,id=sec0,data=abc123 -f gluks
> -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0
> cipher.gluks

Signed-off-by: Hyman Huang 
---
 block.c  |  5 +
 block/generic-luks.c | 53 +++-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index bfb0861ec6..cc9a517a25 100644
--- a/block.c
+++ b/block.c
@@ -7517,6 +7517,11 @@ void bdrv_img_create(const char *filename, const char 
*fmt,
 goto out;
 }
 
+if (!strcmp(fmt, "gluks")) {
+qemu_opt_set(opts, "size", "0M", &local_err);
+size = 0;
+}
+
 if (size == -1) {
 error_setg(errp, "Image creation needs a size parameter");
 goto out;
diff --git a/block/generic-luks.c b/block/generic-luks.c
index 32cbedc86f..579f01c4b0 100644
--- a/block/generic-luks.c
+++ b/block/generic-luks.c
@@ -145,7 +145,58 @@ static int coroutine_fn GRAPH_UNLOCKED
 gluks_co_create_opts(BlockDriver *drv, const char *filename,
  QemuOpts *opts, Error **errp)
 {
-return 0;
+QCryptoBlockCreateOptions *create_opts = NULL;
+BlockDriverState *bs = NULL;
+QDict *cryptoopts;
+int ret;
+
+if (qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) != 0) {
+info_report("gluks format image need not size parameter, ignore it");
+}
+
+cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
+ &gluks_create_opts_luks,
+ true);
+
+qdict_put_str(cryptoopts, "format",
+QCryptoBlockFormat_str(Q_CRYPTO_BLOCK_FORMAT_GLUKS));
+
+create_opts = block_crypto_create_opts_init(cryptoopts, errp);
+if (!create_opts) {
+ret = -EINVAL;
+goto fail;
+}
+
+/* Create protocol layer */
+ret = bdrv_co_create_file(filename, opts, errp);
+if (ret < 0) {
+goto fail;
+}
+
+bs = bdrv_co_open(filename, NULL, NULL,
+  BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, errp);
+if (!bs) {
+ret = -EINVAL;
+goto fail;
+}
+/* Create format layer */
+ret = block_crypto_co_create_generic(bs, 0, create_opts, 0, errp);
+if (ret < 0) {
+goto fail;
+}
+
+ret = 0;
+fail:
+/*
+ * If an error occurred, delete 'filename'. Even if the file existed
+ * beforehand, it has been truncated and corrupted in the process.
+ */
+if (ret) {
+bdrv_graph_co_rdlock();
+bdrv_co_delete_file_noerr(bs);
+bdrv_graph_co_rdunlock();
+}
+return ret;
 }
 
 static void
-- 
2.39.1

[RFC 7/8] Gluks: Implement the fundamental block layer driver hooks

2023-12-04 Thread Hyman Huang

Signed-off-by: Hyman Huang 
---
 block/generic-luks.c | 104 ++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/block/generic-luks.c b/block/generic-luks.c
index ebc0365d40..32cbedc86f 100644
--- a/block/generic-luks.c
+++ b/block/generic-luks.c
@@ -23,8 +23,14 @@
 #include "qemu/osdep.h"
 
 #include "block/block_int.h"
+#include "block/block-io.h"
 #include "block/crypto.h"
+#include "block/qdict.h"
 #include "crypto/block.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
 
 #include "generic-luks.h"
 
@@ -50,10 +56,89 @@ static QemuOptsList gluks_create_opts_luks = {
 },
 };
 
+static int gluks_read_func(QCryptoBlock *block,
+   size_t offset,
+   uint8_t *buf,
+   size_t buflen,
+   void *opaque,
+   Error **errp)
+{
+
+BlockDriverState *bs = opaque;
+BDRVGLUKSState *s = bs->opaque;
+ssize_t ret;
+
+GLOBAL_STATE_CODE();
+GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+ret = bdrv_pread(s->header, offset, buflen, buf, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Could not read generic luks header");
+return ret;
+}
+return 0;
+}
+
 static int gluks_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
-return 0;
+BDRVGLUKSState *s = bs->opaque;
+QemuOpts *opts = NULL;
+QCryptoBlockOpenOptions *open_opts = NULL;
+QDict *cryptoopts = NULL;
+unsigned int cflags = 0;
+int ret;
+
+GLOBAL_STATE_CODE();
+
+if (!bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+ (BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY), false, errp)) 
{
+return -EINVAL;
+}
+s->header = bdrv_open_child(NULL, options, "header", bs,
+&child_of_bds, BDRV_CHILD_METADATA, false,
+errp);
+if (!s->header) {
+return -EINVAL;
+}
+
+GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+opts = qemu_opts_create(&block_crypto_runtime_opts_luks,
+NULL, 0, &error_abort);
+if (!qemu_opts_absorb_qdict(opts, options, errp)) {
+ret = -EINVAL;
+goto cleanup;
+}
+
+cryptoopts = qemu_opts_to_qdict(opts, NULL);
+qdict_put_str(cryptoopts, "format",
+QCryptoBlockFormat_str(Q_CRYPTO_BLOCK_FORMAT_GLUKS));
+
+open_opts = block_crypto_open_opts_init(cryptoopts, errp);
+if (!open_opts) {
+goto cleanup;
+}
+
+s->crypto.block = qcrypto_block_open(open_opts, NULL,
+ gluks_read_func,
+ bs,
+ cflags,
+ 1,
+ errp);
+if (!s->crypto.block) {
+ret = -EIO;
+goto cleanup;
+}
+
+s->header_size = qcrypto_block_get_payload_offset(s->crypto.block);
+qcrypto_block_set_payload_offset(s->crypto.block, 0);
+
+ret = 0;
+ cleanup:
+qobject_unref(cryptoopts);
+qapi_free_QCryptoBlockOpenOptions(open_opts);
+return ret;
 }
 
 static int coroutine_fn GRAPH_UNLOCKED
@@ -70,13 +155,24 @@ gluks_child_perms(BlockDriverState *bs, BdrvChild *c,
   uint64_t perm, uint64_t shared,
   uint64_t *nperm, uint64_t *nshared)
 {
+if (role & BDRV_CHILD_METADATA) {
+/* assign read permission only */
+perm |= BLK_PERM_CONSISTENT_READ;
+/* share all permissions */
+shared |= BLK_PERM_ALL;
 
+*nperm = perm;
+*nshared = shared;
+return;
+}
+
+bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, 
nshared);
 }
 
 static int64_t coroutine_fn GRAPH_RDLOCK
 gluks_co_getlength(BlockDriverState *bs)
 {
-return 0;
+return bdrv_co_getlength(bs->file->bs);
 }
 
 static BlockDriver bdrv_generic_luks = {
@@ -87,8 +183,12 @@ static BlockDriver bdrv_generic_luks = {
 .bdrv_child_perm= gluks_child_perms,
 .bdrv_co_getlength  = gluks_co_getlength,
 
+.bdrv_close = block_crypto_close,
+.bdrv_co_preadv = block_crypto_co_preadv,
+.bdrv_co_pwritev= block_crypto_co_pwritev,
 .create_opts= &gluks_create_opts_luks,
 .amend_opts = &block_crypto_amend_opts_luks,
+.is_format  = false,
 };
 
 static void block_generic_luks_init(void)
-- 
2.39.1

[RFC 0/8] Support generic Luks encryption

2023-12-04 Thread Hyman Huang

This functionality was motivated by the following to-do list seen
in crypto documents:
https://wiki.qemu.org/Features/Block/Crypto 

The last chapter says we should "separate header volume": 

The LUKS format has ability to store the header in a separate volume
from the payload. We should extend the LUKS driver in QEMU to support
this use case.

As a proof-of-concept, I've created this patchset, which I've named
the Gluks: generic luks. As their name suggests, they offer encryption
for any format that QEMU theoretically supports.

As you can see below, the Gluks format block layer driver's design is
quite simple.

 virtio-blk/vhost-user-blk...(front-end device)
  ^
  |
 Gluks   (format-like disk node) 
  / \ 
   file   header (blockdev reference)
/ \
 filefile (protocol node)
   |   |
   disk data   Luks data 

We don't need to create a new disk format in order to use the Gluks
to encrypt the disk; all we need to do is construct a Luks header, which
we will refer to as the "Gluk" because it only contains Luks header data
and no user data. The creation command, for instance, is nearly
identical to Luks image:

$ qemu-img create --object secret,id=sec0,data=abc123 -f gluks
  -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0
  cipher.gluks

As previously mentioned, the "size" option is not accepted during the
generation of the Gluks format because it only contains the Luks header
data.

To hot-add a raw disk with Gluks encryption, see the following steps:

1. add a protocol blockdev node of data disk 
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-1-storage", "driver": "file",
  "filename": "/path/to/test_disk.raw"}}'

2. add a protocol blockdev node of Luks header
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-2-storage", "driver": "file",
  "filename": "/path/to/cipher.gluks" }}'

3. add the secret for decrypting the cipher stored in Gluks header
$ virsh qemu-monitor-command c81_node1 '{"execute":"object-add",
  "arguments":{"qom-type": "secret", "id":
  "libvirt-2-storage-secret0", "data": "abc123"}}'

4. add the Gluks-drived blockdev to connect the user disk with Luks
   header, QEMU will use the cipher in the Luks header to
   encrypt/decrypt the disk data
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-1-format", "driver": "gluks", "file":
  "libvirt-1-storage", "header": "libvirt-2-storage", "key-secret":
  "libvirt-2-storage-secret0"}}' 

5. add the device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
  "arguments": {"num-queues": "1", "driver": "virtio-blk-pci", "scsi":
  "off", "drive": "libvirt-1-format", "id": "virtio-disk1"}}'

Do the reverse to hot-del the raw disk.

To hot-add a qcow2 disk with Gluks encryption:

1. add a protocol blockdev node of data disk
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-1-storage", "driver": "file",
  "filename": "/path/to/test_disk.qcow2"}}'

2. add a protocol blockdev node of Luks header as above.
   block ref: libvirt-2-storage

3. add the secret for decrypting the cipher stored in Gluks header as
   above too 
   secret ref: libvirt-2-storage-secret0

4. add the qcow2-drived blockdev format node:
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-1-format", "driver": "qcow2",
  "file": "libvirt-1-storage"}}'

5. add the Gluks-drived blockdev to connect the qcow2 disk with Luks
   header 
$ virsh qemu-monitor-command vm '{"execute":"blockdev-add",
  "arguments":{"node-name": "libvirt-2-format", "driver": "gluks",
  "file": "libvirt-1-format", "header": "libvirt-2-storage",
  "key-secret": "libvirt-2-format-secret0"}}'

6. add the device finally
$ virsh qemu-monitor-command vm '{"execute":"device_add",
  "arguments": {"num-queues": "1", "driver": "virtio-blk-pci", "scsi":
  "off", "drive": "libvirt-2-format", "id": "virtio-disk2"}}'

In a virtual machine, several disk nodes are allowed to share a single
Gluks header.

This patchset, as previously said, is a proof-of-concept; additional
work may be required before productization. As the title suggests, we
have uploaded it solely for comments. Additionally, a thorough test
would be performed on the following version.

Any ideas and comments about this feature would be appreciated.

Thanks,

Yong

Best regared !

Hyman Huang (8):
  crypto: Export util functions and structures
  crypto: Introduce payload offset set function
  Gluks: Add the basic framework
  Gluks: Introduce Gluks options
  qapi: Introduce Gluks types to qapi
  crypto: Provide the Luks crypto driver to Gluks
  Gluks: Implement the fundamental block layer driver hooks.
  block: Support Gluks format image creation using

Re: [PATCH 2/4] virtio-scsi: don't lock AioContext around virtio_queue_aio_attach_host_notifier()

2023-12-04 Thread Stefan Hajnoczi

On Mon, Nov 27, 2023 at 09:21:08AM -0600, Eric Blake wrote:
> On Thu, Nov 23, 2023 at 02:49:29PM -0500, Stefan Hajnoczi wrote:
> > virtio_queue_aio_attach_host_notifier() does not require the AioContext
> > lock. Stop taking the lock and remember add an explicit smp_wmb()
> 
> s/remember// ?

Will fix, thanks!

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 2/2] linux-user: Fix openat() emulation to not modify atime

2023-12-04 Thread Stefan Hajnoczi

On Mon, Dec 04, 2023 at 02:39:24PM +0100, Philippe Mathieu-Daudé wrote:
> Hi Laurent, Helge, Richard,
> 
> On 1/12/23 19:51, Shu-Chun Weng wrote:
> > On Fri, Dec 1, 2023 at 4:42 AM Philippe Mathieu-Daudé  > > wrote:
> > 
> > Hi Shu-Chun,
> > 
> > On 1/12/23 04:21, Shu-Chun Weng wrote:
> >  > Commit b8002058 strengthened openat()'s /proc detection by calling
> >  > realpath(3) on the given path, which allows various paths and
> > symlinks
> >  > that points to the /proc file system to be intercepted correctly.
> >  >
> >  > Using realpath(3), though, has a side effect that it reads the
> > symlinks
> >  > along the way, and thus changes their atime. The results in the
> >  > following code snippet already get ~now instead of the real atime:
> >  >
> >  >    int fd = open("/path/to/a/symlink", O_PATH | O_NOFOLLOW);
> >  >    struct stat st;
> >  >    fstat(fd, st);
> >  >    return st.st_atime;
> >  >
> >  > This change opens a path that doesn't appear to be part of /proc
> >  > directly and checks the destination of /proc/self/fd/n to
> > determine if
> >  > it actually refers to a file in /proc.
> >  >
> >  > Neither this nor the existing code works with symlinks or
> > indirect paths
> >  > (e.g.  /tmp/../proc/self/exe) that points to /proc/self/exe
> > because it
> >  > is itself a symlink, and both realpath(3) and /proc/self/fd/n will
> >  > resolve into the location of QEMU.
> > 
> > Does this fix any of the following issues?
> > https://gitlab.com/qemu-project/qemu/-/issues/829
> > 
> > 
> > 
> > Not this one -- this is purely in the logic of util/path.c, which we do
> > see and carry an internal patch. It's quite a behavior change so we
> > never upstreamed it.
> > 
> > https://gitlab.com/qemu-project/qemu/-/issues/927
> > 
> > 
> > 
> > No, either. This patch only touches the path handling, not how files are
> > opened.
> > 
> > https://gitlab.com/qemu-project/qemu/-/issues/2004
> > 
> > 
> > 
> > Yes! Though I don't have a toolchain for HPPA or any of the
> > architectures intercepting /proc/cpuinfo handy, I hacked the condition
> > and confirmed that on 7.1 and 8.2, test.c as attached in the bug prints
> > out the host cpuinfo while with this patch, it prints out the content
> > generated by `open_cpuinfo()`.
> > 
> > 
> > 
> >  > Signed-off-by: Shu-Chun Weng  > >
> > 
> > 
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2004
> > 
> 
> Do we need to merge this for 8.2?

Please assign release blocker issues to the 8.2 milestone so that are
tracked:
https://gitlab.com/qemu-project/qemu/-/milestones/10

Thanks,
Stefan

> 
> > 
> >  > ---
> >  >   linux-user/syscall.c | 42
> > +-
> >  >   1 file changed, 33 insertions(+), 9 deletions(-)
> > 
> > 
> > On Fri, Dec 1, 2023 at 9:09 AM Helge Deller  > > wrote:
> > 
> > On 12/1/23 04:21, Shu-Chun Weng wrote:
> >  > Commit b8002058 strengthened openat()'s /proc detection by calling
> >  > realpath(3) on the given path, which allows various paths and
> > symlinks
> >  > that points to the /proc file system to be intercepted correctly.
> >  >
> >  > Using realpath(3), though, has a side effect that it reads the
> > symlinks
> >  > along the way, and thus changes their atime.
> > 
> > Ah, ok. I didn't thought of that side effect when I came up with the
> > patch.
> > Does the updated atimes trigger some real case issue ?
> > 
> > 
> > We have an internal library shimming the underlying filesystem that uses
> > the `open(O_PATH|O_NOFOLLOW)`+`fstat()` pattern for all file stats.
> > Checking symlink atime is in one of the unittests, though I don't know
> > if production ever uses it.
> > 
> > 
> > Helge
> > 
> 


signature.asc
Description: PGP signature

Re: [PATCH 06/12] scsi: remove AioContext locking

2023-12-04 Thread Stefan Hajnoczi

On Mon, Dec 04, 2023 at 01:23:09PM +0100, Kevin Wolf wrote:
> Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> > The AioContext lock no longer has any effect. Remove it.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/hw/virtio/virtio-scsi.h | 14 --
> >  hw/scsi/scsi-bus.c  |  2 --
> >  hw/scsi/scsi-disk.c | 28 
> >  hw/scsi/virtio-scsi.c   | 18 --
> >  4 files changed, 4 insertions(+), 58 deletions(-)
> 
> > @@ -2531,13 +2527,11 @@ static void scsi_unrealize(SCSIDevice *dev)
> >  static void scsi_hd_realize(SCSIDevice *dev, Error **errp)
> >  {
> >  SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
> > -AioContext *ctx = NULL;
> > +
> >  /* can happen for devices without drive. The error message for missing
> >   * backend will be issued in scsi_realize
> >   */
> >  if (s->qdev.conf.blk) {
> > -ctx = blk_get_aio_context(s->qdev.conf.blk);
> > -aio_context_acquire(ctx);
> >  if (!blkconf_blocksizes(&s->qdev.conf, errp)) {
> >  goto out;
> >  }
> > @@ -2549,15 +2543,11 @@ static void scsi_hd_realize(SCSIDevice *dev, Error 
> > **errp)
> >  }
> >  scsi_realize(&s->qdev, errp);
> >  out:
> > -if (ctx) {
> > -aio_context_release(ctx);
> > -}
> >  }
> 
> This doesn't build for me:
> 
> ../hw/scsi/scsi-disk.c:2545:1: error: label at end of compound statement is a 
> C2x extension [-Werror,-Wc2x-extensions]
> }
> ^
> 1 error generated.

Will fix in v2. Thanks!

Stefan


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 163 matches

Mail list logo