[PATCH v2] target/i386: Restore TSX features with taa-no

2022-07-13 Thread Zhenzhong Duan
On ICX-2S2 host, when run L2 guest with both L1/L2 using Icelake-Server-v3
or above, we got below warning:

"warning: host doesn't support requested feature: MSR(10AH).taa-no [bit 8]"

This is because L1 KVM doesn't expose taa-no to L2 if RTM is disabled,
then starting L2 qemu triggers the warning.

Fix it by restoring TSX features in Icelake-Server-v3, which may also help
guest performance if host isn't susceptible to TSX Async Abort (TAA)
vulnerabilities.

Fixes: d965dc35592d ("target/i386: Add ARCH_CAPABILITIES related bits into 
Icelake-Server CPU model")
Tested-by: Xiangfei Ma 
Signed-off-by: Zhenzhong Duan 
---
v2: Rewrite commit message

 target/i386/cpu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 14f681e998cc..25ef972a3eed 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3423,6 +3423,9 @@ static const X86CPUDefinition builtin_x86_defs[] = {
 {
 .version = 3,
 .props = (PropValue[]) {
+/* Restore TSX features removed by -v2 above */
+{ "hle", "on" },
+{ "rtm", "on" },
 { "arch-capabilities", "on" },
 { "rdctl-no", "on" },
 { "ibrs-all", "on" },
-- 
2.25.1




Re: [PATCH 00/12] hw/nvme: misc fixes and updates

2022-07-13 Thread Klaus Jensen
On Jun 23 23:18, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> This series includes a couple of misc fixes as well as some cleanup
> pertaining to the aio handling in flush, dsm, copy and zone reset. As
> Jinhao gets around to iothread stuff, it might come in handy to have
> this stuff cleaned up a bit.
> 
> Dmitrys fix (nvme-next commit "hw/nvme: add missing return statement")
> for dsm prompted me to audit the flush, dsm, zone reset and copy code
> and that resulted in the discovery of some bugs and some general clean
> up.
> 
> Klaus Jensen (12):
>   hw/nvme: fix incorrect use of errp/local_err
>   hw/nvme: remove redundant passing of PCIDevice
>   hw/nvme: cleanup error reporting in nvme_init_pci()
>   hw/nvme: fix numzrwa handling
>   hw/nvme: fix accidental reintroduction of redundant code
>   hw/nvme: fix cancellation of format operations
>   hw/nvme: fix flush cancel
>   hw/nvme: rework flush bh scheduling
>   hw/nvme: improve cancellation handling in zone reset
>   hw/nvme: improve cancellation handling in dsm
>   hw/nvme: simplify copy command error handling
>   hw/nvme: align logic of format with flush
> 
>  hw/nvme/ctrl.c | 252 +++--
>  hw/nvme/ns.c   |   4 +-
>  2 files changed, 119 insertions(+), 137 deletions(-)
> 
> -- 
> 2.36.1
> 

Ping,

We are coming up on the 7.1 soft-freeze. Most of the above are fixes, so
they can go in after, but I'd like to get a couple of the other non-fix
patches in if possile ;)


signature.asc
Description: PGP signature


Re: [PATCH v4] hw/nvme: Use ioeventfd to handle doorbell updates

2022-07-13 Thread Klaus Jensen
On Jul 12 14:23, Klaus Jensen wrote:
> On Jul  9 11:06, Jinhao Fan wrote:
> > at 10:24 PM, Jinhao Fan  wrote:
> > 
> > > @@ -5793,6 +5891,7 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, 
> > > const NvmeRequest *req)
> > > uint64_t dbs_addr = le64_to_cpu(req->cmd.dptr.prp1);
> > > uint64_t eis_addr = le64_to_cpu(req->cmd.dptr.prp2);
> > > int i;
> > > +int ret;
> > > 
> > 
> > I just noticed this ret is unused. Could you help remove this line when
> > applying the patch?
> 
> Yes, I noticed it and hot-fixed it ;)

Jinhao,

Applied to nvme-next!


signature.asc
Description: PGP signature


[PATCH] hw/nvme: add trace events for ioeventfd

2022-07-13 Thread Klaus Jensen
From: Klaus Jensen 

While testing Jinhaos ioeventfd patch I found it useful with a couple of
additional trace events since we no longer see the mmio events.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 8 
 hw/nvme/trace-events | 4 
 2 files changed, 12 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 533ad14e7a61..09725ec49c5d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1346,6 +1346,8 @@ static void nvme_post_cqes(void *opaque)
 bool pending = cq->head != cq->tail;
 int ret;
 
+trace_pci_nvme_post_cqes(cq->cqid);
+
 QTAILQ_FOREACH_SAFE(req, >req_list, entry, next) {
 NvmeSQueue *sq;
 hwaddr addr;
@@ -4238,6 +4240,8 @@ static void nvme_cq_notifier(EventNotifier *e)
 NvmeCQueue *cq = container_of(e, NvmeCQueue, notifier);
 NvmeCtrl *n = cq->ctrl;
 
+trace_pci_nvme_cq_notify(cq->cqid);
+
 event_notifier_test_and_clear(>notifier);
 
 nvme_update_cq_head(cq);
@@ -4275,6 +4279,8 @@ static void nvme_sq_notifier(EventNotifier *e)
 {
 NvmeSQueue *sq = container_of(e, NvmeSQueue, notifier);
 
+trace_pci_nvme_sq_notify(sq->sqid);
+
 event_notifier_test_and_clear(>notifier);
 
 nvme_process_sq(sq);
@@ -6240,6 +6246,8 @@ static void nvme_process_sq(void *opaque)
 NvmeCtrl *n = sq->ctrl;
 NvmeCQueue *cq = n->cq[sq->cqid];
 
+trace_pci_nvme_process_sq(sq->sqid);
+
 uint16_t status;
 hwaddr addr;
 NvmeCmd cmd;
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index fccb79f48973..45dd708bd2fa 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -104,6 +104,10 @@ pci_nvme_mmio_shutdown_set(void) "shutdown bit set"
 pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared"
 pci_nvme_shadow_doorbell_cq(uint16_t cqid, uint16_t new_shadow_doorbell) "cqid 
%"PRIu16" new_shadow_doorbell %"PRIu16""
 pci_nvme_shadow_doorbell_sq(uint16_t sqid, uint16_t new_shadow_doorbell) "sqid 
%"PRIu16" new_shadow_doorbell %"PRIu16""
+pci_nvme_sq_notify(uint16_t sqid) "sqid %"PRIu16""
+pci_nvme_cq_notify(uint16_t cqid) "cqid %"PRIu16""
+pci_nvme_process_sq(uint16_t sqid) "sqid %"PRIu16""
+pci_nvme_post_cqes(uint16_t cqid) "cqid %"PRIu16""
 pci_nvme_open_zone(uint64_t slba, uint32_t zone_idx, int all) "open zone, 
slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
 pci_nvme_close_zone(uint64_t slba, uint32_t zone_idx, int all) "close zone, 
slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
 pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish zone, 
slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
-- 
2.36.1




Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-07-13 Thread Gupta, Pankaj




This is the v7 of this series which tries to implement the fd-based KVM
guest private memory. The patches are based on latest kvm/queue branch
commit:

 b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU
split_desc_cache only by default capacity

Introduction

In general this patch series introduce fd-based memslot which provides
guest memory through memory file descriptor fd[offset,size] instead of
hva/size. The fd can be created from a supported memory filesystem
like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM


Thinking a bit, As host side fd on tmpfs or shmem will store memory on host
page cache instead of mapping pages into userspace address space. Can we hit
double (un-coordinated) page cache problem with this when guest page cache
is also used?


This is my understanding: in host it will be indeed in page cache (in
current shmem implementation) but that's just the way it allocates and
provides the physical memory for the guest. In guest, guest OS will not
see this fd (absolutely), it only sees guest memory, on top of which it
can build its own page cache system for its own file-mapped content but
that is unrelated to host page cache.


yes. If guest fills its page cache with file backed memory, this at host
side(on shmem fd backend) will also fill the host page cache fast. This
can have an impact on performance of guest VM's if host goes to memory
pressure situation sooner. Or else we end up utilizing way less System
RAM.


Is this in any meaningful way different from a regular VM?


After thinking a bit, Seems 'No'. Except the reclaim decisions system 
would take under memory pressure and also will have to see how well this 
gets stitched with memory tiers in future. But all these are future topics.


Sorry! for the noise.

Thanks,
Pankaj




Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-07-13 Thread Gupta, Pankaj



This is the v7 of this series which tries to implement the 
fd-based KVM
guest private memory. The patches are based on latest kvm/queue 
branch

commit:

 b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU
split_desc_cache only by default capacity

Introduction

In general this patch series introduce fd-based memslot which 
provides
guest memory through memory file descriptor fd[offset,size] 
instead of

hva/size. The fd can be created from a supported memory filesystem
like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM


Thinking a bit, As host side fd on tmpfs or shmem will store memory 
on host
page cache instead of mapping pages into userspace address space. 
Can we hit
double (un-coordinated) page cache problem with this when guest 
page cache

is also used?


This is my understanding: in host it will be indeed in page cache (in
current shmem implementation) but that's just the way it allocates and
provides the physical memory for the guest. In guest, guest OS will not
see this fd (absolutely), it only sees guest memory, on top of which it
can build its own page cache system for its own file-mapped content but
that is unrelated to host page cache.


yes. If guest fills its page cache with file backed memory, this at host
side(on shmem fd backend) will also fill the host page cache fast. 
This can
have an impact on performance of guest VM's if host goes to memory 
pressure

situation sooner. Or else we end up utilizing way less System RAM.


(Currently), the file backed guest private memory is long-term pinned
and not reclaimable, it's in page cache anyway once we allocated it for
guest. This does not depend on how guest use it (e.g. use it for guest
page cache or not).


Even if host shmem backed memory always be always un-reclaimable, we end 
up utilizing double RAM (both in guest & host page cache) for guest disk 
accesses?


Answering my own question:

We wont use double RAM, just view of guest & host structures would 
change as per the code path taken. If we we don't care about reclaim 
situations we should be good, else we have to think something to 
coordinate page cache between guest & host (that could be an 
optimization for later).


Thanks,
Pankaj




Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-07-13 Thread Gupta, Pankaj




This is the v7 of this series which tries to implement the fd-based KVM
guest private memory. The patches are based on latest kvm/queue branch
commit:

 b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU
split_desc_cache only by default capacity

Introduction

In general this patch series introduce fd-based memslot which provides
guest memory through memory file descriptor fd[offset,size] instead of
hva/size. The fd can be created from a supported memory filesystem
like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM


Thinking a bit, As host side fd on tmpfs or shmem will store memory on host
page cache instead of mapping pages into userspace address space. Can we hit
double (un-coordinated) page cache problem with this when guest page cache
is also used?


This is my understanding: in host it will be indeed in page cache (in
current shmem implementation) but that's just the way it allocates and
provides the physical memory for the guest. In guest, guest OS will not
see this fd (absolutely), it only sees guest memory, on top of which it
can build its own page cache system for its own file-mapped content but
that is unrelated to host page cache.


yes. If guest fills its page cache with file backed memory, this at host
side(on shmem fd backend) will also fill the host page cache fast. This can
have an impact on performance of guest VM's if host goes to memory pressure
situation sooner. Or else we end up utilizing way less System RAM.


(Currently), the file backed guest private memory is long-term pinned
and not reclaimable, it's in page cache anyway once we allocated it for
guest. This does not depend on how guest use it (e.g. use it for guest
page cache or not).


Even if host shmem backed memory always be always un-reclaimable, we end 
up utilizing double RAM (both in guest & host page cache) for guest disk 
accesses?


I am considering this a serious design decision before we commit to this 
approach.


Happy to be enlightened on this and know the thoughts from others as well.

Thanks,
Pankaj




Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-07-13 Thread Andy Lutomirski



On Wed, Jul 13, 2022, at 3:35 AM, Gupta, Pankaj wrote:
 This is the v7 of this series which tries to implement the fd-based KVM
 guest private memory. The patches are based on latest kvm/queue branch
 commit:

 b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU
 split_desc_cache only by default capacity

 Introduction
 
 In general this patch series introduce fd-based memslot which provides
 guest memory through memory file descriptor fd[offset,size] instead of
 hva/size. The fd can be created from a supported memory filesystem
 like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM
>>>
>>> Thinking a bit, As host side fd on tmpfs or shmem will store memory on host
>>> page cache instead of mapping pages into userspace address space. Can we hit
>>> double (un-coordinated) page cache problem with this when guest page cache
>>> is also used?
>> 
>> This is my understanding: in host it will be indeed in page cache (in
>> current shmem implementation) but that's just the way it allocates and
>> provides the physical memory for the guest. In guest, guest OS will not
>> see this fd (absolutely), it only sees guest memory, on top of which it
>> can build its own page cache system for its own file-mapped content but
>> that is unrelated to host page cache.
>
> yes. If guest fills its page cache with file backed memory, this at host 
> side(on shmem fd backend) will also fill the host page cache fast. This 
> can have an impact on performance of guest VM's if host goes to memory 
> pressure situation sooner. Or else we end up utilizing way less System 
> RAM.

Is this in any meaningful way different from a regular VM?

--Andy



Re: [PATCH v7 04/14] mm/shmem: Support memfile_notifier

2022-07-13 Thread Gupta, Pankaj




+#ifdef CONFIG_MIGRATION
+static int shmem_migrate_page(struct address_space *mapping,
+ struct page *newpage, struct page *page,
+ enum migrate_mode mode)
+{
+   struct inode *inode = mapping->host;
+   struct shmem_inode_info *info = SHMEM_I(inode);
+
+   if (info->memfile_node.flags & MEMFILE_F_UNMOVABLE)
+   return -EOPNOTSUPP;
+   return migrate_page(mapping, newpage, page, mode);


Wondering how well page migrate would work for private pages
on shmem memfd based backend?


  From high level:
- KVM unset MEMFILE_F_UNMOVABLE bit to indicate it capable of
  migrating a page.
- Introduce new 'migrate' callback(s) to memfile_notifier_ops for KVM
  to register.
- The callback is hooked to migrate_page() here.
- Once page migration requested, shmem calls into the 'migrate'
  callback(s) to perform additional steps for encrypted memory (For
  TDX we will call TDH.MEM.PAGE.RELOCATE).


Yes, that would require additional (protocol specific) handling for private
pages. Was trying to find where "MEMFILE_F_UNMOVABLE" flag is set currently?


It's set with memfile_register_notifier() in patch 13.


o.k.

Thanks,

Pankaj




Re: [RFC] hw/nvme: Use irqfd to send interrupts

2022-07-13 Thread Klaus Jensen
On Jul 12 14:26, Klaus Jensen wrote:
> On Jul  9 12:35, Jinhao Fan wrote:
> > Use irqfd to directly notify KVM to inject interrupts. This is done by
> > registering a virtual IRQ(virq) in KVM and associate the virq with an
> > irqfd, so that KVM can directly inject the interrupt when it receives
> > notification from the irqfd. This approach is supposed to improve 
> > performance because it bypasses QEMU's MSI interrupt emulation logic.
> > 
> > However, I did not see an obvious improvement of the emulation KIOPS:
> > 
> > QD  1   4  16  64 
> > QEMU   38 123 210 329
> > irqfd  40 129 219 328
> > 
> > I found this problem quite hard to diagnose since irqfd's workflow
> > involves both QEMU and the in-kernel KVM. 
> > 
> > Could you help me figure out the following questions:
> > 
> > 1. How much performance improvement can I expect from using irqfd?
> 
> This is a level of QEMU/KVM that I am by no means an expert on and I
> would have to let the broader QEMU community comment on this.
> 

In any case, I'm wary about adding this level of kvm-dependence in the
device. This wont work on non-kvm platforms any more.

I think you should put irqfd on hold and focus on iothreads :)


signature.asc
Description: PGP signature


Re: [PATCH] qtest/machine-none: Add LoongArch support

2022-07-13 Thread Richard Henderson

On 7/13/22 07:32, Song Gao wrote:

Update the cpu_maps[] to support the LoongArch target.

Signed-off-by: Song Gao
---
  tests/qtest/machine-none-test.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 3/3] tests/tcg/s390x: test signed vfmin/vfmax

2022-07-13 Thread Richard Henderson

On 7/13/22 23:56, Ilya Leoshkevich wrote:

Add a test to prevent regressions. Try all floating point value sizes
and all combinations of floating point value classes. Verify the results
against PoP tables, which are represented as close to the original as
possible - this produces a lot of checkpatch complaints, but it seems
to be justified in this case.

Signed-off-by: Ilya Leoshkevich
---
  tests/tcg/s390x/Makefile.target |   7 +
  tests/tcg/s390x/vfminmax.c  | 411 
  2 files changed, 418 insertions(+)
  create mode 100644 tests/tcg/s390x/vfminmax.c


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-07-13 Thread Chao Peng
On Wed, Jul 13, 2022 at 12:35:56PM +0200, Gupta, Pankaj wrote:
> 
> > > > This is the v7 of this series which tries to implement the fd-based KVM
> > > > guest private memory. The patches are based on latest kvm/queue branch
> > > > commit:
> > > > 
> > > > b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU
> > > > split_desc_cache only by default capacity
> > > > 
> > > > Introduction
> > > > 
> > > > In general this patch series introduce fd-based memslot which provides
> > > > guest memory through memory file descriptor fd[offset,size] instead of
> > > > hva/size. The fd can be created from a supported memory filesystem
> > > > like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM
> > > 
> > > Thinking a bit, As host side fd on tmpfs or shmem will store memory on 
> > > host
> > > page cache instead of mapping pages into userspace address space. Can we 
> > > hit
> > > double (un-coordinated) page cache problem with this when guest page cache
> > > is also used?
> > 
> > This is my understanding: in host it will be indeed in page cache (in
> > current shmem implementation) but that's just the way it allocates and
> > provides the physical memory for the guest. In guest, guest OS will not
> > see this fd (absolutely), it only sees guest memory, on top of which it
> > can build its own page cache system for its own file-mapped content but
> > that is unrelated to host page cache.
> 
> yes. If guest fills its page cache with file backed memory, this at host
> side(on shmem fd backend) will also fill the host page cache fast. This can
> have an impact on performance of guest VM's if host goes to memory pressure
> situation sooner. Or else we end up utilizing way less System RAM.

(Currently), the file backed guest private memory is long-term pinned
and not reclaimable, it's in page cache anyway once we allocated it for
guest. This does not depend on how guest use it (e.g. use it for guest
page cache or not). 

Chao
> 
> Thanks,
> Pankaj
> 



Re: [PATCH v7 04/14] mm/shmem: Support memfile_notifier

2022-07-13 Thread Chao Peng
On Wed, Jul 13, 2022 at 12:01:13PM +0200, Gupta, Pankaj wrote:
> 
> > > > +#ifdef CONFIG_MIGRATION
> > > > +static int shmem_migrate_page(struct address_space *mapping,
> > > > + struct page *newpage, struct page *page,
> > > > + enum migrate_mode mode)
> > > > +{
> > > > +   struct inode *inode = mapping->host;
> > > > +   struct shmem_inode_info *info = SHMEM_I(inode);
> > > > +
> > > > +   if (info->memfile_node.flags & MEMFILE_F_UNMOVABLE)
> > > > +   return -EOPNOTSUPP;
> > > > +   return migrate_page(mapping, newpage, page, mode);
> > > 
> > > Wondering how well page migrate would work for private pages
> > > on shmem memfd based backend?
> > 
> >  From high level:
> >- KVM unset MEMFILE_F_UNMOVABLE bit to indicate it capable of
> >  migrating a page.
> >- Introduce new 'migrate' callback(s) to memfile_notifier_ops for KVM
> >  to register.
> >- The callback is hooked to migrate_page() here.
> >- Once page migration requested, shmem calls into the 'migrate'
> >  callback(s) to perform additional steps for encrypted memory (For
> >  TDX we will call TDH.MEM.PAGE.RELOCATE).
> 
> Yes, that would require additional (protocol specific) handling for private
> pages. Was trying to find where "MEMFILE_F_UNMOVABLE" flag is set currently?

It's set with memfile_register_notifier() in patch 13.

> 
> Thanks,
> Pankaj



Re: [PULL 00/35] Block patches

2022-07-13 Thread Peter Maydell
On Tue, 12 Jul 2022 at 19:10, Hanna Reitz  wrote:
>
> The following changes since commit 9548cbed4253e38570d29b8cff0bf77c998f:
>
>   iotests/copy-before-write: specify required_fmts (2022-07-12 13:21:02 +0530)
>
> are available in the Git repository at:
>
>   https://gitlab.com/hreitz/qemu.git tags/pull-block-2022-07-12
>
> for you to fetch changes up to 85c4bf8aa6c93c24876e8870ae7cf8ab2e5a96cf:
>
>   vl: Unlink absolute PID file path (2022-07-12 14:31:15 +0200)
>
> 
> Block patches:
> - Refactoring for non-coroutine variants of bdrv/blk_co_* functions:
>   Auto-generate more of them with the block coroutine wrapper generator
>   script
> - iotest fixes
> - Both for the storage daemon and the system emulator: Fix PID file
>   handling when daemonizing (store the absolute path and delete that on
>   exit, which is necessary because daemonizing will change the working
>   directory to /)
>
> 



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.1
for any user-visible changes.

-- PMM



[PATCH 4/7] pci: designware: ignore new bits in ATU CR1

2022-07-13 Thread Ben Dooks
In version 4 and anver ATU CR1 has more bits in it than just the
viewport type. Make a guess at masking these out to avoid issues
where Linux writes these bits and fails to enable memory ATUs.

Signed-off-by: Ben Dooks 
---
 hw/pci-host/designware.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index 6403416634..947547d153 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -276,10 +276,10 @@ static void 
designware_pcie_update_viewport(DesignwarePCIERoot *root,
 const uint64_t base   = viewport->base;
 const uint64_t size   = viewport->limit - base + 1;
 const bool enabled= viewport->cr[1] & DESIGNWARE_PCIE_ATU_ENABLE;
-
+uint32_t cr0  = viewport->cr[0];
 MemoryRegion *current, *other;
 
-if (viewport->cr[0] == DESIGNWARE_PCIE_ATU_TYPE_MEM) {
+if ((cr0 & 0xFF) == DESIGNWARE_PCIE_ATU_TYPE_MEM) {
 current = >mem;
 other   = >cfg;
 memory_region_set_alias_offset(current, target);
-- 
2.35.1




[PATCH 5/7] pci: designware: move msi to entry 5

2022-07-13 Thread Ben Dooks
The driver should leave irq[0..3] for INT[A..D] but seems to put the
MSI IRQ at entry 3 which should also be INT_D. Extend the irqs[] array
to 5 entires and put the MSI at entry irqs[4].

Signed-off-by: Ben Dooks 
---
 hw/pci-host/designware.c | 2 +-
 include/hw/pci-host/designware.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index 947547d153..b5d5b2b8a5 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -56,7 +56,7 @@
 #define DESIGNWARE_PCIE_ATU_UPPER_TARGET   0x91C
 #define DESIGNWARE_PCIE_ATU_UPPER_LIMIT0x924
 
-#define DESIGNWARE_PCIE_IRQ_MSI3
+#define DESIGNWARE_PCIE_IRQ_MSI4
 
 static DesignwarePCIEHost *
 designware_pcie_root_to_host(DesignwarePCIERoot *root)
diff --git a/include/hw/pci-host/designware.h b/include/hw/pci-host/designware.h
index bd4dd49aec..37f90c5000 100644
--- a/include/hw/pci-host/designware.h
+++ b/include/hw/pci-host/designware.h
@@ -90,7 +90,7 @@ struct DesignwarePCIEHost {
 MemoryRegion memory;
 MemoryRegion io;
 
-qemu_irq irqs[4];
+qemu_irq irqs[5];
 } pci;
 
 MemoryRegion mmio;
-- 
2.35.1




[PATCH 1/7] pci: designware: add 64-bit viewport limit

2022-07-13 Thread Ben Dooks
Versions 4 and above add support for 64-bit viewport
limit. Add support for the DESIGNWARE_PCIE_ATU_UPPER_LIMIT
regiser where supported.

Signed-off-by: Ben Dooks 
---
 hw/pci-host/designware.c | 22 +-
 include/hw/pci-host/designware.h |  2 +-
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index bde3a343a2..296f1b9760 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -54,6 +54,7 @@
 #define DESIGNWARE_PCIE_ATU_BUS(x) (((x) >> 24) & 0xff)
 #define DESIGNWARE_PCIE_ATU_DEVFN(x)   (((x) >> 16) & 0xff)
 #define DESIGNWARE_PCIE_ATU_UPPER_TARGET   0x91C
+#define DESIGNWARE_PCIE_ATU_UPPER_LIMIT0x924
 
 #define DESIGNWARE_PCIE_IRQ_MSI3
 
@@ -196,6 +197,10 @@ designware_pcie_root_config_read(PCIDevice *d, uint32_t 
address, int len)
 val = viewport->target >> 32;
 break;
 
+case DESIGNWARE_PCIE_ATU_UPPER_LIMIT:
+val = viewport->limit >> 32;
+break;
+
 case DESIGNWARE_PCIE_ATU_LIMIT:
 val = viewport->limit;
 break;
@@ -269,7 +274,7 @@ static void 
designware_pcie_update_viewport(DesignwarePCIERoot *root,
 {
 const uint64_t target = viewport->target;
 const uint64_t base   = viewport->base;
-const uint64_t size   = (uint64_t)viewport->limit - base + 1;
+const uint64_t size   = viewport->limit - base + 1;
 const bool enabled= viewport->cr[1] & DESIGNWARE_PCIE_ATU_ENABLE;
 
 MemoryRegion *current, *other;
@@ -363,14 +368,21 @@ static void designware_pcie_root_config_write(PCIDevice 
*d, uint32_t address,
 viewport->target |= val;
 break;
 
+case DESIGNWARE_PCIE_ATU_UPPER_LIMIT:
+viewport->limit &= 0xUL;
+viewport->limit |= (uint64_t)val << 32;
+break;
+
 case DESIGNWARE_PCIE_ATU_LIMIT:
-viewport->limit = val;
+viewport->limit = 0xULL;
+viewport->limit |= val;
 break;
 
 case DESIGNWARE_PCIE_ATU_CR1:
 viewport->cr[0] = val;
 break;
 case DESIGNWARE_PCIE_ATU_CR2:
+//printf("CR2: value %08x\n", val);
 viewport->cr[1] = val;
 designware_pcie_update_viewport(root, viewport);
 break;
@@ -429,7 +441,7 @@ static void designware_pcie_root_realize(PCIDevice *dev, 
Error **errp)
 viewport->inbound = true;
 viewport->base= 0xULL;
 viewport->target  = 0xULL;
-viewport->limit   = UINT32_MAX;
+viewport->limit   = UINT64_MAX-1;
 viewport->cr[0]   = DESIGNWARE_PCIE_ATU_TYPE_MEM;
 
 source  = >pci.address_space_root;
@@ -453,7 +465,7 @@ static void designware_pcie_root_realize(PCIDevice *dev, 
Error **errp)
 viewport->inbound = false;
 viewport->base= 0xULL;
 viewport->target  = 0xULL;
-viewport->limit   = UINT32_MAX;
+viewport->limit   = UINT64_MAX-1;
 viewport->cr[0]   = DESIGNWARE_PCIE_ATU_TYPE_MEM;
 
 destination = >pci.memory;
@@ -560,7 +572,7 @@ static const VMStateDescription 
vmstate_designware_pcie_viewport = {
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(base, DesignwarePCIEViewport),
 VMSTATE_UINT64(target, DesignwarePCIEViewport),
-VMSTATE_UINT32(limit, DesignwarePCIEViewport),
+VMSTATE_UINT64(limit, DesignwarePCIEViewport),
 VMSTATE_UINT32_ARRAY(cr, DesignwarePCIEViewport, 2),
 VMSTATE_END_OF_LIST()
 }
diff --git a/include/hw/pci-host/designware.h b/include/hw/pci-host/designware.h
index 6d9b51ae67..bd4dd49aec 100644
--- a/include/hw/pci-host/designware.h
+++ b/include/hw/pci-host/designware.h
@@ -44,7 +44,7 @@ typedef struct DesignwarePCIEViewport {
 
 uint64_t base;
 uint64_t target;
-uint32_t limit;
+uint64_t limit;
 uint32_t cr[2];
 
 bool inbound;
-- 
2.35.1




[PATCH 2/7] pci: designware: fix DESIGNWARE_PCIE_ATU_UPPER_TARGET

2022-07-13 Thread Ben Dooks
By inspection DESIGNWARE_PCIE_ATU_UPPER_TARGET should be writing to
the upper 32-bits of viewport->target, so fix this by shifting the
32-bit value before the or.

Signed-off-by: Ben Dooks 
---
 hw/pci-host/designware.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index 296f1b9760..d213d7324c 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -365,7 +365,7 @@ static void designware_pcie_root_config_write(PCIDevice *d, 
uint32_t address,
 
 case DESIGNWARE_PCIE_ATU_UPPER_TARGET:
 viewport->target &= 0xULL;
-viewport->target |= val;
+viewport->target |= (uint64_t)val << 32;
 break;
 
 case DESIGNWARE_PCIE_ATU_UPPER_LIMIT:
-- 
2.35.1




[PATCH 3/7] pci: designware: clamp viewport index

2022-07-13 Thread Ben Dooks
The current Linux driver for this assumes it can write the 255 into
this register and then read back the value to work out how many
viewports are supported.

Clamp the value so that the probe works and does not cause memory
corruption as the value is not well clamped elsewhere in the driver.

Signed-off-by: Ben Dooks 
---
 hw/pci-host/designware.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index d213d7324c..6403416634 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -345,6 +345,10 @@ static void designware_pcie_root_config_write(PCIDevice 
*d, uint32_t address,
 break;
 
 case DESIGNWARE_PCIE_ATU_VIEWPORT:
+/* clamp this value, linux uses it to calculate the
+ * available number of viewports */
+if (val >= DESIGNWARE_PCIE_NUM_VIEWPORTS)
+val = DESIGNWARE_PCIE_NUM_VIEWPORTS-1;
 root->atu_viewport = val;
 break;
 
-- 
2.35.1




updates for designware pci-host

2022-07-13 Thread Ben Dooks
As part of a project we have been looking at using the DesignWare
PCIe host. We found a few issues of missing features or small bugs
when using this with a recent Linux kernel (v5.17.x)

Whilst doing this we also made a start on some tracing events.





[PATCH 6/7] pci: designware: correct host's class_id

2022-07-13 Thread Ben Dooks
This is a host to pcie bridge, so use PCI_CLASS_BRIDGE_HOST
for the class.

Signed-off-by: Ben Dooks 
---
 hw/pci-host/designware.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index b5d5b2b8a5..a47ae48071 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -615,7 +615,7 @@ static void designware_pcie_root_class_init(ObjectClass 
*klass, void *data)
 k->vendor_id = PCI_VENDOR_ID_SYNOPSYS;
 k->device_id = 0xABCD;
 k->revision = 0;
-k->class_id = PCI_CLASS_BRIDGE_PCI;
+k->class_id = PCI_CLASS_BRIDGE_HOST;
 k->is_bridge = true;
 k->exit = pci_bridge_exitfn;
 k->realize = designware_pcie_root_realize;
-- 
2.35.1




[PATCH 7/7] pci: designware: add initial tracing events

2022-07-13 Thread Ben Dooks
Add a couple of tracing events for internal driver updates

Signed-off-by: Ben Dooks 
---
 hw/pci-host/designware.c | 4 
 hw/pci-host/trace-events | 4 
 2 files changed, 8 insertions(+)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index a47ae48071..489959513f 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -30,6 +30,7 @@
 #include "migration/vmstate.h"
 #include "hw/irq.h"
 #include "hw/pci-host/designware.h"
+#include "trace.h"
 
 #define DESIGNWARE_PCIE_PORT_LINK_CONTROL  0x710
 #define DESIGNWARE_PCIE_PHY_DEBUG_R1   0x72C
@@ -112,6 +113,7 @@ static void 
designware_pcie_root_update_msi_mapping(DesignwarePCIERoot *root)
 const uint64_t base = root->msi.base;
 const bool enable   = root->msi.intr[0].enable;
 
+trace_dw_pcie_msi_update(base, enable);
 memory_region_set_address(mem, base);
 memory_region_set_enabled(mem, enable);
 }
@@ -279,6 +281,8 @@ static void 
designware_pcie_update_viewport(DesignwarePCIERoot *root,
 uint32_t cr0  = viewport->cr[0];
 MemoryRegion *current, *other;
 
+trace_dw_pcie_viewport_update(target, base, size, cr0, enabled);
+
 if ((cr0 & 0xFF) == DESIGNWARE_PCIE_ATU_TYPE_MEM) {
 current = >mem;
 other   = >cfg;
diff --git a/hw/pci-host/trace-events b/hw/pci-host/trace-events
index 437e66ff50..6b064d3c74 100644
--- a/hw/pci-host/trace-events
+++ b/hw/pci-host/trace-events
@@ -3,6 +3,10 @@
 # bonito.c
 bonito_spciconf_small_access(uint64_t addr, unsigned size) "PCI config address 
is smaller then 32-bit, addr: 0x%"PRIx64", size: %u"
 
+# designware.c
+dw_pcie_msi_update(uint64_t base, int enable) "base 0x%" PRIx64 " enable %d"
+dw_pcie_viewport_update(uint64_t target, uint64_t base, uint64_t limit, 
uint32_t cr0, int enabled) "target 0x%" PRIx64 " base 0x%" PRIx64 " limit 0x%" 
PRIx64 " cr0 0x%" PRIx32 " enabled %d"
+
 # grackle.c
 grackle_set_irq(int irq_num, int level) "set_irq num %d level %d"
 
-- 
2.35.1




[PATCH] gpio: designware gpio driver

2022-07-13 Thread Ben Dooks
A model for the DesignWare GPIO (v1) block.

Signed-off-by: Ben Dooks 
---
 hw/gpio/Kconfig   |   3 +
 hw/gpio/designware_gpio.c | 327 ++
 hw/gpio/meson.build   |   1 +
 hw/gpio/trace-events  |   7 +
 include/hw/gpio/designware_gpio.h |  89 
 5 files changed, 427 insertions(+)
 create mode 100644 hw/gpio/designware_gpio.c
 create mode 100644 include/hw/gpio/designware_gpio.h

diff --git a/hw/gpio/Kconfig b/hw/gpio/Kconfig
index f0e7405f6e..e5883d9763 100644
--- a/hw/gpio/Kconfig
+++ b/hw/gpio/Kconfig
@@ -13,3 +13,6 @@ config GPIO_PWR
 
 config SIFIVE_GPIO
 bool
+
+config DESIGNWARE_GPIO
+bool
\ No newline at end of file
diff --git a/hw/gpio/designware_gpio.c b/hw/gpio/designware_gpio.c
new file mode 100644
index 00..417bccd630
--- /dev/null
+++ b/hw/gpio/designware_gpio.c
@@ -0,0 +1,327 @@
+/*
+ * Synopsys Desgignware general purpose input/output register definition
+ *
+ * Based on sifive_gpio.c and imx_gpio.c
+ *
+ * Copyright 2022 Sifive, Inc.
+ * Copyright 2018 Steffen Görtz 
+ *
+ * This code is licensed under the GPL version 2 or later.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/irq.h"
+#include "hw/qdev-properties.h"
+#include "hw/gpio/designware_gpio.h"
+#include "migration/vmstate.h"
+#include "trace.h"
+
+/* only bank A can provide interrupts */
+static void update_output_irqs(DESIGNWAREGPIOState *s)
+{
+struct DESIGNWAREGPIOBank *bank = >bank[0];
+uint32_t level_irqs, edge_irqs = 0;
+
+/* re-calculate interrupts for raw_int_status */
+level_irqs = bank->dr_val ^ s->int_polarity;
+level_irqs &= ~s->int_level;
+
+edge_irqs = bank->dr_val ^ bank->last_dr_val;
+edge_irqs &= s->int_level;
+bank->last_dr_val = bank->dr_val;
+
+/* update irq from raw-status and the mask */
+s->int_status_raw = level_irqs | edge_irqs;
+s->int_status = s->int_status_raw & s->int_mask;
+
+qemu_set_irq(s->irq, s->int_status ? 1 : 0);
+trace_designware_gpio_update_output_irq(s->int_status);
+}
+
+static void update_state(DESIGNWAREGPIOState *s)
+{
+struct DESIGNWAREGPIOBank *bank;
+int banknr, basenr, nr;
+
+for (banknr = 0; banknr < DESIGNWARE_GPIO_BANKS; banknr++) {
+basenr = banknr * DESIGNWARE_GPIO_NR_PER_BANK;
+bank = >bank[banknr];
+
+/* check for data-direction differences */
+if (bank->ddr & bank->in) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "GPIO bank %d: pins shorted, DDR=%x, IN=%x, 
overlap=%x\n",
+  banknr, bank->ddr, bank->in, bank->ddr & bank->in);
+}
+
+bank->dr_val = (bank->dr & bank->ddr) | (bank->in & ~bank->ddr);
+
+/* update any outputs marked as outputs */
+for (nr = 0; nr < DESIGNWARE_GPIO_NR_PER_BANK; nr++) {
+if (!extract32(bank->ddr, nr, 1))
+continue;
+qemu_set_irq(s->output[basenr+nr], extract32(bank->dr_val, nr, 1));
+}
+}
+
+update_output_irqs(s);
+}
+
+
+static uint64_t designware_gpio_read(void *opaque, hwaddr offset, unsigned int 
size)
+{
+struct DESIGNWAREGPIOState *s = DESIGNWARE_GPIO(opaque);
+struct DESIGNWAREGPIOBank *bank;
+hwaddr banknr, reg;
+uint64_t r = 0;
+bool handled = true;
+
+if (offset < (REG_SWPORTD_DDR + 4)) {
+banknr = offset / REG_SWPORT_DR_STRIDE;
+reg = offset % REG_SWPORT_DR_STRIDE;
+bank = >bank[banknr];
+
+switch (reg) {
+case REG_SWPORTA_DR:
+r = bank->dr;
+break;
+case REG_SWPORTA_DDR:
+r = bank->ddr;
+break;
+default:
+handled = false;
+}
+} else {
+switch (offset) {
+case REG_INTEN:
+r= s->int_en;
+break;
+case REG_INTMASK:
+r = s->int_mask;
+break;
+case REG_INTTYPE_LEVEL:
+r = s->int_level;
+break;
+case REG_INT_POLARITY:
+r = s->int_polarity;
+break;
+case REG_INTSTATUS:
+r = s->int_status;
+break;
+case REG_INTSTATUS_RAW:
+r = s->int_status_raw;
+break;
+case REG_PORTA_DEBOUNCE:
+r = s->porta_debounce;
+break;
+case REG_PORTA_EOI:
+r = 0x0;/* write only */
+break;
+case REG_EXT_PORTA:
+r = s->bank[0].dr_val;
+break;
+case REG_EXT_PORTB:
+r = s->bank[1].dr_val;
+break;
+case REG_EXT_PORTC:
+r = s->bank[2].dr_val;
+break;
+case REG_EXT_PORTD:
+r = s->bank[3].dr_val;
+break;
+case REG_ID:
+r = 0x0;
+break;
+default:
+handled = false;
+}
+}
+
+if 

Re: [PATCH 2/9] target/ppc: add errp to kvmppc_read_int_cpu_dt()

2022-07-13 Thread Mark Cave-Ayland

On 12/07/2022 15:54, BALATON Zoltan wrote:


On Tue, 12 Jul 2022, Mark Cave-Ayland wrote:

On 11/07/2022 08:42, Cédric Le Goater wrote:

Anything special I should know ?


As I don't have access to a G5 I've never tried that, however the 
qemu-system-ppc64 mac99 is wired differently to the qemu-system-ppc mac99 machine 
so I wouldn't be surprised if something is broken there.


My normal test for MacOS is something like:

    qemu-system-ppc -M mac99 -accel kvm -hda macos104.img

Can you try qemu-system-ppc and see if it is any better? If not then I can fire 
up the G4 and get the git hashes for my last known working configuration.


Same issue with 32bit.


I've just fired up my G4 to test this again, pulled the latest QEMU git master and 
confirmed that I have a working setup with the details below:


Host kernel: (5.1.0-rc2+)
commit a3ac7917b73070010c05b4485b8582a6c9cd69b6
Author: Linus Torvalds 
Date:   Mon Mar 25 14:49:00 2019 -0700

Guest kernel: (4.14.0-3-powerpc)
using Debian ports debian-9.0-powerpc-NETINST-1.iso

Command line:
./qemu-system-ppc [-M mac99] -accel kvm -cdrom 
/home/mca/images/debian-9.0-powerpc-NETINST-1.iso -boot d -nographic


However if I switch to using the latest Debian ports 
debian-10.0.0-powerpc-NETINST-1.iso then I get a failure:


[    0.198565] BUG: Unable to handle kernel data access on read at 0xbb0030d4


What is or should be at this address and why does the kernel access it? By default I 
see nothing mapped there. Do you need more RAM? Maybe the default 128 MB is not 
enough for newer kernels? I've seen such problem with other OSes before.


Yeah I've already tried increasing the RAM and it makes no difference. It wouldn't 
surprise me if it's a kernel issue since Christophe has done a lot of low-level work 
for 32-bit PPC including moving routines from asm to C and KASAN. I'm away for a few 
days but I will do a bisect when I get back, unless anyone beats me to it...



ATB,

Mark.



Re: [PATCH 6/6] target/ppc: fix exception error code in spr_write_excp_vector

2022-07-13 Thread Daniel Henrique Barboza




On 6/27/22 11:11, Matheus Ferst wrote:

The 'error' argument of gen_inval_exception will be or-ed with
POWERPC_EXCP_INVAL, so it should always be a constant prefixed with
POWERPC_EXCP_INVAL_. No functional change is intended,
spr_write_excp_vector is only used by register_BookE_sprs, and
powerpc_excp_booke ignores the lower 4 bits of the error code on
POWERPC_EXCP_INVAL exceptions.

Also, take the opportunity to replace printf with qemu_log_mask.

Signed-off-by: Matheus Ferst 
---


Reviewed-by: Daniel Henrique Barboza 


  target/ppc/translate.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 30dd524959..da11472877 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -907,9 +907,9 @@ void spr_write_excp_vector(DisasContext *ctx, int sprn, int 
gprn)
  } else if (sprn >= SPR_BOOKE_IVOR38 && sprn <= SPR_BOOKE_IVOR42) {
  sprn_offs = sprn - SPR_BOOKE_IVOR38 + 38;
  } else {
-printf("Trying to write an unknown exception vector %d %03x\n",
-   sprn, sprn);
-gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+qemu_log_mask(LOG_GUEST_ERROR, "Trying to write an unknown exception"
+  " vector 0x%03x\n", sprn);
+gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
  return;
  }
  




Re: [PATCH 4/6] target/ppc: fix exception error code in helper_{load,store}_dcr

2022-07-13 Thread Daniel Henrique Barboza




On 6/27/22 11:11, Matheus Ferst wrote:

POWERPC_EXCP_INVAL should only be or-ed with other constants prefixed
with POWERPC_EXCP_INVAL_. Also, take the opportunity to move both
helpers under #if !defined(CONFIG_USER_ONLY) as the instructions that
use them are privileged.

No functional change is intended, the lower 4 bits of the error code are
ignored by all powerpc_excp_* methods on POWERPC_EXCP_INVAL exceptions.

Reported-by: Laurent Vivier 
Signed-off-by: Matheus Ferst 
---
  target/ppc/helper.h  | 2 +-
  target/ppc/timebase_helper.c | 6 +++---
  2 files changed, 4 insertions(+), 4 deletions(-)


Reviewed-by: Daniel Henrique Barboza 



diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 6233e28d85..c6895f2f99 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -684,10 +684,10 @@ DEF_HELPER_2(book3s_msgclr, void, env, tl)
  DEF_HELPER_4(dlmzb, tl, env, tl, tl, i32)
  #if !defined(CONFIG_USER_ONLY)
  DEF_HELPER_2(rac, tl, env, tl)
-#endif
  
  DEF_HELPER_2(load_dcr, tl, env, tl)

  DEF_HELPER_3(store_dcr, void, env, tl, tl)
+#endif
  
  DEF_HELPER_2(load_dump_spr, void, env, i32)

  DEF_HELPER_2(store_dump_spr, void, env, i32)
diff --git a/target/ppc/timebase_helper.c b/target/ppc/timebase_helper.c
index 86d01d6e4e..b80f56af7e 100644
--- a/target/ppc/timebase_helper.c
+++ b/target/ppc/timebase_helper.c
@@ -143,7 +143,6 @@ void helper_store_booke_tsr(CPUPPCState *env, target_ulong 
val)
  {
  store_booke_tsr(env, val);
  }
-#endif
  
  /*/

  /* Embedded PowerPC specific helpers */
@@ -169,7 +168,7 @@ target_ulong helper_load_dcr(CPUPPCState *env, target_ulong 
dcrn)
(uint32_t)dcrn, (uint32_t)dcrn);
  raise_exception_err_ra(env, POWERPC_EXCP_PROGRAM,
 POWERPC_EXCP_INVAL |
-   POWERPC_EXCP_PRIV_REG, GETPC());
+   POWERPC_EXCP_INVAL_INVAL, GETPC());
  }
  }
  return val;
@@ -192,7 +191,8 @@ void helper_store_dcr(CPUPPCState *env, target_ulong dcrn, 
target_ulong val)
(uint32_t)dcrn, (uint32_t)dcrn);
  raise_exception_err_ra(env, POWERPC_EXCP_PROGRAM,
 POWERPC_EXCP_INVAL |
-   POWERPC_EXCP_PRIV_REG, GETPC());
+   POWERPC_EXCP_INVAL_INVAL, GETPC());
  }
  }
  }
+#endif




Re: [PATCH 2/6] target/ppc: fix exception error value in slbfee

2022-07-13 Thread Daniel Henrique Barboza




On 6/27/22 11:11, Matheus Ferst wrote:

Testing on a POWER9 DD2.3, we observed that the Linux kernel delivers a
signal with si_code ILL_PRVOPC (5) when a userspace application tries to
use slbfee. To obtain this behavior on linux-user, we should use
POWERPC_EXCP_PRIV with POWERPC_EXCP_PRIV_OPC.

No functional change is intended for softmmu targets as
gen_hvpriv_exception uses the same 'exception' argument
(POWERPC_EXCP_HV_EMU) for raise_exception_*, and the powerpc_excp_*
methods do not use lower bits of the exception error code when handling
POWERPC_EXCP_{INVAL,PRIV}.

Reported-by: Laurent Vivier 
Signed-off-by: Matheus Ferst 
---


Reviewed-by: Daniel Henrique Barboza 


  target/ppc/translate.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 55f34eb490..d7e5670c20 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -5386,12 +5386,12 @@ static void gen_slbmfev(DisasContext *ctx)
  static void gen_slbfee_(DisasContext *ctx)
  {
  #if defined(CONFIG_USER_ONLY)
-gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+gen_hvpriv_exception(ctx, POWERPC_EXCP_PRIV_OPC);
  #else
  TCGLabel *l1, *l2;
  
  if (unlikely(ctx->pr)) {

-gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+gen_hvpriv_exception(ctx, POWERPC_EXCP_PRIV_OPC);
  return;
  }
  gen_helper_find_slb_vsid(cpu_gpr[rS(ctx->opcode)], cpu_env,




Re: [PATCH v4 2/2] ui/gtk: a new array param monitor to specify the target displays

2022-07-13 Thread Dongwon Kim
On Tue, Jul 12, 2022 at 08:11:08AM +0200, Markus Armbruster wrote:
> Dongwon Kim  writes:
> 
> > New integer array parameter, 'monitor' is for specifying the target
> > monitors where individual GTK windows are placed upon launching.
> >
> > Monitor numbers in the array are associated with virtual consoles
> > in the order of [VC0, VC1, VC2 ... VCn].
> >
> > Every GTK window containing each VC will be placed in the region
> > of corresponding monitors.
> >
> > Usage: -display gtk,monitor.=,..
> >ex)-display gtk,monitor.0=1,monitor.1=0
> >
> > Cc: Daniel P. Berrangé 
> > Cc: Markus Armbruster 
> > Cc: Philippe Mathieu-Daudé 
> > Cc: Paolo Bonzini 
> > Cc: Gerd Hoffmann 
> > Cc: Vivek Kasireddy 
> > Signed-off-by: Dongwon Kim 
> > ---
> >  qapi/ui.json|  9 -
> >  qemu-options.hx |  3 ++-
> >  ui/gtk.c| 30 --
> >  3 files changed, 38 insertions(+), 4 deletions(-)
> >
> > diff --git a/qapi/ui.json b/qapi/ui.json
> > index 413371d5e8..ee0f9244ef 100644
> > --- a/qapi/ui.json
> > +++ b/qapi/ui.json
> > @@ -1195,12 +1195,19 @@
> >  #   assuming the guest will resize the display to match
> >  #   the window size then.  Otherwise it defaults to "off".
> >  #   Since 3.1
> > +# @monitor: Array of numbers, each of which represents physical
> > +#   monitor where GTK window containing a given VC will be
> > +#   placed. Each monitor number in the array will be
> > +#   associated with a virtual console starting from VC0.
> > +#
> > +#   since 7.1
> 
> I dislike repeating the type (here: array of numbers) in the
> description.
> 
> Suggest something like
> 
># @monitor: List of physical monitor numbers where the GTK windows
>#   containing the virtual consoles VC0, VC1, ... are to be
>#   placed.  (Since 7.1)
> 
> Missing: what happens when there are more VCs than list elements.  Can
> you tell us?

# @monitor: List of physical monitor numbers where the GTK windows
#   containing the virtual consoles VC0, VC1, ... are to be
#   placed. If a mapping exists for a VC, then it'd be
#   placed on that specific physical monitor; otherwise,
#   it'd default to the monitor from where it was launched
#   since 7.1

How does this look?
> 
> >  #
> >  # Since: 2.12
> >  ##
> >  { 'struct'  : 'DisplayGTK',
> >'data': { '*grab-on-hover' : 'bool',
> > -'*zoom-to-fit'   : 'bool'  } }
> > +'*zoom-to-fit'   : 'bool',
> > +'*monitor'   : ['uint16']  } }
> >  
> >  ##
> >  # @DisplayEGLHeadless:
> 
> [...]
> 



Re: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Keith Busch
On Wed, Jul 13, 2022 at 09:11:41PM +0200, Mauricio Sandt wrote:
> On 13/07/2022 20:48, Keith Busch wrote:
> > I guess I'm missing the bigger picture here. You are supposed to be able to
> > retrieve these fields with ioctl's, so not sure what this has to do with
> > malware. Why does the firmware revision matter to this program?
> Oh I'm sorry, I forgot to explain properly. Malware usually checks if it is
> being run in a sandbox environment like a VM, and if it detects such a
> sandbox, it doesn't run or doesn't unleash its full potential. This makes my
> life as a researcher much harder.
> 
> Hiding the VM by overriding the model, firmware, and nqn strings to either
> random values or names of existing hardware in the hypervisor is a much
> cleaner solution than intercepting the IOCTLs in the VM and changing the
> result with a kernel driver.

IIUC, this program is trying to avoid being studied, and uses indicators like
nvme firmware to help determine if it is running in such an environment. If so,
I suspect defeating all possible indicators will be a fun and time consuming
process. :)



Re: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Mauricio Sandt

On 13/07/2022 20:48, Keith Busch wrote:

I guess I'm missing the bigger picture here. You are supposed to be able to
retrieve these fields with ioctl's, so not sure what this has to do with
malware. Why does the firmware revision matter to this program?

Oh I'm sorry, I forgot to explain properly. Malware usually checks if it is
being run in a sandbox environment like a VM, and if it detects such a
sandbox, it doesn't run or doesn't unleash its full potential. This makes my
life as a researcher much harder.

Hiding the VM by overriding the model, firmware, and nqn strings to either
random values or names of existing hardware in the hypervisor is a much
cleaner solution than intercepting the IOCTLs in the VM and changing the
result with a kernel driver.



Older qemu nvme wasn't reliably generating unique identifiers, so linux quirks
them to ignore. They are reliable now, so the quirk can be changed to firmware
specific when the version number updates with the next release.

If I understand you correctly, that means that changing the model and
firmware version to something that doesn't identify the device as a
qemu device would work just fine provided that you're running a recent
qemu version?



Re: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Keith Busch
On Wed, Jul 13, 2022 at 08:06:26PM +0200, Mauricio Sandt wrote:
> My specific use case that required this patch is a piece of malware that used
> several IOCTLs to read model, firmware, and nqn from the NVMe attached to the
> VM. Modifying that info at the hypervisor level was a much better approach
> than loading an (unsigned) driver into windows to intercept said IOCTLs.
> Having this patch in the official qemu version would help me a lot in my test
> lab, and I'm pretty sure it would also help other people.

I guess I'm missing the bigger picture here. You are supposed to be able to
retrieve these fields with ioctl's, so not sure what this has to do with
malware. Why does the firmware revision matter to this program?
 
> I guess there could be a warning about potential problems with drivers in the
> description for model, firmware, and nqn, but I haven't experienced any
> issues when changing the values (for all of them). If you encountered any
> problems, how did they manifest?

Older qemu nvme wasn't reliably generating unique identifiers, so linux quirks
them to ignore. They are reliable now, so the quirk can be changed to firmware
specific when the version number updates with the next release.



[PATCH v2] target/ppc: check tb_env != 0 before printing TBU/TBL/DECR

2022-07-13 Thread Matheus Ferst
When using "-machine none", env->tb_env is not allocated, causing the
segmentation fault reported in issue #85 (launchpad bug #811683). To
avoid this problem, check if the pointer != NULL before calling the
methods to print TBU/TBL/DECR.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/85
Signed-off-by: Matheus Ferst 
---
v2:
 - Added checks in monitor_get_decr, monitor_get_tbu, and monitor_get_tbl.
 - Link to v1: https://lists.gnu.org/archive/html/qemu-ppc/2022-07/msg00173.html
---
 target/ppc/cpu_init.c | 16 
 target/ppc/monitor.c  |  9 +
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 86ad28466a..7e96baac9f 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7476,18 +7476,18 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
  "%08x iidx %d didx %d\n",
  env->msr, env->spr[SPR_HID0], env->hflags,
  cpu_mmu_index(env, true), cpu_mmu_index(env, false));
-#if !defined(NO_TIMER_DUMP)
-qemu_fprintf(f, "TB %08" PRIu32 " %08" PRIu64
+if (env->tb_env) {
+qemu_fprintf(f, "TB %08" PRIu32 " %08" PRIu64
 #if !defined(CONFIG_USER_ONLY)
- " DECR " TARGET_FMT_lu
+ " DECR " TARGET_FMT_lu
 #endif
- "\n",
- cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)
+ "\n",
+ cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)
 #if !defined(CONFIG_USER_ONLY)
- , cpu_ppc_load_decr(env)
-#endif
-);
+ , cpu_ppc_load_decr(env)
 #endif
+);
+}
 for (i = 0; i < 32; i++) {
 if ((i & (RGPL - 1)) == 0) {
 qemu_fprintf(f, "GPR%02d", i);
diff --git a/target/ppc/monitor.c b/target/ppc/monitor.c
index 0b805ef6e9..8250b1304e 100644
--- a/target/ppc/monitor.c
+++ b/target/ppc/monitor.c
@@ -55,6 +55,9 @@ static target_long monitor_get_decr(Monitor *mon, const 
struct MonitorDef *md,
 int val)
 {
 CPUArchState *env = mon_get_cpu_env(mon);
+if (!env->tb_env) {
+return 0;
+}
 return cpu_ppc_load_decr(env);
 }
 
@@ -62,6 +65,9 @@ static target_long monitor_get_tbu(Monitor *mon, const struct 
MonitorDef *md,
int val)
 {
 CPUArchState *env = mon_get_cpu_env(mon);
+if (!env->tb_env) {
+return 0;
+}
 return cpu_ppc_load_tbu(env);
 }
 
@@ -69,6 +75,9 @@ static target_long monitor_get_tbl(Monitor *mon, const struct 
MonitorDef *md,
int val)
 {
 CPUArchState *env = mon_get_cpu_env(mon);
+if (!env->tb_env) {
+return 0;
+}
 return cpu_ppc_load_tbl(env);
 }
 
-- 
2.25.1




[PATCH v2 3/3] tests/tcg/s390x: test signed vfmin/vfmax

2022-07-13 Thread Ilya Leoshkevich
Add a test to prevent regressions. Try all floating point value sizes
and all combinations of floating point value classes. Verify the results
against PoP tables, which are represented as close to the original as
possible - this produces a lot of checkpatch complaints, but it seems
to be justified in this case.

Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/s390x/Makefile.target |   7 +
 tests/tcg/s390x/vfminmax.c  | 411 
 2 files changed, 418 insertions(+)
 create mode 100644 tests/tcg/s390x/vfminmax.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 3124172736..1a7a4a2f59 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -17,6 +17,13 @@ TESTS+=trap
 TESTS+=signals-s390x
 TESTS+=branch-relative-long
 
+Z14_TESTS=vfminmax
+vfminmax: LDFLAGS+=-lm
+$(Z14_TESTS): CFLAGS+=-march=z14 -O2
+
+TESTS+=$(if $(shell $(CC) -march=z14 -S -o /dev/null -xc /dev/null \
+>/dev/null 2>&1 && echo OK),$(Z14_TESTS))
+
 VECTOR_TESTS=vxeh2_vs
 VECTOR_TESTS+=vxeh2_vcvt
 VECTOR_TESTS+=vxeh2_vlstr
diff --git a/tests/tcg/s390x/vfminmax.c b/tests/tcg/s390x/vfminmax.c
new file mode 100644
index 00..22629df160
--- /dev/null
+++ b/tests/tcg/s390x/vfminmax.c
@@ -0,0 +1,411 @@
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * vfmin/vfmax instruction execution.
+ */
+#define VFMIN 0xEE
+#define VFMAX 0xEF
+
+extern char insn[6];
+asm(".pushsection .rwx,\"awx\",@progbits\n"
+".globl insn\n"
+/* e7 89 a0 00 2e ef */
+"insn: vfmaxsb %v24,%v25,%v26,0\n"
+".popsection\n");
+
+static void vfminmax(unsigned int op,
+ unsigned int m4, unsigned int m5, unsigned int m6,
+ void *v1, const void *v2, const void *v3)
+{
+   insn[3] = (m6 << 4) | m5;
+   insn[4] = (m4 << 4) | 0x0e;
+   insn[5] = op;
+
+asm("vl %%v25,%[v2]\n"
+"vl %%v26,%[v3]\n"
+"ex 0,%[insn]\n"
+"vst %%v24,%[v1]\n"
+: [v1] "=m" (*(char (*)[16])v1)
+: [v2] "m" (*(char (*)[16])v2)
+, [v3] "m" (*(char (*)[16])v3)
+, [insn] "m"(insn)
+: "v24", "v25", "v26");
+}
+
+/*
+ * Floating-point value classes.
+ */
+#define N_FORMATS 3
+#define N_SIGNED_CLASSES 8
+static const size_t float_sizes[N_FORMATS] = {
+/* M4 == 2: short*/ 4,
+/* M4 == 3: long */ 8,
+/* M4 == 4: extended */ 16,
+};
+static const size_t e_bits[N_FORMATS] = {
+/* M4 == 2: short*/ 8,
+/* M4 == 3: long */ 11,
+/* M4 == 4: extended */ 15,
+};
+static const unsigned char signed_floats[N_FORMATS][N_SIGNED_CLASSES][2][16] = 
{
+/* M4 == 2: short */
+{
+/* -inf */ {{0xff, 0x80, 0x00, 0x00},
+{0xff, 0x80, 0x00, 0x00}},
+/* -Fn */  {{0xc2, 0x28, 0x00, 0x00},
+{0xc2, 0x29, 0x00, 0x00}},
+/* -0 */   {{0x80, 0x00, 0x00, 0x00},
+{0x80, 0x00, 0x00, 0x00}},
+/* +0 */   {{0x00, 0x00, 0x00, 0x00},
+{0x00, 0x00, 0x00, 0x00}},
+/* +Fn */  {{0x42, 0x28, 0x00, 0x00},
+{0x42, 0x2a, 0x00, 0x00}},
+/* +inf */ {{0x7f, 0x80, 0x00, 0x00},
+{0x7f, 0x80, 0x00, 0x00}},
+/* QNaN */ {{0x7f, 0xff, 0xff, 0xff},
+{0x7f, 0xff, 0xff, 0xfe}},
+/* SNaN */ {{0x7f, 0xbf, 0xff, 0xff},
+{0x7f, 0xbf, 0xff, 0xfd}},
+},
+
+/* M4 == 3: long */
+{
+/* -inf */ {{0xff, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00},
+{0xff, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}},
+/* -Fn */  {{0xc0, 0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00},
+{0xc0, 0x46, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}},
+/* -0 */   {{0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00},
+{0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}},
+/* +0 */   {{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00},
+{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}},
+/* +Fn */  {{0x40, 0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00},
+{0x40, 0x47, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}},
+/* +inf */ {{0x7f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00},
+{0x7f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}},
+/* QNaN */ {{0x7f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+{0x7f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xfe}},
+/* SNaN */ {{0x7f, 0xf7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+{0x7f, 0xf7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xfd}},
+},
+
+/* M4 == 4: extended */
+{
+/* -inf */ {{0xff, 0xff, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00},
+{0xff, 0xff, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}},
+/* -Fn */  {{0xc0, 0x04, 0x50, 0x00, 0x00, 0x00, 

Re: [RFC PATCH] target/ppc: don't print TB in ppc_cpu_dump_state if it's not initialized

2022-07-13 Thread Matheus K. Ferst

On 12/07/2022 23:21, David Gibson wrote:

On Tue, Jul 12, 2022 at 06:13:44PM -0300, Daniel Henrique Barboza wrote:



On 7/12/22 16:25, Matheus Ferst wrote:

When using "-machine none", env->tb_env is not allocated, causing the
segmentation fault reported in issue #85 (launchpad bug #811683). To
avoid this problem, check if the pointer != NULL before calling the
methods to print TBU/TBL/DECR.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/85
Signed-off-by: Matheus Ferst 
---
This patch fixes the reported problem, but may be an incomplete solution
since many other places dereference env->tb_env without checking for
NULL. AFAICS, "-machine none" is the only way to trigger this problem,
and I'm not familiar with the use-cases for this option.


The "none"  machine type is mainly used by libvirt to do instrospection
of the available options/capabilities of the QEMU binary. It starts a QEMU
process like the following:

./x86_64-softmmu/qemu-system-x86_64 -S -no-user-config -nodefaults \
   -nographic -machine none,accel=kvm:tcg -daemonize

And then it uses QMP to probe the binary.

Aside from this libvirt usage I am not aware of anyone else using -machine
none extensively.


Right.  -machine none basically cannot work as a real machine for
POWER (maybe some other CPUs as well).  At least the more modern POWER
CPUs simply cannot boot without a bunch of supporting board/system
level elements, and there's not really a sane way to encode those into
individual emulated devices at present (maybe ever).

One of those things is that POWER expects the timebases to be
synchronized across all CPUs in the system, which obviously can't be
done locally to a single CPU chip.  It requires system level
operations, which is why it's handled by the machine type

[Example: a typical sequence which might be handled in hardware by
  low-level firmware would be to use machine-specific board-level
  registers to suspend the clock pulse to the CPUs which drives the
  timebase, then write the same value to the TB on each CPU, then
  (atomically) restart the clock pulse using board registers again]
 


So I guess it's safe to assume that it's impossible to run code with 
"-machine none", and then there would be no reason to check for NULL in 
the mtspr/mfspr path, right?



Should we stop assuming env->tb_env != NULL and add checks everywhere?
Or should we find a way to provide Time Base/Decrementer for
"-machine none"?
---


Are there other cases where env->tb_env can be NULL, aside from the case
reported in the bug?


If there are, I'd say that's a bug in the machine type.  Setting up
(and synchronizing) the timebase is part of the machine's job.



With "-machine none", it seems that the only places where it could 
happen are:


i) Monitor code: there are some other places where env_tb is used, like 
monitor_get_tb{u,l} and monitor_get_decr, so commands like "p $tbu" or 
"p $dect" cause a segfault too.
ii) mtspr/mfspr: it shouldn't be a problem if it's not possible to run 
code without a machine.
iii) gdbstub: we're not reading or setting TB{U,L} from gdb, which may 
be an issue on its own, but not related to #85.



I don't mind the bug fix, but I'm not fond of the idea of adding additional
checks because of this particular issue. I mean, the bug is using  the 'prep'
machine that Thomas removed year ago in b2ce76a0730. If there's no other
foreseeable problem, that we care about, with env->tb_env being NULL, IMO
let's fix the bug and move on.



I'll send a v2 fixing the other segfault in monitor, and then I guess we 
have a complete solution. Thanks Daniel and David for the feedback.


--
Matheus K. Ferst
Instituto de Pesquisas ELDORADO 
Analista de Software
Aviso Legal - Disclaimer 



[PATCH v2 0/3] target/s390x: vfmin/vfmax fixes

2022-07-13 Thread Ilya Leoshkevich
Hi,

Uli has found an issue with finding maximum of different kinds of 0s; I
wrote a test and found another one with finding maximum of different
kinds of NaNs.

Patches 1 and 2 fix those issues, patch 3 adds a vfmin/vfmax test.

Best regards,
Ilya

v1: https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg01915.html
v1 -> v2: Drop mmap() in the test (Richard).

Ilya Leoshkevich (3):
  target/s390x: fix handling of zeroes in vfmin/vfmax
  target/s390x: fix NaN propagation rules
  tests/tcg/s390x: test signed vfmin/vfmax

 fpu/softfloat-specialize.c.inc|   3 +-
 target/s390x/tcg/vec_fpu_helper.c |   4 +-
 tests/tcg/s390x/Makefile.target   |   7 +
 tests/tcg/s390x/vfminmax.c| 411 ++
 4 files changed, 422 insertions(+), 3 deletions(-)
 create mode 100644 tests/tcg/s390x/vfminmax.c

-- 
2.35.3




[PATCH v2 2/3] target/s390x: fix NaN propagation rules

2022-07-13 Thread Ilya Leoshkevich
s390x has the same NaN propagation rules as ARM, and not as x86.

Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Richard Henderson 
Reviewed-by: David Hildenbrand 
---
 fpu/softfloat-specialize.c.inc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 943e3301d2..a43ff5e02e 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -390,7 +390,8 @@ bool float32_is_signaling_nan(float32 a_, float_status 
*status)
 static int pickNaN(FloatClass a_cls, FloatClass b_cls,
bool aIsLargerSignificand, float_status *status)
 {
-#if defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA)
+#if defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
+defined(TARGET_S390X)
 /* ARM mandated NaN propagation rules (see FPProcessNaNs()), take
  * the first of:
  *  1. A if it is signaling
-- 
2.35.3




[PATCH v2 1/3] target/s390x: fix handling of zeroes in vfmin/vfmax

2022-07-13 Thread Ilya Leoshkevich
vfmin_res() / vfmax_res() are trying to check whether a and b are both
zeroes, but in reality they check that they are the same kind of zero.
This causes incorrect results when comparing positive and negative
zeroes.

Fixes: da4807527f3b ("s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)")
Co-developed-by: Ulrich Weigand 
Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Richard Henderson 
Reviewed-by: David Hildenbrand 
---
 target/s390x/tcg/vec_fpu_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/s390x/tcg/vec_fpu_helper.c 
b/target/s390x/tcg/vec_fpu_helper.c
index 2a618a1093..75cf605b9f 100644
--- a/target/s390x/tcg/vec_fpu_helper.c
+++ b/target/s390x/tcg/vec_fpu_helper.c
@@ -824,7 +824,7 @@ static S390MinMaxRes vfmin_res(uint16_t dcmask_a, uint16_t 
dcmask_b,
 default:
 g_assert_not_reached();
 }
-} else if (unlikely(dcmask_a & dcmask_b & DCMASK_ZERO)) {
+} else if (unlikely((dcmask_a & DCMASK_ZERO) && (dcmask_b & DCMASK_ZERO))) 
{
 switch (type) {
 case S390_MINMAX_TYPE_JAVA:
 return neg_a ? S390_MINMAX_RES_A : S390_MINMAX_RES_B;
@@ -874,7 +874,7 @@ static S390MinMaxRes vfmax_res(uint16_t dcmask_a, uint16_t 
dcmask_b,
 default:
 g_assert_not_reached();
 }
-} else if (unlikely(dcmask_a & dcmask_b & DCMASK_ZERO)) {
+} else if (unlikely((dcmask_a & DCMASK_ZERO) && (dcmask_b & DCMASK_ZERO))) 
{
 const bool neg_a = dcmask_a & DCMASK_NEGATIVE;
 
 switch (type) {
-- 
2.35.3




Re: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Mauricio Sandt

I want to argue the other way around. Why shouldn't those values
be tunable by the user? You are right; if misconfigured, it could 
potentially

break stuff on the driver side, but unless you manually set values for model
and firmware, the default is used (just like it is now), so this patch
wouldn't break any existing setups.
In my opinion, having more freedom in the values being used for model names
and similar is much better than hardcoding values. The previously hardcoded
values should remain the default if no custom value is provided.
My specific use case that required this patch is a piece of malware that 
used

several IOCTLs to read model, firmware, and nqn from the NVMe attached to
the VM. Modifying that info at the hypervisor level was a much better 
approach

than loading an (unsigned) driver into windows to intercept said IOCTLs.
Having this patch in the official qemu version would help me a lot in my 
test

lab, and I'm pretty sure it would also help other people.

I guess there could be a warning about potential problems with drivers 
in the
description for model, firmware, and nqn, but I haven't experienced any 
issues

when changing the values (for all of them). If you encountered any problems,
how did they manifest?

On 13/07/2022 19:03, Keith Busch wrote:

On Sun, Jun 12, 2022 at 12:35:09AM +0200, Mauricio Sandt wrote:

This small patch is the result of some recent malware research I did
in a QEMU VM. The malware used multiple ways of querying info from
the VM disk and I needed a clean way to change those values from the
hypervisor.

I believe this functionality could be useful to more people from multiple
fields, sometimes you just want to change some default values and having
them hardcoded in the sourcecode makes that much harder.

This patch adds three config parameters to the nvme device, all of them
are optional to not break anything. If any of them are not specified,
the previous (before this patch) default is used.

-model - This takes a string and sets it as the devices model name.
If you don't specify this parameter, the default is "QEMU NVMe Ctrl".

-firmware - The firmware version string, max 8 ascii characters.
The default is whatever `QEMU_VERSION` evaluates to.

-nqn_override - Allows to set a custom nqn on the nvme device.
Only used if there is no subsystem. This string should be in the same
format as the default "nqn.2019-08.org.qemu:...", but there is no
validation for that. Its up to the user to provide a valid string.

I guess the nqn can be user tunable just like it is when used with subsystems,
but what's the point of messing with model and firmware? That could mess with
host drivers' ability to detect what quirks it needs to apply to specific
instances of this virtual controller.





Re: [PATCH v7 10/13] migration: Export ram_release_page()

2022-07-13 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> Signed-off-by: Juan Quintela 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/ram.h | 1 +
>  migration/ram.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.h b/migration/ram.h
> index 7b641adc55..aee08de2a5 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -66,6 +66,7 @@ int ram_load_postcopy(QEMUFile *f);
>  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>  
>  void ram_transferred_add(uint64_t bytes);
> +void ram_release_page(const char *rbname, uint64_t offset);
>  
>  int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
>  bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t 
> byte_offset);
> diff --git a/migration/ram.c b/migration/ram.c
> index 71506b1b20..3b2af07341 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1182,7 +1182,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
>  }
>  }
>  
> -static void ram_release_page(const char *rbname, uint64_t offset)
> +void ram_release_page(const char *rbname, uint64_t offset)
>  {
>  if (!migrate_release_ram() || !migration_in_postcopy()) {
>  return;
> -- 
> 2.35.3
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH] hw/ppc: pass random seed to fdt

2022-07-13 Thread Jason A. Donenfeld
Hi Daniel,

On Wed, Jul 13, 2022 at 7:37 PM Daniel Henrique Barboza
 wrote:
>
>
>
> On 7/13/22 14:30, Jason A. Donenfeld wrote:
> > Hi Daniel,
> >
> > On Tue, Jul 12, 2022 at 05:31:27PM -0300, Daniel Henrique Barboza wrote:
> >> CCing qemu-ppc and Cedric for awareness since I forgot to do so in
> >> my reply (⌒_⌒;)
> >>> Reviewed-by: Daniel Henrique Barboza 
> >
> > Thanks for the review and for forwarding this to qemu-ppc. What's the
> > route this patch needs to take in order to make it into some tree
> > somewhere? Can somebody queue it up?
>
>
> I'll queue it up shortly in my ppc-next tree in Gitlab at
>
> gitlab.com/danielhb/qemu/tree/ppc-next
>
>
> After that I'll send a pull request get it merged with upstream. Probably
> end of this week/next Monday.

Excellent, thanks!

Jason



Re: [PATCH] hw/ppc: pass random seed to fdt

2022-07-13 Thread Daniel Henrique Barboza




On 7/13/22 14:30, Jason A. Donenfeld wrote:

Hi Daniel,

On Tue, Jul 12, 2022 at 05:31:27PM -0300, Daniel Henrique Barboza wrote:

CCing qemu-ppc and Cedric for awareness since I forgot to do so in
my reply (⌒_⌒;)

Reviewed-by: Daniel Henrique Barboza 


Thanks for the review and for forwarding this to qemu-ppc. What's the
route this patch needs to take in order to make it into some tree
somewhere? Can somebody queue it up?



I'll queue it up shortly in my ppc-next tree in Gitlab at

gitlab.com/danielhb/qemu/tree/ppc-next


After that I'll send a pull request get it merged with upstream. Probably
end of this week/next Monday.


Thanks,


Daniel




Regards,
Jason




Re: [PATCH v2] memory: prevent dma-reentracy issues

2022-07-13 Thread Stefan Hajnoczi
On Wed, 13 Jul 2022 at 16:51, Alexander Bulekov  wrote:
>
> On 220712 1034, Stefan Hajnoczi wrote:
> > On Tue, Jun 21, 2022 at 11:53:06AM -0400, Alexander Bulekov wrote:
> > > On 220621 1630, Peter Maydell wrote:
> > > > On Thu, 9 Jun 2022 at 14:59, Alexander Bulekov  wrote:
> > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > index 44dacfa224..ab1ad0f7a8 100644
> > > > > --- a/include/hw/pci/pci.h
> > > > > +++ b/include/hw/pci/pci.h
> > > > > @@ -834,8 +834,17 @@ static inline MemTxResult pci_dma_rw(PCIDevice 
> > > > > *dev, dma_addr_t addr,
> > > > >   void *buf, dma_addr_t len,
> > > > >   DMADirection dir, MemTxAttrs 
> > > > > attrs)
> > > > >  {
> > > > > -return dma_memory_rw(pci_get_address_space(dev), addr, buf, len,
> > > > > - dir, attrs);
> > > > > +bool prior_engaged_state;
> > > > > +MemTxResult result;
> > > > > +
> > > > > +prior_engaged_state = dev->qdev.engaged_in_io;
> > > > > +
> > > > > +dev->qdev.engaged_in_io = true;
> > > > > +result = dma_memory_rw(pci_get_address_space(dev), addr, buf, 
> > > > > len,
> > > > > +   dir, attrs);
> > > > > +dev->qdev.engaged_in_io = prior_engaged_state;
> > > > > +
> > > > > +return result;
> > > >
> > > > Why do we need to do something in this pci-specific function ?
> > > > I was expecting this to only need changes at the generic-to-all-devices
> > > > level.
> > >
> > > Both of these handle the BH->DMA->MMIO case. Unlike MMIO, I don't think
> > > there is any neat way to set the engaged_in_io flag as we enter a BH. So
> > > instead, we try to set it when a device initiates DMA.
> > >
> > > The pci function lets us do that since we get a glimpse of the dev/qdev
> > > (unlike the dma_memory_...  functions).
> > ...
> > > > > @@ -302,6 +310,10 @@ static MemTxResult dma_buf_rw(void *buf, 
> > > > > dma_addr_t len, dma_addr_t *residual,
> > > > >  xresidual -= xfer;
> > > > >  }
> > > > >
> > > > > +if (dev) {
> > > > > +dev->engaged_in_io = prior_engaged_state;
> > > > > +}
> > > >
> > > > Not all DMA goes through dma_buf_rw() -- why does it need changes?
> > >
> > > This one has the same goal, but accesses the qdev through sg, instead of
> > > PCI.
> >
> > Should dma_*() APIs take a reentrancy guard argument so that all DMA
> > accesses are systematically covered?
>
> That seems like it would be the best option, though it carries the cost
> of needing to tweak a lot of code in hw/. Maybe some refactoring tool
> could help with this.

Coccinelle is good at adding/removing arguments, but it doesn't know
how to get the DeviceState from, say, a PCIDevice. It may be possible
to cover the majority of cases automatically by writing a cocci
pattern that applies when the containing function takes a PCIDevice*
argument, for example. Some manual work would still be necessary.

Stefan



Re: [PATCH] hw/ppc: pass random seed to fdt

2022-07-13 Thread Jason A. Donenfeld
Hi Daniel,

On Tue, Jul 12, 2022 at 05:31:27PM -0300, Daniel Henrique Barboza wrote:
> CCing qemu-ppc and Cedric for awareness since I forgot to do so in
> my reply (⌒_⌒;)
> > Reviewed-by: Daniel Henrique Barboza 

Thanks for the review and for forwarding this to qemu-ppc. What's the
route this patch needs to take in order to make it into some tree
somewhere? Can somebody queue it up?

Regards,
Jason



Re: [PATCH] hw/riscv: virt: pass random seed to fdt

2022-07-13 Thread Jason A. Donenfeld
Hi again,

On Mon, Jul 11, 2022 at 06:45:42PM +0200, Jason A. Donenfeld wrote:
> I've reproduced the problem and determined the root cause. This is a
> generic issue with the mmio get_cycles() implementation before 5.9 on
> no-MMU configs, which was fixed during the 5.9 cycle. I don't believe
> that this is the only thing affected on that .0 kernel, where fixes were
> ostensibly backported. Given the relative age of risc-v, the fact that
> 5.8.0 was broken anyway, and that likely nobody is using this kernel in
> that configuration without applying updates, I'm pretty sure my patch is
> safe to apply. I'd recommend updating the broken kernel in your CI.
> 
> Meanwhile, the rng-seed field is part of the DT spec. Holding back the
> (virtual) hardware just because some random dot-zero non-LTS release had
> a quickly fixed bug seems ridiculous, and the way in which progress gets
> held up, hacks accumulate, and generally nothing good gets done. It will
> only hamper security, functionality, and boot speed, while helping no
> real practical case that can't be fixed in a better way.
> 
> So I believe you should apply the rng-seed commit so that the RISC-V
> machine honors that DT field.
> 
> Regards,
> Jason
> 

Just following up on this... Hoping we can get this into a tree soon.

Thanks,
Jason



Re: [PATCH v6 12/13] qemu-sockets: update socket_uri() to be consistent with socket_parse()

2022-07-13 Thread Daniel P . Berrangé
On Wed, Jul 13, 2022 at 06:46:17PM +0200, Laurent Vivier wrote:
> On 12/07/2022 14:05, Dr. David Alan Gilbert wrote:
> > * Laurent Vivier (lviv...@redhat.com) wrote:
> > > Remove 'tcp:' prefix for inet type (because inet can be 'tcp' or 'udp'
> > > and socket_parse() doesn't recognize it), the format is 'host:port'.
> > 
> > I don't think I understand why tests/qtest/migration-test.c
> > test_precopy_common is happy with this; it does:
> > 
> >  if (!args->connect_uri) {
> >  g_autofree char *local_connect_uri =
> >  migrate_get_socket_address(to, "socket-address");
> >  migrate_qmp(from, local_connect_uri, "{}");
> > 
> > which hmm, is the code you're changing what was in SocketAddress_to_str
> > which is what migrate_get_socket_address uses; but then the migrate_qmp
> > I don't think will take a migrate uri without the tcp: on the front.
> 
> It's a good point.
> 
> I think socket_parse() should accept the 'tcp:' prefix, and thus
> socket_uri() should generate it, so it will be usable with the qmp migrate
> command.
> 
> I'm going to add 'tcp:' case in socket_parse() and make socket_uri() to 
> generate it.

I'd say any code in util/qemu-sockets.c should only work in terms of
generic concepts. If we're formattting/parsing a migration URI, the
helper APIs should live in the migration/ subdir.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Keith Busch
On Sun, Jun 12, 2022 at 12:35:09AM +0200, Mauricio Sandt wrote:
> This small patch is the result of some recent malware research I did
> in a QEMU VM. The malware used multiple ways of querying info from
> the VM disk and I needed a clean way to change those values from the
> hypervisor.
> 
> I believe this functionality could be useful to more people from multiple
> fields, sometimes you just want to change some default values and having
> them hardcoded in the sourcecode makes that much harder.
> 
> This patch adds three config parameters to the nvme device, all of them
> are optional to not break anything. If any of them are not specified,
> the previous (before this patch) default is used.
> 
> -model - This takes a string and sets it as the devices model name.
> If you don't specify this parameter, the default is "QEMU NVMe Ctrl".
> 
> -firmware - The firmware version string, max 8 ascii characters.
> The default is whatever `QEMU_VERSION` evaluates to.
> 
> -nqn_override - Allows to set a custom nqn on the nvme device.
> Only used if there is no subsystem. This string should be in the same
> format as the default "nqn.2019-08.org.qemu:...", but there is no
> validation for that. Its up to the user to provide a valid string.

I guess the nqn can be user tunable just like it is when used with subsystems,
but what's the point of messing with model and firmware? That could mess with
host drivers' ability to detect what quirks it needs to apply to specific
instances of this virtual controller.



[RFC PATCH v3 3/3] target/ppc: Implement hashstp and hashchkp

2022-07-13 Thread Víctor Colombo
Implementation for instructions hashstp and hashchkp, the privileged
versions of hashst and hashchk, which were added in Power ISA 3.1B.

Signed-off-by: Víctor Colombo 
---
 target/ppc/cpu.h   | 1 +
 target/ppc/cpu_init.c  | 3 +++
 target/ppc/excp_helper.c   | 2 ++
 target/ppc/helper.h| 2 ++
 target/ppc/insn32.decode   | 2 ++
 target/ppc/translate/fixedpoint-impl.c.inc | 2 ++
 6 files changed, 12 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index f3f98d7a01..e6fc9c41f0 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1678,6 +1678,7 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define SPR_TIR   (0x1BE)
 #define SPR_PTCR  (0x1D0)
 #define SPR_POWER_HASHKEYR(0x1D4)
+#define SPR_POWER_HASHPKEYR   (0x1D5)
 #define SPR_BOOKE_SPEFSCR (0x200)
 #define SPR_Exxx_BBEAR(0x201)
 #define SPR_Exxx_BBTAR(0x202)
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index a2bbb84d47..3e704304b1 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6493,6 +6493,9 @@ static void init_proc_POWER10(CPUPPCState *env)
 spr_register_kvm(env, SPR_POWER_HASHKEYR, "HASHKEYR",
 SPR_NOACCESS, SPR_NOACCESS, _read_generic, _write_generic,
 KVM_REG_PPC_HASHKEYR, 0x0);
+spr_register_kvm(env, SPR_POWER_HASHPKEYR, "HASHPKEYR",
+SPR_NOACCESS, SPR_NOACCESS, _read_generic, _write_generic,
+KVM_REG_PPC_HASHPKEYR, 0x0);
 
 /* env variables */
 env->dcache_line_size = 128;
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 34893bdf9f..0998e8374e 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -2253,6 +2253,8 @@ void helper_##op(CPUPPCState *env, target_ulong ea, 
target_ulong ra,  \
 
 HELPER_HASH(HASHST, env->spr[SPR_POWER_HASHKEYR], true)
 HELPER_HASH(HASHCHK, env->spr[SPR_POWER_HASHKEYR], false)
+HELPER_HASH(HASHSTP, env->spr[SPR_POWER_HASHPKEYR], true)
+HELPER_HASH(HASHCHKP, env->spr[SPR_POWER_HASHPKEYR], false)
 
 #if !defined(CONFIG_USER_ONLY)
 
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index d455b9d97a..cf68ba458d 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -6,6 +6,8 @@ DEF_HELPER_FLAGS_4(td, TCG_CALL_NO_WG, void, env, tl, tl, i32)
 #endif
 DEF_HELPER_4(HASHST, void, env, tl, tl, tl)
 DEF_HELPER_4(HASHCHK, void, env, tl, tl, tl)
+DEF_HELPER_4(HASHSTP, void, env, tl, tl, tl)
+DEF_HELPER_4(HASHCHKP, void, env, tl, tl, tl)
 #if !defined(CONFIG_USER_ONLY)
 DEF_HELPER_2(store_msr, void, env, tl)
 DEF_HELPER_1(rfi, void, env)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 37ec6b2681..64f92a0524 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -321,6 +321,8 @@ PEXTD   01 . . . 001000 -   @X
 
 HASHST  01 . . . 1011010010 .   @X_DW
 HASHCHK 01 . . . 100010 .   @X_DW
+HASHSTP 01 . . . 1010010010 .   @X_DW
+HASHCHKP01 . . . 1010110010 .   @X_DW
 
 ## BCD Assist
 
diff --git a/target/ppc/translate/fixedpoint-impl.c.inc 
b/target/ppc/translate/fixedpoint-impl.c.inc
index 41c06de8a2..1ba56cbed5 100644
--- a/target/ppc/translate/fixedpoint-impl.c.inc
+++ b/target/ppc/translate/fixedpoint-impl.c.inc
@@ -572,3 +572,5 @@ static bool do_hash(DisasContext *ctx, arg_X *a, bool priv,
 
 TRANS(HASHST, do_hash, false, gen_helper_HASHST)
 TRANS(HASHCHK, do_hash, false, gen_helper_HASHCHK)
+TRANS(HASHSTP, do_hash, true, gen_helper_HASHSTP)
+TRANS(HASHCHKP, do_hash, true, gen_helper_HASHCHKP)
-- 
2.25.1




[RFC PATCH v3 2/3] target/ppc: Implement hashst and hashchk

2022-07-13 Thread Víctor Colombo
Implementation for instructions hashst and hashchk, which were added
in Power ISA 3.1B.

It was decided to implement the hash algorithm from ground up in this
patch exactly as described in Power ISA.

Signed-off-by: Víctor Colombo 
---
 target/ppc/cpu.h   |  1 +
 target/ppc/cpu_init.c  |  4 ++
 target/ppc/excp_helper.c   | 80 ++
 target/ppc/helper.h|  2 +
 target/ppc/insn32.decode   |  8 +++
 target/ppc/translate.c |  5 ++
 target/ppc/translate/fixedpoint-impl.c.inc | 32 +
 7 files changed, 132 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 7aaff9dcc5..f3f98d7a01 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1677,6 +1677,7 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define SPR_BOOKE_GIVOR14 (0x1BD)
 #define SPR_TIR   (0x1BE)
 #define SPR_PTCR  (0x1D0)
+#define SPR_POWER_HASHKEYR(0x1D4)
 #define SPR_BOOKE_SPEFSCR (0x200)
 #define SPR_Exxx_BBEAR(0x201)
 #define SPR_Exxx_BBTAR(0x202)
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 1da5f1f1d8..a2bbb84d47 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6490,6 +6490,10 @@ static void init_proc_POWER10(CPUPPCState *env)
 spr_read_generic, spr_write_generic,
 KVM_REG_PPC_PSSCR, 0);
 
+spr_register_kvm(env, SPR_POWER_HASHKEYR, "HASHKEYR",
+SPR_NOACCESS, SPR_NOACCESS, _read_generic, _write_generic,
+KVM_REG_PPC_HASHKEYR, 0x0);
+
 /* env variables */
 env->dcache_line_size = 128;
 env->icache_line_size = 128;
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index cb752b184a..34893bdf9f 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -2174,6 +2174,86 @@ void helper_td(CPUPPCState *env, target_ulong arg1, 
target_ulong arg2,
 #endif
 #endif
 
+static uint32_t helper_SIMON_LIKE_32_64(uint32_t x, uint64_t key, uint32_t 
lane)
+{
+const uint16_t c = 0xfffc;
+const uint64_t z0 = 0xfa2561cdf44ac398ULL;
+uint16_t z = 0, temp;
+uint16_t k[32], eff_k[32], xleft[33], xright[33], fxleft[32];
+
+for (int i = 3; i >= 0; i--) {
+k[i] = key & 0x;
+key >>= 16;
+}
+xleft[0] = x & 0x;
+xright[0] = (x >> 16) & 0x;
+
+for (int i = 0; i < 28; i++) {
+z = (z0 >> (63 - i)) & 1;
+temp = ror16(k[i + 3], 3) ^ k[i + 1];
+k[i + 4] = c ^ z ^ k[i] ^ temp ^ ror16(temp, 1);
+}
+
+for (int i = 0; i < 8; i++) {
+eff_k[4 * i + 0] = k[4 * i + ((0 + lane) % 4)];
+eff_k[4 * i + 1] = k[4 * i + ((1 + lane) % 4)];
+eff_k[4 * i + 2] = k[4 * i + ((2 + lane) % 4)];
+eff_k[4 * i + 3] = k[4 * i + ((3 + lane) % 4)];
+}
+
+for (int i = 0; i < 32; i++) {
+fxleft[i] = (rol16(xleft[i], 1) &
+rol16(xleft[i], 8)) ^ rol16(xleft[i], 2);
+xleft[i + 1] = xright[i] ^ fxleft[i] ^ eff_k[i];
+xright[i + 1] = xleft[i];
+}
+
+return (((uint32_t)xright[32]) << 16) | xleft[32];
+}
+
+static uint64_t hash_digest(uint64_t ra, uint64_t rb, uint64_t key)
+{
+uint64_t stage0_h = 0ULL, stage0_l = 0ULL;
+uint64_t stage1_h, stage1_l;
+
+for (int i = 0; i < 4; i++) {
+stage0_h |= ror64(rb & 0xff, 8 * (2 * i + 1));
+stage0_h |= ((ra >> 32) & 0xff) << (8 * 2 * i);
+stage0_l |= ror64((rb >> 32) & 0xff, 8 * (2 * i + 1));
+stage0_l |= (ra & 0xff) << (8 * 2 * i);
+rb >>= 8;
+ra >>= 8;
+}
+
+stage1_h = (uint64_t)helper_SIMON_LIKE_32_64(stage0_h >> 32, key, 0) << 32;
+stage1_h |= helper_SIMON_LIKE_32_64(stage0_h, key, 1);
+stage1_l = (uint64_t)helper_SIMON_LIKE_32_64(stage0_l >> 32, key, 2) << 32;
+stage1_l |= helper_SIMON_LIKE_32_64(stage0_l, key, 3);
+
+return stage1_h ^ stage1_l;
+}
+
+#define HELPER_HASH(op, key, store)   \
+void helper_##op(CPUPPCState *env, target_ulong ea, target_ulong ra,  \
+ target_ulong rb) \
+{ \
+uint64_t chash = hash_digest(ra, rb, key), lhash; \
+  \
+if (store) {  \
+cpu_stq_data_ra(env, ea, chash, GETPC()); \
+} else {  \
+lhash = cpu_ldq_data_ra(env, ea, GETPC());\
+if (lhash != chash) { \
+/* hashes don't match, trap */\
+raise_exception_err_ra(env, 

[RFC PATCH v3 1/3] linux-headers/asm-powerpc/kvm.h: Add HASHKEYR and HASHPKEYR in headers

2022-07-13 Thread Víctor Colombo
Linux KVM currently does not export these registers. Create
placeholders for them to allow implementing hashchk(p) and
hashst(p) instructions from PowerISA 3.1B.

Signed-off-by: Víctor Colombo 
---
 linux-headers/asm-powerpc/kvm.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 9f18fa090f..4ae4718143 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -646,6 +646,9 @@ struct kvm_ppc_cpu_char {
 #define KVM_REG_PPC_SIER3  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc3)
 #define KVM_REG_PPC_DAWR1  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc4)
 #define KVM_REG_PPC_DAWRX1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc5)
+/* FIXME: KVM hasn't exposed these registers yet */
+#define KVM_REG_PPC_HASHKEYR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x00)
+#define KVM_REG_PPC_HASHPKEYR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x00)
 
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
-- 
2.25.1




[RFC PATCH v3 0/3] Implement Power ISA 3.1B hash insns

2022-07-13 Thread Víctor Colombo
This patch series implements the 4 instructions added in Power ISA
3.1B:

- hashchk
- hashst
- hashchkp
- hashstp

To build it, you need to apply the following patches on top of master:
<20220701133507.740619-2-lucas.couti...@eldorado.org.br>
<20220701133507.740619-3-lucas.couti...@eldorado.org.br>
<20220712193741.59134-2-leandro.lup...@eldorado.org.br>
<20220712193741.59134-3-leandro.lup...@eldorado.org.br>

Working branch for ease of use can be found here:
https://github.com/PPC64/qemu/tree/vccolombo-hash-to-send-v3

What do you think about the choice to implement the hash algorithm
from the ground up, following the SIMON-like algorithm presented in
Power ISA? IIUC, this algorithm is not the same as the original[1].
Other options would be to use other algorithm already implemented
in QEMU, or even make this instruction a nop for all Power versions.

Also, I was thinking about using the call to spr_register_kvm() in
init_proc_POWER10 to initialize the registers with a random value.
I'm not sure what is the behavior here, I would expect that is the job
of the OS to set the regs, but looks like KVM is not exporting them,
so they are always 0 (?). Does anyone have any insight on this?

v1->v2:
- Split the patch in 2
- Rebase to master

v2->v3:
- Split patches in 3
- the new patch (patch 1) is separating the kvm header 
  changes [Cornelia]

[1] https://eprint.iacr.org/2013/404.pdf

Víctor Colombo (3):
  linux-headers/asm-powerpc/kvm.h: Add HASHKEYR and HASHPKEYR in headers
  target/ppc: Implement hashst and hashchk
  target/ppc: Implement hashstp and hashchkp

 linux-headers/asm-powerpc/kvm.h|  3 +
 target/ppc/cpu.h   |  2 +
 target/ppc/cpu_init.c  |  7 ++
 target/ppc/excp_helper.c   | 82 ++
 target/ppc/helper.h|  4 ++
 target/ppc/insn32.decode   | 10 +++
 target/ppc/translate.c |  5 ++
 target/ppc/translate/fixedpoint-impl.c.inc | 34 +
 8 files changed, 147 insertions(+)

-- 
2.25.1




Ping: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Mauricio Sandt

https://patchew.org/QEMU/20220611223509.32280-1-mauri...@mailbox.org/
https://lore.kernel.org/qemu-devel/20220611223509.32280-1-mauri...@mailbox.org/

On 12/06/2022 00:35, Mauricio Sandt wrote:

This small patch is the result of some recent malware research I did
in a QEMU VM. The malware used multiple ways of querying info from
the VM disk and I needed a clean way to change those values from the
hypervisor.

I believe this functionality could be useful to more people from multiple
fields, sometimes you just want to change some default values and having
them hardcoded in the sourcecode makes that much harder.

This patch adds three config parameters to the nvme device, all of them
are optional to not break anything. If any of them are not specified,
the previous (before this patch) default is used.

-model - This takes a string and sets it as the devices model name.
If you don't specify this parameter, the default is "QEMU NVMe Ctrl".

-firmware - The firmware version string, max 8 ascii characters.
The default is whatever `QEMU_VERSION` evaluates to.

-nqn_override - Allows to set a custom nqn on the nvme device.
Only used if there is no subsystem. This string should be in the same
format as the default "nqn.2019-08.org.qemu:...", but there is no
validation for that. Its up to the user to provide a valid string.

Signed-off-by: Mauricio Sandt
---
  hw/nvme/ctrl.c | 16 +---
  hw/nvme/nvme.h |  3 +++
  2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 1e6e0fcad9..0e67217a63 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6697,8 +6697,13 @@ static void nvme_init_subnqn(NvmeCtrl *n)
  NvmeIdCtrl *id = >id_ctrl;
  
  if (!subsys) {

-snprintf((char *)id->subnqn, sizeof(id->subnqn),
+if (n->params.nqn_override) {
+snprintf((char *)id->subnqn, sizeof(id->subnqn),
+ "%s", n->params.nqn_override);
+} else {
+snprintf((char *)id->subnqn, sizeof(id->subnqn),
   "nqn.2019-08.org.qemu:%s", n->params.serial);
+}
  } else {
  pstrcpy((char *)id->subnqn, sizeof(id->subnqn), 
(char*)subsys->subnqn);
  }
@@ -6712,8 +6717,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
  
  id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));

  id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
-strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
-strpadcpy((char *)id->fr, sizeof(id->fr), QEMU_VERSION, ' ');
+strpadcpy((char *)id->mn, sizeof(id->mn),
+n->params.model ? n->params.model : "QEMU NVMe Ctrl", ' ');
+strpadcpy((char *)id->fr, sizeof(id->fr),
+n->params.firmware ? n->params.firmware : QEMU_VERSION, ' ');
  strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
  
  id->cntlid = cpu_to_le16(n->cntlid);

@@ -6913,6 +6920,9 @@ static Property nvme_props[] = {
  DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
   NvmeSubsystem *),
  DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
+DEFINE_PROP_STRING("model", NvmeCtrl, params.model),
+DEFINE_PROP_STRING("nqn_override", NvmeCtrl, params.nqn_override),
+DEFINE_PROP_STRING("firmware", NvmeCtrl, params.firmware),
  DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0),
  DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0),
  DEFINE_PROP_UINT32("max_ioqpairs", NvmeCtrl, params.max_ioqpairs, 64),
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index e41771604f..45bcf3e02e 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -394,6 +394,9 @@ typedef struct NvmeCQueue {
  
  typedef struct NvmeParams {

  char *serial;
+char *model;
+char *firmware;
+char *nqn_override;
  uint32_t num_queues; /* deprecated since 5.1 */
  uint32_t max_ioqpairs;
  uint16_t msix_qsize;


Re: [PATCH v6 12/13] qemu-sockets: update socket_uri() to be consistent with socket_parse()

2022-07-13 Thread Laurent Vivier

On 12/07/2022 14:05, Dr. David Alan Gilbert wrote:

* Laurent Vivier (lviv...@redhat.com) wrote:

Remove 'tcp:' prefix for inet type (because inet can be 'tcp' or 'udp'
and socket_parse() doesn't recognize it), the format is 'host:port'.


I don't think I understand why tests/qtest/migration-test.c
test_precopy_common is happy with this; it does:

 if (!args->connect_uri) {
 g_autofree char *local_connect_uri =
 migrate_get_socket_address(to, "socket-address");
 migrate_qmp(from, local_connect_uri, "{}");

which hmm, is the code you're changing what was in SocketAddress_to_str
which is what migrate_get_socket_address uses; but then the migrate_qmp
I don't think will take a migrate uri without the tcp: on the front.


It's a good point.

I think socket_parse() should accept the 'tcp:' prefix, and thus socket_uri() should 
generate it, so it will be usable with the qmp migrate command.


I'm going to add 'tcp:' case in socket_parse() and make socket_uri() to 
generate it.

Thanks,
Laurent




Re: [PATCH 3/3] tests/tcg/s390x: test signed vfmin/vfmax

2022-07-13 Thread Ilya Leoshkevich
On Wed, 2022-07-13 at 21:14 +0530, Richard Henderson wrote:
> On 7/12/22 18:02, Ilya Leoshkevich wrote:
> > > This works, of course.  It could be simpler using EXECUTE, to
> > > store
> > > just the one
> > > instruction and not worry about an executable mapped page, but I
> > > guess it doesn't matter.
> > 
> > I thought about this too, but EX/EXRL operate only on the second
> > byte,
> > and I need to modify bytes 3-5 here.
> 
> I didn't mean modify the instruction via EX, but something like
> 
>    static char minmax[6] __attribute__((aligned(2)))
>  = { xx, yy, zz, 0, 0, 0 };
> 
>    minmax[3] = m6 ...
>    minmax[4] = ...
>    minmax[5] = op;
> 
>    asm("vl %%v25,0(%1)\n"
>    "vl %%v26,0(%2)\n"
>    "ex 0,0(%3)\n"
>    "vst %%v24,0(%0)"
>    : : "a"(v1), "a"(v2), "a"(v3), "a"(minmax)
>    : "memory", "v24", "v25", "v26);
> 
> 
> r~

Nice trick!

This works in qemu, but not natively: EX target must be executable.
I'd still like to try to find a way to establish an rwx section, and
send a v2 with this improvement.

I guess we'll need to fix the access check discrepancy some day.



Re: [PATCH 2/5] hw/intc/loongarch_pch_pic: Fix coverity errors in update irq

2022-07-13 Thread Richard Henderson

On 7/13/22 15:20, Xiaojuan Yang wrote:

Fix coverity errors:
1. In find_first_bit function, the 'size' argument need
'unsigned long' type, so we change the 'size' to unsigned
long type when use the function.
2. In expression 1ULL << irq, left shifting by more than
63 bits has undefined behavior. And out-of-bounds access
error occured when 'irq' >= 64. So we add a condition to
avoid this.
3. Use 'MAKE_64BIT_MASK(irq, 1)' to replace '1ULL << shift'.

Fix coverity CID: 1489761 1489764 1489765

Signed-off-by: Xiaojuan Yang 
---
  hw/intc/loongarch_pch_pic.c | 19 ---
  1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/intc/loongarch_pch_pic.c b/hw/intc/loongarch_pch_pic.c
index 3c9814a3b4..040b89861c 100644
--- a/hw/intc/loongarch_pch_pic.c
+++ b/hw/intc/loongarch_pch_pic.c
@@ -15,22 +15,27 @@
  
  static void pch_pic_update_irq(LoongArchPCHPIC *s, uint64_t mask, int level)

  {
-unsigned long val;
+unsigned long val, max_irq;


You did not follow any of my direction from v1.

(1) val must be uint64_t.

(and, generally, any use of 'unsigned long' is probably a bug)


+irq = find_first_bit(, max_irq);


Use ctz64().


+if (irq < max_irq) {


This, really, should be a test of val != 0 before the ctz.



+s->intisr |= MAKE_64BIT_MASK(irq, 1);
+qemu_set_irq(s->parent_irq[s->htmsi_vector[irq]], 1);
+}
  }
  } else {
  val = mask & s->intisr;
  if (val) {
-irq = find_first_bit(, 64);
-s->intisr &= ~(0x1ULL << irq);
-qemu_set_irq(s->parent_irq[s->htmsi_vector[irq]], 0);
+irq = find_first_bit(, max_irq);
+if (irq < max_irq) {
+s->intisr &= ~(MAKE_64BIT_MASK(irq, 1));
+qemu_set_irq(s->parent_irq[s->htmsi_vector[irq]], 0);


etc.


r~


+}
  }
  }
  }





Re: [PATCH 5/5] target/loongarch/op_helper: Fix coverity cond_at_most error

2022-07-13 Thread Richard Henderson

On 7/13/22 15:20, Xiaojuan Yang wrote:

The boundary size of cpucfg array should be 0 to 20. So,
using index bigger than 20 to access cpucfg[] must be forbidden.


You must update the comment to match the code,
which no longer mentions "20" at all.  With that change,

Reviewed-by: Richard Henderson 

r~



Fix coverity CID: 1489760

Signed-off-by: Xiaojuan Yang 
---
  target/loongarch/op_helper.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/loongarch/op_helper.c b/target/loongarch/op_helper.c
index 4b429b6699..568c071601 100644
--- a/target/loongarch/op_helper.c
+++ b/target/loongarch/op_helper.c
@@ -81,7 +81,7 @@ target_ulong helper_crc32c(target_ulong val, target_ulong m, 
uint64_t sz)
  
  target_ulong helper_cpucfg(CPULoongArchState *env, target_ulong rj)

  {
-return rj > 21 ? 0 : env->cpucfg[rj];
+return rj >= ARRAY_SIZE(env->cpucfg) ? 0 : env->cpucfg[rj];
  }
  
  uint64_t helper_rdtime_d(CPULoongArchState *env)





Re: [PATCH v2 07/11] linux header sync

2022-07-13 Thread Marc-André Lureau
On Wed, Jul 13, 2022 at 5:07 PM Janosch Frank  wrote:
>
> Signed-off-by: Janosch Frank 

Please tell which version this update come from. Otherwise, it should be fine
Reviewed-by: Marc-André Lureau 

> ---
>  linux-headers/linux/kvm.h | 55 +++
>  1 file changed, 55 insertions(+)
>
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index 0d05d02ee4..ae5db2e44c 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> @@ -1150,6 +1150,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_DISABLE_QUIRKS2 213
>  /* #define KVM_CAP_VM_TSC_CONTROL 214 */
>  #define KVM_CAP_SYSTEM_EVENT_DATA 215
> +#define KVM_CAP_S390_PROTECTED_DUMP 217
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -1651,6 +1652,55 @@ struct kvm_s390_pv_unp {
> __u64 tweak;
>  };
>
> +enum pv_cmd_info_id {
> +   KVM_PV_INFO_VM,
> +   KVM_PV_INFO_DUMP,
> +};
> +
> +struct kvm_s390_pv_info_dump {
> +   __u64 dump_cpu_buffer_len;
> +   __u64 dump_config_mem_buffer_per_1m;
> +   __u64 dump_config_finalize_len;
> +};
> +
> +struct kvm_s390_pv_info_vm {
> +   __u64 inst_calls_list[4];
> +   __u64 max_cpus;
> +   __u64 max_guests;
> +   __u64 max_guest_addr;
> +   __u64 feature_indication;
> +};
> +
> +struct kvm_s390_pv_info_header {
> +   __u32 id;
> +   __u32 len_max;
> +   __u32 len_written;
> +   __u32 reserved;
> +};
> +
> +struct kvm_s390_pv_info {
> +   struct kvm_s390_pv_info_header header;
> +   union {
> +   struct kvm_s390_pv_info_dump dump;
> +   struct kvm_s390_pv_info_vm vm;
> +   };
> +};
> +
> +enum pv_cmd_dmp_id {
> +KVM_PV_DUMP_INIT,
> +KVM_PV_DUMP_CONFIG_STATE,
> +KVM_PV_DUMP_COMPLETE,
> +KVM_PV_DUMP_CPU,
> +};
> +
> +struct kvm_s390_pv_dmp {
> +__u64 subcmd;
> +__u64 buff_addr;
> +__u64 buff_len;
> +__u64 gaddr;
> +__u64 reserved[4];
> +};
> +
>  enum pv_cmd_id {
> KVM_PV_ENABLE,
> KVM_PV_DISABLE,
> @@ -1659,6 +1709,8 @@ enum pv_cmd_id {
> KVM_PV_VERIFY,
> KVM_PV_PREP_RESET,
> KVM_PV_UNSHARE_ALL,
> +KVM_PV_INFO,
> +KVM_PV_DUMP,
>  };
>
>  struct kvm_pv_cmd {
> @@ -1733,6 +1785,7 @@ struct kvm_xen_vcpu_attr {
>  #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA   0x4
>  #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST 0x5
>
> +
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
> /* Guest initialization commands */
> @@ -2066,4 +2119,6 @@ struct kvm_stats_desc {
>  /* Available with KVM_CAP_XSAVE2 */
>  #define KVM_GET_XSAVE2   _IOR(KVMIO,  0xcf, struct kvm_xsave)
>
> +#define KVM_S390_PV_CPU_COMMAND _IOWR(KVMIO, 0xd0, struct kvm_pv_cmd)
> +
>  #endif /* __LINUX_KVM_H */
> --
> 2.34.1
>




Re: [PATCH v2 06/11] dump/dump: Add arch section support

2022-07-13 Thread Marc-André Lureau
Hi

On Wed, Jul 13, 2022 at 5:07 PM Janosch Frank  wrote:
>
> Add hooks which architectures can use to add arbitrary data to custom
> sections.
>
> Signed-off-by: Janosch Frank 
> ---
>  dump/dump.c| 21 ++---
>  include/sysemu/dump-arch.h | 27 +++
>  2 files changed, 45 insertions(+), 3 deletions(-)
>
> diff --git a/dump/dump.c b/dump/dump.c
> index 31e2a85372..02de00b6de 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -400,6 +400,7 @@ static void prepare_elf_section_hdrs(DumpState *s)
>  /*
>   * Section ordering:
>   * - HDR zero (if needed)
> + * - Arch section hdrs
>   * - String table hdr
>   */
>  sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
> @@ -417,6 +418,9 @@ static void prepare_elf_section_hdrs(DumpState *s)
>  return;
>  }
>
> +size = dump_arch_sections_write_hdr(>dump_info, s, buff_hdr);
> +buff_hdr += size;
> +
>  /*
>   * String table needs to be last section since strings are added
>   * via arch_sections_write_hdr().
> @@ -567,14 +571,23 @@ static void get_offset_range(hwaddr phys_addr,
>  }
>  }
>
> -static void write_elf_loads(DumpState *s, Error **errp)
> +static void write_elf_phdr_loads(DumpState *s, Error **errp)
>  {
>  ERRP_GUARD();
>  hwaddr offset, filesz;
>  MemoryMapping *memory_mapping;
>  uint32_t phdr_index = 1;
> +hwaddr min = 0, max = 0;
>
>  QTAILQ_FOREACH(memory_mapping, >list.head, next) {
> +if (memory_mapping->phys_addr < min) {
> +min = memory_mapping->phys_addr;
> +}
> +if (memory_mapping->phys_addr + memory_mapping->length > max) {
> +max = memory_mapping->phys_addr + memory_mapping->length;
> +}
> +
> +

Extra line & this belongs to a different patch.

>  get_offset_range(memory_mapping->phys_addr,
>   memory_mapping->length,
>   s, , );
> @@ -682,8 +695,8 @@ static void dump_begin(DumpState *s, Error **errp)
>  return;
>  }
>
> -/* write all PT_LOAD to vmcore */
> -write_elf_loads(s, errp);
> +/* write all PT_LOADs to vmcore */
> +write_elf_phdr_loads(s, errp);
>  if (*errp) {
>  return;
>  }
> @@ -723,6 +736,7 @@ static void dump_end(DumpState *s, Error **errp)
>  return;
>  }
>  s->elf_section_data = g_malloc0(s->elf_section_data_size);
> +dump_arch_sections_write(>dump_info, s, s->elf_section_data);
>
>  /* write sections to vmcore */
>  write_elf_sections(s, errp);
> @@ -1894,6 +1908,7 @@ static void dump_init(DumpState *s, int fd, bool 
> has_format,
>   * If phdr_num overflowed we have at least one section header
>   * More sections/hdrs can be added by the architectures
>   */
> +dump_arch_sections_add(>dump_info, (void *)s);
>  if (s->shdr_num > 1) {
>  /* Reserve the string table */
>  s->shdr_num += 1;
> diff --git a/include/sysemu/dump-arch.h b/include/sysemu/dump-arch.h
> index e25b02e990..de77908424 100644
> --- a/include/sysemu/dump-arch.h
> +++ b/include/sysemu/dump-arch.h
> @@ -21,6 +21,9 @@ typedef struct ArchDumpInfo {
>  uint32_t page_size;  /* The target's page size. If it's variable and
>* unknown, then this should be the maximum. */
>  uint64_t phys_base;  /* The target's physmem base. */
> +void (*arch_sections_add_fn)(void *opaque);
> +uint64_t (*arch_sections_write_hdr_fn)(void *opaque, uint8_t *buff);
> +void (*arch_sections_write_fn)(void *opaque, uint8_t *buff);
>  } ArchDumpInfo;
>
>  struct GuestPhysBlockList; /* memory_mapping.h */
> @@ -28,4 +31,28 @@ int cpu_get_dump_info(ArchDumpInfo *info,
>const struct GuestPhysBlockList *guest_phys_blocks);
>  ssize_t cpu_get_note_size(int class, int machine, int nr_cpus);
>
> +static inline void dump_arch_sections_add(ArchDumpInfo *info, void *opaque)
> +{
> +if (info->arch_sections_add_fn) {
> +info->arch_sections_add_fn(opaque);
> +}
> +}
> +
> +static inline uint64_t dump_arch_sections_write_hdr(ArchDumpInfo *info,
> +void *opaque, uint8_t *buff)
> +{
> +if (info->arch_sections_write_hdr_fn) {
> +return info->arch_sections_write_hdr_fn(opaque, buff);
> +}
> +return 0;
> +}
> +
> +static inline void dump_arch_sections_write(ArchDumpInfo *info, void *opaque,
> +uint8_t *buff)
> +{
> +if (info->arch_sections_write_fn) {
> +info->arch_sections_write_fn(opaque, buff);
> +}
> +}
> +
>  #endif
> --
> 2.34.1
>

otherwise, seems ok to me




Re: [PATCH 3/5] target/loongarch/cpu: Fix coverity errors about excp_names

2022-07-13 Thread Richard Henderson

On 7/13/22 15:20, Xiaojuan Yang wrote:

Fix out-of-bounds errors when access excp_names[] array. the valid
boundary size of excp_names should be 0 to ARRAY_SIZE(excp_names)-1.
However, the general code do not consider the max boundary.

Fix coverity CID: 1489758

Signed-off-by: Xiaojuan Yang 
---
  target/loongarch/cpu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)


I gave you a reviewed-by for this patch in v1.
You must copy those into v2 so that I don't have to do it again.


r~



diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index ed26f9beed..89ea971cde 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -140,7 +140,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
  
  if (cs->exception_index != EXCCODE_INT) {

  if (cs->exception_index < 0 ||
-cs->exception_index > ARRAY_SIZE(excp_names)) {
+cs->exception_index >= ARRAY_SIZE(excp_names)) {
  name = "unknown";
  } else {
  name = excp_names[cs->exception_index];
@@ -190,8 +190,8 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
  cause = cs->exception_index;
  break;
  default:
-qemu_log("Error: exception(%d) '%s' has not been supported\n",
- cs->exception_index, excp_names[cs->exception_index]);
+qemu_log("Error: exception(%d) has not been supported\n",
+ cs->exception_index);
  abort();
  }
  





Re: [PATCH 1/5] target/loongarch/cpu: Fix cpu_class_by_name function

2022-07-13 Thread Richard Henderson

On 7/13/22 15:20, Xiaojuan Yang wrote:

In loongarch_cpu_class_by_name(char *cpu_model) function,
the argument cpu_model already has the suffix '-loongarch-cpu',
so we should remove the LOONGARCH_CPU_TYPE_NAME(cpu_model) macro.
And add the assertion that 'cpu_model' resolves to a class of the
appropriate type.

Signed-off-by: Xiaojuan Yang 
---
  target/loongarch/cpu.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 

r~


diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index e21715592a..ed26f9beed 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -571,11 +571,12 @@ static void loongarch_cpu_init(Object *obj)
  static ObjectClass *loongarch_cpu_class_by_name(const char *cpu_model)
  {
  ObjectClass *oc;
-char *typename;
  
-typename = g_strdup_printf(LOONGARCH_CPU_TYPE_NAME("%s"), cpu_model);

-oc = object_class_by_name(typename);
-g_free(typename);
+oc = object_class_by_name(cpu_model);
+if (!oc || !object_class_dynamic_cast(oc, TYPE_LOONGARCH_CPU) ||
+object_class_is_abstract(oc)) {
+return NULL;
+}
  return oc;
  }
  





Re: [PATCH 4/5] target/loongarch/tlb_helper: Fix coverity integer overflow error

2022-07-13 Thread Richard Henderson

On 7/13/22 15:20, Xiaojuan Yang wrote:

Replace '1 << shift' with 'MAKE_64BIT_MASK(shift, 1)' to fix
unintentional integer overflow errors in tlb_helper file.

Fix coverity CID: 1489759 1489762

Signed-off-by: Xiaojuan Yang
---
  target/loongarch/tlb_helper.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 05/11] dump/dump: Add section string table support

2022-07-13 Thread Marc-André Lureau
Hi

On Wed, Jul 13, 2022 at 5:07 PM Janosch Frank  wrote:
>
> Time to add a bit more descriptiveness to the dumps.

Please add some more description & motivation to the patch (supposedly
necessary for next patches), and explain that it currently doesn't
change the dump (afaict).

>
> Signed-off-by: Janosch Frank 
> Reviewed-by: Richard Henderson 
> ---
>  dump/dump.c   | 106 --
>  include/sysemu/dump.h |   1 +
>  2 files changed, 94 insertions(+), 13 deletions(-)
>
> diff --git a/dump/dump.c b/dump/dump.c
> index 467d934bc1..31e2a85372 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -99,6 +99,7 @@ static int dump_cleanup(DumpState *s)
>  close(s->fd);
>  g_free(s->guest_note);
>  g_free(s->elf_header);
> +g_array_unref(s->string_table_buf);
>  s->guest_note = NULL;
>  if (s->resume) {
>  if (s->detached) {
> @@ -359,14 +360,47 @@ static size_t write_elf_section_hdr_zero(DumpState *s, 
> void *buff)
>  return dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
>  }
>
> +static void write_elf_section_hdr_string(DumpState *s, void *buff)
> +{
> +Elf32_Shdr shdr32;
> +Elf64_Shdr shdr64;
> +int shdr_size;
> +void *shdr = buff;
> +
> +if (dump_is_64bit(s)) {
> +shdr_size = sizeof(Elf64_Shdr);
> +memset(, 0, shdr_size);
> +shdr64.sh_type = SHT_STRTAB;
> +shdr64.sh_offset = s->section_offset + s->elf_section_data_size;
> +shdr64.sh_name = s->string_table_buf->len;
> +g_array_append_vals(s->string_table_buf, ".strtab", 
> sizeof(".strtab"));
> +shdr64.sh_size = s->string_table_buf->len;
> +shdr = 
> +} else {
> +shdr_size = sizeof(Elf32_Shdr);
> +memset(, 0, shdr_size);
> +shdr32.sh_type = SHT_STRTAB;
> +shdr32.sh_offset = s->section_offset + s->elf_section_data_size;
> +shdr32.sh_name = s->string_table_buf->len;
> +g_array_append_vals(s->string_table_buf, ".strtab", 
> sizeof(".strtab"));
> +shdr32.sh_size = s->string_table_buf->len;
> +shdr = 
> +}
> +
> +memcpy(buff, shdr, shdr_size);
> +}
> +
>  static void prepare_elf_section_hdrs(DumpState *s)
>  {
>  uint8_t *buff_hdr;
> -size_t len, sizeof_shdr;
> +size_t len, size = 0, sizeof_shdr;
> +Elf64_Ehdr *hdr64 = s->elf_header;
> +Elf32_Ehdr *hdr32 = s->elf_header;
>
>  /*
>   * Section ordering:
>   * - HDR zero (if needed)
> + * - String table hdr
>   */
>  sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
>  len = sizeof_shdr * s->shdr_num;
> @@ -377,6 +411,22 @@ static void prepare_elf_section_hdrs(DumpState *s)
>  if (s->phdr_num == PN_XNUM) {
>  write_elf_section_hdr_zero(s, buff_hdr);
>  }
> +buff_hdr += size;
> +
> +if (s->shdr_num < 2) {
> +return;
> +}
> +
> +/*
> + * String table needs to be last section since strings are added
> + * via arch_sections_write_hdr().
> + */
> +write_elf_section_hdr_string(s, buff_hdr);
> +if (dump_is_64bit(s)) {
> +hdr64->e_shstrndx = cpu_to_dump16(s, s->shdr_num - 1);
> +} else {
> +hdr32->e_shstrndx = cpu_to_dump16(s, s->shdr_num - 1);
> +}
>  }
>
>  static void prepare_elf_sections(DumpState *s, Error **errp)
> @@ -405,11 +455,18 @@ static void write_elf_sections(DumpState *s, Error 
> **errp)
>  {
>  int ret;
>
> -/* Write section zero */
> +/* Write section zero and arch sections */
>  ret = fd_write_vmcore(s->elf_section_data, s->elf_section_data_size, s);
>  if (ret < 0) {
>  error_setg_errno(errp, -ret, "dump: failed to write section data");
>  }
> +
> +/* Write string table data */
> +ret = fd_write_vmcore(s->string_table_buf->data,
> +  s->string_table_buf->len, s);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "dump: failed to write string table 
> data");
> +}
>  }
>
>  static void write_data(DumpState *s, void *buf, int length, Error **errp)
> @@ -592,6 +649,9 @@ static void dump_begin(DumpState *s, Error **errp)
>   *   --
>   *   |  memory |
>   *   --
> + *   |  sectn data |
> + *   --
> +
>   *
>   * we only know where the memory is saved after we write elf note into
>   * vmcore.
> @@ -677,6 +737,7 @@ static void create_vmcore(DumpState *s, Error **errp)
>  return;
>  }
>
> +/* Iterate over memory and dump it to file */
>  dump_iterate(s, errp);
>  if (*errp) {
>  return;
> @@ -1659,6 +1720,13 @@ static void dump_init(DumpState *s, int fd, bool 
> has_format,
>  s->has_filter = has_filter;
>  s->begin = begin;
>  s->length = length;
> +/* First index is 0, it's the special null name */
> +s->string_table_buf = g_array_new(FALSE, TRUE, 1);
> +/*
> + * Allocate the null name, due to the 

Re: [PATCH v2] memory: prevent dma-reentracy issues

2022-07-13 Thread Alexander Bulekov
On 220712 1034, Stefan Hajnoczi wrote:
> On Tue, Jun 21, 2022 at 11:53:06AM -0400, Alexander Bulekov wrote:
> > On 220621 1630, Peter Maydell wrote:
> > > On Thu, 9 Jun 2022 at 14:59, Alexander Bulekov  wrote:
> > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > index 44dacfa224..ab1ad0f7a8 100644
> > > > --- a/include/hw/pci/pci.h
> > > > +++ b/include/hw/pci/pci.h
> > > > @@ -834,8 +834,17 @@ static inline MemTxResult pci_dma_rw(PCIDevice 
> > > > *dev, dma_addr_t addr,
> > > >   void *buf, dma_addr_t len,
> > > >   DMADirection dir, MemTxAttrs 
> > > > attrs)
> > > >  {
> > > > -return dma_memory_rw(pci_get_address_space(dev), addr, buf, len,
> > > > - dir, attrs);
> > > > +bool prior_engaged_state;
> > > > +MemTxResult result;
> > > > +
> > > > +prior_engaged_state = dev->qdev.engaged_in_io;
> > > > +
> > > > +dev->qdev.engaged_in_io = true;
> > > > +result = dma_memory_rw(pci_get_address_space(dev), addr, buf, len,
> > > > +   dir, attrs);
> > > > +dev->qdev.engaged_in_io = prior_engaged_state;
> > > > +
> > > > +return result;
> > > 
> > > Why do we need to do something in this pci-specific function ?
> > > I was expecting this to only need changes at the generic-to-all-devices
> > > level.
> > 
> > Both of these handle the BH->DMA->MMIO case. Unlike MMIO, I don't think
> > there is any neat way to set the engaged_in_io flag as we enter a BH. So
> > instead, we try to set it when a device initiates DMA.
> > 
> > The pci function lets us do that since we get a glimpse of the dev/qdev
> > (unlike the dma_memory_...  functions).
> ...
> > > > @@ -302,6 +310,10 @@ static MemTxResult dma_buf_rw(void *buf, 
> > > > dma_addr_t len, dma_addr_t *residual,
> > > >  xresidual -= xfer;
> > > >  }
> > > >
> > > > +if (dev) {
> > > > +dev->engaged_in_io = prior_engaged_state;
> > > > +}
> > > 
> > > Not all DMA goes through dma_buf_rw() -- why does it need changes?
> > 
> > This one has the same goal, but accesses the qdev through sg, instead of
> > PCI.
> 
> Should dma_*() APIs take a reentrancy guard argument so that all DMA
> accesses are systematically covered?

That seems like it would be the best option, though it carries the cost
of needing to tweak a lot of code in hw/. Maybe some refactoring tool
could help with this.
-Alex

> 
>   /* Define this in the memory API */
>   typedef struct {
>   bool engaged_in_io;
>   } MemReentrancyGuard;
> 
>   /* Embed MemReentrancyGuard in DeviceState */
>   ...
> 
>   /* Require it in dma_*() APIs */
>   static inline MemTxResult dma_memory_rw(AddressSpace *as, dma_addr_t addr,
>   void *buf, dma_addr_t len,
>   DMADirection dir, MemTxAttrs attrs,
> MemReentrancyGuard *guard);
> 
>   /* Call dma_*() APIs like this... */
>   static inline MemTxResult pci_dma_rw(PCIDevice *dev, dma_addr_t addr,
>void *buf, dma_addr_t len,
>DMADirection dir, MemTxAttrs attrs)
>   {
>   return dma_memory_rw(pci_get_address_space(dev), addr, buf, len,
>dir, attrs, >qdev.reentrancy_guard);
>   }
> 
> Stefan





Re: [PATCH v2 04/11] dump: Reorder struct DumpState

2022-07-13 Thread Marc-André Lureau
On Wed, Jul 13, 2022 at 5:07 PM Janosch Frank  wrote:
>
> Let's move ELF related members into one block and guest memory related
> ones into another to improve readability.
>
> Signed-off-by: Janosch Frank 
> Reviewed-by: Richard Henderson 

Reviewed-by: Marc-André Lureau 

> ---
>  include/sysemu/dump.h | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
> index bd49532232..8379e29ef6 100644
> --- a/include/sysemu/dump.h
> +++ b/include/sysemu/dump.h
> @@ -154,15 +154,8 @@ typedef struct DumpState {
>  GuestPhysBlockList guest_phys_blocks;
>  ArchDumpInfo dump_info;
>  MemoryMappingList list;
> -uint32_t phdr_num;
> -uint32_t shdr_num;
>  bool resume;
>  bool detached;
> -ssize_t note_size;
> -hwaddr shdr_offset;
> -hwaddr phdr_offset;
> -hwaddr section_offset;
> -hwaddr note_offset;
>  hwaddr memory_offset;
>  int fd;
>
> @@ -171,6 +164,16 @@ typedef struct DumpState {
>  int64_t begin; /* Start address of the chunk we want to dump 
> */
>  int64_t length;/* Length of the dump we want to dump */
>
> +/* Elf dump related data */
> +uint32_t phdr_num;
> +uint32_t shdr_num;
> +uint32_t sh_info;
> +ssize_t note_size;
> +hwaddr shdr_offset;
> +hwaddr phdr_offset;
> +hwaddr note_offset;
> +hwaddr section_offset;
> +
>  void *elf_header;
>  void *elf_section_hdrs;
>  uint64_t elf_section_data_size;
> --
> 2.34.1
>




Re: [PATCH 3/3] tests/tcg/s390x: test signed vfmin/vfmax

2022-07-13 Thread Richard Henderson

On 7/12/22 18:02, Ilya Leoshkevich wrote:

This works, of course.  It could be simpler using EXECUTE, to store
just the one
instruction and not worry about an executable mapped page, but I
guess it doesn't matter.


I thought about this too, but EX/EXRL operate only on the second byte,
and I need to modify bytes 3-5 here.


I didn't mean modify the instruction via EX, but something like

  static char minmax[6] __attribute__((aligned(2)))
= { xx, yy, zz, 0, 0, 0 };

  minmax[3] = m6 ...
  minmax[4] = ...
  minmax[5] = op;

  asm("vl %%v25,0(%1)\n"
  "vl %%v26,0(%2)\n"
  "ex 0,0(%3)\n"
  "vst %%v24,0(%0)"
  : : "a"(v1), "a"(v2), "a"(v3), "a"(minmax)
  : "memory", "v24", "v25", "v26);


r~



Re: [PATCH v3] multifd: Copy pages before compressing them with zlib

2022-07-13 Thread Dr. David Alan Gilbert
* Ilya Leoshkevich (i...@linux.ibm.com) wrote:
> zlib_send_prepare() compresses pages of a running VM. zlib does not
> make any thread-safety guarantees with respect to changing deflate()
> input concurrently with deflate() [1].
> 
> One can observe problems due to this with the IBM zEnterprise Data
> Compression accelerator capable zlib [2]. When the hardware
> acceleration is enabled, migration/multifd/tcp/plain/zlib test fails
> intermittently [3] due to sliding window corruption. The accelerator's
> architecture explicitly discourages concurrent accesses [4]:
> 
> Page 26-57, "Other Conditions":
> 
> As observed by this CPU, other CPUs, and channel
> programs, references to the parameter block, first,
> second, and third operands may be multiple-access
> references, accesses to these storage locations are
> not necessarily block-concurrent, and the sequence
> of these accesses or references is undefined.
> 
> Mark Adler pointed out that vanilla zlib performs double fetches under
> certain circumstances as well [5], therefore we need to copy data
> before passing it to deflate().
> 
> [1] https://zlib.net/manual.html
> [2] https://github.com/madler/zlib/pull/410
> [3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html
> [4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
> [5] https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00889.html
> 
> Signed-off-by: Ilya Leoshkevich 

Thanks!


Reviewed-by: Dr. David Alan Gilbert 

> ---
> 
> v1: https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg06841.html
> v1 -> v2: Rebase, mention Mark Adler's reply in the commit message.
> 
> v2: https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00627.html
> v2 -> v3: Get rid of pointer maths (David).
>   Use a more relevant link to Mark Adler's comment (Peter).
> 
>  migration/multifd-zlib.c | 38 ++
>  1 file changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> index 3a7ae44485..18213a9513 100644
> --- a/migration/multifd-zlib.c
> +++ b/migration/multifd-zlib.c
> @@ -27,6 +27,8 @@ struct zlib_data {
>  uint8_t *zbuff;
>  /* size of compressed buffer */
>  uint32_t zbuff_len;
> +/* uncompressed buffer of size qemu_target_page_size() */
> +uint8_t *buf;
>  };
>  
>  /* Multifd zlib compression */
> @@ -45,26 +47,38 @@ static int zlib_send_setup(MultiFDSendParams *p, Error 
> **errp)
>  {
>  struct zlib_data *z = g_new0(struct zlib_data, 1);
>  z_stream *zs = >zs;
> +const char *err_msg;
>  
>  zs->zalloc = Z_NULL;
>  zs->zfree = Z_NULL;
>  zs->opaque = Z_NULL;
>  if (deflateInit(zs, migrate_multifd_zlib_level()) != Z_OK) {
> -g_free(z);
> -error_setg(errp, "multifd %u: deflate init failed", p->id);
> -return -1;
> +err_msg = "deflate init failed";
> +goto err_free_z;
>  }
>  /* This is the maxium size of the compressed buffer */
>  z->zbuff_len = compressBound(MULTIFD_PACKET_SIZE);
>  z->zbuff = g_try_malloc(z->zbuff_len);
>  if (!z->zbuff) {
> -deflateEnd(>zs);
> -g_free(z);
> -error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
> -return -1;
> +err_msg = "out of memory for zbuff";
> +goto err_deflate_end;
> +}
> +z->buf = g_try_malloc(qemu_target_page_size());
> +if (!z->buf) {
> +err_msg = "out of memory for buf";
> +goto err_free_zbuff;
>  }
>  p->data = z;
>  return 0;
> +
> +err_free_zbuff:
> +g_free(z->zbuff);
> +err_deflate_end:
> +deflateEnd(>zs);
> +err_free_z:
> +g_free(z);
> +error_setg(errp, "multifd %u: %s", p->id, err_msg);
> +return -1;
>  }
>  
>  /**
> @@ -82,6 +96,8 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
> **errp)
>  deflateEnd(>zs);
>  g_free(z->zbuff);
>  z->zbuff = NULL;
> +g_free(z->buf);
> +z->buf = NULL;
>  g_free(p->data);
>  p->data = NULL;
>  }
> @@ -114,8 +130,14 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
> **errp)
>  flush = Z_SYNC_FLUSH;
>  }
>  
> +/*
> + * Since the VM might be running, the page may be changing 
> concurrently
> + * with compression. zlib does not guarantee that this is safe,
> + * therefore copy the page before calling deflate().
> + */
> +memcpy(z->buf, p->pages->block->host + p->normal[i], page_size);
>  zs->avail_in = page_size;
> -zs->next_in = p->pages->block->host + p->normal[i];
> +zs->next_in = z->buf;
>  
>  zs->avail_out = available;
>  zs->next_out = z->zbuff + out_size;
> -- 
> 2.35.3
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v2 01/11] dump: Cleanup memblock usage

2022-07-13 Thread Marc-André Lureau
Hi

On Wed, Jul 13, 2022 at 7:30 PM Janosch Frank  wrote:
>
> On 7/13/22 17:09, Marc-André Lureau wrote:
> > Hi
> >
> > On Wed, Jul 13, 2022 at 5:07 PM Janosch Frank  wrote:
> >>
> >> The iteration over the memblocks is hard to understand so it's about
> >> time to clean it up.
> >>
> >> struct DumpState's next_block and start members can and should be
> >> local variables within the iterator.
> >>
> >> Instead of manually grabbing the next memblock we can use
> >> QTAILQ_FOREACH to iterate over all memblocks.
> >>
> >> The begin and length fields in the DumpState have been left untouched
> >> since the qmp arguments share their names.
> >>
> >> Signed-off-by: Janosch Frank 
> >
> > After this patch:
> > ./qemu-system-x86_64 -monitor stdio -S
> > (qemu) dump-guest-memory foo
> > Error: dump: failed to save memory: Bad address
>
> If you have more ways to check for dump errors then please send them to
> me. I'm aware that this might not have been a 100% conversion and I'm a
> bit terrified about the fact that this will affect all architectures.

Same feeling here. Maybe it's about time to write real dump tests!

>
>
> Anyway, I'll have a look.
>
> [...]
>
> >> +static inline int64_t dump_get_memblock_start(GuestPhysBlock *block, 
> >> int64_t filter_area_start,
> >> +  int64_t filter_area_length)
> >> +{
> >> +if (filter_area_length) {
> >> +/*
> >> + * Check if block is within guest memory dump area. If not
> >> + * go to next one.
> >> + */
> >
> > Or rather "return -1 if the block is not within filter area"
>
> Sure
>
> >
> >> +if (block->target_start >= filter_area_start + filter_area_length 
> >> ||
> >> +block->target_end <= filter_area_start) {
> >> +return -1;
> >> +}
> >> +if (filter_area_start > block->target_start) {
> >> +return filter_area_start - block->target_start;
> >> +}
> >> +}
> >> +return block->target_start;
> >
> > This used to be 0. Changing that, I think the patch looks good.
> > Although it could perhaps be splitted to introduce the two functions.
>
> Yes but the 0 was used to indicate that we would have needed continue
> iterating and the iteration is done via other means in this patch.
>
> Or am I missing something?

Well, you changed the way the loop used to work. it used to return 1/0
to indicate stop/continue and rely on s->start / s->next_block. Now
you return memblock_start.

>
> >
> >> +}
> >>   #endif
> >> --
> >> 2.34.1
> >>
> >
> >
>




Re: [PATCH v2 03/11] dump: Split write of section headers and data and add a prepare step

2022-07-13 Thread Marc-André Lureau
Hi

On Wed, Jul 13, 2022 at 5:07 PM Janosch Frank  wrote:
>
> By splitting the writing of the section headers and (future) section
> data we prepare for the addition of a string table section and
> architecture sections.
>
> Signed-off-by: Janosch Frank 
> ---
>  dump/dump.c   | 116 --
>  include/sysemu/dump.h |   4 ++
>  2 files changed, 94 insertions(+), 26 deletions(-)
>
> diff --git a/dump/dump.c b/dump/dump.c
> index 16d7474258..467d934bc1 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -342,30 +342,73 @@ static void write_elf_phdr_note(DumpState *s, Error 
> **errp)
>  }
>  }
>
> -static void write_elf_section(DumpState *s, int type, Error **errp)
> +static size_t write_elf_section_hdr_zero(DumpState *s, void *buff)

Since the function no longer write, I'd suggest to rename it with
prepare_ prefix

>  {
> -Elf32_Shdr shdr32;
> -Elf64_Shdr shdr64;
> -int shdr_size;
> -void *shdr;
> -int ret;
> +if (dump_is_64bit(s)) {
> +Elf64_Shdr *shdr64 = buff;
>
> -if (type == 0) {
> -shdr_size = sizeof(Elf32_Shdr);
> -memset(, 0, shdr_size);
> -shdr32.sh_info = cpu_to_dump32(s, s->phdr_num);
> -shdr = 
> +memset(buff, 0, sizeof(Elf64_Shdr));

You can drop this

> +shdr64->sh_info = cpu_to_dump32(s, s->phdr_num);
>  } else {
> -shdr_size = sizeof(Elf64_Shdr);
> -memset(, 0, shdr_size);
> -shdr64.sh_info = cpu_to_dump32(s, s->phdr_num);
> -shdr = 
> +Elf32_Shdr *shdr32 = buff;
> +
> +memset(buff, 0, sizeof(Elf32_Shdr));

and this

> +shdr32->sh_info = cpu_to_dump32(s, s->phdr_num);
>  }
>
> -ret = fd_write_vmcore(shdr, shdr_size, s);
> +return dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
> +}
> +
> +static void prepare_elf_section_hdrs(DumpState *s)
> +{
> +uint8_t *buff_hdr;
> +size_t len, sizeof_shdr;
> +
> +/*
> + * Section ordering:
> + * - HDR zero (if needed)
> + */
> +sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
> +len = sizeof_shdr * s->shdr_num;
> +s->elf_section_hdrs = g_malloc0(len);

since you alloc0 here

> +buff_hdr = s->elf_section_hdrs;
> +
> +/* Write special section first */
> +if (s->phdr_num == PN_XNUM) {
> +write_elf_section_hdr_zero(s, buff_hdr);

Eventually, drop buff_hdr, and pass only "s" as argument

+ Indentation is off

> +}
> +}
> +
> +static void prepare_elf_sections(DumpState *s, Error **errp)
> +{
> +if (!s->shdr_num) {
> +return;
> +}
> +
> +prepare_elf_section_hdrs(s);
> +}
> +
> +static void write_elf_section_headers(DumpState *s, Error **errp)
> +{
> +size_t sizeof_shdr;
> +int ret;
> +
> +sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
> +
> +ret = fd_write_vmcore(s->elf_section_hdrs, s->shdr_num * sizeof_shdr, s);
>  if (ret < 0) {
> -error_setg_errno(errp, -ret,
> - "dump: failed to write section header table");
> +error_setg_errno(errp, -ret, "dump: failed to write section data");

nit: data->header


> +}
> +}
> +
> +static void write_elf_sections(DumpState *s, Error **errp)
> +{
> +int ret;
> +
> +/* Write section zero */
> +ret = fd_write_vmcore(s->elf_section_data, s->elf_section_data_size, s);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "dump: failed to write section data");
>  }
>  }
>
> @@ -557,12 +600,22 @@ static void dump_begin(DumpState *s, Error **errp)
>  /* Write elf header to buffer */
>  prepare_elf_header(s);
>
> +prepare_elf_sections(s, errp);
> +if (*errp) {
> +return;
> +}
> +
>  /* Start to write stuff into files*/
>  write_elf_header(s, errp);
>  if (*errp) {
>  return;
>  }
>
> +write_elf_section_headers(s, errp);

Why do you reorder the sections? Could you explain in the commit
message why? Is this is format compliant? and update the comment
above? thanks

> +if (*errp) {
> +return;
> +}
> +
>  /* write PT_NOTE to vmcore */
>  write_elf_phdr_note(s, errp);
>  if (*errp) {
> @@ -575,14 +628,6 @@ static void dump_begin(DumpState *s, Error **errp)
>  return;
>  }
>
> -/* write section to vmcore */
> -if (s->shdr_num) {
> -write_elf_section(s, 1, errp);
> -if (*errp) {
> -return;
> -}
> -}
> -
>  /* write notes to vmcore */
>  write_elf_notes(s, errp);
>  }
> @@ -610,6 +655,19 @@ static void dump_iterate(DumpState *s, Error **errp)
>  }
>  }
>
> +static void dump_end(DumpState *s, Error **errp)
> +{
> +ERRP_GUARD();
> +
> +if (!s->elf_section_data_size) {
> +return;
> +}
> +s->elf_section_data = g_malloc0(s->elf_section_data_size);
> +
> +/* write sections to vmcore */
> +write_elf_sections(s, errp);
> +}
> +
>  

Re: [PATCH v2 01/11] dump: Cleanup memblock usage

2022-07-13 Thread Janosch Frank

On 7/13/22 17:09, Marc-André Lureau wrote:

Hi

On Wed, Jul 13, 2022 at 5:07 PM Janosch Frank  wrote:


The iteration over the memblocks is hard to understand so it's about
time to clean it up.

struct DumpState's next_block and start members can and should be
local variables within the iterator.

Instead of manually grabbing the next memblock we can use
QTAILQ_FOREACH to iterate over all memblocks.

The begin and length fields in the DumpState have been left untouched
since the qmp arguments share their names.

Signed-off-by: Janosch Frank 


After this patch:
./qemu-system-x86_64 -monitor stdio -S
(qemu) dump-guest-memory foo
Error: dump: failed to save memory: Bad address


If you have more ways to check for dump errors then please send them to 
me. I'm aware that this might not have been a 100% conversion and I'm a 
bit terrified about the fact that this will affect all architectures.



Anyway, I'll have a look.

[...]


+static inline int64_t dump_get_memblock_start(GuestPhysBlock *block, int64_t 
filter_area_start,
+  int64_t filter_area_length)
+{
+if (filter_area_length) {
+/*
+ * Check if block is within guest memory dump area. If not
+ * go to next one.
+ */


Or rather "return -1 if the block is not within filter area"


Sure




+if (block->target_start >= filter_area_start + filter_area_length ||
+block->target_end <= filter_area_start) {
+return -1;
+}
+if (filter_area_start > block->target_start) {
+return filter_area_start - block->target_start;
+}
+}
+return block->target_start;


This used to be 0. Changing that, I think the patch looks good.
Although it could perhaps be splitted to introduce the two functions.


Yes but the 0 was used to indicate that we would have needed continue 
iterating and the iteration is done via other means in this patch.


Or am I missing something?




+}
  #endif
--
2.34.1








Re: [PATCH v2 02/11] dump: Allocate header

2022-07-13 Thread Marc-André Lureau
On Wed, Jul 13, 2022 at 5:07 PM Janosch Frank  wrote:
>
> Allocating the header lets us write it at a later time and hence also
> allows us to change section and segment table offsets until we
> finally write it.
>
> Signed-off-by: Janosch Frank 

Reviewed-by: Marc-André Lureau 


> ---
>  dump/dump.c   | 127 +-
>  include/sysemu/dump.h |   1 +
>  2 files changed, 64 insertions(+), 64 deletions(-)
>
> diff --git a/dump/dump.c b/dump/dump.c
> index 6feba3cbfa..16d7474258 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -98,6 +98,7 @@ static int dump_cleanup(DumpState *s)
>  memory_mapping_list_free(>list);
>  close(s->fd);
>  g_free(s->guest_note);
> +g_free(s->elf_header);
>  s->guest_note = NULL;
>  if (s->resume) {
>  if (s->detached) {
> @@ -126,73 +127,49 @@ static int fd_write_vmcore(const void *buf, size_t 
> size, void *opaque)
>  return 0;
>  }
>
> -static void write_elf64_header(DumpState *s, Error **errp)
> +static void prepare_elf64_header(DumpState *s)
>  {
> -/*
> - * phnum in the elf header is 16 bit, if we have more segments we
> - * set phnum to PN_XNUM and write the real number of segments to a
> - * special section.
> - */
> -uint16_t phnum = MIN(s->phdr_num, PN_XNUM);
> -Elf64_Ehdr elf_header;
> -int ret;
> +uint16_t phnum = s->phdr_num >= PN_XNUM ? PN_XNUM : s->phdr_num;
> +Elf64_Ehdr *elf_header = s->elf_header;
>
> -memset(_header, 0, sizeof(Elf64_Ehdr));
> -memcpy(_header, ELFMAG, SELFMAG);
> -elf_header.e_ident[EI_CLASS] = ELFCLASS64;
> -elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
> -elf_header.e_ident[EI_VERSION] = EV_CURRENT;
> -elf_header.e_type = cpu_to_dump16(s, ET_CORE);
> -elf_header.e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
> -elf_header.e_version = cpu_to_dump32(s, EV_CURRENT);
> -elf_header.e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
> -elf_header.e_phoff = cpu_to_dump64(s, s->phdr_offset);
> -elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf64_Phdr));
> -elf_header.e_phnum = cpu_to_dump16(s, phnum);
> +memcpy(elf_header, ELFMAG, SELFMAG);
> +elf_header->e_ident[EI_CLASS] = ELFCLASS64;
> +elf_header->e_ident[EI_DATA] = s->dump_info.d_endian;
> +elf_header->e_ident[EI_VERSION] = EV_CURRENT;
> +elf_header->e_type = cpu_to_dump16(s, ET_CORE);
> +elf_header->e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
> +elf_header->e_version = cpu_to_dump32(s, EV_CURRENT);
> +elf_header->e_ehsize = cpu_to_dump16(s, sizeof(*elf_header));
> +elf_header->e_phoff = cpu_to_dump64(s, s->phdr_offset);
> +elf_header->e_phentsize = cpu_to_dump16(s, sizeof(Elf64_Phdr));
> +elf_header->e_phnum = cpu_to_dump16(s, phnum);
>  if (s->shdr_num) {
> -elf_header.e_shoff = cpu_to_dump64(s, s->shdr_offset);
> -elf_header.e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
> -elf_header.e_shnum = cpu_to_dump16(s, s->shdr_num);
> -}
> -
> -ret = fd_write_vmcore(_header, sizeof(elf_header), s);
> -if (ret < 0) {
> -error_setg_errno(errp, -ret, "dump: failed to write elf header");
> +elf_header->e_shoff = cpu_to_dump64(s, s->shdr_offset);
> +elf_header->e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
> +elf_header->e_shnum = cpu_to_dump16(s, s->shdr_num);
>  }
>  }
>
> -static void write_elf32_header(DumpState *s, Error **errp)
> +static void prepare_elf32_header(DumpState *s)
>  {
> -/*
> - * phnum in the elf header is 16 bit, if we have more segments we
> - * set phnum to PN_XNUM and write the real number of segments to a
> - * special section.
> - */
> -uint16_t phnum = MIN(s->phdr_num, PN_XNUM);
> -Elf32_Ehdr elf_header;
> -int ret;
> +uint16_t phnum = s->phdr_num >= PN_XNUM ? PN_XNUM : s->phdr_num;
> +Elf32_Ehdr *elf_header = s->elf_header;
>
> -memset(_header, 0, sizeof(Elf32_Ehdr));
> -memcpy(_header, ELFMAG, SELFMAG);
> -elf_header.e_ident[EI_CLASS] = ELFCLASS32;
> -elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
> -elf_header.e_ident[EI_VERSION] = EV_CURRENT;
> -elf_header.e_type = cpu_to_dump16(s, ET_CORE);
> -elf_header.e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
> -elf_header.e_version = cpu_to_dump32(s, EV_CURRENT);
> -elf_header.e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
> -elf_header.e_phoff = cpu_to_dump32(s, s->phdr_offset);
> -elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf32_Phdr));
> -elf_header.e_phnum = cpu_to_dump16(s, phnum);
> +memcpy(elf_header, ELFMAG, SELFMAG);
> +elf_header->e_ident[EI_CLASS] = ELFCLASS32;
> +elf_header->e_ident[EI_DATA] = s->dump_info.d_endian;
> +elf_header->e_ident[EI_VERSION] = EV_CURRENT;
> +elf_header->e_type = cpu_to_dump16(s, ET_CORE);
> +elf_header->e_machine = cpu_to_dump16(s, 

Re: [PATCH v2 01/11] dump: Cleanup memblock usage

2022-07-13 Thread Marc-André Lureau
Hi

On Wed, Jul 13, 2022 at 5:07 PM Janosch Frank  wrote:
>
> The iteration over the memblocks is hard to understand so it's about
> time to clean it up.
>
> struct DumpState's next_block and start members can and should be
> local variables within the iterator.
>
> Instead of manually grabbing the next memblock we can use
> QTAILQ_FOREACH to iterate over all memblocks.
>
> The begin and length fields in the DumpState have been left untouched
> since the qmp arguments share their names.
>
> Signed-off-by: Janosch Frank 

After this patch:
./qemu-system-x86_64 -monitor stdio -S
(qemu) dump-guest-memory foo
Error: dump: failed to save memory: Bad address


> ---
>  dump/dump.c   | 91 +++
>  include/sysemu/dump.h | 47 +++---
>  2 files changed, 65 insertions(+), 73 deletions(-)
>
> diff --git a/dump/dump.c b/dump/dump.c
> index 4d9658ffa2..6feba3cbfa 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -591,56 +591,27 @@ static void dump_begin(DumpState *s, Error **errp)
>  write_elf_notes(s, errp);
>  }
>
> -static int get_next_block(DumpState *s, GuestPhysBlock *block)
> -{
> -while (1) {
> -block = QTAILQ_NEXT(block, next);
> -if (!block) {
> -/* no more block */
> -return 1;
> -}
> -
> -s->start = 0;
> -s->next_block = block;
> -if (s->has_filter) {
> -if (block->target_start >= s->begin + s->length ||
> -block->target_end <= s->begin) {
> -/* This block is out of the range */
> -continue;
> -}
> -
> -if (s->begin > block->target_start) {
> -s->start = s->begin - block->target_start;
> -}
> -}
> -
> -return 0;
> -}
> -}
> -
>  /* write all memory to vmcore */
>  static void dump_iterate(DumpState *s, Error **errp)
>  {
>  ERRP_GUARD();
>  GuestPhysBlock *block;
> -int64_t size;
> +int64_t memblock_size, memblock_start;
>
> -do {
> -block = s->next_block;
> -
> -size = block->target_end - block->target_start;
> -if (s->has_filter) {
> -size -= s->start;
> -if (s->begin + s->length < block->target_end) {
> -size -= block->target_end - (s->begin + s->length);
> -}
> +QTAILQ_FOREACH(block, >guest_phys_blocks.head, next) {
> +memblock_start = dump_get_memblock_start(block, s->begin, s->length);
> +if (memblock_start == -1) {
> +continue;
>  }
> -write_memory(s, block, s->start, size, errp);
> +
> +memblock_size = dump_get_memblock_size(block, s->begin, s->length);
> +
> +/* Write the memory to file */
> +write_memory(s, block, memblock_start, memblock_size, errp);
>  if (*errp) {
>  return;
>  }
> -
> -} while (!get_next_block(s, block));
> +}
>  }
>
>  static void create_vmcore(DumpState *s, Error **errp)
> @@ -1490,30 +1461,22 @@ static void create_kdump_vmcore(DumpState *s, Error 
> **errp)
>  }
>  }
>
> -static ram_addr_t get_start_block(DumpState *s)
> +static int validate_start_block(DumpState *s)
>  {
>  GuestPhysBlock *block;
>
>  if (!s->has_filter) {
> -s->next_block = QTAILQ_FIRST(>guest_phys_blocks.head);
>  return 0;
>  }
>
>  QTAILQ_FOREACH(block, >guest_phys_blocks.head, next) {
> +/* This block is out of the range */
>  if (block->target_start >= s->begin + s->length ||
>  block->target_end <= s->begin) {
> -/* This block is out of the range */
>  continue;
>  }
> -
> -s->next_block = block;
> -if (s->begin > block->target_start) {
> -s->start = s->begin - block->target_start;
> -} else {
> -s->start = 0;
> -}
> -return s->start;
> -}
> +return 0;
> +   }
>
>  return -1;
>  }
> @@ -1540,25 +1503,17 @@ bool qemu_system_dump_in_progress(void)
>  return (qatomic_read(>status) == DUMP_STATUS_ACTIVE);
>  }
>
> -/* calculate total size of memory to be dumped (taking filter into
> - * acoount.) */
> +/*
> + * calculate total size of memory to be dumped (taking filter into
> + * account.)

thanks for fixing the typo

> + */
>  static int64_t dump_calculate_size(DumpState *s)
>  {
>  GuestPhysBlock *block;
> -int64_t size = 0, total = 0, left = 0, right = 0;
> +int64_t total = 0;
>
>  QTAILQ_FOREACH(block, >guest_phys_blocks.head, next) {
> -if (s->has_filter) {
> -/* calculate the overlapped region. */
> -left = MAX(s->begin, block->target_start);
> -right = MIN(s->begin + s->length, block->target_end);
> -size = right - left;
> -size = size > 0 ? size : 0;
> -} else {
> -/* count the whole region in */
> -size = (block->target_end - 

Re: [PATCH v8 02/12] s390x/cpu_topology: CPU topology objects and structures

2022-07-13 Thread Pierre Morel




On 7/12/22 17:40, Janis Schoetterl-Glausch wrote:

On 6/20/22 16:03, Pierre Morel wrote:

We use new objects to have a dynamic administration of the CPU topology.
The highest level object in this implementation is the s390 book and
in this first implementation of CPU topology for S390 we have a single
book.
The book is built as a SYSBUS bridge during the CPU initialization.
Other objects, sockets and core will be built after the parsing
of the QEMU -smp argument.

Every object under this single book will be build dynamically
immediately after a CPU has be realized if it is needed.
The CPU will fill the sockets once after the other, according to the
number of core per socket defined during the smp parsing.

Each CPU inside a socket will be represented by a bit in a 64bit
unsigned long. Set on plug and clear on unplug of a CPU.

For the S390 CPU topology, thread and cores are merged into
topology cores and the number of topology cores is the multiplication
of cores by the numbers of threads.

Signed-off-by: Pierre Morel 
---
  hw/s390x/cpu-topology.c | 391 
  hw/s390x/meson.build|   1 +
  hw/s390x/s390-virtio-ccw.c  |   6 +
  include/hw/s390x/cpu-topology.h |  74 ++
  target/s390x/cpu.h  |  47 
  5 files changed, 519 insertions(+)
  create mode 100644 hw/s390x/cpu-topology.c
  create mode 100644 include/hw/s390x/cpu-topology.h

diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
new file mode 100644
index 00..0fd6f08084
--- /dev/null
+++ b/hw/s390x/cpu-topology.c
@@ -0,0 +1,391 @@
+/*
+ * CPU Topology
+ *
+ * Copyright 2022 IBM Corp.


Should be Copyright IBM Corp. 2022, and maybe even have a year range.


OK




+ * Author(s): Pierre Morel 
+
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "hw/sysbus.h"
+#include "hw/s390x/cpu-topology.h"
+#include "hw/qdev-properties.h"
+#include "hw/boards.h"
+#include "qemu/typedefs.h"
+#include "target/s390x/cpu.h"
+#include "hw/s390x/s390-virtio-ccw.h"
+
+/*
+ * s390_create_cores:
+ * @ms: Machine state
+ * @socket: the socket on which to create the core set
+ * @origin: the origin offset of the first core of the set
+ * @errp: Error pointer
+ *
+ * returns a pointer to the created S390TopologyCores structure
+ *
+ * On error: return NULL
+ */
+static S390TopologyCores *s390_create_cores(MachineState *ms,
+S390TopologySocket *socket,
+int origin, Error **errp)
+{
+DeviceState *dev;
+S390TopologyCores *cores;
+
+if (socket->bus->num_children >= ms->smp.cores * ms->smp.threads) {
+error_setg(errp, "Unable to create more cores.");
+return NULL;
+}


Why/How can this happen?
The "location" of the CPU is a function of core_id and the same CPU should not 
be added twice.
If it's to enforce a limit on the smp arguments that should happen earlier in 
my opinion.
If it's necessary, you could also make the message more verbose and add ", maximum 
number reached".


should not happen.


+
+dev = qdev_new(TYPE_S390_TOPOLOGY_CORES);
+qdev_realize_and_unref(dev, socket->bus, _fatal);


As a result of this, the order of cores in the socket bus is the creation 
order, correct?
So newest first and not ordered by the origin (since we can hot plug CPUs), 
correct?


yes


+
+cores = S390_TOPOLOGY_CORES(dev);
+cores->origin = origin;


I must admit that I haven't fully grokked the qemu object model, yet, but I'd 
be more comfortable
if you unref'ed cores after you set the origin.
Does the socket bus own the object after you unref it? Does it then make sense 
to return cores
after unref'ing it?


AFAIU yes the bus owns it.


But then we don't support CPU unplug, so the object shouldn't just vanish.


+socket->cnt += 1;


cnt++ to be consistent with create_socket below.


OK


+
+return cores;
+}
+
+/*
+ * s390_create_socket:
+ * @ms: Machine state
+ * @book: the book on which to create the socket
+ * @id: the socket id
+ * @errp: Error pointer
+ *
+ * returns a pointer to the created S390TopologySocket structure
+ *
+ * On error: return NULL
+ */
+static S390TopologySocket *s390_create_socket(MachineState *ms,
+  S390TopologyBook *book,
+  int id, Error **errp)
+{


Same questions/comments as above.


Same answer, should not happen.
I will remove the check.




+DeviceState *dev;
+S390TopologySocket *socket;
+
+if (book->bus->num_children >= ms->smp.sockets) {
+error_setg(errp, "Unable to create more sockets.");
+return NULL;
+}
+
+dev = qdev_new(TYPE_S390_TOPOLOGY_SOCKET);
+qdev_realize_and_unref(dev, book->bus, 

Re: [PATCH 2/4] Adding multi-interface support for multi-FD on destination side

2022-07-13 Thread Het Gala



On 17/06/22 12:10 am, Dr. David Alan Gilbert wrote:

* Het Gala (het.g...@nutanix.com) wrote:

i) Modified the format of qemu monitor command: ‘migrate-incoming’ by adding
a list, each element in the list is to open listeners with a given number
of multiFD channels.

ii) Qemu starts with -incoming flag defer and -multi-fd-incoming defer to
 allow the modified 'migrate-incoming' command to be used.

iii) Format for -multi-fd-incoming flag as a comma separated string has been
  added with each substring containing listener socket address and number
  of sockets to open.

Suggested-by: Manish Mishra 
Signed-off-by: Het Gala 
---
  include/qapi/util.h   |   1 +
  migration/migration.c | 149 --
  migration/migration.h |   2 +
  migration/socket.c|  11 ++--
  migration/socket.h|   3 +-
  monitor/hmp-cmds.c|  42 
  qapi/migration.json   |  43 ++--
  qapi/qapi-util.c  |  27 
  qemu-options.hx   |  18 +
  softmmu/vl.c  |  30 -
  10 files changed, 265 insertions(+), 61 deletions(-)

diff --git a/include/qapi/util.h b/include/qapi/util.h
index 3041feb3d9..88fb2270db 100644
--- a/include/qapi/util.h
+++ b/include/qapi/util.h
@@ -29,6 +29,7 @@ bool qapi_bool_parse(const char *name, const char *value, 
bool *obj,
   Error **errp);
  
  int parse_qapi_name(const char *name, bool complete);

+struct strList *strList_from_string(const char *in, char c);
  
  /*

   * For any GenericList @list, insert @element at the front.
diff --git a/migration/migration.c b/migration/migration.c
index c408175aeb..9b0ad732e7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -477,28 +477,39 @@ void migrate_add_address(SocketAddress *address)
QAPI_CLONE(SocketAddress, address));
  }
  
-static void qemu_start_incoming_migration(const char *uri, Error **errp)

+static void qemu_start_incoming_migration(const char *uri, uint8_t number,
+  int idx, Error **errp)
  {
  const char *p = NULL;
  
-migrate_protocol_allow_multi_channels(false); /* reset it anyway */

-qapi_event_send_migration(MIGRATION_STATUS_SETUP);
-if (strstart(uri, "tcp:", ) ||
-strstart(uri, "unix:", NULL) ||
-strstart(uri, "vsock:", NULL)) {
-migrate_protocol_allow_multi_channels(true);
-socket_start_incoming_migration(p ? p : uri, errp);
-#ifdef CONFIG_RDMA
-} else if (strstart(uri, "rdma:", )) {
-rdma_start_incoming_migration(p, errp);
-#endif
-} else if (strstart(uri, "exec:", )) {
-exec_start_incoming_migration(p, errp);
-} else if (strstart(uri, "fd:", )) {
-fd_start_incoming_migration(p, errp);
+if (number ==  0) {
+migrate_protocol_allow_multi_channels(false); /* reset it anyway */
+qapi_event_send_migration(MIGRATION_STATUS_SETUP);
+if (strstart(uri, "tcp:", ) ||
+strstart(uri, "unix:", NULL) ||
+strstart(uri, "vsock:", NULL)) {
+migrate_protocol_allow_multi_channels(true);
+#ifdef CONFIG_RDMA
+} else if (strstart(uri, "rdma:", )) {
+rdma_start_incoming_migration(p, errp);
+#endif
+} else if (strstart(uri, "exec:", )) {
+exec_start_incoming_migration(p, errp);
+} else if (strstart(uri, "fd:", )) {
+fd_start_incoming_migration(p, errp);
+} else {
+error_setg(errp, "unknown migration protocol: %s", uri);
+}
  } else {
-error_setg(errp, "unknown migration protocol: %s", uri);
+/* multi-FD parameters only support tcp network protocols */
+if (!strstart(uri, "tcp:", )) {
+error_setg(errp, "multifd-destination uri supports "
+"tcp protocol only");
+return;
+}
+store_multifd_migration_params(p ? p : uri, NULL, number, idx, errp);
  }
+socket_start_incoming_migration(p ? p : uri, number, errp);
  }
  
  static void process_incoming_migration_bh(void *opaque)

@@ -2140,7 +2151,17 @@ void migrate_del_blocker(Error *reason)
  migration_blockers = g_slist_remove(migration_blockers, reason);
  }
  
-void qmp_migrate_incoming(const char *uri, Error **errp)

+static inline int incoming_multi_fd_uri_parse(const char *str, char delim)
+{
+int count = 0;
+for (int i = 0; i < strlen(str); i++) {
+count += (str[i] == delim);
+}
+return count;
+}

That's a bit more general little helper function; I guess it could go in
util/ somewhere (something like qemu_string_count_delim ???)


> Yes David, I will include this helper function in util folder, in the 
same patch


with all other helper functions I am trying to include.




+/* migrate_incoming comes from -incoming flag in qemu process */
+void migrate_incoming(const char *uri, Error **errp)
  {
  Error *local_err = NULL;
  static 

Re: [RFC v3 3/8] block: pass size to bdrv_unregister_buf()

2022-07-13 Thread Hanna Reitz

On 08.07.22 06:17, Stefan Hajnoczi wrote:

The only implementor of bdrv_register_buf() is block/nvme.c, where the
size is not needed when unregistering a buffer. This is because
util/vfio-helpers.c can look up mappings by address.

Future block drivers that implement bdrv_register_buf() may not be able
to do their job given only the buffer address. Add a size argument to
bdrv_unregister_buf().

Also document the assumptions about
bdrv_register_buf()/bdrv_unregister_buf() calls. The same 
values that were given to bdrv_register_buf() must be given to
bdrv_unregister_buf().

gcc 11.2.1 emits a spurious warning that img_bench()'s buf_size local
variable might be uninitialized, so it's necessary to silence the
compiler.

Signed-off-by: Stefan Hajnoczi 
---
  include/block/block-global-state.h  | 5 -
  include/block/block_int-common.h| 2 +-
  include/sysemu/block-backend-global-state.h | 2 +-
  block/block-backend.c   | 4 ++--
  block/io.c  | 6 +++---
  block/nvme.c| 2 +-
  qemu-img.c  | 4 ++--
  7 files changed, 14 insertions(+), 11 deletions(-)


Reviewed-by: Hanna Reitz 




[PATCH] MAINTAINERS: Add myself as Guest Agent co-maintainer

2022-07-13 Thread Konstantin Kostiuk
Signed-off-by: Konstantin Kostiuk 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 450abd0252..22a4ffe0a2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2880,6 +2880,7 @@ T: git https://repo.or.cz/qemu/armbru.git qapi-next
 
 QEMU Guest Agent
 M: Michael Roth 
+M: Konstantin Kostiuk 
 S: Maintained
 F: qga/
 F: docs/interop/qemu-ga.rst
-- 
2.25.1




[PATCH v2 09/11] s390x: Introduce PV query interface

2022-07-13 Thread Janosch Frank
Introduce an interface over which we can get information about UV data.

Signed-off-by: Janosch Frank 
---
 hw/s390x/pv.c  | 61 ++
 hw/s390x/s390-virtio-ccw.c |  5 
 include/hw/s390x/pv.h  | 10 +++
 3 files changed, 76 insertions(+)

diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index 401b63d6cb..a5af4ddf46 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -20,6 +20,11 @@
 #include "exec/confidential-guest-support.h"
 #include "hw/s390x/ipl.h"
 #include "hw/s390x/pv.h"
+#include "target/s390x/kvm/kvm_s390x.h"
+
+static bool info_valid;
+static struct kvm_s390_pv_info_vm info_vm;
+static struct kvm_s390_pv_info_dump info_dump;
 
 static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, void *data)
 {
@@ -56,6 +61,42 @@ static int __s390_pv_cmd(uint32_t cmd, const char *cmdname, 
void *data)
 }  \
 }
 
+int s390_pv_query_info(void)
+{
+struct kvm_s390_pv_info info = {
+.header.id = KVM_PV_INFO_VM,
+.header.len_max = sizeof(info.header) + sizeof(info.vm),
+};
+int rc;
+
+/* Info API's first user is dump so they are bundled */
+if (!kvm_s390_get_protected_dump()) {
+return 0;
+}
+
+rc = s390_pv_cmd(KVM_PV_INFO, );
+if (rc) {
+error_report("KVM PV INFO cmd %x failed: %s",
+ info.header.id, strerror(rc));
+return rc;
+}
+memcpy(_vm, , sizeof(info.vm));
+
+info.header.id = KVM_PV_INFO_DUMP;
+info.header.len_max = sizeof(info.header) + sizeof(info.dump);
+rc = s390_pv_cmd(KVM_PV_INFO, );
+if (rc) {
+error_report("KVM PV INFO cmd %x failed: %s",
+ info.header.id, strerror(rc));
+return rc;
+}
+
+memcpy(_dump, , sizeof(info.dump));
+info_valid = true;
+
+return rc;
+}
+
 int s390_pv_vm_enable(void)
 {
 return s390_pv_cmd(KVM_PV_ENABLE, NULL);
@@ -114,6 +155,26 @@ void s390_pv_inject_reset_error(CPUState *cs)
 env->regs[r1 + 1] = DIAG_308_RC_INVAL_FOR_PV;
 }
 
+uint64_t kvm_s390_pv_dmp_get_size_cpu(void)
+{
+return info_dump.dump_cpu_buffer_len;
+}
+
+uint64_t kvm_s390_pv_dmp_get_size_complete(void)
+{
+return info_dump.dump_config_finalize_len;
+}
+
+uint64_t kvm_s390_pv_dmp_get_size_mem(void)
+{
+return info_dump.dump_config_mem_buffer_per_1m;
+}
+
+bool kvm_s390_pv_info_basic_valid(void)
+{
+return info_valid;
+}
+
 #define TYPE_S390_PV_GUEST "s390-pv-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(S390PVGuest, S390_PV_GUEST)
 
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index cc3097bfee..f9401e392b 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -366,6 +366,11 @@ static int s390_machine_protect(S390CcwMachineState *ms)
 
 ms->pv = true;
 
+rc = s390_pv_query_info();
+if (rc) {
+goto out_err;
+}
+
 /* Set SE header and unpack */
 rc = s390_ipl_prepare_pv_header();
 if (rc) {
diff --git a/include/hw/s390x/pv.h b/include/hw/s390x/pv.h
index 1f1f545bfc..6fa55bf70e 100644
--- a/include/hw/s390x/pv.h
+++ b/include/hw/s390x/pv.h
@@ -38,6 +38,7 @@ static inline bool s390_is_pv(void)
 return ccw->pv;
 }
 
+int s390_pv_query_info(void);
 int s390_pv_vm_enable(void);
 void s390_pv_vm_disable(void);
 int s390_pv_set_sec_parms(uint64_t origin, uint64_t length);
@@ -46,8 +47,13 @@ void s390_pv_prep_reset(void);
 int s390_pv_verify(void);
 void s390_pv_unshare(void);
 void s390_pv_inject_reset_error(CPUState *cs);
+uint64_t kvm_s390_pv_dmp_get_size_cpu(void);
+uint64_t kvm_s390_pv_dmp_get_size_mem(void);
+uint64_t kvm_s390_pv_dmp_get_size_complete(void);
+bool kvm_s390_pv_info_basic_valid(void);
 #else /* CONFIG_KVM */
 static inline bool s390_is_pv(void) { return false; }
+static inline int s390_pv_query_info(void) { return 0; }
 static inline int s390_pv_vm_enable(void) { return 0; }
 static inline void s390_pv_vm_disable(void) {}
 static inline int s390_pv_set_sec_parms(uint64_t origin, uint64_t length) { 
return 0; }
@@ -56,6 +62,10 @@ static inline void s390_pv_prep_reset(void) {}
 static inline int s390_pv_verify(void) { return 0; }
 static inline void s390_pv_unshare(void) {}
 static inline void s390_pv_inject_reset_error(CPUState *cs) {};
+static inline uint64_t kvm_s390_pv_dmp_get_size_cpu(void) { return 0; }
+static inline uint64_t kvm_s390_pv_dmp_get_size_mem(void) { return 0; }
+static inline uint64_t kvm_s390_pv_dmp_get_size_complete(void) { return 0; }
+static inline bool kvm_s390_pv_info_basic_valid(void) { return false; }
 #endif /* CONFIG_KVM */
 
 int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
-- 
2.34.1




[PATCH v2 00/11] dump: Add arch section and s390x PV dump

2022-07-13 Thread Janosch Frank
Previously this series was two separate series:
 * Arch section support
   Adds the possibility for arch code to add custom section data.

 * s390 PV dump support
   Adds PV dump data to the custom arch sections.

I've chosen to merge them so it's easier to understand why the arch
section support has been implement the way it is.

Additionally I've added a cleanup patch beforehand which cleans up the
GuestPhysBlock usage.

v2:
* Added "dump: Cleanup memblock usage"
* Fixed whitespace problems and review comments
* Added missing *errp check in dump_end


Janosch Frank (11):
  dump: Cleanup memblock usage
  dump: Allocate header
  dump: Split write of section headers and data and add a prepare step
  dump: Reorder struct DumpState
  dump/dump: Add section string table support
  dump/dump: Add arch section support
  linux header sync
  s390x: Add protected dump cap
  s390x: Introduce PV query interface
  s390x: Add KVM PV dump interface
  s390x: pv: Add dump support

 dump/dump.c  | 443 ++-
 hw/s390x/pv.c| 112 +
 hw/s390x/s390-virtio-ccw.c   |   5 +
 include/elf.h|   1 +
 include/hw/s390x/pv.h|  18 ++
 include/sysemu/dump-arch.h   |  27 +++
 include/sysemu/dump.h|  70 +-
 linux-headers/linux/kvm.h|  55 +
 target/s390x/arch_dump.c | 248 +---
 target/s390x/kvm/kvm.c   |   7 +
 target/s390x/kvm/kvm_s390x.h |   1 +
 11 files changed, 780 insertions(+), 207 deletions(-)

-- 
2.34.1




[PATCH v2 08/11] s390x: Add protected dump cap

2022-07-13 Thread Janosch Frank
Add a protected dump capability for later feature checking.

Signed-off-by: Janosch Frank 
---
 target/s390x/kvm/kvm.c   | 7 +++
 target/s390x/kvm/kvm_s390x.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 7bd8db0e7b..cbd8c91424 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -157,6 +157,7 @@ static int cap_ri;
 static int cap_hpage_1m;
 static int cap_vcpu_resets;
 static int cap_protected;
+static int cap_protected_dump;
 
 static bool mem_op_storage_key_support;
 
@@ -362,6 +363,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
 cap_vcpu_resets = kvm_check_extension(s, KVM_CAP_S390_VCPU_RESETS);
 cap_protected = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
+cap_protected_dump = kvm_check_extension(s, KVM_CAP_S390_PROTECTED_DUMP);
 
 kvm_vm_enable_cap(s, KVM_CAP_S390_USER_SIGP, 0);
 kvm_vm_enable_cap(s, KVM_CAP_S390_VECTOR_REGISTERS, 0);
@@ -2043,6 +2045,11 @@ int kvm_s390_assign_subch_ioeventfd(EventNotifier 
*notifier, uint32_t sch,
 return kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, );
 }
 
+int kvm_s390_get_protected_dump(void)
+{
+return cap_protected_dump;
+}
+
 int kvm_s390_get_ri(void)
 {
 return cap_ri;
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index 05a5e1e6f4..31a69f9ce2 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -26,6 +26,7 @@ int kvm_s390_set_cpu_state(S390CPU *cpu, uint8_t cpu_state);
 void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu);
 int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu);
 int kvm_s390_get_hpage_1m(void);
+int kvm_s390_get_protected_dump(void);
 int kvm_s390_get_ri(void);
 int kvm_s390_get_clock(uint8_t *tod_high, uint64_t *tod_clock);
 int kvm_s390_get_clock_ext(uint8_t *tod_high, uint64_t *tod_clock);
-- 
2.34.1




[PATCH v2 04/11] dump: Reorder struct DumpState

2022-07-13 Thread Janosch Frank
Let's move ELF related members into one block and guest memory related
ones into another to improve readability.

Signed-off-by: Janosch Frank 
Reviewed-by: Richard Henderson 
---
 include/sysemu/dump.h | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index bd49532232..8379e29ef6 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -154,15 +154,8 @@ typedef struct DumpState {
 GuestPhysBlockList guest_phys_blocks;
 ArchDumpInfo dump_info;
 MemoryMappingList list;
-uint32_t phdr_num;
-uint32_t shdr_num;
 bool resume;
 bool detached;
-ssize_t note_size;
-hwaddr shdr_offset;
-hwaddr phdr_offset;
-hwaddr section_offset;
-hwaddr note_offset;
 hwaddr memory_offset;
 int fd;
 
@@ -171,6 +164,16 @@ typedef struct DumpState {
 int64_t begin; /* Start address of the chunk we want to dump */
 int64_t length;/* Length of the dump we want to dump */
 
+/* Elf dump related data */
+uint32_t phdr_num;
+uint32_t shdr_num;
+uint32_t sh_info;
+ssize_t note_size;
+hwaddr shdr_offset;
+hwaddr phdr_offset;
+hwaddr note_offset;
+hwaddr section_offset;
+
 void *elf_header;
 void *elf_section_hdrs;
 uint64_t elf_section_data_size;
-- 
2.34.1




[PATCH v2 10/11] s390x: Add KVM PV dump interface

2022-07-13 Thread Janosch Frank
Let's add a few bits of code which hide the new KVM PV dump API from
us via new functions.

Signed-off-by: Janosch Frank 
---
 hw/s390x/pv.c | 51 +++
 include/hw/s390x/pv.h |  8 +++
 2 files changed, 59 insertions(+)

diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index a5af4ddf46..48591c387d 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -175,6 +175,57 @@ bool kvm_s390_pv_info_basic_valid(void)
 return info_valid;
 }
 
+static int s390_pv_dump_cmd(uint64_t subcmd, uint64_t uaddr, uint64_t gaddr,
+uint64_t len)
+{
+struct kvm_s390_pv_dmp dmp = {
+.subcmd = subcmd,
+.buff_addr = uaddr,
+.buff_len = len,
+.gaddr = gaddr,
+};
+int ret;
+
+ret = s390_pv_cmd(KVM_PV_DUMP, (void *));
+if (ret) {
+error_report("KVM DUMP command %ld failed", subcmd);
+}
+return ret;
+}
+
+int kvm_s390_dump_cpu(S390CPU *cpu, void *buff)
+{
+struct kvm_s390_pv_dmp dmp = {
+.subcmd = KVM_PV_DUMP_CPU,
+.buff_addr = (uint64_t)buff,
+.gaddr = 0,
+.buff_len = info_dump.dump_cpu_buffer_len,
+};
+struct kvm_pv_cmd pv = {
+.cmd = KVM_PV_DUMP,
+.data = (uint64_t),
+};
+
+return kvm_vcpu_ioctl(CPU(cpu), KVM_S390_PV_CPU_COMMAND, );
+}
+
+int kvm_s390_dump_init(void)
+{
+return s390_pv_dump_cmd(KVM_PV_DUMP_INIT, 0, 0, 0);
+}
+
+int kvm_s390_dump_mem(uint64_t gaddr, size_t len, void *dest)
+{
+return s390_pv_dump_cmd(KVM_PV_DUMP_CONFIG_STATE, (uint64_t)dest,
+gaddr, len);
+}
+
+int kvm_s390_dump_finish(void *buff)
+{
+return s390_pv_dump_cmd(KVM_PV_DUMP_COMPLETE, (uint64_t)buff, 0,
+info_dump.dump_config_finalize_len);
+}
+
 #define TYPE_S390_PV_GUEST "s390-pv-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(S390PVGuest, S390_PV_GUEST)
 
diff --git a/include/hw/s390x/pv.h b/include/hw/s390x/pv.h
index 6fa55bf70e..f37021e189 100644
--- a/include/hw/s390x/pv.h
+++ b/include/hw/s390x/pv.h
@@ -51,6 +51,10 @@ uint64_t kvm_s390_pv_dmp_get_size_cpu(void);
 uint64_t kvm_s390_pv_dmp_get_size_mem(void);
 uint64_t kvm_s390_pv_dmp_get_size_complete(void);
 bool kvm_s390_pv_info_basic_valid(void);
+int kvm_s390_dump_init(void);
+int kvm_s390_dump_cpu(S390CPU *cpu, void *buff);
+int kvm_s390_dump_mem(uint64_t addr, size_t len, void *dest);
+int kvm_s390_dump_finish(void *buff);
 #else /* CONFIG_KVM */
 static inline bool s390_is_pv(void) { return false; }
 static inline int s390_pv_query_info(void) { return 0; }
@@ -66,6 +70,10 @@ static inline uint64_t kvm_s390_pv_dmp_get_size_cpu(void) { 
return 0; }
 static inline uint64_t kvm_s390_pv_dmp_get_size_mem(void) { return 0; }
 static inline uint64_t kvm_s390_pv_dmp_get_size_complete(void) { return 0; }
 static inline bool kvm_s390_pv_info_basic_valid(void) { return false; }
+static inline int kvm_s390_dump_init(void) { return 0; }
+static inline int kvm_s390_dump_cpu(S390CPU *cpu, void *buff, size_t len) { 
return 0; }
+static inline int kvm_s390_dump_mem(uint64_t addr, size_t len, void *dest) { 
return 0; }
+static inline int kvm_s390_dump_finish(void *buff) { return 0; }
 #endif /* CONFIG_KVM */
 
 int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
-- 
2.34.1




[PATCH v2 07/11] linux header sync

2022-07-13 Thread Janosch Frank
Signed-off-by: Janosch Frank 
---
 linux-headers/linux/kvm.h | 55 +++
 1 file changed, 55 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 0d05d02ee4..ae5db2e44c 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1150,6 +1150,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DISABLE_QUIRKS2 213
 /* #define KVM_CAP_VM_TSC_CONTROL 214 */
 #define KVM_CAP_SYSTEM_EVENT_DATA 215
+#define KVM_CAP_S390_PROTECTED_DUMP 217
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1651,6 +1652,55 @@ struct kvm_s390_pv_unp {
__u64 tweak;
 };
 
+enum pv_cmd_info_id {
+   KVM_PV_INFO_VM,
+   KVM_PV_INFO_DUMP,
+};
+
+struct kvm_s390_pv_info_dump {
+   __u64 dump_cpu_buffer_len;
+   __u64 dump_config_mem_buffer_per_1m;
+   __u64 dump_config_finalize_len;
+};
+
+struct kvm_s390_pv_info_vm {
+   __u64 inst_calls_list[4];
+   __u64 max_cpus;
+   __u64 max_guests;
+   __u64 max_guest_addr;
+   __u64 feature_indication;
+};
+
+struct kvm_s390_pv_info_header {
+   __u32 id;
+   __u32 len_max;
+   __u32 len_written;
+   __u32 reserved;
+};
+
+struct kvm_s390_pv_info {
+   struct kvm_s390_pv_info_header header;
+   union {
+   struct kvm_s390_pv_info_dump dump;
+   struct kvm_s390_pv_info_vm vm;
+   };
+};
+
+enum pv_cmd_dmp_id {
+KVM_PV_DUMP_INIT,
+KVM_PV_DUMP_CONFIG_STATE,
+KVM_PV_DUMP_COMPLETE,
+KVM_PV_DUMP_CPU,
+};
+
+struct kvm_s390_pv_dmp {
+__u64 subcmd;
+__u64 buff_addr;
+__u64 buff_len;
+__u64 gaddr;
+__u64 reserved[4];
+};
+
 enum pv_cmd_id {
KVM_PV_ENABLE,
KVM_PV_DISABLE,
@@ -1659,6 +1709,8 @@ enum pv_cmd_id {
KVM_PV_VERIFY,
KVM_PV_PREP_RESET,
KVM_PV_UNSHARE_ALL,
+KVM_PV_INFO,
+KVM_PV_DUMP,
 };
 
 struct kvm_pv_cmd {
@@ -1733,6 +1785,7 @@ struct kvm_xen_vcpu_attr {
 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA   0x4
 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST 0x5
 
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
/* Guest initialization commands */
@@ -2066,4 +2119,6 @@ struct kvm_stats_desc {
 /* Available with KVM_CAP_XSAVE2 */
 #define KVM_GET_XSAVE2   _IOR(KVMIO,  0xcf, struct kvm_xsave)
 
+#define KVM_S390_PV_CPU_COMMAND _IOWR(KVMIO, 0xd0, struct kvm_pv_cmd)
+
 #endif /* __LINUX_KVM_H */
-- 
2.34.1




[PATCH v2 05/11] dump/dump: Add section string table support

2022-07-13 Thread Janosch Frank
Time to add a bit more descriptiveness to the dumps.

Signed-off-by: Janosch Frank 
Reviewed-by: Richard Henderson 
---
 dump/dump.c   | 106 --
 include/sysemu/dump.h |   1 +
 2 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 467d934bc1..31e2a85372 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -99,6 +99,7 @@ static int dump_cleanup(DumpState *s)
 close(s->fd);
 g_free(s->guest_note);
 g_free(s->elf_header);
+g_array_unref(s->string_table_buf);
 s->guest_note = NULL;
 if (s->resume) {
 if (s->detached) {
@@ -359,14 +360,47 @@ static size_t write_elf_section_hdr_zero(DumpState *s, 
void *buff)
 return dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
 }
 
+static void write_elf_section_hdr_string(DumpState *s, void *buff)
+{
+Elf32_Shdr shdr32;
+Elf64_Shdr shdr64;
+int shdr_size;
+void *shdr = buff;
+
+if (dump_is_64bit(s)) {
+shdr_size = sizeof(Elf64_Shdr);
+memset(, 0, shdr_size);
+shdr64.sh_type = SHT_STRTAB;
+shdr64.sh_offset = s->section_offset + s->elf_section_data_size;
+shdr64.sh_name = s->string_table_buf->len;
+g_array_append_vals(s->string_table_buf, ".strtab", sizeof(".strtab"));
+shdr64.sh_size = s->string_table_buf->len;
+shdr = 
+} else {
+shdr_size = sizeof(Elf32_Shdr);
+memset(, 0, shdr_size);
+shdr32.sh_type = SHT_STRTAB;
+shdr32.sh_offset = s->section_offset + s->elf_section_data_size;
+shdr32.sh_name = s->string_table_buf->len;
+g_array_append_vals(s->string_table_buf, ".strtab", sizeof(".strtab"));
+shdr32.sh_size = s->string_table_buf->len;
+shdr = 
+}
+
+memcpy(buff, shdr, shdr_size);
+}
+
 static void prepare_elf_section_hdrs(DumpState *s)
 {
 uint8_t *buff_hdr;
-size_t len, sizeof_shdr;
+size_t len, size = 0, sizeof_shdr;
+Elf64_Ehdr *hdr64 = s->elf_header;
+Elf32_Ehdr *hdr32 = s->elf_header;
 
 /*
  * Section ordering:
  * - HDR zero (if needed)
+ * - String table hdr
  */
 sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
 len = sizeof_shdr * s->shdr_num;
@@ -377,6 +411,22 @@ static void prepare_elf_section_hdrs(DumpState *s)
 if (s->phdr_num == PN_XNUM) {
 write_elf_section_hdr_zero(s, buff_hdr);
 }
+buff_hdr += size;
+
+if (s->shdr_num < 2) {
+return;
+}
+
+/*
+ * String table needs to be last section since strings are added
+ * via arch_sections_write_hdr().
+ */
+write_elf_section_hdr_string(s, buff_hdr);
+if (dump_is_64bit(s)) {
+hdr64->e_shstrndx = cpu_to_dump16(s, s->shdr_num - 1);
+} else {
+hdr32->e_shstrndx = cpu_to_dump16(s, s->shdr_num - 1);
+}
 }
 
 static void prepare_elf_sections(DumpState *s, Error **errp)
@@ -405,11 +455,18 @@ static void write_elf_sections(DumpState *s, Error **errp)
 {
 int ret;
 
-/* Write section zero */
+/* Write section zero and arch sections */
 ret = fd_write_vmcore(s->elf_section_data, s->elf_section_data_size, s);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "dump: failed to write section data");
 }
+
+/* Write string table data */
+ret = fd_write_vmcore(s->string_table_buf->data,
+  s->string_table_buf->len, s);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "dump: failed to write string table 
data");
+}
 }
 
 static void write_data(DumpState *s, void *buf, int length, Error **errp)
@@ -592,6 +649,9 @@ static void dump_begin(DumpState *s, Error **errp)
  *   --
  *   |  memory |
  *   --
+ *   |  sectn data |
+ *   --
+
  *
  * we only know where the memory is saved after we write elf note into
  * vmcore.
@@ -677,6 +737,7 @@ static void create_vmcore(DumpState *s, Error **errp)
 return;
 }
 
+/* Iterate over memory and dump it to file */
 dump_iterate(s, errp);
 if (*errp) {
 return;
@@ -1659,6 +1720,13 @@ static void dump_init(DumpState *s, int fd, bool 
has_format,
 s->has_filter = has_filter;
 s->begin = begin;
 s->length = length;
+/* First index is 0, it's the special null name */
+s->string_table_buf = g_array_new(FALSE, TRUE, 1);
+/*
+ * Allocate the null name, due to the clearing option set to true
+ * it will be 0.
+ */
+g_array_set_size(s->string_table_buf, 1);
 
 memory_mapping_list_init(>list);
 
@@ -1819,19 +1887,31 @@ static void dump_init(DumpState *s, int fd, bool 
has_format,
 }
 }
 
-if (dump_is_64bit(s)) {
-s->phdr_offset = sizeof(Elf64_Ehdr);
-s->shdr_offset = s->phdr_offset + sizeof(Elf64_Phdr) * s->phdr_num;
-s->note_offset = s->shdr_offset + sizeof(Elf64_Shdr) * s->shdr_num;
-

[PATCH v2 06/11] dump/dump: Add arch section support

2022-07-13 Thread Janosch Frank
Add hooks which architectures can use to add arbitrary data to custom
sections.

Signed-off-by: Janosch Frank 
---
 dump/dump.c| 21 ++---
 include/sysemu/dump-arch.h | 27 +++
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 31e2a85372..02de00b6de 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -400,6 +400,7 @@ static void prepare_elf_section_hdrs(DumpState *s)
 /*
  * Section ordering:
  * - HDR zero (if needed)
+ * - Arch section hdrs
  * - String table hdr
  */
 sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
@@ -417,6 +418,9 @@ static void prepare_elf_section_hdrs(DumpState *s)
 return;
 }
 
+size = dump_arch_sections_write_hdr(>dump_info, s, buff_hdr);
+buff_hdr += size;
+
 /*
  * String table needs to be last section since strings are added
  * via arch_sections_write_hdr().
@@ -567,14 +571,23 @@ static void get_offset_range(hwaddr phys_addr,
 }
 }
 
-static void write_elf_loads(DumpState *s, Error **errp)
+static void write_elf_phdr_loads(DumpState *s, Error **errp)
 {
 ERRP_GUARD();
 hwaddr offset, filesz;
 MemoryMapping *memory_mapping;
 uint32_t phdr_index = 1;
+hwaddr min = 0, max = 0;
 
 QTAILQ_FOREACH(memory_mapping, >list.head, next) {
+if (memory_mapping->phys_addr < min) {
+min = memory_mapping->phys_addr;
+}
+if (memory_mapping->phys_addr + memory_mapping->length > max) {
+max = memory_mapping->phys_addr + memory_mapping->length;
+}
+
+
 get_offset_range(memory_mapping->phys_addr,
  memory_mapping->length,
  s, , );
@@ -682,8 +695,8 @@ static void dump_begin(DumpState *s, Error **errp)
 return;
 }
 
-/* write all PT_LOAD to vmcore */
-write_elf_loads(s, errp);
+/* write all PT_LOADs to vmcore */
+write_elf_phdr_loads(s, errp);
 if (*errp) {
 return;
 }
@@ -723,6 +736,7 @@ static void dump_end(DumpState *s, Error **errp)
 return;
 }
 s->elf_section_data = g_malloc0(s->elf_section_data_size);
+dump_arch_sections_write(>dump_info, s, s->elf_section_data);
 
 /* write sections to vmcore */
 write_elf_sections(s, errp);
@@ -1894,6 +1908,7 @@ static void dump_init(DumpState *s, int fd, bool 
has_format,
  * If phdr_num overflowed we have at least one section header
  * More sections/hdrs can be added by the architectures
  */
+dump_arch_sections_add(>dump_info, (void *)s);
 if (s->shdr_num > 1) {
 /* Reserve the string table */
 s->shdr_num += 1;
diff --git a/include/sysemu/dump-arch.h b/include/sysemu/dump-arch.h
index e25b02e990..de77908424 100644
--- a/include/sysemu/dump-arch.h
+++ b/include/sysemu/dump-arch.h
@@ -21,6 +21,9 @@ typedef struct ArchDumpInfo {
 uint32_t page_size;  /* The target's page size. If it's variable and
   * unknown, then this should be the maximum. */
 uint64_t phys_base;  /* The target's physmem base. */
+void (*arch_sections_add_fn)(void *opaque);
+uint64_t (*arch_sections_write_hdr_fn)(void *opaque, uint8_t *buff);
+void (*arch_sections_write_fn)(void *opaque, uint8_t *buff);
 } ArchDumpInfo;
 
 struct GuestPhysBlockList; /* memory_mapping.h */
@@ -28,4 +31,28 @@ int cpu_get_dump_info(ArchDumpInfo *info,
   const struct GuestPhysBlockList *guest_phys_blocks);
 ssize_t cpu_get_note_size(int class, int machine, int nr_cpus);
 
+static inline void dump_arch_sections_add(ArchDumpInfo *info, void *opaque)
+{
+if (info->arch_sections_add_fn) {
+info->arch_sections_add_fn(opaque);
+}
+}
+
+static inline uint64_t dump_arch_sections_write_hdr(ArchDumpInfo *info,
+void *opaque, uint8_t *buff)
+{
+if (info->arch_sections_write_hdr_fn) {
+return info->arch_sections_write_hdr_fn(opaque, buff);
+}
+return 0;
+}
+
+static inline void dump_arch_sections_write(ArchDumpInfo *info, void *opaque,
+uint8_t *buff)
+{
+if (info->arch_sections_write_fn) {
+info->arch_sections_write_fn(opaque, buff);
+}
+}
+
 #endif
-- 
2.34.1




[PATCH v2 11/11] s390x: pv: Add dump support

2022-07-13 Thread Janosch Frank
Sometimes dumping a guest from the outside is the only way to get the
data that is needed. This can be the case if a dumping mechanism like
KDUMP hasn't been configured or data needs to be fetched at a specific
point. Dumping a protected guest from the outside without help from
fw/hw doesn't yield sufficient data to be useful. Hence we now
introduce PV dump support.

The PV dump support works by integrating the firmware into the dump
process. New Ultravisor calls are used to initiate the dump process,
dump cpu data, dump memory state and lastly complete the dump process.
The UV calls are exposed by KVM via the new KVM_PV_DUMP command and
its subcommands. The guest's data is fully encrypted and can only be
decrypted by the entity that owns the customer communication key for
the dumped guest. Also dumping needs to be allowed via a flag in the
SE header.

On the QEMU side of things we store the PV dump data in the newly
introduced architecture ELF sections (storage state and completion
data) and the cpu notes (for cpu dump data).

Users can use the zgetdump tool to convert the encrypted QEMU dump to an
unencrypted one.

Signed-off-by: Janosch Frank 
---
 include/elf.h|   1 +
 target/s390x/arch_dump.c | 248 ++-
 2 files changed, 219 insertions(+), 30 deletions(-)

diff --git a/include/elf.h b/include/elf.h
index 3a4bcb646a..58f76fd5b4 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -1649,6 +1649,7 @@ typedef struct elf64_shdr {
 #define NT_TASKSTRUCT  4
 #define NT_AUXV6
 #define NT_PRXFPREG 0x46e62b7f  /* copied from 
gdb5.1/include/elf/common.h */
+#define NT_S390_PV_DATA 0x30e   /* s390 protvirt cpu dump data */
 #define NT_S390_GS_CB   0x30b   /* s390 guarded storage registers */
 #define NT_S390_VXRS_HIGH 0x30a /* s390 vector registers 16-31 */
 #define NT_S390_VXRS_LOW  0x309 /* s390 vector registers 0-15 (lower 
half) */
diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index 08daf93ae1..e081aa9483 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -16,7 +16,8 @@
 #include "s390x-internal.h"
 #include "elf.h"
 #include "sysemu/dump.h"
-
+#include "hw/s390x/pv.h"
+#include "kvm/kvm_s390x.h"
 
 struct S390xUserRegsStruct {
 uint64_t psw[2];
@@ -76,9 +77,16 @@ typedef struct noteStruct {
 uint64_t todcmp;
 uint32_t todpreg;
 uint64_t ctrs[16];
+uint8_t dynamic[1];  /*
+  * Would be a flexible array member, if
+  * that was legal inside a union. Real
+  * size comes from PV info interface.
+  */
 } contents;
 } QEMU_PACKED Note;
 
+static bool pv_dump_initialized;
+
 static void s390x_write_elf64_prstatus(Note *note, S390CPU *cpu, int id)
 {
 int i;
@@ -177,52 +185,82 @@ static void s390x_write_elf64_prefix(Note *note, S390CPU 
*cpu, int id)
 note->contents.prefix = cpu_to_be32((uint32_t)(cpu->env.psa));
 }
 
+static void s390x_write_elf64_pv(Note *note, S390CPU *cpu, int id)
+{
+note->hdr.n_type = cpu_to_be32(NT_S390_PV_DATA);
+if (!pv_dump_initialized) {
+return;
+}
+kvm_s390_dump_cpu(cpu, >contents.dynamic);
+}
 
 typedef struct NoteFuncDescStruct {
 int contents_size;
+uint64_t (*note_size_func)(void); /* NULL for non-dynamic sized contents */
 void (*note_contents_func)(Note *note, S390CPU *cpu, int id);
+bool pvonly;
 } NoteFuncDesc;
 
 static const NoteFuncDesc note_core[] = {
-{sizeof_field(Note, contents.prstatus), s390x_write_elf64_prstatus},
-{sizeof_field(Note, contents.fpregset), s390x_write_elf64_fpregset},
-{ 0, NULL}
+{sizeof_field(Note, contents.prstatus), NULL, s390x_write_elf64_prstatus, 
false},
+{sizeof_field(Note, contents.fpregset), NULL, s390x_write_elf64_fpregset, 
false},
+{ 0, NULL, NULL}
 };
 
 static const NoteFuncDesc note_linux[] = {
-{sizeof_field(Note, contents.prefix),   s390x_write_elf64_prefix},
-{sizeof_field(Note, contents.ctrs), s390x_write_elf64_ctrs},
-{sizeof_field(Note, contents.timer),s390x_write_elf64_timer},
-{sizeof_field(Note, contents.todcmp),   s390x_write_elf64_todcmp},
-{sizeof_field(Note, contents.todpreg),  s390x_write_elf64_todpreg},
-{sizeof_field(Note, contents.vregslo),  s390x_write_elf64_vregslo},
-{sizeof_field(Note, contents.vregshi),  s390x_write_elf64_vregshi},
-{sizeof_field(Note, contents.gscb), s390x_write_elf64_gscb},
-{ 0, NULL}
+{sizeof_field(Note, contents.prefix),   NULL, s390x_write_elf64_prefix,  
false},
+{sizeof_field(Note, contents.ctrs), NULL, s390x_write_elf64_ctrs,
false},
+{sizeof_field(Note, contents.timer),NULL, s390x_write_elf64_timer,   
false},
+{sizeof_field(Note, contents.todcmp),   NULL, s390x_write_elf64_todcmp,  
false},
+{sizeof_field(Note, contents.todpreg),  NULL, 

[PATCH v2 03/11] dump: Split write of section headers and data and add a prepare step

2022-07-13 Thread Janosch Frank
By splitting the writing of the section headers and (future) section
data we prepare for the addition of a string table section and
architecture sections.

Signed-off-by: Janosch Frank 
---
 dump/dump.c   | 116 --
 include/sysemu/dump.h |   4 ++
 2 files changed, 94 insertions(+), 26 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 16d7474258..467d934bc1 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -342,30 +342,73 @@ static void write_elf_phdr_note(DumpState *s, Error 
**errp)
 }
 }
 
-static void write_elf_section(DumpState *s, int type, Error **errp)
+static size_t write_elf_section_hdr_zero(DumpState *s, void *buff)
 {
-Elf32_Shdr shdr32;
-Elf64_Shdr shdr64;
-int shdr_size;
-void *shdr;
-int ret;
+if (dump_is_64bit(s)) {
+Elf64_Shdr *shdr64 = buff;
 
-if (type == 0) {
-shdr_size = sizeof(Elf32_Shdr);
-memset(, 0, shdr_size);
-shdr32.sh_info = cpu_to_dump32(s, s->phdr_num);
-shdr = 
+memset(buff, 0, sizeof(Elf64_Shdr));
+shdr64->sh_info = cpu_to_dump32(s, s->phdr_num);
 } else {
-shdr_size = sizeof(Elf64_Shdr);
-memset(, 0, shdr_size);
-shdr64.sh_info = cpu_to_dump32(s, s->phdr_num);
-shdr = 
+Elf32_Shdr *shdr32 = buff;
+
+memset(buff, 0, sizeof(Elf32_Shdr));
+shdr32->sh_info = cpu_to_dump32(s, s->phdr_num);
 }
 
-ret = fd_write_vmcore(shdr, shdr_size, s);
+return dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
+}
+
+static void prepare_elf_section_hdrs(DumpState *s)
+{
+uint8_t *buff_hdr;
+size_t len, sizeof_shdr;
+
+/*
+ * Section ordering:
+ * - HDR zero (if needed)
+ */
+sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
+len = sizeof_shdr * s->shdr_num;
+s->elf_section_hdrs = g_malloc0(len);
+buff_hdr = s->elf_section_hdrs;
+
+/* Write special section first */
+if (s->phdr_num == PN_XNUM) {
+write_elf_section_hdr_zero(s, buff_hdr);
+}
+}
+
+static void prepare_elf_sections(DumpState *s, Error **errp)
+{
+if (!s->shdr_num) {
+return;
+}
+
+prepare_elf_section_hdrs(s);
+}
+
+static void write_elf_section_headers(DumpState *s, Error **errp)
+{
+size_t sizeof_shdr;
+int ret;
+
+sizeof_shdr = dump_is_64bit(s) ? sizeof(Elf64_Shdr) : sizeof(Elf32_Shdr);
+
+ret = fd_write_vmcore(s->elf_section_hdrs, s->shdr_num * sizeof_shdr, s);
 if (ret < 0) {
-error_setg_errno(errp, -ret,
- "dump: failed to write section header table");
+error_setg_errno(errp, -ret, "dump: failed to write section data");
+}
+}
+
+static void write_elf_sections(DumpState *s, Error **errp)
+{
+int ret;
+
+/* Write section zero */
+ret = fd_write_vmcore(s->elf_section_data, s->elf_section_data_size, s);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "dump: failed to write section data");
 }
 }
 
@@ -557,12 +600,22 @@ static void dump_begin(DumpState *s, Error **errp)
 /* Write elf header to buffer */
 prepare_elf_header(s);
 
+prepare_elf_sections(s, errp);
+if (*errp) {
+return;
+}
+
 /* Start to write stuff into files*/
 write_elf_header(s, errp);
 if (*errp) {
 return;
 }
 
+write_elf_section_headers(s, errp);
+if (*errp) {
+return;
+}
+
 /* write PT_NOTE to vmcore */
 write_elf_phdr_note(s, errp);
 if (*errp) {
@@ -575,14 +628,6 @@ static void dump_begin(DumpState *s, Error **errp)
 return;
 }
 
-/* write section to vmcore */
-if (s->shdr_num) {
-write_elf_section(s, 1, errp);
-if (*errp) {
-return;
-}
-}
-
 /* write notes to vmcore */
 write_elf_notes(s, errp);
 }
@@ -610,6 +655,19 @@ static void dump_iterate(DumpState *s, Error **errp)
 }
 }
 
+static void dump_end(DumpState *s, Error **errp)
+{
+ERRP_GUARD();
+
+if (!s->elf_section_data_size) {
+return;
+}
+s->elf_section_data = g_malloc0(s->elf_section_data_size);
+
+/* write sections to vmcore */
+write_elf_sections(s, errp);
+}
+
 static void create_vmcore(DumpState *s, Error **errp)
 {
 ERRP_GUARD();
@@ -620,6 +678,12 @@ static void create_vmcore(DumpState *s, Error **errp)
 }
 
 dump_iterate(s, errp);
+if (*errp) {
+return;
+}
+
+/* Write section data after memory has been dumped */
+dump_end(s, errp);
 }
 
 static int write_start_flat_header(int fd)
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 736f681d01..bd49532232 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -172,6 +172,10 @@ typedef struct DumpState {
 int64_t length;/* Length of the dump we want to dump */
 
 void *elf_header;
+void *elf_section_hdrs;
+uint64_t elf_section_data_size;
+void *elf_section_data;
+
 

[PATCH v2 01/11] dump: Cleanup memblock usage

2022-07-13 Thread Janosch Frank
The iteration over the memblocks is hard to understand so it's about
time to clean it up.

struct DumpState's next_block and start members can and should be
local variables within the iterator.

Instead of manually grabbing the next memblock we can use
QTAILQ_FOREACH to iterate over all memblocks.

The begin and length fields in the DumpState have been left untouched
since the qmp arguments share their names.

Signed-off-by: Janosch Frank 
---
 dump/dump.c   | 91 +++
 include/sysemu/dump.h | 47 +++---
 2 files changed, 65 insertions(+), 73 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 4d9658ffa2..6feba3cbfa 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -591,56 +591,27 @@ static void dump_begin(DumpState *s, Error **errp)
 write_elf_notes(s, errp);
 }
 
-static int get_next_block(DumpState *s, GuestPhysBlock *block)
-{
-while (1) {
-block = QTAILQ_NEXT(block, next);
-if (!block) {
-/* no more block */
-return 1;
-}
-
-s->start = 0;
-s->next_block = block;
-if (s->has_filter) {
-if (block->target_start >= s->begin + s->length ||
-block->target_end <= s->begin) {
-/* This block is out of the range */
-continue;
-}
-
-if (s->begin > block->target_start) {
-s->start = s->begin - block->target_start;
-}
-}
-
-return 0;
-}
-}
-
 /* write all memory to vmcore */
 static void dump_iterate(DumpState *s, Error **errp)
 {
 ERRP_GUARD();
 GuestPhysBlock *block;
-int64_t size;
+int64_t memblock_size, memblock_start;
 
-do {
-block = s->next_block;
-
-size = block->target_end - block->target_start;
-if (s->has_filter) {
-size -= s->start;
-if (s->begin + s->length < block->target_end) {
-size -= block->target_end - (s->begin + s->length);
-}
+QTAILQ_FOREACH(block, >guest_phys_blocks.head, next) {
+memblock_start = dump_get_memblock_start(block, s->begin, s->length);
+if (memblock_start == -1) {
+continue;
 }
-write_memory(s, block, s->start, size, errp);
+
+memblock_size = dump_get_memblock_size(block, s->begin, s->length);
+
+/* Write the memory to file */
+write_memory(s, block, memblock_start, memblock_size, errp);
 if (*errp) {
 return;
 }
-
-} while (!get_next_block(s, block));
+}
 }
 
 static void create_vmcore(DumpState *s, Error **errp)
@@ -1490,30 +1461,22 @@ static void create_kdump_vmcore(DumpState *s, Error 
**errp)
 }
 }
 
-static ram_addr_t get_start_block(DumpState *s)
+static int validate_start_block(DumpState *s)
 {
 GuestPhysBlock *block;
 
 if (!s->has_filter) {
-s->next_block = QTAILQ_FIRST(>guest_phys_blocks.head);
 return 0;
 }
 
 QTAILQ_FOREACH(block, >guest_phys_blocks.head, next) {
+/* This block is out of the range */
 if (block->target_start >= s->begin + s->length ||
 block->target_end <= s->begin) {
-/* This block is out of the range */
 continue;
 }
-
-s->next_block = block;
-if (s->begin > block->target_start) {
-s->start = s->begin - block->target_start;
-} else {
-s->start = 0;
-}
-return s->start;
-}
+return 0;
+   }
 
 return -1;
 }
@@ -1540,25 +1503,17 @@ bool qemu_system_dump_in_progress(void)
 return (qatomic_read(>status) == DUMP_STATUS_ACTIVE);
 }
 
-/* calculate total size of memory to be dumped (taking filter into
- * acoount.) */
+/*
+ * calculate total size of memory to be dumped (taking filter into
+ * account.)
+ */
 static int64_t dump_calculate_size(DumpState *s)
 {
 GuestPhysBlock *block;
-int64_t size = 0, total = 0, left = 0, right = 0;
+int64_t total = 0;
 
 QTAILQ_FOREACH(block, >guest_phys_blocks.head, next) {
-if (s->has_filter) {
-/* calculate the overlapped region. */
-left = MAX(s->begin, block->target_start);
-right = MIN(s->begin + s->length, block->target_end);
-size = right - left;
-size = size > 0 ? size : 0;
-} else {
-/* count the whole region in */
-size = (block->target_end - block->target_start);
-}
-total += size;
+total += dump_get_memblock_size(block, s->begin, s->length);
 }
 
 return total;
@@ -1660,8 +1615,8 @@ static void dump_init(DumpState *s, int fd, bool 
has_format,
 goto cleanup;
 }
 
-s->start = get_start_block(s);
-if (s->start == -1) {
+/* Is the filter filtering everything? */
+if (validate_start_block(s) == -1) {
 error_setg(errp, QERR_INVALID_PARAMETER, "begin");
 goto cleanup;
 }

[PATCH v2 02/11] dump: Allocate header

2022-07-13 Thread Janosch Frank
Allocating the header lets us write it at a later time and hence also
allows us to change section and segment table offsets until we
finally write it.

Signed-off-by: Janosch Frank 
---
 dump/dump.c   | 127 +-
 include/sysemu/dump.h |   1 +
 2 files changed, 64 insertions(+), 64 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 6feba3cbfa..16d7474258 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -98,6 +98,7 @@ static int dump_cleanup(DumpState *s)
 memory_mapping_list_free(>list);
 close(s->fd);
 g_free(s->guest_note);
+g_free(s->elf_header);
 s->guest_note = NULL;
 if (s->resume) {
 if (s->detached) {
@@ -126,73 +127,49 @@ static int fd_write_vmcore(const void *buf, size_t size, 
void *opaque)
 return 0;
 }
 
-static void write_elf64_header(DumpState *s, Error **errp)
+static void prepare_elf64_header(DumpState *s)
 {
-/*
- * phnum in the elf header is 16 bit, if we have more segments we
- * set phnum to PN_XNUM and write the real number of segments to a
- * special section.
- */
-uint16_t phnum = MIN(s->phdr_num, PN_XNUM);
-Elf64_Ehdr elf_header;
-int ret;
+uint16_t phnum = s->phdr_num >= PN_XNUM ? PN_XNUM : s->phdr_num;
+Elf64_Ehdr *elf_header = s->elf_header;
 
-memset(_header, 0, sizeof(Elf64_Ehdr));
-memcpy(_header, ELFMAG, SELFMAG);
-elf_header.e_ident[EI_CLASS] = ELFCLASS64;
-elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
-elf_header.e_ident[EI_VERSION] = EV_CURRENT;
-elf_header.e_type = cpu_to_dump16(s, ET_CORE);
-elf_header.e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
-elf_header.e_version = cpu_to_dump32(s, EV_CURRENT);
-elf_header.e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
-elf_header.e_phoff = cpu_to_dump64(s, s->phdr_offset);
-elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf64_Phdr));
-elf_header.e_phnum = cpu_to_dump16(s, phnum);
+memcpy(elf_header, ELFMAG, SELFMAG);
+elf_header->e_ident[EI_CLASS] = ELFCLASS64;
+elf_header->e_ident[EI_DATA] = s->dump_info.d_endian;
+elf_header->e_ident[EI_VERSION] = EV_CURRENT;
+elf_header->e_type = cpu_to_dump16(s, ET_CORE);
+elf_header->e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
+elf_header->e_version = cpu_to_dump32(s, EV_CURRENT);
+elf_header->e_ehsize = cpu_to_dump16(s, sizeof(*elf_header));
+elf_header->e_phoff = cpu_to_dump64(s, s->phdr_offset);
+elf_header->e_phentsize = cpu_to_dump16(s, sizeof(Elf64_Phdr));
+elf_header->e_phnum = cpu_to_dump16(s, phnum);
 if (s->shdr_num) {
-elf_header.e_shoff = cpu_to_dump64(s, s->shdr_offset);
-elf_header.e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
-elf_header.e_shnum = cpu_to_dump16(s, s->shdr_num);
-}
-
-ret = fd_write_vmcore(_header, sizeof(elf_header), s);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "dump: failed to write elf header");
+elf_header->e_shoff = cpu_to_dump64(s, s->shdr_offset);
+elf_header->e_shentsize = cpu_to_dump16(s, sizeof(Elf64_Shdr));
+elf_header->e_shnum = cpu_to_dump16(s, s->shdr_num);
 }
 }
 
-static void write_elf32_header(DumpState *s, Error **errp)
+static void prepare_elf32_header(DumpState *s)
 {
-/*
- * phnum in the elf header is 16 bit, if we have more segments we
- * set phnum to PN_XNUM and write the real number of segments to a
- * special section.
- */
-uint16_t phnum = MIN(s->phdr_num, PN_XNUM);
-Elf32_Ehdr elf_header;
-int ret;
+uint16_t phnum = s->phdr_num >= PN_XNUM ? PN_XNUM : s->phdr_num;
+Elf32_Ehdr *elf_header = s->elf_header;
 
-memset(_header, 0, sizeof(Elf32_Ehdr));
-memcpy(_header, ELFMAG, SELFMAG);
-elf_header.e_ident[EI_CLASS] = ELFCLASS32;
-elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
-elf_header.e_ident[EI_VERSION] = EV_CURRENT;
-elf_header.e_type = cpu_to_dump16(s, ET_CORE);
-elf_header.e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
-elf_header.e_version = cpu_to_dump32(s, EV_CURRENT);
-elf_header.e_ehsize = cpu_to_dump16(s, sizeof(elf_header));
-elf_header.e_phoff = cpu_to_dump32(s, s->phdr_offset);
-elf_header.e_phentsize = cpu_to_dump16(s, sizeof(Elf32_Phdr));
-elf_header.e_phnum = cpu_to_dump16(s, phnum);
+memcpy(elf_header, ELFMAG, SELFMAG);
+elf_header->e_ident[EI_CLASS] = ELFCLASS32;
+elf_header->e_ident[EI_DATA] = s->dump_info.d_endian;
+elf_header->e_ident[EI_VERSION] = EV_CURRENT;
+elf_header->e_type = cpu_to_dump16(s, ET_CORE);
+elf_header->e_machine = cpu_to_dump16(s, s->dump_info.d_machine);
+elf_header->e_version = cpu_to_dump32(s, EV_CURRENT);
+elf_header->e_ehsize = cpu_to_dump16(s, sizeof(*elf_header));
+elf_header->e_phoff = cpu_to_dump32(s, s->phdr_offset);
+elf_header->e_phentsize = cpu_to_dump16(s, sizeof(Elf32_Phdr));
+elf_header->e_phnum = 

Re: [PATCH 1/4] Modifying ‘migrate’ qmp command to add multi-FD socket on particular source and destination pair

2022-07-13 Thread Claudio Fontana
On 6/16/22 19:26, Dr. David Alan Gilbert wrote:
> * Het Gala (het.g...@nutanix.com) wrote:
>> i) Modified the format of the qemu monitor command : 'migrate' by adding a 
>> list,
>>each element in the list consists of multi-FD connection parameters: 
>> source
>>and destination uris and of the number of multi-fd channels between each 
>> pair.
>>
>> ii) Information of all multi-FD connection parameters’ list, length of the 
>> list
>> and total number of multi-fd channels for all the connections together is
>> stored in ‘OutgoingArgs’ struct.
>>
>> Suggested-by: Manish Mishra 
>> Signed-off-by: Het Gala 
>> ---
>>  include/qapi/util.h   |  9 
>>  migration/migration.c | 47 ++
>>  migration/socket.c| 53 ---
>>  migration/socket.h| 17 +-
>>  monitor/hmp-cmds.c| 22 --
>>  qapi/migration.json   | 43 +++
>>  6 files changed, 170 insertions(+), 21 deletions(-)
>>
>> diff --git a/include/qapi/util.h b/include/qapi/util.h
>> index 81a2b13a33..3041feb3d9 100644
>> --- a/include/qapi/util.h
>> +++ b/include/qapi/util.h
>> @@ -56,4 +56,13 @@ int parse_qapi_name(const char *name, bool complete);
>>  (tail) = &(*(tail))->next; \
>>  } while (0)
>>  
>> +#define QAPI_LIST_LENGTH(list) ({ \
>> +int _len = 0; \
>> +typeof(list) _elem; \
>> +for (_elem = list; _elem != NULL; _elem = _elem->next) { \
>> +_len++; \
>> +} \
>> +_len; \
>> +})
>> +
>>  #endif
> 
> This looks like it should be a separate patch to me (and perhaps size_t
> for len?)
> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 31739b2af9..c408175aeb 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2328,13 +2328,14 @@ static bool migrate_prepare(MigrationState *s, bool 
>> blk, bool blk_inc,
>>  return true;
>>  }
>>  
>> -void qmp_migrate(const char *uri, bool has_blk, bool blk,
>> +void qmp_migrate(const char *uri, bool has_multi_fd_uri_list,
>> + MigrateUriParameterList *cap, bool has_blk, bool blk,
>>   bool has_inc, bool inc, bool has_detach, bool detach,
>>   bool has_resume, bool resume, Error **errp)
>>  {
>>  Error *local_err = NULL;
>>  MigrationState *s = migrate_get_current();
>> -const char *p = NULL;
>> +const char *dst_ptr = NULL;
>>  
>>  if (!migrate_prepare(s, has_blk && blk, has_inc && inc,
>>   has_resume && resume, errp)) {
>> @@ -2348,20 +2349,46 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
>> blk,
>>  }
>>  }
>>  
>> +/*
>> + * In case of Multi-FD migration parameters, if uri is provided,
> 
> I think you mean 'if uri list is provided'
> 
>> + * supports only tcp network protocol.
>> + */
>> +if (has_multi_fd_uri_list) {
>> +int length = QAPI_LIST_LENGTH(cap);
>> +init_multifd_array(length);
>> +for (int i = 0; i < length; i++) {
>> +const char *p1 = NULL, *p2 = NULL;
> 
> Keep these as ps/pd  to make it clear which is source and dest.
> 
>> +const char *multifd_dst_uri = cap->value->destination_uri;
>> +const char *multifd_src_uri = cap->value->source_uri;
>> +uint8_t multifd_channels = cap->value->multifd_channels;
>> +if (!strstart(multifd_dst_uri, "tcp:", ) ||
>> +!strstart(multifd_src_uri, "tcp:", )) {
> 
> I've copied in Claudio Fontana; Claudio is fighting to make snapshots
> faster and has been playing with various multithread schemes for multifd
> with files and fd's;  perhaps the syntax you're proposing doesn't need
> to be limited to tcp.


Hi,

I will try to express our current problem, and see where there might be some 
overlap, maybe you can see more.

The current problem we are facing is, saving or restoring of VM state to disk, 
which in libvirt terms is "virsh save" or "virsh managedsave" and "virsh 
restore" or "virsh start",
is currently needlessly slow with large VMs, using upstream libvirt and qemu.

We need to get the transfer speeds of VM state (mainly RAM) to disk in the 
multiple GiB/s range with modern processors and NVMe disks; we have shown it is 
feasible.

Mainline libvirt uses QEMU migration to "fd://" to implement saving of VMs to 
disk, but adds copying though pipes via a libvirt "iohelper" process to work 
around a set of problems that are still not 100% clear to me;
(I cannot find the right email where this was discussed).

One clearly factual issue is that the QEMU migration stream is not currently 
O_DIRECT friendly, as it assumes it goes to network, and unfortunately for 
large transfers of this kind, O_DIRECT is needed due to the kernel file cache 
trashing problem.

So already, if your series were to address "fd://", it would potentially 
automatically provide an additional feature for libvirt's current save VM 

[PATCH] pc-bios/s390-ccw: add -Wno-array-bounds

2022-07-13 Thread Paolo Bonzini
The option generates a lot of warnings for integers casted to pointers,
for example:

/home/pbonzini/work/upstream/qemu/pc-bios/s390-ccw/dasd-ipl.c:174:19: warning: 
array subscript 0 is outside array bounds of ‘CcwSeekData[0]’ [-Warray-bounds]
  174 | seekData->cyl = 0x00;
  | ~~^~

Signed-off-by: Paolo Bonzini 
---
 pc-bios/s390-ccw/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pc-bios/s390-ccw/Makefile b/pc-bios/s390-ccw/Makefile
index 26ad40f94e..c8784c2a08 100644
--- a/pc-bios/s390-ccw/Makefile
+++ b/pc-bios/s390-ccw/Makefile
@@ -35,6 +35,7 @@ EXTRA_CFLAGS += $(call cc-option,-Werror 
$(EXTRA_CFLAGS),-Wno-stringop-overflow)
 EXTRA_CFLAGS += -ffreestanding -fno-delete-null-pointer-checks -fno-common 
-fPIE
 EXTRA_CFLAGS += -fwrapv -fno-strict-aliasing -fno-asynchronous-unwind-tables
 EXTRA_CFLAGS += $(call cc-option, $(EXTRA_CFLAGS), -fno-stack-protector)
+EXTRA_CFLAGS += $(call cc-option, $(EXTRA_CFLAGS), -Wno-array-bounds)
 EXTRA_CFLAGS += -msoft-float
 EXTRA_CFLAGS += $(call cc-option, $(EXTRA_CFLAGS),-march=z900,-march=z10)
 EXTRA_CFLAGS += -std=gnu99
-- 
2.36.1




[PATCH] scsi/lsi53c895a: really fix use-after-free in lsi_do_msgout (CVE-2022-0216)

2022-07-13 Thread Paolo Bonzini
From: Mauro Matteo Cascella 

Set current_req to NULL, not current_req->req, to prevent reusing a free'd
buffer in case of repeated SCSI cancel requests.  Also apply the fix to
CLEAR QUEUE and BUS DEVICE RESET messages as well, since they also cancel
the request.

Thanks to Alexander Bulekov for providing a reproducer.

Fixes: CVE-2022-0216
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/972
Signed-off-by: Mauro Matteo Cascella 
Tested-by: Alexander Bulekov 
Message-Id: <20220711123316.421279-1-mcasc...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
Adjust the patch from v1 to v2 since the changes crossed
with the pull request.

 hw/scsi/lsi53c895a.c   |  3 +-
 tests/qtest/fuzz-lsi53c895a-test.c | 71 ++
 2 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index 99ea42d49b..ad5f5e5f39 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -1030,7 +1030,7 @@ static void lsi_do_msgout(LSIState *s)
 trace_lsi_do_msgout_abort(current_tag);
 if (current_req && current_req->req) {
 scsi_req_cancel(current_req->req);
-current_req->req = NULL;
+current_req = NULL;
 }
 lsi_disconnect(s);
 break;
@@ -1056,6 +1056,7 @@ static void lsi_do_msgout(LSIState *s)
 /* clear the current I/O process */
 if (s->current) {
 scsi_req_cancel(s->current->req);
+current_req = NULL;
 }
 
 /* As the current implemented devices scsi_disk and scsi_generic
diff --git a/tests/qtest/fuzz-lsi53c895a-test.c 
b/tests/qtest/fuzz-lsi53c895a-test.c
index 2e8e67859e..6872c70d3a 100644
--- a/tests/qtest/fuzz-lsi53c895a-test.c
+++ b/tests/qtest/fuzz-lsi53c895a-test.c
@@ -8,6 +8,74 @@
 #include "qemu/osdep.h"
 #include "libqtest.h"
 
+/*
+ * This used to trigger a UAF in lsi_do_msgout()
+ * https://gitlab.com/qemu-project/qemu/-/issues/972
+ */
+static void test_lsi_do_msgout_cancel_req(void)
+{
+QTestState *s;
+
+s = qtest_init("-M q35 -m 4G -display none -nodefaults "
+   "-device lsi53c895a,id=scsi "
+   "-device scsi-hd,drive=disk0 "
+   "-drive file=null-co://,id=disk0,if=none,format=raw");
+
+qtest_outl(s, 0xcf8, 0x8810);
+qtest_outl(s, 0xcf8, 0xc000);
+qtest_outl(s, 0xcf8, 0x8810);
+qtest_outw(s, 0xcfc, 0x7);
+qtest_outl(s, 0xcf8, 0x8810);
+qtest_outl(s, 0xcfc, 0xc000);
+qtest_outl(s, 0xcf8, 0x8804);
+qtest_outw(s, 0xcfc, 0x05);
+qtest_writeb(s, 0x69736c10, 0x08);
+qtest_writeb(s, 0x69736c13, 0x58);
+qtest_writeb(s, 0x69736c1a, 0x01);
+qtest_writeb(s, 0x69736c1b, 0x06);
+qtest_writeb(s, 0x69736c22, 0x01);
+qtest_writeb(s, 0x69736c23, 0x07);
+qtest_writeb(s, 0x69736c2b, 0x02);
+qtest_writeb(s, 0x69736c48, 0x08);
+qtest_writeb(s, 0x69736c4b, 0x58);
+qtest_writeb(s, 0x69736c52, 0x04);
+qtest_writeb(s, 0x69736c53, 0x06);
+qtest_writeb(s, 0x69736c5b, 0x02);
+qtest_outl(s, 0xc02d, 0x697300);
+qtest_writeb(s, 0x5a554662, 0x01);
+qtest_writeb(s, 0x5a554663, 0x07);
+qtest_writeb(s, 0x5a55466a, 0x10);
+qtest_writeb(s, 0x5a55466b, 0x22);
+qtest_writeb(s, 0x5a55466c, 0x5a);
+qtest_writeb(s, 0x5a55466d, 0x5a);
+qtest_writeb(s, 0x5a55466e, 0x34);
+qtest_writeb(s, 0x5a55466f, 0x5a);
+qtest_writeb(s, 0x5a345a5a, 0x77);
+qtest_writeb(s, 0x5a345a5b, 0x55);
+qtest_writeb(s, 0x5a345a5c, 0x51);
+qtest_writeb(s, 0x5a345a5d, 0x27);
+qtest_writeb(s, 0x27515577, 0x41);
+qtest_outl(s, 0xc02d, 0x5a5500);
+qtest_writeb(s, 0x364001d0, 0x08);
+qtest_writeb(s, 0x364001d3, 0x58);
+qtest_writeb(s, 0x364001da, 0x01);
+qtest_writeb(s, 0x364001db, 0x26);
+qtest_writeb(s, 0x364001dc, 0x0d);
+qtest_writeb(s, 0x364001dd, 0xae);
+qtest_writeb(s, 0x364001de, 0x41);
+qtest_writeb(s, 0x364001df, 0x5a);
+qtest_writeb(s, 0x5a41ae0d, 0xf8);
+qtest_writeb(s, 0x5a41ae0e, 0x36);
+qtest_writeb(s, 0x5a41ae0f, 0xd7);
+qtest_writeb(s, 0x5a41ae10, 0x36);
+qtest_writeb(s, 0x36d736f8, 0x0c);
+qtest_writeb(s, 0x36d736f9, 0x80);
+qtest_writeb(s, 0x36d736fa, 0x0d);
+qtest_outl(s, 0xc02d, 0x364000);
+
+qtest_quit(s);
+}
+
 /*
  * This used to trigger the assert in lsi_do_dma()
  * https://bugs.launchpad.net/qemu/+bug/697510
@@ -44,5 +112,8 @@ int main(int argc, char **argv)
 qtest_add_func("fuzz/lsi53c895a/lsi_do_dma_empty_queue",
test_lsi_do_dma_empty_queue);
 
+qtest_add_func("fuzz/lsi53c895a/lsi_do_msgout_cancel_req",
+   test_lsi_do_msgout_cancel_req);
+
 return g_test_run();
 }
-- 
2.36.1




Re: [PULL 1/3] MAINTAINERS: Add myself as Guest Agent reviewer

2022-07-13 Thread Konstantin Kostiuk
On Wed, Jul 13, 2022 at 2:55 PM Daniel P. Berrangé 
wrote:

> On Wed, Jul 13, 2022 at 02:31:08PM +0300, Konstantin Kostiuk wrote:
> > On Wed, Jul 13, 2022 at 1:38 PM Daniel P. Berrangé 
> > wrote:
> >
> > > On Wed, Jul 13, 2022 at 01:19:06PM +0300, Konstantin Kostiuk wrote:
> > > > Signed-off-by: Konstantin Kostiuk 
> > > > Message-Id: <20220712092715.2136898-1-kkost...@redhat.com>
> > > > Reviewed-by: Philippe Mathieu-Daudé 
> > > > Signed-off-by: Konstantin Kostiuk 
> > > > ---
> > > >  MAINTAINERS | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > > index 450abd0252..b1e73d99f3 100644
> > > > --- a/MAINTAINERS
> > > > +++ b/MAINTAINERS
> > > > @@ -2880,6 +2880,7 @@ T: git https://repo.or.cz/qemu/armbru.git
> > > qapi-next
> > > >
> > > >  QEMU Guest Agent
> > > >  M: Michael Roth 
> > > > +R: Konstantin Kostiuk 
> > >
> > > This pull request contains functional changes under qga/, which
> > > suggests you're acting as a (co-)maintainer for QGA, not merely
> > > a reviewer. I wouldn't normally expect reviewers to send pull
> > > requests for a subsystem. As such should this be "M:", to
> > > indicate co-maintainership and have an explicit ACK from
> > > Michael Roth.
> > >
> >
> > As the maintainer of the Windows part of the Guest Agent, I have added
> > myself
> > as a reviewer so I don't miss out on general patches for the Guest Agent.
> > Some time ago, I asked Michael Roth if I could submit PRs for all guest
> > agent components and he allow me to do this.
> > If need I can add myself as a co-maintainer to Guest Agent not only
> > Guest Agent Windows.
>
> It sounds like you're defacto a co-maintainer already then and
> might as well ackowledge this in MAINTAINERS.
>

Ok. Will resend patch and pull.


>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|
>
>


Re: [RFC v3 1/8] blkio: add io_uring block driver using libblkio

2022-07-13 Thread Hanna Reitz

On 08.07.22 06:17, Stefan Hajnoczi wrote:

libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
high-performance disk I/O. It currently supports io_uring and
virtio-blk-vhost-vdpa with additional drivers under development.

One of the reasons for developing libblkio is that other applications
besides QEMU can use it. This will be particularly useful for
vhost-user-blk which applications may wish to use for connecting to
qemu-storage-daemon.

libblkio also gives us an opportunity to develop in Rust behind a C API
that is easy to consume from QEMU.

This commit adds io_uring and virtio-blk-vhost-vdpa BlockDrivers to QEMU
using libblkio. It will be easy to add other libblkio drivers since they
will share the majority of code.

For now I/O buffers are copied through bounce buffers if the libblkio
driver requires it. Later commits add an optimization for
pre-registering guest RAM to avoid bounce buffers.

The syntax is:

   --blockdev 
io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off

and:

   --blockdev 
virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off

Signed-off-by: Stefan Hajnoczi 
---
  MAINTAINERS   |   6 +
  meson_options.txt |   2 +
  qapi/block-core.json  |  37 +-
  meson.build   |   9 +
  block/blkio.c | 659 ++
  tests/qtest/modules-test.c|   3 +
  block/meson.build |   1 +
  scripts/meson-buildoptions.sh |   3 +
  8 files changed, 718 insertions(+), 2 deletions(-)
  create mode 100644 block/blkio.c


[...]


diff --git a/block/blkio.c b/block/blkio.c
new file mode 100644
index 00..7fbdbd7fae
--- /dev/null
+++ b/block/blkio.c
@@ -0,0 +1,659 @@


Not sure whether it’s necessary, but I would have expected a copyright 
header here.



+#include "qemu/osdep.h"
+#include 
+#include "block/block_int.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "qemu/module.h"
+
+typedef struct BlkAIOCB {
+BlockAIOCB common;
+struct blkio_mem_region mem_region;
+QEMUIOVector qiov;
+struct iovec bounce_iov;
+} BlkioAIOCB;
+
+typedef struct {
+/* Protects ->blkio and request submission on ->blkioq */
+QemuMutex lock;
+
+struct blkio *blkio;
+struct blkioq *blkioq; /* this could be multi-queue in the future */
+int completion_fd;
+
+/* Polling fetches the next completion into this field */
+struct blkio_completion poll_completion;
+
+/* The value of the "mem-region-alignment" property */
+size_t mem_region_alignment;
+
+/* Can we skip adding/deleting blkio_mem_regions? */
+bool needs_mem_regions;
+} BDRVBlkioState;
+
+static void blkio_aiocb_complete(BlkioAIOCB *acb, int ret)
+{
+/* Copy bounce buffer back to qiov */
+if (acb->qiov.niov > 0) {
+qemu_iovec_from_buf(>qiov, 0,
+acb->bounce_iov.iov_base,
+acb->bounce_iov.iov_len);
+qemu_iovec_destroy(>qiov);
+}
+
+acb->common.cb(acb->common.opaque, ret);
+
+if (acb->mem_region.len > 0) {
+BDRVBlkioState *s = acb->common.bs->opaque;
+
+WITH_QEMU_LOCK_GUARD(>lock) {
+blkio_free_mem_region(s->blkio, >mem_region);
+}
+}
+
+qemu_aio_unref(>common);
+}
+
+/*
+ * Only the thread that calls aio_poll() invokes fd and poll handlers.
+ * Therefore locks are not necessary except when accessing s->blkio.
+ *
+ * No locking is performed around blkioq_get_completions() although other
+ * threads may submit I/O requests on s->blkioq. We're assuming there is no
+ * inteference between blkioq_get_completions() and other s->blkioq APIs.
+ */
+
+static void blkio_completion_fd_read(void *opaque)
+{
+BlockDriverState *bs = opaque;
+BDRVBlkioState *s = bs->opaque;
+struct blkio_completion completion;
+uint64_t val;
+ssize_t ret __attribute__((unused));


I’d prefer a `(void)ret;` over this attribute, not least because that 
line would give a nice opportunity to explain in a short comment why we 
ignore this return value that the compiler tells us not to ignore, but 
if you don’t, then this’ll be fine.



+
+/* Polling may have already fetched a completion */
+if (s->poll_completion.user_data != NULL) {
+completion = s->poll_completion;
+
+/* Clear it in case blkio_aiocb_complete() has a nested event loop */
+s->poll_completion.user_data = NULL;
+
+blkio_aiocb_complete(completion.user_data, completion.ret);
+}
+
+/* Reset completion fd status */
+ret = read(s->completion_fd, , sizeof(val));
+
+/*
+ * Reading one completion at a time makes nested event loop re-entrancy
+ * simple. Change this loop to get multiple completions in one go if it
+ * becomes a performance bottleneck.
+ */
+while (blkioq_do_io(s->blkioq, , 0, 1, NULL) == 1) {
+blkio_aiocb_complete(completion.user_data, completion.ret);
+}
+}
+
+static bool 

Re: [PULL 1/3] MAINTAINERS: Add myself as Guest Agent reviewer

2022-07-13 Thread Daniel P . Berrangé
On Wed, Jul 13, 2022 at 02:31:08PM +0300, Konstantin Kostiuk wrote:
> On Wed, Jul 13, 2022 at 1:38 PM Daniel P. Berrangé 
> wrote:
> 
> > On Wed, Jul 13, 2022 at 01:19:06PM +0300, Konstantin Kostiuk wrote:
> > > Signed-off-by: Konstantin Kostiuk 
> > > Message-Id: <20220712092715.2136898-1-kkost...@redhat.com>
> > > Reviewed-by: Philippe Mathieu-Daudé 
> > > Signed-off-by: Konstantin Kostiuk 
> > > ---
> > >  MAINTAINERS | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index 450abd0252..b1e73d99f3 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -2880,6 +2880,7 @@ T: git https://repo.or.cz/qemu/armbru.git
> > qapi-next
> > >
> > >  QEMU Guest Agent
> > >  M: Michael Roth 
> > > +R: Konstantin Kostiuk 
> >
> > This pull request contains functional changes under qga/, which
> > suggests you're acting as a (co-)maintainer for QGA, not merely
> > a reviewer. I wouldn't normally expect reviewers to send pull
> > requests for a subsystem. As such should this be "M:", to
> > indicate co-maintainership and have an explicit ACK from
> > Michael Roth.
> >
> 
> As the maintainer of the Windows part of the Guest Agent, I have added
> myself
> as a reviewer so I don't miss out on general patches for the Guest Agent.
> Some time ago, I asked Michael Roth if I could submit PRs for all guest
> agent components and he allow me to do this.
> If need I can add myself as a co-maintainer to Guest Agent not only
> Guest Agent Windows.

It sounds like you're defacto a co-maintainer already then and
might as well ackowledge this in MAINTAINERS.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PULL 1/3] MAINTAINERS: Add myself as Guest Agent reviewer

2022-07-13 Thread Konstantin Kostiuk
On Wed, Jul 13, 2022 at 1:38 PM Daniel P. Berrangé 
wrote:

> On Wed, Jul 13, 2022 at 01:19:06PM +0300, Konstantin Kostiuk wrote:
> > Signed-off-by: Konstantin Kostiuk 
> > Message-Id: <20220712092715.2136898-1-kkost...@redhat.com>
> > Reviewed-by: Philippe Mathieu-Daudé 
> > Signed-off-by: Konstantin Kostiuk 
> > ---
> >  MAINTAINERS | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 450abd0252..b1e73d99f3 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2880,6 +2880,7 @@ T: git https://repo.or.cz/qemu/armbru.git
> qapi-next
> >
> >  QEMU Guest Agent
> >  M: Michael Roth 
> > +R: Konstantin Kostiuk 
>
> This pull request contains functional changes under qga/, which
> suggests you're acting as a (co-)maintainer for QGA, not merely
> a reviewer. I wouldn't normally expect reviewers to send pull
> requests for a subsystem. As such should this be "M:", to
> indicate co-maintainership and have an explicit ACK from
> Michael Roth.
>

As the maintainer of the Windows part of the Guest Agent, I have added
myself
as a reviewer so I don't miss out on general patches for the Guest Agent.
Some time ago, I asked Michael Roth if I could submit PRs for all guest
agent components and he allow me to do this.
If need I can add myself as a co-maintainer to Guest Agent not only
Guest Agent Windows.


>
> >  S: Maintained
> >  F: qga/
> >  F: docs/interop/qemu-ga.rst
>
>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|
>
>


Re: [PATCH v9 12/14] tests: Add postcopy tls migration test

2022-07-13 Thread Dr. David Alan Gilbert
* Peter Xu (pet...@redhat.com) wrote:
> We just added TLS tests for precopy but not postcopy.  Add the
> corresponding test for vanilla postcopy.
> 
> Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
> will only use unix sockets as channel.
> 
> Signed-off-by: Peter Xu 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  tests/qtest/migration-test.c | 61 ++--
>  1 file changed, 51 insertions(+), 10 deletions(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index f15a7517b1..ee37ad6631 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -566,6 +566,9 @@ typedef struct {
>  
>  /* Optional: set number of migration passes to wait for */
>  unsigned int iterations;
> +
> +/* Postcopy specific fields */
> +void *postcopy_data;
>  } MigrateCommon;
>  
>  static int test_migrate_start(QTestState **from, QTestState **to,
> @@ -1054,15 +1057,19 @@ test_migrate_tls_x509_finish(QTestState *from,
>  
>  static int migrate_postcopy_prepare(QTestState **from_ptr,
>  QTestState **to_ptr,
> -MigrateStart *args)
> +MigrateCommon *args)
>  {
>  g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
>  QTestState *from, *to;
>  
> -if (test_migrate_start(, , uri, args)) {
> +if (test_migrate_start(, , uri, >start)) {
>  return -1;
>  }
>  
> +if (args->start_hook) {
> +args->postcopy_data = args->start_hook(from, to);
> +}
> +
>  migrate_set_capability(from, "postcopy-ram", true);
>  migrate_set_capability(to, "postcopy-ram", true);
>  migrate_set_capability(to, "postcopy-blocktime", true);
> @@ -1082,7 +1089,8 @@ static int migrate_postcopy_prepare(QTestState 
> **from_ptr,
>  return 0;
>  }
>  
> -static void migrate_postcopy_complete(QTestState *from, QTestState *to)
> +static void migrate_postcopy_complete(QTestState *from, QTestState *to,
> +  MigrateCommon *args)
>  {
>  wait_for_migration_complete(from);
>  
> @@ -1093,25 +1101,48 @@ static void migrate_postcopy_complete(QTestState 
> *from, QTestState *to)
>  read_blocktime(to);
>  }
>  
> +if (args->finish_hook) {
> +args->finish_hook(from, to, args->postcopy_data);
> +args->postcopy_data = NULL;
> +}
> +
>  test_migrate_end(from, to, true);
>  }
>  
> -static void test_postcopy(void)
> +static void test_postcopy_common(MigrateCommon *args)
>  {
> -MigrateStart args = {};
>  QTestState *from, *to;
>  
> -if (migrate_postcopy_prepare(, , )) {
> +if (migrate_postcopy_prepare(, , args)) {
>  return;
>  }
>  migrate_postcopy_start(from, to);
> -migrate_postcopy_complete(from, to);
> +migrate_postcopy_complete(from, to, args);
> +}
> +
> +static void test_postcopy(void)
> +{
> +MigrateCommon args = { };
> +
> +test_postcopy_common();
> +}
> +
> +static void test_postcopy_tls_psk(void)
> +{
> +MigrateCommon args = {
> +.start_hook = test_migrate_tls_psk_start_match,
> +.finish_hook = test_migrate_tls_psk_finish,
> +};
> +
> +test_postcopy_common();
>  }
>  
>  static void test_postcopy_recovery(void)
>  {
> -MigrateStart args = {
> -.hide_stderr = true,
> +MigrateCommon args = {
> +.start = {
> +.hide_stderr = true,
> +},
>  };
>  QTestState *from, *to;
>  g_autofree char *uri = NULL;
> @@ -1167,7 +1198,7 @@ static void test_postcopy_recovery(void)
>  /* Restore the postcopy bandwidth to unlimited */
>  migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
>  
> -migrate_postcopy_complete(from, to);
> +migrate_postcopy_complete(from, to, );
>  }
>  
>  static void test_baddest(void)
> @@ -2122,6 +2153,16 @@ int main(int argc, char **argv)
>  if (has_uffd) {
>  qtest_add_func("/migration/postcopy/unix", test_postcopy);
>  qtest_add_func("/migration/postcopy/recovery", 
> test_postcopy_recovery);
> +qtest_add_func("/migration/postcopy/plain", test_postcopy);
> +#ifdef CONFIG_GNUTLS
> +/*
> + * NOTE: psk test is enough for postcopy, as other types of TLS
> + * channels are tested under precopy.  Here what we want to test is 
> the
> + * general postcopy path that has TLS channel enabled.
> + */
> +qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
> +#endif /* CONFIG_GNUTLS */
> +qtest_add_func("/migration/postcopy/recovery", 
> test_postcopy_recovery);
>  }
>  qtest_add_func("/migration/bad_dest", test_baddest);
>  qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PULL 1/3] MAINTAINERS: Add myself as Guest Agent reviewer

2022-07-13 Thread Daniel P . Berrangé
On Wed, Jul 13, 2022 at 01:19:06PM +0300, Konstantin Kostiuk wrote:
> Signed-off-by: Konstantin Kostiuk 
> Message-Id: <20220712092715.2136898-1-kkost...@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Konstantin Kostiuk 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 450abd0252..b1e73d99f3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2880,6 +2880,7 @@ T: git https://repo.or.cz/qemu/armbru.git qapi-next
>  
>  QEMU Guest Agent
>  M: Michael Roth 
> +R: Konstantin Kostiuk 

This pull request contains functional changes under qga/, which
suggests you're acting as a (co-)maintainer for QGA, not merely
a reviewer. I wouldn't normally expect reviewers to send pull
requests for a subsystem. As such should this be "M:", to
indicate co-maintainership and have an explicit ACK from
Michael Roth.

>  S: Maintained
>  F: qga/
>  F: docs/interop/qemu-ga.rst


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-07-13 Thread Gupta, Pankaj




This is the v7 of this series which tries to implement the fd-based KVM
guest private memory. The patches are based on latest kvm/queue branch
commit:

b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU
split_desc_cache only by default capacity

Introduction

In general this patch series introduce fd-based memslot which provides
guest memory through memory file descriptor fd[offset,size] instead of
hva/size. The fd can be created from a supported memory filesystem
like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM


Thinking a bit, As host side fd on tmpfs or shmem will store memory on host
page cache instead of mapping pages into userspace address space. Can we hit
double (un-coordinated) page cache problem with this when guest page cache
is also used?


This is my understanding: in host it will be indeed in page cache (in
current shmem implementation) but that's just the way it allocates and
provides the physical memory for the guest. In guest, guest OS will not
see this fd (absolutely), it only sees guest memory, on top of which it
can build its own page cache system for its own file-mapped content but
that is unrelated to host page cache.


yes. If guest fills its page cache with file backed memory, this at host 
side(on shmem fd backend) will also fill the host page cache fast. This 
can have an impact on performance of guest VM's if host goes to memory 
pressure situation sooner. Or else we end up utilizing way less System 
RAM.


Thanks,
Pankaj





[PULL 3/3] qga: add command 'guest-get-cpustats'

2022-07-13 Thread Konstantin Kostiuk
From: zhenwei pi 

A vCPU thread always reaches 100% utilization when:
- guest uses idle=poll
- disable HLT vm-exit
- enable MWAIT

Add new guest agent command 'guest-get-cpustats' to get guest CPU
statistics, we can know the guest workload and how busy the CPU is.

Reviewed-by: Marc-André Lureau 
Signed-off-by: zhenwei pi 
Message-Id: <20220707005602.696557-3-pizhen...@bytedance.com>
Reviewed-by: Konstantin Kostiuk 
Signed-off-by: Konstantin Kostiuk 
---
 qga/commands-posix.c | 89 
 qga/commands-win32.c |  6 +++
 qga/qapi-schema.json | 81 
 3 files changed, 176 insertions(+)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 0469dc409d..f18530d85f 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2893,6 +2893,90 @@ GuestDiskStatsInfoList *qmp_guest_get_diskstats(Error 
**errp)
 return guest_get_diskstats(errp);
 }
 
+GuestCpuStatsList *qmp_guest_get_cpustats(Error **errp)
+{
+GuestCpuStatsList *head = NULL, **tail = 
+const char *cpustats = "/proc/stat";
+int clk_tck = sysconf(_SC_CLK_TCK);
+FILE *fp;
+size_t n;
+char *line = NULL;
+
+fp = fopen(cpustats, "r");
+if (fp  == NULL) {
+error_setg_errno(errp, errno, "open(\"%s\")", cpustats);
+return NULL;
+}
+
+while (getline(, , fp) != -1) {
+GuestCpuStats *cpustat = NULL;
+GuestLinuxCpuStats *linuxcpustat;
+int i;
+unsigned long user, system, idle, iowait, irq, softirq, steal, guest;
+unsigned long nice, guest_nice;
+char name[64];
+
+i = sscanf(line, "%s %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu",
+   name, , , , , , , ,
+   , , _nice);
+
+/* drop "cpu 1 2 3 ...", get "cpuX 1 2 3 ..." only */
+if ((i == EOF) || strncmp(name, "cpu", 3) || (name[3] == '\0')) {
+continue;
+}
+
+if (i < 5) {
+slog("Parsing cpu stat from %s failed, see \"man proc\"", 
cpustats);
+break;
+}
+
+cpustat = g_new0(GuestCpuStats, 1);
+cpustat->type = GUEST_CPU_STATS_TYPE_LINUX;
+
+linuxcpustat = >u.q_linux;
+linuxcpustat->cpu = atoi([3]);
+linuxcpustat->user = user * 1000 / clk_tck;
+linuxcpustat->nice = nice * 1000 / clk_tck;
+linuxcpustat->system = system * 1000 / clk_tck;
+linuxcpustat->idle = idle * 1000 / clk_tck;
+
+if (i > 5) {
+linuxcpustat->has_iowait = true;
+linuxcpustat->iowait = iowait * 1000 / clk_tck;
+}
+
+if (i > 6) {
+linuxcpustat->has_irq = true;
+linuxcpustat->irq = irq * 1000 / clk_tck;
+linuxcpustat->has_softirq = true;
+linuxcpustat->softirq = softirq * 1000 / clk_tck;
+}
+
+if (i > 8) {
+linuxcpustat->has_steal = true;
+linuxcpustat->steal = steal * 1000 / clk_tck;
+}
+
+if (i > 9) {
+linuxcpustat->has_guest = true;
+linuxcpustat->guest = guest * 1000 / clk_tck;
+}
+
+if (i > 10) {
+linuxcpustat->has_guest = true;
+linuxcpustat->guest = guest * 1000 / clk_tck;
+linuxcpustat->has_guestnice = true;
+linuxcpustat->guestnice = guest_nice * 1000 / clk_tck;
+}
+
+QAPI_LIST_APPEND(tail, cpustat);
+}
+
+free(line);
+fclose(fp);
+return head;
+}
+
 #else /* defined(__linux__) */
 
 void qmp_guest_suspend_disk(Error **errp)
@@ -3247,6 +3331,11 @@ GuestDiskStatsInfoList *qmp_guest_get_diskstats(Error 
**errp)
 return NULL;
 }
 
+GuestCpuStatsList *qmp_guest_get_cpustats(Error **errp)
+{
+error_setg(errp, QERR_UNSUPPORTED);
+return NULL;
+}
 
 #endif /* CONFIG_FSFREEZE */
 
diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 36f94c0f9c..7ed7664715 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -2543,3 +2543,9 @@ GuestDiskStatsInfoList *qmp_guest_get_diskstats(Error 
**errp)
 error_setg(errp, QERR_UNSUPPORTED);
 return NULL;
 }
+
+GuestCpuStatsList *qmp_guest_get_cpustats(Error **errp)
+{
+error_setg(errp, QERR_UNSUPPORTED);
+return NULL;
+}
diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 9fa20e791b..869399ea1a 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -1576,3 +1576,84 @@
 { 'command': 'guest-get-diskstats',
   'returns': ['GuestDiskStatsInfo']
 }
+
+##
+# @GuestCpuStatsType:
+#
+# An enumeration of OS type
+#
+# Since: 7.1
+##
+{ 'enum': 'GuestCpuStatsType',
+  'data': [ 'linux' ] }
+
+
+##
+# @GuestLinuxCpuStats:
+#
+# CPU statistics of Linux
+#
+# @cpu: CPU index in guest OS
+#
+# @user: Time spent in user mode
+#
+# @nice: Time spent in user mode with low priority (nice)
+#
+# @system: Time spent in system mode
+#
+# @idle: Time spent in the idle task
+#
+# @iowait: Time waiting for I/O to complete (since Linux 2.5.41)

[PULL 0/3] Guest Agent patches 2022-07-13

2022-07-13 Thread Konstantin Kostiuk
The following changes since commit 08c8a31214e8ca29e05b9f6c3ee942b28ec58457:

  Merge tag 'pull-tcg-20220712' of https://gitlab.com/rth7680/qemu into staging 
(2022-07-12 11:52:11 +0530)

are available in the Git repository at:

  g...@github.com:kostyanf14/qemu.git tags/qga-win32-pull-2022-07-13

for you to fetch changes up to 1db8a0b0ea2fb72ecab36bd3143a9715c083d5d3:

  qga: add command 'guest-get-cpustats' (2022-07-13 12:19:18 +0300)


qga-win32-pull-2022-07-13


Konstantin Kostiuk (1):
  MAINTAINERS: Add myself as Guest Agent reviewer

Zhenwei Pi (2):
  qapi: Avoid generating C identifier 'linux'
  qga: add command 'guest-get-cpustats'

 MAINTAINERS|  1 +
 qga/commands-posix.c   | 89 ++
 qga/commands-win32.c   |  6 
 qga/qapi-schema.json   | 81 +
 scripts/qapi/common.py |  2 +-
 5 files changed, 178 insertions(+), 1 deletion(-)


--
2.25.1




[PULL 2/3] qapi: Avoid generating C identifier 'linux'

2022-07-13 Thread Konstantin Kostiuk
From: zhenwei pi 

'linux' is not usable as identifier, because C compilers targeting
Linux predefine it as a macro expanding to 1.  Add it to
@polluted_words. 'unix' is already there.

Suggested-by: Markus Armbruster 
Reviewed-by: Marc-André Lureau 
Signed-off-by: zhenwei pi 
Message-Id: <20220707005602.696557-2-pizhen...@bytedance.com>
Reviewed-by: Markus Armbruster 
Signed-off-by: Konstantin Kostiuk 
---
 scripts/qapi/common.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
index 489273574a..737b059e62 100644
--- a/scripts/qapi/common.py
+++ b/scripts/qapi/common.py
@@ -114,7 +114,7 @@ def c_name(name: str, protect: bool = True) -> str:
  'and', 'and_eq', 'bitand', 'bitor', 'compl', 'not',
  'not_eq', 'or', 'or_eq', 'xor', 'xor_eq'])
 # namespace pollution:
-polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386'])
+polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386', 'linux'])
 name = re.sub(r'[^A-Za-z0-9_]', '_', name)
 if protect and (name in (c89_words | c99_words | c11_words | gcc_words
  | cpp_words | polluted_words)
-- 
2.25.1




  1   2   >