RE: [PATCH 5/7] migration/multifd: Add UADK based compression and decompression

2024-06-06 Thread Shameerali Kolothum Thodi via



> -Original Message-
> From: Fabiano Rosas 
> Sent: Wednesday, June 5, 2024 7:57 PM
> To: Shameerali Kolothum Thodi ;
> pet...@redhat.com; yuan1@intel.com
> Cc: qemu-devel@nongnu.org; Linuxarm ; linwenkai
> (C) ; zhangfei@linaro.org; huangchenghai
> 
> Subject: Re: [PATCH 5/7] migration/multifd: Add UADK based compression
> and decompression
> 
> Shameer Kolothum via  writes:
> 
> > Uses UADK wd_do_comp_sync() API to (de)compress a normal page using
> > hardware accelerator.
> >
> > Signed-off-by: Shameer Kolothum
> 
> 
> A couple of comments below.
> 
> Reviewed-by: Fabiano Rosas 
> > ---
> >  migration/multifd-uadk.c | 132
> ++-
> >  1 file changed, 130 insertions(+), 2 deletions(-)
> >
> > diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
> > index 3172e4d5ca..3329819bd4 100644
> > --- a/migration/multifd-uadk.c
> > +++ b/migration/multifd-uadk.c
> > @@ -13,6 +13,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qemu/module.h"
> >  #include "qapi/error.h"
> > +#include "exec/ramblock.h"
> >  #include "migration.h"
> >  #include "multifd.h"
> >  #include "options.h"
> > @@ -140,6 +141,15 @@ static void
> multifd_uadk_send_cleanup(MultiFDSendParams *p, Error **errp)
> >  p->compress_data = NULL;
> >  }
> >
> > +static inline void prepare_next_iov(MultiFDSendParams *p, void *base,
> > +uint32_t len)
> > +{
> > +p->iov[p->iovs_num].iov_base = (uint8_t *)base;
> > +p->iov[p->iovs_num].iov_len = len;
> > +p->next_packet_size += len;
> > +p->iovs_num++;
> > +}
> > +
> >  /**
> >   * multifd_uadk_send_prepare: prepare data to be able to send
> >   *
> > @@ -153,7 +163,56 @@ static void
> multifd_uadk_send_cleanup(MultiFDSendParams *p, Error **errp)
> >   */
> >  static int multifd_uadk_send_prepare(MultiFDSendParams *p, Error
> **errp)
> >  {
> > -return -1;
> > +struct wd_data *uadk_data = p->compress_data;
> > +uint32_t hdr_size;
> > +uint8_t *buf = uadk_data->buf;
> > +int ret = 0;
> > +
> > +if (!multifd_send_prepare_common(p)) {
> > +goto out;
> > +}
> > +
> > +hdr_size = p->pages->normal_num * sizeof(uint32_t);
> > +/* prepare the header that stores the lengths of all compressed data */
> > +prepare_next_iov(p, uadk_data->buf_hdr, hdr_size);
> > +
> > +for (int i = 0; i < p->pages->normal_num; i++) {
> > +struct wd_comp_req creq = {
> > +.op_type = WD_DIR_COMPRESS,
> > +.src = p->pages->block->host + p->pages->offset[i],
> > +.src_len = p->page_size,
> > +.dst = buf,
> > +/* Set dst_len to double the src to take care of -ve 
> > compression */
> 
> What's -ve compression?

Just meant the case where output is > input. I can reword this.

> 
> > +.dst_len = p->page_size * 2,
> > +};
> > +
> > +ret = wd_do_comp_sync(uadk_data->handle, );
> > +if (ret || creq.status) {
> > +error_setg(errp, "multifd %u: failed wd_do_comp_sync, ret %d
> status %d",
> > +   p->id, ret, creq.status);
> > +return -1;
> > +}
> > +if (creq.dst_len < p->page_size) {
> > +uadk_data->buf_hdr[i] = cpu_to_be32(creq.dst_len);
> > +prepare_next_iov(p, buf, creq.dst_len);
> > +buf += creq.dst_len;
> > +} else {
> > +/*
> > + * Send raw data if compressed out >= page_size. We might be
> better
> > + * off sending raw data if output is slightly less than 
> > page_size
> > + * as well because at the receive end we can skip the
> decompression.
> > + * But it is tricky to find the right number here.
> > + */
> > +uadk_data->buf_hdr[i] = cpu_to_be32(p->page_size);
> > +prepare_next_iov(p, p->pages->block->host + 
> > p->pages->offset[i],
> > + p->page_size);
> > +buf += p->page_size;
> > +}
> > +}
> > +out:
> > +p->flags |= MULTIFD_FLAG_UADK;
> > +multifd_send_fill_pa

RE: [PATCH 4/7] migration/multifd: Add UADK initialization

2024-06-05 Thread Shameerali Kolothum Thodi via



> -Original Message-
> From: Fabiano Rosas 
> Sent: Wednesday, June 5, 2024 3:58 PM
> To: Shameerali Kolothum Thodi ;
> pet...@redhat.com; yuan1@intel.com
> Cc: qemu-devel@nongnu.org; Linuxarm ; linwenkai
> (C) ; zhangfei@linaro.org; huangchenghai
> 
> Subject: Re: [PATCH 4/7] migration/multifd: Add UADK initialization
> 
> Shameer Kolothum via  writes:
> 
> > Initialize UADK session and allocate buffers required. The actual
> > compression/decompression will only be done in a subsequent patch.
> >
> > Signed-off-by: Shameer Kolothum
> 
> > ---
> >  migration/multifd-uadk.c | 207
> ++-
> >  1 file changed, 206 insertions(+), 1 deletion(-)
> >
> > diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
> > index c2bb07535b..3172e4d5ca 100644
> > --- a/migration/multifd-uadk.c
> > +++ b/migration/multifd-uadk.c
> > @@ -12,9 +12,214 @@
> >
> >  #include "qemu/osdep.h"
> >  #include "qemu/module.h"
> > +#include "qapi/error.h"
> > +#include "migration.h"
> > +#include "multifd.h"
> > +#include "options.h"
> > +#include "uadk/wd_comp.h"
> > +#include "uadk/wd_sched.h"
> > +
> > +struct wd_data {
> > +handle_t handle;
> > +uint8_t *buf;
> > +uint32_t *buf_hdr;
> > +};
> > +
> > +static bool uadk_hw_initialised(void)
> 
> The first time this is called it will actually do the initialization,
> no? If so, it should be uadk_hw_init().

Ok. Makes sense.

> 
> > +{
> > +char alg[] = "zlib";
> > +int ret;
> > +
> > +ret = wd_comp_init2(alg, SCHED_POLICY_RR, TASK_HW);
> > +if (ret && ret != -WD_EEXIST) {
> > +return false;
> > +} else {
> > +return true;
> > +}
> > +}
> > +
> > +static struct wd_data *multifd_uadk_init_sess(uint32_t count,
> > +  uint32_t page_size,
> > +  bool compress, Error **errp)
> > +{
> > +struct wd_comp_sess_setup ss = {0};
> > +struct sched_params param = {0};
> > +uint32_t size = count * page_size;
> > +struct wd_data *wd;
> > +
> > +if (!uadk_hw_initialised()) {
> > +error_setg(errp, "multifd: UADK hardware not available");
> 
> Does the lib provide a software fallback path that we could use like QPL
> does?

Unfortunately not. That is why I added patch #6 where we will just send
raw data to take care the CI test.

> 
> > +return NULL;
> > +}
> > +
> > +wd = g_new0(struct wd_data, 1);
> > +ss.alg_type = WD_ZLIB;
> > +if (compress) {
> > +ss.op_type = WD_DIR_COMPRESS;
> > +/* Add an additional page for handling output > input */
> > +size += page_size;
> > +} else {
> > +ss.op_type = WD_DIR_DECOMPRESS;
> > +}
> > +param.type = ss.op_type;
> > +ss.sched_param = 
> 
> What about window size and compression level? Don't we need to set them
> here? What do they default to?

Level 1 and 4K. I will add a comment here.
 
> > +
> > +wd->handle = wd_comp_alloc_sess();
> > +if (!wd->handle) {
> > +error_setg(errp, "multifd: failed wd_comp_alloc_sess");
> > +goto out;
> > +}
> > +
> > +wd->buf = g_try_malloc(size);
> > +if (!wd->buf) {
> > +error_setg(errp, "multifd: out of mem for uadk buf");
> > +goto out_free_sess;
> > +}
> > +wd->buf_hdr = g_new0(uint32_t, count);
> > +return wd;
> > +
> > +out_free_sess:
> > +wd_comp_free_sess(wd->handle);
> > +out:
> > +wd_comp_uninit2();
> > +g_free(wd);
> > +return NULL;
> > +}
> > +
> > +static void multifd_uadk_uninit_sess(struct wd_data *wd)
> > +{
> > +wd_comp_free_sess(wd->handle);
> > +wd_comp_uninit2();
> > +g_free(wd->buf);
> > +g_free(wd->buf_hdr);
> > +g_free(wd);
> > +}
> > +
> > +/**
> > + * multifd_uadk_send_setup: setup send side
> > + *
> > + * Returns 0 for success or -1 for error
> > + *
> > + * @p: Params for the channel that we are using
> > + * @errp: pointer to an error
> > + */
> > +static int multifd_uadk_send_setup(MultiFDSendParams *p, Error **errp)
> > +{
>

RE: [PATCH 1/7] docs/migration: add uadk compression feature

2024-05-30 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Liu, Yuan1 
> Sent: Thursday, May 30, 2024 2:25 PM
> To: Shameerali Kolothum Thodi ;
> pet...@redhat.com; faro...@suse.de
> Cc: qemu-devel@nongnu.org; Linuxarm ; linwenkai (C)
> ; zhangfei@linaro.org; huangchenghai
> 
> Subject: RE: [PATCH 1/7] docs/migration: add uadk compression feature
> 
> > -Original Message-
> > From: Shameer Kolothum 
> > Sent: Wednesday, May 29, 2024 5:44 PM
> > To: pet...@redhat.com; faro...@suse.de; Liu, Yuan1 
> > Cc: qemu-devel@nongnu.org; linux...@huawei.com;
> linwenk...@hisilicon.com;
> > zhangfei@linaro.org; huangchengh...@huawei.com
> > Subject: [PATCH 1/7] docs/migration: add uadk compression feature

[...]

> > +Since UADK uses Shared Virtual Addressing(SVA) and device access virtual
> > memory
> > +directly it is possible that SMMUv3 may enounter page faults while
> > walking the
> > +IO page tables. This may impact the performance. In order to mitigate
> > this,
> > +please make sure to specify ``-mem-prealloc`` parameter to the
> > destination VM
> > +boot parameters.
> 
> Thank you so much for putting the IAA solution at the top and cc me.
> 
> I think migration performance will be better with '-mem-prealloc' option,
> but I am considering whether '-mem-prealloc' is a mandatory option, from my
> experience, SVA performance drops mainly caused by IOTLB flush and IO page
> fault,
> I had some discussions with Peter Xu about the IOTLB flush issue, and it has
> been improved.
> https://patchew.org/QEMU/PH7PR11MB5941F04FBFB964CB2C968866A33E2@
> PH7PR11MB5941.namprd11.prod.outlook.com/

Thanks for the link. Yes I have seen that discussion and this series is on top 
of  that
patch for avoiding the zero page read fault.

> 
> For IO page fault, the QPL(IAA userspace library) can process page fault
> request instead of IOMMU,

Sorry I didn't get this part completely. So if the page fault happens how the 
library
can handle it without IOMMU? Or you meant library will do memory perfecting 
before
to avoid the page fault?

 it means we can disable the I/O page fault feature
> on the IAA device, and let the device still use SVA technology to avoid memory
> copy.
> 
> I will provide the test results in my next version, do you have any ideas or
> suggestions about this, thanks.

I think our UADK test tool had an option to prefect the memory(write some 
random data
to memory) to avoid page fault penalty. I am not sure that is exposed through 
the API or not.
I will check with our UADK team.

Please do CC me when you post your next revision.

Thanks,
Shameer


RE: [PATCH 3/7] migration/multifd: add uadk compression framework

2024-05-30 Thread Shameerali Kolothum Thodi via



> -Original Message-
> From: Markus Armbruster 
> Sent: Wednesday, May 29, 2024 12:11 PM
> To: Shameer Kolothum via 
> Cc: pet...@redhat.com; faro...@suse.de; yuan1....@intel.com; Shameerali
> Kolothum Thodi ; Linuxarm
> ; linwenkai (C) ;
> zhangfei@linaro.org; huangchenghai 
> Subject: Re: [PATCH 3/7] migration/multifd: add uadk compression
> framework
> 
> Please cc: maintainers on patches.  You can use
> scripts/get_maintainer.pl to find them.

Sure. My bad.
> 
> Shameer Kolothum via  writes:
> 
> > Adds the skeleton to support uadk compression method.
> > Complete functionality will be added in subsequent patches.
> >
> > Signed-off-by: Shameer Kolothum
> 
> 
> [...]
> 
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 854e8609bd..0eaea9b0c3 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -632,12 +632,15 @@
> >  #   the deflate compression algorithm and use the Intel In-Memory
> Analytics
> >  #   Accelerator(IAA) accelerated compression and decompression. (Since
> 9.1)
> >  #
> > +# @uadk: use UADK library compression method. (Since 9.1)
> 
> Two spaces after '.' for consistency, please.

Ok.

> > +#
> >  # Since: 5.0
> >  ##
> >  { 'enum': 'MultiFDCompression',
> >'data': [ 'none', 'zlib',
> >  { 'name': 'zstd', 'if': 'CONFIG_ZSTD' },
> > -{ 'name': 'qpl', 'if': 'CONFIG_QPL' } ] }
> > +{ 'name': 'qpl', 'if': 'CONFIG_QPL' },
> > +{ 'name': 'uadk', 'if': 'CONFIG_UADK' } ] }
> >
> >  ##
> >  # @MigMode:
> 
> QAPI schema
> Acked-by: Markus Armbruster 

Thanks,
Shameer



RE: [RFC PATCH v2 00/12] Confidential guest-assisted live migration

2023-09-05 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nong
> nu.org] On Behalf Of Dov Murik
> Sent: 23 August 2021 15:16
> To: qemu-devel@nongnu.org
> Cc: Tom Lendacky ; Ashish Kalra
> ; Brijesh Singh ; Michael
> S. Tsirkin ; Steve Rutherford ;
> James Bottomley ; Juan Quintela
> ; Dr. David Alan Gilbert ; Dov
> Murik ; Hubertus Franke ;
> Tobin Feldman-Fitzthum ; Paolo Bonzini
> 
> Subject: [RFC PATCH v2 00/12] Confidential guest-assisted live migration
> 
> This is an RFC series for fast migration of confidential guests using an
> in-guest migration helper that lives in OVMF.  QEMU VM live migration
> needs to read source VM's RAM and write it in the target VM; this
> mechanism doesn't work when the guest memory is encrypted or QEMU is
> prevented from reading it in another way.  In order to support live
> migration in such scenarios, we introduce an in-guest migration helper
> which can securely extract RAM content from the guest in order to send
> it to the target.  The migration helper is implemented as part of the
> VM's firmware in OVMF.
> 
> We've implemented and tested this on AMD SEV, but expect most of the
> processes can be used with other technologies that prevent direct access
> of hypervisor to the guest's memory.  Specifically, we don't use SEV's
> PSP migration commands (SEV_SEND_START, SEV_RECEIVE_START, etc) at all;
> but note that the mirror VM relies on
> KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
> to shared the SEV ASID with the main VM.

Hi Dov,

Sorry if I missed out, but just to check if there are any updates to or revised
one to this series? This guest-assisted method seems to be a good generic
approach for live migration and just wondering whether it is worth taking a
look for ARM CCA as well(I am not sure ARM RMM spec will have any 
specific proposal for live migration or not, but couldn't find anything
public yet).

Please let me know if you plan to re-spin or there are any concerns with
this approach. Appreciate if you can point me to any relevant discussion
threads.

Thanks,
Shameer

> 
> Corresponding RFC patches for OVMF have been posted by Tobin
> Feldman-Fitzthum on edk2-devel [1].  Those include the crux of the
> migration helper: a mailbox protocol over a shared memory page which
> allows communication between QEMU and the migration helper.  In the
> source VM this is used to read a page and encrypt it for transport; in
> the target it is used to decrypt the incoming page and storing the
> content in the correct address in the guest memory.  All encryption and
> decryption operations occur inside the trusted context in the VM, and
> therefore the VM's memory plaintext content is never accessible to the
> hosts participating in the migration.
> 
> In order to allow OVMF to run the migration helper in parallel to the
> guest OS, we use a mirror VM [3], which shares the same memory mapping
> and SEV ASID as the main VM but has its own run loop.  To start the
> mirror vcpu and the migration handler, we added a temporary
> start-migration-handler QMP command; this will be removed in a future
> version to run as part of the migrate QMP command.
> 
> In the target VM we need the migration handler running to receive
> incoming RAM pages; to achieve that, we boot the VM into OVMF with a
> special fw_cfg value that causes OVMF to not boot the guest OS; we then
> allow QEMU to receive an incoming migration by issuing a new
> start-migrate-incoming QMP command.
> 
> The confidential RAM migration requires checking whether a given guest
> RAM page is encrypted or not.  This is achieved using SEV shared regions
> list tracking, which is implemented as part the SEV live migration patch
> series [2].  This feature tracks hypercalls from OVMF and guest Linux to
> report changes of page encryption status so that QEMU has an up-to-date
> view of which memory regions are shared and which are encrypted.
> 
> We left a few unfinished edges in this RFC but decided to publish it to
> start the commmunity discussion.  TODOs:
> 
> 1. QMP commands start-migration-handler and start-migrate-incoming are
>developer tools and should be performed automatically.
> 2. The entry point address of the in-guest migration handler and its GDT
>are currently hard-coded in QEMU (patch 8); instead they should be
>discovered using pc_system_ovmf_table_find.  Same applies for the
>mailbox address (patch 1).
> 3. For simplicity, this patch series forces the use of the
>guest-assisted migration instead of the SEV PSP-based migration.
>Ideally we might want the user to choose the desired mode using
>migrate-set-parameters or a similar mechanism.
> 4. There is currently no discovery protocol between QEMU and OVMF to
>verify that OVMF indeed supports in-guest migration handler.
> 
> 
> List of patches in this series:
> 
> 1-3: introduce new confidtial RAM migration functions which communicate
>  with the migration helper.
> 4-6: 

RE: [PATCH v3] arm/kvm: Enable support for KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE

2023-09-05 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Gavin Shan [mailto:gs...@redhat.com]
> Sent: 31 August 2023 02:43
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org
> Cc: peter.mayd...@linaro.org; ricar...@google.com; Jonathan Cameron
> ; k...@vger.kernel.org; Linuxarm
> 
> Subject: Re: [PATCH v3] arm/kvm: Enable support for
> KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE
> 
> Hi Shameer,
> 
> On 8/30/23 21:48, Shameer Kolothum wrote:
> > Now that we have Eager Page Split support added for ARM in the kernel,
> > enable it in Qemu. This adds,
> >   -eager-split-size to -accel sub-options to set the eager page split chunk
> size.
> >   -enable KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE.
> >
> > The chunk size specifies how many pages to break at a time, using a
> > single allocation. Bigger the chunk size, more pages need to be
> > allocated ahead of time.
> >
> > Signed-off-by: Shameer Kolothum
> 
> > ---
> > v2:
> https://lore.kernel.org/qemu-devel/20230815092709.1290-1-shameerali.kol
> othum.th...@huawei.com/
> > -Addressed comments from Gavin(Thanks).
> > RFC v1:
> https://lore.kernel.org/qemu-devel/20230725150002.621-1-shameerali.kolo
> thum.th...@huawei.com/
> >-Updated qemu-options.hx with description
> >-Addressed review comments from Peter and Gavin(Thanks).
> > ---
> >   accel/kvm/kvm-all.c  |  1 +
> >   include/sysemu/kvm_int.h |  1 +
> >   qemu-options.hx  | 15 +
> >   target/arm/kvm.c | 68
> 
> >   4 files changed, 85 insertions(+)
> >
> 
> One more question below. Please check if it's worthy to be addressed in v4,
> needed
> to resolved other comments. Otherwise, it looks fine to me.
> 
> Reviewed-by: Gavin Shan 

Thanks. I will send out a v4 with the above tag and the below suggestion to 
get rid of the kvm_arm_eager_split_size_valid().

Shameer.

> 
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index 2ba7521695..ff1578bb32 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -3763,6 +3763,7 @@ static void kvm_accel_instance_init(Object *obj)
> >   /* KVM dirty ring is by default off */
> >   s->kvm_dirty_ring_size = 0;
> >   s->kvm_dirty_ring_with_bitmap = false;
> > +s->kvm_eager_split_size = 0;
> >   s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
> >   s->notify_window = 0;
> >   s->xen_version = 0;
> > diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> > index 511b42bde5..a5b9122cb8 100644
> > --- a/include/sysemu/kvm_int.h
> > +++ b/include/sysemu/kvm_int.h
> > @@ -116,6 +116,7 @@ struct KVMState
> >   uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring
> */
> >   uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring
> */
> >   bool kvm_dirty_ring_with_bitmap;
> > +uint64_t kvm_eager_split_size;  /* Eager Page Splitting chunk size */
> >   struct KVMDirtyRingReaper reaper;
> >   NotifyVmexitOption notify_vmexit;
> >   uint32_t notify_window;
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 29b98c3d4c..2e70704ee8 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -186,6 +186,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
> >   "split-wx=on|off (enable TCG split w^x
> mapping)\n"
> >   "tb-size=n (TCG translation block cache size)\n"
> >   "dirty-ring-size=n (KVM dirty ring GFN count,
> default 0)\n"
> > +"eager-split-size=n (KVM Eager Page Split chunk
> size, default 0, disabled. ARM only)\n"
> >   "
> notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM
> exit and set notify window, x86 only)\n"
> >   "thread=single|multi (enable multi-threaded
> TCG)\n", QEMU_ARCH_ALL)
> >   SRST
> > @@ -244,6 +245,20 @@ SRST
> >   is disabled (dirty-ring-size=0).  When enabled, KVM will
> instead
> >   record dirty pages in a bitmap.
> >
> > +``eager-split-size=n``
> > +KVM implements dirty page logging at the PAGE_SIZE granularity
> and
> > +enabling dirty-logging on a huge-page requires breaking it into
> > +PAGE_SIZE pages in the first place. KVM on ARM does this
> splitting
> > +lazily by default. There are performance benefits in doing
> huge-page
> > +split eagerly,

RE: [PATCH v2] arm/kvm: Enable support for KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE

2023-08-30 Thread Shameerali Kolothum Thodi via
> -Original Message-
> From: Gavin Shan [mailto:gs...@redhat.com]
> Sent: 28 August 2023 01:02
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org
> Cc: peter.mayd...@linaro.org; ricar...@google.com; k...@vger.kernel.org;
> Jonathan Cameron ; Linuxarm
> 
> Subject: Re: [PATCH v2] arm/kvm: Enable support for
> KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE
> 
> Hi Shameer,

Hi Gavin,

Agree with all the comments. Will send out a v3 soon.

Thanks,
Shameer
 
> On 8/15/23 19:27, Shameer Kolothum wrote:
> > Now that we have Eager Page Split support added for ARM in the kernel,
> > enable it in Qemu. This adds,
> >   -eager-split-size to -accel sub-options to set the eager page split chunk
> size.
> >   -enable KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE.
> >
> > The chunk size specifies how many pages to break at a time, using a
> > single allocation. Bigger the chunk size, more pages need to be
> > allocated ahead of time.
> >
> > Signed-off-by: Shameer Kolothum
> 
> > ---
> > RFC v1:
> https://lore.kernel.org/qemu-devel/20230725150002.621-1-shameerali.kolo
> thum.th...@huawei.com/
> >-Updated qemu-options.hx with description
> >-Addressed review comments from Peter and Gavin(Thanks).
> > ---
> >   include/sysemu/kvm_int.h |  1 +
> >   qemu-options.hx  | 14 +
> >   target/arm/kvm.c | 62
> 
> >   3 files changed, 77 insertions(+)
> >
> > diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index
> > 511b42bde5..03a1660d40 100644
> > --- a/include/sysemu/kvm_int.h
> > +++ b/include/sysemu/kvm_int.h
> > @@ -116,6 +116,7 @@ struct KVMState
> >   uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring
> */
> >   uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring
> */
> >   bool kvm_dirty_ring_with_bitmap;
> > +uint64_t kvm_eager_split_size; /* Eager Page Splitting chunk size
> > + */
> 
> One more space is needed before the comments, to have same alignment as
> we had. Besides, it needs to be initialized to zero in
> kvm-all.c::kvm_accel_instance_init()
> as we're doing for @kvm_dirty_ring_size.
> 
> >   struct KVMDirtyRingReaper reaper;
> >   NotifyVmexitOption notify_vmexit;
> >   uint32_t notify_window;
> > diff --git a/qemu-options.hx b/qemu-options.hx index
> > 29b98c3d4c..6ef7b89013 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -186,6 +186,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
> >   "split-wx=on|off (enable TCG split w^x
> mapping)\n"
> >   "tb-size=n (TCG translation block cache size)\n"
> >   "dirty-ring-size=n (KVM dirty ring GFN count,
> default 0)\n"
> > +"eager-split-size=n (KVM Eager Page Split chunk
> size, default 0, disabled. ARM only)\n"
> >   "
> notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM
> exit and set notify window, x86 only)\n"
> >   "thread=single|multi (enable multi-threaded
> TCG)\n", QEMU_ARCH_ALL)
> >   SRST
> > @@ -244,6 +245,19 @@ SRST
> >   is disabled (dirty-ring-size=0).  When enabled, KVM will
> instead
> >   record dirty pages in a bitmap.
> >
> > +``eager-split-size=n``
> > +KVM implements dirty page logging at the PAGE_SIZE granularity
> and
> > +enabling dirty-logging on a huge-page requires breaking it into
> > +PAGE_SIZE pages in the first place. KVM on ARM does this
> splitting
> > +lazily by default. There are performance benefits in doing
> huge-page
> > +split eagerly, especially in situations where TLBI costs associated
> > +with break-before-make sequences are considerable and also if
> guest
> > +workloads are read intensive. The size here specifies how many
> pages
> > +to break at a time and needs to be a valid block page size(eg:
> 4KB |
> > +2M | 1G when PAGE_SIZE is 4K). Be wary of specifying a higher
> size as
> > +it will have an impact on the memory. By default, this feature is
> > +disabled (eager-split-size=0).
> > +
> 
> Since 64KB base page size is another popular option, it's worthy to mention
> the supported block sizes for 64KB base page size. I'm not sure about 16KB
> though.
> For this, the comments can be improved as below if you agree. With the
> improvement, 

RE: [RFC PATCH] arm/kvm: Enable support for KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE

2023-08-07 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Gavin Shan [mailto:gs...@redhat.com]
> Sent: 07 August 2023 06:53
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org
> Cc: peter.mayd...@linaro.org; ricar...@google.com; k...@vger.kernel.org;
> Jonathan Cameron ; Linuxarm
> 
> Subject: Re: [RFC PATCH] arm/kvm: Enable support for
> KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE
> 
> 
> On 7/26/23 01:00, Shameer Kolothum wrote:
> > Now that we have Eager Page Split support added for ARM in the kernel[0],
> > enable it in Qemu. This adds,
> >   -eager-split-size to Qemu options to set the eager page split chunk size.
> >   -enable KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE.
> >
> > The chunk size specifies how many pages to break at a time, using a
> > single allocation. Bigger the chunk size, more pages need to be
> > allocated ahead of time.
> >
> > Notes:
> >   - I am not sure whether we need to call kvm_vm_check_extension() for
> > KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE or not as kernel seems to
> disable
> > eager page size by default and it will return zero always.
> >
> >-ToDo: Update qemu-options.hx
> >
> > [0]:
> https://lore.kernel.org/all/168426111477.3193133.1074810619984378093
> 0.b4...@linux.dev/
> >
> > Signed-off-by: Shameer Kolothum
> 
> > ---
> >   include/sysemu/kvm_int.h |  1 +
> >   target/arm/kvm.c | 73
> 
> >   2 files changed, 74 insertions(+)
> >
> > diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> > index 511b42bde5..03a1660d40 100644
> > --- a/include/sysemu/kvm_int.h
> > +++ b/include/sysemu/kvm_int.h
> > @@ -116,6 +116,7 @@ struct KVMState
> >   uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring
> */
> >   uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring
> */
> >   bool kvm_dirty_ring_with_bitmap;
> > +uint64_t kvm_eager_split_size; /* Eager Page Splitting chunk size */
> >   struct KVMDirtyRingReaper reaper;
> >   NotifyVmexitOption notify_vmexit;
> >   uint32_t notify_window;
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > index b4c7654f49..985d901062 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -30,6 +30,7 @@
> >   #include "exec/address-spaces.h"
> >   #include "hw/boards.h"
> >   #include "hw/irq.h"
> > +#include "qapi/visitor.h"
> >   #include "qemu/log.h"
> >
> >   const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
> > @@ -247,6 +248,23 @@ int
> kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool *fixed_ipa)
> >   return ret > 0 ? ret : 40;
> >   }
> >
> > +static bool kvm_arm_eager_split_size_valid(uint64_t req_size, uint32_t
> sizes)
> > +{
> > +int i;
> > +
> > +for (i = 0; i < sizeof(uint32_t) * BITS_PER_BYTE; i++) {
> > +if (!(sizes & (1 << i))) {
> > +continue;
> > +}
> > +
> > +if (req_size == (1 << i)) {
> > +return true;
> > +}
> > +}
> > +
> > +return false;
> > +}
> > +
> >   int kvm_arch_init(MachineState *ms, KVMState *s)
> >   {
> >   int ret = 0;
> > @@ -280,6 +298,21 @@ int kvm_arch_init(MachineState *ms, KVMState
> *s)
> >   }
> >   }
> >
> > +if (s->kvm_eager_split_size) {
> > +uint32_t sizes;
> > +
> > +sizes = kvm_vm_check_extension(s,
> KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES);
> > +if (!sizes) {
> > +error_report("Eager Page Split not supported on host");
> > +} else if
> (!kvm_arm_eager_split_size_valid(s->kvm_eager_split_size,
> > +   sizes)) {
> > +error_report("Eager Page Split requested chunk size not
> valid");
> > +} else if (kvm_vm_enable_cap(s,
> KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE, 0,
> > + s->kvm_eager_split_size)) {
> > +error_report("Failed to set Eager Page Split chunk size");
> > +}
> > +}
> > +
> >   kvm_arm_init_debug(s);
> >
> >   return ret;
> 
> Do we really want to fail when KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES
> isn't supported?
> I think the appropriate behavior is to warn and clear s->kvm_eager_split_size
> for this specific case,

RE: [RFC PATCH] arm/kvm: Enable support for KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE

2023-08-03 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Peter Maydell [mailto:peter.mayd...@linaro.org]
> Sent: 27 July 2023 16:43
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ricar...@google.com;
> k...@vger.kernel.org; Jonathan Cameron ;
> Linuxarm 
> Subject: Re: [RFC PATCH] arm/kvm: Enable support for
> KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE
> 
> On Tue, 25 Jul 2023 at 16:01, Shameer Kolothum
>  wrote:
> >
> > Now that we have Eager Page Split support added for ARM in the kernel[0],
> > enable it in Qemu. This adds,
> >  -eager-split-size to Qemu options to set the eager page split chunk size.
> >  -enable KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE.
> 
> It looks from the code like you've added a new sub-option
> to -accel, not a new global option. This is the right thing,
> but your commit message should document the actual option syntax
> to avoid confusion.

Ok. Will update the commit message.

> > The chunk size specifies how many pages to break at a time, using a
> > single allocation. Bigger the chunk size, more pages need to be
> > allocated ahead of time.
> >
> > Notes:
> >  - I am not sure whether we need to call kvm_vm_check_extension() for
> >KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE or not as kernel seems to
> disable
> >eager page size by default and it will return zero always.
> >
> >   -ToDo: Update qemu-options.hx
> >
> > [0]:
> https://lore.kernel.org/all/168426111477.3193133.1074810619984378093
> 0.b4...@linux.dev/
> 
> Speaking of confusion, this message says "It's an optimization used
> in Google Cloud since 2016 on x86, and for the last couple of months
> on ARM." so I'm not sure why we've ended up with an Arm-specific
> KVM_CAP and code in target/arm/kvm.c rather than something more
> generic ?
> 
> If this is going to arrive for other architectures in the future
> we should probably think about whether some of this code should
> be generic, not arm-specific.
> 
> Also this seems to be an obscure tuning parameter -- it could
> use good documentation so users have some idea when it can help.
> 
> As a more specific case of that: the kernel patchset says it
> makes Arm do the same thing that x86 already does, and split
> the huge pages automatically based on use of the dirty log.
> If the kernel can do this automatically and we never felt
> the need to provide a manual tuning knob for x86, do we even
> need to expose the Arm manual control via QEMU?

From the history of the above series, it looks like, the main argument
for making this a user adjustable knob for ARM is because of the upfront
extra memory allocations required in kernel associated with splitting the
block page. 

https://lore.kernel.org/kvmarm/86v8ktkqfx.wl-...@kernel.org/

https://lore.kernel.org/kvmarm/86h6w70zhc.wl-...@kernel.org/

And the knob for x86 case is a kvm module_param(eager_page_split).
Not clear to me why x86 opted for a module_param per KVM but not
per VM user space one. The discussion can be found here,
https://lore.kernel.org/kvm/YaDrmNVsXSMXR72Z@xz-m1.local/#t


> Other than that, I have a few minor coding things below.
> 
> > +static bool kvm_arm_eager_split_size_valid(uint64_t req_size, uint32_t
> sizes)
> > +{
> > +int i;
> > +
> > +for (i = 0; i < sizeof(uint32_t) * BITS_PER_BYTE; i++) {
> > +if (!(sizes & (1 << i))) {
> > +continue;
> > +}
> > +
> > +if (req_size == (1 << i)) {
> > +return true;
> > +}
> > +}
> 
> We know req_size is a power of 2 here, so if you also explicitly
> rule out 0 then you can do
>  return sizes & (1 << ctz64(req_size));
> rather than having to loop through. (Need to rule out 0
> because otherwise ctz64() returns 64 and the shift is UB.)

Yes, missed that we already handled the != power of 2 case.
Will update as per your next comment on this patch. That
is much simpler. Thanks.

> 
> > +
> > +return false;
> > +}
> > +
> >  int kvm_arch_init(MachineState *ms, KVMState *s)
> >  {
> >  int ret = 0;
> > @@ -280,6 +298,21 @@ int kvm_arch_init(MachineState *ms, KVMState
> *s)
> >  }
> >  }
> >
> > +if (s->kvm_eager_split_size) {
> > +uint32_t sizes;
> > +
> > +sizes = kvm_vm_check_extension(s,
> KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES);
> > +if (!sizes) {
> > +error_report("Eager Page Split not supported on host");
> > +} else if
> (!kvm_arm_eager_split_size_valid(s->kvm_eager_split_size,
> > +  

RE: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental

2023-06-27 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Jason Gunthorpe [mailto:j...@nvidia.com]
> Sent: 27 June 2023 13:30
> To: Cédric Le Goater 
> Cc: Avihai Horon ; Alex Williamson
> ; Joao Martins ;
> Juan Quintela ; Peter Xu ;
> Leonardo Bras ; Zhenzhong Duan
> ; Yishai Hadas ; Maor
> Gottlieb ; Kirti Wankhede ;
> Tarun Gupta ; qemu-devel@nongnu.org; Shameerali
> Kolothum Thodi 
> Subject: Re: [PATCH 3/3] vfio/migration: Make VFIO migration
> non-experimental
> 
> On Tue, Jun 27, 2023 at 02:21:55PM +0200, Cédric Le Goater wrote:
> 
> > We have a way to run and migrate a machine with a device not supporting
> > dirty tracking. Only Hisilicon is in that case today. May be there are
> > plans to add dirty tracking support ?
> 
> Hisilicon will eventually use Joao's work for IOMMU based tracking,
> this is what their HW was designed to do.

Yes. The plan is to make use of SMMUv3 HTTU feature for dirty tracking based
on Joao's work here,
https://lore.kernel.org/kvm/20230518204650.14541-1-joao.m.mart...@oracle.com/

Thanks,
Shameer



RE: [PATCH v11 05/11] vfio/migration: Block multiple devices migration

2023-05-16 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: 16 May 2023 15:28
> To: Shameerali Kolothum Thodi 
> Cc: Jason Gunthorpe ; Avihai Horon ;
> qemu-devel@nongnu.org; Juan Quintela ; Dr. David
> Alan Gilbert ; Michael S. Tsirkin ;
> Cornelia Huck ; Paolo Bonzini
> ; Vladimir Sementsov-Ogievskiy
> ; Cédric Le Goater ; Yishai
> Hadas ; Maor Gottlieb ; Kirti
> Wankhede ; Tarun Gupta ;
> Joao Martins 
> Subject: Re: [PATCH v11 05/11] vfio/migration: Block multiple devices
> migration
> 
> On Tue, 16 May 2023 13:57:22 +
> Shameerali Kolothum Thodi 
> wrote:
> 
> > > -Original Message-
> > > From: Jason Gunthorpe [mailto:j...@nvidia.com]
> > > Sent: 16 May 2023 13:00
> > > To: Shameerali Kolothum Thodi
> 
> > > Cc: Avihai Horon ; qemu-devel@nongnu.org; Alex
> > > Williamson ; Juan Quintela
> > > ; Dr. David Alan Gilbert ;
> > > Michael S. Tsirkin ; Cornelia Huck
> ;
> > > Paolo Bonzini ; Vladimir Sementsov-Ogievskiy
> > > ; Cédric Le Goater ;
> Yishai
> > > Hadas ; Maor Gottlieb ; Kirti
> > > Wankhede ; Tarun Gupta
> ;
> > > Joao Martins 
> > > Subject: Re: [PATCH v11 05/11] vfio/migration: Block multiple devices
> > > migration
> > >
> > > On Tue, May 16, 2023 at 10:03:54AM +, Shameerali Kolothum Thodi
> > > wrote:
> > >
> > > > > Currently VFIO migration doesn't implement some kind of
> intermediate
> > > > > quiescent state in which P2P DMAs are quiesced before stopping or
> > > > > running the device. This can cause problems in multi-device migration
> > > > > where the devices are doing P2P DMAs, since the devices are not
> stopped
> > > > > together at the same time.
> > > > >
> > > > > Until such support is added, block migration of multiple devices.
> > > >
> > > > Missed this one. Currently this blocks even if the attached devices are
> not
> > > > capable of P2P DMAs. eg; HiSilicon ACC devices. These are integrated
> end
> > > point
> > > > devices without any P2P capability between them. Is it Ok to check for
> > > > VFIO_MIGRATION_P2P flag and allow if the devices are not supporting
> > > that?
> > >
> > > Lacking VFIO_MIGRATION_P2P doesn't mean the device is incapable of
> > > P2P, it means the migration can't support P2P.
> > >
> > > We'd need some kind of new flag to check and such devices should be
> > > blocked from creating P2P mappings. Basically we don't currently
> > > fully support devices that are incapable of P2P operations.
> >
> > Ok. I will take a look.
> >
> > > What happens on your platform if a guest tries to do P2P? Does the
> > > platform crash?
> >
> > I am not sure. Since the devices are behind SMMU, I was under the
> assumption
> > that we do have the guarantee of isolation here(grouping). Or this is
> something
> > we are worried only during migration?
> 
> Grouping doesn't guarantee that mappings cannot be created through the
> SMMU between devices.  When we refer to devices being isolated between
> groups, that only excludes internal P2P between devices, for example
> across the internal link between functions with implementation specific
> routing.  For group isolation, the guarantee is that DMA is always
> routed upstream, not that the ultimate target cannot be another device.
> To guarantee lack of P2P the SMMU would need to reject non-memory
> translation targets.  Thanks,

Ok. Got it. So it depends on what SMMU does for that mapping and is not
related to migration per se and has the potential to crash the system if 
SMMU go ahead with that memory access. Isn't it a more generic problem
then when we have multiple devices attached to the VM? I need to check if 
there is anything in SMMU spec that forbids this access.

Thanks,
Shameer




RE: [PATCH v11 05/11] vfio/migration: Block multiple devices migration

2023-05-16 Thread Shameerali Kolothum Thodi via



> -Original Message-
> From: Jason Gunthorpe [mailto:j...@nvidia.com]
> Sent: 16 May 2023 13:00
> To: Shameerali Kolothum Thodi 
> Cc: Avihai Horon ; qemu-devel@nongnu.org; Alex
> Williamson ; Juan Quintela
> ; Dr. David Alan Gilbert ;
> Michael S. Tsirkin ; Cornelia Huck ;
> Paolo Bonzini ; Vladimir Sementsov-Ogievskiy
> ; Cédric Le Goater ; Yishai
> Hadas ; Maor Gottlieb ; Kirti
> Wankhede ; Tarun Gupta ;
> Joao Martins 
> Subject: Re: [PATCH v11 05/11] vfio/migration: Block multiple devices
> migration
> 
> On Tue, May 16, 2023 at 10:03:54AM +, Shameerali Kolothum Thodi
> wrote:
> 
> > > Currently VFIO migration doesn't implement some kind of intermediate
> > > quiescent state in which P2P DMAs are quiesced before stopping or
> > > running the device. This can cause problems in multi-device migration
> > > where the devices are doing P2P DMAs, since the devices are not stopped
> > > together at the same time.
> > >
> > > Until such support is added, block migration of multiple devices.
> >
> > Missed this one. Currently this blocks even if the attached devices are not
> > capable of P2P DMAs. eg; HiSilicon ACC devices. These are integrated end
> point
> > devices without any P2P capability between them. Is it Ok to check for
> > VFIO_MIGRATION_P2P flag and allow if the devices are not supporting
> that?
> 
> Lacking VFIO_MIGRATION_P2P doesn't mean the device is incapable of
> P2P, it means the migration can't support P2P.
> 
> We'd need some kind of new flag to check and such devices should be
> blocked from creating P2P mappings. Basically we don't currently
> fully support devices that are incapable of P2P operations.

Ok. I will take a look.

> What happens on your platform if a guest tries to do P2P? Does the
> platform crash?

I am not sure. Since the devices are behind SMMU, I was under the assumption
that we do have the guarantee of isolation here(grouping). Or this is something
we are worried only during migration?

Thanks,
Shameer
 



RE: [PATCH v11 05/11] vfio/migration: Block multiple devices migration

2023-05-16 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From:
> qemu-devel-bounces+shameerali.kolothum.thodi=huawei@nongnu.org
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nong
> nu.org] On Behalf Of Avihai Horon
> Sent: 16 February 2023 14:36
> To: qemu-devel@nongnu.org
> Cc: Alex Williamson ; Juan Quintela
> ; Dr. David Alan Gilbert ;
> Michael S. Tsirkin ; Cornelia Huck ;
> Paolo Bonzini ; Vladimir Sementsov-Ogievskiy
> ; Cédric Le Goater ; Yishai
> Hadas ; Jason Gunthorpe ; Maor
> Gottlieb ; Avihai Horon ; Kirti
> Wankhede ; Tarun Gupta ;
> Joao Martins 
> Subject: [PATCH v11 05/11] vfio/migration: Block multiple devices migration
> 
> Currently VFIO migration doesn't implement some kind of intermediate
> quiescent state in which P2P DMAs are quiesced before stopping or
> running the device. This can cause problems in multi-device migration
> where the devices are doing P2P DMAs, since the devices are not stopped
> together at the same time.
> 
> Until such support is added, block migration of multiple devices.

Missed this one. Currently this blocks even if the attached devices are not
capable of P2P DMAs. eg; HiSilicon ACC devices. These are integrated end point
devices without any P2P capability between them. Is it Ok to check for
VFIO_MIGRATION_P2P flag and allow if the devices are not supporting that?

I can sent a patch if that’s fine.

Thanks,
Shameer
> 
> Signed-off-by: Avihai Horon 
> Reviewed-by: Cédric Le Goater 
> Reviewed-by: Juan Quintela 
> ---
>  include/hw/vfio/vfio-common.h |  2 ++
>  hw/vfio/common.c  | 53
> +++
>  hw/vfio/migration.c   |  6 
>  3 files changed, 61 insertions(+)
> 
> diff --git a/include/hw/vfio/vfio-common.h
> b/include/hw/vfio/vfio-common.h
> index e573f5a9f1..56b1683824 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -218,6 +218,8 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup)
> VFIOGroupList;
>  extern VFIOGroupList vfio_group_list;
> 
>  bool vfio_mig_active(void);
> +int vfio_block_multiple_devices_migration(Error **errp);
> +void vfio_unblock_multiple_devices_migration(void);
>  int64_t vfio_mig_bytes_transferred(void);
> 
>  #ifdef CONFIG_LINUX
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 3a35f4afad..fe80ccf914 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -41,6 +41,7 @@
>  #include "qapi/error.h"
>  #include "migration/migration.h"
>  #include "migration/misc.h"
> +#include "migration/blocker.h"
>  #include "sysemu/tpm.h"
> 
>  VFIOGroupList vfio_group_list =
> @@ -337,6 +338,58 @@ bool vfio_mig_active(void)
>  return true;
>  }
> 
> +static Error *multiple_devices_migration_blocker;
> +
> +static unsigned int vfio_migratable_device_num(void)
> +{
> +VFIOGroup *group;
> +VFIODevice *vbasedev;
> +unsigned int device_num = 0;
> +
> +QLIST_FOREACH(group, _group_list, next) {
> +QLIST_FOREACH(vbasedev, >device_list, next) {
> +if (vbasedev->migration) {
> +device_num++;
> +}
> +}
> +}
> +
> +return device_num;
> +}
> +
> +int vfio_block_multiple_devices_migration(Error **errp)
> +{
> +int ret;
> +
> +if (multiple_devices_migration_blocker ||
> +vfio_migratable_device_num() <= 1) {
> +return 0;
> +}
> +
> +error_setg(_devices_migration_blocker,
> +   "Migration is currently not supported with multiple "
> +   "VFIO devices");
> +ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
> +if (ret < 0) {
> +error_free(multiple_devices_migration_blocker);
> +multiple_devices_migration_blocker = NULL;
> +}
> +
> +return ret;
> +}
> +
> +void vfio_unblock_multiple_devices_migration(void)
> +{
> +if (!multiple_devices_migration_blocker ||
> +vfio_migratable_device_num() > 1) {
> +return;
> +}
> +
> +migrate_del_blocker(multiple_devices_migration_blocker);
> +error_free(multiple_devices_migration_blocker);
> +multiple_devices_migration_blocker = NULL;
> +}
> +
>  static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>  {
>  VFIOGroup *group;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index e56eef1ee8..8e96999669 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -878,6 +878,11 @@ int vfio_migration_probe(VFIODevice *vbasedev,
> Error **errp)
>  goto add_blocker;
>  }
> 
> +ret = vfio_block_multiple_devices_migration(errp);
> +if (ret) {
> +return ret;
> +}
> +
>  trace_vfio_migration_probe(vbasedev->name, info->index);
>  g_free(info);
>  return 0;
> @@ -904,6 +909,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>  qemu_del_vm_change_state_handler(migration->vm_state);
>  unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio",
> vbasedev);
>  vfio_migration_exit(vbasedev);
> +

RE: KVM Call for 2022-10-18

2022-10-18 Thread Shameerali Kolothum Thodi via



> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nong
> nu.org] On Behalf Of Juan Quintela
> Sent: 14 October 2022 11:11
> To: kvm-devel ; qemu-devel@nongnu.org
> Subject: KVM Call for 2022-10-18
> 
> 
> 
> Hi
> 
> Please, send any topic that you are interested in covering.
> 
> For next week, we have a topic:
> 
> - VFIO and migration
> 
> We are going to discuss what to do with vfio devices that support
> migration.  See my RFC on the list, so far we are discussing:
> 
> - we need a way to know the size of the vfio device state
>   (In the cases we are discussing, they require that the guest is
>   stopped, so I am redoing how we calculate pending state).
> 
> - We need an estimate/exact sizes.
>   Estimate can be the one calculated last time.  This is supposed to be
>   fast, and needs to work with the guest running.
>   Exact size is just that, we have stopped the guest, and we want to
>   know how big is the state for this device, to know if we can complete
>   migration ore we will continue in iterative stage.
> 
> - We need to send the state asynchronously.
>   VFIO devices are very fast at doing whatever they are designed to do.
>   But copying its state to memory is not one of the things that they do
>   fast.  So I am working in an asynchronous way to copy that state in
>   parallel.  The particular setup that caused this problem was using 4
>   network vfio cards in the guest.  Current code will:
> 
>   for i in network cards:
>  copy the state from card i into memory
>  send the state from memory from card i to destination
> 
>   what we want is something like:
> 
>   for i in network cards:
>  start asyrchronous copy the state from card i into memory
> 
>   for i in network cards:
>  wait for copy the state from card i into memory to finish
>  send the state from memory from card i to destination
> 
> So the cards can tranfer its state to memory in parallel.
> 
> 
> At the end of Monday I will send an email with the agenda or the
> cancellation of the call, so hurry up.
> 
> After discussions on the QEMU Summit, we are going to have always open a
> KVM call where you can add topics.
> 
>  Call details:
> 
> By popular demand, a google calendar public entry with it
> 
> 
> https://calendar.google.com/calendar/u/0?cid=ZWdlZDdja2kwNWxtdTF0bm
> d2a2wzdGhpZHNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
> 
> (Let me know if you have any problems with the calendar entry.  I just
> gave up about getting right at the same time CEST, CET, EDT and DST).
Hi,

Just wondering did this call happen? Tried joining in as it was showing
14:00-15:00 in my google calendar(BST), but no luck.

Thanks,
Shameer

> 
> If you need phone number details,  contact me privately
> 
> Thanks, Juan.
> 




RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()

2022-08-30 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 26 August 2022 13:15
> To: 'Laszlo Ersek' ; qemu-devel@nongnu.org;
> qemu-...@nongnu.org
> Cc: imamm...@redhat.com; peter.mayd...@linaro.org; Linuxarm
> ; chenxiang (M) ; Ard
> Biesheuvel (kernel.org address) ; Gerd Hoffmann
> 
> Subject: RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> fw_cfg_modify_bytes_read()
> 
> 
> 
> > -Original Message-
> > From: Laszlo Ersek [mailto:ler...@redhat.com]
> > Sent: 26 August 2022 13:07
> > To: Shameerali Kolothum Thodi
> ;
> > qemu-devel@nongnu.org; qemu-...@nongnu.org
> > Cc: imamm...@redhat.com; peter.mayd...@linaro.org; Linuxarm
> > ; chenxiang (M) ;
> Ard
> > Biesheuvel (kernel.org address) ; Gerd Hoffmann
> > 
> > Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> > fw_cfg_modify_bytes_read()
> >
> > +Ard +Gerd, one pointer at the bottom
> >
> > On 08/26/22 13:59, Laszlo Ersek wrote:
> > > On 08/25/22 18:18, Shameer Kolothum wrote:
> > >> Hi
> > >>
> > >> On arm/virt platform, Chen Xiang reported a Guest crash while
> > >> attempting the below steps,
> > >>
> > >> 1. Launch the Guest with nvdimm=on
> > >> 2. Hot-add a NVDIMM dev
> > >> 3. Reboot
> > >> 4. Guest boots fine.
> > >> 5. Reboot again.
> > >> 6. Guest boot fails.
> > >>
> > >> QEMU_EFI reports the below error:
> > >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> > >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > >>
> > >> Debugging shows that on first reboot(after hot-adding NVDIMM),
> > >> Qemu updates the etc/table-loader len,
> > >>
> > >> qemu_ram_resize()
> > >>   fw_cfg_modify_file()
> > >>      fw_cfg_modify_bytes_read()
> > >>
> > >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> > >> the "key" entry to NULL. Because of this, on the second reboot,
> > >> virt_acpi_build_update() is called with a NULL "build_state" and
> > >> returns without updating the ACPI tables. This seems to be
> > >> upsetting the firmware.
> > >>
> > >> To fix this, don't change the callback_opaque in
> > fw_cfg_modify_bytes_read().
> > >>
> > >> Reported-by: chenxiang 
> > >> Signed-off-by: Shameer Kolothum
> > 
> > >> ---
> > >> I am still not very convinced this is the root cause of the issue.
> > >> Though it looks like setting callback_opaque to NULL while updating
> > >> the file size is wrong, what puzzles me is that on the second reboot
> > >> we don't have any ACPI table size changes and ideally firmware should
> > >> see the updated tables from the first reboot itself.
> > >>
> > >> Please take a look and let me know.
> > >>
> > >> Thanks,
> > >> Shameer
> > >>
> > >> ---
> > >>  hw/nvram/fw_cfg.c | 1 -
> > >>  1 file changed, 1 deletion(-)
> > >>
> > >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> > >> index d605f3f45a..dfe8404c01 100644
> > >> --- a/hw/nvram/fw_cfg.c
> > >> +++ b/hw/nvram/fw_cfg.c
> > >> @@ -728,7 +728,6 @@ static void
> > *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
> > >>  ptr = s->entries[arch][key].data;
> > >>  s->entries[arch][key].data = data;
> > >>  s->entries[arch][key].len = len;
> > >> -s->entries[arch][key].callback_opaque = NULL;
> > >>  s->entries[arch][key].allow_write = false;
> > >>
> > >>  return ptr;
> > >>
> > >
> > > I vaguely recall seeing the same issue report years ago (also in
> > > relation to hot-adding NVDIMM). However, I have no capacity to
> > > participate in the discussion. Making this remark just for clarity.
> >
> > The earlier report I've had in mind was from Shameer as well:
> >
> >
> http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F
> > b...@lhreml524-mbs.china.huawei.com
> 
> Right. That was a slightly different issue though. It was basically ACPI table
> size not
> getting updated on the first reboot of Guest after we hot-add NVDIMM dev.
> The error
> from firmware was different in that case,
> 
> ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> 
> And it was fixed with this series here,
> https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.3
> 0664-1-shameerali.kolothum.th...@huawei.com/
> 
> The current issue only happens on the second reboot of the Guest as
> described in
> the steps above.
> 

[+Christian]

I just found that a similar issue was reported here sometime back on Q35/Windows
setup,
https://patchew.org/QEMU/yldfmtbflucdf...@cae.in-ulm.de/

But there are no further discussions on that thread.

Thanks,
Shameer



RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()

2022-08-26 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Laszlo Ersek [mailto:ler...@redhat.com]
> Sent: 26 August 2022 13:07
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org
> Cc: imamm...@redhat.com; peter.mayd...@linaro.org; Linuxarm
> ; chenxiang (M) ; Ard
> Biesheuvel (kernel.org address) ; Gerd Hoffmann
> 
> Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> fw_cfg_modify_bytes_read()
> 
> +Ard +Gerd, one pointer at the bottom
> 
> On 08/26/22 13:59, Laszlo Ersek wrote:
> > On 08/25/22 18:18, Shameer Kolothum wrote:
> >> Hi
> >>
> >> On arm/virt platform, Chen Xiang reported a Guest crash while
> >> attempting the below steps,
> >>
> >> 1. Launch the Guest with nvdimm=on
> >> 2. Hot-add a NVDIMM dev
> >> 3. Reboot
> >> 4. Guest boots fine.
> >> 5. Reboot again.
> >> 6. Guest boot fails.
> >>
> >> QEMU_EFI reports the below error:
> >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> >>
> >> Debugging shows that on first reboot(after hot-adding NVDIMM),
> >> Qemu updates the etc/table-loader len,
> >>
> >> qemu_ram_resize()
> >>   fw_cfg_modify_file()
> >>      fw_cfg_modify_bytes_read()
> >>
> >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> >> the "key" entry to NULL. Because of this, on the second reboot,
> >> virt_acpi_build_update() is called with a NULL "build_state" and
> >> returns without updating the ACPI tables. This seems to be
> >> upsetting the firmware.
> >>
> >> To fix this, don't change the callback_opaque in
> fw_cfg_modify_bytes_read().
> >>
> >> Reported-by: chenxiang 
> >> Signed-off-by: Shameer Kolothum
> 
> >> ---
> >> I am still not very convinced this is the root cause of the issue.
> >> Though it looks like setting callback_opaque to NULL while updating
> >> the file size is wrong, what puzzles me is that on the second reboot
> >> we don't have any ACPI table size changes and ideally firmware should
> >> see the updated tables from the first reboot itself.
> >>
> >> Please take a look and let me know.
> >>
> >> Thanks,
> >> Shameer
> >>
> >> ---
> >>  hw/nvram/fw_cfg.c | 1 -
> >>  1 file changed, 1 deletion(-)
> >>
> >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> >> index d605f3f45a..dfe8404c01 100644
> >> --- a/hw/nvram/fw_cfg.c
> >> +++ b/hw/nvram/fw_cfg.c
> >> @@ -728,7 +728,6 @@ static void
> *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
> >>  ptr = s->entries[arch][key].data;
> >>  s->entries[arch][key].data = data;
> >>  s->entries[arch][key].len = len;
> >> -s->entries[arch][key].callback_opaque = NULL;
> >>  s->entries[arch][key].allow_write = false;
> >>
> >>  return ptr;
> >>
> >
> > I vaguely recall seeing the same issue report years ago (also in
> > relation to hot-adding NVDIMM). However, I have no capacity to
> > participate in the discussion. Making this remark just for clarity.
> 
> The earlier report I've had in mind was from Shameer as well:
> 
> http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F
> b...@lhreml524-mbs.china.huawei.com

Right. That was a slightly different issue though. It was basically ACPI table 
size not
getting updated on the first reboot of Guest after we hot-add NVDIMM dev. The 
error
from firmware was different in that case,

ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
OnRootBridgesConnected: InstallAcpiTables: Protocol Error

And it was fixed with this series here,
https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.30664-1-shameerali.kolothum.th...@huawei.com/

The current issue only happens on the second reboot of the Guest as described 
in 
the steps above.

Thanks,
Shameer



RE: [RFC 00/18] vfio: Adopt iommufd

2022-06-28 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Yi Liu [mailto:yi.l@intel.com]
> Sent: 18 May 2022 15:01
> To: zhangfei@foxmail.com; Jason Gunthorpe ;
> Zhangfei Gao 
> Cc: eric.au...@redhat.com; Alex Williamson ;
> Shameerali Kolothum Thodi ;
> coh...@redhat.com; qemu-devel@nongnu.org;
> da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
> mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
> jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
> nicol...@nvidia.com; eric.auger@gmail.com; kevin.t...@intel.com;
> chao.p.p...@intel.com; yi.y@intel.com; pet...@redhat.com
> Subject: Re: [RFC 00/18] vfio: Adopt iommufd
> 
> On 2022/5/18 15:22, zhangfei@foxmail.com wrote:
> >
> >
> > On 2022/5/17 下午4:55, Yi Liu wrote:
> >> Hi Zhangfei,
> >>
> >> On 2022/5/12 17:01, zhangfei@foxmail.com wrote:
> >>>
> >>> Hi, Yi
> >>>
> >>> On 2022/5/11 下午10:17, zhangfei@foxmail.com wrote:
> >>>>
> >>>>
> >>>> On 2022/5/10 下午10:08, Yi Liu wrote:
> >>>>> On 2022/5/10 20:45, Jason Gunthorpe wrote:
> >>>>>> On Tue, May 10, 2022 at 08:35:00PM +0800, Zhangfei Gao wrote:
> >>>>>>> Thanks Yi and Eric,
> >>>>>>> Then will wait for the updated iommufd kernel for the PCI MMIO
> region.
> >>>>>>>
> >>>>>>> Another question,
> >>>>>>> How to get the iommu_domain in the ioctl.
> >>>>>>
> >>>>>> The ID of the iommu_domain (called the hwpt) it should be returned
> by
> >>>>>> the vfio attach ioctl.
> >>>>>
> >>>>> yes, hwpt_id is returned by the vfio attach ioctl and recorded in
> >>>>> qemu. You can query page table related capabilities with this id.
> >>>>>
> >>>>>
> https://lore.kernel.org/kvm/20220414104710.28534-16-yi.l@intel.com/
> >>>>>
> >>>> Thanks Yi,
> >>>>
> >>>> Do we use iommufd_hw_pagetable_from_id in kernel?
> >>>>
> >>>> The qemu send hwpt_id via ioctl.
> >>>> Currently VFIOIOMMUFDContainer has hwpt_list,
> >>>> Which member is good to save hwpt_id, IOMMUTLBEntry?
> >>>
> >>> Can VFIOIOMMUFDContainer  have multi hwpt?
> >>
> >> yes, it is possible
> > Then how to get hwpt_id in map/unmap_notify(IOMMUNotifier *n,
> IOMMUTLBEntry
> > *iotlb)
> 
> in map/unmap, should use ioas_id instead of hwpt_id
> 
> >
> >>
> >>> Since VFIOIOMMUFDContainer has hwpt_list now.
> >>> If so, how to get specific hwpt from map/unmap_notify in hw/vfio/as.c,
> >>> where no vbasedev can be used for compare.
> >>>
> >>> I am testing with a workaround, adding VFIOIOASHwpt *hwpt in
> >>> VFIOIOMMUFDContainer.
> >>> And save hwpt when vfio_device_attach_container.
> >>>
> >>>>
> >>>> In kernel ioctl: iommufd_vfio_ioctl
> >>>> @dev: Device to get an iommu_domain for
> >>>> iommufd_hw_pagetable_from_id(struct iommufd_ctx *ictx, u32 pt_id,
> >>>> struct device *dev)
> >>>> But iommufd_vfio_ioctl seems no para dev?
> >>>
> >>> We can set dev=Null since IOMMUFD_OBJ_HW_PAGETABLE does not
> need dev.
> >>> iommufd_hw_pagetable_from_id(ictx, hwpt_id, NULL)
> >>
> >> this is not good. dev is passed in to this function to allocate domain
> >> and also check sw_msi things. If you pass in a NULL, it may even unable
> >> to get a domain for the hwpt. It won't work I guess.
> >
> > The iommufd_hw_pagetable_from_id can be used for
> > 1, allocate domain, which need para dev
> > case IOMMUFD_OBJ_IOAS
> > hwpt = iommufd_hw_pagetable_auto_get(ictx, ioas, dev);
> 
> this is used when attaching ioas.
> 
> > 2. Just return allocated domain via hwpt_id, which does not need dev.
> > case IOMMUFD_OBJ_HW_PAGETABLE:
> > return container_of(obj, struct iommufd_hw_pagetable, obj);
> 
> yes, this would be the usage in nesting. you may check my below
> branch. It's for nesting integration.
> 
> https://github.com/luxis1999/iommufd/tree/iommufd-v5.18-rc4-nesting
> 
> > By the way, any plan of the nested mode?
> I'm working with Eric, Nic on it. Currently, I've got the above kernel
> branch, QEMU side is also WIP.

Hi Yi/Eric,

I had a look at the above nesting kernel and Qemu branches and as mentioned
in the cover letter it is not working on ARM yet.

IIUC, to get it working via the iommufd the main thing is we need a way to 
configure
the phys SMMU in nested mode and setup the mappings for the stage 2. The
Cache/PASID related changes looks more straight forward. 

I had quite a few hacks to get it working on ARM, but still a WIP. So just 
wondering
do you guys have something that can be shared yet?

Please let me know.

Thanks,
Shameer


RE: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Shameerali Kolothum Thodi via


> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 26 April 2022 12:45
> To: Shameerali Kolothum Thodi ; Yi
> Liu ; alex.william...@redhat.com; coh...@redhat.com;
> qemu-devel@nongnu.org
> Cc: da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
> mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
> jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
> j...@nvidia.com; nicol...@nvidia.com; eric.auger@gmail.com;
> kevin.t...@intel.com; chao.p.p...@intel.com; yi.y@intel.com;
> pet...@redhat.com; Zhangfei Gao 
> Subject: Re: [RFC 00/18] vfio: Adopt iommufd

[...]
 
> >>
> https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-iommufd_...@nvidia.com
> >> /
> >> [2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6
> >> [3] https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1
> > Hi,
> >
> > I had a go with the above branches on our ARM64 platform trying to
> pass-through
> > a VF dev, but Qemu reports an error as below,
> >
> > [0.444728] hisi_sec2 :00:01.0: enabling device ( -> 0002)
> > qemu-system-aarch64-iommufd: IOMMU_IOAS_MAP failed: Bad address
> > qemu-system-aarch64-iommufd: vfio_container_dma_map(0xfeb40ce0,
> 0x80, 0x1, 0xb40ef000) = -14 (Bad address)
> >
> > I think this happens for the dev BAR addr range. I haven't debugged the
> kernel
> > yet to see where it actually reports that.
> Does it prevent your assigned device from working? I have such errors
> too but this is a known issue. This is due to the fact P2P DMA is not
> supported yet.
> 

Yes, the basic tests all good so far. I am still not very clear how it works if
the map() fails though. It looks like it fails in,

iommufd_ioas_map()
  iopt_map_user_pages()
   iopt_map_pages()
   ..
 pfn_reader_pin_pages()

So does it mean it just works because the page is resident()?

Thanks,
Shameer





RE: [RFC 00/18] vfio: Adopt iommufd

2022-04-26 Thread Shameerali Kolothum Thodi via



> -Original Message-
> From: Yi Liu [mailto:yi.l@intel.com]
> Sent: 14 April 2022 11:47
> To: alex.william...@redhat.com; coh...@redhat.com;
> qemu-devel@nongnu.org
> Cc: da...@gibson.dropbear.id.au; th...@redhat.com; far...@linux.ibm.com;
> mjros...@linux.ibm.com; akrow...@linux.ibm.com; pa...@linux.ibm.com;
> jjhe...@linux.ibm.com; jasow...@redhat.com; k...@vger.kernel.org;
> j...@nvidia.com; nicol...@nvidia.com; eric.au...@redhat.com;
> eric.auger@gmail.com; kevin.t...@intel.com; yi.l@intel.com;
> chao.p.p...@intel.com; yi.y@intel.com; pet...@redhat.com
> Subject: [RFC 00/18] vfio: Adopt iommufd
> 
> With the introduction of iommufd[1], the linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> Other devices like vpda, vfio mdev and etc. are not considered yet.
> 
> For vfio devices, the new interface is tied with device fd and iommufd
> as the iommufd solution is device-centric. This is different from legacy
> vfio which is group-centric. To support both interfaces in QEMU, this
> series introduces the iommu backend concept in the form of different
> container classes. The existing vfio container is named legacy container
> (equivalent with legacy iommu backend in this series), while the new
> iommufd based container is named as iommufd container (may also be
> mentioned
> as iommufd backend in this series). The two backend types have their own
> way to setup secure context and dma management interface. Below diagram
> shows how it looks like with both BEs.
> 
> VFIO
> AddressSpace/Memory
> +---+  +--+  +-+  +-+
> |  pci  |  | platform |  |  ap |  | ccw |
> +---+---+  ++-+  +--+--+  +--+--+ +--+
> |   |   |||   AddressSpace
> |
> |   |   ||++-+
> +---V---V---VV+   /
> |   VFIOAddressSpace  | <+
> |  |  |  MemoryListener
> |  VFIOContainer list |
> +---+++
> ||
> ||
> +---V--++V--+
> |   iommufd||vfio legacy|
> |  container   || container |
> +---+--+++--+
> ||
> | /dev/iommu | /dev/vfio/vfio
> | /dev/vfio/devices/vfioX| /dev/vfio/$group_id
>  Userspace  ||
> 
> ===++==
> ==
>  Kernel |  device fd |
> +---+| group/container fd
> | (BIND_IOMMUFD ||
> (SET_CONTAINER/SET_IOMMU)
> |  ATTACH_IOAS) || device fd
> |   ||
> |   +---VV-+
> iommufd |   |vfio  |
> (map/unmap  |   +-++---+
>  ioas_copy) | || map/unmap
> | ||
>  +--V--++-V--+  +--V+
>  | iommfd core ||  device|  |  vfio iommu   |
>  +-+++  +---+
> 
> [Secure Context setup]
> - iommufd BE: uses device fd and iommufd to setup secure context
>   (bind_iommufd, attach_ioas)
> - vfio legacy BE: uses group fd and container fd to setup secure context
>   (set_container, set_iommu)
> [Device access]
> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> - vfio legacy BE: device fd is retrieved from group fd ioctl
> [DMA Mapping flow]
> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> - VFIO populates DMA map/unmap via the container BEs
>   *) iommufd BE: uses iommufd
>   *) vfio legacy BE: uses container fd
> 
> This series qomifies the VFIOContainer object which acts as a base class
> for a container. This base class is derived into the legacy VFIO container
> and the new iommufd based container. The base class implements generic
> code
> such as code related to memory_listener and address space management
> whereas
> the derived class implements callbacks that depend on the kernel user space
> being used.
> 
> The selection of the backend is made on a device basis using the new
> iommufd option (on/off/auto). By default the iommufd backend is selected
> if supported by the host 

RE: [RFC v9 16/29] vfio: Pass stage 1 MSI bindings to the host

2021-10-15 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 11 April 2021 13:09
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> qemu-devel@nongnu.org; qemu-...@nongnu.org;
> alex.william...@redhat.com
> Cc: peter.mayd...@linaro.org; jean-phili...@linaro.org; pet...@redhat.com;
> jacob.jun@linux.intel.com; yi.l....@intel.com; Shameerali Kolothum Thodi
> ; t...@semihalf.com;
> nicoleots...@gmail.com; yuzenghui ;
> zhangfei@gmail.com; vivek.gau...@arm.com; jiangkunkun
> ; vdu...@nvidia.com; chenxiang (M)
> ; zhukeqian 
> Subject: [RFC v9 16/29] vfio: Pass stage 1 MSI bindings to the host
> 
> We register the stage1 MSI bindings when enabling the vectors
> and we unregister them on msi disable.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v7 -> v8:
> - add unregistration on msix_diable
> - remove vfio_container_unbind_msis()
> 
> v4 -> v5:
> - use VFIO_IOMMU_SET_MSI_BINDING
> 
> v2 -> v3:
> - only register the notifier if the IOMMU translates MSIs
> - record the msi bindings in a container list and unregister on
>   container release
> ---
>  include/hw/vfio/vfio-common.h | 12 ++
>  hw/vfio/common.c  | 59 +++
>  hw/vfio/pci.c | 76
> ++-
>  hw/vfio/trace-events  |  2 +
>  4 files changed, 147 insertions(+), 2 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 6141162d7a..f30133b2a3 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -74,6 +74,14 @@ typedef struct VFIOAddressSpace {
>  QLIST_ENTRY(VFIOAddressSpace) list;
>  } VFIOAddressSpace;
> 
> +typedef struct VFIOMSIBinding {
> +int index;
> +hwaddr iova;
> +hwaddr gpa;
> +hwaddr size;
> +QLIST_ENTRY(VFIOMSIBinding) next;
> +} VFIOMSIBinding;
> +
>  struct VFIOGroup;
> 
>  typedef struct VFIOContainer {
> @@ -91,6 +99,7 @@ typedef struct VFIOContainer {
>  QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>  QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>  QLIST_HEAD(, VFIOGroup) group_list;
> +QLIST_HEAD(, VFIOMSIBinding) msibinding_list;
>  QLIST_ENTRY(VFIOContainer) next;
>  } VFIOContainer;
> 
> @@ -200,6 +209,9 @@ VFIOGroup *vfio_get_group(int groupid,
> AddressSpace *as, Error **errp);
>  void vfio_put_group(VFIOGroup *group);
>  int vfio_get_device(VFIOGroup *group, const char *name,
>  VFIODevice *vbasedev, Error **errp);
> +int vfio_iommu_set_msi_binding(VFIOContainer *container, int n,
> +   IOMMUTLBEntry *entry);
> +int vfio_iommu_unset_msi_binding(VFIOContainer *container, int n);
> 
>  extern const MemoryRegionOps vfio_region_ops;
>  typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index e369d451e7..970a5a7be7 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -662,6 +662,65 @@ static void
> vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>  }
>  }
> 
> +int vfio_iommu_set_msi_binding(VFIOContainer *container, int n,
> +   IOMMUTLBEntry *iotlb)
> +{
> +struct vfio_iommu_type1_set_msi_binding ustruct;
> +VFIOMSIBinding *binding;
> +int ret;
> +
> +QLIST_FOREACH(binding, >msibinding_list, next) {
> +if (binding->index == n) {
> +return 0;
> +}
> +}
> +
> +ustruct.argsz = sizeof(struct vfio_iommu_type1_set_msi_binding);
> +ustruct.iova = iotlb->iova;
> +ustruct.flags = VFIO_IOMMU_BIND_MSI;
> +ustruct.gpa = iotlb->translated_addr;
> +ustruct.size = iotlb->addr_mask + 1;
> +ret = ioctl(container->fd, VFIO_IOMMU_SET_MSI_BINDING , );
> +if (ret) {
> +error_report("%s: failed to register the stage1 MSI binding (%m)",
> + __func__);
> +return ret;
> +}
> +binding =  g_new0(VFIOMSIBinding, 1);
> +binding->iova = ustruct.iova;
> +binding->gpa = ustruct.gpa;
> +binding->size = ustruct.size;
> +binding->index = n;
> +
> +QLIST_INSERT_HEAD(>msibinding_list, binding, next);
> +return 0;
> +}
> +
> +int vfio_iommu_unset_msi_binding(VFIOContainer *container, int n)
> +{
> +struct vfio_iommu_type1_set_msi_binding ustruct;
> +VFIOMSIBinding *binding, *tmp;
> +int ret;
> +
> +ustruct.argsz = sizeof(struct vfio_iommu_type1_set_msi_binding);
> +QLIST_FOREACH_SAFE(binding, >msibinding_list, next, tmp) {
> +

RE: [PATCH] hw/arm/Kconfig: no need to enable ACPI_MEMORY_HOTPLUG explicitly

2021-08-19 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Philippe Mathieu-Daudé [mailto:phi...@redhat.com]
> Sent: 19 August 2021 15:50
> To: Ani Sinha 
> Cc: Peter Maydell ; QEMU Developers
> ; qemu-arm ; Michael S.
> Tsirkin ; Igor Mammedov ;
> Shameerali Kolothum Thodi 
> Subject: Re: [PATCH] hw/arm/Kconfig: no need to enable
> ACPI_MEMORY_HOTPLUG explicitly
> 
> Cc'ing Shameer Kolothum.
> 
> On 8/19/21 3:36 PM, Ani Sinha wrote:
> > On Thu, 19 Aug 2021, Ani Sinha wrote:
> >> On Thu, 19 Aug 2021, Peter Maydell wrote:
> >>> On Tue, 17 Aug 2021 at 05:45, Ani Sinha  wrote:
> 
> >>> Is it intended that ACPI_HW_REDUCED must always imply
> >>> ACPI_MEMORY_HOTPLUG, or is it just a coincidence that the virt board
> >>> happens to want both, and so we select both ?
> 
> The ACPI dependency was missing (see commit 36b79e3219d,
> "hw/acpi/Kconfig: Add missing Kconfig dependencies (build error)", now we
> don't need it explicitly.

Yes. And it looks like ACPI_NVDIMM also can be removed now.

Regards,
Shameer

> >> From a purely code inspection point of view, I noticed that
> >> generic_event_device.c depends on CONFIG_ACPI_HW_REDUCED. The GED
> >> does use memory hotplug apis - for example acpi_ged_device_plug_cb()
> >> uses
> >> acpi_memory_plug_cb() etc.
> >>
> >> Hence, as it stands today, CONFIG_ACPI_HW_REDUCED will need to select
> >> ACPI memory hotplug. Unless we remove the GED device's dependence on
> >> ACPI_HW_REDUCED that is. I cannot comment whether that would be wise
> >> or if we should reorg the code in some other way.
> >
> > The other question we should ask is whether arm platform requires
> > ACPI_MEMORY_HOTPLUG independent of ACPI_HW_REDUCED/GED device?
> If that
> > is the case, then maybe we should keep that config option as is.
> > Maybe @qemu-arm can answer that?
> 
> Or git-log:
> 
> commit cff51ac978c4fa0b3d0de0fd62d772d9003f123e
> Author: Shameer Kolothum 
> Date:   Wed Sep 18 14:06:27 2019 +0100
> 
> hw/arm/virt: Enable device memory cold/hot plug with ACPI boot
> 
> This initializes the GED device with base memory and irq, configures
> ged memory hotplug event and builds the corresponding aml code. With
> this, both hot and cold plug of device memory is enabled now for
> Guest with ACPI boot. Memory cold plug support with Guest DT boot is
> not yet supported.
> 
> >>>> On Thu, 12 Aug 2021, Ani Sinha wrote:
> >>>>
> 
> Please prepend here 'Since commit 36b79e3219d ("hw/acpi/Kconfig: Add
> missing Kconfig dependencies"),'
> 
> With it:
> Reviewed-by: Philippe Mathieu-Daudé 
> 
> >>>>> ACPI_MEMORY_HOTPLUG is implicitly turned on when
> ACPI_HW_REDUCED is selected.
> >>>>> ACPI_HW_REDUCED is already enabled. No need to turn on
> >>>>> ACPI_MEMORY_HOTPLUG explicitly. This is a minor cleanup.
> >>>>>
> >>>>> Signed-off-by: Ani Sinha 
> >>>>> ---
> >>>>>  hw/arm/Kconfig | 1 -
> >>>>>  1 file changed, 1 deletion(-)
> >>>>>
> >>>>> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index
> >>>>> 4ba0aca067..38cf9f44e2 100644
> >>>>> --- a/hw/arm/Kconfig
> >>>>> +++ b/hw/arm/Kconfig
> >>>>> @@ -25,7 +25,6 @@ config ARM_VIRT
> >>>>>  select ACPI_PCI
> >>>>>  select MEM_DEVICE
> >>>>>  select DIMM
> >>>>> -select ACPI_MEMORY_HOTPLUG
> >>>>>  select ACPI_HW_REDUCED
> >>>>>  select ACPI_NVDIMM
> >>>>>  select ACPI_APEI
> >>>>> --
> >>>>> 2.25.1



RE: [RFC v7 26/26] vfio/pci: Implement return_page_response page response callback

2021-02-24 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 24 February 2021 13:44
> To: Shameerali Kolothum Thodi ;
> eric.auger@gmail.com; qemu-devel@nongnu.org; qemu-...@nongnu.org;
> alex.william...@redhat.com
> Cc: peter.mayd...@linaro.org; jacob.jun@linux.intel.com;
> zhangfei@gmail.com; jean-phili...@linaro.org; t...@semihalf.com;
> pet...@redhat.com; nicoleots...@gmail.com; vivek.gau...@arm.com;
> yi.l@intel.com; Zengtao (B) ; yuzenghui
> ; qubingbing 
> Subject: Re: [RFC v7 26/26] vfio/pci: Implement return_page_response page
> response callback
> 
> Hi Shameer,
[...]
 
> I sent the respin on top of master branch + Jean-Philippe's
> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3.
> because I thought it makes more sense to post on master + some nearly
> "ready to go" stuff.

Yes. I see that. Thanks for the respin. Will take a look at this soon.

> 
> Nevertheless I will do my best to prepare asap a branch based on Jean's
> sva/current branch (based on 5.11-rc5)

Ok.

Cheers,
Shameer




RE: [RFC v7 26/26] vfio/pci: Implement return_page_response page response callback

2021-02-18 Thread Shameerali Kolothum Thodi

Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 18 February 2021 10:42
> To: Shameerali Kolothum Thodi ;
> eric.auger@gmail.com; qemu-devel@nongnu.org; qemu-...@nongnu.org;
> alex.william...@redhat.com
> Cc: peter.mayd...@linaro.org; jacob.jun@linux.intel.com; Zengtao (B)
> ; jean-phili...@linaro.org; t...@semihalf.com;
> pet...@redhat.com; nicoleots...@gmail.com; vivek.gau...@arm.com;
> yi.l@intel.com; zhangfei@gmail.com; yuzenghui
> ; qubingbing 
> Subject: Re: [RFC v7 26/26] vfio/pci: Implement return_page_response page
> response callback
> 
[...]

> > Also, I just noted that this patch breaks the dev hot add/del functionality.
> > device_add works fine but device_del is not removing the dev cleanly.Thank
> you for reporting this!
> 
> The test matrix becomes bigger and bigger :-( I Need to write some
> avocado-vt tests or alike.
> 
> I am currently working on the respin. At the moment I investigate the
> DPDK issue that you reported and I was able to reproduce.

Ok. Good to know that it is reproducible.

> I intend to rebase on top of Jean-Philippe's
> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
> 
> Is that good enough for your SVA integration or do you want I prepare a
> rebase on some extended code?

Could you please try to base it on 
https://jpbrucker.net/git/linux/log/?h=sva/current

I think that has the latest from Jean-Philippe and will be easy to add
uacce/zip specific patches to test SVA/vSVA.

Thanks,
Shameer

 
> Thanks
> 
> Eric
> >
> > The below one fixes it. Please check.
> >
> > Thanks,
> > Shameer
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 797acd9c73..92c1d48316 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -3470,6 +3470,7 @@ static void vfio_instance_finalize(Object *obj)
> >  vfio_display_finalize(vdev);
> >  vfio_bars_finalize(vdev);
> >  vfio_region_finalize(>dma_fault_region);
> > +vfio_region_finalize(>dma_fault_response_region);
> >  g_free(vdev->emulated_config_bits);
> >  g_free(vdev->rom);
> >  /*
> > @@ -3491,6 +3492,7 @@ static void vfio_exitfn(PCIDevice *pdev)
> >  vfio_unregister_err_notifier(vdev);
> >  vfio_unregister_ext_irq_notifiers(vdev);
> >  vfio_region_exit(>dma_fault_region);
> > +vfio_region_exit(>dma_fault_response_region);
> >  pci_device_set_intx_routing_notifier(>pdev, NULL);
> >  if (vdev->irqchip_change_notifier.notify) {
> >
> kvm_irqchip_remove_change_notifier(>irqchip_change_not
> >
> >
> >



RE: [RFC v7 26/26] vfio/pci: Implement return_page_response page response callback

2021-02-18 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 16 November 2020 18:14
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> qemu-devel@nongnu.org; qemu-...@nongnu.org;
> alex.william...@redhat.com
> Cc: peter.mayd...@linaro.org; jean-phili...@linaro.org; pet...@redhat.com;
> jacob.jun@linux.intel.com; yi.l....@intel.com; Shameerali Kolothum Thodi
> ; t...@semihalf.com;
> nicoleots...@gmail.com; yuzenghui ;
> zhangfei@gmail.com; vivek.gau...@arm.com
> Subject: [RFC v7 26/26] vfio/pci: Implement return_page_response page
> response callback
> 
> This patch implements the page response path. The
> response s written into the page response ring buffer and then
> update header's head index is updated. This path is not used
> by this series. It is introduced here as a POC for vSVA/ARM
> integration.
> 
> Signed-off-by: Eric Auger 
> ---
>  hw/vfio/pci.h |   2 +
>  hw/vfio/pci.c | 121
> ++
>  2 files changed, 123 insertions(+)
> 
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 350e9e9005..ce0472611e 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -147,6 +147,8 @@ struct VFIOPCIDevice {
>  VFIOPCIExtIRQ *ext_irqs;
>  VFIORegion dma_fault_region;
>  uint32_t fault_tail_index;
> +VFIORegion dma_fault_response_region;
> +uint32_t fault_response_head_index;
>  int (*resetfn)(struct VFIOPCIDevice *);
>  uint32_t vendor_id;
>  uint32_t device_id;
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 4e3495bb60..797acd9c73 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2631,6 +2631,61 @@ out:
>  g_free(fault_region_info);
>  }
> 
> +static void vfio_init_fault_response_regions(VFIOPCIDevice *vdev, Error
> **errp)
> +{
> +struct vfio_region_info *fault_region_info = NULL;
> +struct vfio_region_info_cap_fault *cap_fault;
> +VFIODevice *vbasedev = >vbasedev;
> +struct vfio_info_cap_header *hdr;
> +char *fault_region_name;
> +int ret;
> +
> +ret = vfio_get_dev_region_info(>vbasedev,
> +   VFIO_REGION_TYPE_NESTED,
> +
> VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT_RESPONSE,
> +   _region_info);
> +if (ret) {
> +goto out;
> +}
> +
> +hdr = vfio_get_region_info_cap(fault_region_info,
> +
> VFIO_REGION_INFO_CAP_DMA_FAULT);

VFIO_REGION_INFO_CAP_DMA_FAULT_RESPONSE ? 

> +if (!hdr) {
> +error_setg(errp, "failed to retrieve DMA FAULT RESPONSE
> capability");
> +goto out;
> +}
> +cap_fault = container_of(hdr, struct vfio_region_info_cap_fault,
> + header);
> +if (cap_fault->version != 1) {
> +error_setg(errp, "Unsupported DMA FAULT RESPONSE API
> version %d",
> +   cap_fault->version);
> +goto out;
> +}
> +
> +fault_region_name = g_strdup_printf("%s DMA FAULT RESPONSE %d",
> +vbasedev->name,
> +fault_region_info->index);
> +
> +ret = vfio_region_setup(OBJECT(vdev), vbasedev,
> +>dma_fault_response_region,
> +fault_region_info->index,
> +fault_region_name);
> +g_free(fault_region_name);
> +if (ret) {
> +error_setg_errno(errp, -ret,
> + "failed to set up the DMA FAULT RESPONSE
> region %d",
> + fault_region_info->index);
> +goto out;
> +}
> +
> +ret = vfio_region_mmap(>dma_fault_response_region);
> +if (ret) {
> +error_setg_errno(errp, -ret, "Failed to mmap the DMA FAULT
> RESPONSE queue");
> +}
> +out:
> +g_free(fault_region_info);
> +}
> +
>  static void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
>  {
>  VFIODevice *vbasedev = >vbasedev;
> @@ -2706,6 +2761,12 @@ static void vfio_populate_device(VFIOPCIDevice
> *vdev, Error **errp)
>  return;
>  }
> 
> +vfio_init_fault_response_regions(vdev, );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +
>  irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
> 
>  ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
> @@ -2884,8 +2945,68 @@ static int vfio_iommu_set_pasid_table(PCIBus
> *bus, int32_t devfn,
>  return ioctl(container->fd, VFIO_IOMMU_SET_PASID_TABLE, );
>  }
> 
>

RE: [PATCH v4] arm/virt: Add memory hot remove support

2020-06-24 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 24 June 2020 15:09
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> peter.mayd...@linaro.org; m...@redhat.com; Linuxarm
> ; xuwei (O) ;
> eric.au...@redhat.com; Zengtao (B) 
> Subject: Re: [PATCH v4] arm/virt: Add memory hot remove support
> 
> On Mon, 22 Jun 2020 13:41:57 +0100
> Shameer Kolothum  wrote:
> 
> > This adds support for memory(pc-dimm) hot remove on arm/virt that
> > uses acpi ged device.
> >
> > NVDIMM hot removal is not yet supported.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> > v2 --> v3
> >   -Addressed Eric's comments on v3.
> > v2 --> v3
> >   -Addressed Eric's review comment and added check for NVDIMM.
> > RFC v1 --> v2
> >   -Rebased on top of latest Qemu master.
> >   -Dropped "RFC" and tested with kernel 5.7-rc6
> > ---
> >  hw/acpi/generic_event_device.c | 29 
> >  hw/arm/virt.c  | 62
> --
> >  2 files changed, 89 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/acpi/generic_event_device.c
> b/hw/acpi/generic_event_device.c
> > index 1cb34111e5..b8abdefa1c 100644
> > --- a/hw/acpi/generic_event_device.c
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -193,6 +193,33 @@ static void

[...]

> > +static void virt_dimm_unplug(HotplugHandler *hotplug_dev,
> > + DeviceState *dev, Error **errp)
> > +{
> > +VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> > +Error *local_err = NULL;
> > +
> > +hotplug_handler_unplug(HOTPLUG_HANDLER(vms->acpi_dev), dev,
> _err);
> > +if (local_err) {
> > +goto out;
> > +}
> > +
> > +pc_dimm_unplug(PC_DIMM(dev), MACHINE(vms));
> > +qdev_unrealize(dev);
> 
> doesn't pc_dimm_unplug() do unrealize already?
> (/me wonders why it doesn't explode here,
> are we leaking a refference somewhere so dimm is still alive?)

Does it? From a quick look at the code it is not obvious.

pc_dimm_unplug()
  memory_device_unplug()
memory_region_del_subregion()
  vmstate_unregister_ram()
qemu_ram_unset_idstr()
qemu_ram_unset_migratable()

If it does, then we may need to fix x86/ppc as well.

Thanks,
Shameer

> > +
> > +out:
> > +error_propagate(errp, local_err);
> > +}
> > +
> >  static void virt_machine_device_unplug_request_cb(HotplugHandler
> *hotplug_dev,
> >DeviceState *dev, Error
> **errp)
> >  {
> > -error_setg(errp, "device unplug request for unsupported device"
> > -   " type: %s", object_get_typename(OBJECT(dev)));
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +virt_dimm_unplug_request(hotplug_dev, dev, errp);
> > +} else {
> > +error_setg(errp, "device unplug request for unsupported device"
> > +   " type: %s", object_get_typename(OBJECT(dev)));
> > +}
> > +}
> > +
> > +static void virt_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
> > +  DeviceState *dev, Error
> **errp)
> > +{
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +virt_dimm_unplug(hotplug_dev, dev, errp);
> > +} else {
> > +error_setg(errp, "virt: device unplug for unsupported device"
> > +   " type: %s", object_get_typename(OBJECT(dev)));
> > +}
> >  }
> >
> >  static HotplugHandler *virt_machine_get_hotplug_handler(MachineState
> *machine,
> > @@ -2262,6 +2319,7 @@ static void virt_machine_class_init(ObjectClass
> *oc, void *data)
> >  hc->pre_plug = virt_machine_device_pre_plug_cb;
> >  hc->plug = virt_machine_device_plug_cb;
> >  hc->unplug_request = virt_machine_device_unplug_request_cb;
> > +hc->unplug = virt_machine_device_unplug_cb;
> >  mc->numa_mem_supported = true;
> >  mc->nvdimm_supported = true;
> >  mc->auto_enable_numa_with_memhp = true;




RE: [PATCH v3] arm/virt: Add memory hot remove support

2020-06-18 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 18 June 2020 15:42
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org
> Cc: peter.mayd...@linaro.org; m...@redhat.com; Linuxarm
> ; xuwei (O) ; Zengtao (B)
> ; imamm...@redhat.com
> Subject: Re: [PATCH v3] arm/virt: Add memory hot remove support
> 
> Hi Shameer,
> 
> On 6/18/20 2:21 PM, Shameer Kolothum wrote:
> > This adds support for memory(pc-dimm) hot remove on arm/virt that uses
> > acpi ged device.
> >
> > NVDIMM hot removal is not yet supported.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> > V2 --> v3
> >   -Addressed Eric's review comment and added check for NVDIMM.
> > RFC v1 --> v2
> >   -Rebased on top of latest Qemu master.
> >   -Dropped "RFC" and tested with kernel 5.7-rc6
> > ---
> >  hw/acpi/generic_event_device.c | 29 
> >  hw/arm/virt.c  | 62
> --
> >  2 files changed, 89 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/acpi/generic_event_device.c
> > b/hw/acpi/generic_event_device.c index 1cb34111e5..b8abdefa1c 100644
> > --- a/hw/acpi/generic_event_device.c
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -193,6 +193,33 @@ static void
> acpi_ged_device_plug_cb(HotplugHandler *hotplug_dev,
> >  }
> >  }
> >
> > +static void acpi_ged_unplug_request_cb(HotplugHandler *hotplug_dev,
> > +   DeviceState *dev, Error
> > +**errp) {
> > +AcpiGedState *s = ACPI_GED(hotplug_dev);
> > +
> > +if ((object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
> > +   !(object_dynamic_cast(OBJECT(dev),
> TYPE_NVDIMM {
> > +acpi_memory_unplug_request_cb(hotplug_dev,
> >memhp_state, dev, errp);
> > +} else {
> > +error_setg(errp, "acpi: device unplug request for unsupported
> device"
> > +   " type: %s", object_get_typename(OBJECT(dev)));
> > +}
> > +}
> > +
> > +static void acpi_ged_unplug_cb(HotplugHandler *hotplug_dev,
> > +   DeviceState *dev, Error **errp) {
> > +AcpiGedState *s = ACPI_GED(hotplug_dev);
> > +
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +acpi_memory_unplug_cb(>memhp_state, dev, errp);
> > +} else {
> > +error_setg(errp, "acpi: device unplug for unsupported device"
> > +   " type: %s", object_get_typename(OBJECT(dev)));
> > +}
> > +}
> > +
> >  static void acpi_ged_send_event(AcpiDeviceIf *adev,
> > AcpiEventStatusBits ev)  {
> >  AcpiGedState *s = ACPI_GED(adev); @@ -318,6 +345,8 @@ static
> void
> > acpi_ged_class_init(ObjectClass *class, void *data)
> >  dc->vmsd = _acpi_ged;
> >
> >  hc->plug = acpi_ged_device_plug_cb;
> > +hc->unplug_request = acpi_ged_unplug_request_cb;
> > +hc->unplug = acpi_ged_unplug_cb;
> >
> >  adevc->send_event = acpi_ged_send_event;  } diff --git
> > a/hw/arm/virt.c b/hw/arm/virt.c index caceb1e4a0..a981dc9f1c 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -2177,11 +2177,68 @@ static void
> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >  }
> >  }
> >
> > +static void virt_dimm_unplug_request(HotplugHandler *hotplug_dev,
> > + DeviceState *dev, Error
> **errp)
> > +{
> > +VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> > +Error *local_err = NULL;
> > +
> > +if (!vms->acpi_dev) {
> > +error_setg(errp,
> local_err? otherwise no need to propagate?

That’s right. I will change that. But since we do check for vms->acpi_dev in
virt_memory_pre_plug(), do we really need to check this here? I can't think of
getting here without first hitting _pre_plug(). Anyway hw/i386/pc.c has got
checks in both the places, so I will keep it.

> > +   "memory hotplug is not enabled: missing acpi-ged
> device");
> > +goto out;
> > +}
> > +
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
> > +error_setg(_err,
> > +   "nvdimm device hot unplug is not supported yet.");
> > +goto out;
> > +}
> > +
> > +hotplug_handler_unplug_request(HOTPLUG_HANDLER(vms->acpi_dev),
> dev,
> > +

RE: [PATCH v2] arm/virt: Add memory hot remove support

2020-06-17 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 17 June 2020 14:54
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org
> Cc: imamm...@redhat.com; peter.mayd...@linaro.org; m...@redhat.com;
> xuwei (O) ; Zengtao (B) ;
> Linuxarm 
> Subject: Re: [PATCH v2] arm/virt: Add memory hot remove support
> 
> Hi Shameer,
> 
> On 5/20/20 1:03 PM, Shameer Kolothum wrote:
> > This adds support for memory hot remove on arm/virt that
> > uses acpi ged device.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> > RFC v1 --> v2
> >   -Rebased on top of latest Qemu master.
> >   -Dropped "RFC" and tested with kernel 5.7-rc6
> > ---
> >  hw/acpi/generic_event_device.c | 28 +
> >  hw/arm/virt.c  | 56
> --
> >  2 files changed, 82 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/acpi/generic_event_device.c
> b/hw/acpi/generic_event_device.c
> > index b1cbdd86b6..2b3bedcd2f 100644
> > --- a/hw/acpi/generic_event_device.c
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -193,6 +193,32 @@ static void
> acpi_ged_device_plug_cb(HotplugHandler *hotplug_dev,
> >  }
> >  }
> >
> > +static void acpi_ged_unplug_request_cb(HotplugHandler *hotplug_dev,
> > +   DeviceState *dev, Error
> **errp)
> > +{
> > +AcpiGedState *s = ACPI_GED(hotplug_dev);
> > +
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +acpi_memory_unplug_request_cb(hotplug_dev,
> >memhp_state, dev, errp);
> is it allowed to unplug NVDIMM? As NVDIMM inherits from PCDIMM, I wonder
> if we have to handle the case differently (as done in hotplug part).

True. This patch requires NVDMM check. I think when I sent out the initial RFC
NVDIMM hot add was not merged and I forgot to update it. My bad.

But not sure we need to add the check here if we take care that in
virt_machine_device_unplug_request_cb() as you have noted below. Do we?
 
> > +} else {
> > +error_setg(errp, "acpi: device unplug request for unsupported
> device"
> > +   " type: %s", object_get_typename(OBJECT(dev)));
> > +}
> > +}
> > +
> > +static void acpi_ged_unplug_cb(HotplugHandler *hotplug_dev,
> > +   DeviceState *dev, Error **errp)
> > +{
> > +AcpiGedState *s = ACPI_GED(hotplug_dev);
> > +
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +acpi_memory_unplug_cb(>memhp_state, dev, errp);
> > +} else {
> > +error_setg(errp, "acpi: device unplug for unsupported device"
> > +   " type: %s", object_get_typename(OBJECT(dev)));
> > +}
> > +}
> > +
> >  static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits
> ev)
> >  {
> >  AcpiGedState *s = ACPI_GED(adev);
> > @@ -318,6 +344,8 @@ static void acpi_ged_class_init(ObjectClass *class,
> void *data)
> >  dc->vmsd = _acpi_ged;
> >
> >  hc->plug = acpi_ged_device_plug_cb;
> > +hc->unplug_request = acpi_ged_unplug_request_cb;
> > +hc->unplug = acpi_ged_unplug_cb;
> >
> >  adevc->send_event = acpi_ged_send_event;
> >  }
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 37462a6f78..110fa73990 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -2177,11 +2177,62 @@ static void
> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >  }
> >  }
> >
> > +static void virt_dimm_unplug_request(HotplugHandler *hotplug_dev,
> > + DeviceState *dev, Error
> **errp)
> > +{
> > +VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> > +Error *local_err = NULL;
> > +
> > +if (!vms->acpi_dev) {
> > +error_setg(errp,
> > +   "memory hotplug is not enabled: missing acpi-ged
> device");
> > +goto out;
> > +}
> > +
> > +hotplug_handler_unplug_request(HOTPLUG_HANDLER(vms->acpi_dev),
> dev,
> > +   _err);
> > +out:
> > +error_propagate(errp, local_err);
> > +}
> > +
> > +static void virt_dimm_unplug(HotplugHandler *hotplug_dev,
> > + DeviceState *dev, Error **errp)
> > +{
> > +VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> > +Error *local_err 

RE: [PATCH v2] arm/virt: Add memory hot remove support

2020-06-17 Thread Shameerali Kolothum Thodi
Hi,

A gentle ping on this one. 

Thanks,
Shameer

> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of Shameer
> Kolothum
> Sent: 20 May 2020 12:04
> To: qemu-devel@nongnu.org; qemu-...@nongnu.org
> Cc: peter.mayd...@linaro.org; m...@redhat.com; Linuxarm
> ; eric.au...@redhat.com; Zengtao (B)
> ; imamm...@redhat.com
> Subject: [PATCH v2] arm/virt: Add memory hot remove support
> 
> This adds support for memory hot remove on arm/virt that uses acpi ged
> device.
> 
> Signed-off-by: Shameer Kolothum 
> ---
> RFC v1 --> v2
>   -Rebased on top of latest Qemu master.
>   -Dropped "RFC" and tested with kernel 5.7-rc6
> ---
>  hw/acpi/generic_event_device.c | 28 +
>  hw/arm/virt.c  | 56
> --
>  2 files changed, 82 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index b1cbdd86b6..2b3bedcd2f 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -193,6 +193,32 @@ static void acpi_ged_device_plug_cb(HotplugHandler
> *hotplug_dev,
>  }
>  }
> 
> +static void acpi_ged_unplug_request_cb(HotplugHandler *hotplug_dev,
> +   DeviceState *dev, Error
> **errp)
> +{
> +AcpiGedState *s = ACPI_GED(hotplug_dev);
> +
> +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +acpi_memory_unplug_request_cb(hotplug_dev, >memhp_state,
> dev, errp);
> +} else {
> +error_setg(errp, "acpi: device unplug request for unsupported
> device"
> +   " type: %s", object_get_typename(OBJECT(dev)));
> +}
> +}
> +
> +static void acpi_ged_unplug_cb(HotplugHandler *hotplug_dev,
> +   DeviceState *dev, Error **errp) {
> +AcpiGedState *s = ACPI_GED(hotplug_dev);
> +
> +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +acpi_memory_unplug_cb(>memhp_state, dev, errp);
> +} else {
> +error_setg(errp, "acpi: device unplug for unsupported device"
> +   " type: %s", object_get_typename(OBJECT(dev)));
> +}
> +}
> +
>  static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
> {
>  AcpiGedState *s = ACPI_GED(adev);
> @@ -318,6 +344,8 @@ static void acpi_ged_class_init(ObjectClass *class,
> void *data)
>  dc->vmsd = _acpi_ged;
> 
>  hc->plug = acpi_ged_device_plug_cb;
> +hc->unplug_request = acpi_ged_unplug_request_cb;
> +hc->unplug = acpi_ged_unplug_cb;
> 
>  adevc->send_event = acpi_ged_send_event;  } diff --git a/hw/arm/virt.c
> b/hw/arm/virt.c index 37462a6f78..110fa73990 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2177,11 +2177,62 @@ static void
> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>  }
>  }
> 
> +static void virt_dimm_unplug_request(HotplugHandler *hotplug_dev,
> + DeviceState *dev, Error **errp)
> {
> +VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> +Error *local_err = NULL;
> +
> +if (!vms->acpi_dev) {
> +error_setg(errp,
> +   "memory hotplug is not enabled: missing acpi-ged
> device");
> +goto out;
> +}
> +
> +hotplug_handler_unplug_request(HOTPLUG_HANDLER(vms->acpi_dev),
> dev,
> +   _err);
> +out:
> +error_propagate(errp, local_err);
> +}
> +
> +static void virt_dimm_unplug(HotplugHandler *hotplug_dev,
> + DeviceState *dev, Error **errp) {
> +VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> +Error *local_err = NULL;
> +
> +hotplug_handler_unplug(HOTPLUG_HANDLER(vms->acpi_dev), dev,
> _err);
> +if (local_err) {
> +goto out;
> +}
> +
> +pc_dimm_unplug(PC_DIMM(dev), MACHINE(vms));
> +object_property_set_bool(OBJECT(dev), false, "realized", NULL);
> +
> + out:
> +error_propagate(errp, local_err);
> +}
> +
>  static void virt_machine_device_unplug_request_cb(HotplugHandler
> *hotplug_dev,
>DeviceState *dev, Error
> **errp)  {
> -error_setg(errp, "device unplug request for unsupported device"
> -   " type: %s", object_get_typename(OBJECT(dev)));
> +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +virt_dimm_unplug_request(hotplug_dev, dev, errp);
> +} else {
> +error_setg(errp, "device unplug request for unsupported device"
> +   " type: %s", object_get_typename(OBJECT(dev)));
> +}
> +}
> +
> +static void virt_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
> +  DeviceState *dev, Error
> +**errp) {
> +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +virt_dimm_unplug(hotplug_dev, dev, errp);
> +} else {
> +error_setg(errp, "virt: device unplug for unsupported device"
> +   " type: %s", 

RE: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration

2020-04-03 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 03 April 2020 11:45
> To: Shameerali Kolothum Thodi ;
> eric.auger@gmail.com; qemu-devel@nongnu.org; qemu-...@nongnu.org;
> peter.mayd...@linaro.org; m...@redhat.com; alex.william...@redhat.com;
> jacob.jun@linux.intel.com; yi.l@intel.com
> Cc: pet...@redhat.com; jean-phili...@linaro.org; w...@kernel.org;
> tnowi...@marvell.com; zhangfei@foxmail.com; zhangfei@linaro.org;
> m...@kernel.org; bbhush...@marvell.com
> Subject: Re: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration
> 
> Hi Shameer,
> 
> On 3/25/20 12:35 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Eric Auger [mailto:eric.au...@redhat.com]
> >> Sent: 20 March 2020 16:58
> >> To: eric.auger@gmail.com; eric.au...@redhat.com;
> >> qemu-devel@nongnu.org; qemu-...@nongnu.org;
> peter.mayd...@linaro.org;
> >> m...@redhat.com; alex.william...@redhat.com;
> >> jacob.jun@linux.intel.com; yi.l....@intel.com
> >> Cc: pet...@redhat.com; jean-phili...@linaro.org; w...@kernel.org;
> >> tnowi...@marvell.com; Shameerali Kolothum Thodi
> >> ; zhangfei@foxmail.com;
> >> zhangfei@linaro.org; m...@kernel.org; bbhush...@marvell.com
> >> Subject: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration
> >>
> >> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
> >> integration requires to program the physical IOMMU consistently
> >> with the guest mappings. However, as opposed to VTD, SMMUv3 has
> >> no "Caching Mode" which allows easy trapping of guest mappings.
> >> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
> >>
> >> However SMMUv3 has 2 translation stages. This was devised with
> >> virtualization use case in mind where stage 1 is "owned" by the
> >> guest whereas the host uses stage 2 for VM isolation.
> >>
> >> This series sets up this nested translation stage. It only works
> >> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
> >> other words, it does not work if there is a physical SMMUv2).
> >
> > I was testing this series on one of our hardware board with SMMUv3. I did
> > observe an issue while trying to bring up Guest with and without the
> vsmmuV3.
> 
> I am currently investigating and up to now I fail to reproduce on my end.
> >
> > Steps are like below,
> >
> > 1. start a guest with "iommu=smmuv3" and a n/w vf device.
> >
> > 2.Exit the VM.
> how to you exit the VM?

QMP system_powerdown

> >
> > 3. start the guest again without "iommu=smmuv3"
> >
> > This time qemu crashes with,
> >
> > [ 0.447830] hns3 :00:01.0: enabling device ( -> 0002)
> >
> /home/shameer/qemu-eric/qemu/hw/vfio/pci.c:2851:vfio_dma_fault_notifier_
> handler:
> > Object 0xeeb47c00 is not an instance of type
> So I think I understand the qemu crash. At the moment the vfio_pci
> registers a fault handler even if we are not in nested mode. The smmuv3
> host driver calls any registered fault handler when it encounters an
> error in !nested mode. So the eventfd is triggered to userspace but qemu
> does not expect that. However the root case is we got some physical
> faults on the second run.

True. And qemu works fine if I run again with iommu=smmuv3 option. 
That's why I suspect the mapping for the device in the phys smmu
is not cleared and on vfio-pci enable dev path it encounters error ?

> > qemu:iommu-memory-region
> > ./qemu_run-vsmmu-hns: line 9: 13609 Aborted (core
> > dumped) ./qemu-system-aarch64-vsmmuv3v10 -machine
> > virt,kernel_irqchip=on,gic-version=3 -cpu host -smp cpus=1 -kernel
> > Image-ericv10-uacce -initrd rootfs-iperf.cpio -bios
> Just to double check with you,
> host: will-arm-smmu-updates-2stage-v10
> qemu: v4.2.0-2stage-rfcv6
> guest version?

Yes. And guest = host image.

> > QEMU_EFI_Dec2018.fd -device vfio-pci,host=:7d:02.1 -net none -m
> Do you assign exactly the same VF as during the 1st run?

Yes same. Only change is "iommu=smmuv3" omission. 

> > 4096 -nographic -D -d -enable-kvm -append "console=ttyAMA0
> > root=/dev/vda -m 4096 rw earlycon=pl011,0x900"
> >
> > And you can see that host kernel receives smmuv3 C_BAD_STE event,
> >
> > [10499.379288] vfio-pci :7d:02.1: enabling device ( -> 0002)
> > [10501.943881] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x04 received:
> > [10501.943884] arm-

RE: [PATCH for-5.0 2/3] fw_cfg: Migrate ACPI table mr sizes separately

2020-04-01 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: 31 March 2020 16:03
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; imamm...@redhat.com; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; xiaoguangrong.e...@gmail.com;
> da...@redhat.com; xuwei (O) ; ler...@redhat.com;
> Linuxarm 
> Subject: Re: [PATCH for-5.0 2/3] fw_cfg: Migrate ACPI table mr sizes 
> separately
> 
> On Mon, Mar 30, 2020 at 05:49:08PM +0100, Shameer Kolothum wrote:
> > Any sub-page size update to ACPI MRs will be lost during
> > migration, as we use aligned size in ram_load_precopy() ->
> > qemu_ram_resize() path. This will result in inconsistency in
> > FWCfgEntry sizes between source and destination. In order to avoid
> > this, save and restore them separately during migration.
> >
> > Up until now, this problem may not be that relevant for x86 as both
> > ACPI table and Linker MRs gets padded and aligned. Also at present,
> > qemu_ram_resize() doesn't invoke callback to update FWCfgEntry for
> > unaligned size changes. But since we are going to fix the
> > qemu_ram_resize() in the subsequent patch, the issue may become
> > more serious especially for RSDP MR case.
> >
> > Moreover, the issue will soon become prominent in arm/virt as well
> > where the MRs are not padded or aligned at all and eventually have
> > acpi table changes as part of future additions like NVDIMM hot-add
> > feature.
> >
> > Suggested-by: David Hildenbrand 
> > Signed-off-by: Shameer Kolothum 
> > Acked-by: David Hildenbrand 
> > ---
> > Please find previous discussions here,
> > https://patchwork.kernel.org/patch/11339591/#23140343
> > ---
> >
> >  hw/core/machine.c |  1 +
> >  hw/nvram/fw_cfg.c | 86
> ++-
> >  include/hw/nvram/fw_cfg.h |  6 +++
> >  3 files changed, 92 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index de0c425605..c1a444cb75 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -39,6 +39,7 @@ GlobalProperty hw_compat_4_2[] = {
> >  { "usb-redir", "suppress-remote-wake", "off" },
> >  { "qxl", "revision", "4" },
> >  { "qxl-vga", "revision", "4" },
> > +{ "fw_cfg", "acpi-mr-restore", "false" },
> >  };
> >  const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2);
> >
> > diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> > index 179b302f01..36d1e32f83 100644
> > --- a/hw/nvram/fw_cfg.c
> > +++ b/hw/nvram/fw_cfg.c
> > @@ -39,6 +39,7 @@
> >  #include "qemu/config-file.h"
> >  #include "qemu/cutils.h"
> >  #include "qapi/error.h"
> > +#include "hw/acpi/aml-build.h"
> >
> >  #define FW_CFG_FILE_SLOTS_DFLT 0x20
> >
> > @@ -610,6 +611,50 @@ bool fw_cfg_dma_enabled(void *opaque)
> >  return s->dma_enabled;
> >  }
> >
> > +static bool fw_cfg_acpi_mr_restore(void *opaque)
> > +{
> > +FWCfgState *s = opaque;
> > +return s->acpi_mr_restore;
> 
> How about we limit this to the case where the address is
> unaligned?

Ok. I will add that check as well.

Thanks,
Shameer

> > +}
> > +
> > +static void fw_cfg_update_mr(FWCfgState *s, uint16_t key, size_t size)
> > +{
> > +MemoryRegion *mr;
> > +ram_addr_t offset;
> > +int arch = !!(key & FW_CFG_ARCH_LOCAL);
> > +void *ptr;
> > +
> > +key &= FW_CFG_ENTRY_MASK;
> > +assert(key < fw_cfg_max_entry(s));
> > +
> > +ptr = s->entries[arch][key].data;
> > +mr = memory_region_from_host(ptr, );
> > +
> > +memory_region_ram_resize(mr, size, _abort);
> > +}
> > +
> > +static int fw_cfg_acpi_mr_restore_post_load(void *opaque, int version_id)
> > +{
> > +FWCfgState *s = opaque;
> > +int i, index;
> > +
> > +assert(s->files);
> > +
> > +index = be32_to_cpu(s->files->count);
> > +
> > +for (i = 0; i < index; i++) {
> > +if (!strcmp(s->files->f[i].name, ACPI_BUILD_TABLE_FILE)) {
> > +fw_cfg_update_mr(s, FW_CFG_FILE_FIRST + i,
> s->table_mr_size);
> > +} else if (!strcmp(s->files->f[i].name, ACPI_BUILD_LOADER_FILE)) {
> > +fw_cfg_update_mr(s, FW_CFG_FILE_FIRST 

RE: [PATCH for-5.0 2/3] fw_cfg: Migrate ACPI table mr sizes separately

2020-04-01 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> Sent: 31 March 2020 11:46
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; imamm...@redhat.com; peter.mayd...@linaro.org;
> xiaoguangrong.e...@gmail.com; da...@redhat.com; m...@redhat.com;
> Linuxarm ; xuwei (O) ;
> shannon.zha...@gmail.com; ler...@redhat.com
> Subject: Re: [PATCH for-5.0 2/3] fw_cfg: Migrate ACPI table mr sizes 
> separately

[...]

> > +static const VMStateDescription vmstate_fw_cfg_acpi_mr = {
> > +.name = "fw_cfg/acpi_mr",
> > +.version_id = 1,
> > +.minimum_version_id = 1,
> > +.needed = fw_cfg_acpi_mr_restore,
> > +.post_load = fw_cfg_acpi_mr_restore_post_load,
> > +.fields = (VMStateField[]) {
> > +VMSTATE_UINT64(table_mr_size, FWCfgState),
> > +VMSTATE_UINT64(linker_mr_size, FWCfgState),
> > +VMSTATE_UINT64(rsdp_mr_size, FWCfgState),
> 
> The checker found something I also spotted; which is you can't use a
> VMSTATE_UINT64 against a field that is size_t - it's not portable;
> I suggest the easiest fix is to make your fields in fw_cfg.h uint64's.

Thanks for that. Yes, checker also spotted this and I was clueless. Sure, I 
will change
that.

Shameer


> Dave
> 
> > +VMSTATE_END_OF_LIST()
> > +},
> > +};
> > +
> >  static const VMStateDescription vmstate_fw_cfg = {
> >  .name = "fw_cfg",
> >  .version_id = 2,
> > @@ -631,6 +690,7 @@ static const VMStateDescription vmstate_fw_cfg = {
> >  },
> >  .subsections = (const VMStateDescription*[]) {
> >  _fw_cfg_dma,
> > +_fw_cfg_acpi_mr,
> >  NULL,
> >  }
> >  };
> > @@ -815,6 +875,23 @@ static struct {
> >  #define FW_CFG_ORDER_OVERRIDE_LAST 200
> >  };
> >
> > +/*
> > + * Any sub-page size update to these table MRs will be lost during 
> > migration,
> > + * as we use aligned size in ram_load_precopy() -> qemu_ram_resize() path.
> > + * In order to avoid the inconsistency in sizes save them seperately and
> > + * migrate over in vmstate post_load().
> > + */
> > +static void fw_cfg_acpi_mr_save(FWCfgState *s, const char *filename,
> size_t len)
> > +{
> > +if (!strcmp(filename, ACPI_BUILD_TABLE_FILE)) {
> > +s->table_mr_size = len;
> > +} else if (!strcmp(filename, ACPI_BUILD_LOADER_FILE)) {
> > +s->linker_mr_size = len;
> > +} else if (!strcmp(filename, ACPI_BUILD_RSDP_FILE)) {
> > +s->rsdp_mr_size = len;
> > +}
> > +}
> > +
> >  static int get_fw_cfg_order(FWCfgState *s, const char *name)
> >  {
> >  int i;
> > @@ -914,6 +991,7 @@ void fw_cfg_add_file_callback(FWCfgState *s,
> const char *filename,
> >  trace_fw_cfg_add_file(s, index, s->files->f[index].name, len);
> >
> >  s->files->count = cpu_to_be32(count+1);
> > +fw_cfg_acpi_mr_save(s, filename, len);
> >  }
> >
> >  void fw_cfg_add_file(FWCfgState *s,  const char *filename,
> > @@ -937,6 +1015,7 @@ void *fw_cfg_modify_file(FWCfgState *s, const char
> *filename,
> >  ptr = fw_cfg_modify_bytes_read(s, FW_CFG_FILE_FIRST + i,
> > data, len);
> >  s->files->f[i].size   = cpu_to_be32(len);
> > +fw_cfg_acpi_mr_save(s, filename, len);
> >  return ptr;
> >  }
> >  }
> > @@ -973,7 +1052,10 @@ static void fw_cfg_machine_ready(struct Notifier
> *n, void *data)
> >  qemu_register_reset(fw_cfg_machine_reset, s);
> >  }
> >
> > -
> > +static Property fw_cfg_properties[] = {
> > +DEFINE_PROP_BOOL("acpi-mr-restore", FWCfgState, acpi_mr_restore,
> true),
> > +DEFINE_PROP_END_OF_LIST(),
> > +};
> >
> >  static void fw_cfg_common_realize(DeviceState *dev, Error **errp)
> >  {
> > @@ -1097,6 +1179,8 @@ static void fw_cfg_class_init(ObjectClass *klass,
> void *data)
> >
> >  dc->reset = fw_cfg_reset;
> >  dc->vmsd = _fw_cfg;
> > +
> > +device_class_set_props(dc, fw_cfg_properties);
> >  }
> >
> >  static const TypeInfo fw_cfg_info = {
> > diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
> > index b5291eefad..457fee7425 100644
> > --- a/include/hw/nvram/fw_cfg.h
> > +++ b/include/hw/nvram/fw_cfg.h
> > @@ -53,6 +53,12 @@ struct FWCfgState {
> >  dma_addr_t dma_addr;
> >  AddressSpace *dma_as;
> >  MemoryRegion dma_iomem;
> > +
> > +/* restore during migration */
> > +bool acpi_mr_restore;
> > +size_t table_mr_size;
> > +size_t linker_mr_size;
> > +size_t rsdp_mr_size;
> >  };
> >
> >  struct FWCfgIoState {
> > --
> > 2.17.1
> >
> >
> >
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




RE: [PATCH v3 00/10] ARM virt: Add NVDIMM support

2020-03-30 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: 29 March 2020 11:46
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; imamm...@redhat.com; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; xiaoguangrong.e...@gmail.com;
> da...@redhat.com; xuwei (O) ; ler...@redhat.com;
> Linuxarm 
> Subject: Re: [PATCH v3 00/10] ARM virt: Add NVDIMM support
> 
> On Wed, Mar 11, 2020 at 05:20:04PM +, Shameer Kolothum wrote:
> > This series adds NVDIMM support to arm/virt platform.
> 
> 
> So I'm still confused about whether there's a bugfix here
> that we need for 5.0. If yes pls post just that part
> with acks included and for-5.0 in the subject.

Ok. I can send the first 4 patches in this series as general fixes,
but as I mentioned earlier they only matter if we add nvdimm arm/virt
support. The only case I am not sure that may break x86 is the RSDP
table update and the resulting size inconsistency during migration 
discussed in patch #2.
 
Anyways I think it is better to send those separately.

Thanks,
Shameer 

> > The series reuses some of the patches posted by Eric
> > in his earlier attempt here[1].
> >
> > This also include few fixes to qemu in general which were
> > discovered while adding nvdimm support to arm/virt.
> >
> > Patch #2 addresses the issue[2] that, during migration, the
> > source and destination might end up with an inconsistency
> > in acpi table memory region sizes.
> >
> > Patch #3 is to fix the qemu_ram_resize() callback issue[2].
> >
> > Patch #4 is another fix to the nvdimm aml issue discussed
> > here[3].
> >
> > I have done a basic sanity testing of NVDIMM devices
> > with Guest booting with ACPI. Further testing is always
> > welcome.
> >
> > Please let me know your feedback.
> >
> > Thanks,
> > Shameer
> >
> > [1] https://patchwork.kernel.org/cover/10830777/
> > [2] https://patchwork.kernel.org/patch/11339591/
> > [3] https://patchwork.kernel.org/cover/11174959/
> >
> > v2 --> v3
> >  - Added patch #1 and # 2 to fix the inconsistency in acpi
> >table memory region sizes during migration. Thanks to
> >David H.
> >  - The fix for qemu_ram_resize() callback was modified to
> >the one in patch #3. Again thanks to David H.
> >  - Addressed comments from MST and Eric on tests added.
> >  - Addressed comments from Igor/MST on Integer size in patch #4
> >  - Added Eric's R-by to patch #7.
> >
> > v1 --> v2
> >  -Reworked patch #1 and now fix is inside qemu_ram_resize().
> >  -Added patch #2 to fix the nvdim aml issue.
> >  -Dropped support to DT cold plug.
> >  -Updated test_acpi_virt_tcg_memhp() with pc-dimm and nvdimms(patch
> #7)
> >
> > David Hildenbrand (1):
> >   exec: Fix for qemu_ram_resize() callback
> >
> > Kwangwoo Lee (2):
> >   nvdimm: Use configurable ACPI IO base and size
> >   hw/arm/virt: Add nvdimm hot-plug infrastructure
> >
> > Shameer Kolothum (7):
> >   acpi: Use macro for table-loader file name
> >   fw_cfg: Migrate ACPI table mr sizes separately
> >   hw/acpi/nvdimm: Fix for NVDIMM incorrect DSM output buffer length
> >   hw/arm/virt: Add nvdimm hotplug support
> >   tests: Update ACPI tables list for upcoming arm/virt test changes
> >   tests/bios-tables-test: Update arm/virt memhp test
> >   tests/acpi: add expected tables for bios-tables-test
> >
> >  docs/specs/acpi_hw_reduced_hotplug.rst |   1 +
> >  exec.c |  14 +++-
> >  hw/acpi/generic_event_device.c |  15 -
> >  hw/acpi/nvdimm.c   |  72
> +
> >  hw/arm/Kconfig |   1 +
> >  hw/arm/virt-acpi-build.c   |   8 ++-
> >  hw/arm/virt.c  |  35 --
> >  hw/core/machine.c  |   1 +
> >  hw/i386/acpi-build.c   |   8 ++-
> >  hw/i386/acpi-build.h   |   3 +
> >  hw/i386/pc_piix.c  |   2 +
> >  hw/i386/pc_q35.c   |   2 +
> >  hw/mem/Kconfig |   2 +-
> >  hw/nvram/fw_cfg.c  |  86
> -
> >  include/hw/acpi/aml-build.h|   1 +
> >  include/hw/acpi/generic_event_device.h |   1 +
> >  include/hw/arm/virt.h  |   1 +
> >  include/hw/mem/nvdimm.h|   3 +
> >  include/hw/nvram/fw_cfg.h  |   6 ++
> >  tests/data/acpi/pc/SSDT.dimmpxm| Bin 685 -> 734 bytes
> >  tests/data/acpi/q35/SSDT.dimmpxm   | Bin 685 -> 734 bytes
> >  tests/data/acpi/virt/DSDT.memhp| Bin 6644 -> 6668 bytes
> >  tests/data/acpi/virt/NFIT.memhp| Bin 0 -> 224 bytes
> >  tests/data/acpi/virt/SSDT.memhp| Bin 0 -> 736 bytes
> >  tests/qtest/bios-tables-test.c |   9 ++-
> >  25 files changed, 244 insertions(+), 27 deletions(-)
> >  create mode 100644 tests/data/acpi/virt/NFIT.memhp
> >  create mode 100644 tests/data/acpi/virt/SSDT.memhp
> >
> > --
> > 2.17.1
> >




RE: [RFC v1] arm/virt: Add memory hot remove support

2020-03-26 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 26 March 2020 11:01
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org
> Cc: imamm...@redhat.com; peter.mayd...@linaro.org; m...@redhat.com;
> xuwei (O) ; Zengtao (B) ;
> Linuxarm ; Anshuman Khandual
> 
> Subject: Re: [RFC v1] arm/virt: Add memory hot remove support
> 
> Hi Shameer,
> 
> On 3/18/20 1:37 PM, Shameer Kolothum wrote:
> > This adds support for memory hot remove on arm/virt that
> > uses acpi ged device.
> 
> I gave this a try and it works fine if the PCDIMM slot was initially
> hotplugged:
> (QEMU) object-add qom-type=memory-backend-ram id=mem1
> props.size=4294967296
> {"return": {}}
> (QEMU) device_add driver=pc-dimm  id=pcdimm1 memdev=mem1
> (QEMU) device_del id=pcdimm1
> {"return": {}}
> 
> on guest I can see:
> [   82.466321] Offlined Pages 262144
> [   82.541712] Offlined Pages 262144
> [   82.589236] Offlined Pages 262144
> [   82.969166] Offlined Pages 262144
> 
> However I noticed that if qemu is launched directly with
> 
> -m 16G,maxmem=32G,slots=2 \
> -object memory-backend-ram,id=mem1,size=4G \
> -device pc-dimm,memdev=mem1,id=dimm1,driver=pc-dimm -device
> 
> and then in the qmp shell:
> (QEMU) device_del id=dimm1
> 
> the hot-unplug fails in guest:
> 
> [   78.897407] Offlined Pages 262144
> [   79.260811] Offlined Pages 262144
> [   79.308105] Offlined Pages 262144
> [   79.333675] page:fe00137d1f40 refcount:1 mapcount:0
> mapping:0004ea9f20b1 index:0xaaab11c6e
> [   79.335927] anon flags: 0x17880024(uptodate|active|swapbacked)
> [   79.337571] raw: 17880024 dead0100
> dead0122
> 0004ea9f20b1
> [   79.339502] raw: 000aaab11c6e  0001
> 0004fd4e3000
> [   79.341701] page dumped because: unmovable page
> [   79.342887] page->mem_cgroup:0004fd4e3000
> [   79.354729] page:fe00137d1f40 refcount:1 mapcount:0
> mapping:0004ea9f20b1 index:0xaaab11c6e
> [   79.357012] anon flags: 0x17880024(uptodate|active|swapbacked)
> [   79.358658] raw: 17880024 dead0100
> dead0122
> 0004ea9f20b1
> [   79.360611] raw: 000aaab11c6e  0001
> 0004fd4e3000
> [   79.362560] page dumped because: unmovable page
> [   79.363742] page->mem_cgroup:0004fd4e3000
> [   79.368636] memory memory20: Offline failed.
> 
> I did not expect this. The PCDIMM slot in that case does not seem to be
> interpreted as a hot-unpluggable one (?). I added Anshuman in cc.

Could you please try adding "movable_node" to qemu guest kernel command line 
params.
This will prevent any kernel allocation from hotplugable memory nodes which I 
think is
causing the behavior you are seeing.

Thanks,
Shameer


> Thanks
> 
> Eric
> 
> 
> 
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  -RFC because linux kernel support for mem hot remove is just queued
> >   for 5.7[1].
> >  -Tested with guest kernel 5.6-rc5 + [1]
> >
> > 1. https://patchwork.kernel.org/cover/11419301/
> > ---
> >  hw/acpi/generic_event_device.c | 28 +
> >  hw/arm/virt.c  | 56
> --
> >  2 files changed, 82 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/acpi/generic_event_device.c
> b/hw/acpi/generic_event_device.c
> > index 021ed2bf23..3e28c110fa 100644
> > --- a/hw/acpi/generic_event_device.c
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -182,6 +182,32 @@ static void
> acpi_ged_device_plug_cb(HotplugHandler *hotplug_dev,
> >  }
> >  }
> >
> > +static void acpi_ged_unplug_request_cb(HotplugHandler *hotplug_dev,
> > +   DeviceState *dev, Error
> **errp)
> > +{
> > +AcpiGedState *s = ACPI_GED(hotplug_dev);
> > +
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +acpi_memory_unplug_request_cb(hotplug_dev,
> >memhp_state, dev, errp);
> > +} else {
> > +error_setg(errp, "acpi: device unplug request for unsupported
> device"
> > +   " type: %s", object_get_typename(OBJECT(dev)));
> > +}
> > +}
> > +
> > +static void acpi_ged_unplug_cb(HotplugHandler *hotplug_dev,
> > +   DeviceState *dev, Error **errp)
> > +{
> > +AcpiGedState *s = ACPI_GED(hotplug_dev);
> > +
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +  

RE: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration

2020-03-25 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 20 March 2020 16:58
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; peter.mayd...@linaro.org;
> m...@redhat.com; alex.william...@redhat.com;
> jacob.jun@linux.intel.com; yi.l@intel.com
> Cc: pet...@redhat.com; jean-phili...@linaro.org; w...@kernel.org;
> tnowi...@marvell.com; Shameerali Kolothum Thodi
> ; zhangfei@foxmail.com;
> zhangfei@linaro.org; m...@kernel.org; bbhush...@marvell.com
> Subject: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration
> 
> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
> integration requires to program the physical IOMMU consistently
> with the guest mappings. However, as opposed to VTD, SMMUv3 has
> no "Caching Mode" which allows easy trapping of guest mappings.
> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
> 
> However SMMUv3 has 2 translation stages. This was devised with
> virtualization use case in mind where stage 1 is "owned" by the
> guest whereas the host uses stage 2 for VM isolation.
> 
> This series sets up this nested translation stage. It only works
> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
> other words, it does not work if there is a physical SMMUv2).

I was testing this series on one of our hardware board with SMMUv3. I did
observe an issue while trying to bring up Guest with and without the vsmmuV3.

Steps are like below,

1. start a guest with "iommu=smmuv3" and a n/w vf device.

2.Exit the VM.

3. start the guest again without "iommu=smmuv3"

This time qemu crashes with,

[ 0.447830] hns3 :00:01.0: enabling device ( -> 0002)
/home/shameer/qemu-eric/qemu/hw/vfio/pci.c:2851:vfio_dma_fault_notifier_handler:
Object 0xeeb47c00 is not an instance of type
qemu:iommu-memory-region
./qemu_run-vsmmu-hns: line 9: 13609 Aborted (core
dumped) ./qemu-system-aarch64-vsmmuv3v10 -machine
virt,kernel_irqchip=on,gic-version=3 -cpu host -smp cpus=1 -kernel
Image-ericv10-uacce -initrd rootfs-iperf.cpio -bios
QEMU_EFI_Dec2018.fd -device vfio-pci,host=:7d:02.1 -net none -m
4096 -nographic -D -d -enable-kvm -append "console=ttyAMA0
root=/dev/vda -m 4096 rw earlycon=pl011,0x900"

And you can see that host kernel receives smmuv3 C_BAD_STE event,

[10499.379288] vfio-pci :7d:02.1: enabling device ( -> 0002)
[10501.943881] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x04 received:
[10501.943884] arm-smmu-v3 arm-smmu-v3.2.auto: 0x7d110004
[10501.943886] arm-smmu-v3 arm-smmu-v3.2.auto: 0x10080080
[10501.943887] arm-smmu-v3 arm-smmu-v3.2.auto: 0xfe04
[10501.943889] arm-smmu-v3 arm-smmu-v3.2.auto: 0x7e04c440

So I suspect we didn't clear nested stage configuration and that affects the 
translation in the second run. I tried to issue(force) a 
vfio_detach_pasid_table() but 
that didn't solve the problem.

May be I am missing something. Could you please take a look and let me know.

Thanks,
Shameer

> - We force the host to use stage 2 instead of stage 1, when we
>   detect a vSMMUV3 is behind a VFIO device. For a VFIO device
>   without any virtual IOMMU, we still use stage 1 as many existing
>   SMMUs expect this behavior.
> - We use PCIPASIDOps to propage guest stage1 config changes on
>   STE (Stream Table Entry) changes.
> - We implement a specific UNMAP notifier that conveys guest
>   IOTLB invalidations to the host
> - We register MSI IOVA/GPA bindings to the host so that this latter
>   can build a nested stage translation
> - As the legacy MAP notifier is not called anymore, we must make
>   sure stage 2 mappings are set. This is achieved through another
>   prereg memory listener.
> - Physical SMMU stage 1 related faults are reported to the guest
>   via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
>   region. Then they are reinjected into the guest.
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
> 
> Kernel Dependencies:
> [1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> [2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> branch at:
> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
> 
> History:
> 
> v5 -> v6:
> - just rebase work
> 
> v4 -> v5:
> - Use PCIPASIDOps for config update notifications
> - removal of notification for MSI binding which is not needed
>   anymore
> - Use a single fault region
> - use the specific interrupt index
> 
> v3 -> v4:
> - adapt to changes in uapi (asid cache invalidation)
> - check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel lev

RE: [PATCH v3 02/10] fw_cfg: Migrate ACPI table mr sizes separately

2020-03-23 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 23 March 2020 12:35
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org;
> xiaoguangrong.e...@gmail.com; da...@redhat.com; m...@redhat.com;
> Linuxarm ; xuwei (O) ;
> shannon.zha...@gmail.com; ler...@redhat.com
> Subject: Re: [PATCH v3 02/10] fw_cfg: Migrate ACPI table mr sizes separately
> 
> On Wed, 11 Mar 2020 17:20:06 +
> Shameer Kolothum  wrote:
> 
> > Any sub-page size update to ACPI table MRs will be lost during
> > migration, as we use aligned size in ram_load_precopy() ->
> > qemu_ram_resize() path. This will result in inconsistency in sizes
> > between source and destination.
> I'm not sure what problem is and if it matters in case of migration,
> an example here with numbers from affected acpi blob would be useful here.

This happens when we try to fix the qemu_ram_resize() callback for sub-age
changes(patch # 03/10 in this series). In the previous discussion
David Hildenbrand pointed out that the fix will create an inconsistency between
source and target. I can add more details and some numbers in the
commit log here.

> PS:
> could you point to mail thread where problem was discussed

It is here , 
https://patchwork.kernel.org/patch/11339591/#23138505

Thanks,
Shameer
 
> > In order to avoid this, save and
> > restore them separately during migration.
> >
> > Suggested-by: David Hildenbrand 
> > Signed-off-by: Shameer Kolothum 
> > ---
> > Please find the discussion here,
> > https://patchwork.kernel.org/patch/11339591/
> > ---
> >  hw/core/machine.c |  1 +
> >  hw/nvram/fw_cfg.c | 86
> ++-
> >  include/hw/nvram/fw_cfg.h |  6 +++
> >  3 files changed, 92 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index 9e8c06036f..6d960bd47f 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -39,6 +39,7 @@ GlobalProperty hw_compat_4_2[] = {
> >  { "usb-redir", "suppress-remote-wake", "off" },
> >  { "qxl", "revision", "4" },
> >  { "qxl-vga", "revision", "4" },
> > +{ "fw_cfg", "acpi-mr-restore", "false" },
> >  };
> >  const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2);
> >
> > diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> > index 179b302f01..36d1e32f83 100644
> > --- a/hw/nvram/fw_cfg.c
> > +++ b/hw/nvram/fw_cfg.c
> > @@ -39,6 +39,7 @@
> >  #include "qemu/config-file.h"
> >  #include "qemu/cutils.h"
> >  #include "qapi/error.h"
> > +#include "hw/acpi/aml-build.h"
> >
> >  #define FW_CFG_FILE_SLOTS_DFLT 0x20
> >
> > @@ -610,6 +611,50 @@ bool fw_cfg_dma_enabled(void *opaque)
> >  return s->dma_enabled;
> >  }
> >
> > +static bool fw_cfg_acpi_mr_restore(void *opaque)
> > +{
> > +FWCfgState *s = opaque;
> > +return s->acpi_mr_restore;
> > +}
> > +
> > +static void fw_cfg_update_mr(FWCfgState *s, uint16_t key, size_t size)
> > +{
> > +MemoryRegion *mr;
> > +ram_addr_t offset;
> > +int arch = !!(key & FW_CFG_ARCH_LOCAL);
> > +void *ptr;
> > +
> > +key &= FW_CFG_ENTRY_MASK;
> > +assert(key < fw_cfg_max_entry(s));
> > +
> > +ptr = s->entries[arch][key].data;
> > +mr = memory_region_from_host(ptr, );
> > +
> > +memory_region_ram_resize(mr, size, _abort);
> > +}
> > +
> > +static int fw_cfg_acpi_mr_restore_post_load(void *opaque, int version_id)
> > +{
> > +FWCfgState *s = opaque;
> > +int i, index;
> > +
> > +assert(s->files);
> > +
> > +index = be32_to_cpu(s->files->count);
> > +
> > +for (i = 0; i < index; i++) {
> > +if (!strcmp(s->files->f[i].name, ACPI_BUILD_TABLE_FILE)) {
> > +fw_cfg_update_mr(s, FW_CFG_FILE_FIRST + i,
> s->table_mr_size);
> > +} else if (!strcmp(s->files->f[i].name, ACPI_BUILD_LOADER_FILE)) {
> > +fw_cfg_update_mr(s, FW_CFG_FILE_FIRST + i,
> s->linker_mr_size);
> > +} else if (!strcmp(s->files->f[i].name, ACPI_BUILD_RSDP_FILE)) {
> > +fw_cfg_update_mr(s, FW_CFG_FILE_FIRST + i,
> s->rsdp_mr_size);
> > +}
> > +}
> > +
> > +retu

RE: [PATCH v3 02/10] fw_cfg: Migrate ACPI table mr sizes separately

2020-03-20 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: 19 March 2020 17:51
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; imamm...@redhat.com; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; xiaoguangrong.e...@gmail.com;
> da...@redhat.com; xuwei (O) ; ler...@redhat.com;
> Linuxarm 
> Subject: Re: [PATCH v3 02/10] fw_cfg: Migrate ACPI table mr sizes separately
> 
> On Thu, Mar 12, 2020 at 09:27:32AM +, Shameerali Kolothum Thodi
> wrote:
> >
> >
> > > -Original Message-
> > > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > > Sent: 11 March 2020 21:10
> > > To: Shameerali Kolothum Thodi 
> > > Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > > eric.au...@redhat.com; imamm...@redhat.com;
> peter.mayd...@linaro.org;
> > > shannon.zha...@gmail.com; xiaoguangrong.e...@gmail.com;
> > > da...@redhat.com; xuwei (O) ;
> ler...@redhat.com;
> > > Linuxarm 
> > > Subject: Re: [PATCH v3 02/10] fw_cfg: Migrate ACPI table mr sizes
> separately
> > >
> > > On Wed, Mar 11, 2020 at 05:20:06PM +, Shameer Kolothum wrote:
> > > > Any sub-page size update to ACPI table MRs will be lost during
> > > > migration, as we use aligned size in ram_load_precopy() ->
> > > > qemu_ram_resize() path. This will result in inconsistency in sizes
> > > > between source and destination. In order to avoid this, save and
> > > > restore them separately during migration.
> >
> >
> > > Is there a reason this is part of nvdimm patchset?
> >
> > Not really. But this problem is more visible if we have nvdimm hotplug
> > support added to arm/virt. On x86, both acpi table and linker MRs are 
> > already
> > aligned and I don't know a use case where you can change RSDP MR size(See
> below).
> >
> > >
> > > Hmm but for old machine types we still have a problem right?
> > > How about aligning size on source for them?
> > > Then there won't be an inconsistency across migration.
> > > Wastes some boot time/memory but maybe that's better
> > > than a chance of not booting ...
> >
> > Right. That was considered. On x86, except RSDP MR, both the LINKER and
> ACPI
> > TABLE MRs are already aligned/padded. And we cannot make RSDP mr
> aligned
> > as it will break the seabios based boot.
> 
> Hmm. So right now if we migrate just before RSDP is read, there's
> a failure?

I am not sure that will be the case. IIUC, on migration path, 
ram_load_precopy() -->qemu_ram_resize()
won't be called as both length and block->used_length will be aligned size.
Even if it calls, the current qemu_ram_resize() works on aligned size and wont 
invoke
the callback to update the FWCfgEntry. And I believe on destination, the bios 
read will trigger
the fw_cfg_select() which will call the acpi_build_update() to rebuild the 
tables and update the
FWCfgEntry.

Having said that my knowledge on this is limited, but I can test and confirm 
this, if there is
an easy way to trigger this usecase. Please let me know.

Thanks,
Shameer






> > So a generic solution based on alignment
> > is not possible unless we guarantee that RSDP is not going to be modified.
> >
> > What we could do for Arm/virt is just follow the x86 way and add padding for
> > table and linker MRs. But this was discussed before and IIRC, was not well
> > received.
> >
> > Thanks,
> > Shameer
> >
> > > > Suggested-by: David Hildenbrand 
> > > > Signed-off-by: Shameer Kolothum
> 
> > > > ---
> > > > Please find the discussion here,
> > > > https://patchwork.kernel.org/patch/11339591/
> > > > ---
> > > >  hw/core/machine.c |  1 +
> > > >  hw/nvram/fw_cfg.c | 86
> > > ++-
> > > >  include/hw/nvram/fw_cfg.h |  6 +++
> > > >  3 files changed, 92 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > > index 9e8c06036f..6d960bd47f 100644
> > > > --- a/hw/core/machine.c
> > > > +++ b/hw/core/machine.c
> > > > @@ -39,6 +39,7 @@ GlobalProperty hw_compat_4_2[] = {
> > > >  { "usb-redir", "suppress-remote-wake", "off" },
> > > >  { "qxl", "revision", "4" },
> > > >  { "qxl-vga", "revision", "4" },
> > > > +{ "

RE: [PATCH v3 02/10] fw_cfg: Migrate ACPI table mr sizes separately

2020-03-12 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: 11 March 2020 21:10
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; imamm...@redhat.com; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; xiaoguangrong.e...@gmail.com;
> da...@redhat.com; xuwei (O) ; ler...@redhat.com;
> Linuxarm 
> Subject: Re: [PATCH v3 02/10] fw_cfg: Migrate ACPI table mr sizes separately
> 
> On Wed, Mar 11, 2020 at 05:20:06PM +, Shameer Kolothum wrote:
> > Any sub-page size update to ACPI table MRs will be lost during
> > migration, as we use aligned size in ram_load_precopy() ->
> > qemu_ram_resize() path. This will result in inconsistency in sizes
> > between source and destination. In order to avoid this, save and
> > restore them separately during migration.


> Is there a reason this is part of nvdimm patchset?

Not really. But this problem is more visible if we have nvdimm hotplug
support added to arm/virt. On x86, both acpi table and linker MRs are already
aligned and I don't know a use case where you can change RSDP MR size(See 
below).

>
> Hmm but for old machine types we still have a problem right?
> How about aligning size on source for them?
> Then there won't be an inconsistency across migration.
> Wastes some boot time/memory but maybe that's better
> than a chance of not booting ...

Right. That was considered. On x86, except RSDP MR, both the LINKER and ACPI
TABLE MRs are already aligned/padded. And we cannot make RSDP mr aligned
as it will break the seabios based boot. So a generic solution based on 
alignment 
is not possible unless we guarantee that RSDP is not going to be modified.

What we could do for Arm/virt is just follow the x86 way and add padding for
table and linker MRs. But this was discussed before and IIRC, was not well
received.

Thanks,
Shameer

> > Suggested-by: David Hildenbrand 
> > Signed-off-by: Shameer Kolothum 
> > ---
> > Please find the discussion here,
> > https://patchwork.kernel.org/patch/11339591/
> > ---
> >  hw/core/machine.c |  1 +
> >  hw/nvram/fw_cfg.c | 86
> ++-
> >  include/hw/nvram/fw_cfg.h |  6 +++
> >  3 files changed, 92 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index 9e8c06036f..6d960bd47f 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -39,6 +39,7 @@ GlobalProperty hw_compat_4_2[] = {
> >  { "usb-redir", "suppress-remote-wake", "off" },
> >  { "qxl", "revision", "4" },
> >  { "qxl-vga", "revision", "4" },
> > +{ "fw_cfg", "acpi-mr-restore", "false" },
> >  };
> >  const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2);
> >
> > diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> > index 179b302f01..36d1e32f83 100644
> > --- a/hw/nvram/fw_cfg.c
> > +++ b/hw/nvram/fw_cfg.c
> > @@ -39,6 +39,7 @@
> >  #include "qemu/config-file.h"
> >  #include "qemu/cutils.h"
> >  #include "qapi/error.h"
> > +#include "hw/acpi/aml-build.h"
> >
> >  #define FW_CFG_FILE_SLOTS_DFLT 0x20
> >
> > @@ -610,6 +611,50 @@ bool fw_cfg_dma_enabled(void *opaque)
> >  return s->dma_enabled;
> >  }
> >
> > +static bool fw_cfg_acpi_mr_restore(void *opaque)
> > +{
> > +FWCfgState *s = opaque;
> > +return s->acpi_mr_restore;
> > +}
> > +
> > +static void fw_cfg_update_mr(FWCfgState *s, uint16_t key, size_t size)
> > +{
> > +MemoryRegion *mr;
> > +ram_addr_t offset;
> > +int arch = !!(key & FW_CFG_ARCH_LOCAL);
> > +void *ptr;
> > +
> > +key &= FW_CFG_ENTRY_MASK;
> > +assert(key < fw_cfg_max_entry(s));
> > +
> > +ptr = s->entries[arch][key].data;
> > +mr = memory_region_from_host(ptr, );
> > +
> > +memory_region_ram_resize(mr, size, _abort);
> > +}
> > +
> > +static int fw_cfg_acpi_mr_restore_post_load(void *opaque, int version_id)
> > +{
> > +FWCfgState *s = opaque;
> > +int i, index;
> > +
> > +assert(s->files);
> > +
> > +index = be32_to_cpu(s->files->count);
> > +
> > +for (i = 0; i < index; i++) {
> > +if (!strcmp(s->files->f[i].name, ACPI_BUILD_TABLE_FILE)) {
> > +fw_cfg_update_mr(s, FW_CFG_FILE_FIRST + i,
> s->table_mr_size);
> >

RE: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

2020-03-11 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: 28 February 2020 18:00
> To: Shameerali Kolothum Thodi ;
> Igor Mammedov 
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> xuwei (O) ; Linuxarm ;
> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com;
> dgilb...@redhat.com; Juan Jose Quintela Carreira 
> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback
> 

[...]
 
> 
> We should instead think about
> 
> 1. Migrating the actual size of the 3 memory regions separately and setting
> them via
> memory_region_ram_resize() when loading the vmstate. This will trigger
> another FW cfg
> fixup and should be fine (with the same qemu_ram_resize() above).
> 
> 2. Introduce a new RAM_SAVE_FLAG_MEM_SIZE_2, that e.g., stores the
> number of ramblocks,
> not the total amount of memory of the ram blocks. But it's hacky, because we
> migrate
> something for RAM blocks, that is not a RAM block concept (sub-block sizes).
> 
> I think you should look into 1. Shouldn't be too hard I think.

I have send out v3 of this series ([PATCH v3 00/10] ARM virt: Add NVDIMM 
support)
with an attempt to migrate the memory regions separately. It also includes
your patch for qemu_ram_resize() callback fix. Please take a look and let me 
know.

Thanks,
Shameer




RE: [PATCH v2 2/7] hw/acpi/nvdimm: Fix for NVDIMM incorrect DSM output buffer length

2020-03-10 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of Michael S. Tsirkin
> Sent: 10 March 2020 11:36
> To: Shameerali Kolothum Thodi 
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> shannon.zha...@gmail.com; qemu-devel@nongnu.org; Linuxarm
> ; eric.au...@redhat.com; qemu-...@nongnu.org;
> xuwei (O) ; Igor Mammedov
> ; ler...@redhat.com
> Subject: Re: [PATCH v2 2/7] hw/acpi/nvdimm: Fix for NVDIMM incorrect DSM
> output buffer length
> 
> On Tue, Mar 10, 2020 at 11:22:05AM +, Shameerali Kolothum Thodi
> wrote:
> >
> >
> > > -Original Message-
> > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > Sent: 06 February 2020 16:06
> > > To: Shameerali Kolothum Thodi 
> > > Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > > eric.au...@redhat.com; peter.mayd...@linaro.org;
> > > xiaoguangrong.e...@gmail.com; m...@redhat.com; Linuxarm
> > > ; xuwei (O) ;
> > > shannon.zha...@gmail.com; ler...@redhat.com
> > > Subject: Re: [PATCH v2 2/7] hw/acpi/nvdimm: Fix for NVDIMM incorrect
> DSM
> > > output buffer length
> > >
> > > On Fri, 17 Jan 2020 17:45:17 +
> > > Shameer Kolothum  wrote:
> > >
> > > > As per ACPI spec 6.3, Table 19-419 Object Conversion Rules, if the
> > > > Buffer Field <= to the size of an Integer (in bits), it will be
> > > > treated as an integer. Moreover, the integer size depends on DSDT
> > > > tables revision number. If revision number is < 2, integer size is 32
> > > > bits, otherwise it is 64 bits. Current NVDIMM common DSM aml code
> > > > (NCAL) uses CreateField() for creating DSM output buffer. This creates
> > > > an issue in arm/virt platform where DSDT revision number is 2 and
> > > > results in DSM buffer with a wrong
> > > > size(8 bytes) gets returned when actual length is < 8 bytes.
> > > > This causes guest kernel to report,
> > > >
> > > > "nfit ACPI0012:00: found a zero length table '0' parsing nfit"
> > > >
> > > > In order to fix this, aml code is now modified such that it builds the
> > > > DSM output buffer in a byte by byte fashion when length is smaller
> > > > than Integer size.
> > > >
> > > > Suggested-by: Igor Mammedov 
> > > > Signed-off-by: Shameer Kolothum
> 
> > > > ---
> > > > Please find the previous discussion on this here,
> > > > https://patchwork.kernel.org/cover/11174959/
> > > >
> > > > ---
> > > >  hw/acpi/nvdimm.c| 36
> > > +++--
> > > >  tests/qtest/bios-tables-test-allowed-diff.h |  2 ++
> > > >  2 files changed, 35 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c index
> > > > 9fdad6dc3f..5e7b8318d0 100644
> > > > --- a/hw/acpi/nvdimm.c
> > > > +++ b/hw/acpi/nvdimm.c
> > > > @@ -964,6 +964,7 @@ static void nvdimm_build_common_dsm(Aml
> *dev)
> > > >  Aml *method, *ifctx, *function, *handle, *uuid, *dsm_mem,
> > > *elsectx2;
> > > >  Aml *elsectx, *unsupport, *unpatched, *expected_uuid,
> *uuid_invalid;
> > > >  Aml *pckg, *pckg_index, *pckg_buf, *field, *dsm_out_buf,
> > > > *dsm_out_buf_size;
> > > > +Aml *whilectx, *offset;
> > > >  uint8_t byte_list[1];
> > > >
> > > >  method = aml_method(NVDIMM_COMMON_DSM, 5,
> > > AML_SERIALIZED); @@
> > > > -1117,13 +1118,42 @@ static void nvdimm_build_common_dsm(Aml
> *dev)
> > > >  /* RLEN is not included in the payload returned to guest. */
> > > >  aml_append(method,
> > > aml_subtract(aml_name(NVDIMM_DSM_OUT_BUF_SIZE),
> > > > aml_int(4), dsm_out_buf_size));
> > > > +
> > > > +/*
> > > > + * As per ACPI spec 6.3, Table 19-419 Object Conversion Rules, if
> > > > + * the Buffer Field <= to the size of an Integer (in bits), it will
> > > > + * be treated as an integer. Moreover, the integer size depends on
> > > > + * DSDT tables revision number. If revision number is < 2, integer
> > > > + * size is 32 bits, otherwise it is 64 bits.
> > > > + * Because of this CreateField() canot be used if RLEN < Integer
> Size.
> 

RE: [PATCH v2 2/7] hw/acpi/nvdimm: Fix for NVDIMM incorrect DSM output buffer length

2020-03-10 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 06 February 2020 16:06
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org;
> xiaoguangrong.e...@gmail.com; m...@redhat.com; Linuxarm
> ; xuwei (O) ;
> shannon.zha...@gmail.com; ler...@redhat.com
> Subject: Re: [PATCH v2 2/7] hw/acpi/nvdimm: Fix for NVDIMM incorrect DSM
> output buffer length
> 
> On Fri, 17 Jan 2020 17:45:17 +
> Shameer Kolothum  wrote:
> 
> > As per ACPI spec 6.3, Table 19-419 Object Conversion Rules, if the
> > Buffer Field <= to the size of an Integer (in bits), it will be
> > treated as an integer. Moreover, the integer size depends on DSDT
> > tables revision number. If revision number is < 2, integer size is 32
> > bits, otherwise it is 64 bits. Current NVDIMM common DSM aml code
> > (NCAL) uses CreateField() for creating DSM output buffer. This creates
> > an issue in arm/virt platform where DSDT revision number is 2 and
> > results in DSM buffer with a wrong
> > size(8 bytes) gets returned when actual length is < 8 bytes.
> > This causes guest kernel to report,
> >
> > "nfit ACPI0012:00: found a zero length table '0' parsing nfit"
> >
> > In order to fix this, aml code is now modified such that it builds the
> > DSM output buffer in a byte by byte fashion when length is smaller
> > than Integer size.
> >
> > Suggested-by: Igor Mammedov 
> > Signed-off-by: Shameer Kolothum 
> > ---
> > Please find the previous discussion on this here,
> > https://patchwork.kernel.org/cover/11174959/
> >
> > ---
> >  hw/acpi/nvdimm.c| 36
> +++--
> >  tests/qtest/bios-tables-test-allowed-diff.h |  2 ++
> >  2 files changed, 35 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c index
> > 9fdad6dc3f..5e7b8318d0 100644
> > --- a/hw/acpi/nvdimm.c
> > +++ b/hw/acpi/nvdimm.c
> > @@ -964,6 +964,7 @@ static void nvdimm_build_common_dsm(Aml *dev)
> >  Aml *method, *ifctx, *function, *handle, *uuid, *dsm_mem,
> *elsectx2;
> >  Aml *elsectx, *unsupport, *unpatched, *expected_uuid, *uuid_invalid;
> >  Aml *pckg, *pckg_index, *pckg_buf, *field, *dsm_out_buf,
> > *dsm_out_buf_size;
> > +Aml *whilectx, *offset;
> >  uint8_t byte_list[1];
> >
> >  method = aml_method(NVDIMM_COMMON_DSM, 5,
> AML_SERIALIZED); @@
> > -1117,13 +1118,42 @@ static void nvdimm_build_common_dsm(Aml *dev)
> >  /* RLEN is not included in the payload returned to guest. */
> >  aml_append(method,
> aml_subtract(aml_name(NVDIMM_DSM_OUT_BUF_SIZE),
> > aml_int(4), dsm_out_buf_size));
> > +
> > +/*
> > + * As per ACPI spec 6.3, Table 19-419 Object Conversion Rules, if
> > + * the Buffer Field <= to the size of an Integer (in bits), it will
> > + * be treated as an integer. Moreover, the integer size depends on
> > + * DSDT tables revision number. If revision number is < 2, integer
> > + * size is 32 bits, otherwise it is 64 bits.
> > + * Because of this CreateField() canot be used if RLEN < Integer Size.
> > + * Hence build dsm_out_buf byte by byte.
> > + */
> > +ifctx = aml_if(aml_lless(dsm_out_buf_size,
> > + aml_sizeof(aml_int(0;
> 
> this decomplies into
> 
>  If (Local1 < SizeOf ())
> 
> which doesn't look right

Ok. I tried printing the value returned(SizeOf) and that looks alright.

Anyway, changed it into aml_int(1) which decompiles to

   If (Local1 < SizeOf (One))

Hope this is acceptable.

Thanks,
Shameer



RE: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

2020-02-28 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: 13 February 2020 17:09
> To: Shameerali Kolothum Thodi ;
> Igor Mammedov 
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> xuwei (O) ; Linuxarm ;
> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com;
> dgilb...@redhat.com; Juan Jose Quintela Carreira 
> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

[...]

> >> Thanks for that. I had a go with the below patch and it indeed fixes the 
> >> issue
> >> of callback not being called on resize. But the migration fails with the 
> >> below
> >> error,
> >>
> >> For x86
> >> -
> >> qemu-system-x86_64: Unknown combination of migration flags: 0x14
> >> qemu-system-x86_64: error while loading state for instance 0x0 of device
> 'ram'
> >> qemu-system-x86_64: load of migration failed: Invalid argument
> >>
> >> For arm64
> >> --
> >> qemu-system-aarch64: Received an unexpected compressed page
> >> qemu-system-aarch64: error while loading state for instance 0x0 of device
> 'ram'
> >> qemu-system-aarch64: load of migration failed: Invalid argument
> >>
> >> I haven’t debugged this further but looks like there is a corruption
> happening.
> >> Please let me know if you have any clue.
> >
> > The issue is
> >
> > qemu_put_be64(f, ram_bytes_total_common(true) |
> RAM_SAVE_FLAG_MEM_SIZE)
> >
> > The total ram size we store must be page aligned, otherwise it will be
> > detected as flags. Hm ... maybe we can round it up ...
> >
> 
> I'm afraid we can't otherwise we will run into issues in
> ram_load_precopy(). Hm ...

Sorry, took a while to get back on this. Yes, round up indeed breaks in
ram_load_precopy() . I had the below on top of your patch and that 
seems to do the job (sanity tested on arm/virt).

Please take a look and let me know if you see any issues with this approach.

Thanks,
Shameer

diff --git a/migration/ram.c b/migration/ram.c
index 2acc4b85ca..7447f0cefa 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1782,7 +1782,7 @@ static uint64_t ram_bytes_total_migration(void)
 RCU_READ_LOCK_GUARD();
 
 RAMBLOCK_FOREACH_MIGRATABLE(block) {
-total += ramblock_ram_bytes_migration(block);
+total += block->used_length;
 }
 return total;
 }
@@ -3479,7 +3479,7 @@ static int ram_load_precopy(QEMUFile *f)
 ret = -EINVAL;
 }
 
-total_ram_bytes -= length;
+total_ram_bytes -= block->used_length;
 }
 break;





RE: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

2020-02-13 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: 12 February 2020 18:21
> To: Shameerali Kolothum Thodi ;
> Igor Mammedov 
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> xuwei (O) ; Linuxarm ;
> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com;
> dgilb...@redhat.com; Juan Jose Quintela Carreira 
> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

[...]

> > Hmm..it breaks x86 + seabios boot. The issue is seabios expects RSDP in FSEG
> > memory. With the above proposed change, RSDP will be aligned to PAGE_SIZE
> and
> > seabios mem allocation for RSDP fails at,
> >
> >
> https://github.com/coreboot/seabios/blob/master/src/fw/romfile_loader.c#L8
> 5
> >
> > To get pass the above, I changed "alloc_fseg" flag to false in 
> > build_rsdp(), but
> > seabios seems to make the assumption that RSDP has to be placed in FSEG
> memory
> > here,
> > https://github.com/coreboot/seabios/blob/master/src/fw/biostables.c#L126
> >
> > So doesn’t look like there is an easy fix for this without changing the 
> > seabios
> code.
> >
> > Between, OVMF works fine with the aligned size on x86.
> >
> > One thing we can do is treat the RSDP case separately or only use the 
> > aligned
> > page size for "etc/acpi/tables" as below,

[...]

> >
> > Thoughts?
> 
> I don't think introducing memory_region_get_used_length() is a
> good idea. I also dislike, that the ram block size can differ
> to the memory region size. I wasn't aware of that condition, sorry!

Right. They can differ in size and is the case here.

> Making the memory region always store an aligned size might break other use
> cases.
> 
> Summarizing the issue:
> 1. Memory regions contain ram blocks with a different size, if the size is
>not properly aligned. While memory regions can have an unaligned size,
>ram blocks can't. This is true when creating resizable memory region with
>an unaligned size.
> 2. When resizing a ram block/memory region, the size of the memory region
>is set to the aligned size. The callback is called with the aligned size.
>The unaligned piece is lost.
> 3. When migrating, we migrate the aligned size.
> 
>
> What about something like this: (untested)

Thanks for that. I had a go with the below patch and it indeed fixes the issue
of callback not being called on resize. But the migration fails with the below
error,

For x86
-
qemu-system-x86_64: Unknown combination of migration flags: 0x14
qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram'
qemu-system-x86_64: load of migration failed: Invalid argument 

For arm64
--
qemu-system-aarch64: Received an unexpected compressed page
qemu-system-aarch64: error while loading state for instance 0x0 of device 'ram'
qemu-system-aarch64: load of migration failed: Invalid argument
 
I haven’t debugged this further but looks like there is a corruption happening.
Please let me know if you have any clue.

Thanks,
Shameer

> 
> From d84c21bc67e15acdac2f6265cd1576d8dd920211 Mon Sep 17 00:00:00
> 2001
> From: David Hildenbrand 
> Date: Wed, 12 Feb 2020 19:16:34 +0100
> Subject: [PATCH v1] tmp
> 
> Signed-off-by: David Hildenbrand 
> ---
>  exec.c  | 14 --
>  migration/ram.c | 44 
>  2 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 05cfe868ab..d41a1e11b5 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2130,11 +2130,21 @@ static int memory_try_enable_merging(void
> *addr, size_t len)
>   */
>  int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
>  {
> +const ram_addr_t unaligned_size = newsize;
> +
>  assert(block);
> 
>  newsize = HOST_PAGE_ALIGN(newsize);
> 
>  if (block->used_length == newsize) {
> +/*
> + * We don't have to resize the ram block (which only knows aligned
> + * sizes), however, we have to notify if the unaligned size changed.
> + */
> +if (block->resized && unaligned_size !=
> memory_region_size(block->mr)) {
> +block->resized(block->idstr, unaligned_size, block->host);
> +memory_region_set_size(block->mr, unaligned_size);
> +}
>  return 0;
>  }
> 
> @@ -2158,9 +2168,9 @@ int qemu_ram_resize(RAMBlock *block,
> ram_addr_t newsize, Error **errp)
>  block->used_length = newsize;
>  cpu_physical_memory_set_dirty_range(block->offset,
&g

RE: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

2020-02-12 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: 10 February 2020 09:54
> To: Shameerali Kolothum Thodi ;
> Igor Mammedov 
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> xuwei (O) ; Linuxarm ;
> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com
> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback
> 
> On 10.02.20 10:50, Shameerali Kolothum Thodi wrote:
> >
> >
> >> -Original Message-
> >> From: David Hildenbrand [mailto:da...@redhat.com]
> >> Sent: 10 February 2020 09:29
> >> To: Shameerali Kolothum Thodi ;
> >> Igor Mammedov 
> >> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> >> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> >> xuwei (O) ; Linuxarm ;
> >> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com
> >> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback
> >>
> >>>> Can you look the original value up somehow and us the resize callback
> >>>> only as a notification that something changed? (that value would have to
> >>>> be stored somewhere and migrated I assume - maybe that's already
> being
> >>>> done)
> >>>
> >>> Ok. I will take a look at that. But can we instead pass the
> block->used_length
> >> to
> >>> fw_cfg_add_file_callback(). That way we don’t have to change the
> >> qemu_ram_resize()
> >>> as well. I think Igor has suggested this before[1] and I had a go at it 
> >>> before
> >> coming up
> >>> with the "req_length" proposal here.
> >>
> >> You mean, passing the old size as well? I don't see how that will solve
> >> the issue, but yeah, nothing speaks against simply sending the old and
> >> the new size.
> >
> > Nope. I actually meant using the block->used_length to store in the
> > s->files->f[index].size.
> >
> > virt_acpi_setup()
> >   acpi_add_rom_blob()
> > rom_add_blob()
> >   rom_set_mr()  --> used_length  = page aligned blob size
> > fw_cfg_add_file_callback()  --> uses actual blob size.
> >
> >
> > Right now what we do is use the actual blob size to store in FWCfgEntry.
> > Instead pass the RAMBlock used_length to fw_cfg_add_file_callback().
> > Of course by this, the firmware will see an aligned size, but that is fine 
> > I think.
> > But at the same time this means the qemu_ram_resize() can stay as it is
> > because it will invoke the callback when the size changes beyond the aligned
> > page size. And also during migration, there won't be any inconsistency as
> everyone
> > works on aligned page size.
> >
> > Does that make sense? Or I am again missing something here?
> 
> Oh, you mean simply rounding up to full pages in the fw entries? If we
> can drop the "sub-page" restriction, that would be awesome!
> 
> Need to double check if that could be an issue for fw/migration/whatever.

Hmm..it breaks x86 + seabios boot. The issue is seabios expects RSDP in FSEG
memory. With the above proposed change, RSDP will be aligned to PAGE_SIZE and
seabios mem allocation for RSDP fails at,

https://github.com/coreboot/seabios/blob/master/src/fw/romfile_loader.c#L85

To get pass the above, I changed "alloc_fseg" flag to false in build_rsdp(), but
seabios seems to make the assumption that RSDP has to be placed in FSEG memory
here,
https://github.com/coreboot/seabios/blob/master/src/fw/biostables.c#L126

So doesn’t look like there is an easy fix for this without changing the seabios 
code.

Between, OVMF works fine with the aligned size on x86.

One thing we can do is treat the RSDP case separately or only use the aligned
page size for "etc/acpi/tables" as below,

diff --git a/hw/core/loader.c b/hw/core/loader.c
index d1b78f60cd..f07f6a7a35 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -60,6 +60,7 @@
 #include "hw/boards.h"
 #include "qemu/cutils.h"
 #include "sysemu/runstate.h"
+#include "hw/acpi/aml-build.h"
 
 #include 
 
@@ -1056,6 +1057,7 @@ MemoryRegion *rom_add_blob(const char *name, const void 
*blob, size_t len,
 if (fw_file_name && fw_cfg) {
 char devpath[100];
 void *data;
+size_t size = rom->datasize;
 
 if (read_only) {
 snprintf(devpath, sizeof(devpath), "/rom@%s", fw_file_name);
@@ -1066,13 +1068,21 @@ MemoryRegion *rom_add_blob(const char *name, const void 
*blob, size_t 

RE: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

2020-02-10 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: 10 February 2020 09:29
> To: Shameerali Kolothum Thodi ;
> Igor Mammedov 
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> xuwei (O) ; Linuxarm ;
> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com
> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback
> 
> >> Can you look the original value up somehow and us the resize callback
> >> only as a notification that something changed? (that value would have to
> >> be stored somewhere and migrated I assume - maybe that's already being
> >> done)
> >
> > Ok. I will take a look at that. But can we instead pass the 
> > block->used_length
> to
> > fw_cfg_add_file_callback(). That way we don’t have to change the
> qemu_ram_resize()
> > as well. I think Igor has suggested this before[1] and I had a go at it 
> > before
> coming up
> > with the "req_length" proposal here.
> 
> You mean, passing the old size as well? I don't see how that will solve
> the issue, but yeah, nothing speaks against simply sending the old and
> the new size.

Nope. I actually meant using the block->used_length to store in the 
s->files->f[index].size. 

virt_acpi_setup()
  acpi_add_rom_blob()
rom_add_blob()
  rom_set_mr()  --> used_length  = page aligned blob size
fw_cfg_add_file_callback()  --> uses actual blob size.


Right now what we do is use the actual blob size to store in FWCfgEntry.
Instead pass the RAMBlock used_length to fw_cfg_add_file_callback().
Of course by this, the firmware will see an aligned size, but that is fine I 
think.
But at the same time this means the qemu_ram_resize() can stay as it is 
because it will invoke the callback when the size changes beyond the aligned
page size. And also during migration, there won't be any inconsistency as 
everyone
works on aligned page size.

Does that make sense? Or I am again missing something here?

Thanks,
Shameer

> --
> Thanks,
> 
> David / dhildenb



RE: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

2020-02-07 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: 06 February 2020 14:55
> To: Shameerali Kolothum Thodi ;
> Igor Mammedov 
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> xuwei (O) ; Linuxarm ;
> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com
> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback
> 
> On 06.02.20 12:28, Shameerali Kolothum Thodi wrote:
> >
> >
> >> -Original Message-
> >> From: David Hildenbrand [mailto:da...@redhat.com]
> >> Sent: 06 February 2020 10:56
> >> To: Shameerali Kolothum Thodi ;
> >> Igor Mammedov 
> >> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> >> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> >> xuwei (O) ; Linuxarm ;
> >> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com
> >> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback
> >
> > [...]
> >
> >>> root@ubuntu:/# cat /dev/pmem
> >>> pmem0  pmem1
> >>>
> >>> From the logs, it looks like the ram_load_precopy() --> qemu_ram_resize()
> is
> >> not
> >>> called as length == used_length and both seems to be page aligned values.
> >>> And from
> >> https://github.com/qemu/qemu/blob/master/migration/ram.c#L3421
> >>> qemu_ram_resize() is called with length if length != used_length.
> >>
> >> Assume on your source, the old size is 12345 bytes. So 16384 aligned up
> >> (4 pages).
> >>
> >> Assume on your target, the new size is 123456 bytes, so 126976 aligned
> >> up (31 pages).
> >>
> >> If you migrate from source to destination, the migration code would
> >> resize to 16384, although the "actual size" is 12345. The callback will
> >> be called with the aligned size, not the actual size. Same the other way
> >> around. That's what's inconsistent IMHO.
> >
> > Thanks. You are right. I didn’t consider the case where the target can be
> > configured with a larger number of devices than the source. I can replicate
> > the scenario now,
> >
> > Source:
> >
> > fw_cfg_add_file_callback: filename etc/boot-fail-wait size 0x4
> > fw_cfg_add_file_callback: filename etc/acpi/nvdimm-mem size 0x1000
> > fw_cfg_add_file_callback: filename etc/acpi/tables size 0x6210
> >
> > Target:
> > ram_load_precopy: Ram blk mem1 length 0x4000 used_length
> 0x4000
> > ram_load_precopy: Ram blk virt.flash0 length 0x400 used_length
> 0x400
> > ram_load_precopy: Ram blk virt.flash1 length 0x400 used_length
> 0x400
> > ram_load_precopy: Ram blk /rom@etc/acpi/tables length 0x7000
> used_length 0x8000
> > fw_cfg_modify_file: filename etc/acpi/tables len 0x7000
> >
> > Target updates FWCfgEntry with a page aligned size :(. I will look into 
> > this and
> see how
> > we can solve this. Any pointers welcome.
> 
> Can you look the original value up somehow and us the resize callback
> only as a notification that something changed? (that value would have to
> be stored somewhere and migrated I assume - maybe that's already being
> done)

Ok. I will take a look at that. But can we instead pass the block->used_length 
to
fw_cfg_add_file_callback(). That way we don’t have to change the 
qemu_ram_resize()
as well. I think Igor has suggested this before[1] and I had a go at it before 
coming up
with the "req_length" proposal here.

Thanks,
Shameer

[1] 
https://lore.kernel.org/qemu-devel/323aa74a92934b6a989e6e4dbe0df...@huawei.com/




RE: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

2020-02-06 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: 06 February 2020 10:56
> To: Shameerali Kolothum Thodi ;
> Igor Mammedov 
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> xuwei (O) ; Linuxarm ;
> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com
> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

[...]
 
> > root@ubuntu:/# cat /dev/pmem
> > pmem0  pmem1
> >
> > From the logs, it looks like the ram_load_precopy() --> qemu_ram_resize() is
> not
> > called as length == used_length and both seems to be page aligned values.
> > And from
> https://github.com/qemu/qemu/blob/master/migration/ram.c#L3421
> > qemu_ram_resize() is called with length if length != used_length.
> 
> Assume on your source, the old size is 12345 bytes. So 16384 aligned up
> (4 pages).
> 
> Assume on your target, the new size is 123456 bytes, so 126976 aligned
> up (31 pages).
> 
> If you migrate from source to destination, the migration code would
> resize to 16384, although the "actual size" is 12345. The callback will
> be called with the aligned size, not the actual size. Same the other way
> around. That's what's inconsistent IMHO.

Thanks. You are right. I didn’t consider the case where the target can be
configured with a larger number of devices than the source. I can replicate
the scenario now,

Source:

fw_cfg_add_file_callback: filename etc/boot-fail-wait size 0x4
fw_cfg_add_file_callback: filename etc/acpi/nvdimm-mem size 0x1000
fw_cfg_add_file_callback: filename etc/acpi/tables size 0x6210

Target:
ram_load_precopy: Ram blk mem1 length 0x4000 used_length 0x4000
ram_load_precopy: Ram blk virt.flash0 length 0x400 used_length 0x400
ram_load_precopy: Ram blk virt.flash1 length 0x400 used_length 0x400
ram_load_precopy: Ram blk /rom@etc/acpi/tables length 0x7000 used_length 0x8000
fw_cfg_modify_file: filename etc/acpi/tables len 0x7000

Target updates FWCfgEntry with a page aligned size :(. I will look into this 
and see how
we can solve this. Any pointers welcome.

Cheers,
Shameer


RE: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

2020-02-06 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: 05 February 2020 16:41
> To: Shameerali Kolothum Thodi ;
> Igor Mammedov 
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> xuwei (O) ; Linuxarm ;
> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com
> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback
> 
> >> Oh, and one more reason why the proposal in this patch is inconsistent:
> >>
> >> When migrating resizable memory regions (RAM_SAVE_FLAG_MEM_SIZE)
> we
> >> store the block->used_length (ram_save_setup()) and use that value to
> >> resize the region on the target (ram_load_precopy() -> qemu_ram_resize()).
> >>
> >> This will be the value the callback will be called with. Page aligned.
> >>
> >
> > Sorry, I didn’t quite get that point and not sure how "req_length" approach
> > will affect the migration.
> 
> The issue is that on migration, you will lose the sub-page size either
> way. So your callback will be called
> - on the migration source with a sub-page size (via
>   memory_region_ram_resize() from e.g., hw/i386/acpi-build.c)
> - on the migration target with a page-aligned size (via
>   qemu_ram_resize() from migration/ram.c)
> 
> So this is inconsistent, especially when migrating.

Thanks for explaining. I tried to add some debug prints to further understand
what actually happens during migration case.

Guest-source with initial one nvdimm


-object memory-backend-ram,id=mem1,size=1G \
-device nvdimm,id=dimm1,memdev=mem1 \

fw_cfg_add_file_callback: filename etc/boot-fail-wait size 0x4
fw_cfg_add_file_callback: filename etc/acpi/nvdimm-mem size 0x1000
fw_cfg_add_file_callback: filename etc/acpi/tables size 0x55f4
fw_cfg_add_file_callback: filename etc/table-loader size 0xd00
fw_cfg_add_file_callback: filename etc/tpm/log size 0x0
fw_cfg_add_file_callback: filename etc/acpi/rsdp size 0x24
fw_cfg_add_file_callback: filename etc/smbios/smbios-tables size 0x104
fw_cfg_add_file_callback: filename etc/smbios/smbios-anchor size 0x18
fw_cfg_modify_file: filename bootorder len 0x0
fw_cfg_add_file_callback: filename bootorder size 0x0
fw_cfg_modify_file: filename bios-geometry len 0x0
fw_cfg_add_file_callback: filename bios-geometry size 0x0
fw_cfg_modify_file: filename etc/acpi/tables len 0x55f4
fw_cfg_modify_file: filename etc/acpi/rsdp len 0x24
fw_cfg_modify_file: filename etc/table-loader len 0xd00


hot add another nvdimm device,

(qemu) object_add memory-backend-ram,id=mem2,size=1G
(qemu) device_add nvdimm,id=dimm2,memdev=mem2


root@ubuntu:/# cat /dev/pmem
pmem0  pmem1

Guest-target with two nvdimms
---

-object memory-backend-ram,id=mem1,size=1G \
-device nvdimm,id=dimm1,memdev=mem1 \
-object memory-backend-ram,id=mem2,size=1G \
-device nvdimm,id=dimm2,memdev=mem2 \

fw_cfg_add_file_callback: filename etc/boot-fail-wait size 0x4
fw_cfg_add_file_callback: filename etc/acpi/nvdimm-mem size 0x1000
fw_cfg_add_file_callback: filename etc/acpi/tables size 0x56ac
fw_cfg_add_file_callback: filename etc/table-loader size 0xd00
fw_cfg_add_file_callback: filename etc/tpm/log size 0x0
fw_cfg_add_file_callback: filename etc/acpi/rsdp size 0x24
fw_cfg_add_file_callback: filename etc/smbios/smbios-tables size 0x104
fw_cfg_add_file_callback: filename etc/smbios/smbios-anchor size 0x18
fw_cfg_modify_file: filename bootorder len 0x0
fw_cfg_add_file_callback: filename bootorder size 0x0
fw_cfg_modify_file: filename bios-geometry len 0x0
fw_cfg_add_file_callback: filename bios-geometry size 0x0


Initiate migration Source --> Target,

ram_load_precopy: Ram blk mach-virt.ram length 0x1 used_length 
0x1
ram_load_precopy: Ram blk mem1 length 0x4000 used_length 0x4000
ram_load_precopy: Ram blk mem2 length 0x4000 used_length 0x4000
ram_load_precopy: Ram blk virt.flash0 length 0x400 used_length 0x400
ram_load_precopy: Ram blk virt.flash1 length 0x400 used_length 0x400
ram_load_precopy: Ram blk /rom@etc/acpi/tables length 0x6000 used_length 0x6000
ram_load_precopy: Ram blk :00:01.0/virtio-net-pci.rom length 0x4 
used_length 0x4
ram_load_precopy: Ram blk /rom@etc/table-loader length 0x1000 used_length 0x1000
ram_load_precopy: Ram blk /rom@etc/acpi/rsdp length 0x1000 used_length 0x1000


root@ubuntu:/# cat /dev/pmem
pmem0  pmem1  

From the logs, it looks like the ram_load_precopy() --> qemu_ram_resize() is not
called as length == used_length and both seems to be page aligned values.
And from https://github.com/qemu/qemu/blob/master/migration/ram.c#L3421
qemu_ram_resize() is called with length if length != used_length.

Of course my knowledge o

RE: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback

2020-02-05 Thread Shameerali Kolothum Thodi
Hi David,

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of David Hildenbrand
> Sent: 04 February 2020 19:05
> To: Igor Mammedov ; Shameerali Kolothum Thodi
> 
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> m...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> xuwei (O) ; Linuxarm ;
> eric.au...@redhat.com; qemu-...@nongnu.org; ler...@redhat.com
> Subject: Re: [PATCH v2 1/7] exec: Fix for qemu_ram_resize() callback
> 
> On 04.02.20 17:44, David Hildenbrand wrote:
> > On 04.02.20 16:23, Igor Mammedov wrote:
> >> On Fri, 17 Jan 2020 17:45:16 +
> >> Shameer Kolothum  wrote:
> >>
> >>> If ACPI blob length modifications happens after the initial
> >>> virt_acpi_build() call, and the changed blob length is within
> >>> the PAGE size boundary, then the revised size is not seen by
> >>> the firmware on Guest reboot. The is because in the
> >>> virt_acpi_build_update() -> acpi_ram_update() -> qemu_ram_resize()
> >>> path, qemu_ram_resize() uses used_length (ram_block size which is
> >>> aligned to PAGE size) and the "resize callback" to update the size
> >>> seen by firmware is not getting invoked.
> >>>
> >>> Hence make sure callback is called if the new size is different
> >>> from original requested size.
> >>>
> >>> Signed-off-by: Shameer Kolothum
> 
> >>> ---
> >>> Please find the previous discussions on this issue here,
> >>> https://patchwork.kernel.org/patch/11174947/
> >>>
> >>> But this one attempts a different solution to fix it by introducing
> >>> req_length var to RAMBlock struct.
> >>>
> >>
> >> looks fine to me, so
> >> Acked-by: Igor Mammedov 
> >
> > Thanks for CCing.
> >
> > This in fact collides with my changes ... but not severely :)
> >
> >>
> >> CCing David who touches this area in his latest series for and
> >> might have an opinion on how it should be handled.
> >>
> >
> > So we are talking about sub-page size changes? I somewhat dislike
> > storing "req_length" in ram blocks. Looks like sub-pages stuff does not
> > belong there.

Thanks for taking a look at this. Agree, I didn’t like that "req_length" either.

> > Ram blocks only operate on page granularity. Ram block notifiers only
> > operate on page granularity. Memory regions only operate on page
> > granularity. Dirty bitmaps operate on page granularity. Especially,
> > memory_region_size(mr) will always return aligned values.
> >
> > I think users/owner should deal with anything smaller manually if
> > they really need it.
> >
> > What about always calling the resized() callback and letting the
> > actual owner figure out if the size changed on sub-page granularity
> > or not? (by looking up the size manually using some mechanism not glued to
> > memory regions/ram blocks/whatever)
> >
> > diff --git a/exec.c b/exec.c
> > index 67e520d18e..59d46cc388 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -2130,6 +2130,13 @@ int qemu_ram_resize(RAMBlock *block,
> ram_addr_t newsize, Error **errp)
> >  newsize = HOST_PAGE_ALIGN(newsize);
> >
> >  if (block->used_length == newsize) {
> > +/*
> > + * The owner might want to handle sub-page resizes. We only
> provide
> > + * the aligned size - because ram blocks are always page aligned.
> > + */
> > +if (block->resized) {
> > +block->resized(block->idstr, newsize, block->host);

Does it make sense to pass the requested size in the callback than the aligned 
size
as the owner might be interested more in the org_req_size vs new_req _size case?

> > +}
> >  return 0;
> >  }
> >

 
> Oh, and one more reason why the proposal in this patch is inconsistent:
> 
> When migrating resizable memory regions (RAM_SAVE_FLAG_MEM_SIZE) we
> store the block->used_length (ram_save_setup()) and use that value to
> resize the region on the target (ram_load_precopy() -> qemu_ram_resize()).
> 
> This will be the value the callback will be called with. Page aligned.
> 

Sorry, I didn’t quite get that point and not sure how "req_length" approach 
will affect the migration.

Anyway, I have reworked the patch(below) with the above suggestion, that is
always calling the resized() callback, but m

RE: [PATCH v2 0/7] ARM virt: Add NVDIMM support

2020-01-29 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 28 January 2020 15:29
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com; m...@redhat.com;
> xiaoguangrong.e...@gmail.com; xuwei (O) ;
> ler...@redhat.com; Linuxarm 
> Subject: Re: [PATCH v2 0/7] ARM virt: Add NVDIMM support
> 
> Hi Shameer,
> 
> On 1/17/20 6:45 PM, Shameer Kolothum wrote:
> > This series adds NVDIMM support to arm/virt platform.
> > The series reuses some of the patches posted by Eric
> > in his earlier attempt here[1].
> >
> > Patch #1 is a fix to the Guest reboot issue on NVDIMM
> > hot add case described here[2] and patch #2 is another
> > fix to the nvdimm aml issue discussed here[3].
> >
> > I have done a basic sanity testing of NVDIMM deviecs
> > with Guest booting with both ACPI and DT. Further testing
> > is always welcome.
> >
> > Please let me know your feedback.
> 
> 
> With this version, I do not get the former spurious warning reported on v1.
> 
> I can see the nvdimm device topology using ndctl. So it looks fine to me.

Thanks for giving it a spin and confirming. 

> Unfortunately we cannot test with DAX as kernel dependencies are not yet
> resolved yet but this is an independent problem.

True. I did previously test DAX with "arm64/mm: Enable memory hot remove"
Patch series and that seems to work fine.

Cheers,
Shameer


 
> Thanks
> 
> Eric
> >
> > Thanks,
> > Shameer
> >
> > [1] https://patchwork.kernel.org/cover/10830777/
> > [2] https://patchwork.kernel.org/patch/11154757/
> > [3] https://patchwork.kernel.org/cover/11174959/
> >
> > v1 --> v2
> >  -Reworked patch #1 and now fix is inside qemu_ram_resize().
> >  -Added patch #2 to fix the nvdim aml issue.
> >  -Dropped support to DT cold plug.
> >  -Updated test_acpi_virt_tcg_memhp() with pc-dimm and nvdimms(patch
> #7)
> >
> > Kwangwoo Lee (2):
> >   nvdimm: Use configurable ACPI IO base and size
> >   hw/arm/virt: Add nvdimm hot-plug infrastructure
> >
> > Shameer Kolothum (5):
> >   exec: Fix for qemu_ram_resize() callback
> >   hw/acpi/nvdimm: Fix for NVDIMM incorrect DSM output  buffer  length
> >   hw/arm/virt: Add nvdimm hotplug support
> >   tests: Update ACPI tables list for upcoming arm/virt test changes
> >   tests/bios-tables-test: Update arm/virt memhp test
> >
> >  docs/specs/acpi_hw_reduced_hotplug.rst  |  1 +
> >  exec.c  | 36 +++
> >  hw/acpi/generic_event_device.c  | 13 
> >  hw/acpi/nvdimm.c| 68
> +
> >  hw/arm/Kconfig  |  1 +
> >  hw/arm/virt-acpi-build.c|  6 ++
> >  hw/arm/virt.c   | 35 +--
> >  hw/i386/acpi-build.c|  6 ++
> >  hw/i386/acpi-build.h|  3 +
> >  hw/i386/pc_piix.c   |  2 +
> >  hw/i386/pc_q35.c|  2 +
> >  hw/mem/Kconfig  |  2 +-
> >  include/exec/ram_addr.h |  5 +-
> >  include/hw/acpi/generic_event_device.h  |  1 +
> >  include/hw/arm/virt.h   |  1 +
> >  include/hw/mem/nvdimm.h |  3 +
> >  tests/data/acpi/virt/NFIT.memhp |  0
> >  tests/data/acpi/virt/SSDT.memhp |  0
> >  tests/qtest/bios-tables-test-allowed-diff.h |  5 ++
> >  tests/qtest/bios-tables-test.c  |  9 ++-
> >  20 files changed, 163 insertions(+), 36 deletions(-)
> >  create mode 100644 tests/data/acpi/virt/NFIT.memhp
> >  create mode 100644 tests/data/acpi/virt/SSDT.memhp
> >




RE: [PATCH v2 7/7] tests/bios-tables-test: Update arm/virt memhp test

2020-01-29 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of Auger Eric
> Sent: 28 January 2020 16:29
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; xiaoguangrong.e...@gmail.com;
> m...@redhat.com; Linuxarm ; xuwei (O)
> ; shannon.zha...@gmail.com; ler...@redhat.com
> Subject: Re: [PATCH v2 7/7] tests/bios-tables-test: Update arm/virt memhp
> test
> 
> Hi Shameer,
> 
> On 1/17/20 6:45 PM, Shameer Kolothum wrote:
> > Since we now have both pc-dimm and nvdimm support, update
> > test_acpi_virt_tcg_memhp() to include those.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  tests/data/acpi/virt/NFIT.memhp | 0
> >  tests/data/acpi/virt/SSDT.memhp | 0
> Is it normal to have those 2 above void files? I lost track about the
> process.

I guess so :). From tests/qtest/bios-tables-test.c,

/*
 * How to add or update the tests:
 * Contributor:
 * 1. add empty files for new tables, if any, under tests/data/acpi
 * 2. list any changed files in tests/bios-tables-test-allowed-diff.h
 * 3. commit the above *before* making changes that affect the tables
 ...

After reading that again, I am not sure those empty files can be in this
Patch or not. I can move it to 6/7.

> >  tests/qtest/bios-tables-test.c  | 9 +++--
> >  3 files changed, 7 insertions(+), 2 deletions(-)
> >  create mode 100644 tests/data/acpi/virt/NFIT.memhp
> >  create mode 100644 tests/data/acpi/virt/SSDT.memhp
> >
> > diff --git a/tests/data/acpi/virt/NFIT.memhp
> b/tests/data/acpi/virt/NFIT.memhp
> > new file mode 100644
> > index 00..e69de29bb2
> > diff --git a/tests/data/acpi/virt/SSDT.memhp
> b/tests/data/acpi/virt/SSDT.memhp
> > new file mode 100644
> > index 00..e69de29bb2
> > diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
> > index f1ac2d7e96..695d2e7fac 100644
> > --- a/tests/qtest/bios-tables-test.c
> > +++ b/tests/qtest/bios-tables-test.c
> > @@ -913,12 +913,17 @@ static void test_acpi_virt_tcg_memhp(void)
> >  };
> >
> >  data.variant = ".memhp";
> > -test_acpi_one(" -cpu cortex-a57"
> > +test_acpi_one(" -machine nvdimm=on"
> nit: maybe keep the same order as before ...
> > +  " -cpu cortex-a57"
> >" -m 256M,slots=3,maxmem=1G"
> and simply add ,nvdimm=on to above line.
> >" -object memory-backend-ram,id=ram0,size=128M"
> >" -object memory-backend-ram,id=ram1,size=128M"
> >" -numa node,memdev=ram0 -numa
> node,memdev=ram1"
> > -  " -numa dist,src=0,dst=1,val=21",
> > +  " -numa dist,src=0,dst=1,val=21"
> > +  " -object memory-backend-ram,id=ram2,size=128M"
> > +  " -object memory-backend-ram,id=nvm0,size=128M"
> > +  " -device pc-dimm,id=dimm0,memdev=ram2,node=0"
> > +  " -device nvdimm,id=dimm1,memdev=nvm0,node=1",
> >);
> >
> >  free_test_data();
> >

Ok. Noted.

Thanks,
Shameer



RE: [PATCH] tests: acpi: update path in rebuild-expected-aml

2020-01-16 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: 15 January 2020 13:46
> To: Shameerali Kolothum Thodi 
> Cc: Thomas Huth ; qemu-devel@nongnu.org;
> imamm...@redhat.com; xuwei (O) ; Linuxarm
> 
> Subject: Re: [PATCH] tests: acpi: update path in rebuild-expected-aml
> 
> On Wed, Jan 15, 2020 at 11:01:44AM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> >
> > > -Original Message-
> > > From: Thomas Huth [mailto:th...@redhat.com]
> > > Sent: 14 January 2020 17:08
> > > To: Shameerali Kolothum Thodi
> > > ;
> > > qemu-devel@nongnu.org; imamm...@redhat.com; m...@redhat.com
> > > Cc: xuwei (O) ; Linuxarm 
> > > Subject: Re: [PATCH] tests: acpi: update path in
> > > rebuild-expected-aml
> > >
> > > On 14/01/2020 17.51, Shameer Kolothum wrote:
> > > > Since commit 1e8a1fae7464("test: Move qtests to a separate
> > > > directory") qtests are now placed in a separate folder and this
> > > > breaks the script used to rebuild the expected ACPI tables for
> > > > bios-tables-test. Update the script with correct path.
> > > >
> > > > Fixes: 1e8a1fae7464("test: Move qtests to a separate directory")
> > > > Signed-off-by: Shameer Kolothum
> > > > 
> > > > ---
> > > >  tests/data/acpi/rebuild-expected-aml.sh | 6 +++---
> > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/tests/data/acpi/rebuild-expected-aml.sh
> > > b/tests/data/acpi/rebuild-expected-aml.sh
> > > > index f89d4624bc..d44e511533 100755
> > > > --- a/tests/data/acpi/rebuild-expected-aml.sh
> > > > +++ b/tests/data/acpi/rebuild-expected-aml.sh
> > > > @@ -14,7 +14,7 @@
> > > >
> > > >  qemu_bins="x86_64-softmmu/qemu-system-x86_64
> > > aarch64-softmmu/qemu-system-aarch64"
> > > >
> > > > -if [ ! -e "tests/bios-tables-test" ]; then
> > > > +if [ ! -e "tests/qtest/bios-tables-test" ]; then
> > > >  echo "Test: bios-tables-test is required! Run make check
> > > > before this
> > > script."
> > > >  echo "Run this script from the build directory."
> > > >  exit 1;
> > > > @@ -26,11 +26,11 @@ for qemu in $qemu_bins; do
> > > >  echo "Also, run this script from the build directory."
> > > >  exit 1;
> > > >  fi
> > > > -TEST_ACPI_REBUILD_AML=y QTEST_QEMU_BINARY=$qemu
> > > tests/bios-tables-test
> > > > +TEST_ACPI_REBUILD_AML=y QTEST_QEMU_BINARY=$qemu
> > > tests/qtest/bios-tables-test
> > > >  done
> > > >
> > > >  eval `grep SRC_PATH= config-host.mak`
> > > >
> > > > -echo '/* List of comma-separated changed AML files to ignore */'
> > > > >
> > > ${SRC_PATH}/tests/bios-tables-test-allowed-diff.h
> > > > +echo '/* List of comma-separated changed AML files to ignore */'
> > > > +>
> > > ${SRC_PATH}/tests/qtest/bios-tables-test-allowed-diff.h
> > > >
> > > >  echo "The files were rebuilt and can be added to git."
> > >
> > > Oh, sorry for missing that in my patch series ... is there maybe a
> > > way that we could test this script in one of our CI pipelines so
> > > that it is not so easy to miss?
> >
> > Right. That will be a useful one.
> >
> > I am also seeing another error when I run "make check-qtest" on x86_64.
> > This doesn’t seems to be related to the recent changes. I have gone
> > back to 4.1.0 and it is still there.
> >
> >   TESTcheck-qtest-x86_64: tests/boot-order-test
> >   TESTcheck-qtest-x86_64: tests/bios-tables-test
> > Could not access KVM kernel module: No such file or directory
> > qemu-system-x86_64: failed to initialize KVM: No such file or
> > directory
> > qemu-system-x86_64: Back to tcg accelerator Could not access KVM
> > kernel module: No such file or directory
> > qemu-system-x86_64: failed to initialize KVM: No such file or
> > directory
> > qemu-system-x86_64: Back to tcg accelerator Could not access KVM
> > kernel module: No such file or directory
> > qemu-system-x86_64: failed to initialize KVM: No such file or
> > directory
> > qemu-system-x86_64: Back to tcg accelerator
> > acpi-test: Warning! FACP binary file mismatch. Actual [aml:/tmp/aml-2Q9EE0],
> Expected [aml:tests/data/acpi/pc/FACP.bridge].
> > acpi-test: Warning! FACP mismatch. Actual [asl:/tmp/asl-CQ9EE0.dsl,
> aml:/tmp/aml-2Q9EE0], Expected [asl:/tmp/asl-N18EE0.dsl,
> aml:tests/data/acpi/pc/FACP.bridge].
> > **
> > ERROR:tests/bios-tables-test.c:447:test_acpi_asl: assertion failed:
> > (all_tables_match) ERROR - Bail out!
> > ERROR:tests/bios-tables-test.c:447:test_acpi_asl: assertion failed:
> > (all_tables_match) Aborted (core dumped)
> > /home/shameer/qemu/tests/Makefile.include:899: recipe for target
> > 'check-qtest-x86_64' failed
> > make: *** [check-qtest-x86_64] Error 1
> 
> Well make check seems to pass for me ... What's different for you?

I tried a fresh git clone of qemu and that seems to work fine. So I guess
it might be something to do with my earlier setup. Ignore and sorry
for the noise.

Thanks,
Shameer


RE: [PATCH] tests: acpi: update path in rebuild-expected-aml

2020-01-15 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Thomas Huth [mailto:th...@redhat.com]
> Sent: 14 January 2020 17:08
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; imamm...@redhat.com; m...@redhat.com
> Cc: xuwei (O) ; Linuxarm 
> Subject: Re: [PATCH] tests: acpi: update path in rebuild-expected-aml
> 
> On 14/01/2020 17.51, Shameer Kolothum wrote:
> > Since commit 1e8a1fae7464("test: Move qtests to a separate
> > directory") qtests are now placed in a separate folder and
> > this breaks the script used to rebuild the expected ACPI
> > tables for bios-tables-test. Update the script with correct
> > path.
> >
> > Fixes: 1e8a1fae7464("test: Move qtests to a separate directory")
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  tests/data/acpi/rebuild-expected-aml.sh | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/tests/data/acpi/rebuild-expected-aml.sh
> b/tests/data/acpi/rebuild-expected-aml.sh
> > index f89d4624bc..d44e511533 100755
> > --- a/tests/data/acpi/rebuild-expected-aml.sh
> > +++ b/tests/data/acpi/rebuild-expected-aml.sh
> > @@ -14,7 +14,7 @@
> >
> >  qemu_bins="x86_64-softmmu/qemu-system-x86_64
> aarch64-softmmu/qemu-system-aarch64"
> >
> > -if [ ! -e "tests/bios-tables-test" ]; then
> > +if [ ! -e "tests/qtest/bios-tables-test" ]; then
> >  echo "Test: bios-tables-test is required! Run make check before this
> script."
> >  echo "Run this script from the build directory."
> >  exit 1;
> > @@ -26,11 +26,11 @@ for qemu in $qemu_bins; do
> >  echo "Also, run this script from the build directory."
> >  exit 1;
> >  fi
> > -TEST_ACPI_REBUILD_AML=y QTEST_QEMU_BINARY=$qemu
> tests/bios-tables-test
> > +TEST_ACPI_REBUILD_AML=y QTEST_QEMU_BINARY=$qemu
> tests/qtest/bios-tables-test
> >  done
> >
> >  eval `grep SRC_PATH= config-host.mak`
> >
> > -echo '/* List of comma-separated changed AML files to ignore */' >
> ${SRC_PATH}/tests/bios-tables-test-allowed-diff.h
> > +echo '/* List of comma-separated changed AML files to ignore */' >
> ${SRC_PATH}/tests/qtest/bios-tables-test-allowed-diff.h
> >
> >  echo "The files were rebuilt and can be added to git."
> 
> Oh, sorry for missing that in my patch series ... is there maybe a way
> that we could test this script in one of our CI pipelines so that it is
> not so easy to miss?

Right. That will be a useful one.

I am also seeing another error when I run "make check-qtest" on x86_64.
This doesn’t seems to be related to the recent changes. I have gone back
to 4.1.0 and it is still there.

  TESTcheck-qtest-x86_64: tests/boot-order-test
  TESTcheck-qtest-x86_64: tests/bios-tables-test
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
acpi-test: Warning! FACP binary file mismatch. Actual [aml:/tmp/aml-2Q9EE0], 
Expected [aml:tests/data/acpi/pc/FACP.bridge].
acpi-test: Warning! FACP mismatch. Actual [asl:/tmp/asl-CQ9EE0.dsl, 
aml:/tmp/aml-2Q9EE0], Expected [asl:/tmp/asl-N18EE0.dsl, 
aml:tests/data/acpi/pc/FACP.bridge].
**
ERROR:tests/bios-tables-test.c:447:test_acpi_asl: assertion failed: 
(all_tables_match)
ERROR - Bail out! ERROR:tests/bios-tables-test.c:447:test_acpi_asl: assertion 
failed: (all_tables_match)
Aborted (core dumped)
/home/shameer/qemu/tests/Makefile.include:899: recipe for target 
'check-qtest-x86_64' failed
make: *** [check-qtest-x86_64] Error 1

FACP seems to have changed and it looks like need to run the script to generate
a new one.

~/qemu$ diff -u /tmp/asl-CQ9EE0.dsl /tmp/asl-N18EE0.dsl
--- /tmp/asl-CQ9EE0.dsl 2020-01-15 10:52:03.018448627 +
+++ /tmp/asl-N18EE0.dsl 2020-01-15 10:52:03.022448627 +
@@ -3,7 +3,7 @@
  * AML/ASL+ Disassembler version 20180105 (64-bit version)
  * Copyright (c) 2000 - 2018 Intel Corporation
  *
- * Disassembly of /tmp/aml-2Q9EE0, Wed Jan 15 10:52:03 2020
+ * Disassembly of tests/data/acpi/pc/FACP.bridge, Wed Jan 15 10:52:03 
+ 2020
  *
  * ACPI Data Table [FACP]
  *
@@ -13,7 +13,7 @@
 [000h    4]Signature : "FACP"[Fixed ACPI
Description Table (FADT)]
 [004h 0004   4] Table Length : 0074

RE: [PATCH 0/5] ARM virt: Add NVDIMM support

2020-01-13 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of Igor Mammedov
> Sent: 09 January 2020 17:13
> To: Shameerali Kolothum Thodi 
> Cc: peter.mayd...@linaro.org; drjo...@redhat.com;
> xiaoguangrong.e...@gmail.com; Auger Eric ;
> qemu-devel@nongnu.org; Linuxarm ;
> shannon.zha...@gmail.com; qemu-...@nongnu.org; xuwei (O)
> ; Jonathan Cameron
> ; ler...@redhat.com
> Subject: Re: [PATCH 0/5] ARM virt: Add NVDIMM support
> 
> On Mon, 6 Jan 2020 17:06:32 +
> Shameerali Kolothum Thodi  wrote:
> 
> > Hi Igor,

[...]

> > (+Jonathan)
> >
> > Thanks to Jonathan for taking a fresh look at this issue and spotting this,
> >
> https://elixir.bootlin.com/linux/v5.5-rc5/source/drivers/acpi/acpica/utmisc.c#L
> 109
> >
> > And, from ACPI 6.3, table 19-419
> >
> > "If the Buffer Field is smaller than or equal to the size of an Integer (in 
> > bits), it
> > will be treated as an Integer. Otherwise, it will be treated as a Buffer. 
> > The
> size
> > of an Integer is indicated by the Definition Block table header's Revision 
> > field.
> > A Revision field value less than 2 indicates that the size of an Integer is 
> > 32
> bits.
> > A value greater than or equal to 2 signifies that the size of an Integer is 
> > 64
> bits."
> >
> > It looks like the main reason for the difference in behavior of the buffer 
> > object
> > size between x86 and ARM/virt, is because of the Revision number used in
> the
> > DSDT table. On x86 it is 1 and ARM/virt it is 2.
> >
> > So most likely,
> >
> > > CreateField (ODAT, Zero, Local1, OBUF)
> 
> You are right, that's where it goes wrong, since OBUF
> is implicitly converted to integer if size is less than 64bits.
> 
> > > Concatenate (Buffer (Zero){}, OBUF, Local7)
> 
> see more below
> 
> [...]
> 
> >
> > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> > index 64eacfad08..621f9ffd41 100644
> > --- a/hw/acpi/nvdimm.c
> > +++ b/hw/acpi/nvdimm.c
> > @@ -1192,15 +1192,18 @@ static void nvdimm_build_fit(Aml *dev)
> >      aml_append(method, ifctx);
> >
> >      aml_append(method, aml_store(aml_sizeof(buf), buf_size));
> > -    aml_append(method, aml_subtract(buf_size,
> > -                                    aml_int(4) /* the size of
> "STAU" */,
> > -                                    buf_size));
> >
> >      /* if we read the end of fit. */
> > -    ifctx = aml_if(aml_equal(buf_size, aml_int(0)));
> > +    ifctx = aml_if(aml_equal(aml_subtract(buf_size,
> > +                             aml_sizeof(aml_int(0)), NULL),
> > +                             aml_int(0)));
> >      aml_append(ifctx, aml_return(aml_buffer(0, NULL)));
> >      aml_append(method, ifctx);
> >
> > +    aml_append(method, aml_subtract(buf_size,
> > +                                    aml_int(4) /* the size of
> "STAU" */,
> > +                                    buf_size));
> > +
> >      aml_append(method, aml_create_field(buf,
> >                              aml_int(4 * BITS_PER_BYTE), /* offset
> at byte 4.*/
> >                              aml_shiftleft(buf_size, aml_int(3)),
> "BUFF"));
> 
> Instead of covering up error in NCAL, I'd prefer original issue fixed.
> How about something like this pseudocode:
> 
> NTFI = Local6
> Local1 = (RLEN - 0x04)
> -Local1 = (Local1 << 0x03)
> -CreateField (ODAT, Zero, Local1, OBUF)
> -Concatenate (Buffer (Zero) {}, OBUF, Local7)
> 
> If (Local1 < IntegerSize)
> {
> Local7 = Buffer(0) // output buffer
> Local8 = 0 // index for being copied byte
> // build byte by byte output buffer
> while (Local8 < Local1) {
>Local9 = Buffer(1)
>// copy 1 byte at Local8 offset from ODAT to
> temporary buffer Local9
>Store(DeRef(Index(ODAT, Local8)), Index(Local9,
> 0))
>Concatenate (Local7, Local9, Local7)
>Increment(Local8)
> }
> return Local7
> } else {
> CreateField (ODAT, Zero, Local1, OBUF)
> return OBUF
> }
> 

Ok. This looks much better. I will test this and sent out a v2 soon addressing 
other
comments on this series as well.

Thanks,
Shameer


RE: [PATCH 0/5] ARM virt: Add NVDIMM support

2020-01-06 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 13 December 2019 12:52
> To: 'Igor Mammedov' 
> Cc: xiaoguangrong.e...@gmail.com; peter.mayd...@linaro.org;
> drjo...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> Linuxarm ; Auger Eric ;
> qemu-...@nongnu.org; xuwei (O) ;
> ler...@redhat.com
> Subject: RE: [PATCH 0/5] ARM virt: Add NVDIMM support
> 

[...]

> 
> Thanks for your help. I did spend some more time debugging this further.
> I tried to introduce a totally new Buffer field object with different
> sizes and printing the size after creation.
> 
> --- SSDT.dsl  2019-12-12 15:28:21.976986949 +
> +++ SSDT-arm64-dbg.dsl2019-12-13 12:17:11.026806186 +
> @@ -18,7 +18,7 @@
>   * Compiler ID  "BXPC"
>   * Compiler Version 0x0001 (1)
>   */
> -DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x0001)
> +DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x0002)
>  {
>  Scope (\_SB)
>  {
> @@ -48,6 +48,11 @@
>  RLEN,   32,
>  ODAT,   32736
>  }
> +
> +Field (NRAM, DWordAcc, NoLock, Preserve)
> +{
> +NBUF,   32768
> +}
> 
>  If ((Arg4 == Zero))
>  {
> @@ -87,6 +92,12 @@
>  Local3 = DerefOf (Local2)
>  FARG = Local3
>  }
> +
> +Local2 = 0x2
> +printf("AML:NVDIMM Creating TBUF with bytes %o",
> Local2)
> +CreateField (NBUF, Zero, (Local2 << 3), TBUF)
> +Concatenate (Buffer (Zero){}, TBUF, Local3)
> +printf("AML:NVDIMM Size of TBUF(Local3) %o",
> SizeOf(Local3))
> 
>  NTFI = Local6
>  Local1 = (RLEN - 0x04)
> 
> And run it by changing Local2 with different values, It looks on ARM64,
> 
> For cases where, Local2 <8, the created buffer size is always 8 bytes
> 
> "AML:NVDIMM Creating TBUF with bytes 0002"
> "AML:NVDIMM Size of TBUF(Local3) 0008"
> 
> ...
> "AML:NVDIMM Creating TBUF with bytes 0005"
> "AML:NVDIMM Size of TBUF(Local3) 0008"
> 
> And once Local2 >=8, it gets the correct size,
> 
> "AML:NVDIMM Creating TBUF with bytes 0009"
> "AML:NVDIMM Size of TBUF(Local3) 0009"
> 
> 
> But on x86, the behavior is like,
> 
> For cases where, Local2 <4, the created buffer size is always 4 bytes
> 
> "AML:NVDIMM Creating TBUF with bytes 0002"
> "AML:NVDIMM Size of TBUF(Local3) 0004"
> 
> "AML:NVDIMM Creating TBUF with bytes 0003"
> "AML:NVDIMM Size of TBUF(Local3) 0004"
> 
> And once Local2 >= 4, it is ok
> 
> "AML:NVDIMM Creating TBUF with bytes 0005"
> "AML:NVDIMM Size of TBUF(Local3) 0005"
> ...
> "AML:NVDIMM Creating TBUF with bytes 0009"
> "AML:NVDIMM Size of TBUF(Local3) 0009"
> 
> This is the reason why it works on x86 and not on ARM64. Because, if you
> remember on second iteration of the FIT buffer, the requested buffer size is 
> 4 .
> 
> I tried changing the AccessType of the below NBUF field from DWordAcc to
> ByteAcc/BufferAcc, but no luck.
> 
> +Field (NRAM, DWordAcc, NoLock, Preserve)
> +{
> +NBUF,   32768
> +}
> 
> Not sure what we need to change for ARM64 to create buffer object of size 4
> here. Please let me know if you have any pointers to debug this further.
> 
> (I am attaching both x86 and ARM64 SSDT dsl used for reference)

(+Jonathan)

Thanks to Jonathan for taking a fresh look at this issue and spotting this,
https://elixir.bootlin.com/linux/v5.5-rc5/source/drivers/acpi/acpica/utmisc.c#L109

And, from ACPI 6.3, table 19-419

"If the Buffer Field is smaller than or equal to the size of an Integer (in 
bits), it
will be treated as an Integer. Otherwise, it will be treated as a Buffer. The 
size
of an Integer is indicated by the Definition Block table header's Revision 
field.
A Revision field value less than 2 indicates that the size of an Integer is 32 
bits.
A value greater than or equal to 2 signifies that the size of an Integer is 64 
bits."

It looks like the main reason for the difference in behavior of the buffer 
object
size between x86 and ARM/virt, is because of the Revision number used in the
DSDT table. On 

RE: [PATCH 0/5] ARM virt: Add NVDIMM support

2019-12-13 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 11 December 2019 07:57
> To: Shameerali Kolothum Thodi 
> Cc: xiaoguangrong.e...@gmail.com; peter.mayd...@linaro.org;
> drjo...@redhat.com; shannon.zha...@gmail.com; qemu-devel@nongnu.org;
> Linuxarm ; Auger Eric ;
> qemu-...@nongnu.org; xuwei (O) ;
> ler...@redhat.com
> Subject: Re: [PATCH 0/5] ARM virt: Add NVDIMM support

[...]

> > I couldn't figure out yet, why this extra 4 bytes are added by aml code on
> ARM64
> > when the nvdimm_dsm_func_read_fit() returns NvdimmFuncReadFITOut
> without
> > any FIT data. ie, when the FIT buffer len (read_len) is zero.
> >
> > But the below will fix this issue,
> >
> > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> > index f91eea3802..cddf95f4c1 100644
> > --- a/hw/acpi/nvdimm.c
> > +++ b/hw/acpi/nvdimm.c
> > @@ -588,7 +588,7 @@ static void
> nvdimm_dsm_func_read_fit(NVDIMMState *state, NvdimmDsmIn *in,
> >  nvdimm_debug("Read FIT: offset %#x FIT size %#x Dirty %s.\n",
> >   read_fit->offset, fit->len, fit_buf->dirty ? "Yes" : 
> > "No");
> >
> > -if (read_fit->offset > fit->len) {
> > +if (read_fit->offset >= fit->len) {
> >  func_ret_status = NVDIMM_DSM_RET_STATUS_INVALID;
> >  goto exit;
> >  }
> >
> >
> > This will return error code to aml in the second iteration when there is no
> further
> > FIT data to report. But, I am not sure why this check was omitted in the 
> > first
> place.
> >
> > Please let me know if this is acceptable and then probably I can look into 
> > a v2
> of this
> > series.
> Sorry, I don't have capacity to debug this right now,

No problem.

> but I'd prefer if 'why' question was answered first.

Right.

> Anyways, if something is unclear in how concrete AML code is build/works,
> feel free to ask and I'll try to explain and guide you.

Thanks for your help. I did spend some more time debugging this further.
I tried to introduce a totally new Buffer field object with different
sizes and printing the size after creation.

--- SSDT.dsl2019-12-12 15:28:21.976986949 +
+++ SSDT-arm64-dbg.dsl  2019-12-13 12:17:11.026806186 +
@@ -18,7 +18,7 @@
  * Compiler ID  "BXPC"
  * Compiler Version 0x0001 (1)
  */
-DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x0001)
+DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x0002)
 {
 Scope (\_SB)
 {
@@ -48,6 +48,11 @@
 RLEN,   32, 
 ODAT,   32736
 }
+
+Field (NRAM, DWordAcc, NoLock, Preserve)
+{
+NBUF,   32768 
+}
 
 If ((Arg4 == Zero))
 {
@@ -87,6 +92,12 @@
 Local3 = DerefOf (Local2)
 FARG = Local3
 }
+   
+Local2 = 0x2 
+printf("AML:NVDIMM Creating TBUF with bytes %o", Local2)
+CreateField (NBUF, Zero, (Local2 << 3), TBUF)
+Concatenate (Buffer (Zero){}, TBUF, Local3)
+printf("AML:NVDIMM Size of TBUF(Local3) %o", SizeOf(Local3))
 
 NTFI = Local6
 Local1 = (RLEN - 0x04)

And run it by changing Local2 with different values, It looks on ARM64, 

For cases where, Local2 <8, the created buffer size is always 8 bytes

"AML:NVDIMM Creating TBUF with bytes 0002"
"AML:NVDIMM Size of TBUF(Local3) 0008"

...
"AML:NVDIMM Creating TBUF with bytes 0005"
"AML:NVDIMM Size of TBUF(Local3) 0008"

And once Local2 >=8, it gets the correct size,

"AML:NVDIMM Creating TBUF with bytes 0009"
"AML:NVDIMM Size of TBUF(Local3) 0009"


But on x86, the behavior is like, 

For cases where, Local2 <4, the created buffer size is always 4 bytes

"AML:NVDIMM Creating TBUF with bytes 0002"
"AML:NVDIMM Size of TBUF(Local3) 0004"

"AML:NVDIMM Creating TBUF with bytes 0003"
"AML:NVDIMM Size of TBUF(Local3) 0004"

And once Local2 >= 4, it is ok

"AML:NVDIMM Creating TBUF with bytes 0005"
"AML:NVDIMM Size of TBUF(Local3) 0005"
...
"AML:NVDIMM Creating TBUF with bytes 0009"
"AML:NVDIMM Size of TBUF(Local3) 0009"

This is the reason why it works on x86 and not on ARM64. Because, if you
remember on second iteration of the FIT buffer, the requested bu

RE: [PATCH 0/5] ARM virt: Add NVDIMM support

2019-12-09 Thread Shameerali Kolothum Thodi
Hi Igor/ xiaoguangrong,

> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 28 November 2019 12:36
> To: 'Igor Mammedov' ;
> xiaoguangrong.e...@gmail.com
> Cc: peter.mayd...@linaro.org; drjo...@redhat.com;
> shannon.zha...@gmail.com; qemu-devel@nongnu.org; Linuxarm
> ; Auger Eric ;
> qemu-...@nongnu.org; xuwei (O) ;
> ler...@redhat.com
> Subject: RE: [PATCH 0/5] ARM virt: Add NVDIMM support
> 
> 
> 
> > -Original Message-
> > From: Qemu-devel
> >
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> > u.org] On Behalf Of Igor Mammedov
> > Sent: 26 November 2019 08:57
> > To: Shameerali Kolothum Thodi 
> > Cc: peter.mayd...@linaro.org; drjo...@redhat.com;
> > xiaoguangrong.e...@gmail.com; shannon.zha...@gmail.com;
> > qemu-devel@nongnu.org; Linuxarm ; Auger Eric
> > ; qemu-...@nongnu.org; xuwei (O)
> > ; ler...@redhat.com
> > Subject: Re: [PATCH 0/5] ARM virt: Add NVDIMM support
> 
> [..]
> 
> > > > 0xb8 Dirty No.  -->Another read is attempted
> > > > > [Qemu]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0x8
> > > > func_ret_status 3  --> Error status returned
> > > >
> > > > status 3 means that QEMU didn't like content of NRAM, and there is only
> > > > 1 place like this in nvdimm_dsm_func_read_fit()
> > > > if (read_fit->offset > fit->len) {
> > > > func_ret_status = NVDIMM_DSM_RET_STATUS_INVALID;
> > > > goto exit;
> > > > }
> > > >
> > > > so I'd start looking from here and check that QEMU gets expected data
> > > > in nvdimm_dsm_write(). In other words I'd try to trace/compare
> > > > content of DSM buffer (from qemu side).
> > >
> > > I had printed the DSM buffer previously and it looked same, I will double
> check
> > > that.
> 
> Tried printing the buffer in both Qemu/AML code.
> 
> On Amr64,

[...]
 
> Attached the SSDT.dsl used for debugging. I am still not clear why on ARM64,
> 2nd iteration case, the created buffer size in NCAL and RFIT methods have
> additional 4 bytes!.
> 
> CreateField (ODAT, Zero, Local1, OBUF)
> Concatenate (Buffer (Zero){}, OBUF, Local7)
> 
> Please let me know if you have any clue.
> 

I couldn't figure out yet, why this extra 4 bytes are added by aml code on ARM64
when the nvdimm_dsm_func_read_fit() returns NvdimmFuncReadFITOut without
any FIT data. ie, when the FIT buffer len (read_len) is zero.

But the below will fix this issue,

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index f91eea3802..cddf95f4c1 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -588,7 +588,7 @@ static void nvdimm_dsm_func_read_fit(NVDIMMState *state, 
NvdimmDsmIn *in,
 nvdimm_debug("Read FIT: offset %#x FIT size %#x Dirty %s.\n",
  read_fit->offset, fit->len, fit_buf->dirty ? "Yes" : "No");

-if (read_fit->offset > fit->len) {
+if (read_fit->offset >= fit->len) {
 func_ret_status = NVDIMM_DSM_RET_STATUS_INVALID;
 goto exit;
 }


This will return error code to aml in the second iteration when there is no 
further
FIT data to report. But, I am not sure why this check was omitted in the first 
place.

Please let me know if this is acceptable and then probably I can look into a v2 
of this
series.

Thanks,
Shameer






RE: [PATCH 1/5] hw/arm: Align ACPI blob len to PAGE size

2019-12-09 Thread Shameerali Kolothum Thodi
Hi Igor/ Michael,

> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 11 November 2019 12:47
> To: Igor Mammedov 
> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com; Michael S. Tsirkin
> ; qemu-devel@nongnu.org; Linuxarm
> ; eric.au...@redhat.com; qemu-...@nongnu.org;
> xuwei (O) ; ler...@redhat.com
> Subject: RE: [PATCH 1/5] hw/arm: Align ACPI blob len to PAGE size
> 
> Hi Igor,
> 
> > -Original Message-
> > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > Sent: 08 November 2019 16:18
> > To: Shameerali Kolothum Thodi 
> > Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > eric.au...@redhat.com; peter.mayd...@linaro.org;
> > shannon.zha...@gmail.com; xuwei (O) ;
> > ler...@redhat.com; Linuxarm ; Michael S. Tsirkin
> > 
> > Subject: Re: [PATCH 1/5] hw/arm: Align ACPI blob len to PAGE size
> >
> > On Fri, 4 Oct 2019 16:52:58 +0100
> > Shameer Kolothum  wrote:
> >
> > > If ACPI blob length modifications happens after the initial
> > > virt_acpi_build() call, and the changed blob length is within
> > > the PAGE size boundary, then the revised size is not seen by
> > > the firmware on Guest reboot. The is because in the
> > > virt_acpi_build_update() -> acpi_ram_update() -> qemu_ram_resize()
> > > path, qemu_ram_resize() uses ram_block size which is aligned
> > > to PAGE size and the "resize callback" to update the size seen
> > > by firmware is not getting invoked. Hence align ACPI blob sizes
> > > to PAGE boundary.
> > >
> > > Signed-off-by: Shameer Kolothum
> 
> > > ---
> > > More details on this issue can be found here,
> > > https://patchwork.kernel.org/patch/11154757/
> > re-read it again and it seems to me that this patch is workaround
> > rather than a solution to the problem.
> 
> Thanks for taking a look at this. Yes, I was also not very sure about this
> approach
> as the root cause of the issue is in qemu_ram_resize().
> 
> > CCing Michael as an author this code.
> > on x86 we have crazy history of manually aligning acpi blobs, see code under
> > comment
> >
> >   /* We'll expose it all to Guest so we want to reduce
> >
> > so used_length endups with over-sized value which includes table and
> padding
> > and it happens that ACPI_BUILD_TABLE_SIZE is much bigger than host page
> > size
> > so if on reboot we happen to exceed ACPI_BUILD_TABLE_SIZE, the next
> padded
> > table
> > size (used_length) would be  2 x ACPI_BUILD_TABLE_SIZE which doesn't
> > trigger
> >   block->used_length == HOST_PAGE_ALIGN(newsize)
> > condition so fwcfg gets updated value.
> 
> Yes, this is the reason why the issue is not visible on x86.
> 
> >
> > > ---
> > >  hw/arm/virt-acpi-build.c | 14 ++
> > >  1 file changed, 14 insertions(+)
> > >
> > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > > index 4cd50175e0..074e0c858e 100644
> > > --- a/hw/arm/virt-acpi-build.c
> > > +++ b/hw/arm/virt-acpi-build.c
> > > @@ -790,6 +790,7 @@ void virt_acpi_build(VirtMachineState *vms,
> > AcpiBuildTables *tables)
> > >  GArray *table_offsets;
> > >  unsigned dsdt, xsdt;
> > >  GArray *tables_blob = tables->table_data;
> > > +GArray *cmd_blob = tables->linker->cmd_blob;
> > >  MachineState *ms = MACHINE(vms);
> > >
> > >  table_offsets = g_array_new(false, true /* clear */,
> > > @@ -854,6 +855,19 @@ void virt_acpi_build(VirtMachineState *vms,
> > AcpiBuildTables *tables)
> > >  build_rsdp(tables->rsdp, tables->linker, _data);
> > >  }
> > >
> > > +/*
> > > + * Align the ACPI blob lengths to PAGE size so that on ACPI table
> > > + * regeneration, the length that firmware sees really gets updated
> > > + * through 'resize' callback in qemu_ram_resize() in the
> > > + * virt_acpi_build_update() -> acpi_ram_update() ->
> > qemu_ram_resize()
> > > + * path.
> > > + */
> > > +g_array_set_size(tables_blob,
> > > +
> > TARGET_PAGE_ALIGN(acpi_data_len(tables_blob)));
> > here it would depend on TARGET_PAGE_ALIGN vs HOST_PAGE_ALIGN
> relation
> > so depending on host it could flip it's behavior to opposite.
> 
> Ok.
> 
> >
> > one thing we could do is dropping (block->used_length == newsize) condition
&g

RE: [PATCH 0/5] ARM virt: Add NVDIMM support

2019-11-28 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of Igor Mammedov
> Sent: 26 November 2019 08:57
> To: Shameerali Kolothum Thodi 
> Cc: peter.mayd...@linaro.org; drjo...@redhat.com;
> xiaoguangrong.e...@gmail.com; shannon.zha...@gmail.com;
> qemu-devel@nongnu.org; Linuxarm ; Auger Eric
> ; qemu-...@nongnu.org; xuwei (O)
> ; ler...@redhat.com
> Subject: Re: [PATCH 0/5] ARM virt: Add NVDIMM support

[..]

> > > 0xb8 Dirty No.  -->Another read is attempted
> > > > [Qemu]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0x8
> > > func_ret_status 3  --> Error status returned
> > >
> > > status 3 means that QEMU didn't like content of NRAM, and there is only
> > > 1 place like this in nvdimm_dsm_func_read_fit()
> > > if (read_fit->offset > fit->len) {
> > > func_ret_status = NVDIMM_DSM_RET_STATUS_INVALID;
> > > goto exit;
> > > }
> > >
> > > so I'd start looking from here and check that QEMU gets expected data
> > > in nvdimm_dsm_write(). In other words I'd try to trace/compare
> > > content of DSM buffer (from qemu side).
> >
> > I had printed the DSM buffer previously and it looked same, I will double 
> > check
> > that.

Tried printing the buffer in both Qemu/AML code.

On Amr64,
-
(1st iteration with offset 0)
[QEMU]NVDIMM:nvdimm_dsm_func_read_fit: Read FIT: offset 0 FIT size 0xb8 Dirty 
Yes.
[QEMU]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buff: 
[QEMU]NVDIMM BUF[0x0] = 0xC0
[QEMU]NVDIMM BUF[0x1] = 0x0
[QEMU]NVDIMM BUF[0x2] = 0x0
[QEMU]NVDIMM BUF[0x3] = 0x0
[QEMU]NVDIMM BUF[0x4] = 0x0
[QEMU]NVDIMM BUF[0x5] = 0x0
[QEMU]NVDIMM BUF[0x6] = 0x0
[QEMU]NVDIMM BUF[0x7] = 0x0
[QEMU]NVDIMM BUF[0x8] = 0x0
[QEMU]NVDIMM BUF[0x9] = 0x0
[QEMU]NVDIMM BUF[0xA] = 0x38
[QEMU]NVDIMM BUF[0xB] = 0x0
[QEMU]NVDIMM BUF[0xC] = 0x2
[QEMU]NVDIMM BUF[0xD] = 0x0
[QEMU]NVDIMM BUF[0xE] = 0x3
[QEMU]NVDIMM BUF[0xF] = 0x0
.
[QEMU]NVDIMM BUF[0xBC] = 0x0
[QEMU]NVDIMM BUF[0xBD] = 0x0
[QEMU]NVDIMM BUF[0xBE] = 0x0
[QEMU]NVDIMM BUF[0xBF] = 0x0
[QEMU]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0xc0 
func_ret_status 0 

"AML:NVDIMM-NCAL: Rcvd RLEN 00C0"
"AML:NVDIMM-NCAL TBUF[] = 0x00C0"
"AML:NVDIMM-NCAL TBUF[0001] = 0x"
"AML:NVDIMM-NCAL TBUF[0002] = 0x"
"AML:NVDIMM-NCAL TBUF[0003] = 0x"
"AML:NVDIMM-NCAL TBUF[0004] = 0x"
"AML:NVDIMM-NCAL TBUF[0005] = 0x"
"AML:NVDIMM-NCAL TBUF[0006] = 0x"
"AML:NVDIMM-NCAL TBUF[0007] = 0x"
"AML:NVDIMM-NCAL TBUF[0008] = 0x"
"AML:NVDIMM-NCAL TBUF[0009] = 0x"
"AML:NVDIMM-NCAL TBUF[000A] = 0x0038"
"AML:NVDIMM-NCAL TBUF[000B] = 0x"
"AML:NVDIMM-NCAL TBUF[000C] = 0x0002"
"AML:NVDIMM-NCAL TBUF[000D] = 0x"
"AML:NVDIMM-NCAL TBUF[000E] = 0x0003"
"AML:NVDIMM-NCAL TBUF[000F] = 0x"
...
"AML:NVDIMM-NCAL TBUF[00BC] = 0x"
"AML:NVDIMM-NCAL TBUF[00BD] = 0x"
"AML:NVDIMM-NCAL TBUF[00BE] = 0x"
"AML:NVDIMM-NCAL TBUF[00BF] = 0x"
"AML:NVDIMM-NCAL: Creating OBUF with bytes 00BC"
"AML:NVDIMM-NCAL: Created  BUF(Local7) size 00BC"
"AML:NVDIMM-RFIT Rcvd buf size 00BC"
"AML:NVDIMM-RFIT Created NVDR.RFIT.BUFF size 00B8"
"AML:NVDIMM-FIT: Rcvd buf size 00B8" -->All looks fine in first 
iteration.

(2nd iteration with offset 0xb8)
[QEMU]NVDIMM:nvdimm_dsm_func_read_fit: Read FIT: offset 0xb8 FIT size 0xb8 
Dirty No.
[QEMU]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buff: 
[QEMU]NVDIMM BUF[0x0] = 0x8
[QEMU]NVDIMM BUF[0x1] = 0x0
[QEMU]NVDIMM BUF[0x2] = 0x0
[QEMU]NVDIMM BUF[0x3] = 0x0
[QEMU]NVDIMM BUF[0x4] = 0x0
[QEMU]NVDIMM BUF[0x5] = 0x0
[QEMU]NVDIMM BUF[0x6] = 0x0
[QEMU]NVDIMM BUF[0x7] = 0x0
[QEMU]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0x8 
func_ret_status 0 

"AML:NVDIMM-NCAL: Rcvd RLEN 0008"
"AML:NVDIMM-NCAL TBUF[] = 0x0008"
"AML:NVDIMM-NCAL TBUF[0001] = 0x"
"AML:NVDIMM-NCAL TBUF[0002] =

RE: [PATCH 0/5] ARM virt: Add NVDIMM support

2019-11-25 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 25 November 2019 15:46
> To: Shameerali Kolothum Thodi 
> Cc: Auger Eric ; qemu-devel@nongnu.org;
> qemu-...@nongnu.org; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; xuwei (O) ;
> ler...@redhat.com; Linuxarm 
> Subject: Re: [PATCH 0/5] ARM virt: Add NVDIMM support
> 
> On Mon, 25 Nov 2019 13:20:02 +
> Shameerali Kolothum Thodi  wrote:
> 
> > Hi Eric/Igor,
> >
> > > -Original Message-
> > > From: Shameerali Kolothum Thodi
> > > Sent: 22 October 2019 15:05
> > > To: 'Auger Eric' ; qemu-devel@nongnu.org;
> > > qemu-...@nongnu.org; imamm...@redhat.com
> > > Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com; xuwei (O)
> > > ; ler...@redhat.com; Linuxarm
> > > 
> > > Subject: RE: [PATCH 0/5] ARM virt: Add NVDIMM support
> 
> not related to problem discussed in this patch but you probably
> need to update docs/specs/acpi_nvdimm.txt to account for your changes

Ok.

> > >
> >
> > [..]
> >
> > > > one question: I noticed that when a NVDIMM slot is hotplugged one get
> > > > the following trace on guest:
> > > >
> > > > nfit ACPI0012:00: found a zero length table '0' parsing nfit
> > > > pmem0: detected capacity change from 0 to 1073741824
> > > >
> > > > Have you experienced the 0 length trace?
> > >
> > > I double checked and yes that trace is there. And I did a quick check with
> > > x86 and it is not there.
> > >
> > > The reason looks like, ARM64 kernel receives an additional 8 bytes size
> when
> > > the kernel evaluates the "_FIT" object.
> > >
> > > For the same test scenario, Qemu reports a FIT buffer size of 0xb8 and
> > >
> > > X86 Guest kernel,
> > > [1.601077] acpi_nfit_init: data 0x8a273dc12b18 sz 0xb8
> > >
> > > ARM64 Guest,
> > > [0.933133] acpi_nfit_init: data 0x3cbe6018 sz 0xc0
> > >
> > > I am not sure how that size gets changed for ARM which results in
> > > the above mentioned 0 length trace. I need to debug this further.
> > >
> > > Please let me know if you have any pointers...
> >
> > I spend some time debugging this further and it looks like the AML code
> > behaves differently on x86 and ARM64.
> FIT table is built dynamically and you are the first to debug
> such issue.
> (apart from the author the NVDIMM code.
:)
>  btw: why NVDIMM author is not on CC list???)

Right. Missed that. CCd.
 
> 
> > Booted guest with nvdimm mem, and used SSDT override with dbg prints
> > added,
> >
> > -object memory-backend-ram,id=mem1,size=1G \
> > -device nvdimm,id=dimm1,memdev=mem1 \
> >
> > On X86,
> > ---
> >
> > [Qemu]NVDIMM:nvdimm_dsm_func_read_fit: Read FIT: offset 0 FIT size 0xb8
> Dirty Yes.
> > [Qemu]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0xc0
> func_ret_status 0
> >
> > [AML]"NVDIMM-NCAL: Rcvd RLEN 00C0"
> > [AML]"NVDIMM-NCAL: Creating OBUF with 05E0 bits"
> > [AML]"NVDIMM-NCAL: Created  BUF(Local7) size 00BC"
> > [AML]"NVDIMM-RFIT Rcvd buf size 00BC"
> > [AML]"NVDIMM-RFIT Created NVDR.RFIT.BUFF size 00B8"
> > [AML]"NVDIMM-FIT: Rcvd buf size 00B8"
> >
> > [Qemu]NVDIMM:nvdimm_dsm_func_read_fit: Read FIT: offset 0xb8 FIT size
> 0xb8 Dirty No.
> > [Qemu]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0x8
> func_ret_status 0
> >
> > [AML]"NVDIMM-NCAL: Rcvd RLEN 0008"
> > [AML]"NVDIMM-NCAL: Creating OBUF with 0020 bits"
> > [AML]"NVDIMM-NCAL: Created  BUF(Local7) size 0004"
> > [AML]"NVDIMM-RFIT Rcvd buf size 0004"
> > [AML]"NVDIMM-FIT: Rcvd buf size "
> > [AML]"NVDIMM-FIT: _FIT returned size 00B8"
> >
> > [ KERNEL] acpi_nfit_init: NVDIMM: data 0x9855bb9a7518 sz 0xb8  -->
> Guest receives correct size(0xb8) here
> >
> > On ARM64,
> > ---
> >
> > [Qemu]NVDIMM:nvdimm_dsm_func_read_fit: Read FIT: offset 0 FIT size 0xb8
> Dirty Yes.
> > [Qemu]VDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0xc0
> func_ret_status 0
> >
> > [AML]"NVDIMM-NCAL: Rcvd RLEN 00C0"
> > [AML]"NVDIMM-NCAL: Creating OBUF with 05E0 bits"
> > [AML]"NVDIMM-NCAL: Created  BUF(Local7) size 0

RE: [PATCH 0/5] ARM virt: Add NVDIMM support

2019-11-25 Thread Shameerali Kolothum Thodi
Hi Eric/Igor,

> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 22 October 2019 15:05
> To: 'Auger Eric' ; qemu-devel@nongnu.org;
> qemu-...@nongnu.org; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com; xuwei (O)
> ; ler...@redhat.com; Linuxarm
> 
> Subject: RE: [PATCH 0/5] ARM virt: Add NVDIMM support
> 

[..]

> > one question: I noticed that when a NVDIMM slot is hotplugged one get
> > the following trace on guest:
> >
> > nfit ACPI0012:00: found a zero length table '0' parsing nfit
> > pmem0: detected capacity change from 0 to 1073741824
> >
> > Have you experienced the 0 length trace?
> 
> I double checked and yes that trace is there. And I did a quick check with
> x86 and it is not there.
> 
> The reason looks like, ARM64 kernel receives an additional 8 bytes size when
> the kernel evaluates the "_FIT" object.
> 
> For the same test scenario, Qemu reports a FIT buffer size of 0xb8 and
> 
> X86 Guest kernel,
> [1.601077] acpi_nfit_init: data 0x8a273dc12b18 sz 0xb8
> 
> ARM64 Guest,
> [0.933133] acpi_nfit_init: data 0x3cbe6018 sz 0xc0
> 
> I am not sure how that size gets changed for ARM which results in
> the above mentioned 0 length trace. I need to debug this further.
> 
> Please let me know if you have any pointers...

I spend some time debugging this further and it looks like the AML code
behaves differently on x86 and ARM64.

Booted guest with nvdimm mem, and used SSDT override with dbg prints
added,

-object memory-backend-ram,id=mem1,size=1G \
-device nvdimm,id=dimm1,memdev=mem1 \

On X86,
---

[Qemu]NVDIMM:nvdimm_dsm_func_read_fit: Read FIT: offset 0 FIT size 0xb8 Dirty 
Yes.
[Qemu]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0xc0 
func_ret_status 0

[AML]"NVDIMM-NCAL: Rcvd RLEN 00C0"
[AML]"NVDIMM-NCAL: Creating OBUF with 05E0 bits"
[AML]"NVDIMM-NCAL: Created  BUF(Local7) size 00BC"
[AML]"NVDIMM-RFIT Rcvd buf size 00BC"
[AML]"NVDIMM-RFIT Created NVDR.RFIT.BUFF size 00B8"
[AML]"NVDIMM-FIT: Rcvd buf size 00B8"

[Qemu]NVDIMM:nvdimm_dsm_func_read_fit: Read FIT: offset 0xb8 FIT size 0xb8 
Dirty No.
[Qemu]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0x8 
func_ret_status 0 

[AML]"NVDIMM-NCAL: Rcvd RLEN 0008"
[AML]"NVDIMM-NCAL: Creating OBUF with 0020 bits"
[AML]"NVDIMM-NCAL: Created  BUF(Local7) size 0004"
[AML]"NVDIMM-RFIT Rcvd buf size 0004"
[AML]"NVDIMM-FIT: Rcvd buf size "
[AML]"NVDIMM-FIT: _FIT returned size 00B8"

[ KERNEL] acpi_nfit_init: NVDIMM: data 0x9855bb9a7518 sz 0xb8  --> Guest 
receives correct size(0xb8) here 

On ARM64,
---

[Qemu]NVDIMM:nvdimm_dsm_func_read_fit: Read FIT: offset 0 FIT size 0xb8 Dirty 
Yes.
[Qemu]VDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0xc0 
func_ret_status 0 

[AML]"NVDIMM-NCAL: Rcvd RLEN 00C0"
[AML]"NVDIMM-NCAL: Creating OBUF with 05E0 bits"
[AML]"NVDIMM-NCAL: Created  BUF(Local7) size 00BC"
[AML]"NVDIMM-RFIT Rcvd buf size 00BC"
[AML]"NVDIMM-RFIT Created NVDR.RFIT.BUFF size 00B8"
[AML]"NVDIMM-FIT: Rcvd buf size 00B8"

[Qemu]NVDIMM:nvdimm_dsm_func_read_fit: Read FIT: offset 0xb8 FIT size 0xb8 
Dirty No.
[Qemu]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0x8 
func_ret_status 0 

[AML]"NVDIMM-NCAL: Rcvd RLEN 0008"
[AML]"NVDIMM-NCAL: Creating OBUF with 0020 bits"  --> All looks 
same as x86 up to here.
[AML]"NVDIMM-NCAL: Created  BUF(Local7) size 0008"  ---> The size 
goes wrong. 8 bytes instead of 4!.
[AML]"NVDIMM-RFIT Rcvd buf size 0008"
[AML]"NVDIMM-RFIT Created NVDR.RFIT.BUFF size 0004"
[AML]"NVDIMM-FIT: Rcvd buf size 0008"  --> Again size goes wrong. 8 
bytes instead of 4!.

[Qemu]NVDIMM:nvdimm_dsm_func_read_fit: Read FIT: offset 0xc0 FIT size 0xb8 
Dirty No.  -->Another read is attempted 
[Qemu]NVDIMM:nvdimm_dsm_func_read_fit: read_fit_out buf size 0x8 
func_ret_status 3  --> Error status returned

[AML]"NVDIMM-NCAL: Rcvd RLEN 0008"
[AML]"NVDIMM-NCAL: Creating OBUF with 0020 bits"
[AML]"NVDIMM-NCAL: Created  BUF(Local7) size 0008"
[AML]"NVDIMM-FIT: Rcvd buf size "
[AML]"NVDIMM-FIT: _FIT returned size 00C0"   --> Wrong size 
returned.
[ KERNEL] acpi_nfit_init: NVDIMM: data 0xfc57ce18 sz 0xc0   -->Kernel 
gets 0xc0 instead of 0xb8


It looks like the aml, "CreateField (ODAT, Zero, Local1, OBUF)" goes wrong for
ARM64 when the buffer is all zeroes. My knowledge on aml is very limited and not
sure this is a 32/64bit issue or not. I am attaching the SSDT files with the 
above
dbg prints added. Could you please take a look and let me know what actually is
going on here...

Much appreciated,
Shameer.




SSDT-dbg-arm64.dsl
Description: SSDT-dbg-arm64.dsl


SSDT-dbg-x86.dsl
Description: SSDT-dbg-x86.dsl


RE: [PATCH 1/5] hw/arm: Align ACPI blob len to PAGE size

2019-11-11 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 08 November 2019 16:18
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; xuwei (O) ;
> ler...@redhat.com; Linuxarm ; Michael S. Tsirkin
> 
> Subject: Re: [PATCH 1/5] hw/arm: Align ACPI blob len to PAGE size
> 
> On Fri, 4 Oct 2019 16:52:58 +0100
> Shameer Kolothum  wrote:
> 
> > If ACPI blob length modifications happens after the initial
> > virt_acpi_build() call, and the changed blob length is within
> > the PAGE size boundary, then the revised size is not seen by
> > the firmware on Guest reboot. The is because in the
> > virt_acpi_build_update() -> acpi_ram_update() -> qemu_ram_resize()
> > path, qemu_ram_resize() uses ram_block size which is aligned
> > to PAGE size and the "resize callback" to update the size seen
> > by firmware is not getting invoked. Hence align ACPI blob sizes
> > to PAGE boundary.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> > More details on this issue can be found here,
> > https://patchwork.kernel.org/patch/11154757/
> re-read it again and it seems to me that this patch is workaround
> rather than a solution to the problem.

Thanks for taking a look at this. Yes, I was also not very sure about this 
approach
as the root cause of the issue is in qemu_ram_resize().

> CCing Michael as an author this code.
> on x86 we have crazy history of manually aligning acpi blobs, see code under
> comment
> 
>   /* We'll expose it all to Guest so we want to reduce
> 
> so used_length endups with over-sized value which includes table and padding
> and it happens that ACPI_BUILD_TABLE_SIZE is much bigger than host page
> size
> so if on reboot we happen to exceed ACPI_BUILD_TABLE_SIZE, the next padded
> table
> size (used_length) would be  2 x ACPI_BUILD_TABLE_SIZE which doesn't
> trigger
>   block->used_length == HOST_PAGE_ALIGN(newsize)
> condition so fwcfg gets updated value.

Yes, this is the reason why the issue is not visible on x86.
 
> 
> > ---
> >  hw/arm/virt-acpi-build.c | 14 ++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index 4cd50175e0..074e0c858e 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -790,6 +790,7 @@ void virt_acpi_build(VirtMachineState *vms,
> AcpiBuildTables *tables)
> >  GArray *table_offsets;
> >  unsigned dsdt, xsdt;
> >  GArray *tables_blob = tables->table_data;
> > +GArray *cmd_blob = tables->linker->cmd_blob;
> >  MachineState *ms = MACHINE(vms);
> >
> >  table_offsets = g_array_new(false, true /* clear */,
> > @@ -854,6 +855,19 @@ void virt_acpi_build(VirtMachineState *vms,
> AcpiBuildTables *tables)
> >  build_rsdp(tables->rsdp, tables->linker, _data);
> >  }
> >
> > +/*
> > + * Align the ACPI blob lengths to PAGE size so that on ACPI table
> > + * regeneration, the length that firmware sees really gets updated
> > + * through 'resize' callback in qemu_ram_resize() in the
> > + * virt_acpi_build_update() -> acpi_ram_update() ->
> qemu_ram_resize()
> > + * path.
> > + */
> > +g_array_set_size(tables_blob,
> > +
> TARGET_PAGE_ALIGN(acpi_data_len(tables_blob)));
> here it would depend on TARGET_PAGE_ALIGN vs HOST_PAGE_ALIGN relation
> so depending on host it could flip it's behavior to opposite.

Ok.

> 
> one thing we could do is dropping (block->used_length == newsize) condition

I tried this before and strangely for some reason on reboot path,

virt_acpi_build_update() is called with build_state being NULL and no 
acpi_ram_update()
happens. Not sure what causes this behavior when we drop the above condition.

> another is to use value of block->used_length for s->files->f[index].size.

I just tried this by passing block->used_length to fw_cfg_add_file_callback() .
This could work for this case. But not sure there will be any corner cases
and also there isn't any easy way to access the mr->ram_balck->used_length from
hw/core/loader.c.

> 
> Michael,
> what's your take in this?

Thanks,
Shameer

> 
> > +g_array_set_size(tables->rsdp,
> > +
> TARGET_PAGE_ALIGN(acpi_data_len(tables->rsdp)));
> > +g_array_set_size(cmd_blob,
> > + TARGET_PAGE_ALIGN(acpi_data_len(cmd_blob)));
> >  /* Cleanup memory that's no longer used. */
> >  g_array_free(table_offsets, true);
> >  }




RE: [PATCH 0/5] ARM virt: Add NVDIMM support

2019-10-22 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 18 October 2019 17:40
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com; xuwei (O)
> ; ler...@redhat.com; Linuxarm
> 
> Subject: Re: [PATCH 0/5] ARM virt: Add NVDIMM support
> 
> Hi Shameer,
> 
> On 10/4/19 5:52 PM, Shameer Kolothum wrote:
> > This series adds NVDIMM support to arm/virt platform.
> > This has a dependency on [0] and make use of the GED
> > device for NVDIMM hotplug events. The series reuses
> > some of the patches posted by Eric in his earlier
> > attempt here[1].
> >
> > Patch 1/5 is a fix to the Guest reboot issue on NVDIMM
> > hot add case described here[2].
> >
> > I have done basic sanity testing of NVDIMM deviecs with
> devcies
> > both ACPI and DT Guest boot. Further testing is always
> > welcome.
> >
> > Please let me know your feedback.
> 
> I tested it on my side. Looks to work pretty well.

Thanks for giving this a spin.
 
> one question: I noticed that when a NVDIMM slot is hotplugged one get
> the following trace on guest:
> 
> nfit ACPI0012:00: found a zero length table '0' parsing nfit
> pmem0: detected capacity change from 0 to 1073741824
> 
> Have you experienced the 0 length trace?

I double checked and yes that trace is there. And I did a quick check with
x86 and it is not there. 

The reason looks like, ARM64 kernel receives an additional 8 bytes size when
the kernel evaluates the "_FIT" object. 

For the same test scenario, Qemu reports a FIT buffer size of 0xb8 and 

X86 Guest kernel,
[1.601077] acpi_nfit_init: data 0x8a273dc12b18 sz 0xb8

ARM64 Guest,
[0.933133] acpi_nfit_init: data 0x3cbe6018 sz 0xc0

I am not sure how that size gets changed for ARM which results in
the above mentioned 0 length trace. I need to debug this further.

Please let me know if you have any pointers...
 
> Besides when we reset the system we find the namespaces again using
> "ndctl list -u" so the original bug seems to be fixed.
> 
> Did you try to mount a DAX FS. I can mount but with DAX forced off.
> sudo mkdir /mnt/mem0

Yes. I did try with DAX FS. But do we need to change the namespace mode to 
file system DAX mode?

I used the below command before attempting to mount with -o dax,

ndctl create-namespace -f -e namespace0.0 --mode=fsdax

And in order to do the above you might need the ZONE_DEVICE support
in the Kernel which in turn has dependency on hot remove. Hence I tried with
"arm64/mm: Enable memory hot remove" patches,

https://patchwork.kernel.org/cover/11185169/

> mkfs.xfs -f -m reflink=0 /dev/pmem0
> sudo mount -o dax /dev/pmem0 /mnt/mem0
> [ 2610.051830] XFS (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at
> your own risk
> [ 2610.178580] XFS (pmem0): DAX unsupported by block device. Turning off
> DAX.
> [ 2610.180871] XFS (pmem0): Mounting V5 Filesystem
> [ 2610.189797] XFS (pmem0): Ending clean mount
> 
> I fail to remember if it was the case months ago. I am not sure if it is
> an issue in my guest .config or if there is something not yet supported
> on aarch64? Did you try on your side?
> 
> Also if you forget to put the ",nvdimm" to the machvirt options you get,
> on hotplug:
> {"error": {"class": "GenericError", "desc": "nvdimm is not yet supported"}}
> which is not correct anymore ;-)

Ok. I will check this.

Thanks,
Shameer
 
> Thanks
> 
> Eric
> 
> 
> >
> > Thanks,
> > Shameer
> >
> > [0] https://patchwork.kernel.org/cover/11150345/
> > [1] https://patchwork.kernel.org/cover/10830777/
> > [2] https://patchwork.kernel.org/patch/11154757/
> >
> > Eric Auger (1):
> >   hw/arm/boot: Expose the pmem nodes in the DT
> >
> > Kwangwoo Lee (2):
> >   nvdimm: Use configurable ACPI IO base and size
> >   hw/arm/virt: Add nvdimm hot-plug infrastructure
> >
> > Shameer Kolothum (2):
> >   hw/arm: Align ACPI blob len to PAGE size
> >   hw/arm/virt: Add nvdimm hotplug support
> >
> >  docs/specs/acpi_hw_reduced_hotplug.rst |  1 +
> >  hw/acpi/generic_event_device.c | 13 
> >  hw/acpi/nvdimm.c   | 32 --
> >  hw/arm/Kconfig |  1 +
> >  hw/arm/boot.c  | 45
> ++
> >  hw/arm/virt-acpi-build.c   | 20 
> >  hw/arm/virt.c  | 42
> 
> >  hw/i386/acpi-build.c   |  6 
> >  hw/i386/acpi-build.h   |  3 ++
> >  hw/i386/pc_piix.c  |  2 ++
> >  hw/i386/pc_q35.c   |  2 ++
> >  hw/mem/Kconfig |  2 +-
> >  include/hw/acpi/generic_event_device.h |  1 +
> >  include/hw/arm/virt.h  |  1 +
> >  include/hw/mem/nvdimm.h|  3 ++
> >  15 files changed, 157 insertions(+), 17 deletions(-)
> >


RE: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support)

2019-09-26 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 24 September 2019 17:39
> To: 'Laszlo Ersek' ; Igor Mammedov
> 
> Cc: Auger Eric ; shannon.zha...@gmail.com;
> peter.mayd...@linaro.org; qemu-devel@nongnu.org; qemu-...@nongnu.org;
> xuwei (O) ; Linuxarm ; Ard
> Biesheuvel ; Leif Lindholm (Linaro address)
> 
> Subject: RE: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4]
> ARM virt: ACPI memory hotplug support)
 
[...]
 
> 
> > >>>> How about this.
> > >>>>
> > >>>> (1) The firmware looks up the fw_cfg file called "etc/table-loader"
> > >>>> in the fw_cfg file directory (identified by constant selector key
> > >>>> 0x0019, FW_CFG_FILE_DIR).
> > >>>>
> > >>>> (2) The directory entry, once found, tells the firmware two things
> > >>>> simultaneously. The selector key, and the size of the blob.
> > >>>>
> > >>>> (3) The firmware selects the selector key from step (2).
> > >>>>
> > >>>> (4) QEMU regenerates the ACPI payload (as a select callback).
> > >>>>
> > >>>> (5) The firmware reads the number of bytes from the fw_cfg blob
> > >>>> that it learned in step (2).
> > >>>>
> > >>>> Here's the problem. As long as QEMU used to perform step (4) only
> > >>>> for the purpose of refreshing PCI resources in the ACPI payload,
> > >>>> step (4) wouldn't *resize* the blob.
> > >>>>
> > >>>> However, if step (4) enlarges the blob, then the byte count that
> > >>>> step (5) uses -- from step (2) -- for reading, is obsolete.
> > >>
> > >>> I've thought that was a problem with IO based fw_cfg, as reading
> > >>> size/content were separates steps and that it was solved by DMA
> > >>> based fw_cfg file read.
> > >>
> > >> The DMA backend is not relevant for this question, for two reasons:
> > >>
> > >> (a) The question whether the fw_cfg transfer takes places with port
> > >> IO vs. DMA is hidden from the fw_cfg client code; that code goes
> > >> through an abstract library API.
> > >>
> > >> (b) While the DMA method indeed lets the firmware specify the details
> > >> of the transfer with one action, the issue is with the number of
> > >> bytes that the firmware requests (that is, not with *how* the
> > >> firmware requests the transfer). The firmware has to know the size of
> > >> the transfer before it can initiate the transfer (regardless of port
> > >> IO vs. DMA).
> > >>
> > >>
> > >> My question is: assume the firmware item in question is selected, and
> > >> the QEMU-side select callback runs (regenerating the ACPI payload).
> > >> Does this action update the blob size in the fw_cfg file directory as
> > >> well?
> > >
> > > I think it doesn't update the blob size on select callback which is
> > > the root cause of this issue. And the reason looks like,
> > > qemu_ram_resize() function returns without invoking the callback to
> > > update the blob size.
> > >
> > > On boot up, Qemu builds the table and exposes it to guest,
> > >   virt_acpi_setup()
> > > acpi_add_rom_blob()
> > >   rom_add_blob()
> > > rom_set_mr()  --> mr is allocated here and ram_block
> > used_length = HOST_PAGE_ALIGN(blob size);
> > > fw_cfg_add_file_callback()
> > >   fw_cfg_add_bytes_callback() --> This uses the blob size
> > passed into it.
> > >
> > > On select callback path,
> > >
> > > virt_acpi_build_update()
> > >acpi_ram_update()
> > > memory_region_ram_resize()
> > >   qemu_ram_resize() -->. Here the newsize gets aligned to
> HOST_PAGE
> > and callback is only called used_length != newsize.
> > >
> > > https://github.com/qemu/qemu/blob/master/exec.c#L2180
> > >
> > > Debug logs:
> > > Initial boot:
> > > ##QEMU_DEBUG## rom_add_blob: file etc/acpi/tables size 0x64f7
> > > ##QEMU_DEBUG## fw_cfg_add_bytes_callback: key 0x21 len 0x64f7
> > > 
> > > 
> > > ###UEFI InstallQemuFwCfgTables: "etc/table-loader" has FwCfgItem
> > 0x27 size 0xD00
> > > ##QEMU_DEBUG## virt_acpi_build_update:

RE: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support)

2019-09-24 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Laszlo Ersek [mailto:ler...@redhat.com]
> Sent: 24 September 2019 16:53
> To: Shameerali Kolothum Thodi ;
> Igor Mammedov 
> Cc: Auger Eric ; shannon.zha...@gmail.com;
> peter.mayd...@linaro.org; qemu-devel@nongnu.org; qemu-...@nongnu.org;
> xuwei (O) ; Linuxarm ; Ard
> Biesheuvel ; Leif Lindholm (Linaro address)
> 
> Subject: Re: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4]
> ARM virt: ACPI memory hotplug support)
> 
> On 09/20/19 19:04, Shameerali Kolothum Thodi wrote:
> > Hi Laszlo/Igor,
> >
> > I spend some time to debug this further as I was rebasing the nvdimm
> > hot-add support patches on top of the ongoing pc-dimm hot add ones.
> >
> > Just to refresh the memory:
> >
> > https://patchwork.kernel.org/cover/10783589/
> >
> > " It is observed that hot adding nvdimm will results in guest reboot
> > failure. EDK2 fails to build the ACPI tables on reboot. Please find
> > below EDK2 log on Guest reboot after nvdimm hot-add,
> >
> > ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
> > OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > "
> >
> > Please find below,
> >
> >> -Original Message-
> >> From: Laszlo Ersek [mailto:ler...@redhat.com]
> >> Sent: 05 March 2019 12:15
> >> To: Igor Mammedov 
> >> Cc: Shameerali Kolothum Thodi ;
> >> Auger Eric ; shannon.zha...@gmail.com;
> >> peter.mayd...@linaro.org; qemu-devel@nongnu.org;
> qemu-...@nongnu.org;
> >> xuwei (O) ; Linuxarm ; Ard
> >> Biesheuvel ; Leif Lindholm (Linaro
> >> address) 
> >> Subject: Re: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
> >>
> >> On 03/01/19 18:39, Igor Mammedov wrote:
> >>> On Fri, 1 Mar 2019 14:49:45 +0100
> >>> Laszlo Ersek  wrote:
> >>>
> >>>> On 02/28/19 15:02, Shameerali Kolothum Thodi wrote:
> >>>>
> >>>>> Ah..I missed the fact that, firmware indeed sees an update in the
> >>>>> blob len here (rounded or not) after reboot. So don't think x86
> >>>>> has the same issue and padding is not the right solution as Igor
> >>>>> explained in his reply.
> >>>>>
> >>>>> I will try to debug this further. Any pointers welcome.
> >>>>
> >>>> How about this.
> >>>>
> >>>> (1) The firmware looks up the fw_cfg file called "etc/table-loader"
> >>>> in the fw_cfg file directory (identified by constant selector key
> >>>> 0x0019, FW_CFG_FILE_DIR).
> >>>>
> >>>> (2) The directory entry, once found, tells the firmware two things
> >>>> simultaneously. The selector key, and the size of the blob.
> >>>>
> >>>> (3) The firmware selects the selector key from step (2).
> >>>>
> >>>> (4) QEMU regenerates the ACPI payload (as a select callback).
> >>>>
> >>>> (5) The firmware reads the number of bytes from the fw_cfg blob
> >>>> that it learned in step (2).
> >>>>
> >>>> Here's the problem. As long as QEMU used to perform step (4) only
> >>>> for the purpose of refreshing PCI resources in the ACPI payload,
> >>>> step (4) wouldn't *resize* the blob.
> >>>>
> >>>> However, if step (4) enlarges the blob, then the byte count that
> >>>> step (5) uses -- from step (2) -- for reading, is obsolete.
> >>
> >>> I've thought that was a problem with IO based fw_cfg, as reading
> >>> size/content were separates steps and that it was solved by DMA
> >>> based fw_cfg file read.
> >>
> >> The DMA backend is not relevant for this question, for two reasons:
> >>
> >> (a) The question whether the fw_cfg transfer takes places with port
> >> IO vs. DMA is hidden from the fw_cfg client code; that code goes
> >> through an abstract library API.
> >>
> >> (b) While the DMA method indeed lets the firmware specify the details
> >> of the transfer with one action, the issue is with the number of
> >> bytes that the firmware requests (that is, not with *how* the
> >> firmware requests the transfer). The firmware has to know the size of
> >> the transfer before it can initiate the transfer (regardless of port
> >> IO vs. DMA).
> >>
> >>
> >> My question is: assume the firmwar

Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support)

2019-09-20 Thread Shameerali Kolothum Thodi
Hi Laszlo/Igor,

I spend some time to debug this further as I was rebasing the nvdimm
hot-add support patches on top of the ongoing pc-dimm hot add ones.

Just to refresh the memory:

https://patchwork.kernel.org/cover/10783589/

" It is observed that hot adding nvdimm will results in guest reboot
failure. EDK2 fails to build the ACPI tables on reboot. Please find
below EDK2 log on Guest reboot after nvdimm hot-add,

ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
OnRootBridgesConnected: InstallAcpiTables: Protocol Error
"

Please find below,

> -Original Message-
> From: Laszlo Ersek [mailto:ler...@redhat.com]
> Sent: 05 March 2019 12:15
> To: Igor Mammedov 
> Cc: Shameerali Kolothum Thodi ;
> Auger Eric ; shannon.zha...@gmail.com;
> peter.mayd...@linaro.org; qemu-devel@nongnu.org; qemu-...@nongnu.org;
> xuwei (O) ; Linuxarm ; Ard
> Biesheuvel ; Leif Lindholm (Linaro address)
> 
> Subject: Re: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
> 
> On 03/01/19 18:39, Igor Mammedov wrote:
> > On Fri, 1 Mar 2019 14:49:45 +0100
> > Laszlo Ersek  wrote:
> >
> >> On 02/28/19 15:02, Shameerali Kolothum Thodi wrote:
> >>
> >>> Ah..I missed the fact that, firmware indeed sees an update in the blob len
> here
> >>> (rounded or not) after reboot. So don’t think x86 has the same issue and
> padding
> >>> is not the right solution as Igor explained in his reply.
> >>>
> >>> I will try to debug this further. Any pointers welcome.
> >>
> >> How about this.
> >>
> >> (1) The firmware looks up the fw_cfg file called "etc/table-loader" in
> >> the fw_cfg file directory (identified by constant selector key 0x0019,
> >> FW_CFG_FILE_DIR).
> >>
> >> (2) The directory entry, once found, tells the firmware two things
> >> simultaneously. The selector key, and the size of the blob.
> >>
> >> (3) The firmware selects the selector key from step (2).
> >>
> >> (4) QEMU regenerates the ACPI payload (as a select callback).
> >>
> >> (5) The firmware reads the number of bytes from the fw_cfg blob that it
> >> learned in step (2).
> >>
> >> Here's the problem. As long as QEMU used to perform step (4) only for
> >> the purpose of refreshing PCI resources in the ACPI payload, step (4)
> >> wouldn't *resize* the blob.
> >>
> >> However, if step (4) enlarges the blob, then the byte count that step
> >> (5) uses -- from step (2) -- for reading, is obsolete.
> 
> > I've thought that was a problem with IO based fw_cfg, as reading
> size/content
> > were separates steps and that it was solved by DMA based fw_cfg file read.
> 
> The DMA backend is not relevant for this question, for two reasons:
> 
> (a) The question whether the fw_cfg transfer takes places with port IO
> vs. DMA is hidden from the fw_cfg client code; that code goes through an
> abstract library API.
> 
> (b) While the DMA method indeed lets the firmware specify the details of
> the transfer with one action, the issue is with the number of bytes that
> the firmware requests (that is, not with *how* the firmware requests the
> transfer). The firmware has to know the size of the transfer before it
> can initiate the transfer (regardless of port IO vs. DMA).
> 
> 
> My question is: assume the firmware item in question is selected, and
> the QEMU-side select callback runs (regenerating the ACPI payload). Does
> this action update the blob size in the fw_cfg file directory as well?

I think it doesn’t update the blob size on select callback which is the root
cause of this issue. And the reason looks like, qemu_ram_resize() function
returns without invoking the callback to update the blob size.
 
On boot up, Qemu builds the table and exposes it to guest,
  virt_acpi_setup()
acpi_add_rom_blob()
  rom_add_blob()
rom_set_mr()  --> mr is allocated here and ram_block used_length = 
HOST_PAGE_ALIGN(blob size);
fw_cfg_add_file_callback()
  fw_cfg_add_bytes_callback() --> This uses the blob size passed 
into it.

On select callback path,

virt_acpi_build_update()
   acpi_ram_update()
memory_region_ram_resize()
  qemu_ram_resize() -->. Here the newsize gets aligned to HOST_PAGE and 
callback is only called used_length != newsize.

https://github.com/qemu/qemu/blob/master/exec.c#L2180

Debug logs:
Initial boot:
##QEMU_DEBUG## rom_add_blob: file etc/acpi/tables size 0x64f7
##QEMU_DEBUG## fw_cfg_add_bytes_callback: key 0x21 len 0x64f7


###UEFI InstallQemuFwCfgTables: "etc/table-loader" has FwCfgItem 0x27 size 
0xD00
##Q

Re: [Qemu-devel] [PATCH-for-4.2 v10 05/11] hw/arm/virt: Enable device memory cold/hot plug with ACPI boot

2019-09-16 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 11 September 2019 14:07
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; sa...@linux.intel.com;
> sebastien.bo...@intel.com; xuwei (O) ;
> ler...@redhat.com; ard.biesheu...@linaro.org; Linuxarm
> 
> Subject: Re: [PATCH-for-4.2 v10 05/11] hw/arm/virt: Enable device memory
> cold/hot plug with ACPI boot
> 
> On Wed, 4 Sep 2019 09:56:23 +0100
> Shameer Kolothum  wrote:
> 
> [...]
> > @@ -730,6 +733,19 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> VirtMachineState *vms)
> >vms->highmem, vms->highmem_ecam);
> >  acpi_dsdt_add_gpio(scope, [VIRT_GPIO],
> > (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
> > +if (vms->acpi_dev) {
> > +build_ged_aml(scope, "\\_SB."GED_DEVICE,
> > +  HOTPLUG_HANDLER(vms->acpi_dev),
> > +  irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE,
> AML_SYSTEM_MEMORY,
> > +  memmap[VIRT_ACPI_GED].base);
> > +}
> > +
> > +if (vms->acpi_dev && ms->ram_slots) {
> > +build_memory_hotplug_aml(scope, ms->ram_slots, "\\_SB",
> NULL,
> > + AML_SYSTEM_MEMORY,
> > +
> memmap[VIRT_PCDIMM_ACPI].base);
> > +}
> one more thing (though non critical), if ms->ram_slots == 0 
> makes IASL spew a warning
> 
> External (_SB_.MHPC.MSCN, MethodObj)// Warning: Unknown
> method, guessing 0 arguments
> 
> In general non-existing references within methods are fine if they aren't ever
> used.
> however we can be a little bit less sloppy.
> Below you advertise "event = ACPI_GED_MEM_HOTPLUG_EVT", and then here
> suddenly
> don't generate essential AML part for it here.

Ok.

> For consistency if above is conditioned on ms->ram_slots != 0, probably
> it would be better to move that condition where you set 'event' value and
> check property value above instead of ms->ram_slots

I understand the concern here, but not sure I get the suggestion to check 
the "property" instead of ms->ram_slots correctly. 

Is this what you have in mind?

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 538b3bbefa..5c9269dca1 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -742,10 +742,15 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
   memmap[VIRT_ACPI_GED].base);
 }
 
-if (vms->acpi_dev && ms->ram_slots) {
-build_memory_hotplug_aml(scope, ms->ram_slots, "\\_SB", NULL,
- AML_SYSTEM_MEMORY,
- memmap[VIRT_PCDIMM_ACPI].base);
+if (vms->acpi_dev) {
+uint32_t event = object_property_get_uint(OBJECT(vms->acpi_dev),
+  "ged-event", NULL);
+
+if (event & ACPI_GED_MEM_HOTPLUG_EVT) {
+build_memory_hotplug_aml(scope, ms->ram_slots, "\\_SB", NULL,
+ AML_SYSTEM_MEMORY,
+ memmap[VIRT_PCDIMM_ACPI].base);
+}
 }
 
 acpi_dsdt_add_power_button(scope);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index bc152ea2b0..6b024b16df 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -534,8 +534,13 @@ static void fdt_add_pmu_nodes(const VirtMachineState *vms)
 static inline DeviceState *create_acpi_ged(VirtMachineState *vms, qemu_irq 
*pic)
 {
 DeviceState *dev;
+MachineState *ms = MACHINE(vms);
 int irq = vms->irqmap[VIRT_ACPI_GED];
-uint32_t event = ACPI_GED_MEM_HOTPLUG_EVT;
+uint32_t event = 0;
+
+if (ms->ram_slots) {
+event = ACPI_GED_MEM_HOTPLUG_EVT;
+}
 
 dev = qdev_create(NULL, TYPE_ACPI_GED);
 qdev_prop_set_uint32(dev, "ged-event", event);

---8---

Please let me know.

Thanks,
Shameer

 
> [...]
> > +static inline DeviceState *create_acpi_ged(VirtMachineState *vms,
> qemu_irq *pic)
> > +{
> > +DeviceState *dev;
> > +int irq = vms->irqmap[VIRT_ACPI_GED];
> > +uint32_t event = ACPI_GED_MEM_HOTPLUG_EVT;
> > +
> > +dev = qdev_create(NULL, TYPE_ACPI_GED);
> > +qdev_prop_set_uint32(dev, "ged-event", event);
> > +qdev_init_nofail(dev);
> > +
> > +sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0,
> vms->memmap[VIRT_ACPI_GED].base);
> > +sysbus_mmio_map(SYS_BUS_DEVICE(dev), 1,
> vms->memmap[VIRT_PCDIMM_ACPI].base);
> > +
> > +sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irq]);
> > +
> > +return dev;
> > +}
> > +
> [...]



Re: [Qemu-devel] [PATCH-for-4.2 v10 10/11] tests: add dummy ACPI tables for arm/virt board

2019-09-11 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of Michael S. Tsirkin
> Sent: 11 September 2019 14:56
> To: Igor Mammedov 
> Cc: Peter Maydell ; Samuel Ortiz
> ; Ard Biesheuvel ;
> QEMU Developers ; Shameerali Kolothum Thodi
> ; Linuxarm
> ; Shannon Zhao ;
> qemu-arm ; xuwei (O) ; Eric
> Auger ; sebastien.bo...@intel.com; Laszlo Ersek
> 
> Subject: Re: [Qemu-devel] [PATCH-for-4.2 v10 10/11] tests: add dummy ACPI
> tables for arm/virt board
> 
> On Wed, Sep 11, 2019 at 03:50:15PM +0200, Igor Mammedov wrote:
> > On Wed, 11 Sep 2019 13:57:06 +0100
> > Peter Maydell  wrote:
> >
> > > On Wed, 4 Sep 2019 at 09:58, Shameer Kolothum
> > >  wrote:
> > > >
> > > > This patch is in preparation for adding numamem and memhp tests
> > > > to arm/virt board so that 'make check' is happy. This may not
> > > > be required once the scripts are run and new tables are
> > > > generated with ".numamem" and ".memhp" extensions.
> > > >
> > > > Signed-off-by: Shameer Kolothum
> 
> > > > ---
> > > > I am not sure this is the right way to do this. But without this, when
> > > > the numamem and memhp tests are added, you will get,
> > > >
> > > > Looking for expected file 'tests/data/acpi/virt/SRAT.numamem'
> > > > Looking for expected file 'tests/data/acpi/virt/SRAT'
> > > > **
> > > > ERROR:tests/bios-tables-test.c:327:load_expected_aml: assertion failed:
> (exp_sdt.aml_file)
> > > >
> > > > ---
> > > >  tests/data/acpi/virt/SLIT | Bin 0 -> 48 bytes
> > > >  tests/data/acpi/virt/SRAT | Bin 0 -> 224 bytes
> > > >  2 files changed, 0 insertions(+), 0 deletions(-)
> > > >  create mode 100644 tests/data/acpi/virt/SLIT
> > > >  create mode 100644 tests/data/acpi/virt/SRAT
> > >
> > > Do the tests pass with this patch and without the
> > > patch that adds the tests? (That is, can we keep the
> > > two patches separate without breaking bisection, or
> > > do we need to squash them together?)
> > >
> > > I'll leave it to somebody who understands the ACPI
> > > tests stuff to answer whether there's a better way to
> > I'd squash this patch into 11/11 test case,
> 
> 
> Pls don't - the way to add this is to add the files in question to
> tests/bios-tables-test-allowed-diff.h.

IIRC, I have tried that but didn't work. I think the reason being, these
are new test cases for arm/virt and both SRAT and SLIT tables are not
present in the tests/data/acpi/virt folder.

As you can see the error is different,

> > > > Looking for expected file 'tests/data/acpi/virt/SRAT.numamem'
> > > > Looking for expected file 'tests/data/acpi/virt/SRAT'
> > > > **
> > > > ERROR:tests/bios-tables-test.c:327:load_expected_aml: assertion failed:

Not sure I missed anything though.

Thanks,
Shameer

> Maintainer will create a separate commit updating
> the binaries and removing them from the whitelist.
> 
> This way things like rebase work seemlessly.
> 
> 
> > CCing Michael (since he's the one who applies ACPI patches)
> >
> > > do this.
> > >
> > > thanks
> > > -- PMM
> > >




Re: [Qemu-devel] [PATCH-for-4.2 v9 06/12] hw/arm/virt: Enable device memory cold/hot plug with ACPI boot

2019-09-02 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 01 September 2019 12:23
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> ard.biesheu...@linaro.org; Linuxarm ; xuwei (O)
> ; shannon.zha...@gmail.com;
> sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH-for-4.2 v9 06/12] hw/arm/virt: Enable device
> memory cold/hot plug with ACPI boot
> 
> Hi Shameer,
> On 9/1/19 1:18 PM, Auger Eric wrote:
> > Hi Shameer,
> >
> > On 8/13/19 11:05 PM, Shameer Kolothum wrote:
> >> This initializes the GED device with base memory and irq, configures
> >> ged memory hotplug event and builds the corresponding aml code. With
> >> this, both hot and cold plug of device memory is enabled now for
> >> Guest with ACPI boot.
> >>
> >> Memory cold plug support with Guest DT boot is not yet supported.
> >
> > I think you should comment about bios-tables-test-allowed-diff.h update.
> > Can't you update the table instead of ignoring the test?
> >
> > Thanks
> >
> > Eric
> >>
> >> Signed-off-by: Shameer Kolothum
> >> 
> >> ---
> >> v8 --> v9
> >>  -Changes related to GED being a TYPE_SYS_BUS_DEVICE now.
> >>  -Error propagation to _plug() handler.
> >>  -Removed R-by by Eric for now.
> >>
> >> v7 --> v8
> >>  -Changed no_acpi_dev to no_ged.
> >>  -Fixed 'dev' reference leak by object_new().
> >>  -Updated bios-tables-test-allowed-diff.h to avoid "make check"
> >>   failure.
> >>
> >> ---
> >>  hw/arm/Kconfig|  2 +
> >>  hw/arm/virt-acpi-build.c  | 16 +++
> >>  hw/arm/virt.c | 62
> ---
> >>  include/hw/arm/virt.h |  4 ++
> >>  tests/bios-tables-test-allowed-diff.h |  1 +
> >>  5 files changed, 78 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index
> >> 84961c17ab..ad7f7c089b 100644
> >> --- a/hw/arm/Kconfig
> >> +++ b/hw/arm/Kconfig
> >> @@ -22,6 +22,8 @@ config ARM_VIRT
> >>  select ACPI_PCI
> >>  select MEM_DEVICE
> >>  select DIMM
> >> +select ACPI_MEMORY_HOTPLUG
> >> +select ACPI_HW_REDUCED
> >>
> >>  config CHEETAH
> >>  bool
> >> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> >> index 0afb372769..63fa845076 100644
> >> --- a/hw/arm/virt-acpi-build.c
> >> +++ b/hw/arm/virt-acpi-build.c
> >> @@ -40,6 +40,8 @@
> >>  #include "hw/acpi/aml-build.h"
> >>  #include "hw/acpi/utils.h"
> >>  #include "hw/acpi/pci.h"
> >> +#include "hw/acpi/memory_hotplug.h"
> >> +#include "hw/acpi/generic_event_device.h"
> >>  #include "hw/pci/pcie_host.h"
> >>  #include "hw/pci/pci.h"
> >>  #include "hw/arm/virt.h"
> >> @@ -705,6 +707,7 @@ static void
> >>  build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState
> >> *vms)  {
> >>  Aml *scope, *dsdt;
> >> +MachineState *ms = MACHINE(vms);
> >>  const MemMapEntry *memmap = vms->memmap;
> >>  const int *irqmap = vms->irqmap;
> >>
> >> @@ -729,6 +732,19 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> VirtMachineState *vms)
> >>vms->highmem, vms->highmem_ecam);
> >>  acpi_dsdt_add_gpio(scope, [VIRT_GPIO],
> >> (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
> >> +if (vms->acpi_dev) {
> >> +build_ged_aml(scope, "\\_SB."GED_DEVICE,
> >> +  HOTPLUG_HANDLER(vms->acpi_dev),
> >> +  irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE,
> AML_SYSTEM_MEMORY,
> >> +  memmap[VIRT_ACPI_GED].base);
> >> +}
> >> +
> >> +if (vms->acpi_dev && ms->ram_slots) {
> >> +build_memory_hotplug_aml(scope, ms->ram_slots, "\\_SB",
> NULL,
> >> + AML_SYSTEM_MEMORY,
> >> +
> memmap[VIRT_PCDIMM_ACPI].base);
> >> +}
> >> +
> >>  acpi_dsdt_add_power_button(scope);
> >>
> >>  

Re: [Qemu-devel] [PATCH-for-4.2 v9 06/12] hw/arm/virt: Enable device memory cold/hot plug with ACPI boot

2019-09-02 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 01 September 2019 12:19
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> sa...@linux.intel.com; sebastien.bo...@intel.com; xuwei (O)
> ; ler...@redhat.com; ard.biesheu...@linaro.org;
> Linuxarm 
> Subject: Re: [PATCH-for-4.2 v9 06/12] hw/arm/virt: Enable device memory
> cold/hot plug with ACPI boot
> 
> Hi Shameer,
> 
> On 8/13/19 11:05 PM, Shameer Kolothum wrote:
> > This initializes the GED device with base memory and irq, configures
> > ged memory hotplug event and builds the corresponding aml code. With
> > this, both hot and cold plug of device memory is enabled now for Guest
> > with ACPI boot.
> >
> > Memory cold plug support with Guest DT boot is not yet supported.
> 
> I think you should comment about bios-tables-test-allowed-diff.h update.

Ok. I will add that.

> Can't you update the table instead of ignoring the test?

I think that is not how it is handled now. The process is,

"Expected table change is then handled like this:
1. add table to diff allowed list
2. change generating code (can be combined with 1)
3. maintainer runs a script to update expected +
   blows away allowed diff list "
https://patchwork.kernel.org/patch/10967339/

Thanks,
Shameer

> 
> Thanks
> 
> Eric
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> > v8 --> v9
> >  -Changes related to GED being a TYPE_SYS_BUS_DEVICE now.
> >  -Error propagation to _plug() handler.
> >  -Removed R-by by Eric for now.
> >
> > v7 --> v8
> >  -Changed no_acpi_dev to no_ged.
> >  -Fixed 'dev' reference leak by object_new().
> >  -Updated bios-tables-test-allowed-diff.h to avoid "make check"
> >   failure.
> >
> > ---
> >  hw/arm/Kconfig|  2 +
> >  hw/arm/virt-acpi-build.c  | 16 +++
> >  hw/arm/virt.c | 62
> ---
> >  include/hw/arm/virt.h |  4 ++
> >  tests/bios-tables-test-allowed-diff.h |  1 +
> >  5 files changed, 78 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index
> > 84961c17ab..ad7f7c089b 100644
> > --- a/hw/arm/Kconfig
> > +++ b/hw/arm/Kconfig
> > @@ -22,6 +22,8 @@ config ARM_VIRT
> >  select ACPI_PCI
> >  select MEM_DEVICE
> >  select DIMM
> > +select ACPI_MEMORY_HOTPLUG
> > +select ACPI_HW_REDUCED
> >
> >  config CHEETAH
> >  bool
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index
> > 0afb372769..63fa845076 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -40,6 +40,8 @@
> >  #include "hw/acpi/aml-build.h"
> >  #include "hw/acpi/utils.h"
> >  #include "hw/acpi/pci.h"
> > +#include "hw/acpi/memory_hotplug.h"
> > +#include "hw/acpi/generic_event_device.h"
> >  #include "hw/pci/pcie_host.h"
> >  #include "hw/pci/pci.h"
> >  #include "hw/arm/virt.h"
> > @@ -705,6 +707,7 @@ static void
> >  build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState
> > *vms)  {
> >  Aml *scope, *dsdt;
> > +MachineState *ms = MACHINE(vms);
> >  const MemMapEntry *memmap = vms->memmap;
> >  const int *irqmap = vms->irqmap;
> >
> > @@ -729,6 +732,19 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> VirtMachineState *vms)
> >vms->highmem, vms->highmem_ecam);
> >  acpi_dsdt_add_gpio(scope, [VIRT_GPIO],
> > (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
> > +if (vms->acpi_dev) {
> > +build_ged_aml(scope, "\\_SB."GED_DEVICE,
> > +  HOTPLUG_HANDLER(vms->acpi_dev),
> > +  irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE,
> AML_SYSTEM_MEMORY,
> > +  memmap[VIRT_ACPI_GED].base);
> > +}
> > +
> > +if (vms->acpi_dev && ms->ram_slots) {
> > +build_memory_hotplug_aml(scope, ms->ram_slots, "\\_SB",
> NULL,
> > + AML_SYSTEM_MEMORY,
> > +
> memmap[VIRT_PCDIMM_ACPI].base);
> > +}
> > +
> >  acpi_dsdt_add_power_button(scope);
> >
> >  aml_append(dsdt, scope);
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c i

Re: [Qemu-devel] [PATCH-for-4.2 v9 01/12] hw/acpi: Make ACPI IO address space configurable

2019-08-29 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 29 August 2019 13:38
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org; sa...@linux.intel.com;
> ard.biesheu...@linaro.org; Linuxarm ;
> shannon.zha...@gmail.com; sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [PATCH-for-4.2 v9 01/12] hw/acpi: Make ACPI IO address space
> configurable
> 
> On Thu, 29 Aug 2019 11:04:27 +
> Shameerali Kolothum Thodi  wrote:

[...]

> >
> > I think what happens is since we are now passing the memhp_io_base
> directly into the
> > build_memory_hotplug_aml() and removed the "static uint16_t
> memhp_io_base", on
> > x86, memory hotplug aml code is always built by default irrespective of
> whether
> > acpi_memory_hotplug_init() is invoked or not.
> >
> > I could either reintroduce a check in build_memory_hotplug_aml() to make
> sure
> > acpi_memory_hotplug_init() is called, or could do something like below,

> fix looks fine to me, see minor comment below

Ok
 
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 3995f9a40f..17756c2191 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -1873,9 +1873,12 @@ build_dsdt(GArray *table_data, BIOSLinker
> *linker,
> >  build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
> > "\\_SB.PCI0", "\\_GPE._E02");
> >  }
> > -build_memory_hotplug_aml(dsdt, nr_mem, "\\_SB.PCI0",
> > - "\\_GPE._E03", AML_SYSTEM_IO,
> > - pcms->memhp_io_base);
> > +
> > +if (acpi_enabled && pcms->acpi_dev && nr_mem) {
> double-check call path and see if
>   acpi_enabled && pcms->acpi_dev
> is really necessary

Right, looks like those are always true. I will remove those.

Also appreciate if you could take a look at rest of the series and then I can
re-spin along with this.

Thanks,
Shameer
 



Re: [Qemu-devel] [PATCH-for-4.2 v9 01/12] hw/acpi: Make ACPI IO address space configurable

2019-08-29 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 29 August 2019 09:45
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org; sa...@linux.intel.com;
> ard.biesheu...@linaro.org; Linuxarm ;
> shannon.zha...@gmail.com; sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [PATCH-for-4.2 v9 01/12] hw/acpi: Make ACPI IO address space
> configurable
> 
> On Thu, 15 Aug 2019 08:42:48 +
> Shameerali Kolothum Thodi  wrote:
> 
> > > -Original Message-
> > > From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameer
> > > Kolothum
> > > Sent: 13 August 2019 22:05
> > > To: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > > eric.au...@redhat.com; imamm...@redhat.com
> > > Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> > > ard.biesheu...@linaro.org; Linuxarm ;
> > > shannon.zha...@gmail.com; sebastien.bo...@intel.com;
> ler...@redhat.com
> > > Subject: [PATCH-for-4.2 v9 01/12] hw/acpi: Make ACPI IO address space
> > > configurable
> > >
> > > This is in preparation for adding support for ARM64 platforms
> > > where it doesn't use port mapped IO for ACPI IO space. We are
> > > making changes so that MMIO region can be accommodated
> > > and board can pass the base address into the aml build function.
> >
> > Looks like, this now breaks the "make check" on x86_64 and needs
> > updating bios-tables-test-allowed-diff.h with DSDT entries. But I am
> > not sure what changed now compared to v8(and older ones) that makes
> > it to complain now!.
> 
> you could see diff of what's changed but running test manually with
> V=1 env var if you have 'iasl' installed
> 
> V=1 QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
> tests/bios-tables-test

Thanks for that tip and please find below output.

/x86_64/acpi/piix4: Could not access KVM kernel module: No such file or 
directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
acpi-test: Warning! DSDT binary file mismatch. Actual [aml:/tmp/aml-RGE76Z], 
Expected [aml:tests/data/acpi/pc/DSDT].
acpi-test: Warning! DSDT mismatch. Actual [asl:/tmp/asl-TAE76Z.dsl, 
aml:/tmp/aml-RGE76Z], Expected [asl:/tmp/asl-O6B76Z.dsl, 
aml:tests/data/acpi/pc/DSDT].

diff --git a/tmp/asl-O6B76Z.dsl b/tmp/asl-TAE76Z.dsl
index 823ff002ec..4de5bd3221 100644
--- a/tmp/asl-O6B76Z.dsl
+++ b/tmp/asl-TAE76Z.dsl
@@ -5,13 +5,13 @@
  *
  * Disassembling to symbolic ASL+ operators
  *
- * Disassembly of tests/data/acpi/pc/DSDT, Thu Aug 29 10:40:40 2019
+ * Disassembly of /tmp/aml-RGE76Z, Thu Aug 29 10:40:40 2019
  *
  * Original Table Header:
  * Signature"DSDT"
- * Length   0x140B (5131)
+ * Length   0x17E4 (6116)
  * Revision 0x01  32-bit table (V1), no 64-bit math support
- * Checksum 0xB1
+ * Checksum 0x8B
  * OEM ID   "BOCHS "
  * OEM Table ID "BXPCDSDT"
  * OEM Revision 0x0001 (1)
@@ -787,6 +787,206 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPCDSDT", 
0x0001)
 \_SB.CPUS.CSCN ()
 }

+Device (\_SB.PCI0.MHPD)
+{
+Name (_HID, "PNP0A06" /* Generic Container Device */)  // _HID: 
Hardware ID
+Name (_UID, "Memory hotplug resources")  // _UID: Unique ID
+Name (_CRS, Reso 

I think what happens is since we are now passing the memhp_io_base directly 
into the 
build_memory_hotplug_aml() and removed the "static uint16_t memhp_io_base", on 
x86, memory hotplug aml code is always built by default irrespective of whether
acpi_memory_hotplug_init() is invoked or not. 

I could either reintroduce a check in build_memory_hotplug_aml() to make sure
acpi_memory_hotplug_init() is called, or could do something like below, 

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 3995f9a40f..17756c2191 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1873,9 +1873,12 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
"\\_SB.PCI0", "\\_GPE._E02");
 }
-build_memory_hotplug_aml(dsdt, nr_mem, "\\_SB.PCI0",
- "\\_GPE._E03", AML_SYSTEM_IO,
- pcms->memhp_io_base);
+
+if (acpi_enabled && pcms->acpi_dev && nr_mem) {
+build_memory_hotplug_aml(dsdt, nr_mem, "\\_SB.PCI0",
+ "\\_GPE._E03"

Re: [Qemu-devel] [PATCH-for-4.2 v9 01/12] hw/acpi: Make ACPI IO address space configurable

2019-08-15 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of Shameer
> Kolothum
> Sent: 13 August 2019 22:05
> To: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> ard.biesheu...@linaro.org; Linuxarm ;
> shannon.zha...@gmail.com; sebastien.bo...@intel.com; ler...@redhat.com
> Subject: [PATCH-for-4.2 v9 01/12] hw/acpi: Make ACPI IO address space
> configurable
> 
> This is in preparation for adding support for ARM64 platforms
> where it doesn't use port mapped IO for ACPI IO space. We are
> making changes so that MMIO region can be accommodated
> and board can pass the base address into the aml build function.

Looks like, this now breaks the "make check" on x86_64 and needs
updating bios-tables-test-allowed-diff.h with DSDT entries. But I am 
not sure what changed now compared to v8(and older ones) that makes
it to complain now!. 

Patchew URL: 
https://patchew.org/QEMU/20190813210539.31164-1-shameerali.kolothum.th...@huawei.com/

ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:447:test_acpi_asl: assertion 
failed: (all_tables_match)

Thanks,
Shameer

> Also move few MEMORY_* definitions to header so that other memory
> hotplug event signalling mechanisms (eg. Generic Event Device on
> HW-reduced acpi platforms) can use the same from their respective
> event handler code.
> 
> Signed-off-by: Shameer Kolothum 
> ---
> v8 --> v9
>   -base address is an input into build_memory_hotplug_aml()
>   -Removed R-by tags from Igor and Eric for now.
> ---
>  hw/acpi/memory_hotplug.c | 29 ++---
>  hw/i386/acpi-build.c |  4 +++-
>  hw/i386/pc.c |  3 +++
>  include/hw/acpi/memory_hotplug.h |  9 +++--
>  include/hw/i386/pc.h |  3 +++
>  5 files changed, 30 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
> index 297812d5f7..1734d4b44f 100644
> --- a/hw/acpi/memory_hotplug.c
> +++ b/hw/acpi/memory_hotplug.c
> @@ -29,12 +29,7 @@
>  #define MEMORY_SLOT_PROXIMITY_METHOD "MPXM"
>  #define MEMORY_SLOT_EJECT_METHOD "MEJ0"
>  #define MEMORY_SLOT_NOTIFY_METHOD"MTFY"
> -#define MEMORY_SLOT_SCAN_METHOD  "MSCN"
>  #define MEMORY_HOTPLUG_DEVICE"MHPD"
> -#define MEMORY_HOTPLUG_IO_LEN 24
> -#define MEMORY_DEVICES_CONTAINER "\\_SB.MHPC"
> -
> -static uint16_t memhp_io_base;
> 
>  static ACPIOSTInfo *acpi_memory_device_status(int slot, MemStatus *mdev)
>  {
> @@ -209,7 +204,7 @@ static const MemoryRegionOps
> acpi_memory_hotplug_ops = {
>  };
> 
>  void acpi_memory_hotplug_init(MemoryRegion *as, Object *owner,
> -  MemHotplugState *state, uint16_t
> io_base)
> +  MemHotplugState *state, hwaddr
> io_base)
>  {
>  MachineState *machine = MACHINE(qdev_get_machine());
> 
> @@ -218,12 +213,10 @@ void acpi_memory_hotplug_init(MemoryRegion *as,
> Object *owner,
>  return;
>  }
> 
> -assert(!memhp_io_base);
> -memhp_io_base = io_base;
>  state->devs = g_malloc0(sizeof(*state->devs) * state->dev_count);
>  memory_region_init_io(>io, owner, _memory_hotplug_ops,
> state,
>"acpi-mem-hotplug",
> MEMORY_HOTPLUG_IO_LEN);
> -memory_region_add_subregion(as, memhp_io_base, >io);
> +memory_region_add_subregion(as, io_base, >io);
>  }
> 
>  /**
> @@ -342,7 +335,8 @@ const VMStateDescription vmstate_memory_hotplug
> = {
> 
>  void build_memory_hotplug_aml(Aml *table, uint32_t nr_mem,
>const char *res_root,
> -  const char *event_handler_method)
> +  const char *event_handler_method,
> +  AmlRegionSpace rs, hwaddr
> memhp_io_base)
>  {
>  int i;
>  Aml *ifctx;
> @@ -365,14 +359,19 @@ void build_memory_hotplug_aml(Aml *table,
> uint32_t nr_mem,
>  aml_name_decl("_UID", aml_string("Memory hotplug
> resources")));
> 
>  crs = aml_resource_template();
> -aml_append(crs,
> -aml_io(AML_DECODE16, memhp_io_base, memhp_io_base, 0,
> -   MEMORY_HOTPLUG_IO_LEN)
> -);
> +if (rs == AML_SYSTEM_IO) {
> +aml_append(crs,
> +aml_io(AML_DECODE16, memhp_io_base,
> memhp_io_base, 0,
> +   MEMORY_HOTPLUG_IO_LEN)
> +);
> +} else {
> +aml_append(crs, aml_memory32_fixed(memhp_io_base,
> +MEMORY_HOTPLUG_IO_LEN,
> AML_READ_WRITE));
> +}
>  aml_append(mem_ctrl_dev, aml_name_decl("_CRS", crs));
> 
>  aml_append(mem_ctrl_dev, aml_operation_region(
> -MEMORY_HOTPLUG_IO_REGION, AML_SYSTEM_IO,
> +MEMORY_HOTPLUG_IO_REGION, rs,
>  aml_int(memhp_io_base), MEMORY_HOTPLUG_IO_LEN)
>  );
> 
> diff --git 

Re: [Qemu-devel] [PATCH-for-4.2 v8 7/9] hw/arm/virt-acpi-build: Add PC-DIMM in SRAT

2019-08-09 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of Igor Mammedov
> Sent: 06 August 2019 14:22
> To: Shameerali Kolothum Thodi 
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> ard.biesheu...@linaro.org; shannon.zha...@gmail.com;
> qemu-devel@nongnu.org; xuwei (O) ; Linuxarm
> ; eric.au...@redhat.com; qemu-...@nongnu.org;
> sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH-for-4.2 v8 7/9] hw/arm/virt-acpi-build: Add
> PC-DIMM in SRAT
> 
> On Fri, 26 Jul 2019 11:45:17 +0100
> Shameer Kolothum  wrote:
> 
> > Generate Memory Affinity Structures for PC-DIMM ranges.
> >
> > Signed-off-by: Shameer Kolothum 
> > Signed-off-by: Eric Auger 
> > Reviewed-by: Igor Mammedov 
> > ---
> >  hw/arm/virt-acpi-build.c | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index 018b1e326d..75657caa36 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -518,6 +518,7 @@ build_srat(GArray *table_data, BIOSLinker *linker,
> VirtMachineState *vms)
> >  int i, srat_start;
> >  uint64_t mem_base;
> >  MachineClass *mc = MACHINE_GET_CLASS(vms);
> > +MachineState *ms = MACHINE(vms);
> >  const CPUArchIdList *cpu_list =
> mc->possible_cpu_arch_ids(MACHINE(vms));
> >
> >  srat_start = table_data->len;
> > @@ -543,6 +544,14 @@ build_srat(GArray *table_data, BIOSLinker *linker,
> VirtMachineState *vms)
> >  }
> >  }
> >
> > +if (ms->device_memory) {
> > +numamem = acpi_data_push(table_data, sizeof *numamem);
> > +build_srat_memory(numamem, ms->device_memory->base,
> > +
> memory_region_size(>device_memory->mr),
> > +  nb_numa_nodes - 1,
> > +  MEM_AFFINITY_HOTPLUGGABLE |
> MEM_AFFINITY_ENABLED);
> > +}
> > +
> >  build_header(linker, table_data, (void *)(table_data->data +
> srat_start),
> >   "SRAT", table_data->len - srat_start, 3, NULL, NULL);
> >  }
> 
> missing entry in
>   tests/bios-tables-test-allowed-diff.h

I can't find any SRAT file in tests/data/acpi/virt. Arm/virt doesn't have much
tests in bios-tables-test.c. So does it make any difference?

> PS:
> I don't really know what ARM guest kernel expects but on x86 we had to enable
> numa
> for guest to figure out max_possible_pfn
> (see: in linux.git: 8dd330300197 / ec941c5ffede).

>From whatever I can find, doesn't look like there is any special handling of
max_possible_pfn in ARM64 world. The variable seems to be only updated
in acpi_numa_memory_affinity_init()

https://elixir.bootlin.com/linux/v5.3-rc3/source/drivers/acpi/numa.c#L298

Is there any way to test this in Guest to see whether this is actually a 
problem?

Thanks,
Shameer

> It's worth to check if we might need a patch for turning on NUMA
> (how to do it in QEMU see: auto_enable_numa_with_memhp)



Re: [Qemu-devel] [PATCH-for-4.2 v8 6/9] hw/arm/virt: Enable device memory cold/hot plug with ACPI boot

2019-08-07 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 07 August 2019 10:15
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org; sa...@linux.intel.com;
> ard.biesheu...@linaro.org; Linuxarm ; xuwei (O)
> ; shannon.zha...@gmail.com;
> sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH-for-4.2 v8 6/9] hw/arm/virt: Enable device
> memory cold/hot plug with ACPI boot
> 
> On Wed, 7 Aug 2019 08:19:16 +
> Shameerali Kolothum Thodi  wrote:
> 
> > Hi Igor,
> >
> > > -Original Message-
> > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > Sent: 06 August 2019 14:09
> > > To: Shameerali Kolothum Thodi 
> > > Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > > eric.au...@redhat.com; peter.mayd...@linaro.org;
> sa...@linux.intel.com;
> > > ard.biesheu...@linaro.org; Linuxarm ; xuwei (O)
> > > ; shannon.zha...@gmail.com;
> > > sebastien.bo...@intel.com; ler...@redhat.com
> > > Subject: Re: [Qemu-devel] [PATCH-for-4.2 v8 6/9] hw/arm/virt: Enable
> device
> > > memory cold/hot plug with ACPI boot
> >
> > [...]
> >
> > > > +static inline DeviceState *create_acpi_ged(VirtMachineState *vms,
> > > qemu_irq *pic)
> > > > +{
> > > > +DeviceState *dev;
> > > > +int irq = vms->irqmap[VIRT_ACPI_GED];
> > > > +uint32_t event = ACPI_GED_MEM_HOTPLUG_EVT;
> > > > +
> > > > +dev = DEVICE(object_new(TYPE_ACPI_GED));
> > > > +qdev_prop_set_uint64(dev, "memhp-base",
> > > > +
> vms->memmap[VIRT_PCDIMM_ACPI].base);
> > > > +qdev_prop_set_uint64(dev, "ged-base",
> > > vms->memmap[VIRT_ACPI_GED].base);
> > > > +qdev_prop_set_uint32(dev, "ged-event", event);
> > > > +object_property_add_child(qdev_get_machine(), "acpi-ged",
> > > > +  OBJECT(dev), NULL);
> > > > +qdev_init_nofail(dev);
> > > > +qdev_connect_gpio_out_named(dev, "ged-irq", 0, pic[irq]);
> > > > +
> > > > +object_unref(OBJECT(dev));
> > > > +
> > > > +return dev;
> > > > +}
> > >
> > > this function will need changes to accommodate for sysbus device
> > > init sequence [3/9].
> >
> > Yes. I think we are proposing to use sysbus_mmio_map() here for "ged-base".
> > But what about " memhp-base"? Is it ok to invoke
> > acpi_memory_hotplug_init(get_system_memoty(), ...) from ged device?
> no it's not ok.
> 
> One could expose container memory region as sysbus mmio and then put
> ged-io and AcpiGedState::memhp_state::io within it.
> something like:
> 
> board:
> sysbus_mmio_map(ged, 0 /* io_contaner number */, ged-base)
> 
> ged_initfn()
> register io_container as sysbus mmio region
> 
> ged_realize()
> memory_region_add_subregion(>io_container, 0,
> _st->io);
> acpi_memory_hotplug_init(>io_container,,
> >acpi_memory_hotplug, AFTER_GED_IO_OFFSET)
> 
> that would make GED's MMIO available to guest at ged-base and memhp IO
> will be available at address after it.
> You can go even further (more flexible) and register ged_st->io as separate
> sysbus mmio and use a container exclusively for memhp, in this case you'd be
> able to map memhp io from board independently from ged-base.

Ok. Understood. Thanks.

But looks like both the approaches would require changes to 
build_memory_hotplug_aml()
code as acpi_memory_hotplug_init() stores the io_base and reuse that in _aml() 
code.

I will have a go and see.

Thanks,
Shameer

> 
> > Or go with _set_link() function to pass the address space ?
> >
> > Thanks,
> > Shameer
> >
> >
> > > > +
> > > >  static void create_its(VirtMachineState *vms, DeviceState *gicdev)
> > > >  {
> > > >  const char *itsclass = its_class_name();
> > > > @@ -1483,6 +1508,7 @@ static void machvirt_init(MachineState
> *machine)
> > > >  MemoryRegion *ram = g_new(MemoryRegion, 1);
> > > >  bool firmware_loaded;
> > > >  bool aarch64 = true;
> > > > +bool has_ged = !vmc->no_ged;
> > > >  unsigned int smp_cpus = machine->smp.cpus;
> > > >  unsigned int max_cpus = machine->smp.max_cpus;
> > > >
>

Re: [Qemu-devel] [PATCH-for-4.2 v8 6/9] hw/arm/virt: Enable device memory cold/hot plug with ACPI boot

2019-08-07 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 06 August 2019 14:09
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org; sa...@linux.intel.com;
> ard.biesheu...@linaro.org; Linuxarm ; xuwei (O)
> ; shannon.zha...@gmail.com;
> sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH-for-4.2 v8 6/9] hw/arm/virt: Enable device
> memory cold/hot plug with ACPI boot
 
[...]

> > +static inline DeviceState *create_acpi_ged(VirtMachineState *vms,
> qemu_irq *pic)
> > +{
> > +DeviceState *dev;
> > +int irq = vms->irqmap[VIRT_ACPI_GED];
> > +uint32_t event = ACPI_GED_MEM_HOTPLUG_EVT;
> > +
> > +dev = DEVICE(object_new(TYPE_ACPI_GED));
> > +qdev_prop_set_uint64(dev, "memhp-base",
> > + vms->memmap[VIRT_PCDIMM_ACPI].base);
> > +qdev_prop_set_uint64(dev, "ged-base",
> vms->memmap[VIRT_ACPI_GED].base);
> > +qdev_prop_set_uint32(dev, "ged-event", event);
> > +object_property_add_child(qdev_get_machine(), "acpi-ged",
> > +  OBJECT(dev), NULL);
> > +qdev_init_nofail(dev);
> > +qdev_connect_gpio_out_named(dev, "ged-irq", 0, pic[irq]);
> > +
> > +object_unref(OBJECT(dev));
> > +
> > +return dev;
> > +}
> 
> this function will need changes to accommodate for sysbus device
> init sequence [3/9].

Yes. I think we are proposing to use sysbus_mmio_map() here for "ged-base".
But what about " memhp-base"? Is it ok to invoke
acpi_memory_hotplug_init(get_system_memoty(), ...) from ged device?

Or go with _set_link() function to pass the address space ?

Thanks,
Shameer

 
> > +
> >  static void create_its(VirtMachineState *vms, DeviceState *gicdev)
> >  {
> >  const char *itsclass = its_class_name();
> > @@ -1483,6 +1508,7 @@ static void machvirt_init(MachineState *machine)
> >  MemoryRegion *ram = g_new(MemoryRegion, 1);
> >  bool firmware_loaded;
> >  bool aarch64 = true;
> > +bool has_ged = !vmc->no_ged;
> >  unsigned int smp_cpus = machine->smp.cpus;
> >  unsigned int max_cpus = machine->smp.max_cpus;
> >
> > @@ -1697,6 +1723,10 @@ static void machvirt_init(MachineState
> *machine)
> >
> >  create_gpio(vms, pic);
> >
> > +if (has_ged && aarch64 && firmware_loaded && acpi_enabled) {
> > +vms->acpi_dev = create_acpi_ged(vms, pic);
> > +}
> > +
> >  /* Create mmio transports, so the user can create virtio backends
> >   * (which will be automatically plugged in to the transports). If
> >   * no backend is created the transport will just sit harmlessly idle.
> > @@ -1876,27 +1906,34 @@ static const CPUArchIdList
> *virt_possible_cpu_arch_ids(MachineState *ms)
> >  static void virt_memory_pre_plug(HotplugHandler *hotplug_dev,
> DeviceState *dev,
> >   Error **errp)
> >  {
> > +VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> > +const bool is_nvdimm = object_dynamic_cast(OBJECT(dev),
> TYPE_NVDIMM);
> >
> > -/*
> > - * The device memory is not yet exposed to the Guest either through
> > - * DT or ACPI and hence both cold/hot plug of memory is explicitly
> > - * disabled for now.
> > - */
> > -if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > -error_setg(errp, "memory cold/hot plug is not yet supported");
> > +if (is_nvdimm) {
> > +error_setg(errp, "nvdimm is not yet supported");
> >  return;
> >  }
> >
> > +if (!vms->acpi_dev) {
> > +error_setg(errp, "memory hotplug is not enabled: missing acpi
> device");
> > +return;
> > +}
> > +
> > +hotplug_handler_pre_plug(HOTPLUG_HANDLER(vms->acpi_dev), dev,
> errp);
> use local_error and check for error condition here. see pc_memory_pre_plug()
> 
> > +
> >  pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL,
> errp);
> >  }
> >
> >  static void virt_memory_plug(HotplugHandler *hotplug_dev,
> >   DeviceState *dev, Error **errp)
> >  {
> > +HotplugHandlerClass *hhc;
> >  VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> >
> >  pc_dimm_plug(PC_DIMM(dev), MACHINE(vms), NULL);
> 

Re: [Qemu-devel] [PATCH-for-4.2 v8 3/9] hw/acpi: Add ACPI Generic Event Device Support

2019-08-01 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of Igor Mammedov
> Sent: 30 July 2019 16:25
> To: Shameerali Kolothum Thodi 
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> ard.biesheu...@linaro.org; shannon.zha...@gmail.com;
> qemu-devel@nongnu.org; xuwei (O) ; Linuxarm
> ; eric.au...@redhat.com; qemu-...@nongnu.org;
> Paolo Bonzini ; sebastien.bo...@intel.com;
> ler...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH-for-4.2 v8 3/9] hw/acpi: Add ACPI Generic
> Event Device Support
> 
> On Fri, 26 Jul 2019 11:45:13 +0100
> Shameer Kolothum  wrote:
> 
> > From: Samuel Ortiz 
> >
> > The ACPI Generic Event Device (GED) is a hardware-reduced specific
> > device[ACPI v6.1 Section 5.6.9] that handles all platform events,
> > including the hotplug ones. This patch generates the AML code that
> > defines GEDs.
> >
> > Platforms need to specify their own GED Event bitmap to describe
> > what kind of events they want to support through GED.  Also this
> > uses a a single interrupt for the  GED device, relying on IO
> > memory region to communicate the type of device affected by the
> > interrupt. This way, we can support up to 32 events with a unique
> > interrupt.
> >
> > This supports only memory hotplug for now.
> >
> 
> > diff --git a/hw/acpi/generic_event_device.c
> b/hw/acpi/generic_event_device.c
> > new file mode 100644
> > index 00..7902e9d706
> > --- /dev/null
> > +++ b/hw/acpi/generic_event_device.c
> [...]
> > +void build_ged_aml(Aml *table, const char *name, HotplugHandler
> *hotplug_dev,
> > +   uint32_t ged_irq, AmlRegionSpace rs)
> > +{
> [...]
> > +
> > +if (ged_events) {
> > +error_report("GED: Unsupported events specified");
> > +exit(1);
> I'd use error_abort instead, since it's programing error, if you have to 
> respin
> series.

Ok.

> > +}
> > +}
> > +
> > +/* Append _EVT method */
> > +aml_append(dev, evt);
> > +
> > +aml_append(table, dev);
> > +}
> > +
> [...]
> > +static void acpi_ged_device_realize(DeviceState *dev, Error **errp)
> > +{
> > +AcpiGedState *s = ACPI_GED(dev);
> > +
> > +assert(s->ged_base);
> > +acpi_ged_init(get_system_memory(), dev, >ged_state);
> 
> calling get_system_memory() from device code used to be a reason for
> rejecting patch,
> I'm not sure what suggest though.
> 
> Maybe Paolo could suggest something.

How about using object_property_set_link()? Something like below.

--8-
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index f00b0ab14b..eb1ed38f4a 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -229,11 +229,12 @@ static void acpi_ged_device_realize(DeviceState *dev, 
Error **errp)
 AcpiGedState *s = ACPI_GED(dev);
 
 assert(s->ged_base);
-acpi_ged_init(get_system_memory(), dev, >ged_state);
+assert(s->sys_mem);
+acpi_ged_init(s->sys_mem, dev, >ged_state);
 
 if (s->memhp_state.is_enabled) {
 assert(s->memhp_base);
-acpi_memory_hotplug_init(get_system_memory(), OBJECT(dev),
+acpi_memory_hotplug_init(s->sys_mem, OBJECT(dev),
  >memhp_state,
  s->memhp_base);
 }
@@ -245,6 +246,8 @@ static Property acpi_ged_properties[] = {
  * because GED handles memory hotplug event and acpi-mem-hotplug
  * memory region gets initialized when GED device is realized.
  */
+DEFINE_PROP_LINK("memory", AcpiGedState, sys_mem, TYPE_MEMORY_REGION,
+ MemoryRegion *),
 DEFINE_PROP_UINT64("memhp-base", AcpiGedState, memhp_base, 0),
 DEFINE_PROP_BOOL("memory-hotplug-support", AcpiGedState,
  memhp_state.is_enabled, true),
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 73a758d9a9..0cbaf6c6e1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -529,8 +529,12 @@ static inline DeviceState 
*create_acpi_ged(VirtMachineState *vms, qemu_irq *pic)
 DeviceState *dev;
 int irq = vms->irqmap[VIRT_ACPI_GED];
 uint32_t event = ACPI_GED_MEM_HOTPLUG_EVT | ACPI_GED_PWR_DOWN_EVT;
+MemoryRegion *sys_mem = get_system_memory();
 
 dev = DEVICE(object_new(TYPE_ACPI_GED));
+
+object_property_set_link(OBJECT(dev), OBJECT(sys_mem),
+ "memory", _abort);
 qdev_prop_set_uint64(dev, "memhp-base",
  vms->memmap[VIRT_PCDIMM_ACPI].base);

Re: [Qemu-devel] [PATCH-for-4.2 v7 10/10] tests: Update DSDT ACPI table for arm/virt board with PCDIMM related changes

2019-07-22 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of Igor Mammedov
> Sent: 18 July 2019 14:13
> To: Shameerali Kolothum Thodi 
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> shannon.zha...@gmail.com; ard.biesheu...@linaro.org;
> qemu-devel@nongnu.org; xuwei (O) ; Linuxarm
> ; eric.au...@redhat.com; qemu-...@nongnu.org;
> sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH-for-4.2 v7 10/10] tests: Update DSDT ACPI
> table for arm/virt board with PCDIMM related changes
> 
> On Tue, 16 Jul 2019 16:38:16 +0100
> Shameer Kolothum  wrote:
> 
> > From: Eric Auger 
> >
> > PCDIMM hotplug addition updated the DSDT. Update the reference table.
> 
> it's not correct process. series should be merged through Michael's pci branch
> and see
> commit ab50f22309a17c772c51931940596e707c200739 (mst/pci)
> Author: Michael S. Tsirkin 
> Date:   Tue May 21 17:38:47 2019 -0400
> 
> bios-tables-test: add diff allowed list
> 
> how to request table update.

Ok. Just to confirm, this means I can probably add the below diff to patch #6 
and
remove this patch(10/10) from the series. 

diff --git a/tests/bios-tables-test-allowed-diff.h 
b/tests/bios-tables-test-allowed-diff.h
index dfb8523c8b..7b4adbc822 100644
--- a/tests/bios-tables-test-allowed-diff.h
+++ b/tests/bios-tables-test-allowed-diff.h
@@ -1 +1,2 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/virt/DSDT",

> 
> Another thing:
> bios-tables-test has test_acpi_tcg_dimm_pxm() test case,
> pls make use of it to test arm/virt variant

I had a go with this, but has found an issue with this.

This is what I added in order to run the dimm_pxm test.

- - 8- -

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index a356ac3489..79af4f4874 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -871,6 +871,36 @@ static void test_acpi_piix4_tcg_dimm_pxm(void)
 test_acpi_tcg_dimm_pxm(MACHINE_PC);
 }
 
+static void test_acpi_virt_tcg_dimm_pxm(void)
+{
+test_data data = {
+.machine = "virt",
+.accel = "tcg",
+.uefi_fl1 = "pc-bios/edk2-aarch64-code.fd",
+.uefi_fl2 = "pc-bios/edk2-arm-vars.fd",
+.cd = "tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2",
+.ram_start = 0x4000ULL,
+.scan_len = 128ULL * 1024 * 1024,
+};
+
+data.variant = ".dimmpxm";
+test_acpi_one(" -cpu cortex-a57"
+  " -smp 4"
+  " -m 512M,slots=3,maxmem=2G"
+  " -object memory-backend-ram,id=ram0,size=128M"
+  " -object memory-backend-ram,id=ram1,size=128M"
+  " -object memory-backend-ram,id=ram2,size=128M"
+  " -object memory-backend-ram,id=ram3,size=128M"
+  " -numa node,memdev=ram0,nodeid=0"
+  " -numa node,memdev=ram1,nodeid=1"
+  " -numa node,memdev=ram2,nodeid=2"
+  " -numa node,memdev=ram3,nodeid=3"
+  " -object memory-backend-ram,id=ram4,size=1G"
+  " -device pc-dimm,id=dimm0,memdev=ram4,node=0",
+  );
+free_test_data();
+}
+
 static void test_acpi_virt_tcg(void)
 {
 test_data data = {
@@ -917,6 +947,7 @@ int main(int argc, char *argv[])
 qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
 } else if (strcmp(arch, "aarch64") == 0) {
 qtest_add_func("acpi/virt", test_acpi_virt_tcg);
+qtest_add_func("acpi/virt/dimmpxm", test_acpi_virt_tcg_dimm_pxm);
 }
 ret = g_test_run();
 boot_sector_cleanup(disk);

- - 8- -

Then used the script to generate the acpi tables and "make check" runs fine.

But when I changed the memory configuration to,

test_acpi_one(" -cpu cortex-a57"
" -smp 4"
" -m 256M,slots=3,maxmem=2G"
" -object memory-backend-ram,id=ram0,size=64M"
" -object memory-backend-ram,id=ram1,size=64M"
" -object memory-backend-ram,id=ram2,size=64M"
" -object memory-backend-ram,id=ram3,size=64M"
" -numa node,memdev=ram0,nodeid=0"
" -numa node,memdev=ram1,nodeid=1"
" -numa node,memdev=ram2,nodeid=2"
" -numa node,memdev=ram3,nodeid=3"
" -object memory-backend-ram,id=ram4,size=1G"
" -device pc-dimm,id=dimm0,memdev=ram4,node=0",
);

&

Re: [Qemu-devel] [PATCH-for-4.2 v7 09/10] hw/arm: Use GED for system_powerdown event

2019-07-22 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of Igor Mammedov
> Sent: 18 July 2019 14:03
> To: Shameerali Kolothum Thodi 
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> shannon.zha...@gmail.com; ard.biesheu...@linaro.org;
> qemu-devel@nongnu.org; xuwei (O) ; Linuxarm
> ; eric.au...@redhat.com; qemu-...@nongnu.org;
> sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH-for-4.2 v7 09/10] hw/arm: Use GED for
> system_powerdown event
> 
> On Tue, 16 Jul 2019 16:38:15 +0100
> Shameer Kolothum  wrote:
> 
> > Use GED for system_powerdown event instead of GPIO for ACPI.
> > Guest boot with DT still uses GPIO.
> 
> 
> I'd hate to keep ACPI GPIO around but taking in account migration
> wouldn't this patch break ACPI GPIO based button on 4.0 and older where
> GED is not available and guest was booted as ACPI one and then rebooted on
> new QEMU?

Hmm..That looks like a valid case unfortunately :(. I will keep the GPIO then.

Thanks,
Shameer
 
> 
> > Signed-off-by: Shameer Kolothum 
> > Reviewed-by: Eric Auger 
> > ---
> >  hw/arm/virt-acpi-build.c | 37 +
> >  hw/arm/virt.c|  6 +++---
> >  2 files changed, 4 insertions(+), 39 deletions(-)
> >
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index 75657caa36..9178ca8e40 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -49,7 +49,6 @@
> >  #include "kvm_arm.h"
> >
> >  #define ARM_SPI_BASE 32
> > -#define ACPI_POWER_BUTTON_DEVICE "PWRB"
> >
> >  static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)
> >  {
> > @@ -328,37 +327,6 @@ static void acpi_dsdt_add_pci(Aml *scope, const
> MemMapEntry *memmap,
> >  aml_append(scope, dev);
> >  }
> >
> > -static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry
> *gpio_memmap,
> > -   uint32_t gpio_irq)
> > -{
> > -Aml *dev = aml_device("GPO0");
> > -aml_append(dev, aml_name_decl("_HID", aml_string("ARMH0061")));
> > -aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> > -aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> > -
> > -Aml *crs = aml_resource_template();
> > -aml_append(crs, aml_memory32_fixed(gpio_memmap->base,
> gpio_memmap->size,
> > -   AML_READ_WRITE));
> > -aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
> AML_ACTIVE_HIGH,
> > -  AML_EXCLUSIVE, _irq, 1));
> > -aml_append(dev, aml_name_decl("_CRS", crs));
> > -
> > -Aml *aei = aml_resource_template();
> > -/* Pin 3 for power button */
> > -const uint32_t pin_list[1] = {3};
> > -aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE,
> AML_ACTIVE_HIGH,
> > - AML_EXCLUSIVE, AML_PULL_UP, 0,
> pin_list, 1,
> > - "GPO0", NULL, 0));
> > -aml_append(dev, aml_name_decl("_AEI", aei));
> > -
> > -/* _E03 is handle for power button */
> > -Aml *method = aml_method("_E03", 0, AML_NOTSERIALIZED);
> > -aml_append(method,
> aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
> > -  aml_int(0x80)));
> > -aml_append(dev, method);
> > -aml_append(scope, dev);
> > -}
> > -
> >  static void acpi_dsdt_add_power_button(Aml *scope)
> >  {
> >  Aml *dev = aml_device(ACPI_POWER_BUTTON_DEVICE);
> > @@ -739,9 +707,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> VirtMachineState *vms)
> >  (irqmap[VIRT_MMIO] + ARM_SPI_BASE),
> NUM_VIRTIO_TRANSPORTS);
> >  acpi_dsdt_add_pci(scope, memmap, (irqmap[VIRT_PCIE] +
> ARM_SPI_BASE),
> >vms->highmem, vms->highmem_ecam);
> > -acpi_dsdt_add_gpio(scope, [VIRT_GPIO],
> > -   (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
> >  if (vms->acpi_dev) {
> > +acpi_dsdt_add_power_button(scope);
> >  build_ged_aml(scope, "\\_SB."GED_DEVICE,
> >HOTPLUG_HANDLER(vms->acpi_dev),
> >irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE,
> AML_SYSTEM_MEMORY);
> > @@ -752,8 +719,6 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> VirtMachineS

Re: [Qemu-devel] [PATCH-for-4.2 v7 03/10] hw/acpi: Add ACPI Generic Event Device Support

2019-07-22 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 18 July 2019 13:31
> To: Shameerali Kolothum Thodi 
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> shannon.zha...@gmail.com; ard.biesheu...@linaro.org;
> qemu-devel@nongnu.org; xuwei (O) ; Linuxarm
> ; eric.au...@redhat.com; qemu-...@nongnu.org;
> sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH-for-4.2 v7 03/10] hw/acpi: Add ACPI Generic
> Event Device Support
> 
> On Thu, 18 Jul 2019 10:52:10 +
> Shameerali Kolothum Thodi  wrote:
> 
> > Hi Igor,
> >
> > > -Original Message-
> > > From: Qemu-devel
> > >
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> > > u.org] On Behalf Of Igor Mammedov
> > > Sent: 17 July 2019 15:33
> > > To: Shameerali Kolothum Thodi 
> > > Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> > > shannon.zha...@gmail.com; ard.biesheu...@linaro.org;
> > > qemu-devel@nongnu.org; xuwei (O) ; Linuxarm
> > > ; eric.au...@redhat.com;
> qemu-...@nongnu.org;
> > > sebastien.bo...@intel.com; ler...@redhat.com
> > > Subject: Re: [Qemu-devel] [PATCH-for-4.2 v7 03/10] hw/acpi: Add ACPI
> Generic
> > > Event Device Support
> > >
> > > On Tue, 16 Jul 2019 16:38:09 +0100
> > > Shameer Kolothum  wrote:
> >
> > [...]
> >
> > > > +static void acpi_ged_event(AcpiGedState *s, uint32_t sel)
> > > > +{
> > > > +GEDState *ged_st = >ged_state;
> > > > +/*
> > > > + * Set the GED IRQ selector to the expected device type value. This
> > > > + * way, the ACPI method will be able to trigger the right code 
> > > > based
> > > > + * on a unique IRQ.
> > > comment isn't correct anymore, pls fix it
> >
> > True.
> >
> > >
> > > > + */
> > > > +qemu_mutex_lock(_st->lock);
> > > Is this lock really necessary?
> > > (I thought that MMIO and monitor access is guarded by BQL)
> >
> > Hmm..I am not sure. This is to synchronize with the ged_st->sel update 
> > inside
> > ged_read(). And also acpi_ged_event() gets called through
> _power_down_notifier()
> > as well. BQL guard is in place for all the paths here?
> power down command originates from HMP or QMP monitor, so you don't
> really
> need a lock here.

Ok. I will get rid of it then.

> >
> > >
> > > > +ged_st->sel |= sel;
> > > > +qemu_mutex_unlock(_st->lock);
> > > > +
> > > > +/* Trigger the event by sending an interrupt to the guest. */
> > > > +qemu_irq_pulse(s->irq);
> > > > +}
> > > > +
> > > > +static void acpi_ged_init(MemoryRegion *as, DeviceState *dev,
> GEDState
> > > *ged_st)
> > > > +{
> > > > +AcpiGedState *s = ACPI_GED(dev);
> > > > +
> > > > +assert(s->ged_base);
> > > > +
> > > > +qemu_mutex_init(_st->lock);
> > > > +memory_region_init_io(_st->io, OBJECT(dev), _ops,
> ged_st,
> > > > +  TYPE_ACPI_GED, ACPI_GED_REG_LEN);
> > > > +memory_region_add_subregion(as, s->ged_base, _st->io);
> > > > +qdev_init_gpio_out_named(DEVICE(s), >irq, "ged-irq", 1);
> > > > +}
> > > > +
> > > > +static void acpi_ged_device_plug_cb(HotplugHandler *hotplug_dev,
> > > > +DeviceState *dev, Error
> **errp)
> > > > +{
> > > > +AcpiGedState *s = ACPI_GED(hotplug_dev);
> > > > +
> > > > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > > > +if (s->memhp_state.is_enabled) {
> > > > +acpi_memory_plug_cb(hotplug_dev, >memhp_state,
> dev,
> > > errp);
> > > > +} else {
> > > > +error_setg(errp,
> > > > + "memory hotplug is not
> > > enabled: %s.memory-hotplug-support "
> > > > + "is not set", object_get_typename(OBJECT(s)));
> > > > +}
> > > > +} else {
> > > > +error_setg(errp, "virt: device plug request for unsupported
> > > device"
> > > > +   " type: %s",
> object_get_typename(OBJECT(dev)));
> > > > +}
> > > > +}

Re: [Qemu-devel] [PATCH-for-4.2 v7 03/10] hw/acpi: Add ACPI Generic Event Device Support

2019-07-18 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+shameerali.kolothum.thodi=huawei.com@nongn
> u.org] On Behalf Of Igor Mammedov
> Sent: 17 July 2019 15:33
> To: Shameerali Kolothum Thodi 
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> shannon.zha...@gmail.com; ard.biesheu...@linaro.org;
> qemu-devel@nongnu.org; xuwei (O) ; Linuxarm
> ; eric.au...@redhat.com; qemu-...@nongnu.org;
> sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH-for-4.2 v7 03/10] hw/acpi: Add ACPI Generic
> Event Device Support
> 
> On Tue, 16 Jul 2019 16:38:09 +0100
> Shameer Kolothum  wrote:

[...]

> > +static void acpi_ged_event(AcpiGedState *s, uint32_t sel)
> > +{
> > +GEDState *ged_st = >ged_state;
> > +/*
> > + * Set the GED IRQ selector to the expected device type value. This
> > + * way, the ACPI method will be able to trigger the right code based
> > + * on a unique IRQ.
> comment isn't correct anymore, pls fix it

True.

> 
> > + */
> > +qemu_mutex_lock(_st->lock);
> Is this lock really necessary?
> (I thought that MMIO and monitor access is guarded by BQL)

Hmm..I am not sure. This is to synchronize with the ged_st->sel update inside
ged_read(). And also acpi_ged_event() gets called through _power_down_notifier()
as well. BQL guard is in place for all the paths here? 

> 
> > +ged_st->sel |= sel;
> > +qemu_mutex_unlock(_st->lock);
> > +
> > +/* Trigger the event by sending an interrupt to the guest. */
> > +qemu_irq_pulse(s->irq);
> > +}
> > +
> > +static void acpi_ged_init(MemoryRegion *as, DeviceState *dev, GEDState
> *ged_st)
> > +{
> > +AcpiGedState *s = ACPI_GED(dev);
> > +
> > +assert(s->ged_base);
> > +
> > +qemu_mutex_init(_st->lock);
> > +memory_region_init_io(_st->io, OBJECT(dev), _ops, ged_st,
> > +  TYPE_ACPI_GED, ACPI_GED_REG_LEN);
> > +memory_region_add_subregion(as, s->ged_base, _st->io);
> > +qdev_init_gpio_out_named(DEVICE(s), >irq, "ged-irq", 1);
> > +}
> > +
> > +static void acpi_ged_device_plug_cb(HotplugHandler *hotplug_dev,
> > +DeviceState *dev, Error **errp)
> > +{
> > +AcpiGedState *s = ACPI_GED(hotplug_dev);
> > +
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +if (s->memhp_state.is_enabled) {
> > +acpi_memory_plug_cb(hotplug_dev, >memhp_state, dev,
> errp);
> > +} else {
> > +error_setg(errp,
> > + "memory hotplug is not
> enabled: %s.memory-hotplug-support "
> > + "is not set", object_get_typename(OBJECT(s)));
> > +}
> > +} else {
> > +error_setg(errp, "virt: device plug request for unsupported
> device"
> > +   " type: %s", object_get_typename(OBJECT(dev)));
> > +}
> > +}
> > +
> > +static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits
> ev)
> > +{
> > +AcpiGedState *s = ACPI_GED(adev);
> > +uint32_t sel;
> > +
> > +if (ev & ACPI_MEMORY_HOTPLUG_STATUS) {
> > +sel = ACPI_GED_MEM_HOTPLUG_EVT;
> > +} else {
> > +/* Unknown event. Return without generating interrupt. */
> > +warn_report("GED: Unsupported event %d. No irq injected", ev);
> > +return;
> > +}
> > +
> > +/*
> > + * We inject the hotplug interrupt. The IRQ selector will make
> > + * the difference from the ACPI table.
> I don't get comment at all, pls rephrase/

Ok. I think better to get rid of this comment here and update the one in 
acpi_ged_event()
appropriately.

> 
> > + */
> > +acpi_ged_event(s, sel);
> it seems to used only once and only here, suggest to drop acpi_ged_event()
> and move it's code here.

But patch #10 makes use of it from acpi_ged_pm_powerdown_req().

> > +}
> > +
> > +static void acpi_ged_device_realize(DeviceState *dev, Error **errp)
> > +{
> > +AcpiGedState *s = ACPI_GED(dev);
> > +
> > +if (s->memhp_state.is_enabled) {
> > +acpi_memory_hotplug_init(get_system_memory(), OBJECT(dev),
> > + >memhp_state,
> > + s->memhp_base);
> > +}
> > +
> > +acpi_ged_init(get_system_memory(), dev, >ged_state);
> > +}
> > +
> > +sta

Re: [Qemu-devel] [PATCH v6 0/8] ARM virt: ACPI memory hotplug support

2019-07-02 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 02 July 2019 13:00
> To: Peter Maydell ; Igor Mammedov
> 
> Cc: Shameerali Kolothum Thodi ;
> QEMU Developers ; qemu-arm
> ; Samuel Ortiz ; Ard
> Biesheuvel ; Linuxarm ;
> xuwei (O) ; Shannon Zhao
> ; sebastien.bo...@intel.com; Laszlo Ersek
> ; Dr. David Alan Gilbert 
> Subject: Re: [Qemu-devel] [PATCH v6 0/8] ARM virt: ACPI memory hotplug
> support
> 
> Hi Peter,
> 
> On 7/2/19 1:46 PM, Peter Maydell wrote:
> > On Tue, 2 Jul 2019 at 12:07, Igor Mammedov 
> wrote:
> >>
> >> On Tue, 25 Jun 2019 13:14:13 +0100
> >> Shameer Kolothum  wrote:
> >>
> >>> This series is an attempt to provide device memory hotplug support
> >>> on ARM virt platform. This is based on Eric's recent works here[1]
> >>> and carries some of the pc-dimm related patches dropped from his
> >>> series.
> >>>
> >>> The kernel support for arm64 memory hot add was added recently by
> >>> Robin and hence the guest kernel should be => 5.0-rc1.
> >>>
> >>> NVDIM support is not included currently as we still have an unresolved
> >>> issue while hot adding NVDIMM[2]. However NVDIMM cold plug patches
> >>> can be included, but not done for now, for keeping it simple.
> >>>
> >>> This makes use of GED device to sent hotplug ACPI events to the
> >>> Guest. GED code is based on Nemu. Thanks to the efforts of Samuel and
> >>> Sebastien to add the hardware-reduced support to Nemu using GED
> >>> device[3]. (Please shout if I got the author/signed-off wrong for
> >>> those patches or missed any names).
> >>>
> >>> This is sanity tested on a HiSilicon ARM64 platform and appreciate
> >>> any further testing.
> >>
> >> There are several things I'd fix/amend but it's nothing that couldn't
> >> be done on top as bugfixes (I'll comment later on specific issues).
> >>
> >> However as a whole from ACPI and memory hotplug POV series looks more
> >> or less ready for merging.
> >>
> >> I've asked Eric to test migration (I'm quite not sure about that part),
> >> (CCed David)so on condition it works:
> >>
> >>   Reviewed-by: Igor Mammedov 
> >
> > If we want to get this into 4.1 I'll need somebody to do a respin
> > with all the relevant fixes pretty soon (ie within a day or two,
> > and that is pushing it because really it's missed the freeze
> > deadline already). It might be easier just to let it go into 4.2
> > instead...
> 
> OK so after those late attempts to get it in, I agree with you. If it
> missed the deadline already then let's stick to the process and try to
> get this just after 4.1.
> 
> I have just checked migration and it fails between a qemu 4.1 and qemu
> 4.0 with

Thanks Eric for verifying that. I didn’t attempt migration test with different
versions.

> "qemu-system-aarch64: Unknown savevm section or instance 'acpi-ged' 0.
> Make sure that your current VM setup matches your saved VM setup,
> including any hotplugged devices
> qemu-system-aarch64: load of migration failed: Invalid argument"
> 
> so we would need to have a no_acpi_dev class field to avoid using the
> GED device < 4.1 I think.

Ok. 

> 
> + troubles with the DSDT ref files / bios-tables-test.c to be fixed.

I am on travel at the moment and not in a position to respin this quickly.
So as suggested above will target 4.2 if that’s fine.

Thanks,
Shameer

 
> Thanks
> 
> Eric
> >
> > thanks
> > -- PMM
> >


Re: [Qemu-devel] [PATCH v5 0/8] ARM virt: ACPI memory hotplug support

2019-06-19 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 18 June 2019 14:45
> To: Peter Maydell 
> Cc: Shameerali Kolothum Thodi ;
> QEMU Developers ; qemu-arm
> ; Igor Mammedov ;
> Shannon Zhao ; Samuel Ortiz
> ; sebastien.bo...@intel.com; xuwei (O)
> ; Laszlo Ersek ; Ard Biesheuvel
> ; Linuxarm 
> Subject: Re: [PATCH v5 0/8] ARM virt: ACPI memory hotplug support
> 
> Hi Peter,
> On 6/18/19 2:57 PM, Peter Maydell wrote:
> > I'm not sure we should carry across Tested-by tags like that: any
> > respin might accidentally introduce bugs that make it stop working...
> 
> OK. No problem. I will test the next version then.

Thanks for testing and verifying. I will respin this soon.

Cheers,
Shameer
 


Re: [Qemu-devel] [PATCH v5 4/8] hw/arm/virt: Add memory hotplug framework

2019-06-19 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 18 June 2019 13:42
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> sa...@linux.intel.com; sebastien.bo...@intel.com; xuwei (O)
> ; ler...@redhat.com; ard.biesheu...@linaro.org;
> Linuxarm 
> Subject: Re: [PATCH v5 4/8] hw/arm/virt: Add memory hotplug framework
> 
> Hi Shameer,
> 
> On 5/22/19 6:22 PM, Shameer Kolothum wrote:
> > From: Eric Auger 
> >
> > This patch adds the memory hot-plug/hot-unplug infrastructure in
> > machvirt. The device memory is not yet exposed to the Guest either
> > through DT or ACPI and hence both cold/hot plug of memory is
> > explicitly disabled for now.
> >
> > Signed-off-by: Eric Auger 
> > Signed-off-by: Kwangwoo Lee 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  hw/arm/Kconfig |  2 ++
> >  hw/arm/virt.c  | 51
> > +-
> >  2 files changed, 52 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index
> > af8cffde9c..6ef22439b5 100644
> > --- a/hw/arm/Kconfig
> > +++ b/hw/arm/Kconfig
> > @@ -19,6 +19,8 @@ config ARM_VIRT
> >  select PLATFORM_BUS
> >  select SMBIOS
> >  select VIRTIO_MMIO
> small conflict to be resolved here after addition of "select ACPI_PCI".

Ok. I will address that in next revision.

Thanks,
Shameer

> > +select MEM_DEVICE
> > +select DIMM
> >
> >  config CHEETAH
> >  bool
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c index
> > 5331ab71e2..3df8c389ff 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -62,6 +62,8 @@
> >  #include "hw/arm/smmuv3.h"
> >  #include "hw/acpi/acpi.h"
> >  #include "target/arm/internals.h"
> > +#include "hw/mem/pc-dimm.h"
> > +#include "hw/mem/nvdimm.h">
> >  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
> >  static void virt_##major##_##minor##_class_init(ObjectClass *oc,
> > \ @@ -1862,6 +1864,40 @@ static const CPUArchIdList
> *virt_possible_cpu_arch_ids(MachineState *ms)
> >  return ms->possible_cpus;
> >  }
> >
> > +static void virt_memory_pre_plug(HotplugHandler *hotplug_dev,
> DeviceState *dev,
> > + Error **errp) {
> > +
> > +/*
> > + * The device memory is not yet exposed to the Guest either through>
> + * DT or ACPI and hence both cold/hot plug of memory is explicitly
> > + * disabled for now.
> > + */
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +error_setg(errp, "memory cold/hot plug is not yet supported");
> > +return;
> > +}
> > +
> > +pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL,
> errp);
> > +}
> > +
> > +static void virt_memory_plug(HotplugHandler *hotplug_dev,
> > + DeviceState *dev, Error **errp) {
> > +VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> > +
> > +pc_dimm_plug(PC_DIMM(dev), MACHINE(vms), NULL);
> > +
> > +}
> > +
> > +static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> > +DeviceState *dev,
> Error
> > +**errp) {
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +virt_memory_pre_plug(hotplug_dev, dev, errp);
> > +}
> > +}
> > +
> >  static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >  DeviceState *dev, Error
> > **errp)  { @@ -1873,12 +1909,23 @@ static void
> > virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >   SYS_BUS_DEVICE(dev));
> >  }
> >  }
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +virt_memory_plug(hotplug_dev, dev, errp);
> > +}
> > +}
> > +
> > +static void virt_machine_device_unplug_request_cb(HotplugHandler
> *hotplug_dev,
> > +  DeviceState *dev, Error
> > +**errp) {
> > +error_setg(errp, "device unplug request for unsupported device"
> > +   " type: %s", object_get_typename(OBJECT(dev)));
> >  }
> >
> >  static HotplugHandler *virt_machine_get_hotplug_handler(

Re: [Qemu-devel] [PATCH v5 3/8] hw/acpi: Add ACPI Generic Event Device Support

2019-06-19 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 18 June 2019 13:41
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> sa...@linux.intel.com; sebastien.bo...@intel.com; xuwei (O)
> ; ler...@redhat.com; ard.biesheu...@linaro.org;
> Linuxarm 
> Subject: Re: [PATCH v5 3/8] hw/acpi: Add ACPI Generic Event Device Support
> 
> Hi Shameer,
> 
> On 5/22/19 6:22 PM, Shameer Kolothum wrote:
> > From: Samuel Ortiz 
> >
> > The ACPI Generic Event Device (GED) is a hardware-reduced specific
> > device[ACPI v6.1 Section 5.6.9] that handles all platform events,
> > including the hotplug ones.This patch generates the AML code that
> . This patch
> > defines GEDs.
> >
> > Platforms need to specify their own GED Event bitmap to describe what
> > kind of events they want to support through GED.  Also this uses a a
> > single interrupt for the  GED device, relying on IO memory region to
> > communicate the type of device affected by the interrupt. This way, we
> > can support up to 32 events with a unique interrupt.
> >
> > This supports only memory hotplug for now.
> >
> > Signed-off-by: Samuel Ortiz 
> > Signed-off-by: Sebastien Boeuf 
> > Signed-off-by: Shameer Kolothum 
> > ---
> > v4-->v5
> >  -Removed gsi/irq routing code.
> >  -Changed GED Event array to bitmap.
> >  -Added Migration support.
> >
> > ---
> >  hw/acpi/Kconfig|   4 +
> >  hw/acpi/Makefile.objs  |   1 +
> >  hw/acpi/generic_event_device.c | 332
> +
> >  include/hw/acpi/generic_event_device.h | 102 
> >  4 files changed, 439 insertions(+)
> >  create mode 100644 hw/acpi/generic_event_device.c  create mode
> 100644
> > include/hw/acpi/generic_event_device.h
> >
> > diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig index
> > eca3beed75..01a8b41ef5 100644
> > --- a/hw/acpi/Kconfig
> > +++ b/hw/acpi/Kconfig
> > @@ -27,3 +27,7 @@ config ACPI_VMGENID
> >  bool
> >  default y
> >  depends on PC
> > +
> > +config ACPI_HW_REDUCED
> > +bool
> > +depends on ACPI
> > diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs index
> > 2d46e3789a..b753232323 100644
> > --- a/hw/acpi/Makefile.objs
> > +++ b/hw/acpi/Makefile.objs
> > @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) +=
> > memory_hotplug.o
> >  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
> >  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> >  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
> > +common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
> >  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> >
> >  common-obj-y += acpi_interface.o
> > diff --git a/hw/acpi/generic_event_device.c
> > b/hw/acpi/generic_event_device.c new file mode 100644 index
> > 00..914fe64716
> > --- /dev/null
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -0,0 +1,332 @@
> > +/*
> > + *
> > + * Copyright (c) 2018 Intel Corporation
> > + * Copyright (c) 2019 Huawei Technologies R & D (UK) Ltd
> > + * Written by Samuel Ortiz, Shameer Kolothum
> > + *
> > + * This program is free software; you can redistribute it and/or
> > +modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2 or later, as published by the Free Software Foundation.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > +#include "exec/address-spaces.h"
> > +#include "hw/sysbus.h"
> not needed

True. Missed that.

> > +#include "hw/acpi/acpi.h"
> > +#include "hw/acpi/generic_event_device.h"
> > +#include "hw/mem/pc-dimm.h"
> > +#include "qemu/error-report.h"
> > +
> > +static const uint32_t ged_supported_events[] = {
> > +ACPI_GED_MEM_HOTPLUG_EVT,
> > +};
> > +
> > +/*
> > + * The ACPI Generic Event Device (GED) is a hardware-reduced specific
> > + * device[ACPI v6.1 Section 5.6.9] that handles all platform events,
> > + * including the hotplug ones. Platforms need to specify their own
> > + * GED Event bitmap to describe what kind of events they want to
> > +support
> > + * through GED. This routine uses a single interrupt for the GED
> > +device,
> > + * relying 

Re: [Qemu-devel] [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support

2019-05-17 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 17 May 2019 09:41
> To: Shameerali Kolothum Thodi 
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> shannon.zha...@gmail.com; ard.biesheu...@linaro.org;
> qemu-devel@nongnu.org; Linuxarm ;
> eric.au...@redhat.com; qemu-...@nongnu.org; xuwei (O)
> ; sebastien.bo...@intel.com; ler...@redhat.com
> Subject: Re: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support
> 
> On Mon, 13 May 2019 17:00:13 +
> Shameerali Kolothum Thodi  wrote:
> 
> > > -Original Message-
> > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > Sent: 13 May 2019 17:25
> > > To: Shameerali Kolothum Thodi 
> > > Subject: Re: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device
> > > Support
> > >
> > > On Mon, 13 May 2019 11:53:38 +
> > > Shameerali Kolothum Thodi 
> wrote:
> > >
> > > > Hi Igor,
> > > >
> > > > > -Original Message-
> > > > > From: Shameerali Kolothum Thodi
> > > > > Sent: 03 May 2019 13:46
> > > > > To: 'Igor Mammedov' 
> > > > > Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > > > > eric.au...@redhat.com; peter.mayd...@linaro.org;
> > > > > shannon.zha...@gmail.com; sa...@linux.intel.com;
> > > > > sebastien.bo...@intel.com; xuwei (O) ;
> > > > > ler...@redhat.com; ard.biesheu...@linaro.org; Linuxarm
> > > > > 
> > > > > Subject: RE: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event
> > > > > Device
> > > Support
> > > > >
> > > > > Hi Igor,
> > > > >
> > > > > > -Original Message-
> > > > > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > > > > Sent: 02 May 2019 17:13
> > > > > > To: Shameerali Kolothum Thodi
> > > 
> > > > > > Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > > > > > eric.au...@redhat.com; peter.mayd...@linaro.org;
> > > > > > shannon.zha...@gmail.com; sa...@linux.intel.com;
> > > > > > sebastien.bo...@intel.com; xuwei (O) ;
> > > > > > ler...@redhat.com; ard.biesheu...@linaro.org; Linuxarm
> > > > > > 
> > > > > > Subject: Re: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event
> > > > > > Device
> > > Support
> > > > > >
> > > >
> > > > [...]
> > > >
> > > > > > > +}
> > > > > > > +
> > > > > > > +static Property acpi_ged_properties[] = {
> > > > > > > +/*
> > > > > > > + * Memory hotplug base address is a property of GED here,
> > > > > > > + * because GED handles memory hotplug event and
> > > > > > MEMORY_HOTPLUG_DEVICE
> > > > > > > + * gets initialized when GED device is realized.
> > > > > > > + */
> > > > > > > +DEFINE_PROP_UINT64("memhp-base", AcpiGedState,
> > > memhp_base,
> > > > > > 0),
> > > > > > > +DEFINE_PROP_BOOL("memory-hotplug-support",
> AcpiGedState,
> > > > > > > + memhp_state.is_enabled, true),
> > > > > > > +DEFINE_PROP_PTR("gsi", AcpiGedState, gsi),
> > > > > >
> > > > > > PTR shouldn't be used in new code, look at
> > > > > > object_property_add_link() &
> > > co
> > > > >
> > > > > Ok. I will take a look at that.
> > > >
> > > > I attempted to remove _PROP_PTR for "ged-events" and use
> > > > _PROP_LINK
> > > and
> > > > _set_link(),
> > > >
> > > >
> > > > diff --git a/hw/acpi/generic_event_device.c
> > > b/hw/acpi/generic_event_device.c
> > > > index 856ca04c01..978c8e088e 100644
> > > > --- a/hw/acpi/generic_event_device.c
> > > > +++ b/hw/acpi/generic_event_device.c
> > > > @@ -268,7 +268,8 @@ static Property acpi_ged_properties[] = {
> > > >  DEFINE_PROP_PTR("gsi", AcpiGedState, gsi),
> > > >  DEFINE_PROP_UINT64("ged-base", AcpiGedState, ged_base, 0),
> > > >  DEFINE_PROP_UINT32("ged-irq", AcpiGedState, ged_irq, 0),
> > > > -DEFINE_

Re: [Qemu-devel] [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support

2019-05-13 Thread Shameerali Kolothum Thodi
> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 13 May 2019 17:25
> To: Shameerali Kolothum Thodi 
> Subject: Re: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support
> 
> On Mon, 13 May 2019 11:53:38 +
> Shameerali Kolothum Thodi  wrote:
> 
> > Hi Igor,
> >
> > > -----Original Message-
> > > From: Shameerali Kolothum Thodi
> > > Sent: 03 May 2019 13:46
> > > To: 'Igor Mammedov' 
> > > Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > > eric.au...@redhat.com; peter.mayd...@linaro.org;
> > > shannon.zha...@gmail.com; sa...@linux.intel.com;
> > > sebastien.bo...@intel.com; xuwei (O) ;
> > > ler...@redhat.com; ard.biesheu...@linaro.org; Linuxarm
> > > 
> > > Subject: RE: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device
> Support
> > >
> > > Hi Igor,
> > >
> > > > -Original Message-
> > > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > > Sent: 02 May 2019 17:13
> > > > To: Shameerali Kolothum Thodi
> 
> > > > Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > > > eric.au...@redhat.com; peter.mayd...@linaro.org;
> > > > shannon.zha...@gmail.com; sa...@linux.intel.com;
> > > > sebastien.bo...@intel.com; xuwei (O) ;
> > > > ler...@redhat.com; ard.biesheu...@linaro.org; Linuxarm
> > > > 
> > > > Subject: Re: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device
> Support
> > > >
> >
> > [...]
> >
> > > > > +}
> > > > > +
> > > > > +static Property acpi_ged_properties[] = {
> > > > > +/*
> > > > > + * Memory hotplug base address is a property of GED here,
> > > > > + * because GED handles memory hotplug event and
> > > > MEMORY_HOTPLUG_DEVICE
> > > > > + * gets initialized when GED device is realized.
> > > > > + */
> > > > > +DEFINE_PROP_UINT64("memhp-base", AcpiGedState,
> memhp_base,
> > > > 0),
> > > > > +DEFINE_PROP_BOOL("memory-hotplug-support", AcpiGedState,
> > > > > + memhp_state.is_enabled, true),
> > > > > +DEFINE_PROP_PTR("gsi", AcpiGedState, gsi),
> > > >
> > > > PTR shouldn't be used in new code, look at object_property_add_link() &
> co
> > >
> > > Ok. I will take a look at that.
> >
> > I attempted to remove _PROP_PTR for "ged-events" and use _PROP_LINK
> and
> > _set_link(),
> >
> >
> > diff --git a/hw/acpi/generic_event_device.c
> b/hw/acpi/generic_event_device.c
> > index 856ca04c01..978c8e088e 100644
> > --- a/hw/acpi/generic_event_device.c
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -268,7 +268,8 @@ static Property acpi_ged_properties[] = {
> >  DEFINE_PROP_PTR("gsi", AcpiGedState, gsi),
> >  DEFINE_PROP_UINT64("ged-base", AcpiGedState, ged_base, 0),
> >  DEFINE_PROP_UINT32("ged-irq", AcpiGedState, ged_irq, 0),
> > -DEFINE_PROP_PTR("ged-events", AcpiGedState, ged_events),
> > +DEFINE_PROP_LINK("ged-events", AcpiGedState, ged_events,
> TYPE_ACPI_GED,
> > + GedEvent *),
> >  DEFINE_PROP_UINT32("ged-events-size", AcpiGedState,
> ged_events_size, 0),
> >  DEFINE_PROP_END_OF_LIST(),
> >  };
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 8179b3e511..c89b7b7120 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -537,7 +537,8 @@ static inline DeviceState
> *create_acpi_ged(VirtMachineState *vms)
> >  qdev_prop_set_ptr(dev, "gsi", vms->gsi);
> >  qdev_prop_set_uint64(dev, "ged-base",
> vms->memmap[VIRT_ACPI_GED].base);
> >  qdev_prop_set_uint32(dev, "ged-irq", vms->irqmap[VIRT_ACPI_GED]);
> > -qdev_prop_set_ptr(dev, "ged-events", ged_events);
> > +object_property_set_link(OBJECT(dev), OBJECT(ged_events),
> "ged-events",
> > + _abort);
> >  qdev_prop_set_uint32(dev, "ged-events-size",
> ARRAY_SIZE(ged_events));
> >
> >  object_property_add_child(qdev_get_machine(), "acpi-ged",
> > diff --git a/include/hw/acpi/generic_event_device.h
> b/include/hw/acpi/generic_event_device.h
> > index 9c840d8064..588f4ecfba 100644
> > --- a/include/hw/acpi/generic_e

Re: [Qemu-devel] [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support

2019-05-13 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 03 May 2019 13:46
> To: 'Igor Mammedov' 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; sa...@linux.intel.com;
> sebastien.bo...@intel.com; xuwei (O) ;
> ler...@redhat.com; ard.biesheu...@linaro.org; Linuxarm
> 
> Subject: RE: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support
> 
> Hi Igor,
> 
> > -Original Message-
> > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > Sent: 02 May 2019 17:13
> > To: Shameerali Kolothum Thodi 
> > Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > eric.au...@redhat.com; peter.mayd...@linaro.org;
> > shannon.zha...@gmail.com; sa...@linux.intel.com;
> > sebastien.bo...@intel.com; xuwei (O) ;
> > ler...@redhat.com; ard.biesheu...@linaro.org; Linuxarm
> > 
> > Subject: Re: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support
> >

[...]

> > > +}
> > > +
> > > +static Property acpi_ged_properties[] = {
> > > +/*
> > > + * Memory hotplug base address is a property of GED here,
> > > + * because GED handles memory hotplug event and
> > MEMORY_HOTPLUG_DEVICE
> > > + * gets initialized when GED device is realized.
> > > + */
> > > +DEFINE_PROP_UINT64("memhp-base", AcpiGedState, memhp_base,
> > 0),
> > > +DEFINE_PROP_BOOL("memory-hotplug-support", AcpiGedState,
> > > + memhp_state.is_enabled, true),
> > > +DEFINE_PROP_PTR("gsi", AcpiGedState, gsi),
> >
> > PTR shouldn't be used in new code, look at object_property_add_link() & co
> 
> Ok. I will take a look at that.

I attempted to remove _PROP_PTR for "ged-events" and use _PROP_LINK and
_set_link(),


diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 856ca04c01..978c8e088e 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -268,7 +268,8 @@ static Property acpi_ged_properties[] = {
 DEFINE_PROP_PTR("gsi", AcpiGedState, gsi),
 DEFINE_PROP_UINT64("ged-base", AcpiGedState, ged_base, 0),
 DEFINE_PROP_UINT32("ged-irq", AcpiGedState, ged_irq, 0),
-DEFINE_PROP_PTR("ged-events", AcpiGedState, ged_events),
+DEFINE_PROP_LINK("ged-events", AcpiGedState, ged_events, TYPE_ACPI_GED,
+ GedEvent *),
 DEFINE_PROP_UINT32("ged-events-size", AcpiGedState, ged_events_size, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 8179b3e511..c89b7b7120 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -537,7 +537,8 @@ static inline DeviceState *create_acpi_ged(VirtMachineState 
*vms)
 qdev_prop_set_ptr(dev, "gsi", vms->gsi);
 qdev_prop_set_uint64(dev, "ged-base", vms->memmap[VIRT_ACPI_GED].base);
 qdev_prop_set_uint32(dev, "ged-irq", vms->irqmap[VIRT_ACPI_GED]);
-qdev_prop_set_ptr(dev, "ged-events", ged_events);
+object_property_set_link(OBJECT(dev), OBJECT(ged_events), "ged-events",
+ _abort);
 qdev_prop_set_uint32(dev, "ged-events-size", ARRAY_SIZE(ged_events));
 
 object_property_add_child(qdev_get_machine(), "acpi-ged",
diff --git a/include/hw/acpi/generic_event_device.h 
b/include/hw/acpi/generic_event_device.h
index 9c840d8064..588f4ecfba 100644
--- a/include/hw/acpi/generic_event_device.h
+++ b/include/hw/acpi/generic_event_device.h
@@ -111,7 +111,7 @@ typedef struct AcpiGedState {
 hwaddr ged_base;
 GEDState ged_state;
 uint32_t ged_irq;
-void *ged_events;
+GedEvent *ged_events;
 uint32_t ged_events_size;
 } AcpiGedState;


And with this I get,

Segmentation fault  (core dumped) ./qemu-system-aarch64-ged-v5
-machine virt, -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m
4G,maxmem=8G,slots=10 -drive if=none,file=ubuntu-est-5.0,id=fs -device
virtio-blk-device,drive=fs -kernel Image_memhp_remove -bios
QEMU_EFI_Release.fd -object memory-backend-ram,id=mem1,size=1G -device
pc-dimm,id=dimm1,memdev=mem1 -numa node,nodeid=0 -append
"console=ttyAMA0 root=/dev/vda rw acpi=force movable_node"

It looks like struct pointer cannot be used directly and has to make a QOM 
object
for DEFINE_PROP_LINK use. Not sure there is an easy way for setting ptr property
using link() functions. Please let me know if there any reference 
implementation I
can take a look.

Appreciate your help,

Thanks,
Shameer




Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu ARM/virt

2019-05-10 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 10 May 2019 10:16
> To: Shameerali Kolothum Thodi ;
> Laszlo Ersek ; Igor Mammedov
> 
> Cc: peter.mayd...@linaro.org; xuwei (O) ; Anshuman
> Khandual ; Catalin Marinas
> ; ard.biesheu...@linaro.org;
> will.dea...@arm.com; qemu-devel@nongnu.org; Linuxarm
> ; linux-mm ;
> qemu-...@nongnu.org; Jonathan Cameron
> ; Robin Murphy ;
> linux-arm-ker...@lists.infradead.org
> Subject: Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu
> ARM/virt
> 
> Hi Shameer,
> 
> On 5/10/19 10:34 AM, Shameerali Kolothum Thodi wrote:
> >
> >
> >> -Original Message-
> >> From: Laszlo Ersek [mailto:ler...@redhat.com]
> >> Sent: 09 May 2019 22:48
> >> To: Igor Mammedov 
> >> Cc: Robin Murphy ; Shameerali Kolothum Thodi
> >> ; will.dea...@arm.com; Catalin
> >> Marinas ; Anshuman Khandual
> >> ; linux-arm-ker...@lists.infradead.org;
> >> linux-mm ; qemu-devel@nongnu.org;
> >> qemu-...@nongnu.org; eric.au...@redhat.com;
> peter.mayd...@linaro.org;
> >> Linuxarm ; ard.biesheu...@linaro.org; Jonathan
> >> Cameron ; xuwei (O)
> 
> >> Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt
> >>
> >> On 05/09/19 18:35, Igor Mammedov wrote:
> >>> On Wed, 8 May 2019 22:26:12 +0200
> >>> Laszlo Ersek  wrote:
> >>>
> >>>> On 05/08/19 14:50, Robin Murphy wrote:
> >>>>> Hi Shameer,
> >>>>>
> >>>>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> This series here[0] attempts to add support for PCDIMM in QEMU for
> >>>>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
> >>>>>> least
> >>>>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
> >>>>>> memory is handled by kernel.
> >>>>>>
> >>>>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
> >>>>>> and uses GED device to trigger the hotplug. This works fine.
> >>>>>>
> >>>>>> But when we added the DT node corresponding to the PCDIMM(cold
> plug
> >>>>>> scenario), we noticed that Guest kernel see this memory during early
> >> boot
> >>>>>> even if we are booting with ACPI. Because of this, hotpluggable
> memory
> >>>>>> may end up in zone normal and make it non-hot-un-pluggable even if
> >> Guest
> >>>>>> boots with ACPI.
> >>>>>>
> >>>>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
> >>>>>> interpret the
> >>>>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
> >>>>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
> >>>>>> property
> >>>>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
> >>>>>> differentiate
> >>>>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
> >>>>>>
> >>>>>> But then Laszlo rightly pointed out that in order to accommodate the
> >>>>>> changes
> >>>>>> into UEFI we need to know how exactly Linux expects/handles all the
> >>>>>> hotpluggable memory scenarios. Please find the discussion here[2].
> >>>>>>
> >>>>>> For ease, I am just copying the relevant comment from Laszlo below,
> >>>>>>
> >>>>>> /**
> >>>>>> "Given patches #7 and #8, as I understand them, the firmware cannot
> >>>>>> distinguish
> >>>>>>   hotpluggable & present, from hotpluggable & absent. The firmware
> >> can
> >>>>>> only
> >>>>>>   skip both hotpluggable cases. That's fine in that the firmware will
> >>>>>> hog neither
> >>>>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
> >>>>>> boot?
> >>>>>>
> >>>>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
> >>>>>> Assuming
> >>>>>> we mo

Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu ARM/virt

2019-05-10 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Laszlo Ersek [mailto:ler...@redhat.com]
> Sent: 09 May 2019 22:48
> To: Igor Mammedov 
> Cc: Robin Murphy ; Shameerali Kolothum Thodi
> ; will.dea...@arm.com; Catalin
> Marinas ; Anshuman Khandual
> ; linux-arm-ker...@lists.infradead.org;
> linux-mm ; qemu-devel@nongnu.org;
> qemu-...@nongnu.org; eric.au...@redhat.com; peter.mayd...@linaro.org;
> Linuxarm ; ard.biesheu...@linaro.org; Jonathan
> Cameron ; xuwei (O) 
> Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt
> 
> On 05/09/19 18:35, Igor Mammedov wrote:
> > On Wed, 8 May 2019 22:26:12 +0200
> > Laszlo Ersek  wrote:
> >
> >> On 05/08/19 14:50, Robin Murphy wrote:
> >>> Hi Shameer,
> >>>
> >>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
> >>>> Hi,
> >>>>
> >>>> This series here[0] attempts to add support for PCDIMM in QEMU for
> >>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
> >>>> least
> >>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
> >>>> memory is handled by kernel.
> >>>>
> >>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
> >>>> and uses GED device to trigger the hotplug. This works fine.
> >>>>
> >>>> But when we added the DT node corresponding to the PCDIMM(cold plug
> >>>> scenario), we noticed that Guest kernel see this memory during early
> boot
> >>>> even if we are booting with ACPI. Because of this, hotpluggable memory
> >>>> may end up in zone normal and make it non-hot-un-pluggable even if
> Guest
> >>>> boots with ACPI.
> >>>>
> >>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
> >>>> interpret the
> >>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
> >>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
> >>>> property
> >>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
> >>>> differentiate
> >>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
> >>>>
> >>>> But then Laszlo rightly pointed out that in order to accommodate the
> >>>> changes
> >>>> into UEFI we need to know how exactly Linux expects/handles all the
> >>>> hotpluggable memory scenarios. Please find the discussion here[2].
> >>>>
> >>>> For ease, I am just copying the relevant comment from Laszlo below,
> >>>>
> >>>> /**
> >>>> "Given patches #7 and #8, as I understand them, the firmware cannot
> >>>> distinguish
> >>>>   hotpluggable & present, from hotpluggable & absent. The firmware
> can
> >>>> only
> >>>>   skip both hotpluggable cases. That's fine in that the firmware will
> >>>> hog neither
> >>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
> >>>> boot?
> >>>>
> >>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
> >>>> Assuming
> >>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI
> memmap
> >>>> will not include the range despite it being present at boot.
> >>>> Presumably, ACPI
> >>>> will refer to the range somehow, however. Will that not confuse the OS?
> >>>>
> >>>> When Igor raised this earlier, I suggested that
> >>>> hotpluggable-and-present should
> >>>> be added by the firmware, but also allocated immediately, as
> >>>> EfiBootServicesData
> >>>> type memory. This will prevent other drivers in the firmware from
> >>>> allocating AcpiNVS
> >>>> or Reserved chunks from the same memory range, the UEFI memmap will
> >>>> contain
> >>>> the range as EfiBootServicesData, and then the OS can release that
> >>>> allocation in
> >>>> one go early during boot.
> >>>>
> >>>> But this really has to be clarified from the Linux kernel's
> >>>> expectations. Please
> >>>> formalize all of the following cases:
> >>>>
> >>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report
>

Re: [Qemu-devel] [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM nodes in the DT

2019-05-08 Thread Shameerali Kolothum Thodi
Hi Laszlo,

> -Original Message-
> From: Laszlo Ersek [mailto:ler...@redhat.com]
> Sent: 03 May 2019 15:14
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; eric.au...@redhat.com;
> imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> ard.biesheu...@linaro.org; Linuxarm ;
> shannon.zha...@gmail.com; sebastien.bo...@intel.com; xuwei (O)
> 
> Subject: Re: [Qemu-devel] [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM
> nodes in the DT
> 
> Hi Shameer,
> 
> On 05/03/19 15:35, Shameerali Kolothum Thodi wrote:
> >
> >
> >> -Original Message-
> >> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> >> Shameerali Kolothum Thodi
> >> Sent: 10 April 2019 09:49
> >> To: Laszlo Ersek ; qemu-devel@nongnu.org;
> >> qemu-...@nongnu.org; eric.au...@redhat.com; imamm...@redhat.com
> >> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> >> ard.biesheu...@linaro.org; Linuxarm ;
> >> shannon.zha...@gmail.com; sebastien.bo...@intel.com; xuwei (O)
> >> 
> >> Subject: RE: [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM nodes in
> >> the DT
> >>
> >>
> >>> -Original Message-
> >>> From: Laszlo Ersek [mailto:ler...@redhat.com]
> >>> Sent: 09 April 2019 16:09
> >>> To: Shameerali Kolothum Thodi
> >>> ;
> >>> qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com;
> >>> imamm...@redhat.com
> >>> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> >>> sa...@linux.intel.com; sebastien.bo...@intel.com; xuwei (O)
> >>> ; ard.biesheu...@linaro.org; Linuxarm
> >>> 
> >>> Subject: Re: [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM nodes in
> >>> the DT
> >>>
> >>> On 04/09/19 12:29, Shameer Kolothum wrote:
> >>>> This patch adds memory nodes corresponding to PC-DIMM regions.
> >>>> This will enable support for cold plugged device memory for Guests
> >>>> with DT boot.
> >>>>
> >>>> Signed-off-by: Shameer Kolothum
> >> 
> >>>> Signed-off-by: Eric Auger 
> >>>> ---
> >>>>  hw/arm/boot.c | 42
> ++
> >>>>  1 file changed, 42 insertions(+)
> >>>>
> >>>> diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 8c840ba..150e1ed
> >>>> 100644
> >>>> --- a/hw/arm/boot.c
> >>>> +++ b/hw/arm/boot.c
> >>>> @@ -19,6 +19,7 @@
> >>>>  #include "sysemu/numa.h"
> >>>>  #include "hw/boards.h"
> >>>>  #include "hw/loader.h"
> >>>> +#include "hw/mem/memory-device.h"
> >>>>  #include "elf.h"
> >>>>  #include "sysemu/device_tree.h"
> >>>>  #include "qemu/config-file.h"
> >>>> @@ -538,6 +539,41 @@ static void fdt_add_psci_node(void *fdt)
> >>>>  qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);  }
> >>>>
> >>>> +static int fdt_add_hotpluggable_memory_nodes(void *fdt,
> >>>> + uint32_t acells,
> >>> uint32_t scells) {
> >>>> +MemoryDeviceInfoList *info, *info_list =
> qmp_memory_device_list();
> >>>> +MemoryDeviceInfo *mi;
> >>>> +int ret = 0;
> >>>> +
> >>>> +for (info = info_list; info != NULL; info = info->next) {
> >>>> +mi = info->value;
> >>>> +switch (mi->type) {
> >>>> +case MEMORY_DEVICE_INFO_KIND_DIMM:
> >>>> +{
> >>>> +PCDIMMDeviceInfo *di = mi->u.dimm.data;
> >>>> +
> >>>> +ret = fdt_add_memory_node(fdt, acells, di->addr, scells,
> >>>> +  di->size, di->node, true);
> >>>> +if (ret) {
> >>>> +fprintf(stderr,
> >>>> +"couldn't add PCDIMM
> >> /memory@%"PRIx64"
> >>> node\n",
> >>>> +di->addr);
> >>>> +goto out;
> >>>> +}
> >>>> +break;
> >>>> +}
> >>>> +de

[Qemu-devel] [Question] Memory hotplug clarification for Qemu ARM/virt

2019-05-08 Thread Shameerali Kolothum Thodi
Hi,

This series here[0] attempts to add support for PCDIMM in QEMU for
ARM/Virt platform and has stumbled upon an issue as it is not clear(at least
from Qemu/EDK2 point of view) how in physical world the hotpluggable
memory is handled by kernel.

The proposed implementation in Qemu, builds the SRAT and DSDT parts
and uses GED device to trigger the hotplug. This works fine.

But when we added the DT node corresponding to the PCDIMM(cold plug
scenario), we noticed that Guest kernel see this memory during early boot
even if we are booting with ACPI. Because of this, hotpluggable memory
may end up in zone normal and make it non-hot-un-pluggable even if Guest
boots with ACPI.

Further discussions[1] revealed that, EDK2 UEFI has no means to interpret the
ACPI content from Qemu(this is designed to do so) and uses DT info to
build the GetMemoryMap(). To solve this, introduced "hotpluggable" property
to DT memory node(patches #7 & #8 from [0]) so that UEFI can differentiate
the nodes and exclude the hotpluggable ones from GetMemoryMap().

But then Laszlo rightly pointed out that in order to accommodate the changes
into UEFI we need to know how exactly Linux expects/handles all the 
hotpluggable memory scenarios. Please find the discussion here[2].

For ease, I am just copying the relevant comment from Laszlo below,

/**
"Given patches #7 and #8, as I understand them, the firmware cannot distinguish
 hotpluggable & present, from hotpluggable & absent. The firmware can only
 skip both hotpluggable cases. That's fine in that the firmware will hog neither
 type -- but is that OK for the OS as well, for both ACPI boot and DT boot?

Consider in particular the "hotpluggable & present, ACPI boot" case. Assuming
we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap
will not include the range despite it being present at boot. Presumably, ACPI
will refer to the range somehow, however. Will that not confuse the OS?

When Igor raised this earlier, I suggested that hotpluggable-and-present should
be added by the firmware, but also allocated immediately, as EfiBootServicesData
type memory. This will prevent other drivers in the firmware from allocating 
AcpiNVS
or Reserved chunks from the same memory range, the UEFI memmap will contain
the range as EfiBootServicesData, and then the OS can release that allocation in
one go early during boot.

But this really has to be clarified from the Linux kernel's expectations. Please
formalize all of the following cases:

OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as  DT/ACPI 
should report as
-  --  ---  

DT present ??
DT absent  ??
ACPI   present ??
ACPI   absent  ??

Again, this table is dictated by Linux."

**/

Could you please take a look at this and let us know what is expected here from
a Linux kernel view point.

(Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed any valid
points above).

Thanks,
Shameer
[0] https://patchwork.kernel.org/cover/10890919/
[1] https://patchwork.kernel.org/patch/10863299/
[2] https://patchwork.kernel.org/patch/10890937/





Re: [Qemu-devel] [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support

2019-05-07 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 03 May 2019 16:10
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; sa...@linux.intel.com;
> sebastien.bo...@intel.com; xuwei (O) ;
> ler...@redhat.com; ard.biesheu...@linaro.org; Linuxarm
> 
> Subject: Re: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support

[...]

> > > type.
> > > > + * The resulting ASL code looks like:
> > > > + *
> > > > + * Local0 = ISEL
> > > > + * If ((Local0 & irq0) == irq0)
> > > > + * {
> > > > + * MethodEvent0()
> > > > + * }
> > > > + *
> > > > + * If ((Local0 & irq1) == irq1)
> > > > + * {
> > > > + * MethodEvent1()
> > > > + * }
> > > > + * ...
> > > > + */
> > > Well, I'm confused.
> > > do we actually use multiple IRQs or we use only one + MMIO for event
> type?
> >
> > It is one irq + MMIO. I will change the comment block something like
> > this,
> 
> change corresponding variable names as well

Ok.

> >
> > Local0 = ISEL
> > If ((Local0 & One) == One)
> > {
> > MethodEvent1()
> > }
> >
> > If ((Local0 & 0x02) == 0x02)
> > {
> > MethodEvent2()
> > }
> > ...
> >
> > >
> > > > +for (i = 0; i < s->ged_events_size; i++) {
> > >
> > > > +ged_aml = ged_event_aml(_events[i]);
> > > > +if (!ged_aml) {
> > > > +continue;
> > > > +}
> > > I'd get rid of ged_event_aml replace it with more 'switch':
> > >for (i,...)
> > >if_ctx = aml_if(...)
> > >switch (event)
> > >   case GED_MEMORY_HOTPLUG:
> > >aml_append(if_ctx,
> > > aml_call0(MEMORY_DEVICES_CONTAINER "."
> > > MEMORY_SLOT_SCAN_METHOD))
> > >break
> > >   default:
> > >about(); // make sure that a newly added events have
> > > a handler
> >
> > Ok. I will change this.
> >
> > >
> > > > +
> > > > +/* If ((Local1 == irq))*/
> > > > +if_ctx = aml_if(aml_equal(aml_and(irq_sel,
> > > > +
> > > aml_int(ged_events[i].selector), NULL),
> > > > +
> > > aml_int(ged_events[i].selector)));
> > > > +{
> > > > +/* AML for this specific type of event */
> > > > +aml_append(if_ctx), ged_aml);
> > > > +}
> > > > +
> > > > +/*
> > > > + * We append the first "if" to the "while" context.
> > > > + * Other "if"s will be "elseif"s.
> > > > + */
> > > > +aml_append(evt, if_ctx);
> > > > +}
> > > > +}
> > > > +
> > >
> > > > +aml_append(dev, aml_name_decl("_HID",
> aml_string("ACPI0013")));
> > > > +aml_append(dev, aml_name_decl("_UID",
> aml_string(GED_DEVICE)));
> > > > +aml_append(dev, aml_name_decl("_CRS", crs));
> > > > +
> > > > +/* Append IO region */
> > > > +aml_append(dev, aml_operation_region(AML_GED_IRQ_REG, rs,
> > > > +   aml_int(s->ged_base + ACPI_GED_IRQ_SEL_OFFSET),
> > > > +   ACPI_GED_IRQ_SEL_LEN));
> > > > +field = aml_field(AML_GED_IRQ_REG, AML_DWORD_ACC,
> > > AML_NOLOCK,
> > > > +  AML_WRITE_AS_ZEROS);
> > > > +aml_append(field, aml_named_field(AML_GED_IRQ_SEL,
> > > > +  ACPI_GED_IRQ_SEL_LEN
> *
> > > 8));
> > > > +aml_append(dev, field);
> > >
> > > I'd move it up above EVT() method, so it would be clear from the
> > > begging for what device we are building AML
> >
> > Ok.
> >
> > >
> > > > +/* Append _EVT method */
> > > > +aml_append(dev, evt);
> > > > +
> > &

Re: [Qemu-devel] [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM nodes in the DT

2019-05-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 10 April 2019 09:49
> To: Laszlo Ersek ; qemu-devel@nongnu.org;
> qemu-...@nongnu.org; eric.au...@redhat.com; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com;
> ard.biesheu...@linaro.org; Linuxarm ;
> shannon.zha...@gmail.com; sebastien.bo...@intel.com; xuwei (O)
> 
> Subject: RE: [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM nodes in the
> DT
> 
> 
> > -Original Message-
> > From: Laszlo Ersek [mailto:ler...@redhat.com]
> > Sent: 09 April 2019 16:09
> > To: Shameerali Kolothum Thodi ;
> > qemu-devel@nongnu.org; qemu-...@nongnu.org; eric.au...@redhat.com;
> > imamm...@redhat.com
> > Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> > sa...@linux.intel.com; sebastien.bo...@intel.com; xuwei (O)
> > ; ard.biesheu...@linaro.org; Linuxarm
> > 
> > Subject: Re: [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM nodes in the
> > DT
> >
> > On 04/09/19 12:29, Shameer Kolothum wrote:
> > > This patch adds memory nodes corresponding to PC-DIMM regions.
> > > This will enable support for cold plugged device memory for Guests
> > > with DT boot.
> > >
> > > Signed-off-by: Shameer Kolothum
> 
> > > Signed-off-by: Eric Auger 
> > > ---
> > >  hw/arm/boot.c | 42 ++
> > >  1 file changed, 42 insertions(+)
> > >
> > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> > > index 8c840ba..150e1ed 100644
> > > --- a/hw/arm/boot.c
> > > +++ b/hw/arm/boot.c
> > > @@ -19,6 +19,7 @@
> > >  #include "sysemu/numa.h"
> > >  #include "hw/boards.h"
> > >  #include "hw/loader.h"
> > > +#include "hw/mem/memory-device.h"
> > >  #include "elf.h"
> > >  #include "sysemu/device_tree.h"
> > >  #include "qemu/config-file.h"
> > > @@ -538,6 +539,41 @@ static void fdt_add_psci_node(void *fdt)
> > >  qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
> > >  }
> > >
> > > +static int fdt_add_hotpluggable_memory_nodes(void *fdt,
> > > + uint32_t acells,
> > uint32_t scells) {
> > > +MemoryDeviceInfoList *info, *info_list = qmp_memory_device_list();
> > > +MemoryDeviceInfo *mi;
> > > +int ret = 0;
> > > +
> > > +for (info = info_list; info != NULL; info = info->next) {
> > > +mi = info->value;
> > > +switch (mi->type) {
> > > +case MEMORY_DEVICE_INFO_KIND_DIMM:
> > > +{
> > > +PCDIMMDeviceInfo *di = mi->u.dimm.data;
> > > +
> > > +ret = fdt_add_memory_node(fdt, acells, di->addr, scells,
> > > +  di->size, di->node, true);
> > > +if (ret) {
> > > +fprintf(stderr,
> > > +"couldn't add PCDIMM
> /memory@%"PRIx64"
> > node\n",
> > > +di->addr);
> > > +goto out;
> > > +}
> > > +break;
> > > +}
> > > +default:
> > > +fprintf(stderr, "%s memory nodes are not yet supported\n",
> > > +MemoryDeviceInfoKind_str(mi->type));
> > > +ret = -ENOENT;
> > > +goto out;
> > > +}
> > > +}
> > > +out:
> > > +qapi_free_MemoryDeviceInfoList(info_list);
> > > +return ret;
> > > +}
> > > +
> > >  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> > >   hwaddr addr_limit, AddressSpace *as)
> > >  {
> > > @@ -637,6 +673,12 @@ int arm_load_dtb(hwaddr addr, const struct
> > arm_boot_info *binfo,
> > >  }
> > >  }
> > >
> > > +rc = fdt_add_hotpluggable_memory_nodes(fdt, acells, scells);
> > > +if (rc < 0) {
> > > +fprintf(stderr, "couldn't add hotpluggable memory nodes\n");
> > > +goto fail;
> > > +}
> > > +
> > >  rc = fdt_path_offset(fdt, "/chosen");
> > >  if (rc < 0) {
> > >  qemu_fdt_add_subnode(fdt, "/chosen"

Re: [Qemu-devel] [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support

2019-05-03 Thread Shameerali Kolothum Thodi
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 02 May 2019 17:13
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; sa...@linux.intel.com;
> sebastien.bo...@intel.com; xuwei (O) ;
> ler...@redhat.com; ard.biesheu...@linaro.org; Linuxarm
> 
> Subject: Re: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support
> 
> On Tue, 9 Apr 2019 11:29:30 +0100
> Shameer Kolothum  wrote:
> 
> > From: Samuel Ortiz 
> >
> > The ACPI Generic Event Device (GED) is a hardware-reduced specific
> > device[ACPI v6.1 Section 5.6.9] that handles all platform events,
> > including the hotplug ones.This patch generates the AML code that
> > defines GEDs.
> >
> > Platforms need to specify their own GedEvent array to describe what
> > kind of events they want to support through GED.  Also this uses a
> > a single interrupt for the  GED device, relying on IO memory region
> > to communicate the type of device affected by the interrupt. This
> > way, we can support up to 32 events with a unique interrupt.
> >
> > This supports only memory hotplug for now.
> >
> > Signed-off-by: Samuel Ortiz 
> > Signed-off-by: Sebastien Boeuf 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  hw/acpi/Kconfig|   4 +
> >  hw/acpi/Makefile.objs  |   1 +
> >  hw/acpi/generic_event_device.c | 311
> +
> >  include/hw/acpi/generic_event_device.h | 121 +
> >  4 files changed, 437 insertions(+)
> >  create mode 100644 hw/acpi/generic_event_device.c
> >  create mode 100644 include/hw/acpi/generic_event_device.h
> >
> > diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> > index eca3bee..01a8b41 100644
> > --- a/hw/acpi/Kconfig
> > +++ b/hw/acpi/Kconfig
> > @@ -27,3 +27,7 @@ config ACPI_VMGENID
> >  bool
> >  default y
> >  depends on PC
> > +
> > +config ACPI_HW_REDUCED
> > +bool
> > +depends on ACPI
> > diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> > index 2d46e37..b753232 100644
> > --- a/hw/acpi/Makefile.objs
> > +++ b/hw/acpi/Makefile.objs
> > @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) +=
> memory_hotplug.o
> >  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
> >  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> >  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
> > +common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
> >  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> >
> >  common-obj-y += acpi_interface.o
> > diff --git a/hw/acpi/generic_event_device.c
> b/hw/acpi/generic_event_device.c
> > new file mode 100644
> > index 000..856ca04
> > --- /dev/null
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -0,0 +1,311 @@
> > +/*
> > + *
> > + * Copyright (c) 2018 Intel Corporation
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2 or later, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> > + * more details.
> > + *
> > + * You should have received a copy of the GNU General Public License along
> with
> > + * this program.  If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > +#include "exec/address-spaces.h"
> > +#include "hw/sysbus.h"
> > +#include "hw/acpi/acpi.h"
> > +#include "hw/acpi/generic_event_device.h"
> > +#include "hw/mem/pc-dimm.h"
> > +
> > +static Aml *ged_event_aml(const GedEvent *event)
> > +{
> > +
> > +if (!event) {
> In general, I prefer to check condition for calling something before doing 
> call.
> This way one can see in caller why and what is called, which is more clear.

Ok. I will move it then.

> 
> > +return NULL;
> > +}
> > +
> > +switch (event->event) {
> > +case GED_MEMORY_HOTPLUG:
> > +/* We run a complete memory SCAN when getting a memory
&g

Re: [Qemu-devel] [PATCH v4 4/8] hw/arm/virt: Add memory hotplug framework

2019-05-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 02 May 2019 17:19
> To: Shameerali Kolothum Thodi 
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org;
> shannon.zha...@gmail.com; sa...@linux.intel.com;
> sebastien.bo...@intel.com; xuwei (O) ;
> ler...@redhat.com; ard.biesheu...@linaro.org; Linuxarm
> 
> Subject: Re: [PATCH v4 4/8] hw/arm/virt: Add memory hotplug framework
> 
> On Tue, 9 Apr 2019 11:29:31 +0100
> Shameer Kolothum  wrote:
> 
> > From: Eric Auger 
> >
> > This patch adds the memory hot-plug/hot-unplug infrastructure
> > in machvirt. The device memory is not yet exposed to the Guest
> > either though DT or ACPI and hence both cold/hot plug of memory
> s/though/through/

Sure.

> > is explicitly disabled for now.
> >
> > Signed-off-by: Eric Auger 
> > Signed-off-by: Kwangwoo Lee 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  default-configs/arm-softmmu.mak |  3 +++
> >  hw/arm/virt.c   | 45
> -
> >  2 files changed, 47 insertions(+), 1 deletion(-)
> >
> > diff --git a/default-configs/arm-softmmu.mak
> b/default-configs/arm-softmmu.mak
> > index 613d19a..9f4b803 100644
> > --- a/default-configs/arm-softmmu.mak
> > +++ b/default-configs/arm-softmmu.mak
> > @@ -160,3 +160,6 @@ CONFIG_MUSICPAL=y
> >
> >  # for realview and versatilepb
> >  CONFIG_LSI_SCSI_PCI=y
> > +
> > +CONFIG_MEM_DEVICE=y
> > +CONFIG_DIMM=y
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index ce2664a..da516b3 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -61,6 +61,8 @@
> >  #include "hw/arm/smmuv3.h"
> >  #include "hw/acpi/acpi.h"
> >  #include "target/arm/internals.h"
> > +#include "hw/mem/pc-dimm.h"
> > +#include "hw/mem/nvdimm.h"
> >
> >  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
> >  static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> > @@ -1806,6 +1808,34 @@ static const CPUArchIdList
> *virt_possible_cpu_arch_ids(MachineState *ms)
> >  return ms->possible_cpus;
> >  }
> >
> > +static void virt_memory_pre_plug(HotplugHandler *hotplug_dev,
> DeviceState *dev,
> > + Error **errp)
> > +{
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +error_setg(errp, "memory cold/hot plug is not yet supported");
> > +return;
> > +}
> add comment here why it's needed.

Ok.

> 
> > +
> > +pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL,
> errp);
> maybe before calling this there probably should be check if acpi is enabled.
> 
> not sure if arm/virt board honors -no-acpi CLI option.

Ok. I will check this

Thanks,
Shameer
 
> > +}
> > +
> > +static void virt_memory_plug(HotplugHandler *hotplug_dev,
> > + DeviceState *dev, Error **errp)
> > +{
> > +VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> > +
> > +pc_dimm_plug(PC_DIMM(dev), MACHINE(vms), NULL);
> > +
> > +}
> > +
> > +static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> > +DeviceState *dev,
> Error **errp)
> > +{
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +virt_memory_pre_plug(hotplug_dev, dev, errp);
> > +}
> > +}
> > +
> >  static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >  DeviceState *dev, Error
> **errp)
> >  {
> > @@ -1817,12 +1847,23 @@ static void
> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >   SYS_BUS_DEVICE(dev));
> >  }
> >  }
> > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> > +virt_memory_plug(hotplug_dev, dev, errp);
> > +}
> > +}
> > +
> > +static void virt_machine_device_unplug_request_cb(HotplugHandler
> *hotplug_dev,
> > +  DeviceState *dev, Error
> **errp)
> > +{
> > +error_setg(errp, "device unplug request for unsupported device"
> > +   " type: %s", object_get_typename(OBJECT(dev)));
> >  }
> >
> >  static HotplugHandler *virt_machine_get_hotplug_handler(MachineState
> *machine,
> >
> DeviceState *dev)
>

Re: [Qemu-devel] [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support

2019-05-01 Thread Shameerali Kolothum Thodi
Hi Ard,

> -Original Message-
> From: Ard Biesheuvel [mailto:ard.biesheu...@linaro.org]
> Sent: 01 May 2019 12:10
> To: Shameerali Kolothum Thodi 
> Cc: QEMU Developers ; qemu-arm
> ; Auger Eric ; Igor
> Mammedov ; Peter Maydell
> ; shannon.zha...@gmail.com;
> sa...@linux.intel.com; sebastien.bo...@intel.com; xuwei (O)
> ; Laszlo Ersek ; Linuxarm
> 
> Subject: Re: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support
> 
> On Tue, 9 Apr 2019 at 12:31, Shameer Kolothum
>  wrote:
> >
> > From: Samuel Ortiz 
> >
> > The ACPI Generic Event Device (GED) is a hardware-reduced specific
> > device[ACPI v6.1 Section 5.6.9] that handles all platform events,
> > including the hotplug ones.This patch generates the AML code that
> > defines GEDs.
> >
> > Platforms need to specify their own GedEvent array to describe what
> > kind of events they want to support through GED.  Also this uses a
> > a single interrupt for the  GED device, relying on IO memory region
> > to communicate the type of device affected by the interrupt. This
> > way, we can support up to 32 events with a unique interrupt.
> >
> > This supports only memory hotplug for now.
> >
> > Signed-off-by: Samuel Ortiz 
> > Signed-off-by: Sebastien Boeuf 
> > Signed-off-by: Shameer Kolothum 
> 
> Apologies if this question has been raised before, but do we really
> need a separate device for this? We already handle the power button
> via _AEI/_Exx on the GPIO device, and I think we should be able to add
> additional events using that interface, rather than have two event
> signalling methods/devices on the same platform.

Right. The initial RFC was based on GPIO device[1] and later Igor commented
here[2] that,

" ARM boards were first to use ACPI hw-reduced profile so they picked up
available back then GPIO based way to deliver hotplug event, later spec
introduced Generic Event Device for that means to use with hw-reduced
profile, which NEMU implemented[1], so I'd use that rather than ad-hoc
GPIO mapping. I'd guess it will more compatible with various contemporary
guests and we could reuse the same code for both x86/arm virt boards) "

Thanks,
Shameer

[1]. https://patchwork.kernel.org/cover/10783589/
[2] http://patchwork.ozlabs.org/cover/1045604/

> 
> 
> > ---
> >  hw/acpi/Kconfig|   4 +
> >  hw/acpi/Makefile.objs  |   1 +
> >  hw/acpi/generic_event_device.c | 311
> +
> >  include/hw/acpi/generic_event_device.h | 121 +
> >  4 files changed, 437 insertions(+)
> >  create mode 100644 hw/acpi/generic_event_device.c
> >  create mode 100644 include/hw/acpi/generic_event_device.h
> >
> > diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> > index eca3bee..01a8b41 100644
> > --- a/hw/acpi/Kconfig
> > +++ b/hw/acpi/Kconfig
> > @@ -27,3 +27,7 @@ config ACPI_VMGENID
> >  bool
> >  default y
> >  depends on PC
> > +
> > +config ACPI_HW_REDUCED
> > +bool
> > +depends on ACPI
> > diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> > index 2d46e37..b753232 100644
> > --- a/hw/acpi/Makefile.objs
> > +++ b/hw/acpi/Makefile.objs
> > @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) +=
> memory_hotplug.o
> >  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
> >  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> >  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
> > +common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
> >  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> >
> >  common-obj-y += acpi_interface.o
> > diff --git a/hw/acpi/generic_event_device.c
> b/hw/acpi/generic_event_device.c
> > new file mode 100644
> > index 000..856ca04
> > --- /dev/null
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -0,0 +1,311 @@
> > +/*
> > + *
> > + * Copyright (c) 2018 Intel Corporation
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2 or later, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> > + * more details.
> > + *
> > + * You should have received a copy of the GNU General Public License along
> with
> > + * this program.

Re: [Qemu-devel] [PATCH v4 5/8] hw/arm/virt: Enable device memory cold/hot plug with ACPI boot

2019-05-01 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 30 April 2019 17:34
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> sa...@linux.intel.com; sebastien.bo...@intel.com; xuwei (O)
> ; ler...@redhat.com; ard.biesheu...@linaro.org;
> Linuxarm 
> Subject: Re: [PATCH v4 5/8] hw/arm/virt: Enable device memory cold/hot plug
> with ACPI boot
> 
> Hi Shameer,
> 
> On 4/9/19 12:29 PM, Shameer Kolothum wrote:
> > This initializes the GED device with base memory and irq, configures
> > ged memory hotplug event and builds the corresponding aml code. GED
> > irq routing to Guest is also enabled. With this, both hot and cold
> > plug of device memory is enabled now for Guest with ACPI boot.
> >
> > Memory cold plug support with Guest DT boot is not yet supported.
> >
> > Signed-off-by: Shameer Kolothum 
> Individual history logs may be helpful to follow the changes (change in
> MMIO reggion size, ...)

Ok. Noted.

> > ---
> >  default-configs/arm-softmmu.mak |  2 ++
> >  hw/arm/virt-acpi-build.c|  9 ++
> >  hw/arm/virt.c   | 61
> +++--
> >  include/hw/arm/virt.h   |  4 +++
> >  4 files changed, 73 insertions(+), 3 deletions(-)
> >
> > diff --git a/default-configs/arm-softmmu.mak
> b/default-configs/arm-softmmu.mak
> > index 9f4b803..c9a9b34 100644
> > --- a/default-configs/arm-softmmu.mak
> > +++ b/default-configs/arm-softmmu.mak
> > @@ -163,3 +163,5 @@ CONFIG_LSI_SCSI_PCI=y
> >
> >  CONFIG_MEM_DEVICE=y
> >  CONFIG_DIMM=y
> > +CONFIG_ACPI_MEMORY_HOTPLUG=y
> > +CONFIG_ACPI_HW_REDUCED=y
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index bf9c0bc..1ad394b 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -40,6 +40,8 @@
> >  #include "hw/loader.h"
> >  #include "hw/hw.h"
> >  #include "hw/acpi/aml-build.h"
> > +#include "hw/acpi/memory_hotplug.h"
> > +#include "hw/acpi/generic_event_device.h"
> >  #include "hw/pci/pcie_host.h"
> >  #include "hw/pci/pci.h"
> >  #include "hw/arm/virt.h"
> > @@ -727,6 +729,7 @@ static void
> >  build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >  {
> >  Aml *scope, *dsdt;
> > +MachineState *ms = MACHINE(vms);
> >  const MemMapEntry *memmap = vms->memmap;
> >  const int *irqmap = vms->irqmap;
> >
> > @@ -753,6 +756,12 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> VirtMachineState *vms)
> > (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
> >  acpi_dsdt_add_power_button(scope);
> >
> > +build_ged_aml(scope, "\\_SB."GED_DEVICE,
> HOTPLUG_HANDLER(vms->acpi_dev),
> > +  irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE,
> AML_SYSTEM_MEMORY);
> > +
> > +build_memory_hotplug_aml(scope, ms->ram_slots, "\\_SB", NULL,
> > + AML_SYSTEM_MEMORY);
> > +
> >  aml_append(dsdt, scope);
> >
> >  /* copy AML table into ACPI tables blob and patch header there */
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index da516b3..8179b3e 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -63,6 +63,7 @@
> >  #include "target/arm/internals.h"
> >  #include "hw/mem/pc-dimm.h"
> >  #include "hw/mem/nvdimm.h"
> > +#include "hw/acpi/generic_event_device.h"
> >
> >  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
> >  static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> > @@ -133,6 +134,8 @@ static const MemMapEntry base_memmap[] = {
> >  [VIRT_GPIO] =   { 0x0903, 0x1000 },
> >  [VIRT_SECURE_UART] ={ 0x0904, 0x1000 },
> >  [VIRT_SMMU] =   { 0x0905, 0x0002 },
> > +[VIRT_PCDIMM_ACPI] ={ 0x0907,
> MEMORY_HOTPLUG_IO_LEN },
> > +[VIRT_ACPI_GED] =   { 0x0908, ACPI_GED_REG_LEN },>
> [VIRT_MMIO] =   { 0x0a00, 0x0200 },
> >  /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that
> size */
> >  [VIRT_PLATFORM_BUS] =   { 0x0c00, 0x0200 },
> > @@ -168,6 +171,7 @@ static const int a15irqmap[] = {
> >  [VIRT_PCIE] = 3, /* ... to 6 */
> &

Re: [Qemu-devel] [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support

2019-05-01 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 30 April 2019 16:50
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> sa...@linux.intel.com; sebastien.bo...@intel.com; xuwei (O)
> ; ler...@redhat.com; ard.biesheu...@linaro.org;
> Linuxarm 
> Subject: Re: [PATCH v4 3/8] hw/acpi: Add ACPI Generic Event Device Support
> 
> Hi Shameer,
> 
> On 4/9/19 12:29 PM, Shameer Kolothum wrote:
> > From: Samuel Ortiz 
> >
> > The ACPI Generic Event Device (GED) is a hardware-reduced specific
> > device[ACPI v6.1 Section 5.6.9] that handles all platform events,
> > including the hotplug ones.This patch generates the AML code that
> > defines GEDs.
> >
> > Platforms need to specify their own GedEvent array to describe what
> > kind of events they want to support through GED.  Also this uses a
> > a single interrupt for the  GED device, relying on IO memory region
> > to communicate the type of device affected by the interrupt. This
> > way, we can support up to 32 events with a unique interrupt.
> >
> > This supports only memory hotplug for now.
> >
> > Signed-off-by: Samuel Ortiz 
> > Signed-off-by: Sebastien Boeuf 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  hw/acpi/Kconfig|   4 +
> >  hw/acpi/Makefile.objs  |   1 +
> >  hw/acpi/generic_event_device.c | 311
> +
> >  include/hw/acpi/generic_event_device.h | 121 +
> >  4 files changed, 437 insertions(+)
> >  create mode 100644 hw/acpi/generic_event_device.c
> >  create mode 100644 include/hw/acpi/generic_event_device.h
> >
> > diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> > index eca3bee..01a8b41 100644
> > --- a/hw/acpi/Kconfig
> > +++ b/hw/acpi/Kconfig
> > @@ -27,3 +27,7 @@ config ACPI_VMGENID
> >  bool
> >  default y
> >  depends on PC
> > +
> > +config ACPI_HW_REDUCED
> > +bool
> > +depends on ACPI
> > diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> > index 2d46e37..b753232 100644
> > --- a/hw/acpi/Makefile.objs
> > +++ b/hw/acpi/Makefile.objs
> > @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) +=
> memory_hotplug.o
> >  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
> >  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> >  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
> > +common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
> >  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> >
> >  common-obj-y += acpi_interface.o
> > diff --git a/hw/acpi/generic_event_device.c
> b/hw/acpi/generic_event_device.c
> > new file mode 100644
> > index 000..856ca04
> > --- /dev/null
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -0,0 +1,311 @@
> > +/*
> > + *
> > + * Copyright (c) 2018 Intel Corporation
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2 or later, as published by the Free Software Foundation.
> I am not sure we need below statements: see hw/misc/armsse-mhu.c for a
> recent added file.

Ok. I will get rid of this.
 
> > + *
> > + * This program is distributed in the hope it will be useful, but WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> > + * more details.
> > + *
> > + * You should have received a copy of the GNU General Public License along
> with
> > + * this program.  If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > +#include "exec/address-spaces.h"
> > +#include "hw/sysbus.h"
> > +#include "hw/acpi/acpi.h"
> > +#include "hw/acpi/generic_event_device.h"
> > +#include "hw/mem/pc-dimm.h"
> > +
> > +static Aml *ged_event_aml(const GedEvent *event)
> > +{
> > +
> > +if (!event) {
> > +return NULL;
> > +}
> > +
> > +switch (event->event) {
> > +case GED_MEMORY_HOTPLUG:
> > +/* We run a complete memory SCAN when getting a memory
> hotplug event */
> > +return aml_call0(MEMORY_DEVICES_CONTAI

Re: [Qemu-devel] [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM nodes in the DT

2019-04-10 Thread Shameerali Kolothum Thodi

> -Original Message-
> From: Laszlo Ersek [mailto:ler...@redhat.com]
> Sent: 09 April 2019 16:09
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; eric.au...@redhat.com;
> imamm...@redhat.com
> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> sa...@linux.intel.com; sebastien.bo...@intel.com; xuwei (O)
> ; ard.biesheu...@linaro.org; Linuxarm
> 
> Subject: Re: [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM nodes in the
> DT
> 
> On 04/09/19 12:29, Shameer Kolothum wrote:
> > This patch adds memory nodes corresponding to PC-DIMM regions.
> > This will enable support for cold plugged device memory for Guests
> > with DT boot.
> >
> > Signed-off-by: Shameer Kolothum 
> > Signed-off-by: Eric Auger 
> > ---
> >  hw/arm/boot.c | 42 ++
> >  1 file changed, 42 insertions(+)
> >
> > diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> > index 8c840ba..150e1ed 100644
> > --- a/hw/arm/boot.c
> > +++ b/hw/arm/boot.c
> > @@ -19,6 +19,7 @@
> >  #include "sysemu/numa.h"
> >  #include "hw/boards.h"
> >  #include "hw/loader.h"
> > +#include "hw/mem/memory-device.h"
> >  #include "elf.h"
> >  #include "sysemu/device_tree.h"
> >  #include "qemu/config-file.h"
> > @@ -538,6 +539,41 @@ static void fdt_add_psci_node(void *fdt)
> >  qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
> >  }
> >
> > +static int fdt_add_hotpluggable_memory_nodes(void *fdt,
> > + uint32_t acells,
> uint32_t scells) {
> > +MemoryDeviceInfoList *info, *info_list = qmp_memory_device_list();
> > +MemoryDeviceInfo *mi;
> > +int ret = 0;
> > +
> > +for (info = info_list; info != NULL; info = info->next) {
> > +mi = info->value;
> > +switch (mi->type) {
> > +case MEMORY_DEVICE_INFO_KIND_DIMM:
> > +{
> > +PCDIMMDeviceInfo *di = mi->u.dimm.data;
> > +
> > +ret = fdt_add_memory_node(fdt, acells, di->addr, scells,
> > +  di->size, di->node, true);
> > +if (ret) {
> > +fprintf(stderr,
> > +"couldn't add PCDIMM /memory@%"PRIx64"
> node\n",
> > +di->addr);
> > +goto out;
> > +}
> > +break;
> > +}
> > +default:
> > +fprintf(stderr, "%s memory nodes are not yet supported\n",
> > +MemoryDeviceInfoKind_str(mi->type));
> > +ret = -ENOENT;
> > +goto out;
> > +}
> > +}
> > +out:
> > +qapi_free_MemoryDeviceInfoList(info_list);
> > +return ret;
> > +}
> > +
> >  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> >   hwaddr addr_limit, AddressSpace *as)
> >  {
> > @@ -637,6 +673,12 @@ int arm_load_dtb(hwaddr addr, const struct
> arm_boot_info *binfo,
> >  }
> >  }
> >
> > +rc = fdt_add_hotpluggable_memory_nodes(fdt, acells, scells);
> > +if (rc < 0) {
> > +fprintf(stderr, "couldn't add hotpluggable memory nodes\n");
> > +goto fail;
> > +}
> > +
> >  rc = fdt_path_offset(fdt, "/chosen");
> >  if (rc < 0) {
> >  qemu_fdt_add_subnode(fdt, "/chosen");
> >
> 
> 
> Given patches #7 and #8, as I understand them, the firmware cannot
> distinguish hotpluggable & present, from hotpluggable & absent. The firmware
> can only skip both hotpluggable cases. That's fine in that the firmware will 
> hog
> neither type -- but is that OK for the OS as well, for both ACPI boot and DT
> boot?

Right. This only handles the hotpluggable-and-present condition.

> Consider in particular the "hotpluggable & present, ACPI boot" case. Assuming
> we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap
> will not include the range despite it being present at boot. Presumably, ACPI
> will refer to the range somehow, however. Will that not confuse the OS?

From my testing so far, without patches #7 and #8(ie, no UEFI memmap entry),
ACPI boots fine. I think ACPI only relies on aml and SRAT. 
 
> When Igor raised this earlier, I suggested that hotpluggable-and-present
> should be added b

Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt"

2019-04-09 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 08 April 2019 09:12
> To: Shameerali Kolothum Thodi 
> Cc: Laszlo Ersek ; Auger Eric ;
> Ard Biesheuvel ; peter.mayd...@linaro.org;
> sa...@linux.intel.com; qemu-devel@nongnu.org; Linuxarm
> ; shannon.zha...@gmail.com;
> qemu-...@nongnu.org; xuwei (O) ;
> sebastien.bo...@intel.com; Leif Lindholm 
> Subject: Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in
> feature "fdt"

[...]
 
> > > > If the above is correct(with 32-bit variant of UEFI, OS cannot have ACPI
> boot),
> > > > then do we really have the issue of memory becoming non
> > > hot-un-unpluggable?
> > > > May be I am missing something.
> > >
> > > I think Igor and Peter dislike adding complex logic to QEMU that
> > > reflects the behavior of a specific firmware. AIUI their objection isn't
> > > that it wouldn't work, but that it's not the right thing to do, from a
> > > design perspective.
> >
> > Understood. Hope we can converge on something soon.
> Lets try adding a parameter to memory descriptors in DT that would mark
> them as hotpluggable.

Just send out v4 incorporating this. Please take a look and let me know.

Thanks,
Shameer
 



  1   2   >