Re: [PATCH v2] i386: Add the support for AMD EPYC 3rd generation processors

2021-02-24 Thread Pankaj Gupta
> It is normally added as v2 for compatibility. Like this.

o.k. Thanks!
I will test this tomorrow.

>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 24db7ed892..f721d0db78 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -4179,6 +4179,20 @@ static X86CPUDefinition builtin_x86_defs[] = {
>  .xlevel = 0x801E,
>  .model_id = "AMD EPYC-Rome Processor",
>  .cache_info = &epyc_rome_cache_info,
> +.versions = (X86CPUVersionDefinition[]) {
> +{ .version = 1 },
> +{
> +.version = 2,
> +.props = (PropValue[]) {
> +{ "ibrs", "on" },
> +{ "amd-ssbd", "on" },
> +{ "model-id",
> +  "AMD EPYC-Rome Processor" },
> +{ /* end of list */ }
> +}
> +},
> +{ /* end of list */ }
> +}
>  },
>  {
>  .name = "EPYC-Milan",



Re: [PATCH v2] i386: Add the support for AMD EPYC 3rd generation processors

2021-02-24 Thread Pankaj Gupta
Hi Babu,

> >> +.features[FEAT_8000_0008_EBX] =
> >> +CPUID_8000_0008_EBX_CLZERO | CPUID_8000_0008_EBX_XSAVEERPTR |
> >> +CPUID_8000_0008_EBX_WBNOINVD | CPUID_8000_0008_EBX_IBPB |
> >> +CPUID_8000_0008_EBX_IBRS | CPUID_8000_0008_EBX_STIBP |
> >> +CPUID_8000_0008_EBX_AMD_SSBD,
> >
> > Don't have SSBD flag exposed in default EPYC-Rome CPU configuration?
> > Is there any reason for this?
> > Or do we need to explicitly add it?
>
> I think we missed it when we added EPYC-Rome model. I was going to add it
> sometime soon. As you know users can still add it with "+ssbd" if required.

Thanks for clarifying. I also see CPUID_8000_0008_EBX_IBRS missing for Rome.
Will it be okay if we add them now for Rome?

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6a53446e6a..b495116545 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4161,7 +4161,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
 .features[FEAT_8000_0008_EBX] =
 CPUID_8000_0008_EBX_CLZERO | CPUID_8000_0008_EBX_XSAVEERPTR |
 CPUID_8000_0008_EBX_WBNOINVD | CPUID_8000_0008_EBX_IBPB |
-CPUID_8000_0008_EBX_STIBP,
+CPUID_8000_0008_EBX_STIBP | CPUID_8000_0008_EBX_IBRS |
+CPUID_8000_0008_EBX_AMD_SSBD,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
 CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |



Re: [PATCH v2] i386: Add the support for AMD EPYC 3rd generation processors

2021-02-23 Thread Pankaj Gupta
Hello Babu,

Have below doubt about exposed CPU flags between EPYC-Rome & EPYC-Milan family.
Please see below.

> Adds the support for AMD 3rd generation processors. The model
> display for the new processor will be EPYC-Milan.
>
> Adds the following new feature bits on top of the feature bits from
> the first and second generation EPYC models.
>
> pcid  : Process context identifiers support
> ibrs  : Indirect Branch Restricted Speculation
> ssbd  : Speculative Store Bypass Disable
> erms  : Enhanced REP MOVSB/STOSB support
> fsrm  : Fast Short REP MOVSB support
> invpcid   : Invalidate processor context ID
> pku   : Protection keys support
> svme-addr-chk : SVM instructions address check for #GP handling
>
> Depends on the following kernel commits:
> 14c2bf81fcd2 ("KVM: SVM: Fix #GP handling for doubly-nested virtualization")
> 3b9c723ed7cf ("KVM: SVM: Add support for SVM instruction address check 
> change")
> 4aa2691dcbd3 ("8ce1c461188799d863398dd2865d KVM: x86: Factor out x86 
> instruction emulation with decoding")
> 4407a797e941 ("KVM: SVM: Enable INVPCID feature on AMD")
> 9715092f8d7e ("KVM: X86: Move handling of INVPCID types to x86")
> 3f3393b3ce38 ("KVM: X86: Rename and move the function 
> vmx_handle_memory_failure to x86.c")
> 830bd71f2c06 ("KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and 
> is_cr_intercept")
> 4c44e8d6c193 ("KVM: SVM: Add new intercept word in vmcb_control_area")
> c62e2e94b9d4 ("KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors")
> 9780d51dc2af ("KVM: SVM: Modify intercept_exceptions to generic intercepts")
> 30abaa88382c ("KVM: SVM: Change intercept_dr to generic intercepts")
> 03bfeeb988a9 ("KVM: SVM: Change intercept_cr to generic intercepts")
> c45ad7229d13 ("KVM: SVM: Introduce 
> vmcb_(set_intercept/clr_intercept/_is_intercept)")
> a90c1ed9f11d ("(pcid) KVM: nSVM: Remove unused field")
> fa44b82eb831 ("KVM: x86: Move MPK feature detection to common code")
> 38f3e775e9c2 ("x86/Kconfig: Update config and kernel doc for MPK feature on 
> AMD")
> 37486135d3a7 ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it 
> to x86.c")
>
> Signed-off-by: Babu Moger 
> ---
> v2: Added svme-addr-chk. Also added all the dependent kernel commits in the 
> log.
>
> v1: 
> https://lore.kernel.org/qemu-devel/16118780.27536.17735339269843944966.stgit@bmoger-ubuntu/
>
>  target/i386/cpu.c |  107 
> +
>  target/i386/cpu.h |4 ++
>  2 files changed, 110 insertions(+), 1 deletion(-)
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 9c3d2d60b7..24db7ed892 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1033,7 +1033,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] 
> = {
>  "clzero", NULL, "xsaveerptr", NULL,
>  NULL, NULL, NULL, NULL,
>  NULL, "wbnoinvd", NULL, NULL,
> -"ibpb", NULL, NULL, "amd-stibp",
> +"ibpb", NULL, "ibrs", "amd-stibp",
>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
>  "amd-ssbd", "virt-ssbd", "amd-no-ssb", NULL,
> @@ -1798,6 +1798,56 @@ static CPUCaches epyc_rome_cache_info = {
>  },
>  };
>
> +static CPUCaches epyc_milan_cache_info = {
> +.l1d_cache = &(CPUCacheInfo) {
> +.type = DATA_CACHE,
> +.level = 1,
> +.size = 32 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,
> +.no_invd_sharing = true,
> +},
> +.l1i_cache = &(CPUCacheInfo) {
> +.type = INSTRUCTION_CACHE,
> +.level = 1,
> +.size = 32 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 64,
> +.lines_per_tag = 1,
> +.self_init = 1,
> +.no_invd_sharing = true,
> +},
> +.l2_cache = &(CPUCacheInfo) {
> +.type = UNIFIED_CACHE,
> +.level = 2,
> +.size = 512 * KiB,
> +.line_size = 64,
> +.associativity = 8,
> +.partitions = 1,
> +.sets = 1024,
> +.lines_per_tag = 1,
> +},
> +.l3_cache = &(CPUCacheInfo) {
> +.type = UNIFIED_CACHE,
> +.level = 3,
> +.size = 32 * MiB,
> +.line_size = 64,
> +.associativity = 16,
> +.partitions = 1,
> +.sets = 32768,
> +.lines_per_tag = 1,
> +.self_init = true,
> +.inclusive = true,
> +.complex_indexing = true,
> +},
> +};
> +
>  /* The following VMX features are not supported by KVM and are left out in 
> the
>   * CPU definitions:
>   *
> @@ -4130,6 +4180,61 @@ static X86CPUDefinition builtin_x86_defs[] = {
>  .model_id = "AMD EPYC-Rome Processor",
>  .cache_info = &epyc_rome_cache_info,
>  },
> +{
> +.name = "EPYC-Milan",
> +

Re: [PATCH] virtio-pmem: add trace events

2021-02-02 Thread Pankaj Gupta
Ping

@M

On Wed, 9 Dec 2020 at 20:15, David Hildenbrand  wrote:
>
> On 17.11.20 12:57, Pankaj Gupta wrote:
> > This patch adds trace events for virtio-pmem functionality.
> > Adding trace events for virtio pmem request, reponse and host
> > side fsync functionality.
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  hw/virtio/trace-events  | 5 +
> >  hw/virtio/virtio-pmem.c | 4 
> >  2 files changed, 9 insertions(+)
> >
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index 2060a144a2..c62727f879 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -122,3 +122,8 @@ virtio_mem_unplug_all_request(void) ""
> >  virtio_mem_resized_usable_region(uint64_t old_size, uint64_t new_size) 
> > "old_size=0x%" PRIx64 "new_size=0x%" PRIx64
> >  virtio_mem_state_request(uint64_t addr, uint16_t nb_blocks) "addr=0x%" 
> > PRIx64 " nb_blocks=%" PRIu16
> >  virtio_mem_state_response(uint16_t state) "state=%" PRIu16
> > +
> > +# virtio-pmem.c
> > +virtio_pmem_flush_request(void) "flush request"
> > +virtio_pmem_response(void) "flush response"
> > +virtio_pmem_flush_done(int type) "fsync return=%d"
> > diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
> > index ddb0125901..d83e973bf2 100644
> > --- a/hw/virtio/virtio-pmem.c
> > +++ b/hw/virtio/virtio-pmem.c
> > @@ -24,6 +24,7 @@
> >  #include "sysemu/hostmem.h"
> >  #include "block/aio.h"
> >  #include "block/thread-pool.h"
> > +#include "trace.h"
> >
> >  typedef struct VirtIODeviceRequest {
> >  VirtQueueElement elem;
> > @@ -41,6 +42,7 @@ static int worker_cb(void *opaque)
> >
> >  /* flush raw backing image */
> >  err = fsync(req_data->fd);
> > +trace_virtio_pmem_flush_done(err);
> >  if (err != 0) {
> >  err = 1;
> >  }
> > @@ -59,6 +61,7 @@ static void done_cb(void *opaque, int ret)
> >  /* Callbacks are serialized, so no need to use atomic ops. */
> >  virtqueue_push(req_data->pmem->rq_vq, &req_data->elem, len);
> >  virtio_notify((VirtIODevice *)req_data->pmem, req_data->pmem->rq_vq);
> > +trace_virtio_pmem_response();
> >  g_free(req_data);
> >  }
> >
> > @@ -69,6 +72,7 @@ static void virtio_pmem_flush(VirtIODevice *vdev, 
> > VirtQueue *vq)
> >  HostMemoryBackend *backend = MEMORY_BACKEND(pmem->memdev);
> >  ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
> >
> > +trace_virtio_pmem_flush_request();
> >  req_data = virtqueue_pop(vq, sizeof(VirtIODeviceRequest));
> >  if (!req_data) {
> >  virtio_error(vdev, "virtio-pmem missing request data");
> >
>
> LGTM
>
> Reviewed-by: David Hildenbrand 
>
> --
> Thanks,
>
> David / dhildenb
>



Re: [PATCH v3 04/10] vfio: Query and store the maximum number of DMA mappings

2020-12-16 Thread Pankaj Gupta
> Let's query the maximum number of DMA mappings by querying the available
> mappings when creating the container.
>
> In addition, count the number of DMA mappings and warn when we would
> exceed it. This is a preparation for RamDiscardMgr which might
> create quite some DMA mappings over time, and we at least want to warn
> early that the QEMU setup might be problematic. Use "reserved"
> terminology, so we can use this to reserve mappings before they are
> actually created.
>
> Note: don't reserve vIOMMU DMA mappings - using the vIOMMU region size
> divided by the mapping page size might be a bad indication of what will
> happen in practice - we might end up warning all the time.
>
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/vfio/common.c  | 34 ++
>  include/hw/vfio/vfio-common.h |  2 ++
>  2 files changed, 36 insertions(+)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 6ff1daa763..5ad88d476f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -288,6 +288,26 @@ const MemoryRegionOps vfio_region_ops = {
>  },
>  };
>
> +static void vfio_container_dma_reserve(VFIOContainer *container,
> +   unsigned long dma_mappings)
> +{
> +bool warned = container->dma_reserved > container->dma_max;
> +
> +container->dma_reserved += dma_mappings;
> +if (!warned && container->dma_max &&
> +container->dma_reserved > container->dma_max) {
> +warn_report("%s: possibly running out of DMA mappings. "
> +" Maximum number of DMA mappings: %d", __func__,
> +container->dma_max);
> +}
> +}
> +
> +static void vfio_container_dma_unreserve(VFIOContainer *container,
> + unsigned long dma_mappings)
> +{
> +container->dma_reserved -= dma_mappings;
> +}
> +
>  /*
>   * Device state interfaces
>   */
> @@ -835,6 +855,9 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  }
>  }
>
> +/* We'll need one DMA mapping. */
> +vfio_container_dma_reserve(container, 1);
> +
>  ret = vfio_dma_map(container, iova, int128_get64(llsize),
> vaddr, section->readonly);
>  if (ret) {
> @@ -879,6 +902,7 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>   MemoryRegionSection *section)
>  {
>  VFIOContainer *container = container_of(listener, VFIOContainer, 
> listener);
> +bool unreserve_on_unmap = true;
>  hwaddr iova, end;
>  Int128 llend, llsize;
>  int ret;
> @@ -919,6 +943,7 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>   * based IOMMU where a big unmap flattens a large range of IO-PTEs.
>   * That may not be true for all IOMMU types.
>   */
> +unreserve_on_unmap = false;
>  }
>
>  iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> @@ -970,6 +995,11 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>   "0x%"HWADDR_PRIx") = %d (%m)",
>   container, iova, int128_get64(llsize), ret);
>  }
> +
> +/* We previously reserved one DMA mapping. */
> +if (unreserve_on_unmap) {
> +vfio_container_dma_unreserve(container, 1);
> +}
>  }
>
>  memory_region_unref(section->mr);
> @@ -1735,6 +1765,7 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>  container->fd = fd;
>  container->error = NULL;
>  container->dirty_pages_supported = false;
> +container->dma_max = 0;
>  QLIST_INIT(&container->giommu_list);
>  QLIST_INIT(&container->hostwin_list);
>
> @@ -1765,7 +1796,10 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>  vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
>  container->pgsizes = info->iova_pgsizes;
>
> +/* The default in the kernel ("dma_entry_limit") is 65535. */
> +container->dma_max = 65535;
>  if (!ret) {
> +vfio_get_info_dma_avail(info, &container->dma_max);

Re: [PATCH v2 07/10] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)

2020-12-10 Thread Pankaj Gupta
> >>  bool ram_block_discard_is_disabled(void)
> >>  {
> >> -return qatomic_read(&ram_block_discard_disabled) > 0;
> >> +return qatomic_read(&ram_block_discard_disablers);
> >>  }
> > return value won't be bool?
>
> The compiler does type conversion.
>
> != 0 -> true
> == 0 -> false
ah... I missed it :(

Thanks,
Pankaj



Re: [PATCH v2 07/10] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)

2020-12-10 Thread Pankaj Gupta
> We have users in migration context that don't hold the BQL (when
> finishing migration). To prepare for further changes, use a dedicated mutex
> instead of atomic operations. Keep using qatomic_read ("READ_ONCE") for the
> functions that only extract the current state (e.g., used by
> virtio-balloon), locking isn't necessary.
>
> While at it, split up the counter into two variables to make it easier
> to understand.
>
> Suggested-by: Peter Xu 
> Reviewed-by: Peter Xu 
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  softmmu/physmem.c | 70 ++-
>  1 file changed, 39 insertions(+), 31 deletions(-)
>
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 3027747c03..448e4e8c86 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -3650,56 +3650,64 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, 
> MemoryRegion *root)
>  }
>  }
>
> -/*
> - * If positive, discarding RAM is disabled. If negative, discarding RAM is
> - * required to work and cannot be disabled.
> - */
> -static int ram_block_discard_disabled;
> +static unsigned int ram_block_discard_requirers;
> +static unsigned int ram_block_discard_disablers;
> +static QemuMutex ram_block_discard_disable_mutex;
> +
> +static void ram_block_discard_disable_mutex_lock(void)
> +{
> +static gsize initialized;
> +
> +if (g_once_init_enter(&initialized)) {
> +qemu_mutex_init(&ram_block_discard_disable_mutex);
> +g_once_init_leave(&initialized, 1);
> +}
> +qemu_mutex_lock(&ram_block_discard_disable_mutex);
> +}
> +
> +static void ram_block_discard_disable_mutex_unlock(void)
> +{
> +qemu_mutex_unlock(&ram_block_discard_disable_mutex);
> +}
>
>  int ram_block_discard_disable(bool state)
>  {
> -int old;
> +int ret = 0;
>
> +ram_block_discard_disable_mutex_lock();
>  if (!state) {
> -qatomic_dec(&ram_block_discard_disabled);
> -return 0;
> +ram_block_discard_disablers--;
> +} else if (!ram_block_discard_requirers) {
> +ram_block_discard_disablers++;
> +} else {
> +ret = -EBUSY;
>  }
> -
> -do {
> -old = qatomic_read(&ram_block_discard_disabled);
> -if (old < 0) {
> -return -EBUSY;
> -}
> -} while (qatomic_cmpxchg(&ram_block_discard_disabled,
> - old, old + 1) != old);
> -return 0;
> +ram_block_discard_disable_mutex_unlock();
> +return ret;
>  }
>
>  int ram_block_discard_require(bool state)
>  {
> -int old;
> +int ret = 0;
>
> +ram_block_discard_disable_mutex_lock();
>  if (!state) {
> -qatomic_inc(&ram_block_discard_disabled);
> -return 0;
> +ram_block_discard_requirers--;
> +} else if (!ram_block_discard_disablers) {
> +ram_block_discard_requirers++;
> +} else {
> +ret = -EBUSY;
>  }
> -
> -do {
> -old = qatomic_read(&ram_block_discard_disabled);
> -if (old > 0) {
> -return -EBUSY;
> -}
> -} while (qatomic_cmpxchg(&ram_block_discard_disabled,
> - old, old - 1) != old);
> -return 0;
> +ram_block_discard_disable_mutex_unlock();
> +return ret;
>  }
>
>  bool ram_block_discard_is_disabled(void)
>  {
> -return qatomic_read(&ram_block_discard_disabled) > 0;
> +return qatomic_read(&ram_block_discard_disablers);
>  }
return value won't be bool?
>
>  bool ram_block_discard_is_required(void)
>  {
> -return qatomic_read(&ram_block_discard_disabled) < 0;
> +return qatomic_read(&ram_block_discard_requirers);
same here.
>  }
> --
Apart from query above, looks good.
Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 10/10] vfio: Disable only uncoordinated discards

2020-12-10 Thread Pankaj Gupta
> We support coordinated discarding of RAM using the RamDiscardMgr. Let's
> unlock support for coordinated discards, keeping uncoordinated discards
> (e.g., via virtio-balloon) disabled.
>
> This unlocks virtio-mem + vfio. Note that vfio used via "nvme://" by the
> block layer has to be implemented/unlocked separately. For now,
> virtio-mem only supports x86-64 - spapr IOMMUs are not tested/affected.
>
> Note: The block size of a virtio-mem device has to be set to sane sizes,
> depending on the maximum hotplug size - to not run out of vfio mappings.
> The default virtio-mem block size is usually in the range of a couple of
> MBs. The maximum number of mapping is 64k, shared with other users.
> Assume you want to hotplug 256GB using virtio-mem - the block size would
> have to be set to at least 8 MiB (resulting in 32768 separate mappings).
>
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/vfio/common.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 57c83a2f14..3ce5e26bab 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1974,8 +1974,10 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>   * new memory, it will not yet set ram_block_discard_set_required() and
>   * therefore, neither stops us here or deals with the sudden memory
>   * consumption of inflated memory.
> + *
> + * We do support discarding of memory coordinated via the RamDiscardMgr.
>   */
> -ret = ram_block_discard_disable(true);
> +ret = ram_block_uncoordinated_discard_disable(true);
>  if (ret) {
>  error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
>  return ret;
> @@ -2155,7 +2157,7 @@ close_fd_exit:
>  close(fd);
>
>  put_space_exit:
> -ram_block_discard_disable(false);
> +ram_block_uncoordinated_discard_disable(false);
>  vfio_put_address_space(space);
>
>  return ret;
> @@ -2277,7 +2279,7 @@ void vfio_put_group(VFIOGroup *group)
>  }
>
>  if (!group->ram_block_discard_allowed) {
> -ram_block_discard_disable(false);
> +ram_block_uncoordinated_discard_disable(false);
>  }
>  vfio_kvm_device_del_group(group);
>  vfio_disconnect_container(group);
> @@ -2331,7 +2333,7 @@ int vfio_get_device(VFIOGroup *group, const char *name,
>
>  if (!group->ram_block_discard_allowed) {
>  group->ram_block_discard_allowed = true;
> -ram_block_discard_disable(false);
> +ram_block_uncoordinated_discard_disable(false);
>  }
>  }
>
Looks good to me.
Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 08/10] softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types

2020-12-09 Thread Pankaj Gupta
> We want to separate the two cases whereby we discard ram
> - uncoordinated: e.g., virito-balloon
> - coordinated: e.g., virtio-mem coordinated via the RamDiscardMgr
>
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  include/exec/memory.h | 18 +--
>  softmmu/physmem.c | 54 ++-
>  2 files changed, 64 insertions(+), 8 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 30d4fcd2c0..3f0942aa5b 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2804,6 +2804,12 @@ static inline MemOp devend_memop(enum device_endian 
> end)
>   */
>  int ram_block_discard_disable(bool state);
>
> +/*
> + * See ram_block_discard_disable(): only disable unccordinated discards,
s/unccordinated/uncoordinated
> + * keeping coordinated discards (via the RamDiscardMgr) enabled.
> + */
> +int ram_block_uncoordinated_discard_disable(bool state);
> +
>  /*
>   * Inhibit technologies that disable discarding of pages in RAM blocks.
>   *
> @@ -2813,12 +2819,20 @@ int ram_block_discard_disable(bool state);
>  int ram_block_discard_require(bool state);
>
>  /*
> - * Test if discarding of memory in ram blocks is disabled.
> + * See ram_block_discard_require(): only inhibit technologies that disable
> + * uncoordinated discarding of pages in RAM blocks, allowing co-existance 
> with
> + * technologies that only inhibit uncoordinated discards (via the
> + * RamDiscardMgr).
> + */
> +int ram_block_coordinated_discard_require(bool state);
> +
> +/*
> + * Test if any discarding of memory in ram blocks is disabled.
>   */
>  bool ram_block_discard_is_disabled(void);
>
>  /*
> - * Test if discarding of memory in ram blocks is required to work reliably.
> + * Test if any discarding of memory in ram blocks is required to work 
> reliably.
>   */
>  bool ram_block_discard_is_required(void);
>
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 448e4e8c86..7a4f3db1b4 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -3650,8 +3650,14 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, 
> MemoryRegion *root)
>  }
>  }
>
> +/* Require any discards to work. */
>  static unsigned int ram_block_discard_requirers;
> +/* Require only coordinated discards to work. */
> +static unsigned int ram_block_coordinated_discard_requirers;
> +/* Disable any discards. */
>  static unsigned int ram_block_discard_disablers;
> +/* Disable only uncoordinated disards. */
s/disards/discards
> +static unsigned int ram_block_uncoordinated_discard_disablers;
>  static QemuMutex ram_block_discard_disable_mutex;
>
>  static void ram_block_discard_disable_mutex_lock(void)
> @@ -3677,10 +3683,27 @@ int ram_block_discard_disable(bool state)
>  ram_block_discard_disable_mutex_lock();
>  if (!state) {
>  ram_block_discard_disablers--;
> -} else if (!ram_block_discard_requirers) {
> -ram_block_discard_disablers++;
> +} else if (ram_block_discard_requirers ||
> +   ram_block_coordinated_discard_requirers) {
> +ret = -EBUSY;
>  } else {
> +ram_block_discard_disablers++;
> +}
> +ram_block_discard_disable_mutex_unlock();
> +return ret;
> +}
> +
> +int ram_block_uncoordinated_discard_disable(bool state)
> +{
> +int ret = 0;
> +
> +ram_block_discard_disable_mutex_lock();
> +if (!state) {
> +ram_block_uncoordinated_discard_disablers--;
> +} else if (ram_block_discard_requirers) {
>  ret = -EBUSY;
> +} else {
> +ram_block_uncoordinated_discard_disablers++;
>  }
>  ram_block_discard_disable_mutex_unlock();
>  return ret;
> @@ -3693,10 +3716,27 @@ int ram_block_discard_require(bool state)
>  ram_block_discard_disable_mutex_lock();
>  if (!state) {
>  ram_block_discard_requirers--;
> -} else if (!ram_block_discard_disablers) {
> -ram_block_discard_requirers++;
> +} else if (ram_block_discard_disablers ||
> +   ram_block_uncoordinated_discard_disablers) {
> +ret = -EBUSY;
>  } else {
> +ram_block_discard_requirers++;
> +}
> +ram_block_discard_disable_mutex_unlock();
> +return ret;
> +}
> +
> +int ram_block_coordinated_discard_require(bool state)
> +{
> +int ret = 0;
> +
> +ram_block_discard_disable_mutex_l

Re: [PATCH v2 08/10] softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types

2020-12-09 Thread Pankaj Gupta
> [...]
>
> >> +/* Disable only uncoordinated disards. */
> > s/disards/discards
>
> Thanks!
>
> [...]
>
> >>
> >>  bool ram_block_discard_is_required(void)
> >>  {
> >> -return qatomic_read(&ram_block_discard_requirers);
> >> +return qatomic_read(&ram_block_discard_requirers) ||
> >> +   qatomic_read(&ram_block_coordinated_discard_requirers);
> >>  }
> >
> > How to differentiate if we have both un-coordinated & coordinated
> > cases together?
>
> Checking for both is sufficient for current users - which only care if
> any type of discard is required to work. Thanks!
O.k. Looks good to me.

Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 09/10] virtio-mem: Require only coordinated discards

2020-12-09 Thread Pankaj Gupta
> We implement the RamDiscardMgr interface and only require coordinated
> discarding of RAM to work.
>
> Reviewed-by: Dr. David Alan Gilbert 
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-mem.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index f419a758f3..99d0712195 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -687,7 +687,7 @@ static void virtio_mem_device_realize(DeviceState *dev, 
> Error **errp)
>  return;
>  }
>
> -if (ram_block_discard_require(true)) {
> +if (ram_block_coordinated_discard_require(true)) {
>  error_setg(errp, "Discarding RAM is disabled");
>  return;
>  }
> @@ -695,7 +695,7 @@ static void virtio_mem_device_realize(DeviceState *dev, 
> Error **errp)
>  ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
>  if (ret) {
>  error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
> -ram_block_discard_require(false);
> +ram_block_coordinated_discard_require(false);
>  return;
>  }
>
> @@ -738,7 +738,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
>  virtio_del_queue(vdev, 0);
>  virtio_cleanup(vdev);
>  g_free(vmem->bitmap);
> -ram_block_discard_require(false);
> +ram_block_coordinated_discard_require(false);
>  }
>
>  static int virtio_mem_discard_range_cb(const VirtIOMEM *vmem, void *arg,

Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 02/10] virtio-mem: Factor out traversing unplugged ranges

2020-12-09 Thread Pankaj Gupta
> Let's factor out the core logic, to be reused soon.
>
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-mem.c | 86 --
>  1 file changed, 49 insertions(+), 37 deletions(-)
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 655824ff81..471e464171 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -145,6 +145,33 @@ static bool virtio_mem_is_busy(void)
>  return migration_in_incoming_postcopy() || !migration_is_idle();
>  }
>
> +typedef int (*virtio_mem_range_cb)(const VirtIOMEM *vmem, void *arg,
> +   uint64_t offset, uint64_t size);
> +
> +static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void 
> *arg,
> +   virtio_mem_range_cb cb)
> +{
> +unsigned long first_zero_bit, last_zero_bit;
> +uint64_t offset, size;
> +int ret = 0;
> +
> +first_zero_bit = find_first_zero_bit(vmem->bitmap, vmem->bitmap_size);
> +while (first_zero_bit < vmem->bitmap_size) {
> +offset = first_zero_bit * vmem->block_size;
> +last_zero_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
> +  first_zero_bit + 1) - 1;
> +size = (last_zero_bit - first_zero_bit + 1) * vmem->block_size;
> +
> +ret = cb(vmem, arg, offset, size);
> +if (ret) {
> +break;
> +}
> +first_zero_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
> +last_zero_bit + 2);
> +}
> +return ret;
> +}
> +
>  static bool virtio_mem_test_bitmap(VirtIOMEM *vmem, uint64_t start_gpa,
> uint64_t size, bool plugged)
>  {
> @@ -594,33 +621,27 @@ static void virtio_mem_device_unrealize(DeviceState 
> *dev)
>  ram_block_discard_require(false);
>  }
>
> -static int virtio_mem_restore_unplugged(VirtIOMEM *vmem)
> +static int virtio_mem_discard_range_cb(const VirtIOMEM *vmem, void *arg,
> +   uint64_t offset, uint64_t size)
>  {
>  RAMBlock *rb = vmem->memdev->mr.ram_block;
> -unsigned long first_zero_bit, last_zero_bit;
> -uint64_t offset, length;
>  int ret;
>
> -/* Find consecutive unplugged blocks and discard the consecutive range. 
> */
> -first_zero_bit = find_first_zero_bit(vmem->bitmap, vmem->bitmap_size);
> -while (first_zero_bit < vmem->bitmap_size) {
> -offset = first_zero_bit * vmem->block_size;
> -last_zero_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
> -  first_zero_bit + 1) - 1;
> -length = (last_zero_bit - first_zero_bit + 1) * vmem->block_size;
> -
> -ret = ram_block_discard_range(rb, offset, length);
> -if (ret) {
> -error_report("Unexpected error discarding RAM: %s",
> - strerror(-ret));
> -return -EINVAL;
> -}
> -first_zero_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
> -last_zero_bit + 2);
> +ret = ram_block_discard_range(rb, offset, size);
> +if (ret) {
> +error_report("Unexpected error discarding RAM: %s", strerror(-ret));
> +return -EINVAL;
>  }
>  return 0;
>  }
>
> +static int virtio_mem_restore_unplugged(VirtIOMEM *vmem)
> +{
> +/* Make sure all memory is really discarded after migration. */
> +return virtio_mem_for_each_unplugged_range(vmem, NULL,
> +   virtio_mem_discard_range_cb);
> +}
> +
>  static int virtio_mem_post_load(void *opaque, int version_id)
>  {
>  if (migration_in_incoming_postcopy()) {
> @@ -872,28 +893,19 @@ static void virtio_mem_set_block_size(Object *obj, 
> Visitor *v, const char *name,
>  vmem->block_size = value;
>  }
>
> -static void virtio_mem_precopy_exclude_unplugged(VirtIOMEM *vmem)
> +static int virtio_mem_precopy_exclude_range_cb(const VirtIOMEM *vmem, void 
> *arg,
> +   uint64_t offset, uint64_t 
> size)
>  {
>  void * const host = qemu_ram_get_host_addr(vmem->memdev->mr.ram_block);
> -unsigned long first_zero_bit, l

Re: [PATCH v2 01/10] memory: Introduce RamDiscardMgr for RAM memory regions

2020-12-09 Thread Pankaj Gupta
> We have some special RAM memory regions (managed by virtio-mem), whereby
> the guest agreed to only use selected memory ranges. "unused" parts are
> discarded so they won't consume memory - to logically unplug these memory
> ranges. Before the VM is allowed to use such logically unplugged memory
> again, coordination with the hypervisor is required.
>
> This results in "sparse" mmaps/RAMBlocks/memory regions, whereby only
> coordinated parts are valid to be used/accessed by the VM.
>
> In most cases, we don't care about that - e.g., in KVM, we simply have a
> single KVM memory slot. However, in case of vfio, registering the
> whole region with the kernel results in all pages getting pinned, and
> therefore an unexpected high memory consumption - discarding of RAM in
> that context is broken.
>
> Let's introduce a way to coordinate discarding/populating memory within a
> RAM memory region with such special consumers of RAM memory regions: they
> can register as listeners and get updates on memory getting discarded and
> populated. Using this machinery, vfio will be able to map only the
> currently populated parts, resulting in discarded parts not getting pinned
> and not consuming memory.
>
> A RamDiscardMgr has to be set for a memory region before it is getting
> mapped, and cannot change while the memory region is mapped.
>
> Note: At some point, we might want to let RAMBlock users (esp. vfio used
> for nvme://) consume this interface as well. We'll need RAMBlock notifier
> calls when a RAMBlock is getting mapped/unmapped (via the corresponding
> memory region), so we can properly register a listener there as well.
>
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  include/exec/memory.h | 231 ++
>  softmmu/memory.c  |  22 
>  2 files changed, 253 insertions(+)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 0f3e6bcd5e..30d4fcd2c0 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -42,6 +42,12 @@ typedef struct IOMMUMemoryRegionClass 
> IOMMUMemoryRegionClass;
>  DECLARE_OBJ_CHECKERS(IOMMUMemoryRegion, IOMMUMemoryRegionClass,
>   IOMMU_MEMORY_REGION, TYPE_IOMMU_MEMORY_REGION)
>
> +#define TYPE_RAM_DISCARD_MGR "qemu:ram-discard-mgr"
> +typedef struct RamDiscardMgrClass RamDiscardMgrClass;
> +typedef struct RamDiscardMgr RamDiscardMgr;
> +DECLARE_OBJ_CHECKERS(RamDiscardMgr, RamDiscardMgrClass, RAM_DISCARD_MGR,
> + TYPE_RAM_DISCARD_MGR);
> +
>  #ifdef CONFIG_FUZZ
>  void fuzz_dma_read_cb(size_t addr,
>size_t len,
> @@ -116,6 +122,66 @@ struct IOMMUNotifier {
>  };
>  typedef struct IOMMUNotifier IOMMUNotifier;
>
> +struct RamDiscardListener;
> +typedef int (*NotifyRamPopulate)(struct RamDiscardListener *rdl,
> + const MemoryRegion *mr, ram_addr_t offset,
> + ram_addr_t size);
> +typedef void (*NotifyRamDiscard)(struct RamDiscardListener *rdl,
> + const MemoryRegion *mr, ram_addr_t offset,
> + ram_addr_t size);
> +typedef void (*NotifyRamDiscardAll)(struct RamDiscardListener *rdl,
> +const MemoryRegion *mr);
> +
> +typedef struct RamDiscardListener {
> +/*
> + * @notify_populate:
> + *
> + * Notification that previously discarded memory is about to get 
> populated.
> + * Listeners are able to object. If any listener objects, already
> + * successfully notified listeners are notified about a discard again.
> + *
> + * @rdl: the #RamDiscardListener getting notified
> + * @mr: the relevant #MemoryRegion
> + * @offset: offset into the #MemoryRegion, aligned to minimum 
> granularity of
> + *  the #RamDiscardMgr
> + * @size: the size, aligned to minimum granularity of the #RamDiscardMgr
> + *
> + * Returns 0 on success. If the notification is rejected by the listener,
> + * an error is returned.
> + */
> +NotifyRamPopulate notify_populate;
> +
> +/*
> + * @notify_discard:
> + *
> + * Notification that previously populated memory was discarded 
> successfully
> + * and listeners should drop all references to such memory and prevent
> + * new population (e.g., unmap).
> + *
> + * @

Re: [PATCH] virtio-pmem: add trace events

2020-12-09 Thread Pankaj Gupta
ping.

> This patch adds trace events for virtio-pmem functionality.
> Adding trace events for virtio pmem request, reponse and host
> side fsync functionality.
>
> Signed-off-by: Pankaj Gupta 
> ---
>  hw/virtio/trace-events  | 5 +
>  hw/virtio/virtio-pmem.c | 4 
>  2 files changed, 9 insertions(+)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 2060a144a2..c62727f879 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -122,3 +122,8 @@ virtio_mem_unplug_all_request(void) ""
>  virtio_mem_resized_usable_region(uint64_t old_size, uint64_t new_size) 
> "old_size=0x%" PRIx64 "new_size=0x%" PRIx64
>  virtio_mem_state_request(uint64_t addr, uint16_t nb_blocks) "addr=0x%" 
> PRIx64 " nb_blocks=%" PRIu16
>  virtio_mem_state_response(uint16_t state) "state=%" PRIu16
> +
> +# virtio-pmem.c
> +virtio_pmem_flush_request(void) "flush request"
> +virtio_pmem_response(void) "flush response"
> +virtio_pmem_flush_done(int type) "fsync return=%d"
> diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
> index ddb0125901..d83e973bf2 100644
> --- a/hw/virtio/virtio-pmem.c
> +++ b/hw/virtio/virtio-pmem.c
> @@ -24,6 +24,7 @@
>  #include "sysemu/hostmem.h"
>  #include "block/aio.h"
>  #include "block/thread-pool.h"
> +#include "trace.h"
>
>  typedef struct VirtIODeviceRequest {
>  VirtQueueElement elem;
> @@ -41,6 +42,7 @@ static int worker_cb(void *opaque)
>
>  /* flush raw backing image */
>  err = fsync(req_data->fd);
> +trace_virtio_pmem_flush_done(err);
>  if (err != 0) {
>  err = 1;
>  }
> @@ -59,6 +61,7 @@ static void done_cb(void *opaque, int ret)
>  /* Callbacks are serialized, so no need to use atomic ops. */
>  virtqueue_push(req_data->pmem->rq_vq, &req_data->elem, len);
>  virtio_notify((VirtIODevice *)req_data->pmem, req_data->pmem->rq_vq);
> +trace_virtio_pmem_response();
>  g_free(req_data);
>  }
>
> @@ -69,6 +72,7 @@ static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue 
> *vq)
>  HostMemoryBackend *backend = MEMORY_BACKEND(pmem->memdev);
>  ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
>
> +trace_virtio_pmem_flush_request();
>  req_data = virtqueue_pop(vq, sizeof(VirtIODeviceRequest));
>  if (!req_data) {
>  virtio_error(vdev, "virtio-pmem missing request data");
> --
> 2.20.1
>



Re: [PATCH v3] migration: Don't allow migration if vm is in POSTMIGRATE

2020-12-09 Thread Pankaj Gupta
> The following steps will cause qemu assertion failure:
> - pause vm by executing 'virsh suspend'
> - create external snapshot of memory and disk using 'virsh snapshot-create-as'
> - doing the above operation again will cause qemu crash
>
> The backtrace looks like:
> #0  0x7fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x7fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x7fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x7fbf958beca2 in __assert_fail () from 
> /lib/x86_64-linux-gnu/libc.so.6
> #4  0x55ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400) at 
> /build/qemu-5.0/block.c:5724
> #5  0x55ca8dece967 in bdrv_inactivate_all () at 
> /build//qemu-5.0/block.c:5792
> #6  0x55ca8de5539d in qemu_savevm_state_complete_precopy_non_iterable 
> (inactivate_disks=true, in_postcopy=false, f=0x55ca907044b0)
> at /build/qemu-5.0/migration/savevm.c:1401
> #7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0, 
> iterable_only=iterable_only@entry=false, 
> inactivate_disks=inactivate_disks@entry=true)
> at /build/qemu-5.0/migration/savevm.c:1453
> #8  0x55ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at 
> /build/qemu-5.0/migration/migration.c:2941
> #9  migration_iteration_run (s=0x55ca8f64d9f0) at 
> /build/qemu-5.0/migration/migration.c:3295
> #10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at 
> /build/qemu-5.0/migration/migration.c:3459
> #11 0x55ca8dfc6716 in qemu_thread_start (args=) at 
> /build/qemu-5.0/util/qemu-thread-posix.c:519
> #12 0x7fbf95c5f184 in start_thread () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #13 0x7fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6
>
> When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE
> flag by bdrv_inactivate_all(), and during the second migration the
> bdrv_inactivate_recurse assert that the bs->open_flags is already
> BDRV_O_INACTIVE enabled which cause crash.
>
> As Vladimir suggested, this patch makes migrate_prepare check the state of vm 
> and
> return error if it is in RUN_STATE_POSTMIGRATE state.
>
> Signed-off-by: Tuguoyi 
Similar issue is reported by Li Zhang(+CC) with almost same patch[3]
to fix this.

Reported-by: Li Zhang 
Reviewed-by: Pankaj Gupta 

[3] https://marc.info/?l=qemu-devel&m=160749859831357&w=2
> ---
>  migration/migration.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 87a9b59..5e33962 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2115,6 +2115,12 @@ static bool migrate_prepare(MigrationState *s, bool 
> blk, bool blk_inc,
>  return false;
>  }
>
> +if (runstate_check(RUN_STATE_POSTMIGRATE)) {
> +error_setg(errp, "Can't migrate the vm that was paused due to "
> +   "previous migration");
> +return false;
> +}
> +
>  if (migration_is_blocked(errp)) {
>  return false;
>  }
> --
> 2.7.4
>
> [Patch v2]: 
> https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg01318.html
> [Patch v1]: 
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05950.html



Re: [PATCH 02/13] virtio-pmem: put it into the 'storage' category

2020-12-03 Thread Pankaj Gupta
> The category of the virtio-pmem device is not set, put it into the 'storage'
> category.
>
> Signed-off-by: Gan Qixin 
> ---
> Cc: Michael S. Tsirkin 
> ---
>  hw/virtio/virtio-pmem.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
> index ddb0125901..98b3139cd0 100644
> --- a/hw/virtio/virtio-pmem.c
> +++ b/hw/virtio/virtio-pmem.c
> @@ -175,6 +175,7 @@ static void virtio_pmem_class_init(ObjectClass *klass, 
> void *data)
>
>  vpc->fill_device_info = virtio_pmem_fill_device_info;
>  vpc->get_memory_region = virtio_pmem_get_memory_region;
> +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);

Just noticed "virtio_pmem_pci_class_init" is marked as DEVICE_CATEGORY_MISC.
So, both virtio-pmem & virtio_pmem_pci have the same device category?

Thanks,
Pankaj

>  }
>
>  static TypeInfo virtio_pmem_info = {
> --
> 2.23.0
>
>



Re: [PATCH v2 01/12] pc-dimm: put it into the 'storage' category

2020-12-03 Thread Pankaj Gupta
 >  mdc->get_memory_region = pc_dimm_md_get_memory_region;
> > >  mdc->fill_device_info = pc_dimm_md_fill_device_info;
> > > +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
> >
> > Any reason why pc-dimm would be in the storage category?
...
> Thanks for you reply. As far as I know, pc-dimm is a dimm device for memory 
> hotplug, described as a "DIMM memory module" in "-device help".
> This device looks related to storage, so I put it into the "storage" category 
> to make it easy to find. I'm not sure if this is appropriate, do you have any 
> better ideas?

O.k. I see storage & memory as different devices. Since we don't have
any memory device type defined, maybe its okay to put it in MISC
device.

Thanks,
Pankaj



Re: [PATCH v2 01/12] pc-dimm: put it into the 'storage' category

2020-11-30 Thread Pankaj Gupta
> The category of the pc-dimm device is not set, put it into the 'storage'
> category.
>
> Signed-off-by: Gan Qixin 
> ---
> Cc: Michael S. Tsirkin 
> ---
>  hw/mem/pc-dimm.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 2ffc986734..017146e3d1 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -282,6 +282,7 @@ static void pc_dimm_class_init(ObjectClass *oc, void 
> *data)
>  mdc->get_plugged_size = memory_device_get_region_size;
>  mdc->get_memory_region = pc_dimm_md_get_memory_region;
>  mdc->fill_device_info = pc_dimm_md_fill_device_info;
> +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);

Any reason why pc-dimm would be in the storage category?

>  }
>
>  static TypeInfo pc_dimm_info = {
> --
> 2.23.0
>
>



Re: [PATCH v2 04/12] nvdimm: put it into the 'storage' category

2020-11-30 Thread Pankaj Gupta
> The category of the nvdimm device is not set, put it into the 'storage'
> category.
>
> Signed-off-by: Gan Qixin 
> ---
> Cc: Xiao Guangrong 
> ---
>  hw/mem/nvdimm.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> index e1574bc07c..e30695b2ce 100644
> --- a/hw/mem/nvdimm.c
> +++ b/hw/mem/nvdimm.c
> @@ -236,6 +236,7 @@ static void nvdimm_class_init(ObjectClass *oc, void *data)
>
>  nvc->read_label_data = nvdimm_read_label_data;
>  nvc->write_label_data = nvdimm_write_label_data;
> +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
>  }
>
>  static TypeInfo nvdimm_info = {

Reviewed-by: Pankaj Gupta 



Re: [for-5.2 4/9] docs/system/virtio-pmem.rst: Fix minor style issues

2020-11-19 Thread Pankaj Gupta
c/msync is required. This is to persist guest writes in
> +host backing file which otherwise remains in host page cache and there is
> +risk of losing the data in case of power failure.
>
> - With virtio pmem device, MAP_SYNC mmap flag is not supported. This provides
> - a hint to application to perform fsync for write persistence.
> +With virtio pmem device, MAP_SYNC mmap flag is not supported. This provides
> +a hint to application to perform fsync for write persistence.
>
>  Limitations
>  

Reviewed-by: Pankaj Gupta 



Re: [PATCH 02/13] virtio-pmem: put it into the 'storage' category

2020-11-19 Thread Pankaj Gupta
> The category of the virtio-pmem device is not set, put it into the 'storage'
> category.
>
> Signed-off-by: Gan Qixin 
> ---
> Cc: Michael S. Tsirkin 
> ---
>  hw/virtio/virtio-pmem.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
> index ddb0125901..98b3139cd0 100644
> --- a/hw/virtio/virtio-pmem.c
> +++ b/hw/virtio/virtio-pmem.c
> @@ -175,6 +175,7 @@ static void virtio_pmem_class_init(ObjectClass *klass, 
> void *data)
>
>  vpc->fill_device_info = virtio_pmem_fill_device_info;
>  vpc->get_memory_region = virtio_pmem_get_memory_region;
> +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
>  }
>
>  static TypeInfo virtio_pmem_info = {

Reviewed-by: Pankaj Gupta 



[PATCH] virtio-pmem: add trace events

2020-11-17 Thread Pankaj Gupta
This patch adds trace events for virtio-pmem functionality.
Adding trace events for virtio pmem request, reponse and host
side fsync functionality.

Signed-off-by: Pankaj Gupta 
---
 hw/virtio/trace-events  | 5 +
 hw/virtio/virtio-pmem.c | 4 
 2 files changed, 9 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 2060a144a2..c62727f879 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -122,3 +122,8 @@ virtio_mem_unplug_all_request(void) ""
 virtio_mem_resized_usable_region(uint64_t old_size, uint64_t new_size) 
"old_size=0x%" PRIx64 "new_size=0x%" PRIx64
 virtio_mem_state_request(uint64_t addr, uint16_t nb_blocks) "addr=0x%" PRIx64 
" nb_blocks=%" PRIu16
 virtio_mem_state_response(uint16_t state) "state=%" PRIu16
+
+# virtio-pmem.c
+virtio_pmem_flush_request(void) "flush request"
+virtio_pmem_response(void) "flush response"
+virtio_pmem_flush_done(int type) "fsync return=%d"
diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
index ddb0125901..d83e973bf2 100644
--- a/hw/virtio/virtio-pmem.c
+++ b/hw/virtio/virtio-pmem.c
@@ -24,6 +24,7 @@
 #include "sysemu/hostmem.h"
 #include "block/aio.h"
 #include "block/thread-pool.h"
+#include "trace.h"
 
 typedef struct VirtIODeviceRequest {
 VirtQueueElement elem;
@@ -41,6 +42,7 @@ static int worker_cb(void *opaque)
 
 /* flush raw backing image */
 err = fsync(req_data->fd);
+trace_virtio_pmem_flush_done(err);
 if (err != 0) {
 err = 1;
 }
@@ -59,6 +61,7 @@ static void done_cb(void *opaque, int ret)
 /* Callbacks are serialized, so no need to use atomic ops. */
 virtqueue_push(req_data->pmem->rq_vq, &req_data->elem, len);
 virtio_notify((VirtIODevice *)req_data->pmem, req_data->pmem->rq_vq);
+trace_virtio_pmem_response();
 g_free(req_data);
 }
 
@@ -69,6 +72,7 @@ static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue 
*vq)
 HostMemoryBackend *backend = MEMORY_BACKEND(pmem->memdev);
 ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
 
+trace_virtio_pmem_flush_request();
 req_data = virtqueue_pop(vq, sizeof(VirtIODeviceRequest));
 if (!req_data) {
 virtio_error(vdev, "virtio-pmem missing request data");
-- 
2.20.1




Re: [PATCH] physmem: improve ram size error messages

2020-11-06 Thread Pankaj Gupta
Ping

>  Ram size mismatch condition logs below message.
>
>"Length mismatch: pc.ram: 0x8000 in != 0x18000: Invalid argument"
>
>  This patch improves the readability of error messages.
>  Removed the superflous "in" and changed "Length" to "Size".
>
> Signed-off-by: Pankaj Gupta 
> Reported-by: Li Zhang 
> ---
>  softmmu/physmem.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index e319fb2a1e..8da184f4a6 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1756,15 +1756,15 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t 
> newsize, Error **errp)
>
>  if (!(block->flags & RAM_RESIZEABLE)) {
>  error_setg_errno(errp, EINVAL,
> - "Length mismatch: %s: 0x" RAM_ADDR_FMT
> - " in != 0x" RAM_ADDR_FMT, block->idstr,
> + "Size mismatch: %s: 0x" RAM_ADDR_FMT
> + " != 0x" RAM_ADDR_FMT, block->idstr,
>   newsize, block->used_length);
>  return -EINVAL;
>  }
>
>  if (block->max_length < newsize) {
>  error_setg_errno(errp, EINVAL,
> - "Length too large: %s: 0x" RAM_ADDR_FMT
> + "Size too large: %s: 0x" RAM_ADDR_FMT
>   " > 0x" RAM_ADDR_FMT, block->idstr,
>   newsize, block->max_length);
>  return -EINVAL;
> --
> 2.20.1
>



Re: Documents not in sphinx toctree

2020-11-06 Thread Pankaj Gupta
> > >> By running sphinx over the docs/ directory (like readthedocs.org 
> > >> presumably does), it finds a couple of rst documents that are not 
> > >> referenced:
> > >> - cpu-hotplug.rst
> > >> - microvm.rst
> > >> - pr-manager.rst
> > >> - virtio-net-failover.rst
> >
> > Given the current structure of the content in
> > https://qemu.readthedocs.io/en/latest/,
> > would adding this as a new bullet in "QEMU System Emulation User’s
> > Guide" be the right thing to do?
>
> Adding which?
>
> For cpu-hotplug.rst:
>  I guess the system manual. The document has a bit of a
>  "tutorial" feel which doesn't entirely fit the rest of the
>  manuals.
>
> For microvm.rst:
>  docs/system/target-i386.rst should be split into
>  documentation for each of the machine models separately
>  (as a list of links to docs in docs/system/i386/, similar
>  to the structure of target-arm.rst and docs/system/arm/).
>  Then microvm.rst can go into docs/system/i386.
>
> For pr-manager.rst:
>  The parts that are documenting the qemu-pr-helper invocation
>  should turn into a docs/tools/ manpage for it.
>  The other parts should go in the system manual I guess.
>
> For virtio-net-failover.rst:
>  Should go under the "Network emulation" part of the system
>  manual, I think.

Maybe existing memory emulation 'txt' files need to be converted into '.rst',
and along with virtio-pmem.rst could be moved to a new section?

Thanks,
Pankaj



[PATCH] physmem: improve ram size error messages

2020-10-22 Thread Pankaj Gupta
 Ram size mismatch condition logs below message. 

   "Length mismatch: pc.ram: 0x8000 in != 0x18000: Invalid argument"

 This patch improves the readability of error messages. 
 Removed the superflous "in" and changed "Length" to "Size".

Signed-off-by: Pankaj Gupta 
Reported-by: Li Zhang 
---
 softmmu/physmem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index e319fb2a1e..8da184f4a6 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1756,15 +1756,15 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t 
newsize, Error **errp)
 
 if (!(block->flags & RAM_RESIZEABLE)) {
 error_setg_errno(errp, EINVAL,
- "Length mismatch: %s: 0x" RAM_ADDR_FMT
- " in != 0x" RAM_ADDR_FMT, block->idstr,
+ "Size mismatch: %s: 0x" RAM_ADDR_FMT
+ " != 0x" RAM_ADDR_FMT, block->idstr,
  newsize, block->used_length);
 return -EINVAL;
 }
 
 if (block->max_length < newsize) {
 error_setg_errno(errp, EINVAL,
- "Length too large: %s: 0x" RAM_ADDR_FMT
+ "Size too large: %s: 0x" RAM_ADDR_FMT
  " > 0x" RAM_ADDR_FMT, block->idstr,
  newsize, block->max_length);
 return -EINVAL;
-- 
2.20.1




Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-19 Thread Pankaj Gupta
@Jack Wang,
Maybe four io_uring patches in 5.4.71 fixes the issue for you as well?

Thanks,
Pankaj

> Hi Jens.
>
> On Sat, Oct 17, 2020 at 3:07 AM Jens Axboe  wrote:
> >
> > Would be great if you could try 5.4.71 and see if that helps for your
> > issue.
> >
>
> Oh wow, yeah it did fix the issue.
>
> I'm able to reliably turn off and start the VM multiple times in a row.
> Double checked by confirming QEMU is dynamically linked to liburing.so.1.
>
> Looks like those 4 io_uring fixes helped.
>
> Thanks!
>



Re: [PATCH v2 1/5] virtio-mem: Probe THP size to determine default block size

2020-09-29 Thread Pankaj Gupta
> >> Let's allow a minimum block size of 1 MiB in all configurations. Select
> >> the default block size based on
> >> - The page size of the memory backend.
> >> - The THP size if the memory backend size corresponds to the real hsot
> >
> > s/hsot/host
>
> thanks!
>
> >>   page size.
> >> - The global minimum of 1 MiB.
> >> and warn if something smaller is configured by the user.
> >>
> >> VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
> >> THP size unconditionally.
> >>
> >> For now we only support virtio-mem on x86-64 - there isn't a user-visiable
> >
> > s/visiable/visible
>
> thanks!
>
> >> change (x86-64 only supports 2 MiB THP on the PMD level) - the default
> >> was, and will be 2 MiB.
> >>
> >> If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
> >> expect to have a trigger to explicitly opt-in for the new THP granularity.
> >>
> >> Cc: "Michael S. Tsirkin" 
> >> Cc: Wei Yang 
> >> Cc: Dr. David Alan Gilbert 
> >> Cc: Igor Mammedov 
> >> Cc: Pankaj Gupta 
> >> Signed-off-by: David Hildenbrand 
> >> ---
> >>  hw/virtio/virtio-mem.c | 105 +++--
> >>  1 file changed, 101 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> >> index 8fbec77ccc..9b1461cf9d 100644
> >> --- a/hw/virtio/virtio-mem.c
> >> +++ b/hw/virtio/virtio-mem.c
> >> @@ -33,10 +33,83 @@
> >>  #include "trace.h"
> >>
> >>  /*
> >> - * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
> >> - * memory (e.g., 2MB on x86_64).
> >> + * Let's not allow blocks smaller than 1 MiB, for example, to keep the 
> >> tracking
> >> + * bitmap small.
> >>   */
> >> -#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
> >> +#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
> >> +
> >> +#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
> >> +defined(__powerpc64__)
> >> +#define VIRTIO_MEM_DEFAULT_THP_SIZE ((uint32_t)(2 * MiB))
> >> +#else
> >> +/* fallback to 1 MiB (e.g., the THP size on s390x) */
> >> +#define VIRTIO_MEM_DEFAULT_THP_SIZE VIRTIO_MEM_MIN_BLOCK_SIZE
> >> +#endif
> >> +
> >> +/*
> >> + * We want to have a reasonable default block size such that
> >> + * 1. We avoid splitting THPs when unplugging memory, which degrades
> >> + *performance.
> >> + * 2. We avoid placing THPs for plugged blocks that also cover unplugged
> >> + *blocks.
> >> + *
> >> + * The actual THP size might differ between Linux kernels, so we try to 
> >> probe
> >> + * it. In the future (if we ever run into issues regarding 2.), we might 
> >> want
> >> + * to disable THP in case we fail to properly probe the THP size, or if 
> >> the
> >> + * block size is configured smaller than the THP size.
> >> + */
> >> +static uint32_t thp_size;
> >> +
> >> +#define HPAGE_PMD_SIZE_PATH 
> >> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> >> +static uint32_t virtio_mem_thp_size(void)
> >> +{
> >> +gchar *content = NULL;
> >> +const char *endptr;
> >> +uint64_t tmp;
> >> +
> >> +if (thp_size) {
> >> +return thp_size;
> >> +}
> >> +
> >> +/*
> >> + * Try to probe the actual THP size, fallback to (sane but eventually
> >> + * incorrect) default sizes.
> >> + */
> >> +if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
> >> +!qemu_strtou64(content, &endptr, 0, &tmp) &&
> >> +(!endptr || *endptr == '\n')) {
> >> +/*
> >> + * Sanity-check the value, if it's too big (e.g., aarch64 with 
> >> 64k base
> >> + * pages) or weird, fallback to something smaller.
> >> + */
> >> +if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
> >> +warn_report("Read unsupported THP size: %" PRIx64, tmp);
> >> +} else {
> >> +thp_size = tmp;
> >> +}
> >> +}
> >> +
> >> +if (!th

Re: [PATCH v2 1/5] virtio-mem: Probe THP size to determine default block size

2020-09-29 Thread Pankaj Gupta
> Let's allow a minimum block size of 1 MiB in all configurations. Select
> the default block size based on
> - The page size of the memory backend.
> - The THP size if the memory backend size corresponds to the real hsot

s/hsot/host
>   page size.
> - The global minimum of 1 MiB.
> and warn if something smaller is configured by the user.
>
> VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
> THP size unconditionally.
>
> For now we only support virtio-mem on x86-64 - there isn't a user-visiable

s/visiable/visible
> change (x86-64 only supports 2 MiB THP on the PMD level) - the default
> was, and will be 2 MiB.
>
> If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
> expect to have a trigger to explicitly opt-in for the new THP granularity.
>
> Cc: "Michael S. Tsirkin" 
> Cc: Wei Yang 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-mem.c | 105 +++--
>  1 file changed, 101 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 8fbec77ccc..9b1461cf9d 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -33,10 +33,83 @@
>  #include "trace.h"
>
>  /*
> - * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
> - * memory (e.g., 2MB on x86_64).
> + * Let's not allow blocks smaller than 1 MiB, for example, to keep the 
> tracking
> + * bitmap small.
>   */
> -#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
> +#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
> +
> +#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
> +defined(__powerpc64__)
> +#define VIRTIO_MEM_DEFAULT_THP_SIZE ((uint32_t)(2 * MiB))
> +#else
> +/* fallback to 1 MiB (e.g., the THP size on s390x) */
> +#define VIRTIO_MEM_DEFAULT_THP_SIZE VIRTIO_MEM_MIN_BLOCK_SIZE
> +#endif
> +
> +/*
> + * We want to have a reasonable default block size such that
> + * 1. We avoid splitting THPs when unplugging memory, which degrades
> + *performance.
> + * 2. We avoid placing THPs for plugged blocks that also cover unplugged
> + *blocks.
> + *
> + * The actual THP size might differ between Linux kernels, so we try to probe
> + * it. In the future (if we ever run into issues regarding 2.), we might want
> + * to disable THP in case we fail to properly probe the THP size, or if the
> + * block size is configured smaller than the THP size.
> + */
> +static uint32_t thp_size;
> +
> +#define HPAGE_PMD_SIZE_PATH 
> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> +static uint32_t virtio_mem_thp_size(void)
> +{
> +gchar *content = NULL;
> +const char *endptr;
> +uint64_t tmp;
> +
> +if (thp_size) {
> +return thp_size;
> +}
> +
> +/*
> + * Try to probe the actual THP size, fallback to (sane but eventually
> + * incorrect) default sizes.
> + */
> +if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
> +!qemu_strtou64(content, &endptr, 0, &tmp) &&
> +(!endptr || *endptr == '\n')) {
> +/*
> + * Sanity-check the value, if it's too big (e.g., aarch64 with 64k 
> base
> + * pages) or weird, fallback to something smaller.
> + */
> +if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
> +warn_report("Read unsupported THP size: %" PRIx64, tmp);
> +} else {
> +thp_size = tmp;
> +}
> +}
> +
> +if (!thp_size) {
> +thp_size = VIRTIO_MEM_DEFAULT_THP_SIZE;
> +warn_report("Could not detect THP size, falling back to %" PRIx64
> +"  MiB.", thp_size / MiB);
> +}
> +
> +g_free(content);
> +return thp_size;
> +}
> +
> +static uint64_t virtio_mem_default_block_size(RAMBlock *rb)
> +{
> +const uint64_t page_size = qemu_ram_pagesize(rb);
> +
> +/* We can have hugetlbfs with a page size smaller than the THP size. */
> +if (page_size == qemu_real_host_page_size) {
> +return MAX(page_size, virtio_mem_thp_size());
> +}

This condition is special, can think of hugetlbfs smaller in size than THP size
configured.
> +return MAX(page_size, VIRTIO_MEM_MIN_BLOCK_SIZE);

Do we still need this? Or we can have only one return for both the cases?
Probably, I am missing something here.
> +}
> +
>  /*
>   * Size the usable region bigger than the requested size if possible. 

Re: [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size

2020-09-28 Thread Pankaj Gupta
> >> Let's allow a minimum block size of 1 MiB in all configurations. Use
> >> a default block size based on the THP size, and warn if something
> >> smaller is configured by the user.
> >>
> >> VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
> >> THP size unconditionally.
> >>
> >> For now we only support virtio-mem on x86-64 - there isn't a user-visiable
> >> change (x86-64 only supports 2 MiB THP on the PMD level) - the default
> >> was, and will be 2 MiB.
> >>
> >> If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
> >> expect to have a trigger to explicitly opt-in for the new THP granularity.
> >>
> >> Cc: "Michael S. Tsirkin" 
> >> Cc: Wei Yang 
> >> Cc: Dr. David Alan Gilbert 
> >> Cc: Igor Mammedov 
> >> Cc: Pankaj Gupta 
> >> Signed-off-by: David Hildenbrand 
> >> ---
> >>  hw/virtio/virtio-mem.c | 82 +++---
> >>  1 file changed, 78 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> >> index 8fbec77ccc..58098686ee 100644
> >> --- a/hw/virtio/virtio-mem.c
> >> +++ b/hw/virtio/virtio-mem.c
> >> @@ -33,10 +33,70 @@
> >>  #include "trace.h"
> >>
> >>  /*
> >> - * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
> >> - * memory (e.g., 2MB on x86_64).
> >> + * Let's not allow blocks smaller than 1 MiB, for example, to keep the 
> >> tracking
> >> + * bitmap small.
> >>   */
> >> -#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
> >> +#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
> >> +
> >> +/*
> >> + * We want to have a reasonable default block size such that
> >> + * 1. We avoid splitting THPs when unplugging memory, which degrades
> >> + *performance.
> >> + * 2. We avoid placing THPs for plugged blocks that also cover unplugged
> >> + *blocks.
> >> + *
> >> + * The actual THP size might differ between Linux kernels, so we try to 
> >> probe
> >> + * it. In the future (if we ever run into issues regarding 2.), we might 
> >> want
> >> + * to disable THP in case we fail to properly probe the THP size, or if 
> >> the
> >> + * block size is configured smaller than the THP size.
> >> + */
> >> +static uint32_t default_block_size;
> >> +
> >> +#define HPAGE_PMD_SIZE_PATH 
> >> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> >> +static uint32_t virtio_mem_default_block_size(void)
> >> +{
> >> +gchar *content = NULL;
> >> +const char *endptr;
> >> +uint64_t tmp;
> >> +
> >> +if (default_block_size) {
> >> +return default_block_size;
> >> +}
> >> +
> >> +/*
> >> + * Try to probe the actual THP size, fallback to (sane but eventually
> >> + * incorrect) default sizes.
> >> + */
> >> +if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
> >> +!qemu_strtou64(content, &endptr, 0, &tmp) &&
> >> +(!endptr || *endptr == '\n')) {
> >> +/*
> >> + * Sanity-check the value, if it's too big (e.g., aarch64 with 
> >> 64k base
> >> + * pages) or weird, fallback to something smaller.
> >> + */
> >> +if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
> >> +warn_report("Detected a THP size of %" PRIx64
> >> +" MiB, falling back to 1 MiB.", tmp / MiB);
> >> +default_block_size = 1 * MiB;
> >
> > Probably use macro "VIRTIO_MEM_MIN_BLOCK_SIZE"
>
> Indeed.
>
> >> +} else {
> >> +default_block_size = tmp;
> >> +}
> >> +} else {
> >> +#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
> >> +defined(__powerpc64__)
> >> +default_block_size = 2 * MiB;
> >> +#else
> >> +/* fallback to 1 MiB (e.g., the THP size on s390x) */
> >> +default_block_size = 1 * MiB;
> >> +#endif
> >
> > Maybe we can declare this macro near to "VIRTIO_MEM_MIN_BLOCK_SIZE
> > ((uint32_t)(1 * MiB))"
> > or club into one, just a thought.
>
> I decided to not use VIRTIO_MEM_MIN_BLOCK_SIZE here to not accidentally
> mess up the s390x THP size when ever wanting to decrease
> VIRTIO_MEM_MIN_BLOCK_SIZE. But as we have a comment here, people should
> know whats happening when ever changing VIRTIO_MEM_MIN_BLOCK_SIZE.

Thanks for answering. Makes sense.

Overall the patch looks good to me.
Reviewed-by: Pankaj Gupta 



Re: [PATCH v1 5/5] virito-mem: Implement get_min_alignment()

2020-09-26 Thread Pankaj Gupta
> This allows auto-assignment of a properly aligned address in guest
> physical address space. For example, when specifying a 2GB block size
> for a virtio-mem device with 10GB with a memory setup "-m 4G, 20G",
> we'll no longer fail when realizing.
>
> Cc: "Michael S. Tsirkin" 
> Cc: Wei Yang 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-mem-pci.c | 14 ++
>  hw/virtio/virtio-mem.c |  8 
>  2 files changed, 22 insertions(+)
>
> diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
> index 590cec041b..2bfa2474fb 100644
> --- a/hw/virtio/virtio-mem-pci.c
> +++ b/hw/virtio/virtio-mem-pci.c
> @@ -75,6 +75,19 @@ static void virtio_mem_pci_fill_device_info(const 
> MemoryDeviceState *md,
>  info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM;
>  }
>
> +static uint64_t virtio_mem_pci_get_min_alignment(const MemoryDeviceState *md)
> +{
> +/*
> + * If no block size was configured, returns the default block size.
> + * Before the device was realized, this might be smaller than the
> + * final block size (because it ignores the page size of the memory 
> region).
> + * However, the alignment of the memory region properly considers the
> + * page size of the memory region.
> + */
> +return object_property_get_uint(OBJECT(md), VIRTIO_MEM_BLOCK_SIZE_PROP,
> +&error_abort);
> +}
> +
>  static void virtio_mem_pci_size_change_notify(Notifier *notifier, void *data)
>  {
>  VirtIOMEMPCI *pci_mem = container_of(notifier, VirtIOMEMPCI,
> @@ -109,6 +122,7 @@ static void virtio_mem_pci_class_init(ObjectClass *klass, 
> void *data)
>  mdc->get_plugged_size = virtio_mem_pci_get_plugged_size;
>  mdc->get_memory_region = virtio_mem_pci_get_memory_region;
>  mdc->fill_device_info = virtio_mem_pci_fill_device_info;
> +mdc->get_min_alignment = virtio_mem_pci_get_min_alignment;
>  }
>
>  static void virtio_mem_pci_instance_init(Object *obj)
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 716eddd792..d8222153cf 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -805,6 +805,14 @@ static void virtio_mem_get_block_size(Object *obj, 
> Visitor *v, const char *name,
>  const VirtIOMEM *vmem = VIRTIO_MEM(obj);
>  uint64_t value = vmem->block_size;
>
> +/*
> + * If not configured by the user (and we're not realized yet), use the
> + * default block size.
> + */
> +if (!value) {
> +value = virtio_mem_default_block_size();
> +}
> +
>  visit_type_size(v, name, &value, errp);
>  }
>

Reviewed-by: Pankaj Gupta 



Re: [PATCH v1 4/5] memory-device: Add get_min_alignment() callback

2020-09-26 Thread Pankaj Gupta
> Will be used by virtio-mem to express special alignment requirements due
> to manually configured, big block sizes. This avoids failing later when
> realizing, because auto-detection wasn't able to assign a properly
> aligned address.
>
> Cc: "Michael S. Tsirkin" 
> Cc: Wei Yang 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/mem/memory-device.c | 11 +--
>  include/hw/mem/memory-device.h | 11 +++
>  2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> index 8a736f1a26..cf0627fd01 100644
> --- a/hw/mem/memory-device.c
> +++ b/hw/mem/memory-device.c
> @@ -259,7 +259,7 @@ void memory_device_pre_plug(MemoryDeviceState *md, 
> MachineState *ms,
>  {
>  const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
>  Error *local_err = NULL;
> -uint64_t addr, align;
> +uint64_t addr, align = 0;
>  MemoryRegion *mr;
>
>  mr = mdc->get_memory_region(md, &local_err);
> @@ -267,7 +267,14 @@ void memory_device_pre_plug(MemoryDeviceState *md, 
> MachineState *ms,
>  goto out;
>  }
>
> -align = legacy_align ? *legacy_align : memory_region_get_alignment(mr);
> +if (legacy_align) {
> +align = *legacy_align;
> +} else {
> +if (mdc->get_min_alignment) {
> +align = mdc->get_min_alignment(md);
> +}
> +align = MAX(align, memory_region_get_alignment(mr));
> +}
>  addr = mdc->get_addr(md);
>  addr = memory_device_get_free_addr(ms, !addr ? NULL : &addr, align,
> memory_region_size(mr), &local_err);
> diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
> index cde52e83c9..563893854a 100644
> --- a/include/hw/mem/memory-device.h
> +++ b/include/hw/mem/memory-device.h
> @@ -88,6 +88,17 @@ struct MemoryDeviceClass {
>   */
>  MemoryRegion *(*get_memory_region)(MemoryDeviceState *md, Error **errp);
>
> +/*
> + * Optional: Return the desired minimum alignment of the device in guest
> + * physical address space, ignoring the alignment requirements of the
> + * memory region (e.g., based on the page size). The final alignment is
> + * computed by selecting the maximum of both alignments.
> + *
> + * Called when plugging the memory device to detect the required 
> alignment
> + * during address assignment.
> + */
> +uint64_t (*get_min_alignment)(const MemoryDeviceState *md);
> +
>  /*
>   * Translate the memory device into #MemoryDeviceInfo.
>   */
> --

Reviewed-by: Pankaj Gupta 



Re: [PATCH v1 3/5] memory-device: Support big alignment requirements

2020-09-25 Thread Pankaj Gupta
> Let's warn instead of bailing out - the worst thing that can happen is
> that we'll fail hot/coldplug later. The user got warned, and this should
> be rare.
>
> This will be necessary for memory devices with rather big (user-defined)
> alignment requirements - say a virtio-mem device with a 2G block size -
> which will become important, for example, when supporting vfio in the
> future.
>
> Cc: "Michael S. Tsirkin" 
> Cc: Wei Yang 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/mem/memory-device.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> index 4bc9cf0917..8a736f1a26 100644
> --- a/hw/mem/memory-device.c
> +++ b/hw/mem/memory-device.c
> @@ -119,9 +119,10 @@ static uint64_t memory_device_get_free_addr(MachineState 
> *ms,
>
>  /* start of address space indicates the maximum alignment we expect */
>  if (!QEMU_IS_ALIGNED(range_lob(&as), align)) {
> -error_setg(errp, "the alignment (0x%" PRIx64 ") is not supported",
> -   align);
> -return 0;
> +warn_report("the alignment (0x%" PRIx64 ") exceeds the expected"
> +" maximum alignment, memory will get fragmented and not"
> +" all 'maxmem' might be usable for memory devices.",
> +align);
>  }
>
>  memory_device_check_addable(ms, size, &err);
> @@ -151,7 +152,7 @@ static uint64_t memory_device_get_free_addr(MachineState 
> *ms,
>  return 0;
>  }
>  } else {
> -if (range_init(&new, range_lob(&as), size)) {
> +if (range_init(&new, QEMU_ALIGN_UP(range_lob(&as), align), size)) {
>  error_setg(errp, "can't add memory device, device too big");
>  return 0;
>  }

Reviewed-by: Pankaj Gupta 



Re: [PATCH v1 2/5] virtio-mem: Check that "memaddr" is multiples of the block size

2020-09-25 Thread Pankaj Gupta
> The spec requires us to set the "addr" in guest physical address space to
> multiples of the block size. In some cases, this is not the case right
> now: For example, when starting a VM with 4 GiB boot memory and a
> virtio-mem device with a block size of 2 GiB, "memaddr" will be
> auto-assigned to 0x14000 / 5 GiB.
>
> We'll try to improve auto-assignment for memory devices next, to avoid
> bailing out in case memory device code selects a bad address.
>
> Note: The Linux driver doesn't support such big block sizes yet.
>
> Cc: "Michael S. Tsirkin" 
> Cc: Wei Yang 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-mem.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 58098686ee..716eddd792 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -515,6 +515,11 @@ static void virtio_mem_device_realize(DeviceState *dev, 
> Error **errp)
> ")", VIRTIO_MEM_REQUESTED_SIZE_PROP,
> VIRTIO_MEM_BLOCK_SIZE_PROP, vmem->block_size);
>  return;
> +} else if (!QEMU_IS_ALIGNED(vmem->addr, vmem->block_size)) {
> +error_setg(errp, "'%s' property has to be multiples of '%s' (0x%" 
> PRIx64
> +   ")", VIRTIO_MEM_ADDR_PROP, VIRTIO_MEM_BLOCK_SIZE_PROP,
> +   vmem->block_size);
> +return;
>  } else if (!QEMU_IS_ALIGNED(memory_region_size(&vmem->memdev->mr),
>  vmem->block_size)) {
>  error_setg(errp, "'%s' property memdev size has to be multiples of"
> --
> 2.26.2

Reviewed-by: Pankaj Gupta 



Re: [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size

2020-09-25 Thread Pankaj Gupta
> Let's allow a minimum block size of 1 MiB in all configurations. Use
> a default block size based on the THP size, and warn if something
> smaller is configured by the user.
>
> VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
> THP size unconditionally.
>
> For now we only support virtio-mem on x86-64 - there isn't a user-visiable
> change (x86-64 only supports 2 MiB THP on the PMD level) - the default
> was, and will be 2 MiB.
>
> If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
> expect to have a trigger to explicitly opt-in for the new THP granularity.
>
> Cc: "Michael S. Tsirkin" 
> Cc: Wei Yang 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-mem.c | 82 +++---
>  1 file changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 8fbec77ccc..58098686ee 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -33,10 +33,70 @@
>  #include "trace.h"
>
>  /*
> - * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
> - * memory (e.g., 2MB on x86_64).
> + * Let's not allow blocks smaller than 1 MiB, for example, to keep the 
> tracking
> + * bitmap small.
>   */
> -#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
> +#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
> +
> +/*
> + * We want to have a reasonable default block size such that
> + * 1. We avoid splitting THPs when unplugging memory, which degrades
> + *performance.
> + * 2. We avoid placing THPs for plugged blocks that also cover unplugged
> + *blocks.
> + *
> + * The actual THP size might differ between Linux kernels, so we try to probe
> + * it. In the future (if we ever run into issues regarding 2.), we might want
> + * to disable THP in case we fail to properly probe the THP size, or if the
> + * block size is configured smaller than the THP size.
> + */
> +static uint32_t default_block_size;
> +
> +#define HPAGE_PMD_SIZE_PATH 
> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> +static uint32_t virtio_mem_default_block_size(void)
> +{
> +gchar *content = NULL;
> +const char *endptr;
> +uint64_t tmp;
> +
> +if (default_block_size) {
> +return default_block_size;
> +}
> +
> +/*
> + * Try to probe the actual THP size, fallback to (sane but eventually
> + * incorrect) default sizes.
> + */
> +if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
> +!qemu_strtou64(content, &endptr, 0, &tmp) &&
> +(!endptr || *endptr == '\n')) {
> +/*
> + * Sanity-check the value, if it's too big (e.g., aarch64 with 64k 
> base
> + * pages) or weird, fallback to something smaller.
> + */
> +if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
> +warn_report("Detected a THP size of %" PRIx64
> +" MiB, falling back to 1 MiB.", tmp / MiB);
> +default_block_size = 1 * MiB;

Probably use macro "VIRTIO_MEM_MIN_BLOCK_SIZE"
> +} else {
> +default_block_size = tmp;
> +}
> +} else {
> +#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
> +defined(__powerpc64__)
> +default_block_size = 2 * MiB;
> +#else
> +/* fallback to 1 MiB (e.g., the THP size on s390x) */
> +default_block_size = 1 * MiB;
> +#endif

Maybe we can declare this macro near to "VIRTIO_MEM_MIN_BLOCK_SIZE
((uint32_t)(1 * MiB))"
or club into one, just a thought.

> +warn_report("Could not detect THP size, falling back to %" PRIx64
> +"  MiB.", default_block_size / MiB);
> +}
> +
> +g_free(content);
> +return default_block_size;
> +}
> +
>  /*
>   * Size the usable region bigger than the requested size if possible. Esp.
>   * Linux guests will only add (aligned) memory blocks in case they fully
> @@ -437,6 +497,15 @@ static void virtio_mem_device_realize(DeviceState *dev, 
> Error **errp)
>  rb = vmem->memdev->mr.ram_block;
>  page_size = qemu_ram_pagesize(rb);
>
> +/*
> + * If the block size wasn't configured by the user, use a sane default. 
> This
> + * allows using hugetlbfs backends with a pagesize bigger than the 
> detected
> + * THP size without manual intervention/configuration.
> + */
> +if (

Re: [PATCH] virtio-pmem-pci: force virtio version 1

2020-09-25 Thread Pankaj Gupta
>
> >  Qemu fails with below error when trying to run with virtio pmem:
> >
> >  (qemu) qemu-system-x86_64: -device virtio-pmem-pci,memdev=mem1,id=nv1:
> >   device is modern-only, use disable-legacy=on
>
> Oh, another one :(
:)
>
> >
> >  This patch fixes this by forcing virtio 1 with virtio-pmem.
> >
> > fixes: adf0748a49 ("virtio-pci: Proxy for virtio-pmem")
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  hw/virtio/virtio-pmem-pci.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hw/virtio/virtio-pmem-pci.c b/hw/virtio/virtio-pmem-pci.c
> > index 21a457d151..2b2a0b1eae 100644
> > --- a/hw/virtio/virtio-pmem-pci.c
> > +++ b/hw/virtio/virtio-pmem-pci.c
> > @@ -22,6 +22,7 @@ static void virtio_pmem_pci_realize(VirtIOPCIProxy 
> > *vpci_dev, Error **errp)
> >  VirtIOPMEMPCI *pmem_pci = VIRTIO_PMEM_PCI(vpci_dev);
> >  DeviceState *vdev = DEVICE(&pmem_pci->vdev);
> >
> > +virtio_pci_force_virtio_1(vpci_dev);
> >  qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> >  }
> >
>
> The alternative would be to force virtio 1 only for 5.1 and later (see
> https://lore.kernel.org/qemu-devel/20200921122506.82515-1-sgarz...@redhat.com/).

Thanks for sharing this. It's still good to mark virtio pmem as a
modern virtio device.

Best regards,
Pankaj
>



[PATCH] virtio-pmem-pci: force virtio version 1

2020-09-25 Thread Pankaj Gupta
 Qemu fails with below error when trying to run with virtio pmem:

 (qemu) qemu-system-x86_64: -device virtio-pmem-pci,memdev=mem1,id=nv1:
  device is modern-only, use disable-legacy=on
 
 This patch fixes this by forcing virtio 1 with virtio-pmem.

fixes: adf0748a49 ("virtio-pci: Proxy for virtio-pmem")
Signed-off-by: Pankaj Gupta 
---
 hw/virtio/virtio-pmem-pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/virtio-pmem-pci.c b/hw/virtio/virtio-pmem-pci.c
index 21a457d151..2b2a0b1eae 100644
--- a/hw/virtio/virtio-pmem-pci.c
+++ b/hw/virtio/virtio-pmem-pci.c
@@ -22,6 +22,7 @@ static void virtio_pmem_pci_realize(VirtIOPCIProxy *vpci_dev, 
Error **errp)
 VirtIOPMEMPCI *pmem_pci = VIRTIO_PMEM_PCI(vpci_dev);
 DeviceState *vdev = DEVICE(&pmem_pci->vdev);
 
+virtio_pci_force_virtio_1(vpci_dev);
 qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
 }
 
-- 
2.20.1




Re: [PATCH] docs: add 'io_uring' option to 'aio' param in qemu-options.hx

2020-09-24 Thread Pankaj Gupta
> When we added io_uring AIO engine, we forgot to update qemu-options.hx,
> so qemu(1) man page and qemu help were outdated.
>
> Signed-off-by: Stefano Garzarella 
> ---
>  qemu-options.hx | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 47f64be0c0..5b098577fe 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1053,7 +1053,8 @@ SRST
>  The path to the image file in the local filesystem
>
>  ``aio``
> -Specifies the AIO backend (threads/native, default: threads)
> +Specifies the AIO backend (threads/native/io_uring,
> +default: threads)
>
>  ``locking``
>  Specifies whether the image file is protected with Linux OFD
> @@ -1175,7 +1176,8 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>  "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>  "
>  [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
>  "   [,snapshot=on|off][,rerror=ignore|stop|report]\n"
> -"
>  [,werror=ignore|stop|report|enospc][,id=name][,aio=threads|native]\n"
> +"   [,werror=ignore|stop|report|enospc][,id=name]\n"
> +"   [,aio=threads|native|io_uring]\n"
>  "   [,readonly=on|off][,copy-on-read=on|off]\n"
>  "   [,discard=ignore|unmap][,detect-zeroes=on|off|unmap]\n"
>  "   [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]]\n"
> @@ -1247,8 +1249,8 @@ SRST
>  The default mode is ``cache=writeback``.
>
>  ``aio=aio``
> -aio is "threads", or "native" and selects between pthread based
> -disk I/O and native Linux AIO.
> +aio is "threads", "native", or "io_uring" and selects between
> pthread
> +based disk I/O, native Linux AIO, or Linux io_uring API.
>
>  ``format=format``
>  Specify which disk format will be used rather than detecting the
>

Reviewed-by: Pankaj Gupta 

-- 
> 2.26.2
>
>
>


Re: [PATCH] hw: virtio-pmem: detach the element fromt the virtqueue when error occurs

2020-09-10 Thread Pankaj Gupta
Hi Michael,

Please add while applying the patch.

fixes: 5f503cd9f3 ("virtio-pmem: add virtio device")

On Mon, 7 Sep 2020 at 03:38, Li Qiang  wrote:
>
> ping!
>
> Li Qiang  于2020年8月28日周五 上午9:21写道:
> >
> > Kindly ping.
> >
> > Li Qiang  于2020年8月14日周五 上午12:52写道:
> > >
> > > If error occurs while processing the virtio request we should call
> > > 'virtqueue_detach_element' to detach the element from the virtqueue
> > > before free the elem.
> > >
> > > Signed-off-by: Li Qiang 
> > > ---
> > >  hw/virtio/virtio-pmem.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
> > > index 1e0c137497..ddb0125901 100644
> > > --- a/hw/virtio/virtio-pmem.c
> > > +++ b/hw/virtio/virtio-pmem.c
> > > @@ -77,6 +77,7 @@ static void virtio_pmem_flush(VirtIODevice *vdev, 
> > > VirtQueue *vq)
> > >
> > >  if (req_data->elem.out_num < 1 || req_data->elem.in_num < 1) {
> > >  virtio_error(vdev, "virtio-pmem request not proper");
> > > +virtqueue_detach_element(vq, (VirtQueueElement *)req_data, 0);
> > >  g_free(req_data);
> > >  return;
> > >  }
> > > --
> > > 2.17.1
> > >
> > >
>



Re: [PATCH v7 2/2] i386: Simplify CPUID_8000_001E for AMD

2020-09-02 Thread Pankaj Gupta
= core_id % cores_in_ccx;
> -topo->num_nodes = nodes;
> -}
> -
>  /* Encode cache info for CPUID[801E] */
> -static void encode_topo_cpuid801e(CPUState *cs, X86CPU *cpu,
> -   uint32_t *eax, uint32_t *ebx,
> -   uint32_t *ecx, uint32_t *edx)
> +static void encode_topo_cpuid801e(X86CPU *cpu, X86CPUTopoInfo *topo_info,
> +  uint32_t *eax, uint32_t *ebx,
> +  uint32_t *ecx, uint32_t *edx)
>  {
> -struct core_topology topo = {0};
> -unsigned long nodes;
> -int shift;
> +X86CPUTopoIDs topo_ids;
> +
> +x86_topo_ids_from_apicid(cpu->apic_id, topo_info, &topo_ids);
>
> -build_core_topology(cs->nr_cores, cpu->core_id, &topo);
>  *eax = cpu->apic_id;
> +
>  /*
> - * CPUID_Fn801E_EBX
> - * 31:16 Reserved
> - * 15:8  Threads per core (The number of threads per core is
> - *   Threads per core + 1)
> - *  7:0  Core id (see bit decoding below)
> - *   SMT:
> - *   4:3 node id
> - * 2 Core complex id
> - *   1:0 Core id
> - *   Non SMT:
> - *   5:4 node id
> - * 3 Core complex id
> - *   1:0 Core id
> + * CPUID_Fn801E_EBX [Core Identifiers] (CoreId)
> + * Read-only. Reset: _h.
> + * See Core::X86::Cpuid::ExtApicId.
> + * Core::X86::Cpuid::CoreId_lthree[1:0]_core[3:0]_thread[1:0];
> + * Bits Description
> + * 31:16 Reserved.
> + * 15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh.
> + *  The number of threads per core is ThreadsPerCore+1.
> + *  7:0 CoreId: core ID. Read-only. Reset: XXh.
> + *
> + *  NOTE: CoreId is already part of apic_id. Just use it. We can
> + *  use all the 8 bits to represent the core_id here.
>   */
> -if (cs->nr_threads - 1) {
> -*ebx = ((cs->nr_threads - 1) << 8) | (topo.node_id << 3) |
> -(topo.ccx_id << 2) | topo.core_id;
> -} else {
> -*ebx = (topo.node_id << 4) | (topo.ccx_id << 3) | topo.core_id;
> -}
> +*ebx = ((topo_info->threads_per_core - 1) << 8) | (topo_ids.core_id & 
> 0xFF);
> +
>  /*
> - * CPUID_Fn801E_ECX
> - * 31:11 Reserved
> - * 10:8  Nodes per processor (Nodes per processor is number of nodes + 1)
> - *  7:0  Node id (see bit decoding below)
> - * 2  Socket id
> - *   1:0  Node id
> + * CPUID_Fn801E_ECX [Node Identifiers] (NodeId)
> + * Read-only. Reset: _0XXXh.
> + * Core::X86::Cpuid::NodeId_lthree[1:0]_core[3:0]_thread[1:0];
> + * Bits Description
> + * 31:11 Reserved.
> + * 10:8 NodesPerProcessor: Node per processor. Read-only. Reset: XXXb.
> + *  ValidValues:
> + *  Value Description
> + *  000b  1 node per processor.
> + *  001b  2 nodes per processor.
> + *  010b Reserved.
> + *  011b 4 nodes per processor.
> + *  111b-100b Reserved.
> + *  7:0 NodeId: Node ID. Read-only. Reset: XXh.
> + *
> + * NOTE: Hardware reserves 3 bits for number of nodes per processor.
> + * But users can create more nodes than the actual hardware can
> + * support. To genaralize we can use all the upper 8 bits for nodes.
> + * NodeId is combination of node and socket_id which is already decoded
> + * in apic_id. Just use it by shifting.
>   */
> -if (topo.num_nodes <= 4) {
> -*ecx = ((topo.num_nodes - 1) << 8) | (cpu->socket_id << 2) |
> -topo.node_id;
> -} else {
> -/*
> - * Node id fix up. Actual hardware supports up to 4 nodes. But with
> - * more than 32 cores, we may end up with more than 4 nodes.
> - * Node id is a combination of socket id and node id. Only 
> requirement
> - * here is that this number should be unique accross the system.
> - * Shift the socket id to accommodate more nodes. We dont expect both
> - * socket id and node id to be big number at the same time. This is 
> not
> - * an ideal config but we need to to support it. Max nodes we can 
> have
> - * is 32 (255/8) with 8 cores per node and 255 max cores. We only 
> need
> - * 5 bits for nodes. Find the left most set bit to represent the 
> total
> - * number of nodes. find_last_bit returns last set bit(0 based). Left
> - * shift(+1) the socket id to represent all the nodes.
> - */
> -nodes = topo.num_nodes - 1;
> -shift = find_last_bit(&nodes, 8);
> -*ecx = ((topo.num_nodes - 1) << 8) | (cpu->socket_id << (shift + 1)) 
> |
> -topo.node_id;
> -}
> +*ecx = ((topo_info->dies_per_pkg - 1) << 8) |
> +   ((cpu->apic_id >> apicid_die_offset(topo_info)) & 0xFF);
> +
>  *edx = 0;
>  }
>
> @@ -6017,7 +5912,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  break;
>  case 0x801E:
>  assert(cpu->core_id <= 255);
> -encode_topo_cpuid801e(cs, cpu,
> +encode_topo_cpuid801e(cpu, &topo_info,
>eax, ebx, ecx, edx);
>  break;
>  case 0xC000:
>

Reviewed-by: Pankaj Gupta 



Re: [PATCH v7 1/2] i386: Simplify CPUID_8000_001d for AMD

2020-09-02 Thread Pankaj Gupta
> Remove all the hardcoded values and replace with generalized
> fields.
>
> Signed-off-by: Babu Moger 
> ---
>  target/i386/cpu.c |   31 ---
>  1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index ba4667b33c..b12addf323 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -395,11 +395,12 @@ static int cores_in_core_complex(int nr_cores)
>  }
>
>  /* Encode cache info for CPUID[801D] */
> -static void encode_cache_cpuid801d(CPUCacheInfo *cache, CPUState *cs,
> -uint32_t *eax, uint32_t *ebx,
> -uint32_t *ecx, uint32_t *edx)
> +static void encode_cache_cpuid801d(CPUCacheInfo *cache,
> +   X86CPUTopoInfo *topo_info,
> +   uint32_t *eax, uint32_t *ebx,
> +   uint32_t *ecx, uint32_t *edx)
>  {
> -uint32_t l3_cores;
> +uint32_t l3_threads;
>  assert(cache->size == cache->line_size * cache->associativity *
>cache->partitions * cache->sets);
>
> @@ -408,10 +409,10 @@ static void encode_cache_cpuid801d(CPUCacheInfo 
> *cache, CPUState *cs,
>
>  /* L3 is shared among multiple cores */
>  if (cache->level == 3) {
> -l3_cores = cores_in_core_complex(cs->nr_cores);
> -*eax |= ((l3_cores * cs->nr_threads) - 1) << 14;
> +l3_threads = topo_info->cores_per_die * topo_info->threads_per_core;
> +*eax |= (l3_threads - 1) << 14;
>  } else {
> -*eax |= ((cs->nr_threads - 1) << 14);
> +*eax |= ((topo_info->threads_per_core - 1) << 14);
>  }
>
>  assert(cache->line_size > 0);
> @@ -5994,20 +5995,20 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  }
>  switch (count) {
>  case 0: /* L1 dcache info */
> -encode_cache_cpuid801d(env->cache_info_amd.l1d_cache, cs,
> -   eax, ebx, ecx, edx);
> +encode_cache_cpuid801d(env->cache_info_amd.l1d_cache,
> +   &topo_info, eax, ebx, ecx, edx);
>  break;
>  case 1: /* L1 icache info */
> -encode_cache_cpuid801d(env->cache_info_amd.l1i_cache, cs,
> -   eax, ebx, ecx, edx);
> +encode_cache_cpuid801d(env->cache_info_amd.l1i_cache,
> +   &topo_info, eax, ebx, ecx, edx);
>  break;
>  case 2: /* L2 cache info */
> -encode_cache_cpuid801d(env->cache_info_amd.l2_cache, cs,
> -   eax, ebx, ecx, edx);
> +encode_cache_cpuid801d(env->cache_info_amd.l2_cache,
> +   &topo_info, eax, ebx, ecx, edx);
>  break;
>  case 3: /* L3 cache info */
> -encode_cache_cpuid801d(env->cache_info_amd.l3_cache, cs,
> -   eax, ebx, ecx, edx);
> +encode_cache_cpuid801d(env->cache_info_amd.l3_cache,
> +   &topo_info, eax, ebx, ecx, edx);
>  break;
>  default: /* end of info */
>  *eax = *ebx = *ecx = *edx = 0;
>
Nice clean up.
Reviewed-by: Pankaj Gupta 



Re: [PATCH] hw: virtio-pmem: detach the element fromt the virtqueue when error occurs

2020-08-17 Thread Pankaj Gupta
> If error occurs while processing the virtio request we should call
> 'virtqueue_detach_element' to detach the element from the virtqueue
> before free the elem.
>
> Signed-off-by: Li Qiang 
> ---
>  hw/virtio/virtio-pmem.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
> index 1e0c137497..ddb0125901 100644
> --- a/hw/virtio/virtio-pmem.c
> +++ b/hw/virtio/virtio-pmem.c
> @@ -77,6 +77,7 @@ static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue 
> *vq)
>
>  if (req_data->elem.out_num < 1 || req_data->elem.in_num < 1) {
>  virtio_error(vdev, "virtio-pmem request not proper");
> +virtqueue_detach_element(vq, (VirtQueueElement *)req_data, 0);
>  g_free(req_data);
>  return;
>  }
> --

Reviewed-by: Pankaj Gupta 

> 2.17.1
>
>



Re: [PATCH] virtio-mem: Correct format specifier mismatch for RISC-V

2020-07-30 Thread Pankaj Gupta
> This likely affects other, less popular host architectures as well.
> Less common host architectures under linux get QEMU_VMALLOC_ALIGN (from
> which VIRTIO_MEM_MIN_BLOCK_SIZE is derived) define to a variable of
> type uintptr, which isn't compatible with the format specifier used to
> print a user message. Since this particular usage of the underlying data
> seems unique to this file, the simple fix is to just cast
> QEMU_VMALLOC_ALIGN to uint32_t, which corresponds to the format specifier
> used.
>
> Signed-off-by: Bruce Rogers 
> ---
>  hw/virtio/virtio-mem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index c12e9f79b0..7740fc613f 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -36,7 +36,7 @@
>   * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
>   * memory (e.g., 2MB on x86_64).
>   */
> -#define VIRTIO_MEM_MIN_BLOCK_SIZE QEMU_VMALLOC_ALIGN
> +#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
>  /*
>   * Size the usable region bigger than the requested size if possible. Esp.
>   * Linux guests will only add (aligned) memory blocks in case they fully

Reviewed-by: Pankaj Gupta 



Re: [PATCH] docs/nvdimm: add 'pmem=on' for the device dax backend file

2020-07-29 Thread Pankaj Gupta
> At the end of live migration, QEMU uses msync() to flush the data to
> the backend storage. When the backend file is a character device dax,
> the pages explicitly avoid the page cache. It will return failure from 
> msync().
> The following warning is output.
>
> "warning: qemu_ram_msync: failed to sync memory range“
>
> So we add 'pmem=on' to avoid calling msync(), use the QEMU command line:
>
> -object memory-backend-file,id=mem1,pmem=on,mem-path=/dev/dax0.0,size=4G
>
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Jingqi Liu 
> ---
>  docs/nvdimm.txt | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index c2c6e441b3..31048aff5e 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -243,6 +243,13 @@ use the QEMU command line:
>
>  -object memory-backend-file,id=nv_mem,mem-path=/XXX/yyy,size=4G,pmem=on
>
> +At the end of live migration, QEMU uses msync() to flush the data to the
> +backend storage. When the backend file is a character device dax, the pages
> +explicitly avoid the page cache. It will return failure from msync().
> +So we add 'pmem=on' to avoid calling msync(), use the QEMU command line:
> +
> +-object memory-backend-file,id=mem1,pmem=on,mem-path=/dev/dax0.0,size=4G
> +
>  References
>  --
>
> --
Good to document this.

Reviewed-by: Pankaj Gupta 

> 2.17.1
>
>



Re: [PATCH v3 14/20] numa: Handle virtio-mem in NUMA stats

2020-06-03 Thread Pankaj Gupta
> Account the memory to the configured nid.
>
> Cc: Eduardo Habkost 
> Cc: Marcel Apfelbaum 
> Cc: "Michael S. Tsirkin" 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/core/numa.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 316bc50d75..06960918e7 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -812,6 +812,7 @@ static void numa_stat_memory_devices(NumaNodeMem 
> node_mem[])
>  MemoryDeviceInfoList *info;
>  PCDIMMDeviceInfo *pcdimm_info;
>  VirtioPMEMDeviceInfo *vpi;
> +VirtioMEMDeviceInfo *vmi;
>
>  for (info = info_list; info; info = info->next) {
>  MemoryDeviceInfo *value = info->value;
> @@ -832,6 +833,11 @@ static void numa_stat_memory_devices(NumaNodeMem 
> node_mem[])
>  node_mem[0].node_mem += vpi->size;
>  node_mem[0].node_plugged_mem += vpi->size;
>  break;
> +case MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM:
> +vmi = value->u.virtio_mem.data;
> +node_mem[vmi->node].node_mem += vmi->size;
> +node_mem[vmi->node].node_plugged_mem += vmi->size;
> +break;
>  default:
>  g_assert_not_reached();
>  }

Reviewed-by: Pankaj Gupta 



Re: [PATCH v4 4/5] virtio-blk: default num_queues to -smp N

2020-05-28 Thread Pankaj Gupta
> Automatically size the number of virtio-blk-pci request virtqueues to
> match the number of vCPUs.  Other transports continue to default to 1
> request virtqueue.
>
> A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
> handled on the same vCPU that submitted the request.  No IPI is
> necessary to complete an I/O request and performance is improved.
>
> Performance improves from 78k to 104k IOPS on a 32 vCPU guest with 101
> virtio-blk-pci devices (ioengine=libaio, iodepth=1, bs=4k, rw=randread
> with NVMe storage).
>
> Signed-off-by: Stefan Hajnoczi 
> Reviewed-by: Cornelia Huck 
> ---
>  include/hw/virtio/virtio-blk.h | 2 ++
>  hw/block/virtio-blk.c  | 6 +-
>  hw/core/machine.c  | 1 +
>  hw/virtio/virtio-blk-pci.c | 7 ++-
>  4 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
> index 1e62f869b2..4e5e903f4a 100644
> --- a/include/hw/virtio/virtio-blk.h
> +++ b/include/hw/virtio/virtio-blk.h
> @@ -30,6 +30,8 @@ struct virtio_blk_inhdr
>  unsigned char status;
>  };
>
> +#define VIRTIO_BLK_AUTO_NUM_QUEUES UINT16_MAX
> +
>  struct VirtIOBlkConf
>  {
>  BlockConf conf;
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index f5f6fc925e..3c36b38255 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -1135,6 +1135,9 @@ static void virtio_blk_device_realize(DeviceState *dev, 
> Error **errp)
>  error_setg(errp, "Device needs media, but drive is empty");
>  return;
>  }
> +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> +conf->num_queues = 1;
> +}
>  if (!conf->num_queues) {
>  error_setg(errp, "num-queues property must be larger than 0");
>  return;
> @@ -1274,7 +1277,8 @@ static Property virtio_blk_properties[] = {
>  #endif
>  DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
>  true),
> -DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
> +DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues,
> +   VIRTIO_BLK_AUTO_NUM_QUEUES),
>  DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256),
>  DEFINE_PROP_BOOL("seg-max-adjust", VirtIOBlock, conf.seg_max_adjust, 
> true),
>  DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index df7664bc8d..4aba3bdd3c 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -29,6 +29,7 @@
>  #include "migration/vmstate.h"
>
>  GlobalProperty hw_compat_5_0[] = {
> +{ "virtio-blk-device", "num-queues", "1"},
>  { "virtio-scsi-device", "num_queues", "1"},
>  { "vhost-scsi", "num_queues", "1"},
>  { "vhost-user-scsi", "num_queues", "1"},
> diff --git a/hw/virtio/virtio-blk-pci.c b/hw/virtio/virtio-blk-pci.c
> index 28838fa958..2f0ede3863 100644
> --- a/hw/virtio/virtio-blk-pci.c
> +++ b/hw/virtio/virtio-blk-pci.c
> @@ -50,9 +50,14 @@ static void virtio_blk_pci_realize(VirtIOPCIProxy 
> *vpci_dev, Error **errp)
>  {
>  VirtIOBlkPCI *dev = VIRTIO_BLK_PCI(vpci_dev);
>  DeviceState *vdev = DEVICE(&dev->vdev);
> +VirtIOBlkConf *conf = &dev->vdev.conf;
> +
> +if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
> +conf->num_queues = virtio_pci_optimal_num_queues(0);
> +}
>
>  if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> -vpci_dev->nvectors = dev->vdev.conf.num_queues + 1;
> +vpci_dev->nvectors = conf->num_queues + 1;
>  }
>
>  qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));

Looks good to me.
Reviewed-by: Pankaj Gupta 



Re: [PATCH v4 2/5] virtio-scsi: introduce a constant for fixed virtqueues

2020-05-28 Thread Pankaj Gupta
> The event and control virtqueues are always present, regardless of the
> multi-queue configuration.  Define a constant so that virtqueue number
> calculations are easier to read.
>
> Signed-off-by: Stefan Hajnoczi 
> Reviewed-by: Cornelia Huck 
> ---
>  include/hw/virtio/virtio-scsi.h | 3 +++
>  hw/scsi/vhost-user-scsi.c   | 2 +-
>  hw/scsi/virtio-scsi.c   | 7 ---
>  hw/virtio/vhost-scsi-pci.c  | 3 ++-
>  hw/virtio/vhost-user-scsi-pci.c | 3 ++-
>  hw/virtio/virtio-scsi-pci.c | 3 ++-
>  6 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
> index 24e768909d..9f293bcb80 100644
> --- a/include/hw/virtio/virtio-scsi.h
> +++ b/include/hw/virtio/virtio-scsi.h
> @@ -36,6 +36,9 @@
>  #define VIRTIO_SCSI_MAX_TARGET  255
>  #define VIRTIO_SCSI_MAX_LUN 16383
>
> +/* Number of virtqueues that are always present */
> +#define VIRTIO_SCSI_VQ_NUM_FIXED2
> +
>  typedef struct virtio_scsi_cmd_req VirtIOSCSICmdReq;
>  typedef struct virtio_scsi_cmd_resp VirtIOSCSICmdResp;
>  typedef struct virtio_scsi_ctrl_tmf_req VirtIOSCSICtrlTMFReq;
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index cbb5d97599..f8bd158c31 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -115,7 +115,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, 
> Error **errp)
>  goto free_virtio;
>  }
>
> -vsc->dev.nvqs = 2 + vs->conf.num_queues;
> +vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
>  vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs);
>  vsc->dev.vq_index = 0;
>  vsc->dev.backend_features = 0;
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 9b72094a61..f3d60683bd 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -191,7 +191,7 @@ static void virtio_scsi_save_request(QEMUFile *f, 
> SCSIRequest *sreq)
>  VirtIOSCSIReq *req = sreq->hba_private;
>  VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(req->dev);
>  VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
> -uint32_t n = virtio_get_queue_index(req->vq) - 2;
> +uint32_t n = virtio_get_queue_index(req->vq) - VIRTIO_SCSI_VQ_NUM_FIXED;
>
>  assert(n < vs->conf.num_queues);
>  qemu_put_be32s(f, &n);
> @@ -892,10 +892,11 @@ void virtio_scsi_common_realize(DeviceState *dev,
>  sizeof(VirtIOSCSIConfig));
>
>  if (s->conf.num_queues == 0 ||
> -s->conf.num_queues > VIRTIO_QUEUE_MAX - 2) {
> +s->conf.num_queues > VIRTIO_QUEUE_MAX - 
> VIRTIO_SCSI_VQ_NUM_FIXED) {
>  error_setg(errp, "Invalid number of queues (= %" PRIu32 "), "
>   "must be a positive integer less than %d.",
> -   s->conf.num_queues, VIRTIO_QUEUE_MAX - 2);
> +   s->conf.num_queues,
> +   VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED);
>  virtio_cleanup(vdev);
>  return;
>  }
> diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
> index 5da6bb6449..2b25db9a3c 100644
> --- a/hw/virtio/vhost-scsi-pci.c
> +++ b/hw/virtio/vhost-scsi-pci.c
> @@ -50,7 +50,8 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy 
> *vpci_dev, Error **errp)
>  VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
>
>  if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> -vpci_dev->nvectors = vs->conf.num_queues + 3;
> +vpci_dev->nvectors = vs->conf.num_queues +
> + VIRTIO_SCSI_VQ_NUM_FIXED + 1;
>  }
>
>  qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
> diff --git a/hw/virtio/vhost-user-scsi-pci.c b/hw/virtio/vhost-user-scsi-pci.c
> index 6f3375fe55..80710ab170 100644
> --- a/hw/virtio/vhost-user-scsi-pci.c
> +++ b/hw/virtio/vhost-user-scsi-pci.c
> @@ -56,7 +56,8 @@ static void vhost_user_scsi_pci_realize(VirtIOPCIProxy 
> *vpci_dev, Error **errp)
>  VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
>
>  if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> -vpci_dev->nvectors = vs->conf.num_queues + 3;
> +vpci_dev->nvectors = vs->conf.num_queues +
> + VIRTIO_SCSI_VQ_NUM_FIXED + 1;
>  }
>
>  qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
> diff --git a/hw/virtio/virtio-scsi-pci.c b/hw/virtio/virtio-scsi-pci.c
> index e82e7e5680..c52d68053a 100644
> --- a/hw/virtio/virtio-scsi-pci.c
> +++ b/hw/virtio/virtio-scsi-pci.c
> @@ -51,7 +51,8 @@ static void virtio_scsi_pci_realize(VirtIOPCIProxy 
> *vpci_dev, Error **errp)
>  char *bus_name;
>
>  if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> -vpci_dev->nvectors = vs->conf.num_queues + 3;
> +vpci_dev->nvectors = vs->conf.num_queues +
> + VIRTIO_SCSI_VQ_NUM_FIXED + 1;
>  }
>
>  /*
> --
Better readability with no change in logic. Code looks good to me.

Reviewed-by: Pankaj Gupta 



Re: [PATCH v1] pc: Support coldplugging of virtio-pmem-pci devices on all buses

2020-05-27 Thread Pankaj Gupta
Hi David, Vivek,
s
> >> Hi Vivek,
> >>
> >> you have to declare the maxMemory option. Memory devices like
> >> virtio-pmem-pci reside in RAM like a pc-dimm or a nvdimm. If your
> >> virtio-pmem device will be 4GB, you have to add that to maxMemory.
> >>
> >>   64
> >>   68
> >>   64
> >>
> >> (you might have to add "slots='0'" or "slots='1'" to maxMemory to make
> >> libvirt happy)
> >
> > Ok, tried that.
> >
> > 134217728
> >
> > And now it complains about.
> >
> > error: unsupported configuration: At least one numa node has to be 
> > configured when enabling memory hotplug
> >
> > So ultimately it seems to be wanting me to somehow enable memory hotplug
> > to be able to use virtio-pmem?
>
> That's a libvirt error message. Maybe I am confused how libvirt maps
> these parameters to QEMU ...
>
> NVDIMMs under libvirt seem to be easy:
>
> https://www.redhat.com/archives/libvir-list/2016-August/msg00055.html
>
> Maybe the issue is that virtio-pmem has not been properly integrated
> into libvirt yet:
>
> https://www.redhat.com/archives/libvir-list/2019-August/msg7.html
>
> And you attempts to force virtio-pmem in via qemu args does not work
> properly.
>
> Maybe maxMemory in libvirt does not directly map to the QEMU variant to
> define the maximum physical address space reserved also for any memory
> devices (DIMMs, NVDIMMs, virtio-pmem, ...). Any libvirt experts that can
> help?
>
> @Pankaj, did you ever get it to run with libvirt?

I did not run virtio-pmem with libvirt. That requires work at libvirt side.
Created [1] document to run from Qemu command line.

[1] https://github.com/qemu/qemu/blob/master/docs/virtio-pmem.rst



Re: [PATCH v1] pc: Support coldplugging of virtio-pmem-pci devices on all buses

2020-05-25 Thread Pankaj Gupta
> E.g., with "pc-q35-4.2", trying to coldplug a virtio-pmem-pci devices
> results in
> "virtio-pmem-pci not supported on this bus"
>
> Reasons is, that the bus does not support hotplug and, therefore, does
> not have a hotplug handler. Let's allow coldplugging virtio-pmem devices
> on such buses. The hotplug order is only relevant for virtio-pmem-pci
> when the guest is already alive and the device is visible before
> memory_device_plug() wired up the memory device bits.
>
> Hotplug attempts will still fail with:
> "Error: Bus 'pcie.0' does not support hotplugging"
>
> Hotunplug attempts will still fail with:
>     "Error: Bus 'pcie.0' does not support hotplugging"
>
> Reported-by: Vivek Goyal 
> Cc: Pankaj Gupta 
> Cc: Igor Mammedov 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/i386/pc.c | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 2128f3d6fe..c740495eb6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1663,13 +1663,13 @@ static void 
> pc_virtio_pmem_pci_pre_plug(HotplugHandler *hotplug_dev,
>  HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
>  Error *local_err = NULL;
>
> -if (!hotplug_dev2) {
> +if (!hotplug_dev2 && dev->hotplugged) {
>  /*
>   * Without a bus hotplug handler, we cannot control the plug/unplug
> - * order. This should never be the case on x86, however better add
> - * a safety net.
> + * order. We should never reach this point when hotplugging on x86,
> + * however, better add a safety net.
>   */
> -error_setg(errp, "virtio-pmem-pci not supported on this bus.");
> +error_setg(errp, "virtio-pmem-pci hotplug not supported on this 
> bus.");
>  return;
>  }
>  /*
> @@ -1678,7 +1678,7 @@ static void pc_virtio_pmem_pci_pre_plug(HotplugHandler 
> *hotplug_dev,
>   */
>  memory_device_pre_plug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev), NULL,
> &local_err);
> -if (!local_err) {
> +if (!local_err && hotplug_dev2) {
>  hotplug_handler_pre_plug(hotplug_dev2, dev, &local_err);
>  }
>  error_propagate(errp, local_err);
> @@ -1696,9 +1696,11 @@ static void pc_virtio_pmem_pci_plug(HotplugHandler 
> *hotplug_dev,
>   * device bits.
>   */
>  memory_device_plug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev));
> -hotplug_handler_plug(hotplug_dev2, dev, &local_err);
> -if (local_err) {
> -memory_device_unplug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev));
> +if (hotplug_dev2) {
> +hotplug_handler_plug(hotplug_dev2, dev, &local_err);
> +if (local_err) {
> +memory_device_unplug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev));
> +}
>  }
>  error_propagate(errp, local_err);
>  }
This looks good to me & will allow to cold-plug the virtio-pmem device
on bus if they don't support hot-plug.

Reviewed-by: Pankaj Gupta 



Re: [PATCH v1 13/17] hmp: Handle virtio-mem when printing memory device info

2020-05-06 Thread Pankaj Gupta
> Print the memory device info just like for other memory devices.
>
> Cc: "Dr. David Alan Gilbert" 
> Cc: "Michael S. Tsirkin" 
> Signed-off-by: David Hildenbrand 
> ---
>  monitor/hmp-cmds.c | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 7f6e982dc8..4b3638a2a6 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -1805,6 +1805,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
> *qdict)
>  MemoryDeviceInfoList *info_list = qmp_query_memory_devices(&err);
>  MemoryDeviceInfoList *info;
>  VirtioPMEMDeviceInfo *vpi;
> +VirtioMEMDeviceInfo *vmi;
>  MemoryDeviceInfo *value;
>  PCDIMMDeviceInfo *di;
>
> @@ -1839,6 +1840,21 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
> *qdict)
>  monitor_printf(mon, "  size: %" PRIu64 "\n", vpi->size);
>  monitor_printf(mon, "  memdev: %s\n", vpi->memdev);
>  break;
> +case MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM:
> +vmi = value->u.virtio_mem.data;
> +monitor_printf(mon, "Memory device [%s]: \"%s\"\n",
> +   MemoryDeviceInfoKind_str(value->type),
> +   vmi->id ? vmi->id : "");
> +monitor_printf(mon, "  memaddr: 0x%" PRIx64 "\n", 
> vmi->memaddr);
> +monitor_printf(mon, "  node: %" PRId64 "\n", vmi->node);
> +monitor_printf(mon, "  requested-size: %" PRIu64 "\n",
> +   vmi->requested_size);
> +monitor_printf(mon, "  size: %" PRIu64 "\n", vmi->size);
> +monitor_printf(mon, "  max-size: %" PRIu64 "\n", 
> vmi->max_size);
> +monitor_printf(mon, "  block-size: %" PRIu64 "\n",
> +   vmi->block_size);
> +monitor_printf(mon, "  memdev: %s\n", vmi->memdev);
> +break;
>  default:
>  g_assert_not_reached();
>  }
> --
> 2.25.3

Reviewed-by: Pankaj Gupta 



Re: [PATCH v1 11/17] virtio-pci: Proxy for virtio-mem

2020-05-06 Thread Pankaj Gupta
ID_VIRTIO_MEM;
> +pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
> +pcidev_k->class_id = PCI_CLASS_OTHERS;
> +
> +mdc->get_addr = virtio_mem_pci_get_addr;
> +mdc->set_addr = virtio_mem_pci_set_addr;
> +mdc->get_plugged_size = virtio_mem_pci_get_plugged_size;
> +mdc->get_memory_region = virtio_mem_pci_get_memory_region;
> +mdc->fill_device_info = virtio_mem_pci_fill_device_info;
> +}
> +
> +static void virtio_mem_pci_instance_init(Object *obj)
> +{
> +VirtIOMEMPCI *dev = VIRTIO_MEM_PCI(obj);
> +
> +virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> +TYPE_VIRTIO_MEM);
> +object_property_add_alias(obj, VIRTIO_MEM_BLOCK_SIZE_PROP,
> +  OBJECT(&dev->vdev),
> +  VIRTIO_MEM_BLOCK_SIZE_PROP, &error_abort);
> +object_property_add_alias(obj, VIRTIO_MEM_SIZE_PROP, OBJECT(&dev->vdev),
> +  VIRTIO_MEM_SIZE_PROP, &error_abort);
> +object_property_add_alias(obj, VIRTIO_MEM_REQUESTED_SIZE_PROP,
> +  OBJECT(&dev->vdev),
> +  VIRTIO_MEM_REQUESTED_SIZE_PROP, &error_abort);
> +}
> +
> +static const VirtioPCIDeviceTypeInfo virtio_mem_pci_info = {
> +.base_name = TYPE_VIRTIO_MEM_PCI,
> +.generic_name = "virtio-mem-pci",
> +.instance_size = sizeof(VirtIOMEMPCI),
> +.instance_init = virtio_mem_pci_instance_init,
> +.class_init = virtio_mem_pci_class_init,
> +.interfaces = (InterfaceInfo[]) {
> +{ TYPE_MEMORY_DEVICE },
> +{ }
> +},
> +};
> +
> +static void virtio_mem_pci_register_types(void)
> +{
> +virtio_pci_types_register(&virtio_mem_pci_info);
> +}
> +type_init(virtio_mem_pci_register_types)
> diff --git a/hw/virtio/virtio-mem-pci.h b/hw/virtio/virtio-mem-pci.h
> new file mode 100644
> index 00..8820cd6628
> --- /dev/null
> +++ b/hw/virtio/virtio-mem-pci.h
> @@ -0,0 +1,33 @@
> +/*
> + * Virtio MEM PCI device
> + *
> + * Copyright (C) 2020 Red Hat, Inc.
> + *
> + * Authors:
> + *  David Hildenbrand 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_VIRTIO_MEM_PCI_H
> +#define QEMU_VIRTIO_MEM_PCI_H
> +
> +#include "hw/virtio/virtio-pci.h"
> +#include "hw/virtio/virtio-mem.h"
> +
> +typedef struct VirtIOMEMPCI VirtIOMEMPCI;
> +
> +/*
> + * virtio-mem-pci: This extends VirtioPCIProxy.
> + */
> +#define TYPE_VIRTIO_MEM_PCI "virtio-mem-pci-base"
> +#define VIRTIO_MEM_PCI(obj) \
> +OBJECT_CHECK(VirtIOMEMPCI, (obj), TYPE_VIRTIO_MEM_PCI)
> +
> +struct VirtIOMEMPCI {
> +VirtIOPCIProxy parent_obj;
> +VirtIOMEM vdev;
> +};
> +
> +#endif /* QEMU_VIRTIO_MEM_PCI_H */
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index cfedf5a995..fec72d5a31 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -87,6 +87,7 @@ extern bool pci_available;
>  #define PCI_DEVICE_ID_VIRTIO_VSOCK   0x1012
>  #define PCI_DEVICE_ID_VIRTIO_PMEM0x1013
>  #define PCI_DEVICE_ID_VIRTIO_IOMMU   0x1014
> +#define PCI_DEVICE_ID_VIRTIO_MEM 0x1015
>
>  #define PCI_VENDOR_ID_REDHAT 0x1b36
>  #define PCI_DEVICE_ID_REDHAT_BRIDGE  0x0001
> --
> 2.25.3
Reviewed-by: Pankaj Gupta 



Re: [PATCH v1 15/17] pc: Support for virtio-mem-pci

2020-05-06 Thread Pankaj Gupta
   pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
>  }
>  }
>
> @@ -1732,8 +1735,9 @@ static void pc_machine_device_plug_cb(HotplugHandler 
> *hotplug_dev,
>  pc_memory_plug(hotplug_dev, dev, errp);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>  pc_cpu_plug(hotplug_dev, dev, errp);
> -} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
> -pc_virtio_pmem_pci_plug(hotplug_dev, dev, errp);
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
> +   object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
> +pc_virtio_md_pci_plug(hotplug_dev, dev, errp);
>  }
>  }
>
> @@ -1744,8 +1748,9 @@ static void 
> pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
>  pc_memory_unplug_request(hotplug_dev, dev, errp);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>  pc_cpu_unplug_request_cb(hotplug_dev, dev, errp);
> -} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
> -pc_virtio_pmem_pci_unplug_request(hotplug_dev, dev, errp);
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
> +   object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
> +pc_virtio_md_pci_unplug_request(hotplug_dev, dev, errp);
>  } else {
>  error_setg(errp, "acpi: device unplug request for not supported 
> device"
> " type: %s", object_get_typename(OBJECT(dev)));
> @@ -1759,8 +1764,9 @@ static void pc_machine_device_unplug_cb(HotplugHandler 
> *hotplug_dev,
>  pc_memory_unplug(hotplug_dev, dev, errp);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>  pc_cpu_unplug_cb(hotplug_dev, dev, errp);
> -} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
> -pc_virtio_pmem_pci_unplug(hotplug_dev, dev, errp);
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
> +   object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
> +pc_virtio_md_pci_unplug(hotplug_dev, dev, errp);
>  } else {
>  error_setg(errp, "acpi: device unplug for not supported device"
> " type: %s", object_get_typename(OBJECT(dev)));
> @@ -1772,7 +1778,8 @@ static HotplugHandler 
> *pc_get_hotplug_handler(MachineState *machine,
>  {
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
>  object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
> -object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
> +object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
> +object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
>  return HOTPLUG_HANDLER(machine);
>  }
>
> --

Reviewed-by: Pankaj Gupta 

> 2.25.3
>
>



Re: [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default

2020-03-30 Thread Pankaj Gupta
For best case its really a good idea to configure default number of
queues to the number of CPU's.

For the series:
Reviewed-by: Pankaj Gupta 



Re: [PATCH v4 1/2] virtio-blk: delete vqs on the error path in realize()

2020-03-30 Thread Pankaj Gupta
Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 1/3] virtio: add ability to delete vq through a pointer

2019-12-04 Thread Pankaj Gupta


> 
> On 2019/12/4 16:33, Pankaj Gupta wrote:
> > 
> >> From: Pan Nengyuan 
> >>
> >> Devices tend to maintain vq pointers, allow deleting them trough a vq
> >> pointer.
> >>
> >> Signed-off-by: Michael S. Tsirkin 
> >> Signed-off-by: Pan Nengyuan 
> >> ---
> >> Changes v2 to v1:
> >> - add a new function virtio_delete_queue to cleanup vq through a vq
> >> pointer
> >> ---
> >>  hw/virtio/virtio.c | 16 +++-
> >>  include/hw/virtio/virtio.h |  2 ++
> >>  2 files changed, 13 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index 04716b5..6de3cfd 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -2330,17 +2330,23 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev,
> >> int
> >> queue_size,
> >>  return &vdev->vq[i];
> >>  }
> >>  
> >> +void virtio_delete_queue(VirtQueue *vq)
> >> +{
> >> +vq->vring.num = 0;
> >> +vq->vring.num_default = 0;
> >> +vq->handle_output = NULL;
> >> +vq->handle_aio_output = NULL;
> >> +g_free(vq->used_elems);
> >> +vq->used_elems = NULL;
> >> +}
> >> +
> >>  void virtio_del_queue(VirtIODevice *vdev, int n)
> >>  {
> >>  if (n < 0 || n >= VIRTIO_QUEUE_MAX) {
> >>  abort();
> >>  }
> >>  
> >> -vdev->vq[n].vring.num = 0;
> >> -vdev->vq[n].vring.num_default = 0;
> >> -vdev->vq[n].handle_output = NULL;
> >> -vdev->vq[n].handle_aio_output = NULL;
> >> -g_free(vdev->vq[n].used_elems);
> >> +virtio_delete_queue(&vdev->vq[n]);
> >>  }
> >>  
> >>  static void virtio_set_isr(VirtIODevice *vdev, int value)
> >> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >> index c32a815..e18756d 100644
> >> --- a/include/hw/virtio/virtio.h
> >> +++ b/include/hw/virtio/virtio.h
> >> @@ -183,6 +183,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int
> >> queue_size,
> >>  
> >>  void virtio_del_queue(VirtIODevice *vdev, int n);
> >>  
> >> +void virtio_delete_queue(VirtQueue *vq);
> >> +
> >>  void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
> >>  unsigned int len);
> >>  void virtqueue_flush(VirtQueue *vq, unsigned int count);
> >> --
> >> 2.7.2.windows.1
> >>
> >>
> > Overall it ooks good to me.
> > 
> > Just one point: e.g in virtio_rng: "virtio_rng_device_unrealize" function
> > We are doing : virtio_del_queue(vdev, 0);
> > 
> > One can directly call "virtio_delete_queue". It can become confusing
> > to call multiple functions for same purpose. Instead, Can we make
> > "virtio_delete_queue" static inline?
> > 
> yes, It will be a little confused, but I think it will have the same
> problem if we make "virtio_delete_queue" static inline. We can directly
> call it aslo. (e.g virtio-serial-bus.c virtio-balloon.c).
> 
> How about replacing the function name to make it more clear (e.g
> virtio_delete_queue -> virtio_queue_cleanup) ? It's too similar between
> "virtio_del_queue" and "virtio_delete_queue".

I am just thinking if we need these two separate functions.

Yes, changing name of virtio_delete_queue -> virtio_queue_cleanup
should be good enough.

Thanks,
Pankaj

> 
> > Other than that:
> > Reviewed-by: Pankaj Gupta 
> > 
> >>
> >>
> > 
> > 
> > .
> > 
> 
> 
> 




Re: [PATCH v2 1/3] virtio: add ability to delete vq through a pointer

2019-12-04 Thread Pankaj Gupta


> From: Pan Nengyuan 
> 
> Devices tend to maintain vq pointers, allow deleting them trough a vq
> pointer.
> 
> Signed-off-by: Michael S. Tsirkin 
> Signed-off-by: Pan Nengyuan 
> ---
> Changes v2 to v1:
> - add a new function virtio_delete_queue to cleanup vq through a vq pointer
> ---
>  hw/virtio/virtio.c | 16 +++-
>  include/hw/virtio/virtio.h |  2 ++
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 04716b5..6de3cfd 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2330,17 +2330,23 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int
> queue_size,
>  return &vdev->vq[i];
>  }
>  
> +void virtio_delete_queue(VirtQueue *vq)
> +{
> +vq->vring.num = 0;
> +vq->vring.num_default = 0;
> +vq->handle_output = NULL;
> +vq->handle_aio_output = NULL;
> +g_free(vq->used_elems);
> +vq->used_elems = NULL;
> +}
> +
>  void virtio_del_queue(VirtIODevice *vdev, int n)
>  {
>  if (n < 0 || n >= VIRTIO_QUEUE_MAX) {
>  abort();
>  }
>  
> -vdev->vq[n].vring.num = 0;
> -vdev->vq[n].vring.num_default = 0;
> -vdev->vq[n].handle_output = NULL;
> -vdev->vq[n].handle_aio_output = NULL;
> -g_free(vdev->vq[n].used_elems);
> +virtio_delete_queue(&vdev->vq[n]);
>  }
>  
>  static void virtio_set_isr(VirtIODevice *vdev, int value)
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index c32a815..e18756d 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -183,6 +183,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int
> queue_size,
>  
>  void virtio_del_queue(VirtIODevice *vdev, int n);
>  
> +void virtio_delete_queue(VirtQueue *vq);
> +
>  void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>  unsigned int len);
>  void virtqueue_flush(VirtQueue *vq, unsigned int count);
> --
> 2.7.2.windows.1
> 
> 
Overall it ooks good to me.

Just one point: e.g in virtio_rng: "virtio_rng_device_unrealize" function
We are doing : virtio_del_queue(vdev, 0);

One can directly call "virtio_delete_queue". It can become confusing
to call multiple functions for same purpose. Instead, Can we make 
"virtio_delete_queue" static inline?

Other than that:
Reviewed-by: Pankaj Gupta 

> 
> 




Re: [PATCH v4 0/8] Introduce the microvm machine type

2019-09-25 Thread Pankaj Gupta


> >>> Microvm is a machine type inspired by both NEMU and Firecracker, and
> >>> constructed after the machine model implemented by the latter.
> >>>
> >>> It's main purpose is providing users a minimalist machine type free
> >>> from the burden of legacy compatibility, serving as a stepping stone
> >>> for future projects aiming at improving boot times, reducing the
> >>> attack surface and slimming down QEMU's footprint.
> >>>
> >>> The microvm machine type supports the following devices:
> >>>
> >>>  - ISA bus
> >>>  - i8259 PIC
> >>>  - LAPIC (implicit if using KVM)
> >>>  - IOAPIC (defaults to kernel_irqchip_split = true)
> >>>  - i8254 PIT
> >>>  - MC146818 RTC (optional)
> >>>  - kvmclock (if using KVM)
> >>>  - fw_cfg
> >>>  - One ISA serial port (optional)
> >>>  - Up to eight virtio-mmio devices (configured by the user)
> >>
> >> So I assume also no ACPI (CPU/memory hotplug), correct?
> > 
> > Correct.
> > 
> >> @Pankaj, I think it would make sense to make virtio-pmem play with
> >> virtio-mmio/microvm.
> > 
> > That would be great. I'm also looking forward for virtio-mem (and an
> > hypothetical virtio-cpu) to eventually gain hotplug capabilities in
> > microvm.
> 
> @Pankaj, do you have time to look into the virtio-pmem thingy? I guess
> the virtio-mmio rapper shouldn't be too hard (very similar to the
> virtio-pci wrapper - luckily I insisted to make it work independently
> from PCI BARs and ACPI slots ;) ). The microvm bits would be properly
> setting up device memory and wiring up the hotplug handlers, similar as
> done in the other PC machine types (maybe that comes for free?).

Yes, I can look at.

> 
> virtio-pmem will allow (in read-only mode) to place the rootfs on a fake
> NVDIMM, as done e.g., in kata containers. We might have to include the
> virtio-pmem kernel module in the initramfs, shouldn't  be too hard. Not
> sure what else we'll need to make virtio-pmem get used as a rootfs.

Sure, will work on it.

Thanks,
Pankaj

> 
> > 
> > Thanks,
> > Sergio.
> > 
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 
> 



Re: [PATCH v4 0/8] Introduce the microvm machine type

2019-09-25 Thread Pankaj Gupta


> On 24.09.19 14:44, Sergio Lopez wrote:
> > Microvm is a machine type inspired by both NEMU and Firecracker, and
> > constructed after the machine model implemented by the latter.
> > 
> > It's main purpose is providing users a minimalist machine type free
> > from the burden of legacy compatibility, serving as a stepping stone
> > for future projects aiming at improving boot times, reducing the
> > attack surface and slimming down QEMU's footprint.
> > 
> > The microvm machine type supports the following devices:
> > 
> >  - ISA bus
> >  - i8259 PIC
> >  - LAPIC (implicit if using KVM)
> >  - IOAPIC (defaults to kernel_irqchip_split = true)
> >  - i8254 PIT
> >  - MC146818 RTC (optional)
> >  - kvmclock (if using KVM)
> >  - fw_cfg
> >  - One ISA serial port (optional)
> >  - Up to eight virtio-mmio devices (configured by the user)
> 
> So I assume also no ACPI (CPU/memory hotplug), correct?
> 
> @Pankaj, I think it would make sense to make virtio-pmem play with
> virtio-mmio/microvm.

I agree. Its using virtio-blk device over a raw image.
Similarly or alternatively(as an experiment) we can use virtio-pmem
which will even reduce the guest memory footprint. 

Best regards,
Pankaj

> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 



Re: [Qemu-devel] Call for volunteers: LWN.net articles about KVM Forum talks

2019-09-17 Thread Pankaj Gupta


Hi Stefan,

> 
> Hi,
> LWN.net is a popular open source news site that covers Linux and other
> open source communities (Python, GNOME, Debian, etc).  It has published
> a few KVM articles in the past too.
> 
> Let's raise awareness of QEMU, KVM, and libvirt by submitting articles
> covering
> KVM Forum.
> 
> I am looking for ~5 volunteers who are attending KVM Forum to write an
> article
> about a talk they find interesting.
> 
> Please pick a talk you'd like to cover and reply to this email thread.
> I will then send an email to LWN with a heads-up so they can let us know
> if they are interested in publishing a KVM Forum special.  I will not
> ask LWN.net for money.
> 
> KVM Forum schedule:
> https://events.linuxfoundation.org/events/kvm-forum-2019/program/schedule/
> 
> LWN.net guidelines:
> https://lwn.net/op/AuthorGuide.lwn
> "Our general guideline is for articles to be around 1500 words in
> length, though somewhat longer or shorter can work too. The best
> articles cover a fairly narrow topic completely, without any big
> omissions or any extra padding."
> 
> I volunteer to cover Michael Tsirkin's "VirtIO without the Virt -
> Towards Implementations in Hardware" talk.

I volunteer for "KVM Address Space Isolation" talk by - Alexandre Chartre & 
Liran Alon.

Thanks,
Pankaj

> 
> Thanks,
> Stefan
> 
> 



Re: [Qemu-devel] [PATCH v3] virtio pmem: user document

2019-09-15 Thread Pankaj Gupta


Gentle ping.

Can we please merge this patch.

Thanks,
Pankaj

> 
> > This patch documents the steps to use virtio pmem.
> > It also documents other useful information about
> > virtio pmem e.g use-case, comparison with Qemu NVDIMM
> > backend and current limitations.
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> > v3->v3
> >  - Text format fixes - Cornerlia
> > v1->v2
> >  - Fixes on text format and 'Guest Data persistence'
> >section - Cornelia
> > 
> >  docs/virtio-pmem.rst | 75 
> >  1 file changed, 75 insertions(+)
> >  create mode 100644 docs/virtio-pmem.rst
> 
> Looks good to me now.
> 
> Reviewed-by: Cornelia Huck 
> 
> 



[Qemu-devel] [PATCH v3] virtio pmem: user document

2019-08-21 Thread Pankaj Gupta
This patch documents the steps to use virtio pmem.
It also documents other useful information about
virtio pmem e.g use-case, comparison with Qemu NVDIMM
backend and current limitations.

Signed-off-by: Pankaj Gupta 
---
v3->v3
 - Text format fixes - Cornerlia
v1->v2
 - Fixes on text format and 'Guest Data persistence'
   section - Cornelia

 docs/virtio-pmem.rst | 75 
 1 file changed, 75 insertions(+)
 create mode 100644 docs/virtio-pmem.rst

diff --git a/docs/virtio-pmem.rst b/docs/virtio-pmem.rst
new file mode 100644
index 00..e77881b26f
--- /dev/null
+++ b/docs/virtio-pmem.rst
@@ -0,0 +1,75 @@
+
+
+QEMU virtio pmem
+
+
+ This document explains the setup and usage of the virtio pmem device
+ which is available since QEMU v4.1.0.
+
+ The virtio pmem device is a paravirtualized persistent memory device
+ on regular (i.e non-NVDIMM) storage.
+
+Usecase
+
+
+  Virtio pmem allows to bypass the guest page cache and directly use
+  host page cache. This reduces guest memory footprint as the host can
+  make efficient memory reclaim decisions under memory pressure.
+
+o How does virtio-pmem compare to the nvdimm emulation supported by QEMU?
+
+  NVDIMM emulation on regular (i.e. non-NVDIMM) host storage does not
+  persist the guest writes as there are no defined semantics in the device
+  specification. The virtio pmem device provides guest write persistence
+  on non-NVDIMM host storage.
+
+virtio pmem usage
+-
+
+  A virtio pmem device backed by a memory-backend-file can be created on
+  the QEMU command line as in the following example:
+
+  -object memory-backend-file,id=mem1,share,mem-path=./virtio_pmem.img,size=4G
+  -device virtio-pmem-pci,memdev=mem1,id=nv1
+
+   where:
+   - "object memory-backend-file,id=mem1,share,mem-path=, size="
+ creates a backend file with the specified size.
+
+   - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem
+ pci device whose storage is provided by above memory backend device.
+
+  Multiple virtio pmem devices can be created if multiple pairs of "-object"
+  and "-device" are provided.
+
+Hotplug
+---
+
+Virtio pmem devices can be hotplugged via the QEMU monitor. First, the
+memory backing has to be added via 'object_add'; afterwards, the virtio
+pmem device can be added via 'device_add'.
+
+For example, the following commands add another 4GB virtio pmem device to
+the guest:
+
+ (qemu) object_add 
memory-backend-file,id=mem2,share=on,mem-path=virtio_pmem2.img,size=4G
+ (qemu) device_add virtio-pmem-pci,id=virtio_pmem2,memdev=mem2
+
+Guest Data Persistence
+--
+
+ Guest data persistence on non-NVDIMM requires guest userspace applications
+ to perform fsync/msync. This is different from a real nvdimm backend where
+ no additional fsync/msync is required. This is to persist guest writes in
+ host backing file which otherwise remains in host page cache and there is
+ risk of losing the data in case of power failure.
+
+ With virtio pmem device, MAP_SYNC mmap flag is not supported. This provides
+ a hint to application to perform fsync for write persistence.
+
+Limitations
+
+- Real nvdimm device backend is not supported.
+- virtio pmem hotunplug is not supported.
+- ACPI NVDIMM features like regions/namespaces are not supported.
+- ndctl command is not supported.
-- 
2.21.0




Re: [Qemu-devel] [PATCH v2] virtio pmem: user document

2019-08-21 Thread Pankaj Gupta


Hi Cornelia,

> > This patch documents the steps to use virtio pmem.
> > It also documents other useful information about
> > virtio pmem e.g use-case, comparison with Qemu NVDIMM
> > backend and current limitations.
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> > v1->v2
> >  - Fixes on text format and 'Guest Data persistence'
> >section - Cornelia
> > 
> >  docs/virtio-pmem.rst | 75 
> >  1 file changed, 75 insertions(+)
> >  create mode 100644 docs/virtio-pmem.rst
> > 
> > diff --git a/docs/virtio-pmem.rst b/docs/virtio-pmem.rst
> > new file mode 100644
> > index 00..0346e61674
> > --- /dev/null
> > +++ b/docs/virtio-pmem.rst
> > @@ -0,0 +1,75 @@
> > +
> > +
> > +QEMU virtio pmem
> > +
> > +
> > + This document explains the setup and usage of virtio pmem device
> 
> s/virtio pmem device/the virtio pmem device/

done

> 
> > + which is available since QEMU v4.1.0.
> > +
> > + The virtio pmem is a paravirtualized persistent memory device on
> 
> s/The virtio pmem/The virtio pmem device/

o.k

> 
> > + regular(i.e non-NVDIMM) storage.
> 
> missing blank before '('

sure
> 
> > +
> > +Usecase
> > +
> > +
> > +  Allows to bypass the guest page cache and directly use host page cache.
> 
> "Virtio pmem allows to..." ?

done.

> 
> > +  This reduces guest memory footprint as the host can make efficient
> > +  memory reclaim decisions under memory pressure.
> > +
> > +o How does virtio-pmem compare to the nvdimm emulation supported by QEMU?
> > +
> > +  NVDIMM emulation on regular(i.e. non-NVDIMM) host storage does not
> 
> missing blank before '('

done.
> 
> > +  persist the guest writes as there are no defined semantics in the device
> > +  specification. The virtio pmem device provides guest write persistence
> > +  on non-NVDIMM host storage.
> > +
> > +virtio pmem usage
> > +-
> > +
> > +  A virtio pmem device backed by a memory-backend-file can be created on
> > +  the QEMU command line as in the following example:
> > +
> > +  -object
> > memory-backend-file,id=mem1,share,mem-path=./virtio_pmem.img,size=4G
> > +  -device virtio-pmem-pci,memdev=mem1,id=nv1
> > +
> > +   where:
> > +   - "object memory-backend-file,id=mem1,share,mem-path=,
> > size="
> > + creates a backend file of size on a mem-path.
> 
> "a backend file with the specified size" ?

done.

> 
> > +
> > +   - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem
> > + pci device whose storage is provided by above memory backend device.
> > +
> > +  Multiple virtio pmem devices can be created if multiple pairs of
> > "-object"
> > +  and "-device" are provided.
> > +
> > +Hotplug
> > +---
> > +
> > +"Virtio pmem devices can be hotplugged via the QEMU monitor. First, the
> > +memory backing has to be added via 'object_add'; afterwards, the virtio
> > +pmem device can be added via 'device_add'."
> 
> Please lose the '"' (copy/paste leftover, I presume? :)

Done :)

> 
> > +
> > +For example, the following commands add another 4GB virtio pmem device to
> > +the guest:
> > +
> > + (qemu) object_add
> > memory-backend-file,id=mem2,share=on,mem-path=virtio_pmem2.img,size=4G
> > + (qemu) device_add virtio-pmem-pci,id=virtio_pmem2,memdev=mem2
> > +
> > +Guest Data Persistence
> > +--
> > +
> > + Guest data persistence on non-NVDIMM requires guest userspace application
> > to
> 
> s/application/applications/ ?

done.

> 
> > + perform fsync/msync. This is different from a real nvdimm backend where
> > no
> > + additional fsync/msync is required. This is to persist guest writes in
> > host
> > + backing file which otherwise remains in host page cache and there is risk
> > of
> > + losing the data in case of power failure.
> > +
> > + With virtio pmem device, MAP_SYNC mmap flag is not supported. This
> > provides
> > + a hint to application to perform fsync for write persistence.
> > +
> > +Limitations
> > +
> > +- Real nvdimm device backend is not supported.
> > +- virtio pmem hotunplug is not supported.
> > +- ACPI NVDIMM features like regions/namespaces are not supported.
> > +- ndctl command is not supported.
> 
> Only some nits from my side, otherwise looks good to me.

Thank you for the review. Will post a v3 with the changes.

Best regards,
Pankaj

> 



[Qemu-devel] [PATCH v2] virtio pmem: user document

2019-08-21 Thread Pankaj Gupta
This patch documents the steps to use virtio pmem.
It also documents other useful information about
virtio pmem e.g use-case, comparison with Qemu NVDIMM
backend and current limitations.

Signed-off-by: Pankaj Gupta 
---
v1->v2
 - Fixes on text format and 'Guest Data persistence'
   section - Cornelia

 docs/virtio-pmem.rst | 75 
 1 file changed, 75 insertions(+)
 create mode 100644 docs/virtio-pmem.rst

diff --git a/docs/virtio-pmem.rst b/docs/virtio-pmem.rst
new file mode 100644
index 00..0346e61674
--- /dev/null
+++ b/docs/virtio-pmem.rst
@@ -0,0 +1,75 @@
+
+
+QEMU virtio pmem
+
+
+ This document explains the setup and usage of virtio pmem device
+ which is available since QEMU v4.1.0.
+
+ The virtio pmem is a paravirtualized persistent memory device on
+ regular(i.e non-NVDIMM) storage.
+
+Usecase
+
+
+  Allows to bypass the guest page cache and directly use host page cache.
+  This reduces guest memory footprint as the host can make efficient
+  memory reclaim decisions under memory pressure.
+
+o How does virtio-pmem compare to the nvdimm emulation supported by QEMU?
+
+  NVDIMM emulation on regular(i.e. non-NVDIMM) host storage does not
+  persist the guest writes as there are no defined semantics in the device
+  specification. The virtio pmem device provides guest write persistence
+  on non-NVDIMM host storage.
+
+virtio pmem usage
+-
+
+  A virtio pmem device backed by a memory-backend-file can be created on
+  the QEMU command line as in the following example:
+
+  -object memory-backend-file,id=mem1,share,mem-path=./virtio_pmem.img,size=4G
+  -device virtio-pmem-pci,memdev=mem1,id=nv1
+
+   where:
+   - "object memory-backend-file,id=mem1,share,mem-path=, size="
+ creates a backend file of size on a mem-path.
+
+   - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem
+ pci device whose storage is provided by above memory backend device.
+
+  Multiple virtio pmem devices can be created if multiple pairs of "-object"
+  and "-device" are provided.
+
+Hotplug
+---
+
+"Virtio pmem devices can be hotplugged via the QEMU monitor. First, the
+memory backing has to be added via 'object_add'; afterwards, the virtio
+pmem device can be added via 'device_add'."
+
+For example, the following commands add another 4GB virtio pmem device to
+the guest:
+
+ (qemu) object_add 
memory-backend-file,id=mem2,share=on,mem-path=virtio_pmem2.img,size=4G
+ (qemu) device_add virtio-pmem-pci,id=virtio_pmem2,memdev=mem2
+
+Guest Data Persistence
+--
+
+ Guest data persistence on non-NVDIMM requires guest userspace application to
+ perform fsync/msync. This is different from a real nvdimm backend where no
+ additional fsync/msync is required. This is to persist guest writes in host
+ backing file which otherwise remains in host page cache and there is risk of
+ losing the data in case of power failure.
+
+ With virtio pmem device, MAP_SYNC mmap flag is not supported. This provides
+ a hint to application to perform fsync for write persistence.
+
+Limitations
+
+- Real nvdimm device backend is not supported.
+- virtio pmem hotunplug is not supported.
+- ACPI NVDIMM features like regions/namespaces are not supported.
+- ndctl command is not supported.
-- 
2.21.0




Re: [Qemu-devel] [PATCH] virtio pmem: user document

2019-07-30 Thread Pankaj Gupta



> On Tue, 30 Jul 2019 12:16:57 +0530
> Pankaj Gupta  wrote:
> 
> > This patch documents the steps to use virtio pmem.
> > It also documents other useful information about
> > virtio pmem e.g use-case, comparison with Qemu NVDIMM
> > backend and current limitations.
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  docs/virtio-pmem.txt | 65 
> >  1 file changed, 65 insertions(+)
> >  create mode 100644 docs/virtio-pmem.txt
> > 
> > diff --git a/docs/virtio-pmem.txt b/docs/virtio-pmem.txt
> 
> Maybe make this ReST from the start? Should be trivial enough.

o.k

> 
> > new file mode 100644
> > index 00..fc61eebb20
> > --- /dev/null
> > +++ b/docs/virtio-pmem.txt
> > @@ -0,0 +1,65 @@
> > +
> > +QEMU virtio pmem
> > +===
> > +
> > + This document explains the usage of virtio pmem device
> 
> "setup and usage" ?

o.k

> 
> > + which is available since QEMU v4.1.0.
> > +
> > + The virtio pmem is paravirtualized persistent memory device
> 
> "The virtio pmem device is a paravirtualized..."

sure.

> 
> > + on regular(non-NVDIMM) storage.
> > +
> > +Usecase
> > +
> > +  Allows to bypass the guest page cache and directly use host page cache.
> > +  This reduces guest memory footprint as host can make efficient memory
> 
> s/as host/,as the host/

sure

> 
> > +  reclaim decisions under memory pressure.
> > +
> > +o How does virtio-pmem compare to the nvdimm emulation supported by QEMU?
> > +
> > +  NVDIMM emulation on regular(non-NVDIMM) host storage does not persists
> 
> s/regular(non-NVDIMM)/regular (i.e. non-NVDIMM)/ ?
> s/persists/persist/

yes, to both.

> 
> > +  the guest writes as there are no defined semantecs in the device
> > specification.
> 
> s/semantecs/semantics/

ah...spell checker :(

> 
> > +  With virtio pmem device, guest write persistence on non-NVDIMM storage
> > is
> > +  supported.
> 
> "The virtio pmem device provides a way to support guest write
> persistence on non-NVDIMM storage." ?
> 
> > +
> > +virtio pmem usage
> > +-
> > +  virtio pmem device is created with a memory-backend-file with the below
> > +  options:
> 
> "A virtio pmem device backed by a memory-backend-file can be created on
> the QEMU command line as in the following example:" ?

o.k

> 
> > +
> > +  -machine pc -m 8G,slots=$N,maxmem=$MAX_SIZE
> 
> I'm not sure you should explicitly specify the machine type in this
> example. I think it is fine to say that something is only supported on
> a subset of machine types, but it should not make its way into an
> example on how to configure a device and its backing.

o.k

> 
> Also, maybe fill in more concrete values here? Or split it into a part
> specifying the syntax (where I'd use  instead of $MAX_SIZE
> etc.), and a more concrete example?

o.k

> 
> > +  -object memory-backend-file,id=mem1,share,mem-path=$PATH,size=$SIZE
> > +  -device virtio-pmem-pci,memdev=mem1,id=nv1
> > +
> > +   where:
> > +   - "object
> > memory-backend-file,id=mem1,share,mem-path=$PATH,size=$VIRTIO_PMEM_SIZE"
> > + creates a backend storage of size $SIZE on a file $PATH. All
> > + accesses to the virtio pmem device go to the file $PATH.
> > +
> > +   - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem
> > + device whose storage is provided by above memory backend device.
> 
> "a virtio pmem PCI device" ?

o.k

> 
> > +
> > +  Multiple virtio pmem devices can be created if multiple pairs of
> > "-object"
> > +  and "-device" are provided.
> > +
> > +Hotplug
> > +---
> > +Accomplished by two monitor commands "object_add" and "device_add".
> 
> Hm... what about the following instead:
> 
> "Virtio pmem devices can be hotplugged via the QEMU monitor. First, the
> memory backing has to be added via 'object_add'; afterwards, the virtio
> pmem device can be added via 'device_add'."

o.k

> 
> > +
> > +For example, the following commands add another 4GB virtio pmem device to
> > +the guest:
> > +
> > + (qemu) object_add
> > memory-backend-file,id=mem2,share=on,mem-path=virtio_pmem2.img,size=4G
> > + (qemu) device_add virtio-pmem-pci,id=virtio_pmem2,memdev=mem2
> > +
> > +Guest Data Persistence
> > +--
&g

[Qemu-devel] [PATCH] virtio pmem: user document

2019-07-29 Thread Pankaj Gupta
This patch documents the steps to use virtio pmem.
It also documents other useful information about
virtio pmem e.g use-case, comparison with Qemu NVDIMM
backend and current limitations.

Signed-off-by: Pankaj Gupta 
---
 docs/virtio-pmem.txt | 65 
 1 file changed, 65 insertions(+)
 create mode 100644 docs/virtio-pmem.txt

diff --git a/docs/virtio-pmem.txt b/docs/virtio-pmem.txt
new file mode 100644
index 00..fc61eebb20
--- /dev/null
+++ b/docs/virtio-pmem.txt
@@ -0,0 +1,65 @@
+
+QEMU virtio pmem
+===
+
+ This document explains the usage of virtio pmem device 
+ which is available since QEMU v4.1.0.
+
+ The virtio pmem is paravirtualized persistent memory device
+ on regular(non-NVDIMM) storage. 
+
+Usecase
+
+  Allows to bypass the guest page cache and directly use host page cache.
+  This reduces guest memory footprint as host can make efficient memory
+  reclaim decisions under memory pressure.
+
+o How does virtio-pmem compare to the nvdimm emulation supported by QEMU?
+
+  NVDIMM emulation on regular(non-NVDIMM) host storage does not persists
+  the guest writes as there are no defined semantecs in the device 
specification.
+  With virtio pmem device, guest write persistence on non-NVDIMM storage is
+  supported.
+
+virtio pmem usage
+-
+  virtio pmem device is created with a memory-backend-file with the below
+  options:
+
+  -machine pc -m 8G,slots=$N,maxmem=$MAX_SIZE
+  -object memory-backend-file,id=mem1,share,mem-path=$PATH,size=$SIZE
+  -device virtio-pmem-pci,memdev=mem1,id=nv1
+
+   where:
+   - "object 
memory-backend-file,id=mem1,share,mem-path=$PATH,size=$VIRTIO_PMEM_SIZE"
+ creates a backend storage of size $SIZE on a file $PATH. All
+ accesses to the virtio pmem device go to the file $PATH.
+
+   - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem
+ device whose storage is provided by above memory backend device.
+
+  Multiple virtio pmem devices can be created if multiple pairs of "-object"
+  and "-device" are provided.
+
+Hotplug
+---
+Accomplished by two monitor commands "object_add" and "device_add".
+
+For example, the following commands add another 4GB virtio pmem device to
+the guest:
+
+ (qemu) object_add 
memory-backend-file,id=mem2,share=on,mem-path=virtio_pmem2.img,size=4G
+ (qemu) device_add virtio-pmem-pci,id=virtio_pmem2,memdev=mem2
+
+Guest Data Persistence
+--
+Guest data persistence on non-NVDIMM requires guest userspace application to 
+perform fsync/msync. This is different than real nvdimm backend where no 
additional
+fsync/msync is required for data persistence.
+
+Limitations
+
+- Real nvdimm device backend is not supported.
+- virtio pmem hotunplug is not supported.
+- ACPI NVDIMM features like regions/namespaces are not supported.
+- ndctl command is not supported.
-- 
2.21.0




[Qemu-devel] [PATCH] virtio pmem: user document

2019-07-29 Thread Pankaj Gupta
This patch documents the steps to use virtio pmem.
It also documents other useful information about
virtio pmem e.g use-case, comparison with Qemu NVDIMM
backend and current limitations.

Signed-off-by: Pankaj Gupta 
---
 docs/virtio-pmem.txt | 65 
 1 file changed, 65 insertions(+)
 create mode 100644 docs/virtio-pmem.txt

diff --git a/docs/virtio-pmem.txt b/docs/virtio-pmem.txt
new file mode 100644
index 00..fc61eebb20
--- /dev/null
+++ b/docs/virtio-pmem.txt
@@ -0,0 +1,65 @@
+
+QEMU virtio pmem
+===
+
+ This document explains the usage of virtio pmem device 
+ which is available since QEMU v4.1.0.
+
+ The virtio pmem is paravirtualized persistent memory device
+ on regular(non-NVDIMM) storage. 
+
+Usecase
+
+  Allows to bypass the guest page cache and directly use host page cache.
+  This reduces guest memory footprint as host can make efficient memory
+  reclaim decisions under memory pressure.
+
+o How does virtio-pmem compare to the nvdimm emulation supported by QEMU?
+
+  NVDIMM emulation on regular(non-NVDIMM) host storage does not persists
+  the guest writes as there are no defined semantecs in the device 
specification.
+  With virtio pmem device, guest write persistence on non-NVDIMM storage is
+  supported.
+
+virtio pmem usage
+-
+  virtio pmem device is created with a memory-backend-file with the below
+  options:
+
+  -machine pc -m 8G,slots=$N,maxmem=$MAX_SIZE
+  -object memory-backend-file,id=mem1,share,mem-path=$PATH,size=$SIZE
+  -device virtio-pmem-pci,memdev=mem1,id=nv1
+
+   where:
+   - "object 
memory-backend-file,id=mem1,share,mem-path=$PATH,size=$VIRTIO_PMEM_SIZE"
+ creates a backend storage of size $SIZE on a file $PATH. All
+ accesses to the virtio pmem device go to the file $PATH.
+
+   - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem
+ device whose storage is provided by above memory backend device.
+
+  Multiple virtio pmem devices can be created if multiple pairs of "-object"
+  and "-device" are provided.
+
+Hotplug
+---
+Accomplished by two monitor commands "object_add" and "device_add".
+
+For example, the following commands add another 4GB virtio pmem device to
+the guest:
+
+ (qemu) object_add 
memory-backend-file,id=mem2,share=on,mem-path=virtio_pmem2.img,size=4G
+ (qemu) device_add virtio-pmem-pci,id=virtio_pmem2,memdev=mem2
+
+Guest Data Persistence
+--
+Guest data persistence on non-NVDIMM requires guest userspace application to 
+perform fsync/msync. This is different than real nvdimm backend where no 
additional
+fsync/msync is required for data persistence.
+
+Limitations
+
+- Real nvdimm device backend is not supported.
+- virtio pmem hotunplug is not supported.
+- ACPI NVDIMM features like regions/namespaces are not supported.
+- ndctl command is not supported.
-- 
2.21.0




Re: [Qemu-devel] [PATCH-for-4.1 v4 1/7] virtio-balloon: Fix wrong sign extension of PFNs

2019-07-25 Thread Pankaj Gupta


> 
> If we directly cast from int to uint64_t, we will first sign-extend to
> an int64_t, which is wrong. We actually want to treat the PFNs like
> unsigned values.
> 
> As far as I can see, this dates back to the initial virtio-balloon
> commit, but wasn't triggered as fairly big guests would be required.
> 
> Cc: qemu-sta...@nongnu.org
> Reported-by: Michael S. Tsirkin 
> Reviewed-by: David Gibson 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/virtio/virtio-balloon.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index e85d1c0d5c..515abf6553 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -343,8 +343,8 @@ static void virtio_balloon_handle_output(VirtIODevice
> *vdev, VirtQueue *vq)
>  }
>  
>  while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) ==
>  4) { 
> +unsigned int p = virtio_ldl_p(vdev, &pfn);
>  hwaddr pa;
> -int p = virtio_ldl_p(vdev, &pfn);
>  
>      pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
>  offset += 4;
> --
> 2.21.0

Reviewed-by: Pankaj Gupta 

> 
> 
> 



Re: [Qemu-devel] [PATCH] docs/nvdimm: add example on persistent backend setup

2019-07-24 Thread Pankaj Gupta


> 
> Persistent backend setup requires some knowledge about nvdimm and ndctl
> tool. Some users report they may struggle to gather these knowledge and
> have difficulty to setup it properly.
> 
> Here we provide two examples for persistent backend and gives the link
> to ndctl. By doing so, user could try it directly and do more
> investigation on persistent backend setup with ndctl.
> 
> Signed-off-by: Wei Yang 
> ---
>  docs/nvdimm.txt | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index b531cacd35..baba7a940d 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -171,6 +171,32 @@ guest software that this vNVDIMM device contains a
> region that cannot
>  accept persistent writes. In result, for example, the guest Linux
>  NVDIMM driver, marks such vNVDIMM device as read-only.
>  
> +Backend File Setup Example
> +..
> +
> +Here is two examples for how to setup these persistent backend on
> +linux, which leverages the tool ndctl [3].
> +
> +It is easy to setup DAX device backend file.
> +
> +A. DAX device
> +
> +ndctl create-namespace -f -e namespace0.0 -m devdax
> +
> +The /dev/dax0.0 could be used directly in "mem-path" option.
> +
> +For DAX file, it is more than creating the proper namespace. The
> +block device should be partitioned and mounted (with dax option).
> +
> +B. DAX file
> +
> +ndctl create-namespace -f -e namespace0.0 -m fsdax
> +(partition /dev/pmem0 with name pmem0p1)
> +mount -o dax /dev/pmem0p1 /mnt
> +(dd a file with proper size in /mnt)
> +
> +Then the new file in /mnt could be used in "mem-path" option.
> +
>  NVDIMM Persistence
>  --
>  
> @@ -212,3 +238,5 @@ References
>  
> https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
>  [2] Persistent Memory Development Kit (PMDK), formerly known as NVML
>  project, home page:
>  http://pmem.io/pmdk/
> +[3] ndctl-create-namespace - provision or reconfigure a namespace
> +http://pmem.io/ndctl/ndctl-create-namespace.html
> --

Reviewed-by: Pankaj Gupta 

> 2.17.1
> 
> 
> 



Re: [Qemu-devel] [PATCH] docs/nvdimm: add example on persistent backend setup

2019-07-24 Thread Pankaj Gupta


> >
> >> 
> >> Persistent backend setup requires some knowledge about nvdimm and ndctl
> >> tool. Some users report they may struggle to gather these knowledge and
> >> have difficulty to setup it properly.
> >> 
> >> Here we provide two examples for persistent backend and gives the link
> >> to ndctl. By doing so, user could try it directly and do more
> >> investigation on persistent backend setup with ndctl.
> >> 
> >> Signed-off-by: Wei Yang 
> >> ---
> >>  docs/nvdimm.txt | 28 
> >>  1 file changed, 28 insertions(+)
> >> 
> >> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> >> index b531cacd35..baba7a940d 100644
> >> --- a/docs/nvdimm.txt
> >> +++ b/docs/nvdimm.txt
> >> @@ -171,6 +171,32 @@ guest software that this vNVDIMM device contains a
> >> region that cannot
> >>  accept persistent writes. In result, for example, the guest Linux
> >>  NVDIMM driver, marks such vNVDIMM device as read-only.
> >>  
> >> +Backend File Setup Example
> >> +..
> >> +
> >> +Here is two examples for how to setup these persistent backend on
> >> +linux, which leverages the tool ndctl [3].
> >> +
> >> +It is easy to setup DAX device backend file.
> >> +
> >> +A. DAX device
> >> +
> >> +ndctl create-namespace -f -e namespace0.0 -m devdax
> >> +
> >> +The /dev/dax0.0 could be used directly in "mem-path" option.
> >> +
> >> +For DAX file, it is more than creating the proper namespace. The
> >> +block device should be partitioned and mounted (with dax option).
> >> +
> >> +B. DAX file
> >> +
> >> +ndctl create-namespace -f -e namespace0.0 -m fsdax
> >> +(partition /dev/pmem0 with name pmem0p1)
> >> +mount -o dax /dev/pmem0p1 /mnt
> >> +(dd a file with proper size in /mnt)
> >
> >This is not clear to me. why 'dd' file is required in /mnt?
> >You mean for creating a backend file?
> >
> 
> Yes, create a backend file. You need to give a file instead of a directory to
> qemu command line.
o.k

Thanks,
Pankaj 
> 
> >> +
> >> +Then the new file in /mnt could be used in "mem-path" option.
> >> +
> >>  NVDIMM Persistence
> >>  ------
> >>  
> >> @@ -212,3 +238,5 @@ References
> >>  
> >> https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
> >>  [2] Persistent Memory Development Kit (PMDK), formerly known as NVML
> >>  project, home page:
> >>  http://pmem.io/pmdk/
> >> +[3] ndctl-create-namespace - provision or reconfigure a namespace
> >> +http://pmem.io/ndctl/ndctl-create-namespace.html
> >> --
> >
> >Looks good to me. Just a small comment above.
> >Other than that: Reviewed-by: Pankaj Gupta 
> >
> 
> Thanks
> 
> >> 2.17.1
> >> 
> >> 
> >> 
> 
> --
> Wei Yang
> Help you, Help me
> 
> 



Re: [Qemu-devel] [PATCH] docs/nvdimm: add example on persistent backend setup

2019-07-24 Thread Pankaj Gupta


> 
> Persistent backend setup requires some knowledge about nvdimm and ndctl
> tool. Some users report they may struggle to gather these knowledge and
> have difficulty to setup it properly.
> 
> Here we provide two examples for persistent backend and gives the link
> to ndctl. By doing so, user could try it directly and do more
> investigation on persistent backend setup with ndctl.
> 
> Signed-off-by: Wei Yang 
> ---
>  docs/nvdimm.txt | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index b531cacd35..baba7a940d 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -171,6 +171,32 @@ guest software that this vNVDIMM device contains a
> region that cannot
>  accept persistent writes. In result, for example, the guest Linux
>  NVDIMM driver, marks such vNVDIMM device as read-only.
>  
> +Backend File Setup Example
> +..
> +
> +Here is two examples for how to setup these persistent backend on
> +linux, which leverages the tool ndctl [3].
> +
> +It is easy to setup DAX device backend file.
> +
> +A. DAX device
> +
> +ndctl create-namespace -f -e namespace0.0 -m devdax
> +
> +The /dev/dax0.0 could be used directly in "mem-path" option.
> +
> +For DAX file, it is more than creating the proper namespace. The
> +block device should be partitioned and mounted (with dax option).
> +
> +B. DAX file
> +
> +ndctl create-namespace -f -e namespace0.0 -m fsdax
> +(partition /dev/pmem0 with name pmem0p1)
> +mount -o dax /dev/pmem0p1 /mnt
> +(dd a file with proper size in /mnt)

This is not clear to me. why 'dd' file is required in /mnt?
You mean for creating a backend file?

> +
> +Then the new file in /mnt could be used in "mem-path" option.
> +
>  NVDIMM Persistence
>  --
>  
> @@ -212,3 +238,5 @@ References
>  
> https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
>  [2] Persistent Memory Development Kit (PMDK), formerly known as NVML
>  project, home page:
>  http://pmem.io/pmdk/
> +[3] ndctl-create-namespace - provision or reconfigure a namespace
> +http://pmem.io/ndctl/ndctl-create-namespace.html
> --

Looks good to me. Just a small comment above. 
Other than that: Reviewed-by: Pankaj Gupta 

> 2.17.1
> 
> 
> 



Re: [Qemu-devel] [PATCH] docs/nvdimm: add example on persistent backend setup

2019-07-24 Thread Pankaj Gupta


> >
> >> 
> >> Persistent backend setup requires some knowledge about nvdimm and ndctl
> >> tool. Some users report they may struggle to gather these knowledge and
> >> have difficulty to setup it properly.
> >> 
> >> Here we provide two examples for persistent backend and gives the link
> >> to ndctl. By doing so, user could try it directly and do more
> >> investigation on persistent backend setup with ndctl.
> >> 
> >> Signed-off-by: Wei Yang 
> >> ---
> >>  docs/nvdimm.txt | 28 
> >>  1 file changed, 28 insertions(+)
> >> 
> >> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> >> index b531cacd35..baba7a940d 100644
> >> --- a/docs/nvdimm.txt
> >> +++ b/docs/nvdimm.txt
> >> @@ -171,6 +171,32 @@ guest software that this vNVDIMM device contains a
> >> region that cannot
> >>  accept persistent writes. In result, for example, the guest Linux
> >>  NVDIMM driver, marks such vNVDIMM device as read-only.
> >>  
> >> +Backend File Setup Example
> >> +..
> >> +
> >> +Here is two examples for how to setup these persistent backend on
> >> +linux, which leverages the tool ndctl [3].
> >> +
> >> +It is easy to setup DAX device backend file.
> >> +
> >> +A. DAX device
> >> +
> >> +ndctl create-namespace -f -e namespace0.0 -m devdax
> >> +
> >> +The /dev/dax0.0 could be used directly in "mem-path" option.
> >> +
> >> +For DAX file, it is more than creating the proper namespace. The
> >> +block device should be partitioned and mounted (with dax option).
> >> +
> >> +B. DAX file
> >> +
> >> +ndctl create-namespace -f -e namespace0.0 -m fsdax
> >> +(partition /dev/pmem0 with name pmem0p1)
> >> +mount -o dax /dev/pmem0p1 /mnt
> >> +(dd a file with proper size in /mnt)
> >
> 
> Hi, Pankaj
> 
> Thanks for your comment.
> 
> >This namespace is for filesystem DAX? What if user wants to create namespace
> >for
> >device DAX to be used as persistent backend?
> >
> 
> User could choose the type (devdax/fsdax) in ndctl command line with -m
> option.

yes.

> 
> >Does this makes sense to mention about by default namespace created on
> >persistent
> >backend?
> >
> 
> I don't get your point. Here is an example to let user know how to create
> persistent backend. In case they want to get more control about the backend,
> they can refer to the ndctl link.

I agree this is helpful for users. 

> 
> You mean it is not proper to mention the backend setup example in the
> document? We found many users come to us for how to setup it, so we decide to
> add this section.

Sorry! For some reason I ignored devicedax namespace information in patch.
The patch looks good to me and I agree its helpful for the users.
   

Thanks,
Pankaj

> 
> Do you have other option?
> 
> >Thanks,
> >Pankaj
> >
> >> +
> >> +Then the new file in /mnt could be used in "mem-path" option.
> >> +
> >>  NVDIMM Persistence
> >>  --
> >>  
> >> @@ -212,3 +238,5 @@ References
> >>  
> >> https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
> >>  [2] Persistent Memory Development Kit (PMDK), formerly known as NVML
> >>  project, home page:
> >>  http://pmem.io/pmdk/
> >> +[3] ndctl-create-namespace - provision or reconfigure a namespace
> >> +http://pmem.io/ndctl/ndctl-create-namespace.html
> >> --
> >> 2.17.1
> >> 
> >> 
> >> 
> 
> --
> Wei Yang
> Help you, Help me
> 
> 



Re: [Qemu-devel] [PATCH] docs/nvdimm: add example on persistent backend setup

2019-07-24 Thread Pankaj Gupta


> 
> Persistent backend setup requires some knowledge about nvdimm and ndctl
> tool. Some users report they may struggle to gather these knowledge and
> have difficulty to setup it properly.
> 
> Here we provide two examples for persistent backend and gives the link
> to ndctl. By doing so, user could try it directly and do more
> investigation on persistent backend setup with ndctl.
> 
> Signed-off-by: Wei Yang 
> ---
>  docs/nvdimm.txt | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index b531cacd35..baba7a940d 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -171,6 +171,32 @@ guest software that this vNVDIMM device contains a
> region that cannot
>  accept persistent writes. In result, for example, the guest Linux
>  NVDIMM driver, marks such vNVDIMM device as read-only.
>  
> +Backend File Setup Example
> +..
> +
> +Here is two examples for how to setup these persistent backend on
> +linux, which leverages the tool ndctl [3].
> +
> +It is easy to setup DAX device backend file.
> +
> +A. DAX device
> +
> +ndctl create-namespace -f -e namespace0.0 -m devdax
> +
> +The /dev/dax0.0 could be used directly in "mem-path" option.
> +
> +For DAX file, it is more than creating the proper namespace. The
> +block device should be partitioned and mounted (with dax option).
> +
> +B. DAX file
> +
> +ndctl create-namespace -f -e namespace0.0 -m fsdax
> +(partition /dev/pmem0 with name pmem0p1)
> +mount -o dax /dev/pmem0p1 /mnt
> +(dd a file with proper size in /mnt)

This namespace is for filesystem DAX? What if user wants to create namespace for
device DAX to be used as persistent backend?

Does this makes sense to mention about by default namespace created on 
persistent
backend?

Thanks,
Pankaj

> +
> +Then the new file in /mnt could be used in "mem-path" option.
> +
>  NVDIMM Persistence
>  --
>  
> @@ -212,3 +238,5 @@ References
>  
> https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
>  [2] Persistent Memory Development Kit (PMDK), formerly known as NVML
>  project, home page:
>  http://pmem.io/pmdk/
> +[3] ndctl-create-namespace - provision or reconfigure a namespace
> +http://pmem.io/ndctl/ndctl-create-namespace.html
> --
> 2.17.1
> 
> 
> 



Re: [Qemu-devel] [PATCH] pc-dimm: fix crash when invalid slot number is used

2019-07-23 Thread Pankaj Gupta


> QEMU will crash with:
>   Segmentation fault (core dumped)
> when negative slot number is used, ex:
>   qemu-system-x86_64 -m 1G,maxmem=20G,slots=256 \
>   -object memory-backend-ram,id=mem1,size=1G \
>   -device pc-dimm,id=dimm1,memdev=mem1,slot=-2
> 
> fix it by checking that slot number is within valid range.
> 
> Signed-off-by: Igor Mammedov 
> ---
>  hw/mem/pc-dimm.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index b1239fd0d3..29c785799c 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -38,6 +38,13 @@ void pc_dimm_pre_plug(PCDIMMDevice *dimm, MachineState
> *machine,
>  
>  slot = object_property_get_int(OBJECT(dimm), PC_DIMM_SLOT_PROP,
> &error_abort);
> +if ((slot < 0 || slot >= machine->ram_slots) &&
> + slot != PC_DIMM_UNASSIGNED_SLOT) {
> +error_setg(&local_err, "invalid slot number, valid range is [0-%"
> +   PRIu64 "]", machine->ram_slots - 1);
> +goto out;
> +}
> +
>  slot = pc_dimm_get_free_slot(slot == PC_DIMM_UNASSIGNED_SLOT ? NULL :
>  &slot,
>   machine->ram_slots, &local_err);
>  if (local_err) {
> --

Reviewed-by: Pankaj Gupta 

> 2.18.1
> 
> 
> 



Re: [Qemu-devel] [PATCH 3/3] virtio pmem: remove transational device info

2019-07-15 Thread Pankaj Gupta


> 
> On 12.07.19 09:35, Pankaj Gupta wrote:
> > Remove transactional & non transactional device info
> > for virtio pmem.
> 
> Can you explain and add *why* ?

As per upstream suggestion by Cornelia & MST, transactional devices are for
legacy purpose. So, does not make sense for virtio-pmem.

Thanks,
Pankaj 

> 
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  hw/virtio/virtio-pmem-pci.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/hw/virtio/virtio-pmem-pci.c b/hw/virtio/virtio-pmem-pci.c
> > index 0da6627469..fe2af00fa1 100644
> > --- a/hw/virtio/virtio-pmem-pci.c
> > +++ b/hw/virtio/virtio-pmem-pci.c
> > @@ -113,8 +113,6 @@ static void virtio_pmem_pci_instance_init(Object *obj)
> >  static const VirtioPCIDeviceTypeInfo virtio_pmem_pci_info = {
> >  .base_name = TYPE_VIRTIO_PMEM_PCI,
> >  .generic_name  = "virtio-pmem-pci",
> > -.transitional_name = "virtio-pmem-pci-transitional",
> > -.non_transitional_name = "virtio-pmem-pci-non-transitional",
> >  .instance_size = sizeof(VirtIOPMEMPCI),
> >  .instance_init = virtio_pmem_pci_instance_init,
> >  .class_init= virtio_pmem_pci_class_init,
> > 
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 



Re: [Qemu-devel] [PATCH 3/3] virtio pmem: remove transational device info

2019-07-12 Thread Pankaj Gupta


> 
> On Fri, 12 Jul 2019 13:05:54 +0530
> Pankaj Gupta  wrote:
> 
> > Remove transactional & non transactional device info
> > for virtio pmem.
> 
> s/device info/names/ ?

yes.

> 
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  hw/virtio/virtio-pmem-pci.c | 2 --
> >  1 file changed, 2 deletions(-)
> 
> Reviewed-by: Cornelia Huck 
> 



Re: [Qemu-devel] [PATCH 0/3] virtio pmem: coverity fixes

2019-07-12 Thread Pankaj Gupta


> 
> On Fri, 12 Jul 2019 13:05:51 +0530
> Pankaj Gupta  wrote:
> 
> > This patch series two fixes for coverity and a
> > transactional info removal patch.
> > 
> > Pankaj Gupta (3):
> >   virtio pmem: fix wrong mem region condition
> >   virtio pmem: remove memdev null check
> >   virtio pmem: remove transational device info
> > 
> >  hw/virtio/virtio-pmem-pci.c | 4 +---
> >  hw/virtio/virtio-pmem.c | 4 ++--
> >  2 files changed, 3 insertions(+), 5 deletions(-)
> > 
> 
> I think all of this is 4.1 material.

Yes, forgot to add in the subject.

Thank you for the review.

Best regards,
Pankaj

> 
> 



Re: [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices

2019-07-12 Thread Pankaj Gupta


> Linux block devices, even in O_DIRECT mode don't have any user visible
> limit on transfer size / number of segments, which underlying kernel block
> device can have.
> The kernel block layer takes care of enforcing these limits by splitting the
> bios.
> 
> By limiting the transfer sizes, we force qemu to do the splitting itself
> which
> introduces various overheads.
> It is especially visible in nbd server, where the low max transfer size of
> the
> underlying device forces us to advertise this over NBD, thus increasing the
> traffic overhead in case of image conversion which benefits from large
> blocks.
> 
> More information can be found here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104
> 
> Tested this with qemu-img convert over nbd and natively and to my 
> surprise,Reviewed-by: Stefano Garzarella 
> even native IO performance improved a bit.
> 
> (The device on which it was tested is Intel Optane DC P4800X,
> which has 128k max transfer size reported by the kernel)
> 
> The benchmark:
> 
> Images were created using:
> 
> Sparse image:  qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G
> Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o
> preallocation=metadata  1G / 10G / 100G
> 
> The test was:
> 
>  echo "convert native:"
>  rm -rf /dev/shm/disk.img
>  time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img >
>  /dev/zero
> 
>  echo "convert via nbd:"
>  qemu-nbd -k /tmp/nbd.sock -v  -f qcow2 $FILE -x export --cache=none
>  --aio=native --fork
>  rm -rf /dev/shm/disk.img
>  time qemu-img convert -p -f raw -O raw
>  nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero
> 
> The results:
> 
> =
> 1G sparse image:
>  native:
>   before: 0.027s
>   after: 0.027s
>  nbd:
>   before: 0.287s
>   after: 0.035s
> 
> =
> 100G sparse image:
>  native:
>   before: 0.028s
>   after: 0.028s
>  nbd:
>   before: 23.796s
>   after: 0.109s
> 
> =
> 1G preallocated image:
>  native:
>before: 0.454s
>after: 0.427s
>  nbd:
>before: 0.649s
>after: 0.546s
> 
> The block limits of max transfer size/max segment size are retained
> for the SCSI passthrough because in this case the kernel passes the userspace
> request
> directly to the kernel scsi driver, bypassing the block layer, and thus there
> is no code to split
> such requests.
> 
> Fam, since you was the original author of the code that added
> these limits, could you share your opinion on that?
> What was the reason besides SCSI passthrough?
> 
> V2:
> 
> *  Manually tested to not break the scsi passthrough with a nested VM
> *  As Eric suggested, refactored the area around the fstat.
> *  Spelling/grammar fixes
> 
> Best regards,
>   Maxim Levitsky
> 
> Maxim Levitsky (1):
>   raw-posix.c - use max transfer length / max segement count only for
> SCSI passthrough
> 
>  block/file-posix.c | 54 --
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> --

I am not familiar with SCSI passthrough special case. But overall looks good to 
me.

Feel free to add:
Reviewed-by: Pankaj Gupta 

> 2.17.2
> 
> 
> 



[Qemu-devel] [PATCH 3/3] virtio pmem: remove transational device info

2019-07-12 Thread Pankaj Gupta
Remove transactional & non transactional device info
for virtio pmem. 

Signed-off-by: Pankaj Gupta 
---
 hw/virtio/virtio-pmem-pci.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/hw/virtio/virtio-pmem-pci.c b/hw/virtio/virtio-pmem-pci.c
index 0da6627469..fe2af00fa1 100644
--- a/hw/virtio/virtio-pmem-pci.c
+++ b/hw/virtio/virtio-pmem-pci.c
@@ -113,8 +113,6 @@ static void virtio_pmem_pci_instance_init(Object *obj)
 static const VirtioPCIDeviceTypeInfo virtio_pmem_pci_info = {
 .base_name = TYPE_VIRTIO_PMEM_PCI,
 .generic_name  = "virtio-pmem-pci",
-.transitional_name = "virtio-pmem-pci-transitional",
-.non_transitional_name = "virtio-pmem-pci-non-transitional",
 .instance_size = sizeof(VirtIOPMEMPCI),
 .instance_init = virtio_pmem_pci_instance_init,
 .class_init= virtio_pmem_pci_class_init,
-- 
2.14.5




[Qemu-devel] [PATCH 2/3] virtio pmem: remove memdev null check

2019-07-12 Thread Pankaj Gupta
Coverity reports that when we're assigning vi->size we handle the 
"pmem->memdev is NULL" case; but we then pass it into 
object_get_canonical_path(), which unconditionally dereferences it
and will crash if it is NULL. If this pointer can be NULL then we
need to do something else here.

We are removing 'pmem->memdev' null check here as memdev will never
be null in this function.

Signed-off-by: Pankaj Gupta 
---
 hw/virtio/virtio-pmem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
index adbfb603ab..17c196d107 100644
--- a/hw/virtio/virtio-pmem.c
+++ b/hw/virtio/virtio-pmem.c
@@ -134,8 +134,8 @@ static void virtio_pmem_fill_device_info(const VirtIOPMEM 
*pmem,
  VirtioPMEMDeviceInfo *vi)
 {
 vi->memaddr = pmem->start;
-vi->size = pmem->memdev ? memory_region_size(&pmem->memdev->mr) : 0;
-vi->memdev = object_get_canonical_path(OBJECT(pmem->memdev));
+vi->size= memory_region_size(&pmem->memdev->mr);
+vi->memdev  = object_get_canonical_path(OBJECT(pmem->memdev));
 }
 
 static MemoryRegion *virtio_pmem_get_memory_region(VirtIOPMEM *pmem,
-- 
2.14.5




[Qemu-devel] [PATCH 0/3] virtio pmem: coverity fixes

2019-07-12 Thread Pankaj Gupta
This patch series two fixes for coverity and a 
transactional info removal patch.

Pankaj Gupta (3):
  virtio pmem: fix wrong mem region condition
  virtio pmem: remove memdev null check
  virtio pmem: remove transational device info

 hw/virtio/virtio-pmem-pci.c | 4 +---
 hw/virtio/virtio-pmem.c | 4 ++--
 2 files changed, 3 insertions(+), 5 deletions(-)

-- 
2.14.5




[Qemu-devel] [PATCH 1/3] virtio pmem: fix wrong mem region condition

2019-07-12 Thread Pankaj Gupta
Coverity reported memory region returns zero
for non-null value. This is because of wrong
arguments to '?:' , fixing this.

Signed-off-by: Pankaj Gupta 
---
 hw/virtio/virtio-pmem-pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pmem-pci.c b/hw/virtio/virtio-pmem-pci.c
index 8b2d0dbccc..0da6627469 100644
--- a/hw/virtio/virtio-pmem-pci.c
+++ b/hw/virtio/virtio-pmem-pci.c
@@ -57,7 +57,7 @@ static uint64_t virtio_pmem_pci_get_plugged_size(const 
MemoryDeviceState *md,
 MemoryRegion *mr = vpc->get_memory_region(pmem, errp);
 
 /* the plugged size corresponds to the region size */
-return mr ? 0 : memory_region_size(mr);
+return mr ? memory_region_size(mr) : 0;
 }
 
 static void virtio_pmem_pci_fill_device_info(const MemoryDeviceState *md,
-- 
2.14.5




Re: [Qemu-devel] [PULL 05/22] virtio-pmem: add virtio device

2019-07-11 Thread Pankaj Gupta


> >
> > From: Pankaj Gupta 
> >
> > This is the implementation of virtio-pmem device. Support will require
> > machine changes for the architectures that will support it, so it will
> > not yet be compiled. It can be unlocked with VIRTIO_PMEM_SUPPORTED per
> > machine and disabled globally via VIRTIO_PMEM.
> >
> > We cannot use the "addr" property as that is already used e.g. for
> > virtio-pci/pci devices. And we will have e.g. virtio-pmem-pci as a proxy.
> > So we have to choose a different one (unfortunately). "memaddr" it is.
> > That name should ideally be used by all other virtio-* based memory
> > devices in the future.
> > -device virtio-pmem-pci,id=p0,bus=bux0,addr=0x01,memaddr=0x100...
> >
> > Acked-by: Markus Armbruster 
> > [ QAPI bits ]
> > Signed-off-by: Pankaj Gupta '-numa node,memdev
> > [ MemoryDevice/MemoryRegion changes, cleanups, addr property "memaddr",
> >   split up patches, unplug handler ]
> > Signed-off-by: David Hildenbrand 
> > Message-Id: <20190619094907.10131-2-pagu...@redhat.com>
> > Reviewed-by: Michael S. Tsirkin 
> > Signed-off-by: Michael S. Tsirkin 
> 
> > +static void virtio_pmem_fill_device_info(const VirtIOPMEM *pmem,
> > + VirtioPMEMDeviceInfo *vi)
> > +{
> > +vi->memaddr = pmem->start;
> > +vi->size = pmem->memdev ? memory_region_size(&pmem->memdev->mr) : 0;
> > +vi->memdev = object_get_canonical_path(OBJECT(pmem->memdev));
> 
> Hi; Coverity points out (CID 1403009) that when we're assigning
> vi->size we handle the "pmem->memdev is NULL" case; but we then
> pass it into object_get_canonical_path(), which unconditionally
> dereferences it and will crash if it is NULL. If this pointer
> can be NULL then we need to do something else here.

This will never be NULL. I will send below patch to pass coverity.

- vi->size = pmem->memdev ? memory_region_size(&pmem->memdev->mr) : 0;
+ vi->size = memory_region_size(&pmem->memdev->mr);

Thanks,
Pankaj

> 
> > +}
> 
> thanks
> -- PMM
> 
> 



Re: [Qemu-devel] [PULL 08/22] virtio-pci: Proxy for virtio-pmem

2019-07-11 Thread Pankaj Gupta


Hi Peter,

> > From: Pankaj Gupta 
> >
> > We need a proxy device for virtio-pmem, and this device has to be the
> > actual memory device so we can cleanly hotplug it.
> >
> > Forward memory device class functions either to the actual device or use
> > properties of the virtio-pmem device to implement these in the proxy.
> >
> > virtio-pmem will only be compiled for selected, supported architectures
> > (that can deal with virtio/pci devices being memory devices). An
> > architecture that is prepared for that can simply enable
> > CONFIG_VIRTIO_PMEM to make it work.
> >
> > As not all architectures support memory devices (and CONFIG_VIRTIO_PMEM
> > will be enabled per supported architecture), we have to move the PCI proxy
> > to a separate file.
> >
> > Signed-off-by: Pankaj Gupta 
> > [ split up patches, memory-device changes, move pci proxy]
> > Signed-off-by: David Hildenbrand 
> > Message-Id: <20190619094907.10131-5-pagu...@redhat.com>
> > Reviewed-by: Michael S. Tsirkin 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> 
> Hi; Coverity spotted a bug here (CID 1403010):
> 
> > +static uint64_t virtio_pmem_pci_get_plugged_size(const MemoryDeviceState
> > *md,
> > + Error **errp)
> > +{
> > +VirtIOPMEMPCI *pci_pmem = VIRTIO_PMEM_PCI(md);
> > +VirtIOPMEM *pmem = VIRTIO_PMEM(&pci_pmem->vdev);
> > +VirtIOPMEMClass *vpc = VIRTIO_PMEM_GET_CLASS(pmem);
> > +MemoryRegion *mr = vpc->get_memory_region(pmem, errp);
> > +
> > +/* the plugged size corresponds to the region size */
> > +return mr ? 0 : memory_region_size(mr);
> 
> This looks like maybe the arguments to ?: have been put
> the wrong way round? If mr is non-NULL we'll return 0
> and if it is NULL then we'll crash because memory_region_size()
> dereferences mr...

Yes. I will send a patch to fix this.

Thanks,
Pankaj

> 
> > +}
> 
> thanks
> -- PMM
> 
> 



[Qemu-devel] [PATCH v15 7/7] xfs: disable map_sync for async flush

2019-07-05 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_file.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
struct file *filp,
struct vm_area_struct *vma)
 {
+   struct dax_device   *dax_dev;
+
+   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(filp);
-- 
2.20.1




[Qemu-devel] [PATCH v15 5/7] dax: check synchronous mapping is supported

2019-07-05 Thread Pankaj Gupta
This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC.

Suggested-by: Jan Kara 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 include/linux/dax.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 86fc55c99b58..d1bea3979b5a 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -53,6 +53,18 @@ static inline void set_dax_synchronous(struct dax_device 
*dax_dev)
 {
__set_dax_synchronous(dax_dev);
 }
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+struct dax_device *dax_dev)
+{
+   if (!(vma->vm_flags & VM_SYNC))
+   return true;
+   if (!IS_DAX(file_inode(vma->vm_file)))
+   return false;
+   return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -87,6 +99,11 @@ static inline bool dax_synchronous(struct dax_device 
*dax_dev)
 static inline void set_dax_synchronous(struct dax_device *dax_dev)
 {
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+   struct dax_device *dax_dev)
+{
+   return !(vma->vm_flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1




[Qemu-devel] [PATCH v15 6/7] ext4: disable map_sync for async flush

2019-07-05 Thread Pankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
---
 fs/ext4/file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
-* We don't support synchronous mappings for non-DAX files. At least
-* until someone comes with a sensible use case.
+* We don't support synchronous mappings for non-DAX files and
+* for DAX files if underneath dax_device is not synchronous.
 */
-   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+   if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
 
file_accessed(file);
-- 
2.20.1




[Qemu-devel] [PATCH v15 4/7] dm: enable synchronous dax

2019-07-05 Thread Pankaj Gupta
This patch sets dax device 'DAXDEV_SYNC' flag if all the target
devices of device mapper support synchrononous DAX. If device
mapper consists of both synchronous and asynchronous dax devices,
we don't set 'DAXDEV_SYNC' flag.

'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn'
as argument so that the callers can pass the appropriate functions.

Suggested-by: Mike Snitzer 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Mike Snitzer 
---
 drivers/md/dm-table.c | 24 ++--
 drivers/md/dm.c   |  2 +-
 drivers/md/dm.h   |  5 -
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 350cf0451456..81c55304c4fa 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -881,7 +881,7 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);
 
 /* validate the dax capability of the target device span */
-static int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
+int device_supports_dax(struct dm_target *ti, struct dm_dev *dev,
   sector_t start, sector_t len, void *data)
 {
int blocksize = *(int *) data;
@@ -890,7 +890,15 @@ static int device_supports_dax(struct dm_target *ti, 
struct dm_dev *dev,
start, len);
 }
 
-bool dm_table_supports_dax(struct dm_table *t, int blocksize)
+/* Check devices support synchronous DAX */
+static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
+  sector_t start, sector_t len, void *data)
+{
+   return dax_synchronous(dev->dax_dev);
+}
+
+bool dm_table_supports_dax(struct dm_table *t,
+ iterate_devices_callout_fn iterate_fn, int *blocksize)
 {
struct dm_target *ti;
unsigned i;
@@ -903,8 +911,7 @@ bool dm_table_supports_dax(struct dm_table *t, int 
blocksize)
return false;
 
if (!ti->type->iterate_devices ||
-   !ti->type->iterate_devices(ti, device_supports_dax,
-   &blocksize))
+   !ti->type->iterate_devices(ti, iterate_fn, blocksize))
return false;
}
 
@@ -940,6 +947,7 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
+   int page_size = PAGE_SIZE;
 
if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -984,7 +992,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, PAGE_SIZE) ||
+   if (dm_table_supports_dax(t, device_supports_dax, &page_size) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
} else {
@@ -1883,6 +1891,7 @@ void dm_table_set_restrictions(struct dm_table *t, struct 
request_queue *q,
   struct queue_limits *limits)
 {
bool wc = false, fua = false;
+   int page_size = PAGE_SIZE;
 
/*
 * Copy table's limits to the DM device's request_queue
@@ -1910,8 +1919,11 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
}
blk_queue_write_cache(q, wc, fua);
 
-   if (dm_table_supports_dax(t, PAGE_SIZE))
+   if (dm_table_supports_dax(t, device_supports_dax, &page_size)) {
blk_queue_flag_set(QUEUE_FLAG_DAX, q);
+   if (dm_table_supports_dax(t, device_synchronous, NULL))
+   set_dax_synchronous(t->md->dax_dev);
+   }
else
blk_queue_flag_clear(QUEUE_FLAG_DAX, q);
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b1caa7188209..b92c42a72ad4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1119,7 +1119,7 @@ static bool dm_dax_supported(struct dax_device *dax_dev, 
struct block_device *bd
if (!map)
return false;
 
-   ret = dm_table_supports_dax(map, blocksize);
+   ret = dm_table_supports_dax(map, device_supports_dax, &blocksize);
 
dm_put_live_table(md, srcu_idx);
 
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 17e3db54404c..0475673337f3 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -72,7 +72,10 @@ bool dm_table_bio_based(struct dm_table *t);
 bool dm_table_request_based(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
-bool dm_table_sup

[Qemu-devel] [PATCH v15 3/7] libnvdimm: add dax_dev sync flag

2019-07-05 Thread Pankaj Gupta
This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 19 ++-
 drivers/md/dm.c  |  3 ++-
 drivers/nvdimm/pmem.c|  5 -
 drivers/nvdimm/region_devs.c |  7 +++
 drivers/s390/block/dcssblk.c |  2 +-
 include/linux/dax.h  | 24 ++--
 include/linux/libnvdimm.h|  1 +
 8 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
if (!dax_dev)
goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 4e5ae7e8b557..8ab12068eea3 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -195,6 +195,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to check if device supports synchronous flush */
+   DAXDEV_SYNC,
 };
 
 /**
@@ -372,6 +374,18 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool __dax_synchronous(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(__dax_synchronous);
+
+void __set_dax_synchronous(struct dax_device *dax_dev)
+{
+   set_bit(DAXDEV_SYNC, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(__set_dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(&dax_srcu);
@@ -526,7 +540,7 @@ static void dax_add_host(struct dax_device *dax_dev, const 
char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-   const struct dax_operations *ops)
+   const struct dax_operations *ops, unsigned long flags)
 {
struct dax_device *dax_dev;
const char *host;
@@ -549,6 +563,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
+   if (flags & DAXDEV_F_SYNC)
+   set_dax_synchronous(dax_dev);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 5475081dcbd6..b1caa7188209 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1991,7 +1991,8 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-   md->dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops);
+   md->dax_dev = alloc_dax(md, md->disk->disk_name,
+   &dm_dax_ops, 0);
if (!md->dax_dev)
goto bad;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 223da63d1bd7..8be868e2a18b 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -376,6 +376,7 @@ static int pmem_attach_disk(struct device *dev,
struct gendisk *disk;
void *addr;
int rc;
+   unsigned long flags = 0UL;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -474,7 +475,9 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res);
disk->bb = &pmem->bb;
 
-   dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops);
+   if (is_nvdimm_sync(nd_region))
+   flags = DAXDEV_F_SYNC;
+   dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops, flags);
if (!dax_dev) {
put_disk(disk);
return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index eca2e62af134..56f2227f192a 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1211,6 +1211,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+   return is_nd_pmem(&nd_region->dev) &&
+   !test_bit(ND_REGION_ASYNC, &nd_region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
struct nd_region *nd_region;
resourc

[Qemu-devel] [PATCH v15 2/7] virtio-pmem: Add virtio pmem driver

2019-07-05 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
Reviewed-by: Yuval Shaia 
Acked-by: Michael S. Tsirkin 
Acked-by: Jakub Staron 
Tested-by: Jakub Staron 
Reviewed-by: Cornelia Huck 
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_virtio.c   | 125 +++
 drivers/nvdimm/virtio_pmem.c | 122 ++
 drivers/nvdimm/virtio_pmem.h |  55 ++
 drivers/virtio/Kconfig   |  11 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  34 +
 7 files changed, 349 insertions(+)
 create mode 100644 drivers/nvdimm/nd_virtio.c
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/nvdimm/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 6f2a088afad6..cefe233e0b52 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 obj-$(CONFIG_OF_PMEM) += of_pmem.o
+obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
 
 nd_pmem-y := pmem.o
 
diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
new file mode 100644
index ..8645275c08c2
--- /dev/null
+++ b/drivers/nvdimm/nd_virtio.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include "virtio_pmem.h"
+#include "nd.h"
+
+ /* The interrupt handler */
+void virtio_pmem_host_ack(struct virtqueue *vq)
+{
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+   struct virtio_pmem_request *req_data, *req_buf;
+   unsigned long flags;
+   unsigned int len;
+
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+   while ((req_data = virtqueue_get_buf(vq, &len)) != NULL) {
+   req_data->done = true;
+   wake_up(&req_data->host_acked);
+
+   if (!list_empty(&vpmem->req_list)) {
+   req_buf = list_first_entry(&vpmem->req_list,
+   struct virtio_pmem_request, list);
+   req_buf->wq_buf_avail = true;
+   wake_up(&req_buf->wq_buf);
+   list_del(&req_buf->list);
+   }
+   }
+   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_ack);
+
+ /* The request submission function */
+static int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem  = vdev->priv;
+   struct virtio_pmem_request *req_data;
+   struct scatterlist *sgs[2], sg, ret;
+   unsigned long flags;
+   int err, err1;
+
+   might_sleep();
+   req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
+   if (!req_data)
+   return -ENOMEM;
+
+   req_data->done = false;
+   init_waitqueue_head(&req_data->host_acked);
+   init_waitqueue_head(&req_data->wq_buf);
+   INIT_LIST_HEAD(&req_data->list);
+   req_data->req.type = cpu_to_virtio32(vdev, VIRTIO_PMEM_REQ_TYPE_FLUSH);
+   sg_init_one(&sg, &req_data->req, sizeof(req_data->req));
+   sgs[0] = &sg;
+   sg_init_one(&ret, &req_data->resp.ret, sizeof(req_data->resp));
+   sgs[1] = &ret;
+
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+/*
+ * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
+ * queue does not have free descriptor. We add the request
+ * to req_list and wait for host_ack to wake us up when free
+ * slots are available.
+ */
+   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
+   GFP_ATOMIC)) == -ENOSPC) {
+
+   dev_info(&vdev->dev, "failed to send command to virtio pmem 
device, no free slots in the virtqueue\n");
+   req_data->wq_buf_avail = false;
+   list_add_tail(&req_data->

[Qemu-devel] [PATCH v15 0/7] virtio pmem driver

2019-07-05 Thread Pankaj Gupta
 - Rebased to 5.2-rc1

Changes from PATCH v8:
 - Set device mapper synchronous if all target devices support - Dan
 - Move virtio_pmem.h to nvdimm directory  - Dan
 - Style, indentation & better error messages in patch 2 - DavidH
 - Added MST's ack in patch 2.

Changes from PATCH v7:
 - Corrected pending request queue logic (patch 2) - Jakub Staroń
 - Used unsigned long flags for passing DAXDEV_F_SYNC (patch 3) - Dan
 - Fixed typo =>  vma 'flag' to 'vm_flag' (patch 4)
 - Added rob in patch 6 & patch 2

Changes from PATCH v6: 
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: 
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Pankaj Gupta (7):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   dm: dm: Enable synchronous dax
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/6/21/452
[2] https://lkml.org/lkml/2019/6/12/624
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://marc.info/?l=qemu-devel&m=155860751202202&w=2
[6] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[7] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c |4 -
 drivers/dax/bus.c|2 
 drivers/dax/super.c  |   19 +
 drivers/md/dm-table.c|   24 +--
 drivers/md/dm.c  |5 -
 drivers/md/dm.h  |5 +
 drivers/nvdimm/Makefile  |1 
 drivers/nvdimm/claim.c   |6 +
 drivers/nvdimm/nd.h  |1 
 drivers/nvdimm/nd_virtio.c   |  125 +++
 drivers/nvdimm/pmem.c|   18 +++--
 drivers/nvdimm/region_devs.c |   33 +-
 drivers/nvdimm/virtio_pmem.c |  122 ++
 drivers/nvdimm/virtio_pmem.h |   55 +
 drivers/s390/block/dcssblk.c |2 
 drivers/virtio/Kconfig   |   11 +++
 fs/ext4/file.c   |   10 +--
 fs/xfs/xfs_file.c|9 +-
 include/linux/dax.h  |   41 
 include/linux/libnvdimm.h|   10 ++-
 include/uapi/linux/virtio_ids.h  |1 
 include/uapi/linux/virtio_pmem.h |   34 ++
 22 files changed, 504 insertions(+), 34 deletions(-)






[Qemu-devel] [PATCH v15 1/7] libnvdimm: nd_region flush callback support

2019-07-05 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 13 -
 drivers/nvdimm/region_devs.c | 26 --
 include/linux/libnvdimm.h|  9 -
 6 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index f1ed0befe303..9ddd8667153e 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
struct badblocks bb;
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
+   int (*flush)(struct nd_region *nd_region, struct bio *bio);
struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..c757a47183b8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
 
do_acct = nd_iostat_start(bio, &start);
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;
-
gendev = disk_to_dev(disk);
gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/dri

Re: [Qemu-devel] [PULL 00/22] virtio, pc, pci: features, fixes, cleanups

2019-07-05 Thread Pankaj Gupta



> > > > The following changes since commit
> > > > 7fec76a02267598a4e437ddfdaeaeb6de09b92f3:
> > > >
> > > >   Merge remote-tracking branch
> > > >   'remotes/maxreitz/tags/pull-block-2019-06-24' into staging
> > > >   (2019-07-01
> > > >   11:28:28 +0100)
> > > >
> > > > are available in the Git repository at:
> > > >
> > > >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> > > >
> > > > for you to fetch changes up to
> > > > a360cd11de5ae59db55e128fd209290c777eb177:
> > > >
> > > >   docs: avoid vhost-user-net specifics in multiqueue section
> > > >   (2019-07-01
> > > >   10:39:35 -0400)
> > > >
> > > > 
> > > > virtio, pc, pci: features, fixes, cleanups
> > > >
> > > > virtio-pmem support.
> > > > libvhost user mq support.
> > > > A bunch of fixes all over the place.
> > > >
> > > > Signed-off-by: Michael S. Tsirkin 
> > > >
> > > 
> > > This fails to build on all the non-Linux platforms:
> > > 
> > > In file included from
> > > /Users/pm215/src/qemu-for-merges/hw/virtio/virtio-pmem.c:21:
> > > /Users/pm215/src/qemu-for-merges/include/standard-headers/linux/virtio_pmem.h:13:10:
> > > fatal error: 'linux/types.h' file not found
> > > #include 
> > >  ^~~
> > 
> > Sorry for this.
> > Can we please apply below patch on top. I only tested this in linux
> > but I think this will solve the issue. Let me know if you want to resend
> > entire series.
> > 
> > Thank you,
> > Pankaj
> > 
> > ===
> > 
> > From: Pankaj Gupta 
> > Date: Thu, 4 Jul 2019 16:27:08 +0530
> > Subject: [PATCH] Sync header and fix non linux build issue
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  include/standard-headers/linux/virtio_pmem.h | 11 +--
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/standard-headers/linux/virtio_pmem.h
> > b/include/standard-headers/linux/virtio_pmem.h
> > index 7a3e2fe524..a60236f63d 100644
> > --- a/include/standard-headers/linux/virtio_pmem.h
> > +++ b/include/standard-headers/linux/virtio_pmem.h
> > @@ -10,14 +10,13 @@
> >  #ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> >  #define _UAPI_LINUX_VIRTIO_PMEM_H
> > 
> > -#include 
> > -#include 
> > -#include 
> > -#include 
> > +#include "standard-headers/linux/virtio_types.h"
> > +#include "standard-headers/linux/virtio_ids.h"
> > +#include "standard-headers/linux/virtio_config.h"
> > 
> >  struct virtio_pmem_config {
> > -   __le64 start;
> > -   __le64 size;
> > +   uint64_t start;
> > +   uint64_t size;
> >  };
> > 
> >  #define VIRTIO_PMEM_REQ_TYPE_FLUSH  0
> 
> You need to get rid of __virtio things too.
> I fixed up, hopefully well.

o.k. Thank you Michael

Best regards,
Pankaj

> If that's not enough then I will drop pmem for now.
> 
> --
> MST
> 
> 



Re: [Qemu-devel] [PULL 00/22] virtio, pc, pci: features, fixes, cleanups

2019-07-04 Thread Pankaj Gupta


> >
> > The following changes since commit
> > 7fec76a02267598a4e437ddfdaeaeb6de09b92f3:
> >
> >   Merge remote-tracking branch
> >   'remotes/maxreitz/tags/pull-block-2019-06-24' into staging (2019-07-01
> >   11:28:28 +0100)
> >
> > are available in the Git repository at:
> >
> >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> >
> > for you to fetch changes up to a360cd11de5ae59db55e128fd209290c777eb177:
> >
> >   docs: avoid vhost-user-net specifics in multiqueue section (2019-07-01
> >   10:39:35 -0400)
> >
> > 
> > virtio, pc, pci: features, fixes, cleanups
> >
> > virtio-pmem support.
> > libvhost user mq support.
> > A bunch of fixes all over the place.
> >
> > Signed-off-by: Michael S. Tsirkin 
> >
> 
> This fails to build on all the non-Linux platforms:
> 
> In file included from
> /Users/pm215/src/qemu-for-merges/hw/virtio/virtio-pmem.c:21:
> /Users/pm215/src/qemu-for-merges/include/standard-headers/linux/virtio_pmem.h:13:10:
> fatal error: 'linux/types.h' file not found
> #include 
>  ^~~

Sorry for this.
Can we please apply below patch on top. I only tested this in linux
but I think this will solve the issue. Let me know if you want to resend
entire series.

Thank you,
Pankaj

===

From: Pankaj Gupta 
Date: Thu, 4 Jul 2019 16:27:08 +0530
Subject: [PATCH] Sync header and fix non linux build issue

Signed-off-by: Pankaj Gupta 
---
 include/standard-headers/linux/virtio_pmem.h | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/standard-headers/linux/virtio_pmem.h 
b/include/standard-headers/linux/virtio_pmem.h
index 7a3e2fe524..a60236f63d 100644
--- a/include/standard-headers/linux/virtio_pmem.h
+++ b/include/standard-headers/linux/virtio_pmem.h
@@ -10,14 +10,13 @@
 #ifndef _UAPI_LINUX_VIRTIO_PMEM_H
 #define _UAPI_LINUX_VIRTIO_PMEM_H

-#include 
-#include 
-#include 
-#include 
+#include "standard-headers/linux/virtio_types.h"
+#include "standard-headers/linux/virtio_ids.h"
+#include "standard-headers/linux/virtio_config.h"

 struct virtio_pmem_config {
-   __le64 start;
-   __le64 size;
+   uint64_t start;
+   uint64_t size;
 };

 #define VIRTIO_PMEM_REQ_TYPE_FLUSH  0




Re: [Qemu-devel] [PATCH v2 0/7] Qemu virtio pmem device

2019-07-02 Thread Pankaj Gupta



> >> 
> >> Ok, if it works, we could list those regions? and change pmem0 mode to dax
> >> mode, right?
> >
> >You mean fs dax?
> >I don't think currently we support this because raw image wont have any
> >metadata.
> >Will have to think if this is at all possible or how we can achieve such
> >behavior.
> >
> 
> Ok, I got it.
> 
> >Also, there is requirement to support host backing file on real NVDIMM and
> >virtio. Once we have have first version of virtio pmem series merged
> >upstream
> >we will continue to work on advance features depending upon feasibility.
> >
> 
> One curiosity, what difference make NVDIMM backend doesn't work now?
> 
> The /dev/dax0.0 is a char file. The nvdimm device use mmap to map HVA to HPA.
> It looks a normal file to me. Would appreciated it if you would share some
> light on it.
> 

It will require support of MAP_SYNC at host side without an explicit 
flush/fsync.
We need to add the usecase in both guest and host drivers/device. Also, thorough
testing is required to fix any issues. 

I will do the required changes for the next step. 

Thanks,
Pankaj



Re: [Qemu-devel] [PATCH v2 0/7] Qemu virtio pmem device

2019-07-02 Thread Pankaj Gupta



> >> >
> >> >> 
> >> >> On Wed, Jun 19, 2019 at 03:19:00PM +0530, Pankaj Gupta wrote:
> >> >> > This patch series has implementation for "virtio pmem"
> >> >> > device. "virtio pmem" is persistent memory(nvdimm) device in
> >> >> > guest which allows to bypass the guest page cache. This
> >> >> > also implements a VIRTIO based asynchronous flush mechanism.
> >> >> > Details of project idea for 'virtio pmem' flushing interface
> >> >> > is shared [2] & [3].
> >> >> >
> >> >> > Sharing Qemu device emulation in this patchset. Tested with
> >> >> > guest kernel driver [1]. This series is based on David's
> >> >> > memory device refactoring [5] work with modified version of
> >> >> > my initial virtio pmem [4] series.
> >> >> >
> >> >> > Usage:
> >> >> > ./qemu -name test -machine pc -m 8G,slots=240,maxmem=20G
> >> >> > -object memory-backend-file,id=mem1,share,mem-path=test.img,
> >> >> >  size=4G,share
> >> >> > -device virtio-pmem-pci,memdev=mem1,id=nv1
> >> >> >
> >> >> 
> >> >> Hi, Pankaj
> >> >
> >> >Hi Wei,
> >> >
> >> >> 
> >> >> I tried this series with v14 kernel driver, while getting some error on
> >> >> using
> >> >> this. Not sure this is my error configuration.
> >> >> 
> >> >> The qemu command line is:
> >> >> 
> >> >> -object
> >> >> 
> >> >> memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=1G,align=2M
> >> >> -device virtio-pmem-pci,memdev=mem1,id=nv1
> >> >
> >> >Are you using host backing on real NVDIMM? Currently, we only support
> >> >backing image
> >> >file on regular SSD. We have plans to support backing file on real NVDIMM
> >> >device
> >> >as well but this is part of future work.
> >> >
> >> >Can you please try by allocating a raw image file on regular SSD. It is
> >> >working fine
> >> >for me.
> >> >
> >> 
> >> I created a file with 2G on my disk.
> >> 
> >> #ll -h 2G-file
> >> -rw-r--r-- 1 richard richard 2.0G 6月  26 09:26 2G-file
> >> 
> >> The command line is changed to:
> >> 
> >> -object
> >> 
> >> memory-backend-file,id=mem1,share,mem-path=/home/richard/guest/2G-file,size=2G
> >> -device virtio-pmem-pci,memdev=mem1,id=nv1
> >> 
> >> The behavior in guest is the same.
> >
> >Are you still facing an error with this? or its working fine for you?
> >
> 
> I can partition and mkfs with /dev/pmem0 after I use the SSD backend.

Good to know. Thanks for the confirmation.

> 
> >> 
> >> I took a look into the directory /sys/bus/nd/device. These files are
> >> listed.
> >> Compared with normal system, one device file is missed.
> >
> >virtio pmem does not support namespace/region mappings which ACPI NFIT
> >supports.
> >
> 
> Ok, thanks.
> 
> >> 
> >> btt0.0  dax0.0  namespace0.0  ndbus0  pfn0.0  region0
> >> 
> >> But the sysfs shows pmem0 block device is created.
> >> 
> >> /sys/devices/pci:00/:00:04.0/virtio0/ndbus0/region0/namespace0.0/block/pmem0
> >> 
> >> Then I took a look into the pci device:
> >> 
> >> # lspci -vs 00:04.0
> >> 00:04.0 Unclassified device [00ff]: Red Hat, Inc. Device 1013
> >> Subsystem: Red Hat, Inc. Device 001b
> >> Physical Slot: 4
> >> Flags: bus master, fast devsel, latency 0, IRQ 11
> >> I/O ports at c040 [size=64]
> >> Memory at fe00 (64-bit, prefetchable) [size=16K]
> >> Capabilities: [84] Vendor Specific Information: VirtIO:
> >> Capabilities: [70] Vendor Specific Information: VirtIO: Notify
> >> Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
> >> Capabilities: [50] Vendor Specific Information: VirtIO: ISR
> >> Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
> >> Kernel driver in use: virtio-pci
> >> 
> >> This looks good.
> >
> >Good to know.
> >
> >> 
> &

Re: [Qemu-devel] [PATCH v2 0/7] Qemu virtio pmem device

2019-07-02 Thread Pankaj Gupta


> >
> >> 
> >> On Wed, Jun 19, 2019 at 03:19:00PM +0530, Pankaj Gupta wrote:
> >> > This patch series has implementation for "virtio pmem"
> >> > device. "virtio pmem" is persistent memory(nvdimm) device in
> >> > guest which allows to bypass the guest page cache. This
> >> > also implements a VIRTIO based asynchronous flush mechanism.
> >> > Details of project idea for 'virtio pmem' flushing interface
> >> > is shared [2] & [3].
> >> >
> >> > Sharing Qemu device emulation in this patchset. Tested with
> >> > guest kernel driver [1]. This series is based on David's
> >> > memory device refactoring [5] work with modified version of
> >> > my initial virtio pmem [4] series.
> >> >
> >> > Usage:
> >> > ./qemu -name test -machine pc -m 8G,slots=240,maxmem=20G
> >> > -object memory-backend-file,id=mem1,share,mem-path=test.img,
> >> >  size=4G,share
> >> > -device virtio-pmem-pci,memdev=mem1,id=nv1
> >> >
> >> 
> >> Hi, Pankaj
> >
> >Hi Wei,
> >
> >> 
> >> I tried this series with v14 kernel driver, while getting some error on
> >> using
> >> this. Not sure this is my error configuration.
> >> 
> >> The qemu command line is:
> >> 
> >> -object
> >> 
> >> memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=1G,align=2M
> >> -device virtio-pmem-pci,memdev=mem1,id=nv1
> >
> >Are you using host backing on real NVDIMM? Currently, we only support
> >backing image
> >file on regular SSD. We have plans to support backing file on real NVDIMM
> >device
> >as well but this is part of future work.
> >
> >Can you please try by allocating a raw image file on regular SSD. It is
> >working fine
> >for me.
> >
> 
> I created a file with 2G on my disk.
> 
> #ll -h 2G-file
> -rw-r--r-- 1 richard richard 2.0G 6月  26 09:26 2G-file
> 
> The command line is changed to:
> 
> -object
> 
> memory-backend-file,id=mem1,share,mem-path=/home/richard/guest/2G-file,size=2G
> -device virtio-pmem-pci,memdev=mem1,id=nv1
> 
> The behavior in guest is the same.

Are you still facing an error with this? or its working fine for you?

> 
> I took a look into the directory /sys/bus/nd/device. These files are listed.
> Compared with normal system, one device file is missed.

virtio pmem does not support namespace/region mappings which ACPI NFIT supports.

> 
> btt0.0  dax0.0  namespace0.0  ndbus0  pfn0.0  region0
> 
> But the sysfs shows pmem0 block device is created.
> 
> /sys/devices/pci:00/:00:04.0/virtio0/ndbus0/region0/namespace0.0/block/pmem0
> 
> Then I took a look into the pci device:
> 
> # lspci -vs 00:04.0
> 00:04.0 Unclassified device [00ff]: Red Hat, Inc. Device 1013
> Subsystem: Red Hat, Inc. Device 001b
> Physical Slot: 4
> Flags: bus master, fast devsel, latency 0, IRQ 11
> I/O ports at c040 [size=64]
> Memory at fe00 (64-bit, prefetchable) [size=16K]
> Capabilities: [84] Vendor Specific Information: VirtIO: 
> Capabilities: [70] Vendor Specific Information: VirtIO: Notify
> Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
> Capabilities: [50] Vendor Specific Information: VirtIO: ISR
> Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
> Kernel driver in use: virtio-pci
> 
> This looks good.

Good to know.

> 
> >> 
> >> The guest boots up and I can see /dev/pmem0 device. But when I want to
> >> partition this device, I got the error:
> >> 
> >> # parted /dev/pmem0 mklabel gpt
> >> Warning: Error fsyncing/closing /dev/pmem0: Input/output error
> >> 
> >> Also I see an error when running "ndctl list":
> >> 
> >> libndctl: __sysfs_device_parse: ndctl0: add_dev() failed
> >
> >Will look at this if it is related.
> 
> This log still there.

It looks to me libndctl needs to be taught about about virtio pmem
data parsing. But this is unrelated to kernel and qemu patch series.

> 
> >
> >Thanks,
> >Pankaj
> >> 
> >> Would you mind letting me know which part I am wrong?
> >> 
> >> > (qemu) info memory-devices
> >> >  Memory device [virtio-pmem]: "nv1"
> >> >  memaddr: 0x24000
> >> >  size: 4294967296
> >> >  memdev: /objects/mem1
> &g

Re: [Qemu-devel] [PATCH v2 4/7] virtio-pci: Proxy for virtio-pmem

2019-07-02 Thread Pankaj Gupta


> 
> On Tue, Jul 02, 2019 at 01:55:19PM +0200, Cornelia Huck wrote:
> > On Wed, 19 Jun 2019 15:19:04 +0530
> > Pankaj Gupta  wrote:
> > 
> > > We need a proxy device for virtio-pmem, and this device has to be the
> > > actual memory device so we can cleanly hotplug it.
> > > 
> > > Forward memory device class functions either to the actual device or use
> > > properties of the virtio-pmem device to implement these in the proxy.
> > > 
> > > virtio-pmem will only be compiled for selected, supported architectures
> > > (that can deal with virtio/pci devices being memory devices). An
> > > architecture that is prepared for that can simply enable
> > > CONFIG_VIRTIO_PMEM to make it work.
> > > 
> > > As not all architectures support memory devices (and CONFIG_VIRTIO_PMEM
> > > will be enabled per supported architecture), we have to move the PCI
> > > proxy
> > > to a separate file.
> > > 
> > > Signed-off-by: Pankaj Gupta 
> > > [ split up patches, memory-device changes, move pci proxy]
> > > Signed-off-by: David Hildenbrand 
> > > ---
> > >  hw/virtio/Makefile.objs |   1 +
> > >  hw/virtio/virtio-pmem-pci.c | 131
> > >  
> > >  hw/virtio/virtio-pmem-pci.h |  34 
> > >  include/hw/pci/pci.h|   1 +
> > >  4 files changed, 167 insertions(+)
> > >  create mode 100644 hw/virtio/virtio-pmem-pci.c
> > >  create mode 100644 hw/virtio/virtio-pmem-pci.h
> > 
> > (...)
> > 
> > > +static const VirtioPCIDeviceTypeInfo virtio_pmem_pci_info = {
> > > +.base_name = TYPE_VIRTIO_PMEM_PCI,
> > > +.generic_name  = "virtio-pmem-pci",
> > > +.transitional_name = "virtio-pmem-pci-transitional",
> > 
> > Do we even have a transitional device for this? I.e., do we have a
> > legacy version? I don't think that makes sense for new devices.
> 
> 
> I agree - I applied so pls send a patch on top.
> Or if you end up having to respin pls include this.

Sure. Thank you!

> 
> > > +.non_transitional_name = "virtio-pmem-pci-non-transitional",
> 
> Neither do we need a non transitional name.

o.k

Best regards,
Pankaj
> 
> > > +.instance_size = sizeof(VirtIOPMEMPCI),
> > > +.instance_init = virtio_pmem_pci_instance_init,
> > > +.class_init= virtio_pmem_pci_class_init,
> > > +.interfaces = (InterfaceInfo[]) {
> > > +{ TYPE_MEMORY_DEVICE },
> > > +{ }
> > > +},
> > > +};
> > 
> > (...)
> 



Re: [Qemu-devel] [PATCH v2 3/7] virtio-pmem: sync linux headers

2019-07-02 Thread Pankaj Gupta



- Original Message -
> 
> On Tue, Jul 02, 2019 at 07:59:17AM -0400, Pankaj Gupta wrote:
> > 
> > > 
> > > On Wed, 19 Jun 2019 15:19:03 +0530
> > > Pankaj Gupta  wrote:
> > > 
> > > > Sync linux headers for virtio pmem.
> > > > 
> > > > Signed-off-by: Pankaj Gupta 
> > > > ---
> > > >  include/standard-headers/linux/virtio_ids.h  |  1 +
> > > >  include/standard-headers/linux/virtio_pmem.h | 35
> > > >  
> > > >  2 files changed, 36 insertions(+)
> > > >  create mode 100644 include/standard-headers/linux/virtio_pmem.h
> > > 
> > > That's not yet upstream, right?
> > 
> > right.
> > 
> > > 
> > > If so, I fear this feature won't make 4.1, as the merge window for
> > > Linux only opens in one or two weeks :(
> > 
> > Looks so. Its lined up for 5.3 merge window.
> 
> In which tree is it?

Patches are in Dan's tree:
https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libnvdimm-pending

Thanks,
Pankaj
> 
> > Waiting for the kernel patch series to be merged and send an updated
> > version
> > of qemu series with review suggestion if any :)
> > 
> > Thanks,
> > Pankaj
> > 
> > > 
> 



Re: [Qemu-devel] [PATCH v2 3/7] virtio-pmem: sync linux headers

2019-07-02 Thread Pankaj Gupta


> > > > 
> > > > > 
> > > > > On Wed, 19 Jun 2019 15:19:03 +0530
> > > > > Pankaj Gupta  wrote:
> > > > > 
> > > > > > Sync linux headers for virtio pmem.
> > > > > > 
> > > > > > Signed-off-by: Pankaj Gupta 
> > > > > > ---
> > > > > >  include/standard-headers/linux/virtio_ids.h  |  1 +
> > > > > >  include/standard-headers/linux/virtio_pmem.h | 35
> > > > > >  
> > > > > >  2 files changed, 36 insertions(+)
> > > > > >  create mode 100644 include/standard-headers/linux/virtio_pmem.h
> > > > > 
> > > > > That's not yet upstream, right?
> > > > 
> > > > right.
> > > > 
> > > > > 
> > > > > If so, I fear this feature won't make 4.1, as the merge window for
> > > > > Linux only opens in one or two weeks :(
> > > > 
> > > > Looks so. Its lined up for 5.3 merge window.
> > > 
> > > In which tree is it?
> > 
> > Patches are in Dan's tree:
> > https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libnvdimm-pending
> > 
> > Thanks,
> > Pankaj
> 
> Hmm ok, but that isn't part of linux-next. Do you know why?

Dan suggested to first apply in libnvdimm-pending tree to check for any zero 
day test bot 
warnings and then apply in linux-next. So, I think next step is to merge in 
linux-next.  

Thanks,
Pankaj

> 
> > > 
> > > > Waiting for the kernel patch series to be merged and send an updated
> > > > version
> > > > of qemu series with review suggestion if any :)
> > > > 
> > > > Thanks,
> > > > Pankaj
> > > > 
> > > > > 
> > > 
> 



Re: [Qemu-devel] [PATCH v2 3/7] virtio-pmem: sync linux headers

2019-07-02 Thread Pankaj Gupta


> 
> On Wed, 19 Jun 2019 15:19:03 +0530
> Pankaj Gupta  wrote:
> 
> > Sync linux headers for virtio pmem.
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  include/standard-headers/linux/virtio_ids.h  |  1 +
> >  include/standard-headers/linux/virtio_pmem.h | 35
> >  
> >  2 files changed, 36 insertions(+)
> >  create mode 100644 include/standard-headers/linux/virtio_pmem.h
> 
> That's not yet upstream, right?

right.

> 
> If so, I fear this feature won't make 4.1, as the merge window for
> Linux only opens in one or two weeks :(

Looks so. Its lined up for 5.3 merge window.

Waiting for the kernel patch series to be merged and send an updated version
of qemu series with review suggestion if any :)  

Thanks,
Pankaj 

> 



Re: [Qemu-devel] [PATCH v2 4/7] virtio-pci: Proxy for virtio-pmem

2019-07-02 Thread Pankaj Gupta


> 
> On Wed, 19 Jun 2019 15:19:04 +0530
> Pankaj Gupta  wrote:
> 
> > We need a proxy device for virtio-pmem, and this device has to be the
> > actual memory device so we can cleanly hotplug it.
> > 
> > Forward memory device class functions either to the actual device or use
> > properties of the virtio-pmem device to implement these in the proxy.
> > 
> > virtio-pmem will only be compiled for selected, supported architectures
> > (that can deal with virtio/pci devices being memory devices). An
> > architecture that is prepared for that can simply enable
> > CONFIG_VIRTIO_PMEM to make it work.
> > 
> > As not all architectures support memory devices (and CONFIG_VIRTIO_PMEM
> > will be enabled per supported architecture), we have to move the PCI proxy
> > to a separate file.
> > 
> > Signed-off-by: Pankaj Gupta 
> > [ split up patches, memory-device changes, move pci proxy]
> > Signed-off-by: David Hildenbrand 
> > ---
> >  hw/virtio/Makefile.objs |   1 +
> >  hw/virtio/virtio-pmem-pci.c | 131
> >  
> >  hw/virtio/virtio-pmem-pci.h |  34 
> >  include/hw/pci/pci.h|   1 +
> >  4 files changed, 167 insertions(+)
> >  create mode 100644 hw/virtio/virtio-pmem-pci.c
> >  create mode 100644 hw/virtio/virtio-pmem-pci.h
> 
> (...)
> 
> > +static const VirtioPCIDeviceTypeInfo virtio_pmem_pci_info = {
> > +.base_name = TYPE_VIRTIO_PMEM_PCI,
> > +.generic_name  = "virtio-pmem-pci",
> > +.transitional_name = "virtio-pmem-pci-transitional",
> 
> Do we even have a transitional device for this? I.e., do we have a
> legacy version? I don't think that makes sense for new devices.

No. Sure, will remove this.

Thank you,
Pankaj

> 
> > +.non_transitional_name = "virtio-pmem-pci-non-transitional",
> > +.instance_size = sizeof(VirtIOPMEMPCI),
> > +.instance_init = virtio_pmem_pci_instance_init,
> > +.class_init= virtio_pmem_pci_class_init,
> > +.interfaces = (InterfaceInfo[]) {
> > +{ TYPE_MEMORY_DEVICE },
> > +{ }
> > +},
> > +};
> 
> (...)
> 



<    1   2   3   4   5   6   >