from:"junyan . he"

[Qemu-devel] [PATCH 7/7 V11] migration/ram: ensure write persistence on loading all data to PMEM.

2018-07-18 Thread junyan . he

From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 include/qemu/pmem.h | 6 ++
 migration/ram.c | 8 
 2 files changed, 14 insertions(+)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index ebdb070..dfb6d0d 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -25,6 +25,12 @@ pmem_memcpy_persist(void *pmemdest, const void *src, size_t 
len)
 return NULL;
 }
 
+static inline void
+pmem_persist(const void *addr, size_t len)
+{
+g_assert_not_reached();
+}
+
 #endif /* CONFIG_LIBPMEM */
 
 #endif /* !QEMU_PMEM_H */
diff --git a/migration/ram.c b/migration/ram.c
index 309b567..67b620b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3540,6 +3541,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+}
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();
 
-- 
2.7.4

[Qemu-devel] [PATCH 6/7 V11] migration/ram: Add check and info message to nvdimm post copy.

2018-07-18 Thread junyan . he

From: Junyan He 

The nvdimm kind memory does not support post copy now.
We disable post copy if we have nvdimm memory and print some
log hint to user.

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 migration/ram.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 52dd678..309b567 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3899,6 +3899,15 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 static bool ram_has_postcopy(void *opaque)
 {
+RAMBlock *rb;
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
+if (ramblock_is_pmem(rb)) {
+info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
+ "is not supported now!", rb->idstr, rb->host);
+return false;
+}
+}
+
 return migrate_postcopy_ram();
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH 5/7 V11] mem/nvdimm: ensure write persistence to PMEM in label emulation

2018-07-18 Thread junyan . he

From: Junyan He 

Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 hw/mem/nvdimm.c |  9 -
 include/qemu/pmem.h | 30 ++
 2 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 021d1c3..1c6674c 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "hw/mem/nvdimm.h"
@@ -164,11 +165,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
const void *buf,
 {
 MemoryRegion *mr;
 PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+"pmem", NULL);
 uint64_t backend_offset;
 
 nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-memcpy(nvdimm->label_data + offset, buf, size);
+if (!is_pmem) {
+memcpy(nvdimm->label_data + offset, buf, size);
+} else {
+pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+}
 
 mr = host_memory_backend_get_memory(dimm->hostmem);
 backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 000..ebdb070
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,30 @@
+/*
+ * QEMU header file for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include 
+#else  /* !CONFIG_LIBPMEM */
+
+static inline void *
+pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+/* If 'pmem' option is 'on', we should always have libpmem support,
+   or qemu will report a error and exit, never come here. */
+g_assert_not_reached();
+return NULL;
+}
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */
-- 
2.7.4

[Qemu-devel] [PATCH 4/7 V11] hostmem-file: add the 'pmem' option

2018-07-18 Thread junyan . he

From: Junyan He 

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region. If 'pmem' is set while lack of libpmem
support, a error is generated.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
Reviewed-by: Richard Henderson 
---
 backends/hostmem-file.c | 43 +--
 docs/nvdimm.txt | 22 ++
 exec.c  |  8 
 include/exec/memory.h   |  4 
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  7 +++
 6 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 34c68bb..2476dcb 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
+#include "qemu/error-report.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
@@ -31,9 +32,10 @@ typedef struct HostMemoryBackendFile HostMemoryBackendFile;
 struct HostMemoryBackendFile {
 HostMemoryBackend parent_obj;
 
-bool discard_data;
 char *mem_path;
 uint64_t align;
+bool discard_data;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? RAM_SHARED : 0,
+ (backend->share ? RAM_SHARED : 0) |
+ (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +134,39 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o),
+   object_get_canonical_path_component(o));
+return;
+}
+
+#ifndef CONFIG_LIBPMEM
+if (value) {
+Error *local_err = NULL;
+error_setg(&local_err,
+   "Lack of libpmem support while setting the 'pmem=on'"
+   " of %s '%s'. We can't ensure data persistence.",
+   object_get_typename(o),
+   object_get_canonical_path_component(o));
+error_propagate(errp, local_err);
+return;
+}
+#endif
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +198,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 24b443b..5f158a6 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -173,3 +173,25 @@ There are currently two valid values for this option:
  the NVDIMMs in the event of power loss.  This implies that the
  platform also supports flushing dirty data through the memory
  controller on power loss.
+
+If the vNVDIMM backend is in host persistent memory that can be accessed in
+SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's suggested to set
+the 'pmem' option of memory-backend-file to 'on'. When 'pmem' is 'on' and QEMU
+is built with libpmem [2] support (configured with --enable-libpmem), QEMU
+will take necessary operations to guarantee the persistence of its own writes
+to the vNVDIMM backend(e.g., in vNVDIMM label emulation and l

[Qemu-devel] [PATCH 1/7 V11] memory, exec: Expose all memory block related flags.

2018-07-18 Thread junyan . he

From: Junyan He 

We need to use these flags in other files rather than just in exec.c,
For example, RAM_SHARED should be used when create a ram block from file.
We expose them the exec/memory.h

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
Reviewed-by: Richard Henderson 
---
 exec.c| 20 
 include/exec/memory.h | 20 
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/exec.c b/exec.c
index 4f5df07..cc042dc 100644
--- a/exec.c
+++ b/exec.c
@@ -87,26 +87,6 @@ AddressSpace address_space_memory;
 
 MemoryRegion io_mem_rom, io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
-
-/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
-
-/* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED (1 << 1)
-
-/* Only a portion of RAM (used_length) is actually used, and migrated.
- * This used_length size can change across reboots.
- */
-#define RAM_RESIZEABLE (1 << 2)
-
-/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
- * zero the page and wake waiting processes.
- * (Set during postcopy)
- */
-#define RAM_UF_ZEROPAGE (1 << 3)
-
-/* RAM can be migrated */
-#define RAM_MIGRATABLE (1 << 4)
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 448d41a..6d0af29 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -103,6 +103,26 @@ struct IOMMUNotifier {
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC   (1 << 0)
+
+/* RAM is mmap-ed with MAP_SHARED */
+#define RAM_SHARED (1 << 1)
+
+/* Only a portion of RAM (used_length) is actually used, and migrated.
+ * This used_length size can change across reboots.
+ */
+#define RAM_RESIZEABLE (1 << 2)
+
+/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
+ * zero the page and wake waiting processes.
+ * (Set during postcopy)
+ */
+#define RAM_UF_ZEROPAGE (1 << 3)
+
+/* RAM can be migrated */
+#define RAM_MIGRATABLE (1 << 4)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
-- 
2.7.4

[Qemu-devel] [PATCH 2/7 V11] memory, exec: switch file ram allocation functions to 'flags' parameters

2018-07-18 Thread junyan . he

From: Junyan He 

As more flag parameters besides the existing 'share' are going to be
added to following functions
memory_region_init_ram_from_file
qemu_ram_alloc_from_fd
qemu_ram_alloc_from_file
let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags,
and other flag bits are ignored by above functions right now.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 backends/hostmem-file.c |  3 ++-
 exec.c  | 10 +-
 include/exec/memory.h   |  7 +--
 include/exec/ram_addr.h | 25 +++--
 memory.c|  8 +---
 numa.c  |  2 +-
 6 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 134b08d..34c68bb 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 path = object_get_canonical_path(OBJECT(backend));
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
- backend->size, fb->align, backend->share,
+ backend->size, fb->align,
+ backend->share ? RAM_SHARED : 0,
  fb->mem_path, errp);
 g_free(path);
 }
diff --git a/exec.c b/exec.c
index cc042dc..3b8f914 100644
--- a/exec.c
+++ b/exec.c
@@ -2238,7 +2238,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
- bool share, int fd,
+ uint32_t ram_flags, int fd,
  Error **errp)
 {
 RAMBlock *new_block;
@@ -2280,14 +2280,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->mr = mr;
 new_block->used_length = size;
 new_block->max_length = size;
-new_block->flags = share ? RAM_SHARED : 0;
+new_block->flags = ram_flags;
 new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
 }
 
-ram_block_add(new_block, &local_err, share);
+ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
 if (local_err) {
 g_free(new_block);
 error_propagate(errp, local_err);
@@ -2299,7 +2299,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-   bool share, const char *mem_path,
+   uint32_t ram_flags, const char *mem_path,
Error **errp)
 {
 int fd;
@@ -2311,7 +2311,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 return NULL;
 }
 
-block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
 if (!block) {
 if (created) {
 unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6d0af29..30e7166 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -640,6 +640,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
void *host),
Error **errp);
 #ifdef __linux__
+
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
  *mmap-ed backend.
@@ -651,7 +652,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  * (getpagesize()) will be used.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @ram_flags: Memory region features:
+ * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag
+ * Other bits are ignored now.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -663,7 +666,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   const char *name,
   uint64_t size,
   uint64_t align,
-  bool share,
+  uint32_t ram_flags,
   const char *path,
   Error **errp);
 
di

[Qemu-devel] [PATCH 3/7 V11] configure: add libpmem support

2018-07-18 Thread junyan . he

From: Junyan He 

Add a pair of configure options --{enable,disable}-libpmem to control
whether QEMU is compiled with PMDK libpmem [1].

QEMU may write to the host persistent memory (e.g. in vNVDIMM label
emulation and live migration), so it must take the proper operations
to ensure the persistence of its own writes. Depending on the CPU
models and available instructions, the optimal operation can vary [2].
PMDK libpmem have already implemented those operations on multiple CPU
models (x86 and ARM) and the logic to select the optimal ones, so QEMU
can just use libpmem rather than re-implement them.

Libpem is a part of PMDK project(formerly known as NMVL).
The project's home page is: http://pmem.io/pmdk/
And the project's repository is: https://github.com/pmem/pmdk/

For more information about libpmem APIs, you can refer to the comments
in source code of: pmdk/src/libpmem/pmem.c, begin at line 33.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
Reviewed-by: Richard Henderson 
---
 configure | 29 +
 1 file changed, 29 insertions(+)

diff --git a/configure b/configure
index 2a7796e..1c9288b 100755
--- a/configure
+++ b/configure
@@ -475,6 +475,7 @@ vxhs=""
 libxml2=""
 docker="no"
 debug_mutex="no"
+libpmem=""
 
 # cross compilers defaults, can be overridden with --cross-cc-ARCH
 cross_cc_aarch64="aarch64-linux-gnu-gcc"
@@ -1435,6 +1436,10 @@ for opt do
   ;;
   --disable-debug-mutex) debug_mutex=no
   ;;
+  --enable-libpmem) libpmem=yes
+  ;;
+  --disable-libpmem) libpmem=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1710,6 +1715,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   vhost-user  vhost-user support
   capstonecapstone disassembler support
   debug-mutex mutex debugging support
+  libpmem libpmem support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5546,6 +5552,24 @@ if has "docker"; then
 fi
 
 ##
+# check for libpmem
+
+if test "$libpmem" != "no"; then
+   if $pkg_config --exists "libpmem"; then
+   libpmem="yes"
+   libpmem_libs=$($pkg_config --libs libpmem)
+   libpmem_cflags=$($pkg_config --cflags libpmem)
+   libs_softmmu="$libs_softmmu $libpmem_libs"
+   QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags"
+   else
+   if test "$libpmem" = "yes" ; then
+   feature_not_found "libpmem" "Install nvml or pmdk"
+   fi
+   libpmem="no"
+   fi
+fi
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -6010,6 +6034,7 @@ echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "capstone  $capstone"
 echo "docker$docker"
+echo "libpmem support   $libpmem"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -6763,6 +6788,10 @@ if test "$vxhs" = "yes" ; then
   echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
 fi
 
+if test "$libpmem" = "yes" ; then
+  echo "CONFIG_LIBPMEM=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
-- 
2.7.4

[Qemu-devel] [PATCH 0/7 V11] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-07-18 Thread junyan . he

From: Junyan He 

QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live 
migration.
If the backend is on the persistent memory, QEMU needs to take proper 
operations to
ensure its writes persistent on the persistent memory. Otherwise, a host power 
failure
may result in the loss the guest data on the persistent memory.

This patch series is based on Marcel's patch "mem: add share parameter to 
memory-backend-ram" [1]
because of the changes in patch 1.
[1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

Previous versions of this patch series can be found at: 
v10: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg03433.html
v9: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02361.html
v8: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02279.html
v7: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02997.html
v6: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00061.html
v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html
V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Changes in v11:
* (Patch 2) Modify the ram_flags parameter to 32bits, the same size as it in 
RAMBlock
* (Patch 5 and Patch 7) Delete pmem_xxx stub functions in stubs/pmem.c. Use 
inline
functions with assert to replace them, because we never come there when pmem is 
enabled
but lack of libpmem support.

Changes in v10:
* (Patch 4) Fix a nit in nvdimm docs about pmem option usage in command line 
The v10 patch set is all reviewed by Igor Mammedov 

Changes in v9:
* (Patch 3 and Patch 4) Reorder these two patches to make logic right.
Firstly add libpmem support, and then we can use libpmem's configure
check result. Also fix some typo and grammar issues in these two patches.

Changs in v8: 
* (Patch 3) Report a error when user set 'pmem' to file-backend, while
the qemu is lack of libpmem support. In this case, we can not ensure
the persistence of the file-backend, so we choose to fail the build
rather than contine and make the thing more confused.

Changes in v7: 
The v6 patch set has already reviewed by Stefan Hajnoczi 
No logic change in this v7 version, just:
* Spelling check and some document words refined.
* Rebase to "ram is migratable" patch set.

Changes in v6: 
* (Patch 1) Expose all ram block flags rather than redefine the flags.
* (Patch 4) Use pkg-config rather the hard check when configure. 
* (Patch 7) Sync and flush all the pmem data when migration completes,
rather than sync pages one by one in previous version.

Changes in v5: 
* (Patch 9) Add post copy check and output some messages for nvdimm.

Changes in v4: 
* (Patch 2) Fix compilation errors found by patchew.

Changes in v3:
* (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
PMEM writes in it, so we don't need the _common function.
* (Patch 6) Expose qemu_get_buffer_common so we can remove the
unnecessary qemu_get_buffer_to_pmem wrapper.
* (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
PMEM writes in it, so we can remove the unnecessary
xbzrle_decode_buffer_{common, to_pmem}.
* Move libpmem stubs to stubs/pmem.c and fix the compilation failures
of test-{xbzrle,vmstate}.c.

Changes in v2:
* (Patch 1) Use a flags parameter in file ram allocation functions.
* (Patch 2) Add a new option 'pmem' to hostmem-file.
* (Patch 3) Use libpmem to operate on the persistent memory, rather
than re-implementing those operations in QEMU.
* (Patch 5-8) Consider the write persistence in the migration path.


Junyan:
[1/7] memory, exec: Expose all memory block related flags.
[6/7] migration/ram: Add check and info message to nvdimm post copy.
[7/7] migration/ram: ensure write persistence on loading all date to PMEM.

Haozhong:
[5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation

Haozhong & Junyan:
[2/7] memory, exec: switch file ram allocation functions to 'flags' parameters
[3/7] configure: add libpmem support [4/7] hostmem-file: add the 'pmem' option
[4/7] hostmem-file: add the 'pmem' option

--
backends/hostmem-file.c | 44 ++--
configure   | 29 +
docs/nvdimm.txt | 22 ++
exec.c  | 38 +-
hw/mem/nvdimm.c |  9 -
include/exec/memory.h   | 31 +--
include/exec/ram_addr.h | 28 ++--
include/qemu/pmem.h | 36 
memory.c|  8 +---
migration/ram.c | 17 +
numa.c  |  2 +-
qemu-options.hx |  7 +++

[Qemu-devel] [PATCH 7/7 V10] migration/ram: ensure write persistence on loading all data to PMEM.

2018-07-16 Thread junyan . he

From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 include/qemu/pmem.h |  1 +
 migration/ram.c | 10 +-
 stubs/pmem.c|  4 
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..8f52b08 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,7 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void pmem_persist(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index fd7cba1..67b620b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3540,6 +3541,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+}
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();
 
@@ -3900,7 +3908,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 static bool ram_has_postcopy(void *opaque)
 {
 RAMBlock *rb;
-RAMBLOCK_FOREACH(rb) {
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
 if (ramblock_is_pmem(rb)) {
 info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
  "is not supported now!", rb->idstr, rb->host);
diff --git a/stubs/pmem.c b/stubs/pmem.c
index b4ec72d..f794262 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void pmem_persist(const void *addr, size_t len)
+{
+}
-- 
2.7.4

[Qemu-devel] [PATCH 6/7 V10] migration/ram: Add check and info message to nvdimm post copy.

2018-07-16 Thread junyan . he

From: Junyan He 

The nvdimm kind memory does not support post copy now.
We disable post copy if we have nvdimm memory and print some
log hint to user.

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 migration/ram.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 52dd678..fd7cba1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3899,6 +3899,15 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 static bool ram_has_postcopy(void *opaque)
 {
+RAMBlock *rb;
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
+ "is not supported now!", rb->idstr, rb->host);
+return false;
+}
+}
+
 return migrate_postcopy_ram();
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH 3/7 V10] configure: add libpmem support

2018-07-16 Thread junyan . he

From: Junyan He 

Add a pair of configure options --{enable,disable}-libpmem to control
whether QEMU is compiled with PMDK libpmem [1].

QEMU may write to the host persistent memory (e.g. in vNVDIMM label
emulation and live migration), so it must take the proper operations
to ensure the persistence of its own writes. Depending on the CPU
models and available instructions, the optimal operation can vary [2].
PMDK libpmem have already implemented those operations on multiple CPU
models (x86 and ARM) and the logic to select the optimal ones, so QEMU
can just use libpmem rather than re-implement them.

Libpem is a part of PMDK project(formerly known as NMVL).
The project's home page is: http://pmem.io/pmdk/
And the project's repository is: https://github.com/pmem/pmdk/

For more information about libpmem APIs, you can refer to the comments
in source code of: pmdk/src/libpmem/pmem.c, begin at line 33.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 configure | 29 +
 1 file changed, 29 insertions(+)

diff --git a/configure b/configure
index 2a7796e..1c9288b 100755
--- a/configure
+++ b/configure
@@ -475,6 +475,7 @@ vxhs=""
 libxml2=""
 docker="no"
 debug_mutex="no"
+libpmem=""
 
 # cross compilers defaults, can be overridden with --cross-cc-ARCH
 cross_cc_aarch64="aarch64-linux-gnu-gcc"
@@ -1435,6 +1436,10 @@ for opt do
   ;;
   --disable-debug-mutex) debug_mutex=no
   ;;
+  --enable-libpmem) libpmem=yes
+  ;;
+  --disable-libpmem) libpmem=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1710,6 +1715,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   vhost-user  vhost-user support
   capstonecapstone disassembler support
   debug-mutex mutex debugging support
+  libpmem libpmem support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5546,6 +5552,24 @@ if has "docker"; then
 fi
 
 ##
+# check for libpmem
+
+if test "$libpmem" != "no"; then
+   if $pkg_config --exists "libpmem"; then
+   libpmem="yes"
+   libpmem_libs=$($pkg_config --libs libpmem)
+   libpmem_cflags=$($pkg_config --cflags libpmem)
+   libs_softmmu="$libs_softmmu $libpmem_libs"
+   QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags"
+   else
+   if test "$libpmem" = "yes" ; then
+   feature_not_found "libpmem" "Install nvml or pmdk"
+   fi
+   libpmem="no"
+   fi
+fi
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -6010,6 +6034,7 @@ echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "capstone  $capstone"
 echo "docker$docker"
+echo "libpmem support   $libpmem"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -6763,6 +6788,10 @@ if test "$vxhs" = "yes" ; then
   echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
 fi
 
+if test "$libpmem" = "yes" ; then
+  echo "CONFIG_LIBPMEM=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
-- 
2.7.4

[Qemu-devel] [PATCH 5/7 V10] mem/nvdimm: ensure write persistence to PMEM in label emulation

2018-07-16 Thread junyan . he

From: Junyan He 

Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 hw/mem/nvdimm.c |  9 -
 include/qemu/pmem.h | 23 +++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 19 +++
 4 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 021d1c3..1c6674c 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "hw/mem/nvdimm.h"
@@ -164,11 +165,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
const void *buf,
 {
 MemoryRegion *mr;
 PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+"pmem", NULL);
 uint64_t backend_offset;
 
 nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-memcpy(nvdimm->label_data + offset, buf, size);
+if (!is_pmem) {
+memcpy(nvdimm->label_data + offset, buf, size);
+} else {
+pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+}
 
 mr = host_memory_backend_get_memory(dimm->hostmem);
 backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 000..00d6680
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU header file for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include 
+#else  /* !CONFIG_LIBPMEM */
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 53d3f32..be9a042 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,3 +43,4 @@ stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
 stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
+stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o
\ No newline at end of file
diff --git a/stubs/pmem.c b/stubs/pmem.c
new file mode 100644
index 000..b4ec72d
--- /dev/null
+++ b/stubs/pmem.c
@@ -0,0 +1,19 @@
+/*
+ * Stubs for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/pmem.h"
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+return memcpy(pmemdest, src, len);
+}
-- 
2.7.4

[Qemu-devel] [PATCH 0/7 V10] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-07-16 Thread junyan . he

From: Junyan He 

QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live 
migration.
If the backend is on the persistent memory, QEMU needs to take proper 
operations to
ensure its writes persistent on the persistent memory. Otherwise, a host power 
failure
may result in the loss the guest data on the persistent memory.

This patch series is based on Marcel's patch "mem: add share parameter to 
memory-backend-ram" [1]
because of the changes in patch 1.
[1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

Previous versions of this patch series can be found at: 
v9: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02361.html
v8: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02279.html
v7: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02997.html
v6: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00061.html
v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html
V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Changes in v10:
* (Patch 4) Fix a nit in nvdimm docs about pmem option usage in command line 
The v10 patch set is all reviewed by Igor Mammedov 

Changes in v9:
* (Patch 3 and Patch 4) Reorder these two patches to make logic right.
Firstly add libpmem support, and then we can use libpmem's configure
check result. Also fix some typo and grammar issues in these two patches.

Changs in v8: 
* (Patch 3) Report a error when user set 'pmem' to file-backend, while
the qemu is lack of libpmem support. In this case, we can not ensure
the persistence of the file-backend, so we choose to fail the build
rather than contine and make the thing more confused.

Changes in v7: 
The v6 patch set has already reviewed by Stefan Hajnoczi 
No logic change in this v7 version, just:
* Spelling check and some document words refined.
* Rebase to "ram is migratable" patch set.

Changes in v6: 
* (Patch 1) Expose all ram block flags rather than redefine the flags.
* (Patch 4) Use pkg-config rather the hard check when configure. 
* (Patch 7) Sync and flush all the pmem data when migration completes,
rather than sync pages one by one in previous version.

Changes in v5: 
* (Patch 9) Add post copy check and output some messages for nvdimm.

Changes in v4: 
* (Patch 2) Fix compilation errors found by patchew.

Changes in v3:
* (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
PMEM writes in it, so we don't need the _common function.
* (Patch 6) Expose qemu_get_buffer_common so we can remove the
unnecessary qemu_get_buffer_to_pmem wrapper.
* (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
PMEM writes in it, so we can remove the unnecessary
xbzrle_decode_buffer_{common, to_pmem}.
* Move libpmem stubs to stubs/pmem.c and fix the compilation failures
of test-{xbzrle,vmstate}.c.

Changes in v2:
* (Patch 1) Use a flags parameter in file ram allocation functions.
* (Patch 2) Add a new option 'pmem' to hostmem-file.
* (Patch 3) Use libpmem to operate on the persistent memory, rather
than re-implementing those operations in QEMU.
* (Patch 5-8) Consider the write persistence in the migration path.


Junyan:
[1/7] memory, exec: Expose all memory block related flags.
[6/7] migration/ram: Add check and info message to nvdimm post copy.
[7/7] migration/ram: ensure write persistence on loading all date to PMEM.

Haozhong:
[5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation

Haozhong & Junyan:
[2/7] memory, exec: switch file ram allocation functions to 'flags' parameters
[3/7] configure: add libpmem support [4/7] hostmem-file: add the 'pmem' option
[4/7] hostmem-file: add the 'pmem' option

--
 backends/hostmem-file.c | 42 +-
 configure   | 29 +
 docs/nvdimm.txt | 22 ++
 exec.c  | 38 +-
 hw/mem/nvdimm.c |  9 -
 include/exec/memory.h   | 31 +--
 include/exec/ram_addr.h | 28 ++--
 include/qemu/pmem.h | 24 
 memory.c|  8 +---
 migration/ram.c | 17 +
 numa.c  |  2 +-
 qemu-options.hx |  7 +++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 23 +++
 14 files changed, 246 insertions(+), 35 deletions(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

-- 
2.7.4

[Qemu-devel] [PATCH 4/7 V10] hostmem-file: add the 'pmem' option

2018-07-16 Thread junyan . he

From: Junyan He 

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region. If 'pmem' is set while lack of libpmem
support, a error is generated.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 

---
 backends/hostmem-file.c | 41 -
 docs/nvdimm.txt | 22 ++
 exec.c  |  8 
 include/exec/memory.h   |  4 
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  7 +++
 6 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 34c68bb..b1a2453 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
+#include "qemu/error-report.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
@@ -34,6 +35,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 char *mem_path;
 uint64_t align;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? RAM_SHARED : 0,
+ (backend->share ? RAM_SHARED : 0) |
+ (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +134,39 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o),
+   object_get_canonical_path_component(o));
+return;
+}
+
+#ifndef CONFIG_LIBPMEM
+if (value) {
+Error *local_err = NULL;
+error_setg(&local_err,
+   "Lack of libpmem support while setting the 'pmem=on'"
+   " of %s '%s'. We can't ensure data persistence.",
+   object_get_typename(o),
+   object_get_canonical_path_component(o));
+error_propagate(errp, local_err);
+return;
+}
+#endif
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +198,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 24b443b..48754d2 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -173,3 +173,25 @@ There are currently two valid values for this option:
  the NVDIMMs in the event of power loss.  This implies that the
  platform also supports flushing dirty data through the memory
  controller on power loss.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem' is 'on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take necessary operations to guarantee the
+persistence of its own writes to the vNVDIMM backend(e.g., in vNVDIMM label
+emulation and live migration). If 'pmem' is 'on' while there is no libpmem
+support, qemu will exit and report a "lack of libpmem

[Qemu-devel] [PATCH 1/7 V10] memory, exec: Expose all memory block related flags.

2018-07-16 Thread junyan . he

From: Junyan He 

We need to use these flags in other files rather than just in exec.c,
For example, RAM_SHARED should be used when create a ram block from file.
We expose them the exec/memory.h

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 exec.c| 20 
 include/exec/memory.h | 20 
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/exec.c b/exec.c
index 4f5df07..cc042dc 100644
--- a/exec.c
+++ b/exec.c
@@ -87,26 +87,6 @@ AddressSpace address_space_memory;
 
 MemoryRegion io_mem_rom, io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
-
-/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
-
-/* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED (1 << 1)
-
-/* Only a portion of RAM (used_length) is actually used, and migrated.
- * This used_length size can change across reboots.
- */
-#define RAM_RESIZEABLE (1 << 2)
-
-/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
- * zero the page and wake waiting processes.
- * (Set during postcopy)
- */
-#define RAM_UF_ZEROPAGE (1 << 3)
-
-/* RAM can be migrated */
-#define RAM_MIGRATABLE (1 << 4)
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 448d41a..6d0af29 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -103,6 +103,26 @@ struct IOMMUNotifier {
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC   (1 << 0)
+
+/* RAM is mmap-ed with MAP_SHARED */
+#define RAM_SHARED (1 << 1)
+
+/* Only a portion of RAM (used_length) is actually used, and migrated.
+ * This used_length size can change across reboots.
+ */
+#define RAM_RESIZEABLE (1 << 2)
+
+/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
+ * zero the page and wake waiting processes.
+ * (Set during postcopy)
+ */
+#define RAM_UF_ZEROPAGE (1 << 3)
+
+/* RAM can be migrated */
+#define RAM_MIGRATABLE (1 << 4)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
-- 
2.7.4

[Qemu-devel] [PATCH 2/7 V10] memory, exec: switch file ram allocation functions to 'flags' parameters

2018-07-16 Thread junyan . he

From: Junyan He 

As more flag parameters besides the existing 'share' are going to be
added to following functions
memory_region_init_ram_from_file
qemu_ram_alloc_from_fd
qemu_ram_alloc_from_file
let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags,
and other flag bits are ignored by above functions right now.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 backends/hostmem-file.c |  3 ++-
 exec.c  | 10 +-
 include/exec/memory.h   |  7 +--
 include/exec/ram_addr.h | 25 +++--
 memory.c|  8 +---
 numa.c  |  2 +-
 6 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 134b08d..34c68bb 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 path = object_get_canonical_path(OBJECT(backend));
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
- backend->size, fb->align, backend->share,
+ backend->size, fb->align,
+ backend->share ? RAM_SHARED : 0,
  fb->mem_path, errp);
 g_free(path);
 }
diff --git a/exec.c b/exec.c
index cc042dc..1ec539d 100644
--- a/exec.c
+++ b/exec.c
@@ -2238,7 +2238,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
- bool share, int fd,
+ uint64_t ram_flags, int fd,
  Error **errp)
 {
 RAMBlock *new_block;
@@ -2280,14 +2280,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->mr = mr;
 new_block->used_length = size;
 new_block->max_length = size;
-new_block->flags = share ? RAM_SHARED : 0;
+new_block->flags = ram_flags;
 new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
 }
 
-ram_block_add(new_block, &local_err, share);
+ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
 if (local_err) {
 g_free(new_block);
 error_propagate(errp, local_err);
@@ -2299,7 +2299,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-   bool share, const char *mem_path,
+   uint64_t ram_flags, const char *mem_path,
Error **errp)
 {
 int fd;
@@ -2311,7 +2311,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 return NULL;
 }
 
-block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
 if (!block) {
 if (created) {
 unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6d0af29..513ec8d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -640,6 +640,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
void *host),
Error **errp);
 #ifdef __linux__
+
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
  *mmap-ed backend.
@@ -651,7 +652,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  * (getpagesize()) will be used.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @ram_flags: Memory region features:
+ * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag
+ * Other bits are ignored now.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -663,7 +666,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   const char *name,
   uint64_t size,
   uint64_t align,
-  bool share,
+  uint64_t ram_flags,
   const char *path,
   Error **errp);
 
di

[Qemu-devel] [PATCH 6/7 V9] migration/ram: Add check and info message to nvdimm post copy.

2018-07-10 Thread junyan . he

From: Junyan He 

The nvdimm kind memory does not support post copy now.
We disable post copy if we have nvdimm memory and print some
log hint to user.

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 migration/ram.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 1cd98d6..9c03e2b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3895,6 +3895,15 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 static bool ram_has_postcopy(void *opaque)
 {
+RAMBlock *rb;
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
+ "is not supported now!", rb->idstr, rb->host);
+return false;
+}
+}
+
 return migrate_postcopy_ram();
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH 7/7 V9] migration/ram: ensure write persistence on loading all data to PMEM.

2018-07-10 Thread junyan . he

From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 include/qemu/pmem.h |  1 +
 migration/ram.c | 10 +-
 stubs/pmem.c|  4 
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..8f52b08 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,7 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void pmem_persist(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index 9c03e2b..62dfe75 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3541,6 +3542,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+}
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();
 
@@ -3896,7 +3904,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 static bool ram_has_postcopy(void *opaque)
 {
 RAMBlock *rb;
-RAMBLOCK_FOREACH(rb) {
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
 if (ramblock_is_pmem(rb)) {
 info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
  "is not supported now!", rb->idstr, rb->host);
diff --git a/stubs/pmem.c b/stubs/pmem.c
index b4ec72d..f794262 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void pmem_persist(const void *addr, size_t len)
+{
+}
-- 
2.7.4

[Qemu-devel] [PATCH 4/7 V9] hostmem-file: add the 'pmem' option

2018-07-10 Thread junyan . he

From: Junyan He 

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region. If 'pmem' is set while lack of libpmem
support, a error is generated.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
---
 backends/hostmem-file.c | 41 -
 docs/nvdimm.txt | 22 ++
 exec.c  |  8 
 include/exec/memory.h   |  4 
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  7 +++
 6 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 34c68bb..b1a2453 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
+#include "qemu/error-report.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
@@ -34,6 +35,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 char *mem_path;
 uint64_t align;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? RAM_SHARED : 0,
+ (backend->share ? RAM_SHARED : 0) |
+ (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +134,39 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o),
+   object_get_canonical_path_component(o));
+return;
+}
+
+#ifndef CONFIG_LIBPMEM
+if (value) {
+Error *local_err = NULL;
+error_setg(&local_err,
+   "Lack of libpmem support while setting the 'pmem=on'"
+   " of %s '%s'. We can't ensure data persistence.",
+   object_get_typename(o),
+   object_get_canonical_path_component(o));
+error_propagate(errp, local_err);
+return;
+}
+#endif
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +198,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 24b443b..8c83baf 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -173,3 +173,25 @@ There are currently two valid values for this option:
  the NVDIMMs in the event of power loss.  This implies that the
  platform also supports flushing dirty data through the memory
  controller on power loss.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem' is 'on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take necessary operations to guarantee the
+persistence of its own writes to the vNVDIMM backend(e.g., in vNVDIMM label
+emulation and live migration). If 'pmem' is 'on' while there is no libpmem
+support, qemu will exit and report a "lack of libpmem support" message to
+ens

[Qemu-devel] [PATCH 5/7 V9] mem/nvdimm: ensure write persistence to PMEM in label emulation

2018-07-10 Thread junyan . he

From: Junyan He 

Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 hw/mem/nvdimm.c |  9 -
 include/qemu/pmem.h | 23 +++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 19 +++
 4 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 021d1c3..1c6674c 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "hw/mem/nvdimm.h"
@@ -164,11 +165,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
const void *buf,
 {
 MemoryRegion *mr;
 PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+"pmem", NULL);
 uint64_t backend_offset;
 
 nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-memcpy(nvdimm->label_data + offset, buf, size);
+if (!is_pmem) {
+memcpy(nvdimm->label_data + offset, buf, size);
+} else {
+pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+}
 
 mr = host_memory_backend_get_memory(dimm->hostmem);
 backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 000..00d6680
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU header file for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include 
+#else  /* !CONFIG_LIBPMEM */
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 53d3f32..be9a042 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,3 +43,4 @@ stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
 stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
+stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o
\ No newline at end of file
diff --git a/stubs/pmem.c b/stubs/pmem.c
new file mode 100644
index 000..b4ec72d
--- /dev/null
+++ b/stubs/pmem.c
@@ -0,0 +1,19 @@
+/*
+ * Stubs for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/pmem.h"
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+return memcpy(pmemdest, src, len);
+}
-- 
2.7.4

[Qemu-devel] [PATCH 2/7 V9] memory, exec: switch file ram allocation functions to 'flags' parameters

2018-07-10 Thread junyan . he

From: Junyan He 

As more flag parameters besides the existing 'share' are going to be
added to following functions
memory_region_init_ram_from_file
qemu_ram_alloc_from_fd
qemu_ram_alloc_from_file
let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags,
and other flag bits are ignored by above functions right now.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 backends/hostmem-file.c |  3 ++-
 exec.c  | 10 +-
 include/exec/memory.h   |  7 +--
 include/exec/ram_addr.h | 25 +++--
 memory.c|  8 +---
 numa.c  |  2 +-
 6 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 134b08d..34c68bb 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 path = object_get_canonical_path(OBJECT(backend));
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
- backend->size, fb->align, backend->share,
+ backend->size, fb->align,
+ backend->share ? RAM_SHARED : 0,
  fb->mem_path, errp);
 g_free(path);
 }
diff --git a/exec.c b/exec.c
index cc042dc..1ec539d 100644
--- a/exec.c
+++ b/exec.c
@@ -2238,7 +2238,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
- bool share, int fd,
+ uint64_t ram_flags, int fd,
  Error **errp)
 {
 RAMBlock *new_block;
@@ -2280,14 +2280,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->mr = mr;
 new_block->used_length = size;
 new_block->max_length = size;
-new_block->flags = share ? RAM_SHARED : 0;
+new_block->flags = ram_flags;
 new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
 }
 
-ram_block_add(new_block, &local_err, share);
+ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
 if (local_err) {
 g_free(new_block);
 error_propagate(errp, local_err);
@@ -2299,7 +2299,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-   bool share, const char *mem_path,
+   uint64_t ram_flags, const char *mem_path,
Error **errp)
 {
 int fd;
@@ -2311,7 +2311,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 return NULL;
 }
 
-block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
 if (!block) {
 if (created) {
 unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6d0af29..513ec8d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -640,6 +640,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
void *host),
Error **errp);
 #ifdef __linux__
+
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
  *mmap-ed backend.
@@ -651,7 +652,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  * (getpagesize()) will be used.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @ram_flags: Memory region features:
+ * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag
+ * Other bits are ignored now.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -663,7 +666,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   const char *name,
   uint64_t size,
   uint64_t align,
-  bool share,
+  uint64_t ram_flags,
   const char *path,
   Error **errp);
 
di

[Qemu-devel] [PATCH 3/7 V9] configure: add libpmem support

2018-07-10 Thread junyan . he

From: Junyan He 

Add a pair of configure options --{enable,disable}-libpmem to control
whether QEMU is compiled with PMDK libpmem [1].

QEMU may write to the host persistent memory (e.g. in vNVDIMM label
emulation and live migration), so it must take the proper operations
to ensure the persistence of its own writes. Depending on the CPU
models and available instructions, the optimal operation can vary [2].
PMDK libpmem have already implemented those operations on multiple CPU
models (x86 and ARM) and the logic to select the optimal ones, so QEMU
can just use libpmem rather than re-implement them.

Libpem is a part of PMDK project(formerly known as NMVL).
The project's home page is: http://pmem.io/pmdk/
And the project's repository is: https://github.com/pmem/pmdk/

For more information about libpmem APIs, you can refer to the comments
in source code of: pmdk/src/libpmem/pmem.c, begin at line 33.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 configure | 29 +
 1 file changed, 29 insertions(+)

diff --git a/configure b/configure
index 2a7796e..1c9288b 100755
--- a/configure
+++ b/configure
@@ -475,6 +475,7 @@ vxhs=""
 libxml2=""
 docker="no"
 debug_mutex="no"
+libpmem=""
 
 # cross compilers defaults, can be overridden with --cross-cc-ARCH
 cross_cc_aarch64="aarch64-linux-gnu-gcc"
@@ -1435,6 +1436,10 @@ for opt do
   ;;
   --disable-debug-mutex) debug_mutex=no
   ;;
+  --enable-libpmem) libpmem=yes
+  ;;
+  --disable-libpmem) libpmem=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1710,6 +1715,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   vhost-user  vhost-user support
   capstonecapstone disassembler support
   debug-mutex mutex debugging support
+  libpmem libpmem support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5546,6 +5552,24 @@ if has "docker"; then
 fi
 
 ##
+# check for libpmem
+
+if test "$libpmem" != "no"; then
+   if $pkg_config --exists "libpmem"; then
+   libpmem="yes"
+   libpmem_libs=$($pkg_config --libs libpmem)
+   libpmem_cflags=$($pkg_config --cflags libpmem)
+   libs_softmmu="$libs_softmmu $libpmem_libs"
+   QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags"
+   else
+   if test "$libpmem" = "yes" ; then
+   feature_not_found "libpmem" "Install nvml or pmdk"
+   fi
+   libpmem="no"
+   fi
+fi
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -6010,6 +6034,7 @@ echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "capstone  $capstone"
 echo "docker$docker"
+echo "libpmem support   $libpmem"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -6763,6 +6788,10 @@ if test "$vxhs" = "yes" ; then
   echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
 fi
 
+if test "$libpmem" = "yes" ; then
+  echo "CONFIG_LIBPMEM=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
-- 
2.7.4

[Qemu-devel] [PATCH 0/7 V9] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-07-10 Thread junyan . he

From: Junyan He 

QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and 
live migration. If the backend is on the persistent memory, QEMU needs
to take proper operations to ensure its writes persistent on the 
persistent memory. Otherwise, a host power failure may result in the 
loss the guest data on the persistent memory.

This patch series is based on Marcel's patch "mem: add share parameter to
memory-backend-ram" [1] because of the changes in patch 1.
[1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

Previous versions of this patch series can be found at: 
v8: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02279.html
v7: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02997.html
v6: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00061.html
v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html
V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Changes in v9:
* (Patch 3 and Patch 4) Reorder these two patches to make logic right.
  Firstly add libpmem support, and then we can use libpmem's configure
  check result. Also fix some typo and grammar issues in these two patches.

Changs in v8: 
* (Patch 3) Report a error when user set 'pmem' to file-backend, while
  the qemu is lack of libpmem support. In this case, we can not ensure
  the persistence of the file-backend, so we choose to fail the build
  rather than contine and make the thing more confused.

Changes in v7: 
The v6 patch set has already reviewed by Stefan Hajnoczi 
No logic change in this v7 version, just:
* Spelling check and some document words refined.
* Rebase to "ram is migratable" patch set.

Changes in v6: 
* (Patch 1) Expose all ram block flags rather than redefine the flags.
* (Patch 4) Use pkg-config rather the hard check when configure. 
* (Patch 7) Sync and flush all the pmem data when migration completes,
  rather than sync pages one by one in previous version.

Changes in v5: 
* (Patch 9) Add post copy check and output some messages for nvdimm.

Changes in v4: 
* (Patch 2) Fix compilation errors found by patchew.

Changes in v3:
* (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
  PMEM writes in it, so we don't need the _common function.
* (Patch 6) Expose qemu_get_buffer_common so we can remove the
  unnecessary qemu_get_buffer_to_pmem wrapper.
* (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
  PMEM writes in it, so we can remove the unnecessary
  xbzrle_decode_buffer_{common, to_pmem}.
* Move libpmem stubs to stubs/pmem.c and fix the compilation failures
  of test-{xbzrle,vmstate}.c.

Changes in v2:
* (Patch 1) Use a flags parameter in file ram allocation functions.
* (Patch 2) Add a new option 'pmem' to hostmem-file.
* (Patch 3) Use libpmem to operate on the persistent memory, rather
  than re-implementing those operations in QEMU.
* (Patch 5-8) Consider the write persistence in the migration path.

Junyan:
[1/7] memory, exec: Expose all memory block related flags.
[6/7] migration/ram: Add check and info message to nvdimm post copy.
[7/7] migration/ram: ensure write persistence on loading all date to PMEM.

Haozhong:
[5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation

Haozhong & Junyan:
[2/7] memory, exec: switch file ram allocation functions to 'flags' parameters
[3/7] configure: add libpmem support
[4/7] hostmem-file: add the 'pmem' option


Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 


 backends/hostmem-file.c | 42 +-
 configure   | 29 +
 docs/nvdimm.txt | 22 ++
 exec.c  | 38 +-
 hw/mem/nvdimm.c |  9 -
 include/exec/memory.h   | 31 +--
 include/exec/ram_addr.h | 28 ++--
 include/qemu/pmem.h | 24 
 memory.c|  8 +---
 migration/ram.c | 17 +
 numa.c  |  2 +-
 qemu-options.hx |  7 +++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 23 +++
 14 files changed, 246 insertions(+), 35 deletions(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

-- 
2.7.4

[Qemu-devel] [PATCH 1/7 V9] memory, exec: Expose all memory block related flags.

2018-07-10 Thread junyan . he

From: Junyan He 

We need to use these flags in other files rather than just in exec.c,
For example, RAM_SHARED should be used when create a ram block from file.
We expose them the exec/memory.h

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 exec.c| 20 
 include/exec/memory.h | 20 
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/exec.c b/exec.c
index 4f5df07..cc042dc 100644
--- a/exec.c
+++ b/exec.c
@@ -87,26 +87,6 @@ AddressSpace address_space_memory;
 
 MemoryRegion io_mem_rom, io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
-
-/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
-
-/* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED (1 << 1)
-
-/* Only a portion of RAM (used_length) is actually used, and migrated.
- * This used_length size can change across reboots.
- */
-#define RAM_RESIZEABLE (1 << 2)
-
-/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
- * zero the page and wake waiting processes.
- * (Set during postcopy)
- */
-#define RAM_UF_ZEROPAGE (1 << 3)
-
-/* RAM can be migrated */
-#define RAM_MIGRATABLE (1 << 4)
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 448d41a..6d0af29 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -103,6 +103,26 @@ struct IOMMUNotifier {
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC   (1 << 0)
+
+/* RAM is mmap-ed with MAP_SHARED */
+#define RAM_SHARED (1 << 1)
+
+/* Only a portion of RAM (used_length) is actually used, and migrated.
+ * This used_length size can change across reboots.
+ */
+#define RAM_RESIZEABLE (1 << 2)
+
+/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
+ * zero the page and wake waiting processes.
+ * (Set during postcopy)
+ */
+#define RAM_UF_ZEROPAGE (1 << 3)
+
+/* RAM can be migrated */
+#define RAM_MIGRATABLE (1 << 4)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
-- 
2.7.4

[Qemu-devel] [PATCH 4/7 V8] configure: add libpmem support

2018-07-09 Thread junyan . he

From: Junyan He 

Add a pair of configure options --{enable,disable}-libpmem to control
whether QEMU is compiled with PMDK libpmem [1].

QEMU may write to the host persistent memory (e.g. in vNVDIMM label
emulation and live migration), so it must take the proper operations
to ensure the persistence of its own writes. Depending on the CPU
models and available instructions, the optimal operation can vary [2].
PMDK libpmem have already implemented those operations on multiple CPU
models (x86 and ARM) and the logic to select the optimal ones, so QEMU
can just use libpmem rather than re-implement them.

[1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/
[2] 
https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 configure | 29 +
 1 file changed, 29 insertions(+)

diff --git a/configure b/configure
index 2a7796e..1c9288b 100755
--- a/configure
+++ b/configure
@@ -475,6 +475,7 @@ vxhs=""
 libxml2=""
 docker="no"
 debug_mutex="no"
+libpmem=""
 
 # cross compilers defaults, can be overridden with --cross-cc-ARCH
 cross_cc_aarch64="aarch64-linux-gnu-gcc"
@@ -1435,6 +1436,10 @@ for opt do
   ;;
   --disable-debug-mutex) debug_mutex=no
   ;;
+  --enable-libpmem) libpmem=yes
+  ;;
+  --disable-libpmem) libpmem=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1710,6 +1715,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   vhost-user  vhost-user support
   capstonecapstone disassembler support
   debug-mutex mutex debugging support
+  libpmem libpmem support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5546,6 +5552,24 @@ if has "docker"; then
 fi
 
 ##
+# check for libpmem
+
+if test "$libpmem" != "no"; then
+   if $pkg_config --exists "libpmem"; then
+   libpmem="yes"
+   libpmem_libs=$($pkg_config --libs libpmem)
+   libpmem_cflags=$($pkg_config --cflags libpmem)
+   libs_softmmu="$libs_softmmu $libpmem_libs"
+   QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags"
+   else
+   if test "$libpmem" = "yes" ; then
+   feature_not_found "libpmem" "Install nvml or pmdk"
+   fi
+   libpmem="no"
+   fi
+fi
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -6010,6 +6034,7 @@ echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "capstone  $capstone"
 echo "docker$docker"
+echo "libpmem support   $libpmem"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -6763,6 +6788,10 @@ if test "$vxhs" = "yes" ; then
   echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
 fi
 
+if test "$libpmem" = "yes" ; then
+  echo "CONFIG_LIBPMEM=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
-- 
2.7.4

[Qemu-devel] [PATCH 3/7 V8] hostmem-file: add the 'pmem' option

2018-07-09 Thread junyan . he

From: Junyan He 

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region. If 'pmem' is set while lack of libpmem
support, a error is generated.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
---
 backends/hostmem-file.c | 42 +-
 docs/nvdimm.txt | 23 +++
 exec.c  |  9 +
 include/exec/memory.h   |  4 
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  7 +++
 6 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 34c68bb..dbdaf17 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
+#include "qemu/error-report.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
@@ -34,6 +35,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 char *mem_path;
 uint64_t align;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? RAM_SHARED : 0,
+ (backend->share ? RAM_SHARED : 0) |
+ (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +134,40 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o),
+   object_get_canonical_path_component(o));
+return;
+}
+
+#ifndef CONFIG_LIBPMEM
+if (value) {
+Error *local_err = NULL;
+error_setg(&local_err,
+   "Lack of libpmem support while setting the 'pmem=on'"
+   " of %s '%s'. We can not ensure the persistence of it"
+   " without libpmem support, this may cause serious"
+   " problems." , object_get_typename(o),
+   object_get_canonical_path_component(o));
+error_propagate(errp, local_err);
+return;
+}
+#endif
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +199,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 24b443b..b8bb43a 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -173,3 +173,26 @@ There are currently two valid values for this option:
  the NVDIMMs in the event of power loss.  This implies that the
  platform also supports flushing dirty data through the memory
  controller on power loss.
+
+guest software that this vNVDIMM device contains a region that cannot
+accept persistent writes. In result, for example, the guest Linux
+NVDIMM driver, marks such vNVDIMM device as read-only.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem' is 'on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take neces

[Qemu-devel] [PATCH 7/7 V8] migration/ram: ensure write persistence on loading all data to PMEM.

2018-07-09 Thread junyan . he

From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 include/qemu/pmem.h |  1 +
 migration/ram.c | 10 +-
 stubs/pmem.c|  4 
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..8f52b08 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,7 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void pmem_persist(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index 9c03e2b..62dfe75 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3541,6 +3542,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+}
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();
 
@@ -3896,7 +3904,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 static bool ram_has_postcopy(void *opaque)
 {
 RAMBlock *rb;
-RAMBLOCK_FOREACH(rb) {
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
 if (ramblock_is_pmem(rb)) {
 info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
  "is not supported now!", rb->idstr, rb->host);
diff --git a/stubs/pmem.c b/stubs/pmem.c
index b4ec72d..f794262 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void pmem_persist(const void *addr, size_t len)
+{
+}
-- 
2.7.4

[Qemu-devel] [PATCH 1/7 V8] memory, exec: Expose all memory block related flags.

2018-07-09 Thread junyan . he

From: Junyan He 

We need to use these flags in other files rather than just in exec.c,
For example, RAM_SHARED should be used when create a ram block from file.
We expose them the exec/memory.h

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 exec.c| 20 
 include/exec/memory.h | 20 
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/exec.c b/exec.c
index 4f5df07..cc042dc 100644
--- a/exec.c
+++ b/exec.c
@@ -87,26 +87,6 @@ AddressSpace address_space_memory;
 
 MemoryRegion io_mem_rom, io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
-
-/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
-
-/* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED (1 << 1)
-
-/* Only a portion of RAM (used_length) is actually used, and migrated.
- * This used_length size can change across reboots.
- */
-#define RAM_RESIZEABLE (1 << 2)
-
-/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
- * zero the page and wake waiting processes.
- * (Set during postcopy)
- */
-#define RAM_UF_ZEROPAGE (1 << 3)
-
-/* RAM can be migrated */
-#define RAM_MIGRATABLE (1 << 4)
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 448d41a..6d0af29 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -103,6 +103,26 @@ struct IOMMUNotifier {
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC   (1 << 0)
+
+/* RAM is mmap-ed with MAP_SHARED */
+#define RAM_SHARED (1 << 1)
+
+/* Only a portion of RAM (used_length) is actually used, and migrated.
+ * This used_length size can change across reboots.
+ */
+#define RAM_RESIZEABLE (1 << 2)
+
+/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
+ * zero the page and wake waiting processes.
+ * (Set during postcopy)
+ */
+#define RAM_UF_ZEROPAGE (1 << 3)
+
+/* RAM can be migrated */
+#define RAM_MIGRATABLE (1 << 4)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
-- 
2.7.4

[Qemu-devel] [PATCH 5/7 V8] mem/nvdimm: ensure write persistence to PMEM in label emulation

2018-07-09 Thread junyan . he

From: Junyan He 

Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 hw/mem/nvdimm.c |  9 -
 include/qemu/pmem.h | 23 +++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 19 +++
 4 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 021d1c3..1c6674c 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "hw/mem/nvdimm.h"
@@ -164,11 +165,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
const void *buf,
 {
 MemoryRegion *mr;
 PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+"pmem", NULL);
 uint64_t backend_offset;
 
 nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-memcpy(nvdimm->label_data + offset, buf, size);
+if (!is_pmem) {
+memcpy(nvdimm->label_data + offset, buf, size);
+} else {
+pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+}
 
 mr = host_memory_backend_get_memory(dimm->hostmem);
 backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 000..00d6680
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU header file for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include 
+#else  /* !CONFIG_LIBPMEM */
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 53d3f32..be9a042 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,3 +43,4 @@ stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
 stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
+stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o
\ No newline at end of file
diff --git a/stubs/pmem.c b/stubs/pmem.c
new file mode 100644
index 000..b4ec72d
--- /dev/null
+++ b/stubs/pmem.c
@@ -0,0 +1,19 @@
+/*
+ * Stubs for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/pmem.h"
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+return memcpy(pmemdest, src, len);
+}
-- 
2.7.4

[Qemu-devel] [PATCH 0/7 V8] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-07-09 Thread junyan . he

From: Junyan He 


QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
live migration. If the backend is on the persistent memory, QEMU needs
to take proper operations to ensure its writes persistent on the
persistent memory. Otherwise, a host power failure may result in the
loss the guest data on the persistent memory.

This v3 patch series is based on Marcel's patch "mem: add share
parameter to memory-backend-ram" [1] because of the changes in patch 1.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

Previous versions can be found at:
v7: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02997.html
v6: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00061.html
v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html
V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Changs in v8:
* (Patch 3) Report a error when user set 'pmem' to file-backend, while
  the qemu is lack of libpmem support. In this case, we can not ensure
  the persistence of the file-backend, so we choose to fail the build
  rather than contine and make the thing more confused.

Changes in v7:
The v6 patch set has already reviewed by Stefan Hajnoczi 
No logic change in this v7 version, just:
* Spelling check and some document words refined.
* Rebase to "ram is migratable" patch set.

Changes in v6:
* (Patch 1) Expose all ram block flags rather than redefine the flags.
* (Patch 4) Use pkg-config rather the hard check when configure. 
* (Patch 7) Sync and flush all the pmem data when migration completes,
  rather than sync pages one by one in previous version.

Changes in v5:
* (Patch 9) Add post copy check and output some messages for nvdimm.

Changes in v4:
* (Patch 2) Fix compilation errors found by patchew.

Changes in v3:
* (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
  PMEM writes in it, so we don't need the _common function.
* (Patch 6) Expose qemu_get_buffer_common so we can remove the
  unnecessary qemu_get_buffer_to_pmem wrapper.
* (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
  PMEM writes in it, so we can remove the unnecessary
  xbzrle_decode_buffer_{common, to_pmem}.
* Move libpmem stubs to stubs/pmem.c and fix the compilation failures
  of test-{xbzrle,vmstate}.c.

Changes in v2:
* (Patch 1) Use a flags parameter in file ram allocation functions.
* (Patch 2) Add a new option 'pmem' to hostmem-file.
* (Patch 3) Use libpmem to operate on the persistent memory, rather
  than re-implementing those operations in QEMU.
* (Patch 5-8) Consider the write persistence in the migration path.

Junyan:
[1/7] memory, exec: Expose all memory block related flags.
[6/7] migration/ram: Add check and info message to nvdimm post copy.
[7/7] migration/ram: ensure write persistence on loading all date to PMEM.

Haozhong:
[5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation

Haozhong & Junyan:
[2/7] memory, exec: switch file ram allocation functions to 'flags' parameters
[3/7] hostmem-file: add the 'pmem' option
[4/7] configure: add libpmem support


Signed-off-by: Haozhong Zhang 
Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 


backends/hostmem-file.c | 43 ++-
configure   | 29 +
docs/nvdimm.txt | 23 +++
exec.c  | 39 ++-
hw/mem/nvdimm.c |  9 -
include/exec/memory.h   | 31 +--
include/exec/ram_addr.h | 28 ++--
include/qemu/pmem.h | 24 
memory.c|  8 +---
migration/ram.c | 17 +
numa.c  |  2 +-
qemu-options.hx |  7 +++
stubs/Makefile.objs |  1 +
tubs/pmem.c| 23 +++
14 files changed, 249 insertions(+), 35 deletions(-)
create mode 100644 include/qemu/pmem.h
create mode 100644 stubs/pmem.c

-- 
2.7.4

[Qemu-devel] [PATCH 2/7 V8] memory, exec: switch file ram allocation functions to 'flags' parameters

2018-07-09 Thread junyan . he

From: Junyan He 

As more flag parameters besides the existing 'share' are going to be
added to following functions
memory_region_init_ram_from_file
qemu_ram_alloc_from_fd
qemu_ram_alloc_from_file
let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags,
and other flag bits are ignored by above functions right now.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 backends/hostmem-file.c |  3 ++-
 exec.c  | 10 +-
 include/exec/memory.h   |  7 +--
 include/exec/ram_addr.h | 25 +++--
 memory.c|  8 +---
 numa.c  |  2 +-
 6 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 134b08d..34c68bb 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 path = object_get_canonical_path(OBJECT(backend));
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
- backend->size, fb->align, backend->share,
+ backend->size, fb->align,
+ backend->share ? RAM_SHARED : 0,
  fb->mem_path, errp);
 g_free(path);
 }
diff --git a/exec.c b/exec.c
index cc042dc..1ec539d 100644
--- a/exec.c
+++ b/exec.c
@@ -2238,7 +2238,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
- bool share, int fd,
+ uint64_t ram_flags, int fd,
  Error **errp)
 {
 RAMBlock *new_block;
@@ -2280,14 +2280,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->mr = mr;
 new_block->used_length = size;
 new_block->max_length = size;
-new_block->flags = share ? RAM_SHARED : 0;
+new_block->flags = ram_flags;
 new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
 }
 
-ram_block_add(new_block, &local_err, share);
+ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
 if (local_err) {
 g_free(new_block);
 error_propagate(errp, local_err);
@@ -2299,7 +2299,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-   bool share, const char *mem_path,
+   uint64_t ram_flags, const char *mem_path,
Error **errp)
 {
 int fd;
@@ -2311,7 +2311,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 return NULL;
 }
 
-block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
 if (!block) {
 if (created) {
 unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6d0af29..513ec8d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -640,6 +640,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
void *host),
Error **errp);
 #ifdef __linux__
+
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
  *mmap-ed backend.
@@ -651,7 +652,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  * (getpagesize()) will be used.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @ram_flags: Memory region features:
+ * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag
+ * Other bits are ignored now.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -663,7 +666,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   const char *name,
   uint64_t size,
   uint64_t align,
-  bool share,
+  uint64_t ram_flags,
   const char *path,
   Error **errp);
 
di

[Qemu-devel] [PATCH 6/7 V8] migration/ram: Add check and info message to nvdimm post copy.

2018-07-09 Thread junyan . he

From: Junyan He 

The nvdimm kind memory does not support post copy now.
We disable post copy if we have nvdimm memory and print some
log hint to user.

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
---
 migration/ram.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 1cd98d6..9c03e2b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3895,6 +3895,15 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 static bool ram_has_postcopy(void *opaque)
 {
+RAMBlock *rb;
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
+ "is not supported now!", rb->idstr, rb->host);
+return false;
+}
+}
+
 return migrate_postcopy_ram();
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH 3/7 V7] hostmem-file: add the 'pmem' option

2018-07-03 Thread junyan . he

From: Junyan He 

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 

*RESEND: If pmem is on while we lack of libpmem support, we just make qemu
exit and print some error message to user. This can prevent misusing of pmem
parameter while we can not really ensure the persistence.

---
 backends/hostmem-file.c | 39 ++-
 docs/nvdimm.txt | 18 ++
 exec.c  |  9 +
 include/exec/memory.h   |  4 
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  7 +++
 6 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 34c68bb..4607651 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
+#include "qemu/error-report.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
@@ -34,6 +35,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 char *mem_path;
 uint64_t align;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? RAM_SHARED : 0,
+ (backend->share ? RAM_SHARED : 0) |
+ (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +134,37 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o),
+   object_get_canonical_path_component(o));
+return;
+}
+
+#ifndef CONFIG_LIBPMEM
+if (value) {
+error_report("Lack of libpmem support while setting the 'pmem=on'"
+ " of %s '%s'. We can not ensure the persistence of it"
+ " without libpmem support, this may cause serious"
+ " problems." , object_get_typename(o),
+ object_get_canonical_path_component(o));
+exit(1);
+}
+#endif
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +196,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 8b48fb4..2f7d348 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -180,3 +180,21 @@ supports CPU Cache Flush and Memory Controller Flush on 
Power Loss, etc.
 
 For a complete list of the flags available and for more detailed descriptions,
 please consult the ACPI spec.
+
+guest software that this vNVDIMM device contains a region that cannot
+accept persistent writes. In result, for example, the guest Linux
+NVDIMM driver, marks such vNVDIMM device as read-only.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem=on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take necessary operations to guarantee
+the persistence o

[Qemu-devel] [PATCH 3/7 V7 RESEND] hostmem-file: add the 'pmem' option

2018-06-18 Thread junyan . he

From: Junyan He 

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
---
 backends/hostmem-file.c | 39 ++-
 docs/nvdimm.txt | 18 ++
 exec.c  |  9 +
 include/exec/memory.h   |  4 
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  7 +++
 6 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 34c68bb..4607651 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
+#include "qemu/error-report.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
@@ -34,6 +35,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 char *mem_path;
 uint64_t align;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? RAM_SHARED : 0,
+ (backend->share ? RAM_SHARED : 0) |
+ (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +134,37 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o),
+   object_get_canonical_path_component(o));
+return;
+}
+
+#ifndef CONFIG_LIBPMEM
+if (value) {
+error_report("Lack of libpmem support while setting the 'pmem=on'"
+ " of %s '%s'. We can not ensure the persistence of it"
+ " without libpmem support, this may cause serious"
+ " problems." , object_get_typename(o),
+ object_get_canonical_path_component(o));
+exit(1);
+}
+#endif
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +196,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 8b48fb4..2f7d348 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -180,3 +180,21 @@ supports CPU Cache Flush and Memory Controller Flush on 
Power Loss, etc.
 
 For a complete list of the flags available and for more detailed descriptions,
 please consult the ACPI spec.
+
+guest software that this vNVDIMM device contains a region that cannot
+accept persistent writes. In result, for example, the guest Linux
+NVDIMM driver, marks such vNVDIMM device as read-only.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem=on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take necessary operations to guarantee
+the persistence of its own writes to the vNVDIMM backend (e.g., in
+vNVDIMM label emulation and live migration).
+
+References
+--
+
+[1] SNIA NVM Programming Model: 
https://www.snia.org/sites/default/files/technical_work/

Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-06-12 Thread Junyan He

static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
{
HostMemoryBackend *backend = MEMORY_BACKEND(o);
HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);

if (host_memory_backend_mr_inited(backend)) {
error_setg(errp, "cannot change property 'pmem' of %s '%s'",
   object_get_typename(o),
   object_get_canonical_path_component(o));
return;
}
#ifndef CONFIG_LIBPMEM
if (value) {
warn_report("Lack of libpmem support while setting the 'pmem' of"
" %s '%s'. We can not ensure the persistence of it"
" without libpmem support.", object_get_typename(o),
object_get_canonical_path_component(o));
}
#endif
fb->is_pmem = value;
}



Is this kind of hint or warning acceptable?


From: Qemu-devel  on behalf of 
Junyan He 
Sent: Tuesday, June 12, 2018 3:27:38 PM
To: Igor Mammedov
Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; 
m...@redhat.com; qemu-devel@nongnu.org; dgilb...@redhat.com; 
quint...@redhat.com; Junyan He; stefa...@redhat.com; pbonz...@redhat.com; 
r...@twiddle.net; ehabk...@redhat.com
Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU 
writes to persistent memory

According to my understand, the file node on real persistent memory do not need 
to be pmem=on

pmem=on is a feature for file backend class. For example, if we do not have 
enough hard disk space

and we have enough pmem, we can use file on pmem the same as normal 
file-backend on hard disk.

That is the file-backend on pmem does not necessary to a nvdimm backend, it can 
be normal file mapping as well.

so

 > detect that backing file is located on pmem storage

may not  be a good manner.


> (1) Maybe we should error out if pmem=on but compiled without libpmem
> and add 3rd state pmem=force for testing purposes.

I think we can print a warning message to give user a hint, it can really help 
user

when they mis-configure.



From: Qemu-devel  on behalf of 
Igor Mammedov 
Sent: Tuesday, June 12, 2018 2:55:46 PM
To: Junyan He
Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; m...@redhat.com; 
crosthwaite.pe...@gmail.com; qemu-devel@nongnu.org; ehabk...@redhat.com; 
dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; 
pbonz...@redhat.com; r...@twiddle.net
Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU 
writes to persistent memory

On Tue, 12 Jun 2018 13:38:08 +
Junyan He  wrote:

> He have pmem_persist and pmem_memcpy_persist stub functions.
>
> If no libpmem and user really specify pmem=on, we just do nothing or just 
> memcpy.
>
> Real persistent memory always require libpmem support its load/save.
>
> If pmem=on and without libpmem, we can think that user want to imitate
>
> pmem=on while the HW environment is without real persistent memory existing.
>
> It may help debug on some machine without real pmem.

For unaware user it would be easy to misconfigure and think that feature
works while it isn't, which cloud lead to data loss.

(1) Maybe we should error out if pmem=on but compiled without libpmem
and add 3rd state pmem=force for testing purposes.

Also can we detect that backing file is located on pmem storage and do [1] if 
it's not?

>
> 
> From: Qemu-devel  on behalf 
> of Igor Mammedov 
> Sent: Tuesday, June 12, 2018 12:06:43 PM
> To: junyan...@gmx.com
> Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; 
> crosthwaite.pe...@gmail.com; m...@redhat.com; qemu-devel@nongnu.org; 
> dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; 
> pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of 
> QEMU writes to persistent memory
>
> On Fri,  1 Jun 2018 16:10:22 +0800
> junyan...@gmx.com wrote:
>
> > From: Junyan He 
> >
> > QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
> > live migration. If the backend is on the persistent memory, QEMU needs
> > to take proper operations to ensure its writes persistent on the
> > persistent memory. Otherwise, a host power failure may result in the
> > loss the guest data on the persistent memory.
>
> extra question, what are expected behavior when QEMU is built without
> libpmem and user specifies pmem=on for backend?
>
> >
> > This v3 patch series is based on Marcel's patch "mem: add share
> > parameter to memory-backend-ram" [1] because of the changes in patch 1.
> >
> > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03

Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-06-12 Thread Junyan He

According to my understand, the file node on real persistent memory do not need 
to be pmem=on

pmem=on is a feature for file backend class. For example, if we do not have 
enough hard disk space

and we have enough pmem, we can use file on pmem the same as normal 
file-backend on hard disk.

That is the file-backend on pmem does not necessary to a nvdimm backend, it can 
be normal file mapping as well.

so

 > detect that backing file is located on pmem storage

may not  be a good manner.


> (1) Maybe we should error out if pmem=on but compiled without libpmem
> and add 3rd state pmem=force for testing purposes.

I think we can print a warning message to give user a hint, it can really help 
user

when they mis-configure.



From: Qemu-devel  on behalf of 
Igor Mammedov 
Sent: Tuesday, June 12, 2018 2:55:46 PM
To: Junyan He
Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; m...@redhat.com; 
crosthwaite.pe...@gmail.com; qemu-devel@nongnu.org; ehabk...@redhat.com; 
dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; 
pbonz...@redhat.com; r...@twiddle.net
Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU 
writes to persistent memory

On Tue, 12 Jun 2018 13:38:08 +0000
Junyan He  wrote:

> He have pmem_persist and pmem_memcpy_persist stub functions.
>
> If no libpmem and user really specify pmem=on, we just do nothing or just 
> memcpy.
>
> Real persistent memory always require libpmem support its load/save.
>
> If pmem=on and without libpmem, we can think that user want to imitate
>
> pmem=on while the HW environment is without real persistent memory existing.
>
> It may help debug on some machine without real pmem.

For unaware user it would be easy to misconfigure and think that feature
works while it isn't, which cloud lead to data loss.

(1) Maybe we should error out if pmem=on but compiled without libpmem
and add 3rd state pmem=force for testing purposes.

Also can we detect that backing file is located on pmem storage and do [1] if 
it's not?

>
> 
> From: Qemu-devel  on behalf 
> of Igor Mammedov 
> Sent: Tuesday, June 12, 2018 12:06:43 PM
> To: junyan...@gmx.com
> Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; 
> crosthwaite.pe...@gmail.com; m...@redhat.com; qemu-devel@nongnu.org; 
> dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; 
> pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of 
> QEMU writes to persistent memory
>
> On Fri,  1 Jun 2018 16:10:22 +0800
> junyan...@gmx.com wrote:
>
> > From: Junyan He 
> >
> > QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
> > live migration. If the backend is on the persistent memory, QEMU needs
> > to take proper operations to ensure its writes persistent on the
> > persistent memory. Otherwise, a host power failure may result in the
> > loss the guest data on the persistent memory.
>
> extra question, what are expected behavior when QEMU is built without
> libpmem and user specifies pmem=on for backend?
>
> >
> > This v3 patch series is based on Marcel's patch "mem: add share
> > parameter to memory-backend-ram" [1] because of the changes in patch 1.
> >
> > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html
> >
> > Previous versions can be found at
> > v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html
> > V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
> > v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
> > v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
> > v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html
> >
> > Changes in v6:
> > * (Patch 1) Expose all ram block flags rather than redefine the flags.
> > * (Patch 4) Use pkg-config rather the hard check when configure.
> > * (Patch 7) Sync and flush all the pmem data when migration completes,
> > rather than sync pages one by one in previous version.
> >
> > Changes in v5:
> > * (Patch 9) Add post copy check and output some messages for nvdimm.
> >
> > Changes in v4:
> > * (Patch 2) Fix compilation errors found by patchew.
> >
> > Changes in v3:
> > * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
> > PMEM writes in it, so we don't need the _common function.
> > * (Patch 6) Expose qemu_get_buffer_common so we can remove the
> > unnecessary qemu_get_buffer_to_pmem wrapper.
> > * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
> >

Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-06-12 Thread Junyan He

He have pmem_persist and pmem_memcpy_persist stub functions.

If no libpmem and user really specify pmem=on, we just do nothing or just 
memcpy.

Real persistent memory always require libpmem support its load/save.

If pmem=on and without libpmem, we can think that user want to imitate

pmem=on while the HW environment is without real persistent memory existing.

It may help debug on some machine without real pmem.


From: Qemu-devel  on behalf of 
Igor Mammedov 
Sent: Tuesday, June 12, 2018 12:06:43 PM
To: junyan...@gmx.com
Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; 
m...@redhat.com; qemu-devel@nongnu.org; dgilb...@redhat.com; 
quint...@redhat.com; Junyan He; stefa...@redhat.com; pbonz...@redhat.com; 
r...@twiddle.net; ehabk...@redhat.com
Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU 
writes to persistent memory

On Fri,  1 Jun 2018 16:10:22 +0800
junyan...@gmx.com wrote:

> From: Junyan He 
>
> QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
> live migration. If the backend is on the persistent memory, QEMU needs
> to take proper operations to ensure its writes persistent on the
> persistent memory. Otherwise, a host power failure may result in the
> loss the guest data on the persistent memory.

extra question, what are expected behavior when QEMU is built without
libpmem and user specifies pmem=on for backend?

>
> This v3 patch series is based on Marcel's patch "mem: add share
> parameter to memory-backend-ram" [1] because of the changes in patch 1.
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html
>
> Previous versions can be found at
> v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html
> V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
> v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
> v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
> v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html
>
> Changes in v6:
> * (Patch 1) Expose all ram block flags rather than redefine the flags.
> * (Patch 4) Use pkg-config rather the hard check when configure.
> * (Patch 7) Sync and flush all the pmem data when migration completes,
> rather than sync pages one by one in previous version.
>
> Changes in v5:
> * (Patch 9) Add post copy check and output some messages for nvdimm.
>
> Changes in v4:
> * (Patch 2) Fix compilation errors found by patchew.
>
> Changes in v3:
> * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
> PMEM writes in it, so we don't need the _common function.
> * (Patch 6) Expose qemu_get_buffer_common so we can remove the
> unnecessary qemu_get_buffer_to_pmem wrapper.
> * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
> PMEM writes in it, so we can remove the unnecessary
> xbzrle_decode_buffer_{common, to_pmem}.
> * Move libpmem stubs to stubs/pmem.c and fix the compilation failures
> of test-{xbzrle,vmstate}.c.
>
> Changes in v2:
> * (Patch 1) Use a flags parameter in file ram allocation functions.
> * (Patch 2) Add a new option 'pmem' to hostmem-file.
> * (Patch 3) Use libpmem to operate on the persistent memory, rather
> than re-implementing those operations in QEMU.
> * (Patch 5-8) Consider the write persistence in the migration path.
>
>
> Junyan:
> [1/7] memory, exec: Expose all memory block related flags.
> [6/7] migration/ram: Add check and info message to nvdimm post copy.
> [7/7] migration/ram: ensure write persistence on loading all date to PMEM.
>
> Haozhong:
> [5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation
>
> Haozhong & Junyan:
> [2/7] memory, exec: switch file ram allocation functions to 'flags' parameters
> [3/7] hostmem-file: add the 'pmem' option
> [4/7] configure: add libpmem support
>
>
> Signed-off-by: Haozhong Zhang 
> Signed-off-by: Junyan He 
>
> ---
> backends/hostmem-file.c | 28 +++-
> configure   | 29 +
> docs/nvdimm.txt | 14 ++
> exec.c  | 36 ++--
> hw/mem/nvdimm.c |  9 -
> include/exec/memory.h   | 31 +--
> include/exec/ram_addr.h | 28 ++--
> include/qemu/pmem.h | 24 
> memory.c|  8 +---
> migration/ram.c | 18 ++
> numa.c  |  2 +-
> qemu-options.hx |  7 +++
> stubs/Makefile.objs |  1 +
> stubs/pmem.c| 23 +++
> 14 files changed, 226 insertions(+), 32 deletions(-)

[Qemu-devel] [PATCH 7/7 V7] migration/ram: ensure write persistence on loading all data to PMEM.

2018-06-11 Thread junyan . he

From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
---
 include/qemu/pmem.h | 1 +
 migration/ram.c | 8 
 stubs/pmem.c| 4 
 3 files changed, 13 insertions(+)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..8f52b08 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,7 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void pmem_persist(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index d9093a7..5603505 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3065,6 +3066,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+}
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();
 
diff --git a/stubs/pmem.c b/stubs/pmem.c
index b4ec72d..f794262 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void pmem_persist(const void *addr, size_t len)
+{
+}
-- 
2.7.4

[Qemu-devel] [PATCH 6/7 V7] migration/ram: Add check and info message to nvdimm post copy.

2018-06-11 Thread junyan . he

From: Junyan He 

The nvdimm kind memory does not support post copy now.
We disable post copy if we have nvdimm memory and print some
log hint to user.

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
---
 migration/ram.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index a500015..d9093a7 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3419,6 +3419,15 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 static bool ram_has_postcopy(void *opaque)
 {
+RAMBlock *rb;
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
+ "is not supported now!", rb->idstr, rb->host);
+return false;
+}
+}
+
 return migrate_postcopy_ram();
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH 5/7 V7] mem/nvdimm: ensure write persistence to PMEM in label emulation

2018-06-11 Thread junyan . he

From: Junyan He 

Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
---
 hw/mem/nvdimm.c |  9 -
 include/qemu/pmem.h | 23 +++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 19 +++
 4 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 4087aca..03b478e 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "hw/mem/nvdimm.h"
@@ -155,11 +156,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
const void *buf,
 {
 MemoryRegion *mr;
 PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+"pmem", NULL);
 uint64_t backend_offset;
 
 nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-memcpy(nvdimm->label_data + offset, buf, size);
+if (!is_pmem) {
+memcpy(nvdimm->label_data + offset, buf, size);
+} else {
+pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+}
 
 mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
 backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 000..00d6680
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU header file for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include 
+#else  /* !CONFIG_LIBPMEM */
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 53d3f32..be9a042 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,3 +43,4 @@ stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
 stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
+stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o
\ No newline at end of file
diff --git a/stubs/pmem.c b/stubs/pmem.c
new file mode 100644
index 000..b4ec72d
--- /dev/null
+++ b/stubs/pmem.c
@@ -0,0 +1,19 @@
+/*
+ * Stubs for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/pmem.h"
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+return memcpy(pmemdest, src, len);
+}
-- 
2.7.4

[Qemu-devel] [PATCH 0/7 V7] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-06-11 Thread junyan . he

From: Junyan He 

QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
live migration. If the backend is on the persistent memory, QEMU needs
to take proper operations to ensure its writes persistent on the
persistent memory. Otherwise, a host power failure may result in the
loss the guest data on the persistent memory.

This v3 patch series is based on Marcel's patch "mem: add share
parameter to memory-backend-ram" [1] because of the changes in patch 1.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

Previous versions can be found at:
v6: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00061.html
v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html
V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Changes in v7:
The v6 patch set has already reviewed by Stefan Hajnoczi 
No logic change in this v7 version, just:
* Spelling check and some document words refined.
* Rebase to "ram is migratable" patch set.

Changes in v6:
* (Patch 1) Expose all ram block flags rather than redefine the flags.
* (Patch 4) Use pkg-config rather the hard check when configure. 
* (Patch 7) Sync and flush all the pmem data when migration completes,
rather than sync pages one by one in previous version.

Changes in v5:
* (Patch 9) Add post copy check and output some messages for nvdimm.

Changes in v4:
* (Patch 2) Fix compilation errors found by patchew.

Changes in v3:
* (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
PMEM writes in it, so we don't need the _common function.
* (Patch 6) Expose qemu_get_buffer_common so we can remove the
unnecessary qemu_get_buffer_to_pmem wrapper.
* (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
PMEM writes in it, so we can remove the unnecessary
xbzrle_decode_buffer_{common, to_pmem}.
* Move libpmem stubs to stubs/pmem.c and fix the compilation failures
of test-{xbzrle,vmstate}.c.

Changes in v2:
* (Patch 1) Use a flags parameter in file ram allocation functions.
* (Patch 2) Add a new option 'pmem' to hostmem-file.
* (Patch 3) Use libpmem to operate on the persistent memory, rather
than re-implementing those operations in QEMU.
* (Patch 5-8) Consider the write persistence in the migration path.

Junyan:
[1/7] memory, exec: Expose all memory block related flags.
[6/7] migration/ram: Add check and info message to nvdimm post copy.
[7/7] migration/ram: ensure write persistence on loading all date to PMEM.

Haozhong:
[5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation

Haozhong & Junyan:
[2/7] memory, exec: switch file ram allocation functions to 'flags' parameters
[3/7] hostmem-file: add the 'pmem' option
[4/7] configure: add libpmem support


Signed-off-by: Haozhong Zhang 
Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 

---
backends/hostmem-file.c | 28 +++-
configure   | 29 +
docs/nvdimm.txt | 18 ++
exec.c  | 39 ++-
hw/mem/nvdimm.c |  9 -
include/exec/memory.h   | 31 +--
include/exec/ram_addr.h | 28 ++--
include/qemu/pmem.h | 24 
memory.c|  8 +---
migration/ram.c | 17 +
numa.c  |  2 +-
qemu-options.hx |  7 +++
stubs/Makefile.objs |  1 +
stubs/pmem.c| 23 +++
14 files changed, 229 insertions(+), 35 deletions(-)
--
2.7.4

[Qemu-devel] [PATCH 4/7 V7] configure: add libpmem support

2018-06-11 Thread junyan . he

From: Junyan He 

Add a pair of configure options --{enable,disable}-libpmem to control
whether QEMU is compiled with PMDK libpmem [1].

QEMU may write to the host persistent memory (e.g. in vNVDIMM label
emulation and live migration), so it must take the proper operations
to ensure the persistence of its own writes. Depending on the CPU
models and available instructions, the optimal operation can vary [2].
PMDK libpmem have already implemented those operations on multiple CPU
models (x86 and ARM) and the logic to select the optimal ones, so QEMU
can just use libpmem rather than re-implement them.

[1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/
[2] 
https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
---
 configure | 29 +
 1 file changed, 29 insertions(+)

diff --git a/configure b/configure
index 14b1113..c49b7b6 100755
--- a/configure
+++ b/configure
@@ -457,6 +457,7 @@ replication="yes"
 vxhs=""
 libxml2=""
 docker="no"
+libpmem=""
 
 supported_cpu="no"
 supported_os="no"
@@ -1382,6 +1383,10 @@ for opt do
   ;;
   --disable-git-update) git_update=no
   ;;
+  --enable-libpmem) libpmem=yes
+  ;;
+  --disable-libpmem) libpmem=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1639,6 +1644,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   crypto-afalgLinux AF_ALG crypto backend driver
   vhost-user  vhost-user support
   capstonecapstone disassembler support
+  libpmem libpmem support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5463,6 +5469,24 @@ if has "docker"; then
 fi
 
 ##
+# check for libpmem
+
+if test "$libpmem" != "no"; then
+   if $pkg_config --exists "libpmem"; then
+   libpmem="yes"
+   libpmem_libs=$($pkg_config --libs libpmem)
+   libpmem_cflags=$($pkg_config --cflags libpmem)
+   libs_softmmu="$libs_softmmu $libpmem_libs"
+   QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags"
+   else
+   if test "$libpmem" = "yes" ; then
+   feature_not_found "libpmem" "Install nvml or pmdk"
+   fi
+   libpmem="no"
+   fi
+fi
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -5926,6 +5950,7 @@ echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "capstone  $capstone"
 echo "docker$docker"
+echo "libpmem support   $libpmem"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -6673,6 +6698,10 @@ if test "$vxhs" = "yes" ; then
   echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
 fi
 
+if test "$libpmem" = "yes" ; then
+  echo "CONFIG_LIBPMEM=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
-- 
2.7.4

[Qemu-devel] [PATCH 3/7 V7] hostmem-file: add the 'pmem' option

2018-06-11 Thread junyan . he

From: Junyan He 

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
---
 backends/hostmem-file.c | 27 ++-
 docs/nvdimm.txt | 18 ++
 exec.c  |  9 +
 include/exec/memory.h   |  4 
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  7 +++
 6 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 34c68bb..6a861f0 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 char *mem_path;
 uint64_t align;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +60,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? RAM_SHARED : 0,
+ (backend->share ? RAM_SHARED : 0) |
+ (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +133,26 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o),
+   object_get_canonical_path_component(o));
+return;
+}
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +184,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 8b48fb4..2f7d348 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -180,3 +180,21 @@ supports CPU Cache Flush and Memory Controller Flush on 
Power Loss, etc.
 
 For a complete list of the flags available and for more detailed descriptions,
 please consult the ACPI spec.
+
+guest software that this vNVDIMM device contains a region that cannot
+accept persistent writes. In result, for example, the guest Linux
+NVDIMM driver, marks such vNVDIMM device as read-only.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem=on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take necessary operations to guarantee
+the persistence of its own writes to the vNVDIMM backend (e.g., in
+vNVDIMM label emulation and live migration).
+
+References
+--
+
+[1] SNIA NVM Programming Model: 
https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
+[2] PMDK: http://pmem.io/pmdk/
diff --git a/exec.c b/exec.c
index 8e079df..c42483e 100644
--- a/exec.c
+++ b/exec.c
@@ -2077,6 +2077,9 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 Error *local_err = NULL;
 int64_t file_size;
 
+/* Just support these ram flags by now. */
+assert(ram_flags == 0 || (ram_flags & (RAM_SHARED | RAM_PMEM)));
+
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
 return NULL;
@@ -4007,6 +4010,11 @@ err:
 return ret;
 }
 
+bool ramblock_is_pmem(RAMBlock *rb)
+{
+return rb->flags & RAM_PMEM;
+}
+
 #endif
 
 void page_size_init(void)
@@ -4105,3 +4113,4 @@ void mtree_print_dispatch(fprintf_function mon, void

[Qemu-devel] [PATCH 2/7 V7] memory, exec: switch file ram allocation functions to 'flags' parameters

2018-06-11 Thread junyan . he

From: Junyan He 

As more flag parameters besides the existing 'share' are going to be
added to following functions
  memory_region_init_ram_from_file
  qemu_ram_alloc_from_fd
  qemu_ram_alloc_from_file
let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags,
and other flag bits are ignored by above functions right now.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
Reviewed-by: Stefan Hajnoczi 
---
 backends/hostmem-file.c |  3 ++-
 exec.c  | 10 +-
 include/exec/memory.h   |  7 +--
 include/exec/ram_addr.h | 25 +++--
 memory.c|  8 +---
 numa.c  |  2 +-
 6 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 134b08d..34c68bb 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 path = object_get_canonical_path(OBJECT(backend));
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
- backend->size, fb->align, backend->share,
+ backend->size, fb->align,
+ backend->share ? RAM_SHARED : 0,
  fb->mem_path, errp);
 g_free(path);
 }
diff --git a/exec.c b/exec.c
index 9246722..8e079df 100644
--- a/exec.c
+++ b/exec.c
@@ -2070,7 +2070,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
- bool share, int fd,
+ uint64_t ram_flags, int fd,
  Error **errp)
 {
 RAMBlock *new_block;
@@ -2112,14 +2112,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->mr = mr;
 new_block->used_length = size;
 new_block->max_length = size;
-new_block->flags = share ? RAM_SHARED : 0;
+new_block->flags = ram_flags;
 new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
 }
 
-ram_block_add(new_block, &local_err, share);
+ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
 if (local_err) {
 g_free(new_block);
 error_propagate(errp, local_err);
@@ -2131,7 +2131,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-   bool share, const char *mem_path,
+   uint64_t ram_flags, const char *mem_path,
Error **errp)
 {
 int fd;
@@ -2143,7 +2143,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 return NULL;
 }
 
-block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
 if (!block) {
 if (created) {
 unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1bb9172..3769c06 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -599,6 +599,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
void *host),
Error **errp);
 #ifdef __linux__
+
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
  *mmap-ed backend.
@@ -610,7 +611,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  * (getpagesize()) will be used.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @ram_flags: Memory region features:
+ * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag
+ * Other bits are ignored now.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -622,7 +625,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   const char *name,
   uint64_t size,
   uint64_t align,
-  bool share,
+  uint64_t ram_flags,
   const char *path,
   Error **errp);
 
diff --git a/include/exec/

[Qemu-devel] [PATCH 1/7 V7] memory, exec: Expose all memory block related flags.

2018-06-11 Thread junyan . he

From: Junyan He 

We need to use these flags in other files rather than just in exec.c,
For example, RAM_SHARED should be used when create a ram block from file.
We expose them the exec/memory.h

Signed-off-by: Junyan He 
Reviewed-by: Stefan Hajnoczi 
---
 exec.c| 20 
 include/exec/memory.h | 20 
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/exec.c b/exec.c
index f6645ed..9246722 100644
--- a/exec.c
+++ b/exec.c
@@ -87,26 +87,6 @@ AddressSpace address_space_memory;
 
 MemoryRegion io_mem_rom, io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
-
-/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
-
-/* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED (1 << 1)
-
-/* Only a portion of RAM (used_length) is actually used, and migrated.
- * This used_length size can change across reboots.
- */
-#define RAM_RESIZEABLE (1 << 2)
-
-/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
- * zero the page and wake waiting processes.
- * (Set during postcopy)
- */
-#define RAM_UF_ZEROPAGE (1 << 3)
-
-/* RAM can be migrated */
-#define RAM_MIGRATABLE (1 << 4)
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
diff --git a/include/exec/memory.h b/include/exec/memory.h
index eb2ba06..1bb9172 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -102,6 +102,26 @@ struct IOMMUNotifier {
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC   (1 << 0)
+
+/* RAM is mmap-ed with MAP_SHARED */
+#define RAM_SHARED (1 << 1)
+
+/* Only a portion of RAM (used_length) is actually used, and migrated.
+ * This used_length size can change across reboots.
+ */
+#define RAM_RESIZEABLE (1 << 2)
+
+/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
+ * zero the page and wake waiting processes.
+ * (Set during postcopy)
+ */
+#define RAM_UF_ZEROPAGE (1 << 3)
+
+/* RAM can be migrated */
+#define RAM_MIGRATABLE (1 << 4)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end)
-- 
2.7.4

Re: [Qemu-devel] [PATCH V6 RESEND 7/7] migration/ram: ensure write persistence on loading all data to PMEM.

2018-06-08 Thread Junyan He

Sorry missed dgilbert's comment about RAMBLOCK_FOREACH_MIGRATABLE in previous 
revision.

From: Qemu-devel  on behalf of 
junyan...@gmx.com 
Sent: Saturday, June 9, 2018 1:12:31 AM
To: qemu-devel@nongnu.org
Cc: xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; m...@redhat.com; 
dgilb...@redhat.com; ehabk...@redhat.com; quint...@redhat.com; Junyan He; 
stefa...@redhat.com; pbonz...@redhat.com; imamm...@redhat.com; r...@twiddle.net
Subject: [Qemu-devel] [PATCH V6 RESEND 7/7] migration/ram: ensure write 
persistence on loading all data to PMEM.

From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
---
 include/qemu/pmem.h | 1 +
 migration/ram.c | 8 
 stubs/pmem.c| 4 
 3 files changed, 13 insertions(+)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..8f52b08 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,7 @@
 #else  /* !CONFIG_LIBPMEM */

 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void pmem_persist(const void *addr, size_t len);

 #endif /* CONFIG_LIBPMEM */

diff --git a/migration/ram.c b/migration/ram.c
index aa0c6f0..15418c2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+}
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();

diff --git a/stubs/pmem.c b/stubs/pmem.c
index b4ec72d..f794262 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void pmem_persist(const void *addr, size_t len)
+{
+}
--
2.7.4

[Qemu-devel] [PATCH V6 RESEND 7/7] migration/ram: ensure write persistence on loading all data to PMEM.

2018-06-08 Thread junyan . he

From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
---
 include/qemu/pmem.h | 1 +
 migration/ram.c | 8 
 stubs/pmem.c| 4 
 3 files changed, 13 insertions(+)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..8f52b08 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,7 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void pmem_persist(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index aa0c6f0..15418c2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH_MIGRATABLE(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+}
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();
 
diff --git a/stubs/pmem.c b/stubs/pmem.c
index b4ec72d..f794262 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void pmem_persist(const void *addr, size_t len)
+{
+}
-- 
2.7.4

Re: [Qemu-devel] [Qemu-block] Some question about savem/qcow2 incremental snapshot

2018-06-08 Thread Junyan He

I think nvdimm kind memory can really save the content(no matter real or 
emulated). But I think it is still

memory, as I understand, its data should be stored in the qcow2 image or some 
external snapshot data

image, so that we can copy this qcow2 image to other place and restore the same 
environment.

Qcow2 image contain all vm state, disk data and memory data, so I think 
nvdimm's data should also be

stored in this qcow2 image.

I am really a new guy in vmm field and do not know the usage of qcow2 with DAX. 
So far as I know, DAX

is a kernel FS option to let the page mapping bypass all block device logic and 
can improve performance.

But qcow2 is a user space file format used by qemu to emulate disks(Am I 
right?), so I have no idea about

that.


Thanks

Junyan


From: Qemu-devel  on behalf of 
Pankaj Gupta 
Sent: Friday, June 8, 2018 7:59:24 AM
To: Junyan He
Cc: Kevin Wolf; qemu block; qemu-devel@nongnu.org; Stefan Hajnoczi; Max Reitz
Subject: Re: [Qemu-devel] [Qemu-block] Some question about savem/qcow2 
incremental snapshot


Hi Junyan,

AFAICU you are trying to utilize qcow2 capabilities to do incremental
snapshot. As I understand NVDIMM device (being it real or emulated), its
contents are always be backed up in backing device.

Now, the question comes to take a snapshot at some point in time. You are
trying to achieve this with qcow2 format (not checked code yet), I have below
queries:

- Are you implementing this feature for both actual DAX device pass-through
  as well as emulated DAX?
- Are you using additional qcow2 disk for storing/taking snapshots? How we are
  planning to use this feature?

Reason I asked this question is if we concentrate on integrating qcow2
with DAX, we will have a full fledged solution for most of the use-cases.

Thanks,
Pankaj

>
> Dear all:
>
> I just switched from graphic/media field to virtualization at the end of the
> last year,
> so I am sorry that though I have already try my best but I still feel a
> little dizzy
> about your previous discussion about NVDimm via block layer:)
> In today's qemu, we use the SaveVMHandlers functions to handle both snapshot
> and migration.
> So for nvdimm kind memory, its migration and snapshot use the same way as the
> ram(savevm_ram_handlers). But the difference is the size of nvdimm may be
> huge, and the load
> and store speed is slower. According to my usage, when I use 256G nvdimm as
> memory backend,
> it may take more than 5 minutes to complete one snapshot saving, and after
> saving the qcow2
> image is bigger than 50G. For migration, this may not be a problem because we
> do not need
> extra disk space and the guest is not paused when in migration process. But
> for snapshot,
> we need to pause the VM and the user experience is bad, and we got concerns
> about that.
> I posted this question in Jan this year but failed to get enough reply. Then
> I sent a RFC patch
> set in Mar, basic idea is using the dependency snapshot and dirty log trace
> in kernel to
> optimize this.
>
> https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg04530.html
>
> I use the simple way to handle this,
> 1. Separate the nvdimm region from ram when do snapshot.
> 2. If the first time, we dump all the nvdimm data the same as ram, and enable
> dirty log trace
> for nvdimm kind region.
> 3. If not the first time, we find the previous snapshot point and add
> reference to its clusters
> which is used to store nvdimm data. And this time, we just save dirty page
> bitmap and dirty pages.
> Because the previous nvdimm data clusters is ref added, we do not need to
> worry about its deleting.
>
> I encounter a lot of problems:
> 1. Migration and snapshot logic is mixed and need to separate them for
> nvdimm.
> 2. Cluster has its alignment. When do snapshot, we just save data to disk
> continuous. Because we
> need to add ref to cluster, we really need to consider the alignment. I just
> use a little trick way
> to padding some data to alignment now, and I think it is not a good way.
> 3. Dirty log trace may have some performance problem.
>
> In theory, this manner can be used to handle all kind of huge memory
> snapshot, we need to find the
> balance between guest performance(Because of dirty log trace) and snapshot
> saving time.
>
> Thanks
> Junyan
>
>
> -Original Message-
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> Sent: Thursday, May 31, 2018 6:49 PM
> To: Kevin Wolf 
> Cc: Max Reitz ; He, Junyan ; Pankaj
> Gupta ; qemu-devel@nongnu.org; qemu block
> 
> Subject: Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2
> incremental snapshot
>
> On Wed, May 30, 2018 at 06:07:19PM +0200, Kevin Wolf wrote:
> > Am 30.05.2018 um 16:44 hat Stefan Hajnoczi g

[Qemu-devel] [PATCH V6 RESEND 7/7] migration/ram: ensure write persistence on loading all data to PMEM.

2018-06-07 Thread junyan . he

From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
---
 include/qemu/pmem.h | 1 +
 migration/ram.c | 8 
 stubs/pmem.c| 4 
 3 files changed, 13 insertions(+)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..8f52b08 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,7 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void pmem_persist(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index aa0c6f0..15418c2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+}
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();
 
diff --git a/stubs/pmem.c b/stubs/pmem.c
index b4ec72d..f794262 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void pmem_persist(const void *addr, size_t len)
+{
+}
-- 
2.7.4

[Qemu-devel] [PATCH V6 7/7] migration/ram: ensure write persistence on loading all data to PMEM.

2018-06-01 Thread junyan . he

From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
---
 include/qemu/pmem.h | 1 +
 migration/ram.c | 8 
 stubs/pmem.c| 4 
 3 files changed, 13 insertions(+)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..b1e1b5c 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,7 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void *pmem_persist(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index aa0c6f0..09525b2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+ }
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();
 
diff --git a/stubs/pmem.c b/stubs/pmem.c
index b4ec72d..c5bc6d6 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void *pmem_persist(const void *addr, size_t len)
+{
+}
-- 
2.7.4

[Qemu-devel] [PATCH V6 5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation

2018-06-01 Thread junyan . he

From: Junyan He 

Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang 
---
 hw/mem/nvdimm.c |  9 -
 include/qemu/pmem.h | 23 +++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 19 +++
 4 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 4087aca..03b478e 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "hw/mem/nvdimm.h"
@@ -155,11 +156,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
const void *buf,
 {
 MemoryRegion *mr;
 PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+"pmem", NULL);
 uint64_t backend_offset;
 
 nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-memcpy(nvdimm->label_data + offset, buf, size);
+if (!is_pmem) {
+memcpy(nvdimm->label_data + offset, buf, size);
+} else {
+pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+}
 
 mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
 backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 000..00d6680
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU header file for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include 
+#else  /* !CONFIG_LIBPMEM */
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 53d3f32..be9a042 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,3 +43,4 @@ stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
 stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
+stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o
\ No newline at end of file
diff --git a/stubs/pmem.c b/stubs/pmem.c
new file mode 100644
index 000..b4ec72d
--- /dev/null
+++ b/stubs/pmem.c
@@ -0,0 +1,19 @@
+/*
+ * Stubs for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/pmem.h"
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+return memcpy(pmemdest, src, len);
+}
-- 
2.7.4

[Qemu-devel] [PATCH V6 6/7] migration/ram: Add check and info message to nvdimm post copy.

2018-06-01 Thread junyan . he

From: Junyan He 

The nvdimm kind memory does not support post copy now.
We disable post copy if we have nvdimm memory and print some
log hint to user.

Signed-off-by: Junyan He 
---
 migration/ram.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index c53e836..aa0c6f0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3397,6 +3397,15 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 static bool ram_has_postcopy(void *opaque)
 {
+RAMBlock *rb;
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
+ "is not supported now!", rb->idstr, rb->host);
+return false;
+}
+}
+
 return migrate_postcopy_ram();
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH V6 3/7] hostmem-file: add the 'pmem' option

2018-06-01 Thread junyan . he

From: Junyan He 

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
---
 backends/hostmem-file.c | 27 ++-
 docs/nvdimm.txt | 14 ++
 exec.c  |  9 +
 include/exec/memory.h   |  6 ++
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  7 +++
 6 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 34c68bb..ccca7a1 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 char *mem_path;
 uint64_t align;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +60,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? RAM_SHARED : 0,
+ (backend->share ? RAM_SHARED : 0) |
+ (fb->is_pmem ? RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +133,26 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o),
+   object_get_canonical_path_component(OBJECT(backend)));
+return;
+}
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +184,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index e903d8b..bcb2032 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -153,3 +153,17 @@ guest NVDIMM region mapping structure.  This unarmed flag 
indicates
 guest software that this vNVDIMM device contains a region that cannot
 accept persistent writes. In result, for example, the guest Linux
 NVDIMM driver, marks such vNVDIMM device as read-only.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem=on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take necessary operations to guarantee
+the persistence of its own writes to the vNVDIMM backend (e.g., in
+vNVDIMM label emulation and live migration).
+
+References
+--
+
+[1] SNIA NVM Programming Model: 
https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
+[2] PMDK: http://pmem.io/pmdk/
diff --git a/exec.c b/exec.c
index f2082fa..f066705 100644
--- a/exec.c
+++ b/exec.c
@@ -2061,6 +2061,9 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 Error *local_err = NULL;
 int64_t file_size;
 
+/* Just support these ram flags by now. */
+assert(ram_flags == 0 || (ram_flags & (RAM_SHARED | RAM_PMEM)));
+
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
 return NULL;
@@ -3971,6 +3974,11 @@ err:
 return ret;
 }
 
+bool ramblock_is_pmem(RAMBlock *rb)
+{
+return rb->flags & RAM_PMEM;
+}
+
 #endif
 
 void page_size_init(void)
@@ -4069,3 +4077,4 @@ void mtree_print_dispatch(fprintf_function mon, void *f,
 }
 
 #endif
+
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3b68a43..6523512 100644
--- a/include/exec/memory.h

[Qemu-devel] [PATCH V6 4/7] configure: add libpmem support

2018-06-01 Thread junyan . he

From: Junyan He 

Add a pair of configure options --{enable,disable}-libpmem to control
whether QEMU is compiled with PMDK libpmem [1].

QEMU may write to the host persistent memory (e.g. in vNVDIMM label
emulation and live migration), so it must take the proper operations
to ensure the persistence of its own writes. Depending on the CPU
models and available instructions, the optimal operation can vary [2].
PMDK libpmem have already implemented those operations on multiple CPU
models (x86 and ARM) and the logic to select the optimal ones, so QEMU
can just use libpmem rather than re-implement them.

[1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/
[2] 
https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
---
 configure | 29 +
 1 file changed, 29 insertions(+)

diff --git a/configure b/configure
index a6a4616..f44d669 100755
--- a/configure
+++ b/configure
@@ -456,6 +456,7 @@ jemalloc="no"
 replication="yes"
 vxhs=""
 libxml2=""
+libpmem=""
 
 supported_cpu="no"
 supported_os="no"
@@ -1381,6 +1382,10 @@ for opt do
   ;;
   --disable-git-update) git_update=no
   ;;
+  --enable-libpmem) libpmem=yes
+  ;;
+  --disable-libpmem) libpmem=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1638,6 +1643,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   crypto-afalgLinux AF_ALG crypto backend driver
   vhost-user  vhost-user support
   capstonecapstone disassembler support
+  libpmem libpmem support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5445,6 +5451,24 @@ EOF
 fi
 
 ##
+# check for libpmem
+
+if test "$libpmem" != "no"; then
+   if $pkg_config --exists "libpmem"; then
+   libpmem="yes"
+   libpmem_libs=$($pkg_config --libs libpmem)
+   libpmem_cflags=$($pkg_config --cflags libpmem)
+   libs_softmmu="$libs_softmmu $libpmem_libs"
+   QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags"
+   else
+   if test "$libpmem" = "yes" ; then
+   feature_not_found "libpmem" "Install nvml or pmdk"
+   fi
+   libpmem="no"
+   fi
+fi
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -5907,6 +5931,7 @@ echo "avx2 optimization $avx2_opt"
 echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "capstone  $capstone"
+echo "libpmem support   $libpmem"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -6651,6 +6676,10 @@ if test "$vxhs" = "yes" ; then
   echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
 fi
 
+if test "$libpmem" = "yes" ; then
+  echo "CONFIG_LIBPMEM=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
-- 
2.7.4

[Qemu-devel] [PATCH V6 2/7] memory, exec: switch file ram allocation functions to 'flags' parameters

2018-06-01 Thread junyan . he

From: Junyan He 

As more flag parameters besides the existing 'share' are going to be
added to following functions
  memory_region_init_ram_from_file
  qemu_ram_alloc_from_fd
  qemu_ram_alloc_from_file
let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags,
and other flag bits are ignored by above functions right now.

Signed-off-by: Junyan He 
Signed-off-by: Haozhong Zhang 
---
 backends/hostmem-file.c |  3 ++-
 exec.c  | 10 +-
 include/exec/memory.h   |  8 ++--
 include/exec/ram_addr.h | 25 +++--
 memory.c|  8 +---
 numa.c  |  2 +-
 6 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 134b08d..34c68bb 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 path = object_get_canonical_path(OBJECT(backend));
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
- backend->size, fb->align, backend->share,
+ backend->size, fb->align,
+ backend->share ? RAM_SHARED : 0,
  fb->mem_path, errp);
 g_free(path);
 }
diff --git a/exec.c b/exec.c
index 302c04b..f2082fa 100644
--- a/exec.c
+++ b/exec.c
@@ -2054,7 +2054,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
- bool share, int fd,
+ uint64_t ram_flags, int fd,
  Error **errp)
 {
 RAMBlock *new_block;
@@ -2096,14 +2096,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->mr = mr;
 new_block->used_length = size;
 new_block->max_length = size;
-new_block->flags = share ? RAM_SHARED : 0;
+new_block->flags = ram_flags;
 new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
 if (!new_block->host) {
 g_free(new_block);
 return NULL;
 }
 
-ram_block_add(new_block, &local_err, share);
+ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
 if (local_err) {
 g_free(new_block);
 error_propagate(errp, local_err);
@@ -2115,7 +2115,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-   bool share, const char *mem_path,
+   uint64_t ram_flags, const char *mem_path,
Error **errp)
 {
 int fd;
@@ -2127,7 +2127,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 return NULL;
 }
 
-block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
 if (!block) {
 if (created) {
 unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3da315e..3b68a43 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -596,6 +596,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
void *host),
Error **errp);
 #ifdef __linux__
+
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
  *mmap-ed backend.
@@ -607,7 +608,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  * (getpagesize()) will be used.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @ram_flags: specify properties of this memory region, which can be one or
+ * bit-or of following values:
+ * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag
+ * Other bits are ignored.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -619,7 +623,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   const char *name,
   uint64_t size,
   uint64_t align,
-  bool share,
+  uint64_t ram_flags,
   const char *path,

[Qemu-devel] [PATCH V6 1/7] memory, exec: Expose all memory block related flags.

2018-06-01 Thread junyan . he

From: Junyan He 

We need to use these flags in other files rather than just in exec.c,
For example, RAM_SHARED should be used when create a ram block from file.
We expose them the exec/memory.h

Signed-off-by: Junyan He 
---
 exec.c| 17 -
 include/exec/memory.h | 17 +
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/exec.c b/exec.c
index c30f905..302c04b 100644
--- a/exec.c
+++ b/exec.c
@@ -87,23 +87,6 @@ AddressSpace address_space_memory;
 
 MemoryRegion io_mem_rom, io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
-
-/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC   (1 << 0)
-
-/* RAM is mmap-ed with MAP_SHARED */
-#define RAM_SHARED (1 << 1)
-
-/* Only a portion of RAM (used_length) is actually used, and migrated.
- * This used_length size can change across reboots.
- */
-#define RAM_RESIZEABLE (1 << 2)
-
-/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
- * zero the page and wake waiting processes.
- * (Set during postcopy)
- */
-#define RAM_UF_ZEROPAGE (1 << 3)
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 67ea7fe..3da315e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -102,6 +102,23 @@ struct IOMMUNotifier {
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC   (1 << 0)
+
+/* RAM is mmap-ed with MAP_SHARED */
+#define RAM_SHARED (1 << 1)
+
+/* Only a portion of RAM (used_length) is actually used, and migrated.
+ * This used_length size can change across reboots.
+ */
+#define RAM_RESIZEABLE (1 << 2)
+
+/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
+ * zero the page and wake waiting processes.
+ * (Set during postcopy)
+ */
+#define RAM_UF_ZEROPAGE (1 << 3)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end)
-- 
2.7.4

[Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-06-01 Thread junyan . he

From: Junyan He 

QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
live migration. If the backend is on the persistent memory, QEMU needs
to take proper operations to ensure its writes persistent on the
persistent memory. Otherwise, a host power failure may result in the
loss the guest data on the persistent memory.

This v3 patch series is based on Marcel's patch "mem: add share
parameter to memory-backend-ram" [1] because of the changes in patch 1.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

Previous versions can be found at
v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html
V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Changes in v6:
* (Patch 1) Expose all ram block flags rather than redefine the flags.
* (Patch 4) Use pkg-config rather the hard check when configure. 
* (Patch 7) Sync and flush all the pmem data when migration completes,
rather than sync pages one by one in previous version.

Changes in v5:
* (Patch 9) Add post copy check and output some messages for nvdimm.

Changes in v4:
* (Patch 2) Fix compilation errors found by patchew.

Changes in v3:
* (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
PMEM writes in it, so we don't need the _common function.
* (Patch 6) Expose qemu_get_buffer_common so we can remove the
unnecessary qemu_get_buffer_to_pmem wrapper.
* (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
PMEM writes in it, so we can remove the unnecessary
xbzrle_decode_buffer_{common, to_pmem}.
* Move libpmem stubs to stubs/pmem.c and fix the compilation failures
of test-{xbzrle,vmstate}.c.

Changes in v2:
* (Patch 1) Use a flags parameter in file ram allocation functions.
* (Patch 2) Add a new option 'pmem' to hostmem-file.
* (Patch 3) Use libpmem to operate on the persistent memory, rather
than re-implementing those operations in QEMU.
* (Patch 5-8) Consider the write persistence in the migration path.


Junyan:
[1/7] memory, exec: Expose all memory block related flags.
[6/7] migration/ram: Add check and info message to nvdimm post copy.
[7/7] migration/ram: ensure write persistence on loading all date to PMEM.

Haozhong:
[5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation

Haozhong & Junyan:
[2/7] memory, exec: switch file ram allocation functions to 'flags' parameters
[3/7] hostmem-file: add the 'pmem' option
[4/7] configure: add libpmem support


Signed-off-by: Haozhong Zhang 
Signed-off-by: Junyan He 

---
backends/hostmem-file.c | 28 +++-
configure   | 29 +
docs/nvdimm.txt | 14 ++
exec.c  | 36 ++--
hw/mem/nvdimm.c |  9 -
include/exec/memory.h   | 31 +--
include/exec/ram_addr.h | 28 ++--
include/qemu/pmem.h | 24 
memory.c|  8 +---
migration/ram.c | 18 ++
numa.c  |  2 +-
qemu-options.hx |  7 +++
stubs/Makefile.objs |  1 +
stubs/pmem.c| 23 +++
14 files changed, 226 insertions(+), 32 deletions(-)
-- 
2.7.4

Re: [Qemu-devel] [PATCH 2/9 V5] hostmem-file: add the 'pmem' option

2018-05-31 Thread Junyan He

Because we have disk backend imitation for NVDimm kind memory, I think it is 
more

flexible for user to specify the real or not real pmem rather than we check in 
qemu

using the pmem_is_pmem API


From: Stefan Hajnoczi 
Sent: Thursday, May 31, 2018 1:09:42 PM
To: junyan...@gmx.com
Cc: qemu-devel@nongnu.org; Haozhong Zhang; xiaoguangrong.e...@gmail.com; 
crosthwaite.pe...@gmail.com; m...@redhat.com; dgilb...@redhat.com; 
ehabk...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; 
pbonz...@redhat.com; imamm...@redhat.com; r...@twiddle.net
Subject: Re: [Qemu-devel] [PATCH 2/9 V5] hostmem-file: add the 'pmem' option

On Thu, May 10, 2018 at 10:08:51AM +0800, junyan...@gmx.com wrote:
> From: Junyan He 
>
> When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
> needs to know whether the backend storage is a real persistent memory,
> in order to decide whether special operations should be performed to
> ensure the data persistence.
>
> This boolean option 'pmem' allows users to specify whether the backend
> storage of memory-backend-file is a real persistent memory. If
> 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
> corresponding memory region.

I'm still not sure if this option is necessary since pmem_is_pmem() is
available with the introduction of the libpmem dependency.  Why can't it
be used?

Stefan

Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-05-31 Thread Junyan He



> > So Haozhong's manner seems to be a little faster and I choose to keep that.
> >
> > If you want to choose this manner, the code will be clean and no need for
>
> > >   typedef struct {
> > >   void (*memset)(void *s, int c, size_t n);
> > >   void (*memcpy)(void *dest, const void *src, size_t n);
> > >   } MemoryOperations;
> >
> >
> > performance is close, and I am a little new in Qemu:), so both options are 
> > OK for me,
> >
> > Which one do you prefer to?

> The one with the least impact;  the migration code is getting more and
> more complex, so having to do the 'if (is_pmem)' check everywhere isn't
> nice,  passing an 'ops' pointer in is better.  However if you can do the
> 'flush before complete' instead then the amount of code change is a LOT
> smaller.
> The only other question is whether from your pmem view, the
> flush-before-complete causes any problems;  in the worst case, how long
> could the flush take?

According to my understanding,  flush-before-complete should be OK and save.

Haozhong gave the hint that the flush step by step may be faster and I think the

benchmark shows that they are close.

The worst case is that all the pmem-like memory will be flushed after completing

migration, I am not sure whether there are some unused pmem for new guest will 
also

be flushed.


> Dave


From: Dr. David Alan Gilbert 
Sent: Thursday, May 31, 2018 2:42:19 PM
To: Junyan He
Cc: Stefan Hajnoczi; qemu-devel@nongnu.org; Haozhong Zhang; 
xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; m...@redhat.com; 
ehabk...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; 
pbonz...@redhat.com; imamm...@redhat.com; r...@twiddle.net
Subject: Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU 
writes to persistent memory

* Junyan He (junyan...@gmx.com) wrote:
> > Also, there was a discussion about leaving the code unchanged but adding
> > an nvdimm_flush() call at the very end of migration.  I think someone
> > benchmarked it but can't find the email.  Please post a link or
> > summarize the results, because that approach would be much less
> > invasive.  Thanks!
>
>
> And previous comments:
>
>
> > > > 2. The migration/ram code is invasive.  Is it really necessary to
> > > >persist data each time pages are loaded from a migration stream?  It
> > > >seems simpler to migrate as normal and call pmem_persist() just once
> > > >after RAM has been migrated but before the migration completes.
> > >
> > > The concern is about the overhead of cache flush.
> > >
> > > In this patch series, if possible, QEMU will use pmem_mem{set,cpy}_nodrain
> > > APIs to copy NVDIMM blocks. Those APIs use movnt (if it's available) and
> > > can avoid the subsequent cache flush.
> > >
> > > Anyway, I'll make some microbenchmark to check which one will be better.
>
> > The problem is not just the overhead; the problem is the code
> > complexity; this series makes all the paths through the migration code
> > more complex in places we wouldn't expect to change.
>
> I already use the migration info tool and list the result in the Mail just 
> after this patch set sent:
>
> Disable all haozhong's pmem_drain and  pmem_memset_nodrain  kind function call
> and make the cleanup function do the flush job like this:
>
> static int ram_load_cleanup(void *opaque)
> {
> RAMBlock *rb;
> RAMBLOCK_FOREACH(rb) {
> if (ramblock_is_pmem(rb)) {
> pmem_persist(rb->host, rb->used_length);
>  }
> }
>
> xbzrle_load_cleanup();
> compress_threads_load_cleanup();
>
> RAMBLOCK_FOREACH(rb) {
> g_free(rb->receivedmap);
> rb->receivedmap = NULL;
> }
> return 0;
> }
>
>
> The migrate info result is:
>
> Haozhong's Manner
>
> (qemu) migrate -d tcp:localhost:
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
> off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
> block: off return-path: off pause-before-switchover: off x-multifd: off 
> dirty-bitmaps: off postcopy-blocktime: off
> Migration status: completed
> total time: 333668 milliseconds
> downtime: 17 milliseconds
> setup: 50 milliseconds
> transferred ram: 10938039 kbytes
> throughput: 268.55 mbps
> remaining ram: 0 kbytes
>

Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-05-31 Thread Junyan He

> Also, there was a discussion about leaving the code unchanged but adding
> an nvdimm_flush() call at the very end of migration.  I think someone
> benchmarked it but can't find the email.  Please post a link or
> summarize the results, because that approach would be much less
> invasive.  Thanks!


And previous comments:


> > > 2. The migration/ram code is invasive.  Is it really necessary to
> > >persist data each time pages are loaded from a migration stream?  It
> > >seems simpler to migrate as normal and call pmem_persist() just once
> > >after RAM has been migrated but before the migration completes.
> >
> > The concern is about the overhead of cache flush.
> >
> > In this patch series, if possible, QEMU will use pmem_mem{set,cpy}_nodrain
> > APIs to copy NVDIMM blocks. Those APIs use movnt (if it's available) and
> > can avoid the subsequent cache flush.
> >
> > Anyway, I'll make some microbenchmark to check which one will be better.

> The problem is not just the overhead; the problem is the code
> complexity; this series makes all the paths through the migration code
> more complex in places we wouldn't expect to change.

I already use the migration info tool and list the result in the Mail just 
after this patch set sent:

Disable all haozhong's pmem_drain and  pmem_memset_nodrain  kind function call
and make the cleanup function do the flush job like this:

static int ram_load_cleanup(void *opaque)
{
RAMBlock *rb;
RAMBLOCK_FOREACH(rb) {
if (ramblock_is_pmem(rb)) {
pmem_persist(rb->host, rb->used_length);
 }
}

xbzrle_load_cleanup();
compress_threads_load_cleanup();

RAMBLOCK_FOREACH(rb) {
g_free(rb->receivedmap);
rb->receivedmap = NULL;
}
return 0;
}


The migrate info result is:

Haozhong's Manner

(qemu) migrate -d tcp:localhost:
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: 
off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: 
off postcopy-blocktime: off
Migration status: completed
total time: 333668 milliseconds
downtime: 17 milliseconds
setup: 50 milliseconds
transferred ram: 10938039 kbytes
throughput: 268.55 mbps
remaining ram: 0 kbytes
total ram: 11027272 kbytes
duplicate: 35533 pages
skipped: 0 pages
normal: 2729095 pages
normal bytes: 10916380 kbytes
dirty sync count: 4
page size: 4 kbytes
(qemu)


flush before complete

QEMU 2.12.50 monitor - type 'help' for more information
(qemu) migrate -d tcp:localhost:
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: 
off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: 
off postcopy-blocktime: off
Migration status: completed
total time: 334836 milliseconds
downtime: 17 milliseconds
setup: 49 milliseconds
transferred ram: 10978886 kbytes
throughput: 268.62 mbps
remaining ram: 0 kbytes
total ram: 11027272 kbytes
duplicate: 23149 pages
skipped: 0 pages
normal: 2739314 pages
normal bytes: 10957256 kbytes
dirty sync count: 4
page size: 4 kbytes
(qemu)


So Haozhong's manner seems to be a little faster and I choose to keep that.

If you want to choose this manner, the code will be clean and no need for

>   typedef struct {
>   void (*memset)(void *s, int c, size_t n);
>   void (*memcpy)(void *dest, const void *src, size_t n);
>   } MemoryOperations;


performance is close, and I am a little new in Qemu:), so both options are OK 
for me,

Which one do you prefer to?


From: Stefan Hajnoczi 
Sent: Thursday, May 31, 2018 1:18:58 PM
To: junyan...@gmx.com
Cc: qemu-devel@nongnu.org; Haozhong Zhang; xiaoguangrong.e...@gmail.com; 
crosthwaite.pe...@gmail.com; m...@redhat.com; dgilb...@redhat.com; 
ehabk...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; 
pbonz...@redhat.com; imamm...@redhat.com; r...@twiddle.net
Subject: Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU 
writes to persistent memory

David Gilbert previously suggested a memory access interface.  I guess
it would look something like this:

  typedef struct {
  void (*memset)(void *s, int c, size_t n);
  void (*memcpy)(void *dest, const void *src, size_t n);
  } MemoryOperations;

That way code doesn't need if (pmem) A else B.  It can just do
mem_ops->foo().  Have you looked into this idea?

Also, there was a dis

Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-05-27 Thread Junyan He

  GEN qemu-doc.html
  GEN qemu-doc.txt
  GEN qemu.1
  CC  s390-ccw/bootmap.o
  GEN docs/interop/qemu-qmp-ref.html
./qemu-options.texi:2855: unknown command `address'
./qemu-options.texi:2855: unknown command `hidden'
make: *** [Makefile:915: qemu-doc.html] Error 1



It seems that this is not caused by my patch set? And I can not duplicate in 
local.

Pings, thanks


From: Qemu-devel  on behalf of 
Junyan He 
Sent: Monday, May 21, 2018 3:19:48 AM
To: junyan...@gmx.com
Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; 
m...@redhat.com; qemu-devel@nongnu.org; dgilb...@redhat.com; 
quint...@redhat.com; Junyan He; stefa...@redhat.com; imamm...@redhat.com; 
pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com
Subject: Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU 
writes to persistent memory

   Ping for review, thanks

   Sent: Thursday, May 10, 2018 at 10:08 AM
   From: junyan...@gmx.com
   To: qemu-devel@nongnu.org
   Cc: "Haozhong Zhang" ,
   xiaoguangrong.e...@gmail.com, crosthwaite.pe...@gmail.com,
   m...@redhat.com, dgilb...@redhat.com, ehabk...@redhat.com,
   quint...@redhat.com, "Junyan He" ,
   stefa...@redhat.com, pbonz...@redhat.com, imamm...@redhat.com,
   r...@twiddle.net
   Subject: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of
   QEMU writes to persistent memory
   From: Junyan He 
   QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
   live migration. If the backend is on the persistent memory, QEMU needs
   to take proper operations to ensure its writes persistent on the
   persistent memory. Otherwise, a host power failure may result in the
   loss the guest data on the persistent memory.
   This v3 patch series is based on Marcel's patch "mem: add share
   parameter to memory-backend-ram" [1] because of the changes in patch 1.
   [1]
   [1]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html
   Previous versions can be found at
   V4:
   [2]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
   v3:
   [3]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
   v2:
   [4]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
   v1:
   [5]https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html
   Changes in v5:
   * (Patch 9) Add post copy check and output some messages for nvdimm.
   Changes in v4:
   * (Patch 2) Fix compilation errors found by patchew.
   Changes in v3:
   * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
   PMEM writes in it, so we don't need the _common function.
   * (Patch 6) Expose qemu_get_buffer_common so we can remove the
   unnecessary qemu_get_buffer_to_pmem wrapper.
   * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
   PMEM writes in it, so we can remove the unnecessary
   xbzrle_decode_buffer_{common, to_pmem}.
   * Move libpmem stubs to stubs/pmem.c and fix the compilation failures
   of test-{xbzrle,vmstate}.c.
   Changes in v2:
   * (Patch 1) Use a flags parameter in file ram allocation functions.
   * (Patch 2) Add a new option 'pmem' to hostmem-file.
   * (Patch 3) Use libpmem to operate on the persistent memory, rather
   than re-implementing those operations in QEMU.
   * (Patch 5-8) Consider the write persistence in the migration path.
   Haozhong Zhang (8):
   [1/9] memory, exec: switch file ram allocation functions to 'flags'
   parameters
   [2/9] hostmem-file: add the 'pmem' option
   [3/9] configure: add libpmem support
   [4/9] mem/nvdimm: ensure write persistence to PMEM in label emulation
   [5/9] migration/ram: ensure write persistence on loading zero pages to
   PMEM
   [6/9] migration/ram: ensure write persistence on loading normal pages
   to PMEM
   [7/9] migration/ram: ensure write persistence on loading compressed
   pages to PMEM
   [8/9] migration/ram: ensure write persistence on loading xbzrle pages
   to PMEM
   Junyan He (1):
   [9/9] migration/ram: Add check and info message to nvdimm post copy.
   Signed-off-by: Haozhong Zhang 
   Signed-off-by: Junyan He 
   ---
   backends/hostmem-file.c | 27 ++-
   configure | 35 +++
   docs/nvdimm.txt | 14 ++
   exec.c | 20 
   hw/mem/nvdimm.c | 9 -
   include/exec/memory.h | 12 ++--
   include/exec/ram_addr.h | 28 ++--
   include/migration/qemu-file-types.h | 2 ++
   include/qemu/pmem.h | 27 +++
   memory.c | 8 +---
   migration/qemu-file.c | 29 +++--
   migration/ram.c | 52
   ++--
   migration/ram.h | 2 +-
   migration/rdma.c | 2 +-
   migration/xbzrle.c | 8 ++--
   migration/xbzrle.h | 3 ++-
   numa.c | 2 +-

Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-05-20 Thread Junyan He

   Ping for review, thanks

   Sent: Thursday, May 10, 2018 at 10:08 AM
   From: junyan...@gmx.com
   To: qemu-devel@nongnu.org
   Cc: "Haozhong Zhang" ,
   xiaoguangrong.e...@gmail.com, crosthwaite.pe...@gmail.com,
   m...@redhat.com, dgilb...@redhat.com, ehabk...@redhat.com,
   quint...@redhat.com, "Junyan He" ,
   stefa...@redhat.com, pbonz...@redhat.com, imamm...@redhat.com,
   r...@twiddle.net
   Subject: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of
   QEMU writes to persistent memory
   From: Junyan He 
   QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
   live migration. If the backend is on the persistent memory, QEMU needs
   to take proper operations to ensure its writes persistent on the
   persistent memory. Otherwise, a host power failure may result in the
   loss the guest data on the persistent memory.
   This v3 patch series is based on Marcel's patch "mem: add share
   parameter to memory-backend-ram" [1] because of the changes in patch 1.
   [1]
   [1]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html
   Previous versions can be found at
   V4:
   [2]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
   v3:
   [3]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
   v2:
   [4]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
   v1:
   [5]https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html
   Changes in v5:
   * (Patch 9) Add post copy check and output some messages for nvdimm.
   Changes in v4:
   * (Patch 2) Fix compilation errors found by patchew.
   Changes in v3:
   * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
   PMEM writes in it, so we don't need the _common function.
   * (Patch 6) Expose qemu_get_buffer_common so we can remove the
   unnecessary qemu_get_buffer_to_pmem wrapper.
   * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
   PMEM writes in it, so we can remove the unnecessary
   xbzrle_decode_buffer_{common, to_pmem}.
   * Move libpmem stubs to stubs/pmem.c and fix the compilation failures
   of test-{xbzrle,vmstate}.c.
   Changes in v2:
   * (Patch 1) Use a flags parameter in file ram allocation functions.
   * (Patch 2) Add a new option 'pmem' to hostmem-file.
   * (Patch 3) Use libpmem to operate on the persistent memory, rather
   than re-implementing those operations in QEMU.
   * (Patch 5-8) Consider the write persistence in the migration path.
   Haozhong Zhang (8):
   [1/9] memory, exec: switch file ram allocation functions to 'flags'
   parameters
   [2/9] hostmem-file: add the 'pmem' option
   [3/9] configure: add libpmem support
   [4/9] mem/nvdimm: ensure write persistence to PMEM in label emulation
   [5/9] migration/ram: ensure write persistence on loading zero pages to
   PMEM
   [6/9] migration/ram: ensure write persistence on loading normal pages
   to PMEM
   [7/9] migration/ram: ensure write persistence on loading compressed
   pages to PMEM
   [8/9] migration/ram: ensure write persistence on loading xbzrle pages
   to PMEM
   Junyan He (1):
   [9/9] migration/ram: Add check and info message to nvdimm post copy.
   Signed-off-by: Haozhong Zhang 
   Signed-off-by: Junyan He 
   ---
   backends/hostmem-file.c | 27 ++-
   configure | 35 +++
   docs/nvdimm.txt | 14 ++
   exec.c | 20 
   hw/mem/nvdimm.c | 9 -
   include/exec/memory.h | 12 ++--
   include/exec/ram_addr.h | 28 ++--
   include/migration/qemu-file-types.h | 2 ++
   include/qemu/pmem.h | 27 +++
   memory.c | 8 +---
   migration/qemu-file.c | 29 +++--
   migration/ram.c | 52
   ++--
   migration/ram.h | 2 +-
   migration/rdma.c | 2 +-
   migration/xbzrle.c | 8 ++--
   migration/xbzrle.h | 3 ++-
   numa.c | 2 +-
   qemu-options.hx | 7 +++
   stubs/Makefile.objs | 1 +
   stubs/pmem.c | 37 +
   tests/Makefile.include | 4 ++--
   tests/test-xbzrle.c | 4 ++--
   22 files changed, 290 insertions(+), 43 deletions(-)
   --
   2.7.4

References

   1. https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html
   2. https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
   3. https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
   4. https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
   5. https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Re: [Qemu-devel] [PATCH 1/9 V5] memory, exec: switch file ram allocation functions to 'flags' parameters

2018-05-11 Thread Junyan He

>
> 
> 
>
>Sent: Friday, May 11, 2018 at 5:08 AM
>From: "Murilo Opsfelder Araujo" 
>To: junyan...@gmx.com
>Cc: "Haozhong Zhang" , xiaoguangrong.e...@gmail.com, 
>crosthwaite.pe...@gmail.com, m...@redhat.com, qemu-devel@nongnu.org, 
>dgilb...@redhat.com, quint...@redhat.com, "Junyan He" , 
>stefa...@redhat.com, imamm...@redhat.com, pbonz...@redhat.com, 
>r...@twiddle.net, ehabk...@redhat.com
>Subject: Re: [Qemu-devel] [PATCH 1/9 V5] memory, exec: switch file ram 
>allocation functions to 'flags' parameters
>On Thu, May 10, 2018 at 10:08:50AM +0800, junyan...@gmx.com wrote:
>> From: Junyan He 
>>
>> As more flag parameters besides the existing 'share' are going to be
>> added to following functions
>> memory_region_init_ram_from_file
>> qemu_ram_alloc_from_fd
>> qemu_ram_alloc_from_file
>> let's switch them to use the 'flags' parameters so as to ease future
>> flag additions.
>>
>> The existing 'share' flag is converted to the QEMU_RAM_SHARE bit in
>> flags, and other flag bits are ignored by above functions right now.
>>
>> Signed-off-by: Haozhong Zhang 
>> ---
>> backends/hostmem-file.c | 3 ++-
>> exec.c | 7 ---
>> include/exec/memory.h | 10 --
>> include/exec/ram_addr.h | 25 +++--
>> memory.c | 8 +---
>> numa.c | 2 +-
>> 6 files changed, 43 insertions(+), 12 deletions(-)
>>
>> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
>> index 134b08d..30df843 100644
>> --- a/backends/hostmem-file.c
>> +++ b/backends/hostmem-file.c
>> @@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, 
>> Error **errp)
>> path = object_get_canonical_path(OBJECT(backend));
>> memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>> path,
>> - backend->size, fb->align, backend->share,
>> + backend->size, fb->align,
>> + backend->share ? QEMU_RAM_SHARE : 0,
>> fb->mem_path, errp);
>> g_free(path);
>> }
>> diff --git a/exec.c b/exec.c
>> index c7fcefa..fa33c29 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -2030,12 +2030,13 @@ static void ram_block_add(RAMBlock *new_block, Error 
>> **errp, bool shared)
>>
>> #ifdef __linux__
>> RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>> - bool share, int fd,
>> + uint64_t flags, int fd,
>> Error **errp)
>> {
>> RAMBlock *new_block;
>> Error *local_err = NULL;
>> int64_t file_size;
>> + bool share = flags & QEMU_RAM_SHARE;
>>
>> if (xen_enabled()) {
>> error_setg(errp, "-mem-path not supported with Xen");
>> @@ -2091,7 +2092,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
>> MemoryRegion *mr,
>>
>>
>> RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
>> - bool share, const char *mem_path,
>> + uint64_t flags, const char *mem_path,
>> Error **errp)
>> {
>> int fd;
>> @@ -2103,7 +2104,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
>> MemoryRegion *mr,
>> return NULL;
>> }
>>
>> - block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
>> + block = qemu_ram_alloc_from_fd(size, mr, flags, fd, errp);
>> if (!block) {
>> if (created) {
>> unlink(mem_path);
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 31eae0a..0460313 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -507,6 +507,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
>> void *host),
>> Error **errp);
>> #ifdef __linux__
>> +
>> +#define QEMU_RAM_SHARE (1UL << 0)
>> +
>
>Hi, Junyan.
>
>How does this differ from RAM_SHARED in exec.c?
>
Yes, they are really the same meaning.
But this one is for memory object backend while the RAM_SHARED in exec.c is 
used for
memory block. I think we need it here.


>> /**
>> * memory_region_init_ram_from_file: Initialize RAM memory region with a
>> * mmap-ed backend.
>> @@ -518,7 +521,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
>> * @size: size of the region.
>> * @align: alignment of the region base address; if 0, the default alignment
>> * (getpagesize()) will be used.
>> - * @share: %true if memory must be mmaped with the MAP_SHARED flag
>> + * @flags: specify properties of this memory region, which can be one or 
>> bit-or
>> + * of following values:
>> + * - QEMU_RAM_SHARE: memory must be

[Qemu-devel] [PATCH 6/9 V5] migration/ram: ensure write persistence on loading normal pages to PMEM

2018-05-09 Thread junyan . he

From: Junyan He 

When loading a normal page to persistent memory, load its data by
libpmem function pmem_memcpy_nodrain() instead of memcpy(). Combined
with a call to pmem_drain() at the end of memory loading, we can
guarantee all those normal pages are persistenly loaded to PMEM.

Signed-off-by: Haozhong Zhang 
---
 include/migration/qemu-file-types.h |  2 ++
 include/qemu/pmem.h |  1 +
 migration/qemu-file.c   | 29 +++--
 migration/ram.c |  2 +-
 stubs/pmem.c|  5 +
 tests/Makefile.include  |  2 +-
 6 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/include/migration/qemu-file-types.h 
b/include/migration/qemu-file-types.h
index bd6d7dd..c7c3f66 100644
--- a/include/migration/qemu-file-types.h
+++ b/include/migration/qemu-file-types.h
@@ -33,6 +33,8 @@ void qemu_put_byte(QEMUFile *f, int v);
 void qemu_put_be16(QEMUFile *f, unsigned int v);
 void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
+size_t qemu_get_buffer_common(QEMUFile *f, uint8_t *buf, size_t size,
+  bool is_pmem);
 size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size);
 
 int qemu_get_byte(QEMUFile *f);
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 9f39ce8..cb9fa5f 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -16,6 +16,7 @@
 #include 
 #else  /* !CONFIG_LIBPMEM */
 
+void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len);
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
 void *pmem_memset_nodrain(void *pmemdest, int c, size_t len);
 void pmem_drain(void);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 0463f4c..1ec31d5 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -26,6 +26,7 @@
 #include "qemu-common.h"
 #include "qemu/error-report.h"
 #include "qemu/iov.h"
+#include "qemu/pmem.h"
 #include "migration.h"
 #include "qemu-file.h"
 #include "trace.h"
@@ -471,18 +472,13 @@ size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, 
size_t size, size_t offset)
 return size;
 }
 
-/*
- * Read 'size' bytes of data from the file into buf.
- * 'size' can be larger than the internal buffer.
- *
- * It will return size bytes unless there was an error, in which case it will
- * return as many as it managed to read (assuming blocking fd's which
- * all current QEMUFile are)
- */
-size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
+size_t qemu_get_buffer_common(QEMUFile *f, uint8_t *buf, size_t size,
+  bool is_pmem)
 {
 size_t pending = size;
 size_t done = 0;
+void *(*memcpy_func)(void *d, const void *s, size_t n) =
+is_pmem ? pmem_memcpy_nodrain : memcpy;
 
 while (pending > 0) {
 size_t res;
@@ -492,7 +488,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t 
size)
 if (res == 0) {
 return done;
 }
-memcpy(buf, src, res);
+memcpy_func(buf, src, res);
 qemu_file_skip(f, res);
 buf += res;
 pending -= res;
@@ -502,6 +498,19 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t 
size)
 }
 
 /*
+ * Read 'size' bytes of data from the file into buf.
+ * 'size' can be larger than the internal buffer.
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
+{
+return qemu_get_buffer_common(f, buf, size, false);
+}
+
+/*
  * Read 'size' bytes of data from the file.
  * 'size' can be larger than the internal buffer.
  *
diff --git a/migration/ram.c b/migration/ram.c
index e6ae9e3..2a180bc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3063,7 +3063,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 break;
 
 case RAM_SAVE_FLAG_PAGE:
-qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+qemu_get_buffer_common(f, host, TARGET_PAGE_SIZE, is_pmem);
 break;
 
 case RAM_SAVE_FLAG_COMPRESS_PAGE:
diff --git a/stubs/pmem.c b/stubs/pmem.c
index 2f07ae0..b50c35e 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -26,3 +26,8 @@ void *pmem_memset_nodrain(void *pmemdest, int c, size_t len)
 void pmem_drain(void)
 {
 }
+
+void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len)
+{
+return memcpy(pmemdest, src, len);
+}
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 3b9a5e3..5c25b9b 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -652,7 +652,7 @@ tests/test-qdev-global-props$(EXESUF): 
tests/test-qdev-global-props.o \
$(te

[Qemu-devel] [PATCH 4/9 V5] mem/nvdimm: ensure write persistence to PMEM in label emulation

2018-05-09 Thread junyan . he

From: Junyan He 

Guest writes to vNVDIMM labels are intercepted and performed on the
backend by QEMU. When the backend is a real persistent memort, QEMU
needs to take proper operations to ensure its write persistence on the
persistent memory. Otherwise, a host power failure may result in the
loss of guest label configurations.

Signed-off-by: Haozhong Zhang 
---
 hw/mem/nvdimm.c |  9 -
 include/qemu/pmem.h | 23 +++
 stubs/Makefile.objs |  1 +
 stubs/pmem.c| 19 +++
 4 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/pmem.h
 create mode 100644 stubs/pmem.c

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index acb656b..0c962fd 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/pmem.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "hw/mem/nvdimm.h"
@@ -155,11 +156,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
const void *buf,
 {
 MemoryRegion *mr;
 PCDIMMDevice *dimm = PC_DIMM(nvdimm);
+bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem),
+"pmem", NULL);
 uint64_t backend_offset;
 
 nvdimm_validate_rw_label_data(nvdimm, size, offset);
 
-memcpy(nvdimm->label_data + offset, buf, size);
+if (!is_pmem) {
+memcpy(nvdimm->label_data + offset, buf, size);
+} else {
+pmem_memcpy_persist(nvdimm->label_data + offset, buf, size);
+}
 
 mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
 backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
new file mode 100644
index 000..00d6680
--- /dev/null
+++ b/include/qemu/pmem.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU header file for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_PMEM_H
+#define QEMU_PMEM_H
+
+#ifdef CONFIG_LIBPMEM
+#include 
+#else  /* !CONFIG_LIBPMEM */
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+
+#endif /* CONFIG_LIBPMEM */
+
+#endif /* !QEMU_PMEM_H */
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 2d59d84..ba944b9 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,3 +43,4 @@ stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
 stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
+stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o
\ No newline at end of file
diff --git a/stubs/pmem.c b/stubs/pmem.c
new file mode 100644
index 000..b4ec72d
--- /dev/null
+++ b/stubs/pmem.c
@@ -0,0 +1,19 @@
+/*
+ * Stubs for libpmem.
+ *
+ * Copyright (c) 2018 Intel Corporation.
+ *
+ * Author: Haozhong Zhang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/pmem.h"
+
+void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
+{
+return memcpy(pmemdest, src, len);
+}
-- 
2.7.4

[Qemu-devel] [PATCH 3/9 V5] configure: add libpmem support

2018-05-09 Thread junyan . he

From: Junyan He 

Add a pair of configure options --{enable,disable}-libpmem to control
whether QEMU is compiled with PMDK libpmem [1].

QEMU may write to the host persistent memory (e.g. in vNVDIMM label
emulation and live migration), so it must take the proper operations
to ensure the persistence of its own writes. Depending on the CPU
models and available instructions, the optimal operation can vary [2].
PMDK libpmem have already implemented those operations on multiple CPU
models (x86 and ARM) and the logic to select the optimal ones, so QEMU
can just use libpmem rather than re-implement them.

[1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/
[2] 
https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33

Signed-off-by: Haozhong Zhang 
---
 configure | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/configure b/configure
index 1443422..cbb3793 100755
--- a/configure
+++ b/configure
@@ -456,6 +456,7 @@ jemalloc="no"
 replication="yes"
 vxhs=""
 libxml2=""
+libpmem=""
 
 supported_cpu="no"
 supported_os="no"
@@ -1379,6 +1380,10 @@ for opt do
   ;;
   --disable-git-update) git_update=no
   ;;
+  --enable-libpmem) libpmem=yes
+  ;;
+  --disable-libpmem) libpmem=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1636,6 +1641,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   crypto-afalgLinux AF_ALG crypto backend driver
   vhost-user  vhost-user support
   capstonecapstone disassembler support
+  libpmem libpmem support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5443,6 +5449,30 @@ EOF
 fi
 
 ##
+# check for libpmem
+
+if test "$libpmem" != "no"; then
+  cat > $TMPC <
+int main(void)
+{
+  pmem_is_pmem(0, 0);
+  return 0;
+}
+EOF
+  libpmem_libs="-lpmem"
+  if compile_prog "" "$libpmem_libs" ; then
+libs_softmmu="$libpmem_libs $libs_softmmu"
+libpmem="yes"
+  else
+if test "$libpmem" = "yes" ; then
+  feature_not_found "libpmem" "Install nvml or pmdk"
+fi
+libpmem="no"
+  fi
+fi
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -5903,6 +5933,7 @@ echo "avx2 optimization $avx2_opt"
 echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "capstone  $capstone"
+echo "libpmem support   $libpmem"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -6647,6 +6678,10 @@ if test "$vxhs" = "yes" ; then
   echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
 fi
 
+if test "$libpmem" = "yes" ; then
+  echo "CONFIG_LIBPMEM=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
-- 
2.7.4

[Qemu-devel] [PATCH 9/9 V5] migration/ram: Add check and info message to nvdimm post copy.

2018-05-09 Thread junyan . he

From: Junyan He 

The nvdimm kind memory does not support post copy now.
We disable post copy if we have nvdimm memory and print some
log hint to user.

Signed-off-by: Junyan He 
---
 migration/ram.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index afe227e..aa6bb74 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3120,6 +3120,15 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 static bool ram_has_postcopy(void *opaque)
 {
+RAMBlock *rb;
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+info_report("Block: %s, host: %p is a nvdimm memory, postcopy"
+ "is not supported now!", rb->idstr, rb->host);
+return false;
+}
+}
+
 return migrate_postcopy_ram();
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH 1/9 V5] memory, exec: switch file ram allocation functions to 'flags' parameters

2018-05-09 Thread junyan . he

From: Junyan He 

As more flag parameters besides the existing 'share' are going to be
added to following functions
  memory_region_init_ram_from_file
  qemu_ram_alloc_from_fd
  qemu_ram_alloc_from_file
let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the QEMU_RAM_SHARE bit in
flags, and other flag bits are ignored by above functions right now.

Signed-off-by: Haozhong Zhang 
---
 backends/hostmem-file.c |  3 ++-
 exec.c  |  7 ---
 include/exec/memory.h   | 10 --
 include/exec/ram_addr.h | 25 +++--
 memory.c|  8 +---
 numa.c  |  2 +-
 6 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 134b08d..30df843 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 path = object_get_canonical_path(OBJECT(backend));
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
- backend->size, fb->align, backend->share,
+ backend->size, fb->align,
+ backend->share ? QEMU_RAM_SHARE : 0,
  fb->mem_path, errp);
 g_free(path);
 }
diff --git a/exec.c b/exec.c
index c7fcefa..fa33c29 100644
--- a/exec.c
+++ b/exec.c
@@ -2030,12 +2030,13 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 
 #ifdef __linux__
 RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
- bool share, int fd,
+ uint64_t flags, int fd,
  Error **errp)
 {
 RAMBlock *new_block;
 Error *local_err = NULL;
 int64_t file_size;
+bool share = flags & QEMU_RAM_SHARE;
 
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
@@ -2091,7 +2092,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 
 
 RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-   bool share, const char *mem_path,
+   uint64_t flags, const char *mem_path,
Error **errp)
 {
 int fd;
@@ -2103,7 +2104,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 return NULL;
 }
 
-block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp);
+block = qemu_ram_alloc_from_fd(size, mr, flags, fd, errp);
 if (!block) {
 if (created) {
 unlink(mem_path);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 31eae0a..0460313 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -507,6 +507,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
void *host),
Error **errp);
 #ifdef __linux__
+
+#define QEMU_RAM_SHARE  (1UL << 0)
+
 /**
  * memory_region_init_ram_from_file:  Initialize RAM memory region with a
  *mmap-ed backend.
@@ -518,7 +521,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @size: size of the region.
  * @align: alignment of the region base address; if 0, the default alignment
  * (getpagesize()) will be used.
- * @share: %true if memory must be mmaped with the MAP_SHARED flag
+ * @flags: specify properties of this memory region, which can be one or bit-or
+ * of following values:
+ * - QEMU_RAM_SHARE: memory must be mmaped with the MAP_SHARED flag
+ * Other bits are ignored.
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -530,7 +536,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   const char *name,
   uint64_t size,
   uint64_t align,
-  bool share,
+  uint64_t flags,
   const char *path,
   Error **errp);
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index cf2446a..b8b01d1 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -72,12 +72,33 @@ static inline unsigned long int 
ramblock_recv_bitmap_offset(void *host_addr,
 
 long qemu_getrampagesize(void);
 unsigned long last_ram_page(void);
+
+/**
+ * qemu_ram_alloc_from_file,
+ * qemu_ram_alloc_from_fd:  Allocate a ram block from the specified back
+ *

[Qemu-devel] [PATCH 7/9 V5] migration/ram: ensure write persistence on loading compressed pages to PMEM

2018-05-09 Thread junyan . he

From: Junyan He 

When loading a compressed page to persistent memory, flush CPU cache
after the data is decompressed. Combined with a call to pmem_drain()
at the end of memory loading, we can guarantee those compressed pages
are persistently loaded to PMEM.

Signed-off-by: Haozhong Zhang 
---
 include/qemu/pmem.h |  1 +
 migration/ram.c | 10 --
 stubs/pmem.c|  4 
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index cb9fa5f..c9140fb 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -20,6 +20,7 @@ void *pmem_memcpy_nodrain(void *pmemdest, const void *src, 
size_t len);
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
 void *pmem_memset_nodrain(void *pmemdest, int c, size_t len);
 void pmem_drain(void);
+void pmem_flush(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index 2a180bc..e0f3dbc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -286,6 +286,7 @@ struct DecompressParam {
 uint8_t *compbuf;
 int len;
 z_stream stream;
+bool is_pmem;
 };
 typedef struct DecompressParam DecompressParam;
 
@@ -2591,6 +2592,9 @@ static void *do_data_decompress(void *opaque)
 error_report("decompress data failed");
 qemu_file_set_error(decomp_file, ret);
 }
+if (param->is_pmem) {
+pmem_flush(des, len);
+}
 
 qemu_mutex_lock(&decomp_done_lock);
 param->done = true;
@@ -2702,7 +2706,8 @@ exit:
 }
 
 static void decompress_data_with_multi_threads(QEMUFile *f,
-   void *host, int len)
+   void *host, int len,
+   bool is_pmem)
 {
 int idx, thread_count;
 
@@ -2716,6 +2721,7 @@ static void decompress_data_with_multi_threads(QEMUFile 
*f,
 qemu_get_buffer(f, decomp_param[idx].compbuf, len);
 decomp_param[idx].des = host;
 decomp_param[idx].len = len;
+decomp_param[idx].is_pmem = is_pmem;
 qemu_cond_signal(&decomp_param[idx].cond);
 qemu_mutex_unlock(&decomp_param[idx].mutex);
 break;
@@ -3073,7 +3079,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret = -EINVAL;
 break;
 }
-decompress_data_with_multi_threads(f, host, len);
+decompress_data_with_multi_threads(f, host, len, is_pmem);
 break;
 
 case RAM_SAVE_FLAG_XBZRLE:
diff --git a/stubs/pmem.c b/stubs/pmem.c
index b50c35e..9e7d86a 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -31,3 +31,7 @@ void *pmem_memcpy_nodrain(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void pmem_flush(const void *addr, size_t len)
+{
+}
-- 
2.7.4

[Qemu-devel] [PATCH 8/9 V5] migration/ram: ensure write persistence on loading xbzrle pages to PMEM

2018-05-09 Thread junyan . he

From: Junyan He 

When loading a xbzrle encoded page to persistent memory, load the data
via libpmem function pmem_memcpy_nodrain() instead of memcpy().
Combined with a call to pmem_drain() at the end of memory loading, we
can guarantee those xbzrle encoded pages are persistently loaded to PMEM.

Signed-off-by: Haozhong Zhang 
---
 migration/ram.c| 6 +++---
 migration/xbzrle.c | 8 ++--
 migration/xbzrle.h | 3 ++-
 tests/Makefile.include | 2 +-
 tests/test-xbzrle.c| 4 ++--
 5 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index e0f3dbc..afe227e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2441,7 +2441,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
 }
 }
 
-static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
+static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host, bool is_pmem)
 {
 unsigned int xh_len;
 int xh_flags;
@@ -2467,7 +2467,7 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void 
*host)
 
 /* decode RLE */
 if (xbzrle_decode_buffer(loaded_data, xh_len, host,
- TARGET_PAGE_SIZE) == -1) {
+ TARGET_PAGE_SIZE, is_pmem) == -1) {
 error_report("Failed to load XBZRLE page - decode error!");
 return -1;
 }
@@ -3083,7 +3083,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 break;
 
 case RAM_SAVE_FLAG_XBZRLE:
-if (load_xbzrle(f, addr, host) < 0) {
+if (load_xbzrle(f, addr, host, is_pmem) < 0) {
 error_report("Failed to decompress XBZRLE page at "
  RAM_ADDR_FMT, addr);
 ret = -EINVAL;
diff --git a/migration/xbzrle.c b/migration/xbzrle.c
index 1ba482d..ca713c3 100644
--- a/migration/xbzrle.c
+++ b/migration/xbzrle.c
@@ -12,6 +12,7 @@
  */
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 
 /*
@@ -126,11 +127,14 @@ int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t 
*new_buf, int slen,
 return d;
 }
 
-int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen)
+int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen,
+ bool is_pmem)
 {
 int i = 0, d = 0;
 int ret;
 uint32_t count = 0;
+void *(*memcpy_func)(void *d, const void *s, size_t n) =
+is_pmem ? pmem_memcpy_nodrain : memcpy;
 
 while (i < slen) {
 
@@ -167,7 +171,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t 
*dst, int dlen)
 return -1;
 }
 
-memcpy(dst + d, src + i, count);
+memcpy_func(dst + d, src + i, count);
 d += count;
 i += count;
 }
diff --git a/migration/xbzrle.h b/migration/xbzrle.h
index a0db507..f18f679 100644
--- a/migration/xbzrle.h
+++ b/migration/xbzrle.h
@@ -17,5 +17,6 @@
 int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
  uint8_t *dst, int dlen);
 
-int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
+int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen,
+ bool is_pmem);
 #endif
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 5c25b9b..23d7162 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -631,7 +631,7 @@ tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o 
$(test-block-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
 tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) 
$(test-crypto-obj-y)
 tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o
-tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o 
migration/page_cache.o $(test-util-obj-y)
+tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o 
migration/page_cache.o stubs/pmem.o $(test-util-obj-y)
 tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o 
$(test-util-obj-y)
 tests/test-int128$(EXESUF): tests/test-int128.o
 tests/rcutorture$(EXESUF): tests/rcutorture.o $(test-util-obj-y)
diff --git a/tests/test-xbzrle.c b/tests/test-xbzrle.c
index f5e08de..9afa0c4 100644
--- a/tests/test-xbzrle.c
+++ b/tests/test-xbzrle.c
@@ -101,7 +101,7 @@ static void test_encode_decode_1_byte(void)
PAGE_SIZE);
 g_assert(dlen == (uleb128_encode_small(&buf[0], 4095) + 2));
 
-rc = xbzrle_decode_buffer(compressed, dlen, buffer, PAGE_SIZE);
+rc = xbzrle_decode_buffer(compressed, dlen, buffer, PAGE_SIZE, false);
 g_assert(rc == PAGE_SIZE);
 g_assert(memcmp(test, buffer, PAGE_SIZE) == 0);
 
@@ -156,7 +156,7 @@ static void encode_decode_range(void)
 dlen = xbzrle_encode_buffer(test, buffer, PAGE_SIZE, compressed,
 PAGE_SIZE);
 
-rc = xbzrle_decode_buffer(compressed

[Qemu-devel] [PATCH 5/9 V5] migration/ram: ensure write persistence on loading zero pages to PMEM

2018-05-09 Thread junyan . he

From: Junyan He 

When loading a zero page, check whether it will be loaded to
persistent memory If yes, load it by libpmem function
pmem_memset_nodrain().  Combined with a call to pmem_drain() at the
end of RAM loading, we can guarantee all those zero pages are
persistently loaded.

Depending on the host HW/SW configurations, pmem_drain() can be
"sfence".  Therefore, we do not call pmem_drain() after each
pmem_memset_nodrain(), or use pmem_memset_persist() (equally
pmem_memset_nodrain() + pmem_drain()), in order to avoid unnecessary
overhead.

Signed-off-by: Haozhong Zhang 
---
 include/qemu/pmem.h |  2 ++
 migration/ram.c | 25 +
 migration/ram.h |  2 +-
 migration/rdma.c|  2 +-
 stubs/pmem.c|  9 +
 5 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..9f39ce8 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,8 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void *pmem_memset_nodrain(void *pmemdest, int c, size_t len);
+void pmem_drain(void);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index 912810c..e6ae9e3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -51,6 +51,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "qemu/pmem.h"
 
 /***/
 /* ram save/restore */
@@ -2529,11 +2530,16 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
  * @host: host address for the zero page
  * @ch: what the page is filled from.  We only support zero
  * @size: size of the zero page
+ * @is_pmem: whether @host is in the persistent memory
  */
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
+void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, bool is_pmem)
 {
 if (ch != 0 || !is_zero_range(host, size)) {
-memset(host, ch, size);
+if (!is_pmem) {
+memset(host, ch, size);
+} else {
+pmem_memset_nodrain(host, ch, size);
+}
 }
 }
 
@@ -2943,6 +2949,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 bool postcopy_running = postcopy_is_running();
 /* ADVISE is earlier, it shows the source has the postcopy capability on */
 bool postcopy_advised = postcopy_is_advised();
+bool need_pmem_drain = false;
 
 seq_iter++;
 
@@ -2968,6 +2975,8 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ram_addr_t addr, total_ram_bytes;
 void *host = NULL;
 uint8_t ch;
+RAMBlock *block = NULL;
+bool is_pmem = false;
 
 addr = qemu_get_be64(f);
 flags = addr & ~TARGET_PAGE_MASK;
@@ -2984,7 +2993,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
  RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
-RAMBlock *block = ram_block_from_stream(f, flags);
+block = ram_block_from_stream(f, flags);
 
 host = host_from_ram_block_offset(block, addr);
 if (!host) {
@@ -2994,6 +3003,9 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 }
 ramblock_recv_bitmap_set(block, host);
 trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
+
+is_pmem = ramblock_is_pmem(block);
+need_pmem_drain = need_pmem_drain || is_pmem;
 }
 
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
@@ -3047,7 +3059,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 case RAM_SAVE_FLAG_ZERO:
 ch = qemu_get_byte(f);
-ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+ram_handle_compressed(host, ch, TARGET_PAGE_SIZE, is_pmem);
 break;
 
 case RAM_SAVE_FLAG_PAGE:
@@ -3090,6 +3102,11 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 }
 
 ret |= wait_for_decompress_done();
+
+if (need_pmem_drain) {
+pmem_drain();
+}
+
 rcu_read_unlock();
 trace_ram_load_complete(ret, seq_iter);
 return ret;
diff --git a/migration/ram.h b/migration/ram.h
index 5030be1..5c6a288 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -57,7 +57,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
 int ram_discard_range(const char *block_name, uint64_t start, size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, bool 
is_pmem);
 
 int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 bool ramblock_recv_bitmap_test_byte_offset(RAM

[Qemu-devel] [PATCH 2/9 V5] hostmem-file: add the 'pmem' option

2018-05-09 Thread junyan . he

From: Junyan He 

When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region.

Signed-off-by: Haozhong Zhang 
---
 backends/hostmem-file.c | 26 +-
 docs/nvdimm.txt | 14 ++
 exec.c  | 13 -
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h |  3 +++
 qemu-options.hx |  7 +++
 6 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 30df843..5d706d4 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 char *mem_path;
 uint64_t align;
+bool is_pmem;
 };
 
 static void
@@ -59,7 +60,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
  path,
  backend->size, fb->align,
- backend->share ? QEMU_RAM_SHARE : 0,
+ (backend->share ? QEMU_RAM_SHARE : 0) |
+ (fb->is_pmem ? QEMU_RAM_PMEM : 0),
  fb->mem_path, errp);
 g_free(path);
 }
@@ -131,6 +133,25 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 error_propagate(errp, local_err);
 }
 
+static bool file_memory_backend_get_pmem(Object *o, Error **errp)
+{
+return MEMORY_BACKEND_FILE(o)->is_pmem;
+}
+
+static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property 'pmem' of %s '%s'",
+   object_get_typename(o), backend->id);
+return;
+}
+
+fb->is_pmem = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +183,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL, &error_abort);
+object_class_property_add_bool(oc, "pmem",
+file_memory_backend_get_pmem, file_memory_backend_set_pmem,
+&error_abort);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index e903d8b..bcb2032 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -153,3 +153,17 @@ guest NVDIMM region mapping structure.  This unarmed flag 
indicates
 guest software that this vNVDIMM device contains a region that cannot
 accept persistent writes. In result, for example, the guest Linux
 NVDIMM driver, marks such vNVDIMM device as read-only.
+
+If the vNVDIMM backend is on the host persistent memory that can be
+accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's
+suggested to set the 'pmem' option of memory-backend-file to 'on'. When
+'pmem=on' and QEMU is built with libpmem [2] support (configured with
+--enable-libpmem), QEMU will take necessary operations to guarantee
+the persistence of its own writes to the vNVDIMM backend (e.g., in
+vNVDIMM label emulation and live migration).
+
+References
+--
+
+[1] SNIA NVM Programming Model: 
https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
+[2] PMDK: http://pmem.io/pmdk/
diff --git a/exec.c b/exec.c
index fa33c29..dedeb4d 100644
--- a/exec.c
+++ b/exec.c
@@ -52,6 +52,9 @@
 #include 
 #endif
 
+/* RAM is backed by the persistent memory. */
+#define RAM_PMEM   (1 << 3)
+
 #endif
 #include "qemu/rcu_queue.h"
 #include "qemu/main-loop.h"
@@ -2037,6 +2040,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 Error *local_err = NULL;
 int64_t file_size;
 bool share = flags & QEMU_RAM_SHARE;
+bool is_pmem = flags & QEMU_RAM_PMEM;
 
 if (xen_enabled()) {
 error_setg(errp, "-mem-path not supported with Xen");
@@ -2073,7 +2077,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 new_block->mr = mr;
 new_block->used_length = size;
 new_block->max_length = size;
-new_block->flags = share ? RAM_SHARED : 0;
+new_block->flags = (share ? RAM_

[Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory

2018-05-09 Thread junyan . he

From: Junyan He 

QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and
live migration. If the backend is on the persistent memory, QEMU needs
to take proper operations to ensure its writes persistent on the
persistent memory. Otherwise, a host power failure may result in the
loss the guest data on the persistent memory.

This v3 patch series is based on Marcel's patch "mem: add share
parameter to memory-backend-ram" [1] because of the changes in patch 1.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html

Previous versions can be found at
V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html

Changes in v5:
* (Patch 9) Add post copy check and output some messages for nvdimm.

Changes in v4:
* (Patch 2) Fix compilation errors found by patchew.

Changes in v3:
* (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle
PMEM writes in it, so we don't need the _common function.
* (Patch 6) Expose qemu_get_buffer_common so we can remove the
unnecessary qemu_get_buffer_to_pmem wrapper.
* (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle
PMEM writes in it, so we can remove the unnecessary
xbzrle_decode_buffer_{common, to_pmem}.
* Move libpmem stubs to stubs/pmem.c and fix the compilation failures
of test-{xbzrle,vmstate}.c.

Changes in v2:
* (Patch 1) Use a flags parameter in file ram allocation functions.
* (Patch 2) Add a new option 'pmem' to hostmem-file.
* (Patch 3) Use libpmem to operate on the persistent memory, rather
than re-implementing those operations in QEMU.
* (Patch 5-8) Consider the write persistence in the migration path.

Haozhong Zhang (8):
[1/9] memory, exec: switch file ram allocation functions to 'flags' parameters
[2/9] hostmem-file: add the 'pmem' option
[3/9] configure: add libpmem support
[4/9] mem/nvdimm: ensure write persistence to PMEM in label emulation
[5/9] migration/ram: ensure write persistence on loading zero pages to PMEM
[6/9] migration/ram: ensure write persistence on loading normal pages to PMEM
[7/9] migration/ram: ensure write persistence on loading compressed pages to 
PMEM
[8/9] migration/ram: ensure write persistence on loading xbzrle pages to PMEM
Junyan He (1):
[9/9] migration/ram: Add check and info message to nvdimm post copy.

Signed-off-by: Haozhong Zhang 
Signed-off-by: Junyan He 

---
backends/hostmem-file.c | 27 ++-
configure   | 35 +++
docs/nvdimm.txt | 14 ++
exec.c  | 20 
hw/mem/nvdimm.c |  9 -
include/exec/memory.h   | 12 ++--
include/exec/ram_addr.h | 28 ++--
include/migration/qemu-file-types.h |  2 ++
include/qemu/pmem.h | 27 +++
memory.c|  8 +---
migration/qemu-file.c   | 29 +++--
migration/ram.c | 52 
++--
migration/ram.h |  2 +-
migration/rdma.c|  2 +-
migration/xbzrle.c  |  8 ++--
migration/xbzrle.h  |  3 ++-
numa.c  |  2 +-
qemu-options.hx |  7 +++
stubs/Makefile.objs |  1 +
stubs/pmem.c| 37 +
tests/Makefile.include  |  4 ++--
tests/test-xbzrle.c |  4 ++--
22 files changed, 290 insertions(+), 43 deletions(-)
-- 
2.7.4

[Qemu-devel] [PATCH 02/10] RFC: Implement qcow2's snapshot dependent saving function.

2018-03-15 Thread junyan . he

From: Junyan He 

For qcow2 format, we can increase the cluster's reference count of
dependent snapshot content and link the offset to the L2 table of
the new snapshot point. This way can avoid obvious snapshot's dependent
relationship, so when we delete some snapshot point, just decrease the
cluster count and no need to check further.

Signed-off-by: Junyan He 
---
 block/qcow2-snapshot.c | 154 +
 block/qcow2.c  |   2 +
 block/qcow2.h  |   7 +++
 3 files changed, 163 insertions(+)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index cee25f5..8e83084 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -736,3 +736,157 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
 
 return 0;
 }
+
+int qcow2_snapshot_save_dependency(BlockDriverState *bs,
+   const char *depend_snapshot_id,
+   int64_t depend_offset,
+   int64_t depend_size,
+   int64_t offset,
+   Error **errp)
+{
+int snapshot_index;
+BDRVQcow2State *s = bs->opaque;
+QCowSnapshot *sn;
+int ret;
+int64_t i;
+int64_t total_bytes = depend_size;
+int64_t depend_offset1, offset1;
+uint64_t *depend_l1_table = NULL;
+uint64_t depend_l1_bytes;
+uint64_t *depend_l2_table = NULL;
+uint64_t depend_l2_offset;
+uint64_t depend_entry;
+QCowL2Meta l2meta;
+
+assert(bs->read_only == false);
+
+if (depend_snapshot_id == NULL) {
+return 0;
+}
+
+if (!QEMU_IS_ALIGNED(depend_offset,  s->cluster_size)) {
+error_setg(errp, "Specified snapshot offset is not multiple of %u",
+s->cluster_size);
+return -EINVAL;
+}
+
+if (!QEMU_IS_ALIGNED(offset,  s->cluster_size)) {
+error_setg(errp, "Offset is not multiple of %u", s->cluster_size);
+return -EINVAL;
+}
+
+if (!QEMU_IS_ALIGNED(depend_size,  s->cluster_size)) {
+error_setg(errp, "depend_size is not multiple of %u", s->cluster_size);
+return -EINVAL;
+}
+
+snapshot_index = find_snapshot_by_id_and_name(bs, NULL, 
depend_snapshot_id);
+/* Search the snapshot */
+if (snapshot_index < 0) {
+error_setg(errp, "Can't find snapshot");
+return -ENOENT;
+}
+
+sn = &s->snapshots[snapshot_index];
+if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) {
+error_report("qcow2: depend on the snapshots with different disk "
+"size is not implemented");
+return -ENOTSUP;
+}
+
+/* Only can save dependency of snapshot's vmstate data */
+depend_offset1 = depend_offset + qcow2_vm_state_offset(s);
+offset1 = offset + qcow2_vm_state_offset(s);
+
+depend_l1_bytes = s->l1_size * sizeof(uint64_t);
+depend_l1_table = g_try_malloc0(depend_l1_bytes);
+if (depend_l1_table == NULL) {
+return -ENOMEM;
+}
+
+ret = bdrv_pread(bs->file, sn->l1_table_offset, depend_l1_table,
+ depend_l1_bytes);
+if (ret < 0) {
+g_free(depend_l1_table);
+goto out;
+}
+for (i = 0; i < depend_l1_bytes / sizeof(uint64_t); i++) {
+be64_to_cpus(&depend_l1_table[i]);
+}
+
+while (total_bytes) {
+assert(total_bytes > 0);
+/* Find the cluster of depend */
+depend_l2_offset =
+depend_l1_table[depend_offset1 >> (s->l2_bits + s->cluster_bits)];
+depend_l2_offset &= L1E_OFFSET_MASK;
+if (depend_l2_offset == 0) {
+ret = -EINVAL;
+goto out;
+}
+
+if (offset_into_cluster(s, depend_l2_offset)) {
+qcow2_signal_corruption(bs, true, -1, -1, "L2 table offset %#"
+PRIx64 " unaligned (L1 index: %#"
+PRIx64 ")",
+depend_l2_offset,
+depend_offset1 >>
+(s->l2_bits + s->cluster_bits));
+return -EIO;
+}
+
+ret = qcow2_cache_get(bs, s->l2_table_cache, depend_l2_offset,
+  (void **)(&depend_l2_table));
+if (ret < 0) {
+goto out;
+}
+
+depend_entry =
+be64_to_cpu(
+depend_l2_table[offset_to_l2_index(s, depend_offset1)]);
+if (depend_entry == 0) {
+ret = -EINVAL;
+qcow2_cache_put(s->l2_table_cache, (void **)(&depend_l2_table));
+goto out;
+}
+
+memset(&l2meta, 0, sizeof(l2meta));
+l2meta.offset = offset1;
+l2meta.alloc_off

[Qemu-devel] [PATCH 09/10] RFC: Add nvdimm snapshot saving to migration.

2018-03-15 Thread junyan . he

From: Junyan He 

The nvdimm size is huge, sometimes is more than 256G or even more.
This is a huge burden for snapshot saving. One snapshot point with
nvdimm may occupy more than 50G disk space even with compression
enabled.
We need to introduce dependent snapshot manner to solve this problem.
The first snapshot point should always be saved completely, and enable
dirty log trace after saving for nvdimm memory region. The later snapshot
point should add the reference to previous snapshot's nvdimm data and
just saving dirty pages. This can save a lot of disk and time if the
snapshot operations are triggered frequently.

Signed-off-by: Junyan He 
---
 Makefile.target  |1 +
 include/migration/misc.h |4 +
 migration/nvdimm.c   | 1033 ++
 3 files changed, 1038 insertions(+)
 create mode 100644 migration/nvdimm.c

diff --git a/Makefile.target b/Makefile.target
index 6549481..0259e70 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -139,6 +139,7 @@ obj-y += memory.o
 obj-y += memory_mapping.o
 obj-y += dump.o
 obj-y += migration/ram.o
+obj-y += migration/nvdimm.o
 LIBS := $(libs_softmmu) $(LIBS)
 
 # Hardware support
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 77fd4f5..0c23da8 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -20,6 +20,10 @@
 
 void ram_mig_init(void);
 
+/* migration/nvdimm.c */
+void nvdimm_snapshot_init(void);
+bool ram_block_is_nvdimm_active(RAMBlock *block);
+
 /* migration/block.c */
 
 #ifdef CONFIG_LIVE_BLOCK_MIGRATION
diff --git a/migration/nvdimm.c b/migration/nvdimm.c
new file mode 100644
index 000..8516bb0
--- /dev/null
+++ b/migration/nvdimm.c
@@ -0,0 +1,1033 @@
+/*
+ * QEMU System Emulator
+ *
+ * Authors:
+ *  He Junyan
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/mem/nvdimm.h"
+#include "cpu.h"
+#include "qemu/cutils.h"
+#include "exec/ram_addr.h"
+#include "exec/target_page.h"
+#include "qemu/rcu_queue.h"
+#include "qemu/error-report.h"
+#include "migration.h"
+#include "qapi/error.h"
+#include "migration/register.h"
+#include "migration/ram.h"
+#include "migration/qemu-file.h"
+#include "migration.h"
+#include "migration/misc.h"
+#include "migration/savevm.h"
+#include "block/snapshot.h"
+#include "migration/snapshot.h"
+
+#define NVDIMM_MIG_VERSION 0x01
+
+/* PADDING data, useless */
+#define NVDIMM_PADDING_BYTE 0xce
+/* PAGE id, is all zero */
+#define NVDIMM_ZERO_PAGE_ID 0xaabc250f
+#define NVDIMM_NONZERO_PAGE_ID 0xacbc250e
+/* No usage date, for alignment only */
+#define NVDIMM_SECTION_PADDING_ID 0xaaceccea
+/* Section for dirty log kind */
+#define NVDIMM_SECTION_DIRTY_LOG_ID 0xbbcd0c1e
+/* Section for raw data, no bitmap, dump the whole mem */
+#define NVDIMM_SECTION_DATA_ID 0x76bbcae3
+/* Section for setup */
+#define NVDIMM_SECTION_SETUP 0x7ace0cfa
+/* Section for setup */
+#define NVDIMM_SECTION_COMPLETE 0x8ace0cfa
+/* Section end symbol */
+#define NVDIMM_SECTION_END_ID 0xccbe8752
+/  Sections** ***
+Padding section
+
+| PADDING_ID | size | PADDING_BYTE .. | END_ID |
+
+Dirty log section
+
+| DIRTY_BITMAP_ID | total size | ram name size | ram name | ram size | bitmap 
size |
+
+-
+ bitmap data... | dirty page size | dirty page data... | END_ID |
+---

[Qemu-devel] [PATCH 08/10] RFC: Add a section_id parameter to save_live_iterate call.

2018-03-15 Thread junyan . he

From: Junyan He 

We need to know the section_id when we do snapshot saving.
Add a parameter to save_live_iterate function call.

Signed-off-by: Junyan He 
---
 hw/ppc/spapr.c   | 2 +-
 hw/s390x/s390-stattrib.c | 2 +-
 include/migration/register.h | 2 +-
 migration/block.c| 2 +-
 migration/ram.c  | 2 +-
 migration/savevm.c   | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7e1c858..4cde4f4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1974,7 +1974,7 @@ static int htab_save_later_pass(QEMUFile *f, 
sPAPRMachineState *spapr,
 #define MAX_ITERATION_NS500 /* 5 ms */
 #define MAX_KVM_BUF_SIZE2048
 
-static int htab_save_iterate(QEMUFile *f, void *opaque)
+static int htab_save_iterate(QEMUFile *f, void *opaque, int section_id)
 {
 sPAPRMachineState *spapr = opaque;
 int fd;
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index adf07ef..18ece84 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -246,7 +246,7 @@ static int cmma_save(QEMUFile *f, void *opaque, int final)
 return ret;
 }
 
-static int cmma_save_iterate(QEMUFile *f, void *opaque)
+static int cmma_save_iterate(QEMUFile *f, void *opaque, int section_id)
 {
 return cmma_save(f, opaque, 0);
 }
diff --git a/include/migration/register.h b/include/migration/register.h
index f4f7bdc..7f7df2c 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -31,7 +31,7 @@ typedef struct SaveVMHandlers {
  * use data that is local to the migration thread or protected
  * by other locks.
  */
-int (*save_live_iterate)(QEMUFile *f, void *opaque);
+int (*save_live_iterate)(QEMUFile *f, void *opaque, int section_id);
 
 /* This runs outside the iothread lock!  */
 int (*save_setup)(QEMUFile *f, void *opaque);
diff --git a/migration/block.c b/migration/block.c
index 1f03946..6d4c8a3 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -755,7 +755,7 @@ static int block_save_setup(QEMUFile *f, void *opaque)
 return ret;
 }
 
-static int block_save_iterate(QEMUFile *f, void *opaque)
+static int block_save_iterate(QEMUFile *f, void *opaque, int section_id)
 {
 int ret;
 int64_t last_ftell = qemu_ftell(f);
diff --git a/migration/ram.c b/migration/ram.c
index 3b6c077..d1db422 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2249,7 +2249,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
  * @f: QEMUFile where to send the data
  * @opaque: RAMState pointer
  */
-static int ram_save_iterate(QEMUFile *f, void *opaque)
+static int ram_save_iterate(QEMUFile *f, void *opaque, int section_id)
 {
 RAMState **temp = opaque;
 RAMState *rs = *temp;
diff --git a/migration/savevm.c b/migration/savevm.c
index 3a9b904..ce4133a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1072,7 +1072,7 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
 
 save_section_header(f, se, QEMU_VM_SECTION_PART);
 
-ret = se->ops->save_live_iterate(f, se->opaque);
+ret = se->ops->save_live_iterate(f, se->opaque, se->section_id);
 trace_savevm_section_end(se->idstr, se->section_id, ret);
 save_section_footer(f, se);
 
-- 
2.7.4

[Qemu-devel] [PATCH 05/10] RFC: Add memory region snapshot bitmap get function.

2018-03-15 Thread junyan . he

From: Junyan He 

We need to get the bitmap content of the snapshot when enable dirty
log trace for nvdimm.

Signed-off-by: Junyan He 
---
 exec.c  | 7 +++
 include/exec/memory.h   | 9 +
 include/exec/ram_addr.h | 2 ++
 memory.c| 7 +++
 4 files changed, 25 insertions(+)

diff --git a/exec.c b/exec.c
index a9181e6..3d2bf0d 100644
--- a/exec.c
+++ b/exec.c
@@ -1235,6 +1235,13 @@ bool 
cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
 return false;
 }
 
+unsigned long *cpu_physical_memory_snapshot_get_dirty_bitmap
+ (DirtyBitmapSnapshot *snap)
+{
+assert(snap);
+return snap->dirty;
+}
+
 /* Called from RCU critical section */
 hwaddr memory_region_section_get_iotlb(CPUState *cpu,
MemoryRegionSection *section,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 31eae0a..f742995 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1179,6 +1179,15 @@ bool memory_region_snapshot_get_dirty(MemoryRegion *mr,
   hwaddr addr, hwaddr size);
 
 /**
+ * memory_region_snapshot_get_dirty_bitmap: Get the dirty bitmap data of
+ * snapshot.
+ *
+ * @snap: the dirty bitmap snapshot
+ */
+unsigned long *memory_region_snapshot_get_dirty_bitmap
+ (DirtyBitmapSnapshot *snap);
+
+/**
  * memory_region_reset_dirty: Mark a range of pages as clean, for a specified
  *client.
  *
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index cf2446a..ce366c1 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -371,6 +371,8 @@ DirtyBitmapSnapshot 
*cpu_physical_memory_snapshot_and_clear_dirty
 bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
 ram_addr_t start,
 ram_addr_t length);
+unsigned long *cpu_physical_memory_snapshot_get_dirty_bitmap
+(DirtyBitmapSnapshot *snap);
 
 static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start,
  ram_addr_t length)
diff --git a/memory.c b/memory.c
index 4a8a2fe..68f17f0 100644
--- a/memory.c
+++ b/memory.c
@@ -1991,6 +1991,13 @@ DirtyBitmapSnapshot 
*memory_region_snapshot_and_clear_dirty(MemoryRegion *mr,
 memory_region_get_ram_addr(mr) + addr, size, client);
 }
 
+unsigned long *memory_region_snapshot_get_dirty_bitmap
+ (DirtyBitmapSnapshot *snap)
+{
+assert(snap);
+return cpu_physical_memory_snapshot_get_dirty_bitmap(snap);
+}
+
 bool memory_region_snapshot_get_dirty(MemoryRegion *mr, DirtyBitmapSnapshot 
*snap,
   hwaddr addr, hwaddr size)
 {
-- 
2.7.4

[Qemu-devel] [PATCH 06/10] RFC: Add save dependency functions to qemu_file

2018-03-15 Thread junyan . he

From: Junyan He 

When we save snapshot, we need qemu_file to support save dependency
operations. It should call brv_driver's save dependency functions
to implement these operations.

Signed-off-by: Junyan He 
---
 migration/qemu-file.c | 61 +++
 migration/qemu-file.h | 14 
 migration/savevm.c| 33 +---
 3 files changed, 105 insertions(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2ab2bf3..9d2a39a 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -46,10 +46,13 @@ struct QEMUFile {
 int buf_index;
 int buf_size; /* 0 when writing */
 uint8_t buf[IO_BUF_SIZE];
+char ref_name_str[128]; /* maybe snapshot id */
 
 DECLARE_BITMAP(may_free, MAX_IOV_SIZE);
 struct iovec iov[MAX_IOV_SIZE];
 unsigned int iovcnt;
+bool support_dependency;
+int32_t dependency_aligment;
 
 int last_error;
 };
@@ -745,3 +748,61 @@ void qemu_file_set_blocking(QEMUFile *f, bool block)
 f->ops->set_blocking(f->opaque, block);
 }
 }
+
+void qemu_file_set_support_dependency(QEMUFile *f, int32_t alignment)
+{
+f->dependency_aligment = alignment;
+f->support_dependency = true;
+}
+
+bool qemu_file_is_support_dependency(QEMUFile *f, int32_t *alignment)
+{
+if (f->support_dependency && alignment) {
+*alignment = f->dependency_aligment;
+}
+
+return f->support_dependency;
+}
+
+/* This function set the reference name for snapshot usage. Sometimes it needs
+ * to depend on other snapshot's data to avoid redundance.
+ */
+bool qemu_file_set_ref_name(QEMUFile *f, const char *name)
+{
+if (strlen(name) + 1 > sizeof(f->ref_name_str)) {
+return false;
+}
+
+memcpy(f->ref_name_str, name, strlen(name) + 1);
+return true;
+}
+
+ssize_t qemu_file_save_dependency(QEMUFile *f, int64_t depend_offset,
+  int64_t size)
+{
+ssize_t ret;
+
+if (f->support_dependency == false) {
+return -1;
+}
+
+assert(f->ops->save_dependency);
+
+if (!QEMU_IS_ALIGNED(depend_offset, f->dependency_aligment)) {
+return -1;
+}
+
+qemu_fflush(f);
+
+if (!QEMU_IS_ALIGNED(f->pos, f->dependency_aligment)) {
+return -1;
+}
+
+ret = f->ops->save_dependency(f->opaque, f->ref_name_str,
+  depend_offset, size, f->pos);
+if (ret > 0) {
+f->pos += size;
+}
+
+return ret;
+}
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index aae4e5e..137b917 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -57,6 +57,14 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, 
struct iovec *iov,
int iovcnt, int64_t pos);
 
 /*
+ * This function add reference to the dependency data in snapshot specified by
+ * ref_name_str to this file's offset
+ */
+typedef ssize_t (QEMUFileSaveDependencyFunc)(void *opaque, const char *name,
+ int64_t depend_offset,
+ int64_t offset, int64_t size);
+
+/*
  * This function provides hooks around different
  * stages of RAM migration.
  * 'opaque' is the backend specific data in QEMUFile
@@ -104,6 +112,7 @@ typedef struct QEMUFileOps {
 QEMUFileWritevBufferFunc *writev_buffer;
 QEMURetPathFunc *get_return_path;
 QEMUFileShutdownFunc *shut_down;
+QEMUFileSaveDependencyFunc *save_dependency;
 } QEMUFileOps;
 
 typedef struct QEMUFileHooks {
@@ -153,6 +162,11 @@ int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
+bool qemu_file_set_ref_name(QEMUFile *f, const char *name);
+void qemu_file_set_support_dependency(QEMUFile *f, int32_t alignment);
+bool qemu_file_is_support_dependency(QEMUFile *f, int32_t *alignment);
+ssize_t qemu_file_save_dependency(QEMUFile *f, int64_t depend_offset,
+  int64_t size);
 
 size_t qemu_get_counted_string(QEMUFile *f, char buf[256]);
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 358c5b5..1bbd6aa 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -196,6 +196,20 @@ static ssize_t block_writev_buffer(void *opaque, struct 
iovec *iov, int iovcnt,
 return qiov.size;
 }
 
+static ssize_t block_save_dependency(void *opaque, const char *id_name,
+ int64_t depend_offset,
+ int64_t offset, int64_t size)
+{
+int ret = bdrv_snapshot_save_dependency(opaque, id_name,
+depend_offset, offset,
+size, NULL);
+if (ret < 0) {
+return r

[Qemu-devel] [PATCH 03/10] RFC: Implement save and support snapshot dependency in block driver layer.

2018-03-15 Thread junyan . he

From: Junyan He 

Signed-off-by: Junyan He 
---
 block/snapshot.c | 45 +
 include/block/snapshot.h |  7 +++
 2 files changed, 52 insertions(+)

diff --git a/block/snapshot.c b/block/snapshot.c
index eacc1f1..8cc40ac 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -401,6 +401,51 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState 
*bs,
 return ret;
 }
 
+int bdrv_snapshot_save_dependency(BlockDriverState *bs,
+  const char *depend_snapshot_id,
+  int64_t depend_offset,
+  int64_t depend_size,
+  int64_t offset,
+  Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (!drv) {
+return -ENOMEDIUM;
+}
+
+if (drv->bdrv_snapshot_save_dependency) {
+return drv->bdrv_snapshot_save_dependency(bs, depend_snapshot_id,
+  depend_offset, depend_size,
+  offset, errp);
+}
+
+if (bs->file) {
+return bdrv_snapshot_save_dependency(bs->file->bs, depend_snapshot_id,
+ depend_offset, depend_size,
+ offset, errp);
+}
+
+return -ENOTSUP;
+}
+
+int bdrv_snapshot_support_dependency(BlockDriverState *bs, int32_t *alignment)
+{
+BlockDriver *drv = bs->drv;
+if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+return 0;
+}
+
+if (drv->bdrv_snapshot_support_dependency) {
+return drv->bdrv_snapshot_support_dependency(bs, alignment);
+}
+
+if (bs->file != NULL) {
+return bdrv_snapshot_support_dependency(bs->file->bs, alignment);
+}
+
+return -ENOTSUP;
+}
 
 /* Group operations. All block drivers are involved.
  * These functions will properly handle dataplane (take aio_context_acquire
diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index f73d109..e5bf06f 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -73,6 +73,13 @@ int bdrv_snapshot_load_tmp(BlockDriverState *bs,
 int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs,
  const char *id_or_name,
  Error **errp);
+int bdrv_snapshot_save_dependency(BlockDriverState *bs,
+  const char *depend_snapshot_id,
+  int64_t depend_offset,
+  int64_t depend_size,
+  int64_t offset,
+  Error **errp);
+int bdrv_snapshot_support_dependency(BlockDriverState *bs, int32_t *alignment);
 
 
 /* Group operations. All block drivers are involved.
-- 
2.7.4

[Qemu-devel] [PATCH 10/10] RFC: Enable nvdimm snapshot functions.

2018-03-15 Thread junyan . he

From: Junyan He 

In snapshot saving, all nvdimm kind memory will be saved in different way
and we exclude all nvdimm kind memory region in ram.c

Signed-off-by: Junyan He 
---
 migration/ram.c | 17 +
 vl.c|  1 +
 2 files changed, 18 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index d1db422..ad32469 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1219,9 +1219,15 @@ static bool find_dirty_block(RAMState *rs, 
PageSearchStatus *pss, bool *again)
 /* Didn't find anything in this RAM Block */
 pss->page = 0;
 pss->block = QLIST_NEXT_RCU(pss->block, next);
+while (ram_block_is_nvdimm_active(pss->block)) {
+pss->block = QLIST_NEXT_RCU(pss->block, next);
+}
 if (!pss->block) {
 /* Hit the end of the list */
 pss->block = QLIST_FIRST_RCU(&ram_list.blocks);
+while (ram_block_is_nvdimm_active(pss->block)) {
+pss->block = QLIST_NEXT_RCU(pss->block, next);
+}
 /* Flag that we've looped */
 pss->complete_round = true;
 rs->ram_bulk_stage = false;
@@ -1541,6 +1547,9 @@ static int ram_find_and_save_block(RAMState *rs, bool 
last_stage)
 
 if (!pss.block) {
 pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
+while (ram_block_is_nvdimm_active(pss.block)) {
+pss.block = QLIST_NEXT_RCU(pss.block, next);
+}
 }
 
 do {
@@ -1583,6 +1592,10 @@ uint64_t ram_bytes_total(void)
 
 rcu_read_lock();
 RAMBLOCK_FOREACH(block) {
+if (ram_block_is_nvdimm_active(block)) {
+// If snapshot and the block is nvdimm, let nvdimm do the job
+continue;
+}
 total += block->used_length;
 }
 rcu_read_unlock();
@@ -,6 +2235,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
 RAMBLOCK_FOREACH(block) {
+if (ram_block_is_nvdimm_active(block)) {
+// If snapshot and the block is nvdimm, let nvdimm do the job
+continue;
+}
 qemu_put_byte(f, strlen(block->idstr));
 qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
 qemu_put_be64(f, block->used_length);
diff --git a/vl.c b/vl.c
index 3ef04ce..1bd5711 100644
--- a/vl.c
+++ b/vl.c
@@ -4502,6 +4502,7 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 ram_mig_init();
+nvdimm_snapshot_init();
 
 /* If the currently selected machine wishes to override the units-per-bus
  * property of its default HBA interface type, do so now. */
-- 
2.7.4

[Qemu-devel] [PATCH 00/10] RFC: Optimize nvdimm kind memory for snapshot.

2018-03-15 Thread junyan . he

From: Junyan He 

The nvdimm size is huge, sometimes it is more than 256G or even more.
This is a huge burden for snapshot saving. One snapshot point with
nvdimm may occupy more than 50G disk space even with compression
enabled.
We need to introduce dependent snapshot manner to solve this problem.
The first snapshot point should always be saved completely, and enable
dirty log trace after saving for nvdimm memory region. The later snapshot
point should add the reference to previous snapshot's nvdimm data and
just saving dirty pages. This can save a lot of disk and time if the
snapshot operations are triggered frequently.
We add save_snapshot_dependency functions to QCOW2 file system firstly, the
later snapshot will add reference to previous dependent snapshot's data
cluster. There is an alignment problem here, the dependent data should
always be cluster aligned. We need to add some padding data when saving
the snapshot to make it always cluster aligned.
The logic between nvdimm and ram for snapshot saving is a little confused
now, we need to exclude nvdimm kind memory region from ram list and the
dirty log tracing setting is also not very clear. Maybe we can separate the
snapshot saving from the migration logic later to make code clean.
In theory, this kind of manner can apply to any kind of memory. But because
it need to turn dirty log trace on, the performance may decline. So we just
enable it for nvdimm kind memory firstly.

Signed-off-by: Junyan He 
---
Makefile.target  |1 +
block/qcow2-snapshot.c   |  154 ++
block/qcow2.c|2 +
block/qcow2.h|7 +
block/snapshot.c |   45 +++
exec.c   |7 +
hw/ppc/spapr.c   |2 +-
hw/s390x/s390-stattrib.c |2 +-
include/block/block_int.h|9 ++
include/block/snapshot.h |7 +
include/exec/memory.h|9 ++
include/exec/ram_addr.h  |2 +
include/migration/misc.h |4 +
include/migration/register.h |2 +-
include/migration/snapshot.h |3 +
memory.c |   18 ++-
migration/block.c|2 +-
migration/nvdimm.c   | 1033 
+
migration/qemu-file.c|   61 +
migration/qemu-file.h|   14 ++
migration/ram.c  |   19 ++-
migration/savevm.c   |   62 -
vl.c |1 +
23 files changed, 1452 insertions(+), 14 deletions(-)

[Qemu-devel] [PATCH 07/10] RFC: Add get_current_snapshot_info to get the snapshot state.

2018-03-15 Thread junyan . he

From: Junyan He 

We need to know the snapshot saving information when we do dependent
snapshot saving, e.g the name of previous snapshot. Add this global
function to query the snapshot status is usable.

Signed-off-by: Junyan He 
---
 include/migration/snapshot.h |  3 +++
 migration/savevm.c   | 27 +++
 2 files changed, 30 insertions(+)

diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index c85b6ec..0b950ce 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -15,7 +15,10 @@
 #ifndef QEMU_MIGRATION_SNAPSHOT_H
 #define QEMU_MIGRATION_SNAPSHOT_H
 
+#include "block/snapshot.h"
+
 int save_snapshot(const char *name, Error **errp);
 int load_snapshot(const char *name, Error **errp);
+int get_current_snapshot_info(QEMUSnapshotInfo *sn);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 1bbd6aa..3a9b904 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2212,6 +2212,29 @@ int qemu_loadvm_state(QEMUFile *f)
 return ret;
 }
 
+static int in_snap_saving;
+static QEMUSnapshotInfo in_snap_saving_sn;
+
+int get_current_snapshot_info(QEMUSnapshotInfo *sn)
+{
+if (in_snap_saving && sn) {
+memcpy(sn, &in_snap_saving_sn, sizeof(QEMUSnapshotInfo));
+}
+
+return in_snap_saving;
+}
+
+static void set_current_snapshot_info(QEMUSnapshotInfo *sn)
+{
+if (sn) {
+memcpy(&in_snap_saving_sn, sn, sizeof(QEMUSnapshotInfo));
+in_snap_saving = 1;
+} else {
+memset(&in_snap_saving_sn, 0, sizeof(QEMUSnapshotInfo));
+in_snap_saving = 0;
+}
+}
+
 int save_snapshot(const char *name, Error **errp)
 {
 BlockDriverState *bs, *bs1;
@@ -2282,6 +2305,8 @@ int save_snapshot(const char *name, Error **errp)
 strftime(sn->name, sizeof(sn->name), "vm-%Y%m%d%H%M%S", &tm);
 }
 
+set_current_snapshot_info(sn);
+
 /* save the VM state */
 f = qemu_fopen_bdrv(bs, 1);
 if (!f) {
@@ -2313,6 +2338,8 @@ int save_snapshot(const char *name, Error **errp)
 ret = 0;
 
  the_end:
+set_current_snapshot_info(NULL);
+
 if (aio_context) {
 aio_context_release(aio_context);
 }
-- 
2.7.4

[Qemu-devel] [PATCH 04/10] RFC: Set memory_region_set_log available for more client.

2018-03-15 Thread junyan . he

From: Junyan He 

We need to collect dirty log for nvdimm kind memory, need to enable
memory_region_set_log for more clients rather than just VGA.

Signed-off-by: Junyan He 
---
 memory.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/memory.c b/memory.c
index e70b64b..4a8a2fe 100644
--- a/memory.c
+++ b/memory.c
@@ -1921,11 +1921,12 @@ void memory_region_set_log(MemoryRegion *mr, bool log, 
unsigned client)
 uint8_t mask = 1 << client;
 uint8_t old_logging;
 
-assert(client == DIRTY_MEMORY_VGA);
-old_logging = mr->vga_logging_count;
-mr->vga_logging_count += log ? 1 : -1;
-if (!!old_logging == !!mr->vga_logging_count) {
-return;
+if (client == DIRTY_MEMORY_VGA) {
+old_logging = mr->vga_logging_count;
+mr->vga_logging_count += log ? 1 : -1;
+if (!!old_logging == !!mr->vga_logging_count) {
+return;
+}
 }
 
 memory_region_transaction_begin();
-- 
2.7.4

[Qemu-devel] [PATCH 01/10] RFC: Add save and support snapshot dependency function to block driver.

2018-03-15 Thread junyan . he

From: Junyan He 

We want to support incremental snapshot saving, this needs the file
system support dependency saving. Later snapshots may ref the dependent
snapshot's content, and most time should be cluster aligned.
Add a query function to check whether the file system support this, and
use the save_dependency function to do the real work.

Signed-off-by: Junyan He 
---
 include/block/block_int.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 64a5700..be1eca3 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -274,6 +274,15 @@ struct BlockDriver {
   const char *snapshot_id,
   const char *name,
   Error **errp);
+int (*bdrv_snapshot_save_dependency)(BlockDriverState *bs,
+ const char *depend_snapshot_id,
+ int64_t depend_offset,
+ int64_t depend_size,
+ int64_t offset,
+ Error **errp);
+int (*bdrv_snapshot_support_dependency)(BlockDriverState *bs,
+int32_t *alignment);
+
 int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
 
-- 
2.7.4

[Qemu-devel] [PATCH 07/10] RFC: Add get_current_snapshot_info to get the snapshot state.

2018-03-13 Thread junyan . he

From: Junyan He 

We need to know the snapshot saving information when we do dependent
snapshot saving, e.g the name of previous snapshot. Add this global
function to query the snapshot status is usable.

Signed-off-by: Junyan He 
---
 include/migration/snapshot.h |  3 +++
 migration/savevm.c   | 27 +++
 2 files changed, 30 insertions(+)

diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index c85b6ec..0b950ce 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -15,7 +15,10 @@
 #ifndef QEMU_MIGRATION_SNAPSHOT_H
 #define QEMU_MIGRATION_SNAPSHOT_H
 
+#include "block/snapshot.h"
+
 int save_snapshot(const char *name, Error **errp);
 int load_snapshot(const char *name, Error **errp);
+int get_current_snapshot_info(QEMUSnapshotInfo *sn);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 1bbd6aa..3a9b904 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2212,6 +2212,29 @@ int qemu_loadvm_state(QEMUFile *f)
 return ret;
 }
 
+static int in_snap_saving;
+static QEMUSnapshotInfo in_snap_saving_sn;
+
+int get_current_snapshot_info(QEMUSnapshotInfo *sn)
+{
+if (in_snap_saving && sn) {
+memcpy(sn, &in_snap_saving_sn, sizeof(QEMUSnapshotInfo));
+}
+
+return in_snap_saving;
+}
+
+static void set_current_snapshot_info(QEMUSnapshotInfo *sn)
+{
+if (sn) {
+memcpy(&in_snap_saving_sn, sn, sizeof(QEMUSnapshotInfo));
+in_snap_saving = 1;
+} else {
+memset(&in_snap_saving_sn, 0, sizeof(QEMUSnapshotInfo));
+in_snap_saving = 0;
+}
+}
+
 int save_snapshot(const char *name, Error **errp)
 {
 BlockDriverState *bs, *bs1;
@@ -2282,6 +2305,8 @@ int save_snapshot(const char *name, Error **errp)
 strftime(sn->name, sizeof(sn->name), "vm-%Y%m%d%H%M%S", &tm);
 }
 
+set_current_snapshot_info(sn);
+
 /* save the VM state */
 f = qemu_fopen_bdrv(bs, 1);
 if (!f) {
@@ -2313,6 +2338,8 @@ int save_snapshot(const char *name, Error **errp)
 ret = 0;
 
  the_end:
+set_current_snapshot_info(NULL);
+
 if (aio_context) {
 aio_context_release(aio_context);
 }
-- 
2.7.4

[Qemu-devel] [PATCH 09/10] RFC: Add nvdimm snapshot saving to migration.

2018-03-13 Thread junyan . he

From: Junyan He 

The nvdimm size is huge, sometimes is more than 256G or even more.
This is a huge burden for snapshot saving. One snapshot point with
nvdimm may occupy more than 50G disk space even with compression
enabled.
We need to introduce dependent snapshot manner to solve this problem.
The first snapshot point should always be saved completely, and enable
dirty log trace after saving for nvdimm memory region. The later snapshot
point should add the reference to previous snapshot's nvdimm data and
just saving dirty pages. This can save a lot of disk and time if the
snapshot operations are triggered frequently.

Signed-off-by: Junyan He 
---
 Makefile.target  |1 +
 include/migration/misc.h |4 +
 migration/nvdimm.c   | 1033 ++
 3 files changed, 1038 insertions(+)
 create mode 100644 migration/nvdimm.c

diff --git a/Makefile.target b/Makefile.target
index 6549481..0259e70 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -139,6 +139,7 @@ obj-y += memory.o
 obj-y += memory_mapping.o
 obj-y += dump.o
 obj-y += migration/ram.o
+obj-y += migration/nvdimm.o
 LIBS := $(libs_softmmu) $(LIBS)
 
 # Hardware support
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 77fd4f5..0c23da8 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -20,6 +20,10 @@
 
 void ram_mig_init(void);
 
+/* migration/nvdimm.c */
+void nvdimm_snapshot_init(void);
+bool ram_block_is_nvdimm_active(RAMBlock *block);
+
 /* migration/block.c */
 
 #ifdef CONFIG_LIVE_BLOCK_MIGRATION
diff --git a/migration/nvdimm.c b/migration/nvdimm.c
new file mode 100644
index 000..8516bb0
--- /dev/null
+++ b/migration/nvdimm.c
@@ -0,0 +1,1033 @@
+/*
+ * QEMU System Emulator
+ *
+ * Authors:
+ *  He Junyan
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/mem/nvdimm.h"
+#include "cpu.h"
+#include "qemu/cutils.h"
+#include "exec/ram_addr.h"
+#include "exec/target_page.h"
+#include "qemu/rcu_queue.h"
+#include "qemu/error-report.h"
+#include "migration.h"
+#include "qapi/error.h"
+#include "migration/register.h"
+#include "migration/ram.h"
+#include "migration/qemu-file.h"
+#include "migration.h"
+#include "migration/misc.h"
+#include "migration/savevm.h"
+#include "block/snapshot.h"
+#include "migration/snapshot.h"
+
+#define NVDIMM_MIG_VERSION 0x01
+
+/* PADDING data, useless */
+#define NVDIMM_PADDING_BYTE 0xce
+/* PAGE id, is all zero */
+#define NVDIMM_ZERO_PAGE_ID 0xaabc250f
+#define NVDIMM_NONZERO_PAGE_ID 0xacbc250e
+/* No usage date, for alignment only */
+#define NVDIMM_SECTION_PADDING_ID 0xaaceccea
+/* Section for dirty log kind */
+#define NVDIMM_SECTION_DIRTY_LOG_ID 0xbbcd0c1e
+/* Section for raw data, no bitmap, dump the whole mem */
+#define NVDIMM_SECTION_DATA_ID 0x76bbcae3
+/* Section for setup */
+#define NVDIMM_SECTION_SETUP 0x7ace0cfa
+/* Section for setup */
+#define NVDIMM_SECTION_COMPLETE 0x8ace0cfa
+/* Section end symbol */
+#define NVDIMM_SECTION_END_ID 0xccbe8752
+/  Sections** ***
+Padding section
+
+| PADDING_ID | size | PADDING_BYTE .. | END_ID |
+
+Dirty log section
+
+| DIRTY_BITMAP_ID | total size | ram name size | ram name | ram size | bitmap 
size |
+
+-
+ bitmap data... | dirty page size | dirty page data... | END_ID |
+---

[Qemu-devel] [PATCH 05/10] RFC: Add memory region snapshot bitmap get function.

2018-03-13 Thread junyan . he

From: Junyan He 

We need to get the bitmap content of the snapshot when enable dirty
log trace for nvdimm.

Signed-off-by: Junyan He 
---
 exec.c  | 7 +++
 include/exec/memory.h   | 9 +
 include/exec/ram_addr.h | 2 ++
 memory.c| 7 +++
 4 files changed, 25 insertions(+)

diff --git a/exec.c b/exec.c
index a9181e6..3d2bf0d 100644
--- a/exec.c
+++ b/exec.c
@@ -1235,6 +1235,13 @@ bool 
cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
 return false;
 }
 
+unsigned long *cpu_physical_memory_snapshot_get_dirty_bitmap
+ (DirtyBitmapSnapshot *snap)
+{
+assert(snap);
+return snap->dirty;
+}
+
 /* Called from RCU critical section */
 hwaddr memory_region_section_get_iotlb(CPUState *cpu,
MemoryRegionSection *section,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 31eae0a..f742995 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1179,6 +1179,15 @@ bool memory_region_snapshot_get_dirty(MemoryRegion *mr,
   hwaddr addr, hwaddr size);
 
 /**
+ * memory_region_snapshot_get_dirty_bitmap: Get the dirty bitmap data of
+ * snapshot.
+ *
+ * @snap: the dirty bitmap snapshot
+ */
+unsigned long *memory_region_snapshot_get_dirty_bitmap
+ (DirtyBitmapSnapshot *snap);
+
+/**
  * memory_region_reset_dirty: Mark a range of pages as clean, for a specified
  *client.
  *
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index cf2446a..ce366c1 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -371,6 +371,8 @@ DirtyBitmapSnapshot 
*cpu_physical_memory_snapshot_and_clear_dirty
 bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
 ram_addr_t start,
 ram_addr_t length);
+unsigned long *cpu_physical_memory_snapshot_get_dirty_bitmap
+(DirtyBitmapSnapshot *snap);
 
 static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start,
  ram_addr_t length)
diff --git a/memory.c b/memory.c
index 4a8a2fe..68f17f0 100644
--- a/memory.c
+++ b/memory.c
@@ -1991,6 +1991,13 @@ DirtyBitmapSnapshot 
*memory_region_snapshot_and_clear_dirty(MemoryRegion *mr,
 memory_region_get_ram_addr(mr) + addr, size, client);
 }
 
+unsigned long *memory_region_snapshot_get_dirty_bitmap
+ (DirtyBitmapSnapshot *snap)
+{
+assert(snap);
+return cpu_physical_memory_snapshot_get_dirty_bitmap(snap);
+}
+
 bool memory_region_snapshot_get_dirty(MemoryRegion *mr, DirtyBitmapSnapshot 
*snap,
   hwaddr addr, hwaddr size)
 {
-- 
2.7.4

[Qemu-devel] [PATCH 08/10] RFC: Add a section_id parameter to save_live_iterate call.

2018-03-13 Thread junyan . he

From: Junyan He 

We need to know the section_id when we do snapshot saving.
Add a parameter to save_live_iterate function call.

Signed-off-by: Junyan He 
---
 hw/ppc/spapr.c   | 2 +-
 hw/s390x/s390-stattrib.c | 2 +-
 include/migration/register.h | 2 +-
 migration/block.c| 2 +-
 migration/ram.c  | 2 +-
 migration/savevm.c   | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7e1c858..4cde4f4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1974,7 +1974,7 @@ static int htab_save_later_pass(QEMUFile *f, 
sPAPRMachineState *spapr,
 #define MAX_ITERATION_NS500 /* 5 ms */
 #define MAX_KVM_BUF_SIZE2048
 
-static int htab_save_iterate(QEMUFile *f, void *opaque)
+static int htab_save_iterate(QEMUFile *f, void *opaque, int section_id)
 {
 sPAPRMachineState *spapr = opaque;
 int fd;
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index adf07ef..18ece84 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -246,7 +246,7 @@ static int cmma_save(QEMUFile *f, void *opaque, int final)
 return ret;
 }
 
-static int cmma_save_iterate(QEMUFile *f, void *opaque)
+static int cmma_save_iterate(QEMUFile *f, void *opaque, int section_id)
 {
 return cmma_save(f, opaque, 0);
 }
diff --git a/include/migration/register.h b/include/migration/register.h
index f4f7bdc..7f7df2c 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -31,7 +31,7 @@ typedef struct SaveVMHandlers {
  * use data that is local to the migration thread or protected
  * by other locks.
  */
-int (*save_live_iterate)(QEMUFile *f, void *opaque);
+int (*save_live_iterate)(QEMUFile *f, void *opaque, int section_id);
 
 /* This runs outside the iothread lock!  */
 int (*save_setup)(QEMUFile *f, void *opaque);
diff --git a/migration/block.c b/migration/block.c
index 1f03946..6d4c8a3 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -755,7 +755,7 @@ static int block_save_setup(QEMUFile *f, void *opaque)
 return ret;
 }
 
-static int block_save_iterate(QEMUFile *f, void *opaque)
+static int block_save_iterate(QEMUFile *f, void *opaque, int section_id)
 {
 int ret;
 int64_t last_ftell = qemu_ftell(f);
diff --git a/migration/ram.c b/migration/ram.c
index 3b6c077..d1db422 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2249,7 +2249,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
  * @f: QEMUFile where to send the data
  * @opaque: RAMState pointer
  */
-static int ram_save_iterate(QEMUFile *f, void *opaque)
+static int ram_save_iterate(QEMUFile *f, void *opaque, int section_id)
 {
 RAMState **temp = opaque;
 RAMState *rs = *temp;
diff --git a/migration/savevm.c b/migration/savevm.c
index 3a9b904..ce4133a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1072,7 +1072,7 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
 
 save_section_header(f, se, QEMU_VM_SECTION_PART);
 
-ret = se->ops->save_live_iterate(f, se->opaque);
+ret = se->ops->save_live_iterate(f, se->opaque, se->section_id);
 trace_savevm_section_end(se->idstr, se->section_id, ret);
 save_section_footer(f, se);
 
-- 
2.7.4

[Qemu-devel] [PATCH 04/10] RFC: Set memory_region_set_log available for more client.

2018-03-13 Thread junyan . he

From: Junyan He 

We need to collect dirty log for nvdimm kind memory, need to enable
memory_region_set_log for more clients rather than just VGA.

Signed-off-by: Junyan He 
---
 memory.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/memory.c b/memory.c
index e70b64b..4a8a2fe 100644
--- a/memory.c
+++ b/memory.c
@@ -1921,11 +1921,12 @@ void memory_region_set_log(MemoryRegion *mr, bool log, 
unsigned client)
 uint8_t mask = 1 << client;
 uint8_t old_logging;
 
-assert(client == DIRTY_MEMORY_VGA);
-old_logging = mr->vga_logging_count;
-mr->vga_logging_count += log ? 1 : -1;
-if (!!old_logging == !!mr->vga_logging_count) {
-return;
+if (client == DIRTY_MEMORY_VGA) {
+old_logging = mr->vga_logging_count;
+mr->vga_logging_count += log ? 1 : -1;
+if (!!old_logging == !!mr->vga_logging_count) {
+return;
+}
 }
 
 memory_region_transaction_begin();
-- 
2.7.4

[Qemu-devel] [PATCH 03/10] RFC: Implement save and support snapshot dependency in block driver layer.

2018-03-13 Thread junyan . he

From: Junyan He 

Signed-off-by: Junyan He 
---
 block/snapshot.c | 45 +
 include/block/snapshot.h |  7 +++
 2 files changed, 52 insertions(+)

diff --git a/block/snapshot.c b/block/snapshot.c
index eacc1f1..8cc40ac 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -401,6 +401,51 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState 
*bs,
 return ret;
 }
 
+int bdrv_snapshot_save_dependency(BlockDriverState *bs,
+  const char *depend_snapshot_id,
+  int64_t depend_offset,
+  int64_t depend_size,
+  int64_t offset,
+  Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (!drv) {
+return -ENOMEDIUM;
+}
+
+if (drv->bdrv_snapshot_save_dependency) {
+return drv->bdrv_snapshot_save_dependency(bs, depend_snapshot_id,
+  depend_offset, depend_size,
+  offset, errp);
+}
+
+if (bs->file) {
+return bdrv_snapshot_save_dependency(bs->file->bs, depend_snapshot_id,
+ depend_offset, depend_size,
+ offset, errp);
+}
+
+return -ENOTSUP;
+}
+
+int bdrv_snapshot_support_dependency(BlockDriverState *bs, int32_t *alignment)
+{
+BlockDriver *drv = bs->drv;
+if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+return 0;
+}
+
+if (drv->bdrv_snapshot_support_dependency) {
+return drv->bdrv_snapshot_support_dependency(bs, alignment);
+}
+
+if (bs->file != NULL) {
+return bdrv_snapshot_support_dependency(bs->file->bs, alignment);
+}
+
+return -ENOTSUP;
+}
 
 /* Group operations. All block drivers are involved.
  * These functions will properly handle dataplane (take aio_context_acquire
diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index f73d109..e5bf06f 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -73,6 +73,13 @@ int bdrv_snapshot_load_tmp(BlockDriverState *bs,
 int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs,
  const char *id_or_name,
  Error **errp);
+int bdrv_snapshot_save_dependency(BlockDriverState *bs,
+  const char *depend_snapshot_id,
+  int64_t depend_offset,
+  int64_t depend_size,
+  int64_t offset,
+  Error **errp);
+int bdrv_snapshot_support_dependency(BlockDriverState *bs, int32_t *alignment);
 
 
 /* Group operations. All block drivers are involved.
-- 
2.7.4

[Qemu-devel] [PATCH 10/10] RFC: Enable nvdimm snapshot functions.

2018-03-13 Thread junyan . he

From: Junyan He 

In snapshot saving, all nvdimm kind memory will be saved in different way
and we exclude all nvdimm kind memory region in ram.c

Signed-off-by: Junyan He 
---
 migration/ram.c | 17 +
 vl.c|  1 +
 2 files changed, 18 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index d1db422..ad32469 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1219,9 +1219,15 @@ static bool find_dirty_block(RAMState *rs, 
PageSearchStatus *pss, bool *again)
 /* Didn't find anything in this RAM Block */
 pss->page = 0;
 pss->block = QLIST_NEXT_RCU(pss->block, next);
+while (ram_block_is_nvdimm_active(pss->block)) {
+pss->block = QLIST_NEXT_RCU(pss->block, next);
+}
 if (!pss->block) {
 /* Hit the end of the list */
 pss->block = QLIST_FIRST_RCU(&ram_list.blocks);
+while (ram_block_is_nvdimm_active(pss->block)) {
+pss->block = QLIST_NEXT_RCU(pss->block, next);
+}
 /* Flag that we've looped */
 pss->complete_round = true;
 rs->ram_bulk_stage = false;
@@ -1541,6 +1547,9 @@ static int ram_find_and_save_block(RAMState *rs, bool 
last_stage)
 
 if (!pss.block) {
 pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
+while (ram_block_is_nvdimm_active(pss.block)) {
+pss.block = QLIST_NEXT_RCU(pss.block, next);
+}
 }
 
 do {
@@ -1583,6 +1592,10 @@ uint64_t ram_bytes_total(void)
 
 rcu_read_lock();
 RAMBLOCK_FOREACH(block) {
+if (ram_block_is_nvdimm_active(block)) {
+// If snapshot and the block is nvdimm, let nvdimm do the job
+continue;
+}
 total += block->used_length;
 }
 rcu_read_unlock();
@@ -,6 +2235,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
 RAMBLOCK_FOREACH(block) {
+if (ram_block_is_nvdimm_active(block)) {
+// If snapshot and the block is nvdimm, let nvdimm do the job
+continue;
+}
 qemu_put_byte(f, strlen(block->idstr));
 qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
 qemu_put_be64(f, block->used_length);
diff --git a/vl.c b/vl.c
index 3ef04ce..1bd5711 100644
--- a/vl.c
+++ b/vl.c
@@ -4502,6 +4502,7 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 ram_mig_init();
+nvdimm_snapshot_init();
 
 /* If the currently selected machine wishes to override the units-per-bus
  * property of its default HBA interface type, do so now. */
-- 
2.7.4

[Qemu-devel] [PATCH 06/10] RFC: Add save dependency functions to qemu_file

2018-03-13 Thread junyan . he

From: Junyan He 

When we save snapshot, we need qemu_file to support save dependency
operations. It should call brv_driver's save dependency functions
to implement these operations.

Signed-off-by: Junyan He 
---
 migration/qemu-file.c | 61 +++
 migration/qemu-file.h | 14 
 migration/savevm.c| 33 +---
 3 files changed, 105 insertions(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2ab2bf3..9d2a39a 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -46,10 +46,13 @@ struct QEMUFile {
 int buf_index;
 int buf_size; /* 0 when writing */
 uint8_t buf[IO_BUF_SIZE];
+char ref_name_str[128]; /* maybe snapshot id */
 
 DECLARE_BITMAP(may_free, MAX_IOV_SIZE);
 struct iovec iov[MAX_IOV_SIZE];
 unsigned int iovcnt;
+bool support_dependency;
+int32_t dependency_aligment;
 
 int last_error;
 };
@@ -745,3 +748,61 @@ void qemu_file_set_blocking(QEMUFile *f, bool block)
 f->ops->set_blocking(f->opaque, block);
 }
 }
+
+void qemu_file_set_support_dependency(QEMUFile *f, int32_t alignment)
+{
+f->dependency_aligment = alignment;
+f->support_dependency = true;
+}
+
+bool qemu_file_is_support_dependency(QEMUFile *f, int32_t *alignment)
+{
+if (f->support_dependency && alignment) {
+*alignment = f->dependency_aligment;
+}
+
+return f->support_dependency;
+}
+
+/* This function set the reference name for snapshot usage. Sometimes it needs
+ * to depend on other snapshot's data to avoid redundance.
+ */
+bool qemu_file_set_ref_name(QEMUFile *f, const char *name)
+{
+if (strlen(name) + 1 > sizeof(f->ref_name_str)) {
+return false;
+}
+
+memcpy(f->ref_name_str, name, strlen(name) + 1);
+return true;
+}
+
+ssize_t qemu_file_save_dependency(QEMUFile *f, int64_t depend_offset,
+  int64_t size)
+{
+ssize_t ret;
+
+if (f->support_dependency == false) {
+return -1;
+}
+
+assert(f->ops->save_dependency);
+
+if (!QEMU_IS_ALIGNED(depend_offset, f->dependency_aligment)) {
+return -1;
+}
+
+qemu_fflush(f);
+
+if (!QEMU_IS_ALIGNED(f->pos, f->dependency_aligment)) {
+return -1;
+}
+
+ret = f->ops->save_dependency(f->opaque, f->ref_name_str,
+  depend_offset, size, f->pos);
+if (ret > 0) {
+f->pos += size;
+}
+
+return ret;
+}
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index aae4e5e..137b917 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -57,6 +57,14 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, 
struct iovec *iov,
int iovcnt, int64_t pos);
 
 /*
+ * This function add reference to the dependency data in snapshot specified by
+ * ref_name_str to this file's offset
+ */
+typedef ssize_t (QEMUFileSaveDependencyFunc)(void *opaque, const char *name,
+ int64_t depend_offset,
+ int64_t offset, int64_t size);
+
+/*
  * This function provides hooks around different
  * stages of RAM migration.
  * 'opaque' is the backend specific data in QEMUFile
@@ -104,6 +112,7 @@ typedef struct QEMUFileOps {
 QEMUFileWritevBufferFunc *writev_buffer;
 QEMURetPathFunc *get_return_path;
 QEMUFileShutdownFunc *shut_down;
+QEMUFileSaveDependencyFunc *save_dependency;
 } QEMUFileOps;
 
 typedef struct QEMUFileHooks {
@@ -153,6 +162,11 @@ int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
+bool qemu_file_set_ref_name(QEMUFile *f, const char *name);
+void qemu_file_set_support_dependency(QEMUFile *f, int32_t alignment);
+bool qemu_file_is_support_dependency(QEMUFile *f, int32_t *alignment);
+ssize_t qemu_file_save_dependency(QEMUFile *f, int64_t depend_offset,
+  int64_t size);
 
 size_t qemu_get_counted_string(QEMUFile *f, char buf[256]);
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 358c5b5..1bbd6aa 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -196,6 +196,20 @@ static ssize_t block_writev_buffer(void *opaque, struct 
iovec *iov, int iovcnt,
 return qiov.size;
 }
 
+static ssize_t block_save_dependency(void *opaque, const char *id_name,
+ int64_t depend_offset,
+ int64_t offset, int64_t size)
+{
+int ret = bdrv_snapshot_save_dependency(opaque, id_name,
+depend_offset, offset,
+size, NULL);
+if (ret < 0) {
+return r

[Qemu-devel] [PATCH 01/10] RFC: Add save and support snapshot dependency function to block driver.

2018-03-13 Thread junyan . he

From: Junyan He 

We want to support incremental snapshot saving, this needs the file
system support dependency saving. Later snapshots may ref the dependent
snapshot's content, and most time should be cluster aligned.
Add a query function to check whether the file system support this, and
use the save_dependency function to do the real work.

Signed-off-by: Junyan He 
---
 include/block/block_int.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 64a5700..be1eca3 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -274,6 +274,15 @@ struct BlockDriver {
   const char *snapshot_id,
   const char *name,
   Error **errp);
+int (*bdrv_snapshot_save_dependency)(BlockDriverState *bs,
+ const char *depend_snapshot_id,
+ int64_t depend_offset,
+ int64_t depend_size,
+ int64_t offset,
+ Error **errp);
+int (*bdrv_snapshot_support_dependency)(BlockDriverState *bs,
+int32_t *alignment);
+
 int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
 
-- 
2.7.4

[Qemu-devel] [PATCH 02/10] RFC: Implement qcow2's snapshot dependent saving function.

2018-03-13 Thread junyan . he

From: Junyan He 

For qcow2 format, we can increase the cluster's reference count of
dependent snapshot content and link the offset to the L2 table of
the new snapshot point. This way can avoid obvious snapshot's dependent
relationship, so when we delete some snapshot point, just decrease the
cluster count and no need to check further.

Signed-off-by: Junyan He 
---
 block/qcow2-snapshot.c | 154 +
 block/qcow2.c  |   2 +
 block/qcow2.h  |   7 +++
 3 files changed, 163 insertions(+)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index cee25f5..8e83084 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -736,3 +736,157 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
 
 return 0;
 }
+
+int qcow2_snapshot_save_dependency(BlockDriverState *bs,
+   const char *depend_snapshot_id,
+   int64_t depend_offset,
+   int64_t depend_size,
+   int64_t offset,
+   Error **errp)
+{
+int snapshot_index;
+BDRVQcow2State *s = bs->opaque;
+QCowSnapshot *sn;
+int ret;
+int64_t i;
+int64_t total_bytes = depend_size;
+int64_t depend_offset1, offset1;
+uint64_t *depend_l1_table = NULL;
+uint64_t depend_l1_bytes;
+uint64_t *depend_l2_table = NULL;
+uint64_t depend_l2_offset;
+uint64_t depend_entry;
+QCowL2Meta l2meta;
+
+assert(bs->read_only == false);
+
+if (depend_snapshot_id == NULL) {
+return 0;
+}
+
+if (!QEMU_IS_ALIGNED(depend_offset,  s->cluster_size)) {
+error_setg(errp, "Specified snapshot offset is not multiple of %u",
+s->cluster_size);
+return -EINVAL;
+}
+
+if (!QEMU_IS_ALIGNED(offset,  s->cluster_size)) {
+error_setg(errp, "Offset is not multiple of %u", s->cluster_size);
+return -EINVAL;
+}
+
+if (!QEMU_IS_ALIGNED(depend_size,  s->cluster_size)) {
+error_setg(errp, "depend_size is not multiple of %u", s->cluster_size);
+return -EINVAL;
+}
+
+snapshot_index = find_snapshot_by_id_and_name(bs, NULL, 
depend_snapshot_id);
+/* Search the snapshot */
+if (snapshot_index < 0) {
+error_setg(errp, "Can't find snapshot");
+return -ENOENT;
+}
+
+sn = &s->snapshots[snapshot_index];
+if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) {
+error_report("qcow2: depend on the snapshots with different disk "
+"size is not implemented");
+return -ENOTSUP;
+}
+
+/* Only can save dependency of snapshot's vmstate data */
+depend_offset1 = depend_offset + qcow2_vm_state_offset(s);
+offset1 = offset + qcow2_vm_state_offset(s);
+
+depend_l1_bytes = s->l1_size * sizeof(uint64_t);
+depend_l1_table = g_try_malloc0(depend_l1_bytes);
+if (depend_l1_table == NULL) {
+return -ENOMEM;
+}
+
+ret = bdrv_pread(bs->file, sn->l1_table_offset, depend_l1_table,
+ depend_l1_bytes);
+if (ret < 0) {
+g_free(depend_l1_table);
+goto out;
+}
+for (i = 0; i < depend_l1_bytes / sizeof(uint64_t); i++) {
+be64_to_cpus(&depend_l1_table[i]);
+}
+
+while (total_bytes) {
+assert(total_bytes > 0);
+/* Find the cluster of depend */
+depend_l2_offset =
+depend_l1_table[depend_offset1 >> (s->l2_bits + s->cluster_bits)];
+depend_l2_offset &= L1E_OFFSET_MASK;
+if (depend_l2_offset == 0) {
+ret = -EINVAL;
+goto out;
+}
+
+if (offset_into_cluster(s, depend_l2_offset)) {
+qcow2_signal_corruption(bs, true, -1, -1, "L2 table offset %#"
+PRIx64 " unaligned (L1 index: %#"
+PRIx64 ")",
+depend_l2_offset,
+depend_offset1 >>
+(s->l2_bits + s->cluster_bits));
+return -EIO;
+}
+
+ret = qcow2_cache_get(bs, s->l2_table_cache, depend_l2_offset,
+  (void **)(&depend_l2_table));
+if (ret < 0) {
+goto out;
+}
+
+depend_entry =
+be64_to_cpu(
+depend_l2_table[offset_to_l2_index(s, depend_offset1)]);
+if (depend_entry == 0) {
+ret = -EINVAL;
+qcow2_cache_put(s->l2_table_cache, (void **)(&depend_l2_table));
+goto out;
+}
+
+memset(&l2meta, 0, sizeof(l2meta));
+l2meta.offset = offset1;
+l2meta.alloc_off

[Qemu-devel] [PATCH 00/10] RFC: Optimize nvdimm kind memory for snapshot.

2018-03-13 Thread junyan . he

From: Junyan He 

The nvdimm size is huge, sometimes it is more than 256G or even more.
This is a huge burden for snapshot saving. One snapshot point with
nvdimm may occupy more than 50G disk space even with compression
enabled.
We need to introduce dependent snapshot manner to solve this problem.
The first snapshot point should always be saved completely, and enable
dirty log trace after saving for nvdimm memory region. The later snapshot
point should add the reference to previous snapshot's nvdimm data and
just saving dirty pages. This can save a lot of disk and time if the
snapshot operations are triggered frequently.
We add save_snapshot_dependency functions to QCOW2 file system firstly, the
later snapshot will add reference to previous dependent snapshot's data
cluster. There is an alignment problem here, the dependent data should
always be cluster aligned. We need to add some padding data when saving
the snapshot to make it always cluster aligned.
The logic between nvdimm and ram for snapshot saving is a little confused
now, we need to exclude nvdimm kind memory region from ram list and the
dirty log tracing setting is also not very clear. Maybe we can separate the
snapshot saving from the migration logic later to make code clean.
In theory, this kind of manner can apply to any kind of memory. But because
it need to turn dirty log trace on, the performance may decline. So we just
enable it for nvdimm kind memory firstly.

Signed-off-by: Junyan He 
---
Makefile.target  |1 +
block/qcow2-snapshot.c   |  154 ++
block/qcow2.c|2 +
block/qcow2.h|7 +
block/snapshot.c |   45 +++
exec.c   |7 +
hw/ppc/spapr.c   |2 +-
hw/s390x/s390-stattrib.c |2 +-
include/block/block_int.h|9 ++
include/block/snapshot.h |7 +
include/exec/memory.h|9 ++
include/exec/ram_addr.h  |2 +
include/migration/misc.h |4 +
include/migration/register.h |2 +-
include/migration/snapshot.h |3 +
memory.c |   18 ++-
migration/block.c|2 +-
migration/nvdimm.c   | 1033 
+
migration/qemu-file.c|   61 +
migration/qemu-file.h|   14 ++
migration/ram.c  |   19 ++-
migration/savevm.c   |   62 -
vl.c |1 +
23 files changed, 1452 insertions(+), 14 deletions(-)

95 matches

Mail list logo