[Qemu-devel] [PATCH 7/7 V11] migration/ram: ensure write persistence on loading all data to PMEM.
From: Junyan He Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- include/qemu/pmem.h | 6 ++ migration/ram.c | 8 2 files changed, 14 insertions(+) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index ebdb070..dfb6d0d 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -25,6 +25,12 @@ pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) return NULL; } +static inline void +pmem_persist(const void *addr, size_t len) +{ +g_assert_not_reached(); +} + #endif /* CONFIG_LIBPMEM */ #endif /* !QEMU_PMEM_H */ diff --git a/migration/ram.c b/migration/ram.c index 309b567..67b620b 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -33,6 +33,7 @@ #include "qemu/bitops.h" #include "qemu/bitmap.h" #include "qemu/main-loop.h" +#include "qemu/pmem.h" #include "xbzrle.h" #include "ram.h" #include "migration.h" @@ -3540,6 +3541,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque) static int ram_load_cleanup(void *opaque) { RAMBlock *rb; + +RAMBLOCK_FOREACH_MIGRATABLE(rb) { +if (ramblock_is_pmem(rb)) { +pmem_persist(rb->host, rb->used_length); +} +} + xbzrle_load_cleanup(); compress_threads_load_cleanup(); -- 2.7.4
[Qemu-devel] [PATCH 6/7 V11] migration/ram: Add check and info message to nvdimm post copy.
From: Junyan He The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- migration/ram.c | 9 + 1 file changed, 9 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 52dd678..309b567 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3899,6 +3899,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { +RAMBlock *rb; +RAMBLOCK_FOREACH_MIGRATABLE(rb) { +if (ramblock_is_pmem(rb)) { +info_report("Block: %s, host: %p is a nvdimm memory, postcopy" + "is not supported now!", rb->idstr, rb->host); +return false; +} +} + return migrate_postcopy_ram(); } -- 2.7.4
[Qemu-devel] [PATCH 5/7 V11] mem/nvdimm: ensure write persistence to PMEM in label emulation
From: Junyan He Guest writes to vNVDIMM labels are intercepted and performed on the backend by QEMU. When the backend is a real persistent memort, QEMU needs to take proper operations to ensure its write persistence on the persistent memory. Otherwise, a host power failure may result in the loss of guest label configurations. Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- hw/mem/nvdimm.c | 9 - include/qemu/pmem.h | 30 ++ 2 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 include/qemu/pmem.h diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 021d1c3..1c6674c 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "qemu/pmem.h" #include "qapi/error.h" #include "qapi/visitor.h" #include "hw/mem/nvdimm.h" @@ -164,11 +165,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, { MemoryRegion *mr; PCDIMMDevice *dimm = PC_DIMM(nvdimm); +bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), +"pmem", NULL); uint64_t backend_offset; nvdimm_validate_rw_label_data(nvdimm, size, offset); -memcpy(nvdimm->label_data + offset, buf, size); +if (!is_pmem) { +memcpy(nvdimm->label_data + offset, buf, size); +} else { +pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); +} mr = host_memory_backend_get_memory(dimm->hostmem); backend_offset = memory_region_size(mr) - nvdimm->label_size + offset; diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h new file mode 100644 index 000..ebdb070 --- /dev/null +++ b/include/qemu/pmem.h @@ -0,0 +1,30 @@ +/* + * QEMU header file for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_PMEM_H +#define QEMU_PMEM_H + +#ifdef CONFIG_LIBPMEM +#include +#else /* !CONFIG_LIBPMEM */ + +static inline void * +pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) +{ +/* If 'pmem' option is 'on', we should always have libpmem support, + or qemu will report a error and exit, never come here. */ +g_assert_not_reached(); +return NULL; +} + +#endif /* CONFIG_LIBPMEM */ + +#endif /* !QEMU_PMEM_H */ -- 2.7.4
[Qemu-devel] [PATCH 4/7 V11] hostmem-file: add the 'pmem' option
From: Junyan He When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it needs to know whether the backend storage is a real persistent memory, in order to decide whether special operations should be performed to ensure the data persistence. This boolean option 'pmem' allows users to specify whether the backend storage of memory-backend-file is a real persistent memory. If 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the corresponding memory region. If 'pmem' is set while lack of libpmem support, a error is generated. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov Reviewed-by: Richard Henderson --- backends/hostmem-file.c | 43 +-- docs/nvdimm.txt | 22 ++ exec.c | 8 include/exec/memory.h | 4 include/exec/ram_addr.h | 3 +++ qemu-options.hx | 7 +++ 6 files changed, 85 insertions(+), 2 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 34c68bb..2476dcb 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -12,6 +12,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "qemu-common.h" +#include "qemu/error-report.h" #include "sysemu/hostmem.h" #include "sysemu/sysemu.h" #include "qom/object_interfaces.h" @@ -31,9 +32,10 @@ typedef struct HostMemoryBackendFile HostMemoryBackendFile; struct HostMemoryBackendFile { HostMemoryBackend parent_obj; -bool discard_data; char *mem_path; uint64_t align; +bool discard_data; +bool is_pmem; }; static void @@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, backend->size, fb->align, - backend->share ? RAM_SHARED : 0, + (backend->share ? RAM_SHARED : 0) | + (fb->is_pmem ? RAM_PMEM : 0), fb->mem_path, errp); g_free(path); } @@ -131,6 +134,39 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, error_propagate(errp, local_err); } +static bool file_memory_backend_get_pmem(Object *o, Error **errp) +{ +return MEMORY_BACKEND_FILE(o)->is_pmem; +} + +static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property 'pmem' of %s '%s'", + object_get_typename(o), + object_get_canonical_path_component(o)); +return; +} + +#ifndef CONFIG_LIBPMEM +if (value) { +Error *local_err = NULL; +error_setg(&local_err, + "Lack of libpmem support while setting the 'pmem=on'" + " of %s '%s'. We can't ensure data persistence.", + object_get_typename(o), + object_get_canonical_path_component(o)); +error_propagate(errp, local_err); +return; +} +#endif + +fb->is_pmem = value; +} + static void file_backend_unparent(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -162,6 +198,9 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL, &error_abort); +object_class_property_add_bool(oc, "pmem", +file_memory_backend_get_pmem, file_memory_backend_set_pmem, +&error_abort); } static void file_backend_instance_finalize(Object *o) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index 24b443b..5f158a6 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -173,3 +173,25 @@ There are currently two valid values for this option: the NVDIMMs in the event of power loss. This implies that the platform also supports flushing dirty data through the memory controller on power loss. + +If the vNVDIMM backend is in host persistent memory that can be accessed in +SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's suggested to set +the 'pmem' option of memory-backend-file to 'on'. When 'pmem' is 'on' and QEMU +is built with libpmem [2] support (configured with --enable-libpmem), QEMU +will take necessary operations to guarantee the persistence of its own writes +to the vNVDIMM backend(e.g., in vNVDIMM label emulation and l
[Qemu-devel] [PATCH 1/7 V11] memory, exec: Expose all memory block related flags.
From: Junyan He We need to use these flags in other files rather than just in exec.c, For example, RAM_SHARED should be used when create a ram block from file. We expose them the exec/memory.h Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov Reviewed-by: Richard Henderson --- exec.c| 20 include/exec/memory.h | 20 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/exec.c b/exec.c index 4f5df07..cc042dc 100644 --- a/exec.c +++ b/exec.c @@ -87,26 +87,6 @@ AddressSpace address_space_memory; MemoryRegion io_mem_rom, io_mem_notdirty; static MemoryRegion io_mem_unassigned; - -/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ -#define RAM_PREALLOC (1 << 0) - -/* RAM is mmap-ed with MAP_SHARED */ -#define RAM_SHARED (1 << 1) - -/* Only a portion of RAM (used_length) is actually used, and migrated. - * This used_length size can change across reboots. - */ -#define RAM_RESIZEABLE (1 << 2) - -/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically - * zero the page and wake waiting processes. - * (Set during postcopy) - */ -#define RAM_UF_ZEROPAGE (1 << 3) - -/* RAM can be migrated */ -#define RAM_MIGRATABLE (1 << 4) #endif #ifdef TARGET_PAGE_BITS_VARY diff --git a/include/exec/memory.h b/include/exec/memory.h index 448d41a..6d0af29 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -103,6 +103,26 @@ struct IOMMUNotifier { }; typedef struct IOMMUNotifier IOMMUNotifier; +/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ +#define RAM_PREALLOC (1 << 0) + +/* RAM is mmap-ed with MAP_SHARED */ +#define RAM_SHARED (1 << 1) + +/* Only a portion of RAM (used_length) is actually used, and migrated. + * This used_length size can change across reboots. + */ +#define RAM_RESIZEABLE (1 << 2) + +/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically + * zero the page and wake waiting processes. + * (Set during postcopy) + */ +#define RAM_UF_ZEROPAGE (1 << 3) + +/* RAM can be migrated */ +#define RAM_MIGRATABLE (1 << 4) + static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn, IOMMUNotifierFlag flags, hwaddr start, hwaddr end, -- 2.7.4
[Qemu-devel] [PATCH 2/7 V11] memory, exec: switch file ram allocation functions to 'flags' parameters
From: Junyan He As more flag parameters besides the existing 'share' are going to be added to following functions memory_region_init_ram_from_file qemu_ram_alloc_from_fd qemu_ram_alloc_from_file let's switch them to use the 'flags' parameters so as to ease future flag additions. The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags, and other flag bits are ignored by above functions right now. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- backends/hostmem-file.c | 3 ++- exec.c | 10 +- include/exec/memory.h | 7 +-- include/exec/ram_addr.h | 25 +++-- memory.c| 8 +--- numa.c | 2 +- 6 files changed, 41 insertions(+), 14 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 134b08d..34c68bb 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) path = object_get_canonical_path(OBJECT(backend)); memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, - backend->size, fb->align, backend->share, + backend->size, fb->align, + backend->share ? RAM_SHARED : 0, fb->mem_path, errp); g_free(path); } diff --git a/exec.c b/exec.c index cc042dc..3b8f914 100644 --- a/exec.c +++ b/exec.c @@ -2238,7 +2238,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared) #ifdef __linux__ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, - bool share, int fd, + uint32_t ram_flags, int fd, Error **errp) { RAMBlock *new_block; @@ -2280,14 +2280,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, new_block->mr = mr; new_block->used_length = size; new_block->max_length = size; -new_block->flags = share ? RAM_SHARED : 0; +new_block->flags = ram_flags; new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp); if (!new_block->host) { g_free(new_block); return NULL; } -ram_block_add(new_block, &local_err, share); +ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED); if (local_err) { g_free(new_block); error_propagate(errp, local_err); @@ -2299,7 +2299,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, - bool share, const char *mem_path, + uint32_t ram_flags, const char *mem_path, Error **errp) { int fd; @@ -2311,7 +2311,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, return NULL; } -block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp); +block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp); if (!block) { if (created) { unlink(mem_path); diff --git a/include/exec/memory.h b/include/exec/memory.h index 6d0af29..30e7166 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -640,6 +640,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, void *host), Error **errp); #ifdef __linux__ + /** * memory_region_init_ram_from_file: Initialize RAM memory region with a *mmap-ed backend. @@ -651,7 +652,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @size: size of the region. * @align: alignment of the region base address; if 0, the default alignment * (getpagesize()) will be used. - * @share: %true if memory must be mmaped with the MAP_SHARED flag + * @ram_flags: Memory region features: + * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag + * Other bits are ignored now. * @path: the path in which to allocate the RAM. * @errp: pointer to Error*, to store an error if it happens. * @@ -663,7 +666,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, const char *name, uint64_t size, uint64_t align, - bool share, + uint32_t ram_flags, const char *path, Error **errp); di
[Qemu-devel] [PATCH 3/7 V11] configure: add libpmem support
From: Junyan He Add a pair of configure options --{enable,disable}-libpmem to control whether QEMU is compiled with PMDK libpmem [1]. QEMU may write to the host persistent memory (e.g. in vNVDIMM label emulation and live migration), so it must take the proper operations to ensure the persistence of its own writes. Depending on the CPU models and available instructions, the optimal operation can vary [2]. PMDK libpmem have already implemented those operations on multiple CPU models (x86 and ARM) and the logic to select the optimal ones, so QEMU can just use libpmem rather than re-implement them. Libpem is a part of PMDK project(formerly known as NMVL). The project's home page is: http://pmem.io/pmdk/ And the project's repository is: https://github.com/pmem/pmdk/ For more information about libpmem APIs, you can refer to the comments in source code of: pmdk/src/libpmem/pmem.c, begin at line 33. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov Reviewed-by: Richard Henderson --- configure | 29 + 1 file changed, 29 insertions(+) diff --git a/configure b/configure index 2a7796e..1c9288b 100755 --- a/configure +++ b/configure @@ -475,6 +475,7 @@ vxhs="" libxml2="" docker="no" debug_mutex="no" +libpmem="" # cross compilers defaults, can be overridden with --cross-cc-ARCH cross_cc_aarch64="aarch64-linux-gnu-gcc" @@ -1435,6 +1436,10 @@ for opt do ;; --disable-debug-mutex) debug_mutex=no ;; + --enable-libpmem) libpmem=yes + ;; + --disable-libpmem) libpmem=no + ;; *) echo "ERROR: unknown option $opt" echo "Try '$0 --help' for more information" @@ -1710,6 +1715,7 @@ disabled with --disable-FEATURE, default is enabled if available: vhost-user vhost-user support capstonecapstone disassembler support debug-mutex mutex debugging support + libpmem libpmem support NOTE: The object files are built at the place where configure is launched EOF @@ -5546,6 +5552,24 @@ if has "docker"; then fi ## +# check for libpmem + +if test "$libpmem" != "no"; then + if $pkg_config --exists "libpmem"; then + libpmem="yes" + libpmem_libs=$($pkg_config --libs libpmem) + libpmem_cflags=$($pkg_config --cflags libpmem) + libs_softmmu="$libs_softmmu $libpmem_libs" + QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags" + else + if test "$libpmem" = "yes" ; then + feature_not_found "libpmem" "Install nvml or pmdk" + fi + libpmem="no" + fi +fi + +## # End of CC checks # After here, no more $cc or $ld runs @@ -6010,6 +6034,7 @@ echo "replication support $replication" echo "VxHS block device $vxhs" echo "capstone $capstone" echo "docker$docker" +echo "libpmem support $libpmem" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" @@ -6763,6 +6788,10 @@ if test "$vxhs" = "yes" ; then echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak fi +if test "$libpmem" = "yes" ; then + echo "CONFIG_LIBPMEM=y" >> $config_host_mak +fi + if test "$tcg_interpreter" = "yes"; then QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES" elif test "$ARCH" = "sparc64" ; then -- 2.7.4
[Qemu-devel] [PATCH 0/7 V11] nvdimm: guarantee persistence of QEMU writes to persistent memory
From: Junyan He QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live migration. If the backend is on the persistent memory, QEMU needs to take proper operations to ensure its writes persistent on the persistent memory. Otherwise, a host power failure may result in the loss the guest data on the persistent memory. This patch series is based on Marcel's patch "mem: add share parameter to memory-backend-ram" [1] because of the changes in patch 1. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html Previous versions of this patch series can be found at: v10: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg03433.html v9: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02361.html v8: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02279.html v7: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02997.html v6: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00061.html v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html Changes in v11: * (Patch 2) Modify the ram_flags parameter to 32bits, the same size as it in RAMBlock * (Patch 5 and Patch 7) Delete pmem_xxx stub functions in stubs/pmem.c. Use inline functions with assert to replace them, because we never come there when pmem is enabled but lack of libpmem support. Changes in v10: * (Patch 4) Fix a nit in nvdimm docs about pmem option usage in command line The v10 patch set is all reviewed by Igor Mammedov Changes in v9: * (Patch 3 and Patch 4) Reorder these two patches to make logic right. Firstly add libpmem support, and then we can use libpmem's configure check result. Also fix some typo and grammar issues in these two patches. Changs in v8: * (Patch 3) Report a error when user set 'pmem' to file-backend, while the qemu is lack of libpmem support. In this case, we can not ensure the persistence of the file-backend, so we choose to fail the build rather than contine and make the thing more confused. Changes in v7: The v6 patch set has already reviewed by Stefan Hajnoczi No logic change in this v7 version, just: * Spelling check and some document words refined. * Rebase to "ram is migratable" patch set. Changes in v6: * (Patch 1) Expose all ram block flags rather than redefine the flags. * (Patch 4) Use pkg-config rather the hard check when configure. * (Patch 7) Sync and flush all the pmem data when migration completes, rather than sync pages one by one in previous version. Changes in v5: * (Patch 9) Add post copy check and output some messages for nvdimm. Changes in v4: * (Patch 2) Fix compilation errors found by patchew. Changes in v3: * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle PMEM writes in it, so we don't need the _common function. * (Patch 6) Expose qemu_get_buffer_common so we can remove the unnecessary qemu_get_buffer_to_pmem wrapper. * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle PMEM writes in it, so we can remove the unnecessary xbzrle_decode_buffer_{common, to_pmem}. * Move libpmem stubs to stubs/pmem.c and fix the compilation failures of test-{xbzrle,vmstate}.c. Changes in v2: * (Patch 1) Use a flags parameter in file ram allocation functions. * (Patch 2) Add a new option 'pmem' to hostmem-file. * (Patch 3) Use libpmem to operate on the persistent memory, rather than re-implementing those operations in QEMU. * (Patch 5-8) Consider the write persistence in the migration path. Junyan: [1/7] memory, exec: Expose all memory block related flags. [6/7] migration/ram: Add check and info message to nvdimm post copy. [7/7] migration/ram: ensure write persistence on loading all date to PMEM. Haozhong: [5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation Haozhong & Junyan: [2/7] memory, exec: switch file ram allocation functions to 'flags' parameters [3/7] configure: add libpmem support [4/7] hostmem-file: add the 'pmem' option [4/7] hostmem-file: add the 'pmem' option -- backends/hostmem-file.c | 44 ++-- configure | 29 + docs/nvdimm.txt | 22 ++ exec.c | 38 +- hw/mem/nvdimm.c | 9 - include/exec/memory.h | 31 +-- include/exec/ram_addr.h | 28 ++-- include/qemu/pmem.h | 36 memory.c| 8 +--- migration/ram.c | 17 + numa.c | 2 +- qemu-options.hx | 7 +++
[Qemu-devel] [PATCH 7/7 V10] migration/ram: ensure write persistence on loading all data to PMEM.
From: Junyan He Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- include/qemu/pmem.h | 1 + migration/ram.c | 10 +- stubs/pmem.c| 4 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 00d6680..8f52b08 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -17,6 +17,7 @@ #else /* !CONFIG_LIBPMEM */ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); +void pmem_persist(const void *addr, size_t len); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index fd7cba1..67b620b 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -33,6 +33,7 @@ #include "qemu/bitops.h" #include "qemu/bitmap.h" #include "qemu/main-loop.h" +#include "qemu/pmem.h" #include "xbzrle.h" #include "ram.h" #include "migration.h" @@ -3540,6 +3541,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque) static int ram_load_cleanup(void *opaque) { RAMBlock *rb; + +RAMBLOCK_FOREACH_MIGRATABLE(rb) { +if (ramblock_is_pmem(rb)) { +pmem_persist(rb->host, rb->used_length); +} +} + xbzrle_load_cleanup(); compress_threads_load_cleanup(); @@ -3900,7 +3908,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { RAMBlock *rb; -RAMBLOCK_FOREACH(rb) { +RAMBLOCK_FOREACH_MIGRATABLE(rb) { if (ramblock_is_pmem(rb)) { info_report("Block: %s, host: %p is a nvdimm memory, postcopy" "is not supported now!", rb->idstr, rb->host); diff --git a/stubs/pmem.c b/stubs/pmem.c index b4ec72d..f794262 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) { return memcpy(pmemdest, src, len); } + +void pmem_persist(const void *addr, size_t len) +{ +} -- 2.7.4
[Qemu-devel] [PATCH 6/7 V10] migration/ram: Add check and info message to nvdimm post copy.
From: Junyan He The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- migration/ram.c | 9 + 1 file changed, 9 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 52dd678..fd7cba1 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3899,6 +3899,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { +RAMBlock *rb; +RAMBLOCK_FOREACH(rb) { +if (ramblock_is_pmem(rb)) { +info_report("Block: %s, host: %p is a nvdimm memory, postcopy" + "is not supported now!", rb->idstr, rb->host); +return false; +} +} + return migrate_postcopy_ram(); } -- 2.7.4
[Qemu-devel] [PATCH 3/7 V10] configure: add libpmem support
From: Junyan He Add a pair of configure options --{enable,disable}-libpmem to control whether QEMU is compiled with PMDK libpmem [1]. QEMU may write to the host persistent memory (e.g. in vNVDIMM label emulation and live migration), so it must take the proper operations to ensure the persistence of its own writes. Depending on the CPU models and available instructions, the optimal operation can vary [2]. PMDK libpmem have already implemented those operations on multiple CPU models (x86 and ARM) and the logic to select the optimal ones, so QEMU can just use libpmem rather than re-implement them. Libpem is a part of PMDK project(formerly known as NMVL). The project's home page is: http://pmem.io/pmdk/ And the project's repository is: https://github.com/pmem/pmdk/ For more information about libpmem APIs, you can refer to the comments in source code of: pmdk/src/libpmem/pmem.c, begin at line 33. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- configure | 29 + 1 file changed, 29 insertions(+) diff --git a/configure b/configure index 2a7796e..1c9288b 100755 --- a/configure +++ b/configure @@ -475,6 +475,7 @@ vxhs="" libxml2="" docker="no" debug_mutex="no" +libpmem="" # cross compilers defaults, can be overridden with --cross-cc-ARCH cross_cc_aarch64="aarch64-linux-gnu-gcc" @@ -1435,6 +1436,10 @@ for opt do ;; --disable-debug-mutex) debug_mutex=no ;; + --enable-libpmem) libpmem=yes + ;; + --disable-libpmem) libpmem=no + ;; *) echo "ERROR: unknown option $opt" echo "Try '$0 --help' for more information" @@ -1710,6 +1715,7 @@ disabled with --disable-FEATURE, default is enabled if available: vhost-user vhost-user support capstonecapstone disassembler support debug-mutex mutex debugging support + libpmem libpmem support NOTE: The object files are built at the place where configure is launched EOF @@ -5546,6 +5552,24 @@ if has "docker"; then fi ## +# check for libpmem + +if test "$libpmem" != "no"; then + if $pkg_config --exists "libpmem"; then + libpmem="yes" + libpmem_libs=$($pkg_config --libs libpmem) + libpmem_cflags=$($pkg_config --cflags libpmem) + libs_softmmu="$libs_softmmu $libpmem_libs" + QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags" + else + if test "$libpmem" = "yes" ; then + feature_not_found "libpmem" "Install nvml or pmdk" + fi + libpmem="no" + fi +fi + +## # End of CC checks # After here, no more $cc or $ld runs @@ -6010,6 +6034,7 @@ echo "replication support $replication" echo "VxHS block device $vxhs" echo "capstone $capstone" echo "docker$docker" +echo "libpmem support $libpmem" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" @@ -6763,6 +6788,10 @@ if test "$vxhs" = "yes" ; then echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak fi +if test "$libpmem" = "yes" ; then + echo "CONFIG_LIBPMEM=y" >> $config_host_mak +fi + if test "$tcg_interpreter" = "yes"; then QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES" elif test "$ARCH" = "sparc64" ; then -- 2.7.4
[Qemu-devel] [PATCH 5/7 V10] mem/nvdimm: ensure write persistence to PMEM in label emulation
From: Junyan He Guest writes to vNVDIMM labels are intercepted and performed on the backend by QEMU. When the backend is a real persistent memort, QEMU needs to take proper operations to ensure its write persistence on the persistent memory. Otherwise, a host power failure may result in the loss of guest label configurations. Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- hw/mem/nvdimm.c | 9 - include/qemu/pmem.h | 23 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 19 +++ 4 files changed, 51 insertions(+), 1 deletion(-) create mode 100644 include/qemu/pmem.h create mode 100644 stubs/pmem.c diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 021d1c3..1c6674c 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "qemu/pmem.h" #include "qapi/error.h" #include "qapi/visitor.h" #include "hw/mem/nvdimm.h" @@ -164,11 +165,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, { MemoryRegion *mr; PCDIMMDevice *dimm = PC_DIMM(nvdimm); +bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), +"pmem", NULL); uint64_t backend_offset; nvdimm_validate_rw_label_data(nvdimm, size, offset); -memcpy(nvdimm->label_data + offset, buf, size); +if (!is_pmem) { +memcpy(nvdimm->label_data + offset, buf, size); +} else { +pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); +} mr = host_memory_backend_get_memory(dimm->hostmem); backend_offset = memory_region_size(mr) - nvdimm->label_size + offset; diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h new file mode 100644 index 000..00d6680 --- /dev/null +++ b/include/qemu/pmem.h @@ -0,0 +1,23 @@ +/* + * QEMU header file for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_PMEM_H +#define QEMU_PMEM_H + +#ifdef CONFIG_LIBPMEM +#include +#else /* !CONFIG_LIBPMEM */ + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); + +#endif /* CONFIG_LIBPMEM */ + +#endif /* !QEMU_PMEM_H */ diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 53d3f32..be9a042 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -43,3 +43,4 @@ stub-obj-y += xen-common.o stub-obj-y += xen-hvm.o stub-obj-y += pci-host-piix.o stub-obj-y += ram-block.o +stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o \ No newline at end of file diff --git a/stubs/pmem.c b/stubs/pmem.c new file mode 100644 index 000..b4ec72d --- /dev/null +++ b/stubs/pmem.c @@ -0,0 +1,19 @@ +/* + * Stubs for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include + +#include "qemu/pmem.h" + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) +{ +return memcpy(pmemdest, src, len); +} -- 2.7.4
[Qemu-devel] [PATCH 0/7 V10] nvdimm: guarantee persistence of QEMU writes to persistent memory
From: Junyan He QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live migration. If the backend is on the persistent memory, QEMU needs to take proper operations to ensure its writes persistent on the persistent memory. Otherwise, a host power failure may result in the loss the guest data on the persistent memory. This patch series is based on Marcel's patch "mem: add share parameter to memory-backend-ram" [1] because of the changes in patch 1. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html Previous versions of this patch series can be found at: v9: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02361.html v8: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02279.html v7: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02997.html v6: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00061.html v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html Changes in v10: * (Patch 4) Fix a nit in nvdimm docs about pmem option usage in command line The v10 patch set is all reviewed by Igor Mammedov Changes in v9: * (Patch 3 and Patch 4) Reorder these two patches to make logic right. Firstly add libpmem support, and then we can use libpmem's configure check result. Also fix some typo and grammar issues in these two patches. Changs in v8: * (Patch 3) Report a error when user set 'pmem' to file-backend, while the qemu is lack of libpmem support. In this case, we can not ensure the persistence of the file-backend, so we choose to fail the build rather than contine and make the thing more confused. Changes in v7: The v6 patch set has already reviewed by Stefan Hajnoczi No logic change in this v7 version, just: * Spelling check and some document words refined. * Rebase to "ram is migratable" patch set. Changes in v6: * (Patch 1) Expose all ram block flags rather than redefine the flags. * (Patch 4) Use pkg-config rather the hard check when configure. * (Patch 7) Sync and flush all the pmem data when migration completes, rather than sync pages one by one in previous version. Changes in v5: * (Patch 9) Add post copy check and output some messages for nvdimm. Changes in v4: * (Patch 2) Fix compilation errors found by patchew. Changes in v3: * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle PMEM writes in it, so we don't need the _common function. * (Patch 6) Expose qemu_get_buffer_common so we can remove the unnecessary qemu_get_buffer_to_pmem wrapper. * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle PMEM writes in it, so we can remove the unnecessary xbzrle_decode_buffer_{common, to_pmem}. * Move libpmem stubs to stubs/pmem.c and fix the compilation failures of test-{xbzrle,vmstate}.c. Changes in v2: * (Patch 1) Use a flags parameter in file ram allocation functions. * (Patch 2) Add a new option 'pmem' to hostmem-file. * (Patch 3) Use libpmem to operate on the persistent memory, rather than re-implementing those operations in QEMU. * (Patch 5-8) Consider the write persistence in the migration path. Junyan: [1/7] memory, exec: Expose all memory block related flags. [6/7] migration/ram: Add check and info message to nvdimm post copy. [7/7] migration/ram: ensure write persistence on loading all date to PMEM. Haozhong: [5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation Haozhong & Junyan: [2/7] memory, exec: switch file ram allocation functions to 'flags' parameters [3/7] configure: add libpmem support [4/7] hostmem-file: add the 'pmem' option [4/7] hostmem-file: add the 'pmem' option -- backends/hostmem-file.c | 42 +- configure | 29 + docs/nvdimm.txt | 22 ++ exec.c | 38 +- hw/mem/nvdimm.c | 9 - include/exec/memory.h | 31 +-- include/exec/ram_addr.h | 28 ++-- include/qemu/pmem.h | 24 memory.c| 8 +--- migration/ram.c | 17 + numa.c | 2 +- qemu-options.hx | 7 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 23 +++ 14 files changed, 246 insertions(+), 35 deletions(-) create mode 100644 include/qemu/pmem.h create mode 100644 stubs/pmem.c -- 2.7.4
[Qemu-devel] [PATCH 4/7 V10] hostmem-file: add the 'pmem' option
From: Junyan He When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it needs to know whether the backend storage is a real persistent memory, in order to decide whether special operations should be performed to ensure the data persistence. This boolean option 'pmem' allows users to specify whether the backend storage of memory-backend-file is a real persistent memory. If 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the corresponding memory region. If 'pmem' is set while lack of libpmem support, a error is generated. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- backends/hostmem-file.c | 41 - docs/nvdimm.txt | 22 ++ exec.c | 8 include/exec/memory.h | 4 include/exec/ram_addr.h | 3 +++ qemu-options.hx | 7 +++ 6 files changed, 84 insertions(+), 1 deletion(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 34c68bb..b1a2453 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -12,6 +12,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "qemu-common.h" +#include "qemu/error-report.h" #include "sysemu/hostmem.h" #include "sysemu/sysemu.h" #include "qom/object_interfaces.h" @@ -34,6 +35,7 @@ struct HostMemoryBackendFile { bool discard_data; char *mem_path; uint64_t align; +bool is_pmem; }; static void @@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, backend->size, fb->align, - backend->share ? RAM_SHARED : 0, + (backend->share ? RAM_SHARED : 0) | + (fb->is_pmem ? RAM_PMEM : 0), fb->mem_path, errp); g_free(path); } @@ -131,6 +134,39 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, error_propagate(errp, local_err); } +static bool file_memory_backend_get_pmem(Object *o, Error **errp) +{ +return MEMORY_BACKEND_FILE(o)->is_pmem; +} + +static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property 'pmem' of %s '%s'", + object_get_typename(o), + object_get_canonical_path_component(o)); +return; +} + +#ifndef CONFIG_LIBPMEM +if (value) { +Error *local_err = NULL; +error_setg(&local_err, + "Lack of libpmem support while setting the 'pmem=on'" + " of %s '%s'. We can't ensure data persistence.", + object_get_typename(o), + object_get_canonical_path_component(o)); +error_propagate(errp, local_err); +return; +} +#endif + +fb->is_pmem = value; +} + static void file_backend_unparent(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -162,6 +198,9 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL, &error_abort); +object_class_property_add_bool(oc, "pmem", +file_memory_backend_get_pmem, file_memory_backend_set_pmem, +&error_abort); } static void file_backend_instance_finalize(Object *o) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index 24b443b..48754d2 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -173,3 +173,25 @@ There are currently two valid values for this option: the NVDIMMs in the event of power loss. This implies that the platform also supports flushing dirty data through the memory controller on power loss. + +If the vNVDIMM backend is on the host persistent memory that can be +accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's +suggested to set the 'pmem' option of memory-backend-file to 'on'. When +'pmem' is 'on' and QEMU is built with libpmem [2] support (configured with +--enable-libpmem), QEMU will take necessary operations to guarantee the +persistence of its own writes to the vNVDIMM backend(e.g., in vNVDIMM label +emulation and live migration). If 'pmem' is 'on' while there is no libpmem +support, qemu will exit and report a "lack of libpmem
[Qemu-devel] [PATCH 1/7 V10] memory, exec: Expose all memory block related flags.
From: Junyan He We need to use these flags in other files rather than just in exec.c, For example, RAM_SHARED should be used when create a ram block from file. We expose them the exec/memory.h Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- exec.c| 20 include/exec/memory.h | 20 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/exec.c b/exec.c index 4f5df07..cc042dc 100644 --- a/exec.c +++ b/exec.c @@ -87,26 +87,6 @@ AddressSpace address_space_memory; MemoryRegion io_mem_rom, io_mem_notdirty; static MemoryRegion io_mem_unassigned; - -/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ -#define RAM_PREALLOC (1 << 0) - -/* RAM is mmap-ed with MAP_SHARED */ -#define RAM_SHARED (1 << 1) - -/* Only a portion of RAM (used_length) is actually used, and migrated. - * This used_length size can change across reboots. - */ -#define RAM_RESIZEABLE (1 << 2) - -/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically - * zero the page and wake waiting processes. - * (Set during postcopy) - */ -#define RAM_UF_ZEROPAGE (1 << 3) - -/* RAM can be migrated */ -#define RAM_MIGRATABLE (1 << 4) #endif #ifdef TARGET_PAGE_BITS_VARY diff --git a/include/exec/memory.h b/include/exec/memory.h index 448d41a..6d0af29 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -103,6 +103,26 @@ struct IOMMUNotifier { }; typedef struct IOMMUNotifier IOMMUNotifier; +/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ +#define RAM_PREALLOC (1 << 0) + +/* RAM is mmap-ed with MAP_SHARED */ +#define RAM_SHARED (1 << 1) + +/* Only a portion of RAM (used_length) is actually used, and migrated. + * This used_length size can change across reboots. + */ +#define RAM_RESIZEABLE (1 << 2) + +/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically + * zero the page and wake waiting processes. + * (Set during postcopy) + */ +#define RAM_UF_ZEROPAGE (1 << 3) + +/* RAM can be migrated */ +#define RAM_MIGRATABLE (1 << 4) + static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn, IOMMUNotifierFlag flags, hwaddr start, hwaddr end, -- 2.7.4
[Qemu-devel] [PATCH 2/7 V10] memory, exec: switch file ram allocation functions to 'flags' parameters
From: Junyan He As more flag parameters besides the existing 'share' are going to be added to following functions memory_region_init_ram_from_file qemu_ram_alloc_from_fd qemu_ram_alloc_from_file let's switch them to use the 'flags' parameters so as to ease future flag additions. The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags, and other flag bits are ignored by above functions right now. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- backends/hostmem-file.c | 3 ++- exec.c | 10 +- include/exec/memory.h | 7 +-- include/exec/ram_addr.h | 25 +++-- memory.c| 8 +--- numa.c | 2 +- 6 files changed, 41 insertions(+), 14 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 134b08d..34c68bb 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) path = object_get_canonical_path(OBJECT(backend)); memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, - backend->size, fb->align, backend->share, + backend->size, fb->align, + backend->share ? RAM_SHARED : 0, fb->mem_path, errp); g_free(path); } diff --git a/exec.c b/exec.c index cc042dc..1ec539d 100644 --- a/exec.c +++ b/exec.c @@ -2238,7 +2238,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared) #ifdef __linux__ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, - bool share, int fd, + uint64_t ram_flags, int fd, Error **errp) { RAMBlock *new_block; @@ -2280,14 +2280,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, new_block->mr = mr; new_block->used_length = size; new_block->max_length = size; -new_block->flags = share ? RAM_SHARED : 0; +new_block->flags = ram_flags; new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp); if (!new_block->host) { g_free(new_block); return NULL; } -ram_block_add(new_block, &local_err, share); +ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED); if (local_err) { g_free(new_block); error_propagate(errp, local_err); @@ -2299,7 +2299,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, - bool share, const char *mem_path, + uint64_t ram_flags, const char *mem_path, Error **errp) { int fd; @@ -2311,7 +2311,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, return NULL; } -block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp); +block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp); if (!block) { if (created) { unlink(mem_path); diff --git a/include/exec/memory.h b/include/exec/memory.h index 6d0af29..513ec8d 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -640,6 +640,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, void *host), Error **errp); #ifdef __linux__ + /** * memory_region_init_ram_from_file: Initialize RAM memory region with a *mmap-ed backend. @@ -651,7 +652,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @size: size of the region. * @align: alignment of the region base address; if 0, the default alignment * (getpagesize()) will be used. - * @share: %true if memory must be mmaped with the MAP_SHARED flag + * @ram_flags: Memory region features: + * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag + * Other bits are ignored now. * @path: the path in which to allocate the RAM. * @errp: pointer to Error*, to store an error if it happens. * @@ -663,7 +666,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, const char *name, uint64_t size, uint64_t align, - bool share, + uint64_t ram_flags, const char *path, Error **errp); di
[Qemu-devel] [PATCH 6/7 V9] migration/ram: Add check and info message to nvdimm post copy.
From: Junyan He The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- migration/ram.c | 9 + 1 file changed, 9 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 1cd98d6..9c03e2b 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3895,6 +3895,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { +RAMBlock *rb; +RAMBLOCK_FOREACH(rb) { +if (ramblock_is_pmem(rb)) { +info_report("Block: %s, host: %p is a nvdimm memory, postcopy" + "is not supported now!", rb->idstr, rb->host); +return false; +} +} + return migrate_postcopy_ram(); } -- 2.7.4
[Qemu-devel] [PATCH 7/7 V9] migration/ram: ensure write persistence on loading all data to PMEM.
From: Junyan He Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- include/qemu/pmem.h | 1 + migration/ram.c | 10 +- stubs/pmem.c| 4 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 00d6680..8f52b08 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -17,6 +17,7 @@ #else /* !CONFIG_LIBPMEM */ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); +void pmem_persist(const void *addr, size_t len); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index 9c03e2b..62dfe75 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -33,6 +33,7 @@ #include "qemu/bitops.h" #include "qemu/bitmap.h" #include "qemu/main-loop.h" +#include "qemu/pmem.h" #include "xbzrle.h" #include "ram.h" #include "migration.h" @@ -3541,6 +3542,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque) static int ram_load_cleanup(void *opaque) { RAMBlock *rb; + +RAMBLOCK_FOREACH_MIGRATABLE(rb) { +if (ramblock_is_pmem(rb)) { +pmem_persist(rb->host, rb->used_length); +} +} + xbzrle_load_cleanup(); compress_threads_load_cleanup(); @@ -3896,7 +3904,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { RAMBlock *rb; -RAMBLOCK_FOREACH(rb) { +RAMBLOCK_FOREACH_MIGRATABLE(rb) { if (ramblock_is_pmem(rb)) { info_report("Block: %s, host: %p is a nvdimm memory, postcopy" "is not supported now!", rb->idstr, rb->host); diff --git a/stubs/pmem.c b/stubs/pmem.c index b4ec72d..f794262 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) { return memcpy(pmemdest, src, len); } + +void pmem_persist(const void *addr, size_t len) +{ +} -- 2.7.4
[Qemu-devel] [PATCH 4/7 V9] hostmem-file: add the 'pmem' option
From: Junyan He When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it needs to know whether the backend storage is a real persistent memory, in order to decide whether special operations should be performed to ensure the data persistence. This boolean option 'pmem' allows users to specify whether the backend storage of memory-backend-file is a real persistent memory. If 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the corresponding memory region. If 'pmem' is set while lack of libpmem support, a error is generated. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi --- backends/hostmem-file.c | 41 - docs/nvdimm.txt | 22 ++ exec.c | 8 include/exec/memory.h | 4 include/exec/ram_addr.h | 3 +++ qemu-options.hx | 7 +++ 6 files changed, 84 insertions(+), 1 deletion(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 34c68bb..b1a2453 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -12,6 +12,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "qemu-common.h" +#include "qemu/error-report.h" #include "sysemu/hostmem.h" #include "sysemu/sysemu.h" #include "qom/object_interfaces.h" @@ -34,6 +35,7 @@ struct HostMemoryBackendFile { bool discard_data; char *mem_path; uint64_t align; +bool is_pmem; }; static void @@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, backend->size, fb->align, - backend->share ? RAM_SHARED : 0, + (backend->share ? RAM_SHARED : 0) | + (fb->is_pmem ? RAM_PMEM : 0), fb->mem_path, errp); g_free(path); } @@ -131,6 +134,39 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, error_propagate(errp, local_err); } +static bool file_memory_backend_get_pmem(Object *o, Error **errp) +{ +return MEMORY_BACKEND_FILE(o)->is_pmem; +} + +static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property 'pmem' of %s '%s'", + object_get_typename(o), + object_get_canonical_path_component(o)); +return; +} + +#ifndef CONFIG_LIBPMEM +if (value) { +Error *local_err = NULL; +error_setg(&local_err, + "Lack of libpmem support while setting the 'pmem=on'" + " of %s '%s'. We can't ensure data persistence.", + object_get_typename(o), + object_get_canonical_path_component(o)); +error_propagate(errp, local_err); +return; +} +#endif + +fb->is_pmem = value; +} + static void file_backend_unparent(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -162,6 +198,9 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL, &error_abort); +object_class_property_add_bool(oc, "pmem", +file_memory_backend_get_pmem, file_memory_backend_set_pmem, +&error_abort); } static void file_backend_instance_finalize(Object *o) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index 24b443b..8c83baf 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -173,3 +173,25 @@ There are currently two valid values for this option: the NVDIMMs in the event of power loss. This implies that the platform also supports flushing dirty data through the memory controller on power loss. + +If the vNVDIMM backend is on the host persistent memory that can be +accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's +suggested to set the 'pmem' option of memory-backend-file to 'on'. When +'pmem' is 'on' and QEMU is built with libpmem [2] support (configured with +--enable-libpmem), QEMU will take necessary operations to guarantee the +persistence of its own writes to the vNVDIMM backend(e.g., in vNVDIMM label +emulation and live migration). If 'pmem' is 'on' while there is no libpmem +support, qemu will exit and report a "lack of libpmem support" message to +ens
[Qemu-devel] [PATCH 5/7 V9] mem/nvdimm: ensure write persistence to PMEM in label emulation
From: Junyan He Guest writes to vNVDIMM labels are intercepted and performed on the backend by QEMU. When the backend is a real persistent memort, QEMU needs to take proper operations to ensure its write persistence on the persistent memory. Otherwise, a host power failure may result in the loss of guest label configurations. Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- hw/mem/nvdimm.c | 9 - include/qemu/pmem.h | 23 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 19 +++ 4 files changed, 51 insertions(+), 1 deletion(-) create mode 100644 include/qemu/pmem.h create mode 100644 stubs/pmem.c diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 021d1c3..1c6674c 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "qemu/pmem.h" #include "qapi/error.h" #include "qapi/visitor.h" #include "hw/mem/nvdimm.h" @@ -164,11 +165,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, { MemoryRegion *mr; PCDIMMDevice *dimm = PC_DIMM(nvdimm); +bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), +"pmem", NULL); uint64_t backend_offset; nvdimm_validate_rw_label_data(nvdimm, size, offset); -memcpy(nvdimm->label_data + offset, buf, size); +if (!is_pmem) { +memcpy(nvdimm->label_data + offset, buf, size); +} else { +pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); +} mr = host_memory_backend_get_memory(dimm->hostmem); backend_offset = memory_region_size(mr) - nvdimm->label_size + offset; diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h new file mode 100644 index 000..00d6680 --- /dev/null +++ b/include/qemu/pmem.h @@ -0,0 +1,23 @@ +/* + * QEMU header file for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_PMEM_H +#define QEMU_PMEM_H + +#ifdef CONFIG_LIBPMEM +#include +#else /* !CONFIG_LIBPMEM */ + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); + +#endif /* CONFIG_LIBPMEM */ + +#endif /* !QEMU_PMEM_H */ diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 53d3f32..be9a042 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -43,3 +43,4 @@ stub-obj-y += xen-common.o stub-obj-y += xen-hvm.o stub-obj-y += pci-host-piix.o stub-obj-y += ram-block.o +stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o \ No newline at end of file diff --git a/stubs/pmem.c b/stubs/pmem.c new file mode 100644 index 000..b4ec72d --- /dev/null +++ b/stubs/pmem.c @@ -0,0 +1,19 @@ +/* + * Stubs for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include + +#include "qemu/pmem.h" + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) +{ +return memcpy(pmemdest, src, len); +} -- 2.7.4
[Qemu-devel] [PATCH 2/7 V9] memory, exec: switch file ram allocation functions to 'flags' parameters
From: Junyan He As more flag parameters besides the existing 'share' are going to be added to following functions memory_region_init_ram_from_file qemu_ram_alloc_from_fd qemu_ram_alloc_from_file let's switch them to use the 'flags' parameters so as to ease future flag additions. The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags, and other flag bits are ignored by above functions right now. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- backends/hostmem-file.c | 3 ++- exec.c | 10 +- include/exec/memory.h | 7 +-- include/exec/ram_addr.h | 25 +++-- memory.c| 8 +--- numa.c | 2 +- 6 files changed, 41 insertions(+), 14 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 134b08d..34c68bb 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) path = object_get_canonical_path(OBJECT(backend)); memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, - backend->size, fb->align, backend->share, + backend->size, fb->align, + backend->share ? RAM_SHARED : 0, fb->mem_path, errp); g_free(path); } diff --git a/exec.c b/exec.c index cc042dc..1ec539d 100644 --- a/exec.c +++ b/exec.c @@ -2238,7 +2238,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared) #ifdef __linux__ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, - bool share, int fd, + uint64_t ram_flags, int fd, Error **errp) { RAMBlock *new_block; @@ -2280,14 +2280,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, new_block->mr = mr; new_block->used_length = size; new_block->max_length = size; -new_block->flags = share ? RAM_SHARED : 0; +new_block->flags = ram_flags; new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp); if (!new_block->host) { g_free(new_block); return NULL; } -ram_block_add(new_block, &local_err, share); +ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED); if (local_err) { g_free(new_block); error_propagate(errp, local_err); @@ -2299,7 +2299,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, - bool share, const char *mem_path, + uint64_t ram_flags, const char *mem_path, Error **errp) { int fd; @@ -2311,7 +2311,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, return NULL; } -block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp); +block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp); if (!block) { if (created) { unlink(mem_path); diff --git a/include/exec/memory.h b/include/exec/memory.h index 6d0af29..513ec8d 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -640,6 +640,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, void *host), Error **errp); #ifdef __linux__ + /** * memory_region_init_ram_from_file: Initialize RAM memory region with a *mmap-ed backend. @@ -651,7 +652,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @size: size of the region. * @align: alignment of the region base address; if 0, the default alignment * (getpagesize()) will be used. - * @share: %true if memory must be mmaped with the MAP_SHARED flag + * @ram_flags: Memory region features: + * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag + * Other bits are ignored now. * @path: the path in which to allocate the RAM. * @errp: pointer to Error*, to store an error if it happens. * @@ -663,7 +666,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, const char *name, uint64_t size, uint64_t align, - bool share, + uint64_t ram_flags, const char *path, Error **errp); di
[Qemu-devel] [PATCH 3/7 V9] configure: add libpmem support
From: Junyan He Add a pair of configure options --{enable,disable}-libpmem to control whether QEMU is compiled with PMDK libpmem [1]. QEMU may write to the host persistent memory (e.g. in vNVDIMM label emulation and live migration), so it must take the proper operations to ensure the persistence of its own writes. Depending on the CPU models and available instructions, the optimal operation can vary [2]. PMDK libpmem have already implemented those operations on multiple CPU models (x86 and ARM) and the logic to select the optimal ones, so QEMU can just use libpmem rather than re-implement them. Libpem is a part of PMDK project(formerly known as NMVL). The project's home page is: http://pmem.io/pmdk/ And the project's repository is: https://github.com/pmem/pmdk/ For more information about libpmem APIs, you can refer to the comments in source code of: pmdk/src/libpmem/pmem.c, begin at line 33. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- configure | 29 + 1 file changed, 29 insertions(+) diff --git a/configure b/configure index 2a7796e..1c9288b 100755 --- a/configure +++ b/configure @@ -475,6 +475,7 @@ vxhs="" libxml2="" docker="no" debug_mutex="no" +libpmem="" # cross compilers defaults, can be overridden with --cross-cc-ARCH cross_cc_aarch64="aarch64-linux-gnu-gcc" @@ -1435,6 +1436,10 @@ for opt do ;; --disable-debug-mutex) debug_mutex=no ;; + --enable-libpmem) libpmem=yes + ;; + --disable-libpmem) libpmem=no + ;; *) echo "ERROR: unknown option $opt" echo "Try '$0 --help' for more information" @@ -1710,6 +1715,7 @@ disabled with --disable-FEATURE, default is enabled if available: vhost-user vhost-user support capstonecapstone disassembler support debug-mutex mutex debugging support + libpmem libpmem support NOTE: The object files are built at the place where configure is launched EOF @@ -5546,6 +5552,24 @@ if has "docker"; then fi ## +# check for libpmem + +if test "$libpmem" != "no"; then + if $pkg_config --exists "libpmem"; then + libpmem="yes" + libpmem_libs=$($pkg_config --libs libpmem) + libpmem_cflags=$($pkg_config --cflags libpmem) + libs_softmmu="$libs_softmmu $libpmem_libs" + QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags" + else + if test "$libpmem" = "yes" ; then + feature_not_found "libpmem" "Install nvml or pmdk" + fi + libpmem="no" + fi +fi + +## # End of CC checks # After here, no more $cc or $ld runs @@ -6010,6 +6034,7 @@ echo "replication support $replication" echo "VxHS block device $vxhs" echo "capstone $capstone" echo "docker$docker" +echo "libpmem support $libpmem" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" @@ -6763,6 +6788,10 @@ if test "$vxhs" = "yes" ; then echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak fi +if test "$libpmem" = "yes" ; then + echo "CONFIG_LIBPMEM=y" >> $config_host_mak +fi + if test "$tcg_interpreter" = "yes"; then QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES" elif test "$ARCH" = "sparc64" ; then -- 2.7.4
[Qemu-devel] [PATCH 0/7 V9] nvdimm: guarantee persistence of QEMU writes to persistent memory
From: Junyan He QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live migration. If the backend is on the persistent memory, QEMU needs to take proper operations to ensure its writes persistent on the persistent memory. Otherwise, a host power failure may result in the loss the guest data on the persistent memory. This patch series is based on Marcel's patch "mem: add share parameter to memory-backend-ram" [1] because of the changes in patch 1. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html Previous versions of this patch series can be found at: v8: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02279.html v7: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02997.html v6: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00061.html v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html Changes in v9: * (Patch 3 and Patch 4) Reorder these two patches to make logic right. Firstly add libpmem support, and then we can use libpmem's configure check result. Also fix some typo and grammar issues in these two patches. Changs in v8: * (Patch 3) Report a error when user set 'pmem' to file-backend, while the qemu is lack of libpmem support. In this case, we can not ensure the persistence of the file-backend, so we choose to fail the build rather than contine and make the thing more confused. Changes in v7: The v6 patch set has already reviewed by Stefan Hajnoczi No logic change in this v7 version, just: * Spelling check and some document words refined. * Rebase to "ram is migratable" patch set. Changes in v6: * (Patch 1) Expose all ram block flags rather than redefine the flags. * (Patch 4) Use pkg-config rather the hard check when configure. * (Patch 7) Sync and flush all the pmem data when migration completes, rather than sync pages one by one in previous version. Changes in v5: * (Patch 9) Add post copy check and output some messages for nvdimm. Changes in v4: * (Patch 2) Fix compilation errors found by patchew. Changes in v3: * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle PMEM writes in it, so we don't need the _common function. * (Patch 6) Expose qemu_get_buffer_common so we can remove the unnecessary qemu_get_buffer_to_pmem wrapper. * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle PMEM writes in it, so we can remove the unnecessary xbzrle_decode_buffer_{common, to_pmem}. * Move libpmem stubs to stubs/pmem.c and fix the compilation failures of test-{xbzrle,vmstate}.c. Changes in v2: * (Patch 1) Use a flags parameter in file ram allocation functions. * (Patch 2) Add a new option 'pmem' to hostmem-file. * (Patch 3) Use libpmem to operate on the persistent memory, rather than re-implementing those operations in QEMU. * (Patch 5-8) Consider the write persistence in the migration path. Junyan: [1/7] memory, exec: Expose all memory block related flags. [6/7] migration/ram: Add check and info message to nvdimm post copy. [7/7] migration/ram: ensure write persistence on loading all date to PMEM. Haozhong: [5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation Haozhong & Junyan: [2/7] memory, exec: switch file ram allocation functions to 'flags' parameters [3/7] configure: add libpmem support [4/7] hostmem-file: add the 'pmem' option Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi backends/hostmem-file.c | 42 +- configure | 29 + docs/nvdimm.txt | 22 ++ exec.c | 38 +- hw/mem/nvdimm.c | 9 - include/exec/memory.h | 31 +-- include/exec/ram_addr.h | 28 ++-- include/qemu/pmem.h | 24 memory.c| 8 +--- migration/ram.c | 17 + numa.c | 2 +- qemu-options.hx | 7 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 23 +++ 14 files changed, 246 insertions(+), 35 deletions(-) create mode 100644 include/qemu/pmem.h create mode 100644 stubs/pmem.c -- 2.7.4
[Qemu-devel] [PATCH 1/7 V9] memory, exec: Expose all memory block related flags.
From: Junyan He We need to use these flags in other files rather than just in exec.c, For example, RAM_SHARED should be used when create a ram block from file. We expose them the exec/memory.h Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- exec.c| 20 include/exec/memory.h | 20 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/exec.c b/exec.c index 4f5df07..cc042dc 100644 --- a/exec.c +++ b/exec.c @@ -87,26 +87,6 @@ AddressSpace address_space_memory; MemoryRegion io_mem_rom, io_mem_notdirty; static MemoryRegion io_mem_unassigned; - -/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ -#define RAM_PREALLOC (1 << 0) - -/* RAM is mmap-ed with MAP_SHARED */ -#define RAM_SHARED (1 << 1) - -/* Only a portion of RAM (used_length) is actually used, and migrated. - * This used_length size can change across reboots. - */ -#define RAM_RESIZEABLE (1 << 2) - -/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically - * zero the page and wake waiting processes. - * (Set during postcopy) - */ -#define RAM_UF_ZEROPAGE (1 << 3) - -/* RAM can be migrated */ -#define RAM_MIGRATABLE (1 << 4) #endif #ifdef TARGET_PAGE_BITS_VARY diff --git a/include/exec/memory.h b/include/exec/memory.h index 448d41a..6d0af29 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -103,6 +103,26 @@ struct IOMMUNotifier { }; typedef struct IOMMUNotifier IOMMUNotifier; +/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ +#define RAM_PREALLOC (1 << 0) + +/* RAM is mmap-ed with MAP_SHARED */ +#define RAM_SHARED (1 << 1) + +/* Only a portion of RAM (used_length) is actually used, and migrated. + * This used_length size can change across reboots. + */ +#define RAM_RESIZEABLE (1 << 2) + +/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically + * zero the page and wake waiting processes. + * (Set during postcopy) + */ +#define RAM_UF_ZEROPAGE (1 << 3) + +/* RAM can be migrated */ +#define RAM_MIGRATABLE (1 << 4) + static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn, IOMMUNotifierFlag flags, hwaddr start, hwaddr end, -- 2.7.4
[Qemu-devel] [PATCH 4/7 V8] configure: add libpmem support
From: Junyan He Add a pair of configure options --{enable,disable}-libpmem to control whether QEMU is compiled with PMDK libpmem [1]. QEMU may write to the host persistent memory (e.g. in vNVDIMM label emulation and live migration), so it must take the proper operations to ensure the persistence of its own writes. Depending on the CPU models and available instructions, the optimal operation can vary [2]. PMDK libpmem have already implemented those operations on multiple CPU models (x86 and ARM) and the logic to select the optimal ones, so QEMU can just use libpmem rather than re-implement them. [1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/ [2] https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33 Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- configure | 29 + 1 file changed, 29 insertions(+) diff --git a/configure b/configure index 2a7796e..1c9288b 100755 --- a/configure +++ b/configure @@ -475,6 +475,7 @@ vxhs="" libxml2="" docker="no" debug_mutex="no" +libpmem="" # cross compilers defaults, can be overridden with --cross-cc-ARCH cross_cc_aarch64="aarch64-linux-gnu-gcc" @@ -1435,6 +1436,10 @@ for opt do ;; --disable-debug-mutex) debug_mutex=no ;; + --enable-libpmem) libpmem=yes + ;; + --disable-libpmem) libpmem=no + ;; *) echo "ERROR: unknown option $opt" echo "Try '$0 --help' for more information" @@ -1710,6 +1715,7 @@ disabled with --disable-FEATURE, default is enabled if available: vhost-user vhost-user support capstonecapstone disassembler support debug-mutex mutex debugging support + libpmem libpmem support NOTE: The object files are built at the place where configure is launched EOF @@ -5546,6 +5552,24 @@ if has "docker"; then fi ## +# check for libpmem + +if test "$libpmem" != "no"; then + if $pkg_config --exists "libpmem"; then + libpmem="yes" + libpmem_libs=$($pkg_config --libs libpmem) + libpmem_cflags=$($pkg_config --cflags libpmem) + libs_softmmu="$libs_softmmu $libpmem_libs" + QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags" + else + if test "$libpmem" = "yes" ; then + feature_not_found "libpmem" "Install nvml or pmdk" + fi + libpmem="no" + fi +fi + +## # End of CC checks # After here, no more $cc or $ld runs @@ -6010,6 +6034,7 @@ echo "replication support $replication" echo "VxHS block device $vxhs" echo "capstone $capstone" echo "docker$docker" +echo "libpmem support $libpmem" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" @@ -6763,6 +6788,10 @@ if test "$vxhs" = "yes" ; then echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak fi +if test "$libpmem" = "yes" ; then + echo "CONFIG_LIBPMEM=y" >> $config_host_mak +fi + if test "$tcg_interpreter" = "yes"; then QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES" elif test "$ARCH" = "sparc64" ; then -- 2.7.4
[Qemu-devel] [PATCH 3/7 V8] hostmem-file: add the 'pmem' option
From: Junyan He When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it needs to know whether the backend storage is a real persistent memory, in order to decide whether special operations should be performed to ensure the data persistence. This boolean option 'pmem' allows users to specify whether the backend storage of memory-backend-file is a real persistent memory. If 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the corresponding memory region. If 'pmem' is set while lack of libpmem support, a error is generated. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi --- backends/hostmem-file.c | 42 +- docs/nvdimm.txt | 23 +++ exec.c | 9 + include/exec/memory.h | 4 include/exec/ram_addr.h | 3 +++ qemu-options.hx | 7 +++ 6 files changed, 87 insertions(+), 1 deletion(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 34c68bb..dbdaf17 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -12,6 +12,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "qemu-common.h" +#include "qemu/error-report.h" #include "sysemu/hostmem.h" #include "sysemu/sysemu.h" #include "qom/object_interfaces.h" @@ -34,6 +35,7 @@ struct HostMemoryBackendFile { bool discard_data; char *mem_path; uint64_t align; +bool is_pmem; }; static void @@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, backend->size, fb->align, - backend->share ? RAM_SHARED : 0, + (backend->share ? RAM_SHARED : 0) | + (fb->is_pmem ? RAM_PMEM : 0), fb->mem_path, errp); g_free(path); } @@ -131,6 +134,40 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, error_propagate(errp, local_err); } +static bool file_memory_backend_get_pmem(Object *o, Error **errp) +{ +return MEMORY_BACKEND_FILE(o)->is_pmem; +} + +static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property 'pmem' of %s '%s'", + object_get_typename(o), + object_get_canonical_path_component(o)); +return; +} + +#ifndef CONFIG_LIBPMEM +if (value) { +Error *local_err = NULL; +error_setg(&local_err, + "Lack of libpmem support while setting the 'pmem=on'" + " of %s '%s'. We can not ensure the persistence of it" + " without libpmem support, this may cause serious" + " problems." , object_get_typename(o), + object_get_canonical_path_component(o)); +error_propagate(errp, local_err); +return; +} +#endif + +fb->is_pmem = value; +} + static void file_backend_unparent(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -162,6 +199,9 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL, &error_abort); +object_class_property_add_bool(oc, "pmem", +file_memory_backend_get_pmem, file_memory_backend_set_pmem, +&error_abort); } static void file_backend_instance_finalize(Object *o) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index 24b443b..b8bb43a 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -173,3 +173,26 @@ There are currently two valid values for this option: the NVDIMMs in the event of power loss. This implies that the platform also supports flushing dirty data through the memory controller on power loss. + +guest software that this vNVDIMM device contains a region that cannot +accept persistent writes. In result, for example, the guest Linux +NVDIMM driver, marks such vNVDIMM device as read-only. + +If the vNVDIMM backend is on the host persistent memory that can be +accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's +suggested to set the 'pmem' option of memory-backend-file to 'on'. When +'pmem' is 'on' and QEMU is built with libpmem [2] support (configured with +--enable-libpmem), QEMU will take neces
[Qemu-devel] [PATCH 7/7 V8] migration/ram: ensure write persistence on loading all data to PMEM.
From: Junyan He Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- include/qemu/pmem.h | 1 + migration/ram.c | 10 +- stubs/pmem.c| 4 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 00d6680..8f52b08 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -17,6 +17,7 @@ #else /* !CONFIG_LIBPMEM */ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); +void pmem_persist(const void *addr, size_t len); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index 9c03e2b..62dfe75 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -33,6 +33,7 @@ #include "qemu/bitops.h" #include "qemu/bitmap.h" #include "qemu/main-loop.h" +#include "qemu/pmem.h" #include "xbzrle.h" #include "ram.h" #include "migration.h" @@ -3541,6 +3542,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque) static int ram_load_cleanup(void *opaque) { RAMBlock *rb; + +RAMBLOCK_FOREACH_MIGRATABLE(rb) { +if (ramblock_is_pmem(rb)) { +pmem_persist(rb->host, rb->used_length); +} +} + xbzrle_load_cleanup(); compress_threads_load_cleanup(); @@ -3896,7 +3904,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { RAMBlock *rb; -RAMBLOCK_FOREACH(rb) { +RAMBLOCK_FOREACH_MIGRATABLE(rb) { if (ramblock_is_pmem(rb)) { info_report("Block: %s, host: %p is a nvdimm memory, postcopy" "is not supported now!", rb->idstr, rb->host); diff --git a/stubs/pmem.c b/stubs/pmem.c index b4ec72d..f794262 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) { return memcpy(pmemdest, src, len); } + +void pmem_persist(const void *addr, size_t len) +{ +} -- 2.7.4
[Qemu-devel] [PATCH 1/7 V8] memory, exec: Expose all memory block related flags.
From: Junyan He We need to use these flags in other files rather than just in exec.c, For example, RAM_SHARED should be used when create a ram block from file. We expose them the exec/memory.h Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- exec.c| 20 include/exec/memory.h | 20 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/exec.c b/exec.c index 4f5df07..cc042dc 100644 --- a/exec.c +++ b/exec.c @@ -87,26 +87,6 @@ AddressSpace address_space_memory; MemoryRegion io_mem_rom, io_mem_notdirty; static MemoryRegion io_mem_unassigned; - -/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ -#define RAM_PREALLOC (1 << 0) - -/* RAM is mmap-ed with MAP_SHARED */ -#define RAM_SHARED (1 << 1) - -/* Only a portion of RAM (used_length) is actually used, and migrated. - * This used_length size can change across reboots. - */ -#define RAM_RESIZEABLE (1 << 2) - -/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically - * zero the page and wake waiting processes. - * (Set during postcopy) - */ -#define RAM_UF_ZEROPAGE (1 << 3) - -/* RAM can be migrated */ -#define RAM_MIGRATABLE (1 << 4) #endif #ifdef TARGET_PAGE_BITS_VARY diff --git a/include/exec/memory.h b/include/exec/memory.h index 448d41a..6d0af29 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -103,6 +103,26 @@ struct IOMMUNotifier { }; typedef struct IOMMUNotifier IOMMUNotifier; +/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ +#define RAM_PREALLOC (1 << 0) + +/* RAM is mmap-ed with MAP_SHARED */ +#define RAM_SHARED (1 << 1) + +/* Only a portion of RAM (used_length) is actually used, and migrated. + * This used_length size can change across reboots. + */ +#define RAM_RESIZEABLE (1 << 2) + +/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically + * zero the page and wake waiting processes. + * (Set during postcopy) + */ +#define RAM_UF_ZEROPAGE (1 << 3) + +/* RAM can be migrated */ +#define RAM_MIGRATABLE (1 << 4) + static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn, IOMMUNotifierFlag flags, hwaddr start, hwaddr end, -- 2.7.4
[Qemu-devel] [PATCH 5/7 V8] mem/nvdimm: ensure write persistence to PMEM in label emulation
From: Junyan He Guest writes to vNVDIMM labels are intercepted and performed on the backend by QEMU. When the backend is a real persistent memort, QEMU needs to take proper operations to ensure its write persistence on the persistent memory. Otherwise, a host power failure may result in the loss of guest label configurations. Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- hw/mem/nvdimm.c | 9 - include/qemu/pmem.h | 23 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 19 +++ 4 files changed, 51 insertions(+), 1 deletion(-) create mode 100644 include/qemu/pmem.h create mode 100644 stubs/pmem.c diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 021d1c3..1c6674c 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "qemu/pmem.h" #include "qapi/error.h" #include "qapi/visitor.h" #include "hw/mem/nvdimm.h" @@ -164,11 +165,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, { MemoryRegion *mr; PCDIMMDevice *dimm = PC_DIMM(nvdimm); +bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), +"pmem", NULL); uint64_t backend_offset; nvdimm_validate_rw_label_data(nvdimm, size, offset); -memcpy(nvdimm->label_data + offset, buf, size); +if (!is_pmem) { +memcpy(nvdimm->label_data + offset, buf, size); +} else { +pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); +} mr = host_memory_backend_get_memory(dimm->hostmem); backend_offset = memory_region_size(mr) - nvdimm->label_size + offset; diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h new file mode 100644 index 000..00d6680 --- /dev/null +++ b/include/qemu/pmem.h @@ -0,0 +1,23 @@ +/* + * QEMU header file for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_PMEM_H +#define QEMU_PMEM_H + +#ifdef CONFIG_LIBPMEM +#include +#else /* !CONFIG_LIBPMEM */ + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); + +#endif /* CONFIG_LIBPMEM */ + +#endif /* !QEMU_PMEM_H */ diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 53d3f32..be9a042 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -43,3 +43,4 @@ stub-obj-y += xen-common.o stub-obj-y += xen-hvm.o stub-obj-y += pci-host-piix.o stub-obj-y += ram-block.o +stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o \ No newline at end of file diff --git a/stubs/pmem.c b/stubs/pmem.c new file mode 100644 index 000..b4ec72d --- /dev/null +++ b/stubs/pmem.c @@ -0,0 +1,19 @@ +/* + * Stubs for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include + +#include "qemu/pmem.h" + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) +{ +return memcpy(pmemdest, src, len); +} -- 2.7.4
[Qemu-devel] [PATCH 0/7 V8] nvdimm: guarantee persistence of QEMU writes to persistent memory
From: Junyan He QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live migration. If the backend is on the persistent memory, QEMU needs to take proper operations to ensure its writes persistent on the persistent memory. Otherwise, a host power failure may result in the loss the guest data on the persistent memory. This v3 patch series is based on Marcel's patch "mem: add share parameter to memory-backend-ram" [1] because of the changes in patch 1. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html Previous versions can be found at: v7: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02997.html v6: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00061.html v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html Changs in v8: * (Patch 3) Report a error when user set 'pmem' to file-backend, while the qemu is lack of libpmem support. In this case, we can not ensure the persistence of the file-backend, so we choose to fail the build rather than contine and make the thing more confused. Changes in v7: The v6 patch set has already reviewed by Stefan Hajnoczi No logic change in this v7 version, just: * Spelling check and some document words refined. * Rebase to "ram is migratable" patch set. Changes in v6: * (Patch 1) Expose all ram block flags rather than redefine the flags. * (Patch 4) Use pkg-config rather the hard check when configure. * (Patch 7) Sync and flush all the pmem data when migration completes, rather than sync pages one by one in previous version. Changes in v5: * (Patch 9) Add post copy check and output some messages for nvdimm. Changes in v4: * (Patch 2) Fix compilation errors found by patchew. Changes in v3: * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle PMEM writes in it, so we don't need the _common function. * (Patch 6) Expose qemu_get_buffer_common so we can remove the unnecessary qemu_get_buffer_to_pmem wrapper. * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle PMEM writes in it, so we can remove the unnecessary xbzrle_decode_buffer_{common, to_pmem}. * Move libpmem stubs to stubs/pmem.c and fix the compilation failures of test-{xbzrle,vmstate}.c. Changes in v2: * (Patch 1) Use a flags parameter in file ram allocation functions. * (Patch 2) Add a new option 'pmem' to hostmem-file. * (Patch 3) Use libpmem to operate on the persistent memory, rather than re-implementing those operations in QEMU. * (Patch 5-8) Consider the write persistence in the migration path. Junyan: [1/7] memory, exec: Expose all memory block related flags. [6/7] migration/ram: Add check and info message to nvdimm post copy. [7/7] migration/ram: ensure write persistence on loading all date to PMEM. Haozhong: [5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation Haozhong & Junyan: [2/7] memory, exec: switch file ram allocation functions to 'flags' parameters [3/7] hostmem-file: add the 'pmem' option [4/7] configure: add libpmem support Signed-off-by: Haozhong Zhang Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi backends/hostmem-file.c | 43 ++- configure | 29 + docs/nvdimm.txt | 23 +++ exec.c | 39 ++- hw/mem/nvdimm.c | 9 - include/exec/memory.h | 31 +-- include/exec/ram_addr.h | 28 ++-- include/qemu/pmem.h | 24 memory.c| 8 +--- migration/ram.c | 17 + numa.c | 2 +- qemu-options.hx | 7 +++ stubs/Makefile.objs | 1 + tubs/pmem.c| 23 +++ 14 files changed, 249 insertions(+), 35 deletions(-) create mode 100644 include/qemu/pmem.h create mode 100644 stubs/pmem.c -- 2.7.4
[Qemu-devel] [PATCH 2/7 V8] memory, exec: switch file ram allocation functions to 'flags' parameters
From: Junyan He As more flag parameters besides the existing 'share' are going to be added to following functions memory_region_init_ram_from_file qemu_ram_alloc_from_fd qemu_ram_alloc_from_file let's switch them to use the 'flags' parameters so as to ease future flag additions. The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags, and other flag bits are ignored by above functions right now. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- backends/hostmem-file.c | 3 ++- exec.c | 10 +- include/exec/memory.h | 7 +-- include/exec/ram_addr.h | 25 +++-- memory.c| 8 +--- numa.c | 2 +- 6 files changed, 41 insertions(+), 14 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 134b08d..34c68bb 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) path = object_get_canonical_path(OBJECT(backend)); memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, - backend->size, fb->align, backend->share, + backend->size, fb->align, + backend->share ? RAM_SHARED : 0, fb->mem_path, errp); g_free(path); } diff --git a/exec.c b/exec.c index cc042dc..1ec539d 100644 --- a/exec.c +++ b/exec.c @@ -2238,7 +2238,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared) #ifdef __linux__ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, - bool share, int fd, + uint64_t ram_flags, int fd, Error **errp) { RAMBlock *new_block; @@ -2280,14 +2280,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, new_block->mr = mr; new_block->used_length = size; new_block->max_length = size; -new_block->flags = share ? RAM_SHARED : 0; +new_block->flags = ram_flags; new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp); if (!new_block->host) { g_free(new_block); return NULL; } -ram_block_add(new_block, &local_err, share); +ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED); if (local_err) { g_free(new_block); error_propagate(errp, local_err); @@ -2299,7 +2299,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, - bool share, const char *mem_path, + uint64_t ram_flags, const char *mem_path, Error **errp) { int fd; @@ -2311,7 +2311,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, return NULL; } -block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp); +block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp); if (!block) { if (created) { unlink(mem_path); diff --git a/include/exec/memory.h b/include/exec/memory.h index 6d0af29..513ec8d 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -640,6 +640,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, void *host), Error **errp); #ifdef __linux__ + /** * memory_region_init_ram_from_file: Initialize RAM memory region with a *mmap-ed backend. @@ -651,7 +652,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @size: size of the region. * @align: alignment of the region base address; if 0, the default alignment * (getpagesize()) will be used. - * @share: %true if memory must be mmaped with the MAP_SHARED flag + * @ram_flags: Memory region features: + * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag + * Other bits are ignored now. * @path: the path in which to allocate the RAM. * @errp: pointer to Error*, to store an error if it happens. * @@ -663,7 +666,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, const char *name, uint64_t size, uint64_t align, - bool share, + uint64_t ram_flags, const char *path, Error **errp); di
[Qemu-devel] [PATCH 6/7 V8] migration/ram: Add check and info message to nvdimm post copy.
From: Junyan He The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov --- migration/ram.c | 9 + 1 file changed, 9 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 1cd98d6..9c03e2b 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3895,6 +3895,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { +RAMBlock *rb; +RAMBLOCK_FOREACH(rb) { +if (ramblock_is_pmem(rb)) { +info_report("Block: %s, host: %p is a nvdimm memory, postcopy" + "is not supported now!", rb->idstr, rb->host); +return false; +} +} + return migrate_postcopy_ram(); } -- 2.7.4
[Qemu-devel] [PATCH 3/7 V7] hostmem-file: add the 'pmem' option
From: Junyan He When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it needs to know whether the backend storage is a real persistent memory, in order to decide whether special operations should be performed to ensure the data persistence. This boolean option 'pmem' allows users to specify whether the backend storage of memory-backend-file is a real persistent memory. If 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the corresponding memory region. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi *RESEND: If pmem is on while we lack of libpmem support, we just make qemu exit and print some error message to user. This can prevent misusing of pmem parameter while we can not really ensure the persistence. --- backends/hostmem-file.c | 39 ++- docs/nvdimm.txt | 18 ++ exec.c | 9 + include/exec/memory.h | 4 include/exec/ram_addr.h | 3 +++ qemu-options.hx | 7 +++ 6 files changed, 79 insertions(+), 1 deletion(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 34c68bb..4607651 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -12,6 +12,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "qemu-common.h" +#include "qemu/error-report.h" #include "sysemu/hostmem.h" #include "sysemu/sysemu.h" #include "qom/object_interfaces.h" @@ -34,6 +35,7 @@ struct HostMemoryBackendFile { bool discard_data; char *mem_path; uint64_t align; +bool is_pmem; }; static void @@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, backend->size, fb->align, - backend->share ? RAM_SHARED : 0, + (backend->share ? RAM_SHARED : 0) | + (fb->is_pmem ? RAM_PMEM : 0), fb->mem_path, errp); g_free(path); } @@ -131,6 +134,37 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, error_propagate(errp, local_err); } +static bool file_memory_backend_get_pmem(Object *o, Error **errp) +{ +return MEMORY_BACKEND_FILE(o)->is_pmem; +} + +static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property 'pmem' of %s '%s'", + object_get_typename(o), + object_get_canonical_path_component(o)); +return; +} + +#ifndef CONFIG_LIBPMEM +if (value) { +error_report("Lack of libpmem support while setting the 'pmem=on'" + " of %s '%s'. We can not ensure the persistence of it" + " without libpmem support, this may cause serious" + " problems." , object_get_typename(o), + object_get_canonical_path_component(o)); +exit(1); +} +#endif + +fb->is_pmem = value; +} + static void file_backend_unparent(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -162,6 +196,9 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL, &error_abort); +object_class_property_add_bool(oc, "pmem", +file_memory_backend_get_pmem, file_memory_backend_set_pmem, +&error_abort); } static void file_backend_instance_finalize(Object *o) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index 8b48fb4..2f7d348 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -180,3 +180,21 @@ supports CPU Cache Flush and Memory Controller Flush on Power Loss, etc. For a complete list of the flags available and for more detailed descriptions, please consult the ACPI spec. + +guest software that this vNVDIMM device contains a region that cannot +accept persistent writes. In result, for example, the guest Linux +NVDIMM driver, marks such vNVDIMM device as read-only. + +If the vNVDIMM backend is on the host persistent memory that can be +accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's +suggested to set the 'pmem' option of memory-backend-file to 'on'. When +'pmem=on' and QEMU is built with libpmem [2] support (configured with +--enable-libpmem), QEMU will take necessary operations to guarantee +the persistence o
[Qemu-devel] [PATCH 3/7 V7 RESEND] hostmem-file: add the 'pmem' option
From: Junyan He When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it needs to know whether the backend storage is a real persistent memory, in order to decide whether special operations should be performed to ensure the data persistence. This boolean option 'pmem' allows users to specify whether the backend storage of memory-backend-file is a real persistent memory. If 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the corresponding memory region. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi --- backends/hostmem-file.c | 39 ++- docs/nvdimm.txt | 18 ++ exec.c | 9 + include/exec/memory.h | 4 include/exec/ram_addr.h | 3 +++ qemu-options.hx | 7 +++ 6 files changed, 79 insertions(+), 1 deletion(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 34c68bb..4607651 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -12,6 +12,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "qemu-common.h" +#include "qemu/error-report.h" #include "sysemu/hostmem.h" #include "sysemu/sysemu.h" #include "qom/object_interfaces.h" @@ -34,6 +35,7 @@ struct HostMemoryBackendFile { bool discard_data; char *mem_path; uint64_t align; +bool is_pmem; }; static void @@ -59,7 +61,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, backend->size, fb->align, - backend->share ? RAM_SHARED : 0, + (backend->share ? RAM_SHARED : 0) | + (fb->is_pmem ? RAM_PMEM : 0), fb->mem_path, errp); g_free(path); } @@ -131,6 +134,37 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, error_propagate(errp, local_err); } +static bool file_memory_backend_get_pmem(Object *o, Error **errp) +{ +return MEMORY_BACKEND_FILE(o)->is_pmem; +} + +static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property 'pmem' of %s '%s'", + object_get_typename(o), + object_get_canonical_path_component(o)); +return; +} + +#ifndef CONFIG_LIBPMEM +if (value) { +error_report("Lack of libpmem support while setting the 'pmem=on'" + " of %s '%s'. We can not ensure the persistence of it" + " without libpmem support, this may cause serious" + " problems." , object_get_typename(o), + object_get_canonical_path_component(o)); +exit(1); +} +#endif + +fb->is_pmem = value; +} + static void file_backend_unparent(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -162,6 +196,9 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL, &error_abort); +object_class_property_add_bool(oc, "pmem", +file_memory_backend_get_pmem, file_memory_backend_set_pmem, +&error_abort); } static void file_backend_instance_finalize(Object *o) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index 8b48fb4..2f7d348 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -180,3 +180,21 @@ supports CPU Cache Flush and Memory Controller Flush on Power Loss, etc. For a complete list of the flags available and for more detailed descriptions, please consult the ACPI spec. + +guest software that this vNVDIMM device contains a region that cannot +accept persistent writes. In result, for example, the guest Linux +NVDIMM driver, marks such vNVDIMM device as read-only. + +If the vNVDIMM backend is on the host persistent memory that can be +accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's +suggested to set the 'pmem' option of memory-backend-file to 'on'. When +'pmem=on' and QEMU is built with libpmem [2] support (configured with +--enable-libpmem), QEMU will take necessary operations to guarantee +the persistence of its own writes to the vNVDIMM backend (e.g., in +vNVDIMM label emulation and live migration). + +References +-- + +[1] SNIA NVM Programming Model: https://www.snia.org/sites/default/files/technical_work/
Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory
static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) { HostMemoryBackend *backend = MEMORY_BACKEND(o); HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); if (host_memory_backend_mr_inited(backend)) { error_setg(errp, "cannot change property 'pmem' of %s '%s'", object_get_typename(o), object_get_canonical_path_component(o)); return; } #ifndef CONFIG_LIBPMEM if (value) { warn_report("Lack of libpmem support while setting the 'pmem' of" " %s '%s'. We can not ensure the persistence of it" " without libpmem support.", object_get_typename(o), object_get_canonical_path_component(o)); } #endif fb->is_pmem = value; } Is this kind of hint or warning acceptable? From: Qemu-devel on behalf of Junyan He Sent: Tuesday, June 12, 2018 3:27:38 PM To: Igor Mammedov Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; m...@redhat.com; qemu-devel@nongnu.org; dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory According to my understand, the file node on real persistent memory do not need to be pmem=on pmem=on is a feature for file backend class. For example, if we do not have enough hard disk space and we have enough pmem, we can use file on pmem the same as normal file-backend on hard disk. That is the file-backend on pmem does not necessary to a nvdimm backend, it can be normal file mapping as well. so > detect that backing file is located on pmem storage may not be a good manner. > (1) Maybe we should error out if pmem=on but compiled without libpmem > and add 3rd state pmem=force for testing purposes. I think we can print a warning message to give user a hint, it can really help user when they mis-configure. From: Qemu-devel on behalf of Igor Mammedov Sent: Tuesday, June 12, 2018 2:55:46 PM To: Junyan He Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; m...@redhat.com; crosthwaite.pe...@gmail.com; qemu-devel@nongnu.org; ehabk...@redhat.com; dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; pbonz...@redhat.com; r...@twiddle.net Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory On Tue, 12 Jun 2018 13:38:08 + Junyan He wrote: > He have pmem_persist and pmem_memcpy_persist stub functions. > > If no libpmem and user really specify pmem=on, we just do nothing or just > memcpy. > > Real persistent memory always require libpmem support its load/save. > > If pmem=on and without libpmem, we can think that user want to imitate > > pmem=on while the HW environment is without real persistent memory existing. > > It may help debug on some machine without real pmem. For unaware user it would be easy to misconfigure and think that feature works while it isn't, which cloud lead to data loss. (1) Maybe we should error out if pmem=on but compiled without libpmem and add 3rd state pmem=force for testing purposes. Also can we detect that backing file is located on pmem storage and do [1] if it's not? > > > From: Qemu-devel on behalf > of Igor Mammedov > Sent: Tuesday, June 12, 2018 12:06:43 PM > To: junyan...@gmx.com > Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; > crosthwaite.pe...@gmail.com; m...@redhat.com; qemu-devel@nongnu.org; > dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; > pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com > Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of > QEMU writes to persistent memory > > On Fri, 1 Jun 2018 16:10:22 +0800 > junyan...@gmx.com wrote: > > > From: Junyan He > > > > QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and > > live migration. If the backend is on the persistent memory, QEMU needs > > to take proper operations to ensure its writes persistent on the > > persistent memory. Otherwise, a host power failure may result in the > > loss the guest data on the persistent memory. > > extra question, what are expected behavior when QEMU is built without > libpmem and user specifies pmem=on for backend? > > > > > This v3 patch series is based on Marcel's patch "mem: add share > > parameter to memory-backend-ram" [1] because of the changes in patch 1. > > > > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03
Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory
According to my understand, the file node on real persistent memory do not need to be pmem=on pmem=on is a feature for file backend class. For example, if we do not have enough hard disk space and we have enough pmem, we can use file on pmem the same as normal file-backend on hard disk. That is the file-backend on pmem does not necessary to a nvdimm backend, it can be normal file mapping as well. so > detect that backing file is located on pmem storage may not be a good manner. > (1) Maybe we should error out if pmem=on but compiled without libpmem > and add 3rd state pmem=force for testing purposes. I think we can print a warning message to give user a hint, it can really help user when they mis-configure. From: Qemu-devel on behalf of Igor Mammedov Sent: Tuesday, June 12, 2018 2:55:46 PM To: Junyan He Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; m...@redhat.com; crosthwaite.pe...@gmail.com; qemu-devel@nongnu.org; ehabk...@redhat.com; dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; pbonz...@redhat.com; r...@twiddle.net Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory On Tue, 12 Jun 2018 13:38:08 +0000 Junyan He wrote: > He have pmem_persist and pmem_memcpy_persist stub functions. > > If no libpmem and user really specify pmem=on, we just do nothing or just > memcpy. > > Real persistent memory always require libpmem support its load/save. > > If pmem=on and without libpmem, we can think that user want to imitate > > pmem=on while the HW environment is without real persistent memory existing. > > It may help debug on some machine without real pmem. For unaware user it would be easy to misconfigure and think that feature works while it isn't, which cloud lead to data loss. (1) Maybe we should error out if pmem=on but compiled without libpmem and add 3rd state pmem=force for testing purposes. Also can we detect that backing file is located on pmem storage and do [1] if it's not? > > > From: Qemu-devel on behalf > of Igor Mammedov > Sent: Tuesday, June 12, 2018 12:06:43 PM > To: junyan...@gmx.com > Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; > crosthwaite.pe...@gmail.com; m...@redhat.com; qemu-devel@nongnu.org; > dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; > pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com > Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of > QEMU writes to persistent memory > > On Fri, 1 Jun 2018 16:10:22 +0800 > junyan...@gmx.com wrote: > > > From: Junyan He > > > > QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and > > live migration. If the backend is on the persistent memory, QEMU needs > > to take proper operations to ensure its writes persistent on the > > persistent memory. Otherwise, a host power failure may result in the > > loss the guest data on the persistent memory. > > extra question, what are expected behavior when QEMU is built without > libpmem and user specifies pmem=on for backend? > > > > > This v3 patch series is based on Marcel's patch "mem: add share > > parameter to memory-backend-ram" [1] because of the changes in patch 1. > > > > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html > > > > Previous versions can be found at > > v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html > > V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html > > v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html > > v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html > > v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html > > > > Changes in v6: > > * (Patch 1) Expose all ram block flags rather than redefine the flags. > > * (Patch 4) Use pkg-config rather the hard check when configure. > > * (Patch 7) Sync and flush all the pmem data when migration completes, > > rather than sync pages one by one in previous version. > > > > Changes in v5: > > * (Patch 9) Add post copy check and output some messages for nvdimm. > > > > Changes in v4: > > * (Patch 2) Fix compilation errors found by patchew. > > > > Changes in v3: > > * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle > > PMEM writes in it, so we don't need the _common function. > > * (Patch 6) Expose qemu_get_buffer_common so we can remove the > > unnecessary qemu_get_buffer_to_pmem wrapper. > > * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle > >
Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory
He have pmem_persist and pmem_memcpy_persist stub functions. If no libpmem and user really specify pmem=on, we just do nothing or just memcpy. Real persistent memory always require libpmem support its load/save. If pmem=on and without libpmem, we can think that user want to imitate pmem=on while the HW environment is without real persistent memory existing. It may help debug on some machine without real pmem. From: Qemu-devel on behalf of Igor Mammedov Sent: Tuesday, June 12, 2018 12:06:43 PM To: junyan...@gmx.com Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; m...@redhat.com; qemu-devel@nongnu.org; dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com Subject: Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory On Fri, 1 Jun 2018 16:10:22 +0800 junyan...@gmx.com wrote: > From: Junyan He > > QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and > live migration. If the backend is on the persistent memory, QEMU needs > to take proper operations to ensure its writes persistent on the > persistent memory. Otherwise, a host power failure may result in the > loss the guest data on the persistent memory. extra question, what are expected behavior when QEMU is built without libpmem and user specifies pmem=on for backend? > > This v3 patch series is based on Marcel's patch "mem: add share > parameter to memory-backend-ram" [1] because of the changes in patch 1. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html > > Previous versions can be found at > v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html > V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html > v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html > v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html > v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html > > Changes in v6: > * (Patch 1) Expose all ram block flags rather than redefine the flags. > * (Patch 4) Use pkg-config rather the hard check when configure. > * (Patch 7) Sync and flush all the pmem data when migration completes, > rather than sync pages one by one in previous version. > > Changes in v5: > * (Patch 9) Add post copy check and output some messages for nvdimm. > > Changes in v4: > * (Patch 2) Fix compilation errors found by patchew. > > Changes in v3: > * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle > PMEM writes in it, so we don't need the _common function. > * (Patch 6) Expose qemu_get_buffer_common so we can remove the > unnecessary qemu_get_buffer_to_pmem wrapper. > * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle > PMEM writes in it, so we can remove the unnecessary > xbzrle_decode_buffer_{common, to_pmem}. > * Move libpmem stubs to stubs/pmem.c and fix the compilation failures > of test-{xbzrle,vmstate}.c. > > Changes in v2: > * (Patch 1) Use a flags parameter in file ram allocation functions. > * (Patch 2) Add a new option 'pmem' to hostmem-file. > * (Patch 3) Use libpmem to operate on the persistent memory, rather > than re-implementing those operations in QEMU. > * (Patch 5-8) Consider the write persistence in the migration path. > > > Junyan: > [1/7] memory, exec: Expose all memory block related flags. > [6/7] migration/ram: Add check and info message to nvdimm post copy. > [7/7] migration/ram: ensure write persistence on loading all date to PMEM. > > Haozhong: > [5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation > > Haozhong & Junyan: > [2/7] memory, exec: switch file ram allocation functions to 'flags' parameters > [3/7] hostmem-file: add the 'pmem' option > [4/7] configure: add libpmem support > > > Signed-off-by: Haozhong Zhang > Signed-off-by: Junyan He > > --- > backends/hostmem-file.c | 28 +++- > configure | 29 + > docs/nvdimm.txt | 14 ++ > exec.c | 36 ++-- > hw/mem/nvdimm.c | 9 - > include/exec/memory.h | 31 +-- > include/exec/ram_addr.h | 28 ++-- > include/qemu/pmem.h | 24 > memory.c| 8 +--- > migration/ram.c | 18 ++ > numa.c | 2 +- > qemu-options.hx | 7 +++ > stubs/Makefile.objs | 1 + > stubs/pmem.c| 23 +++ > 14 files changed, 226 insertions(+), 32 deletions(-)
[Qemu-devel] [PATCH 7/7 V7] migration/ram: ensure write persistence on loading all data to PMEM.
From: Junyan He Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi --- include/qemu/pmem.h | 1 + migration/ram.c | 8 stubs/pmem.c| 4 3 files changed, 13 insertions(+) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 00d6680..8f52b08 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -17,6 +17,7 @@ #else /* !CONFIG_LIBPMEM */ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); +void pmem_persist(const void *addr, size_t len); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index d9093a7..5603505 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -33,6 +33,7 @@ #include "qemu/bitops.h" #include "qemu/bitmap.h" #include "qemu/main-loop.h" +#include "qemu/pmem.h" #include "xbzrle.h" #include "ram.h" #include "migration.h" @@ -3065,6 +3066,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque) static int ram_load_cleanup(void *opaque) { RAMBlock *rb; + +RAMBLOCK_FOREACH_MIGRATABLE(rb) { +if (ramblock_is_pmem(rb)) { +pmem_persist(rb->host, rb->used_length); +} +} + xbzrle_load_cleanup(); compress_threads_load_cleanup(); diff --git a/stubs/pmem.c b/stubs/pmem.c index b4ec72d..f794262 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) { return memcpy(pmemdest, src, len); } + +void pmem_persist(const void *addr, size_t len) +{ +} -- 2.7.4
[Qemu-devel] [PATCH 6/7 V7] migration/ram: Add check and info message to nvdimm post copy.
From: Junyan He The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi --- migration/ram.c | 9 + 1 file changed, 9 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index a500015..d9093a7 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3419,6 +3419,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { +RAMBlock *rb; +RAMBLOCK_FOREACH(rb) { +if (ramblock_is_pmem(rb)) { +info_report("Block: %s, host: %p is a nvdimm memory, postcopy" + "is not supported now!", rb->idstr, rb->host); +return false; +} +} + return migrate_postcopy_ram(); } -- 2.7.4
[Qemu-devel] [PATCH 5/7 V7] mem/nvdimm: ensure write persistence to PMEM in label emulation
From: Junyan He Guest writes to vNVDIMM labels are intercepted and performed on the backend by QEMU. When the backend is a real persistent memort, QEMU needs to take proper operations to ensure its write persistence on the persistent memory. Otherwise, a host power failure may result in the loss of guest label configurations. Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi --- hw/mem/nvdimm.c | 9 - include/qemu/pmem.h | 23 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 19 +++ 4 files changed, 51 insertions(+), 1 deletion(-) create mode 100644 include/qemu/pmem.h create mode 100644 stubs/pmem.c diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 4087aca..03b478e 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "qemu/pmem.h" #include "qapi/error.h" #include "qapi/visitor.h" #include "hw/mem/nvdimm.h" @@ -155,11 +156,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, { MemoryRegion *mr; PCDIMMDevice *dimm = PC_DIMM(nvdimm); +bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), +"pmem", NULL); uint64_t backend_offset; nvdimm_validate_rw_label_data(nvdimm, size, offset); -memcpy(nvdimm->label_data + offset, buf, size); +if (!is_pmem) { +memcpy(nvdimm->label_data + offset, buf, size); +} else { +pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); +} mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort); backend_offset = memory_region_size(mr) - nvdimm->label_size + offset; diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h new file mode 100644 index 000..00d6680 --- /dev/null +++ b/include/qemu/pmem.h @@ -0,0 +1,23 @@ +/* + * QEMU header file for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_PMEM_H +#define QEMU_PMEM_H + +#ifdef CONFIG_LIBPMEM +#include +#else /* !CONFIG_LIBPMEM */ + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); + +#endif /* CONFIG_LIBPMEM */ + +#endif /* !QEMU_PMEM_H */ diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 53d3f32..be9a042 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -43,3 +43,4 @@ stub-obj-y += xen-common.o stub-obj-y += xen-hvm.o stub-obj-y += pci-host-piix.o stub-obj-y += ram-block.o +stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o \ No newline at end of file diff --git a/stubs/pmem.c b/stubs/pmem.c new file mode 100644 index 000..b4ec72d --- /dev/null +++ b/stubs/pmem.c @@ -0,0 +1,19 @@ +/* + * Stubs for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include + +#include "qemu/pmem.h" + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) +{ +return memcpy(pmemdest, src, len); +} -- 2.7.4
[Qemu-devel] [PATCH 0/7 V7] nvdimm: guarantee persistence of QEMU writes to persistent memory
From: Junyan He QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live migration. If the backend is on the persistent memory, QEMU needs to take proper operations to ensure its writes persistent on the persistent memory. Otherwise, a host power failure may result in the loss the guest data on the persistent memory. This v3 patch series is based on Marcel's patch "mem: add share parameter to memory-backend-ram" [1] because of the changes in patch 1. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html Previous versions can be found at: v6: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00061.html v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html Changes in v7: The v6 patch set has already reviewed by Stefan Hajnoczi No logic change in this v7 version, just: * Spelling check and some document words refined. * Rebase to "ram is migratable" patch set. Changes in v6: * (Patch 1) Expose all ram block flags rather than redefine the flags. * (Patch 4) Use pkg-config rather the hard check when configure. * (Patch 7) Sync and flush all the pmem data when migration completes, rather than sync pages one by one in previous version. Changes in v5: * (Patch 9) Add post copy check and output some messages for nvdimm. Changes in v4: * (Patch 2) Fix compilation errors found by patchew. Changes in v3: * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle PMEM writes in it, so we don't need the _common function. * (Patch 6) Expose qemu_get_buffer_common so we can remove the unnecessary qemu_get_buffer_to_pmem wrapper. * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle PMEM writes in it, so we can remove the unnecessary xbzrle_decode_buffer_{common, to_pmem}. * Move libpmem stubs to stubs/pmem.c and fix the compilation failures of test-{xbzrle,vmstate}.c. Changes in v2: * (Patch 1) Use a flags parameter in file ram allocation functions. * (Patch 2) Add a new option 'pmem' to hostmem-file. * (Patch 3) Use libpmem to operate on the persistent memory, rather than re-implementing those operations in QEMU. * (Patch 5-8) Consider the write persistence in the migration path. Junyan: [1/7] memory, exec: Expose all memory block related flags. [6/7] migration/ram: Add check and info message to nvdimm post copy. [7/7] migration/ram: ensure write persistence on loading all date to PMEM. Haozhong: [5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation Haozhong & Junyan: [2/7] memory, exec: switch file ram allocation functions to 'flags' parameters [3/7] hostmem-file: add the 'pmem' option [4/7] configure: add libpmem support Signed-off-by: Haozhong Zhang Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi --- backends/hostmem-file.c | 28 +++- configure | 29 + docs/nvdimm.txt | 18 ++ exec.c | 39 ++- hw/mem/nvdimm.c | 9 - include/exec/memory.h | 31 +-- include/exec/ram_addr.h | 28 ++-- include/qemu/pmem.h | 24 memory.c| 8 +--- migration/ram.c | 17 + numa.c | 2 +- qemu-options.hx | 7 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 23 +++ 14 files changed, 229 insertions(+), 35 deletions(-) -- 2.7.4
[Qemu-devel] [PATCH 4/7 V7] configure: add libpmem support
From: Junyan He Add a pair of configure options --{enable,disable}-libpmem to control whether QEMU is compiled with PMDK libpmem [1]. QEMU may write to the host persistent memory (e.g. in vNVDIMM label emulation and live migration), so it must take the proper operations to ensure the persistence of its own writes. Depending on the CPU models and available instructions, the optimal operation can vary [2]. PMDK libpmem have already implemented those operations on multiple CPU models (x86 and ARM) and the logic to select the optimal ones, so QEMU can just use libpmem rather than re-implement them. [1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/ [2] https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33 Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi --- configure | 29 + 1 file changed, 29 insertions(+) diff --git a/configure b/configure index 14b1113..c49b7b6 100755 --- a/configure +++ b/configure @@ -457,6 +457,7 @@ replication="yes" vxhs="" libxml2="" docker="no" +libpmem="" supported_cpu="no" supported_os="no" @@ -1382,6 +1383,10 @@ for opt do ;; --disable-git-update) git_update=no ;; + --enable-libpmem) libpmem=yes + ;; + --disable-libpmem) libpmem=no + ;; *) echo "ERROR: unknown option $opt" echo "Try '$0 --help' for more information" @@ -1639,6 +1644,7 @@ disabled with --disable-FEATURE, default is enabled if available: crypto-afalgLinux AF_ALG crypto backend driver vhost-user vhost-user support capstonecapstone disassembler support + libpmem libpmem support NOTE: The object files are built at the place where configure is launched EOF @@ -5463,6 +5469,24 @@ if has "docker"; then fi ## +# check for libpmem + +if test "$libpmem" != "no"; then + if $pkg_config --exists "libpmem"; then + libpmem="yes" + libpmem_libs=$($pkg_config --libs libpmem) + libpmem_cflags=$($pkg_config --cflags libpmem) + libs_softmmu="$libs_softmmu $libpmem_libs" + QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags" + else + if test "$libpmem" = "yes" ; then + feature_not_found "libpmem" "Install nvml or pmdk" + fi + libpmem="no" + fi +fi + +## # End of CC checks # After here, no more $cc or $ld runs @@ -5926,6 +5950,7 @@ echo "replication support $replication" echo "VxHS block device $vxhs" echo "capstone $capstone" echo "docker$docker" +echo "libpmem support $libpmem" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" @@ -6673,6 +6698,10 @@ if test "$vxhs" = "yes" ; then echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak fi +if test "$libpmem" = "yes" ; then + echo "CONFIG_LIBPMEM=y" >> $config_host_mak +fi + if test "$tcg_interpreter" = "yes"; then QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES" elif test "$ARCH" = "sparc64" ; then -- 2.7.4
[Qemu-devel] [PATCH 3/7 V7] hostmem-file: add the 'pmem' option
From: Junyan He When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it needs to know whether the backend storage is a real persistent memory, in order to decide whether special operations should be performed to ensure the data persistence. This boolean option 'pmem' allows users to specify whether the backend storage of memory-backend-file is a real persistent memory. If 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the corresponding memory region. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi --- backends/hostmem-file.c | 27 ++- docs/nvdimm.txt | 18 ++ exec.c | 9 + include/exec/memory.h | 4 include/exec/ram_addr.h | 3 +++ qemu-options.hx | 7 +++ 6 files changed, 67 insertions(+), 1 deletion(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 34c68bb..6a861f0 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -34,6 +34,7 @@ struct HostMemoryBackendFile { bool discard_data; char *mem_path; uint64_t align; +bool is_pmem; }; static void @@ -59,7 +60,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, backend->size, fb->align, - backend->share ? RAM_SHARED : 0, + (backend->share ? RAM_SHARED : 0) | + (fb->is_pmem ? RAM_PMEM : 0), fb->mem_path, errp); g_free(path); } @@ -131,6 +133,26 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, error_propagate(errp, local_err); } +static bool file_memory_backend_get_pmem(Object *o, Error **errp) +{ +return MEMORY_BACKEND_FILE(o)->is_pmem; +} + +static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property 'pmem' of %s '%s'", + object_get_typename(o), + object_get_canonical_path_component(o)); +return; +} + +fb->is_pmem = value; +} + static void file_backend_unparent(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -162,6 +184,9 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL, &error_abort); +object_class_property_add_bool(oc, "pmem", +file_memory_backend_get_pmem, file_memory_backend_set_pmem, +&error_abort); } static void file_backend_instance_finalize(Object *o) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index 8b48fb4..2f7d348 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -180,3 +180,21 @@ supports CPU Cache Flush and Memory Controller Flush on Power Loss, etc. For a complete list of the flags available and for more detailed descriptions, please consult the ACPI spec. + +guest software that this vNVDIMM device contains a region that cannot +accept persistent writes. In result, for example, the guest Linux +NVDIMM driver, marks such vNVDIMM device as read-only. + +If the vNVDIMM backend is on the host persistent memory that can be +accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's +suggested to set the 'pmem' option of memory-backend-file to 'on'. When +'pmem=on' and QEMU is built with libpmem [2] support (configured with +--enable-libpmem), QEMU will take necessary operations to guarantee +the persistence of its own writes to the vNVDIMM backend (e.g., in +vNVDIMM label emulation and live migration). + +References +-- + +[1] SNIA NVM Programming Model: https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf +[2] PMDK: http://pmem.io/pmdk/ diff --git a/exec.c b/exec.c index 8e079df..c42483e 100644 --- a/exec.c +++ b/exec.c @@ -2077,6 +2077,9 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, Error *local_err = NULL; int64_t file_size; +/* Just support these ram flags by now. */ +assert(ram_flags == 0 || (ram_flags & (RAM_SHARED | RAM_PMEM))); + if (xen_enabled()) { error_setg(errp, "-mem-path not supported with Xen"); return NULL; @@ -4007,6 +4010,11 @@ err: return ret; } +bool ramblock_is_pmem(RAMBlock *rb) +{ +return rb->flags & RAM_PMEM; +} + #endif void page_size_init(void) @@ -4105,3 +4113,4 @@ void mtree_print_dispatch(fprintf_function mon, void
[Qemu-devel] [PATCH 2/7 V7] memory, exec: switch file ram allocation functions to 'flags' parameters
From: Junyan He As more flag parameters besides the existing 'share' are going to be added to following functions memory_region_init_ram_from_file qemu_ram_alloc_from_fd qemu_ram_alloc_from_file let's switch them to use the 'flags' parameters so as to ease future flag additions. The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags, and other flag bits are ignored by above functions right now. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang Reviewed-by: Stefan Hajnoczi --- backends/hostmem-file.c | 3 ++- exec.c | 10 +- include/exec/memory.h | 7 +-- include/exec/ram_addr.h | 25 +++-- memory.c| 8 +--- numa.c | 2 +- 6 files changed, 41 insertions(+), 14 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 134b08d..34c68bb 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) path = object_get_canonical_path(OBJECT(backend)); memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, - backend->size, fb->align, backend->share, + backend->size, fb->align, + backend->share ? RAM_SHARED : 0, fb->mem_path, errp); g_free(path); } diff --git a/exec.c b/exec.c index 9246722..8e079df 100644 --- a/exec.c +++ b/exec.c @@ -2070,7 +2070,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared) #ifdef __linux__ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, - bool share, int fd, + uint64_t ram_flags, int fd, Error **errp) { RAMBlock *new_block; @@ -2112,14 +2112,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, new_block->mr = mr; new_block->used_length = size; new_block->max_length = size; -new_block->flags = share ? RAM_SHARED : 0; +new_block->flags = ram_flags; new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp); if (!new_block->host) { g_free(new_block); return NULL; } -ram_block_add(new_block, &local_err, share); +ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED); if (local_err) { g_free(new_block); error_propagate(errp, local_err); @@ -2131,7 +2131,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, - bool share, const char *mem_path, + uint64_t ram_flags, const char *mem_path, Error **errp) { int fd; @@ -2143,7 +2143,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, return NULL; } -block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp); +block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp); if (!block) { if (created) { unlink(mem_path); diff --git a/include/exec/memory.h b/include/exec/memory.h index 1bb9172..3769c06 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -599,6 +599,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, void *host), Error **errp); #ifdef __linux__ + /** * memory_region_init_ram_from_file: Initialize RAM memory region with a *mmap-ed backend. @@ -610,7 +611,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @size: size of the region. * @align: alignment of the region base address; if 0, the default alignment * (getpagesize()) will be used. - * @share: %true if memory must be mmaped with the MAP_SHARED flag + * @ram_flags: Memory region features: + * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag + * Other bits are ignored now. * @path: the path in which to allocate the RAM. * @errp: pointer to Error*, to store an error if it happens. * @@ -622,7 +625,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, const char *name, uint64_t size, uint64_t align, - bool share, + uint64_t ram_flags, const char *path, Error **errp); diff --git a/include/exec/
[Qemu-devel] [PATCH 1/7 V7] memory, exec: Expose all memory block related flags.
From: Junyan He We need to use these flags in other files rather than just in exec.c, For example, RAM_SHARED should be used when create a ram block from file. We expose them the exec/memory.h Signed-off-by: Junyan He Reviewed-by: Stefan Hajnoczi --- exec.c| 20 include/exec/memory.h | 20 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/exec.c b/exec.c index f6645ed..9246722 100644 --- a/exec.c +++ b/exec.c @@ -87,26 +87,6 @@ AddressSpace address_space_memory; MemoryRegion io_mem_rom, io_mem_notdirty; static MemoryRegion io_mem_unassigned; - -/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ -#define RAM_PREALLOC (1 << 0) - -/* RAM is mmap-ed with MAP_SHARED */ -#define RAM_SHARED (1 << 1) - -/* Only a portion of RAM (used_length) is actually used, and migrated. - * This used_length size can change across reboots. - */ -#define RAM_RESIZEABLE (1 << 2) - -/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically - * zero the page and wake waiting processes. - * (Set during postcopy) - */ -#define RAM_UF_ZEROPAGE (1 << 3) - -/* RAM can be migrated */ -#define RAM_MIGRATABLE (1 << 4) #endif #ifdef TARGET_PAGE_BITS_VARY diff --git a/include/exec/memory.h b/include/exec/memory.h index eb2ba06..1bb9172 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -102,6 +102,26 @@ struct IOMMUNotifier { }; typedef struct IOMMUNotifier IOMMUNotifier; +/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ +#define RAM_PREALLOC (1 << 0) + +/* RAM is mmap-ed with MAP_SHARED */ +#define RAM_SHARED (1 << 1) + +/* Only a portion of RAM (used_length) is actually used, and migrated. + * This used_length size can change across reboots. + */ +#define RAM_RESIZEABLE (1 << 2) + +/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically + * zero the page and wake waiting processes. + * (Set during postcopy) + */ +#define RAM_UF_ZEROPAGE (1 << 3) + +/* RAM can be migrated */ +#define RAM_MIGRATABLE (1 << 4) + static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn, IOMMUNotifierFlag flags, hwaddr start, hwaddr end) -- 2.7.4
Re: [Qemu-devel] [PATCH V6 RESEND 7/7] migration/ram: ensure write persistence on loading all data to PMEM.
Sorry missed dgilbert's comment about RAMBLOCK_FOREACH_MIGRATABLE in previous revision. From: Qemu-devel on behalf of junyan...@gmx.com Sent: Saturday, June 9, 2018 1:12:31 AM To: qemu-devel@nongnu.org Cc: xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; m...@redhat.com; dgilb...@redhat.com; ehabk...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; pbonz...@redhat.com; imamm...@redhat.com; r...@twiddle.net Subject: [Qemu-devel] [PATCH V6 RESEND 7/7] migration/ram: ensure write persistence on loading all data to PMEM. From: Junyan He Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He --- include/qemu/pmem.h | 1 + migration/ram.c | 8 stubs/pmem.c| 4 3 files changed, 13 insertions(+) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 00d6680..8f52b08 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -17,6 +17,7 @@ #else /* !CONFIG_LIBPMEM */ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); +void pmem_persist(const void *addr, size_t len); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index aa0c6f0..15418c2 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -33,6 +33,7 @@ #include "qemu/bitops.h" #include "qemu/bitmap.h" #include "qemu/main-loop.h" +#include "qemu/pmem.h" #include "xbzrle.h" #include "ram.h" #include "migration.h" @@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque) static int ram_load_cleanup(void *opaque) { RAMBlock *rb; + +RAMBLOCK_FOREACH_MIGRATABLE(rb) { +if (ramblock_is_pmem(rb)) { +pmem_persist(rb->host, rb->used_length); +} +} + xbzrle_load_cleanup(); compress_threads_load_cleanup(); diff --git a/stubs/pmem.c b/stubs/pmem.c index b4ec72d..f794262 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) { return memcpy(pmemdest, src, len); } + +void pmem_persist(const void *addr, size_t len) +{ +} -- 2.7.4
[Qemu-devel] [PATCH V6 RESEND 7/7] migration/ram: ensure write persistence on loading all data to PMEM.
From: Junyan He Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He --- include/qemu/pmem.h | 1 + migration/ram.c | 8 stubs/pmem.c| 4 3 files changed, 13 insertions(+) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 00d6680..8f52b08 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -17,6 +17,7 @@ #else /* !CONFIG_LIBPMEM */ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); +void pmem_persist(const void *addr, size_t len); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index aa0c6f0..15418c2 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -33,6 +33,7 @@ #include "qemu/bitops.h" #include "qemu/bitmap.h" #include "qemu/main-loop.h" +#include "qemu/pmem.h" #include "xbzrle.h" #include "ram.h" #include "migration.h" @@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque) static int ram_load_cleanup(void *opaque) { RAMBlock *rb; + +RAMBLOCK_FOREACH_MIGRATABLE(rb) { +if (ramblock_is_pmem(rb)) { +pmem_persist(rb->host, rb->used_length); +} +} + xbzrle_load_cleanup(); compress_threads_load_cleanup(); diff --git a/stubs/pmem.c b/stubs/pmem.c index b4ec72d..f794262 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) { return memcpy(pmemdest, src, len); } + +void pmem_persist(const void *addr, size_t len) +{ +} -- 2.7.4
Re: [Qemu-devel] [Qemu-block] Some question about savem/qcow2 incremental snapshot
I think nvdimm kind memory can really save the content(no matter real or emulated). But I think it is still memory, as I understand, its data should be stored in the qcow2 image or some external snapshot data image, so that we can copy this qcow2 image to other place and restore the same environment. Qcow2 image contain all vm state, disk data and memory data, so I think nvdimm's data should also be stored in this qcow2 image. I am really a new guy in vmm field and do not know the usage of qcow2 with DAX. So far as I know, DAX is a kernel FS option to let the page mapping bypass all block device logic and can improve performance. But qcow2 is a user space file format used by qemu to emulate disks(Am I right?), so I have no idea about that. Thanks Junyan From: Qemu-devel on behalf of Pankaj Gupta Sent: Friday, June 8, 2018 7:59:24 AM To: Junyan He Cc: Kevin Wolf; qemu block; qemu-devel@nongnu.org; Stefan Hajnoczi; Max Reitz Subject: Re: [Qemu-devel] [Qemu-block] Some question about savem/qcow2 incremental snapshot Hi Junyan, AFAICU you are trying to utilize qcow2 capabilities to do incremental snapshot. As I understand NVDIMM device (being it real or emulated), its contents are always be backed up in backing device. Now, the question comes to take a snapshot at some point in time. You are trying to achieve this with qcow2 format (not checked code yet), I have below queries: - Are you implementing this feature for both actual DAX device pass-through as well as emulated DAX? - Are you using additional qcow2 disk for storing/taking snapshots? How we are planning to use this feature? Reason I asked this question is if we concentrate on integrating qcow2 with DAX, we will have a full fledged solution for most of the use-cases. Thanks, Pankaj > > Dear all: > > I just switched from graphic/media field to virtualization at the end of the > last year, > so I am sorry that though I have already try my best but I still feel a > little dizzy > about your previous discussion about NVDimm via block layer:) > In today's qemu, we use the SaveVMHandlers functions to handle both snapshot > and migration. > So for nvdimm kind memory, its migration and snapshot use the same way as the > ram(savevm_ram_handlers). But the difference is the size of nvdimm may be > huge, and the load > and store speed is slower. According to my usage, when I use 256G nvdimm as > memory backend, > it may take more than 5 minutes to complete one snapshot saving, and after > saving the qcow2 > image is bigger than 50G. For migration, this may not be a problem because we > do not need > extra disk space and the guest is not paused when in migration process. But > for snapshot, > we need to pause the VM and the user experience is bad, and we got concerns > about that. > I posted this question in Jan this year but failed to get enough reply. Then > I sent a RFC patch > set in Mar, basic idea is using the dependency snapshot and dirty log trace > in kernel to > optimize this. > > https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg04530.html > > I use the simple way to handle this, > 1. Separate the nvdimm region from ram when do snapshot. > 2. If the first time, we dump all the nvdimm data the same as ram, and enable > dirty log trace > for nvdimm kind region. > 3. If not the first time, we find the previous snapshot point and add > reference to its clusters > which is used to store nvdimm data. And this time, we just save dirty page > bitmap and dirty pages. > Because the previous nvdimm data clusters is ref added, we do not need to > worry about its deleting. > > I encounter a lot of problems: > 1. Migration and snapshot logic is mixed and need to separate them for > nvdimm. > 2. Cluster has its alignment. When do snapshot, we just save data to disk > continuous. Because we > need to add ref to cluster, we really need to consider the alignment. I just > use a little trick way > to padding some data to alignment now, and I think it is not a good way. > 3. Dirty log trace may have some performance problem. > > In theory, this manner can be used to handle all kind of huge memory > snapshot, we need to find the > balance between guest performance(Because of dirty log trace) and snapshot > saving time. > > Thanks > Junyan > > > -Original Message- > From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > Sent: Thursday, May 31, 2018 6:49 PM > To: Kevin Wolf > Cc: Max Reitz ; He, Junyan ; Pankaj > Gupta ; qemu-devel@nongnu.org; qemu block > > Subject: Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 > incremental snapshot > > On Wed, May 30, 2018 at 06:07:19PM +0200, Kevin Wolf wrote: > > Am 30.05.2018 um 16:44 hat Stefan Hajnoczi g
[Qemu-devel] [PATCH V6 RESEND 7/7] migration/ram: ensure write persistence on loading all data to PMEM.
From: Junyan He Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He --- include/qemu/pmem.h | 1 + migration/ram.c | 8 stubs/pmem.c| 4 3 files changed, 13 insertions(+) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 00d6680..8f52b08 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -17,6 +17,7 @@ #else /* !CONFIG_LIBPMEM */ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); +void pmem_persist(const void *addr, size_t len); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index aa0c6f0..15418c2 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -33,6 +33,7 @@ #include "qemu/bitops.h" #include "qemu/bitmap.h" #include "qemu/main-loop.h" +#include "qemu/pmem.h" #include "xbzrle.h" #include "ram.h" #include "migration.h" @@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque) static int ram_load_cleanup(void *opaque) { RAMBlock *rb; + +RAMBLOCK_FOREACH(rb) { +if (ramblock_is_pmem(rb)) { +pmem_persist(rb->host, rb->used_length); +} +} + xbzrle_load_cleanup(); compress_threads_load_cleanup(); diff --git a/stubs/pmem.c b/stubs/pmem.c index b4ec72d..f794262 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) { return memcpy(pmemdest, src, len); } + +void pmem_persist(const void *addr, size_t len) +{ +} -- 2.7.4
[Qemu-devel] [PATCH V6 7/7] migration/ram: ensure write persistence on loading all data to PMEM.
From: Junyan He Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He --- include/qemu/pmem.h | 1 + migration/ram.c | 8 stubs/pmem.c| 4 3 files changed, 13 insertions(+) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 00d6680..b1e1b5c 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -17,6 +17,7 @@ #else /* !CONFIG_LIBPMEM */ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); +void *pmem_persist(const void *addr, size_t len); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index aa0c6f0..09525b2 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -33,6 +33,7 @@ #include "qemu/bitops.h" #include "qemu/bitmap.h" #include "qemu/main-loop.h" +#include "qemu/pmem.h" #include "xbzrle.h" #include "ram.h" #include "migration.h" @@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque) static int ram_load_cleanup(void *opaque) { RAMBlock *rb; + +RAMBLOCK_FOREACH(rb) { +if (ramblock_is_pmem(rb)) { +pmem_persist(rb->host, rb->used_length); + } +} + xbzrle_load_cleanup(); compress_threads_load_cleanup(); diff --git a/stubs/pmem.c b/stubs/pmem.c index b4ec72d..c5bc6d6 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) { return memcpy(pmemdest, src, len); } + +void *pmem_persist(const void *addr, size_t len) +{ +} -- 2.7.4
[Qemu-devel] [PATCH V6 5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation
From: Junyan He Guest writes to vNVDIMM labels are intercepted and performed on the backend by QEMU. When the backend is a real persistent memort, QEMU needs to take proper operations to ensure its write persistence on the persistent memory. Otherwise, a host power failure may result in the loss of guest label configurations. Signed-off-by: Haozhong Zhang --- hw/mem/nvdimm.c | 9 - include/qemu/pmem.h | 23 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 19 +++ 4 files changed, 51 insertions(+), 1 deletion(-) create mode 100644 include/qemu/pmem.h create mode 100644 stubs/pmem.c diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 4087aca..03b478e 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "qemu/pmem.h" #include "qapi/error.h" #include "qapi/visitor.h" #include "hw/mem/nvdimm.h" @@ -155,11 +156,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, { MemoryRegion *mr; PCDIMMDevice *dimm = PC_DIMM(nvdimm); +bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), +"pmem", NULL); uint64_t backend_offset; nvdimm_validate_rw_label_data(nvdimm, size, offset); -memcpy(nvdimm->label_data + offset, buf, size); +if (!is_pmem) { +memcpy(nvdimm->label_data + offset, buf, size); +} else { +pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); +} mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort); backend_offset = memory_region_size(mr) - nvdimm->label_size + offset; diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h new file mode 100644 index 000..00d6680 --- /dev/null +++ b/include/qemu/pmem.h @@ -0,0 +1,23 @@ +/* + * QEMU header file for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_PMEM_H +#define QEMU_PMEM_H + +#ifdef CONFIG_LIBPMEM +#include +#else /* !CONFIG_LIBPMEM */ + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); + +#endif /* CONFIG_LIBPMEM */ + +#endif /* !QEMU_PMEM_H */ diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 53d3f32..be9a042 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -43,3 +43,4 @@ stub-obj-y += xen-common.o stub-obj-y += xen-hvm.o stub-obj-y += pci-host-piix.o stub-obj-y += ram-block.o +stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o \ No newline at end of file diff --git a/stubs/pmem.c b/stubs/pmem.c new file mode 100644 index 000..b4ec72d --- /dev/null +++ b/stubs/pmem.c @@ -0,0 +1,19 @@ +/* + * Stubs for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include + +#include "qemu/pmem.h" + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) +{ +return memcpy(pmemdest, src, len); +} -- 2.7.4
[Qemu-devel] [PATCH V6 6/7] migration/ram: Add check and info message to nvdimm post copy.
From: Junyan He The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He --- migration/ram.c | 9 + 1 file changed, 9 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index c53e836..aa0c6f0 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3397,6 +3397,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { +RAMBlock *rb; +RAMBLOCK_FOREACH(rb) { +if (ramblock_is_pmem(rb)) { +info_report("Block: %s, host: %p is a nvdimm memory, postcopy" + "is not supported now!", rb->idstr, rb->host); +return false; +} +} + return migrate_postcopy_ram(); } -- 2.7.4
[Qemu-devel] [PATCH V6 3/7] hostmem-file: add the 'pmem' option
From: Junyan He When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it needs to know whether the backend storage is a real persistent memory, in order to decide whether special operations should be performed to ensure the data persistence. This boolean option 'pmem' allows users to specify whether the backend storage of memory-backend-file is a real persistent memory. If 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the corresponding memory region. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang --- backends/hostmem-file.c | 27 ++- docs/nvdimm.txt | 14 ++ exec.c | 9 + include/exec/memory.h | 6 ++ include/exec/ram_addr.h | 3 +++ qemu-options.hx | 7 +++ 6 files changed, 65 insertions(+), 1 deletion(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 34c68bb..ccca7a1 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -34,6 +34,7 @@ struct HostMemoryBackendFile { bool discard_data; char *mem_path; uint64_t align; +bool is_pmem; }; static void @@ -59,7 +60,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, backend->size, fb->align, - backend->share ? RAM_SHARED : 0, + (backend->share ? RAM_SHARED : 0) | + (fb->is_pmem ? RAM_PMEM : 0), fb->mem_path, errp); g_free(path); } @@ -131,6 +133,26 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, error_propagate(errp, local_err); } +static bool file_memory_backend_get_pmem(Object *o, Error **errp) +{ +return MEMORY_BACKEND_FILE(o)->is_pmem; +} + +static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property 'pmem' of %s '%s'", + object_get_typename(o), + object_get_canonical_path_component(OBJECT(backend))); +return; +} + +fb->is_pmem = value; +} + static void file_backend_unparent(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -162,6 +184,9 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL, &error_abort); +object_class_property_add_bool(oc, "pmem", +file_memory_backend_get_pmem, file_memory_backend_set_pmem, +&error_abort); } static void file_backend_instance_finalize(Object *o) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index e903d8b..bcb2032 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -153,3 +153,17 @@ guest NVDIMM region mapping structure. This unarmed flag indicates guest software that this vNVDIMM device contains a region that cannot accept persistent writes. In result, for example, the guest Linux NVDIMM driver, marks such vNVDIMM device as read-only. + +If the vNVDIMM backend is on the host persistent memory that can be +accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's +suggested to set the 'pmem' option of memory-backend-file to 'on'. When +'pmem=on' and QEMU is built with libpmem [2] support (configured with +--enable-libpmem), QEMU will take necessary operations to guarantee +the persistence of its own writes to the vNVDIMM backend (e.g., in +vNVDIMM label emulation and live migration). + +References +-- + +[1] SNIA NVM Programming Model: https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf +[2] PMDK: http://pmem.io/pmdk/ diff --git a/exec.c b/exec.c index f2082fa..f066705 100644 --- a/exec.c +++ b/exec.c @@ -2061,6 +2061,9 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, Error *local_err = NULL; int64_t file_size; +/* Just support these ram flags by now. */ +assert(ram_flags == 0 || (ram_flags & (RAM_SHARED | RAM_PMEM))); + if (xen_enabled()) { error_setg(errp, "-mem-path not supported with Xen"); return NULL; @@ -3971,6 +3974,11 @@ err: return ret; } +bool ramblock_is_pmem(RAMBlock *rb) +{ +return rb->flags & RAM_PMEM; +} + #endif void page_size_init(void) @@ -4069,3 +4077,4 @@ void mtree_print_dispatch(fprintf_function mon, void *f, } #endif + diff --git a/include/exec/memory.h b/include/exec/memory.h index 3b68a43..6523512 100644 --- a/include/exec/memory.h
[Qemu-devel] [PATCH V6 4/7] configure: add libpmem support
From: Junyan He Add a pair of configure options --{enable,disable}-libpmem to control whether QEMU is compiled with PMDK libpmem [1]. QEMU may write to the host persistent memory (e.g. in vNVDIMM label emulation and live migration), so it must take the proper operations to ensure the persistence of its own writes. Depending on the CPU models and available instructions, the optimal operation can vary [2]. PMDK libpmem have already implemented those operations on multiple CPU models (x86 and ARM) and the logic to select the optimal ones, so QEMU can just use libpmem rather than re-implement them. [1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/ [2] https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33 Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang --- configure | 29 + 1 file changed, 29 insertions(+) diff --git a/configure b/configure index a6a4616..f44d669 100755 --- a/configure +++ b/configure @@ -456,6 +456,7 @@ jemalloc="no" replication="yes" vxhs="" libxml2="" +libpmem="" supported_cpu="no" supported_os="no" @@ -1381,6 +1382,10 @@ for opt do ;; --disable-git-update) git_update=no ;; + --enable-libpmem) libpmem=yes + ;; + --disable-libpmem) libpmem=no + ;; *) echo "ERROR: unknown option $opt" echo "Try '$0 --help' for more information" @@ -1638,6 +1643,7 @@ disabled with --disable-FEATURE, default is enabled if available: crypto-afalgLinux AF_ALG crypto backend driver vhost-user vhost-user support capstonecapstone disassembler support + libpmem libpmem support NOTE: The object files are built at the place where configure is launched EOF @@ -5445,6 +5451,24 @@ EOF fi ## +# check for libpmem + +if test "$libpmem" != "no"; then + if $pkg_config --exists "libpmem"; then + libpmem="yes" + libpmem_libs=$($pkg_config --libs libpmem) + libpmem_cflags=$($pkg_config --cflags libpmem) + libs_softmmu="$libs_softmmu $libpmem_libs" + QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags" + else + if test "$libpmem" = "yes" ; then + feature_not_found "libpmem" "Install nvml or pmdk" + fi + libpmem="no" + fi +fi + +## # End of CC checks # After here, no more $cc or $ld runs @@ -5907,6 +5931,7 @@ echo "avx2 optimization $avx2_opt" echo "replication support $replication" echo "VxHS block device $vxhs" echo "capstone $capstone" +echo "libpmem support $libpmem" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" @@ -6651,6 +6676,10 @@ if test "$vxhs" = "yes" ; then echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak fi +if test "$libpmem" = "yes" ; then + echo "CONFIG_LIBPMEM=y" >> $config_host_mak +fi + if test "$tcg_interpreter" = "yes"; then QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES" elif test "$ARCH" = "sparc64" ; then -- 2.7.4
[Qemu-devel] [PATCH V6 2/7] memory, exec: switch file ram allocation functions to 'flags' parameters
From: Junyan He As more flag parameters besides the existing 'share' are going to be added to following functions memory_region_init_ram_from_file qemu_ram_alloc_from_fd qemu_ram_alloc_from_file let's switch them to use the 'flags' parameters so as to ease future flag additions. The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags, and other flag bits are ignored by above functions right now. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang --- backends/hostmem-file.c | 3 ++- exec.c | 10 +- include/exec/memory.h | 8 ++-- include/exec/ram_addr.h | 25 +++-- memory.c| 8 +--- numa.c | 2 +- 6 files changed, 42 insertions(+), 14 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 134b08d..34c68bb 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) path = object_get_canonical_path(OBJECT(backend)); memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, - backend->size, fb->align, backend->share, + backend->size, fb->align, + backend->share ? RAM_SHARED : 0, fb->mem_path, errp); g_free(path); } diff --git a/exec.c b/exec.c index 302c04b..f2082fa 100644 --- a/exec.c +++ b/exec.c @@ -2054,7 +2054,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared) #ifdef __linux__ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, - bool share, int fd, + uint64_t ram_flags, int fd, Error **errp) { RAMBlock *new_block; @@ -2096,14 +2096,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, new_block->mr = mr; new_block->used_length = size; new_block->max_length = size; -new_block->flags = share ? RAM_SHARED : 0; +new_block->flags = ram_flags; new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp); if (!new_block->host) { g_free(new_block); return NULL; } -ram_block_add(new_block, &local_err, share); +ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED); if (local_err) { g_free(new_block); error_propagate(errp, local_err); @@ -2115,7 +2115,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, - bool share, const char *mem_path, + uint64_t ram_flags, const char *mem_path, Error **errp) { int fd; @@ -2127,7 +2127,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, return NULL; } -block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp); +block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp); if (!block) { if (created) { unlink(mem_path); diff --git a/include/exec/memory.h b/include/exec/memory.h index 3da315e..3b68a43 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -596,6 +596,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, void *host), Error **errp); #ifdef __linux__ + /** * memory_region_init_ram_from_file: Initialize RAM memory region with a *mmap-ed backend. @@ -607,7 +608,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @size: size of the region. * @align: alignment of the region base address; if 0, the default alignment * (getpagesize()) will be used. - * @share: %true if memory must be mmaped with the MAP_SHARED flag + * @ram_flags: specify properties of this memory region, which can be one or + * bit-or of following values: + * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag + * Other bits are ignored. * @path: the path in which to allocate the RAM. * @errp: pointer to Error*, to store an error if it happens. * @@ -619,7 +623,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, const char *name, uint64_t size, uint64_t align, - bool share, + uint64_t ram_flags, const char *path,
[Qemu-devel] [PATCH V6 1/7] memory, exec: Expose all memory block related flags.
From: Junyan He We need to use these flags in other files rather than just in exec.c, For example, RAM_SHARED should be used when create a ram block from file. We expose them the exec/memory.h Signed-off-by: Junyan He --- exec.c| 17 - include/exec/memory.h | 17 + 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/exec.c b/exec.c index c30f905..302c04b 100644 --- a/exec.c +++ b/exec.c @@ -87,23 +87,6 @@ AddressSpace address_space_memory; MemoryRegion io_mem_rom, io_mem_notdirty; static MemoryRegion io_mem_unassigned; - -/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ -#define RAM_PREALLOC (1 << 0) - -/* RAM is mmap-ed with MAP_SHARED */ -#define RAM_SHARED (1 << 1) - -/* Only a portion of RAM (used_length) is actually used, and migrated. - * This used_length size can change across reboots. - */ -#define RAM_RESIZEABLE (1 << 2) - -/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically - * zero the page and wake waiting processes. - * (Set during postcopy) - */ -#define RAM_UF_ZEROPAGE (1 << 3) #endif #ifdef TARGET_PAGE_BITS_VARY diff --git a/include/exec/memory.h b/include/exec/memory.h index 67ea7fe..3da315e 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -102,6 +102,23 @@ struct IOMMUNotifier { }; typedef struct IOMMUNotifier IOMMUNotifier; +/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ +#define RAM_PREALLOC (1 << 0) + +/* RAM is mmap-ed with MAP_SHARED */ +#define RAM_SHARED (1 << 1) + +/* Only a portion of RAM (used_length) is actually used, and migrated. + * This used_length size can change across reboots. + */ +#define RAM_RESIZEABLE (1 << 2) + +/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically + * zero the page and wake waiting processes. + * (Set during postcopy) + */ +#define RAM_UF_ZEROPAGE (1 << 3) + static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn, IOMMUNotifierFlag flags, hwaddr start, hwaddr end) -- 2.7.4
[Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory
From: Junyan He QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live migration. If the backend is on the persistent memory, QEMU needs to take proper operations to ensure its writes persistent on the persistent memory. Otherwise, a host power failure may result in the loss the guest data on the persistent memory. This v3 patch series is based on Marcel's patch "mem: add share parameter to memory-backend-ram" [1] because of the changes in patch 1. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html Previous versions can be found at v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html Changes in v6: * (Patch 1) Expose all ram block flags rather than redefine the flags. * (Patch 4) Use pkg-config rather the hard check when configure. * (Patch 7) Sync and flush all the pmem data when migration completes, rather than sync pages one by one in previous version. Changes in v5: * (Patch 9) Add post copy check and output some messages for nvdimm. Changes in v4: * (Patch 2) Fix compilation errors found by patchew. Changes in v3: * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle PMEM writes in it, so we don't need the _common function. * (Patch 6) Expose qemu_get_buffer_common so we can remove the unnecessary qemu_get_buffer_to_pmem wrapper. * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle PMEM writes in it, so we can remove the unnecessary xbzrle_decode_buffer_{common, to_pmem}. * Move libpmem stubs to stubs/pmem.c and fix the compilation failures of test-{xbzrle,vmstate}.c. Changes in v2: * (Patch 1) Use a flags parameter in file ram allocation functions. * (Patch 2) Add a new option 'pmem' to hostmem-file. * (Patch 3) Use libpmem to operate on the persistent memory, rather than re-implementing those operations in QEMU. * (Patch 5-8) Consider the write persistence in the migration path. Junyan: [1/7] memory, exec: Expose all memory block related flags. [6/7] migration/ram: Add check and info message to nvdimm post copy. [7/7] migration/ram: ensure write persistence on loading all date to PMEM. Haozhong: [5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation Haozhong & Junyan: [2/7] memory, exec: switch file ram allocation functions to 'flags' parameters [3/7] hostmem-file: add the 'pmem' option [4/7] configure: add libpmem support Signed-off-by: Haozhong Zhang Signed-off-by: Junyan He --- backends/hostmem-file.c | 28 +++- configure | 29 + docs/nvdimm.txt | 14 ++ exec.c | 36 ++-- hw/mem/nvdimm.c | 9 - include/exec/memory.h | 31 +-- include/exec/ram_addr.h | 28 ++-- include/qemu/pmem.h | 24 memory.c| 8 +--- migration/ram.c | 18 ++ numa.c | 2 +- qemu-options.hx | 7 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 23 +++ 14 files changed, 226 insertions(+), 32 deletions(-) -- 2.7.4
Re: [Qemu-devel] [PATCH 2/9 V5] hostmem-file: add the 'pmem' option
Because we have disk backend imitation for NVDimm kind memory, I think it is more flexible for user to specify the real or not real pmem rather than we check in qemu using the pmem_is_pmem API From: Stefan Hajnoczi Sent: Thursday, May 31, 2018 1:09:42 PM To: junyan...@gmx.com Cc: qemu-devel@nongnu.org; Haozhong Zhang; xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; m...@redhat.com; dgilb...@redhat.com; ehabk...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; pbonz...@redhat.com; imamm...@redhat.com; r...@twiddle.net Subject: Re: [Qemu-devel] [PATCH 2/9 V5] hostmem-file: add the 'pmem' option On Thu, May 10, 2018 at 10:08:51AM +0800, junyan...@gmx.com wrote: > From: Junyan He > > When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it > needs to know whether the backend storage is a real persistent memory, > in order to decide whether special operations should be performed to > ensure the data persistence. > > This boolean option 'pmem' allows users to specify whether the backend > storage of memory-backend-file is a real persistent memory. If > 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the > corresponding memory region. I'm still not sure if this option is necessary since pmem_is_pmem() is available with the introduction of the libpmem dependency. Why can't it be used? Stefan
Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory
> > So Haozhong's manner seems to be a little faster and I choose to keep that. > > > > If you want to choose this manner, the code will be clean and no need for > > > > typedef struct { > > > void (*memset)(void *s, int c, size_t n); > > > void (*memcpy)(void *dest, const void *src, size_t n); > > > } MemoryOperations; > > > > > > performance is close, and I am a little new in Qemu:), so both options are > > OK for me, > > > > Which one do you prefer to? > The one with the least impact; the migration code is getting more and > more complex, so having to do the 'if (is_pmem)' check everywhere isn't > nice, passing an 'ops' pointer in is better. However if you can do the > 'flush before complete' instead then the amount of code change is a LOT > smaller. > The only other question is whether from your pmem view, the > flush-before-complete causes any problems; in the worst case, how long > could the flush take? According to my understanding, flush-before-complete should be OK and save. Haozhong gave the hint that the flush step by step may be faster and I think the benchmark shows that they are close. The worst case is that all the pmem-like memory will be flushed after completing migration, I am not sure whether there are some unused pmem for new guest will also be flushed. > Dave From: Dr. David Alan Gilbert Sent: Thursday, May 31, 2018 2:42:19 PM To: Junyan He Cc: Stefan Hajnoczi; qemu-devel@nongnu.org; Haozhong Zhang; xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; m...@redhat.com; ehabk...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; pbonz...@redhat.com; imamm...@redhat.com; r...@twiddle.net Subject: Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory * Junyan He (junyan...@gmx.com) wrote: > > Also, there was a discussion about leaving the code unchanged but adding > > an nvdimm_flush() call at the very end of migration. I think someone > > benchmarked it but can't find the email. Please post a link or > > summarize the results, because that approach would be much less > > invasive. Thanks! > > > And previous comments: > > > > > > 2. The migration/ram code is invasive. Is it really necessary to > > > >persist data each time pages are loaded from a migration stream? It > > > >seems simpler to migrate as normal and call pmem_persist() just once > > > >after RAM has been migrated but before the migration completes. > > > > > > The concern is about the overhead of cache flush. > > > > > > In this patch series, if possible, QEMU will use pmem_mem{set,cpy}_nodrain > > > APIs to copy NVDIMM blocks. Those APIs use movnt (if it's available) and > > > can avoid the subsequent cache flush. > > > > > > Anyway, I'll make some microbenchmark to check which one will be better. > > > The problem is not just the overhead; the problem is the code > > complexity; this series makes all the paths through the migration code > > more complex in places we wouldn't expect to change. > > I already use the migration info tool and list the result in the Mail just > after this patch set sent: > > Disable all haozhong's pmem_drain and pmem_memset_nodrain kind function call > and make the cleanup function do the flush job like this: > > static int ram_load_cleanup(void *opaque) > { > RAMBlock *rb; > RAMBLOCK_FOREACH(rb) { > if (ramblock_is_pmem(rb)) { > pmem_persist(rb->host, rb->used_length); > } > } > > xbzrle_load_cleanup(); > compress_threads_load_cleanup(); > > RAMBLOCK_FOREACH(rb) { > g_free(rb->receivedmap); > rb->receivedmap = NULL; > } > return 0; > } > > > The migrate info result is: > > Haozhong's Manner > > (qemu) migrate -d tcp:localhost: > (qemu) info migrate > globals: > store-global-state: on > only-migratable: off > send-configuration: on > send-section-footer: on > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: > off compress: off events: off postcopy-ram: off x-colo: off release-ram: off > block: off return-path: off pause-before-switchover: off x-multifd: off > dirty-bitmaps: off postcopy-blocktime: off > Migration status: completed > total time: 333668 milliseconds > downtime: 17 milliseconds > setup: 50 milliseconds > transferred ram: 10938039 kbytes > throughput: 268.55 mbps > remaining ram: 0 kbytes >
Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory
> Also, there was a discussion about leaving the code unchanged but adding > an nvdimm_flush() call at the very end of migration. I think someone > benchmarked it but can't find the email. Please post a link or > summarize the results, because that approach would be much less > invasive. Thanks! And previous comments: > > > 2. The migration/ram code is invasive. Is it really necessary to > > >persist data each time pages are loaded from a migration stream? It > > >seems simpler to migrate as normal and call pmem_persist() just once > > >after RAM has been migrated but before the migration completes. > > > > The concern is about the overhead of cache flush. > > > > In this patch series, if possible, QEMU will use pmem_mem{set,cpy}_nodrain > > APIs to copy NVDIMM blocks. Those APIs use movnt (if it's available) and > > can avoid the subsequent cache flush. > > > > Anyway, I'll make some microbenchmark to check which one will be better. > The problem is not just the overhead; the problem is the code > complexity; this series makes all the paths through the migration code > more complex in places we wouldn't expect to change. I already use the migration info tool and list the result in the Mail just after this patch set sent: Disable all haozhong's pmem_drain and pmem_memset_nodrain kind function call and make the cleanup function do the flush job like this: static int ram_load_cleanup(void *opaque) { RAMBlock *rb; RAMBLOCK_FOREACH(rb) { if (ramblock_is_pmem(rb)) { pmem_persist(rb->host, rb->used_length); } } xbzrle_load_cleanup(); compress_threads_load_cleanup(); RAMBLOCK_FOREACH(rb) { g_free(rb->receivedmap); rb->receivedmap = NULL; } return 0; } The migrate info result is: Haozhong's Manner (qemu) migrate -d tcp:localhost: (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off Migration status: completed total time: 333668 milliseconds downtime: 17 milliseconds setup: 50 milliseconds transferred ram: 10938039 kbytes throughput: 268.55 mbps remaining ram: 0 kbytes total ram: 11027272 kbytes duplicate: 35533 pages skipped: 0 pages normal: 2729095 pages normal bytes: 10916380 kbytes dirty sync count: 4 page size: 4 kbytes (qemu) flush before complete QEMU 2.12.50 monitor - type 'help' for more information (qemu) migrate -d tcp:localhost: (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off Migration status: completed total time: 334836 milliseconds downtime: 17 milliseconds setup: 49 milliseconds transferred ram: 10978886 kbytes throughput: 268.62 mbps remaining ram: 0 kbytes total ram: 11027272 kbytes duplicate: 23149 pages skipped: 0 pages normal: 2739314 pages normal bytes: 10957256 kbytes dirty sync count: 4 page size: 4 kbytes (qemu) So Haozhong's manner seems to be a little faster and I choose to keep that. If you want to choose this manner, the code will be clean and no need for > typedef struct { > void (*memset)(void *s, int c, size_t n); > void (*memcpy)(void *dest, const void *src, size_t n); > } MemoryOperations; performance is close, and I am a little new in Qemu:), so both options are OK for me, Which one do you prefer to? From: Stefan Hajnoczi Sent: Thursday, May 31, 2018 1:18:58 PM To: junyan...@gmx.com Cc: qemu-devel@nongnu.org; Haozhong Zhang; xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; m...@redhat.com; dgilb...@redhat.com; ehabk...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; pbonz...@redhat.com; imamm...@redhat.com; r...@twiddle.net Subject: Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory David Gilbert previously suggested a memory access interface. I guess it would look something like this: typedef struct { void (*memset)(void *s, int c, size_t n); void (*memcpy)(void *dest, const void *src, size_t n); } MemoryOperations; That way code doesn't need if (pmem) A else B. It can just do mem_ops->foo(). Have you looked into this idea? Also, there was a dis
Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory
GEN qemu-doc.html GEN qemu-doc.txt GEN qemu.1 CC s390-ccw/bootmap.o GEN docs/interop/qemu-qmp-ref.html ./qemu-options.texi:2855: unknown command `address' ./qemu-options.texi:2855: unknown command `hidden' make: *** [Makefile:915: qemu-doc.html] Error 1 It seems that this is not caused by my patch set? And I can not duplicate in local. Pings, thanks From: Qemu-devel on behalf of Junyan He Sent: Monday, May 21, 2018 3:19:48 AM To: junyan...@gmx.com Cc: Haozhong Zhang; xiaoguangrong.e...@gmail.com; crosthwaite.pe...@gmail.com; m...@redhat.com; qemu-devel@nongnu.org; dgilb...@redhat.com; quint...@redhat.com; Junyan He; stefa...@redhat.com; imamm...@redhat.com; pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com Subject: Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory Ping for review, thanks Sent: Thursday, May 10, 2018 at 10:08 AM From: junyan...@gmx.com To: qemu-devel@nongnu.org Cc: "Haozhong Zhang" , xiaoguangrong.e...@gmail.com, crosthwaite.pe...@gmail.com, m...@redhat.com, dgilb...@redhat.com, ehabk...@redhat.com, quint...@redhat.com, "Junyan He" , stefa...@redhat.com, pbonz...@redhat.com, imamm...@redhat.com, r...@twiddle.net Subject: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory From: Junyan He QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live migration. If the backend is on the persistent memory, QEMU needs to take proper operations to ensure its writes persistent on the persistent memory. Otherwise, a host power failure may result in the loss the guest data on the persistent memory. This v3 patch series is based on Marcel's patch "mem: add share parameter to memory-backend-ram" [1] because of the changes in patch 1. [1] [1]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html Previous versions can be found at V4: [2]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html v3: [3]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html v2: [4]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html v1: [5]https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html Changes in v5: * (Patch 9) Add post copy check and output some messages for nvdimm. Changes in v4: * (Patch 2) Fix compilation errors found by patchew. Changes in v3: * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle PMEM writes in it, so we don't need the _common function. * (Patch 6) Expose qemu_get_buffer_common so we can remove the unnecessary qemu_get_buffer_to_pmem wrapper. * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle PMEM writes in it, so we can remove the unnecessary xbzrle_decode_buffer_{common, to_pmem}. * Move libpmem stubs to stubs/pmem.c and fix the compilation failures of test-{xbzrle,vmstate}.c. Changes in v2: * (Patch 1) Use a flags parameter in file ram allocation functions. * (Patch 2) Add a new option 'pmem' to hostmem-file. * (Patch 3) Use libpmem to operate on the persistent memory, rather than re-implementing those operations in QEMU. * (Patch 5-8) Consider the write persistence in the migration path. Haozhong Zhang (8): [1/9] memory, exec: switch file ram allocation functions to 'flags' parameters [2/9] hostmem-file: add the 'pmem' option [3/9] configure: add libpmem support [4/9] mem/nvdimm: ensure write persistence to PMEM in label emulation [5/9] migration/ram: ensure write persistence on loading zero pages to PMEM [6/9] migration/ram: ensure write persistence on loading normal pages to PMEM [7/9] migration/ram: ensure write persistence on loading compressed pages to PMEM [8/9] migration/ram: ensure write persistence on loading xbzrle pages to PMEM Junyan He (1): [9/9] migration/ram: Add check and info message to nvdimm post copy. Signed-off-by: Haozhong Zhang Signed-off-by: Junyan He --- backends/hostmem-file.c | 27 ++- configure | 35 +++ docs/nvdimm.txt | 14 ++ exec.c | 20 hw/mem/nvdimm.c | 9 - include/exec/memory.h | 12 ++-- include/exec/ram_addr.h | 28 ++-- include/migration/qemu-file-types.h | 2 ++ include/qemu/pmem.h | 27 +++ memory.c | 8 +--- migration/qemu-file.c | 29 +++-- migration/ram.c | 52 ++-- migration/ram.h | 2 +- migration/rdma.c | 2 +- migration/xbzrle.c | 8 ++-- migration/xbzrle.h | 3 ++- numa.c | 2 +-
Re: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory
Ping for review, thanks Sent: Thursday, May 10, 2018 at 10:08 AM From: junyan...@gmx.com To: qemu-devel@nongnu.org Cc: "Haozhong Zhang" , xiaoguangrong.e...@gmail.com, crosthwaite.pe...@gmail.com, m...@redhat.com, dgilb...@redhat.com, ehabk...@redhat.com, quint...@redhat.com, "Junyan He" , stefa...@redhat.com, pbonz...@redhat.com, imamm...@redhat.com, r...@twiddle.net Subject: [Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory From: Junyan He QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live migration. If the backend is on the persistent memory, QEMU needs to take proper operations to ensure its writes persistent on the persistent memory. Otherwise, a host power failure may result in the loss the guest data on the persistent memory. This v3 patch series is based on Marcel's patch "mem: add share parameter to memory-backend-ram" [1] because of the changes in patch 1. [1] [1]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html Previous versions can be found at V4: [2]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html v3: [3]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html v2: [4]https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html v1: [5]https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html Changes in v5: * (Patch 9) Add post copy check and output some messages for nvdimm. Changes in v4: * (Patch 2) Fix compilation errors found by patchew. Changes in v3: * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle PMEM writes in it, so we don't need the _common function. * (Patch 6) Expose qemu_get_buffer_common so we can remove the unnecessary qemu_get_buffer_to_pmem wrapper. * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle PMEM writes in it, so we can remove the unnecessary xbzrle_decode_buffer_{common, to_pmem}. * Move libpmem stubs to stubs/pmem.c and fix the compilation failures of test-{xbzrle,vmstate}.c. Changes in v2: * (Patch 1) Use a flags parameter in file ram allocation functions. * (Patch 2) Add a new option 'pmem' to hostmem-file. * (Patch 3) Use libpmem to operate on the persistent memory, rather than re-implementing those operations in QEMU. * (Patch 5-8) Consider the write persistence in the migration path. Haozhong Zhang (8): [1/9] memory, exec: switch file ram allocation functions to 'flags' parameters [2/9] hostmem-file: add the 'pmem' option [3/9] configure: add libpmem support [4/9] mem/nvdimm: ensure write persistence to PMEM in label emulation [5/9] migration/ram: ensure write persistence on loading zero pages to PMEM [6/9] migration/ram: ensure write persistence on loading normal pages to PMEM [7/9] migration/ram: ensure write persistence on loading compressed pages to PMEM [8/9] migration/ram: ensure write persistence on loading xbzrle pages to PMEM Junyan He (1): [9/9] migration/ram: Add check and info message to nvdimm post copy. Signed-off-by: Haozhong Zhang Signed-off-by: Junyan He --- backends/hostmem-file.c | 27 ++- configure | 35 +++ docs/nvdimm.txt | 14 ++ exec.c | 20 hw/mem/nvdimm.c | 9 - include/exec/memory.h | 12 ++-- include/exec/ram_addr.h | 28 ++-- include/migration/qemu-file-types.h | 2 ++ include/qemu/pmem.h | 27 +++ memory.c | 8 +--- migration/qemu-file.c | 29 +++-- migration/ram.c | 52 ++-- migration/ram.h | 2 +- migration/rdma.c | 2 +- migration/xbzrle.c | 8 ++-- migration/xbzrle.h | 3 ++- numa.c | 2 +- qemu-options.hx | 7 +++ stubs/Makefile.objs | 1 + stubs/pmem.c | 37 + tests/Makefile.include | 4 ++-- tests/test-xbzrle.c | 4 ++-- 22 files changed, 290 insertions(+), 43 deletions(-) -- 2.7.4 References 1. https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html 2. https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html 3. https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html 4. https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html 5. https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html
Re: [Qemu-devel] [PATCH 1/9 V5] memory, exec: switch file ram allocation functions to 'flags' parameters
> >Â >Â > >Sent:Â Friday, May 11, 2018 at 5:08 AM >From:Â "Murilo Opsfelder Araujo" >To:Â junyan...@gmx.com >Cc:Â "Haozhong Zhang" , xiaoguangrong.e...@gmail.com, >crosthwaite.pe...@gmail.com, m...@redhat.com, qemu-devel@nongnu.org, >dgilb...@redhat.com, quint...@redhat.com, "Junyan He" , >stefa...@redhat.com, imamm...@redhat.com, pbonz...@redhat.com, >r...@twiddle.net, ehabk...@redhat.com >Subject:Â Re: [Qemu-devel] [PATCH 1/9 V5] memory, exec: switch file ram >allocation functions to 'flags' parameters >On Thu, May 10, 2018 at 10:08:50AM +0800, junyan...@gmx.com wrote: >> From: Junyan He >> >> As more flag parameters besides the existing 'share' are going to be >> added to following functions >> memory_region_init_ram_from_file >> qemu_ram_alloc_from_fd >> qemu_ram_alloc_from_file >> let's switch them to use the 'flags' parameters so as to ease future >> flag additions. >> >> The existing 'share' flag is converted to the QEMU_RAM_SHARE bit in >> flags, and other flag bits are ignored by above functions right now. >> >> Signed-off-by: Haozhong Zhang >> --- >> backends/hostmem-file.c | 3 ++- >> exec.c | 7 --- >> include/exec/memory.h | 10 -- >> include/exec/ram_addr.h | 25 +++-- >> memory.c | 8 +--- >> numa.c | 2 +- >> 6 files changed, 43 insertions(+), 12 deletions(-) >> >> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c >> index 134b08d..30df843 100644 >> --- a/backends/hostmem-file.c >> +++ b/backends/hostmem-file.c >> @@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, >> Error **errp) >> path = object_get_canonical_path(OBJECT(backend)); >> memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), >> path, >> - backend->size, fb->align, backend->share, >> + backend->size, fb->align, >> + backend->share ? QEMU_RAM_SHARE : 0, >> fb->mem_path, errp); >> g_free(path); >> } >> diff --git a/exec.c b/exec.c >> index c7fcefa..fa33c29 100644 >> --- a/exec.c >> +++ b/exec.c >> @@ -2030,12 +2030,13 @@ static void ram_block_add(RAMBlock *new_block, Error >> **errp, bool shared) >> >> #ifdef __linux__ >> RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, >> - bool share, int fd, >> + uint64_t flags, int fd, >> Error **errp) >> { >> RAMBlock *new_block; >> Error *local_err = NULL; >> int64_t file_size; >> + bool share = flags & QEMU_RAM_SHARE; >> >> if (xen_enabled()) { >> error_setg(errp, "-mem-path not supported with Xen"); >> @@ -2091,7 +2092,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, >> MemoryRegion *mr, >> >> >> RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, >> - bool share, const char *mem_path, >> + uint64_t flags, const char *mem_path, >> Error **errp) >> { >> int fd; >> @@ -2103,7 +2104,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, >> MemoryRegion *mr, >> return NULL; >> } >> >> - block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp); >> + block = qemu_ram_alloc_from_fd(size, mr, flags, fd, errp); >> if (!block) { >> if (created) { >> unlink(mem_path); >> diff --git a/include/exec/memory.h b/include/exec/memory.h >> index 31eae0a..0460313 100644 >> --- a/include/exec/memory.h >> +++ b/include/exec/memory.h >> @@ -507,6 +507,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, >> void *host), >> Error **errp); >> #ifdef __linux__ >> + >> +#define QEMU_RAM_SHARE (1UL << 0) >> + > >Hi, Junyan. > >How does this differ from RAM_SHARED in exec.c? > Yes, they are really the same meaning. But this one is for memory object backend while the RAM_SHARED in exec.c is used for memory block. I think we need it here. >> /** >> * memory_region_init_ram_from_file: Initialize RAM memory region with a >> * mmap-ed backend. >> @@ -518,7 +521,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, >> * @size: size of the region. >> * @align: alignment of the region base address; if 0, the default alignment >> * (getpagesize()) will be used. >> - * @share: %true if memory must be mmaped with the MAP_SHARED flag >> + * @flags: specify properties of this memory region, which can be one or >> bit-or >> + * of following values: >> + * - QEMU_RAM_SHARE: memory must be
[Qemu-devel] [PATCH 6/9 V5] migration/ram: ensure write persistence on loading normal pages to PMEM
From: Junyan He When loading a normal page to persistent memory, load its data by libpmem function pmem_memcpy_nodrain() instead of memcpy(). Combined with a call to pmem_drain() at the end of memory loading, we can guarantee all those normal pages are persistenly loaded to PMEM. Signed-off-by: Haozhong Zhang --- include/migration/qemu-file-types.h | 2 ++ include/qemu/pmem.h | 1 + migration/qemu-file.c | 29 +++-- migration/ram.c | 2 +- stubs/pmem.c| 5 + tests/Makefile.include | 2 +- 6 files changed, 29 insertions(+), 12 deletions(-) diff --git a/include/migration/qemu-file-types.h b/include/migration/qemu-file-types.h index bd6d7dd..c7c3f66 100644 --- a/include/migration/qemu-file-types.h +++ b/include/migration/qemu-file-types.h @@ -33,6 +33,8 @@ void qemu_put_byte(QEMUFile *f, int v); void qemu_put_be16(QEMUFile *f, unsigned int v); void qemu_put_be32(QEMUFile *f, unsigned int v); void qemu_put_be64(QEMUFile *f, uint64_t v); +size_t qemu_get_buffer_common(QEMUFile *f, uint8_t *buf, size_t size, + bool is_pmem); size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size); int qemu_get_byte(QEMUFile *f); diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 9f39ce8..cb9fa5f 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -16,6 +16,7 @@ #include #else /* !CONFIG_LIBPMEM */ +void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len); void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); void *pmem_memset_nodrain(void *pmemdest, int c, size_t len); void pmem_drain(void); diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 0463f4c..1ec31d5 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -26,6 +26,7 @@ #include "qemu-common.h" #include "qemu/error-report.h" #include "qemu/iov.h" +#include "qemu/pmem.h" #include "migration.h" #include "qemu-file.h" #include "trace.h" @@ -471,18 +472,13 @@ size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t size, size_t offset) return size; } -/* - * Read 'size' bytes of data from the file into buf. - * 'size' can be larger than the internal buffer. - * - * It will return size bytes unless there was an error, in which case it will - * return as many as it managed to read (assuming blocking fd's which - * all current QEMUFile are) - */ -size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size) +size_t qemu_get_buffer_common(QEMUFile *f, uint8_t *buf, size_t size, + bool is_pmem) { size_t pending = size; size_t done = 0; +void *(*memcpy_func)(void *d, const void *s, size_t n) = +is_pmem ? pmem_memcpy_nodrain : memcpy; while (pending > 0) { size_t res; @@ -492,7 +488,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size) if (res == 0) { return done; } -memcpy(buf, src, res); +memcpy_func(buf, src, res); qemu_file_skip(f, res); buf += res; pending -= res; @@ -502,6 +498,19 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size) } /* + * Read 'size' bytes of data from the file into buf. + * 'size' can be larger than the internal buffer. + * + * It will return size bytes unless there was an error, in which case it will + * return as many as it managed to read (assuming blocking fd's which + * all current QEMUFile are) + */ +size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size) +{ +return qemu_get_buffer_common(f, buf, size, false); +} + +/* * Read 'size' bytes of data from the file. * 'size' can be larger than the internal buffer. * diff --git a/migration/ram.c b/migration/ram.c index e6ae9e3..2a180bc 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3063,7 +3063,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) break; case RAM_SAVE_FLAG_PAGE: -qemu_get_buffer(f, host, TARGET_PAGE_SIZE); +qemu_get_buffer_common(f, host, TARGET_PAGE_SIZE, is_pmem); break; case RAM_SAVE_FLAG_COMPRESS_PAGE: diff --git a/stubs/pmem.c b/stubs/pmem.c index 2f07ae0..b50c35e 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -26,3 +26,8 @@ void *pmem_memset_nodrain(void *pmemdest, int c, size_t len) void pmem_drain(void) { } + +void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len) +{ +return memcpy(pmemdest, src, len); +} diff --git a/tests/Makefile.include b/tests/Makefile.include index 3b9a5e3..5c25b9b 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -652,7 +652,7 @@ tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \ $(te
[Qemu-devel] [PATCH 4/9 V5] mem/nvdimm: ensure write persistence to PMEM in label emulation
From: Junyan He Guest writes to vNVDIMM labels are intercepted and performed on the backend by QEMU. When the backend is a real persistent memort, QEMU needs to take proper operations to ensure its write persistence on the persistent memory. Otherwise, a host power failure may result in the loss of guest label configurations. Signed-off-by: Haozhong Zhang --- hw/mem/nvdimm.c | 9 - include/qemu/pmem.h | 23 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 19 +++ 4 files changed, 51 insertions(+), 1 deletion(-) create mode 100644 include/qemu/pmem.h create mode 100644 stubs/pmem.c diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index acb656b..0c962fd 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "qemu/pmem.h" #include "qapi/error.h" #include "qapi/visitor.h" #include "hw/mem/nvdimm.h" @@ -155,11 +156,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, { MemoryRegion *mr; PCDIMMDevice *dimm = PC_DIMM(nvdimm); +bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), +"pmem", NULL); uint64_t backend_offset; nvdimm_validate_rw_label_data(nvdimm, size, offset); -memcpy(nvdimm->label_data + offset, buf, size); +if (!is_pmem) { +memcpy(nvdimm->label_data + offset, buf, size); +} else { +pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); +} mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort); backend_offset = memory_region_size(mr) - nvdimm->label_size + offset; diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h new file mode 100644 index 000..00d6680 --- /dev/null +++ b/include/qemu/pmem.h @@ -0,0 +1,23 @@ +/* + * QEMU header file for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_PMEM_H +#define QEMU_PMEM_H + +#ifdef CONFIG_LIBPMEM +#include +#else /* !CONFIG_LIBPMEM */ + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); + +#endif /* CONFIG_LIBPMEM */ + +#endif /* !QEMU_PMEM_H */ diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 2d59d84..ba944b9 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -43,3 +43,4 @@ stub-obj-y += xen-common.o stub-obj-y += xen-hvm.o stub-obj-y += pci-host-piix.o stub-obj-y += ram-block.o +stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o \ No newline at end of file diff --git a/stubs/pmem.c b/stubs/pmem.c new file mode 100644 index 000..b4ec72d --- /dev/null +++ b/stubs/pmem.c @@ -0,0 +1,19 @@ +/* + * Stubs for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include + +#include "qemu/pmem.h" + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) +{ +return memcpy(pmemdest, src, len); +} -- 2.7.4
[Qemu-devel] [PATCH 3/9 V5] configure: add libpmem support
From: Junyan He Add a pair of configure options --{enable,disable}-libpmem to control whether QEMU is compiled with PMDK libpmem [1]. QEMU may write to the host persistent memory (e.g. in vNVDIMM label emulation and live migration), so it must take the proper operations to ensure the persistence of its own writes. Depending on the CPU models and available instructions, the optimal operation can vary [2]. PMDK libpmem have already implemented those operations on multiple CPU models (x86 and ARM) and the logic to select the optimal ones, so QEMU can just use libpmem rather than re-implement them. [1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/ [2] https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33 Signed-off-by: Haozhong Zhang --- configure | 35 +++ 1 file changed, 35 insertions(+) diff --git a/configure b/configure index 1443422..cbb3793 100755 --- a/configure +++ b/configure @@ -456,6 +456,7 @@ jemalloc="no" replication="yes" vxhs="" libxml2="" +libpmem="" supported_cpu="no" supported_os="no" @@ -1379,6 +1380,10 @@ for opt do ;; --disable-git-update) git_update=no ;; + --enable-libpmem) libpmem=yes + ;; + --disable-libpmem) libpmem=no + ;; *) echo "ERROR: unknown option $opt" echo "Try '$0 --help' for more information" @@ -1636,6 +1641,7 @@ disabled with --disable-FEATURE, default is enabled if available: crypto-afalgLinux AF_ALG crypto backend driver vhost-user vhost-user support capstonecapstone disassembler support + libpmem libpmem support NOTE: The object files are built at the place where configure is launched EOF @@ -5443,6 +5449,30 @@ EOF fi ## +# check for libpmem + +if test "$libpmem" != "no"; then + cat > $TMPC < +int main(void) +{ + pmem_is_pmem(0, 0); + return 0; +} +EOF + libpmem_libs="-lpmem" + if compile_prog "" "$libpmem_libs" ; then +libs_softmmu="$libpmem_libs $libs_softmmu" +libpmem="yes" + else +if test "$libpmem" = "yes" ; then + feature_not_found "libpmem" "Install nvml or pmdk" +fi +libpmem="no" + fi +fi + +## # End of CC checks # After here, no more $cc or $ld runs @@ -5903,6 +5933,7 @@ echo "avx2 optimization $avx2_opt" echo "replication support $replication" echo "VxHS block device $vxhs" echo "capstone $capstone" +echo "libpmem support $libpmem" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" @@ -6647,6 +6678,10 @@ if test "$vxhs" = "yes" ; then echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak fi +if test "$libpmem" = "yes" ; then + echo "CONFIG_LIBPMEM=y" >> $config_host_mak +fi + if test "$tcg_interpreter" = "yes"; then QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES" elif test "$ARCH" = "sparc64" ; then -- 2.7.4
[Qemu-devel] [PATCH 9/9 V5] migration/ram: Add check and info message to nvdimm post copy.
From: Junyan He The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He --- migration/ram.c | 9 + 1 file changed, 9 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index afe227e..aa6bb74 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3120,6 +3120,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { +RAMBlock *rb; +RAMBLOCK_FOREACH(rb) { +if (ramblock_is_pmem(rb)) { +info_report("Block: %s, host: %p is a nvdimm memory, postcopy" + "is not supported now!", rb->idstr, rb->host); +return false; +} +} + return migrate_postcopy_ram(); } -- 2.7.4
[Qemu-devel] [PATCH 1/9 V5] memory, exec: switch file ram allocation functions to 'flags' parameters
From: Junyan He As more flag parameters besides the existing 'share' are going to be added to following functions memory_region_init_ram_from_file qemu_ram_alloc_from_fd qemu_ram_alloc_from_file let's switch them to use the 'flags' parameters so as to ease future flag additions. The existing 'share' flag is converted to the QEMU_RAM_SHARE bit in flags, and other flag bits are ignored by above functions right now. Signed-off-by: Haozhong Zhang --- backends/hostmem-file.c | 3 ++- exec.c | 7 --- include/exec/memory.h | 10 -- include/exec/ram_addr.h | 25 +++-- memory.c| 8 +--- numa.c | 2 +- 6 files changed, 43 insertions(+), 12 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 134b08d..30df843 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) path = object_get_canonical_path(OBJECT(backend)); memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, - backend->size, fb->align, backend->share, + backend->size, fb->align, + backend->share ? QEMU_RAM_SHARE : 0, fb->mem_path, errp); g_free(path); } diff --git a/exec.c b/exec.c index c7fcefa..fa33c29 100644 --- a/exec.c +++ b/exec.c @@ -2030,12 +2030,13 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared) #ifdef __linux__ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, - bool share, int fd, + uint64_t flags, int fd, Error **errp) { RAMBlock *new_block; Error *local_err = NULL; int64_t file_size; +bool share = flags & QEMU_RAM_SHARE; if (xen_enabled()) { error_setg(errp, "-mem-path not supported with Xen"); @@ -2091,7 +2092,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, - bool share, const char *mem_path, + uint64_t flags, const char *mem_path, Error **errp) { int fd; @@ -2103,7 +2104,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, return NULL; } -block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp); +block = qemu_ram_alloc_from_fd(size, mr, flags, fd, errp); if (!block) { if (created) { unlink(mem_path); diff --git a/include/exec/memory.h b/include/exec/memory.h index 31eae0a..0460313 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -507,6 +507,9 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, void *host), Error **errp); #ifdef __linux__ + +#define QEMU_RAM_SHARE (1UL << 0) + /** * memory_region_init_ram_from_file: Initialize RAM memory region with a *mmap-ed backend. @@ -518,7 +521,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @size: size of the region. * @align: alignment of the region base address; if 0, the default alignment * (getpagesize()) will be used. - * @share: %true if memory must be mmaped with the MAP_SHARED flag + * @flags: specify properties of this memory region, which can be one or bit-or + * of following values: + * - QEMU_RAM_SHARE: memory must be mmaped with the MAP_SHARED flag + * Other bits are ignored. * @path: the path in which to allocate the RAM. * @errp: pointer to Error*, to store an error if it happens. * @@ -530,7 +536,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, const char *name, uint64_t size, uint64_t align, - bool share, + uint64_t flags, const char *path, Error **errp); diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index cf2446a..b8b01d1 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -72,12 +72,33 @@ static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr, long qemu_getrampagesize(void); unsigned long last_ram_page(void); + +/** + * qemu_ram_alloc_from_file, + * qemu_ram_alloc_from_fd: Allocate a ram block from the specified back + *
[Qemu-devel] [PATCH 7/9 V5] migration/ram: ensure write persistence on loading compressed pages to PMEM
From: Junyan He When loading a compressed page to persistent memory, flush CPU cache after the data is decompressed. Combined with a call to pmem_drain() at the end of memory loading, we can guarantee those compressed pages are persistently loaded to PMEM. Signed-off-by: Haozhong Zhang --- include/qemu/pmem.h | 1 + migration/ram.c | 10 -- stubs/pmem.c| 4 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index cb9fa5f..c9140fb 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -20,6 +20,7 @@ void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len); void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); void *pmem_memset_nodrain(void *pmemdest, int c, size_t len); void pmem_drain(void); +void pmem_flush(const void *addr, size_t len); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index 2a180bc..e0f3dbc 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -286,6 +286,7 @@ struct DecompressParam { uint8_t *compbuf; int len; z_stream stream; +bool is_pmem; }; typedef struct DecompressParam DecompressParam; @@ -2591,6 +2592,9 @@ static void *do_data_decompress(void *opaque) error_report("decompress data failed"); qemu_file_set_error(decomp_file, ret); } +if (param->is_pmem) { +pmem_flush(des, len); +} qemu_mutex_lock(&decomp_done_lock); param->done = true; @@ -2702,7 +2706,8 @@ exit: } static void decompress_data_with_multi_threads(QEMUFile *f, - void *host, int len) + void *host, int len, + bool is_pmem) { int idx, thread_count; @@ -2716,6 +2721,7 @@ static void decompress_data_with_multi_threads(QEMUFile *f, qemu_get_buffer(f, decomp_param[idx].compbuf, len); decomp_param[idx].des = host; decomp_param[idx].len = len; +decomp_param[idx].is_pmem = is_pmem; qemu_cond_signal(&decomp_param[idx].cond); qemu_mutex_unlock(&decomp_param[idx].mutex); break; @@ -3073,7 +3079,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) ret = -EINVAL; break; } -decompress_data_with_multi_threads(f, host, len); +decompress_data_with_multi_threads(f, host, len, is_pmem); break; case RAM_SAVE_FLAG_XBZRLE: diff --git a/stubs/pmem.c b/stubs/pmem.c index b50c35e..9e7d86a 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -31,3 +31,7 @@ void *pmem_memcpy_nodrain(void *pmemdest, const void *src, size_t len) { return memcpy(pmemdest, src, len); } + +void pmem_flush(const void *addr, size_t len) +{ +} -- 2.7.4
[Qemu-devel] [PATCH 8/9 V5] migration/ram: ensure write persistence on loading xbzrle pages to PMEM
From: Junyan He When loading a xbzrle encoded page to persistent memory, load the data via libpmem function pmem_memcpy_nodrain() instead of memcpy(). Combined with a call to pmem_drain() at the end of memory loading, we can guarantee those xbzrle encoded pages are persistently loaded to PMEM. Signed-off-by: Haozhong Zhang --- migration/ram.c| 6 +++--- migration/xbzrle.c | 8 ++-- migration/xbzrle.h | 3 ++- tests/Makefile.include | 2 +- tests/test-xbzrle.c| 4 ++-- 5 files changed, 14 insertions(+), 9 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index e0f3dbc..afe227e 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2441,7 +2441,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, } } -static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host) +static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host, bool is_pmem) { unsigned int xh_len; int xh_flags; @@ -2467,7 +2467,7 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host) /* decode RLE */ if (xbzrle_decode_buffer(loaded_data, xh_len, host, - TARGET_PAGE_SIZE) == -1) { + TARGET_PAGE_SIZE, is_pmem) == -1) { error_report("Failed to load XBZRLE page - decode error!"); return -1; } @@ -3083,7 +3083,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) break; case RAM_SAVE_FLAG_XBZRLE: -if (load_xbzrle(f, addr, host) < 0) { +if (load_xbzrle(f, addr, host, is_pmem) < 0) { error_report("Failed to decompress XBZRLE page at " RAM_ADDR_FMT, addr); ret = -EINVAL; diff --git a/migration/xbzrle.c b/migration/xbzrle.c index 1ba482d..ca713c3 100644 --- a/migration/xbzrle.c +++ b/migration/xbzrle.c @@ -12,6 +12,7 @@ */ #include "qemu/osdep.h" #include "qemu/cutils.h" +#include "qemu/pmem.h" #include "xbzrle.h" /* @@ -126,11 +127,14 @@ int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen, return d; } -int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen) +int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen, + bool is_pmem) { int i = 0, d = 0; int ret; uint32_t count = 0; +void *(*memcpy_func)(void *d, const void *s, size_t n) = +is_pmem ? pmem_memcpy_nodrain : memcpy; while (i < slen) { @@ -167,7 +171,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen) return -1; } -memcpy(dst + d, src + i, count); +memcpy_func(dst + d, src + i, count); d += count; i += count; } diff --git a/migration/xbzrle.h b/migration/xbzrle.h index a0db507..f18f679 100644 --- a/migration/xbzrle.h +++ b/migration/xbzrle.h @@ -17,5 +17,6 @@ int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen, uint8_t *dst, int dlen); -int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen); +int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen, + bool is_pmem); #endif diff --git a/tests/Makefile.include b/tests/Makefile.include index 5c25b9b..23d7162 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -631,7 +631,7 @@ tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y) tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y) tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) $(test-crypto-obj-y) tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o -tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o migration/page_cache.o $(test-util-obj-y) +tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o migration/page_cache.o stubs/pmem.o $(test-util-obj-y) tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o $(test-util-obj-y) tests/test-int128$(EXESUF): tests/test-int128.o tests/rcutorture$(EXESUF): tests/rcutorture.o $(test-util-obj-y) diff --git a/tests/test-xbzrle.c b/tests/test-xbzrle.c index f5e08de..9afa0c4 100644 --- a/tests/test-xbzrle.c +++ b/tests/test-xbzrle.c @@ -101,7 +101,7 @@ static void test_encode_decode_1_byte(void) PAGE_SIZE); g_assert(dlen == (uleb128_encode_small(&buf[0], 4095) + 2)); -rc = xbzrle_decode_buffer(compressed, dlen, buffer, PAGE_SIZE); +rc = xbzrle_decode_buffer(compressed, dlen, buffer, PAGE_SIZE, false); g_assert(rc == PAGE_SIZE); g_assert(memcmp(test, buffer, PAGE_SIZE) == 0); @@ -156,7 +156,7 @@ static void encode_decode_range(void) dlen = xbzrle_encode_buffer(test, buffer, PAGE_SIZE, compressed, PAGE_SIZE); -rc = xbzrle_decode_buffer(compressed
[Qemu-devel] [PATCH 5/9 V5] migration/ram: ensure write persistence on loading zero pages to PMEM
From: Junyan He When loading a zero page, check whether it will be loaded to persistent memory If yes, load it by libpmem function pmem_memset_nodrain(). Combined with a call to pmem_drain() at the end of RAM loading, we can guarantee all those zero pages are persistently loaded. Depending on the host HW/SW configurations, pmem_drain() can be "sfence". Therefore, we do not call pmem_drain() after each pmem_memset_nodrain(), or use pmem_memset_persist() (equally pmem_memset_nodrain() + pmem_drain()), in order to avoid unnecessary overhead. Signed-off-by: Haozhong Zhang --- include/qemu/pmem.h | 2 ++ migration/ram.c | 25 + migration/ram.h | 2 +- migration/rdma.c| 2 +- stubs/pmem.c| 9 + 5 files changed, 34 insertions(+), 6 deletions(-) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 00d6680..9f39ce8 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -17,6 +17,8 @@ #else /* !CONFIG_LIBPMEM */ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); +void *pmem_memset_nodrain(void *pmemdest, int c, size_t len); +void pmem_drain(void); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index 912810c..e6ae9e3 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -51,6 +51,7 @@ #include "qemu/rcu_queue.h" #include "migration/colo.h" #include "migration/block.h" +#include "qemu/pmem.h" /***/ /* ram save/restore */ @@ -2529,11 +2530,16 @@ static inline void *host_from_ram_block_offset(RAMBlock *block, * @host: host address for the zero page * @ch: what the page is filled from. We only support zero * @size: size of the zero page + * @is_pmem: whether @host is in the persistent memory */ -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, bool is_pmem) { if (ch != 0 || !is_zero_range(host, size)) { -memset(host, ch, size); +if (!is_pmem) { +memset(host, ch, size); +} else { +pmem_memset_nodrain(host, ch, size); +} } } @@ -2943,6 +2949,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) bool postcopy_running = postcopy_is_running(); /* ADVISE is earlier, it shows the source has the postcopy capability on */ bool postcopy_advised = postcopy_is_advised(); +bool need_pmem_drain = false; seq_iter++; @@ -2968,6 +2975,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) ram_addr_t addr, total_ram_bytes; void *host = NULL; uint8_t ch; +RAMBlock *block = NULL; +bool is_pmem = false; addr = qemu_get_be64(f); flags = addr & ~TARGET_PAGE_MASK; @@ -2984,7 +2993,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE | RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) { -RAMBlock *block = ram_block_from_stream(f, flags); +block = ram_block_from_stream(f, flags); host = host_from_ram_block_offset(block, addr); if (!host) { @@ -2994,6 +3003,9 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) } ramblock_recv_bitmap_set(block, host); trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host); + +is_pmem = ramblock_is_pmem(block); +need_pmem_drain = need_pmem_drain || is_pmem; } switch (flags & ~RAM_SAVE_FLAG_CONTINUE) { @@ -3047,7 +3059,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) case RAM_SAVE_FLAG_ZERO: ch = qemu_get_byte(f); -ram_handle_compressed(host, ch, TARGET_PAGE_SIZE); +ram_handle_compressed(host, ch, TARGET_PAGE_SIZE, is_pmem); break; case RAM_SAVE_FLAG_PAGE: @@ -3090,6 +3102,11 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) } ret |= wait_for_decompress_done(); + +if (need_pmem_drain) { +pmem_drain(); +} + rcu_read_unlock(); trace_ram_load_complete(ret, seq_iter); return ret; diff --git a/migration/ram.h b/migration/ram.h index 5030be1..5c6a288 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -57,7 +57,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms); int ram_discard_range(const char *block_name, uint64_t start, size_t length); int ram_postcopy_incoming_init(MigrationIncomingState *mis); -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size); +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, bool is_pmem); int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr); bool ramblock_recv_bitmap_test_byte_offset(RAM
[Qemu-devel] [PATCH 2/9 V5] hostmem-file: add the 'pmem' option
From: Junyan He When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it needs to know whether the backend storage is a real persistent memory, in order to decide whether special operations should be performed to ensure the data persistence. This boolean option 'pmem' allows users to specify whether the backend storage of memory-backend-file is a real persistent memory. If 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the corresponding memory region. Signed-off-by: Haozhong Zhang --- backends/hostmem-file.c | 26 +- docs/nvdimm.txt | 14 ++ exec.c | 13 - include/exec/memory.h | 2 ++ include/exec/ram_addr.h | 3 +++ qemu-options.hx | 7 +++ 6 files changed, 63 insertions(+), 2 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 30df843..5d706d4 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -34,6 +34,7 @@ struct HostMemoryBackendFile { bool discard_data; char *mem_path; uint64_t align; +bool is_pmem; }; static void @@ -59,7 +60,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, backend->size, fb->align, - backend->share ? QEMU_RAM_SHARE : 0, + (backend->share ? QEMU_RAM_SHARE : 0) | + (fb->is_pmem ? QEMU_RAM_PMEM : 0), fb->mem_path, errp); g_free(path); } @@ -131,6 +133,25 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, error_propagate(errp, local_err); } +static bool file_memory_backend_get_pmem(Object *o, Error **errp) +{ +return MEMORY_BACKEND_FILE(o)->is_pmem; +} + +static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property 'pmem' of %s '%s'", + object_get_typename(o), backend->id); +return; +} + +fb->is_pmem = value; +} + static void file_backend_unparent(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -162,6 +183,9 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL, &error_abort); +object_class_property_add_bool(oc, "pmem", +file_memory_backend_get_pmem, file_memory_backend_set_pmem, +&error_abort); } static void file_backend_instance_finalize(Object *o) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index e903d8b..bcb2032 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -153,3 +153,17 @@ guest NVDIMM region mapping structure. This unarmed flag indicates guest software that this vNVDIMM device contains a region that cannot accept persistent writes. In result, for example, the guest Linux NVDIMM driver, marks such vNVDIMM device as read-only. + +If the vNVDIMM backend is on the host persistent memory that can be +accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's +suggested to set the 'pmem' option of memory-backend-file to 'on'. When +'pmem=on' and QEMU is built with libpmem [2] support (configured with +--enable-libpmem), QEMU will take necessary operations to guarantee +the persistence of its own writes to the vNVDIMM backend (e.g., in +vNVDIMM label emulation and live migration). + +References +-- + +[1] SNIA NVM Programming Model: https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf +[2] PMDK: http://pmem.io/pmdk/ diff --git a/exec.c b/exec.c index fa33c29..dedeb4d 100644 --- a/exec.c +++ b/exec.c @@ -52,6 +52,9 @@ #include #endif +/* RAM is backed by the persistent memory. */ +#define RAM_PMEM (1 << 3) + #endif #include "qemu/rcu_queue.h" #include "qemu/main-loop.h" @@ -2037,6 +2040,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, Error *local_err = NULL; int64_t file_size; bool share = flags & QEMU_RAM_SHARE; +bool is_pmem = flags & QEMU_RAM_PMEM; if (xen_enabled()) { error_setg(errp, "-mem-path not supported with Xen"); @@ -2073,7 +2077,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, new_block->mr = mr; new_block->used_length = size; new_block->max_length = size; -new_block->flags = share ? RAM_SHARED : 0; +new_block->flags = (share ? RAM_
[Qemu-devel] [PATCH V5 0/9] nvdimm: guarantee persistence of QEMU writes to persistent memory
From: Junyan He QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live migration. If the backend is on the persistent memory, QEMU needs to take proper operations to ensure its writes persistent on the persistent memory. Otherwise, a host power failure may result in the loss the guest data on the persistent memory. This v3 patch series is based on Marcel's patch "mem: add share parameter to memory-backend-ram" [1] because of the changes in patch 1. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html Previous versions can be found at V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html Changes in v5: * (Patch 9) Add post copy check and output some messages for nvdimm. Changes in v4: * (Patch 2) Fix compilation errors found by patchew. Changes in v3: * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle PMEM writes in it, so we don't need the _common function. * (Patch 6) Expose qemu_get_buffer_common so we can remove the unnecessary qemu_get_buffer_to_pmem wrapper. * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle PMEM writes in it, so we can remove the unnecessary xbzrle_decode_buffer_{common, to_pmem}. * Move libpmem stubs to stubs/pmem.c and fix the compilation failures of test-{xbzrle,vmstate}.c. Changes in v2: * (Patch 1) Use a flags parameter in file ram allocation functions. * (Patch 2) Add a new option 'pmem' to hostmem-file. * (Patch 3) Use libpmem to operate on the persistent memory, rather than re-implementing those operations in QEMU. * (Patch 5-8) Consider the write persistence in the migration path. Haozhong Zhang (8): [1/9] memory, exec: switch file ram allocation functions to 'flags' parameters [2/9] hostmem-file: add the 'pmem' option [3/9] configure: add libpmem support [4/9] mem/nvdimm: ensure write persistence to PMEM in label emulation [5/9] migration/ram: ensure write persistence on loading zero pages to PMEM [6/9] migration/ram: ensure write persistence on loading normal pages to PMEM [7/9] migration/ram: ensure write persistence on loading compressed pages to PMEM [8/9] migration/ram: ensure write persistence on loading xbzrle pages to PMEM Junyan He (1): [9/9] migration/ram: Add check and info message to nvdimm post copy. Signed-off-by: Haozhong Zhang Signed-off-by: Junyan He --- backends/hostmem-file.c | 27 ++- configure | 35 +++ docs/nvdimm.txt | 14 ++ exec.c | 20 hw/mem/nvdimm.c | 9 - include/exec/memory.h | 12 ++-- include/exec/ram_addr.h | 28 ++-- include/migration/qemu-file-types.h | 2 ++ include/qemu/pmem.h | 27 +++ memory.c| 8 +--- migration/qemu-file.c | 29 +++-- migration/ram.c | 52 ++-- migration/ram.h | 2 +- migration/rdma.c| 2 +- migration/xbzrle.c | 8 ++-- migration/xbzrle.h | 3 ++- numa.c | 2 +- qemu-options.hx | 7 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 37 + tests/Makefile.include | 4 ++-- tests/test-xbzrle.c | 4 ++-- 22 files changed, 290 insertions(+), 43 deletions(-) -- 2.7.4
[Qemu-devel] [PATCH 02/10] RFC: Implement qcow2's snapshot dependent saving function.
From: Junyan He For qcow2 format, we can increase the cluster's reference count of dependent snapshot content and link the offset to the L2 table of the new snapshot point. This way can avoid obvious snapshot's dependent relationship, so when we delete some snapshot point, just decrease the cluster count and no need to check further. Signed-off-by: Junyan He --- block/qcow2-snapshot.c | 154 + block/qcow2.c | 2 + block/qcow2.h | 7 +++ 3 files changed, 163 insertions(+) diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c index cee25f5..8e83084 100644 --- a/block/qcow2-snapshot.c +++ b/block/qcow2-snapshot.c @@ -736,3 +736,157 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, return 0; } + +int qcow2_snapshot_save_dependency(BlockDriverState *bs, + const char *depend_snapshot_id, + int64_t depend_offset, + int64_t depend_size, + int64_t offset, + Error **errp) +{ +int snapshot_index; +BDRVQcow2State *s = bs->opaque; +QCowSnapshot *sn; +int ret; +int64_t i; +int64_t total_bytes = depend_size; +int64_t depend_offset1, offset1; +uint64_t *depend_l1_table = NULL; +uint64_t depend_l1_bytes; +uint64_t *depend_l2_table = NULL; +uint64_t depend_l2_offset; +uint64_t depend_entry; +QCowL2Meta l2meta; + +assert(bs->read_only == false); + +if (depend_snapshot_id == NULL) { +return 0; +} + +if (!QEMU_IS_ALIGNED(depend_offset, s->cluster_size)) { +error_setg(errp, "Specified snapshot offset is not multiple of %u", +s->cluster_size); +return -EINVAL; +} + +if (!QEMU_IS_ALIGNED(offset, s->cluster_size)) { +error_setg(errp, "Offset is not multiple of %u", s->cluster_size); +return -EINVAL; +} + +if (!QEMU_IS_ALIGNED(depend_size, s->cluster_size)) { +error_setg(errp, "depend_size is not multiple of %u", s->cluster_size); +return -EINVAL; +} + +snapshot_index = find_snapshot_by_id_and_name(bs, NULL, depend_snapshot_id); +/* Search the snapshot */ +if (snapshot_index < 0) { +error_setg(errp, "Can't find snapshot"); +return -ENOENT; +} + +sn = &s->snapshots[snapshot_index]; +if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) { +error_report("qcow2: depend on the snapshots with different disk " +"size is not implemented"); +return -ENOTSUP; +} + +/* Only can save dependency of snapshot's vmstate data */ +depend_offset1 = depend_offset + qcow2_vm_state_offset(s); +offset1 = offset + qcow2_vm_state_offset(s); + +depend_l1_bytes = s->l1_size * sizeof(uint64_t); +depend_l1_table = g_try_malloc0(depend_l1_bytes); +if (depend_l1_table == NULL) { +return -ENOMEM; +} + +ret = bdrv_pread(bs->file, sn->l1_table_offset, depend_l1_table, + depend_l1_bytes); +if (ret < 0) { +g_free(depend_l1_table); +goto out; +} +for (i = 0; i < depend_l1_bytes / sizeof(uint64_t); i++) { +be64_to_cpus(&depend_l1_table[i]); +} + +while (total_bytes) { +assert(total_bytes > 0); +/* Find the cluster of depend */ +depend_l2_offset = +depend_l1_table[depend_offset1 >> (s->l2_bits + s->cluster_bits)]; +depend_l2_offset &= L1E_OFFSET_MASK; +if (depend_l2_offset == 0) { +ret = -EINVAL; +goto out; +} + +if (offset_into_cluster(s, depend_l2_offset)) { +qcow2_signal_corruption(bs, true, -1, -1, "L2 table offset %#" +PRIx64 " unaligned (L1 index: %#" +PRIx64 ")", +depend_l2_offset, +depend_offset1 >> +(s->l2_bits + s->cluster_bits)); +return -EIO; +} + +ret = qcow2_cache_get(bs, s->l2_table_cache, depend_l2_offset, + (void **)(&depend_l2_table)); +if (ret < 0) { +goto out; +} + +depend_entry = +be64_to_cpu( +depend_l2_table[offset_to_l2_index(s, depend_offset1)]); +if (depend_entry == 0) { +ret = -EINVAL; +qcow2_cache_put(s->l2_table_cache, (void **)(&depend_l2_table)); +goto out; +} + +memset(&l2meta, 0, sizeof(l2meta)); +l2meta.offset = offset1; +l2meta.alloc_off
[Qemu-devel] [PATCH 09/10] RFC: Add nvdimm snapshot saving to migration.
From: Junyan He The nvdimm size is huge, sometimes is more than 256G or even more. This is a huge burden for snapshot saving. One snapshot point with nvdimm may occupy more than 50G disk space even with compression enabled. We need to introduce dependent snapshot manner to solve this problem. The first snapshot point should always be saved completely, and enable dirty log trace after saving for nvdimm memory region. The later snapshot point should add the reference to previous snapshot's nvdimm data and just saving dirty pages. This can save a lot of disk and time if the snapshot operations are triggered frequently. Signed-off-by: Junyan He --- Makefile.target |1 + include/migration/misc.h |4 + migration/nvdimm.c | 1033 ++ 3 files changed, 1038 insertions(+) create mode 100644 migration/nvdimm.c diff --git a/Makefile.target b/Makefile.target index 6549481..0259e70 100644 --- a/Makefile.target +++ b/Makefile.target @@ -139,6 +139,7 @@ obj-y += memory.o obj-y += memory_mapping.o obj-y += dump.o obj-y += migration/ram.o +obj-y += migration/nvdimm.o LIBS := $(libs_softmmu) $(LIBS) # Hardware support diff --git a/include/migration/misc.h b/include/migration/misc.h index 77fd4f5..0c23da8 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -20,6 +20,10 @@ void ram_mig_init(void); +/* migration/nvdimm.c */ +void nvdimm_snapshot_init(void); +bool ram_block_is_nvdimm_active(RAMBlock *block); + /* migration/block.c */ #ifdef CONFIG_LIVE_BLOCK_MIGRATION diff --git a/migration/nvdimm.c b/migration/nvdimm.c new file mode 100644 index 000..8516bb0 --- /dev/null +++ b/migration/nvdimm.c @@ -0,0 +1,1033 @@ +/* + * QEMU System Emulator + * + * Authors: + * He Junyan + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "hw/mem/nvdimm.h" +#include "cpu.h" +#include "qemu/cutils.h" +#include "exec/ram_addr.h" +#include "exec/target_page.h" +#include "qemu/rcu_queue.h" +#include "qemu/error-report.h" +#include "migration.h" +#include "qapi/error.h" +#include "migration/register.h" +#include "migration/ram.h" +#include "migration/qemu-file.h" +#include "migration.h" +#include "migration/misc.h" +#include "migration/savevm.h" +#include "block/snapshot.h" +#include "migration/snapshot.h" + +#define NVDIMM_MIG_VERSION 0x01 + +/* PADDING data, useless */ +#define NVDIMM_PADDING_BYTE 0xce +/* PAGE id, is all zero */ +#define NVDIMM_ZERO_PAGE_ID 0xaabc250f +#define NVDIMM_NONZERO_PAGE_ID 0xacbc250e +/* No usage date, for alignment only */ +#define NVDIMM_SECTION_PADDING_ID 0xaaceccea +/* Section for dirty log kind */ +#define NVDIMM_SECTION_DIRTY_LOG_ID 0xbbcd0c1e +/* Section for raw data, no bitmap, dump the whole mem */ +#define NVDIMM_SECTION_DATA_ID 0x76bbcae3 +/* Section for setup */ +#define NVDIMM_SECTION_SETUP 0x7ace0cfa +/* Section for setup */ +#define NVDIMM_SECTION_COMPLETE 0x8ace0cfa +/* Section end symbol */ +#define NVDIMM_SECTION_END_ID 0xccbe8752 +/ Sections** *** +Padding section + +| PADDING_ID | size | PADDING_BYTE .. | END_ID | + +Dirty log section + +| DIRTY_BITMAP_ID | total size | ram name size | ram name | ram size | bitmap size | + +- + bitmap data... | dirty page size | dirty page data... | END_ID | +---
[Qemu-devel] [PATCH 08/10] RFC: Add a section_id parameter to save_live_iterate call.
From: Junyan He We need to know the section_id when we do snapshot saving. Add a parameter to save_live_iterate function call. Signed-off-by: Junyan He --- hw/ppc/spapr.c | 2 +- hw/s390x/s390-stattrib.c | 2 +- include/migration/register.h | 2 +- migration/block.c| 2 +- migration/ram.c | 2 +- migration/savevm.c | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 7e1c858..4cde4f4 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1974,7 +1974,7 @@ static int htab_save_later_pass(QEMUFile *f, sPAPRMachineState *spapr, #define MAX_ITERATION_NS500 /* 5 ms */ #define MAX_KVM_BUF_SIZE2048 -static int htab_save_iterate(QEMUFile *f, void *opaque) +static int htab_save_iterate(QEMUFile *f, void *opaque, int section_id) { sPAPRMachineState *spapr = opaque; int fd; diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c index adf07ef..18ece84 100644 --- a/hw/s390x/s390-stattrib.c +++ b/hw/s390x/s390-stattrib.c @@ -246,7 +246,7 @@ static int cmma_save(QEMUFile *f, void *opaque, int final) return ret; } -static int cmma_save_iterate(QEMUFile *f, void *opaque) +static int cmma_save_iterate(QEMUFile *f, void *opaque, int section_id) { return cmma_save(f, opaque, 0); } diff --git a/include/migration/register.h b/include/migration/register.h index f4f7bdc..7f7df2c 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -31,7 +31,7 @@ typedef struct SaveVMHandlers { * use data that is local to the migration thread or protected * by other locks. */ -int (*save_live_iterate)(QEMUFile *f, void *opaque); +int (*save_live_iterate)(QEMUFile *f, void *opaque, int section_id); /* This runs outside the iothread lock! */ int (*save_setup)(QEMUFile *f, void *opaque); diff --git a/migration/block.c b/migration/block.c index 1f03946..6d4c8a3 100644 --- a/migration/block.c +++ b/migration/block.c @@ -755,7 +755,7 @@ static int block_save_setup(QEMUFile *f, void *opaque) return ret; } -static int block_save_iterate(QEMUFile *f, void *opaque) +static int block_save_iterate(QEMUFile *f, void *opaque, int section_id) { int ret; int64_t last_ftell = qemu_ftell(f); diff --git a/migration/ram.c b/migration/ram.c index 3b6c077..d1db422 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2249,7 +2249,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) * @f: QEMUFile where to send the data * @opaque: RAMState pointer */ -static int ram_save_iterate(QEMUFile *f, void *opaque) +static int ram_save_iterate(QEMUFile *f, void *opaque, int section_id) { RAMState **temp = opaque; RAMState *rs = *temp; diff --git a/migration/savevm.c b/migration/savevm.c index 3a9b904..ce4133a 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1072,7 +1072,7 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy) save_section_header(f, se, QEMU_VM_SECTION_PART); -ret = se->ops->save_live_iterate(f, se->opaque); +ret = se->ops->save_live_iterate(f, se->opaque, se->section_id); trace_savevm_section_end(se->idstr, se->section_id, ret); save_section_footer(f, se); -- 2.7.4
[Qemu-devel] [PATCH 05/10] RFC: Add memory region snapshot bitmap get function.
From: Junyan He We need to get the bitmap content of the snapshot when enable dirty log trace for nvdimm. Signed-off-by: Junyan He --- exec.c | 7 +++ include/exec/memory.h | 9 + include/exec/ram_addr.h | 2 ++ memory.c| 7 +++ 4 files changed, 25 insertions(+) diff --git a/exec.c b/exec.c index a9181e6..3d2bf0d 100644 --- a/exec.c +++ b/exec.c @@ -1235,6 +1235,13 @@ bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap, return false; } +unsigned long *cpu_physical_memory_snapshot_get_dirty_bitmap + (DirtyBitmapSnapshot *snap) +{ +assert(snap); +return snap->dirty; +} + /* Called from RCU critical section */ hwaddr memory_region_section_get_iotlb(CPUState *cpu, MemoryRegionSection *section, diff --git a/include/exec/memory.h b/include/exec/memory.h index 31eae0a..f742995 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1179,6 +1179,15 @@ bool memory_region_snapshot_get_dirty(MemoryRegion *mr, hwaddr addr, hwaddr size); /** + * memory_region_snapshot_get_dirty_bitmap: Get the dirty bitmap data of + * snapshot. + * + * @snap: the dirty bitmap snapshot + */ +unsigned long *memory_region_snapshot_get_dirty_bitmap + (DirtyBitmapSnapshot *snap); + +/** * memory_region_reset_dirty: Mark a range of pages as clean, for a specified *client. * diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index cf2446a..ce366c1 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -371,6 +371,8 @@ DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap, ram_addr_t start, ram_addr_t length); +unsigned long *cpu_physical_memory_snapshot_get_dirty_bitmap +(DirtyBitmapSnapshot *snap); static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, ram_addr_t length) diff --git a/memory.c b/memory.c index 4a8a2fe..68f17f0 100644 --- a/memory.c +++ b/memory.c @@ -1991,6 +1991,13 @@ DirtyBitmapSnapshot *memory_region_snapshot_and_clear_dirty(MemoryRegion *mr, memory_region_get_ram_addr(mr) + addr, size, client); } +unsigned long *memory_region_snapshot_get_dirty_bitmap + (DirtyBitmapSnapshot *snap) +{ +assert(snap); +return cpu_physical_memory_snapshot_get_dirty_bitmap(snap); +} + bool memory_region_snapshot_get_dirty(MemoryRegion *mr, DirtyBitmapSnapshot *snap, hwaddr addr, hwaddr size) { -- 2.7.4
[Qemu-devel] [PATCH 06/10] RFC: Add save dependency functions to qemu_file
From: Junyan He When we save snapshot, we need qemu_file to support save dependency operations. It should call brv_driver's save dependency functions to implement these operations. Signed-off-by: Junyan He --- migration/qemu-file.c | 61 +++ migration/qemu-file.h | 14 migration/savevm.c| 33 +--- 3 files changed, 105 insertions(+), 3 deletions(-) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 2ab2bf3..9d2a39a 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -46,10 +46,13 @@ struct QEMUFile { int buf_index; int buf_size; /* 0 when writing */ uint8_t buf[IO_BUF_SIZE]; +char ref_name_str[128]; /* maybe snapshot id */ DECLARE_BITMAP(may_free, MAX_IOV_SIZE); struct iovec iov[MAX_IOV_SIZE]; unsigned int iovcnt; +bool support_dependency; +int32_t dependency_aligment; int last_error; }; @@ -745,3 +748,61 @@ void qemu_file_set_blocking(QEMUFile *f, bool block) f->ops->set_blocking(f->opaque, block); } } + +void qemu_file_set_support_dependency(QEMUFile *f, int32_t alignment) +{ +f->dependency_aligment = alignment; +f->support_dependency = true; +} + +bool qemu_file_is_support_dependency(QEMUFile *f, int32_t *alignment) +{ +if (f->support_dependency && alignment) { +*alignment = f->dependency_aligment; +} + +return f->support_dependency; +} + +/* This function set the reference name for snapshot usage. Sometimes it needs + * to depend on other snapshot's data to avoid redundance. + */ +bool qemu_file_set_ref_name(QEMUFile *f, const char *name) +{ +if (strlen(name) + 1 > sizeof(f->ref_name_str)) { +return false; +} + +memcpy(f->ref_name_str, name, strlen(name) + 1); +return true; +} + +ssize_t qemu_file_save_dependency(QEMUFile *f, int64_t depend_offset, + int64_t size) +{ +ssize_t ret; + +if (f->support_dependency == false) { +return -1; +} + +assert(f->ops->save_dependency); + +if (!QEMU_IS_ALIGNED(depend_offset, f->dependency_aligment)) { +return -1; +} + +qemu_fflush(f); + +if (!QEMU_IS_ALIGNED(f->pos, f->dependency_aligment)) { +return -1; +} + +ret = f->ops->save_dependency(f->opaque, f->ref_name_str, + depend_offset, size, f->pos); +if (ret > 0) { +f->pos += size; +} + +return ret; +} diff --git a/migration/qemu-file.h b/migration/qemu-file.h index aae4e5e..137b917 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -57,6 +57,14 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov, int iovcnt, int64_t pos); /* + * This function add reference to the dependency data in snapshot specified by + * ref_name_str to this file's offset + */ +typedef ssize_t (QEMUFileSaveDependencyFunc)(void *opaque, const char *name, + int64_t depend_offset, + int64_t offset, int64_t size); + +/* * This function provides hooks around different * stages of RAM migration. * 'opaque' is the backend specific data in QEMUFile @@ -104,6 +112,7 @@ typedef struct QEMUFileOps { QEMUFileWritevBufferFunc *writev_buffer; QEMURetPathFunc *get_return_path; QEMUFileShutdownFunc *shut_down; +QEMUFileSaveDependencyFunc *save_dependency; } QEMUFileOps; typedef struct QEMUFileHooks { @@ -153,6 +162,11 @@ int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); +bool qemu_file_set_ref_name(QEMUFile *f, const char *name); +void qemu_file_set_support_dependency(QEMUFile *f, int32_t alignment); +bool qemu_file_is_support_dependency(QEMUFile *f, int32_t *alignment); +ssize_t qemu_file_save_dependency(QEMUFile *f, int64_t depend_offset, + int64_t size); size_t qemu_get_counted_string(QEMUFile *f, char buf[256]); diff --git a/migration/savevm.c b/migration/savevm.c index 358c5b5..1bbd6aa 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -196,6 +196,20 @@ static ssize_t block_writev_buffer(void *opaque, struct iovec *iov, int iovcnt, return qiov.size; } +static ssize_t block_save_dependency(void *opaque, const char *id_name, + int64_t depend_offset, + int64_t offset, int64_t size) +{ +int ret = bdrv_snapshot_save_dependency(opaque, id_name, +depend_offset, offset, +size, NULL); +if (ret < 0) { +return r
[Qemu-devel] [PATCH 03/10] RFC: Implement save and support snapshot dependency in block driver layer.
From: Junyan He Signed-off-by: Junyan He --- block/snapshot.c | 45 + include/block/snapshot.h | 7 +++ 2 files changed, 52 insertions(+) diff --git a/block/snapshot.c b/block/snapshot.c index eacc1f1..8cc40ac 100644 --- a/block/snapshot.c +++ b/block/snapshot.c @@ -401,6 +401,51 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs, return ret; } +int bdrv_snapshot_save_dependency(BlockDriverState *bs, + const char *depend_snapshot_id, + int64_t depend_offset, + int64_t depend_size, + int64_t offset, + Error **errp) +{ +BlockDriver *drv = bs->drv; + +if (!drv) { +return -ENOMEDIUM; +} + +if (drv->bdrv_snapshot_save_dependency) { +return drv->bdrv_snapshot_save_dependency(bs, depend_snapshot_id, + depend_offset, depend_size, + offset, errp); +} + +if (bs->file) { +return bdrv_snapshot_save_dependency(bs->file->bs, depend_snapshot_id, + depend_offset, depend_size, + offset, errp); +} + +return -ENOTSUP; +} + +int bdrv_snapshot_support_dependency(BlockDriverState *bs, int32_t *alignment) +{ +BlockDriver *drv = bs->drv; +if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) { +return 0; +} + +if (drv->bdrv_snapshot_support_dependency) { +return drv->bdrv_snapshot_support_dependency(bs, alignment); +} + +if (bs->file != NULL) { +return bdrv_snapshot_support_dependency(bs->file->bs, alignment); +} + +return -ENOTSUP; +} /* Group operations. All block drivers are involved. * These functions will properly handle dataplane (take aio_context_acquire diff --git a/include/block/snapshot.h b/include/block/snapshot.h index f73d109..e5bf06f 100644 --- a/include/block/snapshot.h +++ b/include/block/snapshot.h @@ -73,6 +73,13 @@ int bdrv_snapshot_load_tmp(BlockDriverState *bs, int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs, const char *id_or_name, Error **errp); +int bdrv_snapshot_save_dependency(BlockDriverState *bs, + const char *depend_snapshot_id, + int64_t depend_offset, + int64_t depend_size, + int64_t offset, + Error **errp); +int bdrv_snapshot_support_dependency(BlockDriverState *bs, int32_t *alignment); /* Group operations. All block drivers are involved. -- 2.7.4
[Qemu-devel] [PATCH 10/10] RFC: Enable nvdimm snapshot functions.
From: Junyan He In snapshot saving, all nvdimm kind memory will be saved in different way and we exclude all nvdimm kind memory region in ram.c Signed-off-by: Junyan He --- migration/ram.c | 17 + vl.c| 1 + 2 files changed, 18 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index d1db422..ad32469 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1219,9 +1219,15 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again) /* Didn't find anything in this RAM Block */ pss->page = 0; pss->block = QLIST_NEXT_RCU(pss->block, next); +while (ram_block_is_nvdimm_active(pss->block)) { +pss->block = QLIST_NEXT_RCU(pss->block, next); +} if (!pss->block) { /* Hit the end of the list */ pss->block = QLIST_FIRST_RCU(&ram_list.blocks); +while (ram_block_is_nvdimm_active(pss->block)) { +pss->block = QLIST_NEXT_RCU(pss->block, next); +} /* Flag that we've looped */ pss->complete_round = true; rs->ram_bulk_stage = false; @@ -1541,6 +1547,9 @@ static int ram_find_and_save_block(RAMState *rs, bool last_stage) if (!pss.block) { pss.block = QLIST_FIRST_RCU(&ram_list.blocks); +while (ram_block_is_nvdimm_active(pss.block)) { +pss.block = QLIST_NEXT_RCU(pss.block, next); +} } do { @@ -1583,6 +1592,10 @@ uint64_t ram_bytes_total(void) rcu_read_lock(); RAMBLOCK_FOREACH(block) { +if (ram_block_is_nvdimm_active(block)) { +// If snapshot and the block is nvdimm, let nvdimm do the job +continue; +} total += block->used_length; } rcu_read_unlock(); @@ -,6 +2235,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque) qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); RAMBLOCK_FOREACH(block) { +if (ram_block_is_nvdimm_active(block)) { +// If snapshot and the block is nvdimm, let nvdimm do the job +continue; +} qemu_put_byte(f, strlen(block->idstr)); qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr)); qemu_put_be64(f, block->used_length); diff --git a/vl.c b/vl.c index 3ef04ce..1bd5711 100644 --- a/vl.c +++ b/vl.c @@ -4502,6 +4502,7 @@ int main(int argc, char **argv, char **envp) blk_mig_init(); ram_mig_init(); +nvdimm_snapshot_init(); /* If the currently selected machine wishes to override the units-per-bus * property of its default HBA interface type, do so now. */ -- 2.7.4
[Qemu-devel] [PATCH 00/10] RFC: Optimize nvdimm kind memory for snapshot.
From: Junyan He The nvdimm size is huge, sometimes it is more than 256G or even more. This is a huge burden for snapshot saving. One snapshot point with nvdimm may occupy more than 50G disk space even with compression enabled. We need to introduce dependent snapshot manner to solve this problem. The first snapshot point should always be saved completely, and enable dirty log trace after saving for nvdimm memory region. The later snapshot point should add the reference to previous snapshot's nvdimm data and just saving dirty pages. This can save a lot of disk and time if the snapshot operations are triggered frequently. We add save_snapshot_dependency functions to QCOW2 file system firstly, the later snapshot will add reference to previous dependent snapshot's data cluster. There is an alignment problem here, the dependent data should always be cluster aligned. We need to add some padding data when saving the snapshot to make it always cluster aligned. The logic between nvdimm and ram for snapshot saving is a little confused now, we need to exclude nvdimm kind memory region from ram list and the dirty log tracing setting is also not very clear. Maybe we can separate the snapshot saving from the migration logic later to make code clean. In theory, this kind of manner can apply to any kind of memory. But because it need to turn dirty log trace on, the performance may decline. So we just enable it for nvdimm kind memory firstly. Signed-off-by: Junyan He --- Makefile.target |1 + block/qcow2-snapshot.c | 154 ++ block/qcow2.c|2 + block/qcow2.h|7 + block/snapshot.c | 45 +++ exec.c |7 + hw/ppc/spapr.c |2 +- hw/s390x/s390-stattrib.c |2 +- include/block/block_int.h|9 ++ include/block/snapshot.h |7 + include/exec/memory.h|9 ++ include/exec/ram_addr.h |2 + include/migration/misc.h |4 + include/migration/register.h |2 +- include/migration/snapshot.h |3 + memory.c | 18 ++- migration/block.c|2 +- migration/nvdimm.c | 1033 + migration/qemu-file.c| 61 + migration/qemu-file.h| 14 ++ migration/ram.c | 19 ++- migration/savevm.c | 62 - vl.c |1 + 23 files changed, 1452 insertions(+), 14 deletions(-)
[Qemu-devel] [PATCH 07/10] RFC: Add get_current_snapshot_info to get the snapshot state.
From: Junyan He We need to know the snapshot saving information when we do dependent snapshot saving, e.g the name of previous snapshot. Add this global function to query the snapshot status is usable. Signed-off-by: Junyan He --- include/migration/snapshot.h | 3 +++ migration/savevm.c | 27 +++ 2 files changed, 30 insertions(+) diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h index c85b6ec..0b950ce 100644 --- a/include/migration/snapshot.h +++ b/include/migration/snapshot.h @@ -15,7 +15,10 @@ #ifndef QEMU_MIGRATION_SNAPSHOT_H #define QEMU_MIGRATION_SNAPSHOT_H +#include "block/snapshot.h" + int save_snapshot(const char *name, Error **errp); int load_snapshot(const char *name, Error **errp); +int get_current_snapshot_info(QEMUSnapshotInfo *sn); #endif diff --git a/migration/savevm.c b/migration/savevm.c index 1bbd6aa..3a9b904 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2212,6 +2212,29 @@ int qemu_loadvm_state(QEMUFile *f) return ret; } +static int in_snap_saving; +static QEMUSnapshotInfo in_snap_saving_sn; + +int get_current_snapshot_info(QEMUSnapshotInfo *sn) +{ +if (in_snap_saving && sn) { +memcpy(sn, &in_snap_saving_sn, sizeof(QEMUSnapshotInfo)); +} + +return in_snap_saving; +} + +static void set_current_snapshot_info(QEMUSnapshotInfo *sn) +{ +if (sn) { +memcpy(&in_snap_saving_sn, sn, sizeof(QEMUSnapshotInfo)); +in_snap_saving = 1; +} else { +memset(&in_snap_saving_sn, 0, sizeof(QEMUSnapshotInfo)); +in_snap_saving = 0; +} +} + int save_snapshot(const char *name, Error **errp) { BlockDriverState *bs, *bs1; @@ -2282,6 +2305,8 @@ int save_snapshot(const char *name, Error **errp) strftime(sn->name, sizeof(sn->name), "vm-%Y%m%d%H%M%S", &tm); } +set_current_snapshot_info(sn); + /* save the VM state */ f = qemu_fopen_bdrv(bs, 1); if (!f) { @@ -2313,6 +2338,8 @@ int save_snapshot(const char *name, Error **errp) ret = 0; the_end: +set_current_snapshot_info(NULL); + if (aio_context) { aio_context_release(aio_context); } -- 2.7.4
[Qemu-devel] [PATCH 04/10] RFC: Set memory_region_set_log available for more client.
From: Junyan He We need to collect dirty log for nvdimm kind memory, need to enable memory_region_set_log for more clients rather than just VGA. Signed-off-by: Junyan He --- memory.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/memory.c b/memory.c index e70b64b..4a8a2fe 100644 --- a/memory.c +++ b/memory.c @@ -1921,11 +1921,12 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client) uint8_t mask = 1 << client; uint8_t old_logging; -assert(client == DIRTY_MEMORY_VGA); -old_logging = mr->vga_logging_count; -mr->vga_logging_count += log ? 1 : -1; -if (!!old_logging == !!mr->vga_logging_count) { -return; +if (client == DIRTY_MEMORY_VGA) { +old_logging = mr->vga_logging_count; +mr->vga_logging_count += log ? 1 : -1; +if (!!old_logging == !!mr->vga_logging_count) { +return; +} } memory_region_transaction_begin(); -- 2.7.4
[Qemu-devel] [PATCH 01/10] RFC: Add save and support snapshot dependency function to block driver.
From: Junyan He We want to support incremental snapshot saving, this needs the file system support dependency saving. Later snapshots may ref the dependent snapshot's content, and most time should be cluster aligned. Add a query function to check whether the file system support this, and use the save_dependency function to do the real work. Signed-off-by: Junyan He --- include/block/block_int.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/block/block_int.h b/include/block/block_int.h index 64a5700..be1eca3 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -274,6 +274,15 @@ struct BlockDriver { const char *snapshot_id, const char *name, Error **errp); +int (*bdrv_snapshot_save_dependency)(BlockDriverState *bs, + const char *depend_snapshot_id, + int64_t depend_offset, + int64_t depend_size, + int64_t offset, + Error **errp); +int (*bdrv_snapshot_support_dependency)(BlockDriverState *bs, +int32_t *alignment); + int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi); ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs); -- 2.7.4
[Qemu-devel] [PATCH 07/10] RFC: Add get_current_snapshot_info to get the snapshot state.
From: Junyan He We need to know the snapshot saving information when we do dependent snapshot saving, e.g the name of previous snapshot. Add this global function to query the snapshot status is usable. Signed-off-by: Junyan He --- include/migration/snapshot.h | 3 +++ migration/savevm.c | 27 +++ 2 files changed, 30 insertions(+) diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h index c85b6ec..0b950ce 100644 --- a/include/migration/snapshot.h +++ b/include/migration/snapshot.h @@ -15,7 +15,10 @@ #ifndef QEMU_MIGRATION_SNAPSHOT_H #define QEMU_MIGRATION_SNAPSHOT_H +#include "block/snapshot.h" + int save_snapshot(const char *name, Error **errp); int load_snapshot(const char *name, Error **errp); +int get_current_snapshot_info(QEMUSnapshotInfo *sn); #endif diff --git a/migration/savevm.c b/migration/savevm.c index 1bbd6aa..3a9b904 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2212,6 +2212,29 @@ int qemu_loadvm_state(QEMUFile *f) return ret; } +static int in_snap_saving; +static QEMUSnapshotInfo in_snap_saving_sn; + +int get_current_snapshot_info(QEMUSnapshotInfo *sn) +{ +if (in_snap_saving && sn) { +memcpy(sn, &in_snap_saving_sn, sizeof(QEMUSnapshotInfo)); +} + +return in_snap_saving; +} + +static void set_current_snapshot_info(QEMUSnapshotInfo *sn) +{ +if (sn) { +memcpy(&in_snap_saving_sn, sn, sizeof(QEMUSnapshotInfo)); +in_snap_saving = 1; +} else { +memset(&in_snap_saving_sn, 0, sizeof(QEMUSnapshotInfo)); +in_snap_saving = 0; +} +} + int save_snapshot(const char *name, Error **errp) { BlockDriverState *bs, *bs1; @@ -2282,6 +2305,8 @@ int save_snapshot(const char *name, Error **errp) strftime(sn->name, sizeof(sn->name), "vm-%Y%m%d%H%M%S", &tm); } +set_current_snapshot_info(sn); + /* save the VM state */ f = qemu_fopen_bdrv(bs, 1); if (!f) { @@ -2313,6 +2338,8 @@ int save_snapshot(const char *name, Error **errp) ret = 0; the_end: +set_current_snapshot_info(NULL); + if (aio_context) { aio_context_release(aio_context); } -- 2.7.4
[Qemu-devel] [PATCH 09/10] RFC: Add nvdimm snapshot saving to migration.
From: Junyan He The nvdimm size is huge, sometimes is more than 256G or even more. This is a huge burden for snapshot saving. One snapshot point with nvdimm may occupy more than 50G disk space even with compression enabled. We need to introduce dependent snapshot manner to solve this problem. The first snapshot point should always be saved completely, and enable dirty log trace after saving for nvdimm memory region. The later snapshot point should add the reference to previous snapshot's nvdimm data and just saving dirty pages. This can save a lot of disk and time if the snapshot operations are triggered frequently. Signed-off-by: Junyan He --- Makefile.target |1 + include/migration/misc.h |4 + migration/nvdimm.c | 1033 ++ 3 files changed, 1038 insertions(+) create mode 100644 migration/nvdimm.c diff --git a/Makefile.target b/Makefile.target index 6549481..0259e70 100644 --- a/Makefile.target +++ b/Makefile.target @@ -139,6 +139,7 @@ obj-y += memory.o obj-y += memory_mapping.o obj-y += dump.o obj-y += migration/ram.o +obj-y += migration/nvdimm.o LIBS := $(libs_softmmu) $(LIBS) # Hardware support diff --git a/include/migration/misc.h b/include/migration/misc.h index 77fd4f5..0c23da8 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -20,6 +20,10 @@ void ram_mig_init(void); +/* migration/nvdimm.c */ +void nvdimm_snapshot_init(void); +bool ram_block_is_nvdimm_active(RAMBlock *block); + /* migration/block.c */ #ifdef CONFIG_LIVE_BLOCK_MIGRATION diff --git a/migration/nvdimm.c b/migration/nvdimm.c new file mode 100644 index 000..8516bb0 --- /dev/null +++ b/migration/nvdimm.c @@ -0,0 +1,1033 @@ +/* + * QEMU System Emulator + * + * Authors: + * He Junyan + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "hw/mem/nvdimm.h" +#include "cpu.h" +#include "qemu/cutils.h" +#include "exec/ram_addr.h" +#include "exec/target_page.h" +#include "qemu/rcu_queue.h" +#include "qemu/error-report.h" +#include "migration.h" +#include "qapi/error.h" +#include "migration/register.h" +#include "migration/ram.h" +#include "migration/qemu-file.h" +#include "migration.h" +#include "migration/misc.h" +#include "migration/savevm.h" +#include "block/snapshot.h" +#include "migration/snapshot.h" + +#define NVDIMM_MIG_VERSION 0x01 + +/* PADDING data, useless */ +#define NVDIMM_PADDING_BYTE 0xce +/* PAGE id, is all zero */ +#define NVDIMM_ZERO_PAGE_ID 0xaabc250f +#define NVDIMM_NONZERO_PAGE_ID 0xacbc250e +/* No usage date, for alignment only */ +#define NVDIMM_SECTION_PADDING_ID 0xaaceccea +/* Section for dirty log kind */ +#define NVDIMM_SECTION_DIRTY_LOG_ID 0xbbcd0c1e +/* Section for raw data, no bitmap, dump the whole mem */ +#define NVDIMM_SECTION_DATA_ID 0x76bbcae3 +/* Section for setup */ +#define NVDIMM_SECTION_SETUP 0x7ace0cfa +/* Section for setup */ +#define NVDIMM_SECTION_COMPLETE 0x8ace0cfa +/* Section end symbol */ +#define NVDIMM_SECTION_END_ID 0xccbe8752 +/ Sections** *** +Padding section + +| PADDING_ID | size | PADDING_BYTE .. | END_ID | + +Dirty log section + +| DIRTY_BITMAP_ID | total size | ram name size | ram name | ram size | bitmap size | + +- + bitmap data... | dirty page size | dirty page data... | END_ID | +---
[Qemu-devel] [PATCH 05/10] RFC: Add memory region snapshot bitmap get function.
From: Junyan He We need to get the bitmap content of the snapshot when enable dirty log trace for nvdimm. Signed-off-by: Junyan He --- exec.c | 7 +++ include/exec/memory.h | 9 + include/exec/ram_addr.h | 2 ++ memory.c| 7 +++ 4 files changed, 25 insertions(+) diff --git a/exec.c b/exec.c index a9181e6..3d2bf0d 100644 --- a/exec.c +++ b/exec.c @@ -1235,6 +1235,13 @@ bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap, return false; } +unsigned long *cpu_physical_memory_snapshot_get_dirty_bitmap + (DirtyBitmapSnapshot *snap) +{ +assert(snap); +return snap->dirty; +} + /* Called from RCU critical section */ hwaddr memory_region_section_get_iotlb(CPUState *cpu, MemoryRegionSection *section, diff --git a/include/exec/memory.h b/include/exec/memory.h index 31eae0a..f742995 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1179,6 +1179,15 @@ bool memory_region_snapshot_get_dirty(MemoryRegion *mr, hwaddr addr, hwaddr size); /** + * memory_region_snapshot_get_dirty_bitmap: Get the dirty bitmap data of + * snapshot. + * + * @snap: the dirty bitmap snapshot + */ +unsigned long *memory_region_snapshot_get_dirty_bitmap + (DirtyBitmapSnapshot *snap); + +/** * memory_region_reset_dirty: Mark a range of pages as clean, for a specified *client. * diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index cf2446a..ce366c1 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -371,6 +371,8 @@ DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap, ram_addr_t start, ram_addr_t length); +unsigned long *cpu_physical_memory_snapshot_get_dirty_bitmap +(DirtyBitmapSnapshot *snap); static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, ram_addr_t length) diff --git a/memory.c b/memory.c index 4a8a2fe..68f17f0 100644 --- a/memory.c +++ b/memory.c @@ -1991,6 +1991,13 @@ DirtyBitmapSnapshot *memory_region_snapshot_and_clear_dirty(MemoryRegion *mr, memory_region_get_ram_addr(mr) + addr, size, client); } +unsigned long *memory_region_snapshot_get_dirty_bitmap + (DirtyBitmapSnapshot *snap) +{ +assert(snap); +return cpu_physical_memory_snapshot_get_dirty_bitmap(snap); +} + bool memory_region_snapshot_get_dirty(MemoryRegion *mr, DirtyBitmapSnapshot *snap, hwaddr addr, hwaddr size) { -- 2.7.4
[Qemu-devel] [PATCH 08/10] RFC: Add a section_id parameter to save_live_iterate call.
From: Junyan He We need to know the section_id when we do snapshot saving. Add a parameter to save_live_iterate function call. Signed-off-by: Junyan He --- hw/ppc/spapr.c | 2 +- hw/s390x/s390-stattrib.c | 2 +- include/migration/register.h | 2 +- migration/block.c| 2 +- migration/ram.c | 2 +- migration/savevm.c | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 7e1c858..4cde4f4 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1974,7 +1974,7 @@ static int htab_save_later_pass(QEMUFile *f, sPAPRMachineState *spapr, #define MAX_ITERATION_NS500 /* 5 ms */ #define MAX_KVM_BUF_SIZE2048 -static int htab_save_iterate(QEMUFile *f, void *opaque) +static int htab_save_iterate(QEMUFile *f, void *opaque, int section_id) { sPAPRMachineState *spapr = opaque; int fd; diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c index adf07ef..18ece84 100644 --- a/hw/s390x/s390-stattrib.c +++ b/hw/s390x/s390-stattrib.c @@ -246,7 +246,7 @@ static int cmma_save(QEMUFile *f, void *opaque, int final) return ret; } -static int cmma_save_iterate(QEMUFile *f, void *opaque) +static int cmma_save_iterate(QEMUFile *f, void *opaque, int section_id) { return cmma_save(f, opaque, 0); } diff --git a/include/migration/register.h b/include/migration/register.h index f4f7bdc..7f7df2c 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -31,7 +31,7 @@ typedef struct SaveVMHandlers { * use data that is local to the migration thread or protected * by other locks. */ -int (*save_live_iterate)(QEMUFile *f, void *opaque); +int (*save_live_iterate)(QEMUFile *f, void *opaque, int section_id); /* This runs outside the iothread lock! */ int (*save_setup)(QEMUFile *f, void *opaque); diff --git a/migration/block.c b/migration/block.c index 1f03946..6d4c8a3 100644 --- a/migration/block.c +++ b/migration/block.c @@ -755,7 +755,7 @@ static int block_save_setup(QEMUFile *f, void *opaque) return ret; } -static int block_save_iterate(QEMUFile *f, void *opaque) +static int block_save_iterate(QEMUFile *f, void *opaque, int section_id) { int ret; int64_t last_ftell = qemu_ftell(f); diff --git a/migration/ram.c b/migration/ram.c index 3b6c077..d1db422 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2249,7 +2249,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) * @f: QEMUFile where to send the data * @opaque: RAMState pointer */ -static int ram_save_iterate(QEMUFile *f, void *opaque) +static int ram_save_iterate(QEMUFile *f, void *opaque, int section_id) { RAMState **temp = opaque; RAMState *rs = *temp; diff --git a/migration/savevm.c b/migration/savevm.c index 3a9b904..ce4133a 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1072,7 +1072,7 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy) save_section_header(f, se, QEMU_VM_SECTION_PART); -ret = se->ops->save_live_iterate(f, se->opaque); +ret = se->ops->save_live_iterate(f, se->opaque, se->section_id); trace_savevm_section_end(se->idstr, se->section_id, ret); save_section_footer(f, se); -- 2.7.4
[Qemu-devel] [PATCH 04/10] RFC: Set memory_region_set_log available for more client.
From: Junyan He We need to collect dirty log for nvdimm kind memory, need to enable memory_region_set_log for more clients rather than just VGA. Signed-off-by: Junyan He --- memory.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/memory.c b/memory.c index e70b64b..4a8a2fe 100644 --- a/memory.c +++ b/memory.c @@ -1921,11 +1921,12 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client) uint8_t mask = 1 << client; uint8_t old_logging; -assert(client == DIRTY_MEMORY_VGA); -old_logging = mr->vga_logging_count; -mr->vga_logging_count += log ? 1 : -1; -if (!!old_logging == !!mr->vga_logging_count) { -return; +if (client == DIRTY_MEMORY_VGA) { +old_logging = mr->vga_logging_count; +mr->vga_logging_count += log ? 1 : -1; +if (!!old_logging == !!mr->vga_logging_count) { +return; +} } memory_region_transaction_begin(); -- 2.7.4
[Qemu-devel] [PATCH 03/10] RFC: Implement save and support snapshot dependency in block driver layer.
From: Junyan He Signed-off-by: Junyan He --- block/snapshot.c | 45 + include/block/snapshot.h | 7 +++ 2 files changed, 52 insertions(+) diff --git a/block/snapshot.c b/block/snapshot.c index eacc1f1..8cc40ac 100644 --- a/block/snapshot.c +++ b/block/snapshot.c @@ -401,6 +401,51 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs, return ret; } +int bdrv_snapshot_save_dependency(BlockDriverState *bs, + const char *depend_snapshot_id, + int64_t depend_offset, + int64_t depend_size, + int64_t offset, + Error **errp) +{ +BlockDriver *drv = bs->drv; + +if (!drv) { +return -ENOMEDIUM; +} + +if (drv->bdrv_snapshot_save_dependency) { +return drv->bdrv_snapshot_save_dependency(bs, depend_snapshot_id, + depend_offset, depend_size, + offset, errp); +} + +if (bs->file) { +return bdrv_snapshot_save_dependency(bs->file->bs, depend_snapshot_id, + depend_offset, depend_size, + offset, errp); +} + +return -ENOTSUP; +} + +int bdrv_snapshot_support_dependency(BlockDriverState *bs, int32_t *alignment) +{ +BlockDriver *drv = bs->drv; +if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) { +return 0; +} + +if (drv->bdrv_snapshot_support_dependency) { +return drv->bdrv_snapshot_support_dependency(bs, alignment); +} + +if (bs->file != NULL) { +return bdrv_snapshot_support_dependency(bs->file->bs, alignment); +} + +return -ENOTSUP; +} /* Group operations. All block drivers are involved. * These functions will properly handle dataplane (take aio_context_acquire diff --git a/include/block/snapshot.h b/include/block/snapshot.h index f73d109..e5bf06f 100644 --- a/include/block/snapshot.h +++ b/include/block/snapshot.h @@ -73,6 +73,13 @@ int bdrv_snapshot_load_tmp(BlockDriverState *bs, int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs, const char *id_or_name, Error **errp); +int bdrv_snapshot_save_dependency(BlockDriverState *bs, + const char *depend_snapshot_id, + int64_t depend_offset, + int64_t depend_size, + int64_t offset, + Error **errp); +int bdrv_snapshot_support_dependency(BlockDriverState *bs, int32_t *alignment); /* Group operations. All block drivers are involved. -- 2.7.4
[Qemu-devel] [PATCH 10/10] RFC: Enable nvdimm snapshot functions.
From: Junyan He In snapshot saving, all nvdimm kind memory will be saved in different way and we exclude all nvdimm kind memory region in ram.c Signed-off-by: Junyan He --- migration/ram.c | 17 + vl.c| 1 + 2 files changed, 18 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index d1db422..ad32469 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1219,9 +1219,15 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again) /* Didn't find anything in this RAM Block */ pss->page = 0; pss->block = QLIST_NEXT_RCU(pss->block, next); +while (ram_block_is_nvdimm_active(pss->block)) { +pss->block = QLIST_NEXT_RCU(pss->block, next); +} if (!pss->block) { /* Hit the end of the list */ pss->block = QLIST_FIRST_RCU(&ram_list.blocks); +while (ram_block_is_nvdimm_active(pss->block)) { +pss->block = QLIST_NEXT_RCU(pss->block, next); +} /* Flag that we've looped */ pss->complete_round = true; rs->ram_bulk_stage = false; @@ -1541,6 +1547,9 @@ static int ram_find_and_save_block(RAMState *rs, bool last_stage) if (!pss.block) { pss.block = QLIST_FIRST_RCU(&ram_list.blocks); +while (ram_block_is_nvdimm_active(pss.block)) { +pss.block = QLIST_NEXT_RCU(pss.block, next); +} } do { @@ -1583,6 +1592,10 @@ uint64_t ram_bytes_total(void) rcu_read_lock(); RAMBLOCK_FOREACH(block) { +if (ram_block_is_nvdimm_active(block)) { +// If snapshot and the block is nvdimm, let nvdimm do the job +continue; +} total += block->used_length; } rcu_read_unlock(); @@ -,6 +2235,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque) qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); RAMBLOCK_FOREACH(block) { +if (ram_block_is_nvdimm_active(block)) { +// If snapshot and the block is nvdimm, let nvdimm do the job +continue; +} qemu_put_byte(f, strlen(block->idstr)); qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr)); qemu_put_be64(f, block->used_length); diff --git a/vl.c b/vl.c index 3ef04ce..1bd5711 100644 --- a/vl.c +++ b/vl.c @@ -4502,6 +4502,7 @@ int main(int argc, char **argv, char **envp) blk_mig_init(); ram_mig_init(); +nvdimm_snapshot_init(); /* If the currently selected machine wishes to override the units-per-bus * property of its default HBA interface type, do so now. */ -- 2.7.4
[Qemu-devel] [PATCH 06/10] RFC: Add save dependency functions to qemu_file
From: Junyan He When we save snapshot, we need qemu_file to support save dependency operations. It should call brv_driver's save dependency functions to implement these operations. Signed-off-by: Junyan He --- migration/qemu-file.c | 61 +++ migration/qemu-file.h | 14 migration/savevm.c| 33 +--- 3 files changed, 105 insertions(+), 3 deletions(-) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 2ab2bf3..9d2a39a 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -46,10 +46,13 @@ struct QEMUFile { int buf_index; int buf_size; /* 0 when writing */ uint8_t buf[IO_BUF_SIZE]; +char ref_name_str[128]; /* maybe snapshot id */ DECLARE_BITMAP(may_free, MAX_IOV_SIZE); struct iovec iov[MAX_IOV_SIZE]; unsigned int iovcnt; +bool support_dependency; +int32_t dependency_aligment; int last_error; }; @@ -745,3 +748,61 @@ void qemu_file_set_blocking(QEMUFile *f, bool block) f->ops->set_blocking(f->opaque, block); } } + +void qemu_file_set_support_dependency(QEMUFile *f, int32_t alignment) +{ +f->dependency_aligment = alignment; +f->support_dependency = true; +} + +bool qemu_file_is_support_dependency(QEMUFile *f, int32_t *alignment) +{ +if (f->support_dependency && alignment) { +*alignment = f->dependency_aligment; +} + +return f->support_dependency; +} + +/* This function set the reference name for snapshot usage. Sometimes it needs + * to depend on other snapshot's data to avoid redundance. + */ +bool qemu_file_set_ref_name(QEMUFile *f, const char *name) +{ +if (strlen(name) + 1 > sizeof(f->ref_name_str)) { +return false; +} + +memcpy(f->ref_name_str, name, strlen(name) + 1); +return true; +} + +ssize_t qemu_file_save_dependency(QEMUFile *f, int64_t depend_offset, + int64_t size) +{ +ssize_t ret; + +if (f->support_dependency == false) { +return -1; +} + +assert(f->ops->save_dependency); + +if (!QEMU_IS_ALIGNED(depend_offset, f->dependency_aligment)) { +return -1; +} + +qemu_fflush(f); + +if (!QEMU_IS_ALIGNED(f->pos, f->dependency_aligment)) { +return -1; +} + +ret = f->ops->save_dependency(f->opaque, f->ref_name_str, + depend_offset, size, f->pos); +if (ret > 0) { +f->pos += size; +} + +return ret; +} diff --git a/migration/qemu-file.h b/migration/qemu-file.h index aae4e5e..137b917 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -57,6 +57,14 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov, int iovcnt, int64_t pos); /* + * This function add reference to the dependency data in snapshot specified by + * ref_name_str to this file's offset + */ +typedef ssize_t (QEMUFileSaveDependencyFunc)(void *opaque, const char *name, + int64_t depend_offset, + int64_t offset, int64_t size); + +/* * This function provides hooks around different * stages of RAM migration. * 'opaque' is the backend specific data in QEMUFile @@ -104,6 +112,7 @@ typedef struct QEMUFileOps { QEMUFileWritevBufferFunc *writev_buffer; QEMURetPathFunc *get_return_path; QEMUFileShutdownFunc *shut_down; +QEMUFileSaveDependencyFunc *save_dependency; } QEMUFileOps; typedef struct QEMUFileHooks { @@ -153,6 +162,11 @@ int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); +bool qemu_file_set_ref_name(QEMUFile *f, const char *name); +void qemu_file_set_support_dependency(QEMUFile *f, int32_t alignment); +bool qemu_file_is_support_dependency(QEMUFile *f, int32_t *alignment); +ssize_t qemu_file_save_dependency(QEMUFile *f, int64_t depend_offset, + int64_t size); size_t qemu_get_counted_string(QEMUFile *f, char buf[256]); diff --git a/migration/savevm.c b/migration/savevm.c index 358c5b5..1bbd6aa 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -196,6 +196,20 @@ static ssize_t block_writev_buffer(void *opaque, struct iovec *iov, int iovcnt, return qiov.size; } +static ssize_t block_save_dependency(void *opaque, const char *id_name, + int64_t depend_offset, + int64_t offset, int64_t size) +{ +int ret = bdrv_snapshot_save_dependency(opaque, id_name, +depend_offset, offset, +size, NULL); +if (ret < 0) { +return r
[Qemu-devel] [PATCH 01/10] RFC: Add save and support snapshot dependency function to block driver.
From: Junyan He We want to support incremental snapshot saving, this needs the file system support dependency saving. Later snapshots may ref the dependent snapshot's content, and most time should be cluster aligned. Add a query function to check whether the file system support this, and use the save_dependency function to do the real work. Signed-off-by: Junyan He --- include/block/block_int.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/block/block_int.h b/include/block/block_int.h index 64a5700..be1eca3 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -274,6 +274,15 @@ struct BlockDriver { const char *snapshot_id, const char *name, Error **errp); +int (*bdrv_snapshot_save_dependency)(BlockDriverState *bs, + const char *depend_snapshot_id, + int64_t depend_offset, + int64_t depend_size, + int64_t offset, + Error **errp); +int (*bdrv_snapshot_support_dependency)(BlockDriverState *bs, +int32_t *alignment); + int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi); ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs); -- 2.7.4
[Qemu-devel] [PATCH 02/10] RFC: Implement qcow2's snapshot dependent saving function.
From: Junyan He For qcow2 format, we can increase the cluster's reference count of dependent snapshot content and link the offset to the L2 table of the new snapshot point. This way can avoid obvious snapshot's dependent relationship, so when we delete some snapshot point, just decrease the cluster count and no need to check further. Signed-off-by: Junyan He --- block/qcow2-snapshot.c | 154 + block/qcow2.c | 2 + block/qcow2.h | 7 +++ 3 files changed, 163 insertions(+) diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c index cee25f5..8e83084 100644 --- a/block/qcow2-snapshot.c +++ b/block/qcow2-snapshot.c @@ -736,3 +736,157 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, return 0; } + +int qcow2_snapshot_save_dependency(BlockDriverState *bs, + const char *depend_snapshot_id, + int64_t depend_offset, + int64_t depend_size, + int64_t offset, + Error **errp) +{ +int snapshot_index; +BDRVQcow2State *s = bs->opaque; +QCowSnapshot *sn; +int ret; +int64_t i; +int64_t total_bytes = depend_size; +int64_t depend_offset1, offset1; +uint64_t *depend_l1_table = NULL; +uint64_t depend_l1_bytes; +uint64_t *depend_l2_table = NULL; +uint64_t depend_l2_offset; +uint64_t depend_entry; +QCowL2Meta l2meta; + +assert(bs->read_only == false); + +if (depend_snapshot_id == NULL) { +return 0; +} + +if (!QEMU_IS_ALIGNED(depend_offset, s->cluster_size)) { +error_setg(errp, "Specified snapshot offset is not multiple of %u", +s->cluster_size); +return -EINVAL; +} + +if (!QEMU_IS_ALIGNED(offset, s->cluster_size)) { +error_setg(errp, "Offset is not multiple of %u", s->cluster_size); +return -EINVAL; +} + +if (!QEMU_IS_ALIGNED(depend_size, s->cluster_size)) { +error_setg(errp, "depend_size is not multiple of %u", s->cluster_size); +return -EINVAL; +} + +snapshot_index = find_snapshot_by_id_and_name(bs, NULL, depend_snapshot_id); +/* Search the snapshot */ +if (snapshot_index < 0) { +error_setg(errp, "Can't find snapshot"); +return -ENOENT; +} + +sn = &s->snapshots[snapshot_index]; +if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) { +error_report("qcow2: depend on the snapshots with different disk " +"size is not implemented"); +return -ENOTSUP; +} + +/* Only can save dependency of snapshot's vmstate data */ +depend_offset1 = depend_offset + qcow2_vm_state_offset(s); +offset1 = offset + qcow2_vm_state_offset(s); + +depend_l1_bytes = s->l1_size * sizeof(uint64_t); +depend_l1_table = g_try_malloc0(depend_l1_bytes); +if (depend_l1_table == NULL) { +return -ENOMEM; +} + +ret = bdrv_pread(bs->file, sn->l1_table_offset, depend_l1_table, + depend_l1_bytes); +if (ret < 0) { +g_free(depend_l1_table); +goto out; +} +for (i = 0; i < depend_l1_bytes / sizeof(uint64_t); i++) { +be64_to_cpus(&depend_l1_table[i]); +} + +while (total_bytes) { +assert(total_bytes > 0); +/* Find the cluster of depend */ +depend_l2_offset = +depend_l1_table[depend_offset1 >> (s->l2_bits + s->cluster_bits)]; +depend_l2_offset &= L1E_OFFSET_MASK; +if (depend_l2_offset == 0) { +ret = -EINVAL; +goto out; +} + +if (offset_into_cluster(s, depend_l2_offset)) { +qcow2_signal_corruption(bs, true, -1, -1, "L2 table offset %#" +PRIx64 " unaligned (L1 index: %#" +PRIx64 ")", +depend_l2_offset, +depend_offset1 >> +(s->l2_bits + s->cluster_bits)); +return -EIO; +} + +ret = qcow2_cache_get(bs, s->l2_table_cache, depend_l2_offset, + (void **)(&depend_l2_table)); +if (ret < 0) { +goto out; +} + +depend_entry = +be64_to_cpu( +depend_l2_table[offset_to_l2_index(s, depend_offset1)]); +if (depend_entry == 0) { +ret = -EINVAL; +qcow2_cache_put(s->l2_table_cache, (void **)(&depend_l2_table)); +goto out; +} + +memset(&l2meta, 0, sizeof(l2meta)); +l2meta.offset = offset1; +l2meta.alloc_off
[Qemu-devel] [PATCH 00/10] RFC: Optimize nvdimm kind memory for snapshot.
From: Junyan He The nvdimm size is huge, sometimes it is more than 256G or even more. This is a huge burden for snapshot saving. One snapshot point with nvdimm may occupy more than 50G disk space even with compression enabled. We need to introduce dependent snapshot manner to solve this problem. The first snapshot point should always be saved completely, and enable dirty log trace after saving for nvdimm memory region. The later snapshot point should add the reference to previous snapshot's nvdimm data and just saving dirty pages. This can save a lot of disk and time if the snapshot operations are triggered frequently. We add save_snapshot_dependency functions to QCOW2 file system firstly, the later snapshot will add reference to previous dependent snapshot's data cluster. There is an alignment problem here, the dependent data should always be cluster aligned. We need to add some padding data when saving the snapshot to make it always cluster aligned. The logic between nvdimm and ram for snapshot saving is a little confused now, we need to exclude nvdimm kind memory region from ram list and the dirty log tracing setting is also not very clear. Maybe we can separate the snapshot saving from the migration logic later to make code clean. In theory, this kind of manner can apply to any kind of memory. But because it need to turn dirty log trace on, the performance may decline. So we just enable it for nvdimm kind memory firstly. Signed-off-by: Junyan He --- Makefile.target |1 + block/qcow2-snapshot.c | 154 ++ block/qcow2.c|2 + block/qcow2.h|7 + block/snapshot.c | 45 +++ exec.c |7 + hw/ppc/spapr.c |2 +- hw/s390x/s390-stattrib.c |2 +- include/block/block_int.h|9 ++ include/block/snapshot.h |7 + include/exec/memory.h|9 ++ include/exec/ram_addr.h |2 + include/migration/misc.h |4 + include/migration/register.h |2 +- include/migration/snapshot.h |3 + memory.c | 18 ++- migration/block.c|2 +- migration/nvdimm.c | 1033 + migration/qemu-file.c| 61 + migration/qemu-file.h| 14 ++ migration/ram.c | 19 ++- migration/savevm.c | 62 - vl.c |1 + 23 files changed, 1452 insertions(+), 14 deletions(-)