Re: [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault
On Thu, Nov 01, 2012 at 02:10:45PM -0600, Eric Blake wrote: On 10/30/2012 02:33 AM, Isaku Yamahata wrote: When page is requested, send surrounding pages are also sent. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- hmp-commands.hx | 15 - hmp.c|3 +++ migration-postcopy.c | 57 +- migration.c | 20 ++ migration.h |2 ++ qapi-schema.json |3 ++- 6 files changed, 89 insertions(+), 11 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index b054760..5e2c77c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -826,26 +826,31 @@ ETEXI { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, -.params = [-d] [-b] [-i] [-p [-n]] uri, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, + forward:i?,backward:i?, +.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword], I don't care what we do to the 'migrate' HMP command, but for QMP... +++ b/qapi-schema.json @@ -2095,7 +2095,8 @@ ## { 'command': 'migrate', 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' , - '*postcopy': 'bool', '*nobg': 'bool'} } + '*postcopy': 'bool', '*nobg': 'bool', + '*forward': 'int', '*backward': 'int'} } Do we really want to be adding new options to migrate (and if so, where's the documentation), or do we need a new monitor command similar to migrate-set-capabilities or migrate-set-cache-size? Okay, migrate-set-capabilities seems usable for boolean and scalable for future extension. On the other hand, migrate-set-cache-size takes only single integer as arguments. So it doesn't seem usable without modification. How about this? { 'type': 'MigrationParameters', 'data': {'parameter': 'name': 'str', 'value': 'int' } } { 'command': 'migrate-set-parameters', 'data': { 'parameters' ['MigrationParameters']}} { 'command': 'query-migrate-parameters', 'returns': [['MigrationParameters']]} -- yamahata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 02/35] arch_init: DPRINTF format error and typo
missing % s/ram_save_live/ram_save_iterate/ Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch_init.c b/arch_init.c index e6effe8..79d4041 100644 --- a/arch_init.c +++ b/arch_init.c @@ -659,7 +659,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) qemu_put_be64(f, RAM_SAVE_FLAG_EOS); expected_downtime = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth; -DPRINTF(ram_save_live: expected(% PRIu64 ) = max( PRIu64 )?\n, +DPRINTF(ram_save_iterate: expected(% PRIu64 ) = max(% PRIu64 )?\n, expected_downtime, migrate_max_downtime()); if (expected_downtime = migrate_max_downtime()) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 01/35] migration.c: remove redundant line in migrate_init()
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- migration.c |1 - 1 file changed, 1 deletion(-) diff --git a/migration.c b/migration.c index 62e0304..8fcb466 100644 --- a/migration.c +++ b/migration.c @@ -460,7 +460,6 @@ static MigrationState *migrate_init(const MigrationParams *params) sizeof(enabled_capabilities)); s-xbzrle_cache_size = xbzrle_cache_size; -s-bandwidth_limit = bandwidth_limit; s-state = MIG_STATE_SETUP; s-total_time = qemu_get_clock_ms(rt_clock); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 06/35] osdep: add qemu_read_full() to read interrupt-safely
This is read counter part of qemu_write_full(). Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- osdep.c | 24 qemu-common.h |2 ++ 2 files changed, 26 insertions(+) diff --git a/osdep.c b/osdep.c index 3b25297..416ffe1 100644 --- a/osdep.c +++ b/osdep.c @@ -261,6 +261,30 @@ ssize_t qemu_write_full(int fd, const void *buf, size_t count) return total; } +ssize_t qemu_read_full(int fd, void *buf, size_t count) +{ +ssize_t ret = 0; +ssize_t total = 0; + +while (count) { +ret = read(fd, buf, count); +if (ret 0) { +if (errno == EINTR) +continue; +break; +} +if (ret == 0) { +break; +} + +count -= ret; +buf += ret; +total += ret; +} + +return total; +} + /* * Opens a socket with FD_CLOEXEC set */ diff --git a/qemu-common.h b/qemu-common.h index b54612b..16128c5 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -214,6 +214,8 @@ ssize_t qemu_write_full(int fd, const void *buf, size_t count) QEMU_WARN_UNUSED_RESULT; ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags) QEMU_WARN_UNUSED_RESULT; +ssize_t qemu_read_full(int fd, void *buf, size_t count) +QEMU_WARN_UNUSED_RESULT; ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags) QEMU_WARN_UNUSED_RESULT; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 14/35] arch_init: refactor ram_save_block() and export ram_save_block()
arch_init: factor out counting transferred bytes. This will be used by postcopy. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v2 - v3: - manual rebase - report ram_save_block Chnages v1 - v2: - don't refer last_block which can be NULL. And avoid possible infinite loop. --- arch_init.c | 122 +++ arch_init.h |5 +++ migration.h |1 + 3 files changed, 70 insertions(+), 58 deletions(-) diff --git a/arch_init.c b/arch_init.c index 23717d3..ad1b01b 100644 --- a/arch_init.c +++ b/arch_init.c @@ -399,59 +399,77 @@ static void migration_bitmap_sync(void) } } +static uint64_t bytes_transferred; + +/* + * ram_save_page: Writes a page of memory to the stream f + * + * Returns: true: page written + * false: no page written + */ +static const RAMBlock *last_sent_block = NULL; +bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset, + bool last_stage) +{ +MemoryRegion *mr = block-mr; +uint8_t *p; +int cont; +int bytes_sent = -1; +ram_addr_t current_addr; + +if (!migration_bitmap_test_and_reset_dirty(mr, offset)) { +return false; +} + +cont = (block == last_sent_block) ? RAM_SAVE_FLAG_CONTINUE : 0; +last_sent_block = block; +p = memory_region_get_ram_ptr(mr) + offset; +if (is_dup_page(p)) { +acct_info.dup_pages++; +save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS); +qemu_put_byte(f, *p); +bytes_sent = 1; +} else if (migrate_use_xbzrle()) { +current_addr = block-offset + offset; +bytes_sent = save_xbzrle_page(f, p, current_addr, block, + offset, cont, last_stage); +if (!last_stage) { +p = get_cached_data(XBZRLE.cache, current_addr); +} +} + +/* either we didn't send yet (we may have had XBZRLE overflow) */ +if (bytes_sent == -1) { +save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE); +qemu_put_buffer(f, p, TARGET_PAGE_SIZE); +bytes_sent = TARGET_PAGE_SIZE; +acct_info.norm_pages++; +} + +bytes_transferred += bytes_sent; +return true; +} + /* * ram_save_block: Writes a page of memory to the stream f * - * Returns: 0: if the page hasn't changed - * -1: if there are no more dirty pages - * n: the amount of bytes written in other case + * Returns: true: there may be more dirty pages + * false: if there are no more dirty pages */ -static int ram_save_block(QEMUFile *f, bool last_stage) +bool ram_save_block(QEMUFile *f, bool last_stage) { RAMBlock *block = last_block; ram_addr_t offset = last_offset; -int bytes_sent = -1; -MemoryRegion *mr; -ram_addr_t current_addr; +bool wrote = false; if (!block) block = QLIST_FIRST(ram_list.blocks); do { -mr = block-mr; -if (migration_bitmap_test_and_reset_dirty(mr, offset)) { -uint8_t *p; -int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0; - -p = memory_region_get_ram_ptr(mr) + offset; - -if (is_dup_page(p)) { -acct_info.dup_pages++; -save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS); -qemu_put_byte(f, *p); -bytes_sent = 1; -} else if (migrate_use_xbzrle()) { -current_addr = block-offset + offset; -bytes_sent = save_xbzrle_page(f, p, current_addr, block, - offset, cont, last_stage); -if (!last_stage) { -p = get_cached_data(XBZRLE.cache, current_addr); -} -} - -/* either we didn't send yet (we may have had XBZRLE overflow) */ -if (bytes_sent == -1) { -save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE); -qemu_put_buffer(f, p, TARGET_PAGE_SIZE); -bytes_sent = TARGET_PAGE_SIZE; -acct_info.norm_pages++; -} - -/* if page is unmodified, continue to the next */ -if (bytes_sent != 0) { -break; -} +wrote = ram_save_page(f, block, offset, last_stage); +if (wrote) { +break; } offset += TARGET_PAGE_SIZE; @@ -466,11 +484,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) last_block = block; last_offset = offset; -return bytes_sent; +return wrote; } -static uint64_t bytes_transferred; - static ram_addr_t ram_save_remaining(void) { return migration_dirty_pages; @@ -547,6 +563,7 @@ static void ram_migration_cancel(void *opaque) static void reset_ram_globals(void) { +last_sent_block = NULL; last_block = NULL; last_offset = 0; last_version = ram_list.version; @@ -618,14 +635,10
[PATCH v3 16/35] arch_init/ram_load: refactor ram_load
ram_load_page() will be used by postcopy. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v2 - v3: - new --- arch_init.c | 137 +++ arch_init.h |3 ++ 2 files changed, 74 insertions(+), 66 deletions(-) diff --git a/arch_init.c b/arch_init.c index 7e6d84e..c77e24d 100644 --- a/arch_init.c +++ b/arch_init.c @@ -721,7 +721,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) return 0; } -static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host) +static int load_xbzrle(QEMUFile *f, void *host) { int ret, rc = 0; unsigned int xh_len; @@ -792,12 +792,73 @@ static inline void *host_from_stream_offset(QEMUFile *f, return NULL; } +int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes) +{ +/* Synchronize RAM block list */ +char id[256]; +ram_addr_t length; + +while (total_ram_bytes) { +RAMBlock *block; +uint8_t len; + +len = qemu_get_byte(f); +qemu_get_buffer(f, (uint8_t *)id, len); +id[len] = 0; +length = qemu_get_be64(f); + +QLIST_FOREACH(block, ram_list.blocks, next) { +if (!strncmp(id, block-idstr, sizeof(id))) { +if (block-length != length) +return -EINVAL; +break; +} +} + +if (!block) { +fprintf(stderr, Unknown ramblock \%s\, cannot +accept migration\n, id); +return -EINVAL; +} + +total_ram_bytes -= length; +} + +return 0; +} + +int ram_load_page(QEMUFile *f, void *host, int flags) +{ +if (flags RAM_SAVE_FLAG_COMPRESS) { +uint8_t ch; +ch = qemu_get_byte(f); +memset(host, ch, TARGET_PAGE_SIZE); +#ifndef _WIN32 +if (ch == 0 +(!kvm_enabled() || kvm_has_sync_mmu())) { +qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED); +} +#endif +} else if (flags RAM_SAVE_FLAG_PAGE) { +qemu_get_buffer(f, host, TARGET_PAGE_SIZE); +} else if (flags RAM_SAVE_FLAG_XBZRLE) { +if (!migrate_use_xbzrle()) { +return -EINVAL; +} +if (load_xbzrle(f, host) 0) { +return -EINVAL; +} +} +return 0; +} + static int ram_load(QEMUFile *f, void *opaque, int version_id) { ram_addr_t addr; int flags, ret = 0; int error; static uint64_t seq_iter; +void *host; seq_iter++; @@ -813,82 +874,26 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) if (flags RAM_SAVE_FLAG_MEM_SIZE) { if (version_id == 4) { -/* Synchronize RAM block list */ -char id[256]; -ram_addr_t length; -ram_addr_t total_ram_bytes = addr; - -while (total_ram_bytes) { -RAMBlock *block; -uint8_t len; - -len = qemu_get_byte(f); -qemu_get_buffer(f, (uint8_t *)id, len); -id[len] = 0; -length = qemu_get_be64(f); - -QLIST_FOREACH(block, ram_list.blocks, next) { -if (!strncmp(id, block-idstr, sizeof(id))) { -if (block-length != length) { -ret = -EINVAL; -goto done; -} -break; -} -} - -if (!block) { -fprintf(stderr, Unknown ramblock \%s\, cannot -accept migration\n, id); -ret = -EINVAL; -goto done; -} - -total_ram_bytes -= length; +error = ram_load_mem_size(f, addr); +if (error) { +DPRINTF(error %d\n, error); +return error; } } } -if (flags RAM_SAVE_FLAG_COMPRESS) { -void *host; -uint8_t ch; - -host = host_from_stream_offset(f, addr, flags); -if (!host) { -return -EINVAL; -} - -ch = qemu_get_byte(f); -memset(host, ch, TARGET_PAGE_SIZE); -#ifndef _WIN32 -if (ch == 0 -(!kvm_enabled() || kvm_has_sync_mmu())) { -qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED); -} -#endif -} else if (flags RAM_SAVE_FLAG_PAGE) { -void *host; - +if (flags (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE | + RAM_SAVE_FLAG_XBZRLE)) { host = host_from_stream_offset(f, addr, flags); if (!host) { return -EINVAL; } - -qemu_get_buffer(f, host
[PATCH v3 08/35] savevm/QEMUFile: consolidate QEMUFile functions a bit
- add qemu_file_fd() for later use - drop qemu_stdio_fd Now qemu_file_fd() replaces qemu_stdio_fd(). - savevm/QEMUFileSocket: drop duplicated member fd fd is already stored in QEMUFile so drop duplicated member QEMUFileSocket::fd. - remove QEMUFileSocket Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- migration-exec.c |4 ++-- migration-fd.c |2 +- qemu-file.h |2 +- savevm.c | 40 +++- 4 files changed, 23 insertions(+), 25 deletions(-) diff --git a/migration-exec.c b/migration-exec.c index 6c97db9..95e9779 100644 --- a/migration-exec.c +++ b/migration-exec.c @@ -98,7 +98,7 @@ static void exec_accept_incoming_migration(void *opaque) QEMUFile *f = opaque; process_incoming_migration(f); -qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL); +qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL); qemu_fclose(f); } @@ -113,7 +113,7 @@ int exec_start_incoming_migration(const char *command) return -errno; } -qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, +qemu_set_fd_handler2(qemu_file_fd(f), NULL, exec_accept_incoming_migration, NULL, f); return 0; diff --git a/migration-fd.c b/migration-fd.c index 7335167..b3c54e5 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -104,7 +104,7 @@ static void fd_accept_incoming_migration(void *opaque) QEMUFile *f = opaque; process_incoming_migration(f); -qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL); +qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL); qemu_fclose(f); } diff --git a/qemu-file.h b/qemu-file.h index 9b6dd08..bc222dc 100644 --- a/qemu-file.h +++ b/qemu-file.h @@ -70,7 +70,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode); QEMUFile *qemu_fopen_socket(int fd); QEMUFile *qemu_popen(FILE *popen_file, const char *mode); QEMUFile *qemu_popen_cmd(const char *command, const char *mode); -int qemu_stdio_fd(QEMUFile *f); +int qemu_file_fd(QEMUFile *f); int qemu_fclose(QEMUFile *f); int qemu_fflush(QEMUFile *f); void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size); diff --git a/savevm.c b/savevm.c index 0c7af43..e24041b 100644 --- a/savevm.c +++ b/savevm.c @@ -178,6 +178,7 @@ struct QEMUFile { uint8_t buf[IO_BUF_SIZE]; int last_error; +int fd; /* -1 means fd isn't associated */ }; typedef struct QEMUFileStdio @@ -186,19 +187,18 @@ typedef struct QEMUFileStdio QEMUFile *file; } QEMUFileStdio; -typedef struct QEMUFileSocket +typedef struct QEMUFileFD { -int fd; QEMUFile *file; -} QEMUFileSocket; +} QEMUFileFD; static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) { -QEMUFileSocket *s = opaque; +QEMUFileFD *s = opaque; ssize_t len; do { -len = qemu_recv(s-fd, buf, size, 0); +len = qemu_recv(s-file-fd, buf, size, 0); } while (len == -1 socket_error() == EINTR); if (len == -1) @@ -207,9 +207,9 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) return len; } -static int socket_close(void *opaque) +static int fd_close(void *opaque) { -QEMUFileSocket *s = opaque; +QEMUFileFD *s = opaque; g_free(s); return 0; } @@ -276,6 +276,7 @@ QEMUFile *qemu_popen(FILE *stdio_file, const char *mode) s-file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_pclose, NULL, NULL, NULL); } +s-file-fd = fileno(stdio_file); return s-file; } @@ -291,17 +292,6 @@ QEMUFile *qemu_popen_cmd(const char *command, const char *mode) return qemu_popen(popen_file, mode); } -int qemu_stdio_fd(QEMUFile *f) -{ -QEMUFileStdio *p; -int fd; - -p = (QEMUFileStdio *)f-opaque; -fd = fileno(p-stdio_file); - -return fd; -} - QEMUFile *qemu_fdopen(int fd, const char *mode) { QEMUFileStdio *s; @@ -325,6 +315,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode) s-file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_fclose, NULL, NULL, NULL); } +s-file-fd = fd; return s-file; fail: @@ -334,11 +325,11 @@ fail: QEMUFile *qemu_fopen_socket(int fd) { -QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket)); +QEMUFileFD *s = g_malloc0(sizeof(QEMUFileFD)); -s-fd = fd; -s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, +s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, fd_close, NULL, NULL, NULL); +s-file-fd = fd; return s-file; } @@ -381,6 +372,7 @@ QEMUFile *qemu_fopen(const char *filename, const char *mode) s-file = qemu_fopen_ops(s, NULL, file_get_buffer, stdio_fclose, NULL, NULL, NULL); } +s-file-fd = fileno(s-stdio_file); return s-file; fail: g_free(s); @@ -431,10 +423,16 @@ QEMUFile
[PATCH v3 10/35] savevm/QEMUFile: add read/write QEMUFile on memory buffer
This will be used by postcopy/incoming part. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- qemu-file.h |4 savevm.c| 60 +++ 2 files changed, 64 insertions(+) diff --git a/qemu-file.h b/qemu-file.h index 94557ea..452efcd 100644 --- a/qemu-file.h +++ b/qemu-file.h @@ -71,6 +71,10 @@ QEMUFile *qemu_fopen_socket(int fd); QEMUFile *qemu_fopen_fd(int fd, const char *mode); QEMUFile *qemu_popen(FILE *popen_file, const char *mode); QEMUFile *qemu_popen_cmd(const char *command, const char *mode); +struct QEMUFileBuf; +typedef struct QEMUFileBuf QEMUFileBuf; +QEMUFileBuf *qemu_fopen_buf_write(void); +QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size); int qemu_file_fd(QEMUFile *f); int qemu_fclose(QEMUFile *f); int qemu_fflush(QEMUFile *f); diff --git a/savevm.c b/savevm.c index 712b7ae..7e55dce 100644 --- a/savevm.c +++ b/savevm.c @@ -368,6 +368,66 @@ QEMUFile *qemu_fopen_fd(int fd, const char *mode) return s-file; } +struct QEMUFileBuf { +QEMUFile *file; +uint8_t *buffer; +size_t buffer_size; +size_t buffer_capacity; +}; + +static int buf_close(void *opaque) +{ +QEMUFileBuf *s = opaque; +g_free(s-buffer); +g_free(s); +return 0; +} + +static int buf_put_buffer(void *opaque, + const uint8_t *buf, int64_t pos, int size) +{ +QEMUFileBuf *s = opaque; + +int inc = size - (s-buffer_capacity - s-buffer_size); +if (inc 0) { +s-buffer_capacity += DIV_ROUND_UP(inc, IO_BUF_SIZE) * IO_BUF_SIZE; +s-buffer = g_realloc(s-buffer, s-buffer_capacity); +} +memcpy(s-buffer + s-buffer_size, buf, size); +s-buffer_size += size; + +return size; +} + +QEMUFileBuf *qemu_fopen_buf_write(void) +{ +QEMUFileBuf *s = g_malloc0(sizeof(*s)); +s-file = qemu_fopen_ops(s, buf_put_buffer, NULL, buf_close, + NULL, NULL, NULL); +return s; +} + +static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) +{ +QEMUFileBuf *s = opaque; +ssize_t len = MIN(size, s-buffer_capacity - s-buffer_size); +memcpy(buf, s-buffer + s-buffer_size, len); +s-buffer_size += len; +return len; +} + +/* This gets the ownership of buf. */ +QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size) +{ +QEMUFileBuf *s = g_malloc0(sizeof(*s)); +s-buffer = buf; +s-buffer_size = 0; /* this is used as index to read */ +s-buffer_capacity = size; +s-file = qemu_fopen_ops(s, NULL, buf_get_buffer, buf_close, + NULL, NULL, NULL); +return s-file; +} + static int file_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int size) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 33/35] arch_init: export migration_bitmap_sync and helper method to get bitmap
Those migration bitmap operation will be used by postcopy. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c |7 ++- migration.h |2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch_init.c b/arch_init.c index 48f45cd..49fbaff 100644 --- a/arch_init.c +++ b/arch_init.c @@ -345,6 +345,11 @@ void migration_bitmap_free(void) migration_bitmap = NULL; } +const unsigned long *migration_bitmap_get(void) +{ +return migration_bitmap; +} + static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr, ram_addr_t offset) { @@ -373,7 +378,7 @@ static inline bool migration_bitmap_set_dirty(MemoryRegion *mr, return ret; } -static void migration_bitmap_sync(void) +void migration_bitmap_sync(void) { RAMBlock *block; ram_addr_t addr; diff --git a/migration.h b/migration.h index 6cc3682..2801e7e 100644 --- a/migration.h +++ b/migration.h @@ -111,6 +111,8 @@ uint64_t ram_bytes_transferred(void); uint64_t ram_bytes_total(void); void migration_bitmap_init(void); void migration_bitmap_free(void); +const unsigned long *migration_bitmap_get(void); +void migration_bitmap_sync(void); extern SaveVMHandlers savevm_ram_handlers; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 34/35] postcopy/outgoing: introduce precopy_count parameter
Precopy with this loop number before postcopy mode. This will be implemented by the next patch. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- hmp-commands.hx | 10 ++ hmp.c|2 ++ migration-postcopy.c |2 +- migration.c |2 ++ migration.h |3 ++- qapi-schema.json |4 +++- qmp-commands.hx |2 +- savevm.c |3 ++- 8 files changed, 19 insertions(+), 9 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 942f620..957bf76 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -826,9 +826,10 @@ ETEXI { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s, - forward:i?,backward:i?, -.params = [-d] [-b] [-i] [-p [-n] [-m]] uri [forward] [backword], +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n, + uri:s,precopy_count:i?,forward:i?,backward:i?, +.params = [-d] [-b] [-i] [-p [-n] [-m]] uri + [precopy_count] [forward] [backword], .help = migrate to URI (using -d to not wait for completion) \n\t\t\t -b for migration without shared storage with full copy of disk\n\t\t\t -i for migration without @@ -837,6 +838,7 @@ ETEXI \n\t\t\t-p for migration with postcopy mode enabled \n\t\t\t-m for move background transfer of postcopy mode \n\t\t\t-n for no background transfer of postcopy mode + \n\t\t\tprecopy_count: loop of precopy when postcopy \n\t\t\tforward: the number of pages to forward-prefault when postcopy (default 0) \n\t\t\tbackward: the number of pages to @@ -846,7 +848,7 @@ ETEXI STEXI -@item migrate [-d] [-b] [-i] [-p [-n] [-m]] @var{uri} @var{forward} @var{backward} +@item migrate [-d] [-b] [-i] [-p [-n] [-m]] @var{uri} @var{precopy_count} @var{forward} @var{backward} @findex migrate Migrate to @var{uri} (using -d to not wait for completion). -b for migration with full copy of disk diff --git a/hmp.c b/hmp.c index a0bd869..be88db9 100644 --- a/hmp.c +++ b/hmp.c @@ -1038,6 +1038,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) int postcopy = qdict_get_try_bool(qdict, postcopy, 0); int movebg = qdict_get_try_bool(qdict, movebg, 0); int nobg = qdict_get_try_bool(qdict, nobg, 0); +int precopy_count = qdict_get_try_int(qdict, precopy_count, 0); int forward = qdict_get_try_int(qdict, forward, 0); int backward = qdict_get_try_int(qdict, backward, 0); const char *uri = qdict_get_str(qdict, uri); @@ -1045,6 +1046,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, !!postcopy, postcopy, !!movebg, movebg, !!nobg, nobg, +!!precopy_count, precopy_count, !!forward, forward, !!backward, backward, err); if (err) { diff --git a/migration-postcopy.c b/migration-postcopy.c index 9298cd4..8a43c42 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -319,7 +319,7 @@ int postcopy_outgoing_create_read_socket(MigrationState *s) return 0; } -void postcopy_outgoing_state_begin(QEMUFile *f) +void postcopy_outgoing_state_begin(QEMUFile *f, const MigrationParams *params) { uint64_t options = 0; qemu_put_ubyte(f, QEMU_VM_POSTCOPY_INIT); diff --git a/migration.c b/migration.c index 057ea31..84ca4b3 100644 --- a/migration.c +++ b/migration.c @@ -513,6 +513,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, bool has_postcopy, bool postcopy, bool has_movebg, bool movebg, bool has_nobg, bool nobg, + bool has_precopy_count, int64_t precopy_count, bool has_forward, int64_t forward, bool has_backward, int64_t backward, Error **errp) @@ -527,6 +528,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, params.postcopy = postcopy; params.nobg = nobg; params.movebg = movebg; +params.precopy_count = precopy_count, params.prefault_forward = 0; if (has_forward) { if (forward 0) { diff --git a/migration.h b/migration.h index 2801e7e..c4d7b0a 100644 --- a/migration.h +++ b/migration.h @@ -27,6 +27,7 @@ struct MigrationParams { bool postcopy; bool nobg; bool movebg; +int precopy_count; int64_t prefault_forward; int64_t prefault_backward; }; @@ -150,7 +151,7 @@ int64_t xbzrle_cache_resize(int64_t new_size); /* For outgoing postcopy */ int postcopy_outgoing_create_read_socket(MigrationState *s); -void postcopy_outgoing_state_begin(QEMUFile *f); +void postcopy_outgoing_state_begin(QEMUFile *f, const MigrationParams
[PATCH v3 31/35] arch_init: export ram_save_iterate()
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 11 --- arch_init.h |1 + 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch_init.c b/arch_init.c index f86a0b4..48f45cd 100644 --- a/arch_init.c +++ b/arch_init.c @@ -633,7 +633,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) return 0; } -static int ram_save_iterate(QEMUFile *f, void *opaque) +int ram_save_iterate(QEMUFile *f) { uint64_t bytes_transferred_last; double bwidth = 0; @@ -705,6 +705,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) return 0; } +static int ram_save_iterate_bwidth(QEMUFile *f, void *opaque) +{ +return ram_save_iterate(f); +} + static int ram_save_complete(QEMUFile *f, void *opaque) { migration_bitmap_sync(); @@ -937,7 +942,7 @@ static void ram_save_set_params(const MigrationParams *params, void *opaque) savevm_ram_handlers.save_live_complete = postcopy_outgoing_ram_save_complete; } else { -savevm_ram_handlers.save_live_iterate = ram_save_iterate; +savevm_ram_handlers.save_live_iterate = ram_save_iterate_bwidth; savevm_ram_handlers.save_live_complete = ram_save_complete; } } @@ -945,7 +950,7 @@ static void ram_save_set_params(const MigrationParams *params, void *opaque) SaveVMHandlers savevm_ram_handlers = { .set_params = ram_save_set_params, .save_live_setup = ram_save_setup, -.save_live_iterate = ram_save_iterate, +.save_live_iterate = ram_save_iterate_bwidth, .save_live_complete = ram_save_complete, .load_state = ram_load_precopy, .cancel = ram_migration_cancel, diff --git a/arch_init.h b/arch_init.h index 3977ca7..966b25a 100644 --- a/arch_init.h +++ b/arch_init.h @@ -47,6 +47,7 @@ CpuDefinitionInfoList GCC_WEAK_DECL *arch_query_cpu_definitions(Error **errp); #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */ int ram_load_page(QEMUFile *f, void *host, int flags); +int ram_save_iterate(QEMUFile *f); #if defined(NEED_CPU_H) !defined(CONFIG_USER_ONLY) void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 32/35] postcopy: pre+post optimization incoming side
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- migration-postcopy.c | 207 +- 1 file changed, 204 insertions(+), 3 deletions(-) diff --git a/migration-postcopy.c b/migration-postcopy.c index 421fb39..9298cd4 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -274,6 +274,9 @@ static void postcopy_outgoing_free_req(struct qemu_umem_req *req) #define QEMU_VM_POSTCOPY_INIT 0 #define QEMU_VM_POSTCOPY_SECTION_FULL 1 +/* options in QEMU_VM_POSTCOPY_INIT section */ +#define POSTCOPY_OPTION_PRECOPY 1ULL + /*** * outgoing part */ @@ -739,6 +742,7 @@ struct PostcopyIncomingUMemDaemon { int nr_target_pages_per_host_page; int target_to_host_page_shift; int version_id; /* save/load format version id */ +bool precopy_enabled; QemuThread thread; QLIST_HEAD(, UMemBlock) blocks; @@ -784,6 +788,7 @@ static PostcopyIncomingState state = { static PostcopyIncomingUMemDaemon umemd = { .state = 0, +.precopy_enabled = false, .to_qemu_fd = -1, .to_qemu = NULL, .from_qemu_fd = -1, @@ -797,6 +802,8 @@ static PostcopyIncomingUMemDaemon umemd = { static void *postcopy_incoming_umemd(void*); static void postcopy_incoming_qemu_handle_req(void *opaque); +static UMemBlock *postcopy_incoming_umem_block_from_stream( +QEMUFile *f, int flags); /* protected by qemu_mutex_lock_ramlist() */ void postcopy_incoming_ram_free(RAMBlock *ram_block) @@ -875,6 +882,25 @@ int postcopy_incoming_ram_load(QEMUFile *f, void *opaque, int version_id) return -EINVAL; } +static void* +postcopy_incoming_shmem_from_stream_offset(QEMUFile *f, ram_addr_t offset, + int flags) +{ +UMemBlock *block = postcopy_incoming_umem_block_from_stream(f, flags); +if (block == NULL) { +DPRINTF(error block = NULL\n); +return NULL; +} +return block-umem-shmem + offset; +} + +static int postcopy_incoming_ram_load_precopy(QEMUFile *f, void *opaque, + int version_id) +{ +return ram_load(f, opaque, version_id, +postcopy_incoming_shmem_from_stream_offset); +} + static void postcopy_incoming_umem_block_free(void) { UMemBlock *block; @@ -982,6 +1008,12 @@ static int postcopy_incoming_loadvm_init(QEMUFile *f, uint32_t size) return -EINVAL; } options = qemu_get_be64(f); +if (options POSTCOPY_OPTION_PRECOPY) { +options = ~POSTCOPY_OPTION_PRECOPY; +umemd.precopy_enabled = true; +} else { +umemd.precopy_enabled = false; +} if (options) { fprintf(stderr, unknown options 0x%PRIx64, options); return -ENOSYS; @@ -999,12 +1031,17 @@ static int postcopy_incoming_loadvm_init(QEMUFile *f, uint32_t size) return -ENOSYS; } -DPRINTF(detected POSTCOPY\n); +DPRINTF(detected POSTCOPY precpoy %d\n, umemd.precopy_enabled); error = postcopy_incoming_prepare(); if (error) { return error; } -savevm_ram_handlers.load_state = postcopy_incoming_ram_load; +if (umemd.precopy_enabled) { +savevm_ram_handlers.load_state = postcopy_incoming_ram_load_precopy; +} else { +savevm_ram_handlers.load_state = postcopy_incoming_ram_load; +} + incoming_postcopy = true; return 0; } @@ -1515,6 +1552,169 @@ static int postcopy_incoming_umem_ram_load(void) return 0; } +static int postcopy_incoming_umemd_read_dirty_bitmap( +QEMUFile *f, const char *idstr, uint8_t idlen, +uint64_t block_offset, uint64_t block_length, uint64_t bitmap_length) +{ +UMemBlock *block; +uint64_t bit_start = block_offset TARGET_PAGE_BITS; +uint64_t bit_end = (block_offset + block_length) TARGET_PAGE_BITS; +uint64_t bit_offset; +uint8_t *buffer; +uint64_t index; + +if ((bitmap_length % sizeof(uint64_t)) != 0) { +return -EINVAL; +} +QLIST_FOREACH(block, umemd.blocks, next) { +if (!strncmp(block-idstr, idstr, idlen)) { +break; +} +} +if (block == NULL) { +return -EINVAL; +} + +DPRINTF(bitmap %s 0x%PRIx64 0x%PRIx64 0x%PRIx64\n, +block-idstr, block_offset, block_length, bitmap_length); +buffer = g_malloc(bitmap_length); +qemu_get_buffer(f, buffer, bitmap_length); + +bit_offset = bit_start ~63; +index = 0; +while (index bitmap_length) { +uint64_t bitmap; +int i; +int j; +int bit; + +bitmap = be64_to_cpup((uint64_t*)(buffer + index)); +for (i = 0; i 64; i++) { +bit = bit_offset + i; +if (bit bit_start) { +continue; +} +if (bit = bit_end) { +break; +} +if (!(bitmap (1ULL i))) { +set_bit
[PATCH v3 35/35] postcopy: pre+post optimization outgoing side
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c |6 ++-- migration-postcopy.c | 94 +++--- migration.h |1 + 3 files changed, 94 insertions(+), 7 deletions(-) diff --git a/arch_init.c b/arch_init.c index 49fbaff..f9bd483 100644 --- a/arch_init.c +++ b/arch_init.c @@ -502,8 +502,10 @@ bool ram_save_block(QEMUFile *f, bool last_stage) if (offset = block-length) { offset = 0; block = QLIST_NEXT(block, next); -if (!block) +if (!block) { block = QLIST_FIRST(ram_list.blocks); +migrate_get_current()-precopy_count++; +} } } while (block != last_block || offset != last_offset); @@ -619,7 +621,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) acct_clear(); } -if (!params-postcopy) { +if (!(params-postcopy params-precopy_count == 0)) { memory_global_dirty_log_start(); migration_bitmap_sync(); } diff --git a/migration-postcopy.c b/migration-postcopy.c index 8a43c42..3f63385 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -322,6 +322,10 @@ int postcopy_outgoing_create_read_socket(MigrationState *s) void postcopy_outgoing_state_begin(QEMUFile *f, const MigrationParams *params) { uint64_t options = 0; +if (params-precopy_count 0) { +options |= POSTCOPY_OPTION_PRECOPY; +} + qemu_put_ubyte(f, QEMU_VM_POSTCOPY_INIT); qemu_put_be32(f, sizeof(options)); qemu_put_be64(f, options); @@ -337,12 +341,36 @@ void postcopy_outgoing_state_complete( int postcopy_outgoing_ram_save_iterate(QEMUFile *f, void *opaque) { -qemu_put_be64(f, RAM_SAVE_FLAG_EOS); -return 1; +int ret; +MigrationState *s = migrate_get_current(); +if (s-params.precopy_count == 0) { +qemu_put_be64(f, RAM_SAVE_FLAG_EOS); +return 1; +} + +ret = ram_save_iterate(f); +if (ret 0) { +return ret; +} +if (ret == 1) { +DPRINTF(precopy worked\n); +return ret; +} +if (ram_bytes_remaining() == 0) { +DPRINTF(no more precopy\n); +return 1; +} +return s-precopy_count = s-params.precopy_count? 1: 0; } int postcopy_outgoing_ram_save_complete(QEMUFile *f, void *opaque) { +MigrationState *s = migrate_get_current(); +if (s-params.precopy_count 0) { +/* Make sure all dirty bits are set */ +migration_bitmap_sync(); +memory_global_dirty_log_stop(); +} qemu_put_be64(f, RAM_SAVE_FLAG_EOS); return 0; } @@ -544,6 +572,7 @@ static void postcopy_outgoing_recv_handler(void *opaque) PostcopyOutgoingState *postcopy_outgoing_begin(MigrationState *ms) { PostcopyOutgoingState *s = g_new(PostcopyOutgoingState, 1); +const RAMBlock *block; DPRINTF(outgoing begin\n); qemu_buffered_file_drain(ms-file); @@ -553,9 +582,64 @@ PostcopyOutgoingState *postcopy_outgoing_begin(MigrationState *ms) s-mig_read = ms-file_read; s-mig_buffered_write = ms-file; -/* Make sure all dirty bits are set */ -memory_global_dirty_log_stop(); -migration_bitmap_init(); +if (ms-params.precopy_count 0) { +QEMUFile *f = ms-file; +uint64_t last_long = +BITS_TO_LONGS(last_ram_offset() TARGET_PAGE_BITS); + +/* send dirty bitmap */ +qemu_mutex_lock_ramlist(); +QLIST_FOREACH(block, ram_list.blocks, next) { +const unsigned long *bitmap = migration_bitmap_get(); +uint64_t length; +uint64_t start; +uint64_t end; +uint64_t i; + +qemu_put_byte(f, strlen(block-idstr)); +qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr)); +qemu_put_be64(f, block-offset); +qemu_put_be64(f, block-length); + +start = (block-offset TARGET_PAGE_BITS); +end = (block-offset + block-length) TARGET_PAGE_BITS; + +length = BITS_TO_LONGS(end - (start ~63)) * sizeof(unsigned long); +length = DIV_ROUND_UP(length, sizeof(uint64_t)) * sizeof(uint64_t); +qemu_put_be64(f, length); +DPRINTF(dirty bitmap %s 0x%PRIx64 0x%PRIx64 0x%PRIx64\n, +block-idstr, block-offset, block-length, length); + +start /= BITS_PER_LONG; +end = DIV_ROUND_UP(end, BITS_PER_LONG); +assert(end = last_long); + +for (i = start; i end; + i += sizeof(uint64_t) / sizeof(unsigned long)) { +uint64_t val; +#if HOST_LONG_BITS == 64 +val = bitmap[i]; +#elif HOST_LONG_BITS == 32 +if (i + 1 last_long) { +val = bitmap[i] | ((uint64_t)bitmap[i + 1] 32); +} else { +val = bitmap[i]; +} +#else +# error unsupported +#endif
[PATCH v3 29/35] postcopy/outgoing: add movebg mode(-m) to migration command
When movebg mode is enabled, the point to send background page is set to the next page to on-demand page. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- hmp-commands.hx |8 +--- hmp.c|3 ++- migration-postcopy.c |8 migration.c |5 - migration.h |1 + qapi-schema.json |2 +- qmp-commands.hx |2 +- savevm.c |1 + 8 files changed, 23 insertions(+), 7 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 5e2c77c..942f620 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -826,15 +826,16 @@ ETEXI { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s, forward:i?,backward:i?, -.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword], +.params = [-d] [-b] [-i] [-p [-n] [-m]] uri [forward] [backword], .help = migrate to URI (using -d to not wait for completion) \n\t\t\t -b for migration without shared storage with full copy of disk\n\t\t\t -i for migration without shared storage with incremental copy of disk (base image shared between src and destination) \n\t\t\t-p for migration with postcopy mode enabled + \n\t\t\t-m for move background transfer of postcopy mode \n\t\t\t-n for no background transfer of postcopy mode \n\t\t\tforward: the number of pages to forward-prefault when postcopy (default 0) @@ -845,12 +846,13 @@ ETEXI STEXI -@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @var{forward} @var{backward} +@item migrate [-d] [-b] [-i] [-p [-n] [-m]] @var{uri} @var{forward} @var{backward} @findex migrate Migrate to @var{uri} (using -d to not wait for completion). -b for migration with full copy of disk -i for migration with incremental copy of disk (base image is shared) -p for migration with postcopy mode enabled (forward/backward is prefault size when postcopy) + -m for migratoin with postcopy mode enabled with moving position -n for migration with postcopy mode enabled without background transfer ETEXI diff --git a/hmp.c b/hmp.c index fb1275d..a0bd869 100644 --- a/hmp.c +++ b/hmp.c @@ -1036,6 +1036,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) int blk = qdict_get_try_bool(qdict, blk, 0); int inc = qdict_get_try_bool(qdict, inc, 0); int postcopy = qdict_get_try_bool(qdict, postcopy, 0); +int movebg = qdict_get_try_bool(qdict, movebg, 0); int nobg = qdict_get_try_bool(qdict, nobg, 0); int forward = qdict_get_try_int(qdict, forward, 0); int backward = qdict_get_try_int(qdict, backward, 0); @@ -1043,7 +1044,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) Error *err = NULL; qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, -!!postcopy, postcopy, !!nobg, nobg, +!!postcopy, postcopy, !!movebg, movebg, !!nobg, nobg, !!forward, forward, !!backward, backward, err); if (err) { diff --git a/migration-postcopy.c b/migration-postcopy.c index 3d51898..421fb39 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -432,6 +432,14 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s, true, j); } } +if (s-ms-params.movebg) { +ram_addr_t last_offset = +(req-pgoffs[req-nr - 1] + s-ms-params.prefault_forward) +TARGET_PAGE_BITS; +last_offset = MIN(last_offset, + s-last_block_read-length - TARGET_PAGE_SIZE); +ram_save_set_last_block(s-last_block_read, last_offset); +} /* backward prefault */ for (j = 1; j = s-ms-params.prefault_backward; j++) { for (i = 0; i req-nr; i++) { diff --git a/migration.c b/migration.c index f29e3bb..057ea31 100644 --- a/migration.c +++ b/migration.c @@ -510,7 +510,9 @@ void migrate_del_blocker(Error *reason) void qmp_migrate(const char *uri, bool has_blk, bool blk, bool has_inc, bool inc, bool has_detach, bool detach, - bool has_postcopy, bool postcopy, bool has_nobg, bool nobg, + bool has_postcopy, bool postcopy, + bool has_movebg, bool movebg, + bool has_nobg, bool nobg, bool has_forward, int64_t forward, bool has_backward, int64_t backward, Error **errp) @@ -524,6 +526,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, params.shared = inc; params.postcopy = postcopy; params.nobg
[PATCH v3 30/35] arch_init: factor out ram_load
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 13 ++--- arch_init.h |3 +++ 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch_init.c b/arch_init.c index 9137013..f86a0b4 100644 --- a/arch_init.c +++ b/arch_init.c @@ -867,7 +867,9 @@ int ram_load_page(QEMUFile *f, void *host, int flags) return 0; } -static int ram_load(QEMUFile *f, void *opaque, int version_id) +int ram_load(QEMUFile *f, void *opaque, int version_id, + void *(host_from_stream_offset_p)(QEMUFile *f, + ram_addr_t offsset, int flags)) { ram_addr_t addr; int flags, ret = 0; @@ -899,7 +901,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) if (flags (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE | RAM_SAVE_FLAG_XBZRLE)) { -host = host_from_stream_offset(f, addr, flags); +host = host_from_stream_offset_p(f, addr, flags); if (!host) { return -EINVAL; } @@ -922,6 +924,11 @@ done: return ret; } +static int ram_load_precopy(QEMUFile *f, void *opaque, int version_id) +{ +return ram_load(f, opaque, version_id, host_from_stream_offset); +} + static void ram_save_set_params(const MigrationParams *params, void *opaque) { if (params-postcopy) { @@ -940,7 +947,7 @@ SaveVMHandlers savevm_ram_handlers = { .save_live_setup = ram_save_setup, .save_live_iterate = ram_save_iterate, .save_live_complete = ram_save_complete, -.load_state = ram_load, +.load_state = ram_load_precopy, .cancel = ram_migration_cancel, }; diff --git a/arch_init.h b/arch_init.h index 9165456..3977ca7 100644 --- a/arch_init.h +++ b/arch_init.h @@ -54,6 +54,9 @@ bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset, bool last_stage); RAMBlock *ram_find_block(const char *id, uint8_t len); int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes); +int ram_load(QEMUFile *f, void *opaque, int version_id, + void *(host_from_stream_offset_p)(QEMUFile *f, + ram_addr_t offsset, int flags)); #endif #endif -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 28/35] arch_init: factor out setting last_block, last_offset
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 10 +++--- arch_init.h |1 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch_init.c b/arch_init.c index d95ce7b..9137013 100644 --- a/arch_init.c +++ b/arch_init.c @@ -416,6 +416,12 @@ static void migration_bitmap_sync(void) static uint64_t bytes_transferred; +void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset) +{ +last_block = block; +last_offset = offset; +} + /* * ram_save_page: Writes a page of memory to the stream f * @@ -496,9 +502,7 @@ bool ram_save_block(QEMUFile *f, bool last_stage) } } while (block != last_block || offset != last_offset); -last_block = block; -last_offset = offset; - +ram_save_set_last_block(block, offset); return wrote; } diff --git a/arch_init.h b/arch_init.h index 499d0f1..9165456 100644 --- a/arch_init.h +++ b/arch_init.h @@ -49,6 +49,7 @@ CpuDefinitionInfoList GCC_WEAK_DECL *arch_query_cpu_definitions(Error **errp); int ram_load_page(QEMUFile *f, void *host, int flags); #if defined(NEED_CPU_H) !defined(CONFIG_USER_ONLY) +void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset); bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset, bool last_stage); RAMBlock *ram_find_block(const char *id, uint8_t len); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 26/35] postcopy/outgoing: add -n options to disable background transfer
This is for benchmark purpose Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- hmp-commands.hx | 10 ++ hmp.c|4 +++- migration-postcopy.c |7 +++ migration.c |4 +++- migration.h |1 + qapi-schema.json |2 +- qmp-commands.hx |3 ++- savevm.c |1 + 8 files changed, 24 insertions(+), 8 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index f2f1264..b054760 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -826,25 +826,27 @@ ETEXI { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s, -.params = [-d] [-b] [-i] [-p] uri, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, +.params = [-d] [-b] [-i] [-p [-n]] uri, .help = migrate to URI (using -d to not wait for completion) \n\t\t\t -b for migration without shared storage with full copy of disk\n\t\t\t -i for migration without shared storage with incremental copy of disk (base image shared between src and destination) - \n\t\t\t-p for migration with postcopy mode enabled, + \n\t\t\t-p for migration with postcopy mode enabled + \n\t\t\t-n for no background transfer of postcopy mode, .mhandler.cmd = hmp_migrate, }, STEXI -@item migrate [-d] [-b] [-i] [-p] @var{uri} +@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @findex migrate Migrate to @var{uri} (using -d to not wait for completion). -b for migration with full copy of disk -i for migration with incremental copy of disk (base image is shared) -p for migration with postcopy mode enabled + -n for migration with postcopy mode enabled without background transfer ETEXI { diff --git a/hmp.c b/hmp.c index 2ea3bc4..203b552 100644 --- a/hmp.c +++ b/hmp.c @@ -1036,11 +1036,13 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) int blk = qdict_get_try_bool(qdict, blk, 0); int inc = qdict_get_try_bool(qdict, inc, 0); int postcopy = qdict_get_try_bool(qdict, postcopy, 0); +int nobg = qdict_get_try_bool(qdict, nobg, 0); const char *uri = qdict_get_str(qdict, uri); Error *err = NULL; qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, -!!postcopy, postcopy, err); +!!postcopy, postcopy, !!nobg, nobg, +err); if (err) { monitor_printf(mon, migrate: %s\n, error_get_pretty(err)); error_free(err); diff --git a/migration-postcopy.c b/migration-postcopy.c index 399e233..5f98ae6 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -557,6 +557,13 @@ int postcopy_outgoing_ram_save_background(QEMUFile *f, void *postcopy) abort(); } +if (s-ms-params.nobg) { +if (ram_bytes_remaining() == 0) { +postcopy_outgoing_ram_all_sent(f, s); +} +return 0; +} + DPRINTF(outgoing background state: %d\n, s-state); i = 0; t0 = qemu_get_clock_ns(rt_clock); diff --git a/migration.c b/migration.c index 85f8f71..279dda5 100644 --- a/migration.c +++ b/migration.c @@ -510,7 +510,8 @@ void migrate_del_blocker(Error *reason) void qmp_migrate(const char *uri, bool has_blk, bool blk, bool has_inc, bool inc, bool has_detach, bool detach, - bool has_postcopy, bool postcopy, Error **errp) + bool has_postcopy, bool postcopy, bool has_nobg, bool nobg, + Error **errp) { MigrationState *s = migrate_get_current(); MigrationParams params; @@ -520,6 +521,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, params.blk = blk; params.shared = inc; params.postcopy = postcopy; +params.nobg = nobg; if (s-state == MIG_STATE_ACTIVE) { error_set(errp, QERR_MIGRATION_ACTIVE); diff --git a/migration.h b/migration.h index 9b3c03b..6724c19 100644 --- a/migration.h +++ b/migration.h @@ -25,6 +25,7 @@ struct MigrationParams { bool blk; bool shared; bool postcopy; +bool nobg; }; typedef struct MigrationState MigrationState; diff --git a/qapi-schema.json b/qapi-schema.json index c969e5a..70d0577 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2095,7 +2095,7 @@ ## { 'command': 'migrate', 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' , - '*postcopy': 'bool'} } + '*postcopy': 'bool', '*nobg': 'bool'} } # @xen-save-devices-state: # diff --git a/qmp-commands.hx b/qmp-commands.hx index ece7a7e..defbeba 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -518,7 +518,7 @@ EQMP { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n
[PATCH v3 25/35] postcopy: implement outgoing part of postcopy live migration
This patch implements postcopy live migration for outgoing part Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v2 - v3: - modify savevm_ram_handlers instead of if (postcopy) - code simplification Changes v1 - v2: - fix parameter to qemu_fdopen() - handle QEMU_UMEM_REQ_EOC properly when PO_STATE_ALL_PAGES_SENT, QEMU_UMEM_REQ_EOC request was ignored. handle properly it. - flush on-demand page unconditionally - improve postcopy_outgoing_ram_save_live and postcopy_outgoing_begin() - use qemu_fopen_fd - use memory api instead of obsolete api - segv in postcopy_outgoing_check_all_ram_sent() - catch up qapi change --- arch_init.c | 22 ++- migration-exec.c |4 + migration-fd.c | 17 ++ migration-postcopy.c | 423 ++ migration-tcp.c |6 +- migration-unix.c | 26 +++- migration.c | 32 +++- migration.h | 18 +++ savevm.c | 35 - sysemu.h |2 +- 10 files changed, 572 insertions(+), 13 deletions(-) diff --git a/arch_init.c b/arch_init.c index d82316d..d95ce7b 100644 --- a/arch_init.c +++ b/arch_init.c @@ -189,7 +189,6 @@ static struct { .cache = NULL, }; - int64_t xbzrle_cache_resize(int64_t new_size) { if (XBZRLE.cache != NULL) { @@ -591,6 +590,7 @@ static void reset_ram_globals(void) static int ram_save_setup(QEMUFile *f, void *opaque) { RAMBlock *block; +const MigrationParams *params = migrate_get_current()-params; migration_bitmap_init(); qemu_mutex_lock_ramlist(); @@ -610,8 +610,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque) acct_clear(); } -memory_global_dirty_log_start(); -migration_bitmap_sync(); +if (!params-postcopy) { +memory_global_dirty_log_start(); +migration_bitmap_sync(); +} qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); @@ -916,7 +918,21 @@ done: return ret; } +static void ram_save_set_params(const MigrationParams *params, void *opaque) +{ +if (params-postcopy) { +savevm_ram_handlers.save_live_iterate = +postcopy_outgoing_ram_save_iterate; +savevm_ram_handlers.save_live_complete = +postcopy_outgoing_ram_save_complete; +} else { +savevm_ram_handlers.save_live_iterate = ram_save_iterate; +savevm_ram_handlers.save_live_complete = ram_save_complete; +} +} + SaveVMHandlers savevm_ram_handlers = { +.set_params = ram_save_set_params, .save_live_setup = ram_save_setup, .save_live_iterate = ram_save_iterate, .save_live_complete = ram_save_complete, diff --git a/migration-exec.c b/migration-exec.c index 95e9779..10bbecf 100644 --- a/migration-exec.c +++ b/migration-exec.c @@ -64,6 +64,10 @@ int exec_start_outgoing_migration(MigrationState *s, const char *command) { FILE *f; +if (s-params.postcopy) { +return -ENOSYS; +} + f = popen(command, w); if (f == NULL) { DPRINTF(Unable to popen exec target\n); diff --git a/migration-fd.c b/migration-fd.c index 8384975..f68fa28 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -90,6 +90,23 @@ int fd_start_outgoing_migration(MigrationState *s, const char *fdname) s-write = fd_write; s-close = fd_close; +if (s-params.postcopy) { +int flags = fcntl(s-fd, F_GETFL); +if ((flags O_ACCMODE) != O_RDWR) { +goto err_after_open; +} + +s-fd_read = dup(s-fd); +if (s-fd_read == -1) { +goto err_after_open; +} +s-file_read = qemu_fopen_fd(s-fd_read, rb); +if (s-file_read == NULL) { +close(s-fd_read); +goto err_after_open; +} +} + migrate_fd_connect(s); return 0; diff --git a/migration-postcopy.c b/migration-postcopy.c index 0809ffa..399e233 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -167,6 +167,107 @@ static void postcopy_incoming_send_req(QEMUFile *f, } } +static int postcopy_outgoing_recv_req_idstr(QEMUFile *f, +struct qemu_umem_req *req, +size_t *offset) +{ +int ret; + +req-len = qemu_peek_byte(f, *offset); +*offset += 1; +if (req-len == 0) { +return -EAGAIN; +} +req-idstr = g_malloc((int)req-len + 1); +ret = qemu_peek_buffer(f, (uint8_t*)req-idstr, req-len, *offset); +*offset += ret; +if (ret != req-len) { +g_free(req-idstr); +req-idstr = NULL; +return -EAGAIN; +} +req-idstr[req-len] = 0; +return 0; +} + +static int postcopy_outgoing_recv_req_pgoffs(QEMUFile *f, + struct qemu_umem_req *req, + size_t *offset) +{ +int ret; +uint32_t be32; +uint32_t i; + +ret = qemu_peek_buffer(f, (uint8_t*)be32, sizeof(be32
[PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault
When page is requested, send surrounding pages are also sent. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- hmp-commands.hx | 15 - hmp.c|3 +++ migration-postcopy.c | 57 +- migration.c | 20 ++ migration.h |2 ++ qapi-schema.json |3 ++- 6 files changed, 89 insertions(+), 11 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index b054760..5e2c77c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -826,26 +826,31 @@ ETEXI { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, -.params = [-d] [-b] [-i] [-p [-n]] uri, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, + forward:i?,backward:i?, +.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword], .help = migrate to URI (using -d to not wait for completion) \n\t\t\t -b for migration without shared storage with full copy of disk\n\t\t\t -i for migration without shared storage with incremental copy of disk (base image shared between src and destination) \n\t\t\t-p for migration with postcopy mode enabled - \n\t\t\t-n for no background transfer of postcopy mode, + \n\t\t\t-n for no background transfer of postcopy mode + \n\t\t\tforward: the number of pages to + forward-prefault when postcopy (default 0) + \n\t\t\tbackward: the number of pages to + backward-prefault when postcopy (default 0), .mhandler.cmd = hmp_migrate, }, STEXI -@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} +@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @var{forward} @var{backward} @findex migrate Migrate to @var{uri} (using -d to not wait for completion). -b for migration with full copy of disk -i for migration with incremental copy of disk (base image is shared) - -p for migration with postcopy mode enabled + -p for migration with postcopy mode enabled (forward/backward is prefault size when postcopy) -n for migration with postcopy mode enabled without background transfer ETEXI diff --git a/hmp.c b/hmp.c index 203b552..fb1275d 100644 --- a/hmp.c +++ b/hmp.c @@ -1037,11 +1037,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) int inc = qdict_get_try_bool(qdict, inc, 0); int postcopy = qdict_get_try_bool(qdict, postcopy, 0); int nobg = qdict_get_try_bool(qdict, nobg, 0); +int forward = qdict_get_try_int(qdict, forward, 0); +int backward = qdict_get_try_int(qdict, backward, 0); const char *uri = qdict_get_str(qdict, uri); Error *err = NULL; qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, !!postcopy, postcopy, !!nobg, nobg, +!!forward, forward, !!backward, backward, err); if (err) { monitor_printf(mon, migrate: %s\n, error_get_pretty(err)); diff --git a/migration-postcopy.c b/migration-postcopy.c index 5f98ae6..3d51898 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -344,6 +344,37 @@ int postcopy_outgoing_ram_save_complete(QEMUFile *f, void *opaque) return 0; } +static void postcopy_outgoing_ram_save_page(PostcopyOutgoingState *s, +uint64_t pgoffset, bool *written, +bool forward, +int prefault_pgoffset) +{ +ram_addr_t offset; +int ret; + +if (forward) { +pgoffset += prefault_pgoffset; +} else { +if (pgoffset prefault_pgoffset) { +return; +} +pgoffset -= prefault_pgoffset; +} + +offset = pgoffset TARGET_PAGE_BITS; +if (offset = s-last_block_read-length) { +assert(forward); +assert(prefault_pgoffset 0); +return; +} + +ret = ram_save_page(s-mig_buffered_write, s-last_block_read, offset, +false); +if (ret 0) { +*written = true; +} +} + /* * return value * 0: continue postcopy mode @@ -355,6 +386,7 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s, bool *written) { int i; +uint64_t j; RAMBlock *block; DPRINTF(cmd %d state %d\n, req-cmd, s-state); @@ -387,11 +419,26 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s, break; } for (i = 0; i req-nr; i++) { -DPRINTF(offs[%d] 0x%PRIx64\n, i, req-pgoffs[i]); -int ret = ram_save_page(s-mig_buffered_write, s-last_block_read, -req-pgoffs[i
[PATCH v3 23/35] postcopy: implement incoming part of postcopy live migration
This patch implements postcopy live migration for incoming part Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v2 - v3: - threading, not fork - use blocking io instead of select + non-blocking io - don't modify RAMBlock - When device allocates its own RAM region, e.g. vshmem, it's handled by device save/load. So skip it such area which has RAM_PREALLOLC_MASK flags set. - less memory overhead - drop -postcopy option. It is automatically detected. - various improvement and simplification - error handling Changes v1 - v2: - fork umemd early to address qemu devices touching guest ram via post/pre_load - code clean up on initialization - Makefile.target migration-postcopy.c is target dependent due to TARGET_PAGE_xxx So it can't be shared between target architecture. - use qemu_fopen_fd - introduce incoming_flags_use_umem_make_present flag - use MADV_DONTNEED - make incoming socket nonblocking - several clean ups - Dropped QEMUFilePipe - Moved QEMUFileNonblock to buffered_file - Split out into umem/incoming/outgoing - make mig_read nonblocking when socket - updates for umem device changes --- Makefile.target |2 + cpu-all.h|3 + exec.c |6 + migration-fd.c |4 +- migration-postcopy.c | 1249 ++ migration-tcp.c | 10 +- migration-unix.c | 10 +- migration.h | 10 + savevm.c | 28 ++ vl.c |2 + 10 files changed, 1315 insertions(+), 9 deletions(-) create mode 100644 migration-postcopy.c diff --git a/Makefile.target b/Makefile.target index 3822bc5..930c070 100644 --- a/Makefile.target +++ b/Makefile.target @@ -121,6 +121,8 @@ obj-$(CONFIG_NO_GET_MEMORY_MAPPING) += memory_mapping-stub.o obj-$(CONFIG_NO_CORE_DUMP) += dump-stub.o LIBS+=-lz +obj-y += migration-postcopy.o umem.o + QEMU_CFLAGS += $(VNC_TLS_CFLAGS) QEMU_CFLAGS += $(VNC_SASL_CFLAGS) QEMU_CFLAGS += $(VNC_JPEG_CFLAGS) diff --git a/cpu-all.h b/cpu-all.h index b5fefc8..79846fe 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -485,6 +485,9 @@ extern ram_addr_t ram_size; /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ #define RAM_PREALLOC_MASK (1 0) +/* RAM is allocated via umem for postcopy incoming mode */ +#define RAM_POSTCOPY_UMEM_MASK (1 1) + typedef struct RAMBlock { struct MemoryRegion *mr; uint8_t *host; diff --git a/exec.c b/exec.c index 2aa4d90..6da991a 100644 --- a/exec.c +++ b/exec.c @@ -36,6 +36,7 @@ #include arch_init.h #include memory.h #include exec-memory.h +#include migration.h #if defined(CONFIG_USER_ONLY) #include qemu.h #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) @@ -2555,6 +2556,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, new_block-host = host; new_block-flags |= RAM_PREALLOC_MASK; } else { +ram_addr_t page_size = getpagesize(); +size = (size + page_size - 1) ~(page_size - 1); if (mem_path) { #if defined (__linux__) !defined(TARGET_S390X) new_block-host = file_ram_alloc(new_block, size, mem_path); @@ -2635,6 +2638,9 @@ void qemu_ram_free(ram_addr_t addr) ram_list.version++; if (block-flags RAM_PREALLOC_MASK) { ; +} +else if (block-flags RAM_POSTCOPY_UMEM_MASK) { +postcopy_incoming_ram_free(block); } else if (mem_path) { #if defined (__linux__) !defined(TARGET_S390X) if (block-fd) { diff --git a/migration-fd.c b/migration-fd.c index b3c54e5..8384975 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -105,7 +105,9 @@ static void fd_accept_incoming_migration(void *opaque) process_incoming_migration(f); qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL); -qemu_fclose(f); +if (!incoming_postcopy) { +qemu_fclose(f); +} } int fd_start_incoming_migration(const char *infd) diff --git a/migration-postcopy.c b/migration-postcopy.c new file mode 100644 index 000..0809ffa --- /dev/null +++ b/migration-postcopy.c @@ -0,0 +1,1249 @@ +/* + * migration-postcopy.c: postcopy livemigration + * + * Copyright (c) 2011 + * National Institute of Advanced Industrial Science and Technology + * + * https://sites.google.com/site/grivonhome/quick-kvm-migration + * Author: Isaku Yamahata yamahata at valinux co jp + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program
[PATCH v3 24/35] postcopy outgoing: add -p option to migrate command
Added -p option to migrate command for postcopy mode and introduce postcopy parameter for migration to indicate that postcopy mode is enabled. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Chnages v1 - v2: - catch up for qapi change --- hmp-commands.hx | 10 ++ hmp.c|4 +++- migration.c |3 ++- migration.h |1 + qapi-schema.json |3 ++- qmp-commands.hx |3 ++- savevm.c |3 ++- 7 files changed, 18 insertions(+), 9 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index e0b537d..f2f1264 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -826,23 +826,25 @@ ETEXI { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,uri:s, -.params = [-d] [-b] [-i] uri, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s, +.params = [-d] [-b] [-i] [-p] uri, .help = migrate to URI (using -d to not wait for completion) \n\t\t\t -b for migration without shared storage with full copy of disk\n\t\t\t -i for migration without shared storage with incremental copy of disk - (base image shared between src and destination), + (base image shared between src and destination) + \n\t\t\t-p for migration with postcopy mode enabled, .mhandler.cmd = hmp_migrate, }, STEXI -@item migrate [-d] [-b] [-i] @var{uri} +@item migrate [-d] [-b] [-i] [-p] @var{uri} @findex migrate Migrate to @var{uri} (using -d to not wait for completion). -b for migration with full copy of disk -i for migration with incremental copy of disk (base image is shared) + -p for migration with postcopy mode enabled ETEXI { diff --git a/hmp.c b/hmp.c index 2b97982..2ea3bc4 100644 --- a/hmp.c +++ b/hmp.c @@ -1035,10 +1035,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) int detach = qdict_get_try_bool(qdict, detach, 0); int blk = qdict_get_try_bool(qdict, blk, 0); int inc = qdict_get_try_bool(qdict, inc, 0); +int postcopy = qdict_get_try_bool(qdict, postcopy, 0); const char *uri = qdict_get_str(qdict, uri); Error *err = NULL; -qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, err); +qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, +!!postcopy, postcopy, err); if (err) { monitor_printf(mon, migrate: %s\n, error_get_pretty(err)); error_free(err); diff --git a/migration.c b/migration.c index 00b0bc2..8bb6073 100644 --- a/migration.c +++ b/migration.c @@ -480,7 +480,7 @@ void migrate_del_blocker(Error *reason) void qmp_migrate(const char *uri, bool has_blk, bool blk, bool has_inc, bool inc, bool has_detach, bool detach, - Error **errp) + bool has_postcopy, bool postcopy, Error **errp) { MigrationState *s = migrate_get_current(); MigrationParams params; @@ -489,6 +489,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, params.blk = blk; params.shared = inc; +params.postcopy = postcopy; if (s-state == MIG_STATE_ACTIVE) { error_set(errp, QERR_MIGRATION_ACTIVE); diff --git a/migration.h b/migration.h index 0766691..b21df18 100644 --- a/migration.h +++ b/migration.h @@ -24,6 +24,7 @@ struct MigrationParams { bool blk; bool shared; +bool postcopy; }; typedef struct MigrationState MigrationState; diff --git a/qapi-schema.json b/qapi-schema.json index c615ee2..c969e5a 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2094,7 +2094,8 @@ # Since: 0.14.0 ## { 'command': 'migrate', - 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } } + 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' , + '*postcopy': 'bool'} } # @xen-save-devices-state: # diff --git a/qmp-commands.hx b/qmp-commands.hx index 5ba8c48..ece7a7e 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -518,7 +518,7 @@ EQMP { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,uri:s, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s, .mhandler.cmd_new = qmp_marshal_input_migrate, }, @@ -532,6 +532,7 @@ Arguments: - blk: block migration, full disk copy (json-bool, optional) - inc: incremental disk copy (json-bool, optional) +- postcopy: postcopy migration (json-bool, optional) - uri: Destination URI (json-string) Example: diff --git a/savevm.c b/savevm.c index d1488d2..04b03cf 100644 --- a/savevm.c +++ b/savevm.c @@ -1806,7 +1806,8 @@ static int qemu_savevm_state(QEMUFile *f) int ret; MigrationParams params = { .blk = 0, -.shared = 0 +.shared = 0, +.postcopy = 0, }; if (qemu_savevm_state_blocked(NULL)) { -- 1.7.10.4 -- To unsubscribe from this list: send the line
[PATCH v3 20/35] osdep: add QEMU_MADV_REMOVE and tirivial fix
MADV_REMOVE will be used by postcopy. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- osdep.h | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/osdep.h b/osdep.h index c5fd3d9..9e97f39 100644 --- a/osdep.h +++ b/osdep.h @@ -113,6 +113,11 @@ void qemu_vfree(void *ptr); #else #define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID #endif +#ifdef MADV_REMOVE +#define QEMU_MADV_REMOVE MADV_REMOVE +#else +#define QEMU_MADV_REMOVE QEMU_MADV_INVALID +#endif #elif defined(CONFIG_POSIX_MADVISE) @@ -120,7 +125,9 @@ void qemu_vfree(void *ptr); #define QEMU_MADV_DONTNEED POSIX_MADV_DONTNEED #define QEMU_MADV_DONTFORK QEMU_MADV_INVALID #define QEMU_MADV_MERGEABLE QEMU_MADV_INVALID -#define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID +#define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID +#define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID +#define QEMU_MADV_REMOVEQEMU_MADV_INVALID #else /* no-op */ @@ -128,7 +135,9 @@ void qemu_vfree(void *ptr); #define QEMU_MADV_DONTNEED QEMU_MADV_INVALID #define QEMU_MADV_DONTFORK QEMU_MADV_INVALID #define QEMU_MADV_MERGEABLE QEMU_MADV_INVALID -#define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID +#define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID +#define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID +#define QEMU_MADV_REMOVEQEMU_MADV_INVALID #endif -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 15/35] arch_init/ram_save_setup: factor out bitmap alloc/free
This will be used by postcopy. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v2 - v3: - new --- arch_init.c | 25 ++--- migration.h |2 ++ 2 files changed, 20 insertions(+), 7 deletions(-) diff --git a/arch_init.c b/arch_init.c index ad1b01b..7e6d84e 100644 --- a/arch_init.c +++ b/arch_init.c @@ -330,6 +330,22 @@ static unsigned long *migration_bitmap; static uint64_t migration_dirty_pages; static uint32_t last_version; +void migration_bitmap_init(void) +{ +int64_t ram_pages = last_ram_offset() TARGET_PAGE_BITS; +if (!migration_bitmap) { +migration_bitmap = bitmap_new(ram_pages); +} +bitmap_set(migration_bitmap, 1, ram_pages); +migration_dirty_pages = ram_pages; +} + +void migration_bitmap_free(void) +{ +g_free(migration_bitmap); +migration_bitmap = NULL; +} + static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr, ram_addr_t offset) { @@ -575,11 +591,7 @@ static void reset_ram_globals(void) static int ram_save_setup(QEMUFile *f, void *opaque) { RAMBlock *block; -int64_t ram_pages = last_ram_offset() TARGET_PAGE_BITS; - -migration_bitmap = bitmap_new(ram_pages); -bitmap_set(migration_bitmap, 1, ram_pages); -migration_dirty_pages = ram_pages; +migration_bitmap_init(); qemu_mutex_lock_ramlist(); bytes_transferred = 0; @@ -704,8 +716,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) qemu_mutex_unlock_ramlist(); qemu_put_be64(f, RAM_SAVE_FLAG_EOS); -g_free(migration_bitmap); -migration_bitmap = NULL; +migration_bitmap_free(); return 0; } diff --git a/migration.h b/migration.h index 7d1b62d..73416ba 100644 --- a/migration.h +++ b/migration.h @@ -95,6 +95,8 @@ bool ram_save_block(QEMUFile *f, bool last_stage); uint64_t ram_bytes_remaining(void); uint64_t ram_bytes_transferred(void); uint64_t ram_bytes_total(void); +void migration_bitmap_init(void); +void migration_bitmap_free(void); extern SaveVMHandlers savevm_ram_handlers; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/2] export necessary symbols
Cc: Andrea Arcangeli aarca...@redhat.com Cc: Avi Kivity a...@redhat.com Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- mm/memcontrol.c |1 + mm/mempolicy.c |1 + mm/shmem.c |1 + 3 files changed, 3 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7acf43b..bc9fd53 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2787,6 +2787,7 @@ int mem_cgroup_newpage_charge(struct page *page, return mem_cgroup_charge_common(page, mm, gfp_mask, MEM_CGROUP_CHARGE_TYPE_ANON); } +EXPORT_SYMBOL_GPL(mem_cgroup_cache_charge); /* * While swap-in, try_charge - commit or cancel, the page is locked. diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d04a8a5..3df6cf5 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1947,6 +1947,7 @@ retry_cpuset: goto retry_cpuset; return page; } +EXPORT_SYMBOL_GPL(alloc_pages_vma); /** * alloc_pages_current - Allocate pages. diff --git a/mm/shmem.c b/mm/shmem.c index 67afba5..41eaefd 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2840,6 +2840,7 @@ int shmem_zero_setup(struct vm_area_struct *vma) vma-vm_ops = shmem_vm_ops; return 0; } +EXPORT_SYMBOL_GPL(shmem_zero_setup); /** * shmem_read_mapping_page_gfp - read into page cache, using specified page allocation flags. -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/2] postcopy migration: uvmem: Linux char device for postcopy
This is Linux kernel driver for qemu/kvm postcopy live migration. This is used by qemu/kvm postcopy live migration patch. User process backed memory driver provides /dev/uvmem device. This /dev/uvmem device is designed for some sort of distributed shared memory. page fault in the area backed by this driver is propagated to (other) server process which serves the page contents. Usually the server process fetches page contents from the remote machine. Then the faulting process continues. ioctl UVMEM_INIT: initialize uvmem device for qemu. Returns file descriptor of tmpfs, serving thread write page contents to this file descriptor. mmap: Guest VM mmaps this device and use it as guest RAM. page fault on this area will be propagated to the service process. read: returns page offset that guest VM page-faulted. write: server process notifies the device which pages are served, then guest VM can resume execution. --- Changes v3 - v4: - rename module name: umem - uvmem avoid module name conflict Changes v2 - v3: - make fault handler killable - make use of read()/write() - documentation Changes version 1 - 2: - make ioctl structures padded to align - un-KVM KVM_VMEM - UMEM - dropped some ioctl commands as Avi requested Isaku Yamahata (2): export necessary symbols umem: chardevice for kvm postcopy Documentation/misc-devices/uvmem.txt | 292 drivers/char/Kconfig | 10 + drivers/char/Makefile|1 + drivers/char/uvmem.c | 841 ++ include/linux/uvmem.h| 41 ++ mm/memcontrol.c |1 + mm/mempolicy.c |1 + mm/shmem.c |1 + 8 files changed, 1188 insertions(+) create mode 100644 Documentation/misc-devices/uvmem.txt create mode 100644 drivers/char/uvmem.c create mode 100644 include/linux/uvmem.h -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 21/35] postcopy: introduce helper functions for postcopy
This patch introduces helper function for postcopy to access umem char device and to communicate between incoming-qemu and umemd. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- changes v2 - v3: - error check, don't abort - typedef - #ifdef CONFIG_LINUX - code simplification changes v1 - v2: - code simplification - make fault trigger more robust - introduce struct umem_pages --- umem.c | 291 umem.h | 88 2 files changed, 379 insertions(+) create mode 100644 umem.c create mode 100644 umem.h diff --git a/umem.c b/umem.c new file mode 100644 index 000..b05377b --- /dev/null +++ b/umem.c @@ -0,0 +1,291 @@ +/* + * umem.c: user process backed memory module for postcopy livemigration + * + * Copyright (c) 2011 + * National Institute of Advanced Industrial Science and Technology + * + * https://sites.google.com/site/grivonhome/quick-kvm-migration + * Author: Isaku Yamahata yamahata at valinux co jp + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include sys/ioctl.h +#include sys/mman.h + +#include config-host.h +#ifdef CONFIG_LINUX +#include linux/uvmem.h +#endif + +#include bitops.h +#include sysemu.h +#include hw/hw.h +#include umem.h + +//#define DEBUG_UMEM +#ifdef DEBUG_UMEM +#define DPRINTF(format, ...)\ +do {\ +printf(%s:%d format, __func__, __LINE__, ## __VA_ARGS__); \ +} while (0) +#else +#define DPRINTF(format, ...)do { } while (0) +#endif + +#define DEV_UMEM/dev/uvmem + +int umem_new(void *hostp, size_t size, UMem** umemp) +{ +#ifdef CONFIG_LINUX +struct uvmem_init uinit = { +.size = size, +.shmem_fd = -1, +}; +UMem *umem; +int error; + +assert((size % getpagesize()) == 0); +umem = g_new(UMem, 1); +umem-fd = open(DEV_UMEM, O_RDWR); +if (umem-fd 0) { +error = -errno; +perror(can't open DEV_UMEM); +goto error; +} + +if (ioctl(umem-fd, UVMEM_INIT, uinit) 0) { +error = -errno; +perror(UMEM_INIT failed); +goto error; +} +if (ftruncate(uinit.shmem_fd, uinit.size) 0) { +error = -errno; +perror(truncate(\shmem_fd\) failed); +goto error; +} + +umem-nbits = 0; +umem-nsets = 0; +umem-faulted = NULL; +umem-page_shift = ffs(getpagesize()) - 1; +umem-shmem_fd = uinit.shmem_fd; +umem-size = uinit.size; +umem-umem = mmap(hostp, size, PROT_EXEC | PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_FIXED, umem-fd, 0); +if (umem-umem == MAP_FAILED) { +error = -errno; +perror(mmap(UMem) failed); +goto error; +} +*umemp = umem; +return 0; + +error: +if (umem-fd = 0) { +close(umem-fd); +} +if (uinit.shmem_fd = 0) { +close(uinit.shmem_fd); +} +g_free(umem); +return error; +#else +perror(postcopy migration is not supported); +return -ENOSYS; +#endif +} + +void umem_destroy(UMem *umem) +{ +if (umem-fd != -1) { +close(umem-fd); +} +if (umem-shmem_fd != -1) { +close(umem-shmem_fd); +} +g_free(umem-faulted); +g_free(umem); +} + +size_t umem_pages_size(uint64_t nr) +{ +return sizeof(UMemPages) + nr * sizeof(uint64_t); +} + +int umem_get_page_request(UMem *umem, UMemPages *page_request) +{ +ssize_t ret = read(umem-fd, page_request-pgoffs, + page_request-nr * sizeof(page_request-pgoffs[0])); +if (ret 0) { +if (errno != EINTR) { +perror(daemon: umem read failed); +return -errno; +} +ret = 0; +} +page_request-nr = ret / sizeof(page_request-pgoffs[0]); +return 0; +} + +int umem_mark_page_cached(UMem *umem, UMemPages *page_cached) +{ +const void *buf = page_cached-pgoffs; +size_t size = page_cached-nr * sizeof(page_cached-pgoffs[0]); +ssize_t ret; + +ret = qemu_write_full(umem-fd, buf, size); +if (ret != size) { +perror(daemon: umem write); +return -errno; +} +return 0; +} + +void umem_unmap(UMem *umem) +{ +munmap(umem-umem, umem-size); +umem-umem = NULL; +} + +void umem_close(UMem *umem) +{ +close(umem-fd); +umem-fd = -1; +} + +int umem_map_shmem(UMem *umem) +{ +umem-nbits
[PATCH v3 22/35] savevm: add new section that is used by postcopy
This is used by postcopy to tell the total length of QEMU_VM_SECTION_FULL and QEMU_VM_SUBSECTION from outgoing to incoming. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- savevm.c |4 1 file changed, 4 insertions(+) diff --git a/savevm.c b/savevm.c index 93c51ab..c93b6eb 100644 --- a/savevm.c +++ b/savevm.c @@ -1614,6 +1614,10 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se) #define QEMU_VM_SECTION_FULL 0x04 #define QEMU_VM_SUBSECTION 0x05 +/* This section is used by postcopy to tell postcopy enabled session. + If the destination side doesn't know, it sees unknown section and abort. */ +#define QEMU_VM_POSTCOPY 0x10 + bool qemu_savevm_state_blocked(Error **errp) { SaveStateEntry *se; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 18/35] migration: export migrate_fd_completed() and migrate_fd_cleanup()
This will be used by postcopy migration. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- migration.c |4 ++-- migration.h |2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/migration.c b/migration.c index 8fcb466..00b0bc2 100644 --- a/migration.c +++ b/migration.c @@ -242,7 +242,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, /* shared migration helpers */ -static int migrate_fd_cleanup(MigrationState *s) +int migrate_fd_cleanup(MigrationState *s) { int ret = 0; @@ -272,7 +272,7 @@ void migrate_fd_error(MigrationState *s) migrate_fd_cleanup(s); } -static void migrate_fd_completed(MigrationState *s) +void migrate_fd_completed(MigrationState *s) { DPRINTF(setting completed state\n); if (migrate_fd_cleanup(s) 0) { diff --git a/migration.h b/migration.h index 73416ba..2d27738 100644 --- a/migration.h +++ b/migration.h @@ -74,7 +74,9 @@ int fd_start_incoming_migration(const char *path); int fd_start_outgoing_migration(MigrationState *s, const char *fdname); +int migrate_fd_cleanup(MigrationState *s); void migrate_fd_error(MigrationState *s); +void migrate_fd_completed(MigrationState *s); void migrate_fd_connect(MigrationState *s); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 19/35] uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- linux-headers/linux/uvmem.h | 41 +++ scripts/update-linux-headers.sh |2 +- 2 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 linux-headers/linux/uvmem.h diff --git a/linux-headers/linux/uvmem.h b/linux-headers/linux/uvmem.h new file mode 100644 index 000..ea88980 --- /dev/null +++ b/linux-headers/linux/uvmem.h @@ -0,0 +1,41 @@ +/* + * User process backed memory. + * This is mainly for KVM post copy. + * + * Copyright (c) 2011, + * National Institute of Advanced Industrial Science and Technology + * + * https://sites.google.com/site/grivonhome/quick-kvm-migration + * Author: Isaku Yamahata yamahata at valinux co jp + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#ifndef __LINUX_UVMEM_H +#define __LINUX_UVMEM_H + +#include linux/types.h +#include linux/ioctl.h + +struct uvmem_init { + __u64 size; /* in bytes */ + __s32 shmem_fd; + __s32 padding; +}; + +#define UVMEMIO0x1E + +/* ioctl for uvmem fd */ +#define UVMEM_INIT _IOWR(UVMEMIO, 0x0, struct uvmem_init) + +#endif /* __LINUX_UVMEM_H */ diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 67be2ef..0fa25ce 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -57,7 +57,7 @@ done rm -rf $output/linux-headers/linux mkdir -p $output/linux-headers/linux -for header in kvm.h kvm_para.h vfio.h vhost.h virtio_config.h virtio_ring.h; do +for header in kvm.h kvm_para.h vfio.h vhost.h virtio_config.h virtio_ring.h umem.h; do cp $tmpdir/include/linux/$header $output/linux-headers/linux done rm -rf $output/linux-headers/asm-generic -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/2] umem: chardevice for kvm postcopy
This is a character device to hook page access. The page fault in the area is propagated to another user process by this chardriver. Then, the process fills the page contents and resolves the page fault. Cc: Andrea Arcangeli aarca...@redhat.com Cc: Avi Kivity a...@redhat.com Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v4 - v5: - rename umem to uvmem to avoid name conflict Changes v3 - v4: - simplified umem_init: kill {a,}sync_req_max - make fault handler killable even when core-dumping - documentation Changes v2 - v3: - made fault handler killable - allow O_LARGEFILE - improve to handle FAULT_FLAG_ALLOW_RETRY - smart on async fault --- Documentation/misc-devices/uvmem.txt | 292 drivers/char/Kconfig | 10 + drivers/char/Makefile|1 + drivers/char/uvmem.c | 841 ++ include/linux/uvmem.h| 41 ++ 5 files changed, 1185 insertions(+) create mode 100644 Documentation/misc-devices/uvmem.txt create mode 100644 drivers/char/uvmem.c create mode 100644 include/linux/uvmem.h diff --git a/Documentation/misc-devices/uvmem.txt b/Documentation/misc-devices/uvmem.txt new file mode 100644 index 000..a9c15a2 --- /dev/null +++ b/Documentation/misc-devices/uvmem.txt @@ -0,0 +1,292 @@ +User process backed memory driver += + +Intro += +User process backed memory driver provides /dev/uvmem device. +This /dev/uvmem device is designed for some sort of distributed shared memory. +Especially post-copy live migration with KVM. + +page fault in the area backed by this driver is propagated to (other) server +process which serves the page contents. Usually the server process fetches +page contents from the remote machine. Then the faulting process continues. + + +Kernel-User protocol + +ioctl +UVMEM_INIT: Initialize the uvmem device with some parameters. + IN size: the area size in bytes (which is rounded up to page size) + OUT shmem_fd: the file descript to tmpfs that is associated to this uvmem +device This is served as backing store of this uvmem device. + +mmap: Mapping the initialized uvmem device provides the area which + is served by user process. + The fault in this area is propagated to uvmem device via read + system call. +read: kernel notifies a process that pages are faulted by returning + page offset in page size in u64 format. + uvmem device is pollable for read. +write: Process notifies kernel that the page is ready to access + by writing page offset in page size in u64 format. + + +operation flow +== + +| +V + open(/dev/uvmem) +| +V + ioctl(UVMEM_INIT) +| +V + Here we have two file descriptors to + uvmem device and shmem file +| +| daemon process which serves +| page fault +V + fork()---, +| | +V V + close(shmem) mmap(shmem file) +| | +V V + mmap(uvmem device) close(shmem file) +| | +V | + close(uvmem device) | +| | + now the setup is done| + work on the uvmem area| +| | +V V + access uvmem area (poll and) read(uvmem) +| | +V V + page fault -- read system call returns + block page offsets + | + V +create page contents +(usually pull the page + from remote) +write the page contents +to the shmem which was +mmapped above
[PATCH v3 17/35] arch_init: factor out logic to find ram block with id string
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 31 --- arch_init.h |1 + exec.c | 12 ++-- 3 files changed, 27 insertions(+), 17 deletions(-) diff --git a/arch_init.c b/arch_init.c index c77e24d..d82316d 100644 --- a/arch_init.c +++ b/arch_init.c @@ -762,6 +762,19 @@ static int load_xbzrle(QEMUFile *f, void *host) return rc; } +RAMBlock *ram_find_block(const char *id, uint8_t len) +{ +RAMBlock *block; + +QLIST_FOREACH(block, ram_list.blocks, next) { +if (!strncmp(id, block-idstr, len)) { +return block; +} +} + +return NULL; +} + static inline void *host_from_stream_offset(QEMUFile *f, ram_addr_t offset, int flags) @@ -783,9 +796,9 @@ static inline void *host_from_stream_offset(QEMUFile *f, qemu_get_buffer(f, (uint8_t *)id, len); id[len] = 0; -QLIST_FOREACH(block, ram_list.blocks, next) { -if (!strncmp(id, block-idstr, sizeof(id))) -return memory_region_get_ram_ptr(block-mr) + offset; +block = ram_find_block(id, len); +if (block) { +return memory_region_get_ram_ptr(block-mr) + offset; } fprintf(stderr, Can't find block %s!\n, id); @@ -807,19 +820,15 @@ int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes) id[len] = 0; length = qemu_get_be64(f); -QLIST_FOREACH(block, ram_list.blocks, next) { -if (!strncmp(id, block-idstr, sizeof(id))) { -if (block-length != length) -return -EINVAL; -break; -} -} - +block = ram_find_block(id, len); if (!block) { fprintf(stderr, Unknown ramblock \%s\, cannot accept migration\n, id); return -EINVAL; } +if (block-length != length) { +return -EINVAL; +} total_ram_bytes -= length; } diff --git a/arch_init.h b/arch_init.h index bca1a29..499d0f1 100644 --- a/arch_init.h +++ b/arch_init.h @@ -51,6 +51,7 @@ int ram_load_page(QEMUFile *f, void *host, int flags); #if defined(NEED_CPU_H) !defined(CONFIG_USER_ONLY) bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset, bool last_stage); +RAMBlock *ram_find_block(const char *id, uint8_t len); int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes); #endif diff --git a/exec.c b/exec.c index 1414654..2aa4d90 100644 --- a/exec.c +++ b/exec.c @@ -33,6 +33,7 @@ #include kvm.h #include hw/xen.h #include qemu-timer.h +#include arch_init.h #include memory.h #include exec-memory.h #if defined(CONFIG_USER_ONLY) @@ -2517,12 +2518,11 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev) pstrcat(new_block-idstr, sizeof(new_block-idstr), name); qemu_mutex_lock_ramlist(); -QLIST_FOREACH(block, ram_list.blocks, next) { -if (block != new_block !strcmp(block-idstr, new_block-idstr)) { -fprintf(stderr, RAMBlock \%s\ already registered, abort!\n, -new_block-idstr); -abort(); -} +block = ram_find_block(new_block-idstr, strlen(new_block-idstr)); +if (block != new_block) { +fprintf(stderr, RAMBlock \%s\ already registered, abort!\n, +new_block-idstr); +abort(); } qemu_mutex_unlock_ramlist(); } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 07/35] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip, qemu_fflush
Those will be used by postcopy. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- qemu-file.h |4 savevm.c|8 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/qemu-file.h b/qemu-file.h index 9c8985b..9b6dd08 100644 --- a/qemu-file.h +++ b/qemu-file.h @@ -72,6 +72,7 @@ QEMUFile *qemu_popen(FILE *popen_file, const char *mode); QEMUFile *qemu_popen_cmd(const char *command, const char *mode); int qemu_stdio_fd(QEMUFile *f); int qemu_fclose(QEMUFile *f); +int qemu_fflush(QEMUFile *f); void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size); void qemu_put_byte(QEMUFile *f, int v); @@ -87,6 +88,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v); void qemu_put_be64(QEMUFile *f, uint64_t v); int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size); int qemu_get_byte(QEMUFile *f); +int qemu_peek_byte(QEMUFile *f, int offset); +int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset); +void qemu_file_skip(QEMUFile *f, int size); static inline unsigned int qemu_get_ubyte(QEMUFile *f) { diff --git a/savevm.c b/savevm.c index b080d37..0c7af43 100644 --- a/savevm.c +++ b/savevm.c @@ -448,7 +448,7 @@ static void qemu_file_set_error(QEMUFile *f, int ret) /** Flushes QEMUFile buffer * */ -static int qemu_fflush(QEMUFile *f) +int qemu_fflush(QEMUFile *f) { int ret = 0; @@ -583,14 +583,14 @@ void qemu_put_byte(QEMUFile *f, int v) } } -static void qemu_file_skip(QEMUFile *f, int size) +void qemu_file_skip(QEMUFile *f, int size) { if (f-buf_index + size = f-buf_size) { f-buf_index += size; } } -static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset) +int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset) { int pending; int index; @@ -638,7 +638,7 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size) return done; } -static int qemu_peek_byte(QEMUFile *f, int offset) +int qemu_peek_byte(QEMUFile *f, int offset) { int index = f-buf_index + offset; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 12/35] arch_init: export RAM_SAVE_xxx flags for postcopy
Those constants will be also used by postcopy. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c |8 arch_init.h |8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch_init.c b/arch_init.c index a312434..4b65221 100644 --- a/arch_init.c +++ b/arch_init.c @@ -106,14 +106,6 @@ const uint32_t arch_type = QEMU_ARCH; /***/ /* ram save/restore */ -#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */ -#define RAM_SAVE_FLAG_COMPRESS 0x02 -#define RAM_SAVE_FLAG_MEM_SIZE 0x04 -#define RAM_SAVE_FLAG_PAGE 0x08 -#define RAM_SAVE_FLAG_EOS 0x10 -#define RAM_SAVE_FLAG_CONTINUE 0x20 -#define RAM_SAVE_FLAG_XBZRLE 0x40 - #ifdef __ALTIVEC__ #include altivec.h #define VECTYPEvector unsigned char diff --git a/arch_init.h b/arch_init.h index d9c572a..e4c131e 100644 --- a/arch_init.h +++ b/arch_init.h @@ -36,4 +36,12 @@ int xen_available(void); CpuDefinitionInfoList GCC_WEAK_DECL *arch_query_cpu_definitions(Error **errp); +#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */ +#define RAM_SAVE_FLAG_COMPRESS 0x02 +#define RAM_SAVE_FLAG_MEM_SIZE 0x04 +#define RAM_SAVE_FLAG_PAGE 0x08 +#define RAM_SAVE_FLAG_EOS 0x10 +#define RAM_SAVE_FLAG_CONTINUE 0x20 +#define RAM_SAVE_FLAG_XBZRLE 0x40 + #endif -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 11/35] savevm, buffered_file: introduce method to drain buffer of buffered file
Introduce a new method to drain the buffer of QEMUBufferedFile. When postcopy migration, buffer size can increase unboundedly. To keep the buffer size reasonably small, introduce the method to wait for buffer to drain. Detect unfreeze output by select too, not only by timer, thus pending data can be sent quickly. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- buffered_file.c | 59 +-- buffered_file.h |1 + qemu-file.h |1 + savevm.c|7 +++ 4 files changed, 58 insertions(+), 10 deletions(-) diff --git a/buffered_file.c b/buffered_file.c index ed92df1..275d504 100644 --- a/buffered_file.c +++ b/buffered_file.c @@ -26,12 +26,14 @@ typedef struct QEMUFileBuffered MigrationState *migration_state; QEMUFile *file; int freeze_output; +bool no_limit; size_t bytes_xfer; size_t xfer_limit; uint8_t *buffer; size_t buffer_size; size_t buffer_capacity; QEMUTimer *timer; +int unfreeze_fd; } QEMUFileBuffered; #ifdef DEBUG_BUFFERED_FILE @@ -42,6 +44,16 @@ typedef struct QEMUFileBuffered do { } while (0) #endif +static ssize_t buffered_flush(QEMUFileBuffered *s); + +static void buffered_unfreeze(void *opaque) +{ +QEMUFileBuffered *s = opaque; +qemu_set_fd_handler(s-unfreeze_fd, NULL, NULL, NULL); +s-freeze_output = 0; +buffered_flush(s); +} + static void buffered_append(QEMUFileBuffered *s, const uint8_t *buf, size_t size) { @@ -65,7 +77,8 @@ static ssize_t buffered_flush(QEMUFileBuffered *s) DPRINTF(flushing %zu byte(s) of data\n, s-buffer_size); -while (s-bytes_xfer s-xfer_limit offset s-buffer_size) { +while ((s-bytes_xfer s-xfer_limit offset s-buffer_size) || + s-no_limit) { ret = migrate_fd_put_buffer(s-migration_state, s-buffer + offset, s-buffer_size - offset); @@ -73,6 +86,15 @@ static ssize_t buffered_flush(QEMUFileBuffered *s) DPRINTF(backend not ready, freezing\n); ret = 0; s-freeze_output = 1; +if (!s-no_limit) { +if (s-unfreeze_fd == -1) { +s-unfreeze_fd = dup(s-migration_state-fd); +} +if (s-unfreeze_fd = 0) { +qemu_set_fd_handler(s-unfreeze_fd, +NULL, buffered_unfreeze, s); +} +} break; } @@ -113,7 +135,7 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in s-freeze_output = 0; if (size 0) { -DPRINTF(buffering %d bytes\n, size - offset); +DPRINTF(buffering %d bytes\n, size); buffered_append(s, buf, size); } @@ -134,17 +156,11 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in return size; } -static int buffered_close(void *opaque) +static void buffered_drain(QEMUFileBuffered *s) { -QEMUFileBuffered *s = opaque; -ssize_t ret = 0; -int ret2; - -DPRINTF(closing\n); - s-xfer_limit = INT_MAX; while (!qemu_file_get_error(s-file) s-buffer_size) { -ret = buffered_flush(s); +ssize_t ret = buffered_flush(s); if (ret 0) { break; } @@ -153,13 +169,27 @@ static int buffered_close(void *opaque) if (ret 0) { break; } +s-freeze_output = 0; } } +} + +static int buffered_close(void *opaque) +{ +QEMUFileBuffered *s = opaque; +ssize_t ret = 0; +int ret2; +DPRINTF(closing\n); + +buffered_drain(s); ret2 = migrate_fd_close(s-migration_state); if (ret = 0) { ret = ret2; } +if (s-unfreeze_fd = 0) { +close(s-unfreeze_fd); +} qemu_del_timer(s-timer); qemu_free_timer(s-timer); g_free(s-buffer); @@ -242,6 +272,7 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state) s-migration_state = migration_state; s-xfer_limit = migration_state-bandwidth_limit / 10; +s-unfreeze_fd = -1; s-file = qemu_fopen_ops(s, buffered_put_buffer, NULL, buffered_close, buffered_rate_limit, @@ -254,3 +285,11 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state) return s-file; } + +void qemu_buffered_file_drain_buffer(void *buffered_file) +{ +QEMUFileBuffered *s = buffered_file; +s-no_limit = true; +buffered_drain(s); +s-no_limit = false; +} diff --git a/buffered_file.h b/buffered_file.h index ef010fe..be714a7 100644 --- a/buffered_file.h +++ b/buffered_file.h @@ -18,5 +18,6 @@ #include migration.h QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state); +void qemu_buffered_file_drain_buffer(void *buffered_file); #endif diff --git a/qemu-file.h b/qemu-file.h index 452efcd..8074df1 100644
[PATCH v3 09/35] savevm/QEMUFile: introduce qemu_fopen_fd
Introduce fd read/write backend of QEMUFile whose fd can be non-blocking This will be used by postcopy live migration. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- qemu-file.h |1 + savevm.c| 35 +++ 2 files changed, 36 insertions(+) diff --git a/qemu-file.h b/qemu-file.h index bc222dc..94557ea 100644 --- a/qemu-file.h +++ b/qemu-file.h @@ -68,6 +68,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer, QEMUFile *qemu_fopen(const char *filename, const char *mode); QEMUFile *qemu_fdopen(int fd, const char *mode); QEMUFile *qemu_fopen_socket(int fd); +QEMUFile *qemu_fopen_fd(int fd, const char *mode); QEMUFile *qemu_popen(FILE *popen_file, const char *mode); QEMUFile *qemu_popen_cmd(const char *command, const char *mode); int qemu_file_fd(QEMUFile *f); diff --git a/savevm.c b/savevm.c index e24041b..712b7ae 100644 --- a/savevm.c +++ b/savevm.c @@ -207,6 +207,19 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) return len; } +static int fd_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) +{ +QEMUFileFD *s = opaque; +return qemu_read_full(s-file-fd, buf, size); +} + +static int fd_put_buffer(void *opaque, + const uint8_t *buf, int64_t pos, int size) +{ +QEMUFileFD *s = opaque; +return qemu_write_full(s-file-fd, buf, size); +} + static int fd_close(void *opaque) { QEMUFileFD *s = opaque; @@ -333,6 +346,28 @@ QEMUFile *qemu_fopen_socket(int fd) return s-file; } +QEMUFile *qemu_fopen_fd(int fd, const char *mode) +{ +QEMUFileFD *s; + +if (mode == NULL || (mode[0] != 'r' mode[0] != 'w') || mode[1] != 0) { +fprintf(stderr, qemu_fopen_fd: Argument validity check failed\n); +return NULL; +} + +s = g_malloc0(sizeof(*s)); +if (mode[0] == 'r') { +s-file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close, + NULL, NULL, NULL); +} else { +s-file = qemu_fopen_ops(s, fd_put_buffer, NULL, fd_close, + NULL, NULL, NULL); +} + +s-file-fd = fd; +return s-file; +} + static int file_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int size) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 05/35] protect the ramlist with a separate mutex
From: Umesh Deshpande udesh...@redhat.com From: Umesh Deshpande udesh...@redhat.com Add the new mutex that protects shared state between ram_save_live and the iothread. If the iothread mutex has to be taken together with the ramlist mutex, the iothread shall always be _outside_. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Umesh Deshpande udesh...@redhat.com Signed-off-by: Juan Quintela quint...@redhat.com --- arch_init.c |9 - cpu-all.h |8 exec.c | 23 +-- 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/arch_init.c b/arch_init.c index eb36a6a..a312434 100644 --- a/arch_init.c +++ b/arch_init.c @@ -553,7 +553,6 @@ static void ram_migration_cancel(void *opaque) migration_end(); } - static void reset_ram_globals(void) { last_block = NULL; @@ -573,6 +572,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) bitmap_set(migration_bitmap, 1, ram_pages); migration_dirty_pages = ram_pages; +qemu_mutex_lock_ramlist(); bytes_transferred = 0; reset_ram_globals(); @@ -600,6 +600,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) qemu_put_be64(f, block-length); } +qemu_mutex_unlock_ramlist(); qemu_put_be64(f, RAM_SAVE_FLAG_EOS); return 0; @@ -614,6 +615,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) uint64_t expected_downtime; MigrationState *s = migrate_get_current(); +qemu_mutex_lock_ramlist(); + if (ram_list.version != last_version) { reset_ram_globals(); } @@ -662,6 +665,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) bwidth = 0.01; } +qemu_mutex_unlock_ramlist(); qemu_put_be64(f, RAM_SAVE_FLAG_EOS); expected_downtime = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth; @@ -682,6 +686,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque) { migration_bitmap_sync(); +qemu_mutex_lock_ramlist(); + /* try transferring iterative blocks of memory */ /* flush all remaining blocks regardless of rate limiting */ @@ -697,6 +703,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) } memory_global_dirty_log_stop(); +qemu_mutex_unlock_ramlist(); qemu_put_be64(f, RAM_SAVE_FLAG_EOS); g_free(migration_bitmap); diff --git a/cpu-all.h b/cpu-all.h index 84aea8b..b5fefc8 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -22,6 +22,7 @@ #include qemu-common.h #include qemu-tls.h #include cpu-common.h +#include qemu-thread.h /* some important defines: * @@ -490,7 +491,9 @@ typedef struct RAMBlock { ram_addr_t offset; ram_addr_t length; uint32_t flags; +/* Protected by the iothread lock. */ QLIST_ENTRY(RAMBlock) next_mru; +/* Protected by the ramlist lock. */ QLIST_ENTRY(RAMBlock) next; char idstr[256]; #if defined(__linux__) !defined(TARGET_S390X) @@ -499,9 +502,12 @@ typedef struct RAMBlock { } RAMBlock; typedef struct RAMList { +QemuMutex mutex; +/* Protected by the iothread lock. */ uint8_t *phys_dirty; uint32_t version; QLIST_HEAD(, RAMBlock) blocks_mru; +/* Protected by the ramlist lock. */ QLIST_HEAD(, RAMBlock) blocks; } RAMList; extern RAMList ram_list; @@ -521,6 +527,8 @@ extern int mem_prealloc; void dump_exec_info(FILE *f, fprintf_function cpu_fprintf); ram_addr_t last_ram_offset(void); +void qemu_mutex_lock_ramlist(void); +void qemu_mutex_unlock_ramlist(void); #endif /* !CONFIG_USER_ONLY */ int cpu_memory_rw_debug(CPUArchState *env, target_ulong addr, diff --git a/exec.c b/exec.c index f5a8aca..1414654 100644 --- a/exec.c +++ b/exec.c @@ -645,6 +645,7 @@ bool tcg_enabled(void) void cpu_exec_init_all(void) { #if !defined(CONFIG_USER_ONLY) +qemu_mutex_init(ram_list.mutex); memory_map_init(); io_mem_init(); #endif @@ -2324,6 +2325,16 @@ void qemu_flush_coalesced_mmio_buffer(void) kvm_flush_coalesced_mmio_buffer(); } +void qemu_mutex_lock_ramlist(void) +{ +qemu_mutex_lock(ram_list.mutex); +} + +void qemu_mutex_unlock_ramlist(void) +{ +qemu_mutex_unlock(ram_list.mutex); +} + #if defined(__linux__) !defined(TARGET_S390X) #include sys/vfs.h @@ -2505,6 +2516,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev) } pstrcat(new_block-idstr, sizeof(new_block-idstr), name); +qemu_mutex_lock_ramlist(); QLIST_FOREACH(block, ram_list.blocks, next) { if (block != new_block !strcmp(block-idstr, new_block-idstr)) { fprintf(stderr, RAMBlock \%s\ already registered, abort!\n, @@ -2512,6 +2524,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev) abort(); } } +qemu_mutex_unlock_ramlist(); } static int memory_try_enable_merging(void *addr, size_t len) @@ -2535,6 +2548,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, size
[PATCH v3 04/35] add a version number to ram_list
From: Umesh Deshpande udesh...@redhat.com From: Umesh Deshpande udesh...@redhat.com This will be used to detect if last_block might have become invalid across different calls to ram_save_live. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Umesh Deshpande udesh...@redhat.com --- arch_init.c |7 ++- cpu-all.h |1 + exec.c |5 - 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch_init.c b/arch_init.c index d6162af..eb36a6a 100644 --- a/arch_init.c +++ b/arch_init.c @@ -336,6 +336,7 @@ static RAMBlock *last_block; static ram_addr_t last_offset; static unsigned long *migration_bitmap; static uint64_t migration_dirty_pages; +static uint32_t last_version; static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr, ram_addr_t offset) @@ -406,7 +407,6 @@ static void migration_bitmap_sync(void) } } - /* * ram_save_block: Writes a page of memory to the stream f * @@ -558,6 +558,7 @@ static void reset_ram_globals(void) { last_block = NULL; last_offset = 0; +last_version = ram_list.version; sort_ram_list(); } @@ -613,6 +614,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) uint64_t expected_downtime; MigrationState *s = migrate_get_current(); +if (ram_list.version != last_version) { +reset_ram_globals(); +} + bytes_transferred_last = bytes_transferred; bwidth = qemu_get_clock_ns(rt_clock); diff --git a/cpu-all.h b/cpu-all.h index ecbba12..84aea8b 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -500,6 +500,7 @@ typedef struct RAMBlock { typedef struct RAMList { uint8_t *phys_dirty; +uint32_t version; QLIST_HEAD(, RAMBlock) blocks_mru; QLIST_HEAD(, RAMBlock) blocks; } RAMList; diff --git a/exec.c b/exec.c index 489d924..f5a8aca 100644 --- a/exec.c +++ b/exec.c @@ -645,7 +645,6 @@ bool tcg_enabled(void) void cpu_exec_init_all(void) { #if !defined(CONFIG_USER_ONLY) -qemu_mutex_init(ram_list.mutex); memory_map_init(); io_mem_init(); #endif @@ -2570,6 +2569,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, QLIST_INSERT_HEAD(ram_list.blocks, new_block, next); QLIST_INSERT_HEAD(ram_list.blocks_mru, new_block, next_mru); +ram_list.version++; + ram_list.phys_dirty = g_realloc(ram_list.phys_dirty, last_ram_offset() TARGET_PAGE_BITS); memset(ram_list.phys_dirty + (new_block-offset TARGET_PAGE_BITS), @@ -2598,6 +2599,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr) if (addr == block-offset) { QLIST_REMOVE(block, next); QLIST_REMOVE(block, next_mru); +ram_list.version++; g_free(block); return; } @@ -2612,6 +2614,7 @@ void qemu_ram_free(ram_addr_t addr) if (addr == block-offset) { QLIST_REMOVE(block, next); QLIST_REMOVE(block, next_mru); +ram_list.version++; if (block-flags RAM_PREALLOC_MASK) { ; } else if (mem_path) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 00/35] postcopy live migration
| | | V | release the cached page | madvise(MADV_REMOVE) | | | pages can be sent | backgroundly | | | V | mark page is cached | Thus future page fault is | avoided. | | | V | touch guest RAM pages | | | V | release the cached page | madvise(MADV_REMOVE) | | V V all the pages are pulled from the source | | V V migration completesexit() Isaku Yamahata (32): migration.c: remove redundant line in migrate_init() arch_init: DPRINTF format error and typo osdep: add qemu_read_full() to read interrupt-safely savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip, qemu_fflush savevm/QEMUFile: consolidate QEMUFile functions a bit savevm/QEMUFile: introduce qemu_fopen_fd savevm/QEMUFile: add read/write QEMUFile on memory buffer savevm, buffered_file: introduce method to drain buffer of buffered file arch_init: export RAM_SAVE_xxx flags for postcopy arch_init/ram_save: introduce constant for ram save version = 4 arch_init: refactor ram_save_block() and export ram_save_block() arch_init/ram_save_setup: factor out bitmap alloc/free arch_init/ram_load: refactor ram_load arch_init: factor out logic to find ram block with id string migration: export migrate_fd_completed() and migrate_fd_cleanup() uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh osdep: add QEMU_MADV_REMOVE and tirivial fix postcopy: introduce helper functions for postcopy savevm: add new section that is used by postcopy postcopy: implement incoming part of postcopy live migration postcopy outgoing: add -p option to migrate command postcopy: implement outgoing part of postcopy live migration postcopy/outgoing: add -n options to disable background transfer postcopy/outgoing: implement forward/backword prefault arch_init: factor out setting last_block, last_offset postcopy/outgoing: add movebg mode(-m) to migration command arch_init: factor out ram_load arch_init: export ram_save_iterate() postcopy: pre+post optimization incoming side arch_init: export migration_bitmap_sync and helper method to get bitmap postcopy/outgoing: introduce precopy_count parameter postcopy: pre+post optimization outgoing side Paolo Bonzini (1): split MRU ram list Umesh Deshpande (2): add a version number to ram_list protect the ramlist with a separate mutex Makefile.target |2 + arch_init.c | 391 +--- arch_init.h | 24 + buffered_file.c | 59 +- buffered_file.h |1 + cpu-all.h | 16 +- exec.c | 62 +- hmp-commands.hx | 21 +- hmp.c | 12 +- linux-headers/linux/uvmem.h | 41 + migration-exec.c|8 +- migration-fd.c | 23 +- migration-postcopy.c| 2019 +++ migration-tcp.c | 16 +- migration-unix.c| 36 +- migration.c | 65 +- migration.h | 42 + osdep.c | 24 + osdep.h | 13 +- qapi-schema.json|6 +- qemu-common.h |2 + qemu-file.h | 12 +- qmp-commands.hx |4 +- savevm.c| 223 - scripts/update-linux-headers.sh |2 +- sysemu.h|2 +- umem.c | 291 ++ umem.h | 88 ++ vl.c|5 +- 29 files changed, 3265 insertions(+), 245 deletions(-) create mode 100644 linux-headers/linux/uvmem.h create mode 100644 migration-postcopy.c create mode 100644 umem.c create mode 100644 umem.h -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm
[PATCH v3 03/35] split MRU ram list
From: Paolo Bonzini pbonz...@redhat.com From: Paolo Bonzini pbonz...@redhat.com Outside the execution threads the normal, non-MRU-ized order of the RAM blocks should always be enough. So manage two separate lists, which will have separate locking rules. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch_init.c |1 + cpu-all.h |4 +++- exec.c | 18 +- 3 files changed, 17 insertions(+), 6 deletions(-) diff --git a/arch_init.c b/arch_init.c index 79d4041..d6162af 100644 --- a/arch_init.c +++ b/arch_init.c @@ -48,6 +48,7 @@ #include qemu/page_cache.h #include qmp-commands.h #include trace.h +#include cpu-all.h #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ diff --git a/cpu-all.h b/cpu-all.h index 6606432..ecbba12 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -490,8 +490,9 @@ typedef struct RAMBlock { ram_addr_t offset; ram_addr_t length; uint32_t flags; -char idstr[256]; +QLIST_ENTRY(RAMBlock) next_mru; QLIST_ENTRY(RAMBlock) next; +char idstr[256]; #if defined(__linux__) !defined(TARGET_S390X) int fd; #endif @@ -499,6 +500,7 @@ typedef struct RAMBlock { typedef struct RAMList { uint8_t *phys_dirty; +QLIST_HEAD(, RAMBlock) blocks_mru; QLIST_HEAD(, RAMBlock) blocks; } RAMList; extern RAMList ram_list; diff --git a/exec.c b/exec.c index b0ed593..489d924 100644 --- a/exec.c +++ b/exec.c @@ -56,6 +56,7 @@ #include xen-mapcache.h #include trace.h #endif +#include cpu-all.h #include cputlb.h @@ -96,7 +97,10 @@ static uint8_t *code_gen_ptr; int phys_ram_fd; static int in_migration; -RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks) }; +RAMList ram_list = { +.blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks), +.blocks_mru = QLIST_HEAD_INITIALIZER(ram_list.blocks_mru) +}; static MemoryRegion *system_memory; static MemoryRegion *system_io; @@ -641,6 +645,7 @@ bool tcg_enabled(void) void cpu_exec_init_all(void) { #if !defined(CONFIG_USER_ONLY) +qemu_mutex_init(ram_list.mutex); memory_map_init(); io_mem_init(); #endif @@ -2563,6 +2568,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, new_block-length = size; QLIST_INSERT_HEAD(ram_list.blocks, new_block, next); +QLIST_INSERT_HEAD(ram_list.blocks_mru, new_block, next_mru); ram_list.phys_dirty = g_realloc(ram_list.phys_dirty, last_ram_offset() TARGET_PAGE_BITS); @@ -2591,6 +2597,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr) QLIST_FOREACH(block, ram_list.blocks, next) { if (addr == block-offset) { QLIST_REMOVE(block, next); +QLIST_REMOVE(block, next_mru); g_free(block); return; } @@ -2604,6 +2611,7 @@ void qemu_ram_free(ram_addr_t addr) QLIST_FOREACH(block, ram_list.blocks, next) { if (addr == block-offset) { QLIST_REMOVE(block, next); +QLIST_REMOVE(block, next_mru); if (block-flags RAM_PREALLOC_MASK) { ; } else if (mem_path) { @@ -2709,12 +2717,12 @@ void *qemu_get_ram_ptr(ram_addr_t addr) { RAMBlock *block; -QLIST_FOREACH(block, ram_list.blocks, next) { +QLIST_FOREACH(block, ram_list.blocks_mru, next_mru) { if (addr - block-offset block-length) { /* Move this entry to to start of the list. */ if (block != QLIST_FIRST(ram_list.blocks)) { -QLIST_REMOVE(block, next); -QLIST_INSERT_HEAD(ram_list.blocks, block, next); +QLIST_REMOVE(block, next_mru); +QLIST_INSERT_HEAD(ram_list.blocks_mru, block, next_mru); } if (xen_enabled()) { /* We need to check if the requested address is in the RAM @@ -2809,7 +2817,7 @@ int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr) return 0; } -QLIST_FOREACH(block, ram_list.blocks, next) { +QLIST_FOREACH(block, ram_list.blocks_mru, next_mru) { /* This case append when the block is not mapped. */ if (block-host == NULL) { continue; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration
On Tue, Oct 30, 2012 at 06:53:31PM +, Benoit Hudzia wrote: Hi Isaku, Are you going to be at the KVM forum ( i think you have a presentation there). It would be nice if we could meet in order to see if we can synch our efforts . Yes, definitively. As you know we have been developing an RDMA based solution for post copy migration and we demonstrated the initial proof of concept in december 2012 ( we published some finding in VHPC 2012 and are working with Petter Svard from Umea on a journal paper with more detailed performance review) . Do you have any pointers to available papers/slides? I can't find any at http://vhpc.org/ While RDMA post copy live migration is just of by product of our long term effort ( i will present the project in my talk at KVM forum) we grabbed the opportunity to address problems we were facing with the live migration of enterprise workload . Namely how to migrate in memory database such has HANA under load. We quickly discovered that pre copy ( even with optimization ) didn't work with such workload. We also tried your code however the performance where far from satisfying with large VM under load due to the heavy cost of transferring memory between user space - kernel multiple time ( actually it often failed) If possible, I'd like to see the details. We then tested a pure RDMA solution we developed ( we suport HW and software RDMA ) and it work fine with all the workload we tested ( we migrated VM with 20+ GB running SAP HANA under a workload similar to TPC-H) and we hop to test with bigger configuration soon ( 1/2 + TB of memory) . However the state of integration of our code with the QEMU -code base is not as advanced and polished as the one you currently have and i would like to know if you would be interested in trying to join our effort or collaborate in merging our solution. Or maybe allowing us to piggy back on your effort. Yeah, we can unite our efforts for the upstream. Especially clean interface for both non-RDMA/RDMA (qemu internal/qemu-kernel) is important. At the moment I have no clue to the requirement of RDMA postcopy and your implementation. transparently integrating with the MMU at the OS level sounds interesting. thanks, Would you bee free to meet at any time next week ? ( from Tuesday to Friday) Ps: we would be open sourcing our project by the end of the month of November and the post copy is only a small part of the technology developed. . Regards Benoit On 30 October 2012 08:32, Isaku Yamahata yamah...@valinux.co.jp wrote: This is the v3 patch series of postcopy migration. The trees is available at git://github.com/yamahata/qemu.git qemu-postcopy-oct-30-2012 git://github.com/yamahata/linux-umem.git linux-umem-oct-29-2012 Major changes v2 - v3: - implemented pre+post optimization - auto detection of postcopy by incoming side - using threads on destination instead of fork - using blocking io instead of select + non-blocking io loop - less memory overhead - various improvement and code simplification - kernel module name change umem - uvmem to avoid name conflict. Patches organization: 1-2: trivial fixes 3-5: prepartion for threading. cherry-picked from migration tree 6-18: refactoring existing code and preparation 19-25: implement postcopy live migration itself (essential part) 26-35: optimization/heuristic for postcopy Usage = You need load uvmem character device on the host before starting migration. Postcopy can be used for tcg and kvm accelarator. The implementation depend on only linux uvmem character device. But the driver dependent code is split into a file. I tested only host page size == guest page size case, but the implementation allows host page size != guest page size case. The following options are added with this patch series. - incoming part use -incoming as usual. Postcopy is automatically detected. example: qemu -incoming tcp:0: -monitor stdio -machine accel=kvm - outging part options for migrate command migrate [-p [-n] [-m]] URI [precopy count [prefault forward [prefault backword]]] Newly added options/arguments -p: indicate postcopy migration -n: disable background transferring pages: This is for benchmark/ debugging -m: move background transfer of postcopy mode precopy count: The number of precopy RAM scan before postcopy. default 0 (0 means no precopy) prefault forward: The number of forward pages which is sent with on-demand prefault backward: The number of backward pages which is sent with on-demand example: migrate -p -n tcp:dest ip address: migrate -p -n -m tcp:dest ip address: 42
Re: [PATCH v2 35/41] postcopy: introduce helper functions for postcopy
On Thu, Jun 14, 2012 at 11:34:09PM +0200, Juan Quintela wrote: Isaku Yamahata yamah...@valinux.co.jp wrote: +//#define DEBUG_UMEM +#ifdef DEBUG_UMEM +#include sys/syscall.h +#define DPRINTF(format, ...)\ +do {\ +printf(%d:%ld %s:%d format, getpid(), syscall(SYS_gettid),\ + __func__, __LINE__, ## __VA_ARGS__); \ +} while (0) This should be in a header file that is linux specific? And (at least on my systems) gettid is already defined on glibc. I'll remove getpid/gettid. It was just for debugging in early phase. They are not necessary any more. +#else +#define DPRINTF(format, ...)do { } while (0) +#endif + +#define DEV_UMEM/dev/umem + +UMem *umem_new(void *hostp, size_t size) +{ +struct umem_init uinit = { +.size = size, +}; +UMem *umem; + +assert((size % getpagesize()) == 0); +umem = g_new(UMem, 1); +umem-fd = open(DEV_UMEM, O_RDWR); +if (umem-fd 0) { +perror(can't open DEV_UMEM); +abort(); Can we return one error insntead of abort? the same for the rest of the file aborts. Ok. +size_t umem_pages_size(uint64_t nr) +{ +return sizeof(struct umem_pages) + nr * sizeof(uint64_t); Can we make sure that the pgoffs field is aligned? I know that as it is now it is aligned, but better to be sure? It is already done by gcc extension, zero length array. +} + +static void umem_write_cmd(int fd, uint8_t cmd) +{ +DPRINTF(write cmd %c\n, cmd); + +for (;;) { +ssize_t ret = write(fd, cmd, 1); +if (ret == -1) { +if (errno == EINTR) { +continue; +} else if (errno == EPIPE) { +perror(pipe); +DPRINTF(write cmd %c %zd %d: pipe is closed\n, +cmd, ret, errno); +break; +} Grr, we don't have a function that writes does a safe_write. The most similar thing in qemu looks to be send_all(). So we should introduce something like qemu_safe_write/read? + +perror(pipe); Can we make a different perror() message than previous error? +DPRINTF(write cmd %c %zd %d\n, cmd, ret, errno); +abort(); +} + +break; +} +} + +static void umem_read_cmd(int fd, uint8_t expect) +{ +uint8_t cmd; +for (;;) { +ssize_t ret = read(fd, cmd, 1); +if (ret == -1) { +if (errno == EINTR) { +continue; +} +perror(pipe); +DPRINTF(read error cmd %c %zd %d\n, cmd, ret, errno); +abort(); +} + +if (ret == 0) { +DPRINTF(read cmd %c %zd: pipe is closed\n, cmd, ret); +abort(); +} + +break; +} + +DPRINTF(read cmd %c\n, cmd); +if (cmd != expect) { +DPRINTF(cmd %c expect %d\n, cmd, expect); +abort(); Ouch. If we receive garbage, we just exit? I really think that we should implement error handling. +} +} + +struct umem_pages *umem_recv_pages(QEMUFile *f, int *offset) +{ +int ret; +uint64_t nr; +size_t size; +struct umem_pages *pages; + +ret = qemu_peek_buffer(f, (uint8_t*)nr, sizeof(nr), *offset); +*offset += sizeof(nr); +DPRINTF(ret %d nr %ld\n, ret, nr); +if (ret != sizeof(nr) || nr == 0) { +return NULL; +} + +size = umem_pages_size(nr); +pages = g_malloc(size); Just thinking about this. Couldn't we just decide on a big enough buffer, and never send anything bigger than that? That would remove the need to have to malloc()/free() a buffer for each reception? Will try to address it. +/* qemu side handler */ +struct umem_pages *umem_qemu_trigger_page_fault(QEMUFile *from_umemd, +int *offset) +{ +uint64_t i; +int page_shift = ffs(getpagesize()) - 1; +struct umem_pages *pages = umem_recv_pages(from_umemd, offset); +if (pages == NULL) { +return NULL; +} + +for (i = 0; i pages-nr; i++) { +ram_addr_t addr = pages-pgoffs[i] page_shift; + +/* make pages present by forcibly triggering page fault. */ +volatile uint8_t *ram = qemu_get_ram_ptr(addr); +uint8_t dummy_read = ram[0]; +(void)dummy_read; /* suppress unused variable warning */ +} + +/* + * Very Linux implementation specific. + * Make it sure that other thread doesn't fault on the above virtual + * address. (More exactly other thread doesn't call fault handler with + * the offset.) + * the fault handler
Re: [Qemu-devel] [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option
On Fri, Jun 08, 2012 at 12:52:54PM +0200, Juan Quintela wrote: Isaku Yamahata yamah...@valinux.co.jp wrote: This patch prepares for postcopy livemigration. It introduces -postcopy option and its internal flag, migration_postcopy. It introduces -postcopy-flags for chaging the behavior of incoming postcopy mainly for benchmark/debug. Why do we need postcopy flag? -incoming should be enough to detect that we are doing postcopy. QLIST_HEAD(, LoadStateEntry) loadvm_handlers = QLIST_HEAD_INITIALIZER(loadvm_handlers); LoadStateEntry *le, *new_le; uint8_t section_type; unsigned int v; int ret; if (qemu_savevm_state_blocked(NULL)) { return -EINVAL; } v = qemu_get_be32(f); if (v != QEMU_VM_FILE_MAGIC) return -EINVAL; v = qemu_get_be32(f); if (v == QEMU_VM_FILE_VERSION_COMPAT) { fprintf(stderr, SaveVM v2 format is obsolete and don't work anymore\n); return -ENOTSUP; } if (v != QEMU_VM_FILE_VERSION) return -ENOTSUP; Shouldn't we be able to change some version field here and make the recognition of postcopy automatic? Having to hack around a new command line option for each page is not going to be nice. And about postcopy flags, if they are for incoming side, please consider just sent that flags on the stream as a first field? Yes, you are right. If bumping version is allowed, -postcopy can be dropped with auto detection. -postcopy-flags can be dropped because it is used only for benchmark purpose to change incoming side behavior independent of outgoing side. -- yamahata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy
Those constants will be also used by postcopy. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c |7 --- arch_init.h |7 +++ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch_init.c b/arch_init.c index 38e0173..bd4e61e 100644 --- a/arch_init.c +++ b/arch_init.c @@ -88,13 +88,6 @@ const uint32_t arch_type = QEMU_ARCH; /***/ /* ram save/restore */ -#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */ -#define RAM_SAVE_FLAG_COMPRESS 0x02 -#define RAM_SAVE_FLAG_MEM_SIZE 0x04 -#define RAM_SAVE_FLAG_PAGE 0x08 -#define RAM_SAVE_FLAG_EOS 0x10 -#define RAM_SAVE_FLAG_CONTINUE 0x20 - #ifdef __ALTIVEC__ #include altivec.h #define VECTYPEvector unsigned char diff --git a/arch_init.h b/arch_init.h index c7cb94a..7cc3fa7 100644 --- a/arch_init.h +++ b/arch_init.h @@ -30,4 +30,11 @@ int tcg_available(void); int kvm_available(void); int xen_available(void); +#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */ +#define RAM_SAVE_FLAG_COMPRESS 0x02 +#define RAM_SAVE_FLAG_MEM_SIZE 0x04 +#define RAM_SAVE_FLAG_PAGE 0x08 +#define RAM_SAVE_FLAG_EOS 0x10 +#define RAM_SAVE_FLAG_CONTINUE 0x20 + #endif -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block()
This will be used by postcopy. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c |4 ++-- migration.h |2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch_init.c b/arch_init.c index a9e8b74..38e0173 100644 --- a/arch_init.c +++ b/arch_init.c @@ -164,7 +164,7 @@ static int is_dup_page(uint8_t *page) static RAMBlock *last_block; static ram_addr_t last_offset; -static int ram_save_block(QEMUFile *f) +int ram_save_block(QEMUFile *f) { RAMBlock *block = last_block; ram_addr_t offset = last_offset; @@ -273,7 +273,7 @@ static int block_compar(const void *a, const void *b) return strcmp((*ablock)-idstr, (*bblock)-idstr); } -static void sort_ram_list(void) +void sort_ram_list(void) { RAMBlock *block, *nblock, **blocks; int n; diff --git a/migration.h b/migration.h index 2e9ca2e..8b9509c 100644 --- a/migration.h +++ b/migration.h @@ -76,6 +76,8 @@ uint64_t ram_bytes_remaining(void); uint64_t ram_bytes_transferred(void); uint64_t ram_bytes_total(void); +void sort_ram_list(void); +int ram_save_block(QEMUFile *f); int ram_save_live(QEMUFile *f, int stage, void *opaque); int ram_load(QEMUFile *f, void *opaque, int version_id); -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 12/41] arch_init: factor out setting last_block, last_offset
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 13 - arch_init.h |1 + 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/arch_init.c b/arch_init.c index 2617478..22d9691 100644 --- a/arch_init.c +++ b/arch_init.c @@ -203,6 +203,12 @@ int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset) static RAMBlock *last_block; static ram_addr_t last_offset; +void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset) +{ +last_block = block; +last_offset = offset; +} + int ram_save_block(QEMUFile *f) { RAMBlock *block = last_block; @@ -230,9 +236,7 @@ int ram_save_block(QEMUFile *f) } } while (block != last_block || offset != last_offset); -last_block = block; -last_offset = offset; - +ram_save_set_last_block(block, offset); return bytes_sent; } @@ -349,8 +353,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) if (stage == 1) { bytes_transferred = 0; last_block_sent = NULL; -last_block = NULL; -last_offset = 0; +ram_save_set_last_block(NULL, 0); sort_ram_list(); /* Make sure all dirty bits are set */ diff --git a/arch_init.h b/arch_init.h index 7f5c77a..15548cd 100644 --- a/arch_init.h +++ b/arch_init.h @@ -40,6 +40,7 @@ int xen_available(void); #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */ #if defined(NEED_CPU_H) !defined(CONFIG_USER_ONLY) +void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset); int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset); RAMBlock *ram_find_block(const char *id, uint8_t len); void *ram_load_host_from_stream_offset(QEMUFile *f, -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- qemu-file.h |1 + savevm.c| 12 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/qemu-file.h b/qemu-file.h index 331ac8b..98a8023 100644 --- a/qemu-file.h +++ b/qemu-file.h @@ -71,6 +71,7 @@ QEMUFile *qemu_fopen_socket(int fd); QEMUFile *qemu_popen(FILE *popen_file, const char *mode); QEMUFile *qemu_popen_cmd(const char *command, const char *mode); int qemu_stdio_fd(QEMUFile *f); +int qemu_file_fd(QEMUFile *f); void qemu_fflush(QEMUFile *f); void qemu_buffered_file_drain(QEMUFile *f); int qemu_fclose(QEMUFile *f); diff --git a/savevm.c b/savevm.c index fb47529..cba1a69 100644 --- a/savevm.c +++ b/savevm.c @@ -178,6 +178,7 @@ struct QEMUFile { uint8_t buf[IO_BUF_SIZE]; int last_error; +int fd; /* -1 means fd isn't associated */ }; typedef struct QEMUFileStdio @@ -276,6 +277,7 @@ QEMUFile *qemu_popen(FILE *stdio_file, const char *mode) s-file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_pclose, NULL, NULL, NULL); } +s-file-fd = fileno(stdio_file); return s-file; } @@ -291,6 +293,7 @@ QEMUFile *qemu_popen_cmd(const char *command, const char *mode) return qemu_popen(popen_file, mode); } +/* TODO: replace this with qemu_file_fd() */ int qemu_stdio_fd(QEMUFile *f) { QEMUFileStdio *p; @@ -325,6 +328,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode) s-file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_fclose, NULL, NULL, NULL); } +s-file-fd = fd; return s-file; fail: @@ -339,6 +343,7 @@ QEMUFile *qemu_fopen_socket(int fd) s-fd = fd; s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, NULL, NULL, NULL); +s-file-fd = fd; return s-file; } @@ -381,6 +386,7 @@ QEMUFile *qemu_fopen(const char *filename, const char *mode) s-file = qemu_fopen_ops(s, NULL, file_get_buffer, stdio_fclose, NULL, NULL, NULL); } +s-file-fd = fileno(s-stdio_file); return s-file; fail: g_free(s); @@ -431,10 +437,16 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer, f-set_rate_limit = set_rate_limit; f-get_rate_limit = get_rate_limit; f-is_write = 0; +f-fd = -1; return f; } +int qemu_file_fd(QEMUFile *f) +{ +return f-fd; +} + int qemu_file_get_error(QEMUFile *f) { return f-last_error; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close
Later the structure will be shared. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- savevm.c | 14 +++--- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/savevm.c b/savevm.c index 4b560b3..2fb0c3e 100644 --- a/savevm.c +++ b/savevm.c @@ -187,14 +187,14 @@ typedef struct QEMUFileStdio QEMUFile *file; } QEMUFileStdio; -typedef struct QEMUFileSocket +typedef struct QEMUFileFD { QEMUFile *file; -} QEMUFileSocket; +} QEMUFileFD; static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) { -QEMUFileSocket *s = opaque; +QEMUFileFD *s = opaque; ssize_t len; do { @@ -207,9 +207,9 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) return len; } -static int socket_close(void *opaque) +static int fd_close(void *opaque) { -QEMUFileSocket *s = opaque; +QEMUFileFD *s = opaque; g_free(s); return 0; } @@ -325,9 +325,9 @@ fail: QEMUFile *qemu_fopen_socket(int fd) { -QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket)); +QEMUFileFD *s = g_malloc0(sizeof(QEMUFileFD)); -s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, +s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, fd_close, NULL, NULL, NULL); s-file-fd = fd; return s-file; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd
Introduce nonblocking fd read backend of QEMUFile. This will be used by postcopy live migration. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- qemu-file.h |1 + savevm.c| 40 2 files changed, 41 insertions(+), 0 deletions(-) diff --git a/qemu-file.h b/qemu-file.h index 1a12e7d..af5b123 100644 --- a/qemu-file.h +++ b/qemu-file.h @@ -68,6 +68,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer, QEMUFile *qemu_fopen(const char *filename, const char *mode); QEMUFile *qemu_fdopen(int fd, const char *mode); QEMUFile *qemu_fopen_socket(int fd); +QEMUFile *qemu_fopen_fd(int fd); QEMUFile *qemu_popen(FILE *popen_file, const char *mode); QEMUFile *qemu_popen_cmd(const char *command, const char *mode); int qemu_file_fd(QEMUFile *f); diff --git a/savevm.c b/savevm.c index 2fb0c3e..5640614 100644 --- a/savevm.c +++ b/savevm.c @@ -207,6 +207,35 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) return len; } +static int fd_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) +{ +QEMUFileFD *s = opaque; +ssize_t len = 0; + +while (size 0) { +ssize_t ret = read(s-file-fd, buf, size); +if (ret == -1) { +if (errno == EINTR) { +continue; +} +if (len == 0) { +len = -errno; +} +break; +} + +if (ret == 0) { +/* the write end of the pipe is closed */ +break; +} +len += ret; +buf += ret; +size -= ret; +} + +return len; +} + static int fd_close(void *opaque) { QEMUFileFD *s = opaque; @@ -333,6 +362,17 @@ QEMUFile *qemu_fopen_socket(int fd) return s-file; } +QEMUFile *qemu_fopen_fd(int fd) +{ +QEMUFileFD *s = g_malloc0(sizeof(*s)); + +fcntl_setfl(fd, O_NONBLOCK); +s-file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close, + NULL, NULL, NULL); +s-file-fd = fd; +return s-file; +} + static int file_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int size) { -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of buffered file
Introduce a new method to drain the buffer of QEMUBufferedFile. When postcopy migration, buffer size can increase unboundedly. To keep the buffer size reasonably small, introduce the method to wait for buffer to drain. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- buffered_file.c | 20 +++- buffered_file.h |1 + qemu-file.h |1 + savevm.c|7 +++ 4 files changed, 24 insertions(+), 5 deletions(-) diff --git a/buffered_file.c b/buffered_file.c index f170aa0..a38caec 100644 --- a/buffered_file.c +++ b/buffered_file.c @@ -170,6 +170,15 @@ static int buffered_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, in return offset; } +static void buffered_drain(QEMUFileBuffered *s) +{ +while (!qemu_file_get_error(s-file) s-buffer_size) { +buffered_flush(s); +if (s-freeze_output) +s-wait_for_unfreeze(s-opaque); +} +} + static int buffered_close(void *opaque) { QEMUFileBuffered *s = opaque; @@ -177,11 +186,7 @@ static int buffered_close(void *opaque) DPRINTF(closing\n); -while (!qemu_file_get_error(s-file) s-buffer_size) { -buffered_flush(s); -if (s-freeze_output) -s-wait_for_unfreeze(s-opaque); -} +buffered_drain(s); ret = s-close(s-opaque); @@ -291,3 +296,8 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque, return s-file; } + +void qemu_buffered_file_drain_buffer(void *buffered_file) +{ +buffered_drain(buffered_file); +} diff --git a/buffered_file.h b/buffered_file.h index 98d358b..cd8e1e8 100644 --- a/buffered_file.h +++ b/buffered_file.h @@ -26,5 +26,6 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t xfer_limit, BufferedPutReadyFunc *put_ready, BufferedWaitForUnfreezeFunc *wait_for_unfreeze, BufferedCloseFunc *close); +void qemu_buffered_file_drain_buffer(void *buffered_file); #endif diff --git a/qemu-file.h b/qemu-file.h index 880ef4b..331ac8b 100644 --- a/qemu-file.h +++ b/qemu-file.h @@ -72,6 +72,7 @@ QEMUFile *qemu_popen(FILE *popen_file, const char *mode); QEMUFile *qemu_popen_cmd(const char *command, const char *mode); int qemu_stdio_fd(QEMUFile *f); void qemu_fflush(QEMUFile *f); +void qemu_buffered_file_drain(QEMUFile *f); int qemu_fclose(QEMUFile *f); void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size); void qemu_put_byte(QEMUFile *f, int v); diff --git a/savevm.c b/savevm.c index 2992f97..fb47529 100644 --- a/savevm.c +++ b/savevm.c @@ -85,6 +85,7 @@ #include cpus.h #include memory.h #include qmp-commands.h +#include buffered_file.h #define SELF_ANNOUNCE_ROUNDS 5 @@ -477,6 +478,12 @@ void qemu_fflush(QEMUFile *f) } } +void qemu_buffered_file_drain(QEMUFile *f) +{ +qemu_fflush(f); +qemu_buffered_file_drain_buffer(f-opaque); +} + static void qemu_fill_buffer(QEMUFile *f) { int len; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd
fd is already stored in QEMUFile so drop duplicated member QEMUFileSocket::fd. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- savevm.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/savevm.c b/savevm.c index ec9f5d0..4b560b3 100644 --- a/savevm.c +++ b/savevm.c @@ -189,7 +189,6 @@ typedef struct QEMUFileStdio typedef struct QEMUFileSocket { -int fd; QEMUFile *file; } QEMUFileSocket; @@ -199,7 +198,7 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) ssize_t len; do { -len = qemu_recv(s-fd, buf, size, 0); +len = qemu_recv(s-file-fd, buf, size, 0); } while (len == -1 socket_error() == EINTR); if (len == -1) @@ -328,7 +327,6 @@ QEMUFile *qemu_fopen_socket(int fd) { QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket)); -s-fd = fd; s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, NULL, NULL, NULL); s-file-fd = fd; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 24/41] migration: export migrate_fd_completed() and migrate_fd_cleanup()
This will be used by postcopy migration. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- migration.c |4 ++-- migration.h |2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/migration.c b/migration.c index 753addb..48a8f68 100644 --- a/migration.c +++ b/migration.c @@ -159,7 +159,7 @@ MigrationInfo *qmp_query_migrate(Error **errp) /* shared migration helpers */ -static int migrate_fd_cleanup(MigrationState *s) +int migrate_fd_cleanup(MigrationState *s) { int ret = 0; @@ -187,7 +187,7 @@ void migrate_fd_error(MigrationState *s) migrate_fd_cleanup(s); } -static void migrate_fd_completed(MigrationState *s) +void migrate_fd_completed(MigrationState *s) { DPRINTF(setting completed state\n); if (migrate_fd_cleanup(s) 0) { diff --git a/migration.h b/migration.h index 6cf4512..d0dd536 100644 --- a/migration.h +++ b/migration.h @@ -62,7 +62,9 @@ int fd_start_incoming_migration(const char *path); int fd_start_outgoing_migration(MigrationState *s, const char *fdname); +int migrate_fd_cleanup(MigrationState *s); void migrate_fd_error(MigrationState *s); +void migrate_fd_completed(MigrationState *s); void migrate_fd_connect(MigrationState *s); -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
Those will be used by postcopy. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- qemu-file.h |3 +++ savevm.c|6 +++--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/qemu-file.h b/qemu-file.h index 31b83f6..a285bef 100644 --- a/qemu-file.h +++ b/qemu-file.h @@ -88,6 +88,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v); void qemu_put_be64(QEMUFile *f, uint64_t v); int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size); int qemu_get_byte(QEMUFile *f); +int qemu_peek_byte(QEMUFile *f, int offset); +int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset); +void qemu_file_skip(QEMUFile *f, int size); static inline unsigned int qemu_get_ubyte(QEMUFile *f) { diff --git a/savevm.c b/savevm.c index 2d18bab..8ad843f 100644 --- a/savevm.c +++ b/savevm.c @@ -588,14 +588,14 @@ void qemu_put_byte(QEMUFile *f, int v) qemu_fflush(f); } -static void qemu_file_skip(QEMUFile *f, int size) +void qemu_file_skip(QEMUFile *f, int size) { if (f-buf_index + size = f-buf_size) { f-buf_index += size; } } -static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset) +int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset) { int pending; int index; @@ -643,7 +643,7 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size) return done; } -static int qemu_peek_byte(QEMUFile *f, int offset) +int qemu_peek_byte(QEMUFile *f, int offset) { int index = f-buf_index + offset; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option
This patch prepares for postcopy livemigration. It introduces -postcopy option and its internal flag, migration_postcopy. It introduces -postcopy-flags for chaging the behavior of incoming postcopy mainly for benchmark/debug. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- migration.h |3 +++ qemu-options.hx | 22 ++ vl.c|8 3 files changed, 33 insertions(+), 0 deletions(-) diff --git a/migration.h b/migration.h index 59e6e68..4bbcf06 100644 --- a/migration.h +++ b/migration.h @@ -103,4 +103,7 @@ void migrate_add_blocker(Error *reason); */ void migrate_del_blocker(Error *reason); +extern bool incoming_postcopy; +extern unsigned long incoming_postcopy_flags; + #endif diff --git a/qemu-options.hx b/qemu-options.hx index 8b66264..a9af31e 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2616,6 +2616,28 @@ STEXI Prepare for incoming migration, listen on @var{port}. ETEXI +DEF(postcopy, 0, QEMU_OPTION_postcopy, +-postcopy postcopy incoming migration when -incoming is specified\n, +QEMU_ARCH_ALL) +STEXI +@item -postcopy +@findex -postcopy +start incoming migration in postcopy mode. +ETEXI + +DEF(postcopy-flags, HAS_ARG, QEMU_OPTION_postcopy_flags, +-postcopy-flags unsigned-int(flags)\n + flags for postcopy incoming migration\n + when -incoming and -postcopy are specified.\n + This is for benchmark/debug purpose (default: 0)\n, +QEMU_ARCH_ALL) +STEXI +@item -postcopy-flags int +@findex -postcopy-flags +Specify flags for incoming postcopy migration when -incoming and -postcopy are +specified. This is for benchamrk/debug purpose. (default: 0) +ETEXI + DEF(nodefaults, 0, QEMU_OPTION_nodefaults, \ -nodefaults don't create default devices\n, QEMU_ARCH_ALL) STEXI diff --git a/vl.c b/vl.c index 62dc343..1674abb 100644 --- a/vl.c +++ b/vl.c @@ -189,6 +189,8 @@ int mem_prealloc = 0; /* force preallocation of physical target memory */ int nb_nics; NICInfo nd_table[MAX_NICS]; int autostart; +bool incoming_postcopy = false; /* When -incoming is specified, postcopy mode */ +unsigned long incoming_postcopy_flags = 0; /* flags for postcopy incoming mode */ static int rtc_utc = 1; static int rtc_date_offset = -1; /* -1 means no change */ QEMUClock *rtc_clock; @@ -3115,6 +3117,12 @@ int main(int argc, char **argv, char **envp) incoming = optarg; runstate_set(RUN_STATE_INMIGRATE); break; +case QEMU_OPTION_postcopy: +incoming_postcopy = true; +break; +case QEMU_OPTION_postcopy_flags: +incoming_postcopy_flags = strtoul(optarg, NULL, 0); +break; case QEMU_OPTION_nodefaults: default_serial = 0; default_parallel = 0; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 35/41] postcopy: introduce helper functions for postcopy
This patch introduces helper function for postcopy to access umem char device and to communicate between incoming-qemu and umemd. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- changes v1 - v2: - code simplification - make fault trigger more robust - introduce struct umem_pages --- umem.c | 364 umem.h | 101 ++ 2 files changed, 465 insertions(+), 0 deletions(-) create mode 100644 umem.c create mode 100644 umem.h diff --git a/umem.c b/umem.c new file mode 100644 index 000..64eaab5 --- /dev/null +++ b/umem.c @@ -0,0 +1,364 @@ +/* + * umem.c: user process backed memory module for postcopy livemigration + * + * Copyright (c) 2011 + * National Institute of Advanced Industrial Science and Technology + * + * https://sites.google.com/site/grivonhome/quick-kvm-migration + * Author: Isaku Yamahata yamahata at valinux co jp + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include sys/ioctl.h +#include sys/mman.h + +#include linux/umem.h + +#include bitops.h +#include sysemu.h +#include hw/hw.h +#include umem.h + +//#define DEBUG_UMEM +#ifdef DEBUG_UMEM +#include sys/syscall.h +#define DPRINTF(format, ...)\ +do {\ +printf(%d:%ld %s:%d format, getpid(), syscall(SYS_gettid),\ + __func__, __LINE__, ## __VA_ARGS__); \ +} while (0) +#else +#define DPRINTF(format, ...)do { } while (0) +#endif + +#define DEV_UMEM/dev/umem + +UMem *umem_new(void *hostp, size_t size) +{ +struct umem_init uinit = { +.size = size, +}; +UMem *umem; + +assert((size % getpagesize()) == 0); +umem = g_new(UMem, 1); +umem-fd = open(DEV_UMEM, O_RDWR); +if (umem-fd 0) { +perror(can't open DEV_UMEM); +abort(); +} + +if (ioctl(umem-fd, UMEM_INIT, uinit) 0) { +perror(UMEM_INIT); +abort(); +} +if (ftruncate(uinit.shmem_fd, uinit.size) 0) { +perror(truncate(\shmem_fd\)); +abort(); +} + +umem-nbits = 0; +umem-nsets = 0; +umem-faulted = NULL; +umem-page_shift = ffs(getpagesize()) - 1; +umem-shmem_fd = uinit.shmem_fd; +umem-size = uinit.size; +umem-umem = mmap(hostp, size, PROT_EXEC | PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_FIXED, umem-fd, 0); +if (umem-umem == MAP_FAILED) { +perror(mmap(UMem) failed); +abort(); +} +return umem; +} + +void umem_destroy(UMem *umem) +{ +if (umem-fd != -1) { +close(umem-fd); +} +if (umem-shmem_fd != -1) { +close(umem-shmem_fd); +} +g_free(umem-faulted); +g_free(umem); +} + +void umem_get_page_request(UMem *umem, struct umem_pages *page_request) +{ +ssize_t ret = read(umem-fd, page_request-pgoffs, + page_request-nr * sizeof(page_request-pgoffs[0])); +if (ret 0) { +perror(daemon: umem read); +abort(); +} +page_request-nr = ret / sizeof(page_request-pgoffs[0]); +} + +void umem_mark_page_cached(UMem *umem, struct umem_pages *page_cached) +{ +const void *buf = page_cached-pgoffs; +ssize_t left = page_cached-nr * sizeof(page_cached-pgoffs[0]); + +while (left 0) { +ssize_t ret = write(umem-fd, buf, left); +if (ret == -1) { +if (errno == EINTR) +continue; + +perror(daemon: umem write); +abort(); +} + +left -= ret; +buf += ret; +} +} + +void umem_unmap(UMem *umem) +{ +munmap(umem-umem, umem-size); +umem-umem = NULL; +} + +void umem_close(UMem *umem) +{ +close(umem-fd); +umem-fd = -1; +} + +void *umem_map_shmem(UMem *umem) +{ +umem-nbits = umem-size umem-page_shift; +umem-nsets = 0; +umem-faulted = g_new0(unsigned long, BITS_TO_LONGS(umem-nbits)); + +umem-shmem = mmap(NULL, umem-size, PROT_READ | PROT_WRITE, MAP_SHARED, + umem-shmem_fd, 0); +if (umem-shmem == MAP_FAILED) { +perror(daemon: mmap(\shmem\)); +abort(); +} +return umem-shmem; +} + +void umem_unmap_shmem(UMem *umem) +{ +munmap(umem-shmem, umem-size); +umem-shmem = NULL; +} + +void umem_remove_shmem(UMem *umem, size_t offset, size_t size) +{ +int s = offset umem-page_shift
[PATCH v2 40/41] migrate: add -m (movebg) option to migrate command
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- hmp-commands.hx |5 +++-- hmp.c|3 ++- migration.c |8 +++- migration.h |1 + qapi-schema.json |2 +- qmp-commands.hx |2 +- savevm.c |1 + 7 files changed, 16 insertions(+), 6 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 38e5c95..1912cb8 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -798,15 +798,16 @@ ETEXI { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s, forward:i?,backward:i?, -.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword], +.params = [-d] [-b] [-i] [-p [-n] [-m] uri [forward] [backword], .help = migrate to URI (using -d to not wait for completion) \n\t\t\t -b for migration without shared storage with full copy of disk\n\t\t\t -i for migration without shared storage with incremental copy of disk (base image shared between src and destination) \n\t\t\t-p for migration with postcopy mode enabled + \n\t\t\t-m for move background transfer of postcopy mode \n\t\t\t-n for no background transfer of postcopy mode \n\t\t\tforward: the number of pages to forward-prefault when postcopy (default 0) diff --git a/hmp.c b/hmp.c index 79a9c86..dd3f307 100644 --- a/hmp.c +++ b/hmp.c @@ -912,6 +912,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) int blk = qdict_get_try_bool(qdict, blk, 0); int inc = qdict_get_try_bool(qdict, inc, 0); int postcopy = qdict_get_try_bool(qdict, postcopy, 0); +int movebg = qdict_get_try_bool(qdict, movebg, 0); int nobg = qdict_get_try_bool(qdict, nobg, 0); int forward = qdict_get_try_int(qdict, forward, 0); int backward = qdict_get_try_int(qdict, backward, 0); @@ -919,7 +920,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) Error *err = NULL; qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, -!!postcopy, postcopy, !!nobg, nobg, +!!postcopy, postcopy, !!movebg, movebg, !!nobg, nobg, !!forward, forward, !!backward, backward, err); if (err) { diff --git a/migration.c b/migration.c index e026085..c5e6820 100644 --- a/migration.c +++ b/migration.c @@ -422,7 +422,9 @@ void migrate_del_blocker(Error *reason) void qmp_migrate(const char *uri, bool has_blk, bool blk, bool has_inc, bool inc, bool has_detach, bool detach, - bool has_postcopy, bool postcopy, bool has_nobg, bool nobg, + bool has_postcopy, bool postcopy, + bool has_movebg, bool movebg, + bool has_nobg, bool nobg, bool has_forward, int64_t forward, bool has_backward, int64_t backward, Error **errp) @@ -432,6 +434,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, .blk = false, .shared = false, .postcopy = false, +.movebg = false, .nobg = false, .prefault_forward = 0, .prefault_backward = 0, @@ -448,6 +451,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, if (has_postcopy) { params.postcopy = postcopy; } +if (has_movebg) { +params.movebg = movebg; +} if (has_nobg) { params.nobg = nobg; } diff --git a/migration.h b/migration.h index 9a9b9c6..1e98b20 100644 --- a/migration.h +++ b/migration.h @@ -23,6 +23,7 @@ struct MigrationParams { int blk; int shared; int postcopy; +int movebg; int nobg; int64_t prefault_forward; int64_t prefault_backward; diff --git a/qapi-schema.json b/qapi-schema.json index 83c2170..ef2f48e 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1718,7 +1718,7 @@ ## { 'command': 'migrate', 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' , - '*postcopy': 'bool', '*nobg': 'bool', + '*postcopy': 'bool', '*movebg': 'bool', '*nobg': 'bool', '*forward': 'int', '*backward': 'int'} } # @xen-save-devices-state: diff --git a/qmp-commands.hx b/qmp-commands.hx index 7b5e5b7..5c9ecc8 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -469,7 +469,7 @@ EQMP { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s, .mhandler.cmd_new = qmp_marshal_input_migrate, }, diff --git a/savevm.c b/savevm.c index 48b636d..19bb8f1 100644 --- a/savevm.c +++ b/savevm.c @@ -1781,6 +1781,7 @@ static int qemu_savevm_state
[PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration
This patch implements postcopy live migration for outgoing part Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v1 - v2: - fix parameter to qemu_fdopen() - handle QEMU_UMEM_REQ_EOC properly when PO_STATE_ALL_PAGES_SENT, QEMU_UMEM_REQ_EOC request was ignored. handle properly it. - flush on-demand page unconditionally - improve postcopy_outgoing_ram_save_live and postcopy_outgoing_begin() - use qemu_fopen_fd - use memory api instead of obsolete api - segv in postcopy_outgoing_check_all_ram_sent() - catch up qapi change --- arch_init.c | 19 ++- migration-exec.c |4 + migration-fd.c| 17 ++ migration-postcopy-stub.c | 22 +++ migration-postcopy.c | 450 + migration-tcp.c | 25 ++- migration-unix.c | 26 ++- migration.c | 32 +++- migration.h | 12 ++ savevm.c | 22 ++- sysemu.h |2 +- 11 files changed, 614 insertions(+), 17 deletions(-) diff --git a/arch_init.c b/arch_init.c index 22d9691..3599e5c 100644 --- a/arch_init.c +++ b/arch_init.c @@ -154,6 +154,13 @@ static int is_dup_page(uint8_t *page) return 1; } +static bool outgoing_postcopy = false; + +void ram_save_set_params(const MigrationParams *params, void *opaque) +{ +outgoing_postcopy = params-postcopy; +} + static RAMBlock *last_block_sent = NULL; static uint64_t bytes_transferred; @@ -343,6 +350,15 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) uint64_t expected_time = 0; int ret; +if (stage == 1) { +bytes_transferred = 0; +last_block_sent = NULL; +ram_save_set_last_block(NULL, 0); +} +if (outgoing_postcopy) { +return postcopy_outgoing_ram_save_live(f, stage, opaque); +} + if (stage 0) { memory_global_dirty_log_stop(); return 0; @@ -351,9 +367,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) memory_global_sync_dirty_bitmap(get_system_memory()); if (stage == 1) { -bytes_transferred = 0; -last_block_sent = NULL; -ram_save_set_last_block(NULL, 0); sort_ram_list(); /* Make sure all dirty bits are set */ diff --git a/migration-exec.c b/migration-exec.c index 7f08b3b..a90da5c 100644 --- a/migration-exec.c +++ b/migration-exec.c @@ -64,6 +64,10 @@ int exec_start_outgoing_migration(MigrationState *s, const char *command) { FILE *f; +if (s-params.postcopy) { +return -ENOSYS; +} + f = popen(command, w); if (f == NULL) { DPRINTF(Unable to popen exec target\n); diff --git a/migration-fd.c b/migration-fd.c index 42b8162..83b5f18 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -90,6 +90,23 @@ int fd_start_outgoing_migration(MigrationState *s, const char *fdname) s-write = fd_write; s-close = fd_close; +if (s-params.postcopy) { +int flags = fcntl(s-fd, F_GETFL); +if ((flags O_ACCMODE) != O_RDWR) { +goto err_after_open; +} + +s-fd_read = dup(s-fd); +if (s-fd_read == -1) { +goto err_after_open; +} +s-file_read = qemu_fopen_fd(s-fd_read); +if (s-file_read == NULL) { +close(s-fd_read); +goto err_after_open; +} +} + migrate_fd_connect(s); return 0; diff --git a/migration-postcopy-stub.c b/migration-postcopy-stub.c index f9ebcbe..9c64827 100644 --- a/migration-postcopy-stub.c +++ b/migration-postcopy-stub.c @@ -24,6 +24,28 @@ #include sysemu.h #include migration.h +int postcopy_outgoing_create_read_socket(MigrationState *s) +{ +return -ENOSYS; +} + +int postcopy_outgoing_ram_save_live(Monitor *mon, +QEMUFile *f, int stage, void *opaque) +{ +return -ENOSYS; +} + +void *postcopy_outgoing_begin(MigrationState *ms) +{ +return NULL; +} + +int postcopy_outgoing_ram_save_background(Monitor *mon, QEMUFile *f, + void *postcopy) +{ +return -ENOSYS; +} + int postcopy_incoming_init(const char *incoming, bool incoming_postcopy) { return -ENOSYS; diff --git a/migration-postcopy.c b/migration-postcopy.c index 5913e05..eb37094 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -177,6 +177,456 @@ static void postcopy_incoming_send_req(QEMUFile *f, } } +static int postcopy_outgoing_recv_req_idstr(QEMUFile *f, +struct qemu_umem_req *req, +size_t *offset) +{ +int ret; + +req-len = qemu_peek_byte(f, *offset); +*offset += 1; +if (req-len == 0) { +return -EAGAIN; +} +req-idstr = g_malloc((int)req-len + 1); +ret = qemu_peek_buffer(f, (uint8_t*)req-idstr, req-len, *offset); +*offset += ret; +if (ret != req-len) { +g_free(req-idstr); +req
[PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the size of prefault
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- hmp-commands.hx | 15 ++- hmp.c|3 +++ migration.c | 20 migration.h |2 ++ qapi-schema.json |3 ++- 5 files changed, 37 insertions(+), 6 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 3c647f7..38e5c95 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -798,26 +798,31 @@ ETEXI { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, -.params = [-d] [-b] [-i] [-p [-n]] uri, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, + forward:i?,backward:i?, +.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword], .help = migrate to URI (using -d to not wait for completion) \n\t\t\t -b for migration without shared storage with full copy of disk\n\t\t\t -i for migration without shared storage with incremental copy of disk (base image shared between src and destination) \n\t\t\t-p for migration with postcopy mode enabled - \n\t\t\t-n for no background transfer of postcopy mode, + \n\t\t\t-n for no background transfer of postcopy mode + \n\t\t\tforward: the number of pages to + forward-prefault when postcopy (default 0) + \n\t\t\tbackward: the number of pages to + backward-prefault when postcopy (default 0), .mhandler.cmd = hmp_migrate, }, STEXI -@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} +@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @var{forward} @var{backward} @findex migrate Migrate to @var{uri} (using -d to not wait for completion). -b for migration with full copy of disk -i for migration with incremental copy of disk (base image is shared) - -p for migration with postcopy mode enabled + -p for migration with postcopy mode enabled (forward/backward is prefault size when postcopy) -n for migration with postcopy mode enabled without background transfer ETEXI diff --git a/hmp.c b/hmp.c index d546a52..79a9c86 100644 --- a/hmp.c +++ b/hmp.c @@ -913,11 +913,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) int inc = qdict_get_try_bool(qdict, inc, 0); int postcopy = qdict_get_try_bool(qdict, postcopy, 0); int nobg = qdict_get_try_bool(qdict, nobg, 0); +int forward = qdict_get_try_int(qdict, forward, 0); +int backward = qdict_get_try_int(qdict, backward, 0); const char *uri = qdict_get_str(qdict, uri); Error *err = NULL; qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, !!postcopy, postcopy, !!nobg, nobg, +!!forward, forward, !!backward, backward, err); if (err) { monitor_printf(mon, migrate: %s\n, error_get_pretty(err)); diff --git a/migration.c b/migration.c index e8be0d1..e026085 100644 --- a/migration.c +++ b/migration.c @@ -423,6 +423,8 @@ void migrate_del_blocker(Error *reason) void qmp_migrate(const char *uri, bool has_blk, bool blk, bool has_inc, bool inc, bool has_detach, bool detach, bool has_postcopy, bool postcopy, bool has_nobg, bool nobg, + bool has_forward, int64_t forward, + bool has_backward, int64_t backward, Error **errp) { MigrationState *s = migrate_get_current(); @@ -431,6 +433,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, .shared = false, .postcopy = false, .nobg = false, +.prefault_forward = 0, +.prefault_backward = 0, }; const char *p; int ret; @@ -447,6 +451,22 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, if (has_nobg) { params.nobg = nobg; } +if (has_forward) { +if (forward 0) { +error_set(errp, QERR_INVALID_PARAMETER_VALUE, + forward, forward = 0); +return; +} +params.prefault_forward = forward; +} +if (has_backward) { +if (backward 0) { +error_set(errp, QERR_INVALID_PARAMETER_VALUE, + backward, backward = 0); +return; +} +params.prefault_backward = backward; +} if (s-state == MIG_STATE_ACTIVE) { error_set(errp, QERR_MIGRATION_ACTIVE); diff --git a/migration.h b/migration.h index 90f3bdf..9a9b9c6 100644 --- a/migration.h +++ b/migration.h @@ -24,6 +24,8 @@ struct MigrationParams { int shared; int postcopy; int nobg; +int64_t prefault_forward; +int64_t prefault_backward; }; typedef struct MigrationState MigrationState; diff --git a/qapi-schema.json b/qapi-schema.json index 5861fb9..83c2170 100644
[PATCH v2 32/41] savevm: add new section that is used by postcopy
This is used by postcopy to tell the total length of QEMU_VM_SECTION_FULL and QEMU_VM_SUBSECTION from outgoing to incoming. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- savevm.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/savevm.c b/savevm.c index 318ec61..3adabad 100644 --- a/savevm.c +++ b/savevm.c @@ -1597,6 +1597,7 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se) #define QEMU_VM_SECTION_END 0x03 #define QEMU_VM_SECTION_FULL 0x04 #define QEMU_VM_SUBSECTION 0x05 +#define QEMU_VM_POSTCOPY 0x10 bool qemu_savevm_state_blocked(Error **errp) { -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 39/41] postcopy/outgoing: implement prefault
When page is requested, send surrounding pages are also sent. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- migration-postcopy.c | 56 + 1 files changed, 51 insertions(+), 5 deletions(-) diff --git a/migration-postcopy.c b/migration-postcopy.c index eb37094..6165657 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -353,6 +353,36 @@ int postcopy_outgoing_ram_save_live(QEMUFile *f, int stage, void *opaque) return ret; } +static void postcopy_outgoing_ram_save_page(PostcopyOutgoingState *s, +uint64_t pgoffset, bool *written, +bool forward, +int prefault_pgoffset) +{ +ram_addr_t offset; +int ret; + +if (forward) { +pgoffset += prefault_pgoffset; +} else { +if (pgoffset prefault_pgoffset) { +return; +} +pgoffset -= prefault_pgoffset; +} + +offset = pgoffset TARGET_PAGE_BITS; +if (offset = s-last_block_read-length) { +assert(forward); +assert(prefault_pgoffset 0); +return; +} + +ret = ram_save_page(s-mig_buffered_write, s-last_block_read, offset); +if (ret 0) { +*written = true; +} +} + /* * return value * 0: continue postcopy mode @@ -364,6 +394,7 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s, bool *written) { int i; +uint64_t j; RAMBlock *block; DPRINTF(cmd %d state %d\n, req-cmd, s-state); @@ -398,11 +429,26 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s, break; } for (i = 0; i req-nr; i++) { -DPRINTF(offs[%d] 0x%PRIx64\n, i, req-pgoffs[i]); -int ret = ram_save_page(s-mig_buffered_write, s-last_block_read, -req-pgoffs[i] TARGET_PAGE_BITS); -if (ret 0) { -*written = true; +DPRINTF(pgoffs[%d] 0x%PRIx64\n, i, req-pgoffs[i]); +postcopy_outgoing_ram_save_page(s, req-pgoffs[i], written, +true, 0); +} +/* forward prefault */ +for (j = 1; j = s-ms-params.prefault_forward; j++) { +for (i = 0; i req-nr; i++) { +DPRINTF(pgoffs[%d] + 0x%PRIx64 0x%PRIx64\n, +i, j, req-pgoffs[i] + j); +postcopy_outgoing_ram_save_page(s, req-pgoffs[i], written, +true, j); +} +} +/* backward prefault */ +for (j = 1; j = s-ms-params.prefault_backward; j++) { +for (i = 0; i req-nr; i++) { +DPRINTF(pgoffs[%d] - 0x%PRIx64 0x%PRIx64\n, +i, j, req-pgoffs[i] - j); +postcopy_outgoing_ram_save_page(s, req-pgoffs[i], written, +false, j); } } break; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command
Added -p option to migrate command for postcopy mode and introduce postcopy parameter for migration to indicate that postcopy mode is enabled. Add -n option for postcopy migration which indicates disabling background transfer. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Chnages v1 - v2: - catch up for qapi change --- hmp-commands.hx | 12 hmp.c|6 +- migration.c |9 + migration.h |2 ++ qapi-schema.json |3 ++- qmp-commands.hx |4 +++- savevm.c |2 ++ 7 files changed, 31 insertions(+), 7 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 18cb415..3c647f7 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -798,23 +798,27 @@ ETEXI { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,uri:s, -.params = [-d] [-b] [-i] uri, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, +.params = [-d] [-b] [-i] [-p [-n]] uri, .help = migrate to URI (using -d to not wait for completion) \n\t\t\t -b for migration without shared storage with full copy of disk\n\t\t\t -i for migration without shared storage with incremental copy of disk - (base image shared between src and destination), + (base image shared between src and destination) + \n\t\t\t-p for migration with postcopy mode enabled + \n\t\t\t-n for no background transfer of postcopy mode, .mhandler.cmd = hmp_migrate, }, STEXI -@item migrate [-d] [-b] [-i] @var{uri} +@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @findex migrate Migrate to @var{uri} (using -d to not wait for completion). -b for migration with full copy of disk -i for migration with incremental copy of disk (base image is shared) + -p for migration with postcopy mode enabled + -n for migration with postcopy mode enabled without background transfer ETEXI { diff --git a/hmp.c b/hmp.c index bb0952e..d546a52 100644 --- a/hmp.c +++ b/hmp.c @@ -911,10 +911,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) int detach = qdict_get_try_bool(qdict, detach, 0); int blk = qdict_get_try_bool(qdict, blk, 0); int inc = qdict_get_try_bool(qdict, inc, 0); +int postcopy = qdict_get_try_bool(qdict, postcopy, 0); +int nobg = qdict_get_try_bool(qdict, nobg, 0); const char *uri = qdict_get_str(qdict, uri); Error *err = NULL; -qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, err); +qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, +!!postcopy, postcopy, !!nobg, nobg, +err); if (err) { monitor_printf(mon, migrate: %s\n, error_get_pretty(err)); error_free(err); diff --git a/migration.c b/migration.c index 3b97aec..7ad62ef 100644 --- a/migration.c +++ b/migration.c @@ -388,12 +388,15 @@ void migrate_del_blocker(Error *reason) void qmp_migrate(const char *uri, bool has_blk, bool blk, bool has_inc, bool inc, bool has_detach, bool detach, + bool has_postcopy, bool postcopy, bool has_nobg, bool nobg, Error **errp) { MigrationState *s = migrate_get_current(); MigrationParams params = { .blk = false, .shared = false, +.postcopy = false, +.nobg = false, }; const char *p; int ret; @@ -404,6 +407,12 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, if (has_inc) { params.shared = inc; } +if (has_postcopy) { +params.postcopy = postcopy; +} +if (has_nobg) { +params.nobg = nobg; +} if (s-state == MIG_STATE_ACTIVE) { error_set(errp, QERR_MIGRATION_ACTIVE); diff --git a/migration.h b/migration.h index 4bbcf06..091b446 100644 --- a/migration.h +++ b/migration.h @@ -22,6 +22,8 @@ struct MigrationParams { int blk; int shared; +int postcopy; +int nobg; }; typedef struct MigrationState MigrationState; diff --git a/qapi-schema.json b/qapi-schema.json index 2ca7195..5861fb9 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1717,7 +1717,8 @@ # Since: 0.14.0 ## { 'command': 'migrate', - 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } } + 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' , + '*postcopy': 'bool', '*nobg': 'bool'} } # @xen-save-devices-state: # diff --git a/qmp-commands.hx b/qmp-commands.hx index db980fa..7b5e5b7 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -469,7 +469,7 @@ EQMP { .name = migrate, -.args_type = detach:-d,blk:-b,inc:-i,uri:s, +.args_type = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s, .mhandler.cmd_new = qmp_marshal_input_migrate, }, @@ -483,6
[PATCH v2 41/41] migration/postcopy: add movebg mode
When movebg mode is enabled, the point to send background page is set to the next page to on-demand page. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- migration-postcopy.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/migration-postcopy.c b/migration-postcopy.c index 6165657..3df88d7 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -442,6 +442,14 @@ static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s, true, j); } } +if (s-ms-params.movebg) { +ram_addr_t last_offset = +(req-pgoffs[req-nr - 1] + s-ms-params.prefault_forward) +TARGET_PAGE_BITS; +last_offset = MIN(last_offset, + s-last_block_read-length - TARGET_PAGE_SIZE); +ram_save_set_last_block(s-last_block_read, last_offset); +} /* backward prefault */ for (j = 1; j = s-ms-params.prefault_backward; j++) { for (i = 0; i req-nr; i++) { -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration
This patch implements postcopy live migration for incoming part Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v3 - v4: - fork umemd early to address qemu devices touching guest ram via post/pre_load - code clean up on initialization - Makefile.target migration-postcopy.c is target dependent due to TARGET_PAGE_xxx So it can't be shared between target architecture. - use qemu_fopen_fd - introduce incoming_flags_use_umem_make_present flag - use MADV_DONTNEED Changes v2 - v3: - make incoming socket nonblocking - several clean ups - Dropped QEMUFilePipe - Moved QEMUFileNonblock to buffered_file - Split out into umem/incoming/outgoing Changes v1 - v2: - make mig_read nonblocking when socket - updates for umem device changes --- Makefile.target|5 + cpu-all.h |7 + exec.c | 20 +- migration-exec.c |4 + migration-fd.c |6 + .../linux/umem.h = migration-postcopy-stub.c | 47 +- migration-postcopy.c | 1267 migration.c|4 + migration.h| 13 + qemu-common.h |1 + qemu-options.hx|5 +- savevm.c | 43 + vl.c |8 +- 13 files changed, 1409 insertions(+), 21 deletions(-) copy linux-headers/linux/umem.h = migration-postcopy-stub.c (55%) create mode 100644 migration-postcopy.c diff --git a/Makefile.target b/Makefile.target index 1582904..618bd3e 100644 --- a/Makefile.target +++ b/Makefile.target @@ -4,6 +4,7 @@ GENERATED_HEADERS = config-target.h CONFIG_NO_PCI = $(if $(subst n,,$(CONFIG_PCI)),n,y) CONFIG_NO_KVM = $(if $(subst n,,$(CONFIG_KVM)),n,y) CONFIG_NO_XEN = $(if $(subst n,,$(CONFIG_XEN)),n,y) +CONFIG_NO_POSTCOPY = $(if $(subst n,,$(CONFIG_POSTCOPY)),n,y) include ../config-host.mak include config-devices.mak @@ -196,6 +197,10 @@ LIBS+=-lz obj-i386-$(CONFIG_KVM) += hyperv.o +obj-$(CONFIG_POSTCOPY) += migration-postcopy.o +obj-$(CONFIG_NO_POSTCOPY) += migration-postcopy-stub.o +common-obj-$(CONFIG_POSTCOPY) += umem.o + QEMU_CFLAGS += $(VNC_TLS_CFLAGS) QEMU_CFLAGS += $(VNC_SASL_CFLAGS) QEMU_CFLAGS += $(VNC_JPEG_CFLAGS) diff --git a/cpu-all.h b/cpu-all.h index ff7f827..e0956bc 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -486,6 +486,9 @@ extern ram_addr_t ram_size; /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ #define RAM_PREALLOC_MASK (1 0) +/* RAM is allocated via umem for postcopy incoming mode */ +#define RAM_POSTCOPY_UMEM_MASK (1 1) + typedef struct RAMBlock { struct MemoryRegion *mr; uint8_t *host; @@ -497,6 +500,10 @@ typedef struct RAMBlock { #if defined(__linux__) !defined(TARGET_S390X) int fd; #endif + +#ifdef CONFIG_POSTCOPY +UMem *umem;/* for incoming postcopy mode */ +#endif } RAMBlock; typedef struct RAMList { diff --git a/exec.c b/exec.c index 785..e5ff2ed 100644 --- a/exec.c +++ b/exec.c @@ -36,6 +36,7 @@ #include arch_init.h #include memory.h #include exec-memory.h +#include migration.h #if defined(CONFIG_USER_ONLY) #include qemu.h #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) @@ -2632,6 +2633,13 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, new_block-host = host; new_block-flags |= RAM_PREALLOC_MASK; } else { +#ifdef CONFIG_POSTCOPY +if (incoming_postcopy) { +ram_addr_t page_size = getpagesize(); +size = (size + page_size - 1) ~(page_size - 1); +mem_path = NULL; +} +#endif if (mem_path) { #if defined (__linux__) !defined(TARGET_S390X) new_block-host = file_ram_alloc(new_block, size, mem_path); @@ -2709,7 +2717,13 @@ void qemu_ram_free(ram_addr_t addr) QLIST_REMOVE(block, next); if (block-flags RAM_PREALLOC_MASK) { ; -} else if (mem_path) { +} +#ifdef CONFIG_POSTCOPY +else if (block-flags RAM_POSTCOPY_UMEM_MASK) { +postcopy_incoming_ram_free(block-umem); +} +#endif +else if (mem_path) { #if defined (__linux__) !defined(TARGET_S390X) if (block-fd) { munmap(block-host, block-length); @@ -2755,6 +2769,10 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length) } else { flags = MAP_FIXED; munmap(vaddr, length); +if (block-flags RAM_POSTCOPY_UMEM_MASK) { +postcopy_incoming_qemu_pages_unmapped(addr, length); +block-flags = ~RAM_POSTCOPY_UMEM_MASK
[PATCH v2 10/41] arch_init: simplify a bit by ram_find_block()
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 21 - exec.c | 12 ++-- 2 files changed, 14 insertions(+), 19 deletions(-) diff --git a/arch_init.c b/arch_init.c index 9981abe..73bf250 100644 --- a/arch_init.c +++ b/arch_init.c @@ -432,11 +432,10 @@ void *ram_load_host_from_stream_offset(QEMUFile *f, qemu_get_buffer(f, (uint8_t *)id, len); id[len] = 0; -QLIST_FOREACH(block, ram_list.blocks, next) { -if (!strncmp(id, block-idstr, sizeof(id))) { -*last_blockp = block; -return memory_region_get_ram_ptr(block-mr) + offset; -} +block = ram_find_block(id, len); +if (block) { +*last_blockp = block; +return memory_region_get_ram_ptr(block-mr) + offset; } fprintf(stderr, Can't find block %s!\n, id); @@ -466,19 +465,15 @@ int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes) id[len] = 0; length = qemu_get_be64(f); -QLIST_FOREACH(block, ram_list.blocks, next) { -if (!strncmp(id, block-idstr, sizeof(id))) { -if (block-length != length) -return -EINVAL; -break; -} -} - +block = ram_find_block(id, len); if (!block) { fprintf(stderr, Unknown ramblock \%s\, cannot accept migration\n, id); return -EINVAL; } +if (block-length != length) { +return -EINVAL; +} total_ram_bytes -= length; } diff --git a/exec.c b/exec.c index a0494c7..078a408 100644 --- a/exec.c +++ b/exec.c @@ -33,6 +33,7 @@ #include kvm.h #include hw/xen.h #include qemu-timer.h +#include arch_init.h #include memory.h #include exec-memory.h #if defined(CONFIG_USER_ONLY) @@ -2609,12 +2610,11 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev) } pstrcat(new_block-idstr, sizeof(new_block-idstr), name); -QLIST_FOREACH(block, ram_list.blocks, next) { -if (block != new_block !strcmp(block-idstr, new_block-idstr)) { -fprintf(stderr, RAMBlock \%s\ already registered, abort!\n, -new_block-idstr); -abort(); -} +block = ram_find_block(new_block-idstr, strlen(new_block-idstr)); +if (block != new_block) { +fprintf(stderr, RAMBlock \%s\ already registered, abort!\n, +new_block-idstr); +abort(); } } -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/2] export necessary symbols
Cc: Andrea Arcangeli aarca...@redhat.com Cc: Avi Kivity a...@redhat.com Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- mm/memcontrol.c |1 + mm/mempolicy.c |1 + mm/shmem.c |1 + 3 files changed, 3 insertions(+), 0 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ac35bcc..265ba2f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2761,6 +2761,7 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm, } return ret; } +EXPORT_SYMBOL_GPL(mem_cgroup_cache_charge); /* * While swap-in, try_charge - commit or cancel, the page is locked. diff --git a/mm/mempolicy.c b/mm/mempolicy.c index f15c1b2..ede02e2 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1907,6 +1907,7 @@ retry_cpuset: goto retry_cpuset; return page; } +EXPORT_SYMBOL_GPL(alloc_pages_vma); /** * alloc_pages_current - Allocate pages. diff --git a/mm/shmem.c b/mm/shmem.c index 585bd22..f2b8aa7 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3041,6 +3041,7 @@ int shmem_zero_setup(struct vm_area_struct *vma) vma-vm_flags |= VM_CAN_NONLINEAR; return 0; } +EXPORT_SYMBOL_GPL(shmem_zero_setup); /** * shmem_read_mapping_page_gfp - read into page cache, using specified page allocation flags. -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/2] postcopy migration: umem: Linux char device for postcopy
This is Linux kernel driver for qemu/kvm postcopy live migration. This is used by qemu/kvm postcopy live migration patch. TODO: - Consider FUSE/CUSE option So far several mmap patches for FUSE/CUSE are floating around. (their purpose isn't different from our purpose, though). They haven't merged into the upstream yet. The driver specific part in qemu patches is modularized. So I expect it wouldn't be difficult to switch kernel driver to CUSE based driver. ioctl commands: UMEM_INIT: initialize umem device for qemu UMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process This is _NOT_ implemented yet. anonymous I'm not sure whether this can be implemented or not. --- Changes v2 - v3: - make fault handler killable - make use of read()/write() - documentation Changes version 1 - 2: - make ioctl structures padded to align - un-KVM KVM_VMEM - UMEM - dropped some ioctl commands as Avi requested Isaku Yamahata (2): export necessary symbols umem: chardevice for kvm postcopy Documentation/misc-devices/umem.txt | 303 drivers/char/Kconfig| 10 + drivers/char/Makefile |1 + drivers/char/umem.c | 900 +++ include/linux/umem.h| 42 ++ mm/memcontrol.c |1 + mm/mempolicy.c |1 + mm/shmem.c |1 + 8 files changed, 1259 insertions(+), 0 deletions(-) create mode 100644 Documentation/misc-devices/umem.txt create mode 100644 drivers/char/umem.c create mode 100644 include/linux/umem.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/2] umem: chardevice for kvm postcopy
This is a character device to hook page access. The page fault in the area is propagated to another user process by this chardriver. Then, the process fills the page contents and resolves the page fault. Cc: Andrea Arcangeli aarca...@redhat.com Cc: Avi Kivity a...@redhat.com Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v3 - v4: - simplified umem_init: kill {a,}sync_req_max - make fault handler killable even when core-dumping - documentation Changes v2 - v3: - made fault handler killable - allow O_LARGEFILE - improve to handle FAULT_FLAG_ALLOW_RETRY - smart on async fault --- Documentation/misc-devices/umem.txt | 303 drivers/char/Kconfig| 10 + drivers/char/Makefile |1 + drivers/char/umem.c | 900 +++ include/linux/umem.h| 42 ++ 5 files changed, 1256 insertions(+), 0 deletions(-) create mode 100644 Documentation/misc-devices/umem.txt create mode 100644 drivers/char/umem.c create mode 100644 include/linux/umem.h diff --git a/Documentation/misc-devices/umem.txt b/Documentation/misc-devices/umem.txt new file mode 100644 index 000..61bba5f --- /dev/null +++ b/Documentation/misc-devices/umem.txt @@ -0,0 +1,303 @@ +User process backed memory driver += + +Intro += +User process backed memory driver provides /dev/umem device. +This /dev/umem device is designed for some sort of distributed shared memory. +Especially post-copy live migration with KVM. + +page fault in the area backed by this driver is propagated to (other) server +process which serves the page contents. Usually the server process fetches +page contents from the remote machine. Then the faulting process continues. + + +Kernel-User protocol + +ioctl +UMEM_INIT: Initialize the umem device with some parameters. + IN size: the area size in bytes (which is rounded up to page size) + OUT shmem_fd: the file descript to tmpfs that is associated to this umem +device This is served as backing store of this umem device. + +mmap: Mapping the initialized umem device provides the area which + is served by user process. + The fault in this area is propagated to umem device via read + system call. +read: kernel notifies a process that pages are faulted by returning + page offset in page size in u64 format. + umem device is pollable for read. +write: Process notifies kernel that the page is ready to access + by writing page offset in page size in u64 format. + + +operation flow +== + +| +V + open(/dev/umem) +| +V + ioctl(UMEM_INIT) +| +V + Here we have two file descriptors to + umem device and shmem file +| +| daemon process which serves +| page fault +V + fork()---, +| | +V V + close(shmem) mmap(shmem file) +| | +V V + mmap(umem device) close(shmem file) +| | +V | + close(umem device) | +| | + now the setup is done| + work on the umem area| +| | +V V + access umem area (poll and) read(umem) +| | +V V + page fault -- read system call returns + block page offsets + | + V +create page contents +(usually pull the page + from remote) +write the page contents +to the shmem which was +mmapped above
[PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory
This is used by postcopy live migration. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- buffered_file.c | 50 ++ buffered_file.h | 10 ++ 2 files changed, 60 insertions(+), 0 deletions(-) diff --git a/buffered_file.c b/buffered_file.c index 5198923..4f0c98e 100644 --- a/buffered_file.c +++ b/buffered_file.c @@ -106,6 +106,56 @@ static void buffer_flush(QEMUBuffer *buf, QEMUFile *file, /*** + * read/write to buffer on memory + */ + +static int buf_close(void *opaque) +{ +QEMUFileBuf *s = opaque; +buffer_destroy(s-buf); +g_free(s); +return 0; +} + +static int buf_put_buffer(void *opaque, + const uint8_t *buf, int64_t pos, int size) +{ +QEMUFileBuf *s = opaque; +buffer_append(s-buf, buf, size); +return size; +} + +QEMUFileBuf *qemu_fopen_buf_write(void) +{ +QEMUFileBuf *s = g_malloc0(sizeof(*s)); + +s-file = qemu_fopen_ops(s, buf_put_buffer, NULL, buf_close, + NULL, NULL, NULL); +return s; +} + +static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) +{ +QEMUFileBuf *s = opaque; +ssize_t len = MIN(size, s-buf.buffer_capacity - s-buf.buffer_size); +memcpy(buf, s-buf.buffer + s-buf.buffer_size, len); +s-buf.buffer_size += len; +return len; +} + +/* This get the ownership of buf. */ +QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size) +{ +QEMUFileBuf *s = g_malloc0(sizeof(*s)); +s-buf.buffer = buf; +s-buf.buffer_size = 0; /* this is used as index to read */ +s-buf.buffer_capacity = size; +s-file = qemu_fopen_ops(s, NULL, buf_get_buffer, buf_close, + NULL, NULL, NULL); +return s-file; +} + +/*** * Nonblocking write only file */ static ssize_t nonblock_flush_buffer_putbuf(void *opaque, diff --git a/buffered_file.h b/buffered_file.h index 2712e01..9e28bef 100644 --- a/buffered_file.h +++ b/buffered_file.h @@ -24,6 +24,16 @@ struct QEMUBuffer { }; typedef struct QEMUBuffer QEMUBuffer; +struct QEMUFileBuf { +QEMUFile *file; +QEMUBuffer buf; +}; +typedef struct QEMUFileBuf QEMUFileBuf; + +QEMUFileBuf *qemu_fopen_buf_write(void); +/* This get the ownership of buf. */ +QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size); + struct QEMUFileNonblock { int fd; QEMUFile *file; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 29/41] umem.h: import Linux umem.h
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- linux-headers/linux/umem.h | 42 ++ 1 files changed, 42 insertions(+), 0 deletions(-) create mode 100644 linux-headers/linux/umem.h diff --git a/linux-headers/linux/umem.h b/linux-headers/linux/umem.h new file mode 100644 index 000..0cf7399 --- /dev/null +++ b/linux-headers/linux/umem.h @@ -0,0 +1,42 @@ +/* + * User process backed memory. + * This is mainly for KVM post copy. + * + * Copyright (c) 2011, + * National Institute of Advanced Industrial Science and Technology + * + * https://sites.google.com/site/grivonhome/quick-kvm-migration + * Author: Isaku Yamahata yamahata at valinux co jp + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#ifndef __LINUX_UMEM_H +#define __LINUX_UMEM_H + +#include linux/types.h +#include linux/ioctl.h + +struct umem_init { + __u64 size; /* in bytes */ + __s32 shmem_fd; + __s32 padding; +}; + +#define UMEMIO 0x1E + +/* ioctl for umem fd */ +#define UMEM_INIT _IOWR(UMEMIO, 0x0, struct umem_init) +#define UMEM_MAKE_VMA_ANONYMOUS_IO (UMEMIO, 0x1) + +#endif /* __LINUX_UMEM_H */ -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 31/41] configure: add CONFIG_POSTCOPY option
Add enable/disable postcopy mode. No dynamic test yet. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- configure | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/configure b/configure index 1f338f8..21de4cb 100755 --- a/configure +++ b/configure @@ -194,6 +194,7 @@ zlib=yes guest_agent=yes libiscsi= coroutine= +postcopy=yes # parse CC options first for opt do @@ -824,6 +825,10 @@ for opt do ;; --disable-guest-agent) guest_agent=no ;; + --enable-postcopy) postcopy=yes + ;; + --disable-postcopy) postcopy=no + ;; *) echo ERROR: unknown option $opt; show_help=yes ;; esac @@ -1110,6 +1115,8 @@ echo --disable-guest-agentdisable building of the QEMU Guest Agent echo --enable-guest-agent enable building of the QEMU Guest Agent echo --with-coroutine=BACKEND coroutine backend. Supported options: echogthread, ucontext, sigaltstack, windows +echo --disable-postcopy disable postcopy mode for live migration +echo --enable-postcopyenable postcopy mode for live migration echo echo NOTE: The object files are built at the place where configure is launched exit 1 @@ -3029,6 +3036,7 @@ echo OpenGL support$opengl echo libiscsi support $libiscsi echo build guest agent $guest_agent echo coroutine backend $coroutine_backend +echo postcopy support $postcopy if test $sdl_too_old = yes; then echo - Your SDL version is too old - please upgrade to have SDL support @@ -3329,6 +3337,10 @@ if test $libiscsi = yes ; then echo CONFIG_LIBISCSI=y $config_host_mak fi +if test $postcopy = yes ; then + echo CONFIG_POSTCOPY=y $config_host_mak +fi + # XXX: suppress that if [ $bsd = yes ] ; then echo CONFIG_BSD=y $config_host_mak -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 25/41] migration: factor out parameters into MigrationParams
Introduce MigrationParams for parameters of migration. Cc: Orit Wasserman owass...@redhat.com Cc: Juan Quintela quint...@redhat.com Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Changes v1 - v2: - catch up qapi change --- block-migration.c |8 migration.c | 21 +++-- migration.h |8 ++-- qemu-common.h |1 + savevm.c | 10 +++--- sysemu.h |2 +- vmstate.h |2 +- 7 files changed, 35 insertions(+), 17 deletions(-) diff --git a/block-migration.c b/block-migration.c index fd2..b95b4e1 100644 --- a/block-migration.c +++ b/block-migration.c @@ -700,13 +700,13 @@ static int block_load(QEMUFile *f, void *opaque, int version_id) return 0; } -static void block_set_params(int blk_enable, int shared_base, void *opaque) +static void block_set_params(const MigrationParams *params, void *opaque) { -block_mig_state.blk_enable = blk_enable; -block_mig_state.shared_base = shared_base; +block_mig_state.blk_enable = params-blk; +block_mig_state.shared_base = params-shared; /* shared base means that blk_enable = 1 */ -block_mig_state.blk_enable |= shared_base; +block_mig_state.blk_enable |= params-shared; } void blk_mig_init(void) diff --git a/migration.c b/migration.c index 48a8f68..3b97aec 100644 --- a/migration.c +++ b/migration.c @@ -352,7 +352,7 @@ void migrate_fd_connect(MigrationState *s) migrate_fd_close); DPRINTF(beginning savevm\n); -ret = qemu_savevm_state_begin(s-file, s-blk, s-shared); +ret = qemu_savevm_state_begin(s-file, s-params); if (ret 0) { DPRINTF(failed, %d\n, ret); migrate_fd_error(s); @@ -361,15 +361,13 @@ void migrate_fd_connect(MigrationState *s) migrate_fd_put_ready(s); } -static MigrationState *migrate_init(int blk, int inc) +static MigrationState *migrate_init(const MigrationParams *params) { MigrationState *s = migrate_get_current(); int64_t bandwidth_limit = s-bandwidth_limit; memset(s, 0, sizeof(*s)); -s-blk = blk; -s-shared = inc; - +s-params = *params; s-bandwidth_limit = bandwidth_limit; s-state = MIG_STATE_SETUP; @@ -393,9 +391,20 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, Error **errp) { MigrationState *s = migrate_get_current(); +MigrationParams params = { +.blk = false, +.shared = false, +}; const char *p; int ret; +if (has_blk) { +params.blk = blk; +} +if (has_inc) { +params.shared = inc; +} + if (s-state == MIG_STATE_ACTIVE) { error_set(errp, QERR_MIGRATION_ACTIVE); return; @@ -410,7 +419,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } -s = migrate_init(blk, inc); +s = migrate_init(params); if (strstart(uri, tcp:, p)) { ret = tcp_start_outgoing_migration(s, p, errp); diff --git a/migration.h b/migration.h index d0dd536..59e6e68 100644 --- a/migration.h +++ b/migration.h @@ -19,6 +19,11 @@ #include notify.h #include error.h +struct MigrationParams { +int blk; +int shared; +}; + typedef struct MigrationState MigrationState; struct MigrationState @@ -31,8 +36,7 @@ struct MigrationState int (*close)(MigrationState *s); int (*write)(MigrationState *s, const void *buff, size_t size); void *opaque; -int blk; -int shared; +MigrationParams params; }; void process_incoming_migration(QEMUFile *f); diff --git a/qemu-common.h b/qemu-common.h index 91e0562..057c810 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -263,6 +263,7 @@ typedef struct EventNotifier EventNotifier; typedef struct VirtIODevice VirtIODevice; typedef struct QEMUSGList QEMUSGList; typedef struct SHPCDevice SHPCDevice; +typedef struct MigrationParams MigrationParams; typedef uint64_t pcibus_t; diff --git a/savevm.c b/savevm.c index 5640614..318ec61 100644 --- a/savevm.c +++ b/savevm.c @@ -1611,7 +1611,7 @@ bool qemu_savevm_state_blocked(Error **errp) return false; } -int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared) +int qemu_savevm_state_begin(QEMUFile *f, const MigrationParams *params) { SaveStateEntry *se; int ret; @@ -1620,7 +1620,7 @@ int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared) if(se-set_params == NULL) { continue; } - se-set_params(blk_enable, shared, se-opaque); + se-set_params(params, se-opaque); } qemu_put_be32(f, QEMU_VM_FILE_MAGIC); @@ -1758,13 +1758,17 @@ void qemu_savevm_state_cancel(QEMUFile *f) static int qemu_savevm_state(QEMUFile *f) { int ret; +MigrationParams params = { +.blk = 0, +.shared = 0, +}; if (qemu_savevm_state_blocked(NULL)) { ret = -EINVAL; goto out; } -ret = qemu_savevm_state_begin
[PATCH v2 23/41] migration.c: remove redundant line in migrate_init()
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- migration.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/migration.c b/migration.c index 3f485d3..753addb 100644 --- a/migration.c +++ b/migration.c @@ -367,7 +367,6 @@ static MigrationState *migrate_init(int blk, int inc) int64_t bandwidth_limit = s-bandwidth_limit; memset(s, 0, sizeof(*s)); -s-bandwidth_limit = bandwidth_limit; s-blk = blk; s-shared = inc; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- buffered_file.c | 115 +++ buffered_file.h | 13 ++ 2 files changed, 128 insertions(+), 0 deletions(-) diff --git a/buffered_file.c b/buffered_file.c index 22dd4c9..5198923 100644 --- a/buffered_file.c +++ b/buffered_file.c @@ -106,6 +106,121 @@ static void buffer_flush(QEMUBuffer *buf, QEMUFile *file, /*** + * Nonblocking write only file + */ +static ssize_t nonblock_flush_buffer_putbuf(void *opaque, +const void *data, size_t size) +{ +QEMUFileNonblock *s = opaque; +ssize_t ret = write(s-fd, data, size); +if (ret == -1) { +return -errno; +} +return ret; +} + +static void nonblock_flush_buffer(QEMUFileNonblock *s) +{ +buffer_flush(s-buf, s-file, s, nonblock_flush_buffer_putbuf); + +if (s-buf.buffer_size 0) { +s-buf.freeze_output = true; +} +} + +static int nonblock_put_buffer(void *opaque, + const uint8_t *buf, int64_t pos, int size) +{ +QEMUFileNonblock *s = opaque; +int error; +ssize_t len = 0; + +error = qemu_file_get_error(s-file); +if (error) { +return error; +} + +nonblock_flush_buffer(s); +error = qemu_file_get_error(s-file); +if (error) { +return error; +} + +while (!s-buf.freeze_output size 0) { +ssize_t ret; +assert(s-buf.buffer_size == 0); + +ret = write(s-fd, buf, size); +if (ret == -1) { +if (errno == EINTR) { +continue; +} else if (errno == EAGAIN) { +s-buf.freeze_output = true; +} else { +qemu_file_set_error(s-file, errno); +} +break; +} + +len += ret; +buf += ret; +size -= ret; +} + +if (size 0) { +buffer_append(s-buf, buf, size); +len += size; +} +return len; +} + +int nonblock_pending_size(QEMUFileNonblock *s) +{ +return qemu_pending_size(s-file) + s-buf.buffer_size; +} + +void nonblock_fflush(QEMUFileNonblock *s) +{ +s-buf.freeze_output = false; +nonblock_flush_buffer(s); +if (!s-buf.freeze_output) { +qemu_fflush(s-file); +} +} + +void nonblock_wait_for_flush(QEMUFileNonblock *s) +{ +while (nonblock_pending_size(s) 0) { +fd_set fds; +FD_ZERO(fds); +FD_SET(s-fd, fds); +select(s-fd + 1, NULL, fds, NULL, NULL); + +nonblock_fflush(s); +} +} + +static int nonblock_close(void *opaque) +{ +QEMUFileNonblock *s = opaque; +nonblock_wait_for_flush(s); +buffer_destroy(s-buf); +g_free(s); +return 0; +} + +QEMUFileNonblock *qemu_fopen_nonblock(int fd) +{ +QEMUFileNonblock *s = g_malloc0(sizeof(*s)); + +s-fd = fd; +fcntl_setfl(fd, O_NONBLOCK); +s-file = qemu_fopen_ops(s, nonblock_put_buffer, NULL, nonblock_close, + NULL, NULL, NULL); +return s; +} + +/*** * Buffered File */ diff --git a/buffered_file.h b/buffered_file.h index d3ef546..2712e01 100644 --- a/buffered_file.h +++ b/buffered_file.h @@ -24,6 +24,19 @@ struct QEMUBuffer { }; typedef struct QEMUBuffer QEMUBuffer; +struct QEMUFileNonblock { +int fd; +QEMUFile *file; + +QEMUBuffer buf; +}; +typedef struct QEMUFileNonblock QEMUFileNonblock; + +QEMUFileNonblock *qemu_fopen_nonblock(int fd); +int nonblock_pending_size(QEMUFileNonblock *s); +void nonblock_fflush(QEMUFileNonblock *s); +void nonblock_wait_for_flush(QEMUFileNonblock *s); + typedef ssize_t (BufferedPutFunc)(void *opaque, const void *data, size_t size); typedef void (BufferedPutReadyFunc)(void *opaque); typedef void (BufferedWaitForUnfreezeFunc)(void *opaque); -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd
Now qemu_file_fd() replaces qemu_stdio_fd(). Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- migration-exec.c |4 ++-- migration-fd.c |2 +- qemu-file.h |1 - savevm.c | 12 4 files changed, 3 insertions(+), 16 deletions(-) diff --git a/migration-exec.c b/migration-exec.c index 6c97db9..95e9779 100644 --- a/migration-exec.c +++ b/migration-exec.c @@ -98,7 +98,7 @@ static void exec_accept_incoming_migration(void *opaque) QEMUFile *f = opaque; process_incoming_migration(f); -qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL); +qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL); qemu_fclose(f); } @@ -113,7 +113,7 @@ int exec_start_incoming_migration(const char *command) return -errno; } -qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, +qemu_set_fd_handler2(qemu_file_fd(f), NULL, exec_accept_incoming_migration, NULL, f); return 0; diff --git a/migration-fd.c b/migration-fd.c index 50138ed..d9c13fe 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -104,7 +104,7 @@ static void fd_accept_incoming_migration(void *opaque) QEMUFile *f = opaque; process_incoming_migration(f); -qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL); +qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL); qemu_fclose(f); } diff --git a/qemu-file.h b/qemu-file.h index 98a8023..1a12e7d 100644 --- a/qemu-file.h +++ b/qemu-file.h @@ -70,7 +70,6 @@ QEMUFile *qemu_fdopen(int fd, const char *mode); QEMUFile *qemu_fopen_socket(int fd); QEMUFile *qemu_popen(FILE *popen_file, const char *mode); QEMUFile *qemu_popen_cmd(const char *command, const char *mode); -int qemu_stdio_fd(QEMUFile *f); int qemu_file_fd(QEMUFile *f); void qemu_fflush(QEMUFile *f); void qemu_buffered_file_drain(QEMUFile *f); diff --git a/savevm.c b/savevm.c index cba1a69..ec9f5d0 100644 --- a/savevm.c +++ b/savevm.c @@ -293,18 +293,6 @@ QEMUFile *qemu_popen_cmd(const char *command, const char *mode) return qemu_popen(popen_file, mode); } -/* TODO: replace this with qemu_file_fd() */ -int qemu_stdio_fd(QEMUFile *f) -{ -QEMUFileStdio *p; -int fd; - -p = (QEMUFileStdio *)f-opaque; -fd = fileno(p-stdio_file); - -return fd; -} - QEMUFile *qemu_fdopen(int fd, const char *mode) { QEMUFileStdio *s; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 30/41] update-linux-headers.sh: teach umem.h to update-linux-headers.sh
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- scripts/update-linux-headers.sh |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 9d2a4bc..2afdd54 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -43,7 +43,7 @@ done rm -rf $output/linux-headers/linux mkdir -p $output/linux-headers/linux -for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do +for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h umem.h; do cp $tmpdir/include/linux/$header $output/linux-headers/linux done if [ -L $linux/source ]; then -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 26/41] buffered_file: factor out buffer management logic
This patch factors out buffer management logic. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- buffered_file.c | 141 +- buffered_file.h |8 +++ 2 files changed, 94 insertions(+), 55 deletions(-) diff --git a/buffered_file.c b/buffered_file.c index a38caec..22dd4c9 100644 --- a/buffered_file.c +++ b/buffered_file.c @@ -20,24 +20,6 @@ #include buffered_file.h //#define DEBUG_BUFFERED_FILE - -typedef struct QEMUFileBuffered -{ -BufferedPutFunc *put_buffer; -BufferedPutReadyFunc *put_ready; -BufferedWaitForUnfreezeFunc *wait_for_unfreeze; -BufferedCloseFunc *close; -void *opaque; -QEMUFile *file; -int freeze_output; -size_t bytes_xfer; -size_t xfer_limit; -uint8_t *buffer; -size_t buffer_size; -size_t buffer_capacity; -QEMUTimer *timer; -} QEMUFileBuffered; - #ifdef DEBUG_BUFFERED_FILE #define DPRINTF(fmt, ...) \ do { printf(buffered-file: fmt, ## __VA_ARGS__); } while (0) @@ -46,57 +28,71 @@ typedef struct QEMUFileBuffered do { } while (0) #endif -static void buffered_append(QEMUFileBuffered *s, -const uint8_t *buf, size_t size) -{ -if (size (s-buffer_capacity - s-buffer_size)) { -void *tmp; - -DPRINTF(increasing buffer capacity from %zu by %zu\n, -s-buffer_capacity, size + 1024); -s-buffer_capacity += size + 1024; +/*** + * buffer management + */ -tmp = g_realloc(s-buffer, s-buffer_capacity); -if (tmp == NULL) { -fprintf(stderr, qemu file buffer expansion failed\n); -exit(1); -} +static void buffer_destroy(QEMUBuffer *s) +{ +g_free(s-buffer); +} -s-buffer = tmp; +static void buffer_consume(QEMUBuffer *s, size_t offset) +{ +if (offset 0) { +assert(s-buffer_size = offset); +memmove(s-buffer, s-buffer + offset, s-buffer_size - offset); +s-buffer_size -= offset; } +} +static void buffer_append(QEMUBuffer *s, const uint8_t *buf, size_t size) +{ +#define BUF_SIZE_INC(32 * 1024) /* = IO_BUF_SIZE */ +int inc = size - (s-buffer_capacity - s-buffer_size); +if (inc 0) { +s-buffer_capacity += DIV_ROUND_UP(inc, BUF_SIZE_INC) * BUF_SIZE_INC; +s-buffer = g_realloc(s-buffer, s-buffer_capacity); +} memcpy(s-buffer + s-buffer_size, buf, size); s-buffer_size += size; } -static void buffered_flush(QEMUFileBuffered *s) +typedef ssize_t (BufferPutBuf)(void *opaque, const void *data, size_t size); + +static void buffer_flush(QEMUBuffer *buf, QEMUFile *file, + void *opaque, BufferPutBuf *put_buf) { size_t offset = 0; int error; -error = qemu_file_get_error(s-file); +error = qemu_file_get_error(file); if (error != 0) { DPRINTF(flush when error, bailing: %s\n, strerror(-error)); return; } -DPRINTF(flushing %zu byte(s) of data\n, s-buffer_size); +DPRINTF(flushing %zu byte(s) of data\n, buf-buffer_size); -while (offset s-buffer_size) { +while (offset buf-buffer_size) { ssize_t ret; -ret = s-put_buffer(s-opaque, s-buffer + offset, -s-buffer_size - offset); -if (ret == -EAGAIN) { +ret = put_buf(opaque, buf-buffer + offset, buf-buffer_size - offset); +if (ret == -EINTR) { +continue; +} else if (ret == -EAGAIN) { DPRINTF(backend not ready, freezing\n); -s-freeze_output = 1; +buf-freeze_output = true; break; } -if (ret = 0) { +if (ret 0) { DPRINTF(error flushing data, %zd\n, ret); -qemu_file_set_error(s-file, ret); +qemu_file_set_error(file, ret); +break; +} else if (ret == 0) { +DPRINTF(ret == 0\n); break; } else { DPRINTF(flushed %zd byte(s)\n, ret); @@ -104,9 +100,44 @@ static void buffered_flush(QEMUFileBuffered *s) } } -DPRINTF(flushed %zu of %zu byte(s)\n, offset, s-buffer_size); -memmove(s-buffer, s-buffer + offset, s-buffer_size - offset); -s-buffer_size -= offset; +DPRINTF(flushed %zu of %zu byte(s)\n, offset, buf-buffer_size); +buffer_consume(buf, offset); +} + + +/*** + * Buffered File + */ + +typedef struct QEMUFileBuffered +{ +BufferedPutFunc *put_buffer; +BufferedPutReadyFunc *put_ready; +BufferedWaitForUnfreezeFunc *wait_for_unfreeze; +BufferedCloseFunc *close; +void *opaque; +QEMUFile *file; +size_t bytes_xfer; +size_t xfer_limit; +QEMUTimer *timer; +QEMUBuffer buf; +} QEMUFileBuffered; + +static ssize_t buffered_flush_putbuf(void *opaque, + const
[PATCH v2 14/41] exec.c: export last_ram_offset()
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- exec-obsolete.h |1 + exec.c |4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/exec-obsolete.h b/exec-obsolete.h index 792c831..fb21dd7 100644 --- a/exec-obsolete.h +++ b/exec-obsolete.h @@ -25,6 +25,7 @@ #ifndef CONFIG_USER_ONLY +ram_addr_t qemu_last_ram_offset(void); ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, MemoryRegion *mr); ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr); diff --git a/exec.c b/exec.c index 7f44893..785 100644 --- a/exec.c +++ b/exec.c @@ -2576,7 +2576,7 @@ static ram_addr_t find_ram_offset(ram_addr_t size) return offset; } -static ram_addr_t last_ram_offset(void) +ram_addr_t qemu_last_ram_offset(void) { RAMBlock *block; ram_addr_t last = 0; @@ -2672,7 +2672,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, QLIST_INSERT_HEAD(ram_list.blocks, new_block, next); ram_list.phys_dirty = g_realloc(ram_list.phys_dirty, - last_ram_offset() TARGET_PAGE_BITS); +qemu_last_ram_offset() TARGET_PAGE_BITS); memset(ram_list.phys_dirty + (new_block-offset TARGET_PAGE_BITS), 0xff, size TARGET_PAGE_BITS); -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 08/41] arch_init/ram_load: refactor ram_load
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 67 +- arch_init.h |1 + 2 files changed, 39 insertions(+), 29 deletions(-) diff --git a/arch_init.c b/arch_init.c index c861e30..bb0cd52 100644 --- a/arch_init.c +++ b/arch_init.c @@ -438,6 +438,41 @@ static inline void *host_from_stream_offset(QEMUFile *f, return ram_load_host_from_stream_offset(f, offset, flags, block); } +int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes) +{ +/* Synchronize RAM block list */ +char id[256]; +ram_addr_t length; + +while (total_ram_bytes) { +RAMBlock *block; +uint8_t len; + +len = qemu_get_byte(f); +qemu_get_buffer(f, (uint8_t *)id, len); +id[len] = 0; +length = qemu_get_be64(f); + +QLIST_FOREACH(block, ram_list.blocks, next) { +if (!strncmp(id, block-idstr, sizeof(id))) { +if (block-length != length) +return -EINVAL; +break; +} +} + +if (!block) { +fprintf(stderr, Unknown ramblock \%s\, cannot +accept migration\n, id); +return -EINVAL; +} + +total_ram_bytes -= length; +} + +return 0; +} + int ram_load(QEMUFile *f, void *opaque, int version_id) { ram_addr_t addr; @@ -456,35 +491,9 @@ int ram_load(QEMUFile *f, void *opaque, int version_id) if (flags RAM_SAVE_FLAG_MEM_SIZE) { if (version_id == 4) { -/* Synchronize RAM block list */ -char id[256]; -ram_addr_t length; -ram_addr_t total_ram_bytes = addr; - -while (total_ram_bytes) { -RAMBlock *block; -uint8_t len; - -len = qemu_get_byte(f); -qemu_get_buffer(f, (uint8_t *)id, len); -id[len] = 0; -length = qemu_get_be64(f); - -QLIST_FOREACH(block, ram_list.blocks, next) { -if (!strncmp(id, block-idstr, sizeof(id))) { -if (block-length != length) -return -EINVAL; -break; -} -} - -if (!block) { -fprintf(stderr, Unknown ramblock \%s\, cannot -accept migration\n, id); -return -EINVAL; -} - -total_ram_bytes -= length; +error = ram_load_mem_size(f, addr); +if (error) { +return error; } } } diff --git a/arch_init.h b/arch_init.h index 0a39082..507f110 100644 --- a/arch_init.h +++ b/arch_init.h @@ -45,6 +45,7 @@ void *ram_load_host_from_stream_offset(QEMUFile *f, ram_addr_t offset, int flags, RAMBlock **last_blockp); +int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes); #endif #endif -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 04/41] arch_init: refactor host_from_stream_offset()
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 25 ++--- arch_init.h |7 +++ 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/arch_init.c b/arch_init.c index 2a53f58..36ece1d 100644 --- a/arch_init.c +++ b/arch_init.c @@ -374,21 +374,22 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) return (stage == 2) (expected_time = migrate_max_downtime()); } -static inline void *host_from_stream_offset(QEMUFile *f, -ram_addr_t offset, -int flags) +void *ram_load_host_from_stream_offset(QEMUFile *f, + ram_addr_t offset, + int flags, + RAMBlock **last_blockp) { -static RAMBlock *block = NULL; +RAMBlock *block; char id[256]; uint8_t len; if (flags RAM_SAVE_FLAG_CONTINUE) { -if (!block) { +if (!(*last_blockp)) { fprintf(stderr, Ack, bad migration stream!\n); return NULL; } -return memory_region_get_ram_ptr(block-mr) + offset; +return memory_region_get_ram_ptr((*last_blockp)-mr) + offset; } len = qemu_get_byte(f); @@ -396,14 +397,24 @@ static inline void *host_from_stream_offset(QEMUFile *f, id[len] = 0; QLIST_FOREACH(block, ram_list.blocks, next) { -if (!strncmp(id, block-idstr, sizeof(id))) +if (!strncmp(id, block-idstr, sizeof(id))) { +*last_blockp = block; return memory_region_get_ram_ptr(block-mr) + offset; +} } fprintf(stderr, Can't find block %s!\n, id); return NULL; } +static inline void *host_from_stream_offset(QEMUFile *f, +ram_addr_t offset, +int flags) +{ +static RAMBlock *block = NULL; +return ram_load_host_from_stream_offset(f, offset, flags, block); +} + int ram_load(QEMUFile *f, void *opaque, int version_id) { ram_addr_t addr; diff --git a/arch_init.h b/arch_init.h index 456637d..d84eac7 100644 --- a/arch_init.h +++ b/arch_init.h @@ -39,4 +39,11 @@ int xen_available(void); #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */ +#if defined(NEED_CPU_H) !defined(CONFIG_USER_ONLY) +void *ram_load_host_from_stream_offset(QEMUFile *f, + ram_addr_t offset, + int flags, + RAMBlock **last_blockp); +#endif + #endif -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr()
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- cpu-all.h |2 ++ exec.c| 51 +-- 2 files changed, 31 insertions(+), 22 deletions(-) diff --git a/cpu-all.h b/cpu-all.h index 028528f..ff7f827 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -508,6 +508,8 @@ extern RAMList ram_list; extern const char *mem_path; extern int mem_prealloc; +RAMBlock *qemu_get_ram_block(ram_addr_t adar); + /* Flags stored in the low bits of the TLB virtual address. These are defined so that fast path ram access is all zeros. */ /* Zero if TLB entry is valid. */ diff --git a/exec.c b/exec.c index 078a408..7f44893 100644 --- a/exec.c +++ b/exec.c @@ -2799,15 +2799,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length) } #endif /* !_WIN32 */ -/* Return a host pointer to ram allocated with qemu_ram_alloc. - With the exception of the softmmu code in this file, this should - only be used for local memory (e.g. video ram) that the device owns, - and knows it isn't going to access beyond the end of the block. - - It should not be used for general purpose DMA. - Use cpu_physical_memory_map/cpu_physical_memory_rw instead. - */ -void *qemu_get_ram_ptr(ram_addr_t addr) +RAMBlock *qemu_get_ram_block(ram_addr_t addr) { RAMBlock *block; @@ -2818,19 +2810,7 @@ void *qemu_get_ram_ptr(ram_addr_t addr) QLIST_REMOVE(block, next); QLIST_INSERT_HEAD(ram_list.blocks, block, next); } -if (xen_enabled()) { -/* We need to check if the requested address is in the RAM - * because we don't want to map the entire memory in QEMU. - * In that case just map until the end of the page. - */ -if (block-offset == 0) { -return xen_map_cache(addr, 0, 0); -} else if (block-host == NULL) { -block-host = -xen_map_cache(block-offset, block-length, 1); -} -} -return block-host + (addr - block-offset); +return block; } } @@ -2841,6 +2821,33 @@ void *qemu_get_ram_ptr(ram_addr_t addr) } /* Return a host pointer to ram allocated with qemu_ram_alloc. + With the exception of the softmmu code in this file, this should + only be used for local memory (e.g. video ram) that the device owns, + and knows it isn't going to access beyond the end of the block. + + It should not be used for general purpose DMA. + Use cpu_physical_memory_map/cpu_physical_memory_rw instead. + */ +void *qemu_get_ram_ptr(ram_addr_t addr) +{ +RAMBlock *block = qemu_get_ram_block(addr); + +if (xen_enabled()) { +/* We need to check if the requested address is in the RAM + * because we don't want to map the entire memory in QEMU. + * In that case just map until the end of the page. + */ +if (block-offset == 0) { +return xen_map_cache(addr, 0, 0); +} else if (block-host == NULL) { +block-host = +xen_map_cache(block-offset, block-length, 1); +} +} +return block-host + (addr - block-offset); +} + +/* Return a host pointer to ram allocated with qemu_ram_alloc. * Same as qemu_get_ram_ptr but avoid reordering ramblocks. */ void *qemu_safe_ram_ptr(ram_addr_t addr) -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 11/41] arch_init: factor out counting transferred bytes
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 24 1 files changed, 12 insertions(+), 12 deletions(-) diff --git a/arch_init.c b/arch_init.c index 73bf250..2617478 100644 --- a/arch_init.c +++ b/arch_init.c @@ -155,8 +155,9 @@ static int is_dup_page(uint8_t *page) } static RAMBlock *last_block_sent = NULL; +static uint64_t bytes_transferred; -int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset) +static int ram_save_page_int(QEMUFile *f, RAMBlock *block, ram_addr_t offset) { MemoryRegion *mr = block-mr; uint8_t *p; @@ -192,6 +193,13 @@ int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset) return TARGET_PAGE_SIZE; } +int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset) +{ +int bytes_sent = ram_save_page_int(f, block, offset); +bytes_transferred += bytes_sent; +return bytes_sent; +} + static RAMBlock *last_block; static ram_addr_t last_offset; @@ -228,8 +236,6 @@ int ram_save_block(QEMUFile *f) return bytes_sent; } -static uint64_t bytes_transferred; - static ram_addr_t ram_save_remaining(void) { RAMBlock *block; @@ -357,11 +363,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) bwidth = qemu_get_clock_ns(rt_clock); while ((ret = qemu_file_rate_limit(f)) == 0) { -int bytes_sent; - -bytes_sent = ram_save_block(f); -bytes_transferred += bytes_sent; -if (bytes_sent == 0) { /* no more blocks */ +if (ram_save_block(f) == 0) { /* no more blocks */ break; } } @@ -381,11 +383,9 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) /* try transferring iterative blocks of memory */ if (stage == 3) { -int bytes_sent; - /* flush all remaining blocks regardless of rate limiting */ -while ((bytes_sent = ram_save_block(f)) != 0) { -bytes_transferred += bytes_sent; +while (ram_save_block(f) != 0) { +/* nothing */ } memory_global_dirty_log_stop(); } -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size
This will be used later by postcopy migration. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- qemu-file.h |1 + savevm.c|5 + 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/qemu-file.h b/qemu-file.h index a285bef..880ef4b 100644 --- a/qemu-file.h +++ b/qemu-file.h @@ -91,6 +91,7 @@ int qemu_get_byte(QEMUFile *f); int qemu_peek_byte(QEMUFile *f, int offset); int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset); void qemu_file_skip(QEMUFile *f, int size); +int qemu_pending_size(const QEMUFile *f); static inline unsigned int qemu_get_ubyte(QEMUFile *f) { diff --git a/savevm.c b/savevm.c index 8ad843f..2992f97 100644 --- a/savevm.c +++ b/savevm.c @@ -595,6 +595,11 @@ void qemu_file_skip(QEMUFile *f, int size) } } +int qemu_pending_size(const QEMUFile *f) +{ +return f-buf_size - f-buf_index; +} + int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset) { int pending; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 21 ++--- migration.h |1 + 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/arch_init.c b/arch_init.c index 36ece1d..28e5abb 100644 --- a/arch_init.c +++ b/arch_init.c @@ -287,6 +287,19 @@ void sort_ram_list(void) g_free(blocks); } +void ram_save_live_mem_size(QEMUFile *f) +{ +RAMBlock *block; + +qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); + +QLIST_FOREACH(block, ram_list.blocks, next) { +qemu_put_byte(f, strlen(block-idstr)); +qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr)); +qemu_put_be64(f, block-length); +} +} + int ram_save_live(QEMUFile *f, int stage, void *opaque) { ram_addr_t addr; @@ -321,13 +334,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) memory_global_dirty_log_start(); -qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); - -QLIST_FOREACH(block, ram_list.blocks, next) { -qemu_put_byte(f, strlen(block-idstr)); -qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr)); -qemu_put_be64(f, block-length); -} +ram_save_live_mem_size(f); } bytes_transferred_last = bytes_transferred; diff --git a/migration.h b/migration.h index 8b9509c..e2e9b43 100644 --- a/migration.h +++ b/migration.h @@ -78,6 +78,7 @@ uint64_t ram_bytes_total(void); void sort_ram_list(void); int ram_save_block(QEMUFile *f); +void ram_save_live_mem_size(QEMUFile *f); int ram_save_live(QEMUFile *f, int stage, void *opaque); int ram_load(QEMUFile *f, void *opaque, int version_id); -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 28 migration.h |1 + 2 files changed, 17 insertions(+), 12 deletions(-) diff --git a/arch_init.c b/arch_init.c index 900cc8e..c861e30 100644 --- a/arch_init.c +++ b/arch_init.c @@ -311,9 +311,23 @@ void ram_save_live_mem_size(QEMUFile *f) } } +void ram_save_memory_set_dirty(void) +{ +RAMBlock *block; + +QLIST_FOREACH(block, ram_list.blocks, next) { +ram_addr_t addr; +for (addr = 0; addr block-length; addr += TARGET_PAGE_SIZE) { +if (!memory_region_get_dirty(block-mr, addr, TARGET_PAGE_SIZE, + DIRTY_MEMORY_MIGRATION)) { +memory_region_set_dirty(block-mr, addr, TARGET_PAGE_SIZE); +} +} +} +} + int ram_save_live(QEMUFile *f, int stage, void *opaque) { -ram_addr_t addr; uint64_t bytes_transferred_last; double bwidth = 0; uint64_t expected_time = 0; @@ -327,7 +341,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) memory_global_sync_dirty_bitmap(get_system_memory()); if (stage == 1) { -RAMBlock *block; bytes_transferred = 0; last_block_sent = NULL; last_block = NULL; @@ -335,17 +348,8 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) sort_ram_list(); /* Make sure all dirty bits are set */ -QLIST_FOREACH(block, ram_list.blocks, next) { -for (addr = 0; addr block-length; addr += TARGET_PAGE_SIZE) { -if (!memory_region_get_dirty(block-mr, addr, TARGET_PAGE_SIZE, - DIRTY_MEMORY_MIGRATION)) { -memory_region_set_dirty(block-mr, addr, TARGET_PAGE_SIZE); -} -} -} - +ram_save_memory_set_dirty(); memory_global_dirty_log_start(); - ram_save_live_mem_size(f); } diff --git a/migration.h b/migration.h index e2e9b43..6cf4512 100644 --- a/migration.h +++ b/migration.h @@ -78,6 +78,7 @@ uint64_t ram_bytes_total(void); void sort_ram_list(void); int ram_save_block(QEMUFile *f); +void ram_save_memory_set_dirty(void); void ram_save_live_mem_size(QEMUFile *f); int ram_save_live(QEMUFile *f, int stage, void *opaque); int ram_load(QEMUFile *f, void *opaque, int version_id); -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 09/41] arch_init: introduce helper function to find ram block with id string
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c | 13 + arch_init.h |1 + 2 files changed, 14 insertions(+), 0 deletions(-) diff --git a/arch_init.c b/arch_init.c index bb0cd52..9981abe 100644 --- a/arch_init.c +++ b/arch_init.c @@ -397,6 +397,19 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) return (stage == 2) (expected_time = migrate_max_downtime()); } +RAMBlock *ram_find_block(const char *id, uint8_t len) +{ +RAMBlock *block; + +QLIST_FOREACH(block, ram_list.blocks, next) { +if (!strncmp(id, block-idstr, len)) { +return block; +} +} + +return NULL; +} + void *ram_load_host_from_stream_offset(QEMUFile *f, ram_addr_t offset, int flags, diff --git a/arch_init.h b/arch_init.h index 507f110..7f5c77a 100644 --- a/arch_init.h +++ b/arch_init.h @@ -41,6 +41,7 @@ int xen_available(void); #if defined(NEED_CPU_H) !defined(CONFIG_USER_ONLY) int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset); +RAMBlock *ram_find_block(const char *id, uint8_t len); void *ram_load_host_from_stream_offset(QEMUFile *f, ram_addr_t offset, int flags, -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4
Introduce RAM_SAVE_VERSION_ID to represent version_id for ram save format. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- arch_init.c |2 +- arch_init.h |2 ++ vl.c|4 ++-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/arch_init.c b/arch_init.c index bd4e61e..2a53f58 100644 --- a/arch_init.c +++ b/arch_init.c @@ -410,7 +410,7 @@ int ram_load(QEMUFile *f, void *opaque, int version_id) int flags; int error; -if (version_id 4 || version_id 4) { +if (version_id 4 || version_id RAM_SAVE_VERSION_ID) { return -EINVAL; } diff --git a/arch_init.h b/arch_init.h index 7cc3fa7..456637d 100644 --- a/arch_init.h +++ b/arch_init.h @@ -37,4 +37,6 @@ int xen_available(void); #define RAM_SAVE_FLAG_EOS 0x10 #define RAM_SAVE_FLAG_CONTINUE 0x20 +#define RAM_SAVE_VERSION_ID 4 /* currently version 4 */ + #endif diff --git a/vl.c b/vl.c index 23ab3a3..62dc343 100644 --- a/vl.c +++ b/vl.c @@ -3436,8 +3436,8 @@ int main(int argc, char **argv, char **envp) default_drive(default_sdcard, snapshot, machine-use_scsi, IF_SD, 0, SD_OPTS); -register_savevm_live(NULL, ram, 0, 4, NULL, ram_save_live, NULL, - ram_load, NULL); +register_savevm_live(NULL, ram, 0, RAM_SAVE_VERSION_ID, NULL, + ram_save_live, NULL, ram_load, NULL); if (nb_numa_nodes 0) { int i; -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 00/41] postcopy live migration
to the shmem. | V unblock -write() to tell served pages the fault handler returns the page page fault is resolved | | pages can be sent | backgroundly | | | V | write() | | V V The specified pages-piperequest to touch pages are made present by | touching guest RAM. | | | V V reply-pipe- release the cached page | madvise(MADV_REMOVE) | | V V all the pages are pulled from the source | | V V the vma becomes anonymousUMEM_MAKE_VMA_ANONYMOUS (note: I'm not sure if this can be implemented or not) | | V V migration completesexit() Isaku Yamahata (41): arch_init: export sort_ram_list() and ram_save_block() arch_init: export RAM_SAVE_xxx flags for postcopy arch_init/ram_save: introduce constant for ram save version = 4 arch_init: refactor host_from_stream_offset() arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case arch_init: refactor ram_save_block() arch_init/ram_save_live: factor out ram_save_limit arch_init/ram_load: refactor ram_load arch_init: introduce helper function to find ram block with id string arch_init: simplify a bit by ram_find_block() arch_init: factor out counting transferred bytes arch_init: factor out setting last_block, last_offset exec.c: factor out qemu_get_ram_ptr() exec.c: export last_ram_offset() savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip savevm: qemu_pending_size() to return pending buffered size savevm, buffered_file: introduce method to drain buffer of buffered file QEMUFile: add qemu_file_fd() for later use savevm/QEMUFile: drop qemu_stdio_fd savevm/QEMUFileSocket: drop duplicated member fd savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close savevm/QEMUFile: introduce qemu_fopen_fd migration.c: remove redundant line in migrate_init() migration: export migrate_fd_completed() and migrate_fd_cleanup() migration: factor out parameters into MigrationParams buffered_file: factor out buffer management logic buffered_file: Introduce QEMUFileNonblock for nonblock write buffered_file: add qemu_file to read/write to buffer in memory umem.h: import Linux umem.h update-linux-headers.sh: teach umem.h to update-linux-headers.sh configure: add CONFIG_POSTCOPY option savevm: add new section that is used by postcopy postcopy: introduce -postcopy and -postcopy-flags option postcopy outgoing: add -p and -n option to migrate command postcopy: introduce helper functions for postcopy postcopy: implement incoming part of postcopy live migration postcopy: implement outgoing part of postcopy live migration postcopy/outgoing: add forward, backward option to specify the size of prefault postcopy/outgoing: implement prefault migrate: add -m (movebg) option to migrate command migration/postcopy: add movebg mode Makefile.target |5 + arch_init.c | 298 --- arch_init.h | 20 + block-migration.c |8 +- buffered_file.c | 322 ++-- buffered_file.h | 32 + configure | 12 + cpu-all.h |9 + exec-obsolete.h |1 + exec.c | 87 ++- hmp-commands.hx | 18 +- hmp.c | 10 +- linux-headers/linux/umem.h | 42 + migration-exec.c| 12 +- migration-fd.c | 25 +- migration-postcopy-stub.c | 77 ++ migration-postcopy.c| 1771 +++ migration-tcp.c | 25 +- migration-unix.c| 26 +- migration.c | 97 ++- migration.h | 47 +- qapi-schema.json
[PATCH v2 06/41] arch_init: refactor ram_save_block()
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- Chnages v1 - v2: - don't refer last_block which can be NULL. And avoid possible infinite loop. --- arch_init.c | 82 +- arch_init.h |1 + 2 files changed, 48 insertions(+), 35 deletions(-) diff --git a/arch_init.c b/arch_init.c index 28e5abb..900cc8e 100644 --- a/arch_init.c +++ b/arch_init.c @@ -154,6 +154,44 @@ static int is_dup_page(uint8_t *page) return 1; } +static RAMBlock *last_block_sent = NULL; + +int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset) +{ +MemoryRegion *mr = block-mr; +uint8_t *p; +int cont; + +if (!memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE, + DIRTY_MEMORY_MIGRATION)) { +return 0; +} +memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE, + DIRTY_MEMORY_MIGRATION); + +cont = (block == last_block_sent) ? RAM_SAVE_FLAG_CONTINUE : 0; +p = memory_region_get_ram_ptr(mr) + offset; +last_block_sent = block; + +if (is_dup_page(p)) { +qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS); +if (!cont) { +qemu_put_byte(f, strlen(block-idstr)); +qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr)); +} +qemu_put_byte(f, *p); +return 1; +} + +qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE); +if (!cont) { +qemu_put_byte(f, strlen(block-idstr)); +qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr)); +} +qemu_put_buffer(f, p, TARGET_PAGE_SIZE); +return TARGET_PAGE_SIZE; +} + static RAMBlock *last_block; static ram_addr_t last_offset; @@ -162,45 +200,14 @@ int ram_save_block(QEMUFile *f) RAMBlock *block = last_block; ram_addr_t offset = last_offset; int bytes_sent = 0; -MemoryRegion *mr; -if (!block) +if (!block) { block = QLIST_FIRST(ram_list.blocks); +last_block = block; +} do { -mr = block-mr; -if (memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE, -DIRTY_MEMORY_MIGRATION)) { -uint8_t *p; -int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0; - -memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE, - DIRTY_MEMORY_MIGRATION); - -p = memory_region_get_ram_ptr(mr) + offset; - -if (is_dup_page(p)) { -qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS); -if (!cont) { -qemu_put_byte(f, strlen(block-idstr)); -qemu_put_buffer(f, (uint8_t *)block-idstr, -strlen(block-idstr)); -} -qemu_put_byte(f, *p); -bytes_sent = 1; -} else { -qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE); -if (!cont) { -qemu_put_byte(f, strlen(block-idstr)); -qemu_put_buffer(f, (uint8_t *)block-idstr, -strlen(block-idstr)); -} -qemu_put_buffer(f, p, TARGET_PAGE_SIZE); -bytes_sent = TARGET_PAGE_SIZE; -} - -break; -} +bytes_sent = ram_save_page(f, block, offset); offset += TARGET_PAGE_SIZE; if (offset = block-length) { @@ -209,6 +216,10 @@ int ram_save_block(QEMUFile *f) if (!block) block = QLIST_FIRST(ram_list.blocks); } + +if (bytes_sent 0) { +break; +} } while (block != last_block || offset != last_offset); last_block = block; @@ -318,6 +329,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque) if (stage == 1) { RAMBlock *block; bytes_transferred = 0; +last_block_sent = NULL; last_block = NULL; last_offset = 0; sort_ram_list(); diff --git a/arch_init.h b/arch_init.h index d84eac7..0a39082 100644 --- a/arch_init.h +++ b/arch_init.h @@ -40,6 +40,7 @@ int xen_available(void); #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */ #if defined(NEED_CPU_H) !defined(CONFIG_USER_ONLY) +int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset); void *ram_load_host_from_stream_offset(QEMUFile *f, ram_addr_t offset, int flags, -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote: Hello Isaku Yamahata, Hi. I just saw your patches..Would it be possible to email me a tar bundle of these patches (makes it easier to apply the patches to a copy of the upstream qemu.git) I uploaded them to github for those who are interested in it. git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012 git://github.com/yamahata/linux-umem.git linux-umem-june-04-2012 BTW, I am also curious if you have considered using any kind of RDMA features for optimizing the page-faults during postcopy ? Yes, RDMA is interesting topic. Can we share your use case/concern/issues? Thus we can collaborate. You may want to see Benoit's results. As long as I know, he has not published his code yet. thanks, Thanks Vinod -- Message: 1 Date: Mon, 4 Jun 2012 18:57:02 +0900 From: Isaku Yamahatayamah...@valinux.co.jp To: qemu-de...@nongnu.org, kvm@vger.kernel.org Cc: benoit.hud...@gmail.com, aarca...@redhat.com, aligu...@us.ibm.com, quint...@redhat.com, stefa...@gmail.com, t.hirofu...@aist.go.jp, dl...@redhat.com, satoshi.i...@aist.go.jp, mdr...@linux.vnet.ibm.com, yoshikawa.tak...@oss.ntt.co.jp, owass...@redhat.com, a...@redhat.com, pbonz...@redhat.com Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration Message-ID:cover.1338802190.git.yamah...@valinux.co.jp After the long time, we have v2. This is qemu part. The linux kernel part is sent separatedly. Changes v1 - v2: - split up patches for review - buffered file refactored - many bug fixes Espcially PV drivers can work with postcopy - optimization/heuristic Patches 1 - 30: refactoring exsiting code and preparation 31 - 37: implement postcopy itself (essential part) 38 - 41: some optimization/heuristic for postcopy Intro = This patch series implements postcopy live migration.[1] As discussed at KVM forum 2011, dedicated character device is used for distributed shared memory between migration source and destination. Now we can discuss/benchmark/compare with precopy. I believe there are much rooms for improvement. [1] http://wiki.qemu.org/Features/PostCopyLiveMigration Usage = You need load umem character device on the host before starting migration. Postcopy can be used for tcg and kvm accelarator. The implementation depend on only linux umem character device. But the driver dependent code is split into a file. I tested only host page size == guest page size case, but the implementation allows host page size != guest page size case. The following options are added with this patch series. - incoming part command line options -postcopy [-postcopy-flagsflags] where flags is for changing behavior for benchmark/debugging Currently the following flags are available 0: default 1: enable touching page request example: qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm - outging part options for migrate command migrate [-p [-n] [-m]] URI [prefault forward [prefault backword]] -p: indicate postcopy migration -n: disable background transferring pages: This is for benchmark/debugging -m: move background transfer of postcopy mode prefault forward: The number of forward pages which is sent with on-demand prefault backward: The number of backward pages which is sent with on-demand example: migrate -p -n tcp:dest ip address: migrate -p -n -m tcp:dest ip address: 32 0 TODO - benchmark/evaluation. Especially how async page fault affects the result. - improve/optimization At the moment at least what I'm aware of is - making incoming socket non-blocking with thread As page compression is comming, it is impractical to non-blocking read and check if the necessary data is read. - touching pages in incoming qemu process by fd handler seems suboptimal. creating dedicated thread? - outgoing handler seems suboptimal causing latency. - consider on FUSE/CUSE possibility - don't fork umemd, but create thread? basic postcopy work flow qemu on the destination | V open(/dev/umem) | V UMEM_INIT | V Here we have two file descriptors to umem device and shmem file | | umemd | daemon on the destination | Vcreate pipe to communicate fork()---, | | V | close(socket)V close(shmem) mmap
Re: [PATCH v2 00/41] postcopy live migration
On Mon, Jun 04, 2012 at 08:37:04PM +0800, Anthony Liguori wrote: On 06/04/2012 05:57 PM, Isaku Yamahata wrote: After the long time, we have v2. This is qemu part. The linux kernel part is sent separatedly. Changes v1 - v2: - split up patches for review - buffered file refactored - many bug fixes Espcially PV drivers can work with postcopy - optimization/heuristic Patches 1 - 30: refactoring exsiting code and preparation 31 - 37: implement postcopy itself (essential part) 38 - 41: some optimization/heuristic for postcopy Intro = This patch series implements postcopy live migration.[1] As discussed at KVM forum 2011, dedicated character device is used for distributed shared memory between migration source and destination. Now we can discuss/benchmark/compare with precopy. I believe there are much rooms for improvement. [1] http://wiki.qemu.org/Features/PostCopyLiveMigration Usage = You need load umem character device on the host before starting migration. Postcopy can be used for tcg and kvm accelarator. The implementation depend on only linux umem character device. But the driver dependent code is split into a file. I tested only host page size == guest page size case, but the implementation allows host page size != guest page size case. The following options are added with this patch series. - incoming part command line options -postcopy [-postcopy-flagsflags] where flags is for changing behavior for benchmark/debugging Currently the following flags are available 0: default 1: enable touching page request example: qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm - outging part options for migrate command migrate [-p [-n] [-m]] URI [prefault forward [prefault backword]] -p: indicate postcopy migration -n: disable background transferring pages: This is for benchmark/debugging -m: move background transfer of postcopy mode prefault forward: The number of forward pages which is sent with on-demand prefault backward: The number of backward pages which is sent with on-demand example: migrate -p -n tcp:dest ip address: migrate -p -n -m tcp:dest ip address: 32 0 TODO - benchmark/evaluation. Especially how async page fault affects the result. I don't mean to beat on a dead horse, but I really don't understand the point of postcopy migration other than the fact that it's possible. It's a lot of code and a new ABI in an area where we already have too much difficulty maintaining our ABI. Without a compelling real world case with supporting benchmarks for why we need postcopy and cannot improve precopy, I'm against merging this. Some new results are available at https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_yamahata_postcopy.pdf precopy assumes that the network bandwidth are wide enough and the number of dirty pages converges. But it doesn't always hold true. - planned migration predictability of total migration time is important - dynamic consolidation In cloud use cases, the resources of physical machine are usually over committed. When physical machine becomes over loaded, some VMs are moved to another physical host to balance the load. precopy can't move VMs promptly. compression makes things worse. - inter data center migration With L2 over L3 technology, it has becoming common to create a virtual data center which actually spans over multi physical data centers. It is useful to migrate VMs over physical data centers as disaster recovery. The network bandwidth between DCs is narrower than LAN case. So precopy assumption wouldn't hold. - In case that network bandwidth might be limited by QoS, precopy assumption doesn't hold. thanks, -- yamahata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
On Mon, Jun 04, 2012 at 07:27:25AM -0700, Chegu Vinod wrote: On 6/4/2012 6:13 AM, Isaku Yamahata wrote: On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote: Hello Isaku Yamahata, Hi. I just saw your patches..Would it be possible to email me a tar bundle of these patches (makes it easier to apply the patches to a copy of the upstream qemu.git) I uploaded them to github for those who are interested in it. git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012 git://github.com/yamahata/linux-umem.git linux-umem-june-04-2012 Thanks for the pointer... BTW, I am also curious if you have considered using any kind of RDMA features for optimizing the page-faults during postcopy ? Yes, RDMA is interesting topic. Can we share your use case/concern/issues? Looking at large sized guests (256GB and higher) running cpu/memory intensive enterprise workloads. The concerns are the same...i.e. having a predictable total migration time, minimal downtime/freeze-time and of course minimal service degradation to the workload(s) in the VM or the co-located VM's... How large of a guest have you tested your changes with and what kind of workloads have you used so far ? Only up to several GB VM. Off course We'd like to benchmark with real huge VM (several hundred GB), but it's somewhat difficult. Thus we can collaborate. You may want to see Benoit's results. Yes. 'have already seen some of Benoit's results. Great. Hence the question about use of RDMA techniques for post copy. So far my implementation doesn't used RDMA. As long as I know, he has not published his code yet. Thanks Vinod thanks, Thanks Vinod -- Message: 1 Date: Mon, 4 Jun 2012 18:57:02 +0900 From: Isaku Yamahatayamah...@valinux.co.jp To: qemu-de...@nongnu.org, kvm@vger.kernel.org Cc: benoit.hud...@gmail.com, aarca...@redhat.com, aligu...@us.ibm.com, quint...@redhat.com, stefa...@gmail.com, t.hirofu...@aist.go.jp, dl...@redhat.com, satoshi.i...@aist.go.jp, mdr...@linux.vnet.ibm.com, yoshikawa.tak...@oss.ntt.co.jp, owass...@redhat.com, a...@redhat.com, pbonz...@redhat.com Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration Message-ID:cover.1338802190.git.yamah...@valinux.co.jp After the long time, we have v2. This is qemu part. The linux kernel part is sent separatedly. Changes v1 - v2: - split up patches for review - buffered file refactored - many bug fixes Espcially PV drivers can work with postcopy - optimization/heuristic Patches 1 - 30: refactoring exsiting code and preparation 31 - 37: implement postcopy itself (essential part) 38 - 41: some optimization/heuristic for postcopy Intro = This patch series implements postcopy live migration.[1] As discussed at KVM forum 2011, dedicated character device is used for distributed shared memory between migration source and destination. Now we can discuss/benchmark/compare with precopy. I believe there are much rooms for improvement. [1] http://wiki.qemu.org/Features/PostCopyLiveMigration Usage = You need load umem character device on the host before starting migration. Postcopy can be used for tcg and kvm accelarator. The implementation depend on only linux umem character device. But the driver dependent code is split into a file. I tested only host page size == guest page size case, but the implementation allows host page size != guest page size case. The following options are added with this patch series. - incoming part command line options -postcopy [-postcopy-flagsflags] where flags is for changing behavior for benchmark/debugging Currently the following flags are available 0: default 1: enable touching page request example: qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm - outging part options for migrate command migrate [-p [-n] [-m]] URI [prefault forward [prefault backword]] -p: indicate postcopy migration -n: disable background transferring pages: This is for benchmark/debugging -m: move background transfer of postcopy mode prefault forward: The number of forward pages which is sent with on-demand prefault backward: The number of backward pages which is sent with on-demand example: migrate -p -n tcp:dest ip address: migrate -p -n -m tcp:dest ip address: 32 0 TODO - benchmark/evaluation. Especially how async page fault affects the result. - improve/optimization At the moment at least what I'm aware of is - making incoming socket non-blocking with thread As page compression is comming, it is impractical to non-blocking read and check if the necessary data is read. - touching pages in incoming qemu process by fd handler seems suboptimal. creating dedicated thread? - outgoing handler seems
Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller
Oh nice work. On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote: This adds support for SHPC interface, as defined by PCI Standard Hot-Plug Controller and Subsystem Specification, Rev 1.0 http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10 Only SHPC intergrated with a PCI-to-PCI bridge is supported, SHPC integrated with a host bridge would need more work. All main SHPC features are supported: - MRL sensor Does this just report latch status? (It seems so.) Do you plan to provide interfaces to manipulate the latch? - Attention button - Attention indicator - Power indicator Wake on hotplug and serr generation are stubbed out but unused as we don't have interfaces to generate these events ATM. One issue that isn't completely resolved is that qemu currently expects an eject interface, which SHPC does not provide: it merely removes the power to device and it's up to the user to remove the device from slot. This patch works around that by ejecting the device when power is removed and power LED goes off. TODO: - migration support - fix dependency on pci_internals.h If I didn't miss the code, - QMP command for pushing attention button. - QMP command to get LED status - QMP events for LED on/off thanks, Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.objs |1 + hw/pci.h |6 + hw/shpc.c | 646 + hw/shpc.h | 40 qemu-common.h |1 + 5 files changed, 694 insertions(+), 0 deletions(-) create mode 100644 hw/shpc.c create mode 100644 hw/shpc.h diff --git a/Makefile.objs b/Makefile.objs index 391e524..4546477 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o hw-obj-$(CONFIG_PCI) += msix.o msi.o +hw-obj-$(CONFIG_PCI) += shpc.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o diff --git a/hw/pci.h b/hw/pci.h index 33b0b18..756577e 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -125,6 +125,9 @@ enum { /* command register SERR bit enabled */ #define QEMU_PCI_CAP_SERR_BITNR 4 QEMU_PCI_CAP_SERR = (1 QEMU_PCI_CAP_SERR_BITNR), +/* Standard hot plug controller. */ +#define QEMU_PCI_SHPC_BITNR 5 +QEMU_PCI_CAP_SHPC = (1 QEMU_PCI_SHPC_BITNR), }; #define TYPE_PCI_DEVICE pci-device @@ -229,6 +232,9 @@ struct PCIDevice { /* PCI Express */ PCIExpressDevice exp; +/* SHPC */ +SHPCDevice *shpc; + /* Location of option rom */ char *romfile; bool has_rom; diff --git a/hw/shpc.c b/hw/shpc.c new file mode 100644 index 000..4baec29 --- /dev/null +++ b/hw/shpc.c @@ -0,0 +1,646 @@ +#include strings.h +#include stdint.h +#include range.h +#include shpc.h +#include pci.h +#include pci_internals.h + +/* TODO: model power only and disabled slot states. */ +/* TODO: handle SERR and wakeups */ +/* TODO: consider enabling 66MHz support */ + +/* TODO: remove fully only on state DISABLED and LED off. + * track state to properly record this. */ + +/* SHPC Working Register Set */ +#define SHPC_BASE_OFFSET 0x00 /* 4 bytes */ +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */ +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */ +#define SHPC_NSLOTS 0x0C /* 1 byte */ +#define SHPC_FIRST_DEV0x0D /* 1 byte */ +#define SHPC_PHYS_SLOT0x0E /* 2 byte */ +#define SHPC_PHYS_NUM_MAX 0x7ff +#define SHPC_PHYS_NUM_UP 0x1000 +#define SHPC_PHYS_MRL 0x4000 +#define SHPC_PHYS_BUTTON 0x8000 +#define SHPC_SEC_BUS 0x10 /* 2 bytes */ +#define SHPC_SEC_BUS_33 0x0 +#define SHPC_SEC_BUS_66 0x1 /* Unused */ +#define SHPC_SEC_BUS_MASK 0x7 +#define SHPC_MSI_CTL 0x12 /* 1 byte */ +#define SHPC_PROG_IFC 0x13 /* 1 byte */ +#define SHPC_PROG_IFC_1_0 0x1 +#define SHPC_CMD_CODE 0x14 /* 1 byte */ +#define SHPC_CMD_TRGT 0x15 /* 1 byte */ +#define SHPC_CMD_TRGT_MIN 0x1 +#define SHPC_CMD_TRGT_MAX 0x1f +#define SHPC_CMD_STATUS 0x16 /* 2 bytes */ +#define SHPC_CMD_STATUS_BUSY 0x1 +#define SHPC_CMD_STATUS_MRL_OPEN 0x2 +#define SHPC_CMD_STATUS_INVALID_CMD 0x4 +#define SHPC_CMD_STATUS_INVALID_MODE 0x8 +#define SHPC_INT_LOCATOR 0x18 /* 4 bytes */ +#define SHPC_INT_COMMAND 0x1 +#define SHPC_SERR_LOCATOR 0x1C /* 4 bytes */ +#define SHPC_SERR_INT 0x20 /* 4 bytes */ +#define SHPC_INT_DIS 0x1 +#define SHPC_SERR_DIS 0x2 +#define SHPC_CMD_INT_DIS 0x4 +#define SHPC_ARB_SERR_DIS 0x8 +#define SHPC_CMD_DETECTED 0x1 +#define SHPC_ARB_DETECTED 0x2 + /* 4 bytes * slot # (start from 0) */ +#define SHPC_SLOT_REG(s) (0x24 + (s) * 4) + /* 2 bytes */ +#define SHPC_SLOT_STATUS(s) (0x0 + SHPC_SLOT_REG(s)) + +/* Same slot state masks are used
Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller
On Mon, Feb 13, 2012 at 01:49:32PM +0200, Michael S. Tsirkin wrote: On Mon, Feb 13, 2012 at 07:03:52PM +0900, Isaku Yamahata wrote: Oh nice work. On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote: This adds support for SHPC interface, as defined by PCI Standard Hot-Plug Controller and Subsystem Specification, Rev 1.0 http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10 Only SHPC intergrated with a PCI-to-PCI bridge is supported, SHPC integrated with a host bridge would need more work. All main SHPC features are supported: - MRL sensor Does this just report latch status? (It seems so.) What happens is that adding a device closes the latch, removing a device opens the latch. This simplifies the number of supported configurations significantly. Do you plan to provide interfaces to manipulate the latch? I didn't plan to do this, and this is non-trivial. Do you just want this for empty slots? And why? No, I just wondered your plan. - Attention button - Attention indicator - Power indicator Wake on hotplug and serr generation are stubbed out but unused as we don't have interfaces to generate these events ATM. One issue that isn't completely resolved is that qemu currently expects an eject interface, which SHPC does not provide: it merely removes the power to device and it's up to the user to remove the device from slot. This patch works around that by ejecting the device when power is removed and power LED goes off. TODO: - migration support - fix dependency on pci_internals.h If I didn't miss the code, - QMP command for pushing attention button. - QMP command to get LED status It's easy to add these, so I'd accept such a patch, but I wonder why. My concern is how libvirt/virt-manger (or other UI) presents slot status to operators/users. - QMP events for LED on/off There's also blink :) thanks, I'm concerned that a guest can flood the management with such events. It's better to send a single LED change event, then we can suppress further events until next get LED status command. Makes sense. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.objs |1 + hw/pci.h |6 + hw/shpc.c | 646 + hw/shpc.h | 40 qemu-common.h |1 + 5 files changed, 694 insertions(+), 0 deletions(-) create mode 100644 hw/shpc.c create mode 100644 hw/shpc.h diff --git a/Makefile.objs b/Makefile.objs index 391e524..4546477 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o hw-obj-$(CONFIG_PCI) += msix.o msi.o +hw-obj-$(CONFIG_PCI) += shpc.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o diff --git a/hw/pci.h b/hw/pci.h index 33b0b18..756577e 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -125,6 +125,9 @@ enum { /* command register SERR bit enabled */ #define QEMU_PCI_CAP_SERR_BITNR 4 QEMU_PCI_CAP_SERR = (1 QEMU_PCI_CAP_SERR_BITNR), +/* Standard hot plug controller. */ +#define QEMU_PCI_SHPC_BITNR 5 +QEMU_PCI_CAP_SHPC = (1 QEMU_PCI_SHPC_BITNR), }; #define TYPE_PCI_DEVICE pci-device @@ -229,6 +232,9 @@ struct PCIDevice { /* PCI Express */ PCIExpressDevice exp; +/* SHPC */ +SHPCDevice *shpc; + /* Location of option rom */ char *romfile; bool has_rom; diff --git a/hw/shpc.c b/hw/shpc.c new file mode 100644 index 000..4baec29 --- /dev/null +++ b/hw/shpc.c @@ -0,0 +1,646 @@ +#include strings.h +#include stdint.h +#include range.h +#include shpc.h +#include pci.h +#include pci_internals.h + +/* TODO: model power only and disabled slot states. */ +/* TODO: handle SERR and wakeups */ +/* TODO: consider enabling 66MHz support */ + +/* TODO: remove fully only on state DISABLED and LED off. + * track state to properly record this. */ + +/* SHPC Working Register Set */ +#define SHPC_BASE_OFFSET 0x00 /* 4 bytes */ +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */ +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */ +#define SHPC_NSLOTS 0x0C /* 1 byte */ +#define SHPC_FIRST_DEV0x0D /* 1 byte */ +#define SHPC_PHYS_SLOT0x0E /* 2 byte */ +#define SHPC_PHYS_NUM_MAX 0x7ff +#define SHPC_PHYS_NUM_UP 0x1000 +#define SHPC_PHYS_MRL 0x4000 +#define SHPC_PHYS_BUTTON 0x8000 +#define SHPC_SEC_BUS 0x10 /* 2 bytes */ +#define SHPC_SEC_BUS_33 0x0 +#define SHPC_SEC_BUS_66 0x1 /* Unused */ +#define SHPC_SEC_BUS_MASK 0x7
Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy
Very interesting. We can cooperate for better (postcopy) live migration. The code doesn't seem available yet, I'm eager for it. On Fri, Jan 13, 2012 at 01:09:30AM +, Benoit Hudzia wrote: Hi, Sorry to jump to hijack the thread like that , however i would like to just to inform you that we recently achieve a milestone out of the research project I'm leading. We enhanced KVM in order to deliver post copy live migration using RDMA at kernel level. Few point on the architecture of the system : * RDMA communication engine in kernel ( you can use soft iwarp or soft ROCE if you don't have hardware acceleration, however we also support standard RDMA enabled NIC) . Do you mean infiniband subsystem? * Naturally Page are transferred with Zerop copy protocol * Leverage the async page fault system. * Pre paging / faulting * No context switch as everything is handled within kernel and using the page fault system. * Hybrid migration ( pre + post copy) available Ah, I've been also planing this. After pre-copy phase, is the dirty bitmap sent? So far I've thought naively that pre-copy phase would be finished by the number of iterations. On the other hand your choice is timeout of pre-copy phase. Do you have rationale? or it was just natural for you? * Rely on an independent Kernel Module * No modification to the KVM kernel Module * Minimal Modification to the Qemu-Kvm code * We plan to add the page prioritization algo in order to optimise the pre paging algo and background transfer Where do you plan to implement? in qemu or in your kernel module? This algo could be shared. thanks in advance. You can learn a little bit more and see a demo here: http://tinyurl.com/8xa2bgl I hope to be able to provide more detail on the design soon. As well as more concrete demo of the system ( live migration of VM running large enterprise apps such as ERP or In memory DB) Note: this is just a step stone as the post copy live migration mainly enable us to validate the architecture design and code. Regards Benoit Regards Benoit On 12 January 2012 13:59, Avi Kivity a...@redhat.com wrote: On 01/04/2012 05:03 AM, Isaku Yamahata wrote: Yes, it's quite doable in user space(qemu) with a kernel-enhancement. And it would be easy to convert a separated daemon process into a thread in qemu. I think it should be done out side of qemu process for some reasons. (I just repeat same discussion at the KVM-forum because no one remembers it) - ptrace (and its variant) ?? Some people want to investigate guest ram on host (qemu stopped or lively). ?? For example, enhance crash utility and it will attach qemu process and ?? debug guest kernel. To debug the guest kernel you don't need to stop qemu itself. ?? I agree it's a problem for qemu debugging though. - core dump ?? qemu process may core-dump. ?? As postmortem analysis, people want to investigate guest RAM. ?? Again enhance crash utility and it will read the core file and analyze ?? guest kernel. ?? When creating core, the qemu process is already dead. Yes, strong point. It precludes the above possibilities to handle fault in qemu process. I agree. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at ??http://vger.kernel.org/majordomo-info.html -- The production of too many useful things results in too many useless people -- yamahata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy
One more question. Does your architecture/implementation (in theory) allow KVM memory features like swap, KSM, THP? On Fri, Jan 13, 2012 at 11:03:23AM +0900, Isaku Yamahata wrote: Very interesting. We can cooperate for better (postcopy) live migration. The code doesn't seem available yet, I'm eager for it. On Fri, Jan 13, 2012 at 01:09:30AM +, Benoit Hudzia wrote: Hi, Sorry to jump to hijack the thread like that , however i would like to just to inform you that we recently achieve a milestone out of the research project I'm leading. We enhanced KVM in order to deliver post copy live migration using RDMA at kernel level. Few point on the architecture of the system : * RDMA communication engine in kernel ( you can use soft iwarp or soft ROCE if you don't have hardware acceleration, however we also support standard RDMA enabled NIC) . Do you mean infiniband subsystem? * Naturally Page are transferred with Zerop copy protocol * Leverage the async page fault system. * Pre paging / faulting * No context switch as everything is handled within kernel and using the page fault system. * Hybrid migration ( pre + post copy) available Ah, I've been also planing this. After pre-copy phase, is the dirty bitmap sent? So far I've thought naively that pre-copy phase would be finished by the number of iterations. On the other hand your choice is timeout of pre-copy phase. Do you have rationale? or it was just natural for you? * Rely on an independent Kernel Module * No modification to the KVM kernel Module * Minimal Modification to the Qemu-Kvm code * We plan to add the page prioritization algo in order to optimise the pre paging algo and background transfer Where do you plan to implement? in qemu or in your kernel module? This algo could be shared. thanks in advance. You can learn a little bit more and see a demo here: http://tinyurl.com/8xa2bgl I hope to be able to provide more detail on the design soon. As well as more concrete demo of the system ( live migration of VM running large enterprise apps such as ERP or In memory DB) Note: this is just a step stone as the post copy live migration mainly enable us to validate the architecture design and code. Regards Benoit Regards Benoit On 12 January 2012 13:59, Avi Kivity a...@redhat.com wrote: On 01/04/2012 05:03 AM, Isaku Yamahata wrote: Yes, it's quite doable in user space(qemu) with a kernel-enhancement. And it would be easy to convert a separated daemon process into a thread in qemu. I think it should be done out side of qemu process for some reasons. (I just repeat same discussion at the KVM-forum because no one remembers it) - ptrace (and its variant) ?? Some people want to investigate guest ram on host (qemu stopped or lively). ?? For example, enhance crash utility and it will attach qemu process and ?? debug guest kernel. To debug the guest kernel you don't need to stop qemu itself. ?? I agree it's a problem for qemu debugging though. - core dump ?? qemu process may core-dump. ?? As postmortem analysis, people want to investigate guest RAM. ?? Again enhance crash utility and it will read the core file and analyze ?? guest kernel. ?? When creating core, the qemu process is already dead. Yes, strong point. It precludes the above possibilities to handle fault in qemu process. I agree. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at ??http://vger.kernel.org/majordomo-info.html -- The production of too many useful things results in too many useless people -- yamahata -- yamahata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy
On Mon, Jan 02, 2012 at 06:05:51PM +0100, Andrea Arcangeli wrote: On Thu, Dec 29, 2011 at 06:01:45PM +0200, Avi Kivity wrote: On 12/29/2011 06:00 PM, Avi Kivity wrote: The NFS client has exactly the same issue, if you mount it with the intr option. In fact you could use the NFS client as a trivial umem/cuse prototype. Actually, NFS can return SIGBUS, it doesn't care about restarting daemons. During KVMForum I suggested to a few people that it could be done entirely in userland with PROT_NONE. So the problem is if we do it in userland with the current functionality you'll run out of VMAs and slowdown performance too much. But all you need is the ability to map single pages in the address space. The only special requirement is that a new vma must not be created during the map operation. It'd be very similar to remap_file_pages for MAP_SHARED, it also was created to avoid having to create new vmas on a large MAP_SHARED mapping and no other reason at all. In our case we deal with a large MAP_ANONYMOUS mapping and we must alter the pte without creating new vmas but the problem is very similar to remap_file_pages. Qemu in the dst node can do: mmap(MAP_ANONYMOUS) fault_area_prepare(start, end, signalnr) prepare_fault_area will map the range with the magic pte. Then when the signalnr fires, you do: send(givemepageX) recv(tmpaddr_aligned, PAGE_SIZE,...); fault_area_map(final_dest_aligned, tmpaddr_aligned, size) map_fault_area will check the pgprot of the two vmas mapping final_dest_aligned and tmpaddr_aligned have the same vma-vm_pgprot and various other vma bits, and if all ok, it'll just copy the pte from tmpaddr_aligned, to final_dest_aligned and it'll update the page-index. It can fail if the page is shared to avoid dealing with the non-linearity of the page mapped in multiple vmas. You basically need a bypass to avoid altering the pgprot of the vma, and enter into the pte a magic thing that fires signal handlers if accessed, without having to create new vmas. gup/gup_fast and stuff should just always fallback into handle_mm_fault when encountering such a thing, so returning failure as if gup_fast was run on a address beyond the end of the i_size in the MAP_SHARED case. Yes, it's quite doable in user space(qemu) with a kernel-enhancement. And it would be easy to convert a separated daemon process into a thread in qemu. I think it should be done out side of qemu process for some reasons. (I just repeat same discussion at the KVM-forum because no one remembers it) - ptrace (and its variant) Some people want to investigate guest ram on host (qemu stopped or lively). For example, enhance crash utility and it will attach qemu process and debug guest kernel. - core dump qemu process may core-dump. As postmortem analysis, people want to investigate guest RAM. Again enhance crash utility and it will read the core file and analyze guest kernel. When creating core, the qemu process is already dead. It precludes the above possibilities to handle fault in qemu process. THP already works on /dev/zero mmaps as long as it's a MAP_PRIVATE, KSM should work too but I doubt anybody tested it on MAP_PRIVATE of /dev/zero. Oh great. It seems to work with anonymous page generally of non-anonymous VMA. Is that right? If correct, THP/KSM work with mmap(MAP_PRIVATE, /dev/umem...), do they? The device driver provides an advantage in being self contained but I doubt it's simpler. I suppose after migration is complete you'll still switch the vma back to regular anonymous vma so leading to the same result? Yes, it was my original intention. The page is anonymous, but the vma isn't anonymous. I concerned that KSM/THP doesn't work with such pages. If they work, it isn't necessary to switch the VMA into anonymous. The patch 2/2 is small and self contained so it's quite attractive, I didn't see patch 1/2, was it posted? Posted. It's quite short and trivial which just do EXPORT_SYMBOL_GPL of mem_cgroup_cache_chage and shmem_zero_setup. I include it here for convenience. From e8bfda16a845eef4381872a331c6f0f200c3f7d7 Mon Sep 17 00:00:00 2001 Message-Id: e8bfda16a845eef4381872a331c6f0f200c3f7d7.1325055066.git.yamah...@valinux.co.jp In-Reply-To: cover.1325055065.git.yamah...@valinux.co.jp References: cover.1325055065.git.yamah...@valinux.co.jp From: Isaku Yamahata yamah...@valinux.co.jp Date: Thu, 11 Aug 2011 20:05:28 +0900 Subject: [PATCH 1/2] export necessary symbols Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp --- mm/memcontrol.c |1 + mm/shmem.c |1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b63f5f7..85530fc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2807,6 +2807,7 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm, return ret; } +EXPORT_SYMBOL_GPL(mem_cgroup_cache_charge
Re: [PATCH 21/21] postcopy: implement postcopy livemigration
On Thu, Dec 29, 2011 at 06:06:10PM +0200, Avi Kivity wrote: On 12/29/2011 03:26 AM, Isaku Yamahata wrote: This patch implements postcopy livemigration. +/* RAM is allocated via umem for postcopy incoming mode */ +#define RAM_POSTCOPY_UMEM_MASK (1 1) + typedef struct RAMBlock { uint8_t *host; ram_addr_t offset; @@ -485,6 +488,10 @@ typedef struct RAMBlock { #if defined(__linux__) !defined(TARGET_S390X) int fd; #endif + +#ifdef CONFIG_POSTCOPY +UMem *umem;/* for incoming postcopy mode */ +#endif } RAMBlock; Is it possible to implement this via the MemoryListener API (which replaces CPUPhysMemoryClient)? This is how kvm, vhost, and xen manage their memory tables. I'm afraid no. Those three you listed above are for outgoing part, but this case is for incoming part. The requirement is quite different from those three. What is needed is - get the corresponding RAMBlock and UMem from (id, idlen) - hook ram_alloc/ram_free (or RAM api corresponding) thanks, -- yamahata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration
On Thu, Dec 29, 2011 at 04:39:52PM -0600, Anthony Liguori wrote: TODO - benchmark/evaluation. Especially how async page fault affects the result. I'll review this series next week (Mike/Juan, please also review when you can). But we really need to think hard about whether this is the right thing to take into the tree. I worry a lot about the fact that we don't test pre-copy migration nearly enough and adding a second form just introduces more things to test. It's also not clear to me why post-copy is better. If you were going to sit down and explain to someone building a management tool when they should use pre-copy and when they should use post-copy, what would you tell them? The concrete patch and its benchmark/evaluation result will help much for making better discussion/decision (whatever decision we will make). My answer is, follow the same policy for block device case. It supports block migration/copy-on-read/image streaming/live block copy... (some of them are under development, though) Seriously, we'll learn the best practice through evaluation/making experiences. thanks, -- yamahata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy
On Thu, Dec 29, 2011 at 04:55:11PM +0200, Avi Kivity wrote: On 12/29/2011 04:49 PM, Isaku Yamahata wrote: Great, then we agreed with list/reattach basically. (Maybe identity scheme needs reconsideration.) I guess we miscommunicated. Why is reattach needed? If you have the fd, nothing else is needed. What if malicious process close the fd and does page fault intentionally? Unkillable process issue remains. I think we are talking not only qemu case but also general case. It's not unkillable. If you sleep with TASK_INTERRUPTIBLE then you can process signals. This includes SIGKILL. Hmm, you said that the fault handler doesn't resolve the page fault. Don't resolve the page fault. It's up to the user/system to make sure it happens. qemu can easily do it by watching for the daemon's death and respawning it. To kill the process, the fault handler must return resolving the fault. It must return something. What do you expect? VM_FAULT_SIGBUS? zero page? -- yamahata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] umem: chardevice for kvm postcopy
Thank you for review. On Thu, Dec 29, 2011 at 01:17:51PM +0200, Avi Kivity wrote: + default n + help + User process backed memory driver provides /dev/umem device. + The /dev/umem device is designed for some sort of distributed + shared memory. Especially post-copy live migration with KVM. + When in doubt, say N. + Need documentation of the protocol between the kernel and userspace; not just the ioctls, but also how faults are propagated. Will do. + +struct umem_page_req_list { + struct list_head list; + pgoff_t pgoff; +}; + + + +static int umem_mark_page_cached(struct umem *umem, +struct umem_page_cached *page_cached) +{ + int ret = 0; +#define PG_MAX ((__u32)32) + __u64 pgoffs[PG_MAX]; + __u32 nr; + unsigned long bit; + bool wake_up_list = false; + + nr = 0; + while (nr page_cached-nr) { + __u32 todo = min(PG_MAX, (page_cached-nr - nr)); + int i; + + if (copy_from_user(pgoffs, page_cached-pgoffs + nr, + sizeof(*pgoffs) * todo)) { + ret = -EFAULT; + goto out; + } + for (i = 0; i todo; ++i) { + if (pgoffs[i] = umem-pgoff_end) { + ret = -EINVAL; + goto out; + } + set_bit(pgoffs[i], umem-cached); + } + nr += todo; + } + Probably need an smp_wmb() where. + spin_lock(umem-lock); + bit = 0; + for (;;) { + bit = find_next_bit(umem-sync_wait_bitmap, umem-sync_req_max, + bit); + if (bit = umem-sync_req_max) + break; + if (test_bit(umem-sync_req[bit], umem-cached)) + wake_up(umem-page_wait[bit]); Why not do this test in the loop above? + bit++; + } + + if (umem-req_list_nr 0) + wake_up_list = true; + spin_unlock(umem-lock); + + if (wake_up_list) + wake_up_all(umem-req_list_wait); + +out: + return ret; +} + + + +static void umem_put(struct umem *umem) +{ + int ret; + + mutex_lock(umem_list_mutex); + ret = kref_put(umem-kref, umem_free); + if (ret == 0) { + mutex_unlock(umem_list_mutex); + } This looks wrong. +} + + +static int umem_create_umem(struct umem_create *create) +{ + int error = 0; + struct umem *umem = NULL; + struct vm_area_struct *vma; + int shmem_fd; + unsigned long bitmap_bytes; + unsigned long sync_bitmap_bytes; + int i; + + umem = kzalloc(sizeof(*umem), GFP_KERNEL); + umem-name = create-name; + kref_init(umem-kref); + INIT_LIST_HEAD(umem-list); + + mutex_lock(umem_list_mutex); + error = umem_add_list(umem); + if (error) { + goto out; + } + + umem-task = NULL; + umem-mmapped = false; + spin_lock_init(umem-lock); + umem-size = roundup(create-size, PAGE_SIZE); + umem-pgoff_end = umem-size PAGE_SHIFT; + init_waitqueue_head(umem-req_wait); + + vma = umem-vma; + vma-vm_start = 0; + vma-vm_end = umem-size; + /* this shmem file is used for temporal buffer for pages + so it's unlikely that so many pages exists in this shmem file */ + vma-vm_flags = VM_READ | VM_SHARED | VM_NOHUGEPAGE | VM_DONTCOPY | + VM_DONTEXPAND; + vma-vm_page_prot = vm_get_page_prot(vma-vm_flags); + vma-vm_pgoff = 0; + INIT_LIST_HEAD(vma-anon_vma_chain); + + shmem_fd = get_unused_fd(); + if (shmem_fd 0) { + error = shmem_fd; + goto out; + } + error = shmem_zero_setup(vma); + if (error 0) { + put_unused_fd(shmem_fd); + goto out; + } + umem-shmem_filp = vma-vm_file; + get_file(umem-shmem_filp); + fd_install(shmem_fd, vma-vm_file); + create-shmem_fd = shmem_fd; + + create-umem_fd = anon_inode_getfd(umem, + umem_fops, umem, O_RDWR); + if (create-umem_fd 0) { + error = create-umem_fd; + goto out; + } + + bitmap_bytes = umem_bitmap_bytes(umem); + if (bitmap_bytes PAGE_SIZE) { + umem-cached = vzalloc(bitmap_bytes); + umem-faulted = vzalloc(bitmap_bytes); + } else { + umem-cached = kzalloc(bitmap_bytes, GFP_KERNEL); + umem-faulted = kzalloc(bitmap_bytes, GFP_KERNEL); + } + + /* those constants are not exported. + They are just used for default value */ +#define KVM_MAX_VCPUS 256 +#define ASYNC_PF_PER_VCPU 64 Best to avoid defaults and require userspace choose. Okay. + +#define ASYNC_REQ_MAX (ASYNC_PF_PER_VCPU * KVM_MAX_VCPUS) + if (create-async_req_max == 0)
Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy
On Thu, Dec 29, 2011 at 02:55:42PM +0200, Avi Kivity wrote: On 12/29/2011 02:39 PM, Isaku Yamahata wrote: ioctl commands: UMEM_DEV_CRATE_UMEM: create umem device for qemu UMEM_DEV_LIST: list created umem devices UMEM_DEV_REATTACH: re-attach the created umem device UMEM_DEV_LIST and UMEM_DEV_REATTACH are used when the process that services page fault disappears or get stack. Then, administrator can list the umem devices and unblock the process which is waiting for page. Ah, I asked about this in my patch comments. I think this is done better by using SCM_RIGHTS to pass fds along, or asking qemu to launch a new process. Can you please elaborate? I think those ways you are suggesting doesn't solve the issue. Let me clarify the problem. process A (typically incoming qemu) | | mmap(/dev/umem) and access those pages triggering page faults | (the file descriptor might be closed after mmap() before page faults) | V /dev/umem ^ | | daemon X resolving page faults triggered by process A (typically this daemon forked from incoming qemu:process A) If daemon X disappears accidentally, there is no one that resolves page faults of process A. At this moment process A is blocked due to page fault. There is no file descriptor available corresponding to the VMA. Here there is no way to kill process A, but system reboot. qemu can have an extra thread that wait4()s the daemon, and relaunch it. This extra thread would not be blocked by the page fault. It can keep the fd so it isn't lost. The unkillability of process A is a security issue; it could be done on purpose. Is it possible to change umem to sleep with TASK_INTERRUPTIBLE, so it can be killed? The issue is how to solve the page fault, not whether TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE. I can think of several options. - When daemon X is dead, all page faults are served by zero pages. - When daemon X is dead, all page faults are resovled as VM_FAULT_SIGBUS - list/reattach: complications. You don't like it - other? Introducing a global namespace has a lot of complications attached. UMEM_GET_PAGE_REQUEST: retrieve page fault of qemu process UMEM_MARK_PAGE_CACHED: mark the specified pages pulled from the source for daemon UMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process This is _NOT_ implemented yet. anonymous I'm not sure whether this can be implemented or not. How do we find out? This is fairly important, stuff like transparent hugepages and ksm only works on anonymous memory. I agree that this is important. At KVM-forum 2011, Andrea said THP and KSM works with non-anonymous VMA. (Or at lease he'll look into those stuff. My memory is vague, though. Please correct me if I'm wrong) += Andrea (who can also provide feedback on umem in general) -- error compiling committee.c: too many arguments to function -- yamahata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html