Re: [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault

2012-11-01 Thread Isaku Yamahata
On Thu, Nov 01, 2012 at 02:10:45PM -0600, Eric Blake wrote:
 On 10/30/2012 02:33 AM, Isaku Yamahata wrote:
  When page is requested, send surrounding pages are also sent.
  
  Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
  ---
   hmp-commands.hx  |   15 -
   hmp.c|3 +++
   migration-postcopy.c |   57 
  +-
   migration.c  |   20 ++
   migration.h  |2 ++
   qapi-schema.json |3 ++-
   6 files changed, 89 insertions(+), 11 deletions(-)
  
  diff --git a/hmp-commands.hx b/hmp-commands.hx
  index b054760..5e2c77c 100644
  --- a/hmp-commands.hx
  +++ b/hmp-commands.hx
  @@ -826,26 +826,31 @@ ETEXI
   
   {
   .name   = migrate,
  -.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
  -.params = [-d] [-b] [-i] [-p [-n]] uri,
  +.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
  + forward:i?,backward:i?,
  +.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword],
 
 I don't care what we do to the 'migrate' HMP command, but for QMP...
 
  +++ b/qapi-schema.json
  @@ -2095,7 +2095,8 @@
   ##
   { 'command': 'migrate',
 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' 
  ,
  -   '*postcopy': 'bool', '*nobg': 'bool'} }
  +   '*postcopy': 'bool', '*nobg': 'bool',
  +   '*forward': 'int', '*backward': 'int'} }
 
 Do we really want to be adding new options to migrate (and if so,
 where's the documentation), or do we need a new monitor command similar
 to migrate-set-capabilities or migrate-set-cache-size?

Okay, migrate-set-capabilities seems usable for boolean and scalable
for future extension.
On the other hand, migrate-set-cache-size takes only single integer
as arguments. So it doesn't seem usable without modification.
How about this?

{ 'type': 'MigrationParameters',
  'data': {'parameter': 'name': 'str', 'value': 'int' } }

{ 'command': 'migrate-set-parameters',
   'data': { 'parameters' ['MigrationParameters']}}


{ 'command': 'query-migrate-parameters',
  'returns': [['MigrationParameters']]}
-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 02/35] arch_init: DPRINTF format error and typo

2012-10-30 Thread Isaku Yamahata
missing %
s/ram_save_live/ram_save_iterate/

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index e6effe8..79d4041 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -659,7 +659,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
 expected_downtime = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
-DPRINTF(ram_save_live: expected(% PRIu64 ) = max( PRIu64 )?\n,
+DPRINTF(ram_save_iterate: expected(% PRIu64 ) = max(% PRIu64 )?\n,
 expected_downtime, migrate_max_downtime());
 
 if (expected_downtime = migrate_max_downtime()) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/35] migration.c: remove redundant line in migrate_init()

2012-10-30 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/migration.c b/migration.c
index 62e0304..8fcb466 100644
--- a/migration.c
+++ b/migration.c
@@ -460,7 +460,6 @@ static MigrationState *migrate_init(const MigrationParams 
*params)
sizeof(enabled_capabilities));
 s-xbzrle_cache_size = xbzrle_cache_size;
 
-s-bandwidth_limit = bandwidth_limit;
 s-state = MIG_STATE_SETUP;
 s-total_time = qemu_get_clock_ms(rt_clock);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/35] osdep: add qemu_read_full() to read interrupt-safely

2012-10-30 Thread Isaku Yamahata
This is read counter part of qemu_write_full().

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 osdep.c   |   24 
 qemu-common.h |2 ++
 2 files changed, 26 insertions(+)

diff --git a/osdep.c b/osdep.c
index 3b25297..416ffe1 100644
--- a/osdep.c
+++ b/osdep.c
@@ -261,6 +261,30 @@ ssize_t qemu_write_full(int fd, const void *buf, size_t 
count)
 return total;
 }
 
+ssize_t qemu_read_full(int fd, void *buf, size_t count)
+{
+ssize_t ret = 0;
+ssize_t total = 0;
+
+while (count) {
+ret = read(fd, buf, count);
+if (ret  0) {
+if (errno == EINTR)
+continue;
+break;
+}
+if (ret == 0) {
+break;
+}
+
+count -= ret;
+buf += ret;
+total += ret;
+}
+
+return total;
+}
+
 /*
  * Opens a socket with FD_CLOEXEC set
  */
diff --git a/qemu-common.h b/qemu-common.h
index b54612b..16128c5 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -214,6 +214,8 @@ ssize_t qemu_write_full(int fd, const void *buf, size_t 
count)
 QEMU_WARN_UNUSED_RESULT;
 ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags)
 QEMU_WARN_UNUSED_RESULT;
+ssize_t qemu_read_full(int fd, void *buf, size_t count)
+QEMU_WARN_UNUSED_RESULT;
 ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags)
 QEMU_WARN_UNUSED_RESULT;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 14/35] arch_init: refactor ram_save_block() and export ram_save_block()

2012-10-30 Thread Isaku Yamahata
arch_init: factor out counting transferred bytes.
This will be used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v2 - v3:
- manual rebase
- report ram_save_block

Chnages v1 - v2:
- don't refer last_block which can be NULL.
  And avoid possible infinite loop.
---
 arch_init.c |  122 +++
 arch_init.h |5 +++
 migration.h |1 +
 3 files changed, 70 insertions(+), 58 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 23717d3..ad1b01b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -399,59 +399,77 @@ static void migration_bitmap_sync(void)
 }
 }
 
+static uint64_t bytes_transferred;
+
+/*
+ * ram_save_page: Writes a page of memory to the stream f
+ *
+ * Returns:  true:  page written
+ *   false: no page written
+ */
+static const RAMBlock *last_sent_block = NULL;
+bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
+   bool last_stage)
+{
+MemoryRegion *mr = block-mr;
+uint8_t *p;
+int cont;
+int bytes_sent = -1;
+ram_addr_t current_addr;
+
+if (!migration_bitmap_test_and_reset_dirty(mr, offset)) {
+return false;
+}
+
+cont = (block == last_sent_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
+last_sent_block = block;
+p = memory_region_get_ram_ptr(mr) + offset;
+if (is_dup_page(p)) {
+acct_info.dup_pages++;
+save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
+qemu_put_byte(f, *p);
+bytes_sent = 1;
+} else if (migrate_use_xbzrle()) {
+current_addr = block-offset + offset;
+bytes_sent = save_xbzrle_page(f, p, current_addr, block,
+  offset, cont, last_stage);
+if (!last_stage) {
+p = get_cached_data(XBZRLE.cache, current_addr);
+}
+}
+
+/* either we didn't send yet (we may have had XBZRLE overflow) */
+if (bytes_sent == -1) {
+save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
+qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+bytes_sent = TARGET_PAGE_SIZE;
+acct_info.norm_pages++;
+}
+
+bytes_transferred += bytes_sent;
+return true;
+}
+
 /*
  * ram_save_block: Writes a page of memory to the stream f
  *
- * Returns:  0: if the page hasn't changed
- *  -1: if there are no more dirty pages
- *   n: the amount of bytes written in other case
+ * Returns: true:  there may be more dirty pages
+ *  false: if there are no more dirty pages
  */
 
-static int ram_save_block(QEMUFile *f, bool last_stage)
+bool ram_save_block(QEMUFile *f, bool last_stage)
 {
 RAMBlock *block = last_block;
 ram_addr_t offset = last_offset;
-int bytes_sent = -1;
-MemoryRegion *mr;
-ram_addr_t current_addr;
+bool wrote = false;
 
 if (!block)
 block = QLIST_FIRST(ram_list.blocks);
 
 do {
-mr = block-mr;
-if (migration_bitmap_test_and_reset_dirty(mr, offset)) {
-uint8_t *p;
-int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
-
-p = memory_region_get_ram_ptr(mr) + offset;
-
-if (is_dup_page(p)) {
-acct_info.dup_pages++;
-save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
-qemu_put_byte(f, *p);
-bytes_sent = 1;
-} else if (migrate_use_xbzrle()) {
-current_addr = block-offset + offset;
-bytes_sent = save_xbzrle_page(f, p, current_addr, block,
-  offset, cont, last_stage);
-if (!last_stage) {
-p = get_cached_data(XBZRLE.cache, current_addr);
-}
-}
-
-/* either we didn't send yet (we may have had XBZRLE overflow) */
-if (bytes_sent == -1) {
-save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
-qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
-bytes_sent = TARGET_PAGE_SIZE;
-acct_info.norm_pages++;
-}
-
-/* if page is unmodified, continue to the next */
-if (bytes_sent != 0) {
-break;
-}
+wrote = ram_save_page(f, block, offset, last_stage);
+if (wrote) {
+break;
 }
 
 offset += TARGET_PAGE_SIZE;
@@ -466,11 +484,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 last_block = block;
 last_offset = offset;
 
-return bytes_sent;
+return wrote;
 }
 
-static uint64_t bytes_transferred;
-
 static ram_addr_t ram_save_remaining(void)
 {
 return migration_dirty_pages;
@@ -547,6 +563,7 @@ static void ram_migration_cancel(void *opaque)
 
 static void reset_ram_globals(void)
 {
+last_sent_block = NULL;
 last_block = NULL;
 last_offset = 0;
 last_version = ram_list.version;
@@ -618,14 +635,10

[PATCH v3 16/35] arch_init/ram_load: refactor ram_load

2012-10-30 Thread Isaku Yamahata
ram_load_page() will be used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v2 - v3:
- new
---
 arch_init.c |  137 +++
 arch_init.h |3 ++
 2 files changed, 74 insertions(+), 66 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 7e6d84e..c77e24d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -721,7 +721,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 return 0;
 }
 
-static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
+static int load_xbzrle(QEMUFile *f, void *host)
 {
 int ret, rc = 0;
 unsigned int xh_len;
@@ -792,12 +792,73 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 return NULL;
 }
 
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+/* Synchronize RAM block list */
+char id[256];
+ram_addr_t length;
+
+while (total_ram_bytes) {
+RAMBlock *block;
+uint8_t len;
+
+len = qemu_get_byte(f);
+qemu_get_buffer(f, (uint8_t *)id, len);
+id[len] = 0;
+length = qemu_get_be64(f);
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (!strncmp(id, block-idstr, sizeof(id))) {
+if (block-length != length)
+return -EINVAL;
+break;
+}
+}
+
+if (!block) {
+fprintf(stderr, Unknown ramblock \%s\, cannot 
+accept migration\n, id);
+return -EINVAL;
+}
+
+total_ram_bytes -= length;
+}
+
+return 0;
+}
+
+int ram_load_page(QEMUFile *f, void *host, int flags)
+{
+if (flags  RAM_SAVE_FLAG_COMPRESS) {
+uint8_t ch;
+ch = qemu_get_byte(f);
+memset(host, ch, TARGET_PAGE_SIZE);
+#ifndef _WIN32
+if (ch == 0 
+(!kvm_enabled() || kvm_has_sync_mmu())) {
+qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
+}
+#endif
+} else if (flags  RAM_SAVE_FLAG_PAGE) {
+qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+} else if (flags  RAM_SAVE_FLAG_XBZRLE) {
+if (!migrate_use_xbzrle()) {
+return -EINVAL;
+}
+if (load_xbzrle(f, host)  0) {
+return -EINVAL;
+}
+}
+return 0;
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 ram_addr_t addr;
 int flags, ret = 0;
 int error;
 static uint64_t seq_iter;
+void *host;
 
 seq_iter++;
 
@@ -813,82 +874,26 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 if (flags  RAM_SAVE_FLAG_MEM_SIZE) {
 if (version_id == 4) {
-/* Synchronize RAM block list */
-char id[256];
-ram_addr_t length;
-ram_addr_t total_ram_bytes = addr;
-
-while (total_ram_bytes) {
-RAMBlock *block;
-uint8_t len;
-
-len = qemu_get_byte(f);
-qemu_get_buffer(f, (uint8_t *)id, len);
-id[len] = 0;
-length = qemu_get_be64(f);
-
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id))) {
-if (block-length != length) {
-ret =  -EINVAL;
-goto done;
-}
-break;
-}
-}
-
-if (!block) {
-fprintf(stderr, Unknown ramblock \%s\, cannot 
-accept migration\n, id);
-ret = -EINVAL;
-goto done;
-}
-
-total_ram_bytes -= length;
+error = ram_load_mem_size(f, addr);
+if (error) {
+DPRINTF(error %d\n, error);
+return error;
 }
 }
 }
 
-if (flags  RAM_SAVE_FLAG_COMPRESS) {
-void *host;
-uint8_t ch;
-
-host = host_from_stream_offset(f, addr, flags);
-if (!host) {
-return -EINVAL;
-}
-
-ch = qemu_get_byte(f);
-memset(host, ch, TARGET_PAGE_SIZE);
-#ifndef _WIN32
-if (ch == 0 
-(!kvm_enabled() || kvm_has_sync_mmu())) {
-qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
-}
-#endif
-} else if (flags  RAM_SAVE_FLAG_PAGE) {
-void *host;
-
+if (flags  (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
+ RAM_SAVE_FLAG_XBZRLE)) {
 host = host_from_stream_offset(f, addr, flags);
 if (!host) {
 return -EINVAL;
 }
-
-qemu_get_buffer(f, host

[PATCH v3 08/35] savevm/QEMUFile: consolidate QEMUFile functions a bit

2012-10-30 Thread Isaku Yamahata
- add qemu_file_fd() for later use
- drop qemu_stdio_fd
  Now qemu_file_fd() replaces qemu_stdio_fd().
- savevm/QEMUFileSocket: drop duplicated member fd
  fd is already stored in QEMUFile so drop duplicated member
   QEMUFileSocket::fd.
- remove QEMUFileSocket

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration-exec.c |4 ++--
 migration-fd.c   |2 +-
 qemu-file.h  |2 +-
 savevm.c |   40 +++-
 4 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 6c97db9..95e9779 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -98,7 +98,7 @@ static void exec_accept_incoming_migration(void *opaque)
 QEMUFile *f = opaque;
 
 process_incoming_migration(f);
-qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
 qemu_fclose(f);
 }
 
@@ -113,7 +113,7 @@ int exec_start_incoming_migration(const char *command)
 return -errno;
 }
 
-qemu_set_fd_handler2(qemu_stdio_fd(f), NULL,
+qemu_set_fd_handler2(qemu_file_fd(f), NULL,
 exec_accept_incoming_migration, NULL, f);
 
 return 0;
diff --git a/migration-fd.c b/migration-fd.c
index 7335167..b3c54e5 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -104,7 +104,7 @@ static void fd_accept_incoming_migration(void *opaque)
 QEMUFile *f = opaque;
 
 process_incoming_migration(f);
-qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
 qemu_fclose(f);
 }
 
diff --git a/qemu-file.h b/qemu-file.h
index 9b6dd08..bc222dc 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -70,7 +70,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
-int qemu_stdio_fd(QEMUFile *f);
+int qemu_file_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int qemu_fflush(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
diff --git a/savevm.c b/savevm.c
index 0c7af43..e24041b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -178,6 +178,7 @@ struct QEMUFile {
 uint8_t buf[IO_BUF_SIZE];
 
 int last_error;
+int fd; /* -1 means fd isn't associated */
 };
 
 typedef struct QEMUFileStdio
@@ -186,19 +187,18 @@ typedef struct QEMUFileStdio
 QEMUFile *file;
 } QEMUFileStdio;
 
-typedef struct QEMUFileSocket
+typedef struct QEMUFileFD
 {
-int fd;
 QEMUFile *file;
-} QEMUFileSocket;
+} QEMUFileFD;
 
 static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
 {
-QEMUFileSocket *s = opaque;
+QEMUFileFD *s = opaque;
 ssize_t len;
 
 do {
-len = qemu_recv(s-fd, buf, size, 0);
+len = qemu_recv(s-file-fd, buf, size, 0);
 } while (len == -1  socket_error() == EINTR);
 
 if (len == -1)
@@ -207,9 +207,9 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
-static int socket_close(void *opaque)
+static int fd_close(void *opaque)
 {
-QEMUFileSocket *s = opaque;
+QEMUFileFD *s = opaque;
 g_free(s);
 return 0;
 }
@@ -276,6 +276,7 @@ QEMUFile *qemu_popen(FILE *stdio_file, const char *mode)
 s-file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_pclose, 
 NULL, NULL, NULL);
 }
+s-file-fd = fileno(stdio_file);
 return s-file;
 }
 
@@ -291,17 +292,6 @@ QEMUFile *qemu_popen_cmd(const char *command, const char 
*mode)
 return qemu_popen(popen_file, mode);
 }
 
-int qemu_stdio_fd(QEMUFile *f)
-{
-QEMUFileStdio *p;
-int fd;
-
-p = (QEMUFileStdio *)f-opaque;
-fd = fileno(p-stdio_file);
-
-return fd;
-}
-
 QEMUFile *qemu_fdopen(int fd, const char *mode)
 {
 QEMUFileStdio *s;
@@ -325,6 +315,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 s-file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_fclose, 
 NULL, NULL, NULL);
 }
+s-file-fd = fd;
 return s-file;
 
 fail:
@@ -334,11 +325,11 @@ fail:
 
 QEMUFile *qemu_fopen_socket(int fd)
 {
-QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket));
+QEMUFileFD *s = g_malloc0(sizeof(QEMUFileFD));
 
-s-fd = fd;
-s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
+s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, fd_close,
 NULL, NULL, NULL);
+s-file-fd = fd;
 return s-file;
 }
 
@@ -381,6 +372,7 @@ QEMUFile *qemu_fopen(const char *filename, const char *mode)
 s-file = qemu_fopen_ops(s, NULL, file_get_buffer, stdio_fclose, 
   NULL, NULL, NULL);
 }
+s-file-fd = fileno(s-stdio_file);
 return s-file;
 fail:
 g_free(s);
@@ -431,10 +423,16 @@ QEMUFile

[PATCH v3 10/35] savevm/QEMUFile: add read/write QEMUFile on memory buffer

2012-10-30 Thread Isaku Yamahata
This will be used by postcopy/incoming part.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |4 
 savevm.c|   60 +++
 2 files changed, 64 insertions(+)

diff --git a/qemu-file.h b/qemu-file.h
index 94557ea..452efcd 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -71,6 +71,10 @@ QEMUFile *qemu_fopen_socket(int fd);
 QEMUFile *qemu_fopen_fd(int fd, const char *mode);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
+struct QEMUFileBuf;
+typedef struct QEMUFileBuf QEMUFileBuf;
+QEMUFileBuf *qemu_fopen_buf_write(void);
+QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size);
 int qemu_file_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int qemu_fflush(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index 712b7ae..7e55dce 100644
--- a/savevm.c
+++ b/savevm.c
@@ -368,6 +368,66 @@ QEMUFile *qemu_fopen_fd(int fd, const char *mode)
 return s-file;
 }
 
+struct QEMUFileBuf {
+QEMUFile *file;
+uint8_t *buffer;
+size_t buffer_size;
+size_t buffer_capacity;
+};
+
+static int buf_close(void *opaque)
+{
+QEMUFileBuf *s = opaque;
+g_free(s-buffer);
+g_free(s);
+return 0;
+}
+
+static int buf_put_buffer(void *opaque,
+  const uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileBuf *s = opaque;
+
+int inc = size - (s-buffer_capacity - s-buffer_size);
+if (inc  0) {
+s-buffer_capacity += DIV_ROUND_UP(inc, IO_BUF_SIZE) * IO_BUF_SIZE;
+s-buffer = g_realloc(s-buffer, s-buffer_capacity);
+}
+memcpy(s-buffer + s-buffer_size, buf, size);
+s-buffer_size += size;
+
+return size;
+}
+
+QEMUFileBuf *qemu_fopen_buf_write(void)
+{
+QEMUFileBuf *s = g_malloc0(sizeof(*s));
+s-file = qemu_fopen_ops(s,  buf_put_buffer, NULL, buf_close,
+ NULL, NULL, NULL);
+return s;
+}
+
+static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileBuf *s = opaque;
+ssize_t len = MIN(size, s-buffer_capacity - s-buffer_size);
+memcpy(buf, s-buffer + s-buffer_size, len);
+s-buffer_size += len;
+return len;
+}
+
+/* This gets the ownership of buf. */
+QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size)
+{
+QEMUFileBuf *s = g_malloc0(sizeof(*s));
+s-buffer = buf;
+s-buffer_size = 0; /* this is used as index to read */
+s-buffer_capacity = size;
+s-file = qemu_fopen_ops(s, NULL, buf_get_buffer, buf_close,
+ NULL, NULL, NULL);
+return s-file;
+}
+
 static int file_put_buffer(void *opaque, const uint8_t *buf,
 int64_t pos, int size)
 {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 33/35] arch_init: export migration_bitmap_sync and helper method to get bitmap

2012-10-30 Thread Isaku Yamahata
Those migration bitmap operation will be used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |7 ++-
 migration.h |2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 48f45cd..49fbaff 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -345,6 +345,11 @@ void migration_bitmap_free(void)
 migration_bitmap = NULL;
 }
 
+const unsigned long *migration_bitmap_get(void)
+{
+return migration_bitmap;
+}
+
 static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr,
  ram_addr_t offset)
 {
@@ -373,7 +378,7 @@ static inline bool migration_bitmap_set_dirty(MemoryRegion 
*mr,
 return ret;
 }
 
-static void migration_bitmap_sync(void)
+void migration_bitmap_sync(void)
 {
 RAMBlock *block;
 ram_addr_t addr;
diff --git a/migration.h b/migration.h
index 6cc3682..2801e7e 100644
--- a/migration.h
+++ b/migration.h
@@ -111,6 +111,8 @@ uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
 void migration_bitmap_init(void);
 void migration_bitmap_free(void);
+const unsigned long *migration_bitmap_get(void);
+void migration_bitmap_sync(void);
 
 extern SaveVMHandlers savevm_ram_handlers;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 34/35] postcopy/outgoing: introduce precopy_count parameter

2012-10-30 Thread Isaku Yamahata
Precopy with this loop number before postcopy mode.
This will be implemented by the next patch.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hmp-commands.hx  |   10 ++
 hmp.c|2 ++
 migration-postcopy.c |2 +-
 migration.c  |2 ++
 migration.h  |3 ++-
 qapi-schema.json |4 +++-
 qmp-commands.hx  |2 +-
 savevm.c |3 ++-
 8 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 942f620..957bf76 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -826,9 +826,10 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = 
detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s,
- forward:i?,backward:i?,
-.params = [-d] [-b] [-i] [-p [-n] [-m]] uri [forward] [backword],
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,
+ uri:s,precopy_count:i?,forward:i?,backward:i?,
+.params = [-d] [-b] [-i] [-p [-n] [-m]] uri 
+ [precopy_count] [forward] [backword],
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
@@ -837,6 +838,7 @@ ETEXI
  \n\t\t\t-p for migration with postcopy mode enabled
  \n\t\t\t-m for move background transfer of postcopy mode
  \n\t\t\t-n for no background transfer of postcopy mode
+ \n\t\t\tprecopy_count: loop of precopy when postcopy
  \n\t\t\tforward: the number of pages to 
  forward-prefault when postcopy (default 0)
  \n\t\t\tbackward: the number of pages to 
@@ -846,7 +848,7 @@ ETEXI
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p [-n] [-m]] @var{uri} @var{forward} 
@var{backward}
+@item migrate [-d] [-b] [-i] [-p [-n] [-m]] @var{uri} @var{precopy_count} 
@var{forward} @var{backward}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-b for migration with full copy of disk
diff --git a/hmp.c b/hmp.c
index a0bd869..be88db9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1038,6 +1038,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
 int movebg = qdict_get_try_bool(qdict, movebg, 0);
 int nobg = qdict_get_try_bool(qdict, nobg, 0);
+int precopy_count = qdict_get_try_int(qdict, precopy_count, 0);
 int forward = qdict_get_try_int(qdict, forward, 0);
 int backward = qdict_get_try_int(qdict, backward, 0);
 const char *uri = qdict_get_str(qdict, uri);
@@ -1045,6 +1046,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 
 qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
 !!postcopy, postcopy, !!movebg, movebg, !!nobg, nobg,
+!!precopy_count, precopy_count,
 !!forward, forward, !!backward, backward,
 err);
 if (err) {
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 9298cd4..8a43c42 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -319,7 +319,7 @@ int postcopy_outgoing_create_read_socket(MigrationState *s)
 return 0;
 }
 
-void postcopy_outgoing_state_begin(QEMUFile *f)
+void postcopy_outgoing_state_begin(QEMUFile *f, const MigrationParams *params)
 {
 uint64_t options = 0;
 qemu_put_ubyte(f, QEMU_VM_POSTCOPY_INIT);
diff --git a/migration.c b/migration.c
index 057ea31..84ca4b3 100644
--- a/migration.c
+++ b/migration.c
@@ -513,6 +513,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_postcopy, bool postcopy,
  bool has_movebg, bool movebg,
  bool has_nobg, bool nobg,
+ bool has_precopy_count, int64_t precopy_count,
  bool has_forward, int64_t forward,
  bool has_backward, int64_t backward,
  Error **errp)
@@ -527,6 +528,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 params.postcopy = postcopy;
 params.nobg = nobg;
 params.movebg = movebg;
+params.precopy_count = precopy_count,
 params.prefault_forward = 0;
 if (has_forward) {
 if (forward  0) {
diff --git a/migration.h b/migration.h
index 2801e7e..c4d7b0a 100644
--- a/migration.h
+++ b/migration.h
@@ -27,6 +27,7 @@ struct MigrationParams {
 bool postcopy;
 bool nobg;
 bool movebg;
+int precopy_count;
 int64_t prefault_forward;
 int64_t prefault_backward;
 };
@@ -150,7 +151,7 @@ int64_t xbzrle_cache_resize(int64_t new_size);
 
 /* For outgoing postcopy */
 int postcopy_outgoing_create_read_socket(MigrationState *s);
-void postcopy_outgoing_state_begin(QEMUFile *f);
+void postcopy_outgoing_state_begin(QEMUFile *f, const MigrationParams

[PATCH v3 31/35] arch_init: export ram_save_iterate()

2012-10-30 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   11 ---
 arch_init.h |1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index f86a0b4..48f45cd 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -633,7 +633,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 return 0;
 }
 
-static int ram_save_iterate(QEMUFile *f, void *opaque)
+int ram_save_iterate(QEMUFile *f)
 {
 uint64_t bytes_transferred_last;
 double bwidth = 0;
@@ -705,6 +705,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 return 0;
 }
 
+static int ram_save_iterate_bwidth(QEMUFile *f, void *opaque)
+{
+return ram_save_iterate(f);
+}
+
 static int ram_save_complete(QEMUFile *f, void *opaque)
 {
 migration_bitmap_sync();
@@ -937,7 +942,7 @@ static void ram_save_set_params(const MigrationParams 
*params, void *opaque)
 savevm_ram_handlers.save_live_complete =
 postcopy_outgoing_ram_save_complete;
 } else {
-savevm_ram_handlers.save_live_iterate = ram_save_iterate;
+savevm_ram_handlers.save_live_iterate = ram_save_iterate_bwidth;
 savevm_ram_handlers.save_live_complete = ram_save_complete;
 }
 }
@@ -945,7 +950,7 @@ static void ram_save_set_params(const MigrationParams 
*params, void *opaque)
 SaveVMHandlers savevm_ram_handlers = {
 .set_params = ram_save_set_params,
 .save_live_setup = ram_save_setup,
-.save_live_iterate = ram_save_iterate,
+.save_live_iterate = ram_save_iterate_bwidth,
 .save_live_complete = ram_save_complete,
 .load_state = ram_load_precopy,
 .cancel = ram_migration_cancel,
diff --git a/arch_init.h b/arch_init.h
index 3977ca7..966b25a 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -47,6 +47,7 @@ CpuDefinitionInfoList GCC_WEAK_DECL 
*arch_query_cpu_definitions(Error **errp);
 #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
 
 int ram_load_page(QEMUFile *f, void *host, int flags);
+int ram_save_iterate(QEMUFile *f);
 
 #if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
 void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 32/35] postcopy: pre+post optimization incoming side

2012-10-30 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration-postcopy.c |  207 +-
 1 file changed, 204 insertions(+), 3 deletions(-)

diff --git a/migration-postcopy.c b/migration-postcopy.c
index 421fb39..9298cd4 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -274,6 +274,9 @@ static void postcopy_outgoing_free_req(struct qemu_umem_req 
*req)
 #define QEMU_VM_POSTCOPY_INIT   0
 #define QEMU_VM_POSTCOPY_SECTION_FULL   1
 
+/* options in QEMU_VM_POSTCOPY_INIT section */
+#define POSTCOPY_OPTION_PRECOPY 1ULL
+
 /***
  * outgoing part
  */
@@ -739,6 +742,7 @@ struct PostcopyIncomingUMemDaemon {
 int nr_target_pages_per_host_page;
 int target_to_host_page_shift;
 int version_id; /* save/load format version id */
+bool precopy_enabled;
 
 QemuThread thread;
 QLIST_HEAD(, UMemBlock) blocks;
@@ -784,6 +788,7 @@ static PostcopyIncomingState state = {
 
 static PostcopyIncomingUMemDaemon umemd = {
 .state = 0,
+.precopy_enabled = false,
 .to_qemu_fd = -1,
 .to_qemu = NULL,
 .from_qemu_fd = -1,
@@ -797,6 +802,8 @@ static PostcopyIncomingUMemDaemon umemd = {
 
 static void *postcopy_incoming_umemd(void*);
 static void postcopy_incoming_qemu_handle_req(void *opaque);
+static UMemBlock *postcopy_incoming_umem_block_from_stream(
+QEMUFile *f, int flags);
 
 /* protected by qemu_mutex_lock_ramlist() */
 void postcopy_incoming_ram_free(RAMBlock *ram_block)
@@ -875,6 +882,25 @@ int postcopy_incoming_ram_load(QEMUFile *f, void *opaque, 
int version_id)
 return -EINVAL;
 }
 
+static void*
+postcopy_incoming_shmem_from_stream_offset(QEMUFile *f, ram_addr_t offset,
+   int flags)
+{
+UMemBlock *block = postcopy_incoming_umem_block_from_stream(f, flags);
+if (block == NULL) {
+DPRINTF(error block = NULL\n);
+return NULL;
+}
+return block-umem-shmem + offset;
+}
+
+static int postcopy_incoming_ram_load_precopy(QEMUFile *f, void *opaque,
+  int version_id)
+{
+return ram_load(f, opaque, version_id,
+postcopy_incoming_shmem_from_stream_offset);
+}
+
 static void postcopy_incoming_umem_block_free(void)
 {
 UMemBlock *block;
@@ -982,6 +1008,12 @@ static int postcopy_incoming_loadvm_init(QEMUFile *f, 
uint32_t size)
 return -EINVAL;
 }
 options = qemu_get_be64(f);
+if (options  POSTCOPY_OPTION_PRECOPY) {
+options = ~POSTCOPY_OPTION_PRECOPY;
+umemd.precopy_enabled = true;
+} else {
+umemd.precopy_enabled = false;
+}
 if (options) {
 fprintf(stderr, unknown options 0x%PRIx64, options);
 return -ENOSYS;
@@ -999,12 +1031,17 @@ static int postcopy_incoming_loadvm_init(QEMUFile *f, 
uint32_t size)
 return -ENOSYS;
 }
 
-DPRINTF(detected POSTCOPY\n);
+DPRINTF(detected POSTCOPY precpoy %d\n, umemd.precopy_enabled);
 error = postcopy_incoming_prepare();
 if (error) {
 return error;
 }
-savevm_ram_handlers.load_state = postcopy_incoming_ram_load;
+if (umemd.precopy_enabled) {
+savevm_ram_handlers.load_state = postcopy_incoming_ram_load_precopy;
+} else {
+savevm_ram_handlers.load_state = postcopy_incoming_ram_load;
+}
+
 incoming_postcopy = true;
 return 0;
 }
@@ -1515,6 +1552,169 @@ static int postcopy_incoming_umem_ram_load(void)
 return 0;
 }
 
+static int postcopy_incoming_umemd_read_dirty_bitmap(
+QEMUFile *f, const char *idstr, uint8_t idlen,
+uint64_t block_offset, uint64_t block_length, uint64_t bitmap_length)
+{
+UMemBlock *block;
+uint64_t bit_start = block_offset  TARGET_PAGE_BITS;
+uint64_t bit_end = (block_offset + block_length)  TARGET_PAGE_BITS;
+uint64_t bit_offset;
+uint8_t *buffer;
+uint64_t index;
+
+if ((bitmap_length % sizeof(uint64_t)) != 0) {
+return -EINVAL;
+}
+QLIST_FOREACH(block, umemd.blocks, next) {
+if (!strncmp(block-idstr, idstr, idlen)) {
+break;
+}
+}
+if (block == NULL) {
+return -EINVAL;
+}
+
+DPRINTF(bitmap %s 0x%PRIx64 0x%PRIx64 0x%PRIx64\n,
+block-idstr, block_offset, block_length, bitmap_length);
+buffer = g_malloc(bitmap_length);
+qemu_get_buffer(f, buffer, bitmap_length);
+
+bit_offset = bit_start  ~63;
+index = 0;
+while (index  bitmap_length) {
+uint64_t bitmap;
+int i;
+int j;
+int bit;
+
+bitmap = be64_to_cpup((uint64_t*)(buffer + index));
+for (i = 0; i  64; i++) {
+bit = bit_offset + i;
+if (bit  bit_start) {
+continue;
+}
+if (bit = bit_end) {
+break;
+}
+if (!(bitmap  (1ULL  i))) {
+set_bit

[PATCH v3 35/35] postcopy: pre+post optimization outgoing side

2012-10-30 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c  |6 ++--
 migration-postcopy.c |   94 +++---
 migration.h  |1 +
 3 files changed, 94 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 49fbaff..f9bd483 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -502,8 +502,10 @@ bool ram_save_block(QEMUFile *f, bool last_stage)
 if (offset = block-length) {
 offset = 0;
 block = QLIST_NEXT(block, next);
-if (!block)
+if (!block) {
 block = QLIST_FIRST(ram_list.blocks);
+migrate_get_current()-precopy_count++;
+}
 }
 } while (block != last_block || offset != last_offset);
 
@@ -619,7 +621,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 acct_clear();
 }
 
-if (!params-postcopy) {
+if (!(params-postcopy  params-precopy_count == 0)) {
 memory_global_dirty_log_start();
 migration_bitmap_sync();
 }
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 8a43c42..3f63385 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -322,6 +322,10 @@ int postcopy_outgoing_create_read_socket(MigrationState *s)
 void postcopy_outgoing_state_begin(QEMUFile *f, const MigrationParams *params)
 {
 uint64_t options = 0;
+if (params-precopy_count  0) {
+options |= POSTCOPY_OPTION_PRECOPY;
+}
+
 qemu_put_ubyte(f, QEMU_VM_POSTCOPY_INIT);
 qemu_put_be32(f, sizeof(options));
 qemu_put_be64(f, options);
@@ -337,12 +341,36 @@ void postcopy_outgoing_state_complete(
 
 int postcopy_outgoing_ram_save_iterate(QEMUFile *f, void *opaque)
 {
-qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-return 1;
+int ret;
+MigrationState *s = migrate_get_current();
+if (s-params.precopy_count == 0) {
+qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+return 1;
+}
+
+ret = ram_save_iterate(f);
+if (ret  0) {
+return ret;
+}
+if (ret == 1) {
+DPRINTF(precopy worked\n);
+return ret;
+}
+if (ram_bytes_remaining() == 0) {
+DPRINTF(no more precopy\n);
+return 1;
+}
+return s-precopy_count = s-params.precopy_count? 1: 0;
 }
 
 int postcopy_outgoing_ram_save_complete(QEMUFile *f, void *opaque)
 {
+MigrationState *s = migrate_get_current();
+if (s-params.precopy_count  0) {
+/* Make sure all dirty bits are set */
+migration_bitmap_sync();
+memory_global_dirty_log_stop();
+}
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 return 0;
 }
@@ -544,6 +572,7 @@ static void postcopy_outgoing_recv_handler(void *opaque)
 PostcopyOutgoingState *postcopy_outgoing_begin(MigrationState *ms)
 {
 PostcopyOutgoingState *s = g_new(PostcopyOutgoingState, 1);
+const RAMBlock *block;
 DPRINTF(outgoing begin\n);
 qemu_buffered_file_drain(ms-file);
 
@@ -553,9 +582,64 @@ PostcopyOutgoingState 
*postcopy_outgoing_begin(MigrationState *ms)
 s-mig_read = ms-file_read;
 s-mig_buffered_write = ms-file;
 
-/* Make sure all dirty bits are set */
-memory_global_dirty_log_stop();
-migration_bitmap_init();
+if (ms-params.precopy_count  0) {
+QEMUFile *f = ms-file;
+uint64_t last_long =
+BITS_TO_LONGS(last_ram_offset()  TARGET_PAGE_BITS);
+
+/* send dirty bitmap */
+qemu_mutex_lock_ramlist();
+QLIST_FOREACH(block, ram_list.blocks, next) {
+const unsigned long *bitmap = migration_bitmap_get();
+uint64_t length;
+uint64_t start;
+uint64_t end;
+uint64_t i;
+
+qemu_put_byte(f, strlen(block-idstr));
+qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
+qemu_put_be64(f, block-offset);
+qemu_put_be64(f, block-length);
+
+start = (block-offset  TARGET_PAGE_BITS);
+end = (block-offset + block-length)  TARGET_PAGE_BITS;
+
+length = BITS_TO_LONGS(end - (start  ~63)) * sizeof(unsigned 
long);
+length = DIV_ROUND_UP(length, sizeof(uint64_t)) * sizeof(uint64_t);
+qemu_put_be64(f, length);
+DPRINTF(dirty bitmap %s 0x%PRIx64 0x%PRIx64 0x%PRIx64\n,
+block-idstr, block-offset, block-length, length);
+
+start /= BITS_PER_LONG;
+end = DIV_ROUND_UP(end, BITS_PER_LONG);
+assert(end = last_long);
+
+for (i = start; i  end;
+ i += sizeof(uint64_t) / sizeof(unsigned long)) {
+uint64_t val;
+#if HOST_LONG_BITS == 64
+val = bitmap[i];
+#elif HOST_LONG_BITS == 32
+if (i + 1  last_long) {
+val = bitmap[i] | ((uint64_t)bitmap[i + 1]  32);
+} else {
+val = bitmap[i];
+}
+#else
+# error unsupported
+#endif

[PATCH v3 29/35] postcopy/outgoing: add movebg mode(-m) to migration command

2012-10-30 Thread Isaku Yamahata
When movebg mode is enabled, the point to send background page is set
to the next page to on-demand page.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hmp-commands.hx  |8 +---
 hmp.c|3 ++-
 migration-postcopy.c |8 
 migration.c  |5 -
 migration.h  |1 +
 qapi-schema.json |2 +-
 qmp-commands.hx  |2 +-
 savevm.c |1 +
 8 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 5e2c77c..942f620 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -826,15 +826,16 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+.args_type  = 
detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s,
  forward:i?,backward:i?,
-.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword],
+.params = [-d] [-b] [-i] [-p [-n] [-m]] uri [forward] [backword],
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
  (base image shared between src and destination)
  \n\t\t\t-p for migration with postcopy mode enabled
+ \n\t\t\t-m for move background transfer of postcopy mode
  \n\t\t\t-n for no background transfer of postcopy mode
  \n\t\t\tforward: the number of pages to 
  forward-prefault when postcopy (default 0)
@@ -845,12 +846,13 @@ ETEXI
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @var{forward} @var{backward}
+@item migrate [-d] [-b] [-i] [-p [-n] [-m]] @var{uri} @var{forward} 
@var{backward}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-b for migration with full copy of disk
-i for migration with incremental copy of disk (base image is shared)
-p for migration with postcopy mode enabled (forward/backward is 
prefault size when postcopy)
+   -m for migratoin with postcopy mode enabled with moving position
-n for migration with postcopy mode enabled without background transfer
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index fb1275d..a0bd869 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1036,6 +1036,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int blk = qdict_get_try_bool(qdict, blk, 0);
 int inc = qdict_get_try_bool(qdict, inc, 0);
 int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
+int movebg = qdict_get_try_bool(qdict, movebg, 0);
 int nobg = qdict_get_try_bool(qdict, nobg, 0);
 int forward = qdict_get_try_int(qdict, forward, 0);
 int backward = qdict_get_try_int(qdict, backward, 0);
@@ -1043,7 +1044,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 Error *err = NULL;
 
 qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
-!!postcopy, postcopy, !!nobg, nobg,
+!!postcopy, postcopy, !!movebg, movebg, !!nobg, nobg,
 !!forward, forward, !!backward, backward,
 err);
 if (err) {
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 3d51898..421fb39 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -432,6 +432,14 @@ static int 
postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
 true, j);
 }
 }
+if (s-ms-params.movebg) {
+ram_addr_t last_offset =
+(req-pgoffs[req-nr - 1] + s-ms-params.prefault_forward) 
+TARGET_PAGE_BITS;
+last_offset = MIN(last_offset,
+  s-last_block_read-length - TARGET_PAGE_SIZE);
+ram_save_set_last_block(s-last_block_read, last_offset);
+}
 /* backward prefault */
 for (j = 1; j = s-ms-params.prefault_backward; j++) {
 for (i = 0; i  req-nr; i++) {
diff --git a/migration.c b/migration.c
index f29e3bb..057ea31 100644
--- a/migration.c
+++ b/migration.c
@@ -510,7 +510,9 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_inc, bool inc, bool has_detach, bool detach,
- bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+ bool has_postcopy, bool postcopy,
+ bool has_movebg, bool movebg,
+ bool has_nobg, bool nobg,
  bool has_forward, int64_t forward,
  bool has_backward, int64_t backward,
  Error **errp)
@@ -524,6 +526,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 params.shared = inc;
 params.postcopy = postcopy;
 params.nobg

[PATCH v3 30/35] arch_init: factor out ram_load

2012-10-30 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   13 ++---
 arch_init.h |3 +++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9137013..f86a0b4 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -867,7 +867,9 @@ int ram_load_page(QEMUFile *f, void *host, int flags)
 return 0;
 }
 
-static int ram_load(QEMUFile *f, void *opaque, int version_id)
+int ram_load(QEMUFile *f, void *opaque, int version_id,
+ void *(host_from_stream_offset_p)(QEMUFile *f,
+   ram_addr_t offsset, int flags))
 {
 ram_addr_t addr;
 int flags, ret = 0;
@@ -899,7 +901,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 if (flags  (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
  RAM_SAVE_FLAG_XBZRLE)) {
-host = host_from_stream_offset(f, addr, flags);
+host = host_from_stream_offset_p(f, addr, flags);
 if (!host) {
 return -EINVAL;
 }
@@ -922,6 +924,11 @@ done:
 return ret;
 }
 
+static int ram_load_precopy(QEMUFile *f, void *opaque, int version_id)
+{
+return ram_load(f, opaque, version_id, host_from_stream_offset);
+}
+
 static void ram_save_set_params(const MigrationParams *params, void *opaque)
 {
 if (params-postcopy) {
@@ -940,7 +947,7 @@ SaveVMHandlers savevm_ram_handlers = {
 .save_live_setup = ram_save_setup,
 .save_live_iterate = ram_save_iterate,
 .save_live_complete = ram_save_complete,
-.load_state = ram_load,
+.load_state = ram_load_precopy,
 .cancel = ram_migration_cancel,
 };
 
diff --git a/arch_init.h b/arch_init.h
index 9165456..3977ca7 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -54,6 +54,9 @@ bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t 
offset,
bool last_stage);
 RAMBlock *ram_find_block(const char *id, uint8_t len);
 int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes);
+int ram_load(QEMUFile *f, void *opaque, int version_id,
+ void *(host_from_stream_offset_p)(QEMUFile *f,
+   ram_addr_t offsset, int flags));
 #endif
 
 #endif
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 28/35] arch_init: factor out setting last_block, last_offset

2012-10-30 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   10 +++---
 arch_init.h |1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d95ce7b..9137013 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -416,6 +416,12 @@ static void migration_bitmap_sync(void)
 
 static uint64_t bytes_transferred;
 
+void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset)
+{
+last_block = block;
+last_offset = offset;
+}
+
 /*
  * ram_save_page: Writes a page of memory to the stream f
  *
@@ -496,9 +502,7 @@ bool ram_save_block(QEMUFile *f, bool last_stage)
 }
 } while (block != last_block || offset != last_offset);
 
-last_block = block;
-last_offset = offset;
-
+ram_save_set_last_block(block, offset);
 return wrote;
 }
 
diff --git a/arch_init.h b/arch_init.h
index 499d0f1..9165456 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -49,6 +49,7 @@ CpuDefinitionInfoList GCC_WEAK_DECL 
*arch_query_cpu_definitions(Error **errp);
 int ram_load_page(QEMUFile *f, void *host, int flags);
 
 #if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
+void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset);
 bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
bool last_stage);
 RAMBlock *ram_find_block(const char *id, uint8_t len);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 26/35] postcopy/outgoing: add -n options to disable background transfer

2012-10-30 Thread Isaku Yamahata
This is for benchmark purpose

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hmp-commands.hx  |   10 ++
 hmp.c|4 +++-
 migration-postcopy.c |7 +++
 migration.c  |4 +++-
 migration.h  |1 +
 qapi-schema.json |2 +-
 qmp-commands.hx  |3 ++-
 savevm.c |1 +
 8 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index f2f1264..b054760 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -826,25 +826,27 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s,
-.params = [-d] [-b] [-i] [-p] uri,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+.params = [-d] [-b] [-i] [-p [-n]] uri,
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
  (base image shared between src and destination)
- \n\t\t\t-p for migration with postcopy mode enabled,
+ \n\t\t\t-p for migration with postcopy mode enabled
+ \n\t\t\t-n for no background transfer of postcopy mode,
 .mhandler.cmd = hmp_migrate,
 },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-b for migration with full copy of disk
-i for migration with incremental copy of disk (base image is shared)
-p for migration with postcopy mode enabled
+   -n for migration with postcopy mode enabled without background transfer
 ETEXI
 
 {
diff --git a/hmp.c b/hmp.c
index 2ea3bc4..203b552 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1036,11 +1036,13 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int blk = qdict_get_try_bool(qdict, blk, 0);
 int inc = qdict_get_try_bool(qdict, inc, 0);
 int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
+int nobg = qdict_get_try_bool(qdict, nobg, 0);
 const char *uri = qdict_get_str(qdict, uri);
 Error *err = NULL;
 
 qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
-!!postcopy, postcopy, err);
+!!postcopy, postcopy, !!nobg, nobg,
+err);
 if (err) {
 monitor_printf(mon, migrate: %s\n, error_get_pretty(err));
 error_free(err);
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 399e233..5f98ae6 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -557,6 +557,13 @@ int postcopy_outgoing_ram_save_background(QEMUFile *f, 
void *postcopy)
 abort();
 }
 
+if (s-ms-params.nobg) {
+if (ram_bytes_remaining() == 0) {
+postcopy_outgoing_ram_all_sent(f, s);
+}
+return 0;
+}
+
 DPRINTF(outgoing background state: %d\n, s-state);
 i = 0;
 t0 = qemu_get_clock_ns(rt_clock);
diff --git a/migration.c b/migration.c
index 85f8f71..279dda5 100644
--- a/migration.c
+++ b/migration.c
@@ -510,7 +510,8 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_inc, bool inc, bool has_detach, bool detach,
- bool has_postcopy, bool postcopy, Error **errp)
+ bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+ Error **errp)
 {
 MigrationState *s = migrate_get_current();
 MigrationParams params;
@@ -520,6 +521,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 params.blk = blk;
 params.shared = inc;
 params.postcopy = postcopy;
+params.nobg = nobg;
 
 if (s-state == MIG_STATE_ACTIVE) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 9b3c03b..6724c19 100644
--- a/migration.h
+++ b/migration.h
@@ -25,6 +25,7 @@ struct MigrationParams {
 bool blk;
 bool shared;
 bool postcopy;
+bool nobg;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index c969e5a..70d0577 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2095,7 +2095,7 @@
 ##
 { 'command': 'migrate',
   'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
-   '*postcopy': 'bool'} }
+   '*postcopy': 'bool', '*nobg': 'bool'} }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index ece7a7e..defbeba 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -518,7 +518,7 @@ EQMP
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n

[PATCH v3 25/35] postcopy: implement outgoing part of postcopy live migration

2012-10-30 Thread Isaku Yamahata
This patch implements postcopy live migration for outgoing part

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v2 - v3:
- modify savevm_ram_handlers instead of if (postcopy)
- code simplification

Changes v1 - v2:
- fix parameter to qemu_fdopen()
- handle QEMU_UMEM_REQ_EOC properly
  when PO_STATE_ALL_PAGES_SENT, QEMU_UMEM_REQ_EOC request was ignored.
  handle properly it.
- flush on-demand page unconditionally
- improve postcopy_outgoing_ram_save_live and postcopy_outgoing_begin()
- use qemu_fopen_fd
- use memory api instead of obsolete api
- segv in postcopy_outgoing_check_all_ram_sent()
- catch up qapi change
---
 arch_init.c  |   22 ++-
 migration-exec.c |4 +
 migration-fd.c   |   17 ++
 migration-postcopy.c |  423 ++
 migration-tcp.c  |6 +-
 migration-unix.c |   26 +++-
 migration.c  |   32 +++-
 migration.h  |   18 +++
 savevm.c |   35 -
 sysemu.h |2 +-
 10 files changed, 572 insertions(+), 13 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d82316d..d95ce7b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -189,7 +189,6 @@ static struct {
 .cache = NULL,
 };
 
-
 int64_t xbzrle_cache_resize(int64_t new_size)
 {
 if (XBZRLE.cache != NULL) {
@@ -591,6 +590,7 @@ static void reset_ram_globals(void)
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
 RAMBlock *block;
+const MigrationParams *params = migrate_get_current()-params;
 migration_bitmap_init();
 
 qemu_mutex_lock_ramlist();
@@ -610,8 +610,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 acct_clear();
 }
 
-memory_global_dirty_log_start();
-migration_bitmap_sync();
+if (!params-postcopy) {
+memory_global_dirty_log_start();
+migration_bitmap_sync();
+}
 
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
@@ -916,7 +918,21 @@ done:
 return ret;
 }
 
+static void ram_save_set_params(const MigrationParams *params, void *opaque)
+{
+if (params-postcopy) {
+savevm_ram_handlers.save_live_iterate =
+postcopy_outgoing_ram_save_iterate;
+savevm_ram_handlers.save_live_complete =
+postcopy_outgoing_ram_save_complete;
+} else {
+savevm_ram_handlers.save_live_iterate = ram_save_iterate;
+savevm_ram_handlers.save_live_complete = ram_save_complete;
+}
+}
+
 SaveVMHandlers savevm_ram_handlers = {
+.set_params = ram_save_set_params,
 .save_live_setup = ram_save_setup,
 .save_live_iterate = ram_save_iterate,
 .save_live_complete = ram_save_complete,
diff --git a/migration-exec.c b/migration-exec.c
index 95e9779..10bbecf 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -64,6 +64,10 @@ int exec_start_outgoing_migration(MigrationState *s, const 
char *command)
 {
 FILE *f;
 
+if (s-params.postcopy) {
+return -ENOSYS;
+}
+
 f = popen(command, w);
 if (f == NULL) {
 DPRINTF(Unable to popen exec target\n);
diff --git a/migration-fd.c b/migration-fd.c
index 8384975..f68fa28 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -90,6 +90,23 @@ int fd_start_outgoing_migration(MigrationState *s, const 
char *fdname)
 s-write = fd_write;
 s-close = fd_close;
 
+if (s-params.postcopy) {
+int flags = fcntl(s-fd, F_GETFL);
+if ((flags  O_ACCMODE) != O_RDWR) {
+goto err_after_open;
+}
+
+s-fd_read = dup(s-fd);
+if (s-fd_read == -1) {
+goto err_after_open;
+}
+s-file_read = qemu_fopen_fd(s-fd_read, rb);
+if (s-file_read == NULL) {
+close(s-fd_read);
+goto err_after_open;
+}
+}
+
 migrate_fd_connect(s);
 return 0;
 
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 0809ffa..399e233 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -167,6 +167,107 @@ static void postcopy_incoming_send_req(QEMUFile *f,
 }
 }
 
+static int postcopy_outgoing_recv_req_idstr(QEMUFile *f,
+struct qemu_umem_req *req,
+size_t *offset)
+{
+int ret;
+
+req-len = qemu_peek_byte(f, *offset);
+*offset += 1;
+if (req-len == 0) {
+return -EAGAIN;
+}
+req-idstr = g_malloc((int)req-len + 1);
+ret = qemu_peek_buffer(f, (uint8_t*)req-idstr, req-len, *offset);
+*offset += ret;
+if (ret != req-len) {
+g_free(req-idstr);
+req-idstr = NULL;
+return -EAGAIN;
+}
+req-idstr[req-len] = 0;
+return 0;
+}
+
+static int postcopy_outgoing_recv_req_pgoffs(QEMUFile *f,
+ struct qemu_umem_req *req,
+ size_t *offset)
+{
+int ret;
+uint32_t be32;
+uint32_t i;
+
+ret = qemu_peek_buffer(f, (uint8_t*)be32, sizeof(be32

[PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault

2012-10-30 Thread Isaku Yamahata
When page is requested, send surrounding pages are also sent.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hmp-commands.hx  |   15 -
 hmp.c|3 +++
 migration-postcopy.c |   57 +-
 migration.c  |   20 ++
 migration.h  |2 ++
 qapi-schema.json |3 ++-
 6 files changed, 89 insertions(+), 11 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index b054760..5e2c77c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -826,26 +826,31 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
-.params = [-d] [-b] [-i] [-p [-n]] uri,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+ forward:i?,backward:i?,
+.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword],
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
  (base image shared between src and destination)
  \n\t\t\t-p for migration with postcopy mode enabled
- \n\t\t\t-n for no background transfer of postcopy mode,
+ \n\t\t\t-n for no background transfer of postcopy mode
+ \n\t\t\tforward: the number of pages to 
+ forward-prefault when postcopy (default 0)
+ \n\t\t\tbackward: the number of pages to 
+ backward-prefault when postcopy (default 0),
 .mhandler.cmd = hmp_migrate,
 },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @var{forward} @var{backward}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-b for migration with full copy of disk
-i for migration with incremental copy of disk (base image is shared)
-   -p for migration with postcopy mode enabled
+   -p for migration with postcopy mode enabled (forward/backward is 
prefault size when postcopy)
-n for migration with postcopy mode enabled without background transfer
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index 203b552..fb1275d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1037,11 +1037,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int inc = qdict_get_try_bool(qdict, inc, 0);
 int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
 int nobg = qdict_get_try_bool(qdict, nobg, 0);
+int forward = qdict_get_try_int(qdict, forward, 0);
+int backward = qdict_get_try_int(qdict, backward, 0);
 const char *uri = qdict_get_str(qdict, uri);
 Error *err = NULL;
 
 qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
 !!postcopy, postcopy, !!nobg, nobg,
+!!forward, forward, !!backward, backward,
 err);
 if (err) {
 monitor_printf(mon, migrate: %s\n, error_get_pretty(err));
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 5f98ae6..3d51898 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -344,6 +344,37 @@ int postcopy_outgoing_ram_save_complete(QEMUFile *f, void 
*opaque)
 return 0;
 }
 
+static void postcopy_outgoing_ram_save_page(PostcopyOutgoingState *s,
+uint64_t pgoffset, bool *written,
+bool forward,
+int prefault_pgoffset)
+{
+ram_addr_t offset;
+int ret;
+
+if (forward) {
+pgoffset += prefault_pgoffset;
+} else {
+if (pgoffset  prefault_pgoffset) {
+return;
+}
+pgoffset -= prefault_pgoffset;
+}
+
+offset = pgoffset  TARGET_PAGE_BITS;
+if (offset = s-last_block_read-length) {
+assert(forward);
+assert(prefault_pgoffset  0);
+return;
+}
+
+ret = ram_save_page(s-mig_buffered_write, s-last_block_read, offset,
+false);
+if (ret  0) {
+*written = true;
+}
+}
+
 /*
  * return value
  *   0: continue postcopy mode
@@ -355,6 +386,7 @@ static int 
postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
 bool *written)
 {
 int i;
+uint64_t j;
 RAMBlock *block;
 
 DPRINTF(cmd %d state %d\n, req-cmd, s-state);
@@ -387,11 +419,26 @@ static int 
postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
 break;
 }
 for (i = 0; i  req-nr; i++) {
-DPRINTF(offs[%d] 0x%PRIx64\n, i, req-pgoffs[i]);
-int ret = ram_save_page(s-mig_buffered_write, s-last_block_read,
-req-pgoffs[i

[PATCH v3 23/35] postcopy: implement incoming part of postcopy live migration

2012-10-30 Thread Isaku Yamahata
This patch implements postcopy live migration for incoming part

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v2 - v3:
- threading, not fork
- use blocking io instead of select + non-blocking io
- don't modify RAMBlock
- When device allocates its own RAM region, e.g. vshmem, it's handled by
  device save/load. So skip it such area which has RAM_PREALLOLC_MASK flags
  set.
- less memory overhead
- drop -postcopy option. It is automatically detected.
- various improvement and simplification
- error handling

Changes v1 - v2:
- fork umemd early to address qemu devices touching guest ram via
  post/pre_load
- code clean up on initialization
- Makefile.target
  migration-postcopy.c is target dependent due to TARGET_PAGE_xxx
  So it can't be shared between target architecture.
- use qemu_fopen_fd
- introduce incoming_flags_use_umem_make_present flag
- use MADV_DONTNEED
- make incoming socket nonblocking
- several clean ups
- Dropped QEMUFilePipe
- Moved QEMUFileNonblock to buffered_file
- Split out into umem/incoming/outgoing
- make mig_read nonblocking when socket
- updates for umem device changes
---
 Makefile.target  |2 +
 cpu-all.h|3 +
 exec.c   |6 +
 migration-fd.c   |4 +-
 migration-postcopy.c | 1249 ++
 migration-tcp.c  |   10 +-
 migration-unix.c |   10 +-
 migration.h  |   10 +
 savevm.c |   28 ++
 vl.c |2 +
 10 files changed, 1315 insertions(+), 9 deletions(-)
 create mode 100644 migration-postcopy.c

diff --git a/Makefile.target b/Makefile.target
index 3822bc5..930c070 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -121,6 +121,8 @@ obj-$(CONFIG_NO_GET_MEMORY_MAPPING) += memory_mapping-stub.o
 obj-$(CONFIG_NO_CORE_DUMP) += dump-stub.o
 LIBS+=-lz
 
+obj-y += migration-postcopy.o umem.o
+
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
 QEMU_CFLAGS += $(VNC_JPEG_CFLAGS)
diff --git a/cpu-all.h b/cpu-all.h
index b5fefc8..79846fe 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -485,6 +485,9 @@ extern ram_addr_t ram_size;
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
 #define RAM_PREALLOC_MASK   (1  0)
 
+/* RAM is allocated via umem for postcopy incoming mode */
+#define RAM_POSTCOPY_UMEM_MASK  (1  1)
+
 typedef struct RAMBlock {
 struct MemoryRegion *mr;
 uint8_t *host;
diff --git a/exec.c b/exec.c
index 2aa4d90..6da991a 100644
--- a/exec.c
+++ b/exec.c
@@ -36,6 +36,7 @@
 #include arch_init.h
 #include memory.h
 #include exec-memory.h
+#include migration.h
 #if defined(CONFIG_USER_ONLY)
 #include qemu.h
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -2555,6 +2556,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 new_block-host = host;
 new_block-flags |= RAM_PREALLOC_MASK;
 } else {
+ram_addr_t page_size = getpagesize();
+size = (size + page_size - 1)  ~(page_size - 1);
 if (mem_path) {
 #if defined (__linux__)  !defined(TARGET_S390X)
 new_block-host = file_ram_alloc(new_block, size, mem_path);
@@ -2635,6 +2638,9 @@ void qemu_ram_free(ram_addr_t addr)
 ram_list.version++;
 if (block-flags  RAM_PREALLOC_MASK) {
 ;
+}
+else if (block-flags  RAM_POSTCOPY_UMEM_MASK) {
+postcopy_incoming_ram_free(block);
 } else if (mem_path) {
 #if defined (__linux__)  !defined(TARGET_S390X)
 if (block-fd) {
diff --git a/migration-fd.c b/migration-fd.c
index b3c54e5..8384975 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -105,7 +105,9 @@ static void fd_accept_incoming_migration(void *opaque)
 
 process_incoming_migration(f);
 qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
-qemu_fclose(f);
+if (!incoming_postcopy) {
+qemu_fclose(f);
+}
 }
 
 int fd_start_incoming_migration(const char *infd)
diff --git a/migration-postcopy.c b/migration-postcopy.c
new file mode 100644
index 000..0809ffa
--- /dev/null
+++ b/migration-postcopy.c
@@ -0,0 +1,1249 @@
+/*
+ * migration-postcopy.c: postcopy livemigration
+ *
+ * Copyright (c) 2011
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata yamahata at valinux co jp
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program

[PATCH v3 24/35] postcopy outgoing: add -p option to migrate command

2012-10-30 Thread Isaku Yamahata
Added -p option to migrate command for postcopy mode and
introduce postcopy parameter for migration to indicate that postcopy mode
is enabled.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Chnages v1 - v2:
- catch up for qapi change
---
 hmp-commands.hx  |   10 ++
 hmp.c|4 +++-
 migration.c  |3 ++-
 migration.h  |1 +
 qapi-schema.json |3 ++-
 qmp-commands.hx  |3 ++-
 savevm.c |3 ++-
 7 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index e0b537d..f2f1264 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -826,23 +826,25 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,uri:s,
-.params = [-d] [-b] [-i] uri,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s,
+.params = [-d] [-b] [-i] [-p] uri,
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t-p for migration with postcopy mode enabled,
 .mhandler.cmd = hmp_migrate,
 },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-b] [-i] [-p] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-b for migration with full copy of disk
-i for migration with incremental copy of disk (base image is shared)
+   -p for migration with postcopy mode enabled
 ETEXI
 
 {
diff --git a/hmp.c b/hmp.c
index 2b97982..2ea3bc4 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1035,10 +1035,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int detach = qdict_get_try_bool(qdict, detach, 0);
 int blk = qdict_get_try_bool(qdict, blk, 0);
 int inc = qdict_get_try_bool(qdict, inc, 0);
+int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
 const char *uri = qdict_get_str(qdict, uri);
 Error *err = NULL;
 
-qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, err);
+qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
+!!postcopy, postcopy, err);
 if (err) {
 monitor_printf(mon, migrate: %s\n, error_get_pretty(err));
 error_free(err);
diff --git a/migration.c b/migration.c
index 00b0bc2..8bb6073 100644
--- a/migration.c
+++ b/migration.c
@@ -480,7 +480,7 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_inc, bool inc, bool has_detach, bool detach,
- Error **errp)
+ bool has_postcopy, bool postcopy, Error **errp)
 {
 MigrationState *s = migrate_get_current();
 MigrationParams params;
@@ -489,6 +489,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
 params.blk = blk;
 params.shared = inc;
+params.postcopy = postcopy;
 
 if (s-state == MIG_STATE_ACTIVE) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 0766691..b21df18 100644
--- a/migration.h
+++ b/migration.h
@@ -24,6 +24,7 @@
 struct MigrationParams {
 bool blk;
 bool shared;
+bool postcopy;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index c615ee2..c969e5a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2094,7 +2094,8 @@
 # Since: 0.14.0
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
+   '*postcopy': 'bool'} }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 5ba8c48..ece7a7e 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -518,7 +518,7 @@ EQMP
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,uri:s,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,uri:s,
 .mhandler.cmd_new = qmp_marshal_input_migrate,
 },
 
@@ -532,6 +532,7 @@ Arguments:
 
 - blk: block migration, full disk copy (json-bool, optional)
 - inc: incremental disk copy (json-bool, optional)
+- postcopy: postcopy migration (json-bool, optional)
 - uri: Destination URI (json-string)
 
 Example:
diff --git a/savevm.c b/savevm.c
index d1488d2..04b03cf 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1806,7 +1806,8 @@ static int qemu_savevm_state(QEMUFile *f)
 int ret;
 MigrationParams params = {
 .blk = 0,
-.shared = 0
+.shared = 0,
+.postcopy = 0,
 };
 
 if (qemu_savevm_state_blocked(NULL)) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line

[PATCH v3 20/35] osdep: add QEMU_MADV_REMOVE and tirivial fix

2012-10-30 Thread Isaku Yamahata
MADV_REMOVE will be used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 osdep.h |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/osdep.h b/osdep.h
index c5fd3d9..9e97f39 100644
--- a/osdep.h
+++ b/osdep.h
@@ -113,6 +113,11 @@ void qemu_vfree(void *ptr);
 #else
 #define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID
 #endif
+#ifdef MADV_REMOVE
+#define QEMU_MADV_REMOVE MADV_REMOVE
+#else
+#define QEMU_MADV_REMOVE QEMU_MADV_INVALID
+#endif
 
 #elif defined(CONFIG_POSIX_MADVISE)
 
@@ -120,7 +125,9 @@ void qemu_vfree(void *ptr);
 #define QEMU_MADV_DONTNEED  POSIX_MADV_DONTNEED
 #define QEMU_MADV_DONTFORK  QEMU_MADV_INVALID
 #define QEMU_MADV_MERGEABLE QEMU_MADV_INVALID
-#define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
+#define QEMU_MADV_DONTDUMP  QEMU_MADV_INVALID
+#define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
+#define QEMU_MADV_REMOVEQEMU_MADV_INVALID
 
 #else /* no-op */
 
@@ -128,7 +135,9 @@ void qemu_vfree(void *ptr);
 #define QEMU_MADV_DONTNEED  QEMU_MADV_INVALID
 #define QEMU_MADV_DONTFORK  QEMU_MADV_INVALID
 #define QEMU_MADV_MERGEABLE QEMU_MADV_INVALID
-#define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
+#define QEMU_MADV_DONTDUMP  QEMU_MADV_INVALID
+#define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
+#define QEMU_MADV_REMOVEQEMU_MADV_INVALID
 
 #endif
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 15/35] arch_init/ram_save_setup: factor out bitmap alloc/free

2012-10-30 Thread Isaku Yamahata
This will be used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v2 - v3:
- new
---
 arch_init.c |   25 ++---
 migration.h |2 ++
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index ad1b01b..7e6d84e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -330,6 +330,22 @@ static unsigned long *migration_bitmap;
 static uint64_t migration_dirty_pages;
 static uint32_t last_version;
 
+void migration_bitmap_init(void)
+{
+int64_t ram_pages = last_ram_offset()  TARGET_PAGE_BITS;
+if (!migration_bitmap) {
+migration_bitmap = bitmap_new(ram_pages);
+}
+bitmap_set(migration_bitmap, 1, ram_pages);
+migration_dirty_pages = ram_pages;
+}
+
+void migration_bitmap_free(void)
+{
+g_free(migration_bitmap);
+migration_bitmap = NULL;
+}
+
 static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr,
  ram_addr_t offset)
 {
@@ -575,11 +591,7 @@ static void reset_ram_globals(void)
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
 RAMBlock *block;
-int64_t ram_pages = last_ram_offset()  TARGET_PAGE_BITS;
-
-migration_bitmap = bitmap_new(ram_pages);
-bitmap_set(migration_bitmap, 1, ram_pages);
-migration_dirty_pages = ram_pages;
+migration_bitmap_init();
 
 qemu_mutex_lock_ramlist();
 bytes_transferred = 0;
@@ -704,8 +716,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 qemu_mutex_unlock_ramlist();
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
-g_free(migration_bitmap);
-migration_bitmap = NULL;
+migration_bitmap_free();
 
 return 0;
 }
diff --git a/migration.h b/migration.h
index 7d1b62d..73416ba 100644
--- a/migration.h
+++ b/migration.h
@@ -95,6 +95,8 @@ bool ram_save_block(QEMUFile *f, bool last_stage);
 uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
+void migration_bitmap_init(void);
+void migration_bitmap_free(void);
 
 extern SaveVMHandlers savevm_ram_handlers;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/2] export necessary symbols

2012-10-30 Thread Isaku Yamahata
Cc: Andrea Arcangeli aarca...@redhat.com
Cc: Avi Kivity a...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 mm/memcontrol.c |1 +
 mm/mempolicy.c  |1 +
 mm/shmem.c  |1 +
 3 files changed, 3 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7acf43b..bc9fd53 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2787,6 +2787,7 @@ int mem_cgroup_newpage_charge(struct page *page,
return mem_cgroup_charge_common(page, mm, gfp_mask,
MEM_CGROUP_CHARGE_TYPE_ANON);
 }
+EXPORT_SYMBOL_GPL(mem_cgroup_cache_charge);
 
 /*
  * While swap-in, try_charge - commit or cancel, the page is locked.
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d04a8a5..3df6cf5 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1947,6 +1947,7 @@ retry_cpuset:
goto retry_cpuset;
return page;
 }
+EXPORT_SYMBOL_GPL(alloc_pages_vma);
 
 /**
  * alloc_pages_current - Allocate pages.
diff --git a/mm/shmem.c b/mm/shmem.c
index 67afba5..41eaefd 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2840,6 +2840,7 @@ int shmem_zero_setup(struct vm_area_struct *vma)
vma-vm_ops = shmem_vm_ops;
return 0;
 }
+EXPORT_SYMBOL_GPL(shmem_zero_setup);
 
 /**
  * shmem_read_mapping_page_gfp - read into page cache, using specified page 
allocation flags.
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/2] postcopy migration: uvmem: Linux char device for postcopy

2012-10-30 Thread Isaku Yamahata
This is Linux kernel driver for qemu/kvm postcopy live migration.
This is used by qemu/kvm postcopy live migration patch.

User process backed memory driver provides /dev/uvmem device.
This /dev/uvmem device is designed for some sort of distributed shared memory.
page fault in the area backed by this driver is propagated to (other) server
process which serves the page contents. Usually the server process fetches
page contents from the remote machine. Then the faulting process continues.


ioctl UVMEM_INIT: initialize uvmem device for qemu.
  Returns file descriptor of tmpfs, serving thread write
  page contents to this file descriptor.
mmap: Guest VM mmaps this device and use it as guest RAM. page fault on
  this area will be propagated to the service process.
read: returns page offset that guest VM page-faulted.
write: server process notifies the device which pages are served, then
   guest VM can resume execution.
---
Changes v3 - v4:
- rename module name: umem - uvmem
  avoid module name conflict

Changes v2 - v3:
- make fault handler killable
- make use of read()/write()
- documentation

Changes version 1 - 2:
- make ioctl structures padded to align
- un-KVM
  KVM_VMEM - UMEM
- dropped some ioctl commands as Avi requested

Isaku Yamahata (2):
  export necessary symbols
  umem: chardevice for kvm postcopy

 Documentation/misc-devices/uvmem.txt |  292 
 drivers/char/Kconfig |   10 +
 drivers/char/Makefile|1 +
 drivers/char/uvmem.c |  841 ++
 include/linux/uvmem.h|   41 ++
 mm/memcontrol.c  |1 +
 mm/mempolicy.c   |1 +
 mm/shmem.c   |1 +
 8 files changed, 1188 insertions(+)
 create mode 100644 Documentation/misc-devices/uvmem.txt
 create mode 100644 drivers/char/uvmem.c
 create mode 100644 include/linux/uvmem.h

--
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 21/35] postcopy: introduce helper functions for postcopy

2012-10-30 Thread Isaku Yamahata
This patch introduces helper function for postcopy to access
umem char device and to communicate between incoming-qemu and umemd.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
changes v2 - v3:
- error check, don't abort
- typedef
- #ifdef CONFIG_LINUX
- code simplification

changes v1 - v2:
- code simplification
- make fault trigger more robust
- introduce struct umem_pages
---
 umem.c |  291 
 umem.h |   88 
 2 files changed, 379 insertions(+)
 create mode 100644 umem.c
 create mode 100644 umem.h

diff --git a/umem.c b/umem.c
new file mode 100644
index 000..b05377b
--- /dev/null
+++ b/umem.c
@@ -0,0 +1,291 @@
+/*
+ * umem.c: user process backed memory module for postcopy livemigration
+ *
+ * Copyright (c) 2011
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata yamahata at valinux co jp
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see http://www.gnu.org/licenses/.
+ */
+
+#include sys/ioctl.h
+#include sys/mman.h
+
+#include config-host.h
+#ifdef CONFIG_LINUX
+#include linux/uvmem.h
+#endif
+
+#include bitops.h
+#include sysemu.h
+#include hw/hw.h
+#include umem.h
+
+//#define DEBUG_UMEM
+#ifdef DEBUG_UMEM
+#define DPRINTF(format, ...)\
+do {\
+printf(%s:%d format, __func__, __LINE__, ## __VA_ARGS__); \
+} while (0)
+#else
+#define DPRINTF(format, ...)do { } while (0)
+#endif
+
+#define DEV_UMEM/dev/uvmem
+
+int umem_new(void *hostp, size_t size, UMem** umemp)
+{
+#ifdef CONFIG_LINUX
+struct uvmem_init uinit = {
+.size = size,
+.shmem_fd = -1,
+};
+UMem *umem;
+int error;
+
+assert((size % getpagesize()) == 0);
+umem = g_new(UMem, 1);
+umem-fd = open(DEV_UMEM, O_RDWR);
+if (umem-fd  0) {
+error = -errno;
+perror(can't open DEV_UMEM);
+goto error;
+}
+
+if (ioctl(umem-fd, UVMEM_INIT, uinit)  0) {
+error = -errno;
+perror(UMEM_INIT failed);
+goto error;
+}
+if (ftruncate(uinit.shmem_fd, uinit.size)  0) {
+error = -errno;
+perror(truncate(\shmem_fd\) failed);
+goto error;
+}
+
+umem-nbits = 0;
+umem-nsets = 0;
+umem-faulted = NULL;
+umem-page_shift = ffs(getpagesize()) - 1;
+umem-shmem_fd = uinit.shmem_fd;
+umem-size = uinit.size;
+umem-umem = mmap(hostp, size, PROT_EXEC | PROT_READ | PROT_WRITE,
+  MAP_PRIVATE | MAP_FIXED, umem-fd, 0);
+if (umem-umem == MAP_FAILED) {
+error = -errno;
+perror(mmap(UMem) failed);
+goto error;
+}
+*umemp = umem;
+return 0;
+
+error:
+if (umem-fd = 0) {
+close(umem-fd);
+}
+if (uinit.shmem_fd = 0) {
+close(uinit.shmem_fd);
+}
+g_free(umem);
+return error;
+#else
+perror(postcopy migration is not supported);
+return -ENOSYS;
+#endif
+}
+
+void umem_destroy(UMem *umem)
+{
+if (umem-fd != -1) {
+close(umem-fd);
+}
+if (umem-shmem_fd != -1) {
+close(umem-shmem_fd);
+}
+g_free(umem-faulted);
+g_free(umem);
+}
+
+size_t umem_pages_size(uint64_t nr)
+{
+return sizeof(UMemPages) + nr * sizeof(uint64_t);
+}
+
+int umem_get_page_request(UMem *umem, UMemPages *page_request)
+{
+ssize_t ret = read(umem-fd, page_request-pgoffs,
+   page_request-nr * sizeof(page_request-pgoffs[0]));
+if (ret  0) {
+if (errno != EINTR) {
+perror(daemon: umem read failed);
+return -errno;
+}
+ret = 0;
+}
+page_request-nr = ret / sizeof(page_request-pgoffs[0]);
+return 0;
+}
+
+int umem_mark_page_cached(UMem *umem, UMemPages *page_cached)
+{
+const void *buf = page_cached-pgoffs;
+size_t size = page_cached-nr * sizeof(page_cached-pgoffs[0]);
+ssize_t ret;
+
+ret = qemu_write_full(umem-fd, buf, size);
+if (ret != size) {
+perror(daemon: umem write);
+return -errno;
+}
+return 0;
+}
+
+void umem_unmap(UMem *umem)
+{
+munmap(umem-umem, umem-size);
+umem-umem = NULL;
+}
+
+void umem_close(UMem *umem)
+{
+close(umem-fd);
+umem-fd = -1;
+}
+
+int umem_map_shmem(UMem *umem)
+{
+umem-nbits

[PATCH v3 22/35] savevm: add new section that is used by postcopy

2012-10-30 Thread Isaku Yamahata
This is used by postcopy to tell the total length of QEMU_VM_SECTION_FULL
and QEMU_VM_SUBSECTION from outgoing to incoming.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 savevm.c |4 
 1 file changed, 4 insertions(+)

diff --git a/savevm.c b/savevm.c
index 93c51ab..c93b6eb 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1614,6 +1614,10 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
 #define QEMU_VM_SECTION_FULL 0x04
 #define QEMU_VM_SUBSECTION   0x05
 
+/* This section is used by postcopy to tell postcopy enabled session.
+   If the destination side doesn't know, it sees unknown section and abort. */
+#define QEMU_VM_POSTCOPY 0x10
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
 SaveStateEntry *se;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 18/35] migration: export migrate_fd_completed() and migrate_fd_cleanup()

2012-10-30 Thread Isaku Yamahata
This will be used by postcopy migration.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration.c |4 ++--
 migration.h |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration.c b/migration.c
index 8fcb466..00b0bc2 100644
--- a/migration.c
+++ b/migration.c
@@ -242,7 +242,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
-static int migrate_fd_cleanup(MigrationState *s)
+int migrate_fd_cleanup(MigrationState *s)
 {
 int ret = 0;
 
@@ -272,7 +272,7 @@ void migrate_fd_error(MigrationState *s)
 migrate_fd_cleanup(s);
 }
 
-static void migrate_fd_completed(MigrationState *s)
+void migrate_fd_completed(MigrationState *s)
 {
 DPRINTF(setting completed state\n);
 if (migrate_fd_cleanup(s)  0) {
diff --git a/migration.h b/migration.h
index 73416ba..2d27738 100644
--- a/migration.h
+++ b/migration.h
@@ -74,7 +74,9 @@ int fd_start_incoming_migration(const char *path);
 
 int fd_start_outgoing_migration(MigrationState *s, const char *fdname);
 
+int migrate_fd_cleanup(MigrationState *s);
 void migrate_fd_error(MigrationState *s);
+void migrate_fd_completed(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 19/35] uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh

2012-10-30 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 linux-headers/linux/uvmem.h |   41 +++
 scripts/update-linux-headers.sh |2 +-
 2 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 linux-headers/linux/uvmem.h

diff --git a/linux-headers/linux/uvmem.h b/linux-headers/linux/uvmem.h
new file mode 100644
index 000..ea88980
--- /dev/null
+++ b/linux-headers/linux/uvmem.h
@@ -0,0 +1,41 @@
+/*
+ * User process backed memory.
+ * This is mainly for KVM post copy.
+ *
+ * Copyright (c) 2011,
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata yamahata at valinux co jp
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see http://www.gnu.org/licenses/.
+ */
+
+#ifndef __LINUX_UVMEM_H
+#define __LINUX_UVMEM_H
+
+#include linux/types.h
+#include linux/ioctl.h
+
+struct uvmem_init {
+   __u64 size; /* in bytes */
+   __s32 shmem_fd;
+   __s32 padding;
+};
+
+#define UVMEMIO0x1E
+
+/* ioctl for uvmem fd */
+#define UVMEM_INIT _IOWR(UVMEMIO, 0x0, struct uvmem_init)
+
+#endif /* __LINUX_UVMEM_H */
diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 67be2ef..0fa25ce 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -57,7 +57,7 @@ done
 
 rm -rf $output/linux-headers/linux
 mkdir -p $output/linux-headers/linux
-for header in kvm.h kvm_para.h vfio.h vhost.h virtio_config.h virtio_ring.h; do
+for header in kvm.h kvm_para.h vfio.h vhost.h virtio_config.h virtio_ring.h 
umem.h; do
 cp $tmpdir/include/linux/$header $output/linux-headers/linux
 done
 rm -rf $output/linux-headers/asm-generic
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/2] umem: chardevice for kvm postcopy

2012-10-30 Thread Isaku Yamahata
This is a character device to hook page access.
The page fault in the area is propagated to another user process by
this chardriver. Then, the process fills the page contents and
resolves the page fault.

Cc: Andrea Arcangeli aarca...@redhat.com
Cc: Avi Kivity a...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp

---
Changes v4 - v5:
- rename umem to uvmem to avoid name conflict

Changes v3 - v4:
- simplified umem_init: kill {a,}sync_req_max
- make fault handler killable even when core-dumping
- documentation

Changes v2 - v3:
- made fault handler killable
- allow O_LARGEFILE
- improve to handle FAULT_FLAG_ALLOW_RETRY
- smart on async fault
---
 Documentation/misc-devices/uvmem.txt |  292 
 drivers/char/Kconfig |   10 +
 drivers/char/Makefile|1 +
 drivers/char/uvmem.c |  841 ++
 include/linux/uvmem.h|   41 ++
 5 files changed, 1185 insertions(+)
 create mode 100644 Documentation/misc-devices/uvmem.txt
 create mode 100644 drivers/char/uvmem.c
 create mode 100644 include/linux/uvmem.h

diff --git a/Documentation/misc-devices/uvmem.txt 
b/Documentation/misc-devices/uvmem.txt
new file mode 100644
index 000..a9c15a2
--- /dev/null
+++ b/Documentation/misc-devices/uvmem.txt
@@ -0,0 +1,292 @@
+User process backed memory driver
+=
+
+Intro
+=
+User process backed memory driver provides /dev/uvmem device.
+This /dev/uvmem device is designed for some sort of distributed shared memory.
+Especially post-copy live migration with KVM.
+
+page fault in the area backed by this driver is propagated to (other) server
+process which serves the page contents. Usually the server process fetches
+page contents from the remote machine. Then the faulting process continues.
+
+
+Kernel-User protocol
+
+ioctl
+UVMEM_INIT: Initialize the uvmem device with some parameters.
+  IN size: the area size in bytes (which is rounded up to page size)
+  OUT shmem_fd: the file descript to tmpfs that is associated to this uvmem
+device This is served as backing store of this uvmem device.
+
+mmap: Mapping the initialized uvmem device provides the area which
+  is served by user process.
+  The fault in this area is propagated to uvmem device via read
+  system call.
+read: kernel notifies a process that pages are faulted by returning
+  page offset in page size in u64 format.
+  uvmem device is pollable for read.
+write: Process notifies kernel that the page is ready to access
+   by writing page offset in page size in u64 format.
+
+
+operation flow
+==
+
+|
+V
+  open(/dev/uvmem)
+|
+V
+  ioctl(UVMEM_INIT)
+|
+V
+  Here we have two file descriptors to
+  uvmem device and shmem file
+|
+|  daemon process which serves
+|  page fault
+V
+  fork()---,
+|  |
+V  V
+  close(shmem) mmap(shmem file)
+|  |
+V  V
+  mmap(uvmem device)   close(shmem file)
+|  |
+V  |
+  close(uvmem device)   |
+|  |
+  now the setup is done|
+  work on the uvmem area|
+|  |
+V  V
+  access uvmem area (poll and) read(uvmem)
+|  |
+V  V
+  page fault -- read system call returns
+  block  page offsets
+   |
+   V
+create page contents
+(usually pull the page
+ from remote)
+write the page contents
+to the shmem which was
+mmapped above

[PATCH v3 17/35] arch_init: factor out logic to find ram block with id string

2012-10-30 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   31 ---
 arch_init.h |1 +
 exec.c  |   12 ++--
 3 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index c77e24d..d82316d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -762,6 +762,19 @@ static int load_xbzrle(QEMUFile *f, void *host)
 return rc;
 }
 
+RAMBlock *ram_find_block(const char *id, uint8_t len)
+{
+RAMBlock *block;
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (!strncmp(id, block-idstr, len)) {
+return block;
+}
+}
+
+return NULL;
+}
+
 static inline void *host_from_stream_offset(QEMUFile *f,
 ram_addr_t offset,
 int flags)
@@ -783,9 +796,9 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 qemu_get_buffer(f, (uint8_t *)id, len);
 id[len] = 0;
 
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id)))
-return memory_region_get_ram_ptr(block-mr) + offset;
+block = ram_find_block(id, len);
+if (block) {
+return memory_region_get_ram_ptr(block-mr) + offset;
 }
 
 fprintf(stderr, Can't find block %s!\n, id);
@@ -807,19 +820,15 @@ int ram_load_mem_size(QEMUFile *f, ram_addr_t 
total_ram_bytes)
 id[len] = 0;
 length = qemu_get_be64(f);
 
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id))) {
-if (block-length != length)
-return -EINVAL;
-break;
-}
-}
-
+block = ram_find_block(id, len);
 if (!block) {
 fprintf(stderr, Unknown ramblock \%s\, cannot 
 accept migration\n, id);
 return -EINVAL;
 }
+if (block-length != length) {
+return -EINVAL;
+}
 
 total_ram_bytes -= length;
 }
diff --git a/arch_init.h b/arch_init.h
index bca1a29..499d0f1 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -51,6 +51,7 @@ int ram_load_page(QEMUFile *f, void *host, int flags);
 #if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
 bool ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
bool last_stage);
+RAMBlock *ram_find_block(const char *id, uint8_t len);
 int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes);
 #endif
 
diff --git a/exec.c b/exec.c
index 1414654..2aa4d90 100644
--- a/exec.c
+++ b/exec.c
@@ -33,6 +33,7 @@
 #include kvm.h
 #include hw/xen.h
 #include qemu-timer.h
+#include arch_init.h
 #include memory.h
 #include exec-memory.h
 #if defined(CONFIG_USER_ONLY)
@@ -2517,12 +2518,11 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char 
*name, DeviceState *dev)
 pstrcat(new_block-idstr, sizeof(new_block-idstr), name);
 
 qemu_mutex_lock_ramlist();
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (block != new_block  !strcmp(block-idstr, new_block-idstr)) {
-fprintf(stderr, RAMBlock \%s\ already registered, abort!\n,
-new_block-idstr);
-abort();
-}
+block = ram_find_block(new_block-idstr, strlen(new_block-idstr));
+if (block != new_block) {
+fprintf(stderr, RAMBlock \%s\ already registered, abort!\n,
+new_block-idstr);
+abort();
 }
 qemu_mutex_unlock_ramlist();
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/35] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip, qemu_fflush

2012-10-30 Thread Isaku Yamahata
Those will be used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |4 
 savevm.c|8 
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 9c8985b..9b6dd08 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -72,6 +72,7 @@ QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
+int qemu_fflush(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
 
@@ -87,6 +88,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 int qemu_get_byte(QEMUFile *f);
+int qemu_peek_byte(QEMUFile *f, int offset);
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
+void qemu_file_skip(QEMUFile *f, int size);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index b080d37..0c7af43 100644
--- a/savevm.c
+++ b/savevm.c
@@ -448,7 +448,7 @@ static void qemu_file_set_error(QEMUFile *f, int ret)
 /** Flushes QEMUFile buffer
  *
  */
-static int qemu_fflush(QEMUFile *f)
+int qemu_fflush(QEMUFile *f)
 {
 int ret = 0;
 
@@ -583,14 +583,14 @@ void qemu_put_byte(QEMUFile *f, int v)
 }
 }
 
-static void qemu_file_skip(QEMUFile *f, int size)
+void qemu_file_skip(QEMUFile *f, int size)
 {
 if (f-buf_index + size = f-buf_size) {
 f-buf_index += size;
 }
 }
 
-static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
 int pending;
 int index;
@@ -638,7 +638,7 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
 return done;
 }
 
-static int qemu_peek_byte(QEMUFile *f, int offset)
+int qemu_peek_byte(QEMUFile *f, int offset)
 {
 int index = f-buf_index + offset;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 12/35] arch_init: export RAM_SAVE_xxx flags for postcopy

2012-10-30 Thread Isaku Yamahata
Those constants will be also used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |8 
 arch_init.h |8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a312434..4b65221 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -106,14 +106,6 @@ const uint32_t arch_type = QEMU_ARCH;
 /***/
 /* ram save/restore */
 
-#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
-#define RAM_SAVE_FLAG_COMPRESS 0x02
-#define RAM_SAVE_FLAG_MEM_SIZE 0x04
-#define RAM_SAVE_FLAG_PAGE 0x08
-#define RAM_SAVE_FLAG_EOS  0x10
-#define RAM_SAVE_FLAG_CONTINUE 0x20
-#define RAM_SAVE_FLAG_XBZRLE   0x40
-
 #ifdef __ALTIVEC__
 #include altivec.h
 #define VECTYPEvector unsigned char
diff --git a/arch_init.h b/arch_init.h
index d9c572a..e4c131e 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -36,4 +36,12 @@ int xen_available(void);
 
 CpuDefinitionInfoList GCC_WEAK_DECL *arch_query_cpu_definitions(Error **errp);
 
+#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
+#define RAM_SAVE_FLAG_COMPRESS 0x02
+#define RAM_SAVE_FLAG_MEM_SIZE 0x04
+#define RAM_SAVE_FLAG_PAGE 0x08
+#define RAM_SAVE_FLAG_EOS  0x10
+#define RAM_SAVE_FLAG_CONTINUE 0x20
+#define RAM_SAVE_FLAG_XBZRLE   0x40
+
 #endif
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 11/35] savevm, buffered_file: introduce method to drain buffer of buffered file

2012-10-30 Thread Isaku Yamahata
Introduce a new method to drain the buffer of QEMUBufferedFile.
When postcopy migration, buffer size can increase unboundedly.
To keep the buffer size reasonably small, introduce the method to
wait for buffer to drain.
Detect unfreeze output by select too, not only by timer, thus pending data
can be sent quickly.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 buffered_file.c |   59 +--
 buffered_file.h |1 +
 qemu-file.h |1 +
 savevm.c|7 +++
 4 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index ed92df1..275d504 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -26,12 +26,14 @@ typedef struct QEMUFileBuffered
 MigrationState *migration_state;
 QEMUFile *file;
 int freeze_output;
+bool no_limit;
 size_t bytes_xfer;
 size_t xfer_limit;
 uint8_t *buffer;
 size_t buffer_size;
 size_t buffer_capacity;
 QEMUTimer *timer;
+int unfreeze_fd;
 } QEMUFileBuffered;
 
 #ifdef DEBUG_BUFFERED_FILE
@@ -42,6 +44,16 @@ typedef struct QEMUFileBuffered
 do { } while (0)
 #endif
 
+static ssize_t buffered_flush(QEMUFileBuffered *s);
+
+static void buffered_unfreeze(void *opaque)
+{
+QEMUFileBuffered *s = opaque;
+qemu_set_fd_handler(s-unfreeze_fd, NULL, NULL, NULL);
+s-freeze_output = 0;
+buffered_flush(s);
+}
+
 static void buffered_append(QEMUFileBuffered *s,
 const uint8_t *buf, size_t size)
 {
@@ -65,7 +77,8 @@ static ssize_t buffered_flush(QEMUFileBuffered *s)
 
 DPRINTF(flushing %zu byte(s) of data\n, s-buffer_size);
 
-while (s-bytes_xfer  s-xfer_limit  offset  s-buffer_size) {
+while ((s-bytes_xfer  s-xfer_limit  offset  s-buffer_size) ||
+   s-no_limit) {
 
 ret = migrate_fd_put_buffer(s-migration_state, s-buffer + offset,
 s-buffer_size - offset);
@@ -73,6 +86,15 @@ static ssize_t buffered_flush(QEMUFileBuffered *s)
 DPRINTF(backend not ready, freezing\n);
 ret = 0;
 s-freeze_output = 1;
+if (!s-no_limit) {
+if (s-unfreeze_fd == -1) {
+s-unfreeze_fd = dup(s-migration_state-fd);
+}
+if (s-unfreeze_fd = 0) {
+qemu_set_fd_handler(s-unfreeze_fd,
+NULL, buffered_unfreeze, s);
+}
+}
 break;
 }
 
@@ -113,7 +135,7 @@ static int buffered_put_buffer(void *opaque, const uint8_t 
*buf, int64_t pos, in
 s-freeze_output = 0;
 
 if (size  0) {
-DPRINTF(buffering %d bytes\n, size - offset);
+DPRINTF(buffering %d bytes\n, size);
 buffered_append(s, buf, size);
 }
 
@@ -134,17 +156,11 @@ static int buffered_put_buffer(void *opaque, const 
uint8_t *buf, int64_t pos, in
 return size;
 }
 
-static int buffered_close(void *opaque)
+static void buffered_drain(QEMUFileBuffered *s)
 {
-QEMUFileBuffered *s = opaque;
-ssize_t ret = 0;
-int ret2;
-
-DPRINTF(closing\n);
-
 s-xfer_limit = INT_MAX;
 while (!qemu_file_get_error(s-file)  s-buffer_size) {
-ret = buffered_flush(s);
+ssize_t ret = buffered_flush(s);
 if (ret  0) {
 break;
 }
@@ -153,13 +169,27 @@ static int buffered_close(void *opaque)
 if (ret  0) {
 break;
 }
+s-freeze_output = 0;
 }
 }
+}
+
+static int buffered_close(void *opaque)
+{
+QEMUFileBuffered *s = opaque;
+ssize_t ret = 0;
+int ret2;
 
+DPRINTF(closing\n);
+
+buffered_drain(s);
 ret2 = migrate_fd_close(s-migration_state);
 if (ret = 0) {
 ret = ret2;
 }
+if (s-unfreeze_fd = 0) {
+close(s-unfreeze_fd);
+}
 qemu_del_timer(s-timer);
 qemu_free_timer(s-timer);
 g_free(s-buffer);
@@ -242,6 +272,7 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState 
*migration_state)
 
 s-migration_state = migration_state;
 s-xfer_limit = migration_state-bandwidth_limit / 10;
+s-unfreeze_fd = -1;
 
 s-file = qemu_fopen_ops(s, buffered_put_buffer, NULL,
  buffered_close, buffered_rate_limit,
@@ -254,3 +285,11 @@ QEMUFile *qemu_fopen_ops_buffered(MigrationState 
*migration_state)
 
 return s-file;
 }
+
+void qemu_buffered_file_drain_buffer(void *buffered_file)
+{
+QEMUFileBuffered *s = buffered_file;
+s-no_limit = true;
+buffered_drain(s);
+s-no_limit = false;
+}
diff --git a/buffered_file.h b/buffered_file.h
index ef010fe..be714a7 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -18,5 +18,6 @@
 #include migration.h
 
 QEMUFile *qemu_fopen_ops_buffered(MigrationState *migration_state);
+void qemu_buffered_file_drain_buffer(void *buffered_file);
 
 #endif
diff --git a/qemu-file.h b/qemu-file.h
index 452efcd..8074df1 100644

[PATCH v3 09/35] savevm/QEMUFile: introduce qemu_fopen_fd

2012-10-30 Thread Isaku Yamahata
Introduce fd read/write backend of QEMUFile whose fd can be non-blocking
This will be used by postcopy live migration.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |1 +
 savevm.c|   35 +++
 2 files changed, 36 insertions(+)

diff --git a/qemu-file.h b/qemu-file.h
index bc222dc..94557ea 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -68,6 +68,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_fd(int fd, const char *mode);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_file_fd(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index e24041b..712b7ae 100644
--- a/savevm.c
+++ b/savevm.c
@@ -207,6 +207,19 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static int fd_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileFD *s = opaque;
+return qemu_read_full(s-file-fd, buf, size);
+}
+
+static int fd_put_buffer(void *opaque,
+ const uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileFD *s = opaque;
+return qemu_write_full(s-file-fd, buf, size);
+}
+
 static int fd_close(void *opaque)
 {
 QEMUFileFD *s = opaque;
@@ -333,6 +346,28 @@ QEMUFile *qemu_fopen_socket(int fd)
 return s-file;
 }
 
+QEMUFile *qemu_fopen_fd(int fd, const char *mode)
+{
+QEMUFileFD *s;
+
+if (mode == NULL || (mode[0] != 'r'  mode[0] != 'w') || mode[1] != 0) {
+fprintf(stderr, qemu_fopen_fd: Argument validity check failed\n);
+return NULL;
+}
+
+s = g_malloc0(sizeof(*s));
+if (mode[0] == 'r') {
+s-file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close,
+ NULL, NULL, NULL);
+} else {
+s-file = qemu_fopen_ops(s, fd_put_buffer, NULL, fd_close,
+ NULL, NULL, NULL);
+}
+
+s-file-fd = fd;
+return s-file;
+}
+
 static int file_put_buffer(void *opaque, const uint8_t *buf,
 int64_t pos, int size)
 {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 05/35] protect the ramlist with a separate mutex

2012-10-30 Thread Isaku Yamahata
From: Umesh Deshpande udesh...@redhat.com

From: Umesh Deshpande udesh...@redhat.com

Add the new mutex that protects shared state between ram_save_live
and the iothread.  If the iothread mutex has to be taken together
with the ramlist mutex, the iothread shall always be _outside_.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Umesh Deshpande udesh...@redhat.com
Signed-off-by: Juan Quintela quint...@redhat.com
---
 arch_init.c |9 -
 cpu-all.h   |8 
 exec.c  |   23 +--
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index eb36a6a..a312434 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -553,7 +553,6 @@ static void ram_migration_cancel(void *opaque)
 migration_end();
 }
 
-
 static void reset_ram_globals(void)
 {
 last_block = NULL;
@@ -573,6 +572,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 bitmap_set(migration_bitmap, 1, ram_pages);
 migration_dirty_pages = ram_pages;
 
+qemu_mutex_lock_ramlist();
 bytes_transferred = 0;
 reset_ram_globals();
 
@@ -600,6 +600,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 qemu_put_be64(f, block-length);
 }
 
+qemu_mutex_unlock_ramlist();
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
 return 0;
@@ -614,6 +615,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 uint64_t expected_downtime;
 MigrationState *s = migrate_get_current();
 
+qemu_mutex_lock_ramlist();
+
 if (ram_list.version != last_version) {
 reset_ram_globals();
 }
@@ -662,6 +665,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 bwidth = 0.01;
 }
 
+qemu_mutex_unlock_ramlist();
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
 expected_downtime = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
@@ -682,6 +686,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 {
 migration_bitmap_sync();
 
+qemu_mutex_lock_ramlist();
+
 /* try transferring iterative blocks of memory */
 
 /* flush all remaining blocks regardless of rate limiting */
@@ -697,6 +703,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 memory_global_dirty_log_stop();
 
+qemu_mutex_unlock_ramlist();
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
 g_free(migration_bitmap);
diff --git a/cpu-all.h b/cpu-all.h
index 84aea8b..b5fefc8 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -22,6 +22,7 @@
 #include qemu-common.h
 #include qemu-tls.h
 #include cpu-common.h
+#include qemu-thread.h
 
 /* some important defines:
  *
@@ -490,7 +491,9 @@ typedef struct RAMBlock {
 ram_addr_t offset;
 ram_addr_t length;
 uint32_t flags;
+/* Protected by the iothread lock.  */
 QLIST_ENTRY(RAMBlock) next_mru;
+/* Protected by the ramlist lock.  */
 QLIST_ENTRY(RAMBlock) next;
 char idstr[256];
 #if defined(__linux__)  !defined(TARGET_S390X)
@@ -499,9 +502,12 @@ typedef struct RAMBlock {
 } RAMBlock;
 
 typedef struct RAMList {
+QemuMutex mutex;
+/* Protected by the iothread lock.  */
 uint8_t *phys_dirty;
 uint32_t version;
 QLIST_HEAD(, RAMBlock) blocks_mru;
+/* Protected by the ramlist lock.  */
 QLIST_HEAD(, RAMBlock) blocks;
 } RAMList;
 extern RAMList ram_list;
@@ -521,6 +527,8 @@ extern int mem_prealloc;
 
 void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 ram_addr_t last_ram_offset(void);
+void qemu_mutex_lock_ramlist(void);
+void qemu_mutex_unlock_ramlist(void);
 #endif /* !CONFIG_USER_ONLY */
 
 int cpu_memory_rw_debug(CPUArchState *env, target_ulong addr,
diff --git a/exec.c b/exec.c
index f5a8aca..1414654 100644
--- a/exec.c
+++ b/exec.c
@@ -645,6 +645,7 @@ bool tcg_enabled(void)
 void cpu_exec_init_all(void)
 {
 #if !defined(CONFIG_USER_ONLY)
+qemu_mutex_init(ram_list.mutex);
 memory_map_init();
 io_mem_init();
 #endif
@@ -2324,6 +2325,16 @@ void qemu_flush_coalesced_mmio_buffer(void)
 kvm_flush_coalesced_mmio_buffer();
 }
 
+void qemu_mutex_lock_ramlist(void)
+{
+qemu_mutex_lock(ram_list.mutex);
+}
+
+void qemu_mutex_unlock_ramlist(void)
+{
+qemu_mutex_unlock(ram_list.mutex);
+}
+
 #if defined(__linux__)  !defined(TARGET_S390X)
 
 #include sys/vfs.h
@@ -2505,6 +2516,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char 
*name, DeviceState *dev)
 }
 pstrcat(new_block-idstr, sizeof(new_block-idstr), name);
 
+qemu_mutex_lock_ramlist();
 QLIST_FOREACH(block, ram_list.blocks, next) {
 if (block != new_block  !strcmp(block-idstr, new_block-idstr)) {
 fprintf(stderr, RAMBlock \%s\ already registered, abort!\n,
@@ -2512,6 +2524,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char 
*name, DeviceState *dev)
 abort();
 }
 }
+qemu_mutex_unlock_ramlist();
 }
 
 static int memory_try_enable_merging(void *addr, size_t len)
@@ -2535,6 +2548,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 size 

[PATCH v3 04/35] add a version number to ram_list

2012-10-30 Thread Isaku Yamahata
From: Umesh Deshpande udesh...@redhat.com

From: Umesh Deshpande udesh...@redhat.com

This will be used to detect if last_block might have become invalid
across different calls to ram_save_live.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Umesh Deshpande udesh...@redhat.com
---
 arch_init.c |7 ++-
 cpu-all.h   |1 +
 exec.c  |5 -
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d6162af..eb36a6a 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -336,6 +336,7 @@ static RAMBlock *last_block;
 static ram_addr_t last_offset;
 static unsigned long *migration_bitmap;
 static uint64_t migration_dirty_pages;
+static uint32_t last_version;
 
 static inline bool migration_bitmap_test_and_reset_dirty(MemoryRegion *mr,
  ram_addr_t offset)
@@ -406,7 +407,6 @@ static void migration_bitmap_sync(void)
 }
 }
 
-
 /*
  * ram_save_block: Writes a page of memory to the stream f
  *
@@ -558,6 +558,7 @@ static void reset_ram_globals(void)
 {
 last_block = NULL;
 last_offset = 0;
+last_version = ram_list.version;
 sort_ram_list();
 }
 
@@ -613,6 +614,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 uint64_t expected_downtime;
 MigrationState *s = migrate_get_current();
 
+if (ram_list.version != last_version) {
+reset_ram_globals();
+}
+
 bytes_transferred_last = bytes_transferred;
 bwidth = qemu_get_clock_ns(rt_clock);
 
diff --git a/cpu-all.h b/cpu-all.h
index ecbba12..84aea8b 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -500,6 +500,7 @@ typedef struct RAMBlock {
 
 typedef struct RAMList {
 uint8_t *phys_dirty;
+uint32_t version;
 QLIST_HEAD(, RAMBlock) blocks_mru;
 QLIST_HEAD(, RAMBlock) blocks;
 } RAMList;
diff --git a/exec.c b/exec.c
index 489d924..f5a8aca 100644
--- a/exec.c
+++ b/exec.c
@@ -645,7 +645,6 @@ bool tcg_enabled(void)
 void cpu_exec_init_all(void)
 {
 #if !defined(CONFIG_USER_ONLY)
-qemu_mutex_init(ram_list.mutex);
 memory_map_init();
 io_mem_init();
 #endif
@@ -2570,6 +2569,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 QLIST_INSERT_HEAD(ram_list.blocks, new_block, next);
 QLIST_INSERT_HEAD(ram_list.blocks_mru, new_block, next_mru);
 
+ram_list.version++;
+
 ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
last_ram_offset()  TARGET_PAGE_BITS);
 memset(ram_list.phys_dirty + (new_block-offset  TARGET_PAGE_BITS),
@@ -2598,6 +2599,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
 if (addr == block-offset) {
 QLIST_REMOVE(block, next);
 QLIST_REMOVE(block, next_mru);
+ram_list.version++;
 g_free(block);
 return;
 }
@@ -2612,6 +2614,7 @@ void qemu_ram_free(ram_addr_t addr)
 if (addr == block-offset) {
 QLIST_REMOVE(block, next);
 QLIST_REMOVE(block, next_mru);
+ram_list.version++;
 if (block-flags  RAM_PREALLOC_MASK) {
 ;
 } else if (mem_path) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/35] postcopy live migration

2012-10-30 Thread Isaku Yamahata
  |  |
  |  V
  |   release the cached page
  |   madvise(MADV_REMOVE)
  |
  |
  |   pages can be sent
  |   backgroundly
  |  |
  |  V
  |   mark page is cached
  |   Thus future page fault is
  |   avoided.
  |  |
  |  V
  |   touch guest RAM pages
  |  |
  |  V
  |   release the cached page
  |   madvise(MADV_REMOVE)
  |  |
  V  V

 all the pages are pulled from the source

  |  |
  V  V
migration completesexit()


Isaku Yamahata (32):
  migration.c: remove redundant line in migrate_init()
  arch_init: DPRINTF format error and typo
  osdep: add qemu_read_full() to read interrupt-safely
  savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip,
qemu_fflush
  savevm/QEMUFile: consolidate QEMUFile functions a bit
  savevm/QEMUFile: introduce qemu_fopen_fd
  savevm/QEMUFile: add read/write QEMUFile on memory buffer
  savevm, buffered_file: introduce method to drain buffer of buffered
file
  arch_init: export RAM_SAVE_xxx flags for postcopy
  arch_init/ram_save: introduce constant for ram save version = 4
  arch_init: refactor ram_save_block() and export ram_save_block()
  arch_init/ram_save_setup: factor out bitmap alloc/free
  arch_init/ram_load: refactor ram_load
  arch_init: factor out logic to find ram block with id string
  migration: export migrate_fd_completed() and migrate_fd_cleanup()
  uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh
  osdep: add QEMU_MADV_REMOVE and tirivial fix
  postcopy: introduce helper functions for postcopy
  savevm: add new section that is used by postcopy
  postcopy: implement incoming part of postcopy live migration
  postcopy outgoing: add -p option to migrate command
  postcopy: implement outgoing part of postcopy live migration
  postcopy/outgoing: add -n options to disable background transfer
  postcopy/outgoing: implement forward/backword prefault
  arch_init: factor out setting last_block, last_offset
  postcopy/outgoing: add movebg mode(-m) to migration command
  arch_init: factor out ram_load
  arch_init: export ram_save_iterate()
  postcopy: pre+post optimization incoming side
  arch_init: export migration_bitmap_sync and helper method to get
bitmap
  postcopy/outgoing: introduce precopy_count parameter
  postcopy: pre+post optimization outgoing side

Paolo Bonzini (1):
  split MRU ram list

Umesh Deshpande (2):
  add a version number to ram_list
  protect the ramlist with a separate mutex

 Makefile.target |2 +
 arch_init.c |  391 +---
 arch_init.h |   24 +
 buffered_file.c |   59 +-
 buffered_file.h |1 +
 cpu-all.h   |   16 +-
 exec.c  |   62 +-
 hmp-commands.hx |   21 +-
 hmp.c   |   12 +-
 linux-headers/linux/uvmem.h |   41 +
 migration-exec.c|8 +-
 migration-fd.c  |   23 +-
 migration-postcopy.c| 2019 +++
 migration-tcp.c |   16 +-
 migration-unix.c|   36 +-
 migration.c |   65 +-
 migration.h |   42 +
 osdep.c |   24 +
 osdep.h |   13 +-
 qapi-schema.json|6 +-
 qemu-common.h   |2 +
 qemu-file.h |   12 +-
 qmp-commands.hx |4 +-
 savevm.c|  223 -
 scripts/update-linux-headers.sh |2 +-
 sysemu.h|2 +-
 umem.c  |  291 ++
 umem.h  |   88 ++
 vl.c|5 +-
 29 files changed, 3265 insertions(+), 245 deletions(-)
 create mode 100644 linux-headers/linux/uvmem.h
 create mode 100644 migration-postcopy.c
 create mode 100644 umem.c
 create mode 100644 umem.h

--
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm

[PATCH v3 03/35] split MRU ram list

2012-10-30 Thread Isaku Yamahata
From: Paolo Bonzini pbonz...@redhat.com

From: Paolo Bonzini pbonz...@redhat.com

Outside the execution threads the normal, non-MRU-ized order of
the RAM blocks should always be enough.  So manage two separate
lists, which will have separate locking rules.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch_init.c |1 +
 cpu-all.h   |4 +++-
 exec.c  |   18 +-
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 79d4041..d6162af 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -48,6 +48,7 @@
 #include qemu/page_cache.h
 #include qmp-commands.h
 #include trace.h
+#include cpu-all.h
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
diff --git a/cpu-all.h b/cpu-all.h
index 6606432..ecbba12 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -490,8 +490,9 @@ typedef struct RAMBlock {
 ram_addr_t offset;
 ram_addr_t length;
 uint32_t flags;
-char idstr[256];
+QLIST_ENTRY(RAMBlock) next_mru;
 QLIST_ENTRY(RAMBlock) next;
+char idstr[256];
 #if defined(__linux__)  !defined(TARGET_S390X)
 int fd;
 #endif
@@ -499,6 +500,7 @@ typedef struct RAMBlock {
 
 typedef struct RAMList {
 uint8_t *phys_dirty;
+QLIST_HEAD(, RAMBlock) blocks_mru;
 QLIST_HEAD(, RAMBlock) blocks;
 } RAMList;
 extern RAMList ram_list;
diff --git a/exec.c b/exec.c
index b0ed593..489d924 100644
--- a/exec.c
+++ b/exec.c
@@ -56,6 +56,7 @@
 #include xen-mapcache.h
 #include trace.h
 #endif
+#include cpu-all.h
 
 #include cputlb.h
 
@@ -96,7 +97,10 @@ static uint8_t *code_gen_ptr;
 int phys_ram_fd;
 static int in_migration;
 
-RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks) };
+RAMList ram_list = {
+.blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks),
+.blocks_mru = QLIST_HEAD_INITIALIZER(ram_list.blocks_mru)
+};
 
 static MemoryRegion *system_memory;
 static MemoryRegion *system_io;
@@ -641,6 +645,7 @@ bool tcg_enabled(void)
 void cpu_exec_init_all(void)
 {
 #if !defined(CONFIG_USER_ONLY)
+qemu_mutex_init(ram_list.mutex);
 memory_map_init();
 io_mem_init();
 #endif
@@ -2563,6 +2568,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 new_block-length = size;
 
 QLIST_INSERT_HEAD(ram_list.blocks, new_block, next);
+QLIST_INSERT_HEAD(ram_list.blocks_mru, new_block, next_mru);
 
 ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
last_ram_offset()  TARGET_PAGE_BITS);
@@ -2591,6 +2597,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr)
 QLIST_FOREACH(block, ram_list.blocks, next) {
 if (addr == block-offset) {
 QLIST_REMOVE(block, next);
+QLIST_REMOVE(block, next_mru);
 g_free(block);
 return;
 }
@@ -2604,6 +2611,7 @@ void qemu_ram_free(ram_addr_t addr)
 QLIST_FOREACH(block, ram_list.blocks, next) {
 if (addr == block-offset) {
 QLIST_REMOVE(block, next);
+QLIST_REMOVE(block, next_mru);
 if (block-flags  RAM_PREALLOC_MASK) {
 ;
 } else if (mem_path) {
@@ -2709,12 +2717,12 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 {
 RAMBlock *block;
 
-QLIST_FOREACH(block, ram_list.blocks, next) {
+QLIST_FOREACH(block, ram_list.blocks_mru, next_mru) {
 if (addr - block-offset  block-length) {
 /* Move this entry to to start of the list.  */
 if (block != QLIST_FIRST(ram_list.blocks)) {
-QLIST_REMOVE(block, next);
-QLIST_INSERT_HEAD(ram_list.blocks, block, next);
+QLIST_REMOVE(block, next_mru);
+QLIST_INSERT_HEAD(ram_list.blocks_mru, block, next_mru);
 }
 if (xen_enabled()) {
 /* We need to check if the requested address is in the RAM
@@ -2809,7 +2817,7 @@ int qemu_ram_addr_from_host(void *ptr, ram_addr_t 
*ram_addr)
 return 0;
 }
 
-QLIST_FOREACH(block, ram_list.blocks, next) {
+QLIST_FOREACH(block, ram_list.blocks_mru, next_mru) {
 /* This case append when the block is not mapped. */
 if (block-host == NULL) {
 continue;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/35] postcopy live migration

2012-10-30 Thread Isaku Yamahata
On Tue, Oct 30, 2012 at 06:53:31PM +, Benoit Hudzia wrote:
 Hi Isaku,
 
 
 Are you going to be at the KVM forum ( i think you have a presentation there).
 It would be nice if we could meet in order to see if we can synch our efforts 
 .

Yes, definitively.

 As you know we have been developing an RDMA based solution for post copy
 migration and  we demonstrated the initial proof of concept in december 2012 (
 we published some finding  in VHPC 2012 and are working with Petter Svard from
 Umea on a journal paper with more detailed performance review) .

Do you have any pointers to available papers/slides?
I can't find any at http://vhpc.org/


 While  RDMA post copy live migration is just of by product of our long term
 effort ( i will present the project  in my talk at KVM forum)  we grabbed the
 opportunity  to address problems we were facing with the live migration of
 enterprise workload . Namely how to migrate in memory database such has HANA
 under load.
 
 We quickly discovered that pre copy ( even with optimization ) didn't work 
 with
 such workload. We also tried your code however the performance where far from
 satisfying with large VM under load due to the heavy cost of transferring
 memory between user space - kernel multiple time ( actually it often failed)

If possible, I'd like to see the details.


 We then tested a   pure RDMA solution we developed  ( we suport HW and 
 software
 RDMA )   and it work fine with all the  workload we tested  ( we migrated VM
 with 20+ GB running SAP HANA under a workload similar to TPC-H) and we hop to
 test with bigger configuration soon ( 1/2 + TB of memory) .
 
 However the state of integration of our code with the QEMU -code base is not 
 as
 advanced and polished as the one you currently have and i would like to know 
 if
 you would be interested in trying to join our effort or collaborate in merging
 our solution. Or maybe allowing us to piggy back on your effort.

Yeah, we can unite our efforts for the upstream.
Especially clean interface for both non-RDMA/RDMA (qemu internal/qemu-kernel)
is important.
At the moment I have no clue to the requirement of RDMA postcopy and
your implementation.
transparently integrating with the MMU at the OS level sounds interesting.

thanks,

 Would you bee free to meet at any time next week ? ( from Tuesday to Friday)
 
 Ps: we would be open sourcing our project by the end of the month of November
 and the post copy is only a small part of the technology developed.
 
 .
 
 
 Regards
 Benoit
 
 
 On 30 October 2012 08:32, Isaku Yamahata yamah...@valinux.co.jp wrote:
 
 This is the v3 patch series of postcopy migration.
 
 The trees is available at
 git://github.com/yamahata/qemu.git qemu-postcopy-oct-30-2012
 git://github.com/yamahata/linux-umem.git linux-umem-oct-29-2012
 
 Major changes v2 - v3:
 - implemented pre+post optimization
 - auto detection of postcopy by incoming side
 - using threads on destination instead of fork
 - using blocking io instead of select + non-blocking io loop
 - less memory overhead
 - various improvement and code simplification
 - kernel module name change umem - uvmem to avoid name conflict.
 
 Patches organization:
 1-2: trivial fixes
 3-5: prepartion for threading. cherry-picked from migration tree
 6-18: refactoring existing code and preparation
 19-25: implement postcopy live migration itself (essential part)
 26-35: optimization/heuristic for postcopy
 
 Usage
 =
 You need load uvmem character device on the host before starting 
 migration.
 Postcopy can be used for tcg and kvm accelarator. The implementation 
 depend
 on only linux uvmem character device. But the driver dependent code is
 split
 into a file.
 I tested only host page size == guest page size case, but the
 implementation
 allows host page size != guest page size case.
 
 The following options are added with this patch series.
 - incoming part
   use -incoming as usual. Postcopy is automatically detected.
   example:
   qemu -incoming tcp:0: -monitor stdio -machine accel=kvm
 
 - outging part
   options for migrate command
   migrate [-p [-n] [-m]] URI
   [precopy count [prefault forward [prefault backword]]]
 
   Newly added options/arguments
   -p: indicate postcopy migration
   -n: disable background transferring pages: This is for benchmark/
 debugging
   -m: move background transfer of postcopy mode
   precopy count: The number of precopy RAM scan before postcopy.
default 0 (0 means no precopy)
   prefault forward: The number of forward pages which is sent with
 on-demand
   prefault backward: The number of backward pages which is sent with
on-demand
 
   example:
   migrate -p -n tcp:dest ip address:
   migrate -p -n -m tcp:dest ip address: 42

Re: [PATCH v2 35/41] postcopy: introduce helper functions for postcopy

2012-06-16 Thread Isaku Yamahata
On Thu, Jun 14, 2012 at 11:34:09PM +0200, Juan Quintela wrote:
 Isaku Yamahata yamah...@valinux.co.jp wrote:
  +//#define DEBUG_UMEM
  +#ifdef DEBUG_UMEM
  +#include sys/syscall.h
  +#define DPRINTF(format, ...)\
  +do {\
  +printf(%d:%ld %s:%d format, getpid(), syscall(SYS_gettid),\
  +   __func__, __LINE__, ## __VA_ARGS__); \
  +} while (0)
 
 This should be in a header file that is linux specific?  And (at least
 on my systems) gettid is already defined on glibc.

I'll remove getpid/gettid. It was just for debugging in early phase.
They are not necessary any more.


  +#else
  +#define DPRINTF(format, ...)do { } while (0)
  +#endif
 
 
  +
  +#define DEV_UMEM/dev/umem
  +
  +UMem *umem_new(void *hostp, size_t size)
  +{
  +struct umem_init uinit = {
  +.size = size,
  +};
  +UMem *umem;
  +
  +assert((size % getpagesize()) == 0);
  +umem = g_new(UMem, 1);
  +umem-fd = open(DEV_UMEM, O_RDWR);
  +if (umem-fd  0) {
  +perror(can't open DEV_UMEM);
  +abort();
 
 Can we return one error insntead of abort?  the same for the rest of the
 file aborts.

Ok.


  +size_t umem_pages_size(uint64_t nr)
  +{
  +return sizeof(struct umem_pages) + nr * sizeof(uint64_t);
 
 Can we make sure that the pgoffs field is aligned?  I know that as it is
 now it is aligned, but better to be sure?

It is already done by gcc extension, zero length array.


  +}
  +
  +static void umem_write_cmd(int fd, uint8_t cmd)
  +{
  +DPRINTF(write cmd %c\n, cmd);
  +
  +for (;;) {
  +ssize_t ret = write(fd, cmd, 1);
  +if (ret == -1) {
  +if (errno == EINTR) {
  +continue;
  +} else if (errno == EPIPE) {
  +perror(pipe);
  +DPRINTF(write cmd %c %zd %d: pipe is closed\n,
  +cmd, ret, errno);
  +break;
  +}
 
 
 Grr, we don't have a function that writes does a safe_write.  The most
 similar thing in qemu looks to be send_all().

So we should introduce something like qemu_safe_write/read?


  +
  +perror(pipe);
 
 Can we make a different perror() message than previous error?
 
  +DPRINTF(write cmd %c %zd %d\n, cmd, ret, errno);
  +abort();
  +}
  +
  +break;
  +}
  +}
  +
  +static void umem_read_cmd(int fd, uint8_t expect)
  +{
  +uint8_t cmd;
  +for (;;) {
  +ssize_t ret = read(fd, cmd, 1);
  +if (ret == -1) {
  +if (errno == EINTR) {
  +continue;
  +}
  +perror(pipe);
  +DPRINTF(read error cmd %c %zd %d\n, cmd, ret, errno);
  +abort();
  +}
  +
  +if (ret == 0) {
  +DPRINTF(read cmd %c %zd: pipe is closed\n, cmd, ret);
  +abort();
  +}
  +
  +break;
  +}
  +
  +DPRINTF(read cmd %c\n, cmd);
  +if (cmd != expect) {
  +DPRINTF(cmd %c expect %d\n, cmd, expect);
  +abort();
 
 Ouch.  If we receive garbage, we just exit?
 
 I really think that we should implement error handling.
 
  +}
  +}
  +
  +struct umem_pages *umem_recv_pages(QEMUFile *f, int *offset)
  +{
  +int ret;
  +uint64_t nr;
  +size_t size;
  +struct umem_pages *pages;
  +
  +ret = qemu_peek_buffer(f, (uint8_t*)nr, sizeof(nr), *offset);
  +*offset += sizeof(nr);
  +DPRINTF(ret %d nr %ld\n, ret, nr);
  +if (ret != sizeof(nr) || nr == 0) {
  +return NULL;
  +}
  +
  +size = umem_pages_size(nr);
  +pages = g_malloc(size);
 
 Just thinking about this.  Couldn't we just decide on a big enough
 buffer, and never send anything bigger than that?  That would remove the
 need to have to malloc()/free() a buffer for each reception?

Will try to address it.


  +/* qemu side handler */
  +struct umem_pages *umem_qemu_trigger_page_fault(QEMUFile *from_umemd,
  +int *offset)
  +{
  +uint64_t i;
  +int page_shift = ffs(getpagesize()) - 1;
  +struct umem_pages *pages = umem_recv_pages(from_umemd, offset);
  +if (pages == NULL) {
  +return NULL;
  +}
  +
  +for (i = 0; i  pages-nr; i++) {
  +ram_addr_t addr = pages-pgoffs[i]  page_shift;
  +
  +/* make pages present by forcibly triggering page fault. */
  +volatile uint8_t *ram = qemu_get_ram_ptr(addr);
  +uint8_t dummy_read = ram[0];
  +(void)dummy_read;   /* suppress unused variable warning */
  +}
  +
  +/*
  + * Very Linux implementation specific.
  + * Make it sure that other thread doesn't fault on the above virtual
  + * address. (More exactly other thread doesn't call fault handler with
  + * the offset.)
  + * the fault handler

Re: [Qemu-devel] [PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option

2012-06-08 Thread Isaku Yamahata
On Fri, Jun 08, 2012 at 12:52:54PM +0200, Juan Quintela wrote:
 Isaku Yamahata yamah...@valinux.co.jp wrote:
  This patch prepares for postcopy livemigration.
  It introduces -postcopy option and its internal flag, migration_postcopy.
  It introduces -postcopy-flags for chaging the behavior of incoming postcopy
  mainly for benchmark/debug.
 
 Why do we need postcopy flag?  -incoming should be enough to detect that
 we are doing postcopy.
 
 QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
 QLIST_HEAD_INITIALIZER(loadvm_handlers);
 LoadStateEntry *le, *new_le;
 uint8_t section_type;
 unsigned int v;
 int ret;
 
 if (qemu_savevm_state_blocked(NULL)) {
 return -EINVAL;
 }
 
 v = qemu_get_be32(f);
 if (v != QEMU_VM_FILE_MAGIC)
 return -EINVAL;
 
 v = qemu_get_be32(f);
 if (v == QEMU_VM_FILE_VERSION_COMPAT) {
 fprintf(stderr, SaveVM v2 format is obsolete and don't work 
 anymore\n);
 return -ENOTSUP;
 }
 if (v != QEMU_VM_FILE_VERSION)
 return -ENOTSUP;
 
 Shouldn't we be able to change some version field here and make the
 recognition of postcopy automatic?  Having to hack around a new
 command line option for each page is not going to be nice.  And about
 postcopy flags, if they are for incoming side, please consider just
 sent that flags on the stream as a first field?

Yes, you are right.
If bumping version is allowed, -postcopy can be dropped with auto detection.
-postcopy-flags can be dropped because it is used only for benchmark purpose
to change incoming side behavior independent of outgoing side.
-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy

2012-06-04 Thread Isaku Yamahata
Those constants will be also used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |7 ---
 arch_init.h |7 +++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 38e0173..bd4e61e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -88,13 +88,6 @@ const uint32_t arch_type = QEMU_ARCH;
 /***/
 /* ram save/restore */
 
-#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
-#define RAM_SAVE_FLAG_COMPRESS 0x02
-#define RAM_SAVE_FLAG_MEM_SIZE 0x04
-#define RAM_SAVE_FLAG_PAGE 0x08
-#define RAM_SAVE_FLAG_EOS  0x10
-#define RAM_SAVE_FLAG_CONTINUE 0x20
-
 #ifdef __ALTIVEC__
 #include altivec.h
 #define VECTYPEvector unsigned char
diff --git a/arch_init.h b/arch_init.h
index c7cb94a..7cc3fa7 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -30,4 +30,11 @@ int tcg_available(void);
 int kvm_available(void);
 int xen_available(void);
 
+#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
+#define RAM_SAVE_FLAG_COMPRESS 0x02
+#define RAM_SAVE_FLAG_MEM_SIZE 0x04
+#define RAM_SAVE_FLAG_PAGE 0x08
+#define RAM_SAVE_FLAG_EOS  0x10
+#define RAM_SAVE_FLAG_CONTINUE 0x20
+
 #endif
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block()

2012-06-04 Thread Isaku Yamahata
This will be used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |4 ++--
 migration.h |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a9e8b74..38e0173 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -164,7 +164,7 @@ static int is_dup_page(uint8_t *page)
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
-static int ram_save_block(QEMUFile *f)
+int ram_save_block(QEMUFile *f)
 {
 RAMBlock *block = last_block;
 ram_addr_t offset = last_offset;
@@ -273,7 +273,7 @@ static int block_compar(const void *a, const void *b)
 return strcmp((*ablock)-idstr, (*bblock)-idstr);
 }
 
-static void sort_ram_list(void)
+void sort_ram_list(void)
 {
 RAMBlock *block, *nblock, **blocks;
 int n;
diff --git a/migration.h b/migration.h
index 2e9ca2e..8b9509c 100644
--- a/migration.h
+++ b/migration.h
@@ -76,6 +76,8 @@ uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
 
+void sort_ram_list(void);
+int ram_save_block(QEMUFile *f);
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 12/41] arch_init: factor out setting last_block, last_offset

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   13 -
 arch_init.h |1 +
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2617478..22d9691 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -203,6 +203,12 @@ int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t 
offset)
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
+void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset)
+{
+last_block = block;
+last_offset = offset;
+}
+
 int ram_save_block(QEMUFile *f)
 {
 RAMBlock *block = last_block;
@@ -230,9 +236,7 @@ int ram_save_block(QEMUFile *f)
 }
 } while (block != last_block || offset != last_offset);
 
-last_block = block;
-last_offset = offset;
-
+ram_save_set_last_block(block, offset);
 return bytes_sent;
 }
 
@@ -349,8 +353,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 if (stage == 1) {
 bytes_transferred = 0;
 last_block_sent = NULL;
-last_block = NULL;
-last_offset = 0;
+ram_save_set_last_block(NULL, 0);
 sort_ram_list();
 
 /* Make sure all dirty bits are set */
diff --git a/arch_init.h b/arch_init.h
index 7f5c77a..15548cd 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -40,6 +40,7 @@ int xen_available(void);
 #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
 
 #if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
+void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset);
 int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 RAMBlock *ram_find_block(const char *id, uint8_t len);
 void *ram_load_host_from_stream_offset(QEMUFile *f,
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |1 +
 savevm.c|   12 
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 331ac8b..98a8023 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -71,6 +71,7 @@ QEMUFile *qemu_fopen_socket(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
+int qemu_file_fd(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_buffered_file_drain(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index fb47529..cba1a69 100644
--- a/savevm.c
+++ b/savevm.c
@@ -178,6 +178,7 @@ struct QEMUFile {
 uint8_t buf[IO_BUF_SIZE];
 
 int last_error;
+int fd; /* -1 means fd isn't associated */
 };
 
 typedef struct QEMUFileStdio
@@ -276,6 +277,7 @@ QEMUFile *qemu_popen(FILE *stdio_file, const char *mode)
 s-file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_pclose, 
 NULL, NULL, NULL);
 }
+s-file-fd = fileno(stdio_file);
 return s-file;
 }
 
@@ -291,6 +293,7 @@ QEMUFile *qemu_popen_cmd(const char *command, const char 
*mode)
 return qemu_popen(popen_file, mode);
 }
 
+/* TODO: replace this with qemu_file_fd() */
 int qemu_stdio_fd(QEMUFile *f)
 {
 QEMUFileStdio *p;
@@ -325,6 +328,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 s-file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_fclose, 
 NULL, NULL, NULL);
 }
+s-file-fd = fd;
 return s-file;
 
 fail:
@@ -339,6 +343,7 @@ QEMUFile *qemu_fopen_socket(int fd)
 s-fd = fd;
 s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
 NULL, NULL, NULL);
+s-file-fd = fd;
 return s-file;
 }
 
@@ -381,6 +386,7 @@ QEMUFile *qemu_fopen(const char *filename, const char *mode)
 s-file = qemu_fopen_ops(s, NULL, file_get_buffer, stdio_fclose, 
   NULL, NULL, NULL);
 }
+s-file-fd = fileno(s-stdio_file);
 return s-file;
 fail:
 g_free(s);
@@ -431,10 +437,16 @@ QEMUFile *qemu_fopen_ops(void *opaque, 
QEMUFilePutBufferFunc *put_buffer,
 f-set_rate_limit = set_rate_limit;
 f-get_rate_limit = get_rate_limit;
 f-is_write = 0;
+f-fd = -1;
 
 return f;
 }
 
+int qemu_file_fd(QEMUFile *f)
+{
+return f-fd;
+}
+
 int qemu_file_get_error(QEMUFile *f)
 {
 return f-last_error;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close

2012-06-04 Thread Isaku Yamahata
Later the structure will be shared.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 savevm.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/savevm.c b/savevm.c
index 4b560b3..2fb0c3e 100644
--- a/savevm.c
+++ b/savevm.c
@@ -187,14 +187,14 @@ typedef struct QEMUFileStdio
 QEMUFile *file;
 } QEMUFileStdio;
 
-typedef struct QEMUFileSocket
+typedef struct QEMUFileFD
 {
 QEMUFile *file;
-} QEMUFileSocket;
+} QEMUFileFD;
 
 static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
 {
-QEMUFileSocket *s = opaque;
+QEMUFileFD *s = opaque;
 ssize_t len;
 
 do {
@@ -207,9 +207,9 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
-static int socket_close(void *opaque)
+static int fd_close(void *opaque)
 {
-QEMUFileSocket *s = opaque;
+QEMUFileFD *s = opaque;
 g_free(s);
 return 0;
 }
@@ -325,9 +325,9 @@ fail:
 
 QEMUFile *qemu_fopen_socket(int fd)
 {
-QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket));
+QEMUFileFD *s = g_malloc0(sizeof(QEMUFileFD));
 
-s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
+s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, fd_close,
 NULL, NULL, NULL);
 s-file-fd = fd;
 return s-file;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd

2012-06-04 Thread Isaku Yamahata
Introduce nonblocking fd read backend of QEMUFile.
This will be used by postcopy live migration.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |1 +
 savevm.c|   40 
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 1a12e7d..af5b123 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -68,6 +68,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_fd(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_file_fd(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index 2fb0c3e..5640614 100644
--- a/savevm.c
+++ b/savevm.c
@@ -207,6 +207,35 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static int fd_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileFD *s = opaque;
+ssize_t len = 0;
+
+while (size  0) {
+ssize_t ret = read(s-file-fd, buf, size);
+if (ret == -1) {
+if (errno == EINTR) {
+continue;
+}
+if (len == 0) {
+len = -errno;
+}
+break;
+}
+
+if (ret == 0) {
+/* the write end of the pipe is closed */
+break;
+}
+len += ret;
+buf += ret;
+size -= ret;
+}
+
+return len;
+}
+
 static int fd_close(void *opaque)
 {
 QEMUFileFD *s = opaque;
@@ -333,6 +362,17 @@ QEMUFile *qemu_fopen_socket(int fd)
 return s-file;
 }
 
+QEMUFile *qemu_fopen_fd(int fd)
+{
+QEMUFileFD *s = g_malloc0(sizeof(*s));
+
+fcntl_setfl(fd, O_NONBLOCK);
+s-file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close,
+ NULL, NULL, NULL);
+s-file-fd = fd;
+return s-file;
+}
+
 static int file_put_buffer(void *opaque, const uint8_t *buf,
 int64_t pos, int size)
 {
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of buffered file

2012-06-04 Thread Isaku Yamahata
Introduce a new method to drain the buffer of QEMUBufferedFile.
When postcopy migration, buffer size can increase unboundedly.
To keep the buffer size reasonably small, introduce the method to
wait for buffer to drain.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 buffered_file.c |   20 +++-
 buffered_file.h |1 +
 qemu-file.h |1 +
 savevm.c|7 +++
 4 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index f170aa0..a38caec 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -170,6 +170,15 @@ static int buffered_put_buffer(void *opaque, const uint8_t 
*buf, int64_t pos, in
 return offset;
 }
 
+static void buffered_drain(QEMUFileBuffered *s)
+{
+while (!qemu_file_get_error(s-file)  s-buffer_size) {
+buffered_flush(s);
+if (s-freeze_output)
+s-wait_for_unfreeze(s-opaque);
+}
+}
+
 static int buffered_close(void *opaque)
 {
 QEMUFileBuffered *s = opaque;
@@ -177,11 +186,7 @@ static int buffered_close(void *opaque)
 
 DPRINTF(closing\n);
 
-while (!qemu_file_get_error(s-file)  s-buffer_size) {
-buffered_flush(s);
-if (s-freeze_output)
-s-wait_for_unfreeze(s-opaque);
-}
+buffered_drain(s);
 
 ret = s-close(s-opaque);
 
@@ -291,3 +296,8 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
 
 return s-file;
 }
+
+void qemu_buffered_file_drain_buffer(void *buffered_file)
+{
+buffered_drain(buffered_file);
+}
diff --git a/buffered_file.h b/buffered_file.h
index 98d358b..cd8e1e8 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -26,5 +26,6 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t 
xfer_limit,
   BufferedPutReadyFunc *put_ready,
   BufferedWaitForUnfreezeFunc 
*wait_for_unfreeze,
   BufferedCloseFunc *close);
+void qemu_buffered_file_drain_buffer(void *buffered_file);
 
 #endif
diff --git a/qemu-file.h b/qemu-file.h
index 880ef4b..331ac8b 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -72,6 +72,7 @@ QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
+void qemu_buffered_file_drain(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
diff --git a/savevm.c b/savevm.c
index 2992f97..fb47529 100644
--- a/savevm.c
+++ b/savevm.c
@@ -85,6 +85,7 @@
 #include cpus.h
 #include memory.h
 #include qmp-commands.h
+#include buffered_file.h
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -477,6 +478,12 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void qemu_buffered_file_drain(QEMUFile *f)
+{
+qemu_fflush(f);
+qemu_buffered_file_drain_buffer(f-opaque);
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd

2012-06-04 Thread Isaku Yamahata
fd is already stored in QEMUFile so drop duplicated member
QEMUFileSocket::fd.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 savevm.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/savevm.c b/savevm.c
index ec9f5d0..4b560b3 100644
--- a/savevm.c
+++ b/savevm.c
@@ -189,7 +189,6 @@ typedef struct QEMUFileStdio
 
 typedef struct QEMUFileSocket
 {
-int fd;
 QEMUFile *file;
 } QEMUFileSocket;
 
@@ -199,7 +198,7 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 ssize_t len;
 
 do {
-len = qemu_recv(s-fd, buf, size, 0);
+len = qemu_recv(s-file-fd, buf, size, 0);
 } while (len == -1  socket_error() == EINTR);
 
 if (len == -1)
@@ -328,7 +327,6 @@ QEMUFile *qemu_fopen_socket(int fd)
 {
 QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket));
 
-s-fd = fd;
 s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
 NULL, NULL, NULL);
 s-file-fd = fd;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 24/41] migration: export migrate_fd_completed() and migrate_fd_cleanup()

2012-06-04 Thread Isaku Yamahata
This will be used by postcopy migration.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration.c |4 ++--
 migration.h |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration.c b/migration.c
index 753addb..48a8f68 100644
--- a/migration.c
+++ b/migration.c
@@ -159,7 +159,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
 /* shared migration helpers */
 
-static int migrate_fd_cleanup(MigrationState *s)
+int migrate_fd_cleanup(MigrationState *s)
 {
 int ret = 0;
 
@@ -187,7 +187,7 @@ void migrate_fd_error(MigrationState *s)
 migrate_fd_cleanup(s);
 }
 
-static void migrate_fd_completed(MigrationState *s)
+void migrate_fd_completed(MigrationState *s)
 {
 DPRINTF(setting completed state\n);
 if (migrate_fd_cleanup(s)  0) {
diff --git a/migration.h b/migration.h
index 6cf4512..d0dd536 100644
--- a/migration.h
+++ b/migration.h
@@ -62,7 +62,9 @@ int fd_start_incoming_migration(const char *path);
 
 int fd_start_outgoing_migration(MigrationState *s, const char *fdname);
 
+int migrate_fd_cleanup(MigrationState *s);
 void migrate_fd_error(MigrationState *s);
+void migrate_fd_completed(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s);
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip

2012-06-04 Thread Isaku Yamahata
Those will be used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |3 +++
 savevm.c|6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 31b83f6..a285bef 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -88,6 +88,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 int qemu_get_byte(QEMUFile *f);
+int qemu_peek_byte(QEMUFile *f, int offset);
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
+void qemu_file_skip(QEMUFile *f, int size);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index 2d18bab..8ad843f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -588,14 +588,14 @@ void qemu_put_byte(QEMUFile *f, int v)
 qemu_fflush(f);
 }
 
-static void qemu_file_skip(QEMUFile *f, int size)
+void qemu_file_skip(QEMUFile *f, int size)
 {
 if (f-buf_index + size = f-buf_size) {
 f-buf_index += size;
 }
 }
 
-static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
 int pending;
 int index;
@@ -643,7 +643,7 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
 return done;
 }
 
-static int qemu_peek_byte(QEMUFile *f, int offset)
+int qemu_peek_byte(QEMUFile *f, int offset)
 {
 int index = f-buf_index + offset;
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option

2012-06-04 Thread Isaku Yamahata
This patch prepares for postcopy livemigration.
It introduces -postcopy option and its internal flag, migration_postcopy.
It introduces -postcopy-flags for chaging the behavior of incoming postcopy
mainly for benchmark/debug.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration.h |3 +++
 qemu-options.hx |   22 ++
 vl.c|8 
 3 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/migration.h b/migration.h
index 59e6e68..4bbcf06 100644
--- a/migration.h
+++ b/migration.h
@@ -103,4 +103,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+extern bool incoming_postcopy;
+extern unsigned long incoming_postcopy_flags;
+
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index 8b66264..a9af31e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2616,6 +2616,28 @@ STEXI
 Prepare for incoming migration, listen on @var{port}.
 ETEXI
 
+DEF(postcopy, 0, QEMU_OPTION_postcopy,
+-postcopy postcopy incoming migration when -incoming is specified\n,
+QEMU_ARCH_ALL)
+STEXI
+@item -postcopy
+@findex -postcopy
+start incoming migration in postcopy mode.
+ETEXI
+
+DEF(postcopy-flags, HAS_ARG, QEMU_OPTION_postcopy_flags,
+-postcopy-flags unsigned-int(flags)\n
+  flags for postcopy incoming migration\n
+   when -incoming and -postcopy are specified.\n
+   This is for benchmark/debug purpose (default: 0)\n,
+QEMU_ARCH_ALL)
+STEXI
+@item -postcopy-flags int
+@findex -postcopy-flags
+Specify flags for incoming postcopy migration when -incoming and -postcopy are
+specified. This is for benchamrk/debug purpose. (default: 0)
+ETEXI
+
 DEF(nodefaults, 0, QEMU_OPTION_nodefaults, \
 -nodefaults don't create default devices\n, QEMU_ARCH_ALL)
 STEXI
diff --git a/vl.c b/vl.c
index 62dc343..1674abb 100644
--- a/vl.c
+++ b/vl.c
@@ -189,6 +189,8 @@ int mem_prealloc = 0; /* force preallocation of physical 
target memory */
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
 int autostart;
+bool incoming_postcopy = false; /* When -incoming is specified, postcopy mode 
*/
+unsigned long incoming_postcopy_flags = 0; /* flags for postcopy incoming mode 
*/
 static int rtc_utc = 1;
 static int rtc_date_offset = -1; /* -1 means no change */
 QEMUClock *rtc_clock;
@@ -3115,6 +3117,12 @@ int main(int argc, char **argv, char **envp)
 incoming = optarg;
 runstate_set(RUN_STATE_INMIGRATE);
 break;
+case QEMU_OPTION_postcopy:
+incoming_postcopy = true;
+break;
+case QEMU_OPTION_postcopy_flags:
+incoming_postcopy_flags = strtoul(optarg, NULL, 0);
+break;
 case QEMU_OPTION_nodefaults:
 default_serial = 0;
 default_parallel = 0;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 35/41] postcopy: introduce helper functions for postcopy

2012-06-04 Thread Isaku Yamahata
This patch introduces helper function for postcopy to access
umem char device and to communicate between incoming-qemu and umemd.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
changes v1 - v2:
- code simplification
- make fault trigger more robust
- introduce struct umem_pages
---
 umem.c |  364 
 umem.h |  101 ++
 2 files changed, 465 insertions(+), 0 deletions(-)
 create mode 100644 umem.c
 create mode 100644 umem.h

diff --git a/umem.c b/umem.c
new file mode 100644
index 000..64eaab5
--- /dev/null
+++ b/umem.c
@@ -0,0 +1,364 @@
+/*
+ * umem.c: user process backed memory module for postcopy livemigration
+ *
+ * Copyright (c) 2011
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata yamahata at valinux co jp
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see http://www.gnu.org/licenses/.
+ */
+
+#include sys/ioctl.h
+#include sys/mman.h
+
+#include linux/umem.h
+
+#include bitops.h
+#include sysemu.h
+#include hw/hw.h
+#include umem.h
+
+//#define DEBUG_UMEM
+#ifdef DEBUG_UMEM
+#include sys/syscall.h
+#define DPRINTF(format, ...)\
+do {\
+printf(%d:%ld %s:%d format, getpid(), syscall(SYS_gettid),\
+   __func__, __LINE__, ## __VA_ARGS__); \
+} while (0)
+#else
+#define DPRINTF(format, ...)do { } while (0)
+#endif
+
+#define DEV_UMEM/dev/umem
+
+UMem *umem_new(void *hostp, size_t size)
+{
+struct umem_init uinit = {
+.size = size,
+};
+UMem *umem;
+
+assert((size % getpagesize()) == 0);
+umem = g_new(UMem, 1);
+umem-fd = open(DEV_UMEM, O_RDWR);
+if (umem-fd  0) {
+perror(can't open DEV_UMEM);
+abort();
+}
+
+if (ioctl(umem-fd, UMEM_INIT, uinit)  0) {
+perror(UMEM_INIT);
+abort();
+}
+if (ftruncate(uinit.shmem_fd, uinit.size)  0) {
+perror(truncate(\shmem_fd\));
+abort();
+}
+
+umem-nbits = 0;
+umem-nsets = 0;
+umem-faulted = NULL;
+umem-page_shift = ffs(getpagesize()) - 1;
+umem-shmem_fd = uinit.shmem_fd;
+umem-size = uinit.size;
+umem-umem = mmap(hostp, size, PROT_EXEC | PROT_READ | PROT_WRITE,
+  MAP_PRIVATE | MAP_FIXED, umem-fd, 0);
+if (umem-umem == MAP_FAILED) {
+perror(mmap(UMem) failed);
+abort();
+}
+return umem;
+}
+
+void umem_destroy(UMem *umem)
+{
+if (umem-fd != -1) {
+close(umem-fd);
+}
+if (umem-shmem_fd != -1) {
+close(umem-shmem_fd);
+}
+g_free(umem-faulted);
+g_free(umem);
+}
+
+void umem_get_page_request(UMem *umem, struct umem_pages *page_request)
+{
+ssize_t ret = read(umem-fd, page_request-pgoffs,
+   page_request-nr * sizeof(page_request-pgoffs[0]));
+if (ret  0) {
+perror(daemon: umem read);
+abort();
+}
+page_request-nr = ret / sizeof(page_request-pgoffs[0]);
+}
+
+void umem_mark_page_cached(UMem *umem, struct umem_pages *page_cached)
+{
+const void *buf = page_cached-pgoffs;
+ssize_t left = page_cached-nr * sizeof(page_cached-pgoffs[0]);
+
+while (left  0) {
+ssize_t ret = write(umem-fd, buf, left);
+if (ret == -1) {
+if (errno == EINTR)
+continue;
+
+perror(daemon: umem write);
+abort();
+}
+
+left -= ret;
+buf += ret;
+}
+}
+
+void umem_unmap(UMem *umem)
+{
+munmap(umem-umem, umem-size);
+umem-umem = NULL;
+}
+
+void umem_close(UMem *umem)
+{
+close(umem-fd);
+umem-fd = -1;
+}
+
+void *umem_map_shmem(UMem *umem)
+{
+umem-nbits = umem-size  umem-page_shift;
+umem-nsets = 0;
+umem-faulted = g_new0(unsigned long, BITS_TO_LONGS(umem-nbits));
+
+umem-shmem = mmap(NULL, umem-size, PROT_READ | PROT_WRITE, MAP_SHARED,
+   umem-shmem_fd, 0);
+if (umem-shmem == MAP_FAILED) {
+perror(daemon: mmap(\shmem\));
+abort();
+}
+return umem-shmem;
+}
+
+void umem_unmap_shmem(UMem *umem)
+{
+munmap(umem-shmem, umem-size);
+umem-shmem = NULL;
+}
+
+void umem_remove_shmem(UMem *umem, size_t offset, size_t size)
+{
+int s = offset  umem-page_shift

[PATCH v2 40/41] migrate: add -m (movebg) option to migrate command

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hmp-commands.hx  |5 +++--
 hmp.c|3 ++-
 migration.c  |8 +++-
 migration.h  |1 +
 qapi-schema.json |2 +-
 qmp-commands.hx  |2 +-
 savevm.c |1 +
 7 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 38e5c95..1912cb8 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -798,15 +798,16 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+.args_type  = 
detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s,
  forward:i?,backward:i?,
-.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword],
+.params = [-d] [-b] [-i] [-p [-n] [-m] uri [forward] [backword],
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
  (base image shared between src and destination)
  \n\t\t\t-p for migration with postcopy mode enabled
+ \n\t\t\t-m for move background transfer of postcopy mode
  \n\t\t\t-n for no background transfer of postcopy mode
  \n\t\t\tforward: the number of pages to 
  forward-prefault when postcopy (default 0)
diff --git a/hmp.c b/hmp.c
index 79a9c86..dd3f307 100644
--- a/hmp.c
+++ b/hmp.c
@@ -912,6 +912,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int blk = qdict_get_try_bool(qdict, blk, 0);
 int inc = qdict_get_try_bool(qdict, inc, 0);
 int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
+int movebg = qdict_get_try_bool(qdict, movebg, 0);
 int nobg = qdict_get_try_bool(qdict, nobg, 0);
 int forward = qdict_get_try_int(qdict, forward, 0);
 int backward = qdict_get_try_int(qdict, backward, 0);
@@ -919,7 +920,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 Error *err = NULL;
 
 qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
-!!postcopy, postcopy, !!nobg, nobg,
+!!postcopy, postcopy, !!movebg, movebg, !!nobg, nobg,
 !!forward, forward, !!backward, backward,
 err);
 if (err) {
diff --git a/migration.c b/migration.c
index e026085..c5e6820 100644
--- a/migration.c
+++ b/migration.c
@@ -422,7 +422,9 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_inc, bool inc, bool has_detach, bool detach,
- bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+ bool has_postcopy, bool postcopy,
+ bool has_movebg, bool movebg,
+ bool has_nobg, bool nobg,
  bool has_forward, int64_t forward,
  bool has_backward, int64_t backward,
  Error **errp)
@@ -432,6 +434,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 .blk = false,
 .shared = false,
 .postcopy = false,
+.movebg = false,
 .nobg = false,
 .prefault_forward = 0,
 .prefault_backward = 0,
@@ -448,6 +451,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 if (has_postcopy) {
 params.postcopy = postcopy;
 }
+if (has_movebg) {
+params.movebg = movebg;
+}
 if (has_nobg) {
 params.nobg = nobg;
 }
diff --git a/migration.h b/migration.h
index 9a9b9c6..1e98b20 100644
--- a/migration.h
+++ b/migration.h
@@ -23,6 +23,7 @@ struct MigrationParams {
 int blk;
 int shared;
 int postcopy;
+int movebg;
 int nobg;
 int64_t prefault_forward;
 int64_t prefault_backward;
diff --git a/qapi-schema.json b/qapi-schema.json
index 83c2170..ef2f48e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1718,7 +1718,7 @@
 ##
 { 'command': 'migrate',
   'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
-   '*postcopy': 'bool', '*nobg': 'bool',
+   '*postcopy': 'bool', '*movebg': 'bool', '*nobg': 'bool',
'*forward': 'int', '*backward': 'int'} }
 
 # @xen-save-devices-state:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7b5e5b7..5c9ecc8 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -469,7 +469,7 @@ EQMP
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+.args_type  = 
detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s,
 .mhandler.cmd_new = qmp_marshal_input_migrate,
 },
 
diff --git a/savevm.c b/savevm.c
index 48b636d..19bb8f1 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1781,6 +1781,7 @@ static int qemu_savevm_state

[PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration

2012-06-04 Thread Isaku Yamahata
This patch implements postcopy live migration for outgoing part

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v1 - v2:
- fix parameter to qemu_fdopen()
- handle QEMU_UMEM_REQ_EOC properly
  when PO_STATE_ALL_PAGES_SENT, QEMU_UMEM_REQ_EOC request was ignored.
  handle properly it.
- flush on-demand page unconditionally
- improve postcopy_outgoing_ram_save_live and postcopy_outgoing_begin()
- use qemu_fopen_fd
- use memory api instead of obsolete api
- segv in postcopy_outgoing_check_all_ram_sent()
- catch up qapi change
---
 arch_init.c   |   19 ++-
 migration-exec.c  |4 +
 migration-fd.c|   17 ++
 migration-postcopy-stub.c |   22 +++
 migration-postcopy.c  |  450 +
 migration-tcp.c   |   25 ++-
 migration-unix.c  |   26 ++-
 migration.c   |   32 +++-
 migration.h   |   12 ++
 savevm.c  |   22 ++-
 sysemu.h  |2 +-
 11 files changed, 614 insertions(+), 17 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 22d9691..3599e5c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -154,6 +154,13 @@ static int is_dup_page(uint8_t *page)
 return 1;
 }
 
+static bool outgoing_postcopy = false;
+
+void ram_save_set_params(const MigrationParams *params, void *opaque)
+{
+outgoing_postcopy = params-postcopy;
+}
+
 static RAMBlock *last_block_sent = NULL;
 static uint64_t bytes_transferred;
 
@@ -343,6 +350,15 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 uint64_t expected_time = 0;
 int ret;
 
+if (stage == 1) {
+bytes_transferred = 0;
+last_block_sent = NULL;
+ram_save_set_last_block(NULL, 0);
+}
+if (outgoing_postcopy) {
+return postcopy_outgoing_ram_save_live(f, stage, opaque);
+}
+
 if (stage  0) {
 memory_global_dirty_log_stop();
 return 0;
@@ -351,9 +367,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 memory_global_sync_dirty_bitmap(get_system_memory());
 
 if (stage == 1) {
-bytes_transferred = 0;
-last_block_sent = NULL;
-ram_save_set_last_block(NULL, 0);
 sort_ram_list();
 
 /* Make sure all dirty bits are set */
diff --git a/migration-exec.c b/migration-exec.c
index 7f08b3b..a90da5c 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -64,6 +64,10 @@ int exec_start_outgoing_migration(MigrationState *s, const 
char *command)
 {
 FILE *f;
 
+if (s-params.postcopy) {
+return -ENOSYS;
+}
+
 f = popen(command, w);
 if (f == NULL) {
 DPRINTF(Unable to popen exec target\n);
diff --git a/migration-fd.c b/migration-fd.c
index 42b8162..83b5f18 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -90,6 +90,23 @@ int fd_start_outgoing_migration(MigrationState *s, const 
char *fdname)
 s-write = fd_write;
 s-close = fd_close;
 
+if (s-params.postcopy) {
+int flags = fcntl(s-fd, F_GETFL);
+if ((flags  O_ACCMODE) != O_RDWR) {
+goto err_after_open;
+}
+
+s-fd_read = dup(s-fd);
+if (s-fd_read == -1) {
+goto err_after_open;
+}
+s-file_read = qemu_fopen_fd(s-fd_read);
+if (s-file_read == NULL) {
+close(s-fd_read);
+goto err_after_open;
+}
+}
+
 migrate_fd_connect(s);
 return 0;
 
diff --git a/migration-postcopy-stub.c b/migration-postcopy-stub.c
index f9ebcbe..9c64827 100644
--- a/migration-postcopy-stub.c
+++ b/migration-postcopy-stub.c
@@ -24,6 +24,28 @@
 #include sysemu.h
 #include migration.h
 
+int postcopy_outgoing_create_read_socket(MigrationState *s)
+{
+return -ENOSYS;
+}
+
+int postcopy_outgoing_ram_save_live(Monitor *mon,
+QEMUFile *f, int stage, void *opaque)
+{
+return -ENOSYS;
+}
+
+void *postcopy_outgoing_begin(MigrationState *ms)
+{
+return NULL;
+}
+
+int postcopy_outgoing_ram_save_background(Monitor *mon, QEMUFile *f,
+  void *postcopy)
+{
+return -ENOSYS;
+}
+
 int postcopy_incoming_init(const char *incoming, bool incoming_postcopy)
 {
 return -ENOSYS;
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 5913e05..eb37094 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -177,6 +177,456 @@ static void postcopy_incoming_send_req(QEMUFile *f,
 }
 }
 
+static int postcopy_outgoing_recv_req_idstr(QEMUFile *f,
+struct qemu_umem_req *req,
+size_t *offset)
+{
+int ret;
+
+req-len = qemu_peek_byte(f, *offset);
+*offset += 1;
+if (req-len == 0) {
+return -EAGAIN;
+}
+req-idstr = g_malloc((int)req-len + 1);
+ret = qemu_peek_buffer(f, (uint8_t*)req-idstr, req-len, *offset);
+*offset += ret;
+if (ret != req-len) {
+g_free(req-idstr);
+req

[PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the size of prefault

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hmp-commands.hx  |   15 ++-
 hmp.c|3 +++
 migration.c  |   20 
 migration.h  |2 ++
 qapi-schema.json |3 ++-
 5 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3c647f7..38e5c95 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -798,26 +798,31 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
-.params = [-d] [-b] [-i] [-p [-n]] uri,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+ forward:i?,backward:i?,
+.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword],
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
  (base image shared between src and destination)
  \n\t\t\t-p for migration with postcopy mode enabled
- \n\t\t\t-n for no background transfer of postcopy mode,
+ \n\t\t\t-n for no background transfer of postcopy mode
+ \n\t\t\tforward: the number of pages to 
+ forward-prefault when postcopy (default 0)
+ \n\t\t\tbackward: the number of pages to 
+ backward-prefault when postcopy (default 0),
 .mhandler.cmd = hmp_migrate,
 },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @var{forward} @var{backward}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-b for migration with full copy of disk
-i for migration with incremental copy of disk (base image is shared)
-   -p for migration with postcopy mode enabled
+   -p for migration with postcopy mode enabled (forward/backward is 
prefault size when postcopy)
-n for migration with postcopy mode enabled without background transfer
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index d546a52..79a9c86 100644
--- a/hmp.c
+++ b/hmp.c
@@ -913,11 +913,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int inc = qdict_get_try_bool(qdict, inc, 0);
 int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
 int nobg = qdict_get_try_bool(qdict, nobg, 0);
+int forward = qdict_get_try_int(qdict, forward, 0);
+int backward = qdict_get_try_int(qdict, backward, 0);
 const char *uri = qdict_get_str(qdict, uri);
 Error *err = NULL;
 
 qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
 !!postcopy, postcopy, !!nobg, nobg,
+!!forward, forward, !!backward, backward,
 err);
 if (err) {
 monitor_printf(mon, migrate: %s\n, error_get_pretty(err));
diff --git a/migration.c b/migration.c
index e8be0d1..e026085 100644
--- a/migration.c
+++ b/migration.c
@@ -423,6 +423,8 @@ void migrate_del_blocker(Error *reason)
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_inc, bool inc, bool has_detach, bool detach,
  bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+ bool has_forward, int64_t forward,
+ bool has_backward, int64_t backward,
  Error **errp)
 {
 MigrationState *s = migrate_get_current();
@@ -431,6 +433,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 .shared = false,
 .postcopy = false,
 .nobg = false,
+.prefault_forward = 0,
+.prefault_backward = 0,
 };
 const char *p;
 int ret;
@@ -447,6 +451,22 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 if (has_nobg) {
 params.nobg = nobg;
 }
+if (has_forward) {
+if (forward  0) {
+error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+  forward, forward = 0);
+return;
+}
+params.prefault_forward = forward;
+}
+if (has_backward) {
+if (backward  0) {
+error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+  backward, backward = 0);
+return;
+}
+params.prefault_backward = backward;
+}
 
 if (s-state == MIG_STATE_ACTIVE) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 90f3bdf..9a9b9c6 100644
--- a/migration.h
+++ b/migration.h
@@ -24,6 +24,8 @@ struct MigrationParams {
 int shared;
 int postcopy;
 int nobg;
+int64_t prefault_forward;
+int64_t prefault_backward;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index 5861fb9..83c2170 100644

[PATCH v2 32/41] savevm: add new section that is used by postcopy

2012-06-04 Thread Isaku Yamahata
This is used by postcopy to tell the total length of QEMU_VM_SECTION_FULL
and QEMU_VM_SUBSECTION from outgoing to incoming.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 savevm.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/savevm.c b/savevm.c
index 318ec61..3adabad 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1597,6 +1597,7 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
 #define QEMU_VM_SECTION_END  0x03
 #define QEMU_VM_SECTION_FULL 0x04
 #define QEMU_VM_SUBSECTION   0x05
+#define QEMU_VM_POSTCOPY 0x10
 
 bool qemu_savevm_state_blocked(Error **errp)
 {
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 39/41] postcopy/outgoing: implement prefault

2012-06-04 Thread Isaku Yamahata
When page is requested, send surrounding pages are also sent.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration-postcopy.c |   56 +
 1 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/migration-postcopy.c b/migration-postcopy.c
index eb37094..6165657 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -353,6 +353,36 @@ int postcopy_outgoing_ram_save_live(QEMUFile *f, int 
stage, void *opaque)
 return ret;
 }
 
+static void postcopy_outgoing_ram_save_page(PostcopyOutgoingState *s,
+uint64_t pgoffset, bool *written,
+bool forward,
+int prefault_pgoffset)
+{
+ram_addr_t offset;
+int ret;
+
+if (forward) {
+pgoffset += prefault_pgoffset;
+} else {
+if (pgoffset  prefault_pgoffset) {
+return;
+}
+pgoffset -= prefault_pgoffset;
+}
+
+offset = pgoffset  TARGET_PAGE_BITS;
+if (offset = s-last_block_read-length) {
+assert(forward);
+assert(prefault_pgoffset  0);
+return;
+}
+
+ret = ram_save_page(s-mig_buffered_write, s-last_block_read, offset);
+if (ret  0) {
+*written = true;
+}
+}
+
 /*
  * return value
  *   0: continue postcopy mode
@@ -364,6 +394,7 @@ static int 
postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
 bool *written)
 {
 int i;
+uint64_t j;
 RAMBlock *block;
 
 DPRINTF(cmd %d state %d\n, req-cmd, s-state);
@@ -398,11 +429,26 @@ static int 
postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
 break;
 }
 for (i = 0; i  req-nr; i++) {
-DPRINTF(offs[%d] 0x%PRIx64\n, i, req-pgoffs[i]);
-int ret = ram_save_page(s-mig_buffered_write, s-last_block_read,
-req-pgoffs[i]  TARGET_PAGE_BITS);
-if (ret  0) {
-*written = true;
+DPRINTF(pgoffs[%d] 0x%PRIx64\n, i, req-pgoffs[i]);
+postcopy_outgoing_ram_save_page(s, req-pgoffs[i], written,
+true, 0);
+}
+/* forward prefault */
+for (j = 1; j = s-ms-params.prefault_forward; j++) {
+for (i = 0; i  req-nr; i++) {
+DPRINTF(pgoffs[%d] + 0x%PRIx64 0x%PRIx64\n,
+i, j, req-pgoffs[i] + j);
+postcopy_outgoing_ram_save_page(s, req-pgoffs[i], written,
+true, j);
+}
+}
+/* backward prefault */
+for (j = 1; j = s-ms-params.prefault_backward; j++) {
+for (i = 0; i  req-nr; i++) {
+DPRINTF(pgoffs[%d] - 0x%PRIx64 0x%PRIx64\n,
+i, j, req-pgoffs[i] - j);
+postcopy_outgoing_ram_save_page(s, req-pgoffs[i], written,
+false, j);
 }
 }
 break;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command

2012-06-04 Thread Isaku Yamahata
Added -p option to migrate command for postcopy mode and
introduce postcopy parameter for migration to indicate that postcopy mode
is enabled.
Add -n option for postcopy migration which indicates disabling background
transfer.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Chnages v1 - v2:
- catch up for qapi change
---
 hmp-commands.hx  |   12 
 hmp.c|6 +-
 migration.c  |9 +
 migration.h  |2 ++
 qapi-schema.json |3 ++-
 qmp-commands.hx  |4 +++-
 savevm.c |2 ++
 7 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 18cb415..3c647f7 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -798,23 +798,27 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,uri:s,
-.params = [-d] [-b] [-i] uri,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+.params = [-d] [-b] [-i] [-p [-n]] uri,
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t-p for migration with postcopy mode enabled
+ \n\t\t\t-n for no background transfer of postcopy mode,
 .mhandler.cmd = hmp_migrate,
 },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-b for migration with full copy of disk
-i for migration with incremental copy of disk (base image is shared)
+   -p for migration with postcopy mode enabled
+   -n for migration with postcopy mode enabled without background transfer
 ETEXI
 
 {
diff --git a/hmp.c b/hmp.c
index bb0952e..d546a52 100644
--- a/hmp.c
+++ b/hmp.c
@@ -911,10 +911,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int detach = qdict_get_try_bool(qdict, detach, 0);
 int blk = qdict_get_try_bool(qdict, blk, 0);
 int inc = qdict_get_try_bool(qdict, inc, 0);
+int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
+int nobg = qdict_get_try_bool(qdict, nobg, 0);
 const char *uri = qdict_get_str(qdict, uri);
 Error *err = NULL;
 
-qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, err);
+qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
+!!postcopy, postcopy, !!nobg, nobg,
+err);
 if (err) {
 monitor_printf(mon, migrate: %s\n, error_get_pretty(err));
 error_free(err);
diff --git a/migration.c b/migration.c
index 3b97aec..7ad62ef 100644
--- a/migration.c
+++ b/migration.c
@@ -388,12 +388,15 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_inc, bool inc, bool has_detach, bool detach,
+ bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
  Error **errp)
 {
 MigrationState *s = migrate_get_current();
 MigrationParams params = {
 .blk = false,
 .shared = false,
+.postcopy = false,
+.nobg = false,
 };
 const char *p;
 int ret;
@@ -404,6 +407,12 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 if (has_inc) {
 params.shared = inc;
 }
+if (has_postcopy) {
+params.postcopy = postcopy;
+}
+if (has_nobg) {
+params.nobg = nobg;
+}
 
 if (s-state == MIG_STATE_ACTIVE) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 4bbcf06..091b446 100644
--- a/migration.h
+++ b/migration.h
@@ -22,6 +22,8 @@
 struct MigrationParams {
 int blk;
 int shared;
+int postcopy;
+int nobg;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index 2ca7195..5861fb9 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1717,7 +1717,8 @@
 # Since: 0.14.0
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
+   '*postcopy': 'bool', '*nobg': 'bool'} }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index db980fa..7b5e5b7 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -469,7 +469,7 @@ EQMP
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,uri:s,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
 .mhandler.cmd_new = qmp_marshal_input_migrate,
 },
 
@@ -483,6

[PATCH v2 41/41] migration/postcopy: add movebg mode

2012-06-04 Thread Isaku Yamahata
When movebg mode is enabled, the point to send background page is set
to the next page to on-demand page.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration-postcopy.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/migration-postcopy.c b/migration-postcopy.c
index 6165657..3df88d7 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -442,6 +442,14 @@ static int 
postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
 true, j);
 }
 }
+if (s-ms-params.movebg) {
+ram_addr_t last_offset =
+(req-pgoffs[req-nr - 1] + s-ms-params.prefault_forward) 
+TARGET_PAGE_BITS;
+last_offset = MIN(last_offset,
+  s-last_block_read-length - TARGET_PAGE_SIZE);
+ram_save_set_last_block(s-last_block_read, last_offset);
+}
 /* backward prefault */
 for (j = 1; j = s-ms-params.prefault_backward; j++) {
 for (i = 0; i  req-nr; i++) {
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration

2012-06-04 Thread Isaku Yamahata
This patch implements postcopy live migration for incoming part

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v3 - v4:
- fork umemd early to address qemu devices touching guest ram via
  post/pre_load
- code clean up on initialization
- Makefile.target
  migration-postcopy.c is target dependent due to TARGET_PAGE_xxx
  So it can't be shared between target architecture.
- use qemu_fopen_fd
- introduce incoming_flags_use_umem_make_present flag
- use MADV_DONTNEED

Changes v2 - v3:
- make incoming socket nonblocking
- several clean ups
- Dropped QEMUFilePipe
- Moved QEMUFileNonblock to buffered_file
- Split out into umem/incoming/outgoing

Changes v1 - v2:
- make mig_read nonblocking when socket
- updates for umem device changes
---
 Makefile.target|5 +
 cpu-all.h  |7 +
 exec.c |   20 +-
 migration-exec.c   |4 +
 migration-fd.c |6 +
 .../linux/umem.h = migration-postcopy-stub.c  |   47 +-
 migration-postcopy.c   | 1267 
 migration.c|4 +
 migration.h|   13 +
 qemu-common.h  |1 +
 qemu-options.hx|5 +-
 savevm.c   |   43 +
 vl.c   |8 +-
 13 files changed, 1409 insertions(+), 21 deletions(-)
 copy linux-headers/linux/umem.h = migration-postcopy-stub.c (55%)
 create mode 100644 migration-postcopy.c

diff --git a/Makefile.target b/Makefile.target
index 1582904..618bd3e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -4,6 +4,7 @@ GENERATED_HEADERS = config-target.h
 CONFIG_NO_PCI = $(if $(subst n,,$(CONFIG_PCI)),n,y)
 CONFIG_NO_KVM = $(if $(subst n,,$(CONFIG_KVM)),n,y)
 CONFIG_NO_XEN = $(if $(subst n,,$(CONFIG_XEN)),n,y)
+CONFIG_NO_POSTCOPY = $(if $(subst n,,$(CONFIG_POSTCOPY)),n,y)
 
 include ../config-host.mak
 include config-devices.mak
@@ -196,6 +197,10 @@ LIBS+=-lz
 
 obj-i386-$(CONFIG_KVM) += hyperv.o
 
+obj-$(CONFIG_POSTCOPY) += migration-postcopy.o
+obj-$(CONFIG_NO_POSTCOPY) += migration-postcopy-stub.o
+common-obj-$(CONFIG_POSTCOPY) += umem.o
+
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
 QEMU_CFLAGS += $(VNC_JPEG_CFLAGS)
diff --git a/cpu-all.h b/cpu-all.h
index ff7f827..e0956bc 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -486,6 +486,9 @@ extern ram_addr_t ram_size;
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
 #define RAM_PREALLOC_MASK   (1  0)
 
+/* RAM is allocated via umem for postcopy incoming mode */
+#define RAM_POSTCOPY_UMEM_MASK  (1  1)
+
 typedef struct RAMBlock {
 struct MemoryRegion *mr;
 uint8_t *host;
@@ -497,6 +500,10 @@ typedef struct RAMBlock {
 #if defined(__linux__)  !defined(TARGET_S390X)
 int fd;
 #endif
+
+#ifdef CONFIG_POSTCOPY
+UMem *umem;/* for incoming postcopy mode */
+#endif
 } RAMBlock;
 
 typedef struct RAMList {
diff --git a/exec.c b/exec.c
index 785..e5ff2ed 100644
--- a/exec.c
+++ b/exec.c
@@ -36,6 +36,7 @@
 #include arch_init.h
 #include memory.h
 #include exec-memory.h
+#include migration.h
 #if defined(CONFIG_USER_ONLY)
 #include qemu.h
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -2632,6 +2633,13 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 new_block-host = host;
 new_block-flags |= RAM_PREALLOC_MASK;
 } else {
+#ifdef CONFIG_POSTCOPY
+if (incoming_postcopy) {
+ram_addr_t page_size = getpagesize();
+size = (size + page_size - 1)  ~(page_size - 1);
+mem_path = NULL;
+}
+#endif
 if (mem_path) {
 #if defined (__linux__)  !defined(TARGET_S390X)
 new_block-host = file_ram_alloc(new_block, size, mem_path);
@@ -2709,7 +2717,13 @@ void qemu_ram_free(ram_addr_t addr)
 QLIST_REMOVE(block, next);
 if (block-flags  RAM_PREALLOC_MASK) {
 ;
-} else if (mem_path) {
+}
+#ifdef CONFIG_POSTCOPY
+else if (block-flags  RAM_POSTCOPY_UMEM_MASK) {
+postcopy_incoming_ram_free(block-umem);
+}
+#endif
+else if (mem_path) {
 #if defined (__linux__)  !defined(TARGET_S390X)
 if (block-fd) {
 munmap(block-host, block-length);
@@ -2755,6 +2769,10 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 } else {
 flags = MAP_FIXED;
 munmap(vaddr, length);
+if (block-flags  RAM_POSTCOPY_UMEM_MASK) {
+postcopy_incoming_qemu_pages_unmapped(addr, length);
+block-flags = ~RAM_POSTCOPY_UMEM_MASK

[PATCH v2 10/41] arch_init: simplify a bit by ram_find_block()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   21 -
 exec.c  |   12 ++--
 2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9981abe..73bf250 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -432,11 +432,10 @@ void *ram_load_host_from_stream_offset(QEMUFile *f,
 qemu_get_buffer(f, (uint8_t *)id, len);
 id[len] = 0;
 
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id))) {
-*last_blockp = block;
-return memory_region_get_ram_ptr(block-mr) + offset;
-}
+block = ram_find_block(id, len);
+if (block) {
+*last_blockp = block;
+return memory_region_get_ram_ptr(block-mr) + offset;
 }
 
 fprintf(stderr, Can't find block %s!\n, id);
@@ -466,19 +465,15 @@ int ram_load_mem_size(QEMUFile *f, ram_addr_t 
total_ram_bytes)
 id[len] = 0;
 length = qemu_get_be64(f);
 
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id))) {
-if (block-length != length)
-return -EINVAL;
-break;
-}
-}
-
+block = ram_find_block(id, len);
 if (!block) {
 fprintf(stderr, Unknown ramblock \%s\, cannot 
 accept migration\n, id);
 return -EINVAL;
 }
+if (block-length != length) {
+return -EINVAL;
+}
 
 total_ram_bytes -= length;
 }
diff --git a/exec.c b/exec.c
index a0494c7..078a408 100644
--- a/exec.c
+++ b/exec.c
@@ -33,6 +33,7 @@
 #include kvm.h
 #include hw/xen.h
 #include qemu-timer.h
+#include arch_init.h
 #include memory.h
 #include exec-memory.h
 #if defined(CONFIG_USER_ONLY)
@@ -2609,12 +2610,11 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char 
*name, DeviceState *dev)
 }
 pstrcat(new_block-idstr, sizeof(new_block-idstr), name);
 
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (block != new_block  !strcmp(block-idstr, new_block-idstr)) {
-fprintf(stderr, RAMBlock \%s\ already registered, abort!\n,
-new_block-idstr);
-abort();
-}
+block = ram_find_block(new_block-idstr, strlen(new_block-idstr));
+if (block != new_block) {
+fprintf(stderr, RAMBlock \%s\ already registered, abort!\n,
+new_block-idstr);
+abort();
 }
 }
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] export necessary symbols

2012-06-04 Thread Isaku Yamahata
Cc: Andrea Arcangeli aarca...@redhat.com
Cc: Avi Kivity a...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 mm/memcontrol.c |1 +
 mm/mempolicy.c  |1 +
 mm/shmem.c  |1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ac35bcc..265ba2f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2761,6 +2761,7 @@ int mem_cgroup_cache_charge(struct page *page, struct 
mm_struct *mm,
}
return ret;
 }
+EXPORT_SYMBOL_GPL(mem_cgroup_cache_charge);
 
 /*
  * While swap-in, try_charge - commit or cancel, the page is locked.
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f15c1b2..ede02e2 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1907,6 +1907,7 @@ retry_cpuset:
goto retry_cpuset;
return page;
 }
+EXPORT_SYMBOL_GPL(alloc_pages_vma);
 
 /**
  * alloc_pages_current - Allocate pages.
diff --git a/mm/shmem.c b/mm/shmem.c
index 585bd22..f2b8aa7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3041,6 +3041,7 @@ int shmem_zero_setup(struct vm_area_struct *vma)
vma-vm_flags |= VM_CAN_NONLINEAR;
return 0;
 }
+EXPORT_SYMBOL_GPL(shmem_zero_setup);
 
 /**
  * shmem_read_mapping_page_gfp - read into page cache, using specified page 
allocation flags.
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] postcopy migration: umem: Linux char device for postcopy

2012-06-04 Thread Isaku Yamahata
This is Linux kernel driver for qemu/kvm postcopy live migration.
This is used by qemu/kvm postcopy live migration patch.

TODO:
- Consider FUSE/CUSE option
  So far several mmap patches for FUSE/CUSE are floating around. (their
  purpose isn't different from our purpose, though). They haven't merged
  into the upstream yet.
  The driver specific part in qemu patches is modularized. So I expect it
  wouldn't be difficult to switch kernel driver to CUSE based driver.

ioctl commands:
UMEM_INIT: initialize umem device for qemu
UMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process
 This is _NOT_ implemented yet.
 anonymous I'm not sure whether this can be implemented
 or not.
---
Changes v2 - v3:
- make fault handler killable
- make use of read()/write()
- documentation

Changes version 1 - 2:
- make ioctl structures padded to align
- un-KVM
  KVM_VMEM - UMEM
- dropped some ioctl commands as Avi requested

Isaku Yamahata (2):
  export necessary symbols
  umem: chardevice for kvm postcopy

 Documentation/misc-devices/umem.txt |  303 
 drivers/char/Kconfig|   10 +
 drivers/char/Makefile   |1 +
 drivers/char/umem.c |  900 +++
 include/linux/umem.h|   42 ++
 mm/memcontrol.c |1 +
 mm/mempolicy.c  |1 +
 mm/shmem.c  |1 +
 8 files changed, 1259 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/misc-devices/umem.txt
 create mode 100644 drivers/char/umem.c
 create mode 100644 include/linux/umem.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] umem: chardevice for kvm postcopy

2012-06-04 Thread Isaku Yamahata
This is a character device to hook page access.
The page fault in the area is propagated to another user process by
this chardriver. Then, the process fills the page contents and
resolves the page fault.

Cc: Andrea Arcangeli aarca...@redhat.com
Cc: Avi Kivity a...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp

---
Changes v3 - v4:
- simplified umem_init: kill {a,}sync_req_max
- make fault handler killable even when core-dumping
- documentation

Changes v2 - v3:
- made fault handler killable
- allow O_LARGEFILE
- improve to handle FAULT_FLAG_ALLOW_RETRY
- smart on async fault
---
 Documentation/misc-devices/umem.txt |  303 
 drivers/char/Kconfig|   10 +
 drivers/char/Makefile   |1 +
 drivers/char/umem.c |  900 +++
 include/linux/umem.h|   42 ++
 5 files changed, 1256 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/misc-devices/umem.txt
 create mode 100644 drivers/char/umem.c
 create mode 100644 include/linux/umem.h

diff --git a/Documentation/misc-devices/umem.txt 
b/Documentation/misc-devices/umem.txt
new file mode 100644
index 000..61bba5f
--- /dev/null
+++ b/Documentation/misc-devices/umem.txt
@@ -0,0 +1,303 @@
+User process backed memory driver
+=
+
+Intro
+=
+User process backed memory driver provides /dev/umem device.
+This /dev/umem device is designed for some sort of distributed shared memory.
+Especially post-copy live migration with KVM.
+
+page fault in the area backed by this driver is propagated to (other) server
+process which serves the page contents. Usually the server process fetches
+page contents from the remote machine. Then the faulting process continues.
+
+
+Kernel-User protocol
+
+ioctl
+UMEM_INIT: Initialize the umem device with some parameters.
+  IN size: the area size in bytes (which is rounded up to page size)
+  OUT shmem_fd: the file descript to tmpfs that is associated to this umem
+device This is served as backing store of this umem device.
+
+mmap: Mapping the initialized umem device provides the area which
+  is served by user process.
+  The fault in this area is propagated to umem device via read
+  system call.
+read: kernel notifies a process that pages are faulted by returning
+  page offset in page size in u64 format.
+  umem device is pollable for read.
+write: Process notifies kernel that the page is ready to access
+   by writing page offset in page size in u64 format.
+
+
+operation flow
+==
+
+|
+V
+  open(/dev/umem)
+|
+V
+  ioctl(UMEM_INIT)
+|
+V
+  Here we have two file descriptors to
+  umem device and shmem file
+|
+|  daemon process which serves
+|  page fault
+V
+  fork()---,
+|  |
+V  V
+  close(shmem) mmap(shmem file)
+|  |
+V  V
+  mmap(umem device)   close(shmem file)
+|  |
+V  |
+  close(umem device)   |
+|  |
+  now the setup is done|
+  work on the umem area|
+|  |
+V  V
+  access umem area (poll and) read(umem)
+|  |
+V  V
+  page fault -- read system call returns
+  block  page offsets
+   |
+   V
+create page contents
+(usually pull the page
+ from remote)
+write the page contents
+to the shmem which was
+mmapped above

[PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory

2012-06-04 Thread Isaku Yamahata
This is used by postcopy live migration.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 buffered_file.c |   50 ++
 buffered_file.h |   10 ++
 2 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 5198923..4f0c98e 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -106,6 +106,56 @@ static void buffer_flush(QEMUBuffer *buf, QEMUFile *file,
 
 
 /***
+ * read/write to buffer on memory
+ */
+
+static int buf_close(void *opaque)
+{
+QEMUFileBuf *s = opaque;
+buffer_destroy(s-buf);
+g_free(s);
+return 0;
+}
+
+static int buf_put_buffer(void *opaque,
+  const uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileBuf *s = opaque;
+buffer_append(s-buf, buf, size);
+return size;
+}
+
+QEMUFileBuf *qemu_fopen_buf_write(void)
+{
+QEMUFileBuf *s = g_malloc0(sizeof(*s));
+
+s-file = qemu_fopen_ops(s,  buf_put_buffer, NULL, buf_close,
+ NULL, NULL, NULL);
+return s;
+}
+
+static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileBuf *s = opaque;
+ssize_t len = MIN(size, s-buf.buffer_capacity - s-buf.buffer_size);
+memcpy(buf, s-buf.buffer + s-buf.buffer_size, len);
+s-buf.buffer_size += len;
+return len;
+}
+
+/* This get the ownership of buf. */
+QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size)
+{
+QEMUFileBuf *s = g_malloc0(sizeof(*s));
+s-buf.buffer = buf;
+s-buf.buffer_size = 0; /* this is used as index to read */
+s-buf.buffer_capacity = size;
+s-file = qemu_fopen_ops(s, NULL, buf_get_buffer, buf_close,
+ NULL, NULL, NULL);
+return s-file;
+}
+
+/***
  * Nonblocking write only file
  */
 static ssize_t nonblock_flush_buffer_putbuf(void *opaque,
diff --git a/buffered_file.h b/buffered_file.h
index 2712e01..9e28bef 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -24,6 +24,16 @@ struct QEMUBuffer {
 };
 typedef struct QEMUBuffer QEMUBuffer;
 
+struct QEMUFileBuf {
+QEMUFile *file;
+QEMUBuffer buf;
+};
+typedef struct QEMUFileBuf QEMUFileBuf;
+
+QEMUFileBuf *qemu_fopen_buf_write(void);
+/* This get the ownership of buf. */
+QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size);
+
 struct QEMUFileNonblock {
 int fd;
 QEMUFile *file;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 29/41] umem.h: import Linux umem.h

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 linux-headers/linux/umem.h |   42 ++
 1 files changed, 42 insertions(+), 0 deletions(-)
 create mode 100644 linux-headers/linux/umem.h

diff --git a/linux-headers/linux/umem.h b/linux-headers/linux/umem.h
new file mode 100644
index 000..0cf7399
--- /dev/null
+++ b/linux-headers/linux/umem.h
@@ -0,0 +1,42 @@
+/*
+ * User process backed memory.
+ * This is mainly for KVM post copy.
+ *
+ * Copyright (c) 2011,
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata yamahata at valinux co jp
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see http://www.gnu.org/licenses/.
+ */
+
+#ifndef __LINUX_UMEM_H
+#define __LINUX_UMEM_H
+
+#include linux/types.h
+#include linux/ioctl.h
+
+struct umem_init {
+   __u64 size; /* in bytes */
+   __s32 shmem_fd;
+   __s32 padding;
+};
+
+#define UMEMIO 0x1E
+
+/* ioctl for umem fd */
+#define UMEM_INIT  _IOWR(UMEMIO, 0x0, struct umem_init)
+#define UMEM_MAKE_VMA_ANONYMOUS_IO  (UMEMIO, 0x1)
+
+#endif /* __LINUX_UMEM_H */
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 31/41] configure: add CONFIG_POSTCOPY option

2012-06-04 Thread Isaku Yamahata
Add enable/disable postcopy mode. No dynamic test yet.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 configure |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 1f338f8..21de4cb 100755
--- a/configure
+++ b/configure
@@ -194,6 +194,7 @@ zlib=yes
 guest_agent=yes
 libiscsi=
 coroutine=
+postcopy=yes
 
 # parse CC options first
 for opt do
@@ -824,6 +825,10 @@ for opt do
   ;;
   --disable-guest-agent) guest_agent=no
   ;;
+  --enable-postcopy) postcopy=yes
+  ;;
+  --disable-postcopy) postcopy=no
+  ;;
   *) echo ERROR: unknown option $opt; show_help=yes
   ;;
   esac
@@ -1110,6 +1115,8 @@ echo   --disable-guest-agentdisable building of the 
QEMU Guest Agent
 echo   --enable-guest-agent enable building of the QEMU Guest Agent
 echo   --with-coroutine=BACKEND coroutine backend. Supported options:
 echogthread, ucontext, sigaltstack, windows
+echo   --disable-postcopy   disable postcopy mode for live migration
+echo   --enable-postcopyenable postcopy mode for live migration
 echo 
 echo NOTE: The object files are built at the place where configure is 
launched
 exit 1
@@ -3029,6 +3036,7 @@ echo OpenGL support$opengl
 echo libiscsi support  $libiscsi
 echo build guest agent $guest_agent
 echo coroutine backend $coroutine_backend
+echo postcopy support  $postcopy
 
 if test $sdl_too_old = yes; then
 echo - Your SDL version is too old - please upgrade to have SDL support
@@ -3329,6 +3337,10 @@ if test $libiscsi = yes ; then
   echo CONFIG_LIBISCSI=y  $config_host_mak
 fi
 
+if test $postcopy = yes ; then
+  echo CONFIG_POSTCOPY=y  $config_host_mak
+fi
+
 # XXX: suppress that
 if [ $bsd = yes ] ; then
   echo CONFIG_BSD=y  $config_host_mak
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 25/41] migration: factor out parameters into MigrationParams

2012-06-04 Thread Isaku Yamahata
Introduce MigrationParams for parameters of migration.

Cc: Orit Wasserman owass...@redhat.com
Cc: Juan Quintela quint...@redhat.com
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v1 - v2:
- catch up qapi change
---
 block-migration.c |8 
 migration.c   |   21 +++--
 migration.h   |8 ++--
 qemu-common.h |1 +
 savevm.c  |   10 +++---
 sysemu.h  |2 +-
 vmstate.h |2 +-
 7 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index fd2..b95b4e1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -700,13 +700,13 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
-static void block_set_params(int blk_enable, int shared_base, void *opaque)
+static void block_set_params(const MigrationParams *params, void *opaque)
 {
-block_mig_state.blk_enable = blk_enable;
-block_mig_state.shared_base = shared_base;
+block_mig_state.blk_enable = params-blk;
+block_mig_state.shared_base = params-shared;
 
 /* shared base means that blk_enable = 1 */
-block_mig_state.blk_enable |= shared_base;
+block_mig_state.blk_enable |= params-shared;
 }
 
 void blk_mig_init(void)
diff --git a/migration.c b/migration.c
index 48a8f68..3b97aec 100644
--- a/migration.c
+++ b/migration.c
@@ -352,7 +352,7 @@ void migrate_fd_connect(MigrationState *s)
   migrate_fd_close);
 
 DPRINTF(beginning savevm\n);
-ret = qemu_savevm_state_begin(s-file, s-blk, s-shared);
+ret = qemu_savevm_state_begin(s-file, s-params);
 if (ret  0) {
 DPRINTF(failed, %d\n, ret);
 migrate_fd_error(s);
@@ -361,15 +361,13 @@ void migrate_fd_connect(MigrationState *s)
 migrate_fd_put_ready(s);
 }
 
-static MigrationState *migrate_init(int blk, int inc)
+static MigrationState *migrate_init(const MigrationParams *params)
 {
 MigrationState *s = migrate_get_current();
 int64_t bandwidth_limit = s-bandwidth_limit;
 
 memset(s, 0, sizeof(*s));
-s-blk = blk;
-s-shared = inc;
-
+s-params = *params;
 s-bandwidth_limit = bandwidth_limit;
 s-state = MIG_STATE_SETUP;
 
@@ -393,9 +391,20 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
  Error **errp)
 {
 MigrationState *s = migrate_get_current();
+MigrationParams params = {
+.blk = false,
+.shared = false,
+};
 const char *p;
 int ret;
 
+if (has_blk) {
+params.blk = blk;
+}
+if (has_inc) {
+params.shared = inc;
+}
+
 if (s-state == MIG_STATE_ACTIVE) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
@@ -410,7 +419,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 return;
 }
 
-s = migrate_init(blk, inc);
+s = migrate_init(params);
 
 if (strstart(uri, tcp:, p)) {
 ret = tcp_start_outgoing_migration(s, p, errp);
diff --git a/migration.h b/migration.h
index d0dd536..59e6e68 100644
--- a/migration.h
+++ b/migration.h
@@ -19,6 +19,11 @@
 #include notify.h
 #include error.h
 
+struct MigrationParams {
+int blk;
+int shared;
+};
+
 typedef struct MigrationState MigrationState;
 
 struct MigrationState
@@ -31,8 +36,7 @@ struct MigrationState
 int (*close)(MigrationState *s);
 int (*write)(MigrationState *s, const void *buff, size_t size);
 void *opaque;
-int blk;
-int shared;
+MigrationParams params;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/qemu-common.h b/qemu-common.h
index 91e0562..057c810 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -263,6 +263,7 @@ typedef struct EventNotifier EventNotifier;
 typedef struct VirtIODevice VirtIODevice;
 typedef struct QEMUSGList QEMUSGList;
 typedef struct SHPCDevice SHPCDevice;
+typedef struct MigrationParams MigrationParams;
 
 typedef uint64_t pcibus_t;
 
diff --git a/savevm.c b/savevm.c
index 5640614..318ec61 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1611,7 +1611,7 @@ bool qemu_savevm_state_blocked(Error **errp)
 return false;
 }
 
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
+int qemu_savevm_state_begin(QEMUFile *f, const MigrationParams *params)
 {
 SaveStateEntry *se;
 int ret;
@@ -1620,7 +1620,7 @@ int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, 
int shared)
 if(se-set_params == NULL) {
 continue;
}
-   se-set_params(blk_enable, shared, se-opaque);
+   se-set_params(params, se-opaque);
 }
 
 qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
@@ -1758,13 +1758,17 @@ void qemu_savevm_state_cancel(QEMUFile *f)
 static int qemu_savevm_state(QEMUFile *f)
 {
 int ret;
+MigrationParams params = {
+.blk = 0,
+.shared = 0,
+};
 
 if (qemu_savevm_state_blocked(NULL)) {
 ret = -EINVAL;
 goto out;
 }
 
-ret = qemu_savevm_state_begin

[PATCH v2 23/41] migration.c: remove redundant line in migrate_init()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 3f485d3..753addb 100644
--- a/migration.c
+++ b/migration.c
@@ -367,7 +367,6 @@ static MigrationState *migrate_init(int blk, int inc)
 int64_t bandwidth_limit = s-bandwidth_limit;
 
 memset(s, 0, sizeof(*s));
-s-bandwidth_limit = bandwidth_limit;
 s-blk = blk;
 s-shared = inc;
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 buffered_file.c |  115 +++
 buffered_file.h |   13 ++
 2 files changed, 128 insertions(+), 0 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 22dd4c9..5198923 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -106,6 +106,121 @@ static void buffer_flush(QEMUBuffer *buf, QEMUFile *file,
 
 
 /***
+ * Nonblocking write only file
+ */
+static ssize_t nonblock_flush_buffer_putbuf(void *opaque,
+const void *data, size_t size)
+{
+QEMUFileNonblock *s = opaque;
+ssize_t ret = write(s-fd, data, size);
+if (ret == -1) {
+return -errno;
+}
+return ret;
+}
+
+static void nonblock_flush_buffer(QEMUFileNonblock *s)
+{
+buffer_flush(s-buf, s-file, s, nonblock_flush_buffer_putbuf);
+
+if (s-buf.buffer_size  0) {
+s-buf.freeze_output = true;
+}
+}
+
+static int nonblock_put_buffer(void *opaque,
+   const uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileNonblock *s = opaque;
+int error;
+ssize_t len = 0;
+
+error = qemu_file_get_error(s-file);
+if (error) {
+return error;
+}
+
+nonblock_flush_buffer(s);
+error = qemu_file_get_error(s-file);
+if (error) {
+return error;
+}
+
+while (!s-buf.freeze_output  size  0) {
+ssize_t ret;
+assert(s-buf.buffer_size == 0);
+
+ret = write(s-fd, buf, size);
+if (ret == -1) {
+if (errno == EINTR) {
+continue;
+} else if (errno == EAGAIN) {
+s-buf.freeze_output = true;
+} else {
+qemu_file_set_error(s-file, errno);
+}
+break;
+}
+
+len += ret;
+buf += ret;
+size -= ret;
+}
+
+if (size  0) {
+buffer_append(s-buf, buf, size);
+len += size;
+}
+return len;
+}
+
+int nonblock_pending_size(QEMUFileNonblock *s)
+{
+return qemu_pending_size(s-file) + s-buf.buffer_size;
+}
+
+void nonblock_fflush(QEMUFileNonblock *s)
+{
+s-buf.freeze_output = false;
+nonblock_flush_buffer(s);
+if (!s-buf.freeze_output) {
+qemu_fflush(s-file);
+}
+}
+
+void nonblock_wait_for_flush(QEMUFileNonblock *s)
+{
+while (nonblock_pending_size(s)  0) {
+fd_set fds;
+FD_ZERO(fds);
+FD_SET(s-fd, fds);
+select(s-fd + 1, NULL, fds, NULL, NULL);
+
+nonblock_fflush(s);
+}
+}
+
+static int nonblock_close(void *opaque)
+{
+QEMUFileNonblock *s = opaque;
+nonblock_wait_for_flush(s);
+buffer_destroy(s-buf);
+g_free(s);
+return 0;
+}
+
+QEMUFileNonblock *qemu_fopen_nonblock(int fd)
+{
+QEMUFileNonblock *s = g_malloc0(sizeof(*s));
+
+s-fd = fd;
+fcntl_setfl(fd, O_NONBLOCK);
+s-file = qemu_fopen_ops(s, nonblock_put_buffer, NULL, nonblock_close,
+ NULL, NULL, NULL);
+return s;
+}
+
+/***
  * Buffered File
  */
 
diff --git a/buffered_file.h b/buffered_file.h
index d3ef546..2712e01 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -24,6 +24,19 @@ struct QEMUBuffer {
 };
 typedef struct QEMUBuffer QEMUBuffer;
 
+struct QEMUFileNonblock {
+int fd;
+QEMUFile *file;
+
+QEMUBuffer buf;
+};
+typedef struct QEMUFileNonblock QEMUFileNonblock;
+
+QEMUFileNonblock *qemu_fopen_nonblock(int fd);
+int nonblock_pending_size(QEMUFileNonblock *s);
+void nonblock_fflush(QEMUFileNonblock *s);
+void nonblock_wait_for_flush(QEMUFileNonblock *s);
+
 typedef ssize_t (BufferedPutFunc)(void *opaque, const void *data, size_t size);
 typedef void (BufferedPutReadyFunc)(void *opaque);
 typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd

2012-06-04 Thread Isaku Yamahata
Now qemu_file_fd() replaces qemu_stdio_fd().

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration-exec.c |4 ++--
 migration-fd.c   |2 +-
 qemu-file.h  |1 -
 savevm.c |   12 
 4 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 6c97db9..95e9779 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -98,7 +98,7 @@ static void exec_accept_incoming_migration(void *opaque)
 QEMUFile *f = opaque;
 
 process_incoming_migration(f);
-qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
 qemu_fclose(f);
 }
 
@@ -113,7 +113,7 @@ int exec_start_incoming_migration(const char *command)
 return -errno;
 }
 
-qemu_set_fd_handler2(qemu_stdio_fd(f), NULL,
+qemu_set_fd_handler2(qemu_file_fd(f), NULL,
 exec_accept_incoming_migration, NULL, f);
 
 return 0;
diff --git a/migration-fd.c b/migration-fd.c
index 50138ed..d9c13fe 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -104,7 +104,7 @@ static void fd_accept_incoming_migration(void *opaque)
 QEMUFile *f = opaque;
 
 process_incoming_migration(f);
-qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
 qemu_fclose(f);
 }
 
diff --git a/qemu-file.h b/qemu-file.h
index 98a8023..1a12e7d 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -70,7 +70,6 @@ QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
-int qemu_stdio_fd(QEMUFile *f);
 int qemu_file_fd(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_buffered_file_drain(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index cba1a69..ec9f5d0 100644
--- a/savevm.c
+++ b/savevm.c
@@ -293,18 +293,6 @@ QEMUFile *qemu_popen_cmd(const char *command, const char 
*mode)
 return qemu_popen(popen_file, mode);
 }
 
-/* TODO: replace this with qemu_file_fd() */
-int qemu_stdio_fd(QEMUFile *f)
-{
-QEMUFileStdio *p;
-int fd;
-
-p = (QEMUFileStdio *)f-opaque;
-fd = fileno(p-stdio_file);
-
-return fd;
-}
-
 QEMUFile *qemu_fdopen(int fd, const char *mode)
 {
 QEMUFileStdio *s;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 30/41] update-linux-headers.sh: teach umem.h to update-linux-headers.sh

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 scripts/update-linux-headers.sh |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 9d2a4bc..2afdd54 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -43,7 +43,7 @@ done
 
 rm -rf $output/linux-headers/linux
 mkdir -p $output/linux-headers/linux
-for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do
+for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h umem.h; do
 cp $tmpdir/include/linux/$header $output/linux-headers/linux
 done
 if [ -L $linux/source ]; then
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 26/41] buffered_file: factor out buffer management logic

2012-06-04 Thread Isaku Yamahata
This patch factors out buffer management logic.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 buffered_file.c |  141 +-
 buffered_file.h |8 +++
 2 files changed, 94 insertions(+), 55 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index a38caec..22dd4c9 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -20,24 +20,6 @@
 #include buffered_file.h
 
 //#define DEBUG_BUFFERED_FILE
-
-typedef struct QEMUFileBuffered
-{
-BufferedPutFunc *put_buffer;
-BufferedPutReadyFunc *put_ready;
-BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
-BufferedCloseFunc *close;
-void *opaque;
-QEMUFile *file;
-int freeze_output;
-size_t bytes_xfer;
-size_t xfer_limit;
-uint8_t *buffer;
-size_t buffer_size;
-size_t buffer_capacity;
-QEMUTimer *timer;
-} QEMUFileBuffered;
-
 #ifdef DEBUG_BUFFERED_FILE
 #define DPRINTF(fmt, ...) \
 do { printf(buffered-file:  fmt, ## __VA_ARGS__); } while (0)
@@ -46,57 +28,71 @@ typedef struct QEMUFileBuffered
 do { } while (0)
 #endif
 
-static void buffered_append(QEMUFileBuffered *s,
-const uint8_t *buf, size_t size)
-{
-if (size  (s-buffer_capacity - s-buffer_size)) {
-void *tmp;
-
-DPRINTF(increasing buffer capacity from %zu by %zu\n,
-s-buffer_capacity, size + 1024);
 
-s-buffer_capacity += size + 1024;
+/***
+ * buffer management
+ */
 
-tmp = g_realloc(s-buffer, s-buffer_capacity);
-if (tmp == NULL) {
-fprintf(stderr, qemu file buffer expansion failed\n);
-exit(1);
-}
+static void buffer_destroy(QEMUBuffer *s)
+{
+g_free(s-buffer);
+}
 
-s-buffer = tmp;
+static void buffer_consume(QEMUBuffer *s, size_t offset)
+{
+if (offset  0) {
+assert(s-buffer_size = offset);
+memmove(s-buffer, s-buffer + offset, s-buffer_size - offset);
+s-buffer_size -= offset;
 }
+}
 
+static void buffer_append(QEMUBuffer *s, const uint8_t *buf, size_t size)
+{
+#define BUF_SIZE_INC(32 * 1024) /* = IO_BUF_SIZE */
+int inc = size - (s-buffer_capacity - s-buffer_size);
+if (inc  0) {
+s-buffer_capacity += DIV_ROUND_UP(inc, BUF_SIZE_INC) * BUF_SIZE_INC;
+s-buffer = g_realloc(s-buffer, s-buffer_capacity);
+}
 memcpy(s-buffer + s-buffer_size, buf, size);
 s-buffer_size += size;
 }
 
-static void buffered_flush(QEMUFileBuffered *s)
+typedef ssize_t (BufferPutBuf)(void *opaque, const void *data, size_t size);
+
+static void buffer_flush(QEMUBuffer *buf, QEMUFile *file,
+ void *opaque, BufferPutBuf *put_buf)
 {
 size_t offset = 0;
 int error;
 
-error = qemu_file_get_error(s-file);
+error = qemu_file_get_error(file);
 if (error != 0) {
 DPRINTF(flush when error, bailing: %s\n, strerror(-error));
 return;
 }
 
-DPRINTF(flushing %zu byte(s) of data\n, s-buffer_size);
+DPRINTF(flushing %zu byte(s) of data\n, buf-buffer_size);
 
-while (offset  s-buffer_size) {
+while (offset  buf-buffer_size) {
 ssize_t ret;
 
-ret = s-put_buffer(s-opaque, s-buffer + offset,
-s-buffer_size - offset);
-if (ret == -EAGAIN) {
+ret = put_buf(opaque, buf-buffer + offset, buf-buffer_size - offset);
+if (ret == -EINTR) {
+continue;
+} else if (ret == -EAGAIN) {
 DPRINTF(backend not ready, freezing\n);
-s-freeze_output = 1;
+buf-freeze_output = true;
 break;
 }
 
-if (ret = 0) {
+if (ret  0) {
 DPRINTF(error flushing data, %zd\n, ret);
-qemu_file_set_error(s-file, ret);
+qemu_file_set_error(file, ret);
+break;
+} else if (ret == 0) {
+DPRINTF(ret == 0\n);
 break;
 } else {
 DPRINTF(flushed %zd byte(s)\n, ret);
@@ -104,9 +100,44 @@ static void buffered_flush(QEMUFileBuffered *s)
 }
 }
 
-DPRINTF(flushed %zu of %zu byte(s)\n, offset, s-buffer_size);
-memmove(s-buffer, s-buffer + offset, s-buffer_size - offset);
-s-buffer_size -= offset;
+DPRINTF(flushed %zu of %zu byte(s)\n, offset, buf-buffer_size);
+buffer_consume(buf, offset);
+}
+
+
+/***
+ * Buffered File
+ */
+
+typedef struct QEMUFileBuffered
+{
+BufferedPutFunc *put_buffer;
+BufferedPutReadyFunc *put_ready;
+BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
+BufferedCloseFunc *close;
+void *opaque;
+QEMUFile *file;
+size_t bytes_xfer;
+size_t xfer_limit;
+QEMUTimer *timer;
+QEMUBuffer buf;
+} QEMUFileBuffered;
+
+static ssize_t buffered_flush_putbuf(void *opaque,
+ const

[PATCH v2 14/41] exec.c: export last_ram_offset()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 exec-obsolete.h |1 +
 exec.c  |4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/exec-obsolete.h b/exec-obsolete.h
index 792c831..fb21dd7 100644
--- a/exec-obsolete.h
+++ b/exec-obsolete.h
@@ -25,6 +25,7 @@
 
 #ifndef CONFIG_USER_ONLY
 
+ram_addr_t qemu_last_ram_offset(void);
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
MemoryRegion *mr);
 ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr);
diff --git a/exec.c b/exec.c
index 7f44893..785 100644
--- a/exec.c
+++ b/exec.c
@@ -2576,7 +2576,7 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
 return offset;
 }
 
-static ram_addr_t last_ram_offset(void)
+ram_addr_t qemu_last_ram_offset(void)
 {
 RAMBlock *block;
 ram_addr_t last = 0;
@@ -2672,7 +2672,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 QLIST_INSERT_HEAD(ram_list.blocks, new_block, next);
 
 ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
-   last_ram_offset()  TARGET_PAGE_BITS);
+qemu_last_ram_offset()  
TARGET_PAGE_BITS);
 memset(ram_list.phys_dirty + (new_block-offset  TARGET_PAGE_BITS),
0xff, size  TARGET_PAGE_BITS);
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 08/41] arch_init/ram_load: refactor ram_load

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   67 +-
 arch_init.h |1 +
 2 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index c861e30..bb0cd52 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -438,6 +438,41 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 return ram_load_host_from_stream_offset(f, offset, flags, block);
 }
 
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+/* Synchronize RAM block list */
+char id[256];
+ram_addr_t length;
+
+while (total_ram_bytes) {
+RAMBlock *block;
+uint8_t len;
+
+len = qemu_get_byte(f);
+qemu_get_buffer(f, (uint8_t *)id, len);
+id[len] = 0;
+length = qemu_get_be64(f);
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (!strncmp(id, block-idstr, sizeof(id))) {
+if (block-length != length)
+return -EINVAL;
+break;
+}
+}
+
+if (!block) {
+fprintf(stderr, Unknown ramblock \%s\, cannot 
+accept migration\n, id);
+return -EINVAL;
+}
+
+total_ram_bytes -= length;
+}
+
+return 0;
+}
+
 int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 ram_addr_t addr;
@@ -456,35 +491,9 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
 
 if (flags  RAM_SAVE_FLAG_MEM_SIZE) {
 if (version_id == 4) {
-/* Synchronize RAM block list */
-char id[256];
-ram_addr_t length;
-ram_addr_t total_ram_bytes = addr;
-
-while (total_ram_bytes) {
-RAMBlock *block;
-uint8_t len;
-
-len = qemu_get_byte(f);
-qemu_get_buffer(f, (uint8_t *)id, len);
-id[len] = 0;
-length = qemu_get_be64(f);
-
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id))) {
-if (block-length != length)
-return -EINVAL;
-break;
-}
-}
-
-if (!block) {
-fprintf(stderr, Unknown ramblock \%s\, cannot 
-accept migration\n, id);
-return -EINVAL;
-}
-
-total_ram_bytes -= length;
+error = ram_load_mem_size(f, addr);
+if (error) {
+return error;
 }
 }
 }
diff --git a/arch_init.h b/arch_init.h
index 0a39082..507f110 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -45,6 +45,7 @@ void *ram_load_host_from_stream_offset(QEMUFile *f,
ram_addr_t offset,
int flags,
RAMBlock **last_blockp);
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes);
 #endif
 
 #endif
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 04/41] arch_init: refactor host_from_stream_offset()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   25 ++---
 arch_init.h |7 +++
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2a53f58..36ece1d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -374,21 +374,22 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 return (stage == 2)  (expected_time = migrate_max_downtime());
 }
 
-static inline void *host_from_stream_offset(QEMUFile *f,
-ram_addr_t offset,
-int flags)
+void *ram_load_host_from_stream_offset(QEMUFile *f,
+   ram_addr_t offset,
+   int flags,
+   RAMBlock **last_blockp)
 {
-static RAMBlock *block = NULL;
+RAMBlock *block;
 char id[256];
 uint8_t len;
 
 if (flags  RAM_SAVE_FLAG_CONTINUE) {
-if (!block) {
+if (!(*last_blockp)) {
 fprintf(stderr, Ack, bad migration stream!\n);
 return NULL;
 }
 
-return memory_region_get_ram_ptr(block-mr) + offset;
+return memory_region_get_ram_ptr((*last_blockp)-mr) + offset;
 }
 
 len = qemu_get_byte(f);
@@ -396,14 +397,24 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 id[len] = 0;
 
 QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id)))
+if (!strncmp(id, block-idstr, sizeof(id))) {
+*last_blockp = block;
 return memory_region_get_ram_ptr(block-mr) + offset;
+}
 }
 
 fprintf(stderr, Can't find block %s!\n, id);
 return NULL;
 }
 
+static inline void *host_from_stream_offset(QEMUFile *f,
+ram_addr_t offset,
+int flags)
+{
+static RAMBlock *block = NULL;
+return ram_load_host_from_stream_offset(f, offset, flags, block);
+}
+
 int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 ram_addr_t addr;
diff --git a/arch_init.h b/arch_init.h
index 456637d..d84eac7 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -39,4 +39,11 @@ int xen_available(void);
 
 #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
 
+#if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
+void *ram_load_host_from_stream_offset(QEMUFile *f,
+   ram_addr_t offset,
+   int flags,
+   RAMBlock **last_blockp);
+#endif
+
 #endif
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 cpu-all.h |2 ++
 exec.c|   51 +--
 2 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 028528f..ff7f827 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -508,6 +508,8 @@ extern RAMList ram_list;
 extern const char *mem_path;
 extern int mem_prealloc;
 
+RAMBlock *qemu_get_ram_block(ram_addr_t adar);
+
 /* Flags stored in the low bits of the TLB virtual address.  These are
defined so that fast path ram access is all zeros.  */
 /* Zero if TLB entry is valid.  */
diff --git a/exec.c b/exec.c
index 078a408..7f44893 100644
--- a/exec.c
+++ b/exec.c
@@ -2799,15 +2799,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 }
 #endif /* !_WIN32 */
 
-/* Return a host pointer to ram allocated with qemu_ram_alloc.
-   With the exception of the softmmu code in this file, this should
-   only be used for local memory (e.g. video ram) that the device owns,
-   and knows it isn't going to access beyond the end of the block.
-
-   It should not be used for general purpose DMA.
-   Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
- */
-void *qemu_get_ram_ptr(ram_addr_t addr)
+RAMBlock *qemu_get_ram_block(ram_addr_t addr)
 {
 RAMBlock *block;
 
@@ -2818,19 +2810,7 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 QLIST_REMOVE(block, next);
 QLIST_INSERT_HEAD(ram_list.blocks, block, next);
 }
-if (xen_enabled()) {
-/* We need to check if the requested address is in the RAM
- * because we don't want to map the entire memory in QEMU.
- * In that case just map until the end of the page.
- */
-if (block-offset == 0) {
-return xen_map_cache(addr, 0, 0);
-} else if (block-host == NULL) {
-block-host =
-xen_map_cache(block-offset, block-length, 1);
-}
-}
-return block-host + (addr - block-offset);
+return block;
 }
 }
 
@@ -2841,6 +2821,33 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 }
 
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
+   With the exception of the softmmu code in this file, this should
+   only be used for local memory (e.g. video ram) that the device owns,
+   and knows it isn't going to access beyond the end of the block.
+
+   It should not be used for general purpose DMA.
+   Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
+ */
+void *qemu_get_ram_ptr(ram_addr_t addr)
+{
+RAMBlock *block = qemu_get_ram_block(addr);
+
+if (xen_enabled()) {
+/* We need to check if the requested address is in the RAM
+ * because we don't want to map the entire memory in QEMU.
+ * In that case just map until the end of the page.
+ */
+if (block-offset == 0) {
+return xen_map_cache(addr, 0, 0);
+} else if (block-host == NULL) {
+block-host =
+xen_map_cache(block-offset, block-length, 1);
+}
+}
+return block-host + (addr - block-offset);
+}
+
+/* Return a host pointer to ram allocated with qemu_ram_alloc.
  * Same as qemu_get_ram_ptr but avoid reordering ramblocks.
  */
 void *qemu_safe_ram_ptr(ram_addr_t addr)
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 11/41] arch_init: factor out counting transferred bytes

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   24 
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 73bf250..2617478 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -155,8 +155,9 @@ static int is_dup_page(uint8_t *page)
 }
 
 static RAMBlock *last_block_sent = NULL;
+static uint64_t bytes_transferred;
 
-int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+static int ram_save_page_int(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
 {
 MemoryRegion *mr = block-mr;
 uint8_t *p;
@@ -192,6 +193,13 @@ int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t 
offset)
 return TARGET_PAGE_SIZE;
 }
 
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+{
+int bytes_sent = ram_save_page_int(f, block, offset);
+bytes_transferred += bytes_sent;
+return bytes_sent;
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
@@ -228,8 +236,6 @@ int ram_save_block(QEMUFile *f)
 return bytes_sent;
 }
 
-static uint64_t bytes_transferred;
-
 static ram_addr_t ram_save_remaining(void)
 {
 RAMBlock *block;
@@ -357,11 +363,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 bwidth = qemu_get_clock_ns(rt_clock);
 
 while ((ret = qemu_file_rate_limit(f)) == 0) {
-int bytes_sent;
-
-bytes_sent = ram_save_block(f);
-bytes_transferred += bytes_sent;
-if (bytes_sent == 0) { /* no more blocks */
+if (ram_save_block(f) == 0) { /* no more blocks */
 break;
 }
 }
@@ -381,11 +383,9 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 
 /* try transferring iterative blocks of memory */
 if (stage == 3) {
-int bytes_sent;
-
 /* flush all remaining blocks regardless of rate limiting */
-while ((bytes_sent = ram_save_block(f)) != 0) {
-bytes_transferred += bytes_sent;
+while (ram_save_block(f) != 0) {
+/* nothing */
 }
 memory_global_dirty_log_stop();
 }
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size

2012-06-04 Thread Isaku Yamahata
This will be used later by postcopy migration.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |1 +
 savevm.c|5 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index a285bef..880ef4b 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -91,6 +91,7 @@ int qemu_get_byte(QEMUFile *f);
 int qemu_peek_byte(QEMUFile *f, int offset);
 int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
 void qemu_file_skip(QEMUFile *f, int size);
+int qemu_pending_size(const QEMUFile *f);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index 8ad843f..2992f97 100644
--- a/savevm.c
+++ b/savevm.c
@@ -595,6 +595,11 @@ void qemu_file_skip(QEMUFile *f, int size)
 }
 }
 
+int qemu_pending_size(const QEMUFile *f)
+{
+return f-buf_size - f-buf_index;
+}
+
 int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
 int pending;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   21 ++---
 migration.h |1 +
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 36ece1d..28e5abb 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -287,6 +287,19 @@ void sort_ram_list(void)
 g_free(blocks);
 }
 
+void ram_save_live_mem_size(QEMUFile *f)
+{
+RAMBlock *block;
+
+qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+qemu_put_byte(f, strlen(block-idstr));
+qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
+qemu_put_be64(f, block-length);
+}
+}
+
 int ram_save_live(QEMUFile *f, int stage, void *opaque)
 {
 ram_addr_t addr;
@@ -321,13 +334,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 
 memory_global_dirty_log_start();
 
-qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
-
-QLIST_FOREACH(block, ram_list.blocks, next) {
-qemu_put_byte(f, strlen(block-idstr));
-qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
-qemu_put_be64(f, block-length);
-}
+ram_save_live_mem_size(f);
 }
 
 bytes_transferred_last = bytes_transferred;
diff --git a/migration.h b/migration.h
index 8b9509c..e2e9b43 100644
--- a/migration.h
+++ b/migration.h
@@ -78,6 +78,7 @@ uint64_t ram_bytes_total(void);
 
 void sort_ram_list(void);
 int ram_save_block(QEMUFile *f);
+void ram_save_live_mem_size(QEMUFile *f);
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   28 
 migration.h |1 +
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 900cc8e..c861e30 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -311,9 +311,23 @@ void ram_save_live_mem_size(QEMUFile *f)
 }
 }
 
+void ram_save_memory_set_dirty(void)
+{
+RAMBlock *block;
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+ram_addr_t addr;
+for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
+if (!memory_region_get_dirty(block-mr, addr, TARGET_PAGE_SIZE,
+ DIRTY_MEMORY_MIGRATION)) {
+memory_region_set_dirty(block-mr, addr, TARGET_PAGE_SIZE);
+}
+}
+}
+}
+
 int ram_save_live(QEMUFile *f, int stage, void *opaque)
 {
-ram_addr_t addr;
 uint64_t bytes_transferred_last;
 double bwidth = 0;
 uint64_t expected_time = 0;
@@ -327,7 +341,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 memory_global_sync_dirty_bitmap(get_system_memory());
 
 if (stage == 1) {
-RAMBlock *block;
 bytes_transferred = 0;
 last_block_sent = NULL;
 last_block = NULL;
@@ -335,17 +348,8 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 sort_ram_list();
 
 /* Make sure all dirty bits are set */
-QLIST_FOREACH(block, ram_list.blocks, next) {
-for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
-if (!memory_region_get_dirty(block-mr, addr, TARGET_PAGE_SIZE,
- DIRTY_MEMORY_MIGRATION)) {
-memory_region_set_dirty(block-mr, addr, TARGET_PAGE_SIZE);
-}
-}
-}
-
+ram_save_memory_set_dirty();
 memory_global_dirty_log_start();
-
 ram_save_live_mem_size(f);
 }
 
diff --git a/migration.h b/migration.h
index e2e9b43..6cf4512 100644
--- a/migration.h
+++ b/migration.h
@@ -78,6 +78,7 @@ uint64_t ram_bytes_total(void);
 
 void sort_ram_list(void);
 int ram_save_block(QEMUFile *f);
+void ram_save_memory_set_dirty(void);
 void ram_save_live_mem_size(QEMUFile *f);
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/41] arch_init: introduce helper function to find ram block with id string

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   13 +
 arch_init.h |1 +
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index bb0cd52..9981abe 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -397,6 +397,19 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 return (stage == 2)  (expected_time = migrate_max_downtime());
 }
 
+RAMBlock *ram_find_block(const char *id, uint8_t len)
+{
+RAMBlock *block;
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (!strncmp(id, block-idstr, len)) {
+return block;
+}
+}
+
+return NULL;
+}
+
 void *ram_load_host_from_stream_offset(QEMUFile *f,
ram_addr_t offset,
int flags,
diff --git a/arch_init.h b/arch_init.h
index 507f110..7f5c77a 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -41,6 +41,7 @@ int xen_available(void);
 
 #if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
 int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
+RAMBlock *ram_find_block(const char *id, uint8_t len);
 void *ram_load_host_from_stream_offset(QEMUFile *f,
ram_addr_t offset,
int flags,
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4

2012-06-04 Thread Isaku Yamahata
Introduce RAM_SAVE_VERSION_ID to represent version_id for ram save format.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |2 +-
 arch_init.h |2 ++
 vl.c|4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index bd4e61e..2a53f58 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -410,7 +410,7 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
 int flags;
 int error;
 
-if (version_id  4 || version_id  4) {
+if (version_id  4 || version_id  RAM_SAVE_VERSION_ID) {
 return -EINVAL;
 }
 
diff --git a/arch_init.h b/arch_init.h
index 7cc3fa7..456637d 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -37,4 +37,6 @@ int xen_available(void);
 #define RAM_SAVE_FLAG_EOS  0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 
+#define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
+
 #endif
diff --git a/vl.c b/vl.c
index 23ab3a3..62dc343 100644
--- a/vl.c
+++ b/vl.c
@@ -3436,8 +3436,8 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_sdcard, snapshot, machine-use_scsi,
   IF_SD, 0, SD_OPTS);
 
-register_savevm_live(NULL, ram, 0, 4, NULL, ram_save_live, NULL,
- ram_load, NULL);
+register_savevm_live(NULL, ram, 0, RAM_SAVE_VERSION_ID, NULL,
+ ram_save_live, NULL, ram_load, NULL);
 
 if (nb_numa_nodes  0) {
 int i;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/41] postcopy live migration

2012-06-04 Thread Isaku Yamahata
  to the shmem.
 |
 V
unblock -write() to tell served pages
the fault handler returns the page
page fault is resolved
  |
  |   pages can be sent
  |   backgroundly
  |  |
  |  V
  |   write()
  |  |
  V  V
The specified pages-piperequest to touch pages
are made present by  |
touching guest RAM.  |
  |  |
  V  V
 reply-pipe- release the cached page
  |   madvise(MADV_REMOVE)
  |  |
  V  V

 all the pages are pulled from the source

  |  |
  V  V
the vma becomes anonymousUMEM_MAKE_VMA_ANONYMOUS
   (note: I'm not sure if this can be implemented or not)
  |  |
  V  V
migration completesexit()




Isaku Yamahata (41):
  arch_init: export sort_ram_list() and ram_save_block()
  arch_init: export RAM_SAVE_xxx flags for postcopy
  arch_init/ram_save: introduce constant for ram save version = 4
  arch_init: refactor host_from_stream_offset()
  arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
  arch_init: refactor ram_save_block()
  arch_init/ram_save_live: factor out ram_save_limit
  arch_init/ram_load: refactor ram_load
  arch_init: introduce helper function to find ram block with id string
  arch_init: simplify a bit by ram_find_block()
  arch_init: factor out counting transferred bytes
  arch_init: factor out setting last_block, last_offset
  exec.c: factor out qemu_get_ram_ptr()
  exec.c: export last_ram_offset()
  savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
  savevm: qemu_pending_size() to return pending buffered size
  savevm, buffered_file: introduce method to drain buffer of buffered
file
  QEMUFile: add qemu_file_fd() for later use
  savevm/QEMUFile: drop qemu_stdio_fd
  savevm/QEMUFileSocket: drop duplicated member fd
  savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close
  savevm/QEMUFile: introduce qemu_fopen_fd
  migration.c: remove redundant line in migrate_init()
  migration: export migrate_fd_completed() and migrate_fd_cleanup()
  migration: factor out parameters into MigrationParams
  buffered_file: factor out buffer management logic
  buffered_file: Introduce QEMUFileNonblock for nonblock write
  buffered_file: add qemu_file to read/write to buffer in memory
  umem.h: import Linux umem.h
  update-linux-headers.sh: teach umem.h to update-linux-headers.sh
  configure: add CONFIG_POSTCOPY option
  savevm: add new section that is used by postcopy
  postcopy: introduce -postcopy and -postcopy-flags option
  postcopy outgoing: add -p and -n option to migrate command
  postcopy: introduce helper functions for postcopy
  postcopy: implement incoming part of postcopy live migration
  postcopy: implement outgoing part of postcopy live migration
  postcopy/outgoing: add forward, backward option to specify the size
of prefault
  postcopy/outgoing: implement prefault
  migrate: add -m (movebg) option to migrate command
  migration/postcopy: add movebg mode

 Makefile.target |5 +
 arch_init.c |  298 ---
 arch_init.h |   20 +
 block-migration.c   |8 +-
 buffered_file.c |  322 ++--
 buffered_file.h |   32 +
 configure   |   12 +
 cpu-all.h   |9 +
 exec-obsolete.h |1 +
 exec.c  |   87 ++-
 hmp-commands.hx |   18 +-
 hmp.c   |   10 +-
 linux-headers/linux/umem.h  |   42 +
 migration-exec.c|   12 +-
 migration-fd.c  |   25 +-
 migration-postcopy-stub.c   |   77 ++
 migration-postcopy.c| 1771 +++
 migration-tcp.c |   25 +-
 migration-unix.c|   26 +-
 migration.c |   97 ++-
 migration.h |   47 +-
 qapi-schema.json

[PATCH v2 06/41] arch_init: refactor ram_save_block()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp

---
Chnages v1 - v2:
- don't refer last_block which can be NULL.
  And avoid possible infinite loop.
---
 arch_init.c |   82 +-
 arch_init.h |1 +
 2 files changed, 48 insertions(+), 35 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 28e5abb..900cc8e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -154,6 +154,44 @@ static int is_dup_page(uint8_t *page)
 return 1;
 }
 
+static RAMBlock *last_block_sent = NULL;
+
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+{
+MemoryRegion *mr = block-mr;
+uint8_t *p;
+int cont;
+
+if (!memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
+ DIRTY_MEMORY_MIGRATION)) {
+return 0;
+}
+memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE,
+  DIRTY_MEMORY_MIGRATION);
+
+cont = (block == last_block_sent) ? RAM_SAVE_FLAG_CONTINUE : 0;
+p = memory_region_get_ram_ptr(mr) + offset;
+last_block_sent = block;
+
+if (is_dup_page(p)) {
+qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
+if (!cont) {
+qemu_put_byte(f, strlen(block-idstr));
+qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
+}
+qemu_put_byte(f, *p);
+return 1;
+}
+
+qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
+if (!cont) {
+qemu_put_byte(f, strlen(block-idstr));
+qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
+}
+qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+return TARGET_PAGE_SIZE;
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
@@ -162,45 +200,14 @@ int ram_save_block(QEMUFile *f)
 RAMBlock *block = last_block;
 ram_addr_t offset = last_offset;
 int bytes_sent = 0;
-MemoryRegion *mr;
 
-if (!block)
+if (!block) {
 block = QLIST_FIRST(ram_list.blocks);
+last_block = block;
+}
 
 do {
-mr = block-mr;
-if (memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
-DIRTY_MEMORY_MIGRATION)) {
-uint8_t *p;
-int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
-
-memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE,
-  DIRTY_MEMORY_MIGRATION);
-
-p = memory_region_get_ram_ptr(mr) + offset;
-
-if (is_dup_page(p)) {
-qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
-if (!cont) {
-qemu_put_byte(f, strlen(block-idstr));
-qemu_put_buffer(f, (uint8_t *)block-idstr,
-strlen(block-idstr));
-}
-qemu_put_byte(f, *p);
-bytes_sent = 1;
-} else {
-qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
-if (!cont) {
-qemu_put_byte(f, strlen(block-idstr));
-qemu_put_buffer(f, (uint8_t *)block-idstr,
-strlen(block-idstr));
-}
-qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
-bytes_sent = TARGET_PAGE_SIZE;
-}
-
-break;
-}
+bytes_sent = ram_save_page(f, block, offset);
 
 offset += TARGET_PAGE_SIZE;
 if (offset = block-length) {
@@ -209,6 +216,10 @@ int ram_save_block(QEMUFile *f)
 if (!block)
 block = QLIST_FIRST(ram_list.blocks);
 }
+
+if (bytes_sent  0) {
+break;
+}
 } while (block != last_block || offset != last_offset);
 
 last_block = block;
@@ -318,6 +329,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 if (stage == 1) {
 RAMBlock *block;
 bytes_transferred = 0;
+last_block_sent = NULL;
 last_block = NULL;
 last_offset = 0;
 sort_ram_list();
diff --git a/arch_init.h b/arch_init.h
index d84eac7..0a39082 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -40,6 +40,7 @@ int xen_available(void);
 #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
 
 #if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 void *ram_load_host_from_stream_offset(QEMUFile *f,
ram_addr_t offset,
int flags,
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: [Qemu-devel] [PATCH v2 00/41] postcopy live migration

2012-06-04 Thread Isaku Yamahata
On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote:
 Hello Isaku Yamahata,

Hi.

 I just saw your patches..Would it be possible to email me a tar bundle of 
 these
 patches (makes it easier to apply the patches to a copy of the upstream 
 qemu.git)

I uploaded them to github for those who are interested in it.

git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012
git://github.com/yamahata/linux-umem.git  linux-umem-june-04-2012 


 BTW, I am also curious if you have considered using any kind of RDMA features 
 for
 optimizing the page-faults during postcopy ?

Yes, RDMA is interesting topic. Can we share your use case/concern/issues?
Thus we can collaborate.
You may want to see Benoit's results. As long as I know, he has not published
his code yet.

thanks,

 Thanks
 Vinod



 --

 Message: 1
 Date: Mon,  4 Jun 2012 18:57:02 +0900
 From: Isaku Yamahatayamah...@valinux.co.jp
 To: qemu-de...@nongnu.org, kvm@vger.kernel.org
 Cc: benoit.hud...@gmail.com, aarca...@redhat.com, aligu...@us.ibm.com,
   quint...@redhat.com, stefa...@gmail.com, t.hirofu...@aist.go.jp,
   dl...@redhat.com, satoshi.i...@aist.go.jp,  
 mdr...@linux.vnet.ibm.com,
   yoshikawa.tak...@oss.ntt.co.jp, owass...@redhat.com, a...@redhat.com,
   pbonz...@redhat.com
 Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
 Message-ID:cover.1338802190.git.yamah...@valinux.co.jp

 After the long time, we have v2. This is qemu part.
 The linux kernel part is sent separatedly.

 Changes v1 -  v2:
 - split up patches for review
 - buffered file refactored
 - many bug fixes
   Espcially PV drivers can work with postcopy
 - optimization/heuristic

 Patches
 1 - 30: refactoring exsiting code and preparation
 31 - 37: implement postcopy itself (essential part)
 38 - 41: some optimization/heuristic for postcopy

 Intro
 =
 This patch series implements postcopy live migration.[1]
 As discussed at KVM forum 2011, dedicated character device is used for
 distributed shared memory between migration source and destination.
 Now we can discuss/benchmark/compare with precopy. I believe there are
 much rooms for improvement.

 [1] http://wiki.qemu.org/Features/PostCopyLiveMigration


 Usage
 =
 You need load umem character device on the host before starting migration.
 Postcopy can be used for tcg and kvm accelarator. The implementation depend
 on only linux umem character device. But the driver dependent code is split
 into a file.
 I tested only host page size == guest page size case, but the implementation
 allows host page size != guest page size case.

 The following options are added with this patch series.
 - incoming part
   command line options
   -postcopy [-postcopy-flagsflags]
   where flags is for changing behavior for benchmark/debugging
   Currently the following flags are available
   0: default
   1: enable touching page request

   example:
   qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm

 - outging part
   options for migrate command
   migrate [-p [-n] [-m]] URI [prefault forward  [prefault backword]]
   -p: indicate postcopy migration
   -n: disable background transferring pages: This is for benchmark/debugging
   -m: move background transfer of postcopy mode
   prefault forward: The number of forward pages which is sent with on-demand
   prefault backward: The number of backward pages which is sent with
on-demand

   example:
   migrate -p -n tcp:dest ip address:
   migrate -p -n -m tcp:dest ip address: 32 0


 TODO
 
 - benchmark/evaluation. Especially how async page fault affects the result.
 - improve/optimization
   At the moment at least what I'm aware of is
   - making incoming socket non-blocking with thread
 As page compression is comming, it is impractical to non-blocking read
 and check if the necessary data is read.
   - touching pages in incoming qemu process by fd handler seems suboptimal.
 creating dedicated thread?
   - outgoing handler seems suboptimal causing latency.
 - consider on FUSE/CUSE possibility
 - don't fork umemd, but create thread?

 basic postcopy work flow
 
 qemu on the destination
   |
   V
 open(/dev/umem)
   |
   V
 UMEM_INIT
   |
   V
 Here we have two file descriptors to
 umem device and shmem file
   |
   |  umemd
   |  daemon on the destination
   |
   Vcreate pipe to communicate
 fork()---,
   |  |
   V  |
 close(socket)V
 close(shmem)  mmap

Re: [PATCH v2 00/41] postcopy live migration

2012-06-04 Thread Isaku Yamahata
On Mon, Jun 04, 2012 at 08:37:04PM +0800, Anthony Liguori wrote:
 On 06/04/2012 05:57 PM, Isaku Yamahata wrote:
 After the long time, we have v2. This is qemu part.
 The linux kernel part is sent separatedly.

 Changes v1 -  v2:
 - split up patches for review
 - buffered file refactored
 - many bug fixes
Espcially PV drivers can work with postcopy
 - optimization/heuristic

 Patches
 1 - 30: refactoring exsiting code and preparation
 31 - 37: implement postcopy itself (essential part)
 38 - 41: some optimization/heuristic for postcopy

 Intro
 =
 This patch series implements postcopy live migration.[1]
 As discussed at KVM forum 2011, dedicated character device is used for
 distributed shared memory between migration source and destination.
 Now we can discuss/benchmark/compare with precopy. I believe there are
 much rooms for improvement.

 [1] http://wiki.qemu.org/Features/PostCopyLiveMigration


 Usage
 =
 You need load umem character device on the host before starting migration.
 Postcopy can be used for tcg and kvm accelarator. The implementation depend
 on only linux umem character device. But the driver dependent code is split
 into a file.
 I tested only host page size == guest page size case, but the implementation
 allows host page size != guest page size case.

 The following options are added with this patch series.
 - incoming part
command line options
-postcopy [-postcopy-flagsflags]
where flags is for changing behavior for benchmark/debugging
Currently the following flags are available
0: default
1: enable touching page request

example:
qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm

 - outging part
options for migrate command
migrate [-p [-n] [-m]] URI [prefault forward  [prefault backword]]
-p: indicate postcopy migration
-n: disable background transferring pages: This is for benchmark/debugging
-m: move background transfer of postcopy mode
prefault forward: The number of forward pages which is sent with 
 on-demand
prefault backward: The number of backward pages which is sent with
 on-demand

example:
migrate -p -n tcp:dest ip address:
migrate -p -n -m tcp:dest ip address: 32 0


 TODO
 
 - benchmark/evaluation. Especially how async page fault affects the result.

 I don't mean to beat on a dead horse, but I really don't understand the 
 point of postcopy migration other than the fact that it's possible.  It's 
 a lot of code and a new ABI in an area where we already have too much 
 difficulty maintaining our ABI.

 Without a compelling real world case with supporting benchmarks for why 
 we need postcopy and cannot improve precopy, I'm against merging this.

Some new results are available at 
https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_yamahata_postcopy.pdf

precopy assumes that the network bandwidth are wide enough and
the number of dirty pages converges. But it doesn't always hold true.

- planned migration
  predictability of total migration time is important

- dynamic consolidation
  In cloud use cases, the resources of physical machine are usually
  over committed.
  When physical machine becomes over loaded, some VMs are moved to another
  physical host to balance the load.
  precopy can't move VMs promptly. compression makes things worse.

- inter data center migration
  With L2 over L3 technology, it has becoming common to create a virtual
  data center which actually spans over multi physical data centers.
  It is useful to migrate VMs over physical data centers as disaster recovery.
  The network bandwidth between DCs is narrower than LAN case. So precopy
  assumption wouldn't hold.

- In case that network bandwidth might be limited by QoS,
  precopy assumption doesn't hold.


thanks,
-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: [Qemu-devel] [PATCH v2 00/41] postcopy live migration

2012-06-04 Thread Isaku Yamahata
On Mon, Jun 04, 2012 at 07:27:25AM -0700, Chegu Vinod wrote:
 On 6/4/2012 6:13 AM, Isaku Yamahata wrote:
 On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote:
 Hello Isaku Yamahata,
 Hi.

 I just saw your patches..Would it be possible to email me a tar bundle of 
 these
 patches (makes it easier to apply the patches to a copy of the upstream 
 qemu.git)
 I uploaded them to github for those who are interested in it.

 git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012
 git://github.com/yamahata/linux-umem.git  linux-umem-june-04-2012


 Thanks for the pointer...
 BTW, I am also curious if you have considered using any kind of RDMA 
 features for
 optimizing the page-faults during postcopy ?
 Yes, RDMA is interesting topic. Can we share your use case/concern/issues?


 Looking at large sized guests (256GB and higher)  running cpu/memory  
 intensive enterprise workloads.
 The  concerns are the same...i.e. having a predictable total migration  
 time, minimal downtime/freeze-time and of course minimal service  
 degradation to the workload(s) in the VM or the co-located VM's...

 How large of a guest have you tested your changes with and what kind of  
 workloads have you used so far ?

Only up to several GB VM. Off course We'd like to benchmark with real
huge VM (several hundred GB), but it's somewhat difficult.


 Thus we can collaborate.
 You may want to see Benoit's results.

 Yes. 'have already seen some of Benoit's results.

Great.

 Hence the question about use of RDMA techniques for post copy.

So far my implementation doesn't used RDMA.

 As long as I know, he has not published
 his code yet.

 Thanks
 Vinod


 thanks,

 Thanks
 Vinod



 --

 Message: 1
 Date: Mon,  4 Jun 2012 18:57:02 +0900
 From: Isaku Yamahatayamah...@valinux.co.jp
 To: qemu-de...@nongnu.org, kvm@vger.kernel.org
 Cc: benoit.hud...@gmail.com, aarca...@redhat.com, aligu...@us.ibm.com,
 quint...@redhat.com, stefa...@gmail.com, t.hirofu...@aist.go.jp,
 dl...@redhat.com, satoshi.i...@aist.go.jp,  
 mdr...@linux.vnet.ibm.com,
 yoshikawa.tak...@oss.ntt.co.jp, owass...@redhat.com, a...@redhat.com,
 pbonz...@redhat.com
 Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
 Message-ID:cover.1338802190.git.yamah...@valinux.co.jp

 After the long time, we have v2. This is qemu part.
 The linux kernel part is sent separatedly.

 Changes v1 -   v2:
 - split up patches for review
 - buffered file refactored
 - many bug fixes
Espcially PV drivers can work with postcopy
 - optimization/heuristic

 Patches
 1 - 30: refactoring exsiting code and preparation
 31 - 37: implement postcopy itself (essential part)
 38 - 41: some optimization/heuristic for postcopy

 Intro
 =
 This patch series implements postcopy live migration.[1]
 As discussed at KVM forum 2011, dedicated character device is used for
 distributed shared memory between migration source and destination.
 Now we can discuss/benchmark/compare with precopy. I believe there are
 much rooms for improvement.

 [1] http://wiki.qemu.org/Features/PostCopyLiveMigration


 Usage
 =
 You need load umem character device on the host before starting migration.
 Postcopy can be used for tcg and kvm accelarator. The implementation depend
 on only linux umem character device. But the driver dependent code is split
 into a file.
 I tested only host page size == guest page size case, but the implementation
 allows host page size != guest page size case.

 The following options are added with this patch series.
 - incoming part
command line options
-postcopy [-postcopy-flagsflags]
where flags is for changing behavior for benchmark/debugging
Currently the following flags are available
0: default
1: enable touching page request

example:
qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm

 - outging part
options for migrate command
migrate [-p [-n] [-m]] URI [prefault forward   [prefault backword]]
-p: indicate postcopy migration
-n: disable background transferring pages: This is for 
 benchmark/debugging
-m: move background transfer of postcopy mode
prefault forward: The number of forward pages which is sent with 
 on-demand
prefault backward: The number of backward pages which is sent with
 on-demand

example:
migrate -p -n tcp:dest ip address:
migrate -p -n -m tcp:dest ip address: 32 0


 TODO
 
 - benchmark/evaluation. Especially how async page fault affects the result.
 - improve/optimization
At the moment at least what I'm aware of is
- making incoming socket non-blocking with thread
  As page compression is comming, it is impractical to non-blocking read
  and check if the necessary data is read.
- touching pages in incoming qemu process by fd handler seems suboptimal.
  creating dedicated thread?
- outgoing handler seems

Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller

2012-02-13 Thread Isaku Yamahata
Oh nice work.

On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote:
 This adds support for SHPC interface, as defined by PCI Standard
 Hot-Plug Controller and Subsystem Specification, Rev 1.0
 http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10
 
 Only SHPC intergrated with a PCI-to-PCI bridge is supported,
 SHPC integrated with a host bridge would need more work.
 
 All main SHPC features are supported:
 - MRL sensor

Does this just report latch status? (It seems so.)
Do you plan to provide interfaces to manipulate the latch?


 - Attention button
 - Attention indicator
 - Power indicator

 Wake on hotplug and serr generation are stubbed out but unused
 as we don't have interfaces to generate these events ATM.
 
 One issue that isn't completely resolved is that qemu currently
 expects an eject interface, which SHPC does not provide: it merely
 removes the power to device and it's up to the user to remove the device
 from slot. This patch works around that by ejecting the device
 when power is removed and power LED goes off.
 
 TODO:
 - migration support
 - fix dependency on pci_internals.h

If I didn't miss the code,
- QMP command for pushing attention button.
- QMP command to get LED status
- QMP events for LED on/off

thanks,

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  Makefile.objs |1 +
  hw/pci.h  |6 +
  hw/shpc.c |  646 
 +
  hw/shpc.h |   40 
  qemu-common.h |1 +
  5 files changed, 694 insertions(+), 0 deletions(-)
  create mode 100644 hw/shpc.c
  create mode 100644 hw/shpc.h
 
 diff --git a/Makefile.objs b/Makefile.objs
 index 391e524..4546477 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
  hw-obj-y += fw_cfg.o
  hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o
  hw-obj-$(CONFIG_PCI) += msix.o msi.o
 +hw-obj-$(CONFIG_PCI) += shpc.o
  hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
  hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
  hw-obj-y += watchdog.o
 diff --git a/hw/pci.h b/hw/pci.h
 index 33b0b18..756577e 100644
 --- a/hw/pci.h
 +++ b/hw/pci.h
 @@ -125,6 +125,9 @@ enum {
  /* command register SERR bit enabled */
  #define QEMU_PCI_CAP_SERR_BITNR 4
  QEMU_PCI_CAP_SERR = (1  QEMU_PCI_CAP_SERR_BITNR),
 +/* Standard hot plug controller. */
 +#define QEMU_PCI_SHPC_BITNR 5
 +QEMU_PCI_CAP_SHPC = (1  QEMU_PCI_SHPC_BITNR),
  };
  
  #define TYPE_PCI_DEVICE pci-device
 @@ -229,6 +232,9 @@ struct PCIDevice {
  /* PCI Express */
  PCIExpressDevice exp;
  
 +/* SHPC */
 +SHPCDevice *shpc;
 +
  /* Location of option rom */
  char *romfile;
  bool has_rom;
 diff --git a/hw/shpc.c b/hw/shpc.c
 new file mode 100644
 index 000..4baec29
 --- /dev/null
 +++ b/hw/shpc.c
 @@ -0,0 +1,646 @@
 +#include strings.h
 +#include stdint.h
 +#include range.h
 +#include shpc.h
 +#include pci.h
 +#include pci_internals.h
 +
 +/* TODO: model power only and disabled slot states. */
 +/* TODO: handle SERR and wakeups */
 +/* TODO: consider enabling 66MHz support */
 +
 +/* TODO: remove fully only on state DISABLED and LED off.
 + * track state to properly record this. */
 +
 +/* SHPC Working Register Set */
 +#define SHPC_BASE_OFFSET  0x00 /* 4 bytes */
 +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */
 +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */
 +#define SHPC_NSLOTS   0x0C /* 1 byte */
 +#define SHPC_FIRST_DEV0x0D /* 1 byte */
 +#define SHPC_PHYS_SLOT0x0E /* 2 byte */
 +#define SHPC_PHYS_NUM_MAX 0x7ff
 +#define SHPC_PHYS_NUM_UP  0x1000
 +#define SHPC_PHYS_MRL 0x4000
 +#define SHPC_PHYS_BUTTON  0x8000
 +#define SHPC_SEC_BUS  0x10 /* 2 bytes */
 +#define SHPC_SEC_BUS_33   0x0
 +#define SHPC_SEC_BUS_66   0x1 /* Unused */
 +#define SHPC_SEC_BUS_MASK 0x7
 +#define SHPC_MSI_CTL  0x12 /* 1 byte */
 +#define SHPC_PROG_IFC 0x13 /* 1 byte */
 +#define SHPC_PROG_IFC_1_0 0x1
 +#define SHPC_CMD_CODE 0x14 /* 1 byte */
 +#define SHPC_CMD_TRGT 0x15 /* 1 byte */
 +#define SHPC_CMD_TRGT_MIN 0x1
 +#define SHPC_CMD_TRGT_MAX 0x1f
 +#define SHPC_CMD_STATUS   0x16 /* 2 bytes */
 +#define SHPC_CMD_STATUS_BUSY  0x1
 +#define SHPC_CMD_STATUS_MRL_OPEN  0x2
 +#define SHPC_CMD_STATUS_INVALID_CMD   0x4
 +#define SHPC_CMD_STATUS_INVALID_MODE  0x8
 +#define SHPC_INT_LOCATOR  0x18 /* 4 bytes */
 +#define SHPC_INT_COMMAND  0x1
 +#define SHPC_SERR_LOCATOR 0x1C /* 4 bytes */
 +#define SHPC_SERR_INT 0x20 /* 4 bytes */
 +#define SHPC_INT_DIS  0x1
 +#define SHPC_SERR_DIS 0x2
 +#define SHPC_CMD_INT_DIS  0x4
 +#define SHPC_ARB_SERR_DIS 0x8
 +#define SHPC_CMD_DETECTED 0x1
 +#define SHPC_ARB_DETECTED 0x2
 + /* 4 bytes * slot # (start from 0) */
 +#define SHPC_SLOT_REG(s) (0x24 + (s) * 4)
 + /* 2 bytes */
 +#define SHPC_SLOT_STATUS(s)   (0x0 + SHPC_SLOT_REG(s))
 +
 +/* Same slot state masks are used 

Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller

2012-02-13 Thread Isaku Yamahata
On Mon, Feb 13, 2012 at 01:49:32PM +0200, Michael S. Tsirkin wrote:
 On Mon, Feb 13, 2012 at 07:03:52PM +0900, Isaku Yamahata wrote:
  Oh nice work.
  
  On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote:
   This adds support for SHPC interface, as defined by PCI Standard
   Hot-Plug Controller and Subsystem Specification, Rev 1.0
   http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10
   
   Only SHPC intergrated with a PCI-to-PCI bridge is supported,
   SHPC integrated with a host bridge would need more work.
   
   All main SHPC features are supported:
   - MRL sensor
  
  Does this just report latch status? (It seems so.)
 
 What happens is that adding a device closes the latch, removing a device
 opens the latch.  This simplifies the number of supported configurations
 significantly.
 
 
  Do you plan to provide interfaces to manipulate the latch?
 
 I didn't plan to do this, and this is non-trivial.
 Do you just want this for empty slots?  And why?

No, I just wondered your plan.


   - Attention button
   - Attention indicator
   - Power indicator
  
   Wake on hotplug and serr generation are stubbed out but unused
   as we don't have interfaces to generate these events ATM.
   
   One issue that isn't completely resolved is that qemu currently
   expects an eject interface, which SHPC does not provide: it merely
   removes the power to device and it's up to the user to remove the device
   from slot. This patch works around that by ejecting the device
   when power is removed and power LED goes off.
   
   TODO:
   - migration support
   - fix dependency on pci_internals.h
  
  If I didn't miss the code,
  - QMP command for pushing attention button.
  - QMP command to get LED status
 
 It's easy to add these, so I'd accept such a patch,
 but I wonder why.

My concern is how libvirt/virt-manger (or other UI) presents
slot status to operators/users.


  - QMP events for LED on/off
 
 There's also blink :)
 
  
  thanks,
 
 I'm concerned that a guest can flood the management with such events.
 It's better to send a single LED change event, then we
 can suppress further events until next get LED status command.

Makes sense.

 
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
Makefile.objs |1 +
hw/pci.h  |6 +
hw/shpc.c |  646 
   +
hw/shpc.h |   40 
qemu-common.h |1 +
5 files changed, 694 insertions(+), 0 deletions(-)
create mode 100644 hw/shpc.c
create mode 100644 hw/shpc.h
   
   diff --git a/Makefile.objs b/Makefile.objs
   index 391e524..4546477 100644
   --- a/Makefile.objs
   +++ b/Makefile.objs
   @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
hw-obj-y += fw_cfg.o
hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o
hw-obj-$(CONFIG_PCI) += msix.o msi.o
   +hw-obj-$(CONFIG_PCI) += shpc.o
hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
hw-obj-y += watchdog.o
   diff --git a/hw/pci.h b/hw/pci.h
   index 33b0b18..756577e 100644
   --- a/hw/pci.h
   +++ b/hw/pci.h
   @@ -125,6 +125,9 @@ enum {
/* command register SERR bit enabled */
#define QEMU_PCI_CAP_SERR_BITNR 4
QEMU_PCI_CAP_SERR = (1  QEMU_PCI_CAP_SERR_BITNR),
   +/* Standard hot plug controller. */
   +#define QEMU_PCI_SHPC_BITNR 5
   +QEMU_PCI_CAP_SHPC = (1  QEMU_PCI_SHPC_BITNR),
};

#define TYPE_PCI_DEVICE pci-device
   @@ -229,6 +232,9 @@ struct PCIDevice {
/* PCI Express */
PCIExpressDevice exp;

   +/* SHPC */
   +SHPCDevice *shpc;
   +
/* Location of option rom */
char *romfile;
bool has_rom;
   diff --git a/hw/shpc.c b/hw/shpc.c
   new file mode 100644
   index 000..4baec29
   --- /dev/null
   +++ b/hw/shpc.c
   @@ -0,0 +1,646 @@
   +#include strings.h
   +#include stdint.h
   +#include range.h
   +#include shpc.h
   +#include pci.h
   +#include pci_internals.h
   +
   +/* TODO: model power only and disabled slot states. */
   +/* TODO: handle SERR and wakeups */
   +/* TODO: consider enabling 66MHz support */
   +
   +/* TODO: remove fully only on state DISABLED and LED off.
   + * track state to properly record this. */
   +
   +/* SHPC Working Register Set */
   +#define SHPC_BASE_OFFSET  0x00 /* 4 bytes */
   +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */
   +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */
   +#define SHPC_NSLOTS   0x0C /* 1 byte */
   +#define SHPC_FIRST_DEV0x0D /* 1 byte */
   +#define SHPC_PHYS_SLOT0x0E /* 2 byte */
   +#define SHPC_PHYS_NUM_MAX 0x7ff
   +#define SHPC_PHYS_NUM_UP  0x1000
   +#define SHPC_PHYS_MRL 0x4000
   +#define SHPC_PHYS_BUTTON  0x8000
   +#define SHPC_SEC_BUS  0x10 /* 2 bytes */
   +#define SHPC_SEC_BUS_33   0x0
   +#define SHPC_SEC_BUS_66   0x1 /* Unused */
   +#define SHPC_SEC_BUS_MASK 0x7

Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2012-01-12 Thread Isaku Yamahata
Very interesting. We can cooperate for better (postcopy) live migration.
The code doesn't seem available yet, I'm eager for it.


On Fri, Jan 13, 2012 at 01:09:30AM +, Benoit Hudzia wrote:
 Hi,
 
 Sorry to jump to hijack the thread  like that , however i would like
 to just to inform you  that we recently achieve a milestone out of the
 research project I'm leading. We enhanced KVM in order to deliver
 post copy live migration using RDMA at kernel level.
 
 Few point on the architecture of the system :
 
 * RDMA communication engine in kernel ( you can use soft iwarp or soft
 ROCE if you don't have hardware acceleration, however we also support
 standard RDMA enabled NIC) .

Do you mean infiniband subsystem?


 * Naturally Page are transferred with Zerop copy protocol
 * Leverage the async page fault system.
 * Pre paging / faulting
 * No context switch as everything is handled within kernel and using
 the page fault system.
 * Hybrid migration ( pre + post copy) available

Ah, I've been also planing this.
After pre-copy phase, is the dirty bitmap sent?

So far I've thought naively that pre-copy phase would be finished by the
number of iterations. On the other hand your choice is timeout of
pre-copy phase. Do you have rationale? or it was just natural for you?


 * Rely on an independent Kernel Module
 * No modification to the KVM kernel Module
 * Minimal Modification to the Qemu-Kvm code
 * We plan to add the page prioritization algo in order to optimise the
 pre paging algo and background transfer

Where do you plan to implement? in qemu or in your kernel module?
This algo could be shared.

thanks in advance.

 You can learn a little bit more and see a demo here:
 http://tinyurl.com/8xa2bgl
 I hope to be able to provide more detail on the design soon. As well
 as more concrete demo of the system ( live migration of VM running
 large  enterprise apps such as ERP or In memory DB)
 
 Note: this is just a step stone as the post copy live migration mainly
 enable us to validate the architecture design and  code.
 
 Regards
 Benoit
 
 
 
 
 
 
 
 Regards
 Benoit
 
 
 On 12 January 2012 13:59, Avi Kivity a...@redhat.com wrote:
  On 01/04/2012 05:03 AM, Isaku Yamahata wrote:
  Yes, it's quite doable in user space(qemu) with a kernel-enhancement.
  And it would be easy to convert a separated daemon process into a thread
  in qemu.
 
  I think it should be done out side of qemu process for some reasons.
  (I just repeat same discussion at the KVM-forum because no one remembers
  it)
 
  - ptrace (and its variant)
  ?? Some people want to investigate guest ram on host (qemu stopped or 
  lively).
  ?? For example, enhance crash utility and it will attach qemu process and
  ?? debug guest kernel.
 
  To debug the guest kernel you don't need to stop qemu itself. ?? I agree
  it's a problem for qemu debugging though.
 
 
  - core dump
  ?? qemu process may core-dump.
  ?? As postmortem analysis, people want to investigate guest RAM.
  ?? Again enhance crash utility and it will read the core file and analyze
  ?? guest kernel.
  ?? When creating core, the qemu process is already dead.
 
  Yes, strong point.
 
  It precludes the above possibilities to handle fault in qemu process.
 
  I agree.
 
 
  --
  error compiling committee.c: too many arguments to function
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at ??http://vger.kernel.org/majordomo-info.html
 
 
 
 -- 
  The production of too many useful things results in too many useless people
 

-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2012-01-12 Thread Isaku Yamahata
One more question.
Does your architecture/implementation (in theory) allow KVM memory
features like swap, KSM, THP?


On Fri, Jan 13, 2012 at 11:03:23AM +0900, Isaku Yamahata wrote:
 Very interesting. We can cooperate for better (postcopy) live migration.
 The code doesn't seem available yet, I'm eager for it.
 
 
 On Fri, Jan 13, 2012 at 01:09:30AM +, Benoit Hudzia wrote:
  Hi,
  
  Sorry to jump to hijack the thread  like that , however i would like
  to just to inform you  that we recently achieve a milestone out of the
  research project I'm leading. We enhanced KVM in order to deliver
  post copy live migration using RDMA at kernel level.
  
  Few point on the architecture of the system :
  
  * RDMA communication engine in kernel ( you can use soft iwarp or soft
  ROCE if you don't have hardware acceleration, however we also support
  standard RDMA enabled NIC) .
 
 Do you mean infiniband subsystem?
 
 
  * Naturally Page are transferred with Zerop copy protocol
  * Leverage the async page fault system.
  * Pre paging / faulting
  * No context switch as everything is handled within kernel and using
  the page fault system.
  * Hybrid migration ( pre + post copy) available
 
 Ah, I've been also planing this.
 After pre-copy phase, is the dirty bitmap sent?
 
 So far I've thought naively that pre-copy phase would be finished by the
 number of iterations. On the other hand your choice is timeout of
 pre-copy phase. Do you have rationale? or it was just natural for you?
 
 
  * Rely on an independent Kernel Module
  * No modification to the KVM kernel Module
  * Minimal Modification to the Qemu-Kvm code
  * We plan to add the page prioritization algo in order to optimise the
  pre paging algo and background transfer
 
 Where do you plan to implement? in qemu or in your kernel module?
 This algo could be shared.
 
 thanks in advance.
 
  You can learn a little bit more and see a demo here:
  http://tinyurl.com/8xa2bgl
  I hope to be able to provide more detail on the design soon. As well
  as more concrete demo of the system ( live migration of VM running
  large  enterprise apps such as ERP or In memory DB)
  
  Note: this is just a step stone as the post copy live migration mainly
  enable us to validate the architecture design and  code.
  
  Regards
  Benoit
  
  
  
  
  
  
  
  Regards
  Benoit
  
  
  On 12 January 2012 13:59, Avi Kivity a...@redhat.com wrote:
   On 01/04/2012 05:03 AM, Isaku Yamahata wrote:
   Yes, it's quite doable in user space(qemu) with a kernel-enhancement.
   And it would be easy to convert a separated daemon process into a thread
   in qemu.
  
   I think it should be done out side of qemu process for some reasons.
   (I just repeat same discussion at the KVM-forum because no one remembers
   it)
  
   - ptrace (and its variant)
   ?? Some people want to investigate guest ram on host (qemu stopped or 
   lively).
   ?? For example, enhance crash utility and it will attach qemu process and
   ?? debug guest kernel.
  
   To debug the guest kernel you don't need to stop qemu itself. ?? I agree
   it's a problem for qemu debugging though.
  
  
   - core dump
   ?? qemu process may core-dump.
   ?? As postmortem analysis, people want to investigate guest RAM.
   ?? Again enhance crash utility and it will read the core file and analyze
   ?? guest kernel.
   ?? When creating core, the qemu process is already dead.
  
   Yes, strong point.
  
   It precludes the above possibilities to handle fault in qemu process.
  
   I agree.
  
  
   --
   error compiling committee.c: too many arguments to function
  
   --
   To unsubscribe from this list: send the line unsubscribe kvm in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at ??http://vger.kernel.org/majordomo-info.html
  
  
  
  -- 
   The production of too many useful things results in too many useless 
  people
  
 
 -- 
 yamahata
 

-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2012-01-03 Thread Isaku Yamahata
On Mon, Jan 02, 2012 at 06:05:51PM +0100, Andrea Arcangeli wrote:
 On Thu, Dec 29, 2011 at 06:01:45PM +0200, Avi Kivity wrote:
  On 12/29/2011 06:00 PM, Avi Kivity wrote:
   The NFS client has exactly the same issue, if you mount it with the intr
   option.  In fact you could use the NFS client as a trivial umem/cuse
   prototype.
  
  Actually, NFS can return SIGBUS, it doesn't care about restarting daemons.
 
 During KVMForum I suggested to a few people that it could be done
 entirely in userland with PROT_NONE. So the problem is if we do it in
 userland with the current functionality you'll run out of VMAs and
 slowdown performance too much.
 
 But all you need is the ability to map single pages in the address
 space. The only special requirement is that a new vma must not be
 created during the map operation. It'd be very similar to
 remap_file_pages for MAP_SHARED, it also was created to avoid having
 to create new vmas on a large MAP_SHARED mapping and no other reason
 at all. In our case we deal with a large MAP_ANONYMOUS mapping and we
 must alter the pte without creating new vmas but the problem is very
 similar to remap_file_pages.
 
 Qemu in the dst node can do:
 
   mmap(MAP_ANONYMOUS)
   fault_area_prepare(start, end, signalnr)
 
 prepare_fault_area will map the range with the magic pte.
 
 Then when the signalnr fires, you do:
 
  send(givemepageX)
  recv(tmpaddr_aligned, PAGE_SIZE,...);
  fault_area_map(final_dest_aligned, tmpaddr_aligned, size)
 
 map_fault_area will check the pgprot of the two vmas mapping
 final_dest_aligned and tmpaddr_aligned have the same vma-vm_pgprot
 and various other vma bits, and if all ok, it'll just copy the pte
 from tmpaddr_aligned, to final_dest_aligned and it'll update the
 page-index. It can fail if the page is shared to avoid dealing with
 the non-linearity of the page mapped in multiple vmas.
 
 You basically need a bypass to avoid altering the pgprot of the vma,
 and enter into the pte a magic thing that fires signal handlers
 if accessed, without having to create new vmas. gup/gup_fast and stuff
 should just always fallback into handle_mm_fault when encountering such a
 thing, so returning failure as if gup_fast was run on a address beyond
 the end of the i_size in the MAP_SHARED case.

Yes, it's quite doable in user space(qemu) with a kernel-enhancement.
And it would be easy to convert a separated daemon process into a thread
in qemu.

I think it should be done out side of qemu process for some reasons.
(I just repeat same discussion at the KVM-forum because no one remembers
it)

- ptrace (and its variant)
  Some people want to investigate guest ram on host (qemu stopped or lively).
  For example, enhance crash utility and it will attach qemu process and
  debug guest kernel.

- core dump
  qemu process may core-dump.
  As postmortem analysis, people want to investigate guest RAM.
  Again enhance crash utility and it will read the core file and analyze
  guest kernel.
  When creating core, the qemu process is already dead.

It precludes the above possibilities to handle fault in qemu process.


 THP already works on /dev/zero mmaps as long as it's a MAP_PRIVATE,
 KSM should work too but I doubt anybody tested it on MAP_PRIVATE of
 /dev/zero.

Oh great. It seems to work with anonymous page generally of non-anonymous VMA.
Is that right?
If correct, THP/KSM work with mmap(MAP_PRIVATE, /dev/umem...), do they?


 The device driver provides an advantage in being self contained but I
 doubt it's simpler. I suppose after migration is complete you'll still
 switch the vma back to regular anonymous vma so leading to the same
 result?

Yes, it was my original intention.
The page is anonymous, but the vma isn't anonymous. I concerned that
KSM/THP doesn't work with such pages.
If they work, it isn't necessary to switch the VMA into anonymous.


 The patch 2/2 is small and self contained so it's quite attractive, I
 didn't see patch 1/2, was it posted?

Posted. It's quite short and trivial which just do EXPORT_SYMBOL_GPL of
mem_cgroup_cache_chage and shmem_zero_setup.
I include it here for convenience.

From e8bfda16a845eef4381872a331c6f0f200c3f7d7 Mon Sep 17 00:00:00 2001
Message-Id: 
e8bfda16a845eef4381872a331c6f0f200c3f7d7.1325055066.git.yamah...@valinux.co.jp
In-Reply-To: cover.1325055065.git.yamah...@valinux.co.jp
References: cover.1325055065.git.yamah...@valinux.co.jp
From: Isaku Yamahata yamah...@valinux.co.jp
Date: Thu, 11 Aug 2011 20:05:28 +0900
Subject: [PATCH 1/2] export necessary symbols

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 mm/memcontrol.c |1 +
 mm/shmem.c  |1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b63f5f7..85530fc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2807,6 +2807,7 @@ int mem_cgroup_cache_charge(struct page *page, struct 
mm_struct *mm,
 
return ret;
 }
+EXPORT_SYMBOL_GPL(mem_cgroup_cache_charge

Re: [PATCH 21/21] postcopy: implement postcopy livemigration

2012-01-03 Thread Isaku Yamahata
On Thu, Dec 29, 2011 at 06:06:10PM +0200, Avi Kivity wrote:
 On 12/29/2011 03:26 AM, Isaku Yamahata wrote:
  This patch implements postcopy livemigration.
 
   
  +/* RAM is allocated via umem for postcopy incoming mode */
  +#define RAM_POSTCOPY_UMEM_MASK  (1  1)
  +
   typedef struct RAMBlock {
   uint8_t *host;
   ram_addr_t offset;
  @@ -485,6 +488,10 @@ typedef struct RAMBlock {
   #if defined(__linux__)  !defined(TARGET_S390X)
   int fd;
   #endif
  +
  +#ifdef CONFIG_POSTCOPY
  +UMem *umem;/* for incoming postcopy mode */
  +#endif
   } RAMBlock;
 
 Is it possible to implement this via the MemoryListener API (which
 replaces CPUPhysMemoryClient)?  This is how kvm, vhost, and xen manage
 their memory tables.

I'm afraid no. Those three you listed above are for outgoing part,
but this case is for incoming part. The requirement is quite different
from those three. What is needed is
- get the corresponding RAMBlock and UMem from (id, idlen)
- hook ram_alloc/ram_free (or RAM api corresponding)

thanks,
-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration

2012-01-03 Thread Isaku Yamahata
On Thu, Dec 29, 2011 at 04:39:52PM -0600, Anthony Liguori wrote:
 TODO
 
 - benchmark/evaluation. Especially how async page fault affects the result.

 I'll review this series next week (Mike/Juan, please also review when you 
 can).

 But we really need to think hard about whether this is the right thing to 
 take into the tree.  I worry a lot about the fact that we don't test 
 pre-copy migration nearly enough and adding a second form just introduces 
 more things to test.

 It's also not clear to me why post-copy is better.  If you were going to 
 sit down and explain to someone building a management tool when they 
 should use pre-copy and when they should use post-copy, what would you 
 tell them?

The concrete patch and its benchmark/evaluation result will help much for
making better discussion/decision (whatever decision we will make).

My answer is, follow the same policy for block device case.
It supports block migration/copy-on-read/image streaming/live block copy...
(some of them are under development, though)

Seriously, we'll learn the best practice through evaluation/making experiences.

thanks,
-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2011-12-29 Thread Isaku Yamahata
On Thu, Dec 29, 2011 at 04:55:11PM +0200, Avi Kivity wrote:
 On 12/29/2011 04:49 PM, Isaku Yamahata wrote:
Great, then we agreed with list/reattach basically.
(Maybe identity scheme needs reconsideration.)
   
   I guess we miscommunicated.  Why is reattach needed?  If you have the
   fd, nothing else is needed.
 
  What if malicious process close the fd and does page fault intentionally?
  Unkillable process issue remains.
  I think we are talking not only qemu case but also general case.
 
 It's not unkillable.  If you sleep with TASK_INTERRUPTIBLE then you can
 process signals.  This includes SIGKILL.

Hmm, you said that the fault handler doesn't resolve the page fault.

  Don't resolve the page fault.  It's up to the user/system to make sure
  it happens.  qemu can easily do it by watching for the daemon's death
  and respawning it.

To kill the process, the fault handler must return resolving the fault.
It must return something. What do you expect? VM_FAULT_SIGBUS? zero page?
-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] umem: chardevice for kvm postcopy

2011-12-29 Thread Isaku Yamahata
Thank you for review.

On Thu, Dec 29, 2011 at 01:17:51PM +0200, Avi Kivity wrote:
  +   default n
  +   help
  + User process backed memory driver provides /dev/umem device.
  + The /dev/umem device is designed for some sort of distributed
  + shared memory. Especially post-copy live migration with KVM.
  + When in doubt, say N.
  +
 
 Need documentation of the protocol between the kernel and userspace; not
 just the ioctls, but also how faults are propagated.

Will do.

 
  +
  +struct umem_page_req_list {
  +   struct list_head list;
  +   pgoff_t pgoff;
  +};
  +
 
  +
  +
  +static int umem_mark_page_cached(struct umem *umem,
  +struct umem_page_cached *page_cached)
  +{
  +   int ret = 0;
  +#define PG_MAX ((__u32)32)
  +   __u64 pgoffs[PG_MAX];
  +   __u32 nr;
  +   unsigned long bit;
  +   bool wake_up_list = false;
  +
  +   nr = 0;
  +   while (nr  page_cached-nr) {
  +   __u32 todo = min(PG_MAX, (page_cached-nr - nr));
  +   int i;
  +
  +   if (copy_from_user(pgoffs, page_cached-pgoffs + nr,
  +  sizeof(*pgoffs) * todo)) {
  +   ret = -EFAULT;
  +   goto out;
  +   }
  +   for (i = 0; i  todo; ++i) {
  +   if (pgoffs[i] = umem-pgoff_end) {
  +   ret = -EINVAL;
  +   goto out;
  +   }
  +   set_bit(pgoffs[i], umem-cached);
  +   }
  +   nr += todo;
  +   }
  +
 
 Probably need an smp_wmb() where.
 
  +   spin_lock(umem-lock);
  +   bit = 0;
  +   for (;;) {
  +   bit = find_next_bit(umem-sync_wait_bitmap, umem-sync_req_max,
  +   bit);
  +   if (bit = umem-sync_req_max)
  +   break;
  +   if (test_bit(umem-sync_req[bit], umem-cached))
  +   wake_up(umem-page_wait[bit]);
 
 Why not do this test in the loop above?
 
  +   bit++;
  +   }
  +
  +   if (umem-req_list_nr  0)
  +   wake_up_list = true;
  +   spin_unlock(umem-lock);
  +
  +   if (wake_up_list)
  +   wake_up_all(umem-req_list_wait);
  +
  +out:
  +   return ret;
  +}
  +
  +
  +
  +static void umem_put(struct umem *umem)
  +{
  +   int ret;
  +
  +   mutex_lock(umem_list_mutex);
  +   ret = kref_put(umem-kref, umem_free);
  +   if (ret == 0) {
  +   mutex_unlock(umem_list_mutex);
  +   }
 
 This looks wrong.
 
  +}
  +
  +
  +static int umem_create_umem(struct umem_create *create)
  +{
  +   int error = 0;
  +   struct umem *umem = NULL;
  +   struct vm_area_struct *vma;
  +   int shmem_fd;
  +   unsigned long bitmap_bytes;
  +   unsigned long sync_bitmap_bytes;
  +   int i;
  +
  +   umem = kzalloc(sizeof(*umem), GFP_KERNEL);
  +   umem-name = create-name;
  +   kref_init(umem-kref);
  +   INIT_LIST_HEAD(umem-list);
  +
  +   mutex_lock(umem_list_mutex);
  +   error = umem_add_list(umem);
  +   if (error) {
  +   goto out;
  +   }
  +
  +   umem-task = NULL;
  +   umem-mmapped = false;
  +   spin_lock_init(umem-lock);
  +   umem-size = roundup(create-size, PAGE_SIZE);
  +   umem-pgoff_end = umem-size  PAGE_SHIFT;
  +   init_waitqueue_head(umem-req_wait);
  +
  +   vma = umem-vma;
  +   vma-vm_start = 0;
  +   vma-vm_end = umem-size;
  +   /* this shmem file is used for temporal buffer for pages
  +  so it's unlikely that so many pages exists in this shmem file */
  +   vma-vm_flags = VM_READ | VM_SHARED | VM_NOHUGEPAGE | VM_DONTCOPY |
  +   VM_DONTEXPAND;
  +   vma-vm_page_prot = vm_get_page_prot(vma-vm_flags);
  +   vma-vm_pgoff = 0;
  +   INIT_LIST_HEAD(vma-anon_vma_chain);
  +
  +   shmem_fd = get_unused_fd();
  +   if (shmem_fd  0) {
  +   error = shmem_fd;
  +   goto out;
  +   }
  +   error = shmem_zero_setup(vma);
  +   if (error  0) {
  +   put_unused_fd(shmem_fd);
  +   goto out;
  +   }
  +   umem-shmem_filp = vma-vm_file;
  +   get_file(umem-shmem_filp);
  +   fd_install(shmem_fd, vma-vm_file);
  +   create-shmem_fd = shmem_fd;
  +
  +   create-umem_fd = anon_inode_getfd(umem,
  +  umem_fops, umem, O_RDWR);
  +   if (create-umem_fd  0) {
  +   error = create-umem_fd;
  +   goto out;
  +   }
  +
  +   bitmap_bytes = umem_bitmap_bytes(umem);
  +   if (bitmap_bytes  PAGE_SIZE) {
  +   umem-cached = vzalloc(bitmap_bytes);
  +   umem-faulted = vzalloc(bitmap_bytes);
  +   } else {
  +   umem-cached = kzalloc(bitmap_bytes, GFP_KERNEL);
  +   umem-faulted = kzalloc(bitmap_bytes, GFP_KERNEL);
  +   }
  +
  +   /* those constants are not exported.
  +  They are just used for default value */
  +#define KVM_MAX_VCPUS  256
  +#define ASYNC_PF_PER_VCPU 64
 
 Best to avoid defaults and require userspace choose.

Okay.


  +
  +#define ASYNC_REQ_MAX  (ASYNC_PF_PER_VCPU * KVM_MAX_VCPUS)
  +   if (create-async_req_max == 0)
  

Re: [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2011-12-29 Thread Isaku Yamahata
On Thu, Dec 29, 2011 at 02:55:42PM +0200, Avi Kivity wrote:
 On 12/29/2011 02:39 PM, Isaku Yamahata wrote:
ioctl commands:
   
UMEM_DEV_CRATE_UMEM: create umem device for qemu
UMEM_DEV_LIST: list created umem devices
UMEM_DEV_REATTACH: re-attach the created umem device
  UMEM_DEV_LIST and UMEM_DEV_REATTACH are used when
  the process that services page fault disappears or 
get stack.
  Then, administrator can list the umem devices and 
unblock
  the process which is waiting for page.
   
   Ah, I asked about this in my patch comments.  I think this is done
   better by using SCM_RIGHTS to pass fds along, or asking qemu to launch a
   new process.
 
  Can you please elaborate? I think those ways you are suggesting doesn't 
  solve
  the issue. Let me clarify the problem.
 
process A (typically incoming qemu)
   |
   | mmap(/dev/umem) and access those pages triggering page faults
   | (the file descriptor might be closed after mmap() before page faults)
   |
   V
 /dev/umem
   ^
   |
   |
 daemon X resolving page faults triggered by process A
 (typically this daemon forked from incoming qemu:process A)
 
  If daemon X disappears accidentally, there is no one that resolves
  page faults of process A. At this moment process A is blocked due to page
  fault. There is no file descriptor available corresponding to the VMA.
  Here there is no way to kill process A, but system reboot.
 
 qemu can have an extra thread that wait4()s the daemon, and relaunch
 it.  This extra thread would not be blocked by the page fault.  It can
 keep the fd so it isn't lost.
 
 The unkillability of process A is a security issue; it could be done on
 purpose.  Is it possible to change umem to sleep with
 TASK_INTERRUPTIBLE, so it can be killed?

The issue is how to solve the page fault, not whether TASK_INTERRUPTIBLE or
TASK_UNINTERRUPTIBLE.
I can think of several options.
- When daemon X is dead, all page faults are served by zero pages.
- When daemon X is dead, all page faults are resovled as VM_FAULT_SIGBUS
- list/reattach: complications. You don't like it
- other?


   Introducing a global namespace has a lot of complications attached.
   
   
UMEM_GET_PAGE_REQUEST: retrieve page fault of qemu process
UMEM_MARK_PAGE_CACHED: mark the specified pages pulled from the source
   for daemon
   
UMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process
 This is _NOT_ implemented yet.
 anonymous I'm not sure whether this can be 
implemented
 or not.
   
   How do we find out?  This is fairly important, stuff like transparent
   hugepages and ksm only works on anonymous memory.
 
  I agree that this is important.
  At KVM-forum 2011, Andrea said THP and KSM works with non-anonymous VMA.
  (Or at lease he'll look into those stuff. My memory is vague, though.
   Please correct me if I'm wrong)
 
 += Andrea (who can also provide feedback on umem in general)
 
 -- 
 error compiling committee.c: too many arguments to function
 

-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >