Re: [Qemu-devel] [PATCH] Fix two XBZRLE corruption issues
/* Can't send this cached data async, since the cache page + * might get updated before it gets to the wire + */ +send_async = false; } } /* XBZRLE overflow or normal page */ if (bytes_sent == -1) { bytes_sent = save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE); -qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE); +if (send_async) { +qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE); +} else { +qemu_put_buffer(f, p, TARGET_PAGE_SIZE); +} bytes_sent += TARGET_PAGE_SIZE; acct_info.norm_pages++; } diff --git a/include/migration/page_cache.h b/include/migration/page_cache.h index d156f0d..2d5ce2d 100644 --- a/include/migration/page_cache.h +++ b/include/migration/page_cache.h @@ -66,7 +66,7 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr); * @addr: page address * @pdata: pointer to the page */ -int cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata); +int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata); /** * cache_resize: resize the page cache. In case of size reduction the extra diff --git a/page_cache.c b/page_cache.c index 3ef6ee7..b033681 100644 --- a/page_cache.c +++ b/page_cache.c @@ -150,7 +150,7 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr) return cache_get_by_addr(cache, addr)->it_data; } -int cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata) +int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata) { CacheItem *it = NULL; Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
On 02/11/2014 03:33 PM, Stefan Priebe - Profihost AG wrote: Am 11.02.2014 14:32, schrieb Orit Wasserman: On 02/08/2014 09:23 PM, Stefan Priebe wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. XBZRLE is disabled by default (actually all capabilities are off by default) What version of QEMU are you using that you need to disable it explicitly? Maybe you run migration with XBZRLE and canceled it, so it stays on? No real idea why this happens - but yes this seems to be a problem for me. I checked upstream QEMU and it is still off by default (always been) But the bug in XBZRLE is still there ;-) We need to understand the exact scenario in order to understand the problem. What exact version of Qemu are you using? Can you try with the latest upstream version, there were some fixes to the XBZRLE code? Stefan Orit Stefan Am 07.02.2014 21:10, schrieb Stefan Priebe: Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: anything i could try or debug? to help to find the problem? I think the most useful would be to see if the problem is a new problem in the 1.7 you're using or has existed for a while; depending on the machine type you used, it might be possible to load that image on an earlier (or newer) qemu and try the same test, however if the problem doesn't repeat reliably it can be hard. I've seen this first with Qemu 1.5 but was not able to reproduce it for month. 1.4 was working fine. If you have any way of simplifying the configuration of the VM it would be good; e.g. if you could get a failure on something without graphics (-nographic) and USB. Sadly not ;-( Dave Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same "pattern" there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the "new" host. Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
On 02/08/2014 09:23 PM, Stefan Priebe wrote: i could fix it by explicitly disable xbzrle - it seems its automatically on if i do not set the migration caps to false. So it seems to be a xbzrle bug. XBZRLE is disabled by default (actually all capabilities are off by default) What version of QEMU are you using that you need to disable it explicitly? Maybe you run migration with XBZRLE and canceled it, so it stays on? Orit Stefan Am 07.02.2014 21:10, schrieb Stefan Priebe: Am 07.02.2014 21:02, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: anything i could try or debug? to help to find the problem? I think the most useful would be to see if the problem is a new problem in the 1.7 you're using or has existed for a while; depending on the machine type you used, it might be possible to load that image on an earlier (or newer) qemu and try the same test, however if the problem doesn't repeat reliably it can be hard. I've seen this first with Qemu 1.5 but was not able to reproduce it for month. 1.4 was working fine. If you have any way of simplifying the configuration of the VM it would be good; e.g. if you could get a failure on something without graphics (-nographic) and USB. Sadly not ;-( Dave Stefan Am 07.02.2014 14:45, schrieb Stefan Priebe - Profihost AG: it's always the same "pattern" there are too many 0 instead of X. only seen: read:0x ... expected:0x or read:0x ... expected:0x or read:0xbf00bf00 ... expected:0xbfffbfff or read:0x ... expected:0xb5b5b5b5b5b5b5b5 no idea if this helps. Stefan Am 07.02.2014 14:39, schrieb Stefan Priebe - Profihost AG: Hi, Am 07.02.2014 14:19, schrieb Paolo Bonzini: Il 07/02/2014 14:04, Stefan Priebe - Profihost AG ha scritto: first of all i've now a memory image of a VM where i can reproduce it. You mean you start that VM with -incoming 'exec:cat /path/to/vm.img'? But google stress test doesn't report any error until you start migration _and_ it finishes? Sorry no i meant i have a VM where i saved the memory to disk - so i don't need to wait hours until i can reproduce as it does not happen with a fresh started VM. So it's a state file i think. Another test: - start the VM with -S, migrate, do errors appear on the destination? I started with -S and the errors appear AFTER resuming/unpause the VM. So it is fine until i resume it on the "new" host. Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] migration question: disk images on nfs server
On 02/07/2014 03:44 PM, Marcin Gibuła wrote: On 07.02.2014 14:36, Orit Wasserman wrote: Do you know if is applies to linux O_DIRECT writes as well? From the man of open: The behaviour of O_DIRECT with NFS will differ from local filesystems. Older kernels, or kernels configured in certain ways, may not support this combination. The NFS protocol does not support passing the flag to the server, so O_DIRECT I/O will bypass the page cache only on the client; the server may still cache the I/O. The client asks the server to make the I/O synchronous to preserve the synchronous semantics of O_DIRECT. Some servers will perform poorly under these circumstances, especially if the I/O size is small. Some servers may also be configured to lie to clients about the I/O having reached stable storage; this will avoid the performance penalty at some risk to data integrity in the event of server power failure. The Linux NFS client places no alignment restrictions on O_DIRECT I/O. To summaries it depends on your kernel (NFS client). So, assuming new kernel (where nfs O_DIRECT translates to no cache at client side) and cache coherent server, is it enough or is 'sync' mount (or O_SYNC flag) still required for some reason? I think is should be enough.
Re: [Qemu-devel] migration question: disk images on nfs server
On 02/07/2014 02:54 PM, Marcin Gibuła wrote: It is more a NFS issue, if you have a file in NFS that two users in two different host are accessing (one at least write to it) you will need to enforce the "sync" option. Even if you flush all the data and close the file the NFS client can still have cached data that it didn't sync to the server. Do you know if is applies to linux O_DIRECT writes as well? From the man of open: The behaviour of O_DIRECT with NFS will differ from local filesystems. Older kernels, or kernels configured in certain ways, may not support this combination. The NFS protocol does not support passing the flag to the server, so O_DIRECT I/O will bypass the page cache only on the client; the server may still cache the I/O. The client asks the server to make the I/O synchronous to preserve the synchronous semantics of O_DIRECT. Some servers will perform poorly under these circumstances, especially if the I/O size is small. Some servers may also be configured to lie to clients about the I/O having reached stable storage; this will avoid the performance penalty at some risk to data integrity in the event of server power failure. The Linux NFS client places no alignment restrictions on O_DIRECT I/O. To summaries it depends on your kernel (NFS client). From comment in fs/nfs/direct.c: * When an application requests uncached I/O, all read and write requests * are made directly to the server; data stored or fetched via these * requests is not cached in the Linux page cache. The client does not * correct unaligned requests from applications. All requested bytes are * held on permanent storage before a direct write system call returns to * an application.
Re: [Qemu-devel] migration question: disk images on nfs server
On 02/07/2014 02:10 PM, Alexey Kardashevskiy wrote: On 07.02.2014 18:46, Orit Wasserman wrote: On 02/07/2014 06:35 AM, Alexey Kardashevskiy wrote: Hi! I have yet another problem with migration. Or NFS. There is one NFS server and 2 test POWER8 machines. There is a shared NFS folder on the server, mounted to both test hosts. There is an qcow2 image (abc.qcow2) in that shared folder. We start a guest with this abc.qcow2 on the test machine #1. And start another guest on the test machine #2 with "-incoming ..." and same abc.qcow2. Now we start migration. In most cases it goes fine. But if we put some load on machine #1, the destination guest sometime crashes. I blame out-of-sync NFS on the test machines. I looked a bit further in QEMU and could not find a spot where it would fflush(abc.qcow2) or close it or do any other sync so it is up to the host NFS mountpoint to decide when to sync and it definitely does not get a clue when to do this. I do not really understand why the abc.qcow2 image is still open, should not it be closed after migration succeeded? What do I miss here? Should we switch from NFS to GlusterFS (is it always syncronized)? Or if we want NFS, should we just boot our guests with "root=/dev/nfs ip=dhcp nfsroot=..." and avoid using disk images in network disks? Thanks! For NFS you need to use the sync mount option to force the NFS client to sync to server on writes. So there is no any kind of sync in QEMU after migration finished, correct? Looks too mucn to enforce "sync" option for NFS as we really need it for once. It is more a NFS issue, if you have a file in NFS that two users in two different host are accessing (one at least write to it) you will need to enforce the "sync" option. Even if you flush all the data and close the file the NFS client can still have cached data that it didn't sync to the server.
Re: [Qemu-devel] migration question: disk images on nfs server
On 02/07/2014 06:35 AM, Alexey Kardashevskiy wrote: Hi! I have yet another problem with migration. Or NFS. There is one NFS server and 2 test POWER8 machines. There is a shared NFS folder on the server, mounted to both test hosts. There is an qcow2 image (abc.qcow2) in that shared folder. We start a guest with this abc.qcow2 on the test machine #1. And start another guest on the test machine #2 with "-incoming ..." and same abc.qcow2. Now we start migration. In most cases it goes fine. But if we put some load on machine #1, the destination guest sometime crashes. I blame out-of-sync NFS on the test machines. I looked a bit further in QEMU and could not find a spot where it would fflush(abc.qcow2) or close it or do any other sync so it is up to the host NFS mountpoint to decide when to sync and it definitely does not get a clue when to do this. I do not really understand why the abc.qcow2 image is still open, should not it be closed after migration succeeded? What do I miss here? Should we switch from NFS to GlusterFS (is it always syncronized)? Or if we want NFS, should we just boot our guests with "root=/dev/nfs ip=dhcp nfsroot=..." and avoid using disk images in network disks? Thanks! For NFS you need to use the sync mount option to force the NFS client to sync to server on writes. Orit
Re: [Qemu-devel] QEMU LIve Migration - swap_free: Bad swap file entry
On 02/06/2014 09:20 AM, Stefan Priebe - Profihost AG wrote: Am 05.02.2014 21:15, schrieb Dr. David Alan Gilbert: * Stefan Priebe (s.pri...@profihost.ag) wrote: Hello, after live migrating machines with a lot of memory (32GB, 48GB, ...) i see pretty often crashing services after migration and the guest kernel prints: [1707620.031806] swap_free: Bad swap file entry 00377410 [1707620.031806] swap_free: Bad swap file entry 00593c48 [1707620.031807] swap_free: Bad swap file entry 03201430 [1707620.031807] swap_free: Bad swap file entry 01bc5900 [1707620.031807] swap_free: Bad swap file entry 0173ce40 [1707620.031808] swap_free: Bad swap file entry 011c0270 [1707620.031808] swap_free: Bad swap file entry 03c58ae8 [1707660.749059] BUG: Bad rss-counter state mm:88064d09f380 idx:1 val:1536 [1707660.749937] BUG: Bad rss-counter state mm:88064d09f380 idx:2 val:-1536 Is this live migration with shared storage? what kind of shared storage? Does this happens with smaller guests? Qemu is 1.7 Does anybody know a fix? I don't, but some more information about: 1) What guest you're running Linux guest the output is also from the guest. Kernel 3.10.26 2) The configuration of your hosts What do you mean by that? 3) The command line (or XML if you're running libvirt) for your qemu so we can see what devices you're running. qemu -chardev socket,id=qmp,path=/var/run/qemu-server/179.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/179.vnc,x509,password -pidfile /var/run/qemu-server/179.pid -daemonize -name K31953 -smp sockets=1,cores=16 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep -k de -m 32768 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -drive if=none,id=drive-ide2,media=cdrom,aio=native -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive file=rbd:...,if=none,id=drive-scsi0,iops_rd=1000,iops_wr=500,bps_rd=314572800,bps_wr=209715200,aio=native,discard=on -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap179i0,script=/var/lib/qemu-server/pve-bridge,vhost=on -device virtio-net-pci,mac=CA:CA:23:AC:2D:C5,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc base=localtime -machine type=pc-i440fx-1.7 Do you get any messages on either the source or destination qemu during the migrate? no Stefan
[Qemu-devel] [PATCH] Remove trailing space from error message
Signed-off-by: Orit Wasserman --- migration.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration.c b/migration.c index 25add6f..cc18f6c 100644 --- a/migration.c +++ b/migration.c @@ -482,7 +482,7 @@ void qmp_migrate_set_cache_size(int64_t value, Error **errp) /* Cache should not be larger than guest ram size */ if (value > ram_bytes_total()) { error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", - "exceeds guest ram size "); + "exceeds guest ram size"); return; } -- 1.8.3.1
Re: [Qemu-devel] [PATCH 5/8] XBZRLE cache size should not be larger than guest memory size
On 02/04/2014 06:26 PM, Eric Blake wrote: On 02/04/2014 08:19 AM, Juan Quintela wrote: From: Orit Wasserman Signed-off-by: Orit Wasserman Signed-off-by: Juan Quintela --- migration.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/migration.c b/migration.c index 46a7305..25add6f 100644 --- a/migration.c +++ b/migration.c @@ -479,6 +479,13 @@ void qmp_migrate_set_cache_size(int64_t value, Error **errp) return; } +/* Cache should not be larger than guest ram size */ +if (value > ram_bytes_total()) { +error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", + "exceeds guest ram size "); Trailing space in the error message. I will send a separate patch to fix this. Orit
Re: [Qemu-devel] [PATCH v2 4/6] XBZRLE cache size should not be larger than guest memory size
On 01/30/2014 08:48 PM, Dr. David Alan Gilbert wrote: * Orit Wasserman (owass...@redhat.com) wrote: On 01/30/2014 08:23 PM, Dr. David Alan Gilbert wrote: * Orit Wasserman (owass...@redhat.com) wrote: Signed-off-by: Orit Wasserman --- migration.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/migration.c b/migration.c index 46a7305..25add6f 100644 --- a/migration.c +++ b/migration.c @@ -479,6 +479,13 @@ void qmp_migrate_set_cache_size(int64_t value, Error **errp) return; } +/* Cache should not be larger than guest ram size */ Why? (It's admittedly odd, but does it actually break something if it's larger?) Because how XBZRLE works, the idea is that for workload that changes the same pages frequently, we can reduce the amount of transferred data sent by sending only the diff. We also compress the diff itself. The cache is used to store the previous page so we can calculate the diff, so at most it will contain all the guest pages. It's a hash based cache though isn't it - so there will be some contention for a cache size==ram size case? not really because the hash function is so simple it becomes like an index into array. Also this does mean that you have to be a little careful to pick a sane XBZRLE cache size, since one that's too large will now fail; I can only see that being a problem on a machine with a mix of huge and tiny VMs. Yes if you pick the cache too small than you will have lots of missing and the compression wont be useful. if you pick too large than you wasted memory for nothing. What also problematic is picking the right workload, it will only be effective if between each migration iteration the same pages are changed. ( I sent the reviewd-by tag separately.) Thanks, Orit Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v2 4/6] XBZRLE cache size should not be larger than guest memory size
On 01/30/2014 08:23 PM, Dr. David Alan Gilbert wrote: * Orit Wasserman (owass...@redhat.com) wrote: Signed-off-by: Orit Wasserman --- migration.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/migration.c b/migration.c index 46a7305..25add6f 100644 --- a/migration.c +++ b/migration.c @@ -479,6 +479,13 @@ void qmp_migrate_set_cache_size(int64_t value, Error **errp) return; } +/* Cache should not be larger than guest ram size */ Why? (It's admittedly odd, but does it actually break something if it's larger?) Because how XBZRLE works, the idea is that for workload that changes the same pages frequently, we can reduce the amount of transferred data sent by sending only the diff. We also compress the diff itself. The cache is used to store the previous page so we can calculate the diff, so at most it will contain all the guest pages. Orit Dave +if (value > ram_bytes_total()) { +error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", + "exceeds guest ram size "); +return; +} + new_size = xbzrle_cache_resize(value); if (new_size < 0) { error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", -- 1.8.3.1 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] [PATCH v2 6/6] Don't abort on memory allocation error
It is better to fail migration in case of failure to allocate new cache item Signed-off-by: Orit Wasserman --- arch_init.c| 4 +++- include/migration/page_cache.h | 4 +++- page_cache.c | 16 +++- 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/arch_init.c b/arch_init.c index 1fa5f1f..80574a0 100644 --- a/arch_init.c +++ b/arch_init.c @@ -284,7 +284,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data, if (!cache_is_cached(XBZRLE.cache, current_addr)) { if (!last_stage) { -cache_insert(XBZRLE.cache, current_addr, current_data); +if (cache_insert(XBZRLE.cache, current_addr, current_data) == -1) { +return -1; +} } acct_info.xbzrle_cache_miss++; return -1; diff --git a/include/migration/page_cache.h b/include/migration/page_cache.h index 87894fe..d156f0d 100644 --- a/include/migration/page_cache.h +++ b/include/migration/page_cache.h @@ -60,11 +60,13 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr); * cache_insert: insert the page into the cache. the page cache * will dup the data on insert. the previous value will be overwritten * + * Returns -1 on error + * * @cache pointer to the PageCache struct * @addr: page address * @pdata: pointer to the page */ -void cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata); +int cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata); /** * cache_resize: resize the page cache. In case of size reduction the extra diff --git a/page_cache.c b/page_cache.c index 62a53f8..3ef6ee7 100644 --- a/page_cache.c +++ b/page_cache.c @@ -150,7 +150,7 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr) return cache_get_by_addr(cache, addr)->it_data; } -void cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata) +int cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata) { CacheItem *it = NULL; @@ -161,16 +161,22 @@ void cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata) /* actual update of entry */ it = cache_get_by_addr(cache, addr); -/* free old cached data if any */ -g_free(it->it_data); - +/* allocate page */ if (!it->it_data) { +it->it_data = g_try_malloc(cache->page_size); +if (!it->it_data) { +DPRINTF("Error allocating page\n"); +return -1; +} cache->num_items++; } -it->it_data = g_memdup(pdata, cache->page_size); +memcpy(it->it_data, pdata, cache->page_size); + it->it_age = ++cache->max_item_age; it->it_addr = addr; + +return 0; } int64_t cache_resize(PageCache *cache, int64_t new_num_pages) -- 1.8.3.1
[Qemu-devel] [PATCH v2 5/6] Don't abort on out of memory when creating page cache
Signed-off-by: Orit Wasserman --- arch_init.c | 18 -- page_cache.c | 18 ++ 2 files changed, 30 insertions(+), 6 deletions(-) diff --git a/arch_init.c b/arch_init.c index 5eff80b..1fa5f1f 100644 --- a/arch_init.c +++ b/arch_init.c @@ -664,8 +664,22 @@ static int ram_save_setup(QEMUFile *f, void *opaque) DPRINTF("Error creating cache\n"); return -1; } -XBZRLE.encoded_buf = g_malloc0(TARGET_PAGE_SIZE); -XBZRLE.current_buf = g_malloc(TARGET_PAGE_SIZE); + +/* We prefer not to abort if there is no memory */ +XBZRLE.encoded_buf = g_try_malloc0(TARGET_PAGE_SIZE); +if (!XBZRLE.encoded_buf) { +DPRINTF("Error allocating encoded_buf\n"); +return -1; +} + +XBZRLE.current_buf = g_try_malloc(TARGET_PAGE_SIZE); +if (!XBZRLE.current_buf) { +DPRINTF("Error allocating current_buf\n"); +g_free(XBZRLE.encoded_buf); +XBZRLE.encoded_buf = NULL; +return -1; +} + acct_clear(); } diff --git a/page_cache.c b/page_cache.c index a05db64..62a53f8 100644 --- a/page_cache.c +++ b/page_cache.c @@ -60,8 +60,12 @@ PageCache *cache_init(int64_t num_pages, unsigned int page_size) return NULL; } -cache = g_malloc(sizeof(*cache)); - +/* We prefer not to abort if there is no memory */ +cache = g_try_malloc(sizeof(*cache)); +if (!cache) { +DPRINTF("Failed to allocate cache\n"); +return NULL; +} /* round down to the nearest power of 2 */ if (!is_power_of_2(num_pages)) { num_pages = pow2floor(num_pages); @@ -74,8 +78,14 @@ PageCache *cache_init(int64_t num_pages, unsigned int page_size) DPRINTF("Setting cache buckets to %" PRId64 "\n", cache->max_num_items); -cache->page_cache = g_malloc((cache->max_num_items) * - sizeof(*cache->page_cache)); +/* We prefer not to abort if there is no memory */ +cache->page_cache = g_try_malloc((cache->max_num_items) * + sizeof(*cache->page_cache)); +if (!cache->page_cache) { +DPRINTF("Failed to allocate cache->page_cache\n"); +g_free(cache); +return NULL; +} for (i = 0; i < cache->max_num_items; i++) { cache->page_cache[i].it_data = NULL; -- 1.8.3.1
[Qemu-devel] [PATCH v2 2/6] Add check for cache size smaller than page size
Signed-off-by: Orit Wasserman Reviewed-by: Juan Quintela --- arch_init.c | 4 migration.c | 10 +- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch_init.c b/arch_init.c index 66f5e82..8edeabe 100644 --- a/arch_init.c +++ b/arch_init.c @@ -178,6 +178,10 @@ static struct { int64_t xbzrle_cache_resize(int64_t new_size) { +if (new_size < TARGET_PAGE_SIZE) { +return -1; +} + if (XBZRLE.cache != NULL) { return cache_resize(XBZRLE.cache, new_size / TARGET_PAGE_SIZE) * TARGET_PAGE_SIZE; diff --git a/migration.c b/migration.c index 7235c23..84587e9 100644 --- a/migration.c +++ b/migration.c @@ -469,6 +469,7 @@ void qmp_migrate_cancel(Error **errp) void qmp_migrate_set_cache_size(int64_t value, Error **errp) { MigrationState *s = migrate_get_current(); +int64_t new_size; /* Check for truncation */ if (value != (size_t)value) { @@ -477,7 +478,14 @@ void qmp_migrate_set_cache_size(int64_t value, Error **errp) return; } -s->xbzrle_cache_size = xbzrle_cache_resize(value); +new_size = xbzrle_cache_resize(value); +if (new_size < 0) { +error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", + "is smaller than page size"); +return; +} + +s->xbzrle_cache_size = new_size; } int64_t qmp_query_migrate_cache_size(Error **errp) -- 1.8.3.1
[Qemu-devel] [PATCH v2 3/6] migration:fix free XBZRLE decoded_buf wrong
From: "Gonglei (Arei)" When qemu do live migration with xbzrle, qemu malloc decoded_buf at destination end but free it at source end. It will crash qemu by double free error in some scenarios. Splitting the XBZRLE structure for clear logic distinguishing src/dst side. Signed-off-by: ChenLiang Reviewed-by: Peter Maydell Reviewed-by: Orit Wasserman Signed-off-by: GongLei --- arch_init.c | 22 -- include/migration/migration.h | 1 + migration.c | 1 + 3 files changed, 14 insertions(+), 10 deletions(-) diff --git a/arch_init.c b/arch_init.c index 8edeabe..5eff80b 100644 --- a/arch_init.c +++ b/arch_init.c @@ -164,17 +164,15 @@ static struct { uint8_t *encoded_buf; /* buffer for storing page content */ uint8_t *current_buf; -/* buffer used for XBZRLE decoding */ -uint8_t *decoded_buf; /* Cache for XBZRLE */ PageCache *cache; } XBZRLE = { .encoded_buf = NULL, .current_buf = NULL, -.decoded_buf = NULL, .cache = NULL, }; - +/* buffer used for XBZRLE decoding */ +static uint8_t *xbzrle_decoded_buf; int64_t xbzrle_cache_resize(int64_t new_size) { @@ -606,6 +604,12 @@ uint64_t ram_bytes_total(void) return total; } +void free_xbzrle_decoded_buf(void) +{ +g_free(xbzrle_decoded_buf); +xbzrle_decoded_buf = NULL; +} + static void migration_end(void) { if (migration_bitmap) { @@ -619,11 +623,9 @@ static void migration_end(void) g_free(XBZRLE.cache); g_free(XBZRLE.encoded_buf); g_free(XBZRLE.current_buf); -g_free(XBZRLE.decoded_buf); XBZRLE.cache = NULL; XBZRLE.encoded_buf = NULL; XBZRLE.current_buf = NULL; -XBZRLE.decoded_buf = NULL; } } @@ -814,8 +816,8 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host) unsigned int xh_len; int xh_flags; -if (!XBZRLE.decoded_buf) { -XBZRLE.decoded_buf = g_malloc(TARGET_PAGE_SIZE); +if (!xbzrle_decoded_buf) { +xbzrle_decoded_buf = g_malloc(TARGET_PAGE_SIZE); } /* extract RLE header */ @@ -832,10 +834,10 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host) return -1; } /* load data and decode */ -qemu_get_buffer(f, XBZRLE.decoded_buf, xh_len); +qemu_get_buffer(f, xbzrle_decoded_buf, xh_len); /* decode RLE */ -ret = xbzrle_decode_buffer(XBZRLE.decoded_buf, xh_len, host, +ret = xbzrle_decode_buffer(xbzrle_decoded_buf, xh_len, host, TARGET_PAGE_SIZE); if (ret == -1) { fprintf(stderr, "Failed to load XBZRLE page - decode error!\n"); diff --git a/include/migration/migration.h b/include/migration/migration.h index bfa3951..3e1e6c7 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -109,6 +109,7 @@ MigrationState *migrate_get_current(void); uint64_t ram_bytes_remaining(void); uint64_t ram_bytes_transferred(void); uint64_t ram_bytes_total(void); +void free_xbzrle_decoded_buf(void); void acct_update_position(QEMUFile *f, size_t size, bool zero); diff --git a/migration.c b/migration.c index 84587e9..46a7305 100644 --- a/migration.c +++ b/migration.c @@ -105,6 +105,7 @@ static void process_incoming_migration_co(void *opaque) ret = qemu_loadvm_state(f); qemu_fclose(f); +free_xbzrle_decoded_buf(); if (ret < 0) { fprintf(stderr, "load of migration failed\n"); exit(EXIT_FAILURE); -- 1.8.3.1
[Qemu-devel] [PATCH v2 4/6] XBZRLE cache size should not be larger than guest memory size
Signed-off-by: Orit Wasserman --- migration.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/migration.c b/migration.c index 46a7305..25add6f 100644 --- a/migration.c +++ b/migration.c @@ -479,6 +479,13 @@ void qmp_migrate_set_cache_size(int64_t value, Error **errp) return; } +/* Cache should not be larger than guest ram size */ +if (value > ram_bytes_total()) { +error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", + "exceeds guest ram size "); +return; +} + new_size = xbzrle_cache_resize(value); if (new_size < 0) { error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", -- 1.8.3.1
[Qemu-devel] [PATCH v2 1/6] Set xbzrle buffers to NULL after freeing them to avoid double free errors
Signed-off-by: Orit Wasserman Reviewed-by: Juan Quintela --- arch_init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch_init.c b/arch_init.c index 77912e7..66f5e82 100644 --- a/arch_init.c +++ b/arch_init.c @@ -617,6 +617,9 @@ static void migration_end(void) g_free(XBZRLE.current_buf); g_free(XBZRLE.decoded_buf); XBZRLE.cache = NULL; +XBZRLE.encoded_buf = NULL; +XBZRLE.current_buf = NULL; +XBZRLE.decoded_buf = NULL; } } -- 1.8.3.1
[Qemu-devel] [PATCH v2 0/6] XBZRLE Fixes
Fix memory leak and missing NULLs. Better cache size validation checks. Fail migration instead of aborting QEMU in case there is not enough memory for the page cache. Changes from v1: Free encoded_buf in case of failure to allocate current_buf. Update num_items after successful allocation. Gonglei (Arei) (1): migration:fix free XBZRLE decoded_buf wrong Orit Wasserman (5): Set xbzrle buffers to NULL after freeing them to avoid double free errors Add check for cache size smaller than page size XBZRLE cache size should not be larger than guest memory size Don't abort on out of memory when creating page cache Don't abort on memory allocation error arch_init.c| 49 +++--- include/migration/migration.h | 1 + include/migration/page_cache.h | 4 +++- migration.c| 18 +++- page_cache.c | 34 + 5 files changed, 83 insertions(+), 23 deletions(-) -- 1.8.3.1
Re: [Qemu-devel] [PATCH 6/6] Don't abort on memory allocation error
On 01/30/2014 06:08 PM, Dr. David Alan Gilbert wrote: * Orit Wasserman (owass...@redhat.com) wrote: It is better to fail migration in case of failure to allocate new cache item Does this actually fail migration or just drop back to sending an uncompressed page? (I think the latter, and that's an even better result). It fails the migration, if the users wants to disable compression it can do it and start migration again. Signed-off-by: Orit Wasserman --- arch_init.c| 4 +++- include/migration/page_cache.h | 4 +++- page_cache.c | 16 +++- 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/arch_init.c b/arch_init.c index 806d096..0bfbc5a 100644 --- a/arch_init.c +++ b/arch_init.c @@ -284,7 +284,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data, if (!cache_is_cached(XBZRLE.cache, current_addr)) { if (!last_stage) { -cache_insert(XBZRLE.cache, current_addr, current_data); +if (cache_insert(XBZRLE.cache, current_addr, current_data) == -1) { +return -1; +} } acct_info.xbzrle_cache_miss++; return -1; diff --git a/include/migration/page_cache.h b/include/migration/page_cache.h index 87894fe..d156f0d 100644 --- a/include/migration/page_cache.h +++ b/include/migration/page_cache.h @@ -60,11 +60,13 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr); * cache_insert: insert the page into the cache. the page cache * will dup the data on insert. the previous value will be overwritten * + * Returns -1 on error + * * @cache pointer to the PageCache struct * @addr: page address * @pdata: pointer to the page */ -void cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata); +int cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata); /** * cache_resize: resize the page cache. In case of size reduction the extra diff --git a/page_cache.c b/page_cache.c index 62a53f8..69e8329 100644 --- a/page_cache.c +++ b/page_cache.c @@ -150,7 +150,7 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr) return cache_get_by_addr(cache, addr)->it_data; } -void cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata) +int cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata) { CacheItem *it = NULL; @@ -161,16 +161,22 @@ void cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata) /* actual update of entry */ it = cache_get_by_addr(cache, addr); -/* free old cached data if any */ -g_free(it->it_data); - +/* allocate page */ if (!it->it_data) { cache->num_items++; +it->it_data = g_try_malloc(cache->page_size); +if (!it->it_data) { +DPRINTF("Error allocating page\n"); +return -1; +} It feels like it would have been correct to do the num_items++ after the allocation test (although since the num_items doesn't seem to be used anywhere that's not a show-stopper, but worth fixing sometime). You are correct, I will move it after allocation. Orit } -it->it_data = g_memdup(pdata, cache->page_size); +memcpy(it->it_data, pdata, cache->page_size); + it->it_age = ++cache->max_item_age; it->it_addr = addr; + +return 0; } int64_t cache_resize(PageCache *cache, int64_t new_num_pages) -- 1.8.3.1 Reviewed-by: Dr. David Alan Gilbert Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 5/6] Don't abort on out of memory when creating page cache
On 01/30/2014 05:48 PM, Dr. David Alan Gilbert wrote: * Orit Wasserman (owass...@redhat.com) wrote: Signed-off-by: Orit Wasserman --- arch_init.c | 16 ++-- page_cache.c | 18 ++ 2 files changed, 28 insertions(+), 6 deletions(-) diff --git a/arch_init.c b/arch_init.c index 5eff80b..806d096 100644 --- a/arch_init.c +++ b/arch_init.c @@ -664,8 +664,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque) DPRINTF("Error creating cache\n"); return -1; } -XBZRLE.encoded_buf = g_malloc0(TARGET_PAGE_SIZE); -XBZRLE.current_buf = g_malloc(TARGET_PAGE_SIZE); + +/* We prefer not to abort if there is no memory */ +XBZRLE.encoded_buf = g_try_malloc0(TARGET_PAGE_SIZE); +if (!XBZRLE.encoded_buf) { +DPRINTF("Error allocating encoded_buf\n"); +return -1; +} + +XBZRLE.current_buf = g_try_malloc(TARGET_PAGE_SIZE); +if (!XBZRLE.current_buf) { +DPRINTF("Error allocating current_buf\n"); +return -1; +} Would it be best to free encoded_buf in this second exit case? It is freed in migration_end (that is called when migration fails or canceled). Orit Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] [PATCH 6/6] Don't abort on memory allocation error
It is better to fail migration in case of failure to allocate new cache item Signed-off-by: Orit Wasserman --- arch_init.c| 4 +++- include/migration/page_cache.h | 4 +++- page_cache.c | 16 +++- 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/arch_init.c b/arch_init.c index 806d096..0bfbc5a 100644 --- a/arch_init.c +++ b/arch_init.c @@ -284,7 +284,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data, if (!cache_is_cached(XBZRLE.cache, current_addr)) { if (!last_stage) { -cache_insert(XBZRLE.cache, current_addr, current_data); +if (cache_insert(XBZRLE.cache, current_addr, current_data) == -1) { +return -1; +} } acct_info.xbzrle_cache_miss++; return -1; diff --git a/include/migration/page_cache.h b/include/migration/page_cache.h index 87894fe..d156f0d 100644 --- a/include/migration/page_cache.h +++ b/include/migration/page_cache.h @@ -60,11 +60,13 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr); * cache_insert: insert the page into the cache. the page cache * will dup the data on insert. the previous value will be overwritten * + * Returns -1 on error + * * @cache pointer to the PageCache struct * @addr: page address * @pdata: pointer to the page */ -void cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata); +int cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata); /** * cache_resize: resize the page cache. In case of size reduction the extra diff --git a/page_cache.c b/page_cache.c index 62a53f8..69e8329 100644 --- a/page_cache.c +++ b/page_cache.c @@ -150,7 +150,7 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr) return cache_get_by_addr(cache, addr)->it_data; } -void cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata) +int cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata) { CacheItem *it = NULL; @@ -161,16 +161,22 @@ void cache_insert(PageCache *cache, uint64_t addr, uint8_t *pdata) /* actual update of entry */ it = cache_get_by_addr(cache, addr); -/* free old cached data if any */ -g_free(it->it_data); - +/* allocate page */ if (!it->it_data) { cache->num_items++; +it->it_data = g_try_malloc(cache->page_size); +if (!it->it_data) { +DPRINTF("Error allocating page\n"); +return -1; +} } -it->it_data = g_memdup(pdata, cache->page_size); +memcpy(it->it_data, pdata, cache->page_size); + it->it_age = ++cache->max_item_age; it->it_addr = addr; + +return 0; } int64_t cache_resize(PageCache *cache, int64_t new_num_pages) -- 1.8.3.1
[Qemu-devel] [PATCH 4/6] XBZRLE cache size should not be larger than guest memory size
Signed-off-by: Orit Wasserman --- migration.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/migration.c b/migration.c index 46a7305..25add6f 100644 --- a/migration.c +++ b/migration.c @@ -479,6 +479,13 @@ void qmp_migrate_set_cache_size(int64_t value, Error **errp) return; } +/* Cache should not be larger than guest ram size */ +if (value > ram_bytes_total()) { +error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", + "exceeds guest ram size "); +return; +} + new_size = xbzrle_cache_resize(value); if (new_size < 0) { error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", -- 1.8.3.1
[Qemu-devel] [PATCH 2/6] Add check for cache size smaller than page size
Signed-off-by: Orit Wasserman Reviewed-by: Juan Quintela --- arch_init.c | 4 migration.c | 10 +- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch_init.c b/arch_init.c index 66f5e82..8edeabe 100644 --- a/arch_init.c +++ b/arch_init.c @@ -178,6 +178,10 @@ static struct { int64_t xbzrle_cache_resize(int64_t new_size) { +if (new_size < TARGET_PAGE_SIZE) { +return -1; +} + if (XBZRLE.cache != NULL) { return cache_resize(XBZRLE.cache, new_size / TARGET_PAGE_SIZE) * TARGET_PAGE_SIZE; diff --git a/migration.c b/migration.c index 7235c23..84587e9 100644 --- a/migration.c +++ b/migration.c @@ -469,6 +469,7 @@ void qmp_migrate_cancel(Error **errp) void qmp_migrate_set_cache_size(int64_t value, Error **errp) { MigrationState *s = migrate_get_current(); +int64_t new_size; /* Check for truncation */ if (value != (size_t)value) { @@ -477,7 +478,14 @@ void qmp_migrate_set_cache_size(int64_t value, Error **errp) return; } -s->xbzrle_cache_size = xbzrle_cache_resize(value); +new_size = xbzrle_cache_resize(value); +if (new_size < 0) { +error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", + "is smaller than page size"); +return; +} + +s->xbzrle_cache_size = new_size; } int64_t qmp_query_migrate_cache_size(Error **errp) -- 1.8.3.1
[Qemu-devel] [PATCH 5/6] Don't abort on out of memory when creating page cache
Signed-off-by: Orit Wasserman --- arch_init.c | 16 ++-- page_cache.c | 18 ++ 2 files changed, 28 insertions(+), 6 deletions(-) diff --git a/arch_init.c b/arch_init.c index 5eff80b..806d096 100644 --- a/arch_init.c +++ b/arch_init.c @@ -664,8 +664,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque) DPRINTF("Error creating cache\n"); return -1; } -XBZRLE.encoded_buf = g_malloc0(TARGET_PAGE_SIZE); -XBZRLE.current_buf = g_malloc(TARGET_PAGE_SIZE); + +/* We prefer not to abort if there is no memory */ +XBZRLE.encoded_buf = g_try_malloc0(TARGET_PAGE_SIZE); +if (!XBZRLE.encoded_buf) { +DPRINTF("Error allocating encoded_buf\n"); +return -1; +} + +XBZRLE.current_buf = g_try_malloc(TARGET_PAGE_SIZE); +if (!XBZRLE.current_buf) { +DPRINTF("Error allocating current_buf\n"); +return -1; +} + acct_clear(); } diff --git a/page_cache.c b/page_cache.c index a05db64..62a53f8 100644 --- a/page_cache.c +++ b/page_cache.c @@ -60,8 +60,12 @@ PageCache *cache_init(int64_t num_pages, unsigned int page_size) return NULL; } -cache = g_malloc(sizeof(*cache)); - +/* We prefer not to abort if there is no memory */ +cache = g_try_malloc(sizeof(*cache)); +if (!cache) { +DPRINTF("Failed to allocate cache\n"); +return NULL; +} /* round down to the nearest power of 2 */ if (!is_power_of_2(num_pages)) { num_pages = pow2floor(num_pages); @@ -74,8 +78,14 @@ PageCache *cache_init(int64_t num_pages, unsigned int page_size) DPRINTF("Setting cache buckets to %" PRId64 "\n", cache->max_num_items); -cache->page_cache = g_malloc((cache->max_num_items) * - sizeof(*cache->page_cache)); +/* We prefer not to abort if there is no memory */ +cache->page_cache = g_try_malloc((cache->max_num_items) * + sizeof(*cache->page_cache)); +if (!cache->page_cache) { +DPRINTF("Failed to allocate cache->page_cache\n"); +g_free(cache); +return NULL; +} for (i = 0; i < cache->max_num_items; i++) { cache->page_cache[i].it_data = NULL; -- 1.8.3.1
[Qemu-devel] [PATCH 1/6] Set xbzrle buffers to NULL after freeing them to avoid double free errors
Signed-off-by: Orit Wasserman Reviewed-by: Juan Quintela --- arch_init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch_init.c b/arch_init.c index 77912e7..66f5e82 100644 --- a/arch_init.c +++ b/arch_init.c @@ -617,6 +617,9 @@ static void migration_end(void) g_free(XBZRLE.current_buf); g_free(XBZRLE.decoded_buf); XBZRLE.cache = NULL; +XBZRLE.encoded_buf = NULL; +XBZRLE.current_buf = NULL; +XBZRLE.decoded_buf = NULL; } } -- 1.8.3.1
[Qemu-devel] [PATCH 3/6] migration:fix free XBZRLE decoded_buf wrong
From: "Gonglei (Arei)" When qemu do live migration with xbzrle, qemu malloc decoded_buf at destination end but free it at source end. It will crash qemu by double free error in some scenarios. Splitting the XBZRLE structure for clear logic distinguishing src/dst side. Signed-off-by: ChenLiang Reviewed-by: Peter Maydell Reviewed-by: Orit Wasserman Signed-off-by: GongLei --- arch_init.c | 22 -- include/migration/migration.h | 1 + migration.c | 1 + 3 files changed, 14 insertions(+), 10 deletions(-) diff --git a/arch_init.c b/arch_init.c index 8edeabe..5eff80b 100644 --- a/arch_init.c +++ b/arch_init.c @@ -164,17 +164,15 @@ static struct { uint8_t *encoded_buf; /* buffer for storing page content */ uint8_t *current_buf; -/* buffer used for XBZRLE decoding */ -uint8_t *decoded_buf; /* Cache for XBZRLE */ PageCache *cache; } XBZRLE = { .encoded_buf = NULL, .current_buf = NULL, -.decoded_buf = NULL, .cache = NULL, }; - +/* buffer used for XBZRLE decoding */ +static uint8_t *xbzrle_decoded_buf; int64_t xbzrle_cache_resize(int64_t new_size) { @@ -606,6 +604,12 @@ uint64_t ram_bytes_total(void) return total; } +void free_xbzrle_decoded_buf(void) +{ +g_free(xbzrle_decoded_buf); +xbzrle_decoded_buf = NULL; +} + static void migration_end(void) { if (migration_bitmap) { @@ -619,11 +623,9 @@ static void migration_end(void) g_free(XBZRLE.cache); g_free(XBZRLE.encoded_buf); g_free(XBZRLE.current_buf); -g_free(XBZRLE.decoded_buf); XBZRLE.cache = NULL; XBZRLE.encoded_buf = NULL; XBZRLE.current_buf = NULL; -XBZRLE.decoded_buf = NULL; } } @@ -814,8 +816,8 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host) unsigned int xh_len; int xh_flags; -if (!XBZRLE.decoded_buf) { -XBZRLE.decoded_buf = g_malloc(TARGET_PAGE_SIZE); +if (!xbzrle_decoded_buf) { +xbzrle_decoded_buf = g_malloc(TARGET_PAGE_SIZE); } /* extract RLE header */ @@ -832,10 +834,10 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host) return -1; } /* load data and decode */ -qemu_get_buffer(f, XBZRLE.decoded_buf, xh_len); +qemu_get_buffer(f, xbzrle_decoded_buf, xh_len); /* decode RLE */ -ret = xbzrle_decode_buffer(XBZRLE.decoded_buf, xh_len, host, +ret = xbzrle_decode_buffer(xbzrle_decoded_buf, xh_len, host, TARGET_PAGE_SIZE); if (ret == -1) { fprintf(stderr, "Failed to load XBZRLE page - decode error!\n"); diff --git a/include/migration/migration.h b/include/migration/migration.h index bfa3951..3e1e6c7 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -109,6 +109,7 @@ MigrationState *migrate_get_current(void); uint64_t ram_bytes_remaining(void); uint64_t ram_bytes_transferred(void); uint64_t ram_bytes_total(void); +void free_xbzrle_decoded_buf(void); void acct_update_position(QEMUFile *f, size_t size, bool zero); diff --git a/migration.c b/migration.c index 84587e9..46a7305 100644 --- a/migration.c +++ b/migration.c @@ -105,6 +105,7 @@ static void process_incoming_migration_co(void *opaque) ret = qemu_loadvm_state(f); qemu_fclose(f); +free_xbzrle_decoded_buf(); if (ret < 0) { fprintf(stderr, "load of migration failed\n"); exit(EXIT_FAILURE); -- 1.8.3.1
[Qemu-devel] [PATCH 0/6] XBZRLE Fixes
Fix memory leak and missing NULLs. Better cache size validation checks. Fail migration instead of aborting QEMU in case there is not enough memory for the page cache. Gonglei (Arei) (1): migration:fix free XBZRLE decoded_buf wrong Orit Wasserman (5): Set xbzrle buffers to NULL after freeing them to avoid double free errors Add check for cache size smaller than page size XBZRLE cache size should not be larger than guest memory size Don't abort on out of memory when creating page cache Don't abort on memory allocation error arch_init.c| 47 +++--- include/migration/migration.h | 1 + include/migration/page_cache.h | 4 +++- migration.c| 18 +++- page_cache.c | 34 ++ 5 files changed, 81 insertions(+), 23 deletions(-) -- 1.8.3.1
Re: [Qemu-devel] [PATCH v2] migration:fix free XBZRLE decoded_buf wrong
On 01/21/2014 02:58 PM, Gonglei (Arei) wrote: -Original Message- From: Orit Wasserman [mailto:owass...@redhat.com] Sent: Tuesday, January 21, 2014 8:24 PM To: Gonglei (Arei); qemu-devel@nongnu.org; qemu-sta...@nongnu.org; Peter Maydell; anth...@codemonkey.ws; pbonz...@redhat.com Cc: chenliang (T); Luonengjun; Huangweidong (Hardware) Subject: Re: [Qemu-devel] [PATCH v2] migration:fix free XBZRLE decoded_buf wrong On 01/21/2014 02:11 PM, Gonglei (Arei) wrote: Hi, This is an update of my patch. Modifications in v2: * Removing excess check for g_free * The structure of XBZRLE is divided into two halves.One is for * src side, another is for dest side. What is the benefit of splitting the structure? decode_buf is only allocated (and freed) in the destination any way. Yeah, you are right. Splitting the structure is not necessary. The key to do that is just for clear logic. As Peter said: the current arrangement looks extremely prone to bugs like this one where somebody forgets that some of the fields are not relevant to whichever of src/dst the code path they're writing is used on. Best regards, -Gonglei Sounds reasonable. Thanks for finding the leak and fixing it. Orit Orit
Re: [Qemu-devel] [PATCH v2] migration:fix free XBZRLE decoded_buf wrong
On 01/21/2014 02:11 PM, Gonglei (Arei) wrote: Hi, This is an update of my patch. Modifications in v2: * Removing excess check for g_free * The structure of XBZRLE is divided into two halves.One is for * src side, another is for dest side. What is the benefit of splitting the structure? decode_buf is only allocated (and freed) in the destination any way. Orit Signed-off-by: chenliang --- arch_init.c | 23 ++- include/migration/migration.h |1 + migration.c |1 + 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/arch_init.c b/arch_init.c index 77912e7..9a28a43 100644 --- a/arch_init.c +++ b/arch_init.c @@ -164,17 +164,15 @@ static struct { uint8_t *encoded_buf; /* buffer for storing page content */ uint8_t *current_buf; -/* buffer used for XBZRLE decoding */ -uint8_t *decoded_buf; /* Cache for XBZRLE */ PageCache *cache; } XBZRLE = { .encoded_buf = NULL, .current_buf = NULL, -.decoded_buf = NULL, .cache = NULL, }; - +/* buffer used for XBZRLE decoding */ +static uint8_t *xbzrle_decoded_buf = NULL; int64_t xbzrle_cache_resize(int64_t new_size) { @@ -602,6 +600,12 @@ uint64_t ram_bytes_total(void) return total; } +void free_xbzrle_decoded_buf(void) +{ +g_free(xbzrle_decoded_buf); +xbzrle_decoded_buf = NULL; +} + static void migration_end(void) { if (migration_bitmap) { @@ -615,8 +619,9 @@ static void migration_end(void) g_free(XBZRLE.cache); g_free(XBZRLE.encoded_buf); g_free(XBZRLE.current_buf); -g_free(XBZRLE.decoded_buf); XBZRLE.cache = NULL; +XBZRLE.encoded_buf = NULL; +XBZRLE.current_buf = NULL; } } @@ -807,8 +812,8 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host) unsigned int xh_len; int xh_flags; -if (!XBZRLE.decoded_buf) { -XBZRLE.decoded_buf = g_malloc(TARGET_PAGE_SIZE); +if (!xbzrle_decoded_buf) { +xbzrle_decoded_buf = g_malloc(TARGET_PAGE_SIZE); } /* extract RLE header */ @@ -825,10 +830,10 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host) return -1; } /* load data and decode */ -qemu_get_buffer(f, XBZRLE.decoded_buf, xh_len); +qemu_get_buffer(f, xbzrle_decoded_buf, xh_len); /* decode RLE */ -ret = xbzrle_decode_buffer(XBZRLE.decoded_buf, xh_len, host, +ret = xbzrle_decode_buffer(xbzrle_decoded_buf, xh_len, host, TARGET_PAGE_SIZE); if (ret == -1) { fprintf(stderr, "Failed to load XBZRLE page - decode error!\n"); diff --git a/include/migration/migration.h b/include/migration/migration.h index bfa3951..3e1e6c7 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -109,6 +109,7 @@ MigrationState *migrate_get_current(void); uint64_t ram_bytes_remaining(void); uint64_t ram_bytes_transferred(void); uint64_t ram_bytes_total(void); +void free_xbzrle_decoded_buf(void); void acct_update_position(QEMUFile *f, size_t size, bool zero); diff --git a/migration.c b/migration.c index 7235c23..3d46804 100644 --- a/migration.c +++ b/migration.c @@ -105,6 +105,7 @@ static void process_incoming_migration_co(void *opaque) ret = qemu_loadvm_state(f); qemu_fclose(f); +free_xbzrle_decoded_buf(); if (ret < 0) { fprintf(stderr, "load of migration failed\n"); exit(EXIT_FAILURE);
[Qemu-devel] [PATCH 1/2] Set xbzrle buffers to NULL after freeing them to avoid double free errors
Signed-off-by: Orit Wasserman --- arch_init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch_init.c b/arch_init.c index e0acbc5..5c55c68 100644 --- a/arch_init.c +++ b/arch_init.c @@ -587,6 +587,9 @@ static void migration_end(void) g_free(XBZRLE.current_buf); g_free(XBZRLE.decoded_buf); XBZRLE.cache = NULL; +XBZRLE.encoded_buf = NULL; +XBZRLE.current_buf = NULL; +XBZRLE.decoded_buf = NULL; } } -- 1.8.3.1
[Qemu-devel] [PATCH 2/2] Add check for cache size smaller than page size
Signed-off-by: Orit Wasserman --- arch_init.c | 4 migration.c | 10 +- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch_init.c b/arch_init.c index 5c55c68..e52c9ba 100644 --- a/arch_init.c +++ b/arch_init.c @@ -176,6 +176,10 @@ static struct { int64_t xbzrle_cache_resize(int64_t new_size) { +if (new_size < TARGET_PAGE_SIZE) { +return -1; +} + if (XBZRLE.cache != NULL) { return cache_resize(XBZRLE.cache, new_size / TARGET_PAGE_SIZE) * TARGET_PAGE_SIZE; diff --git a/migration.c b/migration.c index 2b1ab20..f28aa1d 100644 --- a/migration.c +++ b/migration.c @@ -455,6 +455,7 @@ void qmp_migrate_cancel(Error **errp) void qmp_migrate_set_cache_size(int64_t value, Error **errp) { MigrationState *s = migrate_get_current(); +int64_t new_size; /* Check for truncation */ if (value != (size_t)value) { @@ -463,8 +464,14 @@ void qmp_migrate_set_cache_size(int64_t value, Error **errp) return; } -s->xbzrle_cache_size = xbzrle_cache_resize(value); +new_size = xbzrle_cache_resize(value); +if (new_size < 0) { +error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size", + "is smaller than page size"); +return; +} + +s->xbzrle_cache_size = new_size; } int64_t qmp_query_migrate_cache_size(Error **errp) -- 1.8.3.1
Re: [Qemu-devel] [PATCH 01/38] bitmap: use long as index
long bits) { -int k; -int nr = BITS_TO_LONGS(bits); +long k; +long nr = BITS_TO_LONGS(bits); for (k = 0; k < nr; k++) { dst[k] = bitmap1[k] ^ bitmap2[k]; @@ -143,10 +143,10 @@ void slow_bitmap_xor(unsigned long *dst, const unsigned long *bitmap1, } int slow_bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1, - const unsigned long *bitmap2, int bits) + const unsigned long *bitmap2, long bits) { -int k; -int nr = BITS_TO_LONGS(bits); +long k; +long nr = BITS_TO_LONGS(bits); unsigned long result = 0; for (k = 0; k < nr; k++) { @@ -157,10 +157,10 @@ int slow_bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1, #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) % BITS_PER_LONG)) -void bitmap_set(unsigned long *map, int start, int nr) +void bitmap_set(unsigned long *map, long start, long nr) { unsigned long *p = map + BIT_WORD(start); -const int size = start + nr; +const long size = start + nr; int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG); unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start); @@ -177,10 +177,10 @@ void bitmap_set(unsigned long *map, int start, int nr) } } -void bitmap_clear(unsigned long *map, int start, int nr) +void bitmap_clear(unsigned long *map, long start, long nr) { unsigned long *p = map + BIT_WORD(start); -const int size = start + nr; +const long size = start + nr; int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG); unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start); @@ -212,10 +212,10 @@ void bitmap_clear(unsigned long *map, int start, int nr) * power of 2. A @align_mask of 0 means no alignment is required. */ unsigned long bitmap_find_next_zero_area(unsigned long *map, -unsigned long size, -unsigned long start, -unsigned int nr, -unsigned long align_mask) + unsigned long size, + unsigned long start, + unsigned long nr, + unsigned long align_mask) { unsigned long index, end, i; again: @@ -237,9 +237,9 @@ again: } int slow_bitmap_intersects(const unsigned long *bitmap1, - const unsigned long *bitmap2, int bits) + const unsigned long *bitmap2, long bits) { -int k, lim = bits/BITS_PER_LONG; +long k, lim = bits/BITS_PER_LONG; for (k = 0; k < lim; ++k) { if (bitmap1[k] & bitmap2[k]) { Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 37/38] migration: synchronize memory bitmap 64bits at a time
On 12/17/2013 05:26 PM, Juan Quintela wrote: We use the old code if the bitmaps are not aligned Signed-off-by: Juan Quintela --- arch_init.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/arch_init.c b/arch_init.c index 2cd3d00..77912e7 100644 --- a/arch_init.c +++ b/arch_init.c @@ -50,6 +50,7 @@ #include "exec/cpu-all.h" #include "exec/ram_addr.h" #include "hw/acpi/acpi.h" +#include "qemu/host-utils.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -376,15 +377,34 @@ static inline bool migration_bitmap_set_dirty(ram_addr_t addr) static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length) { ram_addr_t addr; - -for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) { -if (cpu_physical_memory_get_dirty(start + addr, - TARGET_PAGE_SIZE, - DIRTY_MEMORY_MIGRATION)) { -cpu_physical_memory_reset_dirty(start + addr, -TARGET_PAGE_SIZE, -DIRTY_MEMORY_MIGRATION); -migration_bitmap_set_dirty(start + addr); +unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS); + +/* start address is aligned at the start of a word? */ +if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) { +int k; +int nr = BITS_TO_LONGS(length >> TARGET_PAGE_BITS); +unsigned long *src = ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]; + +for (k = page; k < page + nr; k++) { +if (src[k]) { +unsigned long new_dirty; +new_dirty = ~migration_bitmap[k]; +migration_bitmap[k] |= src[k]; +new_dirty &= src[k]; +migration_dirty_pages += ctpopl(new_dirty); +src[k] = 0; +} +} +} else { +for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) { +if (cpu_physical_memory_get_dirty(start + addr, + TARGET_PAGE_SIZE, + DIRTY_MEMORY_MIGRATION)) { +cpu_physical_memory_reset_dirty(start + addr, +TARGET_PAGE_SIZE, +DIRTY_MEMORY_MIGRATION); +migration_bitmap_set_dirty(start + addr); +} } } } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 36/38] ram: split function that synchronizes a range
On 12/17/2013 05:26 PM, Juan Quintela wrote: This function is the only bit where we care about speed. Signed-off-by: Juan Quintela --- arch_init.c | 34 -- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/arch_init.c b/arch_init.c index 0e8c8b5..2cd3d00 100644 --- a/arch_init.c +++ b/arch_init.c @@ -360,11 +360,10 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr, return (next - base) << TARGET_PAGE_BITS; } -static inline bool migration_bitmap_set_dirty(MemoryRegion *mr, - ram_addr_t offset) +static inline bool migration_bitmap_set_dirty(ram_addr_t addr) { bool ret; -int nr = (mr->ram_addr + offset) >> TARGET_PAGE_BITS; +int nr = addr >> TARGET_PAGE_BITS; ret = test_and_set_bit(nr, migration_bitmap); @@ -374,12 +373,28 @@ static inline bool migration_bitmap_set_dirty(MemoryRegion *mr, return ret; } +static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length) +{ +ram_addr_t addr; + +for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) { +if (cpu_physical_memory_get_dirty(start + addr, + TARGET_PAGE_SIZE, + DIRTY_MEMORY_MIGRATION)) { +cpu_physical_memory_reset_dirty(start + addr, +TARGET_PAGE_SIZE, +DIRTY_MEMORY_MIGRATION); +migration_bitmap_set_dirty(start + addr); +} +} +} + + /* Needs iothread lock! */ static void migration_bitmap_sync(void) { RAMBlock *block; -ram_addr_t addr; uint64_t num_dirty_pages_init = migration_dirty_pages; MigrationState *s = migrate_get_current(); static int64_t start_time; @@ -400,16 +415,7 @@ static void migration_bitmap_sync(void) address_space_sync_dirty_bitmap(&address_space_memory); QTAILQ_FOREACH(block, &ram_list.blocks, next) { -for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { -if (cpu_physical_memory_get_dirty(block->mr->ram_addr + addr, - TARGET_PAGE_SIZE, - DIRTY_MEMORY_MIGRATION)) { -cpu_physical_memory_reset_dirty(block->mr->ram_addr + addr, -TARGET_PAGE_SIZE, -DIRTY_MEMORY_MIGRATION); -migration_bitmap_set_dirty(block->mr, addr); -} -} +migration_bitmap_sync_range(block->mr->ram_addr, block->length); } trace_migration_bitmap_sync_end(migration_dirty_pages - num_dirty_pages_init); Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 35/38] memory: syncronize kvm bitmap using bitmaps operations
On 12/17/2013 05:26 PM, Juan Quintela wrote: If bitmaps are aligned properly, use bitmap operations. If they are not, just use old bit at a time code. Signed-off-by: Juan Quintela --- include/exec/ram_addr.h | 54 - 1 file changed, 36 insertions(+), 18 deletions(-) diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index c6736ed..33c8acc 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -83,29 +83,47 @@ static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap, ram_addr_t start, ram_addr_t pages) { -unsigned int i, j; +unsigned long i, j; unsigned long page_number, c; hwaddr addr; ram_addr_t ram_addr; -unsigned int len = (pages + HOST_LONG_BITS - 1) / HOST_LONG_BITS; +unsigned long len = (pages + HOST_LONG_BITS - 1) / HOST_LONG_BITS; unsigned long hpratio = getpagesize() / TARGET_PAGE_SIZE; +unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS); -/* - * bitmap-traveling is faster than memory-traveling (for addr...) - * especially when most of the memory is not dirty. - */ -for (i = 0; i < len; i++) { -if (bitmap[i] != 0) { -c = leul_to_cpu(bitmap[i]); -do { -j = ffsl(c) - 1; -c &= ~(1ul << j); -page_number = (i * HOST_LONG_BITS + j) * hpratio; -addr = page_number * TARGET_PAGE_SIZE; -ram_addr = start + addr; -cpu_physical_memory_set_dirty_range(ram_addr, -TARGET_PAGE_SIZE * hpratio); -} while (c != 0); +/* start address is aligned at the start of a word? */ +if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) { +long k; +long nr = BITS_TO_LONGS(pages); + +for (k = 0; k < nr; k++) { +if (bitmap[k]) { +unsigned long temp = leul_to_cpu(bitmap[k]); + +ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION][page + k] |= temp; +ram_list.dirty_memory[DIRTY_MEMORY_VGA][page + k] |= temp; +ram_list.dirty_memory[DIRTY_MEMORY_CODE][page + k] |= temp; +} +} +xen_modified_memory(start, pages); +} else { +/* + * bitmap-traveling is faster than memory-traveling (for addr...) + * especially when most of the memory is not dirty. + */ +for (i = 0; i < len; i++) { +if (bitmap[i] != 0) { +c = leul_to_cpu(bitmap[i]); +do { +j = ffsl(c) - 1; +c &= ~(1ul << j); +page_number = (i * HOST_LONG_BITS + j) * hpratio; +addr = page_number * TARGET_PAGE_SIZE; +ram_addr = start + addr; +cpu_physical_memory_set_dirty_range(ram_addr, + TARGET_PAGE_SIZE * hpratio); +} while (c != 0); +} } } } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 34/38] memory: move bitmap synchronization to its own function
On 12/17/2013 05:26 PM, Juan Quintela wrote: We want to have all the functions that handle directly the dirty bitmap near. We will change it later. Signed-off-by: Juan Quintela --- include/exec/ram_addr.h | 31 +++ kvm-all.c | 27 ++- 2 files changed, 33 insertions(+), 25 deletions(-) diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index db977fb..c6736ed 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -79,6 +79,37 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, xen_modified_memory(start, length); } +static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap, + ram_addr_t start, + ram_addr_t pages) +{ +unsigned int i, j; +unsigned long page_number, c; +hwaddr addr; +ram_addr_t ram_addr; +unsigned int len = (pages + HOST_LONG_BITS - 1) / HOST_LONG_BITS; +unsigned long hpratio = getpagesize() / TARGET_PAGE_SIZE; + +/* + * bitmap-traveling is faster than memory-traveling (for addr...) + * especially when most of the memory is not dirty. + */ +for (i = 0; i < len; i++) { +if (bitmap[i] != 0) { +c = leul_to_cpu(bitmap[i]); +do { +j = ffsl(c) - 1; +c &= ~(1ul << j); +page_number = (i * HOST_LONG_BITS + j) * hpratio; +addr = page_number * TARGET_PAGE_SIZE; +ram_addr = start + addr; +cpu_physical_memory_set_dirty_range(ram_addr, +TARGET_PAGE_SIZE * hpratio); +} while (c != 0); +} +} +} + static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, ram_addr_t length, unsigned client) diff --git a/kvm-all.c b/kvm-all.c index cb62ba4..0bfb060 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -380,33 +380,10 @@ static int kvm_set_migration_log(int enable) static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section, unsigned long *bitmap) { -unsigned int i, j; -unsigned long page_number, c; -hwaddr addr; ram_addr_t start = section->offset_within_region + section->mr->ram_addr; -ram_addr_t ram_addr; -unsigned int pages = int128_get64(section->size) / getpagesize(); -unsigned int len = (pages + HOST_LONG_BITS - 1) / HOST_LONG_BITS; -unsigned long hpratio = getpagesize() / TARGET_PAGE_SIZE; +ram_addr_t pages = int128_get64(section->size) / getpagesize(); -/* - * bitmap-traveling is faster than memory-traveling (for addr...) - * especially when most of the memory is not dirty. - */ -for (i = 0; i < len; i++) { -if (bitmap[i] != 0) { -c = leul_to_cpu(bitmap[i]); -do { -j = ffsl(c) - 1; -c &= ~(1ul << j); -page_number = (i * HOST_LONG_BITS + j) * hpratio; -addr = page_number * TARGET_PAGE_SIZE; -ram_addr = start + addr; -cpu_physical_memory_set_dirty_range(ram_addr, -TARGET_PAGE_SIZE * hpratio); -} while (c != 0); -} -} +cpu_physical_memory_set_dirty_lebitmap(bitmap, start, pages); return 0; } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 33/38] kvm: refactor start address calculation
On 12/17/2013 05:26 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela --- kvm-all.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 308dfba..cb62ba4 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -382,7 +382,8 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section, { unsigned int i, j; unsigned long page_number, c; -hwaddr addr, addr1; +hwaddr addr; +ram_addr_t start = section->offset_within_region + section->mr->ram_addr; ram_addr_t ram_addr; unsigned int pages = int128_get64(section->size) / getpagesize(); unsigned int len = (pages + HOST_LONG_BITS - 1) / HOST_LONG_BITS; @@ -399,9 +400,8 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section, j = ffsl(c) - 1; c &= ~(1ul << j); page_number = (i * HOST_LONG_BITS + j) * hpratio; -addr1 = page_number * TARGET_PAGE_SIZE; -addr = section->offset_within_region + addr1; -ram_addr = section->mr->ram_addr + addr; +addr = page_number * TARGET_PAGE_SIZE; +ram_addr = start + addr; cpu_physical_memory_set_dirty_range(ram_addr, TARGET_PAGE_SIZE * hpratio); } while (c != 0); Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 32/38] kvm: use directly cpu_physical_memory_* api for tracking dirty pages
On 12/17/2013 05:26 PM, Juan Quintela wrote: Performance is important in this function, and we want to optimize even further. Signed-off-by: Juan Quintela --- kvm-all.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 3937754..308dfba 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -31,6 +31,7 @@ #include "sysemu/kvm.h" #include "qemu/bswap.h" #include "exec/memory.h" +#include "exec/ram_addr.h" #include "exec/address-spaces.h" #include "qemu/event_notifier.h" #include "trace.h" @@ -382,6 +383,7 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section, unsigned int i, j; unsigned long page_number, c; hwaddr addr, addr1; +ram_addr_t ram_addr; unsigned int pages = int128_get64(section->size) / getpagesize(); unsigned int len = (pages + HOST_LONG_BITS - 1) / HOST_LONG_BITS; unsigned long hpratio = getpagesize() / TARGET_PAGE_SIZE; @@ -399,8 +401,9 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section, page_number = (i * HOST_LONG_BITS + j) * hpratio; addr1 = page_number * TARGET_PAGE_SIZE; addr = section->offset_within_region + addr1; -memory_region_set_dirty(section->mr, addr, -TARGET_PAGE_SIZE * hpratio); +ram_addr = section->mr->ram_addr + addr; +cpu_physical_memory_set_dirty_range(ram_addr, +TARGET_PAGE_SIZE * hpratio); } while (c != 0); } } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 31/38] memory: unfold memory_region_test_and_clear()
On 12/17/2013 05:26 PM, Juan Quintela wrote: We are going to update the bitmap directly Signed-off-by: Juan Quintela --- arch_init.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch_init.c b/arch_init.c index e0acbc5..0e8c8b5 100644 --- a/arch_init.c +++ b/arch_init.c @@ -48,6 +48,7 @@ #include "qmp-commands.h" #include "trace.h" #include "exec/cpu-all.h" +#include "exec/ram_addr.h" #include "hw/acpi/acpi.h" #ifdef DEBUG_ARCH_INIT @@ -400,9 +401,12 @@ static void migration_bitmap_sync(void) QTAILQ_FOREACH(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { -if (memory_region_test_and_clear_dirty(block->mr, - addr, TARGET_PAGE_SIZE, - DIRTY_MEMORY_MIGRATION)) { +if (cpu_physical_memory_get_dirty(block->mr->ram_addr + addr, + TARGET_PAGE_SIZE, + DIRTY_MEMORY_MIGRATION)) { +cpu_physical_memory_reset_dirty(block->mr->ram_addr + addr, +TARGET_PAGE_SIZE, +DIRTY_MEMORY_MIGRATION); migration_bitmap_set_dirty(block->mr, addr); } } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 26/38] memory: cpu_physical_memory_clear_dirty_range() now uses bitmap operations
On 12/17/2013 05:26 PM, Juan Quintela wrote: We were clearing a range of bits, so use bitmap_clear(). Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 4906cdf..e2f55ea 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -95,14 +95,12 @@ static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, ram_addr_t length, unsigned client) { -ram_addr_t addr, end; +unsigned long end, page; assert(client < DIRTY_MEMORY_NUM); -end = TARGET_PAGE_ALIGN(start + length); -start &= TARGET_PAGE_MASK; -for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -clear_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); -} +end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS; +page = start >> TARGET_PAGE_BITS; +bitmap_clear(ram_list.dirty_memory[client], page, end - page); } void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 30/38] memory: split cpu_physical_memory_* functions to its own include
COPYING file in the top-level directory. + * + */ + +/* + * This header is for use by exec.c and memory.c ONLY. Do not include it. + * The functions declared here will be removed soon. + */ + +#ifndef RAM_ADDR_H +#define RAM_ADDR_H + +#ifndef CONFIG_USER_ONLY +#include "hw/xen/xen.h" + +ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, + MemoryRegion *mr); +ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr); +void *qemu_get_ram_ptr(ram_addr_t addr); +void qemu_ram_free(ram_addr_t addr); +void qemu_ram_free_from_ptr(ram_addr_t addr); + +static inline bool cpu_physical_memory_get_dirty(ram_addr_t start, + ram_addr_t length, + unsigned client) +{ +unsigned long end, page, next; + +assert(client < DIRTY_MEMORY_NUM); + +end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS; +page = start >> TARGET_PAGE_BITS; +next = find_next_bit(ram_list.dirty_memory[client], end, page); + +return next < end; +} + +static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr, + unsigned client) +{ +return cpu_physical_memory_get_dirty(addr, 1, client); +} + +static inline bool cpu_physical_memory_is_clean(ram_addr_t addr) +{ +bool vga = cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_VGA); +bool code = cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_CODE); +bool migration = +cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_MIGRATION); +return !(vga && code && migration); +} + +static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr, + unsigned client) +{ +assert(client < DIRTY_MEMORY_NUM); +set_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); +} + +static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, + ram_addr_t length) +{ +unsigned long end, page; + +end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS; +page = start >> TARGET_PAGE_BITS; +bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION], page, end - page); +bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_VGA], page, end - page); +bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_CODE], page, end - page); +xen_modified_memory(start, length); +} + +static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, + ram_addr_t length, + unsigned client) +{ +unsigned long end, page; + +assert(client < DIRTY_MEMORY_NUM); +end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS; +page = start >> TARGET_PAGE_BITS; +bitmap_clear(ram_list.dirty_memory[client], page, end - page); +} + +void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t length, + unsigned client); + +#endif +#endif diff --git a/memory.c b/memory.c index c010296..59ecc28 100644 --- a/memory.c +++ b/memory.c @@ -22,6 +22,7 @@ #include #include "exec/memory-internal.h" +#include "exec/ram_addr.h" //#define DEBUG_UNASSIGNED Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 29/38] memory: cpu_physical_memory_set_dirty_tracking() should return void
On 12/17/2013 05:26 PM, Juan Quintela wrote: Result was always 0, and not used anywhere. Once there, use bool type for the parameter. Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- exec.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/exec.c b/exec.c index ba5989c..d165fd8 100644 --- a/exec.c +++ b/exec.c @@ -56,7 +56,7 @@ //#define DEBUG_SUBPAGE #if !defined(CONFIG_USER_ONLY) -static int in_migration; +static bool in_migration; RAMList ram_list = { .blocks = QTAILQ_HEAD_INITIALIZER(ram_list.blocks) }; @@ -751,11 +751,9 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t length, } } -static int cpu_physical_memory_set_dirty_tracking(int enable) +static void cpu_physical_memory_set_dirty_tracking(bool enable) { -int ret = 0; in_migration = enable; -return ret; } hwaddr memory_region_section_get_iotlb(CPUArchState *env, @@ -1797,12 +1795,12 @@ static void tcg_commit(MemoryListener *listener) static void core_log_global_start(MemoryListener *listener) { -cpu_physical_memory_set_dirty_tracking(1); +cpu_physical_memory_set_dirty_tracking(true); } static void core_log_global_stop(MemoryListener *listener) { -cpu_physical_memory_set_dirty_tracking(0); +cpu_physical_memory_set_dirty_tracking(false); } static MemoryListener core_memory_listener = { Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 28/38] memory: make cpu_physical_memory_reset_dirty() take a length parameter
On 12/17/2013 05:26 PM, Juan Quintela wrote: We have an end parameter in all the callers, and this make it coherent with the rest of cpu_physical_memory_* functions, that also take a length parameter. Once here, move the start/end calculation to tlb_reset_dirty_range_all() as we don't need it here anymore. Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- cputlb.c | 3 +-- exec.c | 19 --- include/exec/memory-internal.h | 2 +- memory.c | 8 ++-- 4 files changed, 12 insertions(+), 20 deletions(-) diff --git a/cputlb.c b/cputlb.c index 911d764..a5805e1 100644 --- a/cputlb.c +++ b/cputlb.c @@ -127,8 +127,7 @@ void tlb_flush_page(CPUArchState *env, target_ulong addr) can be detected */ void tlb_protect_code(ram_addr_t ram_addr) { -cpu_physical_memory_reset_dirty(ram_addr, -ram_addr + TARGET_PAGE_SIZE, +cpu_physical_memory_reset_dirty(ram_addr, TARGET_PAGE_SIZE, DIRTY_MEMORY_CODE); } diff --git a/exec.c b/exec.c index a2f89eb..ba5989c 100644 --- a/exec.c +++ b/exec.c @@ -723,11 +723,14 @@ found: return block; } -static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t end, - uintptr_t length) +static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t length) { -RAMBlock *block; ram_addr_t start1; +RAMBlock *block; +ram_addr_t end; + +end = TARGET_PAGE_ALIGN(start + length); +start &= TARGET_PAGE_MASK; block = qemu_get_ram_block(start); assert(block == qemu_get_ram_block(end - 1)); @@ -736,21 +739,15 @@ static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t end, } /* Note: start and end must be within the same ram block. */ -void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end, +void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t length, unsigned client) { -uintptr_t length; - -start &= TARGET_PAGE_MASK; -end = TARGET_PAGE_ALIGN(end); - -length = end - start; if (length == 0) return; cpu_physical_memory_clear_dirty_range(start, length, client); if (tcg_enabled()) { -tlb_reset_dirty_range_all(start, end, length); +tlb_reset_dirty_range_all(start, length); } } diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 771b23f..cb2249f 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -102,7 +102,7 @@ static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, bitmap_clear(ram_list.dirty_memory[client], page, end - page); } -void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end, +void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t length, unsigned client); #endif diff --git a/memory.c b/memory.c index a490cbd..c010296 100644 --- a/memory.c +++ b/memory.c @@ -1191,9 +1191,7 @@ bool memory_region_test_and_clear_dirty(MemoryRegion *mr, hwaddr addr, assert(mr->terminates); ret = cpu_physical_memory_get_dirty(mr->ram_addr + addr, size, client); if (ret) { -cpu_physical_memory_reset_dirty(mr->ram_addr + addr, -mr->ram_addr + addr + size, -client); +cpu_physical_memory_reset_dirty(mr->ram_addr + addr, size, client); } return ret; } @@ -1239,9 +1237,7 @@ void memory_region_reset_dirty(MemoryRegion *mr, hwaddr addr, hwaddr size, unsigned client) { assert(mr->terminates); -cpu_physical_memory_reset_dirty(mr->ram_addr + addr, -mr->ram_addr + addr + size, -client); +cpu_physical_memory_reset_dirty(mr->ram_addr + addr, size, client); } void *memory_region_get_ram_ptr(MemoryRegion *mr) Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 27/38] memory: s/dirty/clean/ in cpu_physical_memory_is_dirty()
On 12/17/2013 05:26 PM, Juan Quintela wrote: All uses except one really want the other meaning. Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- cputlb.c | 3 ++- exec.c | 6 +++--- include/exec/memory-internal.h | 5 ++--- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/cputlb.c b/cputlb.c index dfd747a..911d764 100644 --- a/cputlb.c +++ b/cputlb.c @@ -299,7 +299,8 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr, /* Write access calls the I/O callback. */ te->addr_write = address | TLB_MMIO; } else if (memory_region_is_ram(section->mr) - && !cpu_physical_memory_is_dirty(section->mr->ram_addr + xlat)) { + && cpu_physical_memory_is_clean(section->mr->ram_addr + + xlat)) { te->addr_write = address | TLB_NOTDIRTY; } else { te->addr_write = address; diff --git a/exec.c b/exec.c index 6bef3e5..a2f89eb 100644 --- a/exec.c +++ b/exec.c @@ -1513,7 +1513,7 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr, cpu_physical_memory_set_dirty_flag(ram_addr, DIRTY_MEMORY_VGA); /* we remove the notdirty callback only if the code has been flushed */ -if (cpu_physical_memory_is_dirty(ram_addr)) { +if (!cpu_physical_memory_is_clean(ram_addr)) { CPUArchState *env = current_cpu->env_ptr; tlb_set_dirty(env, env->mem_io_vaddr); } @@ -1916,7 +1916,7 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr, static void invalidate_and_set_dirty(hwaddr addr, hwaddr length) { -if (!cpu_physical_memory_is_dirty(addr)) { +if (cpu_physical_memory_is_clean(addr)) { /* invalidate code */ tb_invalidate_phys_page_range(addr, addr + length, 0); /* set dirty bit */ @@ -2499,7 +2499,7 @@ void stl_phys_notdirty(hwaddr addr, uint32_t val) stl_p(ptr, val); if (unlikely(in_migration)) { -if (!cpu_physical_memory_is_dirty(addr1)) { +if (cpu_physical_memory_is_clean(addr1)) { /* invalidate code */ tb_invalidate_phys_page_range(addr1, addr1 + 4, 0); /* set dirty bit */ diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index e2f55ea..771b23f 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -61,14 +61,13 @@ static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr, return cpu_physical_memory_get_dirty(addr, 1, client); } -/* read dirty bit (return 0 or 1) */ -static inline bool cpu_physical_memory_is_dirty(ram_addr_t addr) +static inline bool cpu_physical_memory_is_clean(ram_addr_t addr) { bool vga = cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_VGA); bool code = cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_CODE); bool migration = cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_MIGRATION); -return vga && code && migration; +return !(vga && code && migration); } static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 25/38] memory: cpu_physical_memory_set_dirty_range() now uses bitmap operations
On 12/17/2013 05:26 PM, Juan Quintela wrote: We were setting a range of bits, so use bitmap_set(). Note: xen has always been wrong, and should have used start instead of addr from the beginning. Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index b017c2e..4906cdf 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -81,19 +81,14 @@ static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr, static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, ram_addr_t length) { -ram_addr_t addr, end; +unsigned long end, page; -end = TARGET_PAGE_ALIGN(start + length); -start &= TARGET_PAGE_MASK; -for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -set_bit(addr >> TARGET_PAGE_BITS, -ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]); -set_bit(addr >> TARGET_PAGE_BITS, -ram_list.dirty_memory[DIRTY_MEMORY_VGA]); -set_bit(addr >> TARGET_PAGE_BITS, -ram_list.dirty_memory[DIRTY_MEMORY_CODE]); -} -xen_modified_memory(addr, length); +end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS; +page = start >> TARGET_PAGE_BITS; +bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION], page, end - page); +bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_VGA], page, end - page); +bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_CODE], page, end - page); +xen_modified_memory(start, length); } static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 24/38] memory: use find_next_bit() to find dirty bits
On 12/17/2013 05:26 PM, Juan Quintela wrote: This operation is way faster than doing it bit by bit. Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index c04a92a..b017c2e 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -44,19 +44,15 @@ static inline bool cpu_physical_memory_get_dirty(ram_addr_t start, ram_addr_t length, unsigned client) { -ram_addr_t addr, end; +unsigned long end, page, next; assert(client < DIRTY_MEMORY_NUM); -end = TARGET_PAGE_ALIGN(start + length); -start &= TARGET_PAGE_MASK; -for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -if (test_bit(addr >> TARGET_PAGE_BITS, - ram_list.dirty_memory[client])) { -return true; -} -} -return false; +end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS; +page = start >> TARGET_PAGE_BITS; +next = find_next_bit(ram_list.dirty_memory[client], end, page); + +return next < end; } static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCHv2] block: add native support for NFS
On 12/18/2013 01:11 PM, Peter Lieven wrote: Am 18.12.2013 um 11:00 schrieb Orit Wasserman : On 12/18/2013 01:03 AM, Peter Lieven wrote: Am 17.12.2013 um 18:32 schrieb "Daniel P. Berrange" : On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote: This patch adds native support for accessing images on NFS shares without the requirement to actually mount the entire NFS share on the host. NFS Images can simply be specified by an url of the form: nfs: For example: qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2 Does it support other config tunables, eg specifying which NFS version to use 2/3/4 ? If so will they be available as URI parameters in the obvious manner ? currently only v3 is supported by libnfs. what other tunables would you like to see? For live migration we need the sync option (async ignores O_SYNC and O_DIRECT sadly), will it be supported? or will it be the default? the async you see in the libnfs calls refer to the async API. bdrv_flush will not return before the nfs server completes the request. That great! Thanks, Orit Peter Orit Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCH 23/38] memory: s/mask/clear/ cpu_physical_memory_mask_dirty_range
On 12/17/2013 05:25 PM, Juan Quintela wrote: Now all functions use the same wording that bitops/bitmap operations Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- exec.c | 2 +- include/exec/memory-internal.h | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/exec.c b/exec.c index ad9866c..6bef3e5 100644 --- a/exec.c +++ b/exec.c @@ -747,7 +747,7 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end, length = end - start; if (length == 0) return; -cpu_physical_memory_mask_dirty_range(start, length, client); +cpu_physical_memory_clear_dirty_range(start, length, client); if (tcg_enabled()) { tlb_reset_dirty_range_all(start, end, length); diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index fa28fc6..c04a92a 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -100,9 +100,9 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, xen_modified_memory(addr, length); } -static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start, -ram_addr_t length, -unsigned client) +static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, + ram_addr_t length, + unsigned client) { ram_addr_t addr, end; Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 18/38] memory: unfold cpu_physical_memory_clear_dirty_flag() in its only user
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 6fb1b64..2f6610a 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -86,13 +86,6 @@ static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_CODE); } -static inline void cpu_physical_memory_clear_dirty_flag(ram_addr_t addr, - unsigned client) -{ -assert(client < DIRTY_MEMORY_NUM); -clear_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); -} - static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, ram_addr_t length) { @@ -112,10 +105,11 @@ static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start, { ram_addr_t addr, end; +assert(client < DIRTY_MEMORY_NUM); end = TARGET_PAGE_ALIGN(start + length); start &= TARGET_PAGE_MASK; for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -cpu_physical_memory_clear_dirty_flag(addr, client); +clear_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); } } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 22/38] memory: cpu_physical_memory_get_dirty() is used as returning a bool
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index edca8a8..fa28fc6 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -40,11 +40,10 @@ void *qemu_get_ram_ptr(ram_addr_t addr); void qemu_ram_free(ram_addr_t addr); void qemu_ram_free_from_ptr(ram_addr_t addr); -static inline int cpu_physical_memory_get_dirty(ram_addr_t start, -ram_addr_t length, -unsigned client) +static inline bool cpu_physical_memory_get_dirty(ram_addr_t start, + ram_addr_t length, + unsigned client) { -int ret = 0; ram_addr_t addr, end; assert(client < DIRTY_MEMORY_NUM); @@ -52,10 +51,12 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t start, end = TARGET_PAGE_ALIGN(start + length); start &= TARGET_PAGE_MASK; for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -ret |= test_bit(addr >> TARGET_PAGE_BITS, -ram_list.dirty_memory[client]); +if (test_bit(addr >> TARGET_PAGE_BITS, + ram_list.dirty_memory[client])) { +return true; +} } -return ret; +return false; } static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 21/38] memory: make cpu_physical_memory_get_dirty() the main function
On 12/17/2013 05:25 PM, Juan Quintela wrote: And make cpu_physical_memory_get_dirty_flag() to use it. It used to be the other way around. Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 36 +++- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index b99617a..edca8a8 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -40,11 +40,28 @@ void *qemu_get_ram_ptr(ram_addr_t addr); void qemu_ram_free(ram_addr_t addr); void qemu_ram_free_from_ptr(ram_addr_t addr); +static inline int cpu_physical_memory_get_dirty(ram_addr_t start, +ram_addr_t length, +unsigned client) +{ +int ret = 0; +ram_addr_t addr, end; + +assert(client < DIRTY_MEMORY_NUM); + +end = TARGET_PAGE_ALIGN(start + length); +start &= TARGET_PAGE_MASK; +for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { +ret |= test_bit(addr >> TARGET_PAGE_BITS, +ram_list.dirty_memory[client]); +} +return ret; +} + static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr, unsigned client) { -assert(client < DIRTY_MEMORY_NUM); -return test_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); +return cpu_physical_memory_get_dirty(addr, 1, client); } /* read dirty bit (return 0 or 1) */ @@ -57,21 +74,6 @@ static inline bool cpu_physical_memory_is_dirty(ram_addr_t addr) return vga && code && migration; } -static inline int cpu_physical_memory_get_dirty(ram_addr_t start, -ram_addr_t length, -unsigned client) -{ -int ret = 0; -ram_addr_t addr, end; - -end = TARGET_PAGE_ALIGN(start + length); -start &= TARGET_PAGE_MASK; -for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -ret |= cpu_physical_memory_get_dirty_flag(addr, client); -} -return ret; -} - static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr, unsigned client) { Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 20/38] memory: unfold cpu_physical_memory_set_dirty_flag()
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 9d34434..b99617a 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -87,9 +87,12 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, end = TARGET_PAGE_ALIGN(start + length); start &= TARGET_PAGE_MASK; for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_MIGRATION); -cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_VGA); -cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_CODE); +set_bit(addr >> TARGET_PAGE_BITS, +ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]); +set_bit(addr >> TARGET_PAGE_BITS, +ram_list.dirty_memory[DIRTY_MEMORY_VGA]); +set_bit(addr >> TARGET_PAGE_BITS, +ram_list.dirty_memory[DIRTY_MEMORY_CODE]); } xen_modified_memory(addr, length); } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 19/38] memory: unfold cpu_physical_memory_set_dirty() in its only user
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 2f6610a..9d34434 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -79,13 +79,6 @@ static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr, set_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); } -static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) -{ -cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_MIGRATION); -cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_VGA); -cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_CODE); -} - static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, ram_addr_t length) { @@ -94,7 +87,9 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, end = TARGET_PAGE_ALIGN(start + length); start &= TARGET_PAGE_MASK; for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -cpu_physical_memory_set_dirty(addr); +cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_MIGRATION); +cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_VGA); +cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_CODE); } xen_modified_memory(addr, length); } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 18/38] memory: unfold cpu_physical_memory_clear_dirty_flag() in its only user
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 6fb1b64..2f6610a 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -86,13 +86,6 @@ static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_CODE); } -static inline void cpu_physical_memory_clear_dirty_flag(ram_addr_t addr, - unsigned client) -{ -assert(client < DIRTY_MEMORY_NUM); -clear_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); -} - static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, ram_addr_t length) { @@ -112,10 +105,11 @@ static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start, { ram_addr_t addr, end; +assert(client < DIRTY_MEMORY_NUM); end = TARGET_PAGE_ALIGN(start + length); start &= TARGET_PAGE_MASK; for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -cpu_physical_memory_clear_dirty_flag(addr, client); +clear_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); } } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCHv2] block: add native support for NFS
On 12/18/2013 12:18 PM, Daniel P. Berrange wrote: On Wed, Dec 18, 2013 at 12:00:03PM +0200, Orit Wasserman wrote: On 12/18/2013 01:03 AM, Peter Lieven wrote: Am 17.12.2013 um 18:32 schrieb "Daniel P. Berrange" : On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote: This patch adds native support for accessing images on NFS shares without the requirement to actually mount the entire NFS share on the host. NFS Images can simply be specified by an url of the form: nfs: For example: qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2 Does it support other config tunables, eg specifying which NFS version to use 2/3/4 ? If so will they be available as URI parameters in the obvious manner ? currently only v3 is supported by libnfs. what other tunables would you like to see? For live migration we need the sync option (async ignores O_SYNC and O_DIRECT sadly), will it be supported? or will it be the default? Since this is bypassing the client kernel FS I/O layer question around support of things like O_SYNC/O_DIRECT are not applicable. so no live migration support? Daniel
Re: [Qemu-devel] [PATCH 17/38] memory: split dirty bitmap into three
On 12/17/2013 05:25 PM, Juan Quintela wrote: After all the previous patches, spliting the bitmap gets direct. Note: For some reason, I have to move DIRTY_MEMORY_* definitions to the beginning of memory.h to make compilation work. Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- exec.c | 9 ++--- include/exec/cpu-all.h | 3 ++- include/exec/memory-internal.h | 9 +++-- include/exec/memory.h | 10 +- 4 files changed, 16 insertions(+), 15 deletions(-) diff --git a/exec.c b/exec.c index bed5c07..ad9866c 100644 --- a/exec.c +++ b/exec.c @@ -1276,9 +1276,12 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, new_ram_size = last_ram_offset() >> TARGET_PAGE_BITS; if (new_ram_size > old_ram_size) { -ram_list.phys_dirty = g_realloc(ram_list.phys_dirty, new_ram_size); -memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS), - 0, size >> TARGET_PAGE_BITS); +int i; +for (i = 0; i < DIRTY_MEMORY_NUM; i++) { +ram_list.dirty_memory[i] = +bitmap_zero_extend(ram_list.dirty_memory[i], + old_ram_size, new_ram_size); + } } cpu_physical_memory_set_dirty_range(new_block->offset, size); diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index b6998f0..4cb4b4a 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -21,6 +21,7 @@ #include "qemu-common.h" #include "exec/cpu-common.h" +#include "exec/memory.h" #include "qemu/thread.h" #include "qom/cpu.h" @@ -459,7 +460,7 @@ typedef struct RAMBlock { typedef struct RAMList { QemuMutex mutex; /* Protected by the iothread lock. */ -uint8_t *phys_dirty; +unsigned long *dirty_memory[DIRTY_MEMORY_NUM]; RAMBlock *mru_block; /* Protected by the ramlist lock. */ QTAILQ_HEAD(, RAMBlock) blocks; diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 666490c..6fb1b64 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -44,7 +44,7 @@ static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr, unsigned client) { assert(client < DIRTY_MEMORY_NUM); -return ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] & (1 << client); +return test_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); } /* read dirty bit (return 0 or 1) */ @@ -76,7 +76,7 @@ static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr, unsigned client) { assert(client < DIRTY_MEMORY_NUM); -ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |= (1 << client); +set_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); } static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) @@ -89,11 +89,8 @@ static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) static inline void cpu_physical_memory_clear_dirty_flag(ram_addr_t addr, unsigned client) { -int mask = ~(1 << client); - assert(client < DIRTY_MEMORY_NUM); - -ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] &= mask; +clear_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]); } static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, diff --git a/include/exec/memory.h b/include/exec/memory.h index d5e9d58..296d6ab 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -16,6 +16,11 @@ #ifndef CONFIG_USER_ONLY +#define DIRTY_MEMORY_VGA 0 +#define DIRTY_MEMORY_CODE 1 +#define DIRTY_MEMORY_MIGRATION 2 +#define DIRTY_MEMORY_NUM 3/* num of dirty bits */ + #include #include #include "qemu-common.h" @@ -33,11 +38,6 @@ typedef struct MemoryRegionOps MemoryRegionOps; typedef struct MemoryRegionMmio MemoryRegionMmio; -#define DIRTY_MEMORY_VGA 0 -#define DIRTY_MEMORY_CODE 1 -#define DIRTY_MEMORY_MIGRATION 2 -#define DIRTY_MEMORY_NUM 3 /* num of dirty bits */ - struct MemoryRegionMmio { CPUReadMemoryFunc *read[3]; CPUWriteMemoryFunc *write[3]; Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 16/38] bitmap: Add bitmap_zero_extend operation
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/qemu/bitmap.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h index afdd257..1babd5d 100644 --- a/include/qemu/bitmap.h +++ b/include/qemu/bitmap.h @@ -220,4 +220,13 @@ unsigned long bitmap_find_next_zero_area(unsigned long *map, unsigned long nr, unsigned long align_mask); +static inline unsigned long *bitmap_zero_extend(unsigned long *old, +long old_nbits, long new_nbits) +{ +long new_len = BITS_TO_LONGS(new_nbits) * sizeof(unsigned long); +unsigned long *new = g_realloc(old, new_len); +bitmap_clear(new, old_nbits, new_nbits - old_nbits); +return new; +} + #endif /* BITMAP_H */ Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 14/38] memory: only resize dirty bitmap when memory size increases
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- exec.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/exec.c b/exec.c index 9996da2..bed5c07 100644 --- a/exec.c +++ b/exec.c @@ -1210,6 +1210,9 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, MemoryRegion *mr) { RAMBlock *block, *new_block; +ram_addr_t old_ram_size, new_ram_size; + +old_ram_size = last_ram_offset() >> TARGET_PAGE_BITS; size = TARGET_PAGE_ALIGN(size); new_block = g_malloc0(sizeof(*new_block)); @@ -1270,10 +1273,13 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, ram_list.version++; qemu_mutex_unlock_ramlist(); -ram_list.phys_dirty = g_realloc(ram_list.phys_dirty, - last_ram_offset() >> TARGET_PAGE_BITS); -memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS), +new_ram_size = last_ram_offset() >> TARGET_PAGE_BITS; + +if (new_ram_size > old_ram_size) { +ram_list.phys_dirty = g_realloc(ram_list.phys_dirty, new_ram_size); +memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS), 0, size >> TARGET_PAGE_BITS); +} cpu_physical_memory_set_dirty_range(new_block->offset, size); qemu_ram_setup_dump(new_block->host, size); Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 15/38] memory: cpu_physical_memory_clear_dirty_flag() result is never used
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index d09d6d8..666490c 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -86,14 +86,14 @@ static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_CODE); } -static inline int cpu_physical_memory_clear_dirty_flag(ram_addr_t addr, +static inline void cpu_physical_memory_clear_dirty_flag(ram_addr_t addr, unsigned client) { int mask = ~(1 << client); assert(client < DIRTY_MEMORY_NUM); -return ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] &= mask; +ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] &= mask; } static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 13/38] memory: make sure that client is always inside range
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 4 1 file changed, 4 insertions(+) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index b58010f..d09d6d8 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -43,6 +43,7 @@ void qemu_ram_free_from_ptr(ram_addr_t addr); static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr, unsigned client) { +assert(client < DIRTY_MEMORY_NUM); return ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] & (1 << client); } @@ -74,6 +75,7 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t start, static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr, unsigned client) { +assert(client < DIRTY_MEMORY_NUM); ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |= (1 << client); } @@ -89,6 +91,8 @@ static inline int cpu_physical_memory_clear_dirty_flag(ram_addr_t addr, { int mask = ~(1 << client); +assert(client < DIRTY_MEMORY_NUM); + return ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] &= mask; } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 12/38] memory: use bit 2 for migration
On 12/17/2013 05:25 PM, Juan Quintela wrote: For historical reasons it was bit 3. Once there, create a constant to know the number of clients. Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index b8e76f4..d5e9d58 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -35,7 +35,8 @@ typedef struct MemoryRegionMmio MemoryRegionMmio; #define DIRTY_MEMORY_VGA 0 #define DIRTY_MEMORY_CODE 1 -#define DIRTY_MEMORY_MIGRATION 3 +#define DIRTY_MEMORY_MIGRATION 2 +#define DIRTY_MEMORY_NUM 3/* num of dirty bits */ struct MemoryRegionMmio { CPUReadMemoryFunc *read[3]; Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 11/38] memory: cpu_physical_memory_mask_dirty_range() always clears a single flag
cpu_physical_memory_get_dirty(mr->ram_addr + addr, size, client); if (ret) { cpu_physical_memory_reset_dirty(mr->ram_addr + addr, mr->ram_addr + addr + size, -1 << client); +client); } return ret; } @@ -1243,7 +1241,7 @@ void memory_region_reset_dirty(MemoryRegion *mr, hwaddr addr, assert(mr->terminates); cpu_physical_memory_reset_dirty(mr->ram_addr + addr, mr->ram_addr + addr + size, - 1 << client); +client); } void *memory_region_get_ram_ptr(MemoryRegion *mr) Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 10/38] memory: cpu_physical_memory_set_dirty_range() always dirty all flags
On 12/17/2013 05:25 PM, Juan Quintela wrote: So remove the flag argument and do it directly. After this change, there is nothing else using cpu_physical_memory_set_dirty_flags() so remove it. Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- exec.c | 2 +- include/exec/memory-internal.h | 11 ++- memory.c | 2 +- 3 files changed, 4 insertions(+), 11 deletions(-) diff --git a/exec.c b/exec.c index 6981f73..45fdb84 100644 --- a/exec.c +++ b/exec.c @@ -1274,7 +1274,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, last_ram_offset() >> TARGET_PAGE_BITS); memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS), 0, size >> TARGET_PAGE_BITS); -cpu_physical_memory_set_dirty_range(new_block->offset, size, 0xff); +cpu_physical_memory_set_dirty_range(new_block->offset, size); qemu_ram_setup_dump(new_block->host, size); qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE); diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 9f4ad69..681d63b 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -75,12 +75,6 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t start, return ret; } -static inline void cpu_physical_memory_set_dirty_flags(ram_addr_t addr, - int dirty_flags) -{ -ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |= dirty_flags; -} - static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr, int dirty_flag) { @@ -103,15 +97,14 @@ static inline int cpu_physical_memory_clear_dirty_flags(ram_addr_t addr, } static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, - ram_addr_t length, - int dirty_flags) + ram_addr_t length) { ram_addr_t addr, end; end = TARGET_PAGE_ALIGN(start + length); start &= TARGET_PAGE_MASK; for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -cpu_physical_memory_set_dirty_flags(addr, dirty_flags); +cpu_physical_memory_set_dirty(addr); } xen_modified_memory(addr, length); } diff --git a/memory.c b/memory.c index e497f99..fb52e1c 100644 --- a/memory.c +++ b/memory.c @@ -1182,7 +1182,7 @@ void memory_region_set_dirty(MemoryRegion *mr, hwaddr addr, hwaddr size) { assert(mr->terminates); -cpu_physical_memory_set_dirty_range(mr->ram_addr + addr, size, -1); +cpu_physical_memory_set_dirty_range(mr->ram_addr + addr, size); } bool memory_region_test_and_clear_dirty(MemoryRegion *mr, hwaddr addr, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 09/38] memory: set single dirty flags when possible
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- exec.c | 7 --- include/exec/memory-internal.h | 4 +++- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/exec.c b/exec.c index a1fc280..6981f73 100644 --- a/exec.c +++ b/exec.c @@ -1911,7 +1911,8 @@ static void invalidate_and_set_dirty(hwaddr addr, /* invalidate code */ tb_invalidate_phys_page_range(addr, addr + length, 0); /* set dirty bit */ -cpu_physical_memory_set_dirty_flags(addr, (0xff & ~CODE_DIRTY_FLAG)); +cpu_physical_memory_set_dirty_flag(addr, VGA_DIRTY_FLAG); +cpu_physical_memory_set_dirty_flag(addr, MIGRATION_DIRTY_FLAG); } xen_modified_memory(addr, length); } @@ -2493,8 +2494,8 @@ void stl_phys_notdirty(hwaddr addr, uint32_t val) /* invalidate code */ tb_invalidate_phys_page_range(addr1, addr1 + 4, 0); /* set dirty bit */ -cpu_physical_memory_set_dirty_flags( -addr1, (0xff & ~CODE_DIRTY_FLAG)); +cpu_physical_memory_set_dirty_flag(addr1, MIGRATION_DIRTY_FLAG); +cpu_physical_memory_set_dirty_flag(addr1, VGA_DIRTY_FLAG); } } } diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 53cfe83..9f4ad69 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -89,7 +89,9 @@ static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr, static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) { -cpu_physical_memory_set_dirty_flags(addr, 0xff); +cpu_physical_memory_set_dirty_flag(addr, MIGRATION_DIRTY_FLAG); +cpu_physical_memory_set_dirty_flag(addr, VGA_DIRTY_FLAG); +cpu_physical_memory_set_dirty_flag(addr, CODE_DIRTY_FLAG); } static inline int cpu_physical_memory_clear_dirty_flags(ram_addr_t addr, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 08/38] memory: all users of cpu_physical_memory_get_dirty used only one flag
On 12/17/2013 05:25 PM, Juan Quintela wrote: So cpu_physical_memory_get_dirty_flags is not needed anymore Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- include/exec/memory-internal.h | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 0b25c3f..53cfe83 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -44,11 +44,6 @@ void qemu_ram_free_from_ptr(ram_addr_t addr); #define CODE_DIRTY_FLAG 0x02 #define MIGRATION_DIRTY_FLAG 0x08 -static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr) -{ -return ram_list.phys_dirty[addr >> TARGET_PAGE_BITS]; -} - static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr, int dirty_flag) { @@ -67,7 +62,7 @@ static inline bool cpu_physical_memory_is_dirty(ram_addr_t addr) static inline int cpu_physical_memory_get_dirty(ram_addr_t start, ram_addr_t length, -int dirty_flags) +int dirty_flag) { int ret = 0; ram_addr_t addr, end; @@ -75,7 +70,7 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t start, end = TARGET_PAGE_ALIGN(start + length); start &= TARGET_PAGE_MASK; for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -ret |= cpu_physical_memory_get_dirty_flags(addr) & dirty_flags; +ret |= cpu_physical_memory_get_dirty_flag(addr, dirty_flag); } return ret; } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 07/38] memory: make cpu_physical_memory_is_dirty return bool
On 12/17/2013 05:25 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela Reviewed-by: Eric Blake --- exec.c | 7 ++- include/exec/memory-internal.h | 8 ++-- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/exec.c b/exec.c index 95bcf70..a1fc280 100644 --- a/exec.c +++ b/exec.c @@ -1484,11 +1484,8 @@ found: static void notdirty_mem_write(void *opaque, hwaddr ram_addr, uint64_t val, unsigned size) { -int dirty_flags; -dirty_flags = cpu_physical_memory_get_dirty_flags(ram_addr); if (!cpu_physical_memory_get_dirty_flag(ram_addr, CODE_DIRTY_FLAG)) { tb_invalidate_phys_page_fast(ram_addr, size); -dirty_flags = cpu_physical_memory_get_dirty_flags(ram_addr); } switch (size) { case 1: @@ -1503,8 +1500,8 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr, default: abort(); } -dirty_flags |= (0xff & ~CODE_DIRTY_FLAG); -cpu_physical_memory_set_dirty_flags(ram_addr, dirty_flags); +cpu_physical_memory_set_dirty_flag(ram_addr, MIGRATION_DIRTY_FLAG); +cpu_physical_memory_set_dirty_flag(ram_addr, VGA_DIRTY_FLAG); /* we remove the notdirty callback only if the code has been flushed */ if (cpu_physical_memory_is_dirty(ram_addr)) { diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 136198c..0b25c3f 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -56,9 +56,13 @@ static inline bool cpu_physical_memory_get_dirty_flag(ram_addr_t addr, } /* read dirty bit (return 0 or 1) */ -static inline int cpu_physical_memory_is_dirty(ram_addr_t addr) +static inline bool cpu_physical_memory_is_dirty(ram_addr_t addr) { -return cpu_physical_memory_get_dirty_flags(addr) == 0xff; +bool vga = cpu_physical_memory_get_dirty_flag(addr, VGA_DIRTY_FLAG); +bool code = cpu_physical_memory_get_dirty_flag(addr, CODE_DIRTY_FLAG); +bool migration = +cpu_physical_memory_get_dirty_flag(addr, MIGRATION_DIRTY_FLAG); +return vga && code && migration; } static inline int cpu_physical_memory_get_dirty(ram_addr_t start, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCHv2] block: add native support for NFS
On 12/18/2013 01:03 AM, Peter Lieven wrote: Am 17.12.2013 um 18:32 schrieb "Daniel P. Berrange" : On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote: This patch adds native support for accessing images on NFS shares without the requirement to actually mount the entire NFS share on the host. NFS Images can simply be specified by an url of the form: nfs: For example: qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2 Does it support other config tunables, eg specifying which NFS version to use 2/3/4 ? If so will they be available as URI parameters in the obvious manner ? currently only v3 is supported by libnfs. what other tunables would you like to see? For live migration we need the sync option (async ignores O_SYNC and O_DIRECT sadly), will it be supported? or will it be the default? Orit Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCH 0/9] vmstate code split + unit tests
On 11/28/2013 04:01 PM, Eduardo Habkost wrote: This series separates the QEMUFile and VMState code from savevm.c, and adds a few unit tests to the VMState code. Eduardo Habkost (9): qemu-file: Make a few functions non-static migration: Move QEMU_VM_* defines to migration/migration.h savevm: Convert all tabs to spaces savevm.c: Coding style fixes savevm.c: Coding style fix vmstate: Move VMState code to vmstate.c qemu-file: Move QEMUFile code to qemu-file.c savevm: Small comment about why timer QEMUFile/VMState code is in savevm.c tests: Some unit tests for vmstate.c Makefile.objs |2 + include/migration/migration.h | 11 + include/migration/qemu-file.h |4 + qemu-file.c | 826 + savevm.c | 1590 ++--- tests/.gitignore |1 + tests/Makefile|4 + tests/test-vmstate.c | 357 + vmstate.c | 650 + 9 files changed, 1921 insertions(+), 1524 deletions(-) create mode 100644 qemu-file.c create mode 100644 tests/test-vmstate.c create mode 100644 vmstate.c Series Reviewed-by: Orit Wasserman (with v2 of patch 8)
Re: [Qemu-devel] [PATCH] The calculation of bytes_xfer in qemu_put_buffer() is wrong
On 11/20/2013 01:26 PM, Juan Quintela wrote: From: "Wangting (Kathy)" In qemu_put_buffer(), bytes_xfer += size is wrong, it will be more than expected, and should be bytes_xfer += l. Signed-off-by: zhangmin Signed-off-by: Juan Quintela --- savevm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/savevm.c b/savevm.c index 2f631d4..3f912dd 100644 --- a/savevm.c +++ b/savevm.c @@ -794,7 +794,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size) if (l > size) l = size; memcpy(f->buf + f->buf_index, buf, l); -f->bytes_xfer += size; +f->bytes_xfer += l; if (f->ops->writev_buffer) { add_to_iovec(f, f->buf + f->buf_index, l); } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 05/39] memory: create function to set a single dirty bit
On 11/06/2013 03:04 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela --- cputlb.c | 2 +- include/exec/memory-internal.h | 6 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/cputlb.c b/cputlb.c index fff0afb..72452e5 100644 --- a/cputlb.c +++ b/cputlb.c @@ -137,7 +137,7 @@ void tlb_protect_code(ram_addr_t ram_addr) void tlb_unprotect_code_phys(CPUArchState *env, ram_addr_t ram_addr, target_ulong vaddr) { -cpu_physical_memory_set_dirty_flags(ram_addr, CODE_DIRTY_FLAG); +cpu_physical_memory_set_dirty_flag(ram_addr, CODE_DIRTY_FLAG); } static bool tlb_is_dirty_ram(CPUTLBEntry *tlbe) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index c71a5e6..4ebab80 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -76,6 +76,12 @@ static inline void cpu_physical_memory_set_dirty_flags(ram_addr_t addr, ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |= dirty_flags; } +static inline void cpu_physical_memory_set_dirty_flag(ram_addr_t addr, + int dirty_flag) +{ +ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |= dirty_flag; +} + static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) { cpu_physical_memory_set_dirty_flags(addr, 0xff); Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 04/39] exec: use accessor function to know if memory is dirty
On 11/06/2013 03:04 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela --- exec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exec.c b/exec.c index 79610ce..7cf5634 100644 --- a/exec.c +++ b/exec.c @@ -1397,7 +1397,7 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr, cpu_physical_memory_set_dirty_flags(ram_addr, dirty_flags); /* we remove the notdirty callback only if the code has been flushed */ -if (dirty_flags == 0xff) { +if (cpu_physical_memory_is_dirty(ram_addr)) { CPUArchState *env = current_cpu->env_ptr; tlb_set_dirty(env, env->mem_io_vaddr); } Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 03/39] memory: cpu_physical_memory_set_dirty_range() return void
On 11/06/2013 03:04 PM, Juan Quintela wrote: Signed-off-by: Juan Quintela --- memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/memory.c b/memory.c index 28f6449..9722f23 100644 --- a/memory.c +++ b/memory.c @@ -1182,7 +1182,7 @@ void memory_region_set_dirty(MemoryRegion *mr, hwaddr addr, hwaddr size) { assert(mr->terminates); -return cpu_physical_memory_set_dirty_range(mr->ram_addr + addr, size, -1); +cpu_physical_memory_set_dirty_range(mr->ram_addr + addr, size, -1); } bool memory_region_test_and_clear_dirty(MemoryRegion *mr, hwaddr addr, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 02/39] memory: cpu_physical_memory_set_dirty_flags() result is never used
On 11/06/2013 03:04 PM, Juan Quintela wrote: So return void. Signed-off-by: Juan Quintela --- include/exec/memory-internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index d0e0633..c71a5e6 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -70,10 +70,10 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t start, return ret; } -static inline int cpu_physical_memory_set_dirty_flags(ram_addr_t addr, +static inline void cpu_physical_memory_set_dirty_flags(ram_addr_t addr, int dirty_flags) { -return ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |= dirty_flags; +ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |= dirty_flags; } static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 01/39] Move prototypes to memory.h
On 11/06/2013 03:04 PM, Juan Quintela wrote: As the comment says, it should only be used on "core" memory files. Signed-off-by: Juan Quintela --- include/exec/cpu-common.h | 4 include/exec/memory.h | 4 +++- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h index e4996e1..8110ef0 100644 --- a/include/exec/cpu-common.h +++ b/include/exec/cpu-common.h @@ -51,10 +51,6 @@ typedef void CPUWriteMemoryFunc(void *opaque, hwaddr addr, uint32_t value); typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr); void qemu_ram_remap(ram_addr_t addr, ram_addr_t length); -/* This should not be used by devices. */ -MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr); -void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev); - void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf, int len, int is_write); static inline void cpu_physical_memory_read(hwaddr addr, diff --git a/include/exec/memory.h b/include/exec/memory.h index 480dfbf..c7e151f 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1055,7 +1055,9 @@ void *address_space_map(AddressSpace *as, hwaddr addr, void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len, int is_write, hwaddr access_len); - +/* This should not be used by devices. */ +MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr); +void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev); #endif #endif Reviewed-by: Orit Wasserman
Re: [Qemu-devel] KVM call for agenda for 2013-10-01
On 09/24/2013 05:09 PM, Juan Quintela wrote: > > Hi > > Please, send any topic that you are interested in covering. > > Last week I forgot to send the call for topics. We still have a topic there. > > Thanks, Juan. > > Agenda so far: > - Talk about qemu reverse executing (1st description was done this week) > How to handle IO when we want to do reverse execution. > How this relate to Kemari needs? Hi Juan, I'm sorry but I won't be able to attend to call tomorrow due to travel. Regards, Orit > And to icount changes? > > Call details: > > 10:00 AM to 11:00 AM EDT > Every two weeks > > If you need phone number details, contact me privately. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [Qemu-devel] [PATCH 3/3 resend v2] arch_init: right return for ram_save_iterate
On 09/11/2013 02:27 PM, Paolo Bonzini wrote: > Il 11/09/2013 13:06, Juan Quintela ha scritto: And I think that the right solution is make qemu_get_rate_limit() to return -1 in case of error (or the error, I don't care). >>> >>> You might do both things, it would avoid the useless g_usleep you >>> pointed out below. But Lei's patch is good, because an error could >>> happen exactly during the qemu_put_be64 that writes RAM_SAVE_FLAG_EOS. >> >> Caller checks also. This is the reason I wanted qemu_file_* callers to >> return an error. It has some advantages and some disadvantages. We >> don't agree on which ones are bigger O:-) > > I think the disadvantages are bigger. It litters the code with error > handling, hides where things actually happen, and doesn't even simplify > QEMUFile itself. Checking only at the toplevel is simpler, all we need > to do is ensure that we get there every now and then (and that's what > qemu_file_rate_limit does). > I also prefer the error checking at the top level. Orit savevm.c: qemu_savevm_state_iterate() if (qemu_file_rate_limit(f)) { return 0; } check is incorrect again, we should return an error if there is one error. >>> >>> Nothing cares if qemu_savevm_state_iterate returns 0 or negative, so >>> changing qemu_savevm_state_iterate to only return 0/1 would make sense too. >> >> In this case, 0 means: >> please, call us again >> when what we mean is: >> don't care about calling us again, there is an error. Handle the error. > > Or alternatively, 0 means: > >we haven't finished the work > > when what we mean is: > >we haven't finished the work (BTW, please check if there is an error) > >> Notice that qemu_save_iterate() already returns errors in other code >> paths > > Yes that's also unnecessary. > >> If we change th ereturn value for qemu_file_rate_limit() the change that >> cames with this patch is not needed, that was my point. > > This is what an earlier patch from Lei did. I told him (or her?) to > leave qemu_file_rate_limit aside since the idea behind QEMUFile is to > only handle the error at the top. > > Paolo >
Re: [Qemu-devel] [PATCH RFC 0/4] Curling: KVM Fault Tolerance
On 09/11/2013 04:54 AM, junqing.w...@cs2c.com.cn wrote: > Hi, > >>The first is that if the VM failure happen in the middle on the live >>migration >the backup VM state will be inconsistent which means you can't >>failover to it. > > Yes, I have concerned about this problem. That is why we need a prefetch > buffer. > You are right I missed that. >>Solving it is not simple as you need some transaction mechanism that will >>>change the backup VM state only when the transaction completes (the live >>migration completes). >Kemari has something like that. > > > The backup VM state will be loaded only when the one whole migration data is > prefetched. Otherwise, VM state will not be loaded. So the backup VM is > ensured to have a consistent state like a checkpoint. > However, how close this checkpoint to the point of the VM failure depends on > the workload and bandwidth. > At the moment in your implementation the prefetch buffer can be very large (several copies of guest memory size) are you planning to address this issue? >>The second is that sadly live migration doesn't always converge this means >>>that the backup VM won't have a consist state to failover to. >You need to >>detect such a case and throttle down the guest to force convergence. > > Yes, that's a problem. AFAK, qemu already have an auto convergence feature. How about activating it when you do fault tolerance automatically? > From another perspective, if many migrations could not converge, maybe the > workload is high and the bandwidth is low, and it is not recommended to use > FT in general. > I agree but we need some way to notify the user of such problem. Regards, Orit > >
Re: [Qemu-devel] [PATCH RFC 0/4] Curling: KVM Fault Tolerance
On 09/10/2013 06:43 AM, Jules Wang wrote: > The goal of Curling(sports) is to provide a fault tolerant mechanism for KVM, > so that in the event of a hardware failure, the virtual machine fails over to > the backup in a way that is completely transparent to the guest operating > system. > > Our goal is exactly the same as the goal of Kemari, by which Curling is > inspired. However, Curling is simpler than Kemari(too simple, I afraid): > > * By leveraging live migration feature, we do endless live migrations between > the sender and receiver, so the two virtual machines are synchronized. > Hi, There are two issues I see with your solution, The first is that if the VM failure happen in the middle on the live migration the backup VM state will be inconsistent which means you can't failover to it. Solving it is not simple as you need some transaction mechanism that will change the backup VM state only when the transaction completes (the live migration completes). Kemari has something like that. The second is that sadly live migration doesn't always converge this means that the backup VM won't have a consist state to failover to. You need to detect such a case and throttle down the guest to force convergence. Regards, Orit > * The receiver does not load vm state once the migration begins, instead, it > perfetches one whole migration data into a buffer, then loads vm state from > that > buffer afterwards. This "all or nothing" approach prevents the > broken-in-the-middle problem Kemari has. > > * The sender sleeps a little while after each migration, to ease the > performance > penalty entailed by vm_stop and iothread locks. This is a tradeoff between > performance and accuracy. > > Usage: > The steps of curling are the same as the steps of live migration except the > following: > 1. Start the receiver vm with -incoming curling:tcp:: > 2. Start ft in the qemu monitor of sender vm by following cmdline: >> migrate_set_speed >> migrate curling:tcp:: > 3. Connect to the receiver vm by vnc or spice. The screen of the vm is > displayed > when curling is ready. > 4. Now, the sender vm is protected by ft, When it encounters a failure, > the failover kicks in. > > Problems to be discussed: > 1. When the receiver is prefectching data, how does it know where is the EOF > of > one migration? > > Currently, we use a magic number 0xfeedcafe to indicate the EOF. > Any better solutions? > > 2. How to reduce the overhead entailed by vm_stop and iothread locks? > > Any solutions other than sleeping? > > -- > > Jules Wang (4): > Curling: add doc > Curling: cmdline interface > Curling: the sender > Curling: the receiver > > arch_init.c | 18 +++-- > docs/curling.txt | 52 ++ > include/migration/migration.h | 2 + > include/migration/qemu-file.h | 1 + > include/sysemu/sysemu.h | 1 + > migration.c | 61 ++-- > savevm.c | 158 > -- > 7 files changed, 277 insertions(+), 16 deletions(-) > create mode 100644 docs/curling.txt >
Re: [Qemu-devel] [PATCH 0/3 resend v2] Migration fix
On 09/04/2013 12:02 PM, Lei Li wrote: > This small patch series is extracted from localhost migration > series that Paolo reviewed positively, send it separately as > his request. > > It fixs wrong initialization in RDMA hook ram_control_load_hook() > and gets rid of the negative check for qemu_file_rate_limit() with > bytes_transferred and return value adjusted correspondingly in > ram_save_iterate. > > > Lei Li (3): > savevm: add comments for qemu_file_get_error() > savevm: fix wrong initialization by ram_control_load_hook > arch_init: right return for ram_save_iterate > > arch_init.c | 15 ++- > savevm.c|9 - > 2 files changed, 18 insertions(+), 6 deletions(-) > > Looks good, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH v2] Fix query-migrate documentation in qmp-commands.hx
On 08/19/2013 09:41 PM, Luiz Capitulino wrote: > On Mon, 12 Aug 2013 10:19:52 -0400 > Luiz Capitulino wrote: > >> On Thu, 8 Aug 2013 20:05:48 +0300 >> Orit Wasserman wrote: >> >>> "ram" is present also when migration completes. >>> expected-downtime, total-time and downtime are no longer part of "ram" data. >>> >>> Signed-off-by: Orit Wasserman >> >> Applied to the qmp branch, thanks. > > This one missed the deadline for 1.6. I'm my updating my tree for 1.7, > but this patch conflicts with 8f3067, which repeated for the setup-time > key the mistake this patch is fixing for other keys. > > We need a respin for this patch, which should also move setup-time > to the main dict. Orit, I can do it myself if you want. > That will be great, Thanks, Orit >> >>> --- >>> qmp-commands.hx | 20 ++-- >>> 1 file changed, 10 insertions(+), 10 deletions(-) >>> >>> diff --git a/qmp-commands.hx b/qmp-commands.hx >>> index 2e59b0d..a22a841 100644 >>> --- a/qmp-commands.hx >>> +++ b/qmp-commands.hx >>> @@ -2626,8 +2626,8 @@ The main json-object contains the following: >>> - "expected-downtime": only present while migration is active >>> total amount in ms for downtime that was calculated on >>> the last bitmap round (json-int) >>> -- "ram": only present if "status" is "active", it is a json-object with the >>> - following RAM information: >>> +- "ram": only present if "status" is "active" or "complete", it is a >>> + json-object with the following RAM information: >>> - "transferred": amount transferred in bytes (json-int) >>> - "remaining": amount remaining to transfer in bytes (json-int) >>> - "total": total amount of memory in bytes (json-int) >>> @@ -2669,12 +2669,12 @@ Examples: >>> -> { "execute": "query-migrate" } >>> <- { "return": { >>> "status": "completed", >>> +"total-time":12345, >>> +"downtime":12345, >>> "ram":{ >>>"transferred":123, >>>"remaining":123, >>>"total":246, >>> - "total-time":12345, >>> - "downtime":12345, >>>"duplicate":123, >>>"normal":123, >>>"normal-bytes":123456 >>> @@ -2693,12 +2693,12 @@ Examples: >>> <- { >>>"return":{ >>> "status":"active", >>> + "total-time":12345, >>> + "expected-downtime":12345, >>> "ram":{ >>> "transferred":123, >>> "remaining":123, >>> "total":246, >>> -"total-time":12345, >>> -"expected-downtime":12345, >>> "duplicate":123, >>> "normal":123, >>> "normal-bytes":123456 >>> @@ -2712,12 +2712,12 @@ Examples: >>> <- { >>>"return":{ >>> "status":"active", >>> + "total-time":12345, >>> + "expected-downtime":12345, >>> "ram":{ >>> "total":1057024, >>> "remaining":1053304, >>> "transferred":3720, >>> -"total-time":12345, >>> -"expected-downtime":12345, >>> "duplicate":123, >>> "normal":123, >>> "normal-bytes":123456 >>> @@ -2736,13 +2736,13 @@ Examples: >>> <- { >>>"return":{ >>> "status":"active", >>> + "total-time":12345, >>> + "expected-downtime":12345, >>> "capabilities" : [ { "capability": "xbzrle", "state" : true } ], >>> "ram":{ >>> "total":1057024, >>> "remaining":1053304, >>> "transferred":3720, >>> -"total-time":12345, >>> -"expected-downtime":12345, >>> "duplicate":10, >>> "normal":, >>> "normal-bytes":3412992 >> >
[Qemu-devel] [PATCH v2] Fix query-migrate documentation in qmp-commands.hx
"ram" is present also when migration completes. expected-downtime, total-time and downtime are no longer part of "ram" data. Signed-off-by: Orit Wasserman --- qmp-commands.hx | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 2e59b0d..a22a841 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -2626,8 +2626,8 @@ The main json-object contains the following: - "expected-downtime": only present while migration is active total amount in ms for downtime that was calculated on the last bitmap round (json-int) -- "ram": only present if "status" is "active", it is a json-object with the - following RAM information: +- "ram": only present if "status" is "active" or "complete", it is a + json-object with the following RAM information: - "transferred": amount transferred in bytes (json-int) - "remaining": amount remaining to transfer in bytes (json-int) - "total": total amount of memory in bytes (json-int) @@ -2669,12 +2669,12 @@ Examples: -> { "execute": "query-migrate" } <- { "return": { "status": "completed", +"total-time":12345, +"downtime":12345, "ram":{ "transferred":123, "remaining":123, "total":246, - "total-time":12345, - "downtime":12345, "duplicate":123, "normal":123, "normal-bytes":123456 @@ -2693,12 +2693,12 @@ Examples: <- { "return":{ "status":"active", + "total-time":12345, + "expected-downtime":12345, "ram":{ "transferred":123, "remaining":123, "total":246, -"total-time":12345, -"expected-downtime":12345, "duplicate":123, "normal":123, "normal-bytes":123456 @@ -2712,12 +2712,12 @@ Examples: <- { "return":{ "status":"active", + "total-time":12345, + "expected-downtime":12345, "ram":{ "total":1057024, "remaining":1053304, "transferred":3720, -"total-time":12345, -"expected-downtime":12345, "duplicate":123, "normal":123, "normal-bytes":123456 @@ -2736,13 +2736,13 @@ Examples: <- { "return":{ "status":"active", + "total-time":12345, + "expected-downtime":12345, "capabilities" : [ { "capability": "xbzrle", "state" : true } ], "ram":{ "total":1057024, "remaining":1053304, "transferred":3720, -"total-time":12345, -"expected-downtime":12345, "duplicate":10, "normal":, "normal-bytes":3412992 -- 1.8.1.4
Re: [Qemu-devel] [PATCH] Fix query-migrate documentation in qmp-commands.hx
On 08/08/2013 05:50 PM, Luiz Capitulino wrote: > On Thu, 8 Aug 2013 11:46:14 +0300 > Orit Wasserman wrote: > >> "ram" is present also when migration completes. >> total-time and downtime are no longer part of "ram" data. > > expected-downtime has to be moved too, looks good otherwise. I will fix it in a separate patch. Orit > >> >> Signed-off-by: Orit Wasserman >> --- >> qmp-commands.hx | 14 +++--- >> 1 file changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/qmp-commands.hx b/qmp-commands.hx >> index 2e59b0d..1d43f4c 100644 >> --- a/qmp-commands.hx >> +++ b/qmp-commands.hx >> @@ -2626,8 +2626,8 @@ The main json-object contains the following: >> - "expected-downtime": only present while migration is active >> total amount in ms for downtime that was calculated on >> the last bitmap round (json-int) >> -- "ram": only present if "status" is "active", it is a json-object with the >> - following RAM information: >> +- "ram": only present if "status" is "active" or "complete", it is a >> + json-object with the following RAM information: >> - "transferred": amount transferred in bytes (json-int) >> - "remaining": amount remaining to transfer in bytes (json-int) >> - "total": total amount of memory in bytes (json-int) >> @@ -2669,12 +2669,12 @@ Examples: >> -> { "execute": "query-migrate" } >> <- { "return": { >> "status": "completed", >> +"total-time":12345, >> +"downtime":12345, >> "ram":{ >>"transferred":123, >>"remaining":123, >>"total":246, >> - "total-time":12345, >> - "downtime":12345, >>"duplicate":123, >>"normal":123, >>"normal-bytes":123456 >> @@ -2693,11 +2693,11 @@ Examples: >> <- { >>"return":{ >> "status":"active", >> + "total-time":12345, >> "ram":{ >> "transferred":123, >> "remaining":123, >> "total":246, >> -"total-time":12345, >> "expected-downtime":12345, >> "duplicate":123, >> "normal":123, >> @@ -2712,11 +2712,11 @@ Examples: >> <- { >>"return":{ >> "status":"active", >> + "total-time":12345, >> "ram":{ >> "total":1057024, >> "remaining":1053304, >> "transferred":3720, >> -"total-time":12345, >> "expected-downtime":12345, >> "duplicate":123, >> "normal":123, >> @@ -2736,12 +2736,12 @@ Examples: >> <- { >>"return":{ >> "status":"active", >> + "total-time":12345, >> "capabilities" : [ { "capability": "xbzrle", "state" : true } ], >> "ram":{ >> "total":1057024, >> "remaining":1053304, >> "transferred":3720, >> -"total-time":12345, >> "expected-downtime":12345, >> "duplicate":10, >> "normal":, >
[Qemu-devel] [PATCH] Fix query-migrate documentation in qmp-commands.hx
"ram" is present also when migration completes. total-time and downtime are no longer part of "ram" data. Signed-off-by: Orit Wasserman --- qmp-commands.hx | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 2e59b0d..1d43f4c 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -2626,8 +2626,8 @@ The main json-object contains the following: - "expected-downtime": only present while migration is active total amount in ms for downtime that was calculated on the last bitmap round (json-int) -- "ram": only present if "status" is "active", it is a json-object with the - following RAM information: +- "ram": only present if "status" is "active" or "complete", it is a + json-object with the following RAM information: - "transferred": amount transferred in bytes (json-int) - "remaining": amount remaining to transfer in bytes (json-int) - "total": total amount of memory in bytes (json-int) @@ -2669,12 +2669,12 @@ Examples: -> { "execute": "query-migrate" } <- { "return": { "status": "completed", +"total-time":12345, +"downtime":12345, "ram":{ "transferred":123, "remaining":123, "total":246, - "total-time":12345, - "downtime":12345, "duplicate":123, "normal":123, "normal-bytes":123456 @@ -2693,11 +2693,11 @@ Examples: <- { "return":{ "status":"active", + "total-time":12345, "ram":{ "transferred":123, "remaining":123, "total":246, -"total-time":12345, "expected-downtime":12345, "duplicate":123, "normal":123, @@ -2712,11 +2712,11 @@ Examples: <- { "return":{ "status":"active", + "total-time":12345, "ram":{ "total":1057024, "remaining":1053304, "transferred":3720, -"total-time":12345, "expected-downtime":12345, "duplicate":123, "normal":123, @@ -2736,12 +2736,12 @@ Examples: <- { "return":{ "status":"active", + "total-time":12345, "capabilities" : [ { "capability": "xbzrle", "state" : true } ], "ram":{ "total":1057024, "remaining":1053304, "transferred":3720, -"total-time":12345, "expected-downtime":12345, "duplicate":10, "normal":, -- 1.8.1.4
Re: [Qemu-devel] [PATCH for-1.6 4/4] rdma: proper getaddrinfo() handling
On 08/07/2013 07:05 PM, mrhi...@linux.vnet.ibm.com wrote: > From: "Michael R. Hines" > > getaddrinfo() already knows what it's doing, > wqand can potentially return multiple addresses. > > Signed-off-by: Michael R. Hines > --- > migration-rdma.c | 56 > -- > 1 file changed, 29 insertions(+), 27 deletions(-) > > diff --git a/migration-rdma.c b/migration-rdma.c > index 30e08cd..d71cca5 100644 > --- a/migration-rdma.c > +++ b/migration-rdma.c > @@ -392,7 +392,6 @@ typedef struct RDMAContext { > uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX]; > > GHashTable *blockmap; > -bool ipv6; > } RDMAContext; > > /* > @@ -745,7 +744,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, > Error **errp) > char port_str[16]; > struct rdma_cm_event *cm_event; > char ip[40] = "unknown"; > -int af = rdma->ipv6 ? PF_INET6 : PF_INET; > +struct addrinfo *e; > > if (rdma->host == NULL || !strcmp(rdma->host, "")) { > ERROR(errp, "RDMA hostname has not been set"); > @@ -775,18 +774,23 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, > Error **errp) > goto err_resolve_get_addr; > } > > -inet_ntop(af, &((struct sockaddr_in *) res->ai_addr)->sin_addr, > -ip, sizeof ip); > -DPRINTF("%s => %s\n", rdma->host, ip); > +for (e = res; e != NULL; e = e->ai_next) { > +inet_ntop(e->ai_family, > +&((struct sockaddr_in *) e->ai_addr)->sin_addr, ip, sizeof ip); > +DPRINTF("Trying %s => %s\n", rdma->host, ip); > > -/* resolve the first address */ > -ret = rdma_resolve_addr(rdma->cm_id, NULL, res->ai_addr, > -RDMA_RESOLVE_TIMEOUT_MS); > -if (ret) { > -ERROR(errp, "could not resolve address %s", rdma->host); > -goto err_resolve_get_addr; > +/* resolve the first address */ > +ret = rdma_resolve_addr(rdma->cm_id, NULL, e->ai_addr, > +RDMA_RESOLVE_TIMEOUT_MS); > +if (!ret) { > +goto route; > +} > } > > +ERROR(errp, "could not resolve address %s", rdma->host); > +goto err_resolve_get_addr; > + > +route: > qemu_rdma_dump_gid("source_resolve_addr", rdma->cm_id); > > ret = rdma_get_cm_event(rdma->channel, &cm_event); > @@ -2260,8 +2264,6 @@ err_rdma_source_connect: > static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp) > { > int ret = -EINVAL, idx; > -int af = rdma->ipv6 ? PF_INET6 : PF_INET; > -struct sockaddr_in sin; > struct rdma_cm_id *listen_id; > char ip[40] = "unknown"; > struct addrinfo *res; > @@ -2292,35 +2294,36 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, > Error **errp) > goto err_dest_init_create_listen_id; > } > > -memset(&sin, 0, sizeof(sin)); > -sin.sin_family = af; > -sin.sin_port = htons(rdma->port); > snprintf(port_str, 16, "%d", rdma->port); > port_str[15] = '\0'; > > if (rdma->host && strcmp("", rdma->host)) { > +struct addrinfo *e; > + > ret = getaddrinfo(rdma->host, port_str, NULL, &res); > if (ret < 0) { > ERROR(errp, "could not getaddrinfo address %s", rdma->host); > goto err_dest_init_bind_addr; > } > > +for (e = res; e != NULL; e = e->ai_next) { > +inet_ntop(e->ai_family, > +&((struct sockaddr_in *) e->ai_addr)->sin_addr, ip, sizeof > ip); > +DPRINTF("Trying %s => %s\n", rdma->host, ip); > +ret = rdma_bind_addr(listen_id, e->ai_addr); > +if (!ret) { > +goto listen; > +} > +} > > -inet_ntop(af, &((struct sockaddr_in *) res->ai_addr)->sin_addr, > -ip, sizeof ip); > +ERROR(errp, "Error: could not rdma_bind_addr!"); > +goto err_dest_init_bind_addr; > } else { > ERROR(errp, "migration host and port not specified!"); > ret = -EINVAL; > goto err_dest_init_bind_addr; > } > - > -DPRINTF("%s => %s\n", rdma->host, ip); > - > - ret = rdma_bind_addr(listen_id, res->ai_addr); > -if (ret) { > -ERROR(errp, "Error: could not rdma_bind_addr!"); > -goto err_dest_init_bind_addr; > -} > +listen: > > rdma->listen_id = listen_id; > qemu_rdma_dump_gid("dest_init", listen_id); > @@ -2351,7 +2354,6 @@ static void *qemu_rdma_data_init(const char *host_port, > Error **errp) > if (addr != NULL) { > rdma->port = atoi(addr->port); > rdma->host = g_strdup(addr->host); > -rdma->ipv6 = addr->ipv6; > } else { > ERROR(errp, "bad RDMA migration address '%s'", host_port); > g_free(rdma); > Looks good, Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH 0/3] rdma: validate remote provided RDMAControlHeader::len
On 08/07/2013 05:26 AM, Isaku Yamahata wrote: > RDMAControlHeader::len is remote-provided. So validate the value before use. > > Isaku Yamahata (3): > rdma: use resp.len after validation in qemu_rdma_registration_stop > rdma: validate RDMAControlHeader::len > rdma: check if RDMAControlHeader::len match transferred byte > > migration-rdma.c | 44 ++-- > 1 file changed, 30 insertions(+), 14 deletions(-) > Series Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH v3 For-1.6 3/7] rdma: correct newlines in error statements
goto err_dest_init_create_listen_id; > } > > @@ -2279,7 +2279,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error > **errp) > if (rdma->host && strcmp("", rdma->host)) { > ret = getaddrinfo(rdma->host, port_str, NULL, &res); > if (ret < 0) { > -ERROR(errp, "could not getaddrinfo address %s\n", rdma->host); > +ERROR(errp, "could not getaddrinfo address %s", rdma->host); > goto err_dest_init_bind_addr; > } > > @@ -2287,7 +2287,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error > **errp) > inet_ntop(af, &((struct sockaddr_in *) res->ai_addr)->sin_addr, > ip, sizeof ip); > } else { > -ERROR(errp, "migration host and port not specified!\n"); > +ERROR(errp, "migration host and port not specified!"); > ret = -EINVAL; > goto err_dest_init_bind_addr; > } > @@ -2296,7 +2296,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error > **errp) > > ret = rdma_bind_addr(listen_id, res->ai_addr); > if (ret) { > -ERROR(errp, "Error: could not rdma_bind_addr!\n"); > +ERROR(errp, "Error: could not rdma_bind_addr!"); > goto err_dest_init_bind_addr; > } > > @@ -3036,7 +3036,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, > void *opaque, > ®_result_idx, rdma->pin_all ? > qemu_rdma_reg_whole_ram_blocks : NULL); > if (ret < 0) { > -ERROR(errp, "receiving remote info!\n"); > +ERROR(errp, "receiving remote info!"); > return ret; > } > > @@ -3061,7 +3061,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, > void *opaque, > if (local->nb_blocks != nb_remote_blocks) { > ERROR(errp, "ram blocks mismatch #1! " > "Your QEMU command line parameters are probably " > -"not identical on both the source and > destination.\n"); > +"not identical on both the source and destination."); > return -EINVAL; > } > > @@ -3077,7 +3077,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, > void *opaque, > if (rdma->block[i].length != local->block[j].length) { > ERROR(errp, "ram blocks mismatch #2! " > "Your QEMU command line parameters are probably " > -"not identical on both the source and > destination.\n"); > +"not identical on both the source and destination."); > return -EINVAL; > } > local->block[j].remote_host_addr = > @@ -3089,7 +3089,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, > void *opaque, > if (j >= local->nb_blocks) { > ERROR(errp, "ram blocks mismatch #3! " > "Your QEMU command line parameters are probably " > -"not identical on both the source and > destination.\n"); > +"not identical on both the source and destination."); > return -EINVAL; > } > } > @@ -3163,7 +3163,7 @@ static void rdma_accept_incoming_migration(void *opaque) > ret = qemu_rdma_accept(rdma); > > if (ret) { > -ERROR(errp, "RDMA Migration initialization failed!\n"); > +ERROR(errp, "RDMA Migration initialization failed!"); > return; > } > > @@ -3171,7 +3171,7 @@ static void rdma_accept_incoming_migration(void *opaque) > > f = qemu_fopen_rdma(rdma, "rb"); > if (f == NULL) { > -ERROR(errp, "could not qemu_fopen_rdma!\n"); > +ERROR(errp, "could not qemu_fopen_rdma!"); > qemu_rdma_cleanup(rdma); > return; > } > @@ -3204,7 +3204,7 @@ void rdma_start_incoming_migration(const char > *host_port, Error **errp) > ret = rdma_listen(rdma->listen_id, 5); > > if (ret) { > -ERROR(errp, "listening on socket!\n"); > +ERROR(errp, "listening on socket!"); > goto err; > } > > @@ -3228,7 +3228,7 @@ void rdma_start_outgoing_migration(void *opaque, > int ret = 0; > > if (rdma == NULL) { > -ERROR(temp, "Failed to initialize RDMA data structures! %d\n", ret); > +ERROR(temp, "Failed to initialize RDMA data structures! %d", ret); > goto err; > } > > Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH v3 For-1.6 2/7] rdma: forgot to turn off the debugging flag
On 08/04/2013 05:54 AM, mrhi...@linux.vnet.ibm.com wrote: > From: "Michael R. Hines" > > Ooops. We forgot to turn off the flag. > > Signed-off-by: Michael R. Hines > --- > migration-rdma.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/migration-rdma.c b/migration-rdma.c > index 9cf73e3..fe6118d 100644 > --- a/migration-rdma.c > +++ b/migration-rdma.c > @@ -27,7 +27,7 @@ > #include > #include > > -#define DEBUG_RDMA > +//#define DEBUG_RDMA > //#define DEBUG_RDMA_VERBOSE > //#define DEBUG_RDMA_REALLY_VERBOSE > > Reviewed-by: Orit Wasserman
Re: [Qemu-devel] [PATCH v3 For-1.6 1/7] rdma: bugfix: make IPv6 support work
On 08/04/2013 05:54 AM, mrhi...@linux.vnet.ibm.com wrote: > From: "Michael R. Hines" > > RDMA does not use sockets, so we cannot use many of the socket > helper functions, but we *do* use inet_parse() which gives > RDMA all the necessary details of the connection parameters. > > However, when testing with libvirt, a simple IPv6 migration test failed > because we were not using getaddrinfo() properly. > > This makes IPv6 migration over RDMA work. > > Signed-off-by: Michael R. Hines > --- > migration-rdma.c | 33 + > 1 file changed, 21 insertions(+), 12 deletions(-) > > diff --git a/migration-rdma.c b/migration-rdma.c > index d044830..9cf73e3 100644 > --- a/migration-rdma.c > +++ b/migration-rdma.c > @@ -392,6 +392,7 @@ typedef struct RDMAContext { > uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX]; > > GHashTable *blockmap; > +bool ipv6; > } RDMAContext; > > /* > @@ -744,6 +745,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, > Error **errp) > char port_str[16]; > struct rdma_cm_event *cm_event; > char ip[40] = "unknown"; > +int af = rdma->ipv6 ? PF_INET6 : PF_INET; > > if (rdma->host == NULL || !strcmp(rdma->host, "")) { > ERROR(errp, "RDMA hostname has not been set\n"); > @@ -773,7 +775,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, > Error **errp) > goto err_resolve_get_addr; > } > > -inet_ntop(AF_INET, &((struct sockaddr_in *) res->ai_addr)->sin_addr, > +inet_ntop(af, &((struct sockaddr_in *) res->ai_addr)->sin_addr, > ip, sizeof ip); > DPRINTF("%s => %s\n", rdma->host, ip); > > @@ -2236,9 +2238,12 @@ err_rdma_source_connect: > static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp) > { > int ret = -EINVAL, idx; > +int af = rdma->ipv6 ? PF_INET6 : PF_INET; > struct sockaddr_in sin; > struct rdma_cm_id *listen_id; > char ip[40] = "unknown"; > +struct addrinfo *res; > +char port_str[16]; > > for (idx = 0; idx <= RDMA_WRID_MAX; idx++) { > rdma->wr_data[idx].control_len = 0; > @@ -2266,27 +2271,30 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, > Error **errp) > } > > memset(&sin, 0, sizeof(sin)); > -sin.sin_family = AF_INET; > +sin.sin_family = af; > sin.sin_port = htons(rdma->port); > +snprintf(port_str, 16, "%d", rdma->port); > +port_str[15] = '\0'; > > if (rdma->host && strcmp("", rdma->host)) { > -struct hostent *dest_addr; > -dest_addr = gethostbyname(rdma->host); > -if (!dest_addr) { > -ERROR(errp, "migration could not gethostbyname!\n"); > -ret = -EINVAL; > +ret = getaddrinfo(rdma->host, port_str, NULL, &res); Hi Michael, getaddrinfo can return a list of addresses, you need to handle it. Look at qemu-sockets.c for example. Regards, Orit > +if (ret < 0) { > +ERROR(errp, "could not getaddrinfo address %s\n", rdma->host); > goto err_dest_init_bind_addr; > } > -memcpy(&sin.sin_addr.s_addr, dest_addr->h_addr, > -dest_addr->h_length); > -inet_ntop(AF_INET, dest_addr->h_addr, ip, sizeof ip); > + > + > +inet_ntop(af, &((struct sockaddr_in *) res->ai_addr)->sin_addr, > +ip, sizeof ip); > } else { > -sin.sin_addr.s_addr = INADDR_ANY; > +ERROR(errp, "migration host and port not specified!\n"); > +ret = -EINVAL; > +goto err_dest_init_bind_addr; > } > > DPRINTF("%s => %s\n", rdma->host, ip); > > -ret = rdma_bind_addr(listen_id, (struct sockaddr *)&sin); > +ret = rdma_bind_addr(listen_id, res->ai_addr); > if (ret) { > ERROR(errp, "Error: could not rdma_bind_addr!\n"); > goto err_dest_init_bind_addr; > @@ -2321,6 +2329,7 @@ static void *qemu_rdma_data_init(const char *host_port, > Error **errp) > if (addr != NULL) { > rdma->port = atoi(addr->port); > rdma->host = g_strdup(addr->host); > +rdma->ipv6 = addr->ipv6; > } else { > ERROR(errp, "bad RDMA migration address '%s'", host_port); > g_free(rdma); >
Re: [Qemu-devel] [PATCH] rdma: bugfix: make IPv6 support work
On 07/31/2013 04:39 PM, Michael R. Hines wrote: > On 07/30/2013 11:31 AM, Orit Wasserman wrote: >> On 07/30/2013 05:57 PM, Michael R. Hines wrote: >>> On 07/30/2013 04:14 AM, Orit Wasserman wrote: >>>> On 07/27/2013 05:23 AM, mrhi...@linux.vnet.ibm.com wrote: >>>>> From: "Michael R. Hines" >>>>> >>>>> When testing with libvirt, a simple IPv6 migration test failed >>>>> because we were not using getaddrinfo() properly. >>>>> This makes IPv6 migration over RDMA work. >>>>> >>>>> Also, we forgot to turn the DPRINTF flag off =). >>>>> >>>>> Signed-off-by: Michael R. Hines >>>>> --- >>>>>migration-rdma.c | 35 ++- >>>>>1 file changed, 22 insertions(+), 13 deletions(-) >>>>> >>>>> diff --git a/migration-rdma.c b/migration-rdma.c >>>>> index d044830..3256c9b 100644 >>>>> --- a/migration-rdma.c >>>>> +++ b/migration-rdma.c >>>>> @@ -27,7 +27,7 @@ >>>>>#include >>>>>#include >>>>>-#define DEBUG_RDMA >>>>> +//#define DEBUG_RDMA >>>> Can you put this in a separate patch? >>> Acknowledged. >>> >>>>>//#define DEBUG_RDMA_VERBOSE >>>>>//#define DEBUG_RDMA_REALLY_VERBOSE >>>>>@@ -392,6 +392,7 @@ typedef struct RDMAContext { >>>>>uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX]; >>>>> GHashTable *blockmap; >>>>> +bool ipv6; >>>>>} RDMAContext; >>>>> /* >>>>> @@ -744,6 +745,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, >>>>> Error **errp) >>>>>char port_str[16]; >>>>>struct rdma_cm_event *cm_event; >>>>>char ip[40] = "unknown"; >>>>> +int af = rdma->ipv6 ? PF_INET6 : PF_INET; >>>> We have code that handles ipv6 in utils/qemu-sockets.c, it also handle host >>>> and port parsing please take a look at inet_parse_connect_opts. >>>> I think it can be reused here and for the destination. >>> RDMA cannot use that function - it creates a socket and RDMA does not use >>> sockets. >>> >>> RDMA is *already*, however, using inet_parse() which does exactly what you >>> said. >>> >> You can update the function so it can be used for RDMA also. > > The inet_parse() function does everything that we need - > it already understands IPv6 without opening an actual socket. > > Any more than that would require CONFIG_RDMA to be > conditionalized inside of the utils/ source code, > and that seems like a lot of work for such a small reward. > Agreed, Orit > - Michael > >> Orit >> >>>> Regards, >>>> Orit >>>>> if (rdma->host == NULL || !strcmp(rdma->host, "")) { >>>>>ERROR(errp, "RDMA hostname has not been set\n"); >>>>> @@ -773,7 +775,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, >>>>> Error **errp) >>>>>goto err_resolve_get_addr; >>>>>} >>>>>-inet_ntop(AF_INET, &((struct sockaddr_in *) >>>>> res->ai_addr)->sin_addr, >>>>> +inet_ntop(af, &((struct sockaddr_in *) res->ai_addr)->sin_addr, >>>>>ip, sizeof ip); >>>>>DPRINTF("%s => %s\n", rdma->host, ip); >>>>>@@ -2236,9 +2238,12 @@ err_rdma_source_connect: >>>>>static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp) >>>>>{ >>>>>int ret = -EINVAL, idx; >>>>> +int af = rdma->ipv6 ? PF_INET6 : PF_INET; >>>>>struct sockaddr_in sin; >>>>>struct rdma_cm_id *listen_id; >>>>>char ip[40] = "unknown"; >>>>> +struct addrinfo *res; >>>>> +char port_str[16]; >>>>> for (idx = 0; idx <= RDMA_WRID_MAX; idx++) { >>>>>rdma->wr_data[idx].control_len = 0; >>>>> @@ -2266,27 +2271,30 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, >>>>> Error **errp) >>>>>} >>>>> memset(&sin, 0, sizeof(sin));
Re: [Qemu-devel] [PATCH] rdma: bugfix: make IPv6 support work
On 07/30/2013 05:57 PM, Michael R. Hines wrote: > On 07/30/2013 04:14 AM, Orit Wasserman wrote: >> On 07/27/2013 05:23 AM, mrhi...@linux.vnet.ibm.com wrote: >>> From: "Michael R. Hines" >>> >>> When testing with libvirt, a simple IPv6 migration test failed >>> because we were not using getaddrinfo() properly. >>> This makes IPv6 migration over RDMA work. >>> >>> Also, we forgot to turn the DPRINTF flag off =). >>> >>> Signed-off-by: Michael R. Hines >>> --- >>> migration-rdma.c | 35 ++- >>> 1 file changed, 22 insertions(+), 13 deletions(-) >>> >>> diff --git a/migration-rdma.c b/migration-rdma.c >>> index d044830..3256c9b 100644 >>> --- a/migration-rdma.c >>> +++ b/migration-rdma.c >>> @@ -27,7 +27,7 @@ >>> #include >>> #include >>> -#define DEBUG_RDMA >>> +//#define DEBUG_RDMA >> Can you put this in a separate patch? > > Acknowledged. > >>> //#define DEBUG_RDMA_VERBOSE >>> //#define DEBUG_RDMA_REALLY_VERBOSE >>> @@ -392,6 +392,7 @@ typedef struct RDMAContext { >>> uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX]; >>> GHashTable *blockmap; >>> +bool ipv6; >>> } RDMAContext; >>> /* >>> @@ -744,6 +745,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, >>> Error **errp) >>> char port_str[16]; >>> struct rdma_cm_event *cm_event; >>> char ip[40] = "unknown"; >>> +int af = rdma->ipv6 ? PF_INET6 : PF_INET; >> We have code that handles ipv6 in utils/qemu-sockets.c, it also handle host >> and port parsing please take a look at inet_parse_connect_opts. >> I think it can be reused here and for the destination. > > RDMA cannot use that function - it creates a socket and RDMA does not use > sockets. > > RDMA is *already*, however, using inet_parse() which does exactly what you > said. > You can update the function so it can be used for RDMA also. Orit >> Regards, >> Orit >>> if (rdma->host == NULL || !strcmp(rdma->host, "")) { >>> ERROR(errp, "RDMA hostname has not been set\n"); >>> @@ -773,7 +775,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, >>> Error **errp) >>> goto err_resolve_get_addr; >>> } >>> -inet_ntop(AF_INET, &((struct sockaddr_in *) res->ai_addr)->sin_addr, >>> +inet_ntop(af, &((struct sockaddr_in *) res->ai_addr)->sin_addr, >>> ip, sizeof ip); >>> DPRINTF("%s => %s\n", rdma->host, ip); >>> @@ -2236,9 +2238,12 @@ err_rdma_source_connect: >>> static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp) >>> { >>> int ret = -EINVAL, idx; >>> +int af = rdma->ipv6 ? PF_INET6 : PF_INET; >>> struct sockaddr_in sin; >>> struct rdma_cm_id *listen_id; >>> char ip[40] = "unknown"; >>> +struct addrinfo *res; >>> +char port_str[16]; >>> for (idx = 0; idx <= RDMA_WRID_MAX; idx++) { >>> rdma->wr_data[idx].control_len = 0; >>> @@ -2266,27 +2271,30 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, >>> Error **errp) >>> } >>> memset(&sin, 0, sizeof(sin)); >>> -sin.sin_family = AF_INET; >>> +sin.sin_family = af; >>> sin.sin_port = htons(rdma->port); >>> +snprintf(port_str, 16, "%d", rdma->port); >>> +port_str[15] = '\0'; >>> if (rdma->host && strcmp("", rdma->host)) { >>> -struct hostent *dest_addr; >>> -dest_addr = gethostbyname(rdma->host); >>> -if (!dest_addr) { >>> -ERROR(errp, "migration could not gethostbyname!\n"); >>> -ret = -EINVAL; >>> +ret = getaddrinfo(rdma->host, port_str, NULL, &res); >>> +if (ret < 0) { >>> +ERROR(errp, "could not getaddrinfo address %s\n", rdma->host); >>> goto err_dest_init_bind_addr; >>> } >>> -memcpy(&sin.sin_addr.s_addr, dest_addr->h_addr, >>> -dest_addr->h_length); >>> -inet_ntop(AF_INET, dest_addr->h_addr, ip, sizeof ip); >>> + >>
Re: [Qemu-devel] [PATCH] rdma: bugfix: make IPv6 support work
On 07/27/2013 05:23 AM, mrhi...@linux.vnet.ibm.com wrote: > From: "Michael R. Hines" > > When testing with libvirt, a simple IPv6 migration test failed > because we were not using getaddrinfo() properly. > This makes IPv6 migration over RDMA work. > > Also, we forgot to turn the DPRINTF flag off =). > > Signed-off-by: Michael R. Hines > --- > migration-rdma.c | 35 ++- > 1 file changed, 22 insertions(+), 13 deletions(-) > > diff --git a/migration-rdma.c b/migration-rdma.c > index d044830..3256c9b 100644 > --- a/migration-rdma.c > +++ b/migration-rdma.c > @@ -27,7 +27,7 @@ > #include > #include > > -#define DEBUG_RDMA > +//#define DEBUG_RDMA Can you put this in a separate patch? > //#define DEBUG_RDMA_VERBOSE > //#define DEBUG_RDMA_REALLY_VERBOSE > > @@ -392,6 +392,7 @@ typedef struct RDMAContext { > uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX]; > > GHashTable *blockmap; > +bool ipv6; > } RDMAContext; > > /* > @@ -744,6 +745,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, > Error **errp) > char port_str[16]; > struct rdma_cm_event *cm_event; > char ip[40] = "unknown"; > +int af = rdma->ipv6 ? PF_INET6 : PF_INET; We have code that handles ipv6 in utils/qemu-sockets.c, it also handle host and port parsing please take a look at inet_parse_connect_opts. I think it can be reused here and for the destination. Regards, Orit > > if (rdma->host == NULL || !strcmp(rdma->host, "")) { > ERROR(errp, "RDMA hostname has not been set\n"); > @@ -773,7 +775,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, > Error **errp) > goto err_resolve_get_addr; > } > > -inet_ntop(AF_INET, &((struct sockaddr_in *) res->ai_addr)->sin_addr, > +inet_ntop(af, &((struct sockaddr_in *) res->ai_addr)->sin_addr, > ip, sizeof ip); > DPRINTF("%s => %s\n", rdma->host, ip); > > @@ -2236,9 +2238,12 @@ err_rdma_source_connect: > static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp) > { > int ret = -EINVAL, idx; > +int af = rdma->ipv6 ? PF_INET6 : PF_INET; > struct sockaddr_in sin; > struct rdma_cm_id *listen_id; > char ip[40] = "unknown"; > +struct addrinfo *res; > +char port_str[16]; > > for (idx = 0; idx <= RDMA_WRID_MAX; idx++) { > rdma->wr_data[idx].control_len = 0; > @@ -2266,27 +2271,30 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, > Error **errp) > } > > memset(&sin, 0, sizeof(sin)); > -sin.sin_family = AF_INET; > +sin.sin_family = af; > sin.sin_port = htons(rdma->port); > +snprintf(port_str, 16, "%d", rdma->port); > +port_str[15] = '\0'; > > if (rdma->host && strcmp("", rdma->host)) { > -struct hostent *dest_addr; > -dest_addr = gethostbyname(rdma->host); > -if (!dest_addr) { > -ERROR(errp, "migration could not gethostbyname!\n"); > -ret = -EINVAL; > +ret = getaddrinfo(rdma->host, port_str, NULL, &res); > +if (ret < 0) { > +ERROR(errp, "could not getaddrinfo address %s\n", rdma->host); > goto err_dest_init_bind_addr; > } > -memcpy(&sin.sin_addr.s_addr, dest_addr->h_addr, > -dest_addr->h_length); > -inet_ntop(AF_INET, dest_addr->h_addr, ip, sizeof ip); > + > + > +inet_ntop(af, &((struct sockaddr_in *) res->ai_addr)->sin_addr, > +ip, sizeof ip); > } else { > -sin.sin_addr.s_addr = INADDR_ANY; > +ERROR(errp, "migration host and port not specified!\n"); > +ret = -EINVAL; > +goto err_dest_init_bind_addr; > } > > DPRINTF("%s => %s\n", rdma->host, ip); > > -ret = rdma_bind_addr(listen_id, (struct sockaddr *)&sin); > +ret = rdma_bind_addr(listen_id, res->ai_addr); > if (ret) { > ERROR(errp, "Error: could not rdma_bind_addr!\n"); > goto err_dest_init_bind_addr; > @@ -2321,6 +2329,7 @@ static void *qemu_rdma_data_init(const char *host_port, > Error **errp) > if (addr != NULL) { > rdma->port = atoi(addr->port); > rdma->host = g_strdup(addr->host); > +rdma->ipv6 = addr->ipv6; > } else { > ERROR(errp, "bad RDMA migration address '%s'", host_port); > g_free(rdma); >
[Qemu-devel] [PATCH v3 2/2] Fix real mode guest segments dpl value in savevm
Older KVM version put invalid value in the segments registers dpl field for real mode guests (0x3). This breaks migration from those hosts to hosts with unrestricted guest support. We detect it by checking CS dpl value for real mode guest and fix the dpl values of all the segment registers. Signed-off-by: Orit Wasserman Reviewed-by: Juan Quintela --- target-i386/machine.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/target-i386/machine.c b/target-i386/machine.c index af4c167..f9ec581 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -252,6 +252,24 @@ static void cpu_pre_save(void *opaque) } env->fpregs_format_vmstate = 0; + +/* + * Real mode guest segments register DPL should be zero. + * Older KVM version were setting it wrongly. + * Fixing it will allow live migration to host with unrestricted guest + * support (otherwise the migration will fail with invalid guest state + * error). + */ +if (!(env->cr[0] & CR0_PE_MASK) && +(env->segs[R_CS].flags >> DESC_DPL_SHIFT & 3) != 0) { +env->segs[R_CS].flags &= ~(env->segs[R_CS].flags & DESC_DPL_MASK); +env->segs[R_DS].flags &= ~(env->segs[R_DS].flags & DESC_DPL_MASK); +env->segs[R_ES].flags &= ~(env->segs[R_ES].flags & DESC_DPL_MASK); +env->segs[R_FS].flags &= ~(env->segs[R_FS].flags & DESC_DPL_MASK); +env->segs[R_GS].flags &= ~(env->segs[R_GS].flags & DESC_DPL_MASK); +env->segs[R_SS].flags &= ~(env->segs[R_SS].flags & DESC_DPL_MASK); +} + } static int cpu_post_load(void *opaque, int version_id) -- 1.8.1.4
[Qemu-devel] [PATCH v3 1/2] Fix real mode guest migration
Older KVM versions save CS dpl value to an invalid value for real mode guests (0x3). This patch detect this situation when loading CPU state and set all the segments dpl to zero. This will allow migration from older KVM on host without unrestricted guest to hosts with restricted guest support. For example migration from a Penryn host (with kernel 2.6.32) to a Westmere host (for real mode guest) will fail with "kvm: unhandled exit 8021". Signed-off-by: Orit Wasserman Reviewed-by: Juan Quintela --- target-i386/machine.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/target-i386/machine.c b/target-i386/machine.c index 3659db9..af4c167 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -260,6 +260,24 @@ static int cpu_post_load(void *opaque, int version_id) CPUX86State *env = &cpu->env; int i; +/* + * Real mode guest segments register DPL should be zero. + * Older KVM version were setting it wrongly. + * Fixing it will allow live migration from such host that don't have + * restricted guest support to a host with unrestricted guest support + * (otherwise the migration will fail with invalid guest state + * error). + */ +if (!(env->cr[0] & CR0_PE_MASK) && +(env->segs[R_CS].flags >> DESC_DPL_SHIFT & 3) != 0) { +env->segs[R_CS].flags &= ~(env->segs[R_CS].flags & DESC_DPL_MASK); +env->segs[R_DS].flags &= ~(env->segs[R_DS].flags & DESC_DPL_MASK); +env->segs[R_ES].flags &= ~(env->segs[R_ES].flags & DESC_DPL_MASK); +env->segs[R_FS].flags &= ~(env->segs[R_FS].flags & DESC_DPL_MASK); +env->segs[R_GS].flags &= ~(env->segs[R_GS].flags & DESC_DPL_MASK); +env->segs[R_SS].flags &= ~(env->segs[R_SS].flags & DESC_DPL_MASK); +} + /* XXX: restore FPU round state */ env->fpstt = (env->fpus_vmstate >> 11) & 7; env->fpus = env->fpus_vmstate & ~0x3800; -- 1.8.1.4
[Qemu-devel] [PATCH v3 0/2] Fix real mode guest live migration
Older KVM versions save CS dpl value to an invalid value for real mode guests (0x3). This caused migration to fail from older KVM on host without unrestricted guest to hosts with restricted guest support with "kvm: unhandled exit 8021". For example migration from a Penryn host (with kernel 2.6.32) to a Westmere host (for real mode guest) This series fixing it both in the destination and source QEMU. Changes from v2: Fix more typos Changes from v1: Fix typos and style Orit Wasserman (2): Fix real mode guest migration Fix real mode guest segments dpl value in savevm target-i386/machine.c | 36 1 file changed, 36 insertions(+) -- 1.8.1.4
[Qemu-devel] [PATCH v2 1/2] Fix real mode guest migration
Older KVM versions save CS dpl value to an invalid value for real mode guests (0x3). This patch detect this situation when loading CPU state and set all the segments dpl to zero. This will allow migration from older KVM on host without unrestricted guest to hosts with restricted guest support. For example migration from a Penryn host (with kernel 2.6.32) to a Westmere host (for real mode guest) will fail with "kvm: unhandled exit 8021". Signed-off-by: Orit Wasserman --- target-i386/machine.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/target-i386/machine.c b/target-i386/machine.c index 3659db9..83ea6ac 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -260,6 +260,24 @@ static int cpu_post_load(void *opaque, int version_id) CPUX86State *env = &cpu->env; int i; +/* + * Real mode guest segments register DPL should be zero. + * Older KVM version were setting it wrongly. + * Fixing it will allow live migration from such host that don't have + * restricted guest support to an host with unrestricted guest support + * (otherwise the migration will fail with invalid guest state + * error). + */ +if (!(env->cr[0] & CR0_PE_MASK) && +(env->segs[R_CS].flags >> DESC_DPL_SHIFT & 3) != 0) { +env->segs[R_CS].flags &= ~(env->segs[R_CS].flags & DESC_DPL_MASK); +env->segs[R_DS].flags &= ~(env->segs[R_DS].flags & DESC_DPL_MASK); +env->segs[R_ES].flags &= ~(env->segs[R_ES].flags & DESC_DPL_MASK); +env->segs[R_FS].flags &= ~(env->segs[R_FS].flags & DESC_DPL_MASK); +env->segs[R_GS].flags &= ~(env->segs[R_GS].flags & DESC_DPL_MASK); +env->segs[R_SS].flags &= ~(env->segs[R_SS].flags & DESC_DPL_MASK); +} + /* XXX: restore FPU round state */ env->fpstt = (env->fpus_vmstate >> 11) & 7; env->fpus = env->fpus_vmstate & ~0x3800; -- 1.8.1.4
[Qemu-devel] [PATCH v2 0/2] Fix real mode guest live migration
Older KVM versions save CS dpl value to an invalid value for real mode guests (0x3). This caused migration to fail from older KVM on host without unrestricted guest to hosts with restricted guest support with "kvm: unhandled exit 8021". For example migration from a Penryn host (with kernel 2.6.32) to a Westmere host (for real mode guest) This series fixing it both in the destination and source QEMU. Changes from v1: Fix typos and style Orit Wasserman (2): Fix real mode guest migration Fix real mode guest segments dpl value in savevm target-i386/machine.c | 36 1 file changed, 36 insertions(+) -- 1.8.1.4
[Qemu-devel] [PATCH v2 2/2] Fix real mode guest segments dpl value in savevm
Older KVM version put invalid value in the segments registers dpl field for real mode guests (0x3). This breaks migration from those hosts to hosts with unrestricted guest support. We detect it by checking CS dpl value for real mode guest and fix the dpl values of all the segment registers. Signed-off-by: Orit Wasserman --- target-i386/machine.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/target-i386/machine.c b/target-i386/machine.c index 83ea6ac..c98f1c9 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -252,6 +252,24 @@ static void cpu_pre_save(void *opaque) } env->fpregs_format_vmstate = 0; + +/* + * Real mode guest segments register DPL should be zero. + * Older KVM version were setting it wrongly. + * Fixing it will allow live migration to host with unrestricted guest + * support (otherwise the migration will fail with invalid guest state + * error). + */ +if (!(env->cr[0] & CR0_PE_MASK) && +(env->segs[R_CS].flags >> DESC_DPL_SHIFT & 3) != 0) { +env->segs[R_CS].flags &= ~(env->segs[R_CS].flags & DESC_DPL_MASK); +env->segs[R_DS].flags &= ~(env->segs[R_DS].flags & DESC_DPL_MASK); +env->segs[R_ES].flags &= ~(env->segs[R_ES].flags & DESC_DPL_MASK); +env->segs[R_FS].flags &= ~(env->segs[R_FS].flags & DESC_DPL_MASK); +env->segs[R_GS].flags &= ~(env->segs[R_GS].flags & DESC_DPL_MASK); +env->segs[R_SS].flags &= ~(env->segs[R_SS].flags & DESC_DPL_MASK); +} + } static int cpu_post_load(void *opaque, int version_id) -- 1.8.1.4
Re: [Qemu-devel] [PATCH 1/2] Fix real mode guest migration
On 07/22/2013 01:33 PM, Andreas Färber wrote: > Am 22.07.2013 11:49, schrieb Paolo Bonzini: >> Il 22/07/2013 08:49, Orit Wasserman ha scritto: >>> Older KVM versions save CS dpl value to an invalid value for real mode >>> guests >>> (0x3). This patch detect this situation when loading CPU state and set all >>> the >>> segments dpl to zero. >>> This will allow migration from older KVM on host without unrestricted guest >>> to hosts with restricted guest support. >>> For example migration from a Penryn host (with kernel 2.6.32) to >>> a Westmere host. >>> >>> Signed-off-by: Orit Wasserman >>> --- >>> target-i386/machine.c | 18 ++ >>> 1 file changed, 18 insertions(+) >>> >>> diff --git a/target-i386/machine.c b/target-i386/machine.c >>> index 3659db9..7e95829 100644 >>> --- a/target-i386/machine.c >>> +++ b/target-i386/machine.c >>> @@ -260,6 +260,24 @@ static int cpu_post_load(void *opaque, int version_id) >>> CPUX86State *env = &cpu->env; >>> int i; >>> >>> +/* >>> + Real mode guest segments register DPL should be zero. >>> + Older KVM version were setting it worngly. >>> + Fixing it will allow live migration from such host that don't have >>> + restricted guest support to an host with unrestricted guest support >>> + (otherwise the migration will fail with invalid guest state >>> + error). >>> +*/ >> >> Coding standard asks for *s on every line. >> >> As discussed offlist, I would prefer to have this in the kernel since >> that's where the bug is. Gleb disagrees. >> >> We need to find a third person who mediates... Anthony, Eduardo, what >> do you think? > > Having the code here does not look wrong to me, to enforce a consistent > state inside QEMU. > > However I wonder what happens without this patch on Westmere? Might it > make sense to sanitize or at least "assert" (whatever the kernel > equivalent is ;)) in the ioctl setting X86CPU state to the vCPU that the > incoming values will be valid for the host CPU? And optionally in QEMU's > KVM code for the reverse direction, cpu_synchronize_state(), to cope > with older kernels? > Without the patch we get "kvm: unhandled exit 8021" error in incoming migration or loadvm. This is a KVM error (kernel) which translates to invalid guest state.This happens only in migration of a real mode guest. The problem in fixing the values in cpu_synchronize_state is that the function is called in many places in the code. As real mode code is very complex (Gleb can attest to that) I prefer a fix that has a very limited scope like fixing it in the cpu_post_load and cpu_pre_save function that are only used in savevm and live migration. Orit > Regards, > Andreas > >>> +if (!(env->cr[0] & CR0_PE_MASK) && >>> + (env->segs[R_CS].flags >> DESC_DPL_SHIFT & 3) != 0) { >>> +env->segs[R_CS].flags &= ~(env->segs[R_CS].flags & DESC_DPL_MASK); >>> +env->segs[R_DS].flags &= ~(env->segs[R_DS].flags & DESC_DPL_MASK); >>> +env->segs[R_ES].flags &= ~(env->segs[R_ES].flags & DESC_DPL_MASK); >>> +env->segs[R_FS].flags &= ~(env->segs[R_FS].flags & DESC_DPL_MASK); >>> +env->segs[R_GS].flags &= ~(env->segs[R_GS].flags & DESC_DPL_MASK); >>> +env->segs[R_SS].flags &= ~(env->segs[R_SS].flags & DESC_DPL_MASK); >>> +} >>> + >>> /* XXX: restore FPU round state */ >>> env->fpstt = (env->fpus_vmstate >> 11) & 7; >>> env->fpus = env->fpus_vmstate & ~0x3800; >
Re: [Qemu-devel] [PATCH 1/2] Fix real mode guest migration
On 07/22/2013 12:49 PM, Paolo Bonzini wrote: > Il 22/07/2013 08:49, Orit Wasserman ha scritto: >> Older KVM versions save CS dpl value to an invalid value for real mode guests >> (0x3). This patch detect this situation when loading CPU state and set all >> the >> segments dpl to zero. >> This will allow migration from older KVM on host without unrestricted guest >> to hosts with restricted guest support. >> For example migration from a Penryn host (with kernel 2.6.32) to >> a Westmere host. >> >> Signed-off-by: Orit Wasserman >> --- >> target-i386/machine.c | 18 ++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/target-i386/machine.c b/target-i386/machine.c >> index 3659db9..7e95829 100644 >> --- a/target-i386/machine.c >> +++ b/target-i386/machine.c >> @@ -260,6 +260,24 @@ static int cpu_post_load(void *opaque, int version_id) >> CPUX86State *env = &cpu->env; >> int i; >> >> +/* >> + Real mode guest segments register DPL should be zero. >> + Older KVM version were setting it worngly. >> + Fixing it will allow live migration from such host that don't have >> + restricted guest support to an host with unrestricted guest support >> + (otherwise the migration will fail with invalid guest state >> + error). >> +*/ > > Coding standard asks for *s on every line. I will fix it in v2. By the way checkpatch.pl didn't send a warning about this it will be nice to add. > > As discussed offlist, I would prefer to have this in the kernel since > that's where the bug is. Gleb disagrees. > As this is a migration bug I prefer to fix it in the migration code only. Fixing it in the kernel will effect all scenarios that load the segments, in userspace it is only in load VM code. Orit > We need to find a third person who mediates... Anthony, Eduardo, what > do you think? > > Paolo > >> +if (!(env->cr[0] & CR0_PE_MASK) && >> + (env->segs[R_CS].flags >> DESC_DPL_SHIFT & 3) != 0) { >> +env->segs[R_CS].flags &= ~(env->segs[R_CS].flags & DESC_DPL_MASK); >> +env->segs[R_DS].flags &= ~(env->segs[R_DS].flags & DESC_DPL_MASK); >> +env->segs[R_ES].flags &= ~(env->segs[R_ES].flags & DESC_DPL_MASK); >> +env->segs[R_FS].flags &= ~(env->segs[R_FS].flags & DESC_DPL_MASK); >> +env->segs[R_GS].flags &= ~(env->segs[R_GS].flags & DESC_DPL_MASK); >> +env->segs[R_SS].flags &= ~(env->segs[R_SS].flags & DESC_DPL_MASK); >> +} >> + >> /* XXX: restore FPU round state */ >> env->fpstt = (env->fpus_vmstate >> 11) & 7; >> env->fpus = env->fpus_vmstate & ~0x3800; >> >