RFC patch is attached. Comments appreciated. I have two concerns left: a) what happens if a page turns from zero to non-zero in the first stage. Is this page transferred in the same round or in the next? b) what happens if live migration fails or is aborted and then again a migration is started to the same target (if this is possible). Is the memory at the target reinitialized?
Am 31.01.2013 um 10:37 schrieb Orit Wasserman <owass...@redhat.com>: > On 01/31/2013 11:25 AM, Peter Lieven wrote: >> >> Am 31.01.2013 um 10:19 schrieb Orit Wasserman <owass...@redhat.com>: >> >>> On 01/31/2013 11:00 AM, Peter Lieven wrote: >>>> >>>> Am 31.01.2013 um 09:59 schrieb Orit Wasserman <owass...@redhat.com>: >>>> >>>>> On 01/31/2013 10:37 AM, Peter Lieven wrote: >>>>>> >>>>>> Am 31.01.2013 um 09:33 schrieb Orit Wasserman <owass...@redhat.com>: >>>>>> >>>>>>> On 01/31/2013 10:10 AM, Peter Lieven wrote: >>>>>>>> >>>>>>>> Am 31.01.2013 um 08:47 schrieb Orit Wasserman <owass...@redhat.com>: >>>>>>>> >>>>>>>>> On 01/31/2013 08:57 AM, Peter Lieven wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I just came across an idea and would like to have feedback if it >>>>>>>>>> makes sence or not. >>>>>>>>>> >>>>>>>>>> If a VM is started without preallocated memory all memory that has >>>>>>>>>> not been written to >>>>>>>>>> reads as zeros, right? >>>>>>>>> Hi, >>>>>>>>> No the memory will be unmapped (we allocate on demand). >>>>>>>> >>>>>>>> Yes, but those unmapped pages will read as zeroes if the guest >>>>>>>> accesses it? >>>>>>> yes. >>>>>>>> >>>>>>>>>> If a VM with a lot of unwritten memory is migrated or if the memory >>>>>>>>>> contains a lot >>>>>>>>>> of zeroed out memory (e.g. Windows or Linux guest with page >>>>>>>>>> sanitization) all this memory >>>>>>>>>> is allocated on the target during live migration. Especially with >>>>>>>>>> KSM this leads >>>>>>>>>> to the problem that this memory is allocated and might be not >>>>>>>>>> available completely as >>>>>>>>>> merging of the pages will happen async. >>>>>>>>>> >>>>>>>>>> Wouldn't it make sense to not send zero pages in the first round >>>>>>>>>> where the complete >>>>>>>>>> ram is sent (if it is detectable that we are in this stage)? >>>>>>>>> We send one byte per zero page at the moment (see is_dup_page) we can >>>>>>>>> further optimizing it >>>>>>>>> by not sending it. >>>>>>>>> I have to point out that this is a very idle guest and we need to >>>>>>>>> work on a loaded guest >>>>>>>>> which is the more hard problem in migration. >>>>>>>> >>>>>>>> I was not talking about saving one byte (+ 8 bytes for header), my >>>>>>>> concern was that we memset all (dup) pages >>>>>>>> including the special case of a zero dup page on the migration target. >>>>>>>> This allocates the memory or does it not? >>>>>>>> >>>>>>> >>>>>>>> If my above assumption that the guest reads unmapped memory as zeroes >>>>>>>> is right, this mapping >>>>>>>> is not necessary in the case of a zero dup page. >>>>>>>> >>>>>>>> We just have to make sure that we are still in the very first round >>>>>>>> when deciding not to sent >>>>>>>> a zero page, because otherwise it could be a page that has become zero >>>>>>>> during migration and >>>>>>>> this of course has to be transferred. >>>>>>> >>>>>>> OK, so if we won't send the pages than it won't be allocate in the dst >>>>>>> and it can improve both >>>>>>> memory usage and reduce cpu consumption on it. >>>>>>> That can be good for over commit scenario. >>>>>> >>>>>> Yes. On the Source host those zero pages have likely all been merged by >>>>>> KSM already, but on the destination >>>>>> they are allocated and initially consume real memory. This can be a >>>>>> problem if a lot of incoming migrations happen >>>>>> at the same time. >>>>> >>>>> That can be very effective. >>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Also I notice that the bottle neck in migrating unmapped pages is the >>>>>>>>> detection of those pages >>>>>>>>> because we map the pages in order to check them, for a large guest >>>>>>>>> this is very expensive as mapping a page >>>>>>>>> results in a page fault in the host. >>>>>>>>> So what will be very helpful is actually locating those pages without >>>>>>>>> mapping them >>>>>>>>> which looks very complicated. >>>>>>>> >>>>>>>> This would be a nice improvement, but as you said a guest will sooner >>>>>>>> or later allocate >>>>>>>> all memory if it is not totally idle. However, bigger parts of this >>>>>>>> memory might have been reset to zeroes. >>>>>>>> This happens on page deallocation in a Windows Guest by default and >>>>>>>> can also be enforced in LInux >>>>>>>> with page sanitization. >>>>>>> >>>>>>> true, but it those cases we will want to zero the page in the dst as >>>>>>> this is done for security reasons. >>>>>> >>>>>> if i migrate it to a destination where initially all memory is unmapped >>>>>> not migrating the zero page turns it >>>>>> into an unmapped page (which reads a zero?). where is the security >>>>>> problem? its like rethinning on a storage. >>>>>> Or do I understand something wrong here? Is the actual mapping >>>>>> information migrated? >>>>> >>>>> I was referring to pages that had some data and were migrated, so when >>>>> the guest OS zeros them we need to zero them >>>>> also in destination because the data is also there. >>>> >>>> Ok, so can we with the current implementation effectively decide if a page >>>> is transferred for the first time? >>> >>> In the old code (before 1.3 or 1.2 we add a separate function for the >>> first full transfer but now we don't. >>> So I guess you will need to implement it, it shouldn't be too complicated. >>> I would add a flag to the existing code. >>>> >>>> Do we always migrate the complete memory once and then iterate over dirty >>>> pages? I have to check the code >>>> that searches for dirty pages to confirm that. >>> We set all the bitmap as dirty in the beginning of migration so in the >>> first iteration all pages will be sent. >>> The code is in arch_init.c, look at ram_save_setup and ram_save_iterate. >> >> I will have a look and sent a RFC patch once I have tested it. > Great! diff --git a/arch_init.c b/arch_init.c index dada6de..33f3b12 100644 --- a/arch_init.c +++ b/arch_init.c @@ -426,6 +426,8 @@ static void migration_bitmap_sync(void) * 0 means no dirty pages */ +static uint64_t complete_rounds; + static int ram_save_block(QEMUFile *f, bool last_stage) { RAMBlock *block = last_seen_block; @@ -451,6 +453,10 @@ static int ram_save_block(QEMUFile *f, bool last_stage) if (!block) { block = QTAILQ_FIRST(&ram_list.blocks); complete_round = true; + if (!complete_rounds) { + error_report("ram_save_block: finished bulk ram migration"); + } + complete_rounds++; } } else { uint8_t *p; @@ -463,10 +469,17 @@ static int ram_save_block(QEMUFile *f, bool last_stage) bytes_sent = -1; if (is_dup_page(p)) { acct_info.dup_pages++; - bytes_sent = save_block_hdr(f, block, offset, cont, + /* we can skip transferring zero pages in the first round because + memory is unmapped (reads as zero) at the target anyway or initialized + to zero in case of mem-prealloc. */ + if (complete_rounds || *p) { + bytes_sent = save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS); - qemu_put_byte(f, *p); - bytes_sent += 1; + qemu_put_byte(f, *p); + bytes_sent += 1; + } else { + bytes_sent = 1; + } } else if (migrate_use_xbzrle()) { current_addr = block->offset + offset; bytes_sent = save_xbzrle_page(f, p, current_addr, block, @@ -569,6 +582,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) qemu_mutex_lock_ramlist(); bytes_transferred = 0; + complete_rounds = 0; reset_ram_globals(); if (migrate_use_xbzrle()) {