Re: [Qemu-devel] Windows VM slow boot
On Wed, Sep 12, 2012 at 05:46:15PM +0100, Richard Davies wrote: Hi Mel - thanks for replying to my underhand bcc! Mel Gorman wrote: I see that this is an old-ish bug but I did not read the full history. Is it now booting faster than 3.5.0 was? I'm asking because I'm interested to see if commit c67fe375 helped your particular case. Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be improved, as discussed. What are the boot times for each kernel? PATCH SNIPPED I have applied and tested again - perf results below. isolate_migratepages_range is indeed much reduced. There is now a lot of time in isolate_freepages_block and still quite a lot of lock contention, although in a different place. This on top please. ---8--- From: Shaohua Li s...@fusionio.com compaction: abort compaction loop if lock is contended or run too long isolate_migratepages_range() might isolate none pages, for example, when zone-lru_lock is contended and compaction is async. In this case, we should abort compaction, otherwise, compact_zone will run a useless loop and make zone-lru_lock is even contended. V2: only abort the compaction if lock is contended or run too long Rearranged the code by Andrea Arcangeli. [minc...@kernel.org: Putback pages isolated for migration if aborting] [a...@linux-foundation.org: Fixup one contended usage site] Signed-off-by: Andrea Arcangeli aarca...@redhat.com Signed-off-by: Shaohua Li s...@fusionio.com Signed-off-by: Mel Gorman mgor...@suse.de --- mm/compaction.c | 17 - mm/internal.h |2 +- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..a8de20d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, /* async aborts if taking too long or contended */ if (!cc-sync) { - if (cc-contended) - *cc-contended = true; + cc-contended = true; return false; } @@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* Perform the isolation */ low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); - if (!low_pfn) + if (!low_pfn || cc-contended) return ISOLATE_ABORT; cc-migrate_pfn = low_pfn; @@ -787,6 +786,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_PARTIAL; + putback_lru_pages(cc-migratepages); + cc-nr_migratepages = 0; goto out; case ISOLATE_NONE: continue; @@ -831,6 +832,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, bool *contended) { + unsigned long ret; struct compact_control cc = { .nr_freepages = 0, .nr_migratepages = 0, @@ -838,12 +840,17 @@ static unsigned long compact_zone_order(struct zone *zone, .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .contended = contended, }; INIT_LIST_HEAD(cc.freepages); INIT_LIST_HEAD(cc.migratepages); - return compact_zone(zone, cc); + ret = compact_zone(zone, cc); + + VM_BUG_ON(!list_empty(cc.freepages)); + VM_BUG_ON(!list_empty(cc.migratepages)); + + *contended = cc.contended; + return ret; } int sysctl_extfrag_threshold = 500; diff --git a/mm/internal.h b/mm/internal.h index b8c91b3..4bd7c0e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -130,7 +130,7 @@ struct compact_control { int order; /* order a direct compactor needs */ int migratetype;/* MOVABLE, RECLAIMABLE etc */ struct zone *zone; - bool *contended;/* True if a lock was contended */ + bool contended; /* True if a lock was contended */ }; unsigned long
Re: [Qemu-devel] Windows VM slow boot
[ adding linux-mm - previously at http://marc.info/?t=13451150943 ] Hi Rik, Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would retest with these. The typical symptom now appears to be that the Windows VMs boot reasonably fast, but then there is high CPU use and load for many minutes afterwards - the high CPU use is both for the qemu-kvm processes themselves and also for % sys. I attach a perf report which seems to show that the high CPU use is in the memory manager. Cheers, Richard. # # captured on: Wed Sep 12 10:25:43 2012 # os release : 3.6.0-rc5-elastic # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # # # Samples: 870K of event 'cycles' # Event count (approx.): 432968175910 # # Overhead Command Shared Object Symbol # ... .. # 89.14% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--95.47%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.64%-- 0x1010002 | | | --44.36%-- 0x1010006 | |--4.53%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.36%-- 0x1010002 | | | --44.64%-- 0x1010006 --0.00%-- [...] 4.92% qemu-kvm [kernel.kallsyms] [k] migrate_pages | --- migrate_pages | |--99.74%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact
Re: [Qemu-devel] Windows VM slow boot
On Wed, Sep 12, 2012 at 11:56:59AM +0100, Richard Davies wrote: [ adding linux-mm - previously at http://marc.info/?t=13451150943 ] Hi Rik, I'm not Rik but hi anyway. Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would retest with these. Ok. 3.6.0-rc5 contains [c67fe375: mm: compaction: Abort async compaction if locks are contended or taking too long] that should have helped mitigate some of the lock contention problem but not all of it as we'll see later. The typical symptom now appears to be that the Windows VMs boot reasonably fast, I see that this is an old-ish bug but I did not read the full history. Is it now booting faster than 3.5.0 was? I'm asking because I'm interested to see if commit c67fe375 helped your particular case. but then there is high CPU use and load for many minutes afterwards - the high CPU use is both for the qemu-kvm processes themselves and also for % sys. Ok, I cannot comment on the userspace portion of things but the kernel portion still indicates that there is a high percentage of time on what appears to be lock contention. I attach a perf report which seems to show that the high CPU use is in the memory manager. A follow-on from commit c67fe375 was the following patch (author cc'd) which addresses lock contention in isolate_migratepages_range where your perf report indicates that we're spending 95% of the time. Would you be willing to test it please? ---8--- From: Shaohua Li s...@kernel.org Subject: mm: compaction: check lock contention first before taking lock isolate_migratepages_range will take zone-lru_lock first and check if the lock is contented, if yes, it will release the lock. This isn't efficient. If the lock is truly contented, a lock/unlock pair will increase the lock contention. We'd better check if the lock is contended first. compact_trylock_irqsave perfectly meets the requirement. Signed-off-by: Shaohua Li s...@fusionio.com Acked-by: Mel Gorman mgor...@suse.de Acked-by: Minchan Kim minc...@kernel.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- mm/compaction.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock +++ a/mm/compaction.c @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone * /* Time to isolate some pages for migration */ cond_resched(); - spin_lock_irqsave(zone-lru_lock, flags); - locked = true; + locked = compact_trylock_irqsave(zone-lru_lock, flags, cc); + if (!locked) + return 0; for (; low_pfn end_pfn; low_pfn++) { struct page *page;
Re: [Qemu-devel] Windows VM slow boot
Hi Mel - thanks for replying to my underhand bcc! Mel Gorman wrote: I see that this is an old-ish bug but I did not read the full history. Is it now booting faster than 3.5.0 was? I'm asking because I'm interested to see if commit c67fe375 helped your particular case. Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be improved, as discussed. A follow-on from commit c67fe375 was the following patch (author cc'd) which addresses lock contention in isolate_migratepages_range where your perf report indicates that we're spending 95% of the time. Would you be willing to test it please? ---8--- From: Shaohua Li s...@kernel.org Subject: mm: compaction: check lock contention first before taking lock isolate_migratepages_range will take zone-lru_lock first and check if the lock is contented, if yes, it will release the lock. This isn't efficient. If the lock is truly contented, a lock/unlock pair will increase the lock contention. We'd better check if the lock is contended first. compact_trylock_irqsave perfectly meets the requirement. Signed-off-by: Shaohua Li s...@fusionio.com Acked-by: Mel Gorman mgor...@suse.de Acked-by: Minchan Kim minc...@kernel.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- mm/compaction.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock +++ a/mm/compaction.c @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone * /* Time to isolate some pages for migration */ cond_resched(); - spin_lock_irqsave(zone-lru_lock, flags); - locked = true; + locked = compact_trylock_irqsave(zone-lru_lock, flags, cc); + if (!locked) + return 0; for (; low_pfn end_pfn; low_pfn++) { struct page *page; I have applied and tested again - perf results below. isolate_migratepages_range is indeed much reduced. There is now a lot of time in isolate_freepages_block and still quite a lot of lock contention, although in a different place. # # captured on: Wed Sep 12 16:00:52 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # # # Samples: 1M of event 'cycles' # Event count (approx.): 560365005583 # # Overhead Command Shared Object Symbol # ... .. # 43.95% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.17%-- 0x1010006 | | | --4.83%-- 0x1010002 --0.01%-- [...] 15.98% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | ---