[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From p...@au1.ibm.com 2019-09-27 01:51 EDT--- (In reply to comment #73) > Leonardo, can you elaborate on the 'other possible issues'? We're hesitant > to pull 18 patches into a stable kernel under the assumption that they > *might* fix some *potential* issues, without clear evidence. If you can test > the single-patch kernel and report back that there are still issues then > that's a much stronger case for the other patches. > > Commit 'KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page > fault' that you're asking for requires all these additional backports to > apply cleanly. Which makes me wonder if we're not actually introducing a > problem with these backports just to fix it again later. Not saying that's > the case, just wondering... > > Also, the following seem to be totally unrelated and unnecessary: > - KVM: PPC: Remove unused kvm_unmap_hva callback > - powerpc/mm/radix: Remove unused code > > While looking through the patches I also noticed that the following is the > second patch of a series of 11 but it's the only one from the series that > you're backporting. > - powerpc/kvm: Switch kvm pmd allocator to custom allocator > Its commit message mentions subsequent patches of that series so I'm > wondering why we need/want only this single patch?? > > Remember that we have to support this kernel for years and years to come so > we only want to backport the absolute necessary. > > Lastly and FYI, the following is the minimal subset of your patches that all > cherry-pick cleanly: > - KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault > - KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping > size > - KVM: PPC: Book3S HV: Use correct pagesize in kvm_unmap_radix() > - KVM: PPC: Book3S HV: radix: Refine IO region partition scope attributes > - KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot() in page fault handler > - KVM: PPC: Book3S HV: Handle 1GB pages in radix page fault handler > - KVM: PPC: Book3S HV: Streamline setting of reference and change bits > - KVM: PPC: Book3S HV: Radix page fault handler optimizations > > Please provide some context why we need all the above (and potentially more). OK, so these are the ones *not* included in the above list (oldest to newest, with upstream commit IDs): 39c983ea0f96 KVM: PPC: Remove unused kvm_unmap_hva callback This one is dead code removal, it can be dropped. e2560b108fb1 KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page This one adds barriers which are required according to the architecture specification. It is not strictly related to fixing this bug, but if not included here, another bug should be raised to include it. It is quite safe since it is just adding barrier instructions. Without it there is a possibility of occasional mis-translation of addresses (though perhaps a very small possibility). If another bug is raised for this patch, include df158189dbcc below as well in the same bug. 7e3d9a1d0f2c KVM: PPC: Book3S HV: Make radix clear pte when unmapping This fixes a real bug, though it is not strictly related to the bug in this bugzilla. If it is not included here then another bug should be raised to include it. It is a small, simple and safe change. Without it there is a possibility of guests getting stuck doing continual hypervisor page faults. df158189dbcc KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path This one, like e2560b108fb1 above, adds barriers which are required according to the architecture specification. It is not strictly related to fixing this bug, but if not included here, another bug should be raised to include it. It is quite safe since it is just adding barrier instructions. 21828c99ee91 powerpc/kvm: Switch kvm pmd allocator to custom allocator This one is not needed and can be dropped. 99491e2d0e50 powerpc/mm/radix: Remove unused code This is dead code removal and can be dropped. 0078778a86b1 powerpc/mm/radix: implement LPID based TLB flushes to be used by KVM This is not strictly needed and can be dropped if d91cb39ffa7b and 9a4506e11b97 are being dropped. a5fad1e95952 KVM: PPC: Book3S HV: Use a helper to unmap ptes in the radix fault path This is not strictly needed (code refactoring) and can be dropped. a5704e83aa3d KVM: PPC: Book3S HV: Recursively unmap all page table entries when unmapping This one fixes a memory leak, so is not strictly related to this bug. The memory leak will probably not be apparent unless users are using 1GB huge pages to back guests. d91cb39ffa7b KVM: PPC: Book3S HV: Make radix use the Linux translation flush functions for partition scope This is code refactoring and can be dropped. 9a4506e11b97 KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on This is code refactoring and can be dropped. 878cf2bb2d8d KVM: PPC: Book3S HV: radix: Do not clear partition PTE when RC or write bits do not match This one is a
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From mranw...@us.ibm.com 2019-09-26 15:42 EDT--- Re-opening on our side to test in 19.10. Everything should be there for that, but it would be good to confirm this in time to get any needed fixes to 20.04, too. Just being clear at this point we don't need to target bionic - but validate on 19.10. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From mranw...@us.ibm.com 2019-05-13 17:04 EDT--- Adding Paul Mackerras - can you help with the context for the patches - beyond the potential performance impact? We were picking up this series because it fixes the migration problem, which appeared after adding a patch for bug 169712 for performance. Thanks! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-05-10 19:54 EDT--- Hello Juerg, As this complete list was suggested by Paul, I think he may be the best person to show the context of the patch series. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-04-29 18:50 EDT--- (In reply to comment #70) > Leonardo, since you seem to have a reliable reproducer now, could you give > this test kernel [1] a try? It just contains commit c066fafc595e ("KVM: PPC: > Book3S HV: Use correct pagesize in kvm_unmap_radix()") and is basically what > Joe gave you (comment #5) but at that time you weren't able to reproduce the > issue. > > [1] https://kernel.ubuntu.com/~juergh/lp1788098/ Hello Juerg, As you pointed, this kernel has only one of the 19 patches of the patch series. IMHO it would't be very productive to test this kernel as is. It can as well work just fine, but it doesn't have the complete solution to this problems. The kernel with the whole patch series is already tested, and solves many other possible issues. But If you think it's really important to test this one, I will try to schedule it for testing ASAP. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-04-10 12:08 EDT--- (In reply to comment #68) > In comment #22 above, it states that "In a meeting with lagarcia, I was > informed this patch is very important, and that it is already on kernel > 4.18-15 onwards." > > So, I assume that the required patchset(s) are already applied to the 18.04 > HWE kernel, and this bug requests a backport to the bionic 4.15 kernel. > > Next step is for the Canonical kernel team to analyse this backport request, > dropping the commit fb1522e099f0 ("KVM: update to new mmu_notifier semantic > v2", 2017-08-31), to assess whether it can be SRU'ed into the bionic 4.15 > kernel. I may be wrong, but the patch to be dropped is "KVM: PPC: Remove unused kvm_unmap_hva callback" (7fe24f427a09). On this commit, it says it's removing code that is dead since commit fb1522e099f0. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From p...@au1.ibm.com 2019-04-10 04:10 EDT--- (In reply to comment #66) > Stefan NACK'ed the series. For some unknown reason that email did make it > into the archive so here is ist content: > > > Since commit fb1522e099f0 ("KVM: update to new mmu_notifier semantic > > v2", 2017-08-31), the MMU notifier code in KVM no longer calls the > > kvm_unmap_hva callback. This removes the PPC implementations of > > kvm_unmap_hva(). > > This is not really the way SRUs should be done. We cannot remove support for > interfaces after release. Also the amount of change as a requisite should be > kept as minimal as possible. This just feels like too many changes without a > strong argument on why this must be done that way. > > -Stefan Well it was just removing dead code, but whatever. The series should be fine without that patch. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-04-08 10:37 EDT--- This (In reply to comment #64) > Hi Leonardo, > unfortunately there was an issue with the SRU request and Juerg NACK-ed it, > please have a look here: > https://lists.ubuntu.com/archives/kernel-team/2019-March/099128.html > Please re-submit the SRU request with the requested corrections. The email you posted was from March 10, and is outdated. The changes required were made, and it was acked on March 13, as said on the previous comment. Please see https://lists.ubuntu.com/archives/kernel- team/2019-March/099221.html -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-03-26 13:12 EDT--- Updating: The patchset was acked by Juerg Haefliger on Mar 13. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-03-14 15:47 EDT--- Patchset SRU [Impact] * VMs have a high chance to hit guest migration issues if more than one guest migration happens at a time, while using THP on ppc64le. * Migrating VMs in parallel will cause at least one guest to crash about half the time. Since VM migration is a upgrade/uptime strategy this has a fairly large customer impact. * The uploaded patches correct the behavior of THP on guests. They are available on v4.18.x onwards. [Test Case] * One can reproduce the bug by trying two guest migrations, at the same time, following this instructions on comment 12: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1788098/comments/12 [Regression Potential] * These patches are already on linux-stable since v4.18.15 (also on hwe), so there is low regression chance. 8afc7da95a7e [Bionic] KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault 82f7758a9c99 [Bionic] KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size b0f7664dc993 [Bionic] KVM: PPC: Book3S HV: Use correct pagesize in kvm_unmap_radix() 1991612ab005 [Bionic] KVM: PPC: Book3S HV: radix: Do not clear partition PTE when RC or write bits do not match 04fea11aa5fe [Bionic] KVM: PPC: Book3S HV: radix: Refine IO region partition scope attributes 9037e89d8093 [Bionic] KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on ed0a86a433c7 [Bionic] KVM: PPC: Book3S HV: Make radix use the Linux translation flush functions for partition scope 0effe5dc3cf4 [Bionic] KVM: PPC: Book3S HV: Recursively unmap all page table entries when unmapping 42cbaef5361b [Bionic] KVM: PPC: Book3S HV: Use a helper to unmap ptes in the radix fault path 414207e08540 [Bionic] powerpc/mm/radix: implement LPID based TLB flushes to be used by KVM eb2a70df7099 [Bionic] powerpc/mm/radix: Remove unused code ad052e60a417 [Bionic] powerpc/kvm: Switch kvm pmd allocator to custom allocator bb2c03e387f4 [Bionic] KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path 699642e0a4f8 [Bionic] KVM: PPC: Book3S HV: Make radix clear pte when unmapping 297755f60b17 [Bionic] KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page d5f5570b7df4 [Bionic] KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot() in page fault handler b0adb3223100 [Bionic] KVM: PPC: Book3S HV: Handle 1GB pages in radix page fault handler 5be468e7408b [Bionic] KVM: PPC: Book3S HV: Streamline setting of reference and change bits 860816ea1680 [Bionic] KVM: PPC: Book3S HV: Radix page fault handler optimizations 7fe24f427a09 [Bionic] KVM: PPC: Remove unused kvm_unmap_hva callback -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-03-12 14:52 EDT--- (In reply to comment #60) The patches were sent to Ubuntu kernel-team mailing list. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-02-26 12:58 EDT--- Here are the patches: https://gitlab.com/LeoBras/bionic/compare/master...lp1788098 Also, I attached a tgz with the patches. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-02-26 12:36 EDT--- (In reply to comment #41) > ...or perhaps I've misunderstood. Are the patches listed in comment #23 the > complete set required to resolve the issue (with no complex backporting > required)? Yes, the patches listed by Paul are the only ones required to fix the issue. As noted by Paul, there is only one patch that causes some conflict. I have solved this conflict and I will soon attach the full patch series. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-02-25 18:35 EDT--- I cherry-picked all patches on top of ubuntu-bionic (Ubuntu-4.15.0-45.48). Then, the next step was trying to find a way to reproduce the bug. I have noted, after several tests, that the previous suggestion of Michael Ranweiler was valid, but it's reproduction rate is about 50%. As previously I have tested only a few times, I could not get it to reproduce. How it fails: During 'memtest' second part, on a 'migrated to' guest, one of the migrations (that occur in parallel) would exit with a "Segmentation Fault" and not conclude the normal flow of the test. (It never reaches the puts part) After applying the kernel patches, it seems to work just fine all the times (I have tested 10+ times by now). The kernel debs generated by the building process can be downloaded on the link bellow: ftp://testcase.software.ibm.com/fromibm/linux/patched_kernel.tar.gz - Please use user=anonymous, passwd=anonymous if asked - Make sure to download it soon, as the link will be available for 3 business days. Building info: command: fakeroot debian/rules binary-generic binary-perarch git repo (before patches) : git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git (tag: Ubuntu-4.15.0-45.48) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From p...@au1.ibm.com 2019-02-19 20:52 EDT--- (In reply to comment #34) > In a meeting with lagarcia, I was informed this patch is very important, and > that it is already on kernel 4.18-15 onwards. > > In fact, including this one. there are two important patches on this subject: > > https://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git/commit/ > ?h=kvm-ppc-next=c066fafc595eef5ae3c83ae3a8305956b8c3ef15 > https://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git/commit/ > ?h=kvm-ppc-next=6579804c431712d56956a63b1a01509441cc6800 To get those you will need to cherry-pick the following patches from upstream: 39c983ea0f96 KVM: PPC: Remove unused kvm_unmap_hva callback c4c8a7643e74 KVM: PPC: Book3S HV: Radix page fault handler optimizations f7caf712d885 KVM: PPC: Book3S HV: Streamline setting of reference and change bits 58c5c276b4c2 KVM: PPC: Book3S HV: Handle 1GB pages in radix page fault handler 31c8b0d0694a KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot() in page fault handler e2560b108fb1 KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page 7e3d9a1d0f2c KVM: PPC: Book3S HV: Make radix clear pte when unmapping df158189dbcc KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path 21828c99ee91 powerpc/kvm: Switch kvm pmd allocator to custom allocator 99491e2d0e50 powerpc/mm/radix: Remove unused code 0078778a86b1 powerpc/mm/radix: implement LPID based TLB flushes to be used by KVM (note that this one will generate some conflicts) a5fad1e95952 KVM: PPC: Book3S HV: Use a helper to unmap ptes in the radix fault path a5704e83aa3d KVM: PPC: Book3S HV: Recursively unmap all page table entries when unmapping d91cb39ffa7b KVM: PPC: Book3S HV: Make radix use the Linux translation flush functions for partition scope 9a4506e11b97 KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on bc64dd0e1c4e KVM: PPC: Book3S HV: radix: Refine IO region partition scope attributes 878cf2bb2d8d KVM: PPC: Book3S HV: radix: Do not clear partition PTE when RC or write bits do not match c066fafc595e KVM: PPC: Book3S HV: Use correct pagesize in kvm_unmap_radix() 71d29f43b633 KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size 6579804c4317 KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-02-08 14:05 EDT--- In a meeting with lagarcia, I was informed this patch is very important, and that it is already on kernel 4.18-15 onwards. In fact, including this one. there are two important patches on this subject: https://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git/commit/?h=kvm-ppc-next=c066fafc595eef5ae3c83ae3a8305956b8c3ef15 https://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git/commit/?h=kvm-ppc-next=6579804c431712d56956a63b1a01509441cc6800 As I said before, for 18.10 onwards (kernel >= 4.18), the patch is available from kernel upstream source, but for Ubuntu 18.04 they may not be so easily applied. So I will work on backporting them to v4.15. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-01-24 09:34 EDT--- By the test results, the problem doesn't seem to reproduce. Are there any other suggestions to reproduce it? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-01-24 09:26 EDT--- By suggestion of Michael Ranweiler, I did some concurrent migration tests. In fact, I just repeated the procedure used before, but did it twice at roughly the same time (in parallel). The results are attached. Migration 1: from1.txt to1.txt Migration 2: from2.txt to2.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-01-04 14:29 EDT--- Test: Verify all memory after migration ### Host: ### # uname -a Linux host 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux #cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never #cat /proc/cpuinfo [...] processor : 159 cpu : POWER9, altivec supported clock : 2300.00MHz revision: 2.2 (pvr 004e 1202) timebase: 51200 platform: PowerNV model : 8375-42A machine : PowerNV 8375-42A firmware: OPAL MMU : Radix As previously, I have built version Qemu 3.1.0 and made sure the patch that enables THP was included: #../configure --target-list=ppc-linux-user,ppc64-linux-user,ppc64le-linux-user,ppc-softmmu,ppc64-softmmu --enable-debug-info --enable-trace-backends=log --python=/usr/bin/python3 && make -j $(nproc)' #./ppc-softmmu/qemu-system-ppc -version QEMU emulator version 3.1.0 (v3.1.0-dirty) ### Guest: ### ### CLI 1: Migrating from: MALLOC_PERTURB_=1 /home/leonardo/qemu/build/ppc64-softmmu/qemu-system-ppc64 \ -nographic \ -serial mon:stdio \ -name 'avocado-vt-vm1' \ -machine pseries \ -nodefaults \ -vga std \ -device pci-bridge,id=pci_bridge,bus=pci.0,addr=0x3,chassis_nr=1 \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=0x4 \ -object rng-random,filename=/dev/random,id=passthrough-RHq4nIpF \ -device virtio-rng-pci,id=virtio-rng-pci-aXCni2OX,rng=passthrough-RHq4nIpF,bus=pci.0,addr=0x5 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x6 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x7 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/leonardo/images/ubuntu-18.04-ppc64le.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1 \ -m 8192 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -watchdog i6300esb \ -watchdog-action reset \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 \ -initrd /boot/initrd.img-4.15.0-20-generic \ -kernel /boot/vmlinux-4.15.0-20-generic \ -append "root=UUID=b4ef9412-06d6-4947-9969-f15c7cc2c986 ro quiet splash ### CLI 2: Migrating To Copy of CLI 1, changing: - -name 'avocado-vt-vm1' \ + -name 'avocado-vt-vm2' \ + -S - -vnc :0 \ + -vnc :1 \ + -incoming tcp:0:5801 \ ### Inside Guest: #uname -a Linux localhost 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux # cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never #cat /proc/cpuinfo processor : 3 cpu : POWER9 (architected), altivec supported clock : 2900.00MHz revision: 2.2 (pvr 004e 1202) timebase: 51200 platform: pSeries model : IBM pSeries (emulated by qemu) machine : CHRP IBM pSeries (emulated by qemu) MMU : Radix ### Test Software: ### I created a simple C file to: - allocate 2MB blocks, - write urandom to them, - md5sum all the blocks together, - stops, allowing migration, - re-md5sum everything, - free the blocks. The attached source file is copied to guest, then compiled: #gcc -o memtest memtest.c -lcrypto ### Procedure ### Use CLI commands to bring up Guest "Migrate from" and "Migrate to". On "Migrate from": root@localhost:~# ./memtest Block 0 Block 128 [...] Block 3968 Allocated 4075 blocks of 2097152 size. Md5 = 209a63b9c1f9acd13fa32236229daa9b Press enter key to check memory integrity [1]+ Stopped ./memtest root@localhost:~# free -h totalusedfree shared buff/cache available Mem: 8.0G7.7G246M 64K 21M 37M Swap: 758M758M 0B - Enter Qemu Monitor: QEMU 3.1.0 monitor - type 'help' for more information (qemu) migrate -d tcp:0:5801 (qemu) info status VM status: paused (postmigrate) (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off Migration status: completed total time: 248950 milliseconds downtime: 112 milliseconds setup: 18 milliseconds transferred ram: 9847781 kbytes throughput: 269.52 mbps remaining ram: 0 kbytes total ram: 8405056 kbytes duplicate: 143398 pages skipped: 0 pages normal: 2456826 pages normal
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2019-01-04 06:12 EDT--- I have tried the following test in order to reproduce the bug: ## root@localhost:~# uname -a Linux localhost 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux root@localhost:~# cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never ## dd if=/dev/urandom of=/dev/shm/img bs=2M count=2000 md5sum /dev/shm/img > test.md5 After the migration, i did: md5sum -c test.md5 And the result was OK. (memory not corrupted). I also modified the above test allocating chunks of 2M, this way: for i in {0001..2000} ; do dd if=/dev/urandom of=/dev/shm/img_${i} bs=2M count=1 ; done md5sum /dev/shm/* > test.md5 After the migration, i did: md5sum -c test.md5 And the result was OK for every file. (memory not corrupted). Conclusion: - I have found no difference between patched and unpatched kernel during the tests. - The memory after the migration seems fine, returning the same memory block (tested with md5sum) Is there any other suggestion about how to reproduce the bug? Thanks! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1788098] Comment bridged from LTC Bugzilla
--- Comment From leona...@ibm.com 2018-12-21 12:10 EDT--- Hello, I have been trying to reproduce this bug over this week, but I couldn't do so on Ubuntu. Could anyone verify what I have been doing wrong? # ## QEMU I have built version Qemu 3.1.0 and made sure the patch that enables THP was included: ../configure --target-list=ppc-linux-user,ppc64-linux-user,ppc64le-linux-user,ppc-softmmu,ppc64-softmmu --enable-debug-info --enable-trace-backends=log --python=/usr/bin/python3 && make -j $(nproc)' ./ppc-softmmu/qemu-system-ppc -version QEMU emulator version 3.1.0 (v3.1.0-dirty) ## Kernel uname -a Linux NAME 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never ## CLI command Both commands were sent on the same host, (1) is the "migrating from" instance and (2) is the "migrate to" instance. (1) MALLOC_PERTURB_=1 /home/leonardo/qemu/build/ppc64-softmmu/qemu-system-ppc64 \ -nographic \ -serial mon:stdio \ -S \ -name 'avocado-vt-vm1' \ -machine pseries \ -nodefaults \ -vga std \ -device pci-bridge,id=pci_bridge,bus=pci.0,addr=0x3,chassis_nr=1 \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=0x4 \ -object rng-random,filename=/dev/random,id=passthrough-RHq4nIpF \ -device virtio-rng-pci,id=virtio-rng-pci-aXCni2OX,rng=passthrough-RHq4nIpF,bus=pci.0,addr=0x5 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x6 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x7 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/leonardo/images/ubuntu-18.04-ppc64le.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1 \ -m 8192 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -watchdog i6300esb \ -watchdog-action reset \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 (2) Same as above. Changes only a few stuff: - -name 'avocado-vt-vm1' \ + -name 'avocado-vt-vm2' \ - -vnc :0 \ + -vnc :1 \ + -incoming tcp:0:5801 \ ## Testing and Results (1) On guest : # stress --io 5 --cpu 4 stress: info: [812] dispatching hogs: 4 cpu, 5 io, 0 vm, 0 hdd (1) on Qemu Terminal: (qemu) migrate_set_speed 256 (qemu) migrate -d tcp:0:5801 (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x -multifd: off dirty-bitmaps: off Migration status: completed total time: 1776 milliseconds downtime: 61 milliseconds setup: 9 milliseconds transferred ram: 422571 kbytes throughput: 1964.89 mbps remaining ram: 0 kbytes total ram: 8405056 kbytes duplicate: 2006371 pages skipped: 0 pages normal: 101037 pages normal bytes: 404148 kbytes dirty sync count: 3 page size: 4 kbytes (qemu) info status VM status: paused (postmigrate) It's all over on ~2 seconds, no issues. Stress stay running on the new machine. (after cont) ### Other Qemu tested, with the same result: v2.12 git v3.0.0 git Debian 1:2.12+dfsg-3ubuntu8) Other Host Kernel tested, with the same result: 4.18.0 - Vanilla, no patch 4.15.0-42-generic 4.15.0-42-generic + patch 4.15.0-32-generic (provided by jsalisbury) 4.15.0-20-generic 4.15.0 - Vanilla, no patch -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs