On Thu, Jul 20, 2017 at 10:18:18PM -0300, jos...@linux.vnet.ibm.com wrote: > On Thu, Jul 20, 2017 at 03:21:59PM +1000, Paul Mackerras wrote: > > On Thu, Jul 20, 2017 at 12:02:23AM -0300, jos...@linux.vnet.ibm.com wrote: > > > On Thu, Jul 20, 2017 at 09:42:50AM +1000, Benjamin Herrenschmidt wrote: > > > > On Wed, 2017-07-19 at 16:46 -0300, jos...@linux.vnet.ibm.com wrote: > > > > > Hello! > > > > > > > > > > We're not able to boot any KVM guest using upstream kernel > > > > > (cb8c65ccff7f77d0285f1b126c72d37b2572c865 - 4.13.0-rc1+). > > > > > After reaching the SLOF initial counting, the guest simply freezes: > > > > > > > > Can you send our .config ? > > > > > > Sure, > > > > > > Answering Michael as well: > > > > > > It's a P9 with RHEL kernel 4.11.0-10.el7a.ppc64le installed. The problem > > > was noticed with kernel > 4.13 (I'm currently running 4.13.0-rc1+). > > > > > > QEMU is https://github.com/dgibson/qemu (ppc-for-2.10) but I gave the > > > default packaged Qemu a try. > > > > > > For the guest, I tried both a vanilla Ubuntu 17.04 and the host kernel. > > > But they had never a chance to run since the freezing happened in SLOF. > > > > > > Note that using the 4.11.0-10.el7a.ppc64le kernel it works fine > > > (for any of these Qemu/Guest setup). With 4.13.0-rc1 I have it run after > > > reverting that referred commit. > > > > Is the host kernel running in radix mode? > > yes > > > > > Did you check the host kernel logs for any oops messages? > > dmesg was clean but after sometime waiting (I forgot QEMU running in > another terminal) I got the oops below (after rebooting the host I > couldn't reproduce it again). > > Another test that I did was: > Compile with transparent huge pages disabled: KVM works fine > Compile with transparent huge pages enabled: doesn't work > + disabling it in /sys/kernel/mm/transparent_hugepage: doesn't work > > Just out of my own curiosity I made this small change: > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > b/arch/powerpc/include > index c0737c8..f94a3b6 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -80,7 +80,7 @@ > > #define _PAGE_SOFT_DIRTY _RPAGE_SW3 /* software: software dirty > tracking > #define _PAGE_SPECIAL _RPAGE_SW2 /* software: special page */ > -#define _PAGE_DEVMAP _RPAGE_SW1 /* software: ZONE_DEVICE page */ > +#define _PAGE_DEVMAP _RPAGE_RSV3 > #define __HAVE_ARCH_PTE_DEVMAP > > and it works. I chose _RPAGE_RSV3 because it uses the same value that > x86 uses (0x0400000000000000UL) but I don't if it could have any side > effect >
Does this change make any sense to you people? I didn't see any side effect expect that devices backed memory will have a bigger address space in transparent huge pages IF I understand that correctly. If so I can send a patch with this change. Thank you!!