On Thu, Jan 31, 2019 at 07:15:26AM +0100, Christophe Leroy wrote: > > > Le 31/01/2019 à 07:06, Stephen Rothwell a écrit : > >Hi all, > > > >On Thu, 31 Jan 2019 16:38:54 +1100 Stephen Rothwell <s...@canb.auug.org.au> > >wrote: > >> > >>[I am guessing that is is something in Andrew's tree that has caused > >>this.] > >> > >>My qemu boot of the powerpc pseries_le_defconfig config failed like this: > >> > >>htab_hash_mask = 0x1ffff > >>----------------------------------------------------- > >>numa: NODE_DATA [mem 0x7ffe7000-0x7ffebfff] > >>Kernel panic - not syncing: sparse_buffer_init: Failed to allocate > >>2147483648 bytes align=0x10000 nid=0 from=fffffffffffffff
This means that sparse_buffer_init tries to allocate 2G for the sparsemap_buf... Stephen, how many memory do you give to your VM? > >>CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc4 #2 > >>Call Trace: > >>[c00000000105bbd0] [c000000000b1345c] dump_stack+0xb0/0xf4 (unreliable) > >>[c00000000105bc10] [c000000000111120] panic+0x168/0x3b8 > >>[c00000000105bcb0] [c000000000e701c8] sparse_init_nid+0x178/0x550 > >>[c00000000105bd70] [c000000000e709b4] sparse_init+0x210/0x238 > >>[c00000000105bdb0] [c000000000e468f4] initmem_init+0x1e0/0x260 > >>[c00000000105be80] [c000000000e3b9b0] setup_arch+0x354/0x3d4 > >>[c00000000105bef0] [c000000000e33afc] start_kernel+0x98/0x648 > >>[c00000000105bf90] [c00000000000b270] start_here_common+0x1c/0x52c > > > >A quick bisect leads to this: > > > >1c3c9328cde027eb875ba4692f0a5d66b0afe862 is the first bad commit > >commit 1c3c9328cde027eb875ba4692f0a5d66b0afe862 > >Author: Mike Rapoport <r...@linux.ibm.com> > >Date: Thu Jan 31 10:51:32 2019 +1100 > > > > treewide: add checks for the return value of memblock_alloc*() > > Add check for the return value of memblock_alloc*() functions and call > > panic() in case of error. The panic message repeats the one used by > > panicing memblock allocators with adjustment of parameters to include > > only > > relevant ones. > > The replacement was mostly automated with semantic patches like the one > > below with manual massaging of format strings. > > @@ > > expression ptr, size, align; > > @@ > > ptr = memblock_alloc(size, align); > > + if (!ptr) > > + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", > > __func__, > > size, align); > > Link: > > http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-r...@linux.ibm.com > > Signed-off-by: Mike Rapoport <r...@linux.ibm.com> > > Reviewed-by: Guo Ren <ren_...@c-sky.com> [c-sky] > > Acked-by: Paul Burton <paul.bur...@mips.com> [MIPS] > > Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com> [s390] > > Reviewed-by: Juergen Gross <jgr...@suse.com> [Xen] > > Reviewed-by: Geert Uytterhoeven <ge...@linux-m68k.org> [m68k] > > Cc: Catalin Marinas <catalin.mari...@arm.com> > > Cc: Christophe Leroy <christophe.le...@c-s.fr> > > Cc: Christoph Hellwig <h...@lst.de> > > Cc: "David S. Miller" <da...@davemloft.net> > > Cc: Dennis Zhou <den...@kernel.org> > > Cc: Greentime Hu <green...@gmail.com> > > Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org> > > Cc: Guan Xuetao <g...@pku.edu.cn> > > Cc: Guo Ren <guo...@kernel.org> > > Cc: Mark Salter <msal...@redhat.com> > > Cc: Matt Turner <matts...@gmail.com> > > Cc: Max Filippov <jcmvb...@gmail.com> > > Cc: Michael Ellerman <m...@ellerman.id.au> > > Cc: Michal Simek <mon...@monstr.eu> > > Cc: Petr Mladek <pmla...@suse.com> > > Cc: Richard Weinberger <rich...@nod.at> > > Cc: Rich Felker <dal...@libc.org> > > Cc: Rob Herring <robh...@kernel.org> > > Cc: Rob Herring <r...@kernel.org> > > Cc: Russell King <li...@armlinux.org.uk> > > Cc: Stafford Horne <sho...@gmail.com> > > Cc: Tony Luck <tony.l...@intel.com> > > Cc: Vineet Gupta <vgu...@synopsys.com> > > Cc: Yoshinori Sato <ys...@users.sourceforge.jp> > > Signed-off-by: Andrew Morton <a...@linux-foundation.org> > > > >Which is just adding the panic we hit. So, presumably, the bug is in a > >preceding patch :-( > > > >I have left the kernel not booting for today. > > > > No I think the error is really in that patch, see my other mail. > > See https://elixir.bootlin.com/linux/v5.0-rc4/source/mm/memblock.c#L1455, > memblock_alloc_try_nid_raw() is not supposed to panic, so the last hunk of > this patch should be reverted. It is not supposed to panic, but it can still fail, so simply ignoring it's return value seems a bit odd at least. > Found in total three problematic hunks in that patch: > > @@ -48,6 +53,11 @@ static phys_addr_t __init kasan_alloc_raw_page(int node) > void *p = memblock_alloc_try_nid_raw(PAGE_SIZE, PAGE_SIZE, > __pa(MAX_DMA_ADDRESS), > MEMBLOCK_ALLOC_KASAN, node); > + if (!p) > + panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d > from=%llx\n", > + __func__, PAGE_SIZE, PAGE_SIZE, node, > + __pa(MAX_DMA_ADDRESS)); > + > return __pa(p); > } > > @@ -211,6 +211,9 @@ static int __init iob_init(struct device_node *dn) > iob_l2_base = memblock_alloc_try_nid_raw(1UL << 21, 1UL << 21, > MEMBLOCK_LOW_LIMIT, 0x80000000, > NUMA_NO_NODE); > + if (!iob_l2_base) > + panic("%s: Failed to allocate %lu bytes align=0x%lx > max_addr=%x\n", > + __func__, 1UL << 21, 1UL << 21, 0x80000000); > > pr_info("IOBMAP L2 allocated at: %p\n", iob_l2_base); > > > @@ -425,6 +436,10 @@ static void __init sparse_buffer_init(unsigned long > size, int nid) > memblock_alloc_try_nid_raw(size, PAGE_SIZE, > __pa(MAX_DMA_ADDRESS), > MEMBLOCK_ALLOC_ACCESSIBLE, nid); > + if (!sparsemap_buf) > + panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d > from=%lx\n", > + __func__, size, PAGE_SIZE, nid, __pa(MAX_DMA_ADDRESS)); > + > sparsemap_buf_end = sparsemap_buf + size; > } > > > > Christophe > -- Sincerely yours, Mike.