Re: 2.6.18-mm2 boot failure on x86-64
On Mon, Oct 09, 2006 at 10:53:58AM +0100, Mel Gorman wrote: On Fri, 6 Oct 2006, Vivek Goyal wrote: On Fri, Oct 06, 2006 at 01:03:50PM -0500, Steve Fox wrote: On Fri, 2006-10-06 at 18:11 +0100, Mel Gorman wrote: On (06/10/06 11:36), Vivek Goyal didst pronounce: Where is bss placed in physical memory? I guess bss_start and bss_stop from System.map will tell us. That will confirm that above memset step is stomping over bss. Then we have to just find that somewhere probably we allocated wrong physical memory area for bootmem allocator map. BSS is at 0x643000 - 0x777BC4 init_bootmem wipes from 0x777000 - 0x8F7000 So the BSS bytes from 0x777000 -0x777BC4 (which looks very suspiciously pile a page alignment of addr PAGE_MASK) gets set to 0xFF. One possible fix is below. It adds a check in bad_addr() to see if the BSS section is about to be used for bootmap. It Seems To Work For Me (tm) and illustrates the source of the problem even if it's not the 100% correct fix. I was able to boot the machine with Mel's patch applied on top of -git22. Please have a look at the attached patch. Does it make some sense. It makes some sense. As you state, it wastes memory but that is better than breaking. Steve, can you please give this patch a try if it fixes the problem? I boottested the patch on the same machine as Steve was using and it completed successfully. Hi Andrew, Can you please have a look at the attached patch and include it in -mm. This fixes the issue for steve. It also figures in the list of Adrian Bunk of known regressions. Subject: oops in xfrm_register_mode References : http://lkml.org/lkml/2006/10/4/170 Submitter : Steve Fox [EMAIL PROTECTED] Handled-By : Vivek Goyal [EMAIL PROTECTED] Status : patch available o Currently some code pieces assume that address returned by find_e820_area() are page aligned. But looks like find_e820_area() had no such intention and hence one might end up stomping over some of the data. One such case is bootmem allocator initialization code stomped over bss. o This patch modified find_e820_area() to return page aligned address. This might be little wasteful of memory but at the same time probably it is easier to handle page aligned memory. Signed-off-by: Vivek Goyal [EMAIL PROTECTED] --- arch/x86_64/kernel/e820.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff -puN arch/x86_64/kernel/e820.c~x86_64-return-page-aligned-phy-addr-from-find-e820-area arch/x86_64/kernel/e820.c --- linux-2.6.19-rc1-1M/arch/x86_64/kernel/e820.c~x86_64-return-page-aligned-phy-addr-from-find-e820-area 2006-10-06 15:28:13.0 -0400 +++ linux-2.6.19-rc1-1M-root/arch/x86_64/kernel/e820.c 2006-10-06 15:44:45.0 -0400 @@ -54,13 +54,13 @@ static inline int bad_addr(unsigned long /* various gunk below that needed for SMP startup */ if (addr 0x8000) { - *addrp = 0x8000; + *addrp = PAGE_ALIGN(0x8000); return 1; } /* direct mapping tables of the kernel */ if (last = table_startPAGE_SHIFT addr table_endPAGE_SHIFT) { - *addrp = table_end PAGE_SHIFT; + *addrp = PAGE_ALIGN(table_end PAGE_SHIFT); return 1; } @@ -68,18 +68,18 @@ static inline int bad_addr(unsigned long #ifdef CONFIG_BLK_DEV_INITRD if (LOADER_TYPE INITRD_START last = INITRD_START addr INITRD_START+INITRD_SIZE) { - *addrp = INITRD_START + INITRD_SIZE; + *addrp = PAGE_ALIGN(INITRD_START + INITRD_SIZE); return 1; } #endif /* kernel code */ - if (last = __pa_symbol(_text) last __pa_symbol(_end)) { - *addrp = __pa_symbol(_end); + if (last = __pa_symbol(_text) addr __pa_symbol(_end)) { + *addrp = PAGE_ALIGN(__pa_symbol(_end)); return 1; } if (last = ebda_addr addr ebda_addr + ebda_size) { - *addrp = ebda_addr + ebda_size; + *addrp = PAGE_ALIGN(ebda_addr + ebda_size); return 1; } @@ -152,7 +152,7 @@ unsigned long __init find_e820_area(unsi continue; while (bad_addr(addr, size) addr+size = ei-addr+ei-size) ; - last = addr + size; + last = PAGE_ALIGN(addr) + size; if (last ei-addr + ei-size) continue; if (last end) _ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-mm2 boot failure on x86-64
On Fri, Oct 06, 2006 at 03:33:12PM +0100, Mel Gorman wrote: Linux version 2.6.18-git22 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #2 SMP Thu Oct 5 19:05:36 PDT 2006 Command line: root=/dev/sda1 vga=791 ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1 showopts earlyprintk=serial,ttyS0,57600 console=tty0 console=ttyS0,57600 autobench_args: root=/dev/sda1 ABAT:1160100417 BIOS-provided physical RAM map: BIOS-e820: - 0009ac00 (usable) BIOS-e820: 0009ac00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - bff764c0 (usable) BIOS-e820: bff764c0 - bff98880 (ACPI data) BIOS-e820: bff98880 - c000 (reserved) BIOS-e820: fec0 - 0001 (reserved) BIOS-e820: 0001 - 000c (usable) I continued what Steve was doing this morning to see could this be pinned down. After placing 'CHECK;' in a few places as suggested by Andi's check, the problem code was identified as that following in mm/bootmem.c#init_bootmem_core() mapsize = get_mapsize(bdata); memset(bdata-node_bootmem_map, 0xff, mapsize); That explains the value in the array at least. A few more printfs around this point printed out the following in the boot log init_bootmem_core(0, 1909, 0, 12582912) init_bootmem_core: Calling memset(0x81775000, 1572864) AAGH: afinfo corrupted at mm/bootmem.c:121 where; 1909 == mapstart 0 == start 12582912 == end 1572864 == mapsize mapstart, start and end being the parameters being passed to init_bootmem_core(). This means we are calling memset for the physical range 0x775000 - 0x8F5000 which is in a usable range according to the BIOS-e820 map it appears. Hi Mel, Where is bss placed in physical memory? I guess bss_start and bss_stop from System.map will tell us. That will confirm that above memset step is stomping over bss. Then we have to just find that somewhere probably we allocated wrong physical memory area for bootmem allocator map. Thanks Vivek - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-mm2 boot failure on x86-64
On Fri, Oct 06, 2006 at 06:11:05PM +0100, Mel Gorman wrote: On (06/10/06 11:36), Vivek Goyal didst pronounce: On Fri, Oct 06, 2006 at 03:33:12PM +0100, Mel Gorman wrote: Linux version 2.6.18-git22 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #2 SMP Thu Oct 5 19:05:36 PDT 2006 Command line: root=/dev/sda1 vga=791 ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1 showopts earlyprintk=serial,ttyS0,57600 console=tty0 console=ttyS0,57600 autobench_args: root=/dev/sda1 ABAT:1160100417 BIOS-provided physical RAM map: BIOS-e820: - 0009ac00 (usable) BIOS-e820: 0009ac00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - bff764c0 (usable) BIOS-e820: bff764c0 - bff98880 (ACPI data) BIOS-e820: bff98880 - c000 (reserved) BIOS-e820: fec0 - 0001 (reserved) BIOS-e820: 0001 - 000c (usable) I continued what Steve was doing this morning to see could this be pinned down. After placing 'CHECK;' in a few places as suggested by Andi's check, the problem code was identified as that following in mm/bootmem.c#init_bootmem_core() mapsize = get_mapsize(bdata); memset(bdata-node_bootmem_map, 0xff, mapsize); That explains the value in the array at least. A few more printfs around this point printed out the following in the boot log init_bootmem_core(0, 1909, 0, 12582912) init_bootmem_core: Calling memset(0x81775000, 1572864) AAGH: afinfo corrupted at mm/bootmem.c:121 where; 1909 == mapstart 0 == start 12582912 == end 1572864 == mapsize mapstart, start and end being the parameters being passed to init_bootmem_core(). This means we are calling memset for the physical range 0x775000 - 0x8F5000 which is in a usable range according to the BIOS-e820 map it appears. Hi Mel, Hi. Where is bss placed in physical memory? I guess bss_start and bss_stop from System.map will tell us. That will confirm that above memset step is stomping over bss. Then we have to just find that somewhere probably we allocated wrong physical memory area for bootmem allocator map. BSS is at 0x643000 - 0x777BC4 init_bootmem wipes from 0x777000 - 0x8F7000 So the BSS bytes from 0x777000 -0x777BC4 (which looks very suspiciously pile a page alignment of addr PAGE_MASK) gets set to 0xFF. One possible fix is below. It adds a check in bad_addr() to see if the BSS section is about to be used for bootmap. It Seems To Work For Me (tm) and illustrates the source of the problem even if it's not the 100% correct fix. diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-git22-clean/arch/x86_64/kernel/e820.c linux-2.6.18-git22-bss_relocate_fix/arch/x86_64/kernel/e820.c --- linux-2.6.18-git22-clean/arch/x86_64/kernel/e820.c2006-10-05 20:42:07.0 +0100 +++ linux-2.6.18-git22-bss_relocate_fix/arch/x86_64/kernel/e820.c 2006-10-06 17:39:51.0 +0100 @@ -51,6 +51,7 @@ extern struct resource code_resource, da static inline int bad_addr(unsigned long *addrp, unsigned long size) { unsigned long addr = *addrp, last = addr + size; + unsigned long bss_start, bss_end; /* various gunk below that needed for SMP startup */ if (addr 0x8000) { @@ -77,6 +78,14 @@ static inline int bad_addr(unsigned long *addrp = __pa_symbol(_end); return 1; } + + /* bss section */ + bss_start = __pa_symbol(__bss_start); + bss_end = PAGE_ALIGN(__pa_symbol(__bss_stop)); + if (addr = bss_start addr bss_end) { + *addrp = bss_end; + return 1; + } Surprising, the kernel code check just before this should have taken care of it. /* kernel code */ if (last = __pa_symbol(_text) last __pa_symbol(_end)) { *addrp = __pa_symbol(_end); return 1; } May be it can be changed to if (last = __pa_symbol(_text) last PAGE_ALIGN(__pa_symbol(_end))) { But all this seem to be a stopgap fix. Still the real puzzle is exactly where did it slip out and should be fixed there. May be some more printks will help us. Thanks Vivek - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-mm2 boot failure on x86-64
On Fri, Oct 06, 2006 at 06:11:05PM +0100, Mel Gorman wrote: On (06/10/06 11:36), Vivek Goyal didst pronounce: On Fri, Oct 06, 2006 at 03:33:12PM +0100, Mel Gorman wrote: Linux version 2.6.18-git22 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #2 SMP Thu Oct 5 19:05:36 PDT 2006 Command line: root=/dev/sda1 vga=791 ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1 showopts earlyprintk=serial,ttyS0,57600 console=tty0 console=ttyS0,57600 autobench_args: root=/dev/sda1 ABAT:1160100417 BIOS-provided physical RAM map: BIOS-e820: - 0009ac00 (usable) BIOS-e820: 0009ac00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - bff764c0 (usable) BIOS-e820: bff764c0 - bff98880 (ACPI data) BIOS-e820: bff98880 - c000 (reserved) BIOS-e820: fec0 - 0001 (reserved) BIOS-e820: 0001 - 000c (usable) I continued what Steve was doing this morning to see could this be pinned down. After placing 'CHECK;' in a few places as suggested by Andi's check, the problem code was identified as that following in mm/bootmem.c#init_bootmem_core() mapsize = get_mapsize(bdata); memset(bdata-node_bootmem_map, 0xff, mapsize); That explains the value in the array at least. A few more printfs around this point printed out the following in the boot log init_bootmem_core(0, 1909, 0, 12582912) init_bootmem_core: Calling memset(0x81775000, 1572864) AAGH: afinfo corrupted at mm/bootmem.c:121 where; 1909 == mapstart 0 == start 12582912 == end 1572864 == mapsize mapstart, start and end being the parameters being passed to init_bootmem_core(). This means we are calling memset for the physical range 0x775000 - 0x8F5000 which is in a usable range according to the BIOS-e820 map it appears. Hi Mel, Hi. Where is bss placed in physical memory? I guess bss_start and bss_stop from System.map will tell us. That will confirm that above memset step is stomping over bss. Then we have to just find that somewhere probably we allocated wrong physical memory area for bootmem allocator map. BSS is at 0x643000 - 0x777BC4 init_bootmem wipes from 0x777000 - 0x8F7000 So the BSS bytes from 0x777000 -0x777BC4 (which looks very suspiciously pile a page alignment of addr PAGE_MASK) gets set to 0xFF. One possible fix is below. It adds a check in bad_addr() to see if the BSS section is about to be used for bootmap. It Seems To Work For Me (tm) and illustrates the source of the problem even if it's not the 100% correct fix. Ok, it looks like that code is assuming that memory area returned by find_e820_area() is page aligned. I found two such instances and that's what is leading to problem. bootmap_size = init_bootmem_node(NODE_DATA(nodeid), bootmap_start PAGE_SHIFT, start_pfn, end_pfn); Here bootmap_start is not page aligned and I guess currently should contain the value 0x777BC4 (just beyond _end). But the moement I do bootmap_startPAGE_SHIFT, I start stomping bss. Similar is the case here. bootmap = find_e820_area(0, end_pfnPAGE_SHIFT, bootmap_size); if (bootmap == -1L) panic(Cannot find bootmem map of size %ld\n,bootmap_size); bootmap_size = init_bootmem(bootmap PAGE_SHIFT, end_pfn); So may be we should return a page aligned address from find_e820_area(). May be we can change bad_addr() to set *addrp to next page aligned boundary for every check? *addrp = PAGE_ALIGN(__pa_symbol(_end)); Thanks Vivek - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-mm2 boot failure on x86-64
On Fri, Oct 06, 2006 at 01:03:50PM -0500, Steve Fox wrote: On Fri, 2006-10-06 at 18:11 +0100, Mel Gorman wrote: On (06/10/06 11:36), Vivek Goyal didst pronounce: Where is bss placed in physical memory? I guess bss_start and bss_stop from System.map will tell us. That will confirm that above memset step is stomping over bss. Then we have to just find that somewhere probably we allocated wrong physical memory area for bootmem allocator map. BSS is at 0x643000 - 0x777BC4 init_bootmem wipes from 0x777000 - 0x8F7000 So the BSS bytes from 0x777000 -0x777BC4 (which looks very suspiciously pile a page alignment of addr PAGE_MASK) gets set to 0xFF. One possible fix is below. It adds a check in bad_addr() to see if the BSS section is about to be used for bootmap. It Seems To Work For Me (tm) and illustrates the source of the problem even if it's not the 100% correct fix. I was able to boot the machine with Mel's patch applied on top of -git22. Please have a look at the attached patch. Does it make some sense. Steve, can you please give this patch a try if it fixes the problem? Thanks Vivek o Currently some code pieces assume that address returned by find_e820_area() are page aligned. But looks like find_e820_area() had no such intention and hence one might end up stomping over some of the data. One such case is bootmem allocator initialization code stomped over bss. o This patch modified find_e820_area() to return page aligned address. This might be little wasteful of memory but at the same time probably it is easier to handle page aligned memory. Signed-off-by: Vivek Goyal [EMAIL PROTECTED] --- arch/x86_64/kernel/e820.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff -puN arch/x86_64/kernel/e820.c~x86_64-return-page-aligned-phy-addr-from-find-e820-area arch/x86_64/kernel/e820.c --- linux-2.6.19-rc1-1M/arch/x86_64/kernel/e820.c~x86_64-return-page-aligned-phy-addr-from-find-e820-area 2006-10-06 15:28:13.0 -0400 +++ linux-2.6.19-rc1-1M-root/arch/x86_64/kernel/e820.c 2006-10-06 15:44:45.0 -0400 @@ -54,13 +54,13 @@ static inline int bad_addr(unsigned long /* various gunk below that needed for SMP startup */ if (addr 0x8000) { - *addrp = 0x8000; + *addrp = PAGE_ALIGN(0x8000); return 1; } /* direct mapping tables of the kernel */ if (last = table_startPAGE_SHIFT addr table_endPAGE_SHIFT) { - *addrp = table_end PAGE_SHIFT; + *addrp = PAGE_ALIGN(table_end PAGE_SHIFT); return 1; } @@ -68,18 +68,18 @@ static inline int bad_addr(unsigned long #ifdef CONFIG_BLK_DEV_INITRD if (LOADER_TYPE INITRD_START last = INITRD_START addr INITRD_START+INITRD_SIZE) { - *addrp = INITRD_START + INITRD_SIZE; + *addrp = PAGE_ALIGN(INITRD_START + INITRD_SIZE); return 1; } #endif /* kernel code */ - if (last = __pa_symbol(_text) last __pa_symbol(_end)) { - *addrp = __pa_symbol(_end); + if (last = __pa_symbol(_text) addr __pa_symbol(_end)) { + *addrp = PAGE_ALIGN(__pa_symbol(_end)); return 1; } if (last = ebda_addr addr ebda_addr + ebda_size) { - *addrp = ebda_addr + ebda_size; + *addrp = PAGE_ALIGN(ebda_addr + ebda_size); return 1; } @@ -152,7 +152,7 @@ unsigned long __init find_e820_area(unsi continue; while (bad_addr(addr, size) addr+size = ei-addr+ei-size) ; - last = addr + size; + last = PAGE_ALIGN(addr) + size; if (last ei-addr + ei-size) continue; if (last end) _ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-mm2 boot failure on x86-64
On Thu, Oct 05, 2006 at 08:27:02PM +0200, Andi Kleen wrote: On Thursday 05 October 2006 19:57, Steve Fox wrote: On Thu, 2006-10-05 at 17:40 +0200, Andi Kleen wrote: Please don't snip the Code: line. It is fairly important. Sorry about that. The remote console I was using appears to overwrite some text after I force the reboot. Here's a clean one. global Ok that definitely shouldn't be in there. I guess we need to track when it gets corrupted. Can you send the full boot log with this patch applied? Just recalled one more observation about the problem when keith had reported it last. If I just move .bss before .data_nosave instead of it being at the end, keith's problem had disappeared. Thanks Vivek - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-mm2 boot failure on x86-64
On Wed, Oct 04, 2006 at 08:45:40AM -0700, Andrew Morton wrote: On Wed, 04 Oct 2006 08:42:28 -0500 Steve Fox [EMAIL PROTECTED] wrote: On Thu, 2006-09-28 at 14:01 -0700, Andrew Morton wrote: On Thu, 28 Sep 2006 17:50:31 + (UTC) Steve Fox [EMAIL PROTECTED] wrote: On Thu, 28 Sep 2006 01:46:23 -0700, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm2/ Panic on boot. This machine booted 2.6.18-mm1 fine. em64t machine. TCP bic registered TCP westwood registered TCP htcp registered NET: Registered protocol family 1 NET: Registered protocol family 17 Unable to handle kernel paging request at RIP: [8047ef93] packet_notifier+0x163/0x1a0 PGD 203027 PUD 2b031067 PMD 0 Oops: [1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.18-mm2-autokern1 #1 RIP: 0010:[8047ef93] [8047ef93] packet_notifier+0x163/0x1a0 RSP: :810bffcbde90 EFLAGS: 00010286 RAX: RBX: 810bff4a1000 RCX: RDX: 810bff4a1000 RSI: 0005 RDI: 8055f5e0 RBP: R08: 7616 R09: 000e R10: 0006 R11: 803373f0 R12: R13: 0005 R14: 810bff4a1000 R15: FS: () GS:805d8000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 00201000 CR4: 06e0 Process swapper (pid: 1, threadinfo 810bffcbc000, task 810bffcbb510) Stack: 810bff4a1000 8055f4c0 810bffcbdef0 8042736e 8061c68d 806260f0 80207182 Call Trace: [8042736e] register_netdevice_notifier+0x3e/0x70 [8061c68d] packet_init+0x2d/0x53 [80207182] init+0x162/0x330 [8020a9d8] child_rip+0xa/0x12 [8033c2a2] acpi_ds_init_one_object+0x0/0x82 [80207020] init+0x0/0x330 [8020a9ce] child_rip+0x0/0x12 Code: 48 8b 45 00 0f 18 08 49 83 fd 02 4c 8d 65 f8 0f 84 f8 fe ff RIP [8047ef93] packet_notifier+0x163/0x1a0 RSP 810bffcbde90 CR2: 0Kernel panic - not syncing: Attempted to kill init! I'm really struggling to work out what went wrong there. Comparing your miserable 20 bytes of code to my object code makes me think that this: struct packet_sock *po = pkt_sk(sk); returned -1, perhaps in %ebp. But it's all very crude. Perhaps you could compile that kernel with CONFIG_DEBUG_INFO, rerun it (the addresses might change) then have a poke around with `gdb vmlinux' (or maybe just addr2line) to work out where it's really oopsing? I don't see much which has changed in that area recently. Sorry for the delay. I was finally able to perform a bisect on this. It turns out the patch that causes this is x86_64-mm-re-positioning-the-bss-segment.patch, which seems like a strange candidate, but sure enough I can boot to login: right up until that patch is applied. hm, that patch was merged into mainline September 29. Does mainline work? I thought above patch was dropped because Keith ran into some boot issues on one of the machines. Though there seems to be nothing wrong with the patch as such but it might have triggered some existing bug. At that point of time I looked into the issue but nothing was conclusive. So looks like this patch has come back. I am not sure how. Thanks Vivek - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-mm2 boot failure on x86-64
On Wed, Oct 04, 2006 at 05:06:59PM -0700, Andrew Morton wrote: On Wed, 04 Oct 2006 11:41:59 -0500 Steve Fox [EMAIL PROTECTED] wrote: On Wed, 2006-10-04 at 08:45 -0700, Andrew Morton wrote: On Wed, 04 Oct 2006 08:42:28 -0500 Steve Fox [EMAIL PROTECTED] wrote: Sorry for the delay. I was finally able to perform a bisect on this. It turns out the patch that causes this is x86_64-mm-re-positioning-the-bss-segment.patch, which seems like a strange candidate, but sure enough I can boot to login: right up until that patch is applied. hm, that patch was merged into mainline September 29. Does mainline work? -git21 also fails with this same error. OK, thanks. And we know that x86_64-mm-re-positioning-the-bss-segment.patch triggered this failure. And that patch is non-buggy, and the xfrm code is probably non-buggy. So we don't know squat, and we're going to need to debug this crash. Well. There is one trick we could use: apply x86_64-mm-re-positioning-the-bss-segment.patch to 2.6.18 base and see if it crashes. If it doesn't, then we can theorise that the bug is some buggy post 2.6.18 patch which is being exposed by I think most likely it would crash on 2.6.18. Keith mannthey had reported a different crash on 2.6.18-rc4-mm2 when this patch was introduced first time. Following is the link to the thread. http://marc.theaimsgroup.com/?l=linux-kernelm=115629369729911w=2 Following is the backtrace he had reported. Unable to handle kernel NULL pointer dereference at 0007 RIP: [803d45b0] __unix_insert_socket+0x49/0x5a PGD 115c934067 PUD 115c935067 PMD 0 Oops: 0002 [1] SMP last sysfs file: CPU 14 Modules linked in: Pid: 1, comm: init Not tainted 2.6.18-rc4-mm2-smp #3 RIP: 0010:[803d45b0] [803d45b0] __unix_insert_socket+0x49/0x5a RSP: 0018:810460605eb8 EFLAGS: 00010286 RAX: RBX: 81115c171c80 RCX: RDX: 81115c171c88 RSI: 81115c171c80 RDI: 806656e0 RBP: 806656e0 R08: 81115c069200 R09: 8110700b4000 R10: R11: 0002 R12: 81115c170d00 R13: 0001 R14: 0001 R15: FS: 2b793a4fd6d0() GS:81115c910e40() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0007 CR3: 00115c92d000 CR4: 06e0 Process init (pid: 1, threadinfo 810460604000, task 81115cb10040) Stack: 00010001 81115c171c80 803d58e9 8045bb30 000180298f61 80498080 0001 81115c170d00 803d595d 0004 80376061 Call Trace: [803d58e9] unix_create1+0xf3/0x107 [803d595d] unix_create+0x60/0x6b [80376061] __sock_create+0x12f/0x227 [80376429] sys_socket+0xf/0x37 [8020968e] system_call+0x7e/0x83 Code: 48 89 50 08 48 89 55 00 48 89 6a 08 41 58 5b 5d c3 c7 47 08 RIP [803d45b0] __unix_insert_socket+0x49/0x5a RSP 810460605eb8 CR2: 0007 0Kernel panic - not syncing: Attempted to kill init! Thanks Vivek - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html