subject:"2.6.18\-mm2 boot failure on x86\-64"

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-17 Thread Adrian Bunk

On Mon, Oct 16, 2006 at 04:58:14PM -0700, Andrew Morton wrote:
 On Mon, 16 Oct 2006 14:16:13 -0400
 Vivek Goyal [EMAIL PROTECTED] wrote:
 
  
  Can you please have a look at the attached patch
 
 Looks like a fine patch to me, although it could benefit from a comment
 explaining why all those PAGE_ALIGN()s are in there.
 
  and include it in -mm.
 
 Does it fix a patch in -mm or is it needed in mainline?

The bug in my list was reported to be present in mainline [1].

cu
Adrian

[1] http://lkml.org/lkml/2006/10/4/394

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-17 Thread Mel Gorman


On Tue, 17 Oct 2006, Adrian Bunk wrote:


On Mon, Oct 16, 2006 at 04:58:14PM -0700, Andrew Morton wrote:

On Mon, 16 Oct 2006 14:16:13 -0400
Vivek Goyal [EMAIL PROTECTED] wrote:



Can you please have a look at the attached patch


Looks like a fine patch to me, although it could benefit from a comment
explaining why all those PAGE_ALIGN()s are in there.


and include it in -mm.


Does it fix a patch in -mm or is it needed in mainline?


The bug in my list was reported to be present in mainline [1].



Confirmed. This bug is present in 2.6.19-rc2

--
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-16 Thread Vivek Goyal

On Mon, Oct 09, 2006 at 10:53:58AM +0100, Mel Gorman wrote:
 On Fri, 6 Oct 2006, Vivek Goyal wrote:
 
 On Fri, Oct 06, 2006 at 01:03:50PM -0500, Steve Fox wrote:
 On Fri, 2006-10-06 at 18:11 +0100, Mel Gorman wrote:
 On (06/10/06 11:36), Vivek Goyal didst pronounce:
 Where is bss placed in physical memory? I guess bss_start and bss_stop
 from System.map will tell us. That will confirm that above memset step 
 is
 stomping over bss. Then we have to just find that somewhere probably
 we allocated wrong physical memory area for bootmem allocator map.
 
 
 BSS is at 0x643000 - 0x777BC4
 init_bootmem wipes from 0x777000 - 0x8F7000
 
 So the BSS bytes from 0x777000 -0x777BC4 (which looks very suspiciously
 pile a page alignment of addr  PAGE_MASK) gets set to 0xFF. One possible
 fix is below. It adds a check in bad_addr() to see if the BSS section is
 about to be used for bootmap. It Seems To Work For Me (tm) and 
 illustrates
 the source of the problem even if it's not the 100% correct fix.
 
 I was able to boot the machine with Mel's patch applied on top of
 -git22.
 
 
 Please have a look at the attached patch. Does it make some sense.
 
 
 It makes some sense. As you state, it wastes memory but that is better 
 than breaking.
 
 Steve, can you please give this patch a try if it fixes the problem?
 
 
 I boottested the patch on the same machine as Steve was using and it 
 completed successfully.


Hi Andrew,

Can you please have a look at the attached patch and include it in -mm.
This fixes the issue for steve. It also figures in the list of Adrian Bunk
of known regressions.

Subject: oops in xfrm_register_mode
References : http://lkml.org/lkml/2006/10/4/170
Submitter  : Steve Fox [EMAIL PROTECTED]
Handled-By : Vivek Goyal [EMAIL PROTECTED]
Status : patch available



o Currently some code pieces assume that address returned by find_e820_area()
  are page aligned. But looks like find_e820_area() had no such intention
  and hence one might end up stomping over some of the data. One such
  case is bootmem allocator initialization code stomped over bss.

o This patch modified find_e820_area() to return page aligned address. This
  might be little wasteful of memory but at the same time probably it is
  easier to handle page aligned memory. 

Signed-off-by: Vivek Goyal [EMAIL PROTECTED]
---

 arch/x86_64/kernel/e820.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff -puN 
arch/x86_64/kernel/e820.c~x86_64-return-page-aligned-phy-addr-from-find-e820-area
 arch/x86_64/kernel/e820.c
--- 
linux-2.6.19-rc1-1M/arch/x86_64/kernel/e820.c~x86_64-return-page-aligned-phy-addr-from-find-e820-area
   2006-10-06 15:28:13.0 -0400
+++ linux-2.6.19-rc1-1M-root/arch/x86_64/kernel/e820.c  2006-10-06 
15:44:45.0 -0400
@@ -54,13 +54,13 @@ static inline int bad_addr(unsigned long
 
/* various gunk below that needed for SMP startup */
if (addr  0x8000) { 
-   *addrp = 0x8000;
+   *addrp = PAGE_ALIGN(0x8000);
return 1; 
}
 
/* direct mapping tables of the kernel */
if (last = table_startPAGE_SHIFT  addr  table_endPAGE_SHIFT) { 
-   *addrp = table_end  PAGE_SHIFT; 
+   *addrp = PAGE_ALIGN(table_end  PAGE_SHIFT);
return 1;
} 
 
@@ -68,18 +68,18 @@ static inline int bad_addr(unsigned long
 #ifdef CONFIG_BLK_DEV_INITRD
if (LOADER_TYPE  INITRD_START  last = INITRD_START  
addr  INITRD_START+INITRD_SIZE) { 
-   *addrp = INITRD_START + INITRD_SIZE; 
+   *addrp = PAGE_ALIGN(INITRD_START + INITRD_SIZE);
return 1;
} 
 #endif
/* kernel code */
-   if (last = __pa_symbol(_text)  last  __pa_symbol(_end)) {
-   *addrp = __pa_symbol(_end);
+   if (last = __pa_symbol(_text)  addr  __pa_symbol(_end)) {
+   *addrp = PAGE_ALIGN(__pa_symbol(_end));
return 1;
}
 
if (last = ebda_addr  addr  ebda_addr + ebda_size) {
-   *addrp = ebda_addr + ebda_size;
+   *addrp = PAGE_ALIGN(ebda_addr + ebda_size);
return 1;
}
 
@@ -152,7 +152,7 @@ unsigned long __init find_e820_area(unsi
continue; 
while (bad_addr(addr, size)  addr+size = ei-addr+ei-size)
;
-   last = addr + size;
+   last = PAGE_ALIGN(addr) + size;
if (last  ei-addr + ei-size)
continue;
if (last  end) 
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-16 Thread Andrew Morton

On Mon, 16 Oct 2006 14:16:13 -0400
Vivek Goyal [EMAIL PROTECTED] wrote:

 
 Can you please have a look at the attached patch

Looks like a fine patch to me, although it could benefit from a comment
explaining why all those PAGE_ALIGN()s are in there.

 and include it in -mm.

Does it fix a patch in -mm or is it needed in mainline?


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-09 Thread Mel Gorman


On Fri, 6 Oct 2006, Vivek Goyal wrote:


On Fri, Oct 06, 2006 at 01:03:50PM -0500, Steve Fox wrote:

On Fri, 2006-10-06 at 18:11 +0100, Mel Gorman wrote:

On (06/10/06 11:36), Vivek Goyal didst pronounce:

Where is bss placed in physical memory? I guess bss_start and bss_stop
from System.map will tell us. That will confirm that above memset step is
stomping over bss. Then we have to just find that somewhere probably
we allocated wrong physical memory area for bootmem allocator map.



BSS is at 0x643000 - 0x777BC4
init_bootmem wipes from 0x777000 - 0x8F7000

So the BSS bytes from 0x777000 -0x777BC4 (which looks very suspiciously
pile a page alignment of addr  PAGE_MASK) gets set to 0xFF. One possible
fix is below. It adds a check in bad_addr() to see if the BSS section is
about to be used for bootmap. It Seems To Work For Me (tm) and illustrates
the source of the problem even if it's not the 100% correct fix.


I was able to boot the machine with Mel's patch applied on top of
-git22.



Please have a look at the attached patch. Does it make some sense.



It makes some sense. As you state, it wastes memory but that is better 
than breaking.



Steve, can you please give this patch a try if it fixes the problem?



I boottested the patch on the same machine as Steve was using and it 
completed successfully.



Thanks
Vivek




o Currently some code pieces assume that address returned by find_e820_area()
 are page aligned. But looks like find_e820_area() had no such intention
 and hence one might end up stomping over some of the data. One such
 case is bootmem allocator initialization code stomped over bss.

o This patch modified find_e820_area() to return page aligned address. This
 might be little wasteful of memory but at the same time probably it is
 easier to handle page aligned memory.

Signed-off-by: Vivek Goyal [EMAIL PROTECTED]
---

arch/x86_64/kernel/e820.c |   14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)

diff -puN 
arch/x86_64/kernel/e820.c~x86_64-return-page-aligned-phy-addr-from-find-e820-area
 arch/x86_64/kernel/e820.c
--- 
linux-2.6.19-rc1-1M/arch/x86_64/kernel/e820.c~x86_64-return-page-aligned-phy-addr-from-find-e820-area
   2006-10-06 15:28:13.0 -0400
+++ linux-2.6.19-rc1-1M-root/arch/x86_64/kernel/e820.c  2006-10-06 
15:44:45.0 -0400
@@ -54,13 +54,13 @@ static inline int bad_addr(unsigned long

/* various gunk below that needed for SMP startup */
if (addr  0x8000) {
-   *addrp = 0x8000;
+   *addrp = PAGE_ALIGN(0x8000);
return 1;
}

/* direct mapping tables of the kernel */
if (last = table_startPAGE_SHIFT  addr  table_endPAGE_SHIFT) {
-   *addrp = table_end  PAGE_SHIFT;
+   *addrp = PAGE_ALIGN(table_end  PAGE_SHIFT);
return 1;
}

@@ -68,18 +68,18 @@ static inline int bad_addr(unsigned long
#ifdef CONFIG_BLK_DEV_INITRD
if (LOADER_TYPE  INITRD_START  last = INITRD_START 
addr  INITRD_START+INITRD_SIZE) {
-   *addrp = INITRD_START + INITRD_SIZE;
+   *addrp = PAGE_ALIGN(INITRD_START + INITRD_SIZE);
return 1;
}
#endif
/* kernel code */
-   if (last = __pa_symbol(_text)  last  __pa_symbol(_end)) {
-   *addrp = __pa_symbol(_end);
+   if (last = __pa_symbol(_text)  addr  __pa_symbol(_end)) {
+   *addrp = PAGE_ALIGN(__pa_symbol(_end));
return 1;
}

if (last = ebda_addr  addr  ebda_addr + ebda_size) {
-   *addrp = ebda_addr + ebda_size;
+   *addrp = PAGE_ALIGN(ebda_addr + ebda_size);
return 1;
}

@@ -152,7 +152,7 @@ unsigned long __init find_e820_area(unsi
continue;
while (bad_addr(addr, size)  addr+size = ei-addr+ei-size)
;
-   last = addr + size;
+   last = PAGE_ALIGN(addr) + size;
if (last  ei-addr + ei-size)
continue;
if (last  end)
_



--
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-06 Thread Vivek Goyal

On Fri, Oct 06, 2006 at 03:33:12PM +0100, Mel Gorman wrote:
  Linux version 2.6.18-git22 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE 
  Linux)) #2 SMP Thu Oct 5 19:05:36 PDT 2006
  Command line: root=/dev/sda1 vga=791  
  ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1 showopts 
  earlyprintk=serial,ttyS0,57600 console=tty0 console=ttyS0,57600 
  autobench_args: root=/dev/sda1 ABAT:1160100417
  BIOS-provided physical RAM map:
   BIOS-e820:  - 0009ac00 (usable)
   BIOS-e820: 0009ac00 - 000a (reserved)
   BIOS-e820: 000e - 0010 (reserved)
   BIOS-e820: 0010 - bff764c0 (usable)
   BIOS-e820: bff764c0 - bff98880 (ACPI data)
   BIOS-e820: bff98880 - c000 (reserved)
   BIOS-e820: fec0 - 0001 (reserved)
   BIOS-e820: 0001 - 000c (usable)
 
 I continued what Steve was doing this morning to see could this be
 pinned down. After placing 'CHECK;' in a few places as suggested by
 Andi's check, the problem code was identified as that following in
 mm/bootmem.c#init_bootmem_core()
 
 mapsize = get_mapsize(bdata);
 memset(bdata-node_bootmem_map, 0xff, mapsize);
 
 That explains the value in the array at least. A few more printfs around
 this point printed out the following in the boot log
 
 init_bootmem_core(0, 1909, 0, 12582912)
 init_bootmem_core: Calling memset(0x81775000, 1572864)
 AAGH: afinfo corrupted at mm/bootmem.c:121
 
 where;
 
 1909 == mapstart
 0 == start
 12582912 == end
 1572864 == mapsize
 
 mapstart, start and end being the parameters being passed to
 init_bootmem_core(). This means we are calling memset for the physical
 range 0x775000 - 0x8F5000 which is in a usable range according to the
 BIOS-e820 map it appears.
 

Hi Mel,

Where is bss placed in physical memory? I guess bss_start and bss_stop
from System.map will tell us. That will confirm that above memset step is
stomping over bss. Then we have to just find that somewhere probably
we allocated wrong physical memory area for bootmem allocator map.

Thanks
Vivek

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-06 Thread Vivek Goyal

On Fri, Oct 06, 2006 at 06:11:05PM +0100, Mel Gorman wrote:
 On (06/10/06 11:36), Vivek Goyal didst pronounce:
  On Fri, Oct 06, 2006 at 03:33:12PM +0100, Mel Gorman wrote:
Linux version 2.6.18-git22 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE 
Linux)) #2 SMP Thu Oct 5 19:05:36 PDT 2006
Command line: root=/dev/sda1 vga=791  
ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1 
showopts earlyprintk=serial,ttyS0,57600 console=tty0 
console=ttyS0,57600 autobench_args: root=/dev/sda1 ABAT:1160100417
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009ac00 (usable)
 BIOS-e820: 0009ac00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - bff764c0 (usable)
 BIOS-e820: bff764c0 - bff98880 (ACPI data)
 BIOS-e820: bff98880 - c000 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
 BIOS-e820: 0001 - 000c (usable)
   
   I continued what Steve was doing this morning to see could this be
   pinned down. After placing 'CHECK;' in a few places as suggested by
   Andi's check, the problem code was identified as that following in
   mm/bootmem.c#init_bootmem_core()
   
   mapsize = get_mapsize(bdata);
   memset(bdata-node_bootmem_map, 0xff, mapsize);
   
   That explains the value in the array at least. A few more printfs around
   this point printed out the following in the boot log
   
   init_bootmem_core(0, 1909, 0, 12582912)
   init_bootmem_core: Calling memset(0x81775000, 1572864)
   AAGH: afinfo corrupted at mm/bootmem.c:121
   
   where;
   
   1909 == mapstart
   0 == start
   12582912 == end
   1572864 == mapsize
   
   mapstart, start and end being the parameters being passed to
   init_bootmem_core(). This means we are calling memset for the physical
   range 0x775000 - 0x8F5000 which is in a usable range according to the
   BIOS-e820 map it appears.
   
  
  Hi Mel,
  
 
 Hi.
 
  Where is bss placed in physical memory? I guess bss_start and bss_stop
  from System.map will tell us. That will confirm that above memset step is
  stomping over bss. Then we have to just find that somewhere probably
  we allocated wrong physical memory area for bootmem allocator map.
  
 
 BSS is at 0x643000 - 0x777BC4
 init_bootmem wipes from 0x777000 - 0x8F7000
 
 So the BSS bytes from 0x777000 -0x777BC4 (which looks very suspiciously
 pile a page alignment of addr  PAGE_MASK) gets set to 0xFF. One possible
 fix is below. It adds a check in bad_addr() to see if the BSS section is
 about to be used for bootmap. It Seems To Work For Me (tm) and illustrates
 the source of the problem even if it's not the 100% correct fix.
 
 diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
 linux-2.6.18-git22-clean/arch/x86_64/kernel/e820.c 
 linux-2.6.18-git22-bss_relocate_fix/arch/x86_64/kernel/e820.c
 --- linux-2.6.18-git22-clean/arch/x86_64/kernel/e820.c2006-10-05 
 20:42:07.0 +0100
 +++ linux-2.6.18-git22-bss_relocate_fix/arch/x86_64/kernel/e820.c 
 2006-10-06 17:39:51.0 +0100
 @@ -51,6 +51,7 @@ extern struct resource code_resource, da
  static inline int bad_addr(unsigned long *addrp, unsigned long size)
  { 
   unsigned long addr = *addrp, last = addr + size; 
 + unsigned long bss_start, bss_end;
  
   /* various gunk below that needed for SMP startup */
   if (addr  0x8000) { 
 @@ -77,6 +78,14 @@ static inline int bad_addr(unsigned long
   *addrp = __pa_symbol(_end);
   return 1;
   }
 + 
 + /* bss section */
 + bss_start = __pa_symbol(__bss_start);
 + bss_end = PAGE_ALIGN(__pa_symbol(__bss_stop));
 + if (addr = bss_start  addr  bss_end) {
 + *addrp = bss_end;
 + return 1;
 + }
  

Surprising, the kernel code check just before this should have taken care
of it.

 /* kernel code */
if (last = __pa_symbol(_text)  last  __pa_symbol(_end)) {
*addrp = __pa_symbol(_end);
return 1;
}
May be it can be changed to 
if (last = __pa_symbol(_text)  last  
PAGE_ALIGN(__pa_symbol(_end))) {

But all this seem to be a stopgap fix. Still the real puzzle is exactly
where did it slip out and should be fixed there.

May be some more printks will help us.

Thanks
Vivek
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-06 Thread Vivek Goyal

On Fri, Oct 06, 2006 at 06:11:05PM +0100, Mel Gorman wrote:
 On (06/10/06 11:36), Vivek Goyal didst pronounce:
  On Fri, Oct 06, 2006 at 03:33:12PM +0100, Mel Gorman wrote:
Linux version 2.6.18-git22 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE 
Linux)) #2 SMP Thu Oct 5 19:05:36 PDT 2006
Command line: root=/dev/sda1 vga=791  
ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1 
showopts earlyprintk=serial,ttyS0,57600 console=tty0 
console=ttyS0,57600 autobench_args: root=/dev/sda1 ABAT:1160100417
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009ac00 (usable)
 BIOS-e820: 0009ac00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - bff764c0 (usable)
 BIOS-e820: bff764c0 - bff98880 (ACPI data)
 BIOS-e820: bff98880 - c000 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
 BIOS-e820: 0001 - 000c (usable)
   
   I continued what Steve was doing this morning to see could this be
   pinned down. After placing 'CHECK;' in a few places as suggested by
   Andi's check, the problem code was identified as that following in
   mm/bootmem.c#init_bootmem_core()
   
   mapsize = get_mapsize(bdata);
   memset(bdata-node_bootmem_map, 0xff, mapsize);
   
   That explains the value in the array at least. A few more printfs around
   this point printed out the following in the boot log
   
   init_bootmem_core(0, 1909, 0, 12582912)
   init_bootmem_core: Calling memset(0x81775000, 1572864)
   AAGH: afinfo corrupted at mm/bootmem.c:121
   
   where;
   
   1909 == mapstart
   0 == start
   12582912 == end
   1572864 == mapsize
   
   mapstart, start and end being the parameters being passed to
   init_bootmem_core(). This means we are calling memset for the physical
   range 0x775000 - 0x8F5000 which is in a usable range according to the
   BIOS-e820 map it appears.
   
  
  Hi Mel,
  
 
 Hi.
 
  Where is bss placed in physical memory? I guess bss_start and bss_stop
  from System.map will tell us. That will confirm that above memset step is
  stomping over bss. Then we have to just find that somewhere probably
  we allocated wrong physical memory area for bootmem allocator map.
  
 
 BSS is at 0x643000 - 0x777BC4
 init_bootmem wipes from 0x777000 - 0x8F7000
 
 So the BSS bytes from 0x777000 -0x777BC4 (which looks very suspiciously
 pile a page alignment of addr  PAGE_MASK) gets set to 0xFF. One possible
 fix is below. It adds a check in bad_addr() to see if the BSS section is
 about to be used for bootmap. It Seems To Work For Me (tm) and illustrates
 the source of the problem even if it's not the 100% correct fix.
 

Ok, it looks like that code is assuming that memory area returned by
find_e820_area() is page aligned. I found two such instances and that's
what is leading to problem.

bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
 bootmap_start  PAGE_SHIFT,
 start_pfn, end_pfn);

Here bootmap_start is not page aligned and I guess  currently should
contain the value 0x777BC4 (just beyond _end). But the moement I do
bootmap_startPAGE_SHIFT, I start stomping bss.

Similar is the case here.

bootmap = find_e820_area(0, end_pfnPAGE_SHIFT, bootmap_size);
if (bootmap == -1L)
panic(Cannot find bootmem map of size %ld\n,bootmap_size);
bootmap_size = init_bootmem(bootmap  PAGE_SHIFT, end_pfn);

So may be we should return a page aligned address from find_e820_area(). 
May be we can change bad_addr() to set *addrp to next page aligned 
boundary for every check?

*addrp = PAGE_ALIGN(__pa_symbol(_end));

Thanks
Vivek
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-06 Thread Steve Fox

On Fri, 2006-10-06 at 18:11 +0100, Mel Gorman wrote:
 On (06/10/06 11:36), Vivek Goyal didst pronounce:
  Where is bss placed in physical memory? I guess bss_start and bss_stop
  from System.map will tell us. That will confirm that above memset step is
  stomping over bss. Then we have to just find that somewhere probably
  we allocated wrong physical memory area for bootmem allocator map.
  
 
 BSS is at 0x643000 - 0x777BC4
 init_bootmem wipes from 0x777000 - 0x8F7000
 
 So the BSS bytes from 0x777000 -0x777BC4 (which looks very suspiciously
 pile a page alignment of addr  PAGE_MASK) gets set to 0xFF. One possible
 fix is below. It adds a check in bad_addr() to see if the BSS section is
 about to be used for bootmap. It Seems To Work For Me (tm) and illustrates
 the source of the problem even if it's not the 100% correct fix.

I was able to boot the machine with Mel's patch applied on top of
-git22.

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-06 Thread Vivek Goyal

On Fri, Oct 06, 2006 at 01:03:50PM -0500, Steve Fox wrote:
 On Fri, 2006-10-06 at 18:11 +0100, Mel Gorman wrote:
  On (06/10/06 11:36), Vivek Goyal didst pronounce:
   Where is bss placed in physical memory? I guess bss_start and bss_stop
   from System.map will tell us. That will confirm that above memset step is
   stomping over bss. Then we have to just find that somewhere probably
   we allocated wrong physical memory area for bootmem allocator map.
   
  
  BSS is at 0x643000 - 0x777BC4
  init_bootmem wipes from 0x777000 - 0x8F7000
  
  So the BSS bytes from 0x777000 -0x777BC4 (which looks very suspiciously
  pile a page alignment of addr  PAGE_MASK) gets set to 0xFF. One possible
  fix is below. It adds a check in bad_addr() to see if the BSS section is
  about to be used for bootmap. It Seems To Work For Me (tm) and illustrates
  the source of the problem even if it's not the 100% correct fix.
 
 I was able to boot the machine with Mel's patch applied on top of
 -git22.


Please have a look at the attached patch. Does it make some sense. 

Steve, can you please give this patch a try if it fixes the problem?

Thanks
Vivek




o Currently some code pieces assume that address returned by find_e820_area()
  are page aligned. But looks like find_e820_area() had no such intention
  and hence one might end up stomping over some of the data. One such
  case is bootmem allocator initialization code stomped over bss.

o This patch modified find_e820_area() to return page aligned address. This
  might be little wasteful of memory but at the same time probably it is
  easier to handle page aligned memory. 

Signed-off-by: Vivek Goyal [EMAIL PROTECTED]
---

 arch/x86_64/kernel/e820.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff -puN 
arch/x86_64/kernel/e820.c~x86_64-return-page-aligned-phy-addr-from-find-e820-area
 arch/x86_64/kernel/e820.c
--- 
linux-2.6.19-rc1-1M/arch/x86_64/kernel/e820.c~x86_64-return-page-aligned-phy-addr-from-find-e820-area
   2006-10-06 15:28:13.0 -0400
+++ linux-2.6.19-rc1-1M-root/arch/x86_64/kernel/e820.c  2006-10-06 
15:44:45.0 -0400
@@ -54,13 +54,13 @@ static inline int bad_addr(unsigned long
 
/* various gunk below that needed for SMP startup */
if (addr  0x8000) { 
-   *addrp = 0x8000;
+   *addrp = PAGE_ALIGN(0x8000);
return 1; 
}
 
/* direct mapping tables of the kernel */
if (last = table_startPAGE_SHIFT  addr  table_endPAGE_SHIFT) { 
-   *addrp = table_end  PAGE_SHIFT; 
+   *addrp = PAGE_ALIGN(table_end  PAGE_SHIFT);
return 1;
} 
 
@@ -68,18 +68,18 @@ static inline int bad_addr(unsigned long
 #ifdef CONFIG_BLK_DEV_INITRD
if (LOADER_TYPE  INITRD_START  last = INITRD_START  
addr  INITRD_START+INITRD_SIZE) { 
-   *addrp = INITRD_START + INITRD_SIZE; 
+   *addrp = PAGE_ALIGN(INITRD_START + INITRD_SIZE);
return 1;
} 
 #endif
/* kernel code */
-   if (last = __pa_symbol(_text)  last  __pa_symbol(_end)) {
-   *addrp = __pa_symbol(_end);
+   if (last = __pa_symbol(_text)  addr  __pa_symbol(_end)) {
+   *addrp = PAGE_ALIGN(__pa_symbol(_end));
return 1;
}
 
if (last = ebda_addr  addr  ebda_addr + ebda_size) {
-   *addrp = ebda_addr + ebda_size;
+   *addrp = PAGE_ALIGN(ebda_addr + ebda_size);
return 1;
}
 
@@ -152,7 +152,7 @@ unsigned long __init find_e820_area(unsi
continue; 
while (bad_addr(addr, size)  addr+size = ei-addr+ei-size)
;
-   last = addr + size;
+   last = PAGE_ALIGN(addr) + size;
if (last  ei-addr + ei-size)
continue;
if (last  end) 
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox

On Wed, 2006-10-04 at 18:08 -0700, Martin Bligh wrote:
 Andi Kleen wrote:
 I think most likely it would crash on 2.6.18. Keith mannthey had reported
 a different crash on 2.6.18-rc4-mm2 when this patch was introduced first
 time. Following is the link to the thread.
  
  
  Then maybe trying 2.6.17 + the patch and then bisect between that and -rc4?
 
 I think it's fixed already in -git22, or at least it is for the IBM box
 reporting to test.kernel.org. You might want to try that one ...

-git22 also panics for me.

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Badari Pulavarty

On Thu, 2006-10-05 at 09:53 -0500, Steve Fox wrote:
 On Wed, 2006-10-04 at 18:08 -0700, Martin Bligh wrote:
  Andi Kleen wrote:
  I think most likely it would crash on 2.6.18. Keith mannthey had reported
  a different crash on 2.6.18-rc4-mm2 when this patch was introduced first
  time. Following is the link to the thread.
   
   
   Then maybe trying 2.6.17 + the patch and then bisect between that and 
   -rc4?
  
  I think it's fixed already in -git22, or at least it is for the IBM box
  reporting to test.kernel.org. You might want to try that one ...
 
 -git22 also panics for me.
 

Steve,

Can you post the latest panic stack again (with CONFIG_DEBUG_KERNEL) ? 
Last time I couldn't match your instruction dump to any code segment
in the routine. And also, can you post your .config file. I have
an amd64 and em64t machine and both work fine...

Thanks,
Badari

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox

On Thu, 2006-10-05 at 08:12 -0700, Badari Pulavarty wrote:

 Can you post the latest panic stack again (with CONFIG_DEBUG_KERNEL) ? 

CONFIG_DEBUG_KERNEL should be on

 Last time I couldn't match your instruction dump to any code segment
 in the routine. And also, can you post your .config file. I have
 an amd64 and em64t machine and both work fine...

Unable to handle kernel NULL pointer dereference at 0827 RIP:
 [804705e6] xfrm_register_mode+0x36/0x60
PGD 0
Oops:  [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-git22 #1
RIP: 0010:[804705e6]  [804705e6] 
xfrm_register_mode+0x36/0x60
RSP: :810bffcbded0  EFLAGS: 00010286
RAX: 081f RBX: 805588a0 RCX: 
RDX:  RSI: 0002 RDI: 80559550
RBP: ffef R08: 3f924371 R09: 
R10: 810bffcbdcb0 R11: 0154 R12: 
R13: 810bffcbdef0 R14:  R15: 
FS:  () GS:805d2000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0827 CR3: 00201000 CR4: 06e0
Process swapper (pid: 1, threadinfo 810bffcbc000, task 810bffcbb4e0)
Stack:   8061fb48  80207182
    
    0009

The base config file I'm using is at
http://flooterbu.net/kernel/elm3b239-2.6.17.config

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox

On Thu, 2006-10-05 at 17:40 +0200, Andi Kleen wrote:

 Please don't snip the Code: line. It is fairly important.

Sorry about that. The remote console I was using appears to overwrite
some text after I force the reboot. Here's a clean one.

global 
Unable to handle kernel NULL pointer dereference at 0827 RIP:
 [80470766] xfrm_register_mode+0x36/0x60
PGD 0
Oops:  [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-git22 #3
RIP: 0010:[80470766]  [80470766] 
xfrm_register_mode+0x36/0x60
RSP: :810bffcbded0  EFLAGS: 00010286
RAX: 081f RBX: 805588a0 RCX: 
RDX:  RSI: 0046 RDI: 80559550
RBP: ffef R08: 7a02 R09: 000e
R10: 0006 R11: 80334660 R12: 
R13: 810bffcbdef0 R14:  R15: 
FS:  () GS:805d2000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0827 CR3: 00201000 CR4: 06e0
Process swapper (pid: 1, threadinfo 810bffcbc000, task 810bffcbb4e0)
Stack:   8061fb48  80207182
    
    0009
Call Trace:
 [80207182] init+0x162/0x330
 [8020a9a8] child_rip+0xa/0x12
 [803394c2] acpi_ds_init_one_object+0x0/0x82
 [80207020] init+0x0/0x330
 [8020a99e] child_rip+0x0/0x12


Code: 48 83 78 08 00 75 06 48 89 58 08 31 ed 48 89 d7 e8 65 fd ff
RIP  [80470766] xfrm_register_mode+0x36/0x60
 RSP 810bffcbded0
CR2: 0827
 0Kernel panic - not syncing: Aiee, killing interrupt handler!

 My guess is that something is wrong with the global variable it is accessing.
 Can you post the output of grep -5 xfrm_policy_afinfo ? 

elm3b239:/boot # grep -5 xfrm_policy_afinfo System.map-2.6.18-git22
805594c0 d xfrm4_state_afinfo
80559500 D xfrm_cfg_mutex
80559530 d xfrm_dev_notifier
80559548 d xfrm_policy_lock
8055954c d xfrm_policy_gc_lock
80559550 d xfrm_policy_afinfo_lock
80559560 d xfrm_hash_work
805595c0 d hash_resize_mutex
80559600 D sysctl_xfrm_aevent_etime
80559604 D sysctl_xfrm_aevent_rseqth
80559610 D km_waitq
--
8075bfd8 b idiagnl
8075bfe0 B xfrm_policy_count
8075bff8 b xfrm_policy_gc_list
8075c000 b dummy.28400
8075c038 b idx_generator.27450
8075c040 b xfrm_policy_afinfo
8075c140 b xfrm_policy_gc_work
8075c1a0 b xfrm_policy_inexact
8075c1e0 B xfrm_nl
8075c1e8 b xfrm_state_gc_list
8075c1f0 b acqseq.27386

 And please add a 
 printk(global %p\n,  xfrm_policy_afinfo[family]);
 at the beginning of net/xfrm/xfrm_poliy.c:xfrm_policy_lock_afinfo
 and post the output.

Included above.

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Andi Kleen

On Thursday 05 October 2006 19:57, Steve Fox wrote:
 On Thu, 2006-10-05 at 17:40 +0200, Andi Kleen wrote:
 
  Please don't snip the Code: line. It is fairly important.
 
 Sorry about that. The remote console I was using appears to overwrite
 some text after I force the reboot. Here's a clean one.
 
 global 

Ok that definitely shouldn't be in there.

I guess we need to track when it gets corrupted. Can you send the full
boot log with this patch applied?


-Andi

Index: linux-2.6.19-rc1-hack/init/main.c
===
--- linux-2.6.19-rc1-hack.orig/init/main.c
+++ linux-2.6.19-rc1-hack/init/main.c
@@ -75,6 +75,9 @@
 
 static int init(void *);
 
+extern void bugcheck(char *, int);
+#define CHECK bugcheck(__FILE__, __LINE__)
+
 extern void init_IRQ(void);
 extern void fork_init(unsigned long);
 extern void mca_init(void);
@@ -480,6 +483,8 @@ asmlinkage void __init start_kernel(void
char * command_line;
extern struct kernel_param __start___param[], __stop___param[];
 
+   CHECK;
+
smp_setup_processor_id();
 
/*
@@ -502,7 +507,9 @@ asmlinkage void __init start_kernel(void
page_address_init();
printk(KERN_NOTICE);
printk(linux_banner);
+   CHECK;
setup_arch(command_line);
+   CHECK;
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
 
@@ -517,6 +524,7 @@ asmlinkage void __init start_kernel(void
 * fragile until we cpu_idle() for the first time.
 */
preempt_disable();
+   CHECK;
build_all_zonelists();
page_alloc_init();
printk(KERN_NOTICE Kernel command line: %s\n, saved_command_line);
@@ -525,6 +533,7 @@ asmlinkage void __init start_kernel(void
   __stop___param - __start___param,
   unknown_bootoption);
sort_main_extable();
+   CHECK;
trap_init();
rcu_init();
init_IRQ();
@@ -533,8 +542,10 @@ asmlinkage void __init start_kernel(void
hrtimers_init();
softirq_init();
timekeeping_init();
+   CHECK;
time_init();
profile_init();
+   CHECK;
if (!irqs_disabled())
printk(start_kernel(): bug: interrupts were enabled early\n);
early_boot_irqs_on();
@@ -568,7 +579,9 @@ asmlinkage void __init start_kernel(void
 #endif
vfs_caches_init_early();
cpuset_init_early();
+   CHECK;
mem_init();
+   CHECK;
kmem_cache_init();
setup_per_cpu_pageset();
numa_policy_init();
@@ -577,6 +590,7 @@ asmlinkage void __init start_kernel(void
calibrate_delay();
pidmap_init();
pgtable_cache_init();
+   CHECK;
prio_tree_init();
anon_vma_init();
 #ifdef CONFIG_X86
@@ -586,12 +600,14 @@ asmlinkage void __init start_kernel(void
fork_init(num_physpages);
proc_caches_init();
buffer_init();
+   CHECK;
unnamed_dev_init();
key_init();
security_init();
vfs_caches_init(num_physpages);
radix_tree_init();
signals_init();
+   CHECK;
/* rootfs populating might need page-writeback */
page_writeback_init();
 #ifdef CONFIG_PROC_FS
@@ -599,6 +615,7 @@ asmlinkage void __init start_kernel(void
 #endif
cpuset_init();
taskstats_init_early();
+   CHECK;
delayacct_init();
 
check_bugs();
@@ -609,7 +626,7 @@ asmlinkage void __init start_kernel(void
rest_init();
 }
 
-static int __initdata initcall_debug;
+static int __initdata initcall_debug = 1;
 
 static int __init initcall_debug_setup(char *str)
 {
@@ -639,7 +656,11 @@ static void __init do_initcalls(void)
printk(\n);
}
 
+   CHECK;
+
result = (*call)();
+   
+   CHECK;
 
if (result  result != -ENODEV  initcall_debug) {
sprintf(msgbuf, error code %d, result);
@@ -725,21 +746,32 @@ static int init(void * unused)
 
smp_prepare_cpus(max_cpus);
 
+   CHECK;
+
do_pre_smp_initcalls();
 
smp_init();
+
+   CHECK;
+
sched_init_smp();
 
cpuset_init_smp();
 
+   CHECK;
+
/*
 * Do this before initcalls, because some drivers want to access
 * firmware files.
 */
populate_rootfs();
 
+   CHECK;
+
do_basic_setup();
 
+   CHECK;
+
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
Index: linux-2.6.19-rc1-hack/net/xfrm/xfrm_policy.c
===
--- linux-2.6.19-rc1-hack.orig/net/xfrm/xfrm_policy.c
+++ linux-2.6.19-rc1-hack/net/xfrm/xfrm_policy.c
@@ -39,6 +39,16 @@ EXPORT_SYMBOL(xfrm_policy_count);
 static DEFINE_RWLOCK(xfrm_policy_afinfo_lock);
 static struct

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox

On Thu, 2006-10-05 at 20:27 +0200, Andi Kleen wrote:

 I guess we need to track when it gets corrupted. Can you send the full
 boot log with this patch applied?

Here she blows!

root (hd0,0)
 Filesystem type is reiserfs, partition type 0x83
kernel /boot/vmlinuz-autobench root=/dev/sda1 vga=791
ip=9.47.67.239:9.47.67.5
0:9.47.67.1:255.255.255.0 resume=/dev/sdb1 showopts console=tty0
console=ttyS0,
57600 autobench_args: root=/dev/sda1 ABAT:1160073474
   [Linux-bzImage, setup=0x1400, size=0x1dd755]
initrd /boot/initrd-autobench.img
   [Linux-initrd @ 0x37ceb000, 0x304c57 bytes]

Linux version 2.6.18-git22 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE
Linux)) #4 SMP Thu Oct 5 11:36:21 PDT 2006
Command line: root=/dev/sda1 vga=791
ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1
showopts console=tty0 console=ttyS0,57600 autobench_args: root=/dev/sda1
ABAT:1160073474
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009ac00 (usable)
 BIOS-e820: 0009ac00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - bff764c0 (usable)
 BIOS-e820: bff764c0 - bff98880 (ACPI data)
 BIOS-e820: bff98880 - c000 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
 BIOS-e820: 0001 - 000c (usable)
end_pfn_map = 12582912
DMI 2.3 present.
Zone PFN ranges:
  DMA 0 - 4096
  DMA324096 -  1048576
  Normal1048576 - 12582912
early_node_map[3] active PFN ranges
0:0 -  154
0:  256 -   786294
0:  1048576 - 12582912
ACPI: PM-Timer IO Port: 0x9c
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
Processor #6
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
Processor #7
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x10] enabled)
Processor #16
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x11] enabled)
Processor #17
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x16] enabled)
Processor #22
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x17] enabled)
Processor #23
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
Processor #32
ACPI: LAPIC (acpi_id[0x11] lapic_id[0x21] enabled)
Processor #33
ACPI: LAPIC (acpi_id[0x12] lapic_id[0x26] enabled)
Processor #38
ACPI: LAPIC (acpi_id[0x13] lapic_id[0x27] enabled)
Processor #39
ACPI: LAPIC (acpi_id[0x14] lapic_id[0x30] enabled)
Processor #48
ACPI: LAPIC (acpi_id[0x15] lapic_id[0x31] enabled)
Processor #49
ACPI: LAPIC (acpi_id[0x16] lapic_id[0x36] enabled)
Processor #54
ACPI: LAPIC (acpi_id[0x17] lapic_id[0x37] enabled)
Processor #55
ACPI: LAPIC (acpi_id[0x20] lapic_id[0x40] enabled)
Processor #64
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x21] lapic_id[0x41] enabled)
Processor #65
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x22] lapic_id[0x46] enabled)
Processor #70
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x23] lapic_id[0x47] enabled)
Processor #71
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x24] lapic_id[0x50] enabled)
Processor #80
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x25] lapic_id[0x51] enabled)
Processor #81
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x26] lapic_id[0x56] enabled)
Processor #86
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x27] lapic_id[0x57] enabled)
Processor #87
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x30] lapic_id[0x60] enabled)
Processor #96
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x31] lapic_id[0x61] enabled)
Processor #97
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x32] lapic_id[0x66] enabled)
Processor #102
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x33] lapic_id[0x67] enabled)
Processor #103
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x34] lapic_id[0x70] enabled)
Processor #112
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x35] lapic_id[0x71] enabled)
Processor #113
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x36] lapic_id[0x76] enabled)
Processor #118
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x37] lapic_id[0x77] enabled)
Processor #119
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] dfl dfl lint[0x1])

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Vivek Goyal

On Thu, Oct 05, 2006 at 08:27:02PM +0200, Andi Kleen wrote:
 On Thursday 05 October 2006 19:57, Steve Fox wrote:
  On Thu, 2006-10-05 at 17:40 +0200, Andi Kleen wrote:
  
   Please don't snip the Code: line. It is fairly important.
  
  Sorry about that. The remote console I was using appears to overwrite
  some text after I force the reboot. Here's a clean one.
  
  global 
 
 Ok that definitely shouldn't be in there.
 
 I guess we need to track when it gets corrupted. Can you send the full
 boot log with this patch applied?
 

Just recalled one more observation about the problem when keith had
reported it last. If I just move .bss before .data_nosave instead
of it being at the end, keith's problem had disappeared.

Thanks
Vivek
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Andi Kleen

On Thursday 05 October 2006 20:51, Steve Fox wrote:
 On Thu, 2006-10-05 at 20:27 +0200, Andi Kleen wrote:
 
  I guess we need to track when it gets corrupted. Can you send the full
  boot log with this patch applied?
 
 Here she blows!

Can you please try it again with this patch to narrow it down further?

-Andi

Index: linux-2.6.19-rc1-hack/init/main.c
===
--- linux-2.6.19-rc1-hack.orig/init/main.c
+++ linux-2.6.19-rc1-hack/init/main.c
@@ -75,6 +75,9 @@
 
 static int init(void *);
 
+extern void bugcheck(char *, int);
+#define CHECK bugcheck(__FILE__, __LINE__)
+
 extern void init_IRQ(void);
 extern void fork_init(unsigned long);
 extern void mca_init(void);
@@ -480,6 +483,8 @@ asmlinkage void __init start_kernel(void
char * command_line;
extern struct kernel_param __start___param[], __stop___param[];
 
+   CHECK;
+
smp_setup_processor_id();
 
/*
@@ -502,7 +507,9 @@ asmlinkage void __init start_kernel(void
page_address_init();
printk(KERN_NOTICE);
printk(linux_banner);
+   CHECK;
setup_arch(command_line);
+   CHECK;
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
 
@@ -517,6 +524,7 @@ asmlinkage void __init start_kernel(void
 * fragile until we cpu_idle() for the first time.
 */
preempt_disable();
+   CHECK;
build_all_zonelists();
page_alloc_init();
printk(KERN_NOTICE Kernel command line: %s\n, saved_command_line);
@@ -525,6 +533,7 @@ asmlinkage void __init start_kernel(void
   __stop___param - __start___param,
   unknown_bootoption);
sort_main_extable();
+   CHECK;
trap_init();
rcu_init();
init_IRQ();
@@ -533,8 +542,10 @@ asmlinkage void __init start_kernel(void
hrtimers_init();
softirq_init();
timekeeping_init();
+   CHECK;
time_init();
profile_init();
+   CHECK;
if (!irqs_disabled())
printk(start_kernel(): bug: interrupts were enabled early\n);
early_boot_irqs_on();
@@ -568,7 +579,9 @@ asmlinkage void __init start_kernel(void
 #endif
vfs_caches_init_early();
cpuset_init_early();
+   CHECK;
mem_init();
+   CHECK;
kmem_cache_init();
setup_per_cpu_pageset();
numa_policy_init();
@@ -577,6 +590,7 @@ asmlinkage void __init start_kernel(void
calibrate_delay();
pidmap_init();
pgtable_cache_init();
+   CHECK;
prio_tree_init();
anon_vma_init();
 #ifdef CONFIG_X86
@@ -586,12 +600,14 @@ asmlinkage void __init start_kernel(void
fork_init(num_physpages);
proc_caches_init();
buffer_init();
+   CHECK;
unnamed_dev_init();
key_init();
security_init();
vfs_caches_init(num_physpages);
radix_tree_init();
signals_init();
+   CHECK;
/* rootfs populating might need page-writeback */
page_writeback_init();
 #ifdef CONFIG_PROC_FS
@@ -599,6 +615,7 @@ asmlinkage void __init start_kernel(void
 #endif
cpuset_init();
taskstats_init_early();
+   CHECK;
delayacct_init();
 
check_bugs();
@@ -609,7 +626,7 @@ asmlinkage void __init start_kernel(void
rest_init();
 }
 
-static int __initdata initcall_debug;
+static int __initdata initcall_debug = 1;
 
 static int __init initcall_debug_setup(char *str)
 {
@@ -639,7 +656,11 @@ static void __init do_initcalls(void)
printk(\n);
}
 
+   CHECK;
+
result = (*call)();
+   
+   CHECK;
 
if (result  result != -ENODEV  initcall_debug) {
sprintf(msgbuf, error code %d, result);
@@ -725,21 +746,32 @@ static int init(void * unused)
 
smp_prepare_cpus(max_cpus);
 
+   CHECK;
+
do_pre_smp_initcalls();
 
smp_init();
+
+   CHECK;
+
sched_init_smp();
 
cpuset_init_smp();
 
+   CHECK;
+
/*
 * Do this before initcalls, because some drivers want to access
 * firmware files.
 */
populate_rootfs();
 
+   CHECK;
+
do_basic_setup();
 
+   CHECK;
+
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
Index: linux-2.6.19-rc1-hack/net/xfrm/xfrm_policy.c
===
--- linux-2.6.19-rc1-hack.orig/net/xfrm/xfrm_policy.c
+++ linux-2.6.19-rc1-hack/net/xfrm/xfrm_policy.c
@@ -39,6 +39,16 @@ EXPORT_SYMBOL(xfrm_policy_count);
 static DEFINE_RWLOCK(xfrm_policy_afinfo_lock);
 static struct xfrm_policy_afinfo *xfrm_policy_afinfo[NPROTO];
 
+void bugcheck(char *where, int line)
+{
+   int i;
+   for (i = 0; i  NPROTO; i++)
+   if

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Andi Kleen

On Thursday 05 October 2006 20:52, Vivek Goyal wrote:
 On Thu, Oct 05, 2006 at 08:27:02PM +0200, Andi Kleen wrote:
  On Thursday 05 October 2006 19:57, Steve Fox wrote:
   On Thu, 2006-10-05 at 17:40 +0200, Andi Kleen wrote:
   
Please don't snip the Code: line. It is fairly important.
   
   Sorry about that. The remote console I was using appears to overwrite
   some text after I force the reboot. Here's a clean one.
   
   global 
  
  Ok that definitely shouldn't be in there.
  
  I guess we need to track when it gets corrupted. Can you send the full
  boot log with this patch applied?
  
 
 Just recalled one more observation about the problem when keith had
 reported it last. If I just move .bss before .data_nosave instead
 of it being at the end, keith's problem had disappeared.

Yes, that could well be that it's something in the new bootmap 
management.  Steve's box failed at

Using ACPI (MADT) for SMP configuration information
Nosave address range: 0009a000 - 0009b000
Nosave address range: 0009b000 - 000a
Nosave address range: 000a - 000e
Nosave address range: 000e - 0010
Nosave address range: bff76000 - bff77000
Nosave address range: bff77000 - bff98000
Nosave address range: bff98000 - bff99000
Nosave address range: bff99000 - c000
Nosave address range: c000 - fec0
Nosave address range: fec0 - 0001
Allocating PCI resources starting at c400 (gap: c000:3ec0)
afinfo corrupted at init/main.c:512

which is directly after that code does lots of stuff.

Mel might want to take a look (and perhaps
also cut down a little on the ugly printks ...) 

BTW I found one of my test systems too now which does a lot of:
I'm about to leave for vacation so i won't have time to track it down
any time soon. But here is it for reference.

-Andi

Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 800
Bad page state in process 'swapper'
page:810003ee5480 flags:0x mapping: 
mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:   

Call Trace:  
 [8020ac84] show_trace+0x34/0x47
 [8020aca9] dump_stack+0x12/0x17
 [802586a7] bad_page+0x57/0x81
 [80258791] __free_pages_ok+0x64/0x247
 [807cca72] free_all_bootmem_core+0xcc/0x1a9
 [807ca08b] numa_free_all_bootmem+0x3b/0x77
 [807c915e] mem_init+0x44/0x186
 [807bc5f0] start_kernel+0x17b/0x207
 [807bc168] _sinittext+0x168/0x16c

Bad page state in process 'swapper'
page:810003ee54b8 flags:0x mapping: 
mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:   

Call Trace:  
 [8020ac84] show_trace+0x34/0x47
 [8020aca9] dump_stack+0x12/0x17
 [802586a7] bad_page+0x57/0x81
 [80258791] __free_pages_ok+0x64/0x247
 [807cca72] free_all_bootmem_core+0xcc/0x1a9
 [807ca08b] numa_free_all_bootmem+0x3b/0x77
 [807c915e] mem_init+0x44/0x186
 [807bc5f0] start_kernel+0x17b/0x207
 [807bc168] _sinittext+0x168/0x16c


... lots more of those ...
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox

On Thu, 2006-10-05 at 21:08 +0200, Andi Kleen wrote:

 Mel might want to take a look (and perhaps
 also cut down a little on the ugly printks ...) 

I tested a patch from Mel which backs out the arch independent zone
sizing and got the same results (to my inexperienced eye). I've sent him
the boot log to verify they really are the same as without this
back-out.

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Mel Gorman


On Thu, 5 Oct 2006, Andi Kleen wrote:


On Thursday 05 October 2006 20:52, Vivek Goyal wrote:

On Thu, Oct 05, 2006 at 08:27:02PM +0200, Andi Kleen wrote:

On Thursday 05 October 2006 19:57, Steve Fox wrote:

On Thu, 2006-10-05 at 17:40 +0200, Andi Kleen wrote:


Please don't snip the Code: line. It is fairly important.


Sorry about that. The remote console I was using appears to overwrite
some text after I force the reboot. Here's a clean one.

global 


Ok that definitely shouldn't be in there.

I guess we need to track when it gets corrupted. Can you send the full
boot log with this patch applied?



Just recalled one more observation about the problem when keith had
reported it last. If I just move .bss before .data_nosave instead
of it being at the end, keith's problem had disappeared.


Yes, that could well be that it's something in the new bootmap
management.  Steve's box failed at

Using ACPI (MADT) for SMP configuration information
Nosave address range: 0009a000 - 0009b000
Nosave address range: 0009b000 - 000a
Nosave address range: 000a - 000e
Nosave address range: 000e - 0010
Nosave address range: bff76000 - bff77000
Nosave address range: bff77000 - bff98000
Nosave address range: bff98000 - bff99000
Nosave address range: bff99000 - c000
Nosave address range: c000 - fec0
Nosave address range: fec0 - 0001
Allocating PCI resources starting at c400 (gap: c000:3ec0)
afinfo corrupted at init/main.c:512

which is directly after that code does lots of stuff.

Mel might want to take a look (and perhaps
also cut down a little on the ugly printks ...)



Steve tested a patch with arch-independent zone-sizing backed out for 
x86_64 and things looked ok but that is no guarantee it is not a 
contributary factor. The Nosave address range: printks are related to a 
suspend problem that was reported  end of June I believe.


I'll pick this up in the morning because I should have access to the same 
machine Steve does and see what I can come up with.



BTW I found one of my test systems too now which does a lot of:
I'm about to leave for vacation so i won't have time to track it down
any time soon. But here is it for reference.



hmm, rather than bugging you with patches now, I'll see what I can find 
with the x86_64 machines I have access to and see can I reproduce it.



-Andi

Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 800
Bad page state in process 'swapper'
page:810003ee5480 flags:0x mapping: 
mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace:
[8020ac84] show_trace+0x34/0x47
[8020aca9] dump_stack+0x12/0x17
[802586a7] bad_page+0x57/0x81
[80258791] __free_pages_ok+0x64/0x247
[807cca72] free_all_bootmem_core+0xcc/0x1a9
[807ca08b] numa_free_all_bootmem+0x3b/0x77
[807c915e] mem_init+0x44/0x186
[807bc5f0] start_kernel+0x17b/0x207
[807bc168] _sinittext+0x168/0x16c

Bad page state in process 'swapper'
page:810003ee54b8 flags:0x mapping: 
mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace:
[8020ac84] show_trace+0x34/0x47
[8020aca9] dump_stack+0x12/0x17
[802586a7] bad_page+0x57/0x81
[80258791] __free_pages_ok+0x64/0x247
[807cca72] free_all_bootmem_core+0xcc/0x1a9
[807ca08b] numa_free_all_bootmem+0x3b/0x77
[807c915e] mem_init+0x44/0x186
[807bc5f0] start_kernel+0x17b/0x207
[807bc168] _sinittext+0x168/0x16c


... lots more of those ...



--
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox

On Thu, 2006-10-05 at 21:05 +0200, Andi Kleen wrote:

 Can you please try it again with this patch to narrow it down further?

Unfortunately this is as far as it got before it hung.

root (hd0,0)
 Filesystem type is reiserfs, partition type 0x83
kernel /boot/vmlinuz-autobench root=/dev/sda1 vga=791  ip=9.47.67.239:9.47.67.5
0:9.47.67.1:255.255.255.0 resume=/dev/sdb1 showopts console=tty0 console=ttyS0,
57600 autobench_args: root=/dev/sda1 ABAT:1160080320
   [Linux-bzImage, setup=0x1400, size=0x1dd871]
initrd /boot/initrd-autobench.img
   [Linux-initrd @ 0x37ceb000, 0x304c57 bytes]


-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Andi Kleen

On Thursday 05 October 2006 22:42, Steve Fox wrote:
 On Thu, 2006-10-05 at 21:05 +0200, Andi Kleen wrote:
 
  Can you please try it again with this patch to narrow it down further?
 
 Unfortunately this is as far as it got before it hung.

Boot with earlyprintk=serial,ttyS0,57600
(or change the panic in the checkfunction back to a printk) 

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Andi Kleen


 hmm, rather than bugging you with patches now, I'll see what I can find 
 with the x86_64 machines I have access to and see can I reproduce it.

I started the bisect, should finish soon.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64 II

2006-10-05 Thread Andi Kleen

On Thursday 05 October 2006 22:51, Andi Kleen wrote:
 
  hmm, rather than bugging you with patches now, I'll see what I can find 
  with the x86_64 machines I have access to and see can I reproduce it.
 
 I started the bisect, should finish soon.

It ended at 

diff-tree d5cdb67236dba94496de052c9f9f431e1fc658f4 (from 0dad3510ee82bcf8a380b81
a2184a664a911ef9c)
Author: Satoru Takeuchi [EMAIL PROTECTED]
Date:   Tue Sep 12 10:19:00 2006 -0700

acpiphp: disable bridges

Currently acpiphp calls pci_enable_device() against all
hot-added bridges, but acpiphp does not call pci_disable_device()
against them in hot-remove. So ioapic hot-remove would fail.
This patch fixes this issue.

Not sure that is it really, it is possible i made a mistake during bisect
(the symptoms changed from bad page to just networking doesn't work
somewhere at 4cfee88ad30acc47f02b8b7ba3db8556262dce1e) 

I don't have time to rerun unfortunately
for some time. Anyone else looking would be useful.

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64 II

2006-10-05 Thread keith mannthey

On Fri, 2006-10-06 at 01:14 +0200, Andi Kleen wrote:
 On Thursday 05 October 2006 22:51, Andi Kleen wrote:
  
   hmm, rather than bugging you with patches now, I'll see what I can find 
   with the x86_64 machines I have access to and see can I reproduce it.
  
  I started the bisect, should finish soon.
 
 It ended at 
 
 diff-tree d5cdb67236dba94496de052c9f9f431e1fc658f4 (from 
 0dad3510ee82bcf8a380b81
 a2184a664a911ef9c)
 Author: Satoru Takeuchi [EMAIL PROTECTED]
 Date:   Tue Sep 12 10:19:00 2006 -0700
 
 acpiphp: disable bridges
 
 Currently acpiphp calls pci_enable_device() against all
 hot-added bridges, but acpiphp does not call pci_disable_device()
 against them in hot-remove. So ioapic hot-remove would fail.
 This patch fixes this issue.
 
 Not sure that is it really, it is possible i made a mistake during bisect
 (the symptoms changed from bad page to just networking doesn't work
 somewhere at 4cfee88ad30acc47f02b8b7ba3db8556262dce1e) 
 
 I don't have time to rerun unfortunately
 for some time. Anyone else looking would be useful.

As of yet I haven't been able to recreate the hang.  I am running
similar HW to Steve. 

Thanks,
  Keith 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64 II

2006-10-05 Thread Andi Kleen


 As of yet I haven't been able to recreate the hang.  I am running
 similar HW to Steve. 

That was on a 4 core Opteron with Tyan board  (S2881) and AMD-8111 
chipset.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64 II

2006-10-05 Thread keith mannthey

On Fri, 2006-10-06 at 01:35 +0200, Andi Kleen wrote:
  As of yet I haven't been able to recreate the hang.  I am running
  similar HW to Steve. 

I ran into this with -mm3

Memory: 24150368k/26738688k available (1933k kernel code, 490260k
reserved, 978k data, 308k init)
[ cut here ]
kernel BUG in init_list at mm/slab.c:1334!
invalid opcode:  [1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.18-mm3-smp #1
RIP: 0010:[8027f8fa]  [8027f8fa] init_list+0x1d/0xfd
RSP: 0018:80577f48  EFLAGS: 00010212
RAX: 0040 RBX: 0001 RCX: 
RDX: 0001 RSI: 805ba848 RDI: 810460700040
RBP: 0001 R08: 0001 R09: 0003
R10:  R11: 805bc268 R12: 810460700040
R13: 805ba848 R14:  R15: 
FS:  () GS:804d8000()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2:  CR3: 00201000 CR4: 06a0
Process swapper (pid: 0, threadinfo 80576000, task
80455840)
Stack:    0001
0001
 805ba848   80593aa8
 02c0 00010001 0008ef00 0008c000
Call Trace:
 [80593aa8] kmem_cache_init+0x344/0x406
 [805805ef] start_kernel+0x180/0x21b
 [8058016a] _sinittext+0x16a/0x16e


Code: 0f 0b 48 8b 3d 15 ab 1e 00 be d0 00 00 00 e8 c0 f5 ff ff 48
RIP  [8027f8fa] init_list+0x1d/0xfd
 RSP 80577f48
 0Kernel panic - not syncing: Attempted to kill the idle task!


I am going to revert the patch and see if it works.  I ran -git22 just
fine. 

Thanks,
  Keith 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64 II

2006-10-05 Thread Badari Pulavarty


keith mannthey wrote:

On Fri, 2006-10-06 at 01:35 +0200, Andi Kleen wrote:
  

As of yet I haven't been able to recreate the hang.  I am running
similar HW to Steve. 
  


I ran into this with -mm3

Memory: 24150368k/26738688k available (1933k kernel code, 490260k
reserved, 978k data, 308k init)
[ cut here ]
kernel BUG in init_list at mm/slab.c:1334!
invalid opcode:  [1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.18-mm3-smp #1
RIP: 0010:[8027f8fa]  [8027f8fa] init_list+0x1d/0xfd
RSP: 0018:80577f48  EFLAGS: 00010212
RAX: 0040 RBX: 0001 RCX: 
RDX: 0001 RSI: 805ba848 RDI: 810460700040
RBP: 0001 R08: 0001 R09: 0003
R10:  R11: 805bc268 R12: 810460700040
R13: 805ba848 R14:  R15: 
FS:  () GS:804d8000()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2:  CR3: 00201000 CR4: 06a0
Process swapper (pid: 0, threadinfo 80576000, task
80455840)
Stack:    0001
0001
 805ba848   80593aa8
 02c0 00010001 0008ef00 0008c000
Call Trace:
 [80593aa8] kmem_cache_init+0x344/0x406
 [805805ef] start_kernel+0x180/0x21b
 [8058016a] _sinittext+0x16a/0x16e


Code: 0f 0b 48 8b 3d 15 ab 1e 00 be d0 00 00 00 e8 c0 f5 ff ff 48
RIP  [8027f8fa] init_list+0x1d/0xfd
 RSP 80577f48
 0Kernel panic - not syncing: Attempted to kill the idle task!


I am going to revert the patch and see if it works.  I ran -git22 just
fine. 


Thanks,
  Keith 

  

Keith,

I fixed this already. Can you look for it on lkml (look for 2.6.18-mm3 
in the subject line).

one typo in mm/slab.c

Thanks,
Badari

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64 II

2006-10-05 Thread Andrew Morton

On Thu, 05 Oct 2006 17:02:54 -0700
Badari Pulavarty [EMAIL PROTECTED] wrote:

  Code: 0f 0b 48 8b 3d 15 ab 1e 00 be d0 00 00 00 e8 c0 f5 ff ff 48
  RIP  [8027f8fa] init_list+0x1d/0xfd
   RSP 80577f48
   0Kernel panic - not syncing: Attempted to kill the idle task!
 
 
  I am going to revert the patch and see if it works.  I ran -git22 just
  fine. 
 
  Thanks,
Keith 
 

 Keith,
 
 I fixed this already. Can you look for it on lkml (look for 2.6.18-mm3 
 in the subject line).
 one typo in mm/slab.c

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm3/hot-fixes
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-04 Thread Steve Fox

On Thu, 2006-09-28 at 14:01 -0700, Andrew Morton wrote:
 On Thu, 28 Sep 2006 17:50:31 + (UTC)
 Steve Fox [EMAIL PROTECTED] wrote:
 
  On Thu, 28 Sep 2006 01:46:23 -0700, Andrew Morton wrote:
  
   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm2/
  
  Panic on boot. This machine booted 2.6.18-mm1 fine. em64t machine.
  
  TCP bic registered
  TCP westwood registered
  TCP htcp registered
  NET: Registered protocol family 1
  NET: Registered protocol family 17
  Unable to handle kernel paging request at  RIP: 
   [8047ef93] packet_notifier+0x163/0x1a0
  PGD 203027 PUD 2b031067 PMD 0 
  Oops:  [1] SMP 
  last sysfs file: 
  CPU 0 
  Modules linked in:
  Pid: 1, comm: swapper Not tainted 2.6.18-mm2-autokern1 #1
  RIP: 0010:[8047ef93]  [8047ef93] 
  packet_notifier+0x163/0x1a0
  RSP: :810bffcbde90  EFLAGS: 00010286
  RAX:  RBX: 810bff4a1000 RCX: 
  RDX: 810bff4a1000 RSI: 0005 RDI: 8055f5e0
  RBP:  R08: 7616 R09: 000e
  R10: 0006 R11: 803373f0 R12: 
  R13: 0005 R14: 810bff4a1000 R15: 
  FS:  () GS:805d8000() knlGS:
  CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
  CR2:  CR3: 00201000 CR4: 06e0
  Process swapper (pid: 1, threadinfo 810bffcbc000, task 810bffcbb510)
  Stack:  810bff4a1000 8055f4c0  810bffcbdef0
    8042736e  
    8061c68d 806260f0 80207182
  Call Trace:
   [8042736e] register_netdevice_notifier+0x3e/0x70
   [8061c68d] packet_init+0x2d/0x53
   [80207182] init+0x162/0x330
   [8020a9d8] child_rip+0xa/0x12
   [8033c2a2] acpi_ds_init_one_object+0x0/0x82
   [80207020] init+0x0/0x330
   [8020a9ce] child_rip+0x0/0x12
  
  
  Code: 48 8b 45 00 0f 18 08 49 83 fd 02 4c 8d 65 f8 0f 84 f8 fe ff 
  RIP  [8047ef93] packet_notifier+0x163/0x1a0
   RSP 810bffcbde90
  CR2: 
   0Kernel panic - not syncing: Attempted to kill init!
  
 
 I'm really struggling to work out what went wrong there.  Comparing your
 miserable 20 bytes of code to my object code makes me think that this:
 
   struct packet_sock *po = pkt_sk(sk);
 
 returned -1, perhaps in %ebp.  But it's all very crude.
 
 Perhaps you could compile that kernel with CONFIG_DEBUG_INFO, rerun it (the
 addresses might change) then have a poke around with `gdb vmlinux' (or
 maybe just addr2line) to work out where it's really oopsing?
 
 I don't see much which has changed in that area recently.

Sorry for the delay. I was finally able to perform a bisect on this. It
turns out the patch that causes this is
x86_64-mm-re-positioning-the-bss-segment.patch, which seems like a
strange candidate, but sure enough I can boot to login: right up until
that patch is applied.

P.S. I had to comment usb-hubc-build-fix.patch out of the series file
because it would not apply cleanly and caused quilt (0.45) to simply
abort its 'push' operation.

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-04 Thread Andrew Morton

On Wed, 04 Oct 2006 08:42:28 -0500
Steve Fox [EMAIL PROTECTED] wrote:

 On Thu, 2006-09-28 at 14:01 -0700, Andrew Morton wrote:
  On Thu, 28 Sep 2006 17:50:31 + (UTC)
  Steve Fox [EMAIL PROTECTED] wrote:
  
   On Thu, 28 Sep 2006 01:46:23 -0700, Andrew Morton wrote:
   
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm2/
   
   Panic on boot. This machine booted 2.6.18-mm1 fine. em64t machine.
   
   TCP bic registered
   TCP westwood registered
   TCP htcp registered
   NET: Registered protocol family 1
   NET: Registered protocol family 17
   Unable to handle kernel paging request at  RIP: 
[8047ef93] packet_notifier+0x163/0x1a0
   PGD 203027 PUD 2b031067 PMD 0 
   Oops:  [1] SMP 
   last sysfs file: 
   CPU 0 
   Modules linked in:
   Pid: 1, comm: swapper Not tainted 2.6.18-mm2-autokern1 #1
   RIP: 0010:[8047ef93]  [8047ef93] 
   packet_notifier+0x163/0x1a0
   RSP: :810bffcbde90  EFLAGS: 00010286
   RAX:  RBX: 810bff4a1000 RCX: 
   RDX: 810bff4a1000 RSI: 0005 RDI: 8055f5e0
   RBP:  R08: 7616 R09: 000e
   R10: 0006 R11: 803373f0 R12: 
   R13: 0005 R14: 810bff4a1000 R15: 
   FS:  () GS:805d8000() 
   knlGS:
   CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
   CR2:  CR3: 00201000 CR4: 06e0
   Process swapper (pid: 1, threadinfo 810bffcbc000, task 
   810bffcbb510)
   Stack:  810bff4a1000 8055f4c0  
   810bffcbdef0
 8042736e  
 8061c68d 806260f0 80207182
   Call Trace:
[8042736e] register_netdevice_notifier+0x3e/0x70
[8061c68d] packet_init+0x2d/0x53
[80207182] init+0x162/0x330
[8020a9d8] child_rip+0xa/0x12
[8033c2a2] acpi_ds_init_one_object+0x0/0x82
[80207020] init+0x0/0x330
[8020a9ce] child_rip+0x0/0x12
   
   
   Code: 48 8b 45 00 0f 18 08 49 83 fd 02 4c 8d 65 f8 0f 84 f8 fe ff 
   RIP  [8047ef93] packet_notifier+0x163/0x1a0
RSP 810bffcbde90
   CR2: 
0Kernel panic - not syncing: Attempted to kill init!
   
  
  I'm really struggling to work out what went wrong there.  Comparing your
  miserable 20 bytes of code to my object code makes me think that this:
  
  struct packet_sock *po = pkt_sk(sk);
  
  returned -1, perhaps in %ebp.  But it's all very crude.
  
  Perhaps you could compile that kernel with CONFIG_DEBUG_INFO, rerun it (the
  addresses might change) then have a poke around with `gdb vmlinux' (or
  maybe just addr2line) to work out where it's really oopsing?
  
  I don't see much which has changed in that area recently.
 
 Sorry for the delay. I was finally able to perform a bisect on this. It
 turns out the patch that causes this is
 x86_64-mm-re-positioning-the-bss-segment.patch, which seems like a
 strange candidate, but sure enough I can boot to login: right up until
 that patch is applied.

hm, that patch was merged into mainline September 29.  Does mainline work?

 P.S. I had to comment usb-hubc-build-fix.patch out of the series file
 because it would not apply cleanly and caused quilt (0.45) to simply
 abort its 'push' operation.

Sorry about that.

If mainline _does_ work then perhaps it's an interaction between that patch
and something else in the -mm2 lineup (and at that point in the bisection,
it'll be one of the git trees or something else in the x86_64 tree).  Could
be that the problem remains in -mm3.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-04 Thread Andi Kleen

On Wednesday 04 October 2006 17:45, Andrew Morton wrote:
 On Wed, 04 Oct 2006 08:42:28 -0500
 Steve Fox [EMAIL PROTECTED] wrote:
 
  On Thu, 2006-09-28 at 14:01 -0700, Andrew Morton wrote:
   On Thu, 28 Sep 2006 17:50:31 + (UTC)
   Steve Fox [EMAIL PROTECTED] wrote:
   
On Thu, 28 Sep 2006 01:46:23 -0700, Andrew Morton wrote:

 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm2/

Panic on boot. This machine booted 2.6.18-mm1 fine. em64t machine.

TCP bic registered
TCP westwood registered
TCP htcp registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Unable to handle kernel paging request at  RIP: 
 [8047ef93] packet_notifier+0x163/0x1a0
PGD 203027 PUD 2b031067 PMD 0 
Oops:  [1] SMP 
last sysfs file: 
CPU 0 
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-mm2-autokern1 #1
RIP: 0010:[8047ef93]  [8047ef93] 
packet_notifier+0x163/0x1a0
RSP: :810bffcbde90  EFLAGS: 00010286
RAX:  RBX: 810bff4a1000 RCX: 
RDX: 810bff4a1000 RSI: 0005 RDI: 8055f5e0
RBP:  R08: 7616 R09: 000e
R10: 0006 R11: 803373f0 R12: 
R13: 0005 R14: 810bff4a1000 R15: 
FS:  () GS:805d8000() 
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2:  CR3: 00201000 CR4: 06e0
Process swapper (pid: 1, threadinfo 810bffcbc000, task 
810bffcbb510)
Stack:  810bff4a1000 8055f4c0  
810bffcbdef0
  8042736e  
  8061c68d 806260f0 80207182
Call Trace:
 [8042736e] register_netdevice_notifier+0x3e/0x70
 [8061c68d] packet_init+0x2d/0x53
 [80207182] init+0x162/0x330
 [8020a9d8] child_rip+0xa/0x12
 [8033c2a2] acpi_ds_init_one_object+0x0/0x82
 [80207020] init+0x0/0x330
 [8020a9ce] child_rip+0x0/0x12


Code: 48 8b 45 00 0f 18 08 49 83 fd 02 4c 8d 65 f8 0f 84 f8 fe ff 
RIP  [8047ef93] packet_notifier+0x163/0x1a0
 RSP 810bffcbde90
CR2: 
 0Kernel panic - not syncing: Attempted to kill init!

   
   I'm really struggling to work out what went wrong there.  Comparing your
   miserable 20 bytes of code to my object code makes me think that this:
   
 struct packet_sock *po = pkt_sk(sk);
   
   returned -1, perhaps in %ebp.  But it's all very crude.
   
   Perhaps you could compile that kernel with CONFIG_DEBUG_INFO, rerun it 
   (the
   addresses might change) then have a poke around with `gdb vmlinux' (or
   maybe just addr2line) to work out where it's really oopsing?
   
   I don't see much which has changed in that area recently.
  
  Sorry for the delay. I was finally able to perform a bisect on this. It
  turns out the patch that causes this is
  x86_64-mm-re-positioning-the-bss-segment.patch, which seems like a
  strange candidate, but sure enough I can boot to login: right up until
  that patch is applied.
 
 hm, that patch was merged into mainline September 29.  Does mainline work?

Yes we had this earlier already. But without this patch it doesn't 
compile for some people. So it was readded.

And nobody knows why the reposition-bss patch actually breaks things :/

In theory the reposition is ok, so it must be some marginal code
somewhere else that just ends up failing over.

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-04 Thread Vivek Goyal

On Wed, Oct 04, 2006 at 08:45:40AM -0700, Andrew Morton wrote:
 On Wed, 04 Oct 2006 08:42:28 -0500
 Steve Fox [EMAIL PROTECTED] wrote:
 
  On Thu, 2006-09-28 at 14:01 -0700, Andrew Morton wrote:
   On Thu, 28 Sep 2006 17:50:31 + (UTC)
   Steve Fox [EMAIL PROTECTED] wrote:
   
On Thu, 28 Sep 2006 01:46:23 -0700, Andrew Morton wrote:

 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm2/

Panic on boot. This machine booted 2.6.18-mm1 fine. em64t machine.

TCP bic registered
TCP westwood registered
TCP htcp registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Unable to handle kernel paging request at  RIP: 
 [8047ef93] packet_notifier+0x163/0x1a0
PGD 203027 PUD 2b031067 PMD 0 
Oops:  [1] SMP 
last sysfs file: 
CPU 0 
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-mm2-autokern1 #1
RIP: 0010:[8047ef93]  [8047ef93] 
packet_notifier+0x163/0x1a0
RSP: :810bffcbde90  EFLAGS: 00010286
RAX:  RBX: 810bff4a1000 RCX: 
RDX: 810bff4a1000 RSI: 0005 RDI: 8055f5e0
RBP:  R08: 7616 R09: 000e
R10: 0006 R11: 803373f0 R12: 
R13: 0005 R14: 810bff4a1000 R15: 
FS:  () GS:805d8000() 
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2:  CR3: 00201000 CR4: 06e0
Process swapper (pid: 1, threadinfo 810bffcbc000, task 
810bffcbb510)
Stack:  810bff4a1000 8055f4c0  
810bffcbdef0
  8042736e  
  8061c68d 806260f0 80207182
Call Trace:
 [8042736e] register_netdevice_notifier+0x3e/0x70
 [8061c68d] packet_init+0x2d/0x53
 [80207182] init+0x162/0x330
 [8020a9d8] child_rip+0xa/0x12
 [8033c2a2] acpi_ds_init_one_object+0x0/0x82
 [80207020] init+0x0/0x330
 [8020a9ce] child_rip+0x0/0x12


Code: 48 8b 45 00 0f 18 08 49 83 fd 02 4c 8d 65 f8 0f 84 f8 fe ff 
RIP  [8047ef93] packet_notifier+0x163/0x1a0
 RSP 810bffcbde90
CR2: 
 0Kernel panic - not syncing: Attempted to kill init!

   
   I'm really struggling to work out what went wrong there.  Comparing your
   miserable 20 bytes of code to my object code makes me think that this:
   
 struct packet_sock *po = pkt_sk(sk);
   
   returned -1, perhaps in %ebp.  But it's all very crude.
   
   Perhaps you could compile that kernel with CONFIG_DEBUG_INFO, rerun it 
   (the
   addresses might change) then have a poke around with `gdb vmlinux' (or
   maybe just addr2line) to work out where it's really oopsing?
   
   I don't see much which has changed in that area recently.
  
  Sorry for the delay. I was finally able to perform a bisect on this. It
  turns out the patch that causes this is
  x86_64-mm-re-positioning-the-bss-segment.patch, which seems like a
  strange candidate, but sure enough I can boot to login: right up until
  that patch is applied.
 
 hm, that patch was merged into mainline September 29.  Does mainline work?
 

I thought above patch was dropped because Keith ran into some boot issues
on one of the machines. Though there seems to be nothing wrong with the
patch as such but it might have triggered some existing bug. At that point
of time I looked into the issue but nothing was conclusive.

So looks like this patch has come back. I am not sure how.

Thanks
Vivek
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-04 Thread Steve Fox

On Wed, 2006-10-04 at 08:45 -0700, Andrew Morton wrote:
 On Wed, 04 Oct 2006 08:42:28 -0500
 Steve Fox [EMAIL PROTECTED] wrote:
  Sorry for the delay. I was finally able to perform a bisect on this. It
  turns out the patch that causes this is
  x86_64-mm-re-positioning-the-bss-segment.patch, which seems like a
  strange candidate, but sure enough I can boot to login: right up until
  that patch is applied.
 
 hm, that patch was merged into mainline September 29.  Does mainline work?

-git21 also fails with this same error.

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-04 Thread Andrew Morton

On Wed, 04 Oct 2006 11:41:59 -0500
Steve Fox [EMAIL PROTECTED] wrote:

 On Wed, 2006-10-04 at 08:45 -0700, Andrew Morton wrote:
  On Wed, 04 Oct 2006 08:42:28 -0500
  Steve Fox [EMAIL PROTECTED] wrote:
   Sorry for the delay. I was finally able to perform a bisect on this. It
   turns out the patch that causes this is
   x86_64-mm-re-positioning-the-bss-segment.patch, which seems like a
   strange candidate, but sure enough I can boot to login: right up until
   that patch is applied.
  
  hm, that patch was merged into mainline September 29.  Does mainline work?
 
 -git21 also fails with this same error.
 

OK, thanks.  And we know that
x86_64-mm-re-positioning-the-bss-segment.patch triggered this failure.  And
that patch is non-buggy, and the xfrm code is probably non-buggy.  So we don't
know squat, and we're going to need to debug this crash.

Well.  There is one trick we could use: apply
x86_64-mm-re-positioning-the-bss-segment.patch to 2.6.18 base and see if it
crashes.  If it doesn't, then we can theorise that the bug is some buggy
post 2.6.18 patch which is being exposed by
x86_64-mm-re-positioning-the-bss-segment.patch.  A technique I've used
before for identifying the buggy patch is to do a git-bisect, but apply
x86_64-mm-re-positioning-the-bss-segment.patch by hand at each bisection
step.  It's pretty straightforward as long as the patch roughly applies at
each step.  

Or we could debug it.  Can you send the .config?  Let's see if it happens
with my toolchain+machine first.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-04 Thread Vivek Goyal

On Wed, Oct 04, 2006 at 05:06:59PM -0700, Andrew Morton wrote:
 On Wed, 04 Oct 2006 11:41:59 -0500
 Steve Fox [EMAIL PROTECTED] wrote:
 
  On Wed, 2006-10-04 at 08:45 -0700, Andrew Morton wrote:
   On Wed, 04 Oct 2006 08:42:28 -0500
   Steve Fox [EMAIL PROTECTED] wrote:
Sorry for the delay. I was finally able to perform a bisect on this. It
turns out the patch that causes this is
x86_64-mm-re-positioning-the-bss-segment.patch, which seems like a
strange candidate, but sure enough I can boot to login: right up until
that patch is applied.
   
   hm, that patch was merged into mainline September 29.  Does mainline work?
  
  -git21 also fails with this same error.
  
 
 OK, thanks.  And we know that
 x86_64-mm-re-positioning-the-bss-segment.patch triggered this failure.  And
 that patch is non-buggy, and the xfrm code is probably non-buggy.  So we don't
 know squat, and we're going to need to debug this crash.
 
 Well.  There is one trick we could use: apply
 x86_64-mm-re-positioning-the-bss-segment.patch to 2.6.18 base and see if it
 crashes.  If it doesn't, then we can theorise that the bug is some buggy
 post 2.6.18 patch which is being exposed by

I think most likely it would crash on 2.6.18. Keith mannthey had reported
a different crash on 2.6.18-rc4-mm2 when this patch was introduced first
time. Following is the link to the thread.

http://marc.theaimsgroup.com/?l=linux-kernelm=115629369729911w=2

Following is the backtrace he had reported.

 Unable to handle kernel NULL pointer dereference at 0007
 RIP:
  [803d45b0] __unix_insert_socket+0x49/0x5a
 PGD 115c934067 PUD 115c935067 PMD 0
 Oops: 0002 [1] SMP
 last sysfs file:
 CPU 14
 Modules linked in:
 Pid: 1, comm: init Not tainted 2.6.18-rc4-mm2-smp #3
 RIP: 0010:[803d45b0]  [803d45b0]
 __unix_insert_socket+0x49/0x5a
 RSP: 0018:810460605eb8  EFLAGS: 00010286
 RAX:  RBX: 81115c171c80 RCX: 
 RDX: 81115c171c88 RSI: 81115c171c80 RDI: 806656e0
 RBP: 806656e0 R08: 81115c069200 R09: 8110700b4000
 R10:  R11: 0002 R12: 81115c170d00
 R13: 0001 R14: 0001 R15: 
 FS:  2b793a4fd6d0() GS:81115c910e40()
 knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 0007 CR3: 00115c92d000 CR4: 06e0
 Process init (pid: 1, threadinfo 810460604000, task
 81115cb10040)
 Stack:  00010001  81115c171c80
 803d58e9
  8045bb30 000180298f61 80498080 0001
  81115c170d00 803d595d 0004 80376061
 Call Trace:
  [803d58e9] unix_create1+0xf3/0x107
  [803d595d] unix_create+0x60/0x6b
  [80376061] __sock_create+0x12f/0x227
  [80376429] sys_socket+0xf/0x37
  [8020968e] system_call+0x7e/0x83


 Code: 48 89 50 08 48 89 55 00 48 89 6a 08 41 58 5b 5d c3 c7 47 08
 RIP  [803d45b0] __unix_insert_socket+0x49/0x5a
  RSP 810460605eb8
 CR2: 0007
  0Kernel panic - not syncing: Attempted to kill init!

Thanks
Vivek
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-04 Thread Andi Kleen


 I think most likely it would crash on 2.6.18. Keith mannthey had reported
 a different crash on 2.6.18-rc4-mm2 when this patch was introduced first
 time. Following is the link to the thread.

Then maybe trying 2.6.17 + the patch and then bisect between that and -rc4?

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-04 Thread Keith Mannthey


On 10/4/06, Martin Bligh [EMAIL PROTECTED] wrote:

Andi Kleen wrote:
I think most likely it would crash on 2.6.18. Keith mannthey had reported
a different crash on 2.6.18-rc4-mm2 when this patch was introduced first
time. Following is the link to the thread.


 Then maybe trying 2.6.17 + the patch and then bisect between that and -rc4?

I think it's fixed already in -git22, or at least it is for the IBM box
reporting to test.kernel.org. You might want to try that one ...


Fixed or hidden... hard to say at this point.   I think it could be a
werid interaction between patches and or config options.  I will see
tommorrow if I can recreate again.

Thanks,
 Keith
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

39 matches

Mail list logo