Re: system without RAM on node0 boot fail

2008-02-01 Thread Yinghai Lu
On Feb 1, 2008 10:11 AM, dean gaudet <[EMAIL PROTECTED]> wrote:
> actually yeah i've seen this... in a bizarre failure situation in a system
> which physically had RAM in the boot node but it was never enumerated for
> the kernel (other nodes had RAM which was enumerated).
>
> so technically there was boot node RAM but the kernel never saw it.

BIOS sometime disabled some dimms on node that it thought that dimm
was bad and caused mce error in last boot.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-02-01 Thread dean gaudet
actually yeah i've seen this... in a bizarre failure situation in a system 
which physically had RAM in the boot node but it was never enumerated for 
the kernel (other nodes had RAM which was enumerated).

so technically there was boot node RAM but the kernel never saw it.

-dean

On Wed, 30 Jan 2008, Christoph Lameter wrote:

> x86 supports booting from a node without RAM?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-02-01 Thread dean gaudet
actually yeah i've seen this... in a bizarre failure situation in a system 
which physically had RAM in the boot node but it was never enumerated for 
the kernel (other nodes had RAM which was enumerated).

so technically there was boot node RAM but the kernel never saw it.

-dean

On Wed, 30 Jan 2008, Christoph Lameter wrote:

 x86 supports booting from a node without RAM?
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-02-01 Thread Yinghai Lu
On Feb 1, 2008 10:11 AM, dean gaudet [EMAIL PROTECTED] wrote:
 actually yeah i've seen this... in a bizarre failure situation in a system
 which physically had RAM in the boot node but it was never enumerated for
 the kernel (other nodes had RAM which was enumerated).

 so technically there was boot node RAM but the kernel never saw it.

BIOS sometime disabled some dimms on node that it thought that dimm
was bad and caused mce error in last boot.

YH
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread Andi Kleen
On Thursday 31 January 2008 08:22:15 H. Peter Anvin wrote:
> Yinghai Lu wrote:
> > On Jan 30, 2008 10:09 PM, H. Peter Anvin <[EMAIL PROTECTED]> wrote:
> >> Christoph Lameter wrote:
> >>> x86 supports booting from a node without RAM?
> > 
> > it is a two sockets system. only 4G RAM installed on node1.
> > 
> 
> "Node 1" is the boot CPU, though, right?
> 
> I don't know if the spec requires node 0 to be the boot node.  Probably not.

There is no spec I know of that completely defines "nodes" on x86.
Actually I think on Linux calls them that.

There is the ACPI 3.0 SRAT spec that defines memory affinity,
but I don't think it has any requirements about where the memory must be.

Even if there was a spec people who actually put in DIMMs tend
to violate it. It seems to be not totally uncommon to just 
stuff them all into the same corner of the motherboard to give 
a "tidy appearance" (for non physicists :-) and that usually results in 
memory less nodes.

Anyways this area is something that regresses regularly. I had
fixed it several times and tested all cases on SimNow, but after
some time it tends to bit rot again unfortunately. The people who
usually test kernels probably know where to put the DIMMs in.

Probably just happened again.

-Andi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread H. Peter Anvin

Yinghai Lu wrote:

On Jan 30, 2008 10:09 PM, H. Peter Anvin <[EMAIL PROTECTED]> wrote:

Christoph Lameter wrote:

x86 supports booting from a node without RAM?


it is a two sockets system. only 4G RAM installed on node1.



"Node 1" is the boot CPU, though, right?

I don't know if the spec requires node 0 to be the boot node.  Probably not.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread Yinghai Lu
On Jan 30, 2008 10:09 PM, H. Peter Anvin <[EMAIL PROTECTED]> wrote:
> Christoph Lameter wrote:
> > x86 supports booting from a node without RAM?

it is a two sockets system. only 4G RAM installed on node1.

>
>  From the looks of it I would say he probably has the boot node numbered 1.
>
> The e820 map is also "interesting" - doesn't list the first 256 bytes,
> which corresponds to the first quarter(!) of the real-mode exception table.

i kexec that from 2.6.24 (with discontinuous and slab)
so that e820 is passed by kexec from first kernel. normal pxeboot will
have start for 0.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread H. Peter Anvin

Christoph Lameter wrote:

x86 supports booting from a node without RAM?


From the looks of it I would say he probably has the boot node numbered 1.

The e820 map is also "interesting" - doesn't list the first 256 bytes, 
which corresponds to the first quarter(!) of the real-mode exception table.


-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread Christoph Lameter
x86 supports booting from a node without RAM?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread Yinghai Lu
current x86.git
Command line: apic=debug acpi.debug_level=0x000F debug initcall_debug 
pci=routeirq ramdisk_size=131072 root=/dev/ram0 rw ip=dhcp 
console=uart8250,io,0x3f8,115200n8
BIOS-provided physical RAM map:
 BIOS-e820: 0100 - 0009bc00 (usable)
 BIOS-e820: 0009bc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - dcff (usable)
 BIOS-e820: dcff - dcffe000 (ACPI data)
 BIOS-e820: dcffe000 - dd00 (ACPI NVS)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: ff70 - 0001 (reserved)
 BIOS-e820: 0001 - 00012300 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
end_pfn_map = 1191936
DMI present.
...
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 1 -> APIC 2 -> Node 1
SRAT: PXM 1 -> APIC 3 -> Node 1
SRAT: Node 1 PXM 1 0-a
SRAT: Node 1 PXM 1 0-dd00
SRAT: Node 1 PXM 1 0-12300
ACPI: SLIT: nodes = 2
 10 13
 13 10
mapped APIC to ff5fb000 (fee0)
Bootmem setup node 1 -00012300
  NODE_DATA [e000 - 00014fff]
  bootmap [00015000 -  000395ff] pages 25
early res: 0 [0-fff] BIOS data page
early res: 1 [6000-7fff] SMP_TRAMPOLINE
early res: 2 [20-d9c273] TEXT DATA BSS
early res: 3 [7e6f4000-7fff3a25] RAMDISK
early res: 4 [9bc00-9dbff] EBDA
early res: 5 [8000-dfff] PGTABLE
Could not find start_pfn for node 0
Pid: 0, comm: swapper Not tainted 2.6.24-smp-04921-gbce08dc-dirty #43

Call Trace:
 [] free_area_init_node+0x22/0x381
 [] generic_swap+0x0/0x17
 [] find_zone_movable_pfns_for_nodes+0x54/0x271
 [] free_area_init_nodes+0x239/0x287
 [] paging_init+0x46/0x4c
 [] setup_arch+0x3c3/0x44e
 [] start_kernel+0x6f/0x2c7
 [] _sinittext+0x1e1/0x1e8

RIP 0x10


2.6.24 
discontinuous and slab works well

2.6.24
sparse and slub will get oops
ehci_hcd :00:02.1: EHCI Host Controller
Unable to handle kernel paging request at 3078 RIP:
 [] __alloc_pages+0x7d/0x33a
PGD 0
Oops:  [1] SMP
CPU 3
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp #1
RIP: 0010:[]  [] __alloc_pages+0x7d/0x33a
RSP: 0018:810122a55bc0  EFLAGS: 00010246
RAX:  RBX:  RCX: 0002
RDX: 3070 RSI:  RDI: 00d0
RBP: 00d0 R08: 810001025be0 R09: 810122a55b00
R10: 8085e2a0 R11: 00a0 R12: 3070
R13:  R14: 00d0 R15: 810122a52000
FS:  () GS:810122c02f00() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 3078 CR3: 00201000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 1, threadinfo 810122a54000, task 810122a52000)
Stack:  810122a55d20 80455b55  0080
 3078 22a55d10 001200d0 81011f5ea000
 0020  fff2 
Call Trace:
 [] new_slab+0xdd/0x236
 [] __slab_alloc+0x1a7/0x397
 [] dma_pool_create+0x86/0x147
 [] kmem_cache_alloc_node+0x3e/0x6d
 [] dma_pool_create+0x86/0x147
 [] hcd_buffer_create+0x57/0x89
 [] compat_blkdev_ioctl+0xd72/0x11f4
 [] usb_add_hcd+0x72/0x59f
 [] usb_hcd_pci_probe+0x1e4/0x28b
 [] pci_device_probe+0xd1/0x136
 [] driver_probe_device+0xd3/0x150
 [] __driver_attach+0x0/0x93
 [] __driver_attach+0x5a/0x93
 [] bus_for_each_dev+0x43/0x6e
 [] bus_add_driver+0x79/0x1bd
 [] __pci_register_driver+0x5b/0x8d
 [] kernel_init+0x175/0x2e1
 [] child_rip+0xa/0x12
 [] kernel_init+0x0/0x2e1
 [] child_rip+0x0/0x12


Code: 49 83 7c 24 08 00 75 0e 48 c7 44 24 38 00 00 00 00 e9 93 02
RIP  [] __alloc_pages+0x7d/0x33a
 RSP 
CR2: 3078
---[ end trace c08baa60a7f2ad32 ]---
Kernel panic - not syncing: Attempted to kill init!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread Yinghai Lu
current x86.git
Command line: apic=debug acpi.debug_level=0x000F debug initcall_debug 
pci=routeirq ramdisk_size=131072 root=/dev/ram0 rw ip=dhcp 
console=uart8250,io,0x3f8,115200n8
BIOS-provided physical RAM map:
 BIOS-e820: 0100 - 0009bc00 (usable)
 BIOS-e820: 0009bc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - dcff (usable)
 BIOS-e820: dcff - dcffe000 (ACPI data)
 BIOS-e820: dcffe000 - dd00 (ACPI NVS)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: ff70 - 0001 (reserved)
 BIOS-e820: 0001 - 00012300 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
end_pfn_map = 1191936
DMI present.
...
SRAT: PXM 0 - APIC 0 - Node 0
SRAT: PXM 0 - APIC 1 - Node 0
SRAT: PXM 1 - APIC 2 - Node 1
SRAT: PXM 1 - APIC 3 - Node 1
SRAT: Node 1 PXM 1 0-a
SRAT: Node 1 PXM 1 0-dd00
SRAT: Node 1 PXM 1 0-12300
ACPI: SLIT: nodes = 2
 10 13
 13 10
mapped APIC to ff5fb000 (fee0)
Bootmem setup node 1 -00012300
  NODE_DATA [e000 - 00014fff]
  bootmap [00015000 -  000395ff] pages 25
early res: 0 [0-fff] BIOS data page
early res: 1 [6000-7fff] SMP_TRAMPOLINE
early res: 2 [20-d9c273] TEXT DATA BSS
early res: 3 [7e6f4000-7fff3a25] RAMDISK
early res: 4 [9bc00-9dbff] EBDA
early res: 5 [8000-dfff] PGTABLE
Could not find start_pfn for node 0
Pid: 0, comm: swapper Not tainted 2.6.24-smp-04921-gbce08dc-dirty #43

Call Trace:
 [80c200d6] free_area_init_node+0x22/0x381
 [804539e5] generic_swap+0x0/0x17
 [80c06a1b] find_zone_movable_pfns_for_nodes+0x54/0x271
 [80c06f4a] free_area_init_nodes+0x239/0x287
 [80c0203a] paging_init+0x46/0x4c
 [80bf9dba] setup_arch+0x3c3/0x44e
 [80bf38d3] start_kernel+0x6f/0x2c7
 [80bf31e1] _sinittext+0x1e1/0x1e8

RIP 0x10


2.6.24 
discontinuous and slab works well

2.6.24
sparse and slub will get oops
ehci_hcd :00:02.1: EHCI Host Controller
Unable to handle kernel paging request at 3078 RIP:
 [8026585f] __alloc_pages+0x7d/0x33a
PGD 0
Oops:  [1] SMP
CPU 3
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp #1
RIP: 0010:[8026585f]  [8026585f] __alloc_pages+0x7d/0x33a
RSP: 0018:810122a55bc0  EFLAGS: 00010246
RAX:  RBX:  RCX: 0002
RDX: 3070 RSI:  RDI: 00d0
RBP: 00d0 R08: 810001025be0 R09: 810122a55b00
R10: 8085e2a0 R11: 00a0 R12: 3070
R13:  R14: 00d0 R15: 810122a52000
FS:  () GS:810122c02f00() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 3078 CR3: 00201000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 1, threadinfo 810122a54000, task 810122a52000)
Stack:  810122a55d20 80455b55  0080
 3078 22a55d10 001200d0 81011f5ea000
 0020  fff2 
Call Trace:
 [80282ea1] new_slab+0xdd/0x236
 [802831a1] __slab_alloc+0x1a7/0x397
 [804cea6f] dma_pool_create+0x86/0x147
 [80283697] kmem_cache_alloc_node+0x3e/0x6d
 [804cea6f] dma_pool_create+0x86/0x147
 [80667e52] hcd_buffer_create+0x57/0x89
 [80450032] compat_blkdev_ioctl+0xd72/0x11f4
 [80662da4] usb_add_hcd+0x72/0x59f
 [8066c541] usb_hcd_pci_probe+0x1e4/0x28b
 [80461de6] pci_device_probe+0xd1/0x136
 [804caf0f] driver_probe_device+0xd3/0x150
 [804cb02e] __driver_attach+0x0/0x93
 [804cb088] __driver_attach+0x5a/0x93
 [804ca36d] bus_for_each_dev+0x43/0x6e
 [804ca69d] bus_add_driver+0x79/0x1bd
 [80461fbe] __pci_register_driver+0x5b/0x8d
 [80bbf63a] kernel_init+0x175/0x2e1
 [8020cd28] child_rip+0xa/0x12
 [80bbf4c5] kernel_init+0x0/0x2e1
 [8020cd1e] child_rip+0x0/0x12


Code: 49 83 7c 24 08 00 75 0e 48 c7 44 24 38 00 00 00 00 e9 93 02
RIP  [8026585f] __alloc_pages+0x7d/0x33a
 RSP 810122a55bc0
CR2: 3078
---[ end trace c08baa60a7f2ad32 ]---
Kernel panic - not syncing: Attempted to kill init!

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread Christoph Lameter
x86 supports booting from a node without RAM?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread H. Peter Anvin

Christoph Lameter wrote:

x86 supports booting from a node without RAM?


From the looks of it I would say he probably has the boot node numbered 1.

The e820 map is also interesting - doesn't list the first 256 bytes, 
which corresponds to the first quarter(!) of the real-mode exception table.


-hpa
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread Yinghai Lu
On Jan 30, 2008 10:09 PM, H. Peter Anvin [EMAIL PROTECTED] wrote:
 Christoph Lameter wrote:
  x86 supports booting from a node without RAM?

it is a two sockets system. only 4G RAM installed on node1.


  From the looks of it I would say he probably has the boot node numbered 1.

 The e820 map is also interesting - doesn't list the first 256 bytes,
 which corresponds to the first quarter(!) of the real-mode exception table.

i kexec that from 2.6.24 (with discontinuous and slab)
so that e820 is passed by kexec from first kernel. normal pxeboot will
have start for 0.

YH
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-01-30 Thread H. Peter Anvin

Yinghai Lu wrote:

On Jan 30, 2008 10:09 PM, H. Peter Anvin [EMAIL PROTECTED] wrote:

Christoph Lameter wrote:

x86 supports booting from a node without RAM?


it is a two sockets system. only 4G RAM installed on node1.



Node 1 is the boot CPU, though, right?

I don't know if the spec requires node 0 to be the boot node.  Probably not.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/