Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
On 26/09/07 14:20 -0700, H. Peter Anvin wrote: > Testing this patch now: > > >From 2efa33f81ef56e7700c09a3d8a881c96692149e5 Mon Sep 17 00:00:00 2001 > From: H. Peter Anvin <[EMAIL PROTECTED]> > Date: Wed, 26 Sep 2007 14:11:43 -0700 > Subject: [PATCH] [x86 setup] Handle case of improperly terminated E820 chain > > At least one system (a Geode system with a Digital Logic BIOS) has > been found which suddenly stops reporting the SMAP signature when > reading the E820 memory chain. We can't know what, exactly, broke in > the BIOS, so if we detect this situation, declare the E820 data > unusable and fall back to E801. > > Also, revert to original behavior of always probing all memory > methods; that way all the memory information is available to the > kernel. > > Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]> > Cc: Jordan Crouse <[EMAIL PROTECTED]> > Cc: Joerg Pommnitz <[EMAIL PROTECTED]> > --- > arch/i386/boot/memory.c | 30 +++--- > 1 files changed, 23 insertions(+), 7 deletions(-) > > diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c > index 1a2e62d..bccaa1c 100644 > --- a/arch/i386/boot/memory.c > +++ b/arch/i386/boot/memory.c > @@ -20,6 +20,7 @@ > > static int detect_memory_e820(void) > { > + int count = 0; > u32 next = 0; > u32 size, id; > u8 err; > @@ -33,14 +34,24 @@ static int detect_memory_e820(void) > "=m" (*desc) > : "D" (desc), "a" (0xe820)); > > - if (err || id != SMAP) > + /* Some BIOSes stop returning SMAP in the middle of > +the search loop. We don't know exactly how the BIOS > +screwed up the map at that point, we might have a > +partial map, the full map, or complete garbage, so > +just return failure. */ > + if (id != SMAP) { > + count = 0; > break; > + } > > - boot_params.e820_entries++; > + if (err) > + break; > + > + count++; > desc++; > - } while (next && boot_params.e820_entries < E820MAX); > + } while (next && count < E820MAX); > > - return boot_params.e820_entries; > + return boot_params.e820_entries = count; > } > > static int detect_memory_e801(void) > @@ -89,11 +100,16 @@ static int detect_memory_88(void) > > int detect_memory(void) > { > + int err = -1; > + > if (detect_memory_e820() > 0) > - return 0; > + err = 0; > > if (!detect_memory_e801()) > - return 0; > + err = 0; > + > + if (!detect_memory_88()) > + err = 0; > > - return detect_memory_88(); > + return err; > } > -- > 1.5.3.1 > Works here with the buggy BIOS. Acked-by: Jordan Crouse <[EMAIL PROTECTED]> Thanks. -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Jordan Crouse wrote: > > Hmm - the old code seems to fail to e801 when CF was set too: > > int $0x15 # make the call > jc bail820 # fall to e801 if it fails > > cmpl$SMAP, %eax # check the return is `SMAP' > jne bail820 # fall to e801 if it fails > > Thats not to say that the old code was correct, mind you. Failing on a > bad ID and returning without error on a set CF seems to be good to me. > Testing this patch now: >From 2efa33f81ef56e7700c09a3d8a881c96692149e5 Mon Sep 17 00:00:00 2001 From: H. Peter Anvin <[EMAIL PROTECTED]> Date: Wed, 26 Sep 2007 14:11:43 -0700 Subject: [PATCH] [x86 setup] Handle case of improperly terminated E820 chain At least one system (a Geode system with a Digital Logic BIOS) has been found which suddenly stops reporting the SMAP signature when reading the E820 memory chain. We can't know what, exactly, broke in the BIOS, so if we detect this situation, declare the E820 data unusable and fall back to E801. Also, revert to original behavior of always probing all memory methods; that way all the memory information is available to the kernel. Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]> Cc: Jordan Crouse <[EMAIL PROTECTED]> Cc: Joerg Pommnitz <[EMAIL PROTECTED]> --- arch/i386/boot/memory.c | 30 +++--- 1 files changed, 23 insertions(+), 7 deletions(-) diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index 1a2e62d..bccaa1c 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -20,6 +20,7 @@ static int detect_memory_e820(void) { + int count = 0; u32 next = 0; u32 size, id; u8 err; @@ -33,14 +34,24 @@ static int detect_memory_e820(void) "=m" (*desc) : "D" (desc), "a" (0xe820)); - if (err || id != SMAP) + /* Some BIOSes stop returning SMAP in the middle of + the search loop. We don't know exactly how the BIOS + screwed up the map at that point, we might have a + partial map, the full map, or complete garbage, so + just return failure. */ + if (id != SMAP) { + count = 0; break; + } - boot_params.e820_entries++; + if (err) + break; + + count++; desc++; - } while (next && boot_params.e820_entries < E820MAX); + } while (next && count < E820MAX); - return boot_params.e820_entries; + return boot_params.e820_entries = count; } static int detect_memory_e801(void) @@ -89,11 +100,16 @@ static int detect_memory_88(void) int detect_memory(void) { + int err = -1; + if (detect_memory_e820() > 0) - return 0; + err = 0; if (!detect_memory_e801()) - return 0; + err = 0; + + if (!detect_memory_88()) + err = 0; - return detect_memory_88(); + return err; } -- 1.5.3.1
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
On 26/09/07 14:04 -0700, H. Peter Anvin wrote: > Jordan Crouse wrote: > > On 26/09/07 12:14 -0700, H. Peter Anvin wrote: > >> Please try the following debug patch to let us know what is going on. > >> > >>-hpa > > > >> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c > >> index 1a2e62d..a0ccf29 100644 > >> --- a/arch/i386/boot/memory.c > >> +++ b/arch/i386/boot/memory.c > >> @@ -33,6 +33,12 @@ static int detect_memory_e820(void) > >> "=m" (*desc) > >>: "D" (desc), "a" (0xe820)); > >> > >> + printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n", > >> + err, id, next, > >> + (unsigned int)desc->addr, > >> + (unsigned int)desc->size, > >> + desc->type); > >> + > >>if (err || id != SMAP) > >>break; > > > > Okay, we have clarity. Here is the output > > > > e820: err 0 id 0x534d4150 next 15476 :0009fc00 1 > > e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2 > > e820: err 0 id 0x534d4150 next 15516 000e:0002 2 > > e820: err 0 id 0x0e7b next 11536 0010:0e6b 1 > > > > In the last entry, id is obviously wrong (it should be 'SMAP' or > > 0x534d4150). This is the BIOS bug. > > > > Here's the reason why this bothers us now. In the old assembly code, > > if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801 > > code. In the new code in memory.c, if id != SMAP, we break out of the > > int15 loop, and return boot_params.e820_entries, which in our case is > > 3. detect_memory() considers this to be successful, and no attempt to > > parse e801 is made. > > > > So thats where the problem is - in the old code with the buggy BIOS, we > > punted to reading the e801 information, and that was enough to keep us > > going. In the new code, we allow a partial table to be used, and we > > blow up. > > > > Attached is a patch to fix this - it returns -1 on error, and only sets > > boot_params.e820_entries to be non-zero if we have something useful > > in it. This punts the detection to the e801 code, which then is > > then successful. > > > > This fixes the problem with the DB800, and so it probably should > > with the other Geode platforms affected by this. > > > > Many thanks to hpa for the guiding hand. > > > > This patch is obviously wrong. There are a lot of e820 BIOSen out there > that terminate with CF=1, and that is a legitimate termination condition > for e820. Now, as far as what to do when id != SMAP, it probably is > still the right thing to do; since the BOS vendor couldn't get something > that elementary correct, we shouldn't trust the data. > > I'll write up a corrected patch. Hmm - the old code seems to fail to e801 when CF was set too: int $0x15 # make the call jc bail820 # fall to e801 if it fails cmpl$SMAP, %eax # check the return is `SMAP' jne bail820 # fall to e801 if it fails Thats not to say that the old code was correct, mind you. Failing on a bad ID and returning without error on a set CF seems to be good to me. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Jordan Crouse wrote: > On 26/09/07 12:14 -0700, H. Peter Anvin wrote: >> Please try the following debug patch to let us know what is going on. >> >> -hpa > >> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c >> index 1a2e62d..a0ccf29 100644 >> --- a/arch/i386/boot/memory.c >> +++ b/arch/i386/boot/memory.c >> @@ -33,6 +33,12 @@ static int detect_memory_e820(void) >>"=m" (*desc) >> : "D" (desc), "a" (0xe820)); >> >> +printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n", >> + err, id, next, >> + (unsigned int)desc->addr, >> + (unsigned int)desc->size, >> + desc->type); >> + >> if (err || id != SMAP) >> break; > > Okay, we have clarity. Here is the output > > e820: err 0 id 0x534d4150 next 15476 :0009fc00 1 > e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2 > e820: err 0 id 0x534d4150 next 15516 000e:0002 2 > e820: err 0 id 0x0e7b next 11536 0010:0e6b 1 > > In the last entry, id is obviously wrong (it should be 'SMAP' or > 0x534d4150). This is the BIOS bug. > > Here's the reason why this bothers us now. In the old assembly code, > if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801 > code. In the new code in memory.c, if id != SMAP, we break out of the > int15 loop, and return boot_params.e820_entries, which in our case is > 3. detect_memory() considers this to be successful, and no attempt to > parse e801 is made. > > So thats where the problem is - in the old code with the buggy BIOS, we > punted to reading the e801 information, and that was enough to keep us > going. In the new code, we allow a partial table to be used, and we > blow up. > > Attached is a patch to fix this - it returns -1 on error, and only sets > boot_params.e820_entries to be non-zero if we have something useful > in it. This punts the detection to the e801 code, which then is > then successful. > > This fixes the problem with the DB800, and so it probably should > with the other Geode platforms affected by this. > > Many thanks to hpa for the guiding hand. > This patch is obviously wrong. There are a lot of e820 BIOSen out there that terminate with CF=1, and that is a legitimate termination condition for e820. Now, as far as what to do when id != SMAP, it probably is still the right thing to do; since the BOS vendor couldn't get something that elementary correct, we shouldn't trust the data. I'll write up a corrected patch. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
On 26/09/07 12:14 -0700, H. Peter Anvin wrote: > Please try the following debug patch to let us know what is going on. > > -hpa > diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c > index 1a2e62d..a0ccf29 100644 > --- a/arch/i386/boot/memory.c > +++ b/arch/i386/boot/memory.c > @@ -33,6 +33,12 @@ static int detect_memory_e820(void) > "=m" (*desc) > : "D" (desc), "a" (0xe820)); > > + printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n", > +err, id, next, > +(unsigned int)desc->addr, > +(unsigned int)desc->size, > +desc->type); > + > if (err || id != SMAP) > break; Okay, we have clarity. Here is the output e820: err 0 id 0x534d4150 next 15476 :0009fc00 1 e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2 e820: err 0 id 0x534d4150 next 15516 000e:0002 2 e820: err 0 id 0x0e7b next 11536 0010:0e6b 1 In the last entry, id is obviously wrong (it should be 'SMAP' or 0x534d4150). This is the BIOS bug. Here's the reason why this bothers us now. In the old assembly code, if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801 code. In the new code in memory.c, if id != SMAP, we break out of the int15 loop, and return boot_params.e820_entries, which in our case is 3. detect_memory() considers this to be successful, and no attempt to parse e801 is made. So thats where the problem is - in the old code with the buggy BIOS, we punted to reading the e801 information, and that was enough to keep us going. In the new code, we allow a partial table to be used, and we blow up. Attached is a patch to fix this - it returns -1 on error, and only sets boot_params.e820_entries to be non-zero if we have something useful in it. This punts the detection to the e801 code, which then is then successful. This fixes the problem with the DB800, and so it probably should with the other Geode platforms affected by this. Many thanks to hpa for the guiding hand. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. [i386]: Return an error if the e820 detection goes bad From: Jordan Crouse <[EMAIL PROTECTED]> Change the e820 code to always return an error if something bad happens while reading the e820 map. This matches the old code behavior, and allows brain-dead e820 implementations to still work. Signed-off-by: Jordan Crouse <[EMAIL PROTECTED]> --- arch/i386/boot/memory.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index 1a2e62d..4c7f0f6 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -22,7 +22,7 @@ static int detect_memory_e820(void) { u32 next = 0; u32 size, id; - u8 err; + u8 err, count = 0; struct e820entry *desc = boot_params.e820_map; do { @@ -34,13 +34,14 @@ static int detect_memory_e820(void) : "D" (desc), "a" (0xe820)); if (err || id != SMAP) - break; + return -1; - boot_params.e820_entries++; + count++; desc++; } while (next && boot_params.e820_entries < E820MAX); - return boot_params.e820_entries; + boot_params.e820_entries = count; + return count; } static int detect_memory_e801(void)
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Jordan Crouse wrote: > > Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c > is being set to 9F (640k) in the broken case, this due to the > the e820 map looking something like this: > > Address Size Type > 0009FC00 1 > 0009FC00 0400 2 > 000E 2000 2 > > (Yep, thats it - thats the list. e820.nr_map is indeed 3). > > Long story short, bdata->node_low_pfn gets set to 9F, and When we > try to allocate the bootmem bitmap (at _pa_symbol(_text), which is > page 0x100), then the system gets appropriately angry. > > As background, I'm using syslinux 3.36 as my loader here - I've used this > exact same version for a very long time, so I don't blame it in the least. > Something is getting confused in the early kernel, and whatever that > something is, a still unknown change in a newer version of the BIOS > fixed it. The search goes on. > Please try the following debug patch to let us know what is going on. -hpa diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index 1a2e62d..a0ccf29 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -33,6 +33,12 @@ static int detect_memory_e820(void) "=m" (*desc) : "D" (desc), "a" (0xe820)); + printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n", + err, id, next, + (unsigned int)desc->addr, + (unsigned int)desc->size, + desc->type); + if (err || id != SMAP) break;
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Jordan Crouse wrote: > > As background, I'm using syslinux 3.36 as my loader here - I've used this > exact same version for a very long time, so I don't blame it in the least. > Something is getting confused in the early kernel, and whatever that > something is, a still unknown change in a newer version of the BIOS > fixed it. The search goes on. > OK, we should put printf's in arch/i386/boot/memory.c and see what actually gets read out from the BIOS. This could either be a BIOS problem or a bug in memory.c (or a bug elsewhere in the code that the change in memory.c triggers, but that seems less likely.) -hpa P.S. Are you guys in the Bay Area by any chance? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
On 26/09/07 07:10 -0700, H. Peter Anvin wrote: > Joerg Pommnitz wrote: > > Hello all, > > this is what git bisect told me about the problem: > > > > [EMAIL PROTECTED]:~/linux-2.6$ git bisect good > > 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit > > commit 4fd06960f120e02e9abc802a09f9511c400042a5 > > Author: H. Peter Anvin <[EMAIL PROTECTED]> > > Date: Wed Jul 11 12:18:56 2007 -0700 > > > > Use the new x86 setup code for i386 > > > > This patch hooks the new x86 setup code into the Makefile machinery. It > > also adapts boot/tools/build.c to a two-file (as opposed to three-file) > > universe, and simplifies it substantially. > > > > Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]> > > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> > > > > :04 04 6560eb5b7e40d93813276544bced8c478f9067f5 > > fe5f90d9ca08e526559815789175602ba2c51743 M arch > > > > There is something very fishy. > > The only documentation you've given us so far is a screen shot which > contained a message ("BIOS data check successful") which doesn't occur > in the kernel. > > The loader string doesn't look all that familiar either; it looks like > an extremely old version of SYSLINUX, but that doesn't contain that > message either. > > INT 6 is #UD, the undefined instruction exception. This is consistent with: > > > Its hitting a bug - specifically (from bootmem.c:125): > > BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn); > > However, all that tells us is that reserve_bootmem_core() was either > called with a bad address or bdata->node_low_pfn is garbage. In > particular, without knowing how it got there it's hard to know for sure. /me swings a +5 JTAG debugger Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c is being set to 9F (640k) in the broken case, this due to the the e820 map looking something like this: Address Size Type 0009FC00 1 0009FC00 0400 2 000E 2000 2 (Yep, thats it - thats the list. e820.nr_map is indeed 3). Long story short, bdata->node_low_pfn gets set to 9F, and When we try to allocate the bootmem bitmap (at _pa_symbol(_text), which is page 0x100), then the system gets appropriately angry. As background, I'm using syslinux 3.36 as my loader here - I've used this exact same version for a very long time, so I don't blame it in the least. Something is getting confused in the early kernel, and whatever that something is, a still unknown change in a newer version of the BIOS fixed it. The search goes on. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
> There is something very fishy. > > The only documentation you've given us so far is a screen shot which > contained a message ("BIOS data check successful") which doesn't occur > in the kernel. > > The loader string doesn't look all that familiar either; it looks like > an extremely old version of SYSLINUX, but that doesn't contain that > message either. The boot loader is LILO from Ubuntu 7.04, so it should be fairly recent. > INT 6 is #UD, the undefined instruction exception. This is consistent with: > > > Its hitting a bug - specifically (from bootmem.c:125): > > BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn); > > However, all that tells us is that reserve_bootmem_core() was either > called with a bad address or bdata->node_low_pfn is garbage. In > particular, without knowing how it got there it's hard to know for sure. > > Could you send me the boot messages from a working kernel boot? Attached is a boot log I get where the last patch is f2d98ae63dc64dedb00499289e13a50677f771f9, e.g. "Linker script for the new x86 setup code". The kernel is directly from "git bisect, make defconfig, make", no local changes or strange patches applied. Build environment is plain Ubuntu-7.04. -- Kind regards Joerg __ Alles was der Gesundheit und Entspannung dient. BE A BETTER MEDIZINMANN! www.yahoo.de/cleverLinux version 2.6.22-gf2d98ae6 ([EMAIL PROTECTED]) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #12 SMP Wed Sep 26 12:45:21 CEST 2007 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 1e7b (usable) BIOS-e820: 1e7b - 1e7bffc0 (ACPI data) BIOS-e820: 1e7bffc0 - 1e7c (ACPI NVS) BIOS-e820: 4040 - 40440004 (reserved) BIOS-e820: f000 - 0001 (reserved) 0MB HIGHMEM available. 487MB LOWMEM available. Entering add_active_range(0, 0, 124848) 0 entries of 256 used Zone PFN ranges: DMA 0 -> 4096 Normal 4096 -> 124848 HighMem124848 -> 124848 early_node_map[1] active PFN ranges 0:0 -> 124848 On node 0 totalpages: 124848 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 4064 pages, LIFO batch:0 Normal zone: 943 pages used for memmap Normal zone: 119809 pages, LIFO batch:31 HighMem zone: 0 pages used for memmap DMI not present or invalid. Using APIC driver default ACPI: RSDP 000E9010, 0014 (r0 OID_00) ACPI: RSDT 1E7B2AE0, 0030 (r1 AMDRSDT_000 31303030 AMD 31303030) ACPI: FACP 1E7B29E0, 0084 (r1 AMDFACP_000 31303030 AMD 31303030) ACPI: DSDT 1E7B, 29D6 (r1 INSYDE CS553x 1007 INTL 20030122) ACPI: FACS 1E7BFFC0, 0040 ACPI: BOOT 1E7B2A70, 0028 (r1 AMDBOOT_000 31303030 AMD 31303030) ACPI: DBGP 1E7B2AA0, 0034 (r1 AMDDBGP_000 31303030 AMD 31303030) ACPI: no DMI BIOS year, acpi=force is required to enable ACPI ACPI: Disabling ACPI support Allocating PCI resources starting at 5000 (gap: 40440004:afbbfffc) Built 1 zonelists. Total pages: 123873 Kernel command line: BOOT_IMAGE=Linux2623 ro root=341 No local APIC present or hardware disabled mapped APIC to d000 (013dc000) Initializing CPU#0 PID hash table entries: 2048 (order: 11, 8192 bytes) Detected 498.434 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) Memory: 488944k/499392k available (3041k kernel code, 9872k reserved, 1517k data, 292k init, 0k highmem) virtual kernel memory layout: fixmap : 0xffe16000 - 0xf000 (1956 kB) pkmap : 0xff80 - 0xffc0 (4096 kB) vmalloc : 0xdf00 - 0xff7fe000 ( 519 MB) lowmem : 0xc000 - 0xde7b ( 487 MB) .init : 0xc057b000 - 0xc05c4000 ( 292 kB) .data : 0xc03f86e0 - 0xc0573ecc (1517 kB) .text : 0xc010 - 0xc03f86e0 (3041 kB) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 998.38 BogoMIPS (lpj=1996769) Mount-cache hash table entries: 512 CPU: After generic identify, caps: 0088a93d c0c0a13d CPU: L1 I Cache: 64K (32 bytes/line), D cache 64K (32 bytes/line) CPU: L2 Cache: 128K (32 bytes/line) CPU: After all inits, caps: 0088a93d c0c0a13d Compat vDSO mapped to e000. Checking 'hlt' instruction... OK. SMP alternatives: switching to UP code Freeing SMP alternatives: 18k freed CPU0: AMD Geode(TM) Integrated Processor by AMD PCS stepping 02 SMP motherboard not detected. Local APIC not detected. Using dummy APIC emulation. Brought up 1 CPUs NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Joerg Pommnitz wrote: > Hello all, > this is what git bisect told me about the problem: > > [EMAIL PROTECTED]:~/linux-2.6$ git bisect good > 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit > commit 4fd06960f120e02e9abc802a09f9511c400042a5 > Author: H. Peter Anvin <[EMAIL PROTECTED]> > Date: Wed Jul 11 12:18:56 2007 -0700 > > Use the new x86 setup code for i386 > > This patch hooks the new x86 setup code into the Makefile machinery. It > also adapts boot/tools/build.c to a two-file (as opposed to three-file) > universe, and simplifies it substantially. > > Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]> > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> > > :04 04 6560eb5b7e40d93813276544bced8c478f9067f5 > fe5f90d9ca08e526559815789175602ba2c51743 M arch > There is something very fishy. The only documentation you've given us so far is a screen shot which contained a message ("BIOS data check successful") which doesn't occur in the kernel. The loader string doesn't look all that familiar either; it looks like an extremely old version of SYSLINUX, but that doesn't contain that message either. INT 6 is #UD, the undefined instruction exception. This is consistent with: > Its hitting a bug - specifically (from bootmem.c:125): > BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn); However, all that tells us is that reserve_bootmem_core() was either called with a bad address or bdata->node_low_pfn is garbage. In particular, without knowing how it got there it's hard to know for sure. Could you send me the boot messages from a working kernel boot? -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Hello all, this is what git bisect told me about the problem: [EMAIL PROTECTED]:~/linux-2.6$ git bisect good 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit commit 4fd06960f120e02e9abc802a09f9511c400042a5 Author: H. Peter Anvin <[EMAIL PROTECTED]> Date: Wed Jul 11 12:18:56 2007 -0700 Use the new x86 setup code for i386 This patch hooks the new x86 setup code into the Makefile machinery. It also adapts boot/tools/build.c to a two-file (as opposed to three-file) universe, and simplifies it substantially. Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]> Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> :04 04 6560eb5b7e40d93813276544bced8c478f9067f5 fe5f90d9ca08e526559815789175602ba2c51743 M arch -- Regards Joerg - Ursprüngliche Mail Von: Jordan Crouse <[EMAIL PROTECTED]> An: Joerg Pommnitz <[EMAIL PROTECTED]> CC: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org Gesendet: Dienstag, den 25. September 2007, 17:04:52 Uhr Betreff: Re: Problems with 2.6.23-rc6 on AMD Geode LX800 On 25/09/07 01:38 -0700, Joerg Pommnitz wrote: > Chuck, Jordan, > thanks for taking an interest in this problem. As suggested by Jordan I tried > a new > BIOS revision from > http://www.digitallogic.ch/index.php?id=256=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21=23 > > Unfortunately the kernel still fails to boot in the same way. You'll have to contact Digital Logic and have them check with the BIOS vendor to see if the fix was made in that version or not. I don't have that particular board, so I can't try out the fixes here. I'm still trying to track down the particulars of the fix from the BIOS vendor. I'll let you know. > Do you still need the disassembled reserve_bootmem_core? Sure - you might as well - just to make sure its the same problem. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. __ Yahoo! Clever: Sie haben Fragen? Yahoo! Nutzer antworten Ihnen. www.yahoo.de/clever - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Hello all, this is what git bisect told me about the problem: [EMAIL PROTECTED]:~/linux-2.6$ git bisect good 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit commit 4fd06960f120e02e9abc802a09f9511c400042a5 Author: H. Peter Anvin [EMAIL PROTECTED] Date: Wed Jul 11 12:18:56 2007 -0700 Use the new x86 setup code for i386 This patch hooks the new x86 setup code into the Makefile machinery. It also adapts boot/tools/build.c to a two-file (as opposed to three-file) universe, and simplifies it substantially. Signed-off-by: H. Peter Anvin [EMAIL PROTECTED] Signed-off-by: Linus Torvalds [EMAIL PROTECTED] :04 04 6560eb5b7e40d93813276544bced8c478f9067f5 fe5f90d9ca08e526559815789175602ba2c51743 M arch -- Regards Joerg - Ursprüngliche Mail Von: Jordan Crouse [EMAIL PROTECTED] An: Joerg Pommnitz [EMAIL PROTECTED] CC: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org Gesendet: Dienstag, den 25. September 2007, 17:04:52 Uhr Betreff: Re: Problems with 2.6.23-rc6 on AMD Geode LX800 On 25/09/07 01:38 -0700, Joerg Pommnitz wrote: Chuck, Jordan, thanks for taking an interest in this problem. As suggested by Jordan I tried a new BIOS revision from http://www.digitallogic.ch/index.php?id=256dir=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21mountpoint=23 Unfortunately the kernel still fails to boot in the same way. You'll have to contact Digital Logic and have them check with the BIOS vendor to see if the fix was made in that version or not. I don't have that particular board, so I can't try out the fixes here. I'm still trying to track down the particulars of the fix from the BIOS vendor. I'll let you know. Do you still need the disassembled reserve_bootmem_core? Sure - you might as well - just to make sure its the same problem. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. __ Yahoo! Clever: Sie haben Fragen? Yahoo! Nutzer antworten Ihnen. www.yahoo.de/clever - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Joerg Pommnitz wrote: Hello all, this is what git bisect told me about the problem: [EMAIL PROTECTED]:~/linux-2.6$ git bisect good 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit commit 4fd06960f120e02e9abc802a09f9511c400042a5 Author: H. Peter Anvin [EMAIL PROTECTED] Date: Wed Jul 11 12:18:56 2007 -0700 Use the new x86 setup code for i386 This patch hooks the new x86 setup code into the Makefile machinery. It also adapts boot/tools/build.c to a two-file (as opposed to three-file) universe, and simplifies it substantially. Signed-off-by: H. Peter Anvin [EMAIL PROTECTED] Signed-off-by: Linus Torvalds [EMAIL PROTECTED] :04 04 6560eb5b7e40d93813276544bced8c478f9067f5 fe5f90d9ca08e526559815789175602ba2c51743 M arch There is something very fishy. The only documentation you've given us so far is a screen shot which contained a message (BIOS data check successful) which doesn't occur in the kernel. The loader string doesn't look all that familiar either; it looks like an extremely old version of SYSLINUX, but that doesn't contain that message either. INT 6 is #UD, the undefined instruction exception. This is consistent with: Its hitting a bug - specifically (from bootmem.c:125): BUG_ON(PFN_DOWN(addr) = bdata-node_low_pfn); However, all that tells us is that reserve_bootmem_core() was either called with a bad address or bdata-node_low_pfn is garbage. In particular, without knowing how it got there it's hard to know for sure. Could you send me the boot messages from a working kernel boot? -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
There is something very fishy. The only documentation you've given us so far is a screen shot which contained a message (BIOS data check successful) which doesn't occur in the kernel. The loader string doesn't look all that familiar either; it looks like an extremely old version of SYSLINUX, but that doesn't contain that message either. The boot loader is LILO from Ubuntu 7.04, so it should be fairly recent. INT 6 is #UD, the undefined instruction exception. This is consistent with: Its hitting a bug - specifically (from bootmem.c:125): BUG_ON(PFN_DOWN(addr) = bdata-node_low_pfn); However, all that tells us is that reserve_bootmem_core() was either called with a bad address or bdata-node_low_pfn is garbage. In particular, without knowing how it got there it's hard to know for sure. Could you send me the boot messages from a working kernel boot? Attached is a boot log I get where the last patch is f2d98ae63dc64dedb00499289e13a50677f771f9, e.g. Linker script for the new x86 setup code. The kernel is directly from git bisect, make defconfig, make, no local changes or strange patches applied. Build environment is plain Ubuntu-7.04. -- Kind regards Joerg __ Alles was der Gesundheit und Entspannung dient. BE A BETTER MEDIZINMANN! www.yahoo.de/cleverLinux version 2.6.22-gf2d98ae6 ([EMAIL PROTECTED]) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #12 SMP Wed Sep 26 12:45:21 CEST 2007 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 1e7b (usable) BIOS-e820: 1e7b - 1e7bffc0 (ACPI data) BIOS-e820: 1e7bffc0 - 1e7c (ACPI NVS) BIOS-e820: 4040 - 40440004 (reserved) BIOS-e820: f000 - 0001 (reserved) 0MB HIGHMEM available. 487MB LOWMEM available. Entering add_active_range(0, 0, 124848) 0 entries of 256 used Zone PFN ranges: DMA 0 - 4096 Normal 4096 - 124848 HighMem124848 - 124848 early_node_map[1] active PFN ranges 0:0 - 124848 On node 0 totalpages: 124848 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 4064 pages, LIFO batch:0 Normal zone: 943 pages used for memmap Normal zone: 119809 pages, LIFO batch:31 HighMem zone: 0 pages used for memmap DMI not present or invalid. Using APIC driver default ACPI: RSDP 000E9010, 0014 (r0 OID_00) ACPI: RSDT 1E7B2AE0, 0030 (r1 AMDRSDT_000 31303030 AMD 31303030) ACPI: FACP 1E7B29E0, 0084 (r1 AMDFACP_000 31303030 AMD 31303030) ACPI: DSDT 1E7B, 29D6 (r1 INSYDE CS553x 1007 INTL 20030122) ACPI: FACS 1E7BFFC0, 0040 ACPI: BOOT 1E7B2A70, 0028 (r1 AMDBOOT_000 31303030 AMD 31303030) ACPI: DBGP 1E7B2AA0, 0034 (r1 AMDDBGP_000 31303030 AMD 31303030) ACPI: no DMI BIOS year, acpi=force is required to enable ACPI ACPI: Disabling ACPI support Allocating PCI resources starting at 5000 (gap: 40440004:afbbfffc) Built 1 zonelists. Total pages: 123873 Kernel command line: BOOT_IMAGE=Linux2623 ro root=341 No local APIC present or hardware disabled mapped APIC to d000 (013dc000) Initializing CPU#0 PID hash table entries: 2048 (order: 11, 8192 bytes) Detected 498.434 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) Memory: 488944k/499392k available (3041k kernel code, 9872k reserved, 1517k data, 292k init, 0k highmem) virtual kernel memory layout: fixmap : 0xffe16000 - 0xf000 (1956 kB) pkmap : 0xff80 - 0xffc0 (4096 kB) vmalloc : 0xdf00 - 0xff7fe000 ( 519 MB) lowmem : 0xc000 - 0xde7b ( 487 MB) .init : 0xc057b000 - 0xc05c4000 ( 292 kB) .data : 0xc03f86e0 - 0xc0573ecc (1517 kB) .text : 0xc010 - 0xc03f86e0 (3041 kB) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 998.38 BogoMIPS (lpj=1996769) Mount-cache hash table entries: 512 CPU: After generic identify, caps: 0088a93d c0c0a13d CPU: L1 I Cache: 64K (32 bytes/line), D cache 64K (32 bytes/line) CPU: L2 Cache: 128K (32 bytes/line) CPU: After all inits, caps: 0088a93d c0c0a13d Compat vDSO mapped to e000. Checking 'hlt' instruction... OK. SMP alternatives: switching to UP code Freeing SMP alternatives: 18k freed CPU0: AMD Geode(TM) Integrated Processor by AMD PCS stepping 02 SMP motherboard not detected. Local APIC not detected. Using dummy APIC emulation. Brought up 1 CPUs NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xff8b7, last bus=0 PCI:
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
On 26/09/07 07:10 -0700, H. Peter Anvin wrote: Joerg Pommnitz wrote: Hello all, this is what git bisect told me about the problem: [EMAIL PROTECTED]:~/linux-2.6$ git bisect good 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit commit 4fd06960f120e02e9abc802a09f9511c400042a5 Author: H. Peter Anvin [EMAIL PROTECTED] Date: Wed Jul 11 12:18:56 2007 -0700 Use the new x86 setup code for i386 This patch hooks the new x86 setup code into the Makefile machinery. It also adapts boot/tools/build.c to a two-file (as opposed to three-file) universe, and simplifies it substantially. Signed-off-by: H. Peter Anvin [EMAIL PROTECTED] Signed-off-by: Linus Torvalds [EMAIL PROTECTED] :04 04 6560eb5b7e40d93813276544bced8c478f9067f5 fe5f90d9ca08e526559815789175602ba2c51743 M arch There is something very fishy. The only documentation you've given us so far is a screen shot which contained a message (BIOS data check successful) which doesn't occur in the kernel. The loader string doesn't look all that familiar either; it looks like an extremely old version of SYSLINUX, but that doesn't contain that message either. INT 6 is #UD, the undefined instruction exception. This is consistent with: Its hitting a bug - specifically (from bootmem.c:125): BUG_ON(PFN_DOWN(addr) = bdata-node_low_pfn); However, all that tells us is that reserve_bootmem_core() was either called with a bad address or bdata-node_low_pfn is garbage. In particular, without knowing how it got there it's hard to know for sure. /me swings a +5 JTAG debugger Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c is being set to 9F (640k) in the broken case, this due to the the e820 map looking something like this: Address Size Type 0009FC00 1 0009FC00 0400 2 000E 2000 2 (Yep, thats it - thats the list. e820.nr_map is indeed 3). Long story short, bdata-node_low_pfn gets set to 9F, and When we try to allocate the bootmem bitmap (at _pa_symbol(_text), which is page 0x100), then the system gets appropriately angry. As background, I'm using syslinux 3.36 as my loader here - I've used this exact same version for a very long time, so I don't blame it in the least. Something is getting confused in the early kernel, and whatever that something is, a still unknown change in a newer version of the BIOS fixed it. The search goes on. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Jordan Crouse wrote: As background, I'm using syslinux 3.36 as my loader here - I've used this exact same version for a very long time, so I don't blame it in the least. Something is getting confused in the early kernel, and whatever that something is, a still unknown change in a newer version of the BIOS fixed it. The search goes on. OK, we should put printf's in arch/i386/boot/memory.c and see what actually gets read out from the BIOS. This could either be a BIOS problem or a bug in memory.c (or a bug elsewhere in the code that the change in memory.c triggers, but that seems less likely.) -hpa P.S. Are you guys in the Bay Area by any chance? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Jordan Crouse wrote: Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c is being set to 9F (640k) in the broken case, this due to the the e820 map looking something like this: Address Size Type 0009FC00 1 0009FC00 0400 2 000E 2000 2 (Yep, thats it - thats the list. e820.nr_map is indeed 3). Long story short, bdata-node_low_pfn gets set to 9F, and When we try to allocate the bootmem bitmap (at _pa_symbol(_text), which is page 0x100), then the system gets appropriately angry. As background, I'm using syslinux 3.36 as my loader here - I've used this exact same version for a very long time, so I don't blame it in the least. Something is getting confused in the early kernel, and whatever that something is, a still unknown change in a newer version of the BIOS fixed it. The search goes on. Please try the following debug patch to let us know what is going on. -hpa diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index 1a2e62d..a0ccf29 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -33,6 +33,12 @@ static int detect_memory_e820(void) =m (*desc) : D (desc), a (0xe820)); + printf(e820: err %d id 0x%08x next %u %08x:%08x %u\n, + err, id, next, + (unsigned int)desc-addr, + (unsigned int)desc-size, + desc-type); + if (err || id != SMAP) break;
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
On 26/09/07 12:14 -0700, H. Peter Anvin wrote: Please try the following debug patch to let us know what is going on. -hpa diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index 1a2e62d..a0ccf29 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -33,6 +33,12 @@ static int detect_memory_e820(void) =m (*desc) : D (desc), a (0xe820)); + printf(e820: err %d id 0x%08x next %u %08x:%08x %u\n, +err, id, next, +(unsigned int)desc-addr, +(unsigned int)desc-size, +desc-type); + if (err || id != SMAP) break; Okay, we have clarity. Here is the output e820: err 0 id 0x534d4150 next 15476 :0009fc00 1 e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2 e820: err 0 id 0x534d4150 next 15516 000e:0002 2 e820: err 0 id 0x0e7b next 11536 0010:0e6b 1 In the last entry, id is obviously wrong (it should be 'SMAP' or 0x534d4150). This is the BIOS bug. Here's the reason why this bothers us now. In the old assembly code, if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801 code. In the new code in memory.c, if id != SMAP, we break out of the int15 loop, and return boot_params.e820_entries, which in our case is 3. detect_memory() considers this to be successful, and no attempt to parse e801 is made. So thats where the problem is - in the old code with the buggy BIOS, we punted to reading the e801 information, and that was enough to keep us going. In the new code, we allow a partial table to be used, and we blow up. Attached is a patch to fix this - it returns -1 on error, and only sets boot_params.e820_entries to be non-zero if we have something useful in it. This punts the detection to the e801 code, which then is then successful. This fixes the problem with the DB800, and so it probably should with the other Geode platforms affected by this. Many thanks to hpa for the guiding hand. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. [i386]: Return an error if the e820 detection goes bad From: Jordan Crouse [EMAIL PROTECTED] Change the e820 code to always return an error if something bad happens while reading the e820 map. This matches the old code behavior, and allows brain-dead e820 implementations to still work. Signed-off-by: Jordan Crouse [EMAIL PROTECTED] --- arch/i386/boot/memory.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index 1a2e62d..4c7f0f6 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -22,7 +22,7 @@ static int detect_memory_e820(void) { u32 next = 0; u32 size, id; - u8 err; + u8 err, count = 0; struct e820entry *desc = boot_params.e820_map; do { @@ -34,13 +34,14 @@ static int detect_memory_e820(void) : D (desc), a (0xe820)); if (err || id != SMAP) - break; + return -1; - boot_params.e820_entries++; + count++; desc++; } while (next boot_params.e820_entries E820MAX); - return boot_params.e820_entries; + boot_params.e820_entries = count; + return count; } static int detect_memory_e801(void)
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Jordan Crouse wrote: On 26/09/07 12:14 -0700, H. Peter Anvin wrote: Please try the following debug patch to let us know what is going on. -hpa diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index 1a2e62d..a0ccf29 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -33,6 +33,12 @@ static int detect_memory_e820(void) =m (*desc) : D (desc), a (0xe820)); +printf(e820: err %d id 0x%08x next %u %08x:%08x %u\n, + err, id, next, + (unsigned int)desc-addr, + (unsigned int)desc-size, + desc-type); + if (err || id != SMAP) break; Okay, we have clarity. Here is the output e820: err 0 id 0x534d4150 next 15476 :0009fc00 1 e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2 e820: err 0 id 0x534d4150 next 15516 000e:0002 2 e820: err 0 id 0x0e7b next 11536 0010:0e6b 1 In the last entry, id is obviously wrong (it should be 'SMAP' or 0x534d4150). This is the BIOS bug. Here's the reason why this bothers us now. In the old assembly code, if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801 code. In the new code in memory.c, if id != SMAP, we break out of the int15 loop, and return boot_params.e820_entries, which in our case is 3. detect_memory() considers this to be successful, and no attempt to parse e801 is made. So thats where the problem is - in the old code with the buggy BIOS, we punted to reading the e801 information, and that was enough to keep us going. In the new code, we allow a partial table to be used, and we blow up. Attached is a patch to fix this - it returns -1 on error, and only sets boot_params.e820_entries to be non-zero if we have something useful in it. This punts the detection to the e801 code, which then is then successful. This fixes the problem with the DB800, and so it probably should with the other Geode platforms affected by this. Many thanks to hpa for the guiding hand. This patch is obviously wrong. There are a lot of e820 BIOSen out there that terminate with CF=1, and that is a legitimate termination condition for e820. Now, as far as what to do when id != SMAP, it probably is still the right thing to do; since the BOS vendor couldn't get something that elementary correct, we shouldn't trust the data. I'll write up a corrected patch. -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
On 26/09/07 14:04 -0700, H. Peter Anvin wrote: Jordan Crouse wrote: On 26/09/07 12:14 -0700, H. Peter Anvin wrote: Please try the following debug patch to let us know what is going on. -hpa diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index 1a2e62d..a0ccf29 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -33,6 +33,12 @@ static int detect_memory_e820(void) =m (*desc) : D (desc), a (0xe820)); + printf(e820: err %d id 0x%08x next %u %08x:%08x %u\n, + err, id, next, + (unsigned int)desc-addr, + (unsigned int)desc-size, + desc-type); + if (err || id != SMAP) break; Okay, we have clarity. Here is the output e820: err 0 id 0x534d4150 next 15476 :0009fc00 1 e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2 e820: err 0 id 0x534d4150 next 15516 000e:0002 2 e820: err 0 id 0x0e7b next 11536 0010:0e6b 1 In the last entry, id is obviously wrong (it should be 'SMAP' or 0x534d4150). This is the BIOS bug. Here's the reason why this bothers us now. In the old assembly code, if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801 code. In the new code in memory.c, if id != SMAP, we break out of the int15 loop, and return boot_params.e820_entries, which in our case is 3. detect_memory() considers this to be successful, and no attempt to parse e801 is made. So thats where the problem is - in the old code with the buggy BIOS, we punted to reading the e801 information, and that was enough to keep us going. In the new code, we allow a partial table to be used, and we blow up. Attached is a patch to fix this - it returns -1 on error, and only sets boot_params.e820_entries to be non-zero if we have something useful in it. This punts the detection to the e801 code, which then is then successful. This fixes the problem with the DB800, and so it probably should with the other Geode platforms affected by this. Many thanks to hpa for the guiding hand. This patch is obviously wrong. There are a lot of e820 BIOSen out there that terminate with CF=1, and that is a legitimate termination condition for e820. Now, as far as what to do when id != SMAP, it probably is still the right thing to do; since the BOS vendor couldn't get something that elementary correct, we shouldn't trust the data. I'll write up a corrected patch. Hmm - the old code seems to fail to e801 when CF was set too: int $0x15 # make the call jc bail820 # fall to e801 if it fails cmpl$SMAP, %eax # check the return is `SMAP' jne bail820 # fall to e801 if it fails Thats not to say that the old code was correct, mind you. Failing on a bad ID and returning without error on a set CF seems to be good to me. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
Jordan Crouse wrote: Hmm - the old code seems to fail to e801 when CF was set too: int $0x15 # make the call jc bail820 # fall to e801 if it fails cmpl$SMAP, %eax # check the return is `SMAP' jne bail820 # fall to e801 if it fails Thats not to say that the old code was correct, mind you. Failing on a bad ID and returning without error on a set CF seems to be good to me. Testing this patch now: From 2efa33f81ef56e7700c09a3d8a881c96692149e5 Mon Sep 17 00:00:00 2001 From: H. Peter Anvin [EMAIL PROTECTED] Date: Wed, 26 Sep 2007 14:11:43 -0700 Subject: [PATCH] [x86 setup] Handle case of improperly terminated E820 chain At least one system (a Geode system with a Digital Logic BIOS) has been found which suddenly stops reporting the SMAP signature when reading the E820 memory chain. We can't know what, exactly, broke in the BIOS, so if we detect this situation, declare the E820 data unusable and fall back to E801. Also, revert to original behavior of always probing all memory methods; that way all the memory information is available to the kernel. Signed-off-by: H. Peter Anvin [EMAIL PROTECTED] Cc: Jordan Crouse [EMAIL PROTECTED] Cc: Joerg Pommnitz [EMAIL PROTECTED] --- arch/i386/boot/memory.c | 30 +++--- 1 files changed, 23 insertions(+), 7 deletions(-) diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index 1a2e62d..bccaa1c 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -20,6 +20,7 @@ static int detect_memory_e820(void) { + int count = 0; u32 next = 0; u32 size, id; u8 err; @@ -33,14 +34,24 @@ static int detect_memory_e820(void) =m (*desc) : D (desc), a (0xe820)); - if (err || id != SMAP) + /* Some BIOSes stop returning SMAP in the middle of + the search loop. We don't know exactly how the BIOS + screwed up the map at that point, we might have a + partial map, the full map, or complete garbage, so + just return failure. */ + if (id != SMAP) { + count = 0; break; + } - boot_params.e820_entries++; + if (err) + break; + + count++; desc++; - } while (next boot_params.e820_entries E820MAX); + } while (next count E820MAX); - return boot_params.e820_entries; + return boot_params.e820_entries = count; } static int detect_memory_e801(void) @@ -89,11 +100,16 @@ static int detect_memory_88(void) int detect_memory(void) { + int err = -1; + if (detect_memory_e820() 0) - return 0; + err = 0; if (!detect_memory_e801()) - return 0; + err = 0; + + if (!detect_memory_88()) + err = 0; - return detect_memory_88(); + return err; } -- 1.5.3.1
Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
On 26/09/07 14:20 -0700, H. Peter Anvin wrote: Testing this patch now: From 2efa33f81ef56e7700c09a3d8a881c96692149e5 Mon Sep 17 00:00:00 2001 From: H. Peter Anvin [EMAIL PROTECTED] Date: Wed, 26 Sep 2007 14:11:43 -0700 Subject: [PATCH] [x86 setup] Handle case of improperly terminated E820 chain At least one system (a Geode system with a Digital Logic BIOS) has been found which suddenly stops reporting the SMAP signature when reading the E820 memory chain. We can't know what, exactly, broke in the BIOS, so if we detect this situation, declare the E820 data unusable and fall back to E801. Also, revert to original behavior of always probing all memory methods; that way all the memory information is available to the kernel. Signed-off-by: H. Peter Anvin [EMAIL PROTECTED] Cc: Jordan Crouse [EMAIL PROTECTED] Cc: Joerg Pommnitz [EMAIL PROTECTED] --- arch/i386/boot/memory.c | 30 +++--- 1 files changed, 23 insertions(+), 7 deletions(-) diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index 1a2e62d..bccaa1c 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -20,6 +20,7 @@ static int detect_memory_e820(void) { + int count = 0; u32 next = 0; u32 size, id; u8 err; @@ -33,14 +34,24 @@ static int detect_memory_e820(void) =m (*desc) : D (desc), a (0xe820)); - if (err || id != SMAP) + /* Some BIOSes stop returning SMAP in the middle of +the search loop. We don't know exactly how the BIOS +screwed up the map at that point, we might have a +partial map, the full map, or complete garbage, so +just return failure. */ + if (id != SMAP) { + count = 0; break; + } - boot_params.e820_entries++; + if (err) + break; + + count++; desc++; - } while (next boot_params.e820_entries E820MAX); + } while (next count E820MAX); - return boot_params.e820_entries; + return boot_params.e820_entries = count; } static int detect_memory_e801(void) @@ -89,11 +100,16 @@ static int detect_memory_88(void) int detect_memory(void) { + int err = -1; + if (detect_memory_e820() 0) - return 0; + err = 0; if (!detect_memory_e801()) - return 0; + err = 0; + + if (!detect_memory_88()) + err = 0; - return detect_memory_88(); + return err; } -- 1.5.3.1 Works here with the buggy BIOS. Acked-by: Jordan Crouse [EMAIL PROTECTED] Thanks. -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with 2.6.23-rc6 on AMD Geode LX800
On 25/09/07 01:38 -0700, Joerg Pommnitz wrote: > Chuck, Jordan, > thanks for taking an interest in this problem. As suggested by Jordan I tried > a new > BIOS revision from > http://www.digitallogic.ch/index.php?id=256=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21=23 > > Unfortunately the kernel still fails to boot in the same way. You'll have to contact Digital Logic and have them check with the BIOS vendor to see if the fix was made in that version or not. I don't have that particular board, so I can't try out the fixes here. I'm still trying to track down the particulars of the fix from the BIOS vendor. I'll let you know. > Do you still need the disassembled reserve_bootmem_core? Sure - you might as well - just to make sure its the same problem. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Problems with 2.6.23-rc6 on AMD Geode LX800
Chuck, Jordan, thanks for taking an interest in this problem. As suggested by Jordan I tried a new BIOS revision from http://www.digitallogic.ch/index.php?id=256=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21=23 Unfortunately the kernel still fails to boot in the same way. Do you still need the disassembled reserve_bootmem_core? -- Thanks and kind regards Joerg Yahoo! Clever: Stellen Sie Fragen und finden Sie Antworten. Teilen Sie Ihr Wissen. www.yahoo.de/clever - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Problems with 2.6.23-rc6 on AMD Geode LX800
Chuck, Jordan, thanks for taking an interest in this problem. As suggested by Jordan I tried a new BIOS revision from http://www.digitallogic.ch/index.php?id=256dir=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21mountpoint=23 Unfortunately the kernel still fails to boot in the same way. Do you still need the disassembled reserve_bootmem_core? -- Thanks and kind regards Joerg Yahoo! Clever: Stellen Sie Fragen und finden Sie Antworten. Teilen Sie Ihr Wissen. www.yahoo.de/clever - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with 2.6.23-rc6 on AMD Geode LX800
On 25/09/07 01:38 -0700, Joerg Pommnitz wrote: Chuck, Jordan, thanks for taking an interest in this problem. As suggested by Jordan I tried a new BIOS revision from http://www.digitallogic.ch/index.php?id=256dir=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21mountpoint=23 Unfortunately the kernel still fails to boot in the same way. You'll have to contact Digital Logic and have them check with the BIOS vendor to see if the fix was made in that version or not. I don't have that particular board, so I can't try out the fixes here. I'm still trying to track down the particulars of the fix from the BIOS vendor. I'll let you know. Do you still need the disassembled reserve_bootmem_core? Sure - you might as well - just to make sure its the same problem. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with 2.6.23-rc6 on AMD Geode LX800
Chuck Ebbert wrote: > On 09/20/2007 08:32 AM, Joerg Pommnitz wrote: >> Hello all, >> yesterday I tried to boot a kernel built from the current wireless-dev git >> tree (ath5k branch) >> on a MSEP800/A board (see http://www.milesie.co.uk/pdf/MSEP800.pdf). The >> board >> contains an AMD Geode LX800 CPU. >> The wireless-dev tree is up to date with Linus kernel 2.6.23-rc6. >> >> Attached is a photographic screen shot. The EIP value of c0378dd6 seems to >> correspond with the >> reserve_bootmem_core from System.map: >> >> c0378d51 t free_bootmem_core >> c0378da7 T free_bootmem >> c0378db2 T free_bootmem_node >> c0378dba t reserve_bootmem_core >> c0378e14 T reserve_bootmem >> c0378e1f T reserve_bootmem_node >> > Can you post disassembled code for that function? Its hitting a bug - specifically (from bootmem.c:125): BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn); I hit this problem on a db800 last week. It went away with a newer version of the BIOS, which doesn't help Joerg any, since its a different board (though I think it is the same BIOS vendor). Other BIOSes work just fine with the same kernel image (including known troublemakers like LinuxBIOS). I believe that 2.6.22 was good, so some change must have come along in 2.6.23-pre to cause the pain. Or, it may have exposed old breakage in the BIOS that was later repaired. I'll do the math to figure out whats happening - and I'll check the release notes to see what changed in the BIOS between the failing and working version. If anybody familiar with arch/i386 can think of something new in the kernel that may have precipitated this, do let me know. :) Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with 2.6.23-rc6 on AMD Geode LX800
On 09/20/2007 08:32 AM, Joerg Pommnitz wrote: > Hello all, > yesterday I tried to boot a kernel built from the current wireless-dev git > tree (ath5k branch) > on a MSEP800/A board (see http://www.milesie.co.uk/pdf/MSEP800.pdf). The board > contains an AMD Geode LX800 CPU. > The wireless-dev tree is up to date with Linus kernel 2.6.23-rc6. > > Attached is a photographic screen shot. The EIP value of c0378dd6 seems to > correspond with the > reserve_bootmem_core from System.map: > > c0378d51 t free_bootmem_core > c0378da7 T free_bootmem > c0378db2 T free_bootmem_node > c0378dba t reserve_bootmem_core > c0378e14 T reserve_bootmem > c0378e1f T reserve_bootmem_node > Can you post disassembled code for that function? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with 2.6.23-rc6 on AMD Geode LX800
On 09/20/2007 08:32 AM, Joerg Pommnitz wrote: Hello all, yesterday I tried to boot a kernel built from the current wireless-dev git tree (ath5k branch) on a MSEP800/A board (see http://www.milesie.co.uk/pdf/MSEP800.pdf). The board contains an AMD Geode LX800 CPU. The wireless-dev tree is up to date with Linus kernel 2.6.23-rc6. Attached is a photographic screen shot. The EIP value of c0378dd6 seems to correspond with the reserve_bootmem_core from System.map: c0378d51 t free_bootmem_core c0378da7 T free_bootmem c0378db2 T free_bootmem_node c0378dba t reserve_bootmem_core c0378e14 T reserve_bootmem c0378e1f T reserve_bootmem_node Can you post disassembled code for that function? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with 2.6.23-rc6 on AMD Geode LX800
Chuck Ebbert wrote: On 09/20/2007 08:32 AM, Joerg Pommnitz wrote: Hello all, yesterday I tried to boot a kernel built from the current wireless-dev git tree (ath5k branch) on a MSEP800/A board (see http://www.milesie.co.uk/pdf/MSEP800.pdf). The board contains an AMD Geode LX800 CPU. The wireless-dev tree is up to date with Linus kernel 2.6.23-rc6. Attached is a photographic screen shot. The EIP value of c0378dd6 seems to correspond with the reserve_bootmem_core from System.map: c0378d51 t free_bootmem_core c0378da7 T free_bootmem c0378db2 T free_bootmem_node c0378dba t reserve_bootmem_core c0378e14 T reserve_bootmem c0378e1f T reserve_bootmem_node Can you post disassembled code for that function? Its hitting a bug - specifically (from bootmem.c:125): BUG_ON(PFN_DOWN(addr) = bdata-node_low_pfn); I hit this problem on a db800 last week. It went away with a newer version of the BIOS, which doesn't help Joerg any, since its a different board (though I think it is the same BIOS vendor). Other BIOSes work just fine with the same kernel image (including known troublemakers like LinuxBIOS). I believe that 2.6.22 was good, so some change must have come along in 2.6.23-pre to cause the pain. Or, it may have exposed old breakage in the BIOS that was later repaired. I'll do the math to figure out whats happening - and I'll check the release notes to see what changed in the BIOS between the failing and working version. If anybody familiar with arch/i386 can think of something new in the kernel that may have precipitated this, do let me know. :) Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/