Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Jordan Crouse
On 26/09/07 14:20 -0700, H. Peter Anvin wrote:
> Testing this patch now:
> 

> >From 2efa33f81ef56e7700c09a3d8a881c96692149e5 Mon Sep 17 00:00:00 2001
> From: H. Peter Anvin <[EMAIL PROTECTED]>
> Date: Wed, 26 Sep 2007 14:11:43 -0700
> Subject: [PATCH] [x86 setup] Handle case of improperly terminated E820 chain
> 
> At least one system (a Geode system with a Digital Logic BIOS) has
> been found which suddenly stops reporting the SMAP signature when
> reading the E820 memory chain.  We can't know what, exactly, broke in
> the BIOS, so if we detect this situation, declare the E820 data
> unusable and fall back to E801.
> 
> Also, revert to original behavior of always probing all memory
> methods; that way all the memory information is available to the
> kernel.
> 
> Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
> Cc: Jordan Crouse <[EMAIL PROTECTED]>
> Cc: Joerg Pommnitz <[EMAIL PROTECTED]>
> ---
>  arch/i386/boot/memory.c |   30 +++---
>  1 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index 1a2e62d..bccaa1c 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -20,6 +20,7 @@
>  
>  static int detect_memory_e820(void)
>  {
> + int count = 0;
>   u32 next = 0;
>   u32 size, id;
>   u8 err;
> @@ -33,14 +34,24 @@ static int detect_memory_e820(void)
> "=m" (*desc)
>   : "D" (desc), "a" (0xe820));
>  
> - if (err || id != SMAP)
> + /* Some BIOSes stop returning SMAP in the middle of
> +the search loop.  We don't know exactly how the BIOS
> +screwed up the map at that point, we might have a
> +partial map, the full map, or complete garbage, so
> +just return failure. */
> + if (id != SMAP) {
> + count = 0;
>   break;
> + }
>  
> - boot_params.e820_entries++;
> + if (err)
> + break;
> +
> + count++;
>   desc++;
> - } while (next && boot_params.e820_entries < E820MAX);
> + } while (next && count < E820MAX);
>  
> - return boot_params.e820_entries;
> + return boot_params.e820_entries = count;
>  }
>  
>  static int detect_memory_e801(void)
> @@ -89,11 +100,16 @@ static int detect_memory_88(void)
>  
>  int detect_memory(void)
>  {
> + int err = -1;
> +
>   if (detect_memory_e820() > 0)
> - return 0;
> + err = 0;
>  
>   if (!detect_memory_e801())
> - return 0;
> + err = 0;
> +
> + if (!detect_memory_88())
> + err = 0;
>  
> - return detect_memory_88();
> + return err;
>  }
> -- 
> 1.5.3.1
> 

Works here with the buggy BIOS.  

Acked-by: Jordan Crouse <[EMAIL PROTECTED]>

Thanks.

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Jordan Crouse wrote:
> 
> Hmm - the old code seems to fail to e801 when CF was set too:
> 
>   int $0x15   # make the call
>   jc  bail820 # fall to e801 if it fails
> 
>   cmpl$SMAP, %eax # check the return is `SMAP'
>   jne bail820 # fall to e801 if it fails
> 
> Thats not to say that the old code was correct, mind you.  Failing on a
> bad ID and returning without error on a set CF seems to be good to me.
> 

Testing this patch now:

>From 2efa33f81ef56e7700c09a3d8a881c96692149e5 Mon Sep 17 00:00:00 2001
From: H. Peter Anvin <[EMAIL PROTECTED]>
Date: Wed, 26 Sep 2007 14:11:43 -0700
Subject: [PATCH] [x86 setup] Handle case of improperly terminated E820 chain

At least one system (a Geode system with a Digital Logic BIOS) has
been found which suddenly stops reporting the SMAP signature when
reading the E820 memory chain.  We can't know what, exactly, broke in
the BIOS, so if we detect this situation, declare the E820 data
unusable and fall back to E801.

Also, revert to original behavior of always probing all memory
methods; that way all the memory information is available to the
kernel.

Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
Cc: Jordan Crouse <[EMAIL PROTECTED]>
Cc: Joerg Pommnitz <[EMAIL PROTECTED]>
---
 arch/i386/boot/memory.c |   30 +++---
 1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..bccaa1c 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -20,6 +20,7 @@
 
 static int detect_memory_e820(void)
 {
+	int count = 0;
 	u32 next = 0;
 	u32 size, id;
 	u8 err;
@@ -33,14 +34,24 @@ static int detect_memory_e820(void)
 		  "=m" (*desc)
 		: "D" (desc), "a" (0xe820));
 
-		if (err || id != SMAP)
+		/* Some BIOSes stop returning SMAP in the middle of
+		   the search loop.  We don't know exactly how the BIOS
+		   screwed up the map at that point, we might have a
+		   partial map, the full map, or complete garbage, so
+		   just return failure. */
+		if (id != SMAP) {
+			count = 0;
 			break;
+		}
 
-		boot_params.e820_entries++;
+		if (err)
+			break;
+
+		count++;
 		desc++;
-	} while (next && boot_params.e820_entries < E820MAX);
+	} while (next && count < E820MAX);
 
-	return boot_params.e820_entries;
+	return boot_params.e820_entries = count;
 }
 
 static int detect_memory_e801(void)
@@ -89,11 +100,16 @@ static int detect_memory_88(void)
 
 int detect_memory(void)
 {
+	int err = -1;
+
 	if (detect_memory_e820() > 0)
-		return 0;
+		err = 0;
 
 	if (!detect_memory_e801())
-		return 0;
+		err = 0;
+
+	if (!detect_memory_88())
+		err = 0;
 
-	return detect_memory_88();
+	return err;
 }
-- 
1.5.3.1



Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Jordan Crouse
On 26/09/07 14:04 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> > On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
> >> Please try the following debug patch to let us know what is going on.
> >>
> >>-hpa
> > 
> >> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> >> index 1a2e62d..a0ccf29 100644
> >> --- a/arch/i386/boot/memory.c
> >> +++ b/arch/i386/boot/memory.c
> >> @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
> >>  "=m" (*desc)
> >>: "D" (desc), "a" (0xe820));
> >>  
> >> +  printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
> >> + err, id, next,
> >> + (unsigned int)desc->addr,
> >> + (unsigned int)desc->size,
> >> + desc->type);
> >> +
> >>if (err || id != SMAP)
> >>break;
> > 
> > Okay, we have clarity.   Here is the output
> > 
> > e820: err 0 id 0x534d4150 next 15476 :0009fc00 1
> > e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2
> > e820: err 0 id 0x534d4150 next 15516 000e:0002 2
> > e820: err 0 id 0x0e7b next 11536 0010:0e6b 1
> > 
> > In the last entry,  id is obviously wrong (it should be 'SMAP' or
> > 0x534d4150).  This is the BIOS bug.
> > 
> > Here's the reason why this bothers us now.  In the old assembly code,
> > if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
> > code.  In the new code in memory.c, if id != SMAP, we break out of the
> > int15 loop, and return boot_params.e820_entries, which in our case is
> > 3.  detect_memory() considers this to be successful, and no attempt to
> > parse e801 is made.
> > 
> > So thats where the problem is - in the old code with the buggy BIOS, we
> > punted to reading the e801 information, and that was enough to keep us 
> > going.   In the new code, we allow a partial table to be used, and we
> > blow up.
> > 
> > Attached is a patch to fix this - it returns -1 on error, and only sets
> > boot_params.e820_entries to be non-zero if we have something useful
> > in it.  This punts the detection to the e801 code, which then is
> > then successful.
> > 
> > This fixes the problem with the DB800, and so it probably should
> > with the other Geode platforms affected by this.
> > 
> > Many thanks to hpa for the guiding hand.
> > 
> 
> This patch is obviously wrong.  There are a lot of e820 BIOSen out there
> that terminate with CF=1, and that is a legitimate termination condition
> for e820.  Now, as far as what to do when id != SMAP, it probably is
> still the right thing to do; since the BOS vendor couldn't get something
> that elementary correct, we shouldn't trust the data.
> 
> I'll write up a corrected patch.

Hmm - the old code seems to fail to e801 when CF was set too:

int $0x15   # make the call
jc  bail820 # fall to e801 if it fails

cmpl$SMAP, %eax # check the return is `SMAP'
jne bail820 # fall to e801 if it fails

Thats not to say that the old code was correct, mind you.  Failing on a
bad ID and returning without error on a set CF seems to be good to me.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Jordan Crouse wrote:
> On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
>> Please try the following debug patch to let us know what is going on.
>>
>>  -hpa
> 
>> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
>> index 1a2e62d..a0ccf29 100644
>> --- a/arch/i386/boot/memory.c
>> +++ b/arch/i386/boot/memory.c
>> @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
>>"=m" (*desc)
>>  : "D" (desc), "a" (0xe820));
>>  
>> +printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
>> +   err, id, next,
>> +   (unsigned int)desc->addr,
>> +   (unsigned int)desc->size,
>> +   desc->type);
>> +
>>  if (err || id != SMAP)
>>  break;
> 
> Okay, we have clarity.   Here is the output
> 
> e820: err 0 id 0x534d4150 next 15476 :0009fc00 1
> e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2
> e820: err 0 id 0x534d4150 next 15516 000e:0002 2
> e820: err 0 id 0x0e7b next 11536 0010:0e6b 1
> 
> In the last entry,  id is obviously wrong (it should be 'SMAP' or
> 0x534d4150).  This is the BIOS bug.
> 
> Here's the reason why this bothers us now.  In the old assembly code,
> if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
> code.  In the new code in memory.c, if id != SMAP, we break out of the
> int15 loop, and return boot_params.e820_entries, which in our case is
> 3.  detect_memory() considers this to be successful, and no attempt to
> parse e801 is made.
> 
> So thats where the problem is - in the old code with the buggy BIOS, we
> punted to reading the e801 information, and that was enough to keep us 
> going.   In the new code, we allow a partial table to be used, and we
> blow up.
> 
> Attached is a patch to fix this - it returns -1 on error, and only sets
> boot_params.e820_entries to be non-zero if we have something useful
> in it.  This punts the detection to the e801 code, which then is
> then successful.
> 
> This fixes the problem with the DB800, and so it probably should
> with the other Geode platforms affected by this.
> 
> Many thanks to hpa for the guiding hand.
> 

This patch is obviously wrong.  There are a lot of e820 BIOSen out there
that terminate with CF=1, and that is a legitimate termination condition
for e820.  Now, as far as what to do when id != SMAP, it probably is
still the right thing to do; since the BOS vendor couldn't get something
that elementary correct, we shouldn't trust the data.

I'll write up a corrected patch.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Jordan Crouse
On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
> Please try the following debug patch to let us know what is going on.
> 
>   -hpa

> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index 1a2e62d..a0ccf29 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
> "=m" (*desc)
>   : "D" (desc), "a" (0xe820));
>  
> + printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
> +err, id, next,
> +(unsigned int)desc->addr,
> +(unsigned int)desc->size,
> +desc->type);
> +
>   if (err || id != SMAP)
>   break;

Okay, we have clarity.   Here is the output

e820: err 0 id 0x534d4150 next 15476 :0009fc00 1
e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2
e820: err 0 id 0x534d4150 next 15516 000e:0002 2
e820: err 0 id 0x0e7b next 11536 0010:0e6b 1

In the last entry,  id is obviously wrong (it should be 'SMAP' or
0x534d4150).  This is the BIOS bug.

Here's the reason why this bothers us now.  In the old assembly code,
if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
code.  In the new code in memory.c, if id != SMAP, we break out of the
int15 loop, and return boot_params.e820_entries, which in our case is
3.  detect_memory() considers this to be successful, and no attempt to
parse e801 is made.

So thats where the problem is - in the old code with the buggy BIOS, we
punted to reading the e801 information, and that was enough to keep us 
going.   In the new code, we allow a partial table to be used, and we
blow up.

Attached is a patch to fix this - it returns -1 on error, and only sets
boot_params.e820_entries to be non-zero if we have something useful
in it.  This punts the detection to the e801 code, which then is
then successful.

This fixes the problem with the DB800, and so it probably should
with the other Geode platforms affected by this.

Many thanks to hpa for the guiding hand.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.
[i386]: Return an error if the e820 detection goes bad

From: Jordan Crouse <[EMAIL PROTECTED]>

Change the e820 code to always return an error if something
bad happens while reading the e820 map.  This matches the
old code behavior, and allows brain-dead e820 implementations
to still work.

Signed-off-by: Jordan Crouse <[EMAIL PROTECTED]>
---

 arch/i386/boot/memory.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..4c7f0f6 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -22,7 +22,7 @@ static int detect_memory_e820(void)
 {
u32 next = 0;
u32 size, id;
-   u8 err;
+   u8 err, count = 0;
struct e820entry *desc = boot_params.e820_map;
 
do {
@@ -34,13 +34,14 @@ static int detect_memory_e820(void)
: "D" (desc), "a" (0xe820));
 
if (err || id != SMAP)
-   break;
+   return -1;
 
-   boot_params.e820_entries++;
+   count++;
desc++;
} while (next && boot_params.e820_entries < E820MAX);
 
-   return boot_params.e820_entries;
+   boot_params.e820_entries = count;
+   return count;
 }
 
 static int detect_memory_e801(void)


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Jordan Crouse wrote:
> 
> Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c
> is being set to 9F (640k) in the broken case, this due to the
> the e820 map looking something like this:
> 
> Address   Size  Type
>   0009FC00  1
> 0009FC00  0400  2
> 000E  2000  2
> 
> (Yep, thats it - thats the list.  e820.nr_map is indeed 3). 
> 
> Long story short, bdata->node_low_pfn gets set to 9F, and When we 
> try to allocate the bootmem bitmap (at _pa_symbol(_text), which is 
> page 0x100), then the system gets appropriately angry.
> 
> As background, I'm using syslinux 3.36 as my loader here - I've used this
> exact same version for a very long time, so I don't blame it in the least.
> Something is getting confused in the early kernel, and whatever that
> something is, a still unknown change in a newer version of the BIOS
> fixed it.  The search goes on.
> 

Please try the following debug patch to let us know what is going on.

-hpa
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..a0ccf29 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -33,6 +33,12 @@ static int detect_memory_e820(void)
  "=m" (*desc)
: "D" (desc), "a" (0xe820));
 
+   printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
+  err, id, next,
+  (unsigned int)desc->addr,
+  (unsigned int)desc->size,
+  desc->type);
+
if (err || id != SMAP)
break;
 


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Jordan Crouse wrote:
> 
> As background, I'm using syslinux 3.36 as my loader here - I've used this
> exact same version for a very long time, so I don't blame it in the least.
> Something is getting confused in the early kernel, and whatever that
> something is, a still unknown change in a newer version of the BIOS
> fixed it.  The search goes on.
> 

OK, we should put printf's in arch/i386/boot/memory.c and see what
actually gets read out from the BIOS.  This could either be a BIOS
problem or a bug in memory.c (or a bug elsewhere in the code that the
change in memory.c triggers, but that seems less likely.)

-hpa

P.S. Are you guys in the Bay Area by any chance?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Jordan Crouse
On 26/09/07 07:10 -0700, H. Peter Anvin wrote:
> Joerg Pommnitz wrote:
> > Hello all,
> > this is what git bisect told me about the problem:
> > 
> > [EMAIL PROTECTED]:~/linux-2.6$ git bisect good
> > 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit
> > commit 4fd06960f120e02e9abc802a09f9511c400042a5
> > Author: H. Peter Anvin <[EMAIL PROTECTED]>
> > Date:   Wed Jul 11 12:18:56 2007 -0700
> > 
> > Use the new x86 setup code for i386
> > 
> > This patch hooks the new x86 setup code into the Makefile machinery.  It
> > also adapts boot/tools/build.c to a two-file (as opposed to three-file)
> > universe, and simplifies it substantially.
> > 
> > Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
> > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> > 
> > :04 04 6560eb5b7e40d93813276544bced8c478f9067f5 
> > fe5f90d9ca08e526559815789175602ba2c51743 M  arch
> > 
> 
> There is something very fishy.
> 
> The only documentation you've given us so far is a screen shot which
> contained a message ("BIOS data check successful") which doesn't occur
> in the kernel.
> 
> The loader string doesn't look all that familiar either; it looks like
> an extremely old version of SYSLINUX, but that doesn't contain that
> message either.
> 
> INT 6 is #UD, the undefined instruction exception.  This is consistent with:
> 
> > Its hitting a bug - specifically (from bootmem.c:125):
> > BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
> 
> However, all that tells us is that reserve_bootmem_core() was either
> called with a bad address or bdata->node_low_pfn is garbage.  In
> particular, without knowing how it got there it's hard to know for sure.

/me swings a +5 JTAG debugger

Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c
is being set to 9F (640k) in the broken case, this due to the
the e820 map looking something like this:

Address   Size  Type
  0009FC00  1
0009FC00  0400  2
000E  2000  2

(Yep, thats it - thats the list.  e820.nr_map is indeed 3). 

Long story short, bdata->node_low_pfn gets set to 9F, and When we 
try to allocate the bootmem bitmap (at _pa_symbol(_text), which is 
page 0x100), then the system gets appropriately angry.

As background, I'm using syslinux 3.36 as my loader here - I've used this
exact same version for a very long time, so I don't blame it in the least.
Something is getting confused in the early kernel, and whatever that
something is, a still unknown change in a newer version of the BIOS
fixed it.  The search goes on.

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Joerg Pommnitz
 > There is something very fishy.
 > 
 > The only documentation you've given us so far is a screen shot which
 > contained a message ("BIOS data check successful") which doesn't occur
 > in the kernel.
 >
 > The loader string doesn't look all that familiar either; it looks like
 > an extremely old version of SYSLINUX, but that doesn't contain that
 > message either.

The boot loader is LILO from Ubuntu 7.04, so it should be fairly recent.

 > INT 6 is #UD, the undefined instruction exception.  This is consistent with:
 > 
 > > Its hitting a bug - specifically (from bootmem.c:125):
 > > BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
 > 
 > However, all that tells us is that reserve_bootmem_core() was either
 > called with a bad address or bdata->node_low_pfn is garbage.  In
 > particular, without knowing how it got there it's hard to know for sure.
 > 
 > Could you send me the boot messages from a working kernel boot?

Attached is a boot log I get where the last patch is 
f2d98ae63dc64dedb00499289e13a50677f771f9, e.g. "Linker script for the
new x86 setup code".

The kernel is directly from "git bisect, make defconfig, make", no local
changes or strange patches applied. Build environment is plain Ubuntu-7.04.

--  
Kind regards
 
   Joerg




  __  
Alles was der Gesundheit und Entspannung dient. BE A BETTER MEDIZINMANN! 
www.yahoo.de/cleverLinux version 2.6.22-gf2d98ae6 ([EMAIL PROTECTED]) (gcc version 4.1.2 (Ubuntu 
4.1.2-0ubuntu4)) #12 SMP Wed Sep 26 12:45:21 CEST 2007
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 1e7b (usable)
 BIOS-e820: 1e7b - 1e7bffc0 (ACPI data)
 BIOS-e820: 1e7bffc0 - 1e7c (ACPI NVS)
 BIOS-e820: 4040 - 40440004 (reserved)
 BIOS-e820: f000 - 0001 (reserved)
0MB HIGHMEM available.
487MB LOWMEM available.
Entering add_active_range(0, 0, 124848) 0 entries of 256 used
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   124848
  HighMem124848 ->   124848
early_node_map[1] active PFN ranges
0:0 ->   124848
On node 0 totalpages: 124848
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 943 pages used for memmap
  Normal zone: 119809 pages, LIFO batch:31
  HighMem zone: 0 pages used for memmap
DMI not present or invalid.
Using APIC driver default
ACPI: RSDP 000E9010, 0014 (r0 OID_00)
ACPI: RSDT 1E7B2AE0, 0030 (r1 AMDRSDT_000 31303030 AMD  31303030)
ACPI: FACP 1E7B29E0, 0084 (r1 AMDFACP_000 31303030 AMD  31303030)
ACPI: DSDT 1E7B, 29D6 (r1 INSYDE CS553x   1007 INTL 20030122)
ACPI: FACS 1E7BFFC0, 0040
ACPI: BOOT 1E7B2A70, 0028 (r1 AMDBOOT_000 31303030 AMD  31303030)
ACPI: DBGP 1E7B2AA0, 0034 (r1 AMDDBGP_000 31303030 AMD  31303030)
ACPI: no DMI BIOS year, acpi=force is required to enable ACPI
ACPI: Disabling ACPI support
Allocating PCI resources starting at 5000 (gap: 40440004:afbbfffc)
Built 1 zonelists.  Total pages: 123873
Kernel command line: BOOT_IMAGE=Linux2623 ro root=341
No local APIC present or hardware disabled
mapped APIC to d000 (013dc000)
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Detected 498.434 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 488944k/499392k available (3041k kernel code, 9872k reserved, 1517k 
data, 292k init, 0k highmem)
virtual kernel memory layout:
fixmap  : 0xffe16000 - 0xf000   (1956 kB)
pkmap   : 0xff80 - 0xffc0   (4096 kB)
vmalloc : 0xdf00 - 0xff7fe000   ( 519 MB)
lowmem  : 0xc000 - 0xde7b   ( 487 MB)
  .init : 0xc057b000 - 0xc05c4000   ( 292 kB)
  .data : 0xc03f86e0 - 0xc0573ecc   (1517 kB)
  .text : 0xc010 - 0xc03f86e0   (3041 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 998.38 BogoMIPS (lpj=1996769)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0088a93d c0c0a13d    
  
CPU: L1 I Cache: 64K (32 bytes/line), D cache 64K (32 bytes/line)
CPU: L2 Cache: 128K (32 bytes/line)
CPU: After all inits, caps: 0088a93d c0c0a13d    
  
Compat vDSO mapped to e000.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 18k freed
CPU0: AMD Geode(TM) Integrated Processor by AMD PCS stepping 02
SMP motherboard not detected.
Local APIC not detected. Using dummy APIC emulation.
Brought up 1 CPUs
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 

Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Joerg Pommnitz wrote:
> Hello all,
> this is what git bisect told me about the problem:
> 
> [EMAIL PROTECTED]:~/linux-2.6$ git bisect good
> 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit
> commit 4fd06960f120e02e9abc802a09f9511c400042a5
> Author: H. Peter Anvin <[EMAIL PROTECTED]>
> Date:   Wed Jul 11 12:18:56 2007 -0700
> 
> Use the new x86 setup code for i386
> 
> This patch hooks the new x86 setup code into the Makefile machinery.  It
> also adapts boot/tools/build.c to a two-file (as opposed to three-file)
> universe, and simplifies it substantially.
> 
> Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
> Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> 
> :04 04 6560eb5b7e40d93813276544bced8c478f9067f5 
> fe5f90d9ca08e526559815789175602ba2c51743 M  arch
> 

There is something very fishy.

The only documentation you've given us so far is a screen shot which
contained a message ("BIOS data check successful") which doesn't occur
in the kernel.

The loader string doesn't look all that familiar either; it looks like
an extremely old version of SYSLINUX, but that doesn't contain that
message either.

INT 6 is #UD, the undefined instruction exception.  This is consistent with:

> Its hitting a bug - specifically (from bootmem.c:125):
> BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);

However, all that tells us is that reserve_bootmem_core() was either
called with a bad address or bdata->node_low_pfn is garbage.  In
particular, without knowing how it got there it's hard to know for sure.

Could you send me the boot messages from a working kernel boot?

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Joerg Pommnitz
Hello all,
this is what git bisect told me about the problem:

[EMAIL PROTECTED]:~/linux-2.6$ git bisect good
4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit
commit 4fd06960f120e02e9abc802a09f9511c400042a5
Author: H. Peter Anvin <[EMAIL PROTECTED]>
Date:   Wed Jul 11 12:18:56 2007 -0700

Use the new x86 setup code for i386

This patch hooks the new x86 setup code into the Makefile machinery.  It
also adapts boot/tools/build.c to a two-file (as opposed to three-file)
universe, and simplifies it substantially.

Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>

:04 04 6560eb5b7e40d93813276544bced8c478f9067f5 
fe5f90d9ca08e526559815789175602ba2c51743 M  arch

 
--  
Regards
 
   Joerg
 


- Ursprüngliche Mail 
Von: Jordan Crouse <[EMAIL PROTECTED]>
An: Joerg Pommnitz <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Gesendet: Dienstag, den 25. September 2007, 17:04:52 Uhr
Betreff: Re: Problems with 2.6.23-rc6 on AMD Geode LX800

On 25/09/07 01:38 -0700, Joerg Pommnitz wrote:
> Chuck, Jordan,
> thanks for taking an interest in this problem. As suggested by Jordan I tried 
> a new
> BIOS revision from
> http://www.digitallogic.ch/index.php?id=256=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21=23
> 
> Unfortunately the kernel still fails to boot in the same way.

You'll have to contact Digital Logic and have them check with the BIOS vendor
to see if the fix was made in that version or not.  I don't have that
particular board, so I can't try out the fixes here.

I'm still trying to track down the particulars of the fix from the BIOS 
vendor.  I'll let you know.

> Do you still need the disassembled reserve_bootmem_core? 

Sure - you might as well - just to make sure its the same problem.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.








  __  
Yahoo! Clever: Sie haben Fragen? Yahoo! Nutzer antworten Ihnen. 
www.yahoo.de/clever

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Joerg Pommnitz
Hello all,
this is what git bisect told me about the problem:

[EMAIL PROTECTED]:~/linux-2.6$ git bisect good
4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit
commit 4fd06960f120e02e9abc802a09f9511c400042a5
Author: H. Peter Anvin [EMAIL PROTECTED]
Date:   Wed Jul 11 12:18:56 2007 -0700

Use the new x86 setup code for i386

This patch hooks the new x86 setup code into the Makefile machinery.  It
also adapts boot/tools/build.c to a two-file (as opposed to three-file)
universe, and simplifies it substantially.

Signed-off-by: H. Peter Anvin [EMAIL PROTECTED]
Signed-off-by: Linus Torvalds [EMAIL PROTECTED]

:04 04 6560eb5b7e40d93813276544bced8c478f9067f5 
fe5f90d9ca08e526559815789175602ba2c51743 M  arch

 
--  
Regards
 
   Joerg
 


- Ursprüngliche Mail 
Von: Jordan Crouse [EMAIL PROTECTED]
An: Joerg Pommnitz [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Gesendet: Dienstag, den 25. September 2007, 17:04:52 Uhr
Betreff: Re: Problems with 2.6.23-rc6 on AMD Geode LX800

On 25/09/07 01:38 -0700, Joerg Pommnitz wrote:
 Chuck, Jordan,
 thanks for taking an interest in this problem. As suggested by Jordan I tried 
 a new
 BIOS revision from
 http://www.digitallogic.ch/index.php?id=256dir=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21mountpoint=23
 
 Unfortunately the kernel still fails to boot in the same way.

You'll have to contact Digital Logic and have them check with the BIOS vendor
to see if the fix was made in that version or not.  I don't have that
particular board, so I can't try out the fixes here.

I'm still trying to track down the particulars of the fix from the BIOS 
vendor.  I'll let you know.

 Do you still need the disassembled reserve_bootmem_core? 

Sure - you might as well - just to make sure its the same problem.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.








  __  
Yahoo! Clever: Sie haben Fragen? Yahoo! Nutzer antworten Ihnen. 
www.yahoo.de/clever

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Joerg Pommnitz wrote:
 Hello all,
 this is what git bisect told me about the problem:
 
 [EMAIL PROTECTED]:~/linux-2.6$ git bisect good
 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit
 commit 4fd06960f120e02e9abc802a09f9511c400042a5
 Author: H. Peter Anvin [EMAIL PROTECTED]
 Date:   Wed Jul 11 12:18:56 2007 -0700
 
 Use the new x86 setup code for i386
 
 This patch hooks the new x86 setup code into the Makefile machinery.  It
 also adapts boot/tools/build.c to a two-file (as opposed to three-file)
 universe, and simplifies it substantially.
 
 Signed-off-by: H. Peter Anvin [EMAIL PROTECTED]
 Signed-off-by: Linus Torvalds [EMAIL PROTECTED]
 
 :04 04 6560eb5b7e40d93813276544bced8c478f9067f5 
 fe5f90d9ca08e526559815789175602ba2c51743 M  arch
 

There is something very fishy.

The only documentation you've given us so far is a screen shot which
contained a message (BIOS data check successful) which doesn't occur
in the kernel.

The loader string doesn't look all that familiar either; it looks like
an extremely old version of SYSLINUX, but that doesn't contain that
message either.

INT 6 is #UD, the undefined instruction exception.  This is consistent with:

 Its hitting a bug - specifically (from bootmem.c:125):
 BUG_ON(PFN_DOWN(addr) = bdata-node_low_pfn);

However, all that tells us is that reserve_bootmem_core() was either
called with a bad address or bdata-node_low_pfn is garbage.  In
particular, without knowing how it got there it's hard to know for sure.

Could you send me the boot messages from a working kernel boot?

-hpa

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Joerg Pommnitz
  There is something very fishy.
  
  The only documentation you've given us so far is a screen shot which
  contained a message (BIOS data check successful) which doesn't occur
  in the kernel.
 
  The loader string doesn't look all that familiar either; it looks like
  an extremely old version of SYSLINUX, but that doesn't contain that
  message either.

The boot loader is LILO from Ubuntu 7.04, so it should be fairly recent.

  INT 6 is #UD, the undefined instruction exception.  This is consistent with:
  
   Its hitting a bug - specifically (from bootmem.c:125):
   BUG_ON(PFN_DOWN(addr) = bdata-node_low_pfn);
  
  However, all that tells us is that reserve_bootmem_core() was either
  called with a bad address or bdata-node_low_pfn is garbage.  In
  particular, without knowing how it got there it's hard to know for sure.
  
  Could you send me the boot messages from a working kernel boot?

Attached is a boot log I get where the last patch is 
f2d98ae63dc64dedb00499289e13a50677f771f9, e.g. Linker script for the
new x86 setup code.

The kernel is directly from git bisect, make defconfig, make, no local
changes or strange patches applied. Build environment is plain Ubuntu-7.04.

--  
Kind regards
 
   Joerg




  __  
Alles was der Gesundheit und Entspannung dient. BE A BETTER MEDIZINMANN! 
www.yahoo.de/cleverLinux version 2.6.22-gf2d98ae6 ([EMAIL PROTECTED]) (gcc version 4.1.2 (Ubuntu 
4.1.2-0ubuntu4)) #12 SMP Wed Sep 26 12:45:21 CEST 2007
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 1e7b (usable)
 BIOS-e820: 1e7b - 1e7bffc0 (ACPI data)
 BIOS-e820: 1e7bffc0 - 1e7c (ACPI NVS)
 BIOS-e820: 4040 - 40440004 (reserved)
 BIOS-e820: f000 - 0001 (reserved)
0MB HIGHMEM available.
487MB LOWMEM available.
Entering add_active_range(0, 0, 124848) 0 entries of 256 used
Zone PFN ranges:
  DMA 0 - 4096
  Normal   4096 -   124848
  HighMem124848 -   124848
early_node_map[1] active PFN ranges
0:0 -   124848
On node 0 totalpages: 124848
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 943 pages used for memmap
  Normal zone: 119809 pages, LIFO batch:31
  HighMem zone: 0 pages used for memmap
DMI not present or invalid.
Using APIC driver default
ACPI: RSDP 000E9010, 0014 (r0 OID_00)
ACPI: RSDT 1E7B2AE0, 0030 (r1 AMDRSDT_000 31303030 AMD  31303030)
ACPI: FACP 1E7B29E0, 0084 (r1 AMDFACP_000 31303030 AMD  31303030)
ACPI: DSDT 1E7B, 29D6 (r1 INSYDE CS553x   1007 INTL 20030122)
ACPI: FACS 1E7BFFC0, 0040
ACPI: BOOT 1E7B2A70, 0028 (r1 AMDBOOT_000 31303030 AMD  31303030)
ACPI: DBGP 1E7B2AA0, 0034 (r1 AMDDBGP_000 31303030 AMD  31303030)
ACPI: no DMI BIOS year, acpi=force is required to enable ACPI
ACPI: Disabling ACPI support
Allocating PCI resources starting at 5000 (gap: 40440004:afbbfffc)
Built 1 zonelists.  Total pages: 123873
Kernel command line: BOOT_IMAGE=Linux2623 ro root=341
No local APIC present or hardware disabled
mapped APIC to d000 (013dc000)
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Detected 498.434 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 488944k/499392k available (3041k kernel code, 9872k reserved, 1517k 
data, 292k init, 0k highmem)
virtual kernel memory layout:
fixmap  : 0xffe16000 - 0xf000   (1956 kB)
pkmap   : 0xff80 - 0xffc0   (4096 kB)
vmalloc : 0xdf00 - 0xff7fe000   ( 519 MB)
lowmem  : 0xc000 - 0xde7b   ( 487 MB)
  .init : 0xc057b000 - 0xc05c4000   ( 292 kB)
  .data : 0xc03f86e0 - 0xc0573ecc   (1517 kB)
  .text : 0xc010 - 0xc03f86e0   (3041 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 998.38 BogoMIPS (lpj=1996769)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0088a93d c0c0a13d    
  
CPU: L1 I Cache: 64K (32 bytes/line), D cache 64K (32 bytes/line)
CPU: L2 Cache: 128K (32 bytes/line)
CPU: After all inits, caps: 0088a93d c0c0a13d    
  
Compat vDSO mapped to e000.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 18k freed
CPU0: AMD Geode(TM) Integrated Processor by AMD PCS stepping 02
SMP motherboard not detected.
Local APIC not detected. Using dummy APIC emulation.
Brought up 1 CPUs
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xff8b7, last bus=0
PCI: 

Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Jordan Crouse
On 26/09/07 07:10 -0700, H. Peter Anvin wrote:
 Joerg Pommnitz wrote:
  Hello all,
  this is what git bisect told me about the problem:
  
  [EMAIL PROTECTED]:~/linux-2.6$ git bisect good
  4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit
  commit 4fd06960f120e02e9abc802a09f9511c400042a5
  Author: H. Peter Anvin [EMAIL PROTECTED]
  Date:   Wed Jul 11 12:18:56 2007 -0700
  
  Use the new x86 setup code for i386
  
  This patch hooks the new x86 setup code into the Makefile machinery.  It
  also adapts boot/tools/build.c to a two-file (as opposed to three-file)
  universe, and simplifies it substantially.
  
  Signed-off-by: H. Peter Anvin [EMAIL PROTECTED]
  Signed-off-by: Linus Torvalds [EMAIL PROTECTED]
  
  :04 04 6560eb5b7e40d93813276544bced8c478f9067f5 
  fe5f90d9ca08e526559815789175602ba2c51743 M  arch
  
 
 There is something very fishy.
 
 The only documentation you've given us so far is a screen shot which
 contained a message (BIOS data check successful) which doesn't occur
 in the kernel.
 
 The loader string doesn't look all that familiar either; it looks like
 an extremely old version of SYSLINUX, but that doesn't contain that
 message either.
 
 INT 6 is #UD, the undefined instruction exception.  This is consistent with:
 
  Its hitting a bug - specifically (from bootmem.c:125):
  BUG_ON(PFN_DOWN(addr) = bdata-node_low_pfn);
 
 However, all that tells us is that reserve_bootmem_core() was either
 called with a bad address or bdata-node_low_pfn is garbage.  In
 particular, without knowing how it got there it's hard to know for sure.

/me swings a +5 JTAG debugger

Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c
is being set to 9F (640k) in the broken case, this due to the
the e820 map looking something like this:

Address   Size  Type
  0009FC00  1
0009FC00  0400  2
000E  2000  2

(Yep, thats it - thats the list.  e820.nr_map is indeed 3). 

Long story short, bdata-node_low_pfn gets set to 9F, and When we 
try to allocate the bootmem bitmap (at _pa_symbol(_text), which is 
page 0x100), then the system gets appropriately angry.

As background, I'm using syslinux 3.36 as my loader here - I've used this
exact same version for a very long time, so I don't blame it in the least.
Something is getting confused in the early kernel, and whatever that
something is, a still unknown change in a newer version of the BIOS
fixed it.  The search goes on.

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Jordan Crouse wrote:
 
 As background, I'm using syslinux 3.36 as my loader here - I've used this
 exact same version for a very long time, so I don't blame it in the least.
 Something is getting confused in the early kernel, and whatever that
 something is, a still unknown change in a newer version of the BIOS
 fixed it.  The search goes on.
 

OK, we should put printf's in arch/i386/boot/memory.c and see what
actually gets read out from the BIOS.  This could either be a BIOS
problem or a bug in memory.c (or a bug elsewhere in the code that the
change in memory.c triggers, but that seems less likely.)

-hpa

P.S. Are you guys in the Bay Area by any chance?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Jordan Crouse wrote:
 
 Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c
 is being set to 9F (640k) in the broken case, this due to the
 the e820 map looking something like this:
 
 Address   Size  Type
   0009FC00  1
 0009FC00  0400  2
 000E  2000  2
 
 (Yep, thats it - thats the list.  e820.nr_map is indeed 3). 
 
 Long story short, bdata-node_low_pfn gets set to 9F, and When we 
 try to allocate the bootmem bitmap (at _pa_symbol(_text), which is 
 page 0x100), then the system gets appropriately angry.
 
 As background, I'm using syslinux 3.36 as my loader here - I've used this
 exact same version for a very long time, so I don't blame it in the least.
 Something is getting confused in the early kernel, and whatever that
 something is, a still unknown change in a newer version of the BIOS
 fixed it.  The search goes on.
 

Please try the following debug patch to let us know what is going on.

-hpa
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..a0ccf29 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -33,6 +33,12 @@ static int detect_memory_e820(void)
  =m (*desc)
: D (desc), a (0xe820));
 
+   printf(e820: err %d id 0x%08x next %u %08x:%08x %u\n,
+  err, id, next,
+  (unsigned int)desc-addr,
+  (unsigned int)desc-size,
+  desc-type);
+
if (err || id != SMAP)
break;
 


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Jordan Crouse
On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
 Please try the following debug patch to let us know what is going on.
 
   -hpa

 diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
 index 1a2e62d..a0ccf29 100644
 --- a/arch/i386/boot/memory.c
 +++ b/arch/i386/boot/memory.c
 @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
 =m (*desc)
   : D (desc), a (0xe820));
  
 + printf(e820: err %d id 0x%08x next %u %08x:%08x %u\n,
 +err, id, next,
 +(unsigned int)desc-addr,
 +(unsigned int)desc-size,
 +desc-type);
 +
   if (err || id != SMAP)
   break;

Okay, we have clarity.   Here is the output

e820: err 0 id 0x534d4150 next 15476 :0009fc00 1
e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2
e820: err 0 id 0x534d4150 next 15516 000e:0002 2
e820: err 0 id 0x0e7b next 11536 0010:0e6b 1

In the last entry,  id is obviously wrong (it should be 'SMAP' or
0x534d4150).  This is the BIOS bug.

Here's the reason why this bothers us now.  In the old assembly code,
if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
code.  In the new code in memory.c, if id != SMAP, we break out of the
int15 loop, and return boot_params.e820_entries, which in our case is
3.  detect_memory() considers this to be successful, and no attempt to
parse e801 is made.

So thats where the problem is - in the old code with the buggy BIOS, we
punted to reading the e801 information, and that was enough to keep us 
going.   In the new code, we allow a partial table to be used, and we
blow up.

Attached is a patch to fix this - it returns -1 on error, and only sets
boot_params.e820_entries to be non-zero if we have something useful
in it.  This punts the detection to the e801 code, which then is
then successful.

This fixes the problem with the DB800, and so it probably should
with the other Geode platforms affected by this.

Many thanks to hpa for the guiding hand.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.
[i386]: Return an error if the e820 detection goes bad

From: Jordan Crouse [EMAIL PROTECTED]

Change the e820 code to always return an error if something
bad happens while reading the e820 map.  This matches the
old code behavior, and allows brain-dead e820 implementations
to still work.

Signed-off-by: Jordan Crouse [EMAIL PROTECTED]
---

 arch/i386/boot/memory.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..4c7f0f6 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -22,7 +22,7 @@ static int detect_memory_e820(void)
 {
u32 next = 0;
u32 size, id;
-   u8 err;
+   u8 err, count = 0;
struct e820entry *desc = boot_params.e820_map;
 
do {
@@ -34,13 +34,14 @@ static int detect_memory_e820(void)
: D (desc), a (0xe820));
 
if (err || id != SMAP)
-   break;
+   return -1;
 
-   boot_params.e820_entries++;
+   count++;
desc++;
} while (next  boot_params.e820_entries  E820MAX);
 
-   return boot_params.e820_entries;
+   boot_params.e820_entries = count;
+   return count;
 }
 
 static int detect_memory_e801(void)


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Jordan Crouse wrote:
 On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
 Please try the following debug patch to let us know what is going on.

  -hpa
 
 diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
 index 1a2e62d..a0ccf29 100644
 --- a/arch/i386/boot/memory.c
 +++ b/arch/i386/boot/memory.c
 @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
=m (*desc)
  : D (desc), a (0xe820));
  
 +printf(e820: err %d id 0x%08x next %u %08x:%08x %u\n,
 +   err, id, next,
 +   (unsigned int)desc-addr,
 +   (unsigned int)desc-size,
 +   desc-type);
 +
  if (err || id != SMAP)
  break;
 
 Okay, we have clarity.   Here is the output
 
 e820: err 0 id 0x534d4150 next 15476 :0009fc00 1
 e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2
 e820: err 0 id 0x534d4150 next 15516 000e:0002 2
 e820: err 0 id 0x0e7b next 11536 0010:0e6b 1
 
 In the last entry,  id is obviously wrong (it should be 'SMAP' or
 0x534d4150).  This is the BIOS bug.
 
 Here's the reason why this bothers us now.  In the old assembly code,
 if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
 code.  In the new code in memory.c, if id != SMAP, we break out of the
 int15 loop, and return boot_params.e820_entries, which in our case is
 3.  detect_memory() considers this to be successful, and no attempt to
 parse e801 is made.
 
 So thats where the problem is - in the old code with the buggy BIOS, we
 punted to reading the e801 information, and that was enough to keep us 
 going.   In the new code, we allow a partial table to be used, and we
 blow up.
 
 Attached is a patch to fix this - it returns -1 on error, and only sets
 boot_params.e820_entries to be non-zero if we have something useful
 in it.  This punts the detection to the e801 code, which then is
 then successful.
 
 This fixes the problem with the DB800, and so it probably should
 with the other Geode platforms affected by this.
 
 Many thanks to hpa for the guiding hand.
 

This patch is obviously wrong.  There are a lot of e820 BIOSen out there
that terminate with CF=1, and that is a legitimate termination condition
for e820.  Now, as far as what to do when id != SMAP, it probably is
still the right thing to do; since the BOS vendor couldn't get something
that elementary correct, we shouldn't trust the data.

I'll write up a corrected patch.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Jordan Crouse
On 26/09/07 14:04 -0700, H. Peter Anvin wrote:
 Jordan Crouse wrote:
  On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
  Please try the following debug patch to let us know what is going on.
 
 -hpa
  
  diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
  index 1a2e62d..a0ccf29 100644
  --- a/arch/i386/boot/memory.c
  +++ b/arch/i386/boot/memory.c
  @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
   =m (*desc)
 : D (desc), a (0xe820));
   
  +  printf(e820: err %d id 0x%08x next %u %08x:%08x %u\n,
  + err, id, next,
  + (unsigned int)desc-addr,
  + (unsigned int)desc-size,
  + desc-type);
  +
 if (err || id != SMAP)
 break;
  
  Okay, we have clarity.   Here is the output
  
  e820: err 0 id 0x534d4150 next 15476 :0009fc00 1
  e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2
  e820: err 0 id 0x534d4150 next 15516 000e:0002 2
  e820: err 0 id 0x0e7b next 11536 0010:0e6b 1
  
  In the last entry,  id is obviously wrong (it should be 'SMAP' or
  0x534d4150).  This is the BIOS bug.
  
  Here's the reason why this bothers us now.  In the old assembly code,
  if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
  code.  In the new code in memory.c, if id != SMAP, we break out of the
  int15 loop, and return boot_params.e820_entries, which in our case is
  3.  detect_memory() considers this to be successful, and no attempt to
  parse e801 is made.
  
  So thats where the problem is - in the old code with the buggy BIOS, we
  punted to reading the e801 information, and that was enough to keep us 
  going.   In the new code, we allow a partial table to be used, and we
  blow up.
  
  Attached is a patch to fix this - it returns -1 on error, and only sets
  boot_params.e820_entries to be non-zero if we have something useful
  in it.  This punts the detection to the e801 code, which then is
  then successful.
  
  This fixes the problem with the DB800, and so it probably should
  with the other Geode platforms affected by this.
  
  Many thanks to hpa for the guiding hand.
  
 
 This patch is obviously wrong.  There are a lot of e820 BIOSen out there
 that terminate with CF=1, and that is a legitimate termination condition
 for e820.  Now, as far as what to do when id != SMAP, it probably is
 still the right thing to do; since the BOS vendor couldn't get something
 that elementary correct, we shouldn't trust the data.
 
 I'll write up a corrected patch.

Hmm - the old code seems to fail to e801 when CF was set too:

int $0x15   # make the call
jc  bail820 # fall to e801 if it fails

cmpl$SMAP, %eax # check the return is `SMAP'
jne bail820 # fall to e801 if it fails

Thats not to say that the old code was correct, mind you.  Failing on a
bad ID and returning without error on a set CF seems to be good to me.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Jordan Crouse wrote:
 
 Hmm - the old code seems to fail to e801 when CF was set too:
 
   int $0x15   # make the call
   jc  bail820 # fall to e801 if it fails
 
   cmpl$SMAP, %eax # check the return is `SMAP'
   jne bail820 # fall to e801 if it fails
 
 Thats not to say that the old code was correct, mind you.  Failing on a
 bad ID and returning without error on a set CF seems to be good to me.
 

Testing this patch now:

From 2efa33f81ef56e7700c09a3d8a881c96692149e5 Mon Sep 17 00:00:00 2001
From: H. Peter Anvin [EMAIL PROTECTED]
Date: Wed, 26 Sep 2007 14:11:43 -0700
Subject: [PATCH] [x86 setup] Handle case of improperly terminated E820 chain

At least one system (a Geode system with a Digital Logic BIOS) has
been found which suddenly stops reporting the SMAP signature when
reading the E820 memory chain.  We can't know what, exactly, broke in
the BIOS, so if we detect this situation, declare the E820 data
unusable and fall back to E801.

Also, revert to original behavior of always probing all memory
methods; that way all the memory information is available to the
kernel.

Signed-off-by: H. Peter Anvin [EMAIL PROTECTED]
Cc: Jordan Crouse [EMAIL PROTECTED]
Cc: Joerg Pommnitz [EMAIL PROTECTED]
---
 arch/i386/boot/memory.c |   30 +++---
 1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..bccaa1c 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -20,6 +20,7 @@
 
 static int detect_memory_e820(void)
 {
+	int count = 0;
 	u32 next = 0;
 	u32 size, id;
 	u8 err;
@@ -33,14 +34,24 @@ static int detect_memory_e820(void)
 		  =m (*desc)
 		: D (desc), a (0xe820));
 
-		if (err || id != SMAP)
+		/* Some BIOSes stop returning SMAP in the middle of
+		   the search loop.  We don't know exactly how the BIOS
+		   screwed up the map at that point, we might have a
+		   partial map, the full map, or complete garbage, so
+		   just return failure. */
+		if (id != SMAP) {
+			count = 0;
 			break;
+		}
 
-		boot_params.e820_entries++;
+		if (err)
+			break;
+
+		count++;
 		desc++;
-	} while (next  boot_params.e820_entries  E820MAX);
+	} while (next  count  E820MAX);
 
-	return boot_params.e820_entries;
+	return boot_params.e820_entries = count;
 }
 
 static int detect_memory_e801(void)
@@ -89,11 +100,16 @@ static int detect_memory_88(void)
 
 int detect_memory(void)
 {
+	int err = -1;
+
 	if (detect_memory_e820()  0)
-		return 0;
+		err = 0;
 
 	if (!detect_memory_e801())
-		return 0;
+		err = 0;
+
+	if (!detect_memory_88())
+		err = 0;
 
-	return detect_memory_88();
+	return err;
 }
-- 
1.5.3.1



Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Jordan Crouse
On 26/09/07 14:20 -0700, H. Peter Anvin wrote:
 Testing this patch now:
 

 From 2efa33f81ef56e7700c09a3d8a881c96692149e5 Mon Sep 17 00:00:00 2001
 From: H. Peter Anvin [EMAIL PROTECTED]
 Date: Wed, 26 Sep 2007 14:11:43 -0700
 Subject: [PATCH] [x86 setup] Handle case of improperly terminated E820 chain
 
 At least one system (a Geode system with a Digital Logic BIOS) has
 been found which suddenly stops reporting the SMAP signature when
 reading the E820 memory chain.  We can't know what, exactly, broke in
 the BIOS, so if we detect this situation, declare the E820 data
 unusable and fall back to E801.
 
 Also, revert to original behavior of always probing all memory
 methods; that way all the memory information is available to the
 kernel.
 
 Signed-off-by: H. Peter Anvin [EMAIL PROTECTED]
 Cc: Jordan Crouse [EMAIL PROTECTED]
 Cc: Joerg Pommnitz [EMAIL PROTECTED]
 ---
  arch/i386/boot/memory.c |   30 +++---
  1 files changed, 23 insertions(+), 7 deletions(-)
 
 diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
 index 1a2e62d..bccaa1c 100644
 --- a/arch/i386/boot/memory.c
 +++ b/arch/i386/boot/memory.c
 @@ -20,6 +20,7 @@
  
  static int detect_memory_e820(void)
  {
 + int count = 0;
   u32 next = 0;
   u32 size, id;
   u8 err;
 @@ -33,14 +34,24 @@ static int detect_memory_e820(void)
 =m (*desc)
   : D (desc), a (0xe820));
  
 - if (err || id != SMAP)
 + /* Some BIOSes stop returning SMAP in the middle of
 +the search loop.  We don't know exactly how the BIOS
 +screwed up the map at that point, we might have a
 +partial map, the full map, or complete garbage, so
 +just return failure. */
 + if (id != SMAP) {
 + count = 0;
   break;
 + }
  
 - boot_params.e820_entries++;
 + if (err)
 + break;
 +
 + count++;
   desc++;
 - } while (next  boot_params.e820_entries  E820MAX);
 + } while (next  count  E820MAX);
  
 - return boot_params.e820_entries;
 + return boot_params.e820_entries = count;
  }
  
  static int detect_memory_e801(void)
 @@ -89,11 +100,16 @@ static int detect_memory_88(void)
  
  int detect_memory(void)
  {
 + int err = -1;
 +
   if (detect_memory_e820()  0)
 - return 0;
 + err = 0;
  
   if (!detect_memory_e801())
 - return 0;
 + err = 0;
 +
 + if (!detect_memory_88())
 + err = 0;
  
 - return detect_memory_88();
 + return err;
  }
 -- 
 1.5.3.1
 

Works here with the buggy BIOS.  

Acked-by: Jordan Crouse [EMAIL PROTECTED]

Thanks.

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-25 Thread Jordan Crouse
On 25/09/07 01:38 -0700, Joerg Pommnitz wrote:
> Chuck, Jordan,
> thanks for taking an interest in this problem. As suggested by Jordan I tried 
> a new
> BIOS revision from
> http://www.digitallogic.ch/index.php?id=256=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21=23
> 
> Unfortunately the kernel still fails to boot in the same way.

You'll have to contact Digital Logic and have them check with the BIOS vendor
to see if the fix was made in that version or not.  I don't have that
particular board, so I can't try out the fixes here.

I'm still trying to track down the particulars of the fix from the BIOS 
vendor.  I'll let you know.

> Do you still need the disassembled reserve_bootmem_core? 

Sure - you might as well - just to make sure its the same problem.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-25 Thread Joerg Pommnitz
Chuck, Jordan,
thanks for taking an interest in this problem. As suggested by Jordan I tried a 
new
BIOS revision from
http://www.digitallogic.ch/index.php?id=256=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21=23

Unfortunately the kernel still fails to boot in the same way.

Do you still need the disassembled reserve_bootmem_core? 
--  
Thanks and kind regards
 Joerg



   
Yahoo! Clever: Stellen Sie Fragen und finden Sie Antworten. Teilen Sie Ihr 
Wissen. www.yahoo.de/clever

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-25 Thread Joerg Pommnitz
Chuck, Jordan,
thanks for taking an interest in this problem. As suggested by Jordan I tried a 
new
BIOS revision from
http://www.digitallogic.ch/index.php?id=256dir=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21mountpoint=23

Unfortunately the kernel still fails to boot in the same way.

Do you still need the disassembled reserve_bootmem_core? 
--  
Thanks and kind regards
 Joerg



   
Yahoo! Clever: Stellen Sie Fragen und finden Sie Antworten. Teilen Sie Ihr 
Wissen. www.yahoo.de/clever

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-25 Thread Jordan Crouse
On 25/09/07 01:38 -0700, Joerg Pommnitz wrote:
 Chuck, Jordan,
 thanks for taking an interest in this problem. As suggested by Jordan I tried 
 a new
 BIOS revision from
 http://www.digitallogic.ch/index.php?id=256dir=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21mountpoint=23
 
 Unfortunately the kernel still fails to boot in the same way.

You'll have to contact Digital Logic and have them check with the BIOS vendor
to see if the fix was made in that version or not.  I don't have that
particular board, so I can't try out the fixes here.

I'm still trying to track down the particulars of the fix from the BIOS 
vendor.  I'll let you know.

 Do you still need the disassembled reserve_bootmem_core? 

Sure - you might as well - just to make sure its the same problem.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-20 Thread Jordan Crouse
Chuck Ebbert wrote:

> On 09/20/2007 08:32 AM, Joerg Pommnitz wrote:
>> Hello all,
>> yesterday I tried to boot a kernel built from the current wireless-dev git
>> tree (ath5k branch)
>> on a MSEP800/A board (see http://www.milesie.co.uk/pdf/MSEP800.pdf). The
>> board
>> contains an AMD Geode LX800 CPU.
>> The wireless-dev tree is up to date with Linus kernel 2.6.23-rc6.
>> 
>> Attached is a photographic screen shot. The EIP value of c0378dd6 seems to
>> correspond with the
>> reserve_bootmem_core from System.map:
>> 
>> c0378d51 t free_bootmem_core
>> c0378da7 T free_bootmem
>> c0378db2 T free_bootmem_node
>> c0378dba t reserve_bootmem_core
>> c0378e14 T reserve_bootmem
>> c0378e1f T reserve_bootmem_node
>>

> Can you post disassembled code for that function?

Its hitting a bug - specifically (from bootmem.c:125):
BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);

I hit this problem on a db800 last week.  It went away with a newer
version of the BIOS, which doesn't help Joerg any, since its a different
board (though I think it is the same BIOS vendor).  Other BIOSes work
just fine with the same kernel image (including known troublemakers like
LinuxBIOS).  I believe that 2.6.22 was good, so some change must
have come along in 2.6.23-pre to cause the pain.  Or, it may have exposed
old breakage in the BIOS that was later repaired.

I'll do the math to figure out whats happening - and I'll check the release
notes to see what changed in the BIOS between the failing and working
version.  If anybody familiar with arch/i386 can think of something
new in the kernel that may have precipitated this, do let me know. :)

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-20 Thread Chuck Ebbert
On 09/20/2007 08:32 AM, Joerg Pommnitz wrote:
> Hello all,
> yesterday I tried to boot a kernel built from the current wireless-dev git 
> tree (ath5k branch)
> on a MSEP800/A board (see http://www.milesie.co.uk/pdf/MSEP800.pdf). The board
> contains an AMD Geode LX800 CPU.
> The wireless-dev tree is up to date with Linus kernel 2.6.23-rc6.
> 
> Attached is a photographic screen shot. The EIP value of c0378dd6 seems to 
> correspond with the
> reserve_bootmem_core from System.map:
> 
> c0378d51 t free_bootmem_core
> c0378da7 T free_bootmem
> c0378db2 T free_bootmem_node
> c0378dba t reserve_bootmem_core
> c0378e14 T reserve_bootmem
> c0378e1f T reserve_bootmem_node
> 

Can you post disassembled code for that function?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-20 Thread Chuck Ebbert
On 09/20/2007 08:32 AM, Joerg Pommnitz wrote:
 Hello all,
 yesterday I tried to boot a kernel built from the current wireless-dev git 
 tree (ath5k branch)
 on a MSEP800/A board (see http://www.milesie.co.uk/pdf/MSEP800.pdf). The board
 contains an AMD Geode LX800 CPU.
 The wireless-dev tree is up to date with Linus kernel 2.6.23-rc6.
 
 Attached is a photographic screen shot. The EIP value of c0378dd6 seems to 
 correspond with the
 reserve_bootmem_core from System.map:
 
 c0378d51 t free_bootmem_core
 c0378da7 T free_bootmem
 c0378db2 T free_bootmem_node
 c0378dba t reserve_bootmem_core
 c0378e14 T reserve_bootmem
 c0378e1f T reserve_bootmem_node
 

Can you post disassembled code for that function?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-20 Thread Jordan Crouse
Chuck Ebbert wrote:

 On 09/20/2007 08:32 AM, Joerg Pommnitz wrote:
 Hello all,
 yesterday I tried to boot a kernel built from the current wireless-dev git
 tree (ath5k branch)
 on a MSEP800/A board (see http://www.milesie.co.uk/pdf/MSEP800.pdf). The
 board
 contains an AMD Geode LX800 CPU.
 The wireless-dev tree is up to date with Linus kernel 2.6.23-rc6.
 
 Attached is a photographic screen shot. The EIP value of c0378dd6 seems to
 correspond with the
 reserve_bootmem_core from System.map:
 
 c0378d51 t free_bootmem_core
 c0378da7 T free_bootmem
 c0378db2 T free_bootmem_node
 c0378dba t reserve_bootmem_core
 c0378e14 T reserve_bootmem
 c0378e1f T reserve_bootmem_node


 Can you post disassembled code for that function?

Its hitting a bug - specifically (from bootmem.c:125):
BUG_ON(PFN_DOWN(addr) = bdata-node_low_pfn);

I hit this problem on a db800 last week.  It went away with a newer
version of the BIOS, which doesn't help Joerg any, since its a different
board (though I think it is the same BIOS vendor).  Other BIOSes work
just fine with the same kernel image (including known troublemakers like
LinuxBIOS).  I believe that 2.6.22 was good, so some change must
have come along in 2.6.23-pre to cause the pain.  Or, it may have exposed
old breakage in the BIOS that was later repaired.

I'll do the math to figure out whats happening - and I'll check the release
notes to see what changed in the BIOS between the failing and working
version.  If anybody familiar with arch/i386 can think of something
new in the kernel that may have precipitated this, do let me know. :)

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/