Re: More E820 brokenness

2007-09-28 Thread Rafael J. Wysocki
On Friday, 28 September 2007 02:12, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> > 
> > Worked, but that just raises more questions.  Why didn't more x86 boxes
> > break or, alternatively, why did a new version of the BIOS fix the problem? 
> > I guess we shouldn't look a gift horse in the mouth. Or something.
> > 
> 
> Why didn't more x86 boxes break... well, it's pretty natural an
> implementation of the BIOS to not clobber registers that aren't outputs.
>  Arguably the BIOSes that do are still buggy, since there isn't a
> well-defined calling sequence for the BIOS and the convention that has
> evolved is "don't clobber anything unless it's an output."
> 
> It's still wrong, however, especially since it means omitting the *real*
> SMAP check.

I'd like to update http://bugzilla.kernel.org/show_bug.cgi?id=9086 with correct
information.

Should I add a pointer to the patch from your previous message to it?

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


AW: More E820 brokenness

2007-09-28 Thread Joerg Pommnitz
Hello all,
I won't be able to test anything before next Tuesday, but I'm with
Jordan on this one. Have a look at the dmsg output I sent you some
time ago (http://article.gmane.org/gmane.linux.kernel/584681).

This was as plain from Linus' tree as you get it (git bisect, make defconfig, 
make)
and I see:
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 1e7b (usable)
 BIOS-e820: 1e7b - 1e7bffc0 (ACPI data)
 BIOS-e820: 1e7bffc0 - 1e7c (ACPI NVS)
 BIOS-e820: 4040 - 40440004 (reserved)
 BIOS-e820: f000 - 0001 (reserved) --  
Kind regards
 
   Joerg
 


- Ursprüngliche Mail 
Von: Jordan Crouse <[EMAIL PROTECTED]>
An: H. Peter Anvin <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]; Joerg Pommnitz <[EMAIL PROTECTED]>; Chuck Ebbert <[EMAIL 
PROTECTED]>; Linux Kernel Mailing List ; Andi 
Kleen <[EMAIL PROTECTED]>
Gesendet: Freitag, den 28. September 2007, 01:15:52 Uhr
Betreff: Re: More E820 brokenness

On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> > 
> > Breaks on the Geode - original behavior.
> > 
> > I think that having boot_prams.e820_entries != 0 makes the kernel
> > assume the e820 data is correct.
> > 
> 
> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
> because this, to the best of my reading, mimics the 2.6.22 behavior
> exactly.  DID IT REALLY, and/or did you make any kind of configuration
> changes?

I copied in a 2.6.22 kernel to see that it really did work, and it did.
But here's the crazy part - I did a dmesg, and it looks like it
*is* using e820 data, and it looks complete (I see the entire map - 
including the ACPI and reserved blocks way up high).

So apparently it was the 2.6.22 code that was buggy, but reading it,
I don't immediately see how. 

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.








  __  
Yahoo! Clever: Sie haben Fragen? Yahoo! Nutzer antworten Ihnen. 
www.yahoo.de/clever

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


AW: More E820 brokenness

2007-09-28 Thread Joerg Pommnitz
Hello all,
I won't be able to test anything before next Tuesday, but I'm with
Jordan on this one. Have a look at the dmsg output I sent you some
time ago (http://article.gmane.org/gmane.linux.kernel/584681).

This was as plain from Linus' tree as you get it (git bisect, make defconfig, 
make)
and I see:
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 1e7b (usable)
 BIOS-e820: 1e7b - 1e7bffc0 (ACPI data)
 BIOS-e820: 1e7bffc0 - 1e7c (ACPI NVS)
 BIOS-e820: 4040 - 40440004 (reserved)
 BIOS-e820: f000 - 0001 (reserved) --  
Kind regards
 
   Joerg
 


- Ursprüngliche Mail 
Von: Jordan Crouse [EMAIL PROTECTED]
An: H. Peter Anvin [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]; Joerg Pommnitz [EMAIL PROTECTED]; Chuck Ebbert [EMAIL 
PROTECTED]; Linux Kernel Mailing List linux-kernel@vger.kernel.org; Andi 
Kleen [EMAIL PROTECTED]
Gesendet: Freitag, den 28. September 2007, 01:15:52 Uhr
Betreff: Re: More E820 brokenness

On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
 Jordan Crouse wrote:
  
  Breaks on the Geode - original behavior.
  
  I think that having boot_prams.e820_entries != 0 makes the kernel
  assume the e820 data is correct.
  
 
 Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
 because this, to the best of my reading, mimics the 2.6.22 behavior
 exactly.  DID IT REALLY, and/or did you make any kind of configuration
 changes?

I copied in a 2.6.22 kernel to see that it really did work, and it did.
But here's the crazy part - I did a dmesg, and it looks like it
*is* using e820 data, and it looks complete (I see the entire map - 
including the ACPI and reserved blocks way up high).

So apparently it was the 2.6.22 code that was buggy, but reading it,
I don't immediately see how. 

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.








  __  
Yahoo! Clever: Sie haben Fragen? Yahoo! Nutzer antworten Ihnen. 
www.yahoo.de/clever

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-28 Thread Rafael J. Wysocki
On Friday, 28 September 2007 02:12, H. Peter Anvin wrote:
 Jordan Crouse wrote:
  
  Worked, but that just raises more questions.  Why didn't more x86 boxes
  break or, alternatively, why did a new version of the BIOS fix the problem? 
  I guess we shouldn't look a gift horse in the mouth. Or something.
  
 
 Why didn't more x86 boxes break... well, it's pretty natural an
 implementation of the BIOS to not clobber registers that aren't outputs.
  Arguably the BIOSes that do are still buggy, since there isn't a
 well-defined calling sequence for the BIOS and the convention that has
 evolved is don't clobber anything unless it's an output.
 
 It's still wrong, however, especially since it means omitting the *real*
 SMAP check.

I'd like to update http://bugzilla.kernel.org/show_bug.cgi?id=9086 with correct
information.

Should I add a pointer to the patch from your previous message to it?

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
> 
> Worked, but that just raises more questions.  Why didn't more x86 boxes
> break or, alternatively, why did a new version of the BIOS fix the problem? 
> I guess we shouldn't look a gift horse in the mouth. Or something.
> 

Why didn't more x86 boxes break... well, it's pretty natural an
implementation of the BIOS to not clobber registers that aren't outputs.
 Arguably the BIOSes that do are still buggy, since there isn't a
well-defined calling sequence for the BIOS and the convention that has
evolved is "don't clobber anything unless it's an output."

It's still wrong, however, especially since it means omitting the *real*
SMAP check.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 16:36 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> >>>
> >> Oh bugger, looks like this one might be genuinely my fault after all.
> >> The ID check in the new code is buggy.
> >>
> >> Can you please test this revised patch out (against current -git)?
> > 
> > 
> > That looks the same as the previous patch you sent?
> > 
> 
> Sorry, this is the right one...
> 
>   -hpa

> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index bccaa1c..2f37568 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -28,11 +28,10 @@ static int detect_memory_e820(void)
>  
>   do {
>   size = sizeof(struct e820entry);
> - id = SMAP;
>   asm("int $0x15; setc %0"
> - : "=am" (err), "+b" (next), "+d" (id), "+c" (size),
> + : "=dm" (err), "+b" (next), "=a" (id), "+c" (size),
> "=m" (*desc)
> - : "D" (desc), "a" (0xe820));
> + : "D" (desc), "d" (SMAP), "a" (0xe820));
>  
>   /* Some BIOSes stop returning SMAP in the middle of
>  the search loop.  We don't know exactly how the BIOS

Worked, but that just raises more questions.  Why didn't more x86 boxes
break or, alternatively, why did a new version of the BIOS fix the problem? 
I guess we shouldn't look a gift horse in the mouth. Or something.

Thanks very much for your help.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
>>>
>> Oh bugger, looks like this one might be genuinely my fault after all.
>> The ID check in the new code is buggy.
>>
>> Can you please test this revised patch out (against current -git)?
> 
> 
> That looks the same as the previous patch you sent?
> 

Sorry, this is the right one...

-hpa
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..2f37568 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -28,11 +28,10 @@ static int detect_memory_e820(void)
 
 	do {
 		size = sizeof(struct e820entry);
-		id = SMAP;
 		asm("int $0x15; setc %0"
-		: "=am" (err), "+b" (next), "+d" (id), "+c" (size),
+		: "=dm" (err), "+b" (next), "=a" (id), "+c" (size),
 		  "=m" (*desc)
-		: "D" (desc), "a" (0xe820));
+		: "D" (desc), "d" (SMAP), "a" (0xe820));
 
 		/* Some BIOSes stop returning SMAP in the middle of
 		   the search loop.  We don't know exactly how the BIOS


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 16:27 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> > On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
> >> Jordan Crouse wrote:
> >>> Breaks on the Geode - original behavior.
> >>>
> >>> I think that having boot_prams.e820_entries != 0 makes the kernel
> >>> assume the e820 data is correct.
> >>>
> >> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
> >> because this, to the best of my reading, mimics the 2.6.22 behavior
> >> exactly.  DID IT REALLY, and/or did you make any kind of configuration
> >> changes?
> > 
> > I copied in a 2.6.22 kernel to see that it really did work, and it did.
> > But here's the crazy part - I did a dmesg, and it looks like it
> > *is* using e820 data, and it looks complete (I see the entire map - 
> > including the ACPI and reserved blocks way up high).
> > 
> > So apparently it was the 2.6.22 code that was buggy, but reading it,
> > I don't immediately see how. 
> > 
> 
> Oh bugger, looks like this one might be genuinely my fault after all.
> The ID check in the new code is buggy.
> 
> Can you please test this revised patch out (against current -git)?

>   -hpa
> 
> 
> 

> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index bccaa1c..84939b7 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -34,17 +34,7 @@ static int detect_memory_e820(void)
> "=m" (*desc)
>   : "D" (desc), "a" (0xe820));
>  
> - /* Some BIOSes stop returning SMAP in the middle of
> -the search loop.  We don't know exactly how the BIOS
> -screwed up the map at that point, we might have a
> -partial map, the full map, or complete garbage, so
> -just return failure. */
> - if (id != SMAP) {
> - count = 0;
> - break;
> - }
> -
> - if (err)
> + if (id != SMAP || err)
>   break;
>  
>   count++;


That looks the same as the previous patch you sent?

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
> On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
>> Jordan Crouse wrote:
>>> Breaks on the Geode - original behavior.
>>>
>>> I think that having boot_prams.e820_entries != 0 makes the kernel
>>> assume the e820 data is correct.
>>>
>> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
>> because this, to the best of my reading, mimics the 2.6.22 behavior
>> exactly.  DID IT REALLY, and/or did you make any kind of configuration
>> changes?
> 
> I copied in a 2.6.22 kernel to see that it really did work, and it did.
> But here's the crazy part - I did a dmesg, and it looks like it
> *is* using e820 data, and it looks complete (I see the entire map - 
> including the ACPI and reserved blocks way up high).
> 
> So apparently it was the 2.6.22 code that was buggy, but reading it,
> I don't immediately see how. 
> 

Oh bugger, looks like this one might be genuinely my fault after all.
The ID check in the new code is buggy.

Can you please test this revised patch out (against current -git)?

-hpa



diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..84939b7 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -34,17 +34,7 @@ static int detect_memory_e820(void)
  "=m" (*desc)
: "D" (desc), "a" (0xe820));
 
-   /* Some BIOSes stop returning SMAP in the middle of
-  the search loop.  We don't know exactly how the BIOS
-  screwed up the map at that point, we might have a
-  partial map, the full map, or complete garbage, so
-  just return failure. */
-   if (id != SMAP) {
-   count = 0;
-   break;
-   }
-
-   if (err)
+   if (id != SMAP || err)
break;
 
count++;


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
> 
> I copied in a 2.6.22 kernel to see that it really did work, and it did.
> But here's the crazy part - I did a dmesg, and it looks like it
> *is* using e820 data, and it looks complete (I see the entire map - 
> including the ACPI and reserved blocks way up high).
> 
> So apparently it was the 2.6.22 code that was buggy, but reading it,
> I don't immediately see how. 
> 

Was this a stock 2.6.22 kernel, or might it have been modified?

There is, of course, also the possibility that triggering the BIOS bug
in your case depends on some delicate combination of input state.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> > 
> > Breaks on the Geode - original behavior.
> > 
> > I think that having boot_prams.e820_entries != 0 makes the kernel
> > assume the e820 data is correct.
> > 
> 
> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
> because this, to the best of my reading, mimics the 2.6.22 behavior
> exactly.  DID IT REALLY, and/or did you make any kind of configuration
> changes?

I copied in a 2.6.22 kernel to see that it really did work, and it did.
But here's the crazy part - I did a dmesg, and it looks like it
*is* using e820 data, and it looks complete (I see the entire map - 
including the ACPI and reserved blocks way up high).

So apparently it was the 2.6.22 code that was buggy, but reading it,
I don't immediately see how. 

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
> 
> Breaks on the Geode - original behavior.
> 
> I think that having boot_prams.e820_entries != 0 makes the kernel
> assume the e820 data is correct.
> 

Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
because this, to the best of my reading, mimics the 2.6.22 behavior
exactly.  DID IT REALLY, and/or did you make any kind of configuration
changes?

>> I want to emphasize that this is seriously broken.  Using a partial e820
>> map could have disastrous results, since the kernel will have partial
>> memory map information and not know about reserved areas, etc.  Part of
>> me feels that the right thing to do is what the current git kernel does
>> -- either fall back to e801, or stop and error.
> 
> I'm inclined to agree.  

Arguably the right thing to do is to find the responsible BIOS engineer
and shoot them, but that's hard to do without robotics.

-hpa



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 15:17 -0700, H. Peter Anvin wrote:
> As luck would have it, it's not just an obscure Geode system which has a
> broken E820 implementation.  Today I received a bug report about a Dell
> system (XPS M1330) with broken E820.
> 
> Unfortunately, the workaround for the Geode breaks this system, because
> x86-64 doesn't fall back to the e801/88 information like the i386 kernel
> does.
> 
> I wonder if the relevant people could test out this patch to see how it
> works on their respective system.  This patch reverts to 2.6.23-rc8
> behaviour of simply truncating the map, but still makes e801/88 info
> available to the kernel; this hopefully should match 2.6.22 behaviour.

Breaks on the Geode - original behavior.

I think that having boot_prams.e820_entries != 0 makes the kernel
assume the e820 data is correct.

> I want to emphasize that this is seriously broken.  Using a partial e820
> map could have disastrous results, since the kernel will have partial
> memory map information and not know about reserved areas, etc.  Part of
> me feels that the right thing to do is what the current git kernel does
> -- either fall back to e801, or stop and error.

I'm inclined to agree.  

Jordan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


More E820 brokenness

2007-09-27 Thread H. Peter Anvin
As luck would have it, it's not just an obscure Geode system which has a
broken E820 implementation.  Today I received a bug report about a Dell
system (XPS M1330) with broken E820.

Unfortunately, the workaround for the Geode breaks this system, because
x86-64 doesn't fall back to the e801/88 information like the i386 kernel
does.

I wonder if the relevant people could test out this patch to see how it
works on their respective system.  This patch reverts to 2.6.23-rc8
behaviour of simply truncating the map, but still makes e801/88 info
available to the kernel; this hopefully should match 2.6.22 behaviour.

I want to emphasize that this is seriously broken.  Using a partial e820
map could have disastrous results, since the kernel will have partial
memory map information and not know about reserved areas, etc.  Part of
me feels that the right thing to do is what the current git kernel does
-- either fall back to e801, or stop and error.

(Andi: I would particularly appreciate your opinion on this issue.)

-hpa
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..84939b7 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -34,17 +34,7 @@ static int detect_memory_e820(void)
  "=m" (*desc)
: "D" (desc), "a" (0xe820));
 
-   /* Some BIOSes stop returning SMAP in the middle of
-  the search loop.  We don't know exactly how the BIOS
-  screwed up the map at that point, we might have a
-  partial map, the full map, or complete garbage, so
-  just return failure. */
-   if (id != SMAP) {
-   count = 0;
-   break;
-   }
-
-   if (err)
+   if (id != SMAP || err)
break;
 
count++;


More E820 brokenness

2007-09-27 Thread H. Peter Anvin
As luck would have it, it's not just an obscure Geode system which has a
broken E820 implementation.  Today I received a bug report about a Dell
system (XPS M1330) with broken E820.

Unfortunately, the workaround for the Geode breaks this system, because
x86-64 doesn't fall back to the e801/88 information like the i386 kernel
does.

I wonder if the relevant people could test out this patch to see how it
works on their respective system.  This patch reverts to 2.6.23-rc8
behaviour of simply truncating the map, but still makes e801/88 info
available to the kernel; this hopefully should match 2.6.22 behaviour.

I want to emphasize that this is seriously broken.  Using a partial e820
map could have disastrous results, since the kernel will have partial
memory map information and not know about reserved areas, etc.  Part of
me feels that the right thing to do is what the current git kernel does
-- either fall back to e801, or stop and error.

(Andi: I would particularly appreciate your opinion on this issue.)

-hpa
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..84939b7 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -34,17 +34,7 @@ static int detect_memory_e820(void)
  =m (*desc)
: D (desc), a (0xe820));
 
-   /* Some BIOSes stop returning SMAP in the middle of
-  the search loop.  We don't know exactly how the BIOS
-  screwed up the map at that point, we might have a
-  partial map, the full map, or complete garbage, so
-  just return failure. */
-   if (id != SMAP) {
-   count = 0;
-   break;
-   }
-
-   if (err)
+   if (id != SMAP || err)
break;
 
count++;


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
 
 I copied in a 2.6.22 kernel to see that it really did work, and it did.
 But here's the crazy part - I did a dmesg, and it looks like it
 *is* using e820 data, and it looks complete (I see the entire map - 
 including the ACPI and reserved blocks way up high).
 
 So apparently it was the 2.6.22 code that was buggy, but reading it,
 I don't immediately see how. 
 

Was this a stock 2.6.22 kernel, or might it have been modified?

There is, of course, also the possibility that triggering the BIOS bug
in your case depends on some delicate combination of input state.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
 Jordan Crouse wrote:
  
  Breaks on the Geode - original behavior.
  
  I think that having boot_prams.e820_entries != 0 makes the kernel
  assume the e820 data is correct.
  
 
 Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
 because this, to the best of my reading, mimics the 2.6.22 behavior
 exactly.  DID IT REALLY, and/or did you make any kind of configuration
 changes?

I copied in a 2.6.22 kernel to see that it really did work, and it did.
But here's the crazy part - I did a dmesg, and it looks like it
*is* using e820 data, and it looks complete (I see the entire map - 
including the ACPI and reserved blocks way up high).

So apparently it was the 2.6.22 code that was buggy, but reading it,
I don't immediately see how. 

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
 On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
 Jordan Crouse wrote:
 Breaks on the Geode - original behavior.

 I think that having boot_prams.e820_entries != 0 makes the kernel
 assume the e820 data is correct.

 Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
 because this, to the best of my reading, mimics the 2.6.22 behavior
 exactly.  DID IT REALLY, and/or did you make any kind of configuration
 changes?
 
 I copied in a 2.6.22 kernel to see that it really did work, and it did.
 But here's the crazy part - I did a dmesg, and it looks like it
 *is* using e820 data, and it looks complete (I see the entire map - 
 including the ACPI and reserved blocks way up high).
 
 So apparently it was the 2.6.22 code that was buggy, but reading it,
 I don't immediately see how. 
 

Oh bugger, looks like this one might be genuinely my fault after all.
The ID check in the new code is buggy.

Can you please test this revised patch out (against current -git)?

-hpa



diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..84939b7 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -34,17 +34,7 @@ static int detect_memory_e820(void)
  =m (*desc)
: D (desc), a (0xe820));
 
-   /* Some BIOSes stop returning SMAP in the middle of
-  the search loop.  We don't know exactly how the BIOS
-  screwed up the map at that point, we might have a
-  partial map, the full map, or complete garbage, so
-  just return failure. */
-   if (id != SMAP) {
-   count = 0;
-   break;
-   }
-
-   if (err)
+   if (id != SMAP || err)
break;
 
count++;


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
 
 Breaks on the Geode - original behavior.
 
 I think that having boot_prams.e820_entries != 0 makes the kernel
 assume the e820 data is correct.
 

Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
because this, to the best of my reading, mimics the 2.6.22 behavior
exactly.  DID IT REALLY, and/or did you make any kind of configuration
changes?

 I want to emphasize that this is seriously broken.  Using a partial e820
 map could have disastrous results, since the kernel will have partial
 memory map information and not know about reserved areas, etc.  Part of
 me feels that the right thing to do is what the current git kernel does
 -- either fall back to e801, or stop and error.
 
 I'm inclined to agree.  

Arguably the right thing to do is to find the responsible BIOS engineer
and shoot them, but that's hard to do without robotics.

-hpa



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 15:17 -0700, H. Peter Anvin wrote:
 As luck would have it, it's not just an obscure Geode system which has a
 broken E820 implementation.  Today I received a bug report about a Dell
 system (XPS M1330) with broken E820.
 
 Unfortunately, the workaround for the Geode breaks this system, because
 x86-64 doesn't fall back to the e801/88 information like the i386 kernel
 does.
 
 I wonder if the relevant people could test out this patch to see how it
 works on their respective system.  This patch reverts to 2.6.23-rc8
 behaviour of simply truncating the map, but still makes e801/88 info
 available to the kernel; this hopefully should match 2.6.22 behaviour.

Breaks on the Geode - original behavior.

I think that having boot_prams.e820_entries != 0 makes the kernel
assume the e820 data is correct.

 I want to emphasize that this is seriously broken.  Using a partial e820
 map could have disastrous results, since the kernel will have partial
 memory map information and not know about reserved areas, etc.  Part of
 me feels that the right thing to do is what the current git kernel does
 -- either fall back to e801, or stop and error.

I'm inclined to agree.  

Jordan


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 16:27 -0700, H. Peter Anvin wrote:
 Jordan Crouse wrote:
  On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
  Jordan Crouse wrote:
  Breaks on the Geode - original behavior.
 
  I think that having boot_prams.e820_entries != 0 makes the kernel
  assume the e820 data is correct.
 
  Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
  because this, to the best of my reading, mimics the 2.6.22 behavior
  exactly.  DID IT REALLY, and/or did you make any kind of configuration
  changes?
  
  I copied in a 2.6.22 kernel to see that it really did work, and it did.
  But here's the crazy part - I did a dmesg, and it looks like it
  *is* using e820 data, and it looks complete (I see the entire map - 
  including the ACPI and reserved blocks way up high).
  
  So apparently it was the 2.6.22 code that was buggy, but reading it,
  I don't immediately see how. 
  
 
 Oh bugger, looks like this one might be genuinely my fault after all.
 The ID check in the new code is buggy.
 
 Can you please test this revised patch out (against current -git)?

   -hpa
 
 
 

 diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
 index bccaa1c..84939b7 100644
 --- a/arch/i386/boot/memory.c
 +++ b/arch/i386/boot/memory.c
 @@ -34,17 +34,7 @@ static int detect_memory_e820(void)
 =m (*desc)
   : D (desc), a (0xe820));
  
 - /* Some BIOSes stop returning SMAP in the middle of
 -the search loop.  We don't know exactly how the BIOS
 -screwed up the map at that point, we might have a
 -partial map, the full map, or complete garbage, so
 -just return failure. */
 - if (id != SMAP) {
 - count = 0;
 - break;
 - }
 -
 - if (err)
 + if (id != SMAP || err)
   break;
  
   count++;


That looks the same as the previous patch you sent?

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:

 Oh bugger, looks like this one might be genuinely my fault after all.
 The ID check in the new code is buggy.

 Can you please test this revised patch out (against current -git)?
 
 
 That looks the same as the previous patch you sent?
 

Sorry, this is the right one...

-hpa
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..2f37568 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -28,11 +28,10 @@ static int detect_memory_e820(void)
 
 	do {
 		size = sizeof(struct e820entry);
-		id = SMAP;
 		asm(int $0x15; setc %0
-		: =am (err), +b (next), +d (id), +c (size),
+		: =dm (err), +b (next), =a (id), +c (size),
 		  =m (*desc)
-		: D (desc), a (0xe820));
+		: D (desc), d (SMAP), a (0xe820));
 
 		/* Some BIOSes stop returning SMAP in the middle of
 		   the search loop.  We don't know exactly how the BIOS


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 16:36 -0700, H. Peter Anvin wrote:
 Jordan Crouse wrote:
 
  Oh bugger, looks like this one might be genuinely my fault after all.
  The ID check in the new code is buggy.
 
  Can you please test this revised patch out (against current -git)?
  
  
  That looks the same as the previous patch you sent?
  
 
 Sorry, this is the right one...
 
   -hpa

 diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
 index bccaa1c..2f37568 100644
 --- a/arch/i386/boot/memory.c
 +++ b/arch/i386/boot/memory.c
 @@ -28,11 +28,10 @@ static int detect_memory_e820(void)
  
   do {
   size = sizeof(struct e820entry);
 - id = SMAP;
   asm(int $0x15; setc %0
 - : =am (err), +b (next), +d (id), +c (size),
 + : =dm (err), +b (next), =a (id), +c (size),
 =m (*desc)
 - : D (desc), a (0xe820));
 + : D (desc), d (SMAP), a (0xe820));
  
   /* Some BIOSes stop returning SMAP in the middle of
  the search loop.  We don't know exactly how the BIOS

Worked, but that just raises more questions.  Why didn't more x86 boxes
break or, alternatively, why did a new version of the BIOS fix the problem? 
I guess we shouldn't look a gift horse in the mouth. Or something.

Thanks very much for your help.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
 
 Worked, but that just raises more questions.  Why didn't more x86 boxes
 break or, alternatively, why did a new version of the BIOS fix the problem? 
 I guess we shouldn't look a gift horse in the mouth. Or something.
 

Why didn't more x86 boxes break... well, it's pretty natural an
implementation of the BIOS to not clobber registers that aren't outputs.
 Arguably the BIOSes that do are still buggy, since there isn't a
well-defined calling sequence for the BIOS and the convention that has
evolved is don't clobber anything unless it's an output.

It's still wrong, however, especially since it means omitting the *real*
SMAP check.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/