from:"David P. Reed"

Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic

2020-07-07 Thread David P. Reed



On Tuesday, July 7, 2020 3:24pm, "Sean Christopherson" 
 said:

> On Tue, Jul 07, 2020 at 03:09:38PM -0400, David P. Reed wrote:
>>
>> On Tuesday, July 7, 2020 1:09am, "Sean Christopherson"
>>  said:
>> Sean, are you the one who would get this particular fix pushed into Linus's
>> tree, by the way? The "maintainership" is not clear to me.
> 
> Nope, I'm just here to complain and nitpick :-)  There's no direct maintainer
> for virtext.h so it falls under the higher level arch/x86 umbrella, i.e. I
> expect Boris/Thomas/Ingo will pick this up.
> 
Thanks for your time and effort in helping.

Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic

2020-07-07 Thread David P. Reed



On Tuesday, July 7, 2020 1:09am, "Sean Christopherson" 
 said:

> On Sat, Jul 04, 2020 at 04:38:08PM -0400, David P. Reed wrote:
>> Fix: Mask undefined operation fault during emergency VMXOFF that must be
>> attempted to force cpu exit from VMX root operation.
>> Explanation: When a cpu may be in VMX root operation (only possible when
>> CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation
>> using VMXOFF. This is necessary, because any INIT will be masked while cpu
>> is in VMX root operation, but that state cannot be reliably
>> discerned by the state of the cpu.
>> VMXOFF faults if the cpu is not actually in VMX root operation, signalling
>> undefined operation.
>> Discovered while debugging an out-of-tree x-visor with a race. Can happen
>> due to certain kinds of bugs in KVM.
>>
>> Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067>
>> Reported-by: David P. Reed 
>> Suggested-by: Thomas Gleixner 
>> Suggested-by: Sean Christopherson 
>> Suggested-by: Andy Lutomirski 
>> Signed-off-by: David P. Reed 
>> ---
>>  arch/x86/include/asm/virtext.h | 20 ++--
>>  1 file changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
>> index 0ede8d04535a..0e0900eacb9c 100644
>> --- a/arch/x86/include/asm/virtext.h
>> +++ b/arch/x86/include/asm/virtext.h
>> @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void)
>>  }
>>
>>
>> -/* Disable VMX on the current CPU
>> +/* Exit VMX root mode and isable VMX on the current CPU.
>>   *
>>   * vmxoff causes a undefined-opcode exception if vmxon was not run
>> - * on the CPU previously. Only call this function if you know VMX
>> - * is enabled.
>> + * on the CPU previously. Only call this function if you know cpu
>> + * is in VMX root mode.
>>   */
>>  static inline void cpu_vmxoff(void)
>>  {
>> @@ -47,14 +47,22 @@ static inline int cpu_vmx_enabled(void)
>>  return __read_cr4() & X86_CR4_VMXE;
>>  }
>>
>> -/* Disable VMX if it is enabled on the current CPU
>> +/* Safely exit VMX root mode and disable VMX if VMX enabled
>> + * on the current CPU. Handle undefined-opcode fault
>> + * that can occur if cpu is not in VMX root mode, due
>> + * to a race.
>>   *
>>   * You shouldn't call this if cpu_has_vmx() returns 0.
>>   */
>>  static inline void __cpu_emergency_vmxoff(void)
>>  {
>> -if (cpu_vmx_enabled())
>> -cpu_vmxoff();
>> +if (!cpu_vmx_enabled())
>> +return;
>> +asm volatile ("1:vmxoff\n\t"
>> +  "2:\n\t"
>> +  _ASM_EXTABLE(1b, 2b)
>> +  ::: "cc", "memory");
>> +cr4_clear_bits(X86_CR4_VMXE);
> 
> Open coding vmxoff doesn't make sense, and IMO is flat out wrong as it fixes
> flows that use __cpu_emergency_vmxoff() but leaves the same bug hanging
> around in emergency_vmx_disable_all() until the next patch.
> 
> The reason I say it doesn't make sense is that there is no sane scenario
> where the generic vmxoff helper should _not_ eat the fault.  All other VMXOFF
> faults are mode related, i.e. any fault is guaranteed to be due to the
> !post-VMXON check unless we're magically in RM, VM86, compat mode, or at
> CPL>0.  Given that the whole point of this series is that it's impossible to
> determine whether or not the CPU if post-VMXON if CR4.VMXE=1 without taking a
> fault of some form, there's simply no way that anything except the hypervisor
> (in normal operation) can know the state of VMX.  And given that the only
> in-tree hypervisor (KVM) has its own version of vmxoff, that means there is
> no scenario in which cpu_vmxoff() can safely be used.  Case in point, after
> the next patch there are no users of cpu_vmxoff().
> 
> TL;DR: Just do fixup on cpu_vmxoff().

Personally, I don't care either way, since it fixes the bug either way (and 
it's inlined, so either way no additional code is generated. I was just being 
conservative since the cpu_vmxoff() is exported throughout the kernel source, 
so it might be expected to stay the same (when not in an "emergency"). I'll 
wait a day or two for any objections to just doing the fix in cpu_vmxoff() by 
other commenters. WIth no objection, I'll just do it that way.

Sean, are you the one who would get this particular fix pushed into Linus's 
tree, by the way? The "maintainership" is not clear to me. If you are, happy to 
take direction from you as the primary input.

> 
>>  }
>>
>>  /* Disable VMX if it is supported and enabled on the current CPU
>> --
>> 2.26.2
>>
>

Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic

2020-07-05 Thread David P. Reed



On Sunday, July 5, 2020 4:55pm, "Andy Lutomirski"  said:

> On Sun, Jul 5, 2020 at 12:52 PM David P. Reed  wrote:
>>
>> Thanks, will handle these. 2 questions below.
>>
>> On Sunday, July 5, 2020 2:22pm, "Andy Lutomirski"  said:
>>
>> > On Sat, Jul 4, 2020 at 1:38 PM David P. Reed  wrote:
>> >>
>> >> Fix: Mask undefined operation fault during emergency VMXOFF that must be
>> >> attempted to force cpu exit from VMX root operation.
>> >> Explanation: When a cpu may be in VMX root operation (only possible when
>> >> CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation
>> >> using VMXOFF. This is necessary, because any INIT will be masked while cpu
>> >> is in VMX root operation, but that state cannot be reliably
>> >> discerned by the state of the cpu.
>> >> VMXOFF faults if the cpu is not actually in VMX root operation, signalling
>> >> undefined operation.
>> >> Discovered while debugging an out-of-tree x-visor with a race. Can happen
>> >> due to certain kinds of bugs in KVM.
>> >
>> > Can you re-wrap lines to 68 characters?  Also, the Fix: and
>>
>> I used 'scripts/checkpatch.pl' and it had me wrap to 75 chars:
>> "WARNING: Possible unwrapped commit description (prefer a maximum 75 chars 
>> per
>> line)"
>>
>> Should I submit a fix to checkpatch.pl to say 68?
> 
> 75 is probably fine too, but something is odd about your wrapping.
> You have long lines mostly alternating with short lines.  It's as if
> you wrote 120-ish character lines and then wrapped to 75 without
> reflowing.
My emacs settings tend to wrap at about 85 depending on file type (big 
screens). I did the shortening manually, aimed at breaking at meaningful 
points, not worrying too much about
line-length uniformity.

> 
>>
>> > Explanation: is probably unnecessary.  You could say:
>> >
>> > Ignore a potential #UD failut during emergency VMXOFF ...
>> >
>> > When a cpu may be in VMX ...
>> >
>> >>
>> >> Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067>
>> >> Reported-by: David P. Reed 
>> >
>> > It's not really necessary to say that you, the author, reported the
>> > problem, but I guess it's harmless.
>> >
>> >> Suggested-by: Thomas Gleixner 
>> >> Suggested-by: Sean Christopherson 
>> >> Suggested-by: Andy Lutomirski 
>> >> Signed-off-by: David P. Reed 
>> >> ---
>> >>  arch/x86/include/asm/virtext.h | 20 ++--
>> >>  1 file changed, 14 insertions(+), 6 deletions(-)
>> >>
>> >> diff --git a/arch/x86/include/asm/virtext.h 
>> >> b/arch/x86/include/asm/virtext.h
>> >> index 0ede8d04535a..0e0900eacb9c 100644
>> >> --- a/arch/x86/include/asm/virtext.h
>> >> +++ b/arch/x86/include/asm/virtext.h
>> >> @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void)
>> >>  }
>> >>
>> >>
>> >> -/* Disable VMX on the current CPU
>> >> +/* Exit VMX root mode and isable VMX on the current CPU.
>> >
>> > s/isable/disable/
>> >
>> >
>> >>  /* Disable VMX if it is supported and enabled on the current CPU
>> >> --
>> >> 2.26.2
>> >>
>> >
>> > Other than that:
>> >
>> > Reviewed-by: Andy Lutomirski 
>>
>> As a newbie, I have a process question - should I resend the patch with the
>> 'Reviewed-by' line, as well as correcting the other wording? Thanks!
> 
> Probably.  Sometimes a maintainer will apply the patch and make these
> types of cosmetic changes, but it's easier if you resubmit.  That
> being said, for non-urgent patches, it's usually considered polite to
> wait a day or two to give other people a chance to comment.

I'm not sure which maintainer will move the patches along. I am waiting for 
additional input, but will resubmit in a day or two.

> 
> --Andy
>

Re: [PATCH v3 3/3] Force all cpus to exit VMX root operation on crash/panic reliably

2020-07-05 Thread David P. Reed



On Sunday, July 5, 2020 2:26pm, "Andy Lutomirski"  said:

> On Sat, Jul 4, 2020 at 1:38 PM David P. Reed  wrote:
>>
>> Fix the logic during crash/panic reboot on Intel processors that
>> can support VMX operation to ensure that all processors are not
>> in VMX root operation. Prior code made optimistic assumptions
>> about other cpus that would leave other cpus in VMX root operation
>> depending on timing of crash/panic reboot.
>> Builds on cpu_ermergency_vmxoff() and __cpu_emergency_vmxoff() created
>> in a prior patch.
>>
>> Suggested-by: Sean Christopherson 
>> Signed-off-by: David P. Reed 
>> ---
>>  arch/x86/kernel/reboot.c | 20 +++-
>>  1 file changed, 7 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
>> index 0ec7ced727fe..c8e96ba78efc 100644
>> --- a/arch/x86/kernel/reboot.c
>> +++ b/arch/x86/kernel/reboot.c
>> @@ -543,24 +543,18 @@ static void emergency_vmx_disable_all(void)
>>  * signals when VMX is enabled.
>>  *
>>  * We can't take any locks and we may be on an inconsistent
>> -* state, so we use NMIs as IPIs to tell the other CPUs to disable
>> -* VMX and halt.
>> +* state, so we use NMIs as IPIs to tell the other CPUs to exit
>> +* VMX root operation and halt.
>>  *
>>  * For safety, we will avoid running the nmi_shootdown_cpus()
>>  * stuff unnecessarily, but we don't have a way to check
>> -* if other CPUs have VMX enabled. So we will call it only if the
>> -* CPU we are running on has VMX enabled.
>> -*
>> -* We will miss cases where VMX is not enabled on all CPUs. This
>> -* shouldn't do much harm because KVM always enable VMX on all
>> -* CPUs anyway. But we can miss it on the small window where KVM
>> -* is still enabling VMX.
>> +* if other CPUs might be in VMX root operation.
>>  */
>> -   if (cpu_has_vmx() && cpu_vmx_enabled()) {
>> -   /* Disable VMX on this CPU. */
>> -   cpu_vmxoff();
>> +   if (cpu_has_vmx()) {
>> +   /* Safely force out of VMX root operation on this CPU. */
>> +   __cpu_emergency_vmxoff();
>>
>> -   /* Halt and disable VMX on the other CPUs */
>> +   /* Halt and exit VMX root operation on the other CPUs */
>> nmi_shootdown_cpus(vmxoff_nmi);
>>
>> }
> 
> Seems reasonable to me.
> 
> As a minor caveat, doing cr4_clear_bits() in NMI context is not really
> okay, but we're about to reboot, so nothing too awful should happen.
> And this has very little to do with your patch.

I had wondered why the bit is cleared, too. (I assumed it was OK or desirable, 
because it was being cleared in NMI context before). Happy to submit a separate 
patch to eliminate that issue as well, since the point of emergency vmxoff is 
only to get out of VMX root mode - CR4.VMXE's state is irrelevant. Of course, 
clearing it prevents any future emergency vmxoff attempts. (there seemed to be 
some confusion about "enabling" VMX vs. "in VMX operation" in the comments)  
Should I?

> 
> Reviewed-by: Andy Lutomirski 
>

Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic

2020-07-05 Thread David P. Reed

Thanks, will handle these. 2 questions below.

On Sunday, July 5, 2020 2:22pm, "Andy Lutomirski"  said:

> On Sat, Jul 4, 2020 at 1:38 PM David P. Reed  wrote:
>>
>> Fix: Mask undefined operation fault during emergency VMXOFF that must be
>> attempted to force cpu exit from VMX root operation.
>> Explanation: When a cpu may be in VMX root operation (only possible when
>> CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation
>> using VMXOFF. This is necessary, because any INIT will be masked while cpu
>> is in VMX root operation, but that state cannot be reliably
>> discerned by the state of the cpu.
>> VMXOFF faults if the cpu is not actually in VMX root operation, signalling
>> undefined operation.
>> Discovered while debugging an out-of-tree x-visor with a race. Can happen
>> due to certain kinds of bugs in KVM.
> 
> Can you re-wrap lines to 68 characters?  Also, the Fix: and

I used 'scripts/checkpatch.pl' and it had me wrap to 75 chars:
"WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per 
line)"

Should I submit a fix to checkpatch.pl to say 68? 

> Explanation: is probably unnecessary.  You could say:
> 
> Ignore a potential #UD failut during emergency VMXOFF ...
> 
> When a cpu may be in VMX ...
> 
>>
>> Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067>
>> Reported-by: David P. Reed 
> 
> It's not really necessary to say that you, the author, reported the
> problem, but I guess it's harmless.
> 
>> Suggested-by: Thomas Gleixner 
>> Suggested-by: Sean Christopherson 
>> Suggested-by: Andy Lutomirski 
>> Signed-off-by: David P. Reed 
>> ---
>>  arch/x86/include/asm/virtext.h | 20 ++--
>>  1 file changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
>> index 0ede8d04535a..0e0900eacb9c 100644
>> --- a/arch/x86/include/asm/virtext.h
>> +++ b/arch/x86/include/asm/virtext.h
>> @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void)
>>  }
>>
>>
>> -/* Disable VMX on the current CPU
>> +/* Exit VMX root mode and isable VMX on the current CPU.
> 
> s/isable/disable/
> 
> 
>>  /* Disable VMX if it is supported and enabled on the current CPU
>> --
>> 2.26.2
>>
> 
> Other than that:
> 
> Reviewed-by: Andy Lutomirski 

As a newbie, I have a process question - should I resend the patch with the 
'Reviewed-by' line, as well as correcting the other wording? Thanks!

> 
> --Andy
>

[PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic

2020-07-04 Thread David P. Reed

Fix: Mask undefined operation fault during emergency VMXOFF that must be
attempted to force cpu exit from VMX root operation.
Explanation: When a cpu may be in VMX root operation (only possible when
CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation
using VMXOFF. This is necessary, because any INIT will be masked while cpu
is in VMX root operation, but that state cannot be reliably
discerned by the state of the cpu.
VMXOFF faults if the cpu is not actually in VMX root operation, signalling
undefined operation.
Discovered while debugging an out-of-tree x-visor with a race. Can happen
due to certain kinds of bugs in KVM.

Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067>
Reported-by: David P. Reed 
Suggested-by: Thomas Gleixner 
Suggested-by: Sean Christopherson 
Suggested-by: Andy Lutomirski 
Signed-off-by: David P. Reed 
---
 arch/x86/include/asm/virtext.h | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 0ede8d04535a..0e0900eacb9c 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void)
 }
 
 
-/* Disable VMX on the current CPU
+/* Exit VMX root mode and isable VMX on the current CPU.
  *
  * vmxoff causes a undefined-opcode exception if vmxon was not run
- * on the CPU previously. Only call this function if you know VMX
- * is enabled.
+ * on the CPU previously. Only call this function if you know cpu
+ * is in VMX root mode.
  */
 static inline void cpu_vmxoff(void)
 {
@@ -47,14 +47,22 @@ static inline int cpu_vmx_enabled(void)
return __read_cr4() & X86_CR4_VMXE;
 }
 
-/* Disable VMX if it is enabled on the current CPU
+/* Safely exit VMX root mode and disable VMX if VMX enabled
+ * on the current CPU. Handle undefined-opcode fault
+ * that can occur if cpu is not in VMX root mode, due
+ * to a race.
  *
  * You shouldn't call this if cpu_has_vmx() returns 0.
  */
 static inline void __cpu_emergency_vmxoff(void)
 {
-   if (cpu_vmx_enabled())
-   cpu_vmxoff();
+   if (!cpu_vmx_enabled())
+   return;
+   asm volatile ("1:vmxoff\n\t"
+ "2:\n\t"
+ _ASM_EXTABLE(1b, 2b)
+ ::: "cc", "memory");
+   cr4_clear_bits(X86_CR4_VMXE);
 }
 
 /* Disable VMX if it is supported and enabled on the current CPU
-- 
2.26.2

[PATCH v3 0/3] Fix undefined operation VMXOFF during reboot and crash

2020-07-04 Thread David P. Reed

At the request of Sean Christopherson, the original patch was split
into three patches, each fixing a distinct issue related to the original
bug, of a hang due to VMXOFF causing an undefined operation fault
when the kernel reboots with CR4.VMXE set. The combination of
the patches is the complete fix to the reported bug, and a lurking
error in asm side effects.

David P. Reed (3):
  Correct asm VMXOFF side effects
  Fix undefined operation fault that can hang a cpu on crash or panic
  Force all cpus to exit VMX root operation on crash/panic reliably

 arch/x86/include/asm/virtext.h | 24 
 arch/x86/kernel/reboot.c   | 20 +++-
 2 files changed, 23 insertions(+), 21 deletions(-)

-- 
2.26.2

[PATCH v3 3/3] Force all cpus to exit VMX root operation on crash/panic reliably

2020-07-04 Thread David P. Reed

Fix the logic during crash/panic reboot on Intel processors that
can support VMX operation to ensure that all processors are not
in VMX root operation. Prior code made optimistic assumptions
about other cpus that would leave other cpus in VMX root operation
depending on timing of crash/panic reboot.
Builds on cpu_ermergency_vmxoff() and __cpu_emergency_vmxoff() created
in a prior patch.

Suggested-by: Sean Christopherson 
Signed-off-by: David P. Reed 
---
 arch/x86/kernel/reboot.c | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 0ec7ced727fe..c8e96ba78efc 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -543,24 +543,18 @@ static void emergency_vmx_disable_all(void)
 * signals when VMX is enabled.
 *
 * We can't take any locks and we may be on an inconsistent
-* state, so we use NMIs as IPIs to tell the other CPUs to disable
-* VMX and halt.
+* state, so we use NMIs as IPIs to tell the other CPUs to exit
+* VMX root operation and halt.
 *
 * For safety, we will avoid running the nmi_shootdown_cpus()
 * stuff unnecessarily, but we don't have a way to check
-* if other CPUs have VMX enabled. So we will call it only if the
-* CPU we are running on has VMX enabled.
-*
-* We will miss cases where VMX is not enabled on all CPUs. This
-* shouldn't do much harm because KVM always enable VMX on all
-* CPUs anyway. But we can miss it on the small window where KVM
-* is still enabling VMX.
+* if other CPUs might be in VMX root operation.
 */
-   if (cpu_has_vmx() && cpu_vmx_enabled()) {
-   /* Disable VMX on this CPU. */
-   cpu_vmxoff();
+   if (cpu_has_vmx()) {
+   /* Safely force out of VMX root operation on this CPU. */
+   __cpu_emergency_vmxoff();
 
-   /* Halt and disable VMX on the other CPUs */
+   /* Halt and exit VMX root operation on the other CPUs */
nmi_shootdown_cpus(vmxoff_nmi);
 
}
-- 
2.26.2

[PATCH v3 1/3] Correct asm VMXOFF side effects

2020-07-04 Thread David P. Reed

Tell gcc that VMXOFF instruction clobbers condition codes
and memory when executed.
Also, correct original comments to remove kernel-doc syntax
per Randy Dunlap's request.

Suggested-by: Randy Dunlap 
Signed-off-by: David P. Reed 
---
 arch/x86/include/asm/virtext.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 9aad0e0876fb..0ede8d04535a 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -30,7 +30,7 @@ static inline int cpu_has_vmx(void)
 }
 
 
-/** Disable VMX on the current CPU
+/* Disable VMX on the current CPU
  *
  * vmxoff causes a undefined-opcode exception if vmxon was not run
  * on the CPU previously. Only call this function if you know VMX
@@ -38,7 +38,7 @@ static inline int cpu_has_vmx(void)
  */
 static inline void cpu_vmxoff(void)
 {
-   asm volatile ("vmxoff");
+   asm volatile ("vmxoff" ::: "cc", "memory");
cr4_clear_bits(X86_CR4_VMXE);
 }
 
@@ -47,7 +47,7 @@ static inline int cpu_vmx_enabled(void)
return __read_cr4() & X86_CR4_VMXE;
 }
 
-/** Disable VMX if it is enabled on the current CPU
+/* Disable VMX if it is enabled on the current CPU
  *
  * You shouldn't call this if cpu_has_vmx() returns 0.
  */
@@ -57,7 +57,7 @@ static inline void __cpu_emergency_vmxoff(void)
cpu_vmxoff();
 }
 
-/** Disable VMX if it is supported and enabled on the current CPU
+/* Disable VMX if it is supported and enabled on the current CPU
  */
 static inline void cpu_emergency_vmxoff(void)
 {
-- 
2.26.2

Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

2020-06-29 Thread David P. Reed



On Monday, June 29, 2020 5:49pm, "Sean Christopherson" 
 said:

> On Mon, Jun 29, 2020 at 02:22:45PM -0700, Andy Lutomirski wrote:
>>
>>
>> > On Jun 29, 2020, at 1:54 PM, David P. Reed  wrote:
>> >
>> > Simple question for those on the To: and CC: list here. Should I
>> > abandon any hope of this patch being accepted? It's been a long time.
>> >
>> > The non-response after I acknowledged that this was discovered by when
>> > working on a personal, non-commercial research project - which is
>> > "out-of-tree" (apparently dirty words on LKML) has me thinking my
>> > contribution is unwanted. That's fine, I suppose. I can maintain this patch
>> > out-of-tree as well.  I did incorporate all the helpful suggestions I
>> > received in this second patch, and given some encouragement, will happily
>> > submit a revised v3 if there is any likelihood of acceptance. I'm wary of
>> > doing more radical changes (like combining emergency and normal paths).
>> >
>>
>> Sorry about being slow and less actively encouraging than we should be. We
>> absolutely welcome personal contributions. The actual problem is that
>> everyone is worked and we’re all slow. Also, you may be hitting a corner
>> case
>> in the process: is this a KVM patch or an x86 patch?
> 
> It's an x86 patch as it's not KVM specific, e.g. this code also helps play
> nice with out of tree hypervisors.
> 
> The code change is mostly good, but it needs to be split up as there are
> three separate fixes:
> 
>   1. Handle #UD on VMXON due to a race.
>   2. Mark memory and flags as clobbered by VMXON.
>   3. Change emergency_vmx_disable_all() to not manually check 
> cpu_vmx_enabled().
> 
> Yes, the changes are tiny, but if for example #3 introduces a bug then we
> don't have to revert #1 and #2.  Or perhaps older kernels are only subject
> to the #1 and #2 and thus dumping all three changes into a single patch makes
> it all harder to backport.  In other words, all the usual "one change per
> patch" reasons.
> 
Thanks. If no one else responds with additional suggestions, I will make it 
into 3 patches.
I'm happy to learn the nuances of the kernel patch regimen.

Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

2020-06-29 Thread David P. Reed

Simple question for those on the To: and CC: list here. Should I abandon any 
hope of this patch being accepted? It's been a long time.

The non-response after I acknowledged that this was discovered by when working 
on a personal, non-commercial research project - which is "out-of-tree" 
(apparently dirty words on LKML) has me thinking my contribution is unwanted. 
That's fine, I suppose. I can maintain this patch out-of-tree as well.
I did incorporate all the helpful suggestions I received in this second patch, 
and given some encouragement, will happily submit a revised v3 if there is any 
likelihood of acceptance. I'm wary of doing more radical changes (like 
combining emergency and normal paths).


On Thursday, June 25, 2020 10:59am, "David P. Reed"  said:

> Correction to my comment below.
> On Thursday, June 25, 2020 10:45am, "David P. Reed"  
> said:
> 
>> [Sorry: this is resent because my mailer included HTML, rejected by LKML]
>> Thanks for the response, Sean ... I had thought everyone was too busy to 
>> follow
>> up
>> from the first version.
>>  
>> I confess I'm not sure why this should be broken up into a patch series, 
>> given
>> that it is so very small and is all aimed at the same category of bug.
>>  
>> And the "emergency" path pre-existed, I didn't want to propose removing it, 
>> since
>> I assumed it was there for a reason. I didn't want to include my own 
>> judgement as
>> to whether there should only be one path. (I'm pretty sure I didn't find a 
>> VMXOFF
>> in KVM separately from the instance in this include file, but I will check).
> Just checked. Yes, the kvm code's handling of VMXOFF is separate, and though 
> it
> uses exception masking, seems to do other things, perhaps related to nested 
> KVM,
> but I haven't studied the deep logic of KVM nesting.
> 
>>  
>> A question: if I make it a series, I have to test each patch doesn't break
>> something individually, in order to handle the case where one patch is 
>> accepted
>> and the others are not. Do I need to test each individual patch thoroughly 
>> as an
>> independent patch against all those cases?
>> I know the combination don't break anything and fixes the issues I've 
>> discovered
>> by testing all combinations (and I've done some thorough testing of panics,
>> oopses
>> crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash
>> source
>> to verify the fix fixes the problem's manifestations and to verify that it
>> doesn't
>> break any of the working paths.
>>  
>> That said, I'm willing to do a v3 "series" based on these suggestions if it 
>> will
>> smooth its acceptance. If it's not going to get accepted after doing that, my
>> motivation is flagging.
>> On Thursday, June 25, 2020 2:06am, "Sean Christopherson"
>>  said:
>>
>>
>>
>>> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote:
>>> > -/** Disable VMX on the current CPU
>>> > +/* Disable VMX on the current CPU
>>> > *
>>> > - * vmxoff causes a undefined-opcode exception if vmxon was not run
>>> > - * on the CPU previously. Only call this function if you know VMX
>>> > - * is enabled.
>>> > + * vmxoff causes an undefined-opcode exception if vmxon was not run
>>> > + * on the CPU previously. Only call this function directly if you know 
>>> > VMX
>>> > + * is enabled *and* CPU is in VMX root operation.
>>> > */
>>> > static inline void cpu_vmxoff(void)
>>> > {
>>> > - asm volatile ("vmxoff");
>>> > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on 
>>> > success
>>> */
>>> > cr4_clear_bits(X86_CR4_VMXE);
>>> > }
>>> >
>>> > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
>>> > return __read_cr4() & X86_CR4_VMXE;
>>> > }
>>> >
>>> > -/** Disable VMX if it is enabled on the current CPU
>>> > - *
>>> > - * You shouldn't call this if cpu_has_vmx() returns 0.
>>> > +/*
>>> > + * Safely disable VMX root operation if active
>>> > + * Note that if CPU is not in VMX root operation this
>>> > + * VMXOFF will fault an undefined operation fault,
>>> > + * so use the exception masking facility to handle that RARE
>>> > + * case.
>

Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

2020-06-25 Thread David P. Reed

Correction to my comment below.
On Thursday, June 25, 2020 10:45am, "David P. Reed"  said:

> [Sorry: this is resent because my mailer included HTML, rejected by LKML]
> Thanks for the response, Sean ... I had thought everyone was too busy to 
> follow up
> from the first version.
>  
> I confess I'm not sure why this should be broken up into a patch series, given
> that it is so very small and is all aimed at the same category of bug.
>  
> And the "emergency" path pre-existed, I didn't want to propose removing it, 
> since
> I assumed it was there for a reason. I didn't want to include my own 
> judgement as
> to whether there should only be one path. (I'm pretty sure I didn't find a 
> VMXOFF
> in KVM separately from the instance in this include file, but I will check).
Just checked. Yes, the kvm code's handling of VMXOFF is separate, and though it 
uses exception masking, seems to do other things, perhaps related to nested 
KVM, but I haven't studied the deep logic of KVM nesting.

>  
> A question: if I make it a series, I have to test each patch doesn't break
> something individually, in order to handle the case where one patch is 
> accepted
> and the others are not. Do I need to test each individual patch thoroughly as 
> an
> independent patch against all those cases?
> I know the combination don't break anything and fixes the issues I've 
> discovered
> by testing all combinations (and I've done some thorough testing of panics, 
> oopses
> crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash 
> source
> to verify the fix fixes the problem's manifestations and to verify that it 
> doesn't
> break any of the working paths.
>  
> That said, I'm willing to do a v3 "series" based on these suggestions if it 
> will
> smooth its acceptance. If it's not going to get accepted after doing that, my
> motivation is flagging.
> On Thursday, June 25, 2020 2:06am, "Sean Christopherson"
>  said:
> 
> 
> 
>> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote:
>> > -/** Disable VMX on the current CPU
>> > +/* Disable VMX on the current CPU
>> > *
>> > - * vmxoff causes a undefined-opcode exception if vmxon was not run
>> > - * on the CPU previously. Only call this function if you know VMX
>> > - * is enabled.
>> > + * vmxoff causes an undefined-opcode exception if vmxon was not run
>> > + * on the CPU previously. Only call this function directly if you know VMX
>> > + * is enabled *and* CPU is in VMX root operation.
>> > */
>> > static inline void cpu_vmxoff(void)
>> > {
>> > - asm volatile ("vmxoff");
>> > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on 
>> > success
>> */
>> > cr4_clear_bits(X86_CR4_VMXE);
>> > }
>> >
>> > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
>> > return __read_cr4() & X86_CR4_VMXE;
>> > }
>> >
>> > -/** Disable VMX if it is enabled on the current CPU
>> > - *
>> > - * You shouldn't call this if cpu_has_vmx() returns 0.
>> > +/*
>> > + * Safely disable VMX root operation if active
>> > + * Note that if CPU is not in VMX root operation this
>> > + * VMXOFF will fault an undefined operation fault,
>> > + * so use the exception masking facility to handle that RARE
>> > + * case.
>> > + * You shouldn't call this directly if cpu_has_vmx() returns 0
>> > + */
>> > +static inline void cpu_vmxoff_safe(void)
>> > +{
>> > + asm volatile("1:vmxoff\n\t" /* clears all flags on success */
>>
>> Eh, I wouldn't bother with the comment, there are a million other caveats
>> with VMXOFF that are far more interesting.
>>
>> > + "2:\n\t"
>> > + _ASM_EXTABLE(1b, 2b)
>> > + ::: "cc", "memory");
>>
>> Adding the memory and flags clobber should be a separate patch.
>>
>> > + cr4_clear_bits(X86_CR4_VMXE);
>> > +}
>>
>>
>> I don't see any value in safe/unsafe variants. The only in-kernel user of
>> VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF
>> helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add
>> the exception fixup to cpu_vmxoff() and call it good.
>>
>> > +
>> > +/*
>> > + * Force disable VMX if it is enabled on the current CPU,
>> > + * when it is unk

Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

2020-06-25 Thread David P. Reed

[Sorry: this is resent because my mailer included HTML, rejected by LKML]
Thanks for the response, Sean ... I had thought everyone was too busy to follow 
up from the first version.

I confess I'm not sure why this should be broken up into a patch series, given 
that it is so very small and is all aimed at the same category of bug.

And the "emergency" path pre-existed, I didn't want to propose removing it, 
since I assumed it was there for a reason. I didn't want to include my own 
judgement as to whether there should only be one path. (I'm pretty sure I 
didn't find a VMXOFF in KVM separately from the instance in this include file, 
but I will check).

A question: if I make it a series, I have to test each patch doesn't break 
something individually, in order to handle the case where one patch is accepted 
and the others are not. Do I need to test each individual patch thoroughly as 
an independent patch against all those cases?
I know the combination don't break anything and fixes the issues I've 
discovered by testing all combinations (and I've done some thorough testing of 
panics, oopses crashes, kexec, ... under all combinations of CR4.VMXE 
enablement and crash source to verify the fix fixes the problem's 
manifestations and to verify that it doesn't break any of the working paths.

That said, I'm willing to do a v3 "series" based on these suggestions if it 
will smooth its acceptance. If it's not going to get accepted after doing that, 
my motivation is flagging.
On Thursday, June 25, 2020 2:06am, "Sean Christopherson" 
 said:

> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote:
> > -/** Disable VMX on the current CPU
> > +/* Disable VMX on the current CPU
> > *
> > - * vmxoff causes a undefined-opcode exception if vmxon was not run
> > - * on the CPU previously. Only call this function if you know VMX
> > - * is enabled.
> > + * vmxoff causes an undefined-opcode exception if vmxon was not run
> > + * on the CPU previously. Only call this function directly if you know VMX
> > + * is enabled *and* CPU is in VMX root operation.
> > */
> > static inline void cpu_vmxoff(void)
> > {
> > - asm volatile ("vmxoff");
> > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success
> */
> > cr4_clear_bits(X86_CR4_VMXE);
> > }
> >
> > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
> > return __read_cr4() & X86_CR4_VMXE;
> > }
> >
> > -/** Disable VMX if it is enabled on the current CPU
> > - *
> > - * You shouldn't call this if cpu_has_vmx() returns 0.
> > +/*
> > + * Safely disable VMX root operation if active
> > + * Note that if CPU is not in VMX root operation this
> > + * VMXOFF will fault an undefined operation fault,
> > + * so use the exception masking facility to handle that RARE
> > + * case.
> > + * You shouldn't call this directly if cpu_has_vmx() returns 0
> > + */
> > +static inline void cpu_vmxoff_safe(void)
> > +{
> > + asm volatile("1:vmxoff\n\t" /* clears all flags on success */
> 
> Eh, I wouldn't bother with the comment, there are a million other caveats
> with VMXOFF that are far more interesting.
> 
> > + "2:\n\t"
> > + _ASM_EXTABLE(1b, 2b)
> > + ::: "cc", "memory");
> 
> Adding the memory and flags clobber should be a separate patch.
> 
> > + cr4_clear_bits(X86_CR4_VMXE);
> > +}
> 
> 
> I don't see any value in safe/unsafe variants. The only in-kernel user of
> VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF
> helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add
> the exception fixup to cpu_vmxoff() and call it good.
> 
> > +
> > +/*
> > + * Force disable VMX if it is enabled on the current CPU,
> > + * when it is unknown whether CPU is in VMX operation.
> > */
> > static inline void __cpu_emergency_vmxoff(void)
> > {
> > - if (cpu_vmx_enabled())
> > - cpu_vmxoff();
> > + if (!cpu_vmx_enabled())
> > + return;
> > + cpu_vmxoff_safe();
> 
> Unnecessary churn.
> 
> > }
> >
> > -/** Disable VMX if it is supported and enabled on the current CPU
> > +/* Force disable VMX if it is supported on current CPU
> > */
> > static inline void cpu_emergency_vmxoff(void)
> > {
> > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> > index e040ba6be27b..b0e6b106a67e 100644
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @

[PATCH v2] Fix undefined operation VMXOFF during reboot and crash

2020-06-11 Thread David P. Reed

If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
done on all CPUS, to allow the INIT IPI to function, since
INIT is suppressed when CPUs are in VMX root operation.
Problem is that  VMXOFF will causes undefined operation fault when CPU not
in VMX operation, that is, VMXON has not been executed yet, or VMXOFF
has been execute but VMX still enabled. Patch makes the reboot
work more reliably by masking the exception on VMXOFF in the
crash/panic/reboot path, which uses cpu_emergency_vmxoff().
Can happen with KVM due to a race, but that race is rare today.
Problem discovered doing out-of-tree x-visor development that uses VMX
in a novel way for kernel performance analysis.
The logic in reboot.c is also corrected, since the point of forcing
the processor out of VMX root operation is to allow the INIT signal
to be unmasked. See Intel SDM section on differences between VMX Root
operation and normal operation. Thus every CPU must be forced out of
VMX operation. Since the CPU may hang rather if INIT fails than restart,
a manual hardware "reset" is the only way out of this state in a
lights-out datacenter (well, if there is a BMC, it can issue a
hardware RESET to the chip).
Style errors in original file fixed, at request of Randy Dunlap:
eliminate '/**' in non-kernel-doc comments.

Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067>
Reported-by: David P. Reed 
Reported-by: Randy Dunlap 
Suggested-by: Thomas Gleixner 
Suggested-by: Sean Christopherson 
Suggested-by: Andy Lutomirski 
Signed-off-by: David P. Reed 
---
 arch/x86/include/asm/virtext.h | 40 --
 arch/x86/kernel/reboot.c   | 13 +++
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 9aad0e0876fb..ed22c1983da8 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -30,15 +30,15 @@ static inline int cpu_has_vmx(void)
 }
 
 
-/** Disable VMX on the current CPU
+/* Disable VMX on the current CPU
  *
- * vmxoff causes a undefined-opcode exception if vmxon was not run
- * on the CPU previously. Only call this function if you know VMX
- * is enabled.
+ * vmxoff causes an undefined-opcode exception if vmxon was not run
+ * on the CPU previously. Only call this function directly if you know VMX
+ * is enabled *and* CPU is in VMX root operation.
  */
 static inline void cpu_vmxoff(void)
 {
-   asm volatile ("vmxoff");
+   asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on 
success */
cr4_clear_bits(X86_CR4_VMXE);
 }
 
@@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
return __read_cr4() & X86_CR4_VMXE;
 }
 
-/** Disable VMX if it is enabled on the current CPU
- *
- * You shouldn't call this if cpu_has_vmx() returns 0.
+/*
+ * Safely disable VMX root operation if active
+ * Note that if CPU is not in VMX root operation this
+ * VMXOFF will fault an undefined operation fault,
+ * so use the exception masking facility to handle that RARE
+ * case.
+ * You shouldn't call this directly if cpu_has_vmx() returns 0
+ */
+static inline void cpu_vmxoff_safe(void)
+{
+   asm volatile("1:vmxoff\n\t" /* clears all flags on success */
+   "2:\n\t"
+_ASM_EXTABLE(1b, 2b)
+::: "cc",  "memory");
+   cr4_clear_bits(X86_CR4_VMXE);
+}
+
+/*
+ * Force disable VMX if it is enabled on the current CPU,
+ * when it is unknown whether CPU is in VMX operation.
  */
 static inline void __cpu_emergency_vmxoff(void)
 {
-   if (cpu_vmx_enabled())
-   cpu_vmxoff();
+   if (!cpu_vmx_enabled())
+   return;
+   cpu_vmxoff_safe();
 }
 
-/** Disable VMX if it is supported and enabled on the current CPU
+/* Force disable VMX if it is supported on current CPU
  */
 static inline void cpu_emergency_vmxoff(void)
 {
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e040ba6be27b..b0e6b106a67e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
 *
 * For safety, we will avoid running the nmi_shootdown_cpus()
 * stuff unnecessarily, but we don't have a way to check
-* if other CPUs have VMX enabled. So we will call it only if the
-* CPU we are running on has VMX enabled.
-*
-* We will miss cases where VMX is not enabled on all CPUs. This
-* shouldn't do much harm because KVM always enable VMX on all
-* CPUs anyway. But we can miss it on the small window where KVM
-* is still enabling VMX.
+* if other CPUs have VMX enabled.
 */
-   if (cpu_has_vmx() && cpu_vmx_enabled()) {
+   if (cpu_has_vmx()) {
/* Disable VMX on this

[PATCH v2] Fix undefined operation VMXOFF during reboot and crash

2020-06-11 Thread David P. Reed

If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
done on all CPUS, to allow the INIT IPI to function, since
INIT is suppressed when CPUs are in VMX root operation.
Problem is that  VMXOFF will causes undefined operation fault when CPU not
in VMX operation, that is, VMXON has not been executed yet, or VMXOFF
has been execute but VMX still enabled. Patch makes the reboot
work more reliably by masking the exception on VMXOFF in the
crash/panic/reboot path, which uses cpu_emergency_vmxoff().
Can happen with KVM due to a race, but that race is rare today.
Problem discovered doing out-of-tree x-visor development that uses VMX
in a novel way for kernel performance analysis.
The logic in reboot.c is also corrected, since the point of forcing
the processor out of VMX root operation is to allow the INIT signal
to be unmasked. See Intel SDM section on differences between VMX Root
operation and normal operation. Thus every CPU must be forced out of
VMX operation. Since the CPU may hang rather if INIT fails than restart,
a manual hardware "reset" is the only way out of this state in a
lights-out datacenter (well, if there is a BMC, it can issue a
hardware RESET to the chip).
Style errors in original file fixed, at request of Randy Dunlap:
eliminate '/**' in non-kernel-doc comments.

Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067>
Reported-by: David P. Reed 
Reported-by: Randy Dunlap 
Suggested-by: Thomas Gleixner 
Suggested-by: Sean Christopherson 
Suggested-by: Andy Lutomirski 
Signed-off-by: David P. Reed 
---
 arch/x86/include/asm/virtext.h | 40 --
 arch/x86/kernel/reboot.c   | 13 +++
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 9aad0e0876fb..ed22c1983da8 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -30,15 +30,15 @@ static inline int cpu_has_vmx(void)
 }
 
 
-/** Disable VMX on the current CPU
+/* Disable VMX on the current CPU
  *
- * vmxoff causes a undefined-opcode exception if vmxon was not run
- * on the CPU previously. Only call this function if you know VMX
- * is enabled.
+ * vmxoff causes an undefined-opcode exception if vmxon was not run
+ * on the CPU previously. Only call this function directly if you know VMX
+ * is enabled *and* CPU is in VMX root operation.
  */
 static inline void cpu_vmxoff(void)
 {
-   asm volatile ("vmxoff");
+   asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on 
success */
cr4_clear_bits(X86_CR4_VMXE);
 }
 
@@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
return __read_cr4() & X86_CR4_VMXE;
 }
 
-/** Disable VMX if it is enabled on the current CPU
- *
- * You shouldn't call this if cpu_has_vmx() returns 0.
+/*
+ * Safely disable VMX root operation if active
+ * Note that if CPU is not in VMX root operation this
+ * VMXOFF will fault an undefined operation fault,
+ * so use the exception masking facility to handle that RARE
+ * case.
+ * You shouldn't call this directly if cpu_has_vmx() returns 0
+ */
+static inline void cpu_vmxoff_safe(void)
+{
+   asm volatile("1:vmxoff\n\t" /* clears all flags on success */
+   "2:\n\t"
+_ASM_EXTABLE(1b, 2b)
+::: "cc",  "memory");
+   cr4_clear_bits(X86_CR4_VMXE);
+}
+
+/*
+ * Force disable VMX if it is enabled on the current CPU,
+ * when it is unknown whether CPU is in VMX operation.
  */
 static inline void __cpu_emergency_vmxoff(void)
 {
-   if (cpu_vmx_enabled())
-   cpu_vmxoff();
+   if (!cpu_vmx_enabled())
+   return;
+   cpu_vmxoff_safe();
 }
 
-/** Disable VMX if it is supported and enabled on the current CPU
+/* Force disable VMX if it is supported on current CPU
  */
 static inline void cpu_emergency_vmxoff(void)
 {
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e040ba6be27b..b0e6b106a67e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
 *
 * For safety, we will avoid running the nmi_shootdown_cpus()
 * stuff unnecessarily, but we don't have a way to check
-* if other CPUs have VMX enabled. So we will call it only if the
-* CPU we are running on has VMX enabled.
-*
-* We will miss cases where VMX is not enabled on all CPUs. This
-* shouldn't do much harm because KVM always enable VMX on all
-* CPUs anyway. But we can miss it on the small window where KVM
-* is still enabling VMX.
+* if other CPUs have VMX enabled.
 */
-   if (cpu_has_vmx() && cpu_vmx_enabled()) {
+   if (cpu_has_vmx()) {
/* Disable VMX on this

[PATCH] Fix undefined operation VMXOFF during reboot and crash

2020-06-10 Thread David P. Reed

If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
done on all CPUS, to allow the INIT IPI to function, since
INIT is suppressed when CPUs are in VMX root operation.
However, VMXOFF causes an undefined operation fault if the CPU is not
in VMX operation, that is, VMXON has not been executed, or VMXOFF
has been executed, but VMX is enabled. This fix makes the reboot
work more reliably by modifying the #UD handler to skip the VMXOFF
if VMX is enabled on the CPU and the VMXOFF is executed as part
of cpu_emergency_vmxoff().
The logic in reboot.c is also corrected, since the point of forcing
the processor out of VMX root operation is because when VMX root
operation is enabled, the processor INIT signal is always masked.
See Intel SDM section on differences between VMX Root operation and normal
operation. Thus every CPU must be forced out of VMX operation.
Since the CPU will hang rather than restart, a manual "reset" is the
only way out of this state (or if there is a BMC, it can issue a RESET
to the chip).

Signed-off-by: David P. Reed 
---
 arch/x86/include/asm/virtext.h | 24 
 arch/x86/kernel/reboot.c   | 13 ++---
 arch/x86/kernel/traps.c| 52 --
 3 files changed, 71 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 9aad0e0876fb..ea2d67191684 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -13,12 +13,16 @@
 #ifndef _ASM_X86_VIRTEX_H
 #define _ASM_X86_VIRTEX_H
 
+#include 
+
 #include 
 
 #include 
 #include 
 #include 
 
+DECLARE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff);
+
 /*
  * VMX functions:
  */
@@ -33,8 +37,8 @@ static inline int cpu_has_vmx(void)
 /** Disable VMX on the current CPU
  *
  * vmxoff causes a undefined-opcode exception if vmxon was not run
- * on the CPU previously. Only call this function if you know VMX
- * is enabled.
+ * on the CPU previously. Only call this function directly if you know VMX
+ * is enabled *and* CPU is in VMX root operation.
  */
 static inline void cpu_vmxoff(void)
 {
@@ -47,17 +51,25 @@ static inline int cpu_vmx_enabled(void)
return __read_cr4() & X86_CR4_VMXE;
 }
 
-/** Disable VMX if it is enabled on the current CPU
+/** Force disable VMX if it is enabled on the current CPU.
+ * Note that if CPU is not in VMX root operation this
+ * VMXOFF will fault an undefined operation fault.
+ * So the 'doing_emergency_vmxoff' percpu flag is set,
+ * the trap handler for just restarts execution after
+ * the VMXOFF instruction.
  *
- * You shouldn't call this if cpu_has_vmx() returns 0.
+ * You shouldn't call this directly if cpu_has_vmx() returns 0.
  */
 static inline void __cpu_emergency_vmxoff(void)
 {
-   if (cpu_vmx_enabled())
+   if (cpu_vmx_enabled()) {
+   this_cpu_write(doing_emergency_vmxoff, 1);
cpu_vmxoff();
+   this_cpu_write(doing_emergency_vmxoff, 0);
+   }
 }
 
-/** Disable VMX if it is supported and enabled on the current CPU
+/** Force disable VMX if it is supported and enabled on the current CPU
  */
 static inline void cpu_emergency_vmxoff(void)
 {
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 3ca43be4f9cf..abc8b51a57c7 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
 *
 * For safety, we will avoid running the nmi_shootdown_cpus()
 * stuff unnecessarily, but we don't have a way to check
-* if other CPUs have VMX enabled. So we will call it only if the
-* CPU we are running on has VMX enabled.
-*
-* We will miss cases where VMX is not enabled on all CPUs. This
-* shouldn't do much harm because KVM always enable VMX on all
-* CPUs anyway. But we can miss it on the small window where KVM
-* is still enabling VMX.
+* if other CPUs have VMX enabled.
 */
-   if (cpu_has_vmx() && cpu_vmx_enabled()) {
+   if (cpu_has_vmx()) {
/* Disable VMX on this CPU. */
-   cpu_vmxoff();
+   cpu_emergency_vmxoff();
 
/* Halt and disable VMX on the other CPUs */
nmi_shootdown_cpus(vmxoff_nmi);
-
}
 }
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4cc541051994..2dcf57ef467e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -59,6 +60,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -70,6 +72,8 @@
 #include 
 #endif
 
+DEFINE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff) = 0;
+
 DECLARE_BITMAP(system_vectors, NR_VECTORS);
 
 static inline void cond_local_irq_enable(struct pt_regs *regs)
@@ -115,6 +119,43 @@ int fixup_bu

Re: [linux-kernel] Re: [PATCH] x86: use explicit timing delay for pit accesses in kernel and pcspkr driver

2008-02-20 Thread David P. Reed

Actually, disparaging things as "one idiotic system" doesn't seem like a 
long-term thoughtful process - it's not even accurate.  There are more 
such systems that are running code today than the total number of 486 
systems ever manufactured.  The production rate is $1M/month.


a) ENE chips are "documented" to receive port 80, and also it is the 
case that modern chipsets will happily diagnose writes to non-existent 
ports as MCE's.   Using side effects that depend on non-existent ports 
just creates a brittle failure mode down the road.  And it's not just 
post ACPI "initialization".   The pcspkr use of port 80 caused solid 
freezes if you typed "tab" to complete a command line and there were 
more than one choice, leading to beeps.


b) sad to say, Linux is not what hardware vendors use as the system that 
their BIOSes MUST work with.  That's Windows, and Windows, whether we 
like it or not does not require hardware vendors to stay away from port 80.


IMHO, calling something "idiotic" is hardly evidence-based decision 
making.   Maybe you love to hate Microsoft, but until Intel writes an 
architecture standard that says explicitly that a "standard PC" must not 
use port 80 for any peripheral, the port 80 thing is folklore, and one 
that is solely Linux-defined.


Rene Herman wrote:

On 20-02-08 18:05, H. Peter Anvin wrote:
 

Rene Herman wrote:


_Something_ like this would seem to be the only remaining option. It 
seems fairly unuseful to #ifdef around that switch statement for 
kernels without support for the earlier families, but if you insist...




"Only remaining option" other than the one we've had all along.  Even 
on the one idiotic set of systems which break, it only breaks 
post-ACPI intialization, IIRC.


Linus vetoed the DMI switch.

Rene.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: use explicit timing delay for pit accesses in kernel and pcspkr driver

2008-02-18 Thread David P. Reed

x86: use explicit timing delay for pit accesses in kernel and pcspkr driver

pit accesses in i8253.c and pcspkr driver use outb_p for timing.
Fix them to use explicit timing delay for access to PIT,
rather than inb_p/outb_p calls, which use insufficiently explicit
delays (defaulting to port 80 writes) that can cause freeze problems
on some machines, such as Quanta moterboard  machines using ENE EC's.

Since the pcspkr driver accesses PIT registers directly, it should also
use outb_pit, which is inlined, so does not need bo exported.
Explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT.
Fix pcspkr driver to use the new outb_pic call properly, use
named PIC port values rather than hex constants, and drop its use of
inb_p and outb_p in accessing port 61h where it has never been needed.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>

Index: linux-2.6/drivers/input/misc/pcspkr.c
===
--- linux-2.6.orig/drivers/input/misc/pcspkr.c
+++ linux-2.6/drivers/input/misc/pcspkr.c
@@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev
 {
unsigned int count = 0;
unsigned long flags;
+   unsigned char port61;
 
if (type != EV_SND)
return -1;
@@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev
 
spin_lock_irqsave(&i8253_lock, flags);
 
+   port61 = inb(0x61);
if (count) {
/* enable counter 2 */
-   outb_p(inb_p(0x61) | 3, 0x61);
+   outb(port61 | 3, 0x61);
/* set command for counter 2, 2 byte write */
-   outb_p(0xB6, 0x43);
+   outb_pit(0xB6, PIT_MODE);
/* select desired HZ */
-   outb_p(count & 0xff, 0x42);
-   outb((count >> 8) & 0xff, 0x42);
+   outb_pit(count & 0xff, PIT_CH2);
+   outb((count >> 8) & 0xff, PIT_CH2);
} else {
/* disable counter 2 */
-   outb(inb_p(0x61) & 0xFC, 0x61);
+   outb(port61 & 0xFC, 0x61);
}
 
spin_unlock_irqrestore(&i8253_lock, flags);
Index: linux-2.6/include/asm-x86/i8253.h
===
--- linux-2.6.orig/include/asm-x86/i8253.h
+++ linux-2.6/include/asm-x86/i8253.h
@@ -12,7 +12,25 @@ extern struct clock_event_device *global
 
 extern void setup_pit_timer(void);
 
-#define inb_pitinb_p
-#define outb_pit   outb_p
+/* accesses to PIT registers need careful delays on some platforms. Define
+   them here in a common place */
+static inline unsigned char inb_pit(unsigned int port)
+{
+   /* delay for some accesses to PIT on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   unsigned char value = inb(port);
+   udelay(2);
+   return value;
+}
+
+static inline void outb_pit(unsigned char value, unsigned int port)
+{
+   /* delay for some accesses to PIT on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   outb(value, port);
+   udelay(2);
+}
+
+
 
 #endif /* __ASM_I8253_H__ */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: define outb_pic and inb_pic to stop using outb_p and inb_p

2008-02-18 Thread David P. Reed

x86: define outb_pic and inb_pic to stop using outb_p and inb_p
 
The delay between io port accesses to the PIC is now defined using outb_pic
and inb_pic.  This fix provides the next step, using udelay(2) to define the
*PIC specific* timing requirements, rather than on bus-oriented timing, which
is not well calibrated.

Again, the primary reason for fixing this is to use proper delay strategy,
and in particular to fix crashes that can result from using port 80 writes
on machines that have resources on port 80, such as the ENE chips used by Quanta
in latops it designs and sells to, e.g. HP.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>
Index: linux-2.6/include/asm-x86/i8259.h
===
--- linux-2.6.orig/include/asm-x86/i8259.h
+++ linux-2.6/include/asm-x86/i8259.h
@@ -29,7 +29,23 @@ extern void enable_8259A_irq(unsigned in
 extern void disable_8259A_irq(unsigned int irq);
 extern unsigned int startup_8259A_irq(unsigned int irq);
 
-#define inb_picinb_p
-#define outb_pic   outb_p
+/* the PIC may need a careful delay on some platforms, hence specific calls */
+static inline unsigned char inb_pic(unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   unsigned char value = inb(port);
+   udelay(2);
+   return value;
+}
+
+static inline void outb_pic(unsigned char value, unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   outb(value, port);
+   udelay(2);
+}
+
 
 #endif /* __ASM_I8259_H__ */

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [patch 1/2] x86: define outb_pic and inb_pic to stop using outb_p and inb_p

2008-02-18 Thread David P. Reed


Alan Cox wrote:

+unsigned char inb_pic(unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   unsigned char value = inb(port);
+   udelay(2);
+   return value;
+}



inline it. Its almost no instructions

  
Will do. Assume you desire inlining of the outb_pic, and also the 
inb_pit and outb_pit routines.  Didn't do it because the code is 
slightly bigger than the call.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] x86: revised - use explicit timing delay for pit accesses

2008-02-18 Thread David P. Reed

x86: revised - use explicit timing delay for pit accesses

pit accesses in i8253.c and pcspkr driver use outb_p for timing.
Fix them to use explicit timing delay for access to PIT,
rather than inb_p/outb_p calls, which use insufficiently explicit
delays (defaulting to port 80 writes) that can cause freeze problems
on some machines, such as Quanta moterboard  machines using ENE EC's.

Since the pcspkr driver accesses PIT registers directly, it needs
the symbol outb_pit exported, so it can be built as a module.
Explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT.
Fix pcspkr driver to use the new outb_pic call properly, use
named PIC port values rather than hex constants, and drop its use of
inb_p and outb_p in accessing port 61h where it has never been needed.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>

Index: linux-2.6/drivers/input/misc/pcspkr.c
===
--- linux-2.6.orig/drivers/input/misc/pcspkr.c
+++ linux-2.6/drivers/input/misc/pcspkr.c
@@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev
 {
unsigned int count = 0;
unsigned long flags;
+   unsigned char port61;
 
if (type != EV_SND)
return -1;
@@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev
 
spin_lock_irqsave(&i8253_lock, flags);
 
+   port61 = inb(0x61);
if (count) {
/* enable counter 2 */
-   outb_p(inb_p(0x61) | 3, 0x61);
+   outb(port61 | 3, 0x61);
/* set command for counter 2, 2 byte write */
-   outb_p(0xB6, 0x43);
+   outb_pit(0xB6, PIT_MODE);
/* select desired HZ */
-   outb_p(count & 0xff, 0x42);
-   outb((count >> 8) & 0xff, 0x42);
+   outb_pit(count & 0xff, PIT_CH2);
+   outb((count >> 8) & 0xff, PIT_CH2);
} else {
/* disable counter 2 */
-   outb(inb_p(0x61) & 0xFC, 0x61);
+   outb(port61 & 0xFC, 0x61);
}
 
spin_unlock_irqrestore(&i8253_lock, flags);
Index: linux-2.6/arch/x86/kernel/i8253.c
===
--- linux-2.6.orig/arch/x86/kernel/i8253.c
+++ linux-2.6/arch/x86/kernel/i8253.c
@@ -31,6 +31,29 @@ static inline void pit_disable_clocksour
 struct clock_event_device *global_clock_event;
 
 /*
+ * define the PIT specific port access routines, which define the timing
+ * needed by the PIT registers on some platforms.
+ */
+unsigned char inb_pit(unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   unsigned char value = inb(port);
+   udelay(2);
+   return value;
+}
+EXPORT_SYMBOL(inb_pit);
+
+void outb_pit(unsigned char value, unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   outb(value, port);
+   udelay(2);
+}
+EXPORT_SYMBOL(outb_pit);
+
+/*
  * Initialize the PIT timer.
  *
  * This is also called after resume to bring the PIT into operation again.
Index: linux-2.6/include/asm-x86/i8253.h
===
--- linux-2.6.orig/include/asm-x86/i8253.h
+++ linux-2.6/include/asm-x86/i8253.h
@@ -12,7 +12,9 @@ extern struct clock_event_device *global
 
 extern void setup_pit_timer(void);
 
-#define inb_pitinb_p
-#define outb_pit   outb_p
+/* accesses to PIT registers need careful delays on some platforms. Define
+   them here in a common place */
+extern unsigned char inb_pit(unsigned int port);
+extern void outb_pit(unsigned char value, unsigned int port);
 
 #endif /* __ASM_I8253_H__ */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] [patch 2/2] x86: use explicit timing delay for pit accesses

2008-02-18 Thread David P. Reed

Oops. the patch I just submitted for i8253.c didn't export the symbol 
needed by the pcspkr driver to build it as a module.  I will send the 
revised patch shortly.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/2] x86: use explicit timing delay for pit accesses

2008-02-18 Thread David P. Reed

pit accesses in i8253.c and pcspkr driver use outb_p for timing.
Fix them to use explicit timing delay for access to PIT,
rather than inb_p/outb_p calls, which use insufficiently explicit
delays (defaulting to port 80 writes) that can cause freeze problems
on some machines, such as Quanta moterboard  machines using ENE EC's.

The explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT.
Fix pcspkr driver to use the new outb_pic call properly, use
named port values rather than hex constants, and drop its use of
inb_p and outb_p in accessing port 61h where it has never been needed.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>

Index: linux-2.6/drivers/input/misc/pcspkr.c
===
--- linux-2.6.orig/drivers/input/misc/pcspkr.c
+++ linux-2.6/drivers/input/misc/pcspkr.c
@@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev
 {
unsigned int count = 0;
unsigned long flags;
+   unsigned char port61;
 
if (type != EV_SND)
return -1;
@@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev
 
spin_lock_irqsave(&i8253_lock, flags);
 
+   port61 = inb(0x61);
if (count) {
/* enable counter 2 */
-   outb_p(inb_p(0x61) | 3, 0x61);
+   outb(port61 | 3, 0x61);
/* set command for counter 2, 2 byte write */
-   outb_p(0xB6, 0x43);
+   outb_pit(0xB6, PIT_MODE);
/* select desired HZ */
-   outb_p(count & 0xff, 0x42);
-   outb((count >> 8) & 0xff, 0x42);
+   outb_pit(count & 0xff, PIT_CH2);
+   outb((count >> 8) & 0xff, PIT_CH2);
} else {
/* disable counter 2 */
-   outb(inb_p(0x61) & 0xFC, 0x61);
+   outb(port61 & 0xFC, 0x61);
}
 
spin_unlock_irqrestore(&i8253_lock, flags);
Index: linux-2.6/arch/x86/kernel/i8253.c
===
--- linux-2.6.orig/arch/x86/kernel/i8253.c
+++ linux-2.6/arch/x86/kernel/i8253.c
@@ -31,6 +31,27 @@ static inline void pit_disable_clocksour
 struct clock_event_device *global_clock_event;
 
 /*
+ * define the PIT specific port access routines, which define the timing
+ * needed by the PIT registers on some platforms.
+ */
+unsigned char inb_pit(unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   unsigned char value = inb(port);
+   udelay(2);
+   return value;
+}
+
+void outb_pit(unsigned char value, unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   outb(value, port);
+   udelay(2);
+}
+
+/*
  * Initialize the PIT timer.
  *
  * This is also called after resume to bring the PIT into operation again.
Index: linux-2.6/include/asm-x86/i8253.h
===
--- linux-2.6.orig/include/asm-x86/i8253.h
+++ linux-2.6/include/asm-x86/i8253.h
@@ -12,7 +12,9 @@ extern struct clock_event_device *global
 
 extern void setup_pit_timer(void);
 
-#define inb_pitinb_p
-#define outb_pit   outb_p
+/* accesses to PIT registers need careful delays on some platforms. Define
+   them here in a common place */
+extern unsigned char inb_pit(unsigned int port);
+extern void outb_pit(unsigned char value, unsigned int port);
 
 #endif /* __ASM_I8253_H__ */

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/2] x86: define outb_pic and inb_pic to stop using outb_p and inb_p

2008-02-18 Thread David P. Reed

The delay between io port accesses to the PIC is now defined using outb_pic
and inb_pic.  This fix provides the next step, using udelay(2) to define the
*PIC specific* timing requirements, rather than on bus-oriented timing, which
is not well calibrated.

Again, the primary reason for fixing this is to use proper delay strategy,
and in particular to fix crashes that can result from using port 80 writes
on machines that have resources on port 80, such as the ENE chips used by Quanta
in latops it designs and sells to, e.g. HP.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>
Index: linux-2.6/arch/x86/kernel/i8259_32.c
===
--- linux-2.6.orig/arch/x86/kernel/i8259_32.c
+++ linux-2.6/arch/x86/kernel/i8259_32.c
@@ -277,6 +277,23 @@ static int __init i8259A_init_sysfs(void
 
 device_initcall(i8259A_init_sysfs);
 
+unsigned char inb_pic(unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   unsigned char value = inb(port);
+   udelay(2);
+   return value;
+}
+
+void outb_pic(unsigned char value, unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   outb(value, port);
+   udelay(2);
+}
+
 void init_8259A(int auto_eoi)
 {
unsigned long flags;
Index: linux-2.6/arch/x86/kernel/i8259_64.c
===
--- linux-2.6.orig/arch/x86/kernel/i8259_64.c
+++ linux-2.6/arch/x86/kernel/i8259_64.c
@@ -347,6 +347,23 @@ static int __init i8259A_init_sysfs(void
 
 device_initcall(i8259A_init_sysfs);
 
+unsigned char inb_pic(unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   unsigned char value = inb(port);
+   udelay(2);
+   return value;
+}
+
+void outb_pic(unsigned char value, unsigned int port)
+{
+   /* delay for some accesses to PIC on motherboard or in chipset must be
+  at least one microsecond, but be safe here. */
+   outb(value, port);
+   udelay(2);
+}
+
 void init_8259A(int auto_eoi)
 {
unsigned long flags;
Index: linux-2.6/include/asm-x86/i8259.h
===
--- linux-2.6.orig/include/asm-x86/i8259.h
+++ linux-2.6/include/asm-x86/i8259.h
@@ -28,8 +28,8 @@ extern void init_8259A(int auto_eoi);
 extern void enable_8259A_irq(unsigned int irq);
 extern void disable_8259A_irq(unsigned int irq);
 extern unsigned int startup_8259A_irq(unsigned int irq);
-
-#define inb_picinb_p
-#define outb_pic   outb_p
+/* the PIC may need a careful delay on some platforms, hence specific calls */
+extern unsigned char inb_pic(unsigned int port);
+extern void outb_pic(unsigned char value, unsigned int port);
 
 #endif /* __ASM_I8259_H__ */

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 0/2] replacement submission for motherboard/chipset iodelay fixes

2008-02-18 Thread David P. Reed

Here are the two revised patches based on Alan Cox's NAK's and suggestions
regarding using the _pic and _pit versions of inb/outb.  The new patches
use udelay(2) as a conservative delay for pic and pit, and isolate
that usage in the respective i8253.c and i8259_*.c files.
Together with the already ack'ed patch for CMOS rtc (not included here)
these should solve the problem with modern machines, the ones that don't use
older devices, but only motherboard/chipset resources.

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH 1/3] x86: fix init_8259A() to not use outb_pic

2008-02-17 Thread David P. Reed


Rene Herman wrote:

On 17-02-08 23:25, Alan Cox wrote:


On Sun, 17 Feb 2008 16:56:28 -0500 (EST)
"David P. Reed" <[EMAIL PROTECTED]> wrote:


fix init_8259A() which initializes the 8259 PIC to not use outb_pic,
which is a renamed version of outb_p, and delete outb_pic define.


NAK

The entire point of inb_pic/outb_pic is to isolate the various methods
and keep the logic for delays in one place. Undoing this just creates a
nasty mess.

Quite probably inb_pic/outb_pic will end up as static inlines that do 
inb

or outb with a udelay of 1 or 2 but that is where the knowledge belongs.


Additional NAK in sofar that the PIC delays were reported to be 
necesary with some VIA chipsets earlier in these threads.


Rene.

This not being a place where performance matters, I will submit a new 
patch that changes inb_pic and outb_pic to use udelay(2).  However, note 
that init_8259A does not use these consistently in its own accesses to 
the PIC registers.  Should I change it to use the _pic calls whereever 
it touches the PIC registers to be conservative?  Note that there is a 
udelay(100) after the registers are all setup, perhaps this is the real 
VIA requirement...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] x86: fix pcspkr to not use inb_p/outb_p calls.

2008-02-17 Thread David P. Reed

Fix pcspkr driver to use explicit timing delay for access to PIT,
rather than inb_p/outb_p calls, which use insufficiently explicit
delays (defaulting to port 80 writes) that can cause freeze problems
on some machines, such as Quanta moterboard  machines using ENE EC's.
The explicit timing delay is only needed for accesses to the 8253 PIT.
The standard requirement for the 8253 to respond to successive writes
is 1 microsecond.  The 8253 has never been on the expansion bus, so 
a proper delay has nothing to do with expansion bus timing, but instead
its internal logic's capability to react to input.  Since udelay is correctly
calibrated by the time the pcspkr driver is initialized, we use 1 microsecond
as the timing.

Also shorten lines to less than 80 characters.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>

Index: linux-2.6/drivers/input/misc/pcspkr.c
===
--- linux-2.6.orig/drivers/input/misc/pcspkr.c
+++ linux-2.6/drivers/input/misc/pcspkr.c
@@ -32,9 +32,11 @@ MODULE_ALIAS("platform:pcspkr");
 static DEFINE_SPINLOCK(i8253_lock);
 #endif
 
-static int pcspkr_event(struct input_dev *dev, unsigned int type, unsigned int 
code, int value)
+static int pcspkr_event(struct input_dev *dev, unsigned int type,
+   unsigned int code, int value)
 {
unsigned int count = 0;
+   unsigned char mask;
unsigned long flags;
 
if (type != EV_SND)
@@ -51,17 +53,21 @@ static int pcspkr_event(struct input_dev
 
spin_lock_irqsave(&i8253_lock, flags);
 
+   mask = inb(0x61);
if (count) {
/* enable counter 2 */
-   outb_p(inb_p(0x61) | 3, 0x61);
+   outb(mask | 3, 0x61);
+   /* some 8253's may require 1 usec. between accesses */
/* set command for counter 2, 2 byte write */
-   outb_p(0xB6, 0x43);
+   outb(0xB6, 0x43);
+   udelay(1);
/* select desired HZ */
-   outb_p(count & 0xff, 0x42);
+   outb(count & 0xff, 0x42);
+   udelay(1);
outb((count >> 8) & 0xff, 0x42);
} else {
/* disable counter 2 */
-   outb(inb_p(0x61) & 0xFC, 0x61);
+   outb(mask & 0xFC, 0x61);
}
 
spin_unlock_irqrestore(&i8253_lock, flags);

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] x86: fix init_8259A() to not use outb_pic

2008-02-17 Thread David P. Reed

fix init_8259A() which initializes the 8259 PIC to not use outb_pic,
which is a renamed version of outb_p, and delete outb_pic define.
There is already code in the .c files that does accesses to CMD & IMR registers
in successive outb() calls without _p.  Thus the outb_p is obviously not
needed, if it ever was.  Research into chipset documentation and old BIOS
listings shows that IODELAY was not used even in early machines.  Thus
the delay between i/o port writes was deleted for the 8259.

Again, the primary reason for fixing this is to use proper delay strategy,
and in particular to fix crashes that can result from using port 80 writes
on machines that have resources on port 80, such as the ENE chips used by Quanta
in latops it designs and sells to, e.g. HP.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>
Index: linux-2.6/arch/x86/kernel/i8259_32.c
===
--- linux-2.6.orig/arch/x86/kernel/i8259_32.c
+++ linux-2.6/arch/x86/kernel/i8259_32.c
@@ -285,24 +285,30 @@ void init_8259A(int auto_eoi)
 
spin_lock_irqsave(&i8259A_lock, flags);
 
-   outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */
-   outb(0xff, PIC_SLAVE_IMR);  /* mask all of 8259A-2 */
-
-   /*
-* outb_pic - this has to work on a wide range of PC hardware.
-*/
-   outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */
-   outb_pic(0x20 + 0, PIC_MASTER_IMR); /* ICW2: 8259A-1 IR0-7 mapped 
to 0x20-0x27 */
-   outb_pic(1U << PIC_CASCADE_IR, PIC_MASTER_IMR); /* 8259A-1 (the master) 
has a slave on IR2 */
+   /* mask all of 8259A-1 */
+   outb(0xff, PIC_MASTER_IMR);
+   /* mask all of 8259A-2 */
+   outb(0xff, PIC_SLAVE_IMR);
+
+   /* ICW1: select 8259A-1 init */
+   outb(0x11, PIC_MASTER_CMD);
+   /* ICW2: 8259A-1 IR0-7 mapped to 0x20-0x27 */
+   outb(0x20 + 0, PIC_MASTER_IMR);
+   /* 8259A-1 (the master) has a slave on IR2 */
+   outb(1U << PIC_CASCADE_IR, PIC_MASTER_IMR);
if (auto_eoi)   /* master does Auto EOI */
-   outb_pic(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR);
+   outb(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR);
else/* master expects normal EOI */
-   outb_pic(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR);
+   outb(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR);
 
-   outb_pic(0x11, PIC_SLAVE_CMD);  /* ICW1: select 8259A-2 init */
-   outb_pic(0x20 + 8, PIC_SLAVE_IMR);  /* ICW2: 8259A-2 IR0-7 mapped 
to 0x28-0x2f */
-   outb_pic(PIC_CASCADE_IR, PIC_SLAVE_IMR);/* 8259A-2 is a slave 
on master's IR2 */
-   outb_pic(SLAVE_ICW4_DEFAULT, PIC_SLAVE_IMR); /* (slave's support for 
AEOI in flat mode is to be investigated) */
+   /* ICW1: select 8259A-2 init */
+   outb(0x11, PIC_SLAVE_CMD);
+   /* ICW2: 8259A-2 IR0-7 mapped to 0x28-0x2f */
+   outb(0x20 + 8, PIC_SLAVE_IMR);
+   /* 8259A-2 is a slave on master's IR2 */
+   outb(PIC_CASCADE_IR, PIC_SLAVE_IMR);
+   /* (slave's support for AEOI in flat mode is to be investigated) */
+   outb(SLAVE_ICW4_DEFAULT, PIC_SLAVE_IMR);
if (auto_eoi)
/*
 * In AEOI mode we just have to mask the interrupt
Index: linux-2.6/arch/x86/kernel/i8259_64.c
===
--- linux-2.6.orig/arch/x86/kernel/i8259_64.c
+++ linux-2.6/arch/x86/kernel/i8259_64.c
@@ -355,29 +355,30 @@ void init_8259A(int auto_eoi)
 
spin_lock_irqsave(&i8259A_lock, flags);
 
-   outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */
-   outb(0xff, PIC_SLAVE_IMR);  /* mask all of 8259A-2 */
+   /* mask all of 8259A-1 */
+   outb(0xff, PIC_MASTER_IMR);
+   /* mask all of 8259A-2 */
+   outb(0xff, PIC_SLAVE_IMR);
 
-   /*
-* outb_pic - this has to work on a wide range of PC hardware.
-*/
-   outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */
+   /* ICW1: select 8259A-1 init */
+   outb(0x11, PIC_MASTER_CMD);
/* ICW2: 8259A-1 IR0-7 mapped to 0x30-0x37 */
-   outb_pic(IRQ0_VECTOR, PIC_MASTER_IMR);
+   outb(IRQ0_VECTOR, PIC_MASTER_IMR);
/* 8259A-1 (the master) has a slave on IR2 */
-   outb_pic(0x04, PIC_MASTER_IMR);
+   outb(0x04, PIC_MASTER_IMR);
if (auto_eoi)   /* master does Auto EOI */
-   outb_pic(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR);
+   outb(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR);
else/* master expects normal EOI */
-   outb_pic(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR);
+   outb(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR);
 
-   outb_pic(0x11, PIC_SLAVE_CMD);  /* ICW1: select 8259A-2 init */
+   /* ICW1: select 8259A-2 init */
+   outb(0x11, PIC_SLAVE_CMD);
/* ICW2: 825

[PATCH 2/3] x86: fix cmos read and write to not use inb_p and outb_p

2008-02-17 Thread David P. Reed

fix code to access CMOS rtc registers so that it does not use inb_p and
outb_p routines, which are deprecated.  Extensive research on all known CMOS RTC
chipset timing shows that there is no need for a delay in accessing the
registers of these chips even on old machines. These chipa are never on an
expansion bus, but have always been "motherboard" resources, either in the
processor chipset or explicitly on the motherboard, and they are not part of 
the ISA/LPC or PCI buses, so delays should not be based on bus timing.
The reason to fix it:
1) port 80 writes often hang some laptops that use ENE EC chipsets, esp. those
designed and manufactured by Quanta for HP;
2) RTC accesses are timing sensitive, and extra microseconds may matter;
3) the new "io_delay" function is calibrated by expansion bus timing needs,
thus is not appropriate for access to CMOS rtc registers.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>
Index: linux-2.6/arch/x86/kernel/rtc.c
===
--- linux-2.6.orig/arch/x86/kernel/rtc.c
+++ linux-2.6/arch/x86/kernel/rtc.c
@@ -151,8 +151,8 @@ unsigned char rtc_cmos_read(unsigned cha
unsigned char val;
 
lock_cmos_prefix(addr);
-   outb_p(addr, RTC_PORT(0));
-   val = inb_p(RTC_PORT(1));
+   outb(addr, RTC_PORT(0));
+   val = inb(RTC_PORT(1));
lock_cmos_suffix(addr);
return val;
 }
@@ -161,8 +161,8 @@ EXPORT_SYMBOL(rtc_cmos_read);
 void rtc_cmos_write(unsigned char val, unsigned char addr)
 {
lock_cmos_prefix(addr);
-   outb_p(addr, RTC_PORT(0));
-   outb_p(val, RTC_PORT(1));
+   outb(addr, RTC_PORT(0));
+   outb(val, RTC_PORT(1));
lock_cmos_suffix(addr);
 }
 EXPORT_SYMBOL(rtc_cmos_write);

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] x86: cleanup primary motherboard chip port access delays

2008-02-17 Thread David P. Reed

cleanup motherboard chip io port delays.  inb_p and outb_p have traditionally
used a write to port 80 (a non-existent port) as a delay.  Though there is an
argument that that is a good delay for devices on the ISA or PCI expansion buses
it is not a good mechanism for devices in the processor chipset or on the
"motherboard".  The write to port 80 at best causes an abort on the ISA or LPC
bus, and on some machines (like many of the HP laptops manufactured by Quanta)
actually writes data to real i/o devices.  For example, the ENE Embedded
Controller chip family defaults to provide a register at port 80 that 
can be written, and which can cause an interrupt in the Embedded Controller.

This has been shown to cause hangs on some machines, especially in accessing
the CMOS RTC during bootup.

This patch series addresses three of the places where these are used in common
kernel code - in particular the three uses that affect the HP laptops mentioned
above, modifying the delays to match the worst known delay issues for the
specific chips.

The patch set is complementary to the iodelay= kernel parameter added in 2.6.25,
since it means fewer users will need to add that parameter to run linux "out of 
the box" without hanging.

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-14 Thread David P. Reed


David Woodhouse wrote:

On Fri, 2008-01-11 at 09:35 -0500, David P. Reed wrote:
  

  Using any "unused port" for a delay means that the machine check
feature is wasted and utterly unusable.



Not entirely unusable. You can recover silently from the machine check
if it was one of the known accesses to the 'unused port'. It certainly
achieves a delay :)
  

I'm sure that's what the driver writers had in mind.  ;-)

And I think we probably have a great shot at getting Intel, Microsoft, 
HP, et al.. to add a feature for Linux to one of the ACPI table 
specifications that define an "unused port for delay purposes" field in 
the ACPI 4.0 spec, and retrofit it into PC/104 machine BIOSes.  At least 
Microsoft doesn't have a patent on using port 80 for delay purposes. :-)





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-11 Thread David P. Reed


Alan Cox wrote:
bus abort on the LPC bus".   More problematic is that I would think some 
people might want to turn on the AMD feature that generates machine 
checks if a bus timeout happens.   The whole point of machine checks is 



An ISA/LPC bus timeout is fulfilled by the bridge so doesn't cause an MCE.


  
Good possibility, but the documentation on HyperTransport suggests 
otherwise, even for LPC bridges in this particular modern world of 
AMD64. I might do the experiment someday to see if my LPC bridge is 
implemented in a way that does or doesn't support enabling MCE's. Could 
be different between Intel and AMD - I haven't had reason to pore over 
the Intel chipset specs, since my poking into all this stuff has been 
driven by my personal machine's issues, and it's not got any Intel 
compatible parts.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-11 Thread David P. Reed




Rene Herman wrote:

On 11-01-08 02:36, Zachary Amsden wrote:

FWIW, I fixed the problem locally by recompiling, changing port 80 to 
port 84 in io.h; works great, and doesn't conflict with any occupied 
ports.


Might not give you a "proper" delay though. 0xed should be a better 
choice...


I don't think there is any magic here.  I modified the patch to do *no 
delay at all* in the io_delay "quirk" and have been running reliably for 
weeks including the very heavy I/O load that comes from using software 
RAID on this nice laptop that has two separate SATA drives!  This 
particular laptop has no problematic devices - the only problem is 
actually in the CMOS_READ and CMOS_WRITE macros that *use* the _p 
operations in a way that is unnecessary on this machine.  (in fact, it 
would be hard to add a problematic device - there's no PCMCIA slot 
either, and so every option is USB or Firewire).


Using 0xED happens to work, but it's not guaranteed to work either.  
There is not a "standard" for an "unused port that is mapped to cause a 
bus abort on the LPC bus".   More problematic is that I would think some 
people might want to turn on the AMD feature that generates machine 
checks if a bus timeout happens.   The whole point of machine checks is 
to allow the machine to be more reliable.   Using any "unused port" for 
a delay means that the machine check feature is wasted and utterly unusable.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-10 Thread David P. Reed



Rene Herman wrote:

On 10-01-08 01:37, Robert Hancock wrote:


I agree. In this case the BIOS on these laptops is trying to tell us 
"port 80 is used for our purposes, do not touch it". We should be 
listening.


Listening is fine but what are you going to do after you have 
listened? Right, not use port 0x80 but since that's the current idea 
anyway outside of legacy drivers, you don't actually need to listen.


If the quirk-to-0xed or similar was to stay, it's a much better 
switching point than DMI strings but if not, it's not actually important.
Well, I was just suggesting a warning that would come up when a driver 
that still used port 80 was initialized...
I think that is what Alan Cox and others suggest for legacy drivers that 
have worked - I agree that it may not be the right thing to screw them 
up, especially since my laptop, and probably most machines that might 
start using port 80 or other "legacy ports" won't ever need those drivers.


I thought more about a complete solution last night.   A clean idea that 
really fits the linux design might be the following outline of a patch. 
I suspect it might seem far less ugly, and probably would meet Alan 
Cox's needs, too - I am very sympathetic about not breaking 8390's, etc.


Define a "motherboard resources" driver that claims all the resources 
defined for PNP0C02 devices during the pnp process.   I think Windoze 
actually does something quite similar.   This would claim port 80.


Define an iodelay driver.  This driver exists largely to claim port 80 
for the iodelay operation  (you could even define an option for other 
ports).  Legacy drivers would be modified to require iodelay.  The 
iodelay driver would set up the iodelay mechanism to do something other 
than the "boot time" default - which could be no delay, or udelay.  It 
would also set a flag that says "_b operations are safe".


Put a WARN_ONCE() in the in/out*_b operations that checks the flag that 
is set by the iodelay driver.  This would only trigger in the case that 
80 or whatever was reserved by some other device driver - such as the 
motherboard resources driver above.  Modern machines won't do that.


Finally, anything that happens before the motherboard resources and 
iodelay drivers are initialized cannot use in*_p or out*_p (whether they 
can be loadable modules rather than built in is a question).  This is a 
very small set, and I believe with the exception of the PIT (8253/4) are 
very safe.


Note that this idea is also compatible with rewriting all drivers to use 
"iodelay()" explicitly instead of _p().  But it doesn't require that.




Rene.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-09 Thread David P. Reed


Zachary Amsden wrote:


According to Phoenix Technologies book "System BIOS for IBM PCs,
Compatibles and EISA Computers, 2nd Edition", the I/O port list gives

port 0080h   R/W  Extra page register (temporary storage)

Despite looking, I've never seen it documented anywhere else, but I
believe it works on just about every PC platform.  Except, apparently,
my laptop.


  
The port 80 problem was discovered by me, after months of "bisecting" 
the running code around a problem with hanging when using hwclock in 
64-bit mode when ACPI is on.  So it kills my laptop, too, and many 
currentlaptop motherboards designed by Quanta for HP and Compaq (dv6000, 
dv9000, tx1000, apparently)


In the last couple of weeks, I was able with luck to discover that the 
problem is the ENE KB3920 chip, which is the "big brother" of the KB3700 
chip included in the OLPC XO "$100 laptop" made also by Quanta.  I 
verified this by taking my laptop apart - a fun and risky experience.  
Didn't break any connectors, but I don't recommend it for those who are 
not experienced disassembling laptops and cellphones, etc.  The KB3920 
contains an EC, an SMBus, a KBC, some watchdog timers, and a variety of 
other functions that keep the laptop going, coordinating the 
relationships among various peripherals.  The firmware is part standard 
from ENE, part OEM-specific, in this case coded by Quanta or a BIOS 
subcontractor.


You can read the specsheet for the KB3700 online at laptop.org, since 
the specs of the laptop are "open".  The 3920's spec is confidential.  
And the firmware is confidential as well for both the 3700 and 3920.  
Clues as to what it does can be gleaned by reading the disassembler 
output of the DSDT code in the particular laptops - though the SMM BIOS 
probably also talks to it.


Modern machines have many subsystems, and the ACPI and SMBIOS coordinate 
to run them; blade servers also have drawer controllers and backplane 
management buses.  The part that runs Linux is only part of the machine.


Your laptop isn't an aberration.  It's part of the new generation of 
evolved machines that take advantage of the capabilities of ACPI and 
SMBIOS and DMI standards that are becoming core parts of the market.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-09 Thread David P. Reed


Christer Weinigel wrote:


Did I miss anyting?

  

Nothing that seems *crucial* going forward for Linux.  The fate of
"legacy machines" is really important to get right.

I have a small suggestion in mind that might be helpful in the future:
the  "motherboard resources" discovered as PNP0C02 devices in their _CRS
settings in ACPI during ACPI PnP startup should be reserved (or
checked), and any drivers that still use port 80 implicitly should
reserve that port.

This may be too late in the boot process to make a decision not to use 
port 80, and it

doesn't help decide a strategy to use an alternate port (0xED happens to
"work" on the dv9000 machines in the sense that it generates a bus
timeout on LPC, but there is no guarantee that 0xED is free on any 
particular motherboard, and "unusedness" is not declared in any 
BIOS/ACPI tables) or to use a udelay-based iodelay (but there is nothing 
in the BIOS tables that suggest the right delays for various I/O ports 
if any modern parts need them...which I question, but can't prove a 
negative in general).


However, doing the reservations on such resources could generate a 
warning that would help diagnose new current and future

designs including devices like the ENE KB3920 that have a port that is
defaulted to port 80 and routed to the EC for functions that the 
firmware and ACPI can agree to do.  Or any other ports used in new ways 
and properly notified to the OS via the now-standard Wintel BIOS functions.


I don't know if /proc/ioports is being maintained, but the fact that it
doesn't contain all of those PNP0C02 resources known on my machine seems 
to be a bug - which I am happy to code a patch or two for as a 
contribution back to Linux, if it isn't on the way out as the /sys 
hierarchy does a better job.

/sys/bus/pnp/... does get built properly and has port 80 described
properly - not as a DMA port, but as a port in use by device 05:00, 
which is the motherboard resource catchall.  Thus the patch would be small.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-08 Thread David P. Reed


Christer Weinigel wrote:

Argument by personal authority.  Thats good.
There is no other kind of argument.  Are you claiming supernatural 
authority drives your typing fingers, or is your argument based on what 
you think you know?  I have piles of code that I wrote, spec sheets (now 
that I'm back in my home office), code that others wrote at the time, 
and documentation from vendors that come from my personal experiences.  
That doesn't mean I'm always right - always happy to learn something 
new.  Just don't condescend to a 55 year old who has been writing 
operating systems, compilers, and designing hardware for almost 40 years 
professionally (yes, I got my first job at 16 writing FORTRAN code to 
simulate hydrodynamic systems).

I guess that's why you
don't seem to understand the difference between reading the serial port
status register and not being allowed to access a register at all
due to such this as the 4 cycle delay you quoted yourself from the 8390
data sheet,
If you read what I said carefully, I said that the 8390 was a very 
special case.   The "chip select" problem it experienced was pretty much 
unique among boards of the time.  Those of us who looked at its design 
and had any experience designing hardware for buses like the unibus or 
even the buses on PDP-8's and DG machines thought it had to be a joke.  
Of course it saved money per board, so it beat the 3Com boards on price 
- and you could program it after a fashion.  So it involved "cheaping out".


The normal timing problem was that an out or in operation to a board or 
chip required some time to elapse before the chip performed the side 
effects internally so that the next operation to it would have an 
effect.  This is exactly the reason why most chips and boards are 
designed to either have a polling of a flag indicate operation 
completion.  The serial "buffer empty" flag is the simplest possible 
explanatory example of such handshaking that came to mind (writing a 
character to a serial output device twice often leads to surprises, 
unless you wait for the previous character to clock out).  See my 
comment on RTC below, for a more complex to explain example.

and similar issues with the I8253 that I quoted from its
data sheet a few posts ago.

  
The 8253 was a motherboard chip.  I am not sure it had any timing 
problems with its electrical signalling.  I just don't remember.  The 
spec sheet doesn't say it's internal state can get scrambled.


I was thinking of another timer, the RTC which is usually a part of the
Super I/O.
The RTC has very well documented timing requirements.  But none of the 
spec sheets, nor my experience with it, mention electrical issues that 
prevented back-to-back port operations.  The documented timing 
requirements have to do with the state during the time it ticks over 
internally once per second.  But it is carefully designed to have a flag 
that is "on" during 244 microseconds prior to and covering the time it 
is unsafe to read the registers.   That design is special because it is 
designed to operate when the machine is powered off, so it has two 
internal clock domains, one of which is used in "low power" mode and is 
very slow to minimize power.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-08 Thread David P. Reed



Christer Weinigel wrote:
There is no need to use io writes to supposedly/theoretically "unused 
ports" to make drivers work on any bus.

ISA included!  You can, for example, wait for an ISA bus serial
adapter to put out its next character by looping reading the port
that has the output buffer full flag in a tight loop, with no delay
code at all.  And if you need to time things, just call a timing loop
subroutine that you calibrate at boot time.



Now you're totally confusing things.  You're talking about looking at
bits in a register to see if a transmit register is empty.  
That's easy.


The delays needed for the Intel M8259 and M8253 say that you're not
even allowed to access the registers _at_ _all_ for some time after a
register access.  If you do a write to a register immediately followed
by any access, including a read of the status register, you can corrupt
the state of the chip.
  
Not true.  Even on the original IBM 5150 PC, the 8259 on the motherboard 
accepted back to back  OUT and IN instructions, and it would NOT trash 
the chip state.  You can read the original IBM BIOS code if you like.  I 
don't remember about the 8253's timing.  I doubt the chip's state would 
be corrupted in any way. The data and address lines were the same data 
and address lines that the microprocessor used to access memory - it 
didn't "hold" the lines stable any longer than the OUT instruction.

And the Intel chips are not the only ones with that kind of brain
damage.  But what makes the 8259 and 8253 a big problem is that every
modern PC has a descendant of those chips in them.
Register compatible.  Not  the same chips or even the same  masks or 
timing requirements.

The discrete Intel
chips or clones got aggregated into Super I/O chips, and the Super I/O
chips were put on a LPC bus (an ISA bus with another name) or
integrated into the southbrige.
Don't try to teach your grandmother to suck eggs: I've been programming 
PC compatibles since probably before you were able to do long division - 
including writing code on the first prototype IBM PCs, the first 
pre-manufacturing PC-ATs, and zillions of clones.  (and I was also 
involved in designing hardware including the so-called "Lotus Intel" 
expanded memory cards and the original PC cards)  The 8259 PIC is an 
*interrupt controller*. It was NEVER present in a Super I/O chip, or an 
LPC chip.  Its functionality was absorbed into the chipsets that control 
interrupt mapping, like the PIIX and the nForce. 


And the "if it ain't broken, don't fix
it" mantra probably means that some modern chipsets are still using
exactly the same internal design as the 25 year old chips and will
still be subject to some of those ancient limitations.
  
Oh, come on.  Give the VLSI designers some credit for brains.   The CAD 
tools used to design the 8259 and 8253 were so primitive you couldn't 
even get a chip manufactured with designs from that era today.  When 
people design chips today they do it with simulators that can't even 
work, and testers that run from test suites that were not available at 
the time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-08 Thread David P. Reed


Alan -

I dug up a DP83901A SNIC datasheet in a quick Google search, while that 
wasn't the only such chip, it was one of them.  I can forward the PDF 
(the www.alldatasheet.com site dynamically creates the download URL), if 
anyone wants it.
The relevant passage says, in regard to delaying between checking the 
CRDA addresses to see if a dummy "remote read" has been executed., and 
in regard perhaps to other card IO register loops: 



   TIME BETWEEN CHIP SELECTS

   The SNIC requires that successive chip selects be no
   closer 
   than 4 bus clocks (BSCK) together. If the condition is
   violat-   
   ed the SNIC may glitch ACK. CPUs that operate from pipe-

   lined instructions (i e 386) or have a cache (i e 486) can
   execute consecutive I O cycles very quickly The solution is
   to delay the execution of consecutive I O cycles by either
   breaking the pipeline or forcing the CPU to access outside
   its cache.

The NE2000 as I recall had no special logic on the board to protect the 
chip from successive chip selects that were too close - which is the 
reason for the problem. Clearly an out to port 80 takes more than 4 ISA 
bus clocks, so that works if the NE2000 is on the ISA bus,   On the 
other hand, there are other ways to delay more than 4 ISA bus clocks.  
And as you say, one needs a delay for this chip that relates to the 
chip's card's bus's clock speed, not absolute time.


Alan Cox wrote:
As well you should. I am honestly curious (for my own satisfaction) as 
to what the natsemi docs say the delay code should do  (can't imagine 
they say "use io port 80 because it is unused").  I don't have any 



They say you must allow 4 bus clocks for the address decode. They don't
deal with the ISA side as the chip itself has no ISA glue.


  
copies anymore. But mere curiosity on my part is not worth spending a 
lot of time on - I know you are super busy.   If there's a copy online 
at a URL ...



Not that I know of. There may be. A good general source of info is Russ
Nelson's old DOS packet driver collection.


  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-08 Thread David P. Reed


Alan Cox wrote:

The natsemi docs here say otherwise. I trust them not you.
  
As well you should. I am honestly curious (for my own satisfaction) as 
to what the natsemi docs say the delay code should do  (can't imagine 
they say "use io port 80 because it is unused").  I don't have any 
copies anymore. But mere curiosity on my part is not worth spending a 
lot of time on - I know you are super busy.   If there's a copy online 
at a URL ...


The problem is that certain people, unfortunately those who know
nothing about ISA related bus systems, keep trying to confuse ISA delay
logic with core chip logic and end up trying to solve both a problem and a
non-problem in one, creating a nasty mess in the process.

  
I agree that the problems of chip logic and ISA delay are all tangled 
up, probably more than need be.  I hope that the solution turns out to 
simplify matters, and hopefully to document the intention of the 
resulting code sections a bit more clearly for the future.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-08 Thread David P. Reed




Ondrej Zary wrote:

On Tuesday 08 January 2008 18:24:02 David P. Reed wrote:
  

Windows these days does delays with timing loops or the scheduler.  It
doesn't use a "port".  Also, Windows XP only supports machines that tend
not to have timing problems that use delays.  Instead, if a device takes
a while to respond, it has a "busy bit" in some port or memory slot that
can be tested.



Windows XP can run on a machine with ISA slot(s) and has built-in drivers for 
some plug&play ISA cards - e.g. the famous 3Com EtherLink III. I think that 
there's a driver for NE2000-compatible cards too and it probably works.
  
There is no need to use io writes to supposedly/theoretically "unused 
ports" to make drivers work on any bus.
ISA included!  You can, for example, wait for an ISA bus serial adapter 
to put out its next character by looping reading the port that has the 
output buffer full flag in a tight loop, with no delay code at all.  And 
if you need to time things, just call a timing loop subroutine that you 
calibrate at boot time.
I wrote DOS drivers for NE2000's on the ISA bus when they were brand new 
designs from Novell without such kludges as writes to I/O port 80.  I 
don't remember writing a driver for the 3com devices - probably didn't, 
because 3Com's cards were expensive at the time.


In any case, Linux *did* adopt this port 80 strategy - I'm sure all 
concerned thought it was frightfully clever at the time.  Linus 
expressed his skepticism in the comments in io.h.  The problem is to 
safely move away from it toward a proper strategy that doesn't depend on 
"bus aborts" which would trigger machine checks if they were properly 
enabled.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-08 Thread David P. Reed

Windows these days does delays with timing loops or the scheduler.  It 
doesn't use a "port".  Also, Windows XP only supports machines that tend 
not to have timing problems that use delays.  Instead, if a device takes 
a while to respond, it has a "busy bit" in some port or memory slot that 
can be tested.


Almost all of the issues in Linux where _p operations are used are (or 
should be) historical - IMO.


Ondrej Zary wrote:

On Tuesday 08 January 2008 02:38:15 David P. Reed wrote:
  

H. Peter Anvin wrote:


And shoot the designer of this particular microcontroller firmware.
  

Well, some days I want to shoot the "designer" of the entire Wintel
architecture...  it's not exactly "designed" by anybody of course, and
today it's created largely by a collection of Taiwanese and Chinese ODM
firms, coupled with Microsoft WinHEC and Intel folks.  At least they
follow the rules and their ACPI and BIOS code say that they are using
port 80 very clearly if you use PnP and ACPI properly.  And in the old
days, you were "supposed" to use the system BIOS to talk to things like
the PIT that had timing issues, not write your own code.



Does anyone know what port does Windows use? I'm pretty sure that it isn't 80h 
as I run Windows 98 often with port 80h debug card inserted. The last POST 
code set by BIOS usually remains on the display and only changes when BIOS 
does something like suspend/resume. IIRC, there was a program that was able 
to display temperature from onboard sensors on the port 80h display that's 
integrated on some mainboards.


  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-08 Thread David P. Reed

The last time I heard of a 12 MHz bus in a PC system was in the days of 
the PC-AT, when some clone makers sped up their buses (pre PCI!!!) in an 
attempt to allow adapter card *memory* to run at the 12 MHz speed.


This caused so many industry-wide problems with adapter cards that 
couldn't be installed in certain machines and still run reliably that 
the industry learned a lesson.  That doesn't mean that LPCs don't run at 
12 MHz, but if they do, they don't have old 8 bit punky cards plugged 
into them for lots of practical reasons.  (I have whole drawers full of 
such old cards, trying to figure out an environmentally responsible way 
to get rid of them - even third world countries would be fools to make 
machiens with them).


I can't believe that we are not supporting today's machines correctly 
because we are still trying to be compatible with a few (at most a 
hundre thousand were manufactured!  Much less still functioning or 
running Linux) machines.


Now I understand that PC/104 machines and other things are very non PC 
compatible, but are x86 processor architectures.  Do they even run x86 
under 2.6.24?


Perhaps the rational solution here is to declare x86 the architecture 
for "relics" and develop a merged architecture called "modern machines" 
to include only those PCs that have been made to work since, say, the 
release of (cough) WIndows 2000?


Bodo Eggert wrote:

On Tue, 8 Jan 2008, Rene Herman wrote:
  

On 08-01-08 00:24, H. Peter Anvin wrote:


Rene Herman wrote:
  


  

Is this only about the ones then left for things like legacy PIC and PIT?
Does anyone care about just sticking in a udelay(2) (or 1) there as a
replacement and call it a day?



PIT is problematic because the PIT may be necessary for udelay setup.
  

Yes, can initialise loops_per_jiffy conservatively. Just didn't quite get why
you guys are talking about an ISA bus speed parameter.



If the ISA bus is below 8 MHz, we might need a longer delay. If we default
to the longer delay, the delay will be too long for more than 99,99 % of 
all systems, not counting i586+. Especially if the driver is fine-tuned to 
give maximum throughput, this may be bad.


OTOH, the DOS drivers I heared about use delays and would break on 
underclocked ISA busses if the n * ISA_HZ delay was needed. Maybe

somebody having a configurable ISA bus speed and some problematic
chips can test it ...

  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-07 Thread David P. Reed


H. Peter Anvin wrote:


And shoot the designer of this particular microcontroller firmware.

 
Well, some days I want to shoot the "designer" of the entire Wintel 
architecture...  it's not exactly "designed" by anybody of course, and 
today it's created largely by a collection of Taiwanese and Chinese ODM 
firms, coupled with Microsoft WinHEC and Intel folks.  At least they 
follow the rules and their ACPI and BIOS code say that they are using 
port 80 very clearly if you use PnP and ACPI properly.  And in the old 
days, you were "supposed" to use the system BIOS to talk to things like 
the PIT that had timing issues, not write your own code.


Or perhaps the ACPI spec should specify a timing loop spec and precisely 
specify the desired timing after accessing an I/O port till that device 
has properly "acted" on that operation.


The idea that Port 80 was "unused" and appropriate for delay purposes 
elicited skepticism by Linus that is recorded for posterity in the 
comments of the relevant Linux include files - especially since it was 
clearly "used" for non-delay purposes, by cards that could be plugged 
into a PCI (fast), not just an 8-bit ISA, bus.


Perhaps we should declare the world of ACPI systems a separate "arch" 
from the world of l'ancien regime where folklore about which ports were 
used for what ruled.   I lived through those old days, and they were not 
wonderful, either.


The world sucks, and Linux is supposed to be able to adapt to that 
world, suckitude and all.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-07 Thread David P. Reed

On another topic.  I have indeed determined what device uses port 80 on 
Quanta AMD64 laptops from HP.


I had lunch with Jim Gettys of OLPC a week ago; he's an old friend since 
he worked on the original X windows system.   After telling him my story 
about port 80, he mentioned that the OLPC XO machine had some issues 
with port 80 which was by design handled by the ENE KBC device on its 
motherboard.   He said the ENE was a very desirable chipset for AMD 
designs recommended by Quanta.  Richard Smith of OLPC explained to me 
how the KB3700 they use works, and that they use the KB3700 to send POST 
codes out over a serial link during boot up.


This gave me a reason to take apart my laptop, to discover that it has 
an ENE KB3920 B0 as its EC and KBC.  The port interface for the KB3920 
includes listening to port 80 which is then made available to firmware 
on the EC.  It is recognized and decoded on the LPC bus, only for 
writes, and optionally can generate an interrupt in the 8051.


Dumping both the ENE chip, and looking at the DSDT.dsl for my machine, I 
discovered that port 80 is used as an additional parameter for various 
DSDT methods that communicate to the EC, when it is operating in ACPI mode.


More work is in progress as I play around with this.  But the key thing 
is that ACPI and perhaps SMM both use port 80 as part of the base 
function of the chipset.


And actually, if I had looked at the /sys/bus/pnp definitions, rather 
than /proc/ioports, I would have noticed that port 80 was part of a 
PNP0C02 resource set.   That means exactly one thing:  ACPI says that 
port 80 is NOT free to be used, for delays or anything else.


This should make no difference here: it's just one more reason to stop 
using port 80 for delays on modern machines.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-07 Thread David P. Reed




H. Peter Anvin wrote:

Rene Herman wrote:


Is this only about the ones then left for things like legacy PIC and 
PIT? Does anyone care about just sticking in a udelay(2) (or 1) there 
as a replacement and call it a day?




PIT is problematic because the PIT may be necessary for udelay setup.

The PIT usage for calibrating the delay loop can be moderated, if need 
by, by using the PC BIOS which by definition uses the PIT correctly it 
its int 15 function 83 call..   Just do it before coming up in a state 
where the PC BIOS int 15h calls no longer work.  I gave code to do this 
in a much earlier message.


This is the MOST reliable way to use the PIT early in boot, on a PC 
compatible.  God knows how one should do it on a Macintosh running a 
386/20  :-).   But the ONLY old bat-PIT machines are, thank god, PC 
compatible, maybe.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-02 Thread David P. Reed

FYI - another quirky Quanta motherboard from HP, with DMI readings 
reported to me.

 Original Message 
Date:   Wed, 2 Jan 2008 16:23:27 +1030
From:   Joel Stanley <[EMAIL PROTECTED]>
To: David P. Reed <[EMAIL PROTECTED]>
Subject:Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)

On Dec 30, 2007 1:13 AM, David P. Reed <[EMAIL PROTECTED]> wrote:

I have also attached a c program that only touches port 80.  Compile it
for 32-bit mode (see comment), run it as root, and after two or three
runs, it will hang a system that has the port 80 bug.

Using port80.c, I could hard lock a HP Pavilion tx1000 laptop on the
first go. This was with ubuntu hardy's stock kernel (a 2.6.24-rc)

dmidecode -s baseboard-manufacturer
dmidecode -s baseboard-product-name

Quanta
30BF

Tonight, I will try compiling a kernel with these values added to your patch.

Some history, feel free to ignore if it's not relevant: ubuntu
feisty's 2.6.22 based kernel worked fine, irc. We were having issues
with sound, so tried fedora8's .23 based kernel, but this would
sporadically hard lock. Ubuntu hardy's 2.6.24 appeared fine, for the 2
hours or so I used it last night, until using the port80.c program,
obviously.

Cheers,

Joel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-01 Thread David P. Reed



Alan Cox wrote:

That does imply some muppet 'extended' the debug interface for power
management on your laptop. Also pretty much proves that for such systems
we do have to move from port 0x80 to another delay approach.
  
Alan - in googling around the net yesterday looking for SuperIO chipsets 
that claim to support port 80, I have found that "blade" servers from 
companies like IBM and HP *claim* to have a system for monitoring port 
80 diagnostic codes and sending them to the "drawer" management 
processor through a management backplane.   This is a little puzzling, 
because you'd think they would have noticed port 80 issues, since they 
run Linux in their systems.  Maybe not hangs, but it seems unhelpful to 
have a lot of noise spewing over a bus that is supposed to provide 
"management" diagnostics.  Anyway, what I did not find was whether there 
was a particular chipset that provided that port 80 feature on those 
machines.  However, if it's a common "cell" in a design, it may have 
leaked into the notebook market chipsets too.


Anyone know if the Linux kernels used on blade servers have been patched 
to not do the port 80 things?  I don't think this would break anything 
there, but it might have been a helpful patch for their purposes.  I 
don't do blades personally or at work (I focus on mobile devices these 
days, and my personal servers are discrete), so I have no knowledge.


It could be that the blade servers have BIOSes that don't do POST codes 
over port 80, but send them directly to the "drawer" management bus, of 
course.  
--

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-01 Thread David P. Reed


Pavel Machek wrote:

2. there is some "meaning" to certain byte values being written (the
_PTS and _WAK use of arguments that come from callers to store into port
80 makes me suspicious.)   That might mean that the freeze happens only
when certain values are written, or when they are written closely in
time to some other action - being used to communicate something to the



There's nothing easier than always writing 0 to the 0x80 to check if
it hangs in such case...?
Pavel

  
I did try that.  Machine in question does hang when you write 0 to 0x80 
in a loop a few thousand times.  This particular suspicion was that the 
problem was caused by the following sort of thing (it's a multi-cpu 
system...)


First, some ACPI code writes "meaningful value" X to  port 80 that is 
sort of a "parameter" to whatever follows.  Just because the DSDT 
disassembly *calls* it the DBUG port doesn't mean it is *only* used for 
debugging.   We (Linux) use it for timing delays, after all...


then Linux driver writes some random value (!=X) including zero to port 80.

then ACPI writes some other values that cause SMI or some other thing to 
happen,


There are experiments that are not so simple that could rule this 
particular guess out.   I have them on my queue of experiments I might 
try (locking out ACPI).  Of course if the BIOS were GPL, we could look 
at the comments, etc...


I may today pull the laptop apart to see if I can see what chips are on 
it, besides the nvidia chipset and the processor.  That might give a 
clue as to what SuperIO or other logic chips are there.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2008-01-01 Thread David P. Reed


Alan Cox wrote:
responds to reads differently than "unused" ports.  In particular, an 
inb takes 1/2 the elapsed time compared to a read to "known" unused port 
0xed - 792 tsc ticks for port 80 compared to about 1450 tsc ticks for 
port 0xed and other unused ports (tsc at 800 MHz).



Well at least we know where the port is now - thats too fast for an LPC
bus device, so it must be an SMI trap.

Only easy way to find out is to use the debugging event counters and see
how many instruction cycles are issued as part of the 0x80 port. If its
suprisingly high then you've got a firmware bug and can go spank HP.

  
Alan, thank you for the pointers.  I have been doing variations on this 
testing theme for a while - I get intrigued by a good debugging 
challenge, and after all it's my machine...


Two relevant new data points, and then some more suggestions:

1. It appears to be a real port.  SMI traps are not happening in the 
normal outb to 80.  Hundreds of them execute perfectly with the expected 
instruction counts.  If I can trace the particular event that creates 
the hard freeze (getting really creative, here) and stop before the 
freeze disables the entire computer, I will.  That may be an SMI, or 
perhaps any other kind of interrupt or exception.  Maybe someone knows 
how to safely trace through an impending SMI while doing printk's or 
something?


2. It appears to be the standard POST diagnostic port.  On a whim, I 
disassembled my DSDT code, and studied it more closely.   It turns out 
that there are a bunch of "Store(..., DBUG)" instructions scattered 
throughout, and when you look at what DBUG is defined as, it is defined 
as an IO Port at IO address DBGP, which is a 1-byte value = 0x80.  So 
the ACPI BIOS thinks it has something to do with debugging.   There's a 
little strangeness here, however, because the value sent to the port 
occasionally has something to do with arguments to the ACPI operations 
relating to sleep and wakeup ...  could just be that those arguments are 
distinctive.


In thinking about this, I recognize a couple of things.  ACPI is telling 
us something when it declares a reference to port 80 in its code.  It's 
not telling us the function of this port on this machine, but it is 
telling us that it is being used by the BIOS.   This could be a reason 
to put out a printk warning message...   'warning: port 80 is used by 
ACPI BIOS - if you are experiencing problems, you might try an alternate 
means of iodelay.'


Second, it seems likely that there are one of two possible reasons that 
the port 80 writes cause hang/freezes:


1. buffer overflow in such a device.

2. there is some "meaning" to certain byte values being written (the 
_PTS and _WAK use of arguments that come from callers to store into port 
80 makes me suspicious.)   That might mean that the freeze happens only 
when certain values are written, or when they are written closely in 
time to some other action - being used to communicate something to the 
SMM code).   If there is some race in when Linux's port 80 writes happen 
that happen to change the meaning of a request to the hardware or to 
SMM, then we could be rarely stepping on




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override

2007-12-30 Thread David P. Reed




H. Peter Anvin wrote:
Now, I think there is a specific reason to believe that EGA/VGA (but 
perhaps not CGA/MDA) didn't need these kinds of hacks: the video cards 
of the day was touched, directly, by an interminable number of DOS 
applications.  CGA/MDA generally *were not*, due to the unsynchronized 
memory of the original versions (writing could cause snow), so most 
applications tended to fall back to using the BIOS access methods for 
CGA and MDA.


A little history... not that it really matters, but some might be 
interested in a 55-year-old hacker's sentimental recollections...As 
someone who actually wrote drivers for CGA and MDA on the original IBM 
PC, I can tell you that back to back I/O *port* writes and reads were 
perfectly fine.  The "snow" problem had nothing to do with I/O ports.  
It had to do with the memory on the CGA adapter card not being dual 
ported, and in high-res (80x25) character mode (only!) a CPU read or 
write access caused a read of the adapter memory by the 
character-generator to fail, causing one character-position of the 
current scanline being output to get all random bits, which was then put 
through the character-generator and generated whatever the character 
generator did with 8 random bits of character or attributes as an index 
into the character generator's font table.


In particular, the solution in both the BIOS and in Visicalc, 1-2-3, and 
other products that did NOT use the BIOS or DOS for I/O to the CGA or 
MDA because they were Dog Slow, was to detect the CGA, and do a *very* 
tight loop doing "inb" instructions from one of the CGA status 
registers, looking for a 0-1 transition on the horizontal retrace flag.  
It would then do a write to display memory with all interrupts locked 
out, because that was all it could do during the horizontal retrace, 
given the speed of the processor.  One of the hacks I did in those days 
(I wrote the CGA driver for Visicalc Advanced Version and several other 
Software Arts programs, some of which were sold to Lotus when they 
bought our assets, and hired me, in 1985) was to measure the "horizontal 
retrace time" and the "vertical blanking interval" when the program 
started, and compile screen-writing code that squeezed as many writes as 
possible into both horizontal retraces and vertical retraces.   That was 
actually a "selling point" for spreadsheets - the reviewers actually 
measured whether you could use the down-arrow key in auto-repeat mode 
and keep the screen scrolling at the relevant rate!  That was hard on an 
8088 or 80286 processor, with a CGA card.


It was great when EGA and VGA came out, but we still had to support the 
CGA long after.  Which is why I fully understand the need not to break 
old machines.  We had to run on every machine that was claimed to be "PC 
compatible" - many of which were hardly so compatible  (the PS/2 model 
50 had a completely erroneous serial chip that claimed to emulate the 
original 8250, but had an immense pile of bugs, for example, that IBM 
begged ISVs to call a software problem and fix so they didn't get sued).


The IBM PC bus (predecessor of the current ISA bus, which came from the 
PC-AT's 16-bit bus), did just fine electrically - any I/O port-specific 
timing problems had to do with the timing of the chips attached to the 
bus.  For example, if a bus write to a port was routed into a particular 
chip, the timing of that chip's subsequent processing might be such that 
it was not ready to respond to another read or write.)  That's not a 
"signalling" problem - it has nothing to do with capacitance on the bus, 
e.g., but a functional speed problem in the chip (if on the motherboard) 
or the adapter card.


Rant off.  This has nothing, of course, to do with present issues.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)

2007-12-30 Thread David P. Reed




Richard Harman wrote:


I think I may have a monkey wrench to throw into this, I finally got 
around to testing the C1E patch, and the port80 patch.  End result: 
port80 patch has zero effect on this laptop, and the C1E patch makes

it stable.



Stating that your system is "stable" is not very definitive.   Does 
hwclock work when full Fedora 8 is running without the port80 patch, or 
have you disabled the uses of hwclock in your init and shutdown code?   
Have you set the hwclock setting to use the extremely dangerous 
"-directisa" option - which hides the problem because it avoids the port 
80 i/o?


Try compiling and running the test program port80.c a few times.   If 
your machine doesn't hang, it would be interesting to see the results it 
gives.


The C1E patch alone does not fix the port 80 problem several of us have 
observed.  what does dmidecode say for your motherboard vendor and model?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override

2007-12-30 Thread David P. Reed


Alan Cox wrote:
Now what's interesting is that the outb to port 80 is *faster* than an 
outb to an unused port, on my machine.  So there's something there - 
actually accepting the bus transaction.   In the ancient 5150 PC, 80 was 



Yes and I even told you a while back how to verify where it is. From the
timing you get its not on the LPC bus but chipset core so pretty
certainly an SMM trap as other systems with the same chipset don't have
the bug. Probably all that is needed is a BIOS upgrade

  
Actually, I could see whether it was SMM trapping due to AMD MSR's that 
would allow such trapping, performance or debug registers.  Nothing was 
set to trap with SMI or other traps on any port outputs.   But I'm 
continuing to investigate for a cause.  It would be nice if it were a 
BIOS-fixable problem.  It would be even nicer if the BIOS were GPL...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override

2007-12-30 Thread David P. Reed

I am so happy that there will be a way for people who don't build their 
own kernels to run Linux on their HP and Compaq laptops that have 
problems with gazillions of writes to port 80, and I'm also happy that 
some of the strange driver code will be cleaned up over time.  Thank you 
all.  Some thoughts you all might consider, take or leave, in this 
process, from an old engineering manager who once had to worry about QA 
for software on nearly every personal computer model in the 1980-1992 
period:


You know, there is a class of devices that are defined to use port 
0x80...  it's that historically useful class of devices that show/record 
the POST diagnostics.   It certainly was not designed for "delay" 
purposes.   In fact, some of those same silly devices are still used in 
industry during manufacturing test.   I wonder what would happen if 
Windows were not part of manufacturing test, and instead Linux were the 
"standard" for some category of machines...


When I was still working at Lotus in the late '80's, when we still 
supported machines like 286's, there were lots of problems with timing 
loops in drivers in applications (even Win 3.0 had some in hard disk 
drivers, as did some of our printer drivers, ...), as clock speeds 
continued to ramp.  There were major news stories of machines that 
"crashed when xyz application or zyx peripheral were added".  It was 
Intel, as I recall, that started "publicly" berating companies in the PC 
industry for using the "two short jumps" solutions, and suggesting that 
they measure the processor speed at bootup, using the BIOS standard for 
doing that with the int 15 BIOS elapsed time calls, and always use 
"calibrated" timing loops.   Which all of us who supported device 
drivers started to do  (remember, apps had device drivers in those days 
for many devices that talked directly with the registers).


I was impressed when I dug into Linux eventually, that this operating 
system "got it right" by measuring the timing during boot and creating a 
udelay function that really worked!


So I have to say, that when I was tracing down the problem that 
originally kicked off this thread, which was that just accessing the RTC 
using the standard CMOS_READ macros in a loop caused a hang, that these 
"outb al,80h" things were there.   And I noticed your skeptical comment 
in the code, Linus.  Knowing that there was never in any of the 
documented RTC chipsets a need for a pause between accesses (going back 
to my days at Software Arts working on just about every old machine 
there was...) I changed it on a lark to do no pause at all.   And my 
machine never hung...


Now what's interesting is that the outb to port 80 is *faster* than an 
outb to an unused port, on my machine.  So there's something there - 
actually accepting the bus transaction.   In the ancient 5150 PC, 80 was 
unused because it was the DMA controller port that drove memory refresh, 
and had no meaning.


Now my current hypothesis (not having access to quanta's design specs 
for a board they designed and have shipped in quantity, or having taken 
the laptop apart recently) is that there is logic there on port 80, 
doing something.  Perhaps even "POST diagnostic recording" as every PC 
since the XT has supported... perhaps supporting post-crash 
dignostics...   And that that something has a buffer, perhaps even in 
the "Embedded Controller" that may need emptying periodically.   It 
takes several tens of thousands of "outb" to port 80 to hang the 
hardware solid - so something is either rare or overflowing.  In any 
case, if this hypothesis is correct - the hardware may have an erratum, 
but the hardware is doing a very desirable thing - standardizing on an 
error mechanism that was already in the "standard" as an option...  It's 
Linux that is using a "standard" in a wrong way (a diagnostic port as a 
delay).


So I say all this, mainly to point out that Linux has done timing loops 
right (udelay and ndelay) - except one place where there was some 
skepticism expressed, right there in the code.   Linus may have some 
idea why it was thought important to do an essential delay with a bus 
transaction that had uncertain timing.   My hypothesis is that 
"community" projects have the danger of "magical theories" and 
"coolness" overriding careful engineering design practices.


Cleaning up that "clever hack" that seemed so good at the time is hugely 
difficult, especially when the driver writer didn't write down why he 
used it.  

Thus I would suggest that the _p functions be deprecated, and if there 
needs to be a timing-delay after in/out instructions, define 
in_pause(port, nsec_delay) with an explicit delay.   And if the delay is 
dependent on bus speeds, define a bus-speed ratio calibration.


Thus in future driver writing, people will be forced to think clearly 
about what the timing characteristics of their device on its bus must 
be.   That presupposes that driver writers understand the timing 
is

Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)

2007-12-29 Thread David P. Reed


Islam Amer wrote:

Hello.
I was interested in getting dynticks to work on my compaq presario v6000
to help with the 1 hour thirty minutes battery time, but after this
discussion I lost interest.

I too had the early boot time hang, and found it was udev triggering the
bug.
  
This early boot time hang is *almost certainly* due to the in/out port 
80 bug, which I discovered a few weeks ago, which affects hwclock and 
other I/O device drivers on a number of HP/Compaq machines in exactly 
this way.  The proper fix for this bug is in dispute, and will probably 
not occur in the 2.6.24 release because it touches code in many, many 
drivers.  The simplest way to test if you have a problem of this sort is 
to try this shell line as root, after you boot successfully.  If your 
machine hangs hard,  you have a problem that really looks like the port 
80 problem.


for ((i = 0; i < 1000; i = i + 1)); do cat /dev/nvram > /dev/null; done

I have also attached a c program that only touches port 80.  Compile it 
for 32-bit mode (see comment), run it as root, and after two or three 
runs, it will hang a system that has the port 80 bug.


If you then run:

dmidecode -s baseboard-manufacturer
dmidecode -s baseboard-product-name

are the values you should plug into the .matches field in the 
dmi_system_id struct in the attached patch. It would be great if you 
could do that, test, and post back with those values so they can be 
accumulated.  HP/Compaq machines with quanta m/b's are very popular, and 
very common - so at least a quirk patch for all the broken models would 
be worth doing in 2.6.25 or downstream in the distros.  The right 
patches will probably take a long time - there is a dispute as to what 
the semantics of port 80 writes even mean among the core kernel 
developers, because the hack is lost in the dim dark days of history, 
and safe resolution will take time


There is also a C1E issue with the BIOS in my machine (an HP Pavilion 
dv9000z).  I don't know if it is a bug, yet, but that's a different 
problem - associated with dynticks, perhaps.  I have to say that 
researching the AMD Kernel/BIOS docs on C1E (a very new feature in the 
last year on AMD) leaves me puzzled as to whether the dynticks problem 
exists on my machine at all, but the patch for it turns off dynticks!





Changing the /etc/init.d/udev script so that the line containing

/sbin/udevtrigger

to

/sbin/udevtrigger --subsystem-nomatch="*misc*"

seemed to fix things.

the hang is triggered specifically by 


echo add > /sys/class/misc/rtc/uevent
after inserting rtc.ko

Also using hwclock to set the rtc , will cause a hard hang, if you are
using 64bit linux. Disable the init scripts that set the time, or use
the 32bit binary, as suggested here : 


http://www.mail-archive.com/[EMAIL PROTECTED]/msg41964.html

I hope this helps. But your hardware is slightly different though.
  


commit c12c7a47b9af87e8d867d5aa0ca5c6bcdd2463da
Author: Rene Herman <[EMAIL PROTECTED]>
Date:   Mon Dec 17 21:23:55 2007 +0100

x86: provide a DMI based port 0x80 I/O delay override.

Certain (HP) laptops experience trouble from our port 0x80 I/O delay
writes. This patch provides for a DMI based switch to the "alternate
diagnostic port" 0xed (as used by some BIOSes as well) for these.

David P. Reed confirmed that port 0xed works for him and provides a
proper delay. The symptoms of _not_ working are a hanging machine,
with "hwclock" use being a direct trigger.

Earlier versions of this attempted to simply use udelay(2), with the
2 being a value tested to be a nicely conservative upper-bound with
help from many on the linux-kernel mailinglist, but that approach has
two problems.

First, pre-loops_per_jiffy calibration (which is post PIT init while
some implementations of the PIT are actually one of the historically
problematic devices that need the delay) udelay() isn't particularly
well-defined. We could initialise loops_per_jiffy conservatively (and
based on CPU family so as to not unduly delay old machines) which
would sort of work, but still leaves:

Second, delaying isn't the only effect that a write to port 0x80 has.
It's also a PCI posting barrier which some devices may be explicitly
or implicitly relying on. Alan Cox did a survey and found evidence
that additionally various drivers are racy on SMP without the bus
locking outb.

Switching to an inb() makes the timing too unpredictable and as such,
this DMI based switch should be the safest approach for now. Any more
invasive changes should get more rigid testing first. It's moreover
only very few machines with the problem and a DMI based hack seems
to fit that situation.

An early boot parameter to make the choice manually (and override any
possible DMI based decision) is also provided:

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2007-12-17 Thread David P. Reed

Besides the two reports of freezes on bugzilla.kernel.org (9511, 6307), 
the following two bug reports on bugzilla.redhat.com are almost 
certainly due to the same cause (imo, of course): 245834, 227234.


Ubuntu launchpad bug 158849 also seems to report the same problem, for 
an HP dv6258se 64-bit machine.


Also this one: 
http://www.mail-archive.com/[EMAIL PROTECTED]/msg10321.html


If you want to collect dmidecode data from these folks, perhaps we might 
get a wider sense of what categories of machines are affected.  They all 
seem to be recemt HP and Compaq AMD64 laptops, probably all Quanta 
motherboards.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2007-12-17 Thread David P. Reed




H. Peter Anvin wrote:

David P. Reed wrote:
As support: port 80 on the reporter's (my) HP dv9000z laptop clearly 
responds to reads differently than "unused" ports.  In particular, an 
inb takes 1/2 the elapsed time compared to a read to "known" unused 
port 0xed - 792 tsc ticks for port 80 compared to about 1450 tsc 
ticks for port 0xed and other unused ports (tsc at 800 MHz).




Any timings for port 0xf0 (write zero), out of curiosity?



Here's a bunch of data:

port 0xF0: cycles: out 919, in 933
port 0xed: cycles: out 2541, in 2036
port 0x70: cycles: out n/a,  in 934
port 0x80: cycles: out 1424, in 795

AMD Turion 64x2 TL-60 CPU running at 800 MHz, nVidia MCP51 chipset, 
Quanta motherboard.  Running 2.6.24-rc5 with Ingo's patch so inb_p, etc. 
use port 0xed.


Note that I can run the port 80 test once, the second time I get the 
hard freeze.  I didn't try writing to port 70 from userspace - that 
one's dangerous, but the reading of it was included for a timing typical 
of a chipset supported device.  These are all pretty consistent.


I find the "read" timing from 0x80 very interesting.  The write 
timeing is also interesting, being faster than an unused port. 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2007-12-17 Thread David P. Reed


H. Peter Anvin wrote:

Rene Herman wrote:


I do not know how universal that is, but _reading_ port 0xf0 might in 
fact be sensible then? And should even work on a 386/387 pair? (I 
have a 386/387 in fact, although I'd need to dig it up).




No.  Someone might have used 0xf0 as a readonly port for other uses.

As support: port 80 on the reporter's (my) HP dv9000z laptop clearly 
responds to reads differently than "unused" ports.  In particular, an 
inb takes 1/2 the elapsed time compared to a read to "known" unused port 
0xed - 792 tsc ticks for port 80 compared to about 1450 tsc ticks for 
port 0xed and other unused ports (tsc at 800 MHz).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2007-12-17 Thread David P. Reed


Ingo -

I finished testing the rolled up patch that you provided.  It seems to 
work just fine.  Thank you for putting this all together and persevering 
in this long and complex discussion. 

Here are the results, on the offending laptop, using 2.6.24-rc5 plus 
that one patch.


First: booted with normal boot parameters (no io_delay=):

   According to dmesg, 0xed is used.

   hwclock ran fine, hundreds of times.
   my shell script loop doing "cat /dev/nvram > /dev/null" ran fine, 
several times.
   Running Rene's "port 80" speed test ran fine once, then froze the 
system hard.  (expected)


Second: booted with io_delay=0x80, several tests, rebooting after freezes:

   hwclock froze system hard.  (this is the problem that drove me to 
find this bug).

   my shell script loop froze system hard.

Third: booted with io_delay=none:

   hwclock ran fine, also hundreds of times.
   my shell script loop ran fine several times.
   Running rene's port80 test ran fine twice, froze system hard on 
third try.


Fourth: booted with io_delay=udelay:

   hwclock ran fine, also hundreds of times.
   my shell script loop ran fine several times.
   Running Rene's port80 test ran fine, froze system hard on second try.

Analysis:

   patch works fine, and default to 0xed seems super conservative.
   I will probably use the boot parameter io_delay=none, because I 
don't seem to have any I/O

   devices that require any delays - and this way I can find any that do.

Still wondering:

   what the heck is going on with port 80 on my laptop motherboard.  
Clearly it "does something".
   I will in my spare time continue investigating, though having a 
reliable system is GREAT.





  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2007-12-17 Thread David P. Reed


About to start building and testing.  It will take a few hours.

Ingo Molnar wrote:
here's an updated rollup patch, against 2.6.24-rc4. David, could you 
please try this? This should work out of box on your system, without any 
boot option or other tweak needed.



  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2007-12-17 Thread David P. Reed


Rene Herman wrote:
No, most definitely not. Having the user select udelay or none through 
the kernel config and then the kernel deciding "ah, you know what, 
I'll know better and use port access anyway" is _utterly_ broken 
behaviour. Software needs to listen to its master.


When acting as an ordinary user, the .config is beyond my control 
(except on Gentoo).   It is in control of the distro (Fedora, Ubuntu, 
... but perhaps not Gentoo).  I think the distro guys want a default 
behavior that is set in .config, with quirk overrides being done when 
needed.   And of course the user in his/her boot params gets the final say.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.

2007-12-16 Thread David P. Reed




Rene Herman wrote:
David: I've plugged in your DMI values in this. Could you perhaps test 
this to confirm that it works for you?



Will test it by tomorrow morning.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.

2007-12-16 Thread David P. Reed

The process of safely making delicate changes here is beyond my 
responsibility as just a user - believe me, I'm not suggesting that a 
risky fix be put in .24.   I can patch my own kernels, and I can even 
share an unofficial patch with others for now, or suggest that Fedora 
and Ubuntu add it to their downstream.


May I make a small suggestion, though.   If the decision is a DMI-keyed 
switch from out-80 to udelay(2)  gets put in, perhaps there should also 
be a way for people to test their own configuration for the underlying 
problem made available as a script.   Though it is a "hack", all you 
need to freeze a problem system is to run a loop doing about 1000 "cat 
/dev/nvram > /dev/null"  commands.  If that leads to a freeze, one might 
ask to have the motherboard added to the DMI-key list.


H. Peter Anvin wrote:

Ingo Molnar wrote:

* H. Peter Anvin <[EMAIL PROTECTED]> wrote:


Pavel Machek wrote:

this is also something for v2.6.24 merging.

As much as I like this patch, I do not think it is suitable for
.24. Too risky, I'd say.

No kidding!  We're talking about removing a hack that has been 
successful on thousands of pieces of hardware over 15 years because it 

 ^[*]

breaks ONE machine.


[*] "- none of which needs it anymore -"

there, fixed it for you ;-)

So lets keep this in perspective: this is a hack that only helps on a 
very low number of systems. (the PIT of one PII era chipset is known 
to be affected)


Yes, but the status quo has been *tested* on thousands of systems and 
is known to work.  Thus, changing it puts things into unknown 
territory, even if only a small number of machines actually need the 
current configuration.


Heck, there are only a small number of 386/486 machines still in 
operation and being actively updated.


unfortunately this hack's side-effects are mis-used by an unknown 
number of drivers to mask PCI posting bugs. We want to figure out 
those bugs (safely and carefully) and we want to remove this hack 
from modern machines that dont need it. Doing anything else would be 
superstition.


anyway, we likely wont be doing anything about this in .24.


Again, 24 is "right out".  25 is a "maybe", IMO.  Rene's fix could be 
an exception, since it is a DMI-keyed workaround for a specific 
machine and doesn't change behaviour in general.


-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: More info on port 80 symptoms on MCP51 machine.

2007-12-15 Thread David P. Reed


Allen Martin wrote:
Nothing inside the chipset should be decoding port 80 writes.  It's 
possible this board has a port 80 decoder wired onto the board that's 
misbehaving.  I've seen other laptop boards with port 80 decoders

wired onto the board, even if the 7 segment display is only populated
on debug builds.  

  
This is very helpful.  So the next question is there something on the 
laptop mainboard.


Any idea how to look for such a thing?   I am not averse to taking the 
laptop apart to look at the mainboard.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.

2007-12-15 Thread David P. Reed


Here we go.

# dmidecode -s baseboard-manufacturer
Quanta
# dmidecode -s baseboard-product-name
30B9

There do seem to be other systems, besides mine, that have the same 
problem.  I think it's pretty likely that other machines that have this 
problem are Quanta machines, since Quanta is one of the primary ODM's 
that does HP laptops.   Don't know about the product-name being common 
with the HP dv6000z family, which is another one reported to have this 
problem.  We could try to ask all the reporters of hwclock freezes to 
report their results from dmidecode.



Rene Herman wrote:

On 15-12-07 21:27, H. Peter Anvin wrote:


Rene Herman wrote:


Yes, just posted a Patch-For-Comments that switches on the 
availability of a TSC (tsc_init sets tsc_disable also for 
!cpu_has_tsc) which would mean that only really old stuff would be 
using the outb still. A TSC is really all we need to have a sensible 
udelay().


Uhm, no.  You have no clue what the speed of the TSC is until you 
have been able to calibrate it against a fixed timesource - like the 
PIT.


Yes. Hnng. Okay, this is going nowhere in a hurry, so back to the very 
first suggestion in this thread. How about this? This allows to switch 
from port 0x80 to port 0xed based on DMI.


David: I plugged in my own DMI values for testing, but obviously yours 
are needed. The values that are needed are retrieved by the 
"dmidecode" program which you will probably have installed (it might 
be in an sbin directory) or will be able to install through whatever 
package manager you use.


dmidecode -s baseboard-manufacturer
dmidecode -s baseboard-product-name

are the values you should plug into the .matches field in the 
dmi_system_id struct in this. It would be great if you could do that, 
test, and post back with those values. .ident should be a nice human 
name.


It's been tested on x86-32 and seems to work fine. It's not been 
tested on x86-64 but seems to stand a fair chance of working similarly.


It ofcourse remains possible to switch to a udelay() based method 
later on anyways but with all the pre-calibratin trouble, this might 
be the lowest risk method in the short run.


This is partly based on previous patches by Pavel Machek and David P. 
Reed.


I hope this is considered half-way correct/sane (note by the way that 
it's not a good idea to switch a "native_io_delay_port" value since 
plugging in a variable port would clobber register dx for every 
outb_p, which would then have to be reloaded for the next outb again). 
Comments appreciated.


Signed-off-by: Rene Herman <[EMAIL PROTECTED]>

 arch/x86/boot/compressed/misc_32.c |8 ++---
 arch/x86/boot/compressed/misc_64.c |8 ++---
 arch/x86/kernel/Makefile_32|2 -
 arch/x86/kernel/Makefile_64|2 -
 arch/x86/kernel/io_delay.c |   53 
+

 arch/x86/kernel/setup_32.c |2 +
 arch/x86/kernel/setup_64.c |2 +
 include/asm-x86/io_32.h|   17 ++-
 include/asm-x86/io_64.h|   23 ++--


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.

2007-12-15 Thread David P. Reed




H. Peter Anvin wrote:

David P. Reed wrote:
Just a thought for a way to fix the "very early" timing needed to set 
up udelay to work in a way that works on all machines.  Perhaps we 
haven't exploited the BIOS enough:  The PC BIOS since the PC AT 
(286)  has always had a standard "countdown timer" way to delay for n 
microseconds, which as far as I know still works.   This can be used 
to measure the speed of a delay loop, without using ANY in/out 
instructions directly (the ones in the BIOS are presumably correctly 
delayed...).


If we enter from the 32-bit entrypoint, we already don't have access 
to the BIOS, though.


My understanding is that the linux starts in real mode, and uses the 
BIOS for such things as reading the very first image.   
arch/x86/boot/main.c seems to use BIOS calls, and one can do what I 
wrote in C or asm.  Good place to measure the appropriate delay timing, 
and pass it on forward.  That's what I was suggesting, which is why I 
copied the ASM routine from my old code listing as I did.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.

2007-12-15 Thread David P. Reed

I understand the risks of such a fundamental change, and it may be only 
a minor concern, but I did also point out that using an unused port 
causes the bus to be tied up for a microsecond or two, which matters on 
a fast SMP machine.


Of course all the other concerns you guys are worrying about are really 
important.  I don't want to break anybody's running systems...  I'd like 
to see my machine run smoothly, and all the other machines that may or 
may not have this problem (google "hwclock freeze" to see that I'm far 
from alone - I just have persevered in "bisecting" this problem with 
kernel tweaks for months, whereas the others did not or did not know how).


By the way, this laptop is really nice for Linux in lots of ways.  Dual 
drives, so I set it up with software RAID for reliability, dual 64-bit 
processors, fast 3D graphics, etc.  Great battery life.  Just one last 
kernel issue.


I also note that curent machines like the problem machine have ACPI, and 
maybe those would be the ones that vendors might start to define port 80 
to mean something. As I noted, it /seems/ to be only when ACPI is turned 
on that this problem happens on my machine - that's when the OS starts 
to be involved in servicing various things, so it suggests that maybe 
things change about port 80's unknown function on these machines when 
ACPI is servicing the system management code (that's not something I 
have been able to verify).


My belief is that my machine has some device that is responding to port 
80 by doing something.  And that something requires some other program 
to "service" port 80 in some way.  But it sure would be nice to know.   
I can't personally sand off the top of the chipset to put probes into it 
- so my normal approach of putting a logic analyzer on the bus doesn't work.


PS: If I have time, I may try to build Rene's port 80 test for Windows 
and run it under WinXP on this machine (I still have a crappy little 
partition that boots it).   If it freezes the same way, it's almost 
certain a design "feature", and if it doesn't freeze, we might suspect 
that there is compensating logic in either Windows ACPI code or some way 
that windows "sets up" the machine.


Alan Cox wrote:

On Sat, 15 Dec 2007 14:27:25 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:

  

* Rene Herman <[EMAIL PROTECTED]> wrote:


The issue is -- how do you safely replace the outb pre-loops_per_jiffy 
calibration? I'm currently running with the attached hack (not 
submitted, only for 32-bit and discussion) the idea of which might be 
the best we can do?
  

how about doing a known-NOP chipset cycle? For example:

  inb(PIC_SLAVE_IMR)



It needs tobe a different chip to the main one (or macrocell anyway) - so
PIC for PIT and vice versa. However since we know 0x80 works for
everything on the planet but this one specifies of laptop which seems to
need a firmware update its a very high risk approach.

Alan

  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.

2007-12-14 Thread David P. Reed

Just a thought for a way to fix the "very early" timing needed to set up 
udelay to work in a way that works on all machines.  Perhaps we haven't 
exploited the BIOS enough:  The PC BIOS since the PC AT (286)  has 
always had a standard "countdown timer" way to delay for n microseconds, 
which as far as I know still works.   This can be used to measure the 
speed of a delay loop, without using ANY in/out instructions directly 
(the ones in the BIOS are presumably correctly delayed...).


So first thing in the boot sequence, one can calibrate a timing loop 
using this technique, and save the value to be used for all the "early" 
stuff.


Here's skeleton code from old ASM code I found lying around in my 
archives to use BIOS to measure how many unrolled short jumps can 
execute in 10 msec.  Note that it can run without knowing anything 
whatsoever about port timing.


haltbyte db 0

calibrate:
   les bx,haltbyte ; address of halt flag into es:bx
   mov ax,8300h
   sub cx,cx
   mov dx,1 ; 10 msec. in cx:dx
   int 15h
   jc bad
   sub dx,dx
   sub cx,cx ; decrement counter in dx:cx
tloop:
   jmp short $+2 ; 10 short jmps
   jmp short $+2
   jmp short $+2
   jmp short $+2
   jmp short $+2
   jmp short $+2
   jmp short $+2
   jmp short $+2
   jmp short $+2
   test haltbyte
   loopz tloop
   jnz done
   dec dx
   jnz tloop

" overflowed 32 bits ... never happens, cancel BIOS event wait.
   mov ax,8301h
   int 15h
   jmp error

done:
   mov ax,cx
   negl
" here dx:ax contains 32 bit loop count corresponding to 10 msec.
   ret ; return 32-bit value


Doc on function 83h of int 15h should be available online.
  


Alan Cox wrote:

On Fri, 14 Dec 2007 14:13:46 -0800
"H. Peter Anvin" <[EMAIL PROTECTED]> wrote:

  

Pavel Machek wrote:


On Fri 2007-12-14 10:02:57, H. Peter Anvin wrote:
  

Ingo Molnar wrote:

wow, cool fix! (I remember that there were other systems as well that are 
affected by port 0x80 muckery - i thought we had removed port 0x80 
accesses long ago.)
how about the simpler fix below, as a first-level approach? We can then 
remove the _p in/out sequences after this.
  
I believe this will suffer from the issue that was raised: this will use 
udelay() long before loop calibration (and no, we can't just "be 
conservative" since there is no "conservative" value we can use.)


?? Just initialize bogomips to 6GHz equivalent... and we are fine
until 6GHz cpus come out.
  

How long will that take to boot on a 386?



Well the dumb approach to fix that would seem to be to initialise it to

cpu->family   3 -> 50MHz 4 -> 300Mhz 5-> etc...

Alan

  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.

2007-12-14 Thread David P. Reed


Avi Kivity wrote:
kvm will forward a virtual machine's writes to port 0x80 to the real 
port.  The reason is that the write is much faster than exiting and 
emulating it; the difference is measurable when compiling kernels.


Now if the cause is simply writing to port 0x80, then we must stop 
doing that.  But if the reason is the back-to-back writes, when we can 
keep it, since the other writes will be trapped by kvm and emulated.  
Do you which is the case?


As for kvm, I don't have enough info to know anything about that.  Is 
there a test you'd like me to try?


I think you are also asking if the crash on these laptops is caused only 
by back-to-back writes.  Actually, it doesn't seem to matter if they are 
back to back.  I can cause the crash if the writes to 80 are very much 
spread out in time - it seems only to matter how many of them get 
executed - almost as if there is a buffer overflow.  (And of course if 
you do back to back writes to other ports that are apparently fully 
unused, such as 0xED on my machine, no crash occurs).


I believe (though no one seems to have confirming documentation from the 
chipset or motherboard vendor) that port 80 is actually functional for 
some unknown function on these machines.   (They do respond to "in" 
instructions faster than a bus cycle abort does - more evidence).


I searched the DSDT to see if there is any evidence of an ACPI use for 
this port, but found nothing.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.

2007-12-14 Thread David P. Reed




Andi Kleen wrote:

"
With the additional call this should be completely out of line now to save 
code size. Similar for the in variant.



  

Sure.  Want me to make a new patch with the _p croutines out-of-line?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.

2007-12-13 Thread David P. Reed


Replace use of outb to "unused" diagnostic port 0x80 for time delay
with udelay based time delay on x86_64 architecture machines.  Fix for
bugs 9511 and 6307 in bugzilla, plus bugs reported in
bugzilla.redhat.com.

Derived from suggestion (that didn't compile) by Pavel Machek, and
tested, also based on measurements of typical timings of out's
collated by Rene Herman from many in the community.

This patch fixes a number of bugs known to cause problems on HP
Pavilion dv9000z and dv6000z laptops - in the form of solid freezes
when hwclock is used to show or set the time.  Also, it potentially
improves bus utilization on SMP machines, by using a waiting process
that doesn't tie up the ISA/LPC bus for 1 or 2 microseconds.

i386 family fixes (completely parallel) were not included, considering
that such machines might involve more risk of problems on legacy machines.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>

Index: linux-2.6/arch/x86/boot/compressed/misc_64.c
===
--- linux-2.6.orig/arch/x86/boot/compressed/misc_64.c
+++ linux-2.6/arch/x86/boot/compressed/misc_64.c
@@ -269,10 +269,10 @@ static void putstr(const char *s)
RM_SCREEN_INFO.orig_y = y;

pos = (x + cols * y) * 2;   /* Update cursor position */
-   outb_p(14, vidport);
-   outb_p(0xff & (pos >> 9), vidport+1);
-   outb_p(15, vidport);
-   outb_p(0xff & (pos >> 1), vidport+1);
+   outb(14, vidport);
+   outb(0xff & (pos >> 9), vidport+1);
+   outb(15, vidport);
+   outb(0xff & (pos >> 1), vidport+1);
 }

 static void* memset(void* s, int c, unsigned n)
Index: linux-2.6/include/asm/io_64.h
===
--- linux-2.6.orig/include/asm/io_64.h
+++ linux-2.6/include/asm/io_64.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_IO_H
 #define _ASM_IO_H

+#include 

 /*
  * This file contains the definitions for the x86 IO instructions
@@ -15,19 +16,7 @@
  * mistake somewhere.
  */

-/*
- * Thanks to James van Artsdalen for a better timing-fix than
- * the two short jumps: using outb's to a nonexistent port seems
- * to guarantee better timings even on fast machines.
- *
- * On the other hand, I'd like to be sure of a non-existent port:
- * I feel a bit unsafe about using 0x80 (should be safe, though)
- *
- * Linus
- */
-
- /*
-  *  Bit simplified and optimized by Jan Hubicka
+/*   Bit simplified and optimized by Jan Hubicka
   *  Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999.
   *
   *  isa_memset_io, isa_memcpy_fromio, isa_memcpy_toio added,
@@ -35,36 +24,36 @@
   *  - Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
   */

-#define __SLOW_DOWN_IO "\noutb %%al,$0x80"
-
+/* the following delays are really conservative, at least for modern 
machines */

 #ifdef REALLY_SLOW_IO
-#define __FULL_SLOW_DOWN_IO __SLOW_DOWN_IO __SLOW_DOWN_IO 
__SLOW_DOWN_IO __SLOW_DOWN_IO

+#define _IOPORT_PAUSE_DELAY 10
 #else
-#define __FULL_SLOW_DOWN_IO __SLOW_DOWN_IO
+#define _IOPORT_PAUSE_DELAY 2
 #endif

 /*
  * Talk about misusing macros..
  */
-#define __OUT1(s,x) \
+#define __OUT1(s, x) \
 static inline void out##s(unsigned x value, unsigned short port) {

-#define __OUT2(s,s1,s2) \
-__asm__ __volatile__ ("out" #s " %" s1 "0,%" s2 "1"
-
-#define __OUT(s,s1,x) \
-__OUT1(s,x) __OUT2(s,s1,"w") : : "a" (value), "Nd" (port)); } \
-__OUT1(s##_p,x) __OUT2(s,s1,"w") __FULL_SLOW_DOWN_IO : : "a" (value), 
"Nd" (port));} \

+#define __OUT2(s, s1, s2) \
+  __asm__ __volatile__ ("out" #s " %" s1 "0,%" s2 "1" : : "a" (value), 
"Nd" \

+(port));
+
+#define __OUT(s, s1, x) \
+__OUT1(s, x) __OUT2(s, s1, "w") } \
+ __OUT1(s##_p, x) __OUT2(s, s1, "w") udelay(_IOPORT_PAUSE_DELAY); } \

 #define __IN1(s) \
 static inline RETURN_TYPE in##s(unsigned short port) { RETURN_TYPE _v;

-#define __IN2(s,s1,s2) \
-__asm__ __volatile__ ("in" #s " %" s2 "1,%" s1 "0"
+#define __IN2(s, s1, s2) \
+__asm__ __volatile__ ("in" #s " %" s2 "1,%" s1 "0" : "=a" (_v) : "Nd" 
(port));


-#define __IN(s,s1,i...) \
-__IN1(s) __IN2(s,s1,"w") : "=a" (_v) : "Nd" (port) ,##i ); return _v; } \
-__IN1(s##_p) __IN2(s,s1,"w") __FULL_SLOW_DOWN_IO : "=a" (_v) : "Nd" 
(port) ,##i ); return _v; } \

+#define __IN(s, s1) \
+__IN1(s) __IN2(s, s1, "w") return _v; } \
+ __IN1(s##_p) __IN2(s, s1, "w") udelay(_IOPORT_PAUSE_DELAY); return _v; } \

 #define __INS(s) \
 static inline void ins##s(unsigned short port, void * addr, unsigned 
long count) \


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-13 Thread David P. Reed

Simulating 1 microsecond delays (assuming LPC meets that goal for 0x80) 
is "absolutely correct" for devices provided on PCI-X running on 3 GHz 
or greater machines?


Well, you are entitled to your opinion.   Seems likely that reading the 
timing specs of such a chipset might be correct, and delaying for a time 
proportional to CPU speed, rather than assuming running 3000 3GHz clock 
cycles is needed on a very fast emulation of an old device that probably 
runs at the fastest bus speed provided in the chipset.


Every device has different timing constraints.  In the real world that I 
live in.


Alan Cox wrote:

On Thu, 13 Dec 2007 08:13:29 -0500
"David P. Reed" <[EMAIL PROTECTED]> wrote:

  
Perhaps what was meant is that ISA-tuned timings make little sense on 
devices that are part of the chipset or on the PCI or PCI-X buses?



No.

ISA as LPC bus is alive and well inside and outside chipsets. Welcome to
planet earth and the reality of 'its cheaper to reuse cells than design a
new one'. For the chipset logic like DMA controllers the _p is absolutely
correct.

Alan

  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-13 Thread David P. Reed

Perhaps what was meant is that ISA-tuned timings make little sense on 
devices that are part of the chipset or on the PCI or PCI-X buses?


On the other hand, since we don't know in many cases whether the "_p" 
was supposed to mean "the time it takes to execute an "out al,80h" on 
whatever bus structure happens to be on whatever machine, the problem is 
unsolvable.


Ranting about whether ISA/LPC is on what machines seems to be of little 
value in contributing to a constructive solution.


It seems to me that in the long term, driver writers would do well to 
think more clearly about the timings their devices require, when that is 
possible.   They are probably implementation dependent - depending on 
the clock speed of the particular clock that is driving the particular 
i/o device.


Then there's the social problem of a community development project - 
which is to get people to tune their code but preserve its ability to 
run on older and variant machines.


Alan Cox wrote:
Yes, it's now clear that all of this is so.  Regrettably, it's used in 
dozens of drivers, most having nothing to do with an ISA/LPC bus.


If it really is specific to the ISA architecture, then it should only be 
used in architecture specific code.



ISA/LPC is not architecture specific. In fact ISA/LPC bus components get
everywhere because the PC market drives the cost per unit for those
components down to nearly nothing.

  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: More info on port 80 symptoms on MCP51 machine.

2007-12-12 Thread David P. Reed


Port 0xED, just FYI:

cycles: out 1430, in 1370
cycles: out 1429, in 1370

(800 Mhz)


Rene Herman wrote:

On 12-12-07 21:07, David P. Reed wrote:

Sadly, I've been busy with other crises in my day job for the last 
few days.   I did modify Rene's test program and ran it on my 
"problem" machine, with the results below.


The interesting part of this is that port 80 seems to respond to "in" 
instructions faster than the presumably "unused" ports 0xEC  and 
0XEF  (those were mentioned by someone as alternatives to port 80).


Don't know if someone else mentioned those but I only said 0xed. 
That's the value Phoenix BIOSes use (yes, and which H. Peter Anvin) 
reported as being generally problematic as well).


It's in fact not all that unexpected it seems that port 0x80 responds 
to in given that it's used by the DMA controller. It's a write that 
falls on deaf ears. The read is going to be faster if it doesn't 
timeout on an unused port.


Although it's not faster for everyone, such as for me indicating that 
for us port 0x80 is really-really unused, it is for many. See results 
here:


http://lkml.org/lkml/2007/12/12/309

That, and the fact that the port 80 test reliably freezes the machine 
solid the second time it is run, and the "hwclock" utility reliably 
hangs the machine if the port 80's are used in the 
CMOS_READ/CMOS_WRITE loop, seems to strongly indicate that this 
chipset or motherboard actually uses port 80, rather than there being 
a bus problem.


Yes, so it seems. In this case we could in fact also "fix" your 
situation by just going to 0xed depending on for example DMI. Alan Cox 
just posted a few further problems with a simple udelay() replacement...


Someone might have an in to nVidia to clarify this, since I don't.  
In any case, the udelay(2) approach seems to be a safe fix for this 
machine.


Hope input from an "outsider" is helpful in going forward.   I put a 
lot of time and effort into tracking down this problem on this 
particular machine model, largely because I like the machine.


Running the (slightly modified to test ports 80, ec, ef instead of 
just port 80) test when the 2 GHz max speed CPU is running at 800 
MHz, here's what I get for port 80 and port ec and port ef.


port 80:   cycles: out 1430, in 792


At 800 MHz, that's 1.79 / 0.99 microseconds. The precision of the "in" 
is somewhat interesting. Did someone at nVidia think it's an "in" from 
0x80 which should get the 1 microsec delay?



port ef:cycles: out 1431, in 1378
port ec:   cycles: out 1432, in 1372


Unused ports, bus timeouts.




System info:  HP Pavilion dv9000z laptop (AMD64x2)

PCI bus controller is nVidia MCP51.
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 72
model name  : AMD Turion(tm) 64 X2 Mobile Technology TL-60
stepping: 2
cpu MHz : 800.000
cache size  : 512 KB


Rene.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

More info on port 80 symptoms on MCP51 machine.

2007-12-12 Thread David P. Reed

Sadly, I've been busy with other crises in my day job for the last few 
days.   I did modify Rene's test program and ran it on my "problem" 
machine, with the results below.


The interesting part of this is that port 80 seems to respond to "in" 
instructions faster than the presumably "unused" ports 0xEC  and 0XEF  
(those were mentioned by someone as alternatives to port 80).


That, and the fact that the port 80 test reliably freezes the machine 
solid the second time it is run, and the "hwclock" utility reliably 
hangs the machine if the port 80's are used in the CMOS_READ/CMOS_WRITE 
loop, seems to strongly indicate that this chipset or motherboard 
actually uses port 80, rather than there being a bus problem.


Someone might have an in to nVidia to clarify this, since I don't.  In 
any case, the udelay(2) approach seems to be a safe fix for this machine.


Hope input from an "outsider" is helpful in going forward.   I put a lot 
of time and effort into tracking down this problem on this particular 
machine model, largely because I like the machine.


Running the (slightly modified to test ports 80, ec, ef instead of just 
port 80) test when the 2 GHz max speed CPU is running at 800 MHz, here's 
what I get for port 80 and port ec and port ef.


port 80:   cycles: out 1430, in 792
port ef:cycles: out 1431, in 1378
port ec:   cycles: out 1432, in 1372


System info:  HP Pavilion dv9000z laptop (AMD64x2)

PCI bus controller is nVidia MCP51.
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 72
model name  : AMD Turion(tm) 64 X2 Mobile Technology TL-60
stepping: 2
cpu MHz : 800.000
cache size  : 512 KB
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Attitude problems.

2007-12-12 Thread David P. Reed


Who has attitude problems here?  I have indeed learned a lot about assholes.

linux-os (Dick Johnson) wrote:

Yep. We are all wrong. You come out of nowhere and claim to
be right. Goodbye.

  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFT] Port 0x80 I/O speed

2007-12-12 Thread David P. Reed


I have been having a fun time testing this on my AMD64x2 system.

Since out's to port 80 hang the system hard after a while, I can run a 
test just after booting, but the next run will typically hang it.


I did also test two ports thought to be unused.   They do *not* hang the 
system.  Thus apparently there is some device responding to port 80!


Running the (slightly modified to test 3 ports instead of just port 80) 
test when the CPU is running at 800 MHz, here's what I get for port 80 
and port ec and port ef.


port 80:   cycles: out 1430, in 792
port ef:cycles: out 1431, in 1378
port ec:   cycles: out 1432, in 1372

[Note: port 80, when it doesn't hang, which is very often, responds to a 
read twice as fast as a port which is "not there".   Typically though, 
the second time I run the test, the system freezes solid.  Seems like 
evidence of a present device, not a missing one.  Also, port 80 responds 
when booted with acpi=off, but never seems to hang - sounds like ACPI 
causes it to be active in some way]



System info:  HP Pavilion dv9000z laptop (AMD64x2)

PCI bus controller is nVidia MCP51.
/proc/cpuinfo

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 72
model name  : AMD Turion(tm) 64 X2 Mobile Technology TL-60
stepping: 2
cpu MHz : 800.000
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext 
fxsr_opt rdtscp lm 3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy 
svm extapic cr8_legacy

bogomips: 1608.35
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

processor   : 1
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 72
model name  : AMD Turion(tm) 64 X2 Mobile Technology TL-60
stepping: 2
cpu MHz : 800.000
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext 
fxsr_opt rdtscp lm 3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy 
svm extapic cr8_legacy

bogomips: 1608.35
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-11 Thread David P. Reed

1) I found in a book, the Undocumented PC, that I have lying around that 
the "pause" recommended for some old adapter chips on the ISA bus was 1 
usec.  The book carefully points out on various models of PCs how many 
short jumps are required to implement 1 usec, and suggests that for 
faster machines, 1 usec loops be calibrated.  That seems like a good 
heuristic.


2) Also, Dick, you got me interested in doing more historical research 
into electrical specs and circuit diagrams (which did come with the IBM 
5150).  The bus in the original IBM PC had no problem with "bus capacity 
being charged" as you put it.   Perhaps you don't remember that the I/O 
bus had the same electrical characteristics as the memory bus.  Thus 
there is no issue with the bus being "charged".  The issue of delays 
between i/o instructions was entirely a problem of whether the adapter 
card could clock data into its buffers and handle the clocked in data in 
time for another bus cycle.  This had nothing to do with "charging" - 
buses in those days happily handled edges that were much faster than 1 usec.


We at Software Arts did what we did based on direct measurements and 
testing.  We found that the early BIOS listings were usually fine, but 
in fact were misleading. After all, the guys who built the machine and 
wrote the BIOS were in a hurry.  There were errata. 


linux-os (Dick Johnson) wrote:

You do remember that the X86 can do back-to-back port
instructions faster than the ISA bus capacity can be
charged, don't you? You do remember the admonishment
about:
intel asm

mov dx, port; One of two adjacent ports
mov al,ffh  ; All bits set
out dx,al   ; Output to port, bits start charging bus
inc al  ; Al becomes 0
inc dx  ; Next port
out dx,al   ; Write 0 there, data bits discharged

When the port at 'port' gets its data, it will likely
be 0, not 0xff, because the intervening instructions
can execute faster than a heavily-loaded ISA bus.

  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-11 Thread David P. Reed

Dick - I didn't work for Don in Boca.  I did know him, having met with 
him several times when he was still alive.  I worked from 1979-1985 as a 
consultant and eventually VP R&D, at Software Arts in Cambridge, MA, and 
there was a machine we developed the first IBM Visicalc for, in a locked 
room which required NDA sign-in, with a list of authorized employees and 
consultants.  The machine was a plywood board.   It was not a 5150, yet.


Note that I did not say I worked in Boca.  Funny thing to twist my 
comments into that assertion.


Note: I am not trying to say that I know everything about the history of 
PC-compatibles, nor am I trying to prove some kind of macho thing.  But 
I do happen to have a lot of practical experience in this space.


In particular, I am trying to contribute to Linux to make it better.  
Largely because I think you guys are doing a great thing, and as a user 
of it, I think it's a good thing to give back.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-11 Thread David P. Reed




Alan Cox wrote:


The vga driver is somewhat misnamed. In console mode we handle everything
back to MDA/HGA and some HGA adapters do need delays.

  
No they don't.   I really, really, really know this for a fact.  I wrote 
ASM drivers for every early video adapter and ran them all through Lotus 
QA and Software Arts QA.   Personally.  The only delay needed is caused 
by not having dual-ported video frame buffers on the original CGA in 
high res character mode. This caused "snow" when a memory write was done 
concurrently with the read being done by the scanline character 
generator. And that delay was done by waiting for a bit in the I/O port 
space to change.  There was NO reason to do waits between I/O 
instructions.  Produce a spec sheet, or even better a machine.   I may 
have an original PC-XT still lying around in the attic, don't know if I 
can fire it up, but it had such graphics cards.  I also have several 
early high-speed clones that were "overclocked".


I do remind all that 0x80 is a BIOS-specific standard, and is per BIOS - 
other ports have been used in the history of the IBM PC family by some 
vendors, and some machines have used it for DMA port mapping!!  And 



All do -thats why it is suitable.
  
Not true.  Again, I can produce machines that don't use 0x80.  Perhaps 
that is because I am many years older than you are, and have been 
writing code for PC compatibles since 1981.  (not a typo - this was 
before the first IBM PC was released to the public).
  
Windows XP does NOT use it at all.  Therefore it may not be supported by 



Older Windows does. Don't know about XP although DOS apps in XP will but
they may virtualise the port.

  
Show me one line of Windows code written by Microsoft that uses port 
80.  I don't know what app hackers might have done - there was no 
protection, and someone might have copied the BIOS for some reason.
I have a simple patch that fixes my primary concern - just change the 
CMOS_READ and CMOS_WRITE, 64-bit versions of I/O and bootcode vga 
accesses (first group below) to use the straight inb and outb code.



Which requires care. Have you verified all the main chipset vendors ?

  
I obviously have not.   Clearly the guys who want this port 80 hack so 
desperately have not either.  That's why we are in this pickle.  (well, 
we only to the extent that I am accepted as having useful input.  I'm 
happy to go if I'm not perceived as being helpful).
I may submit it so that the many others who share my pain will be made 



All .. none of them ?
  
There is a long standing set of reports of "hwclock" not working on HP 
dv.000v laptops, where the . stands for 2, 4, 6, and 9.   These are all 
nvidia MCP51 chipset AMD64's.


And if you choose to be such an insulting , I may just stop trying 
to be helpful.  I presume that others in the community find my comments 
helpful.

I can do some of these off the top of my head

  

drivers/net/8390.h



Needed for some 8390 devices on ISA bus

  

drivers/net/de600.c
drivers/net/de600.h



Uses the parallel port which isnt guaranteed to be full ISA speed.

  

drivers/scsi/ppa.h



Parallel port

  

drivers/serial/8250.c



Some PC's need delays for certain ops.

  

drivers/watchdog/wdt_pci.c



That one is a mistake I believe, I'll dig out the docs.


  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-11 Thread David P. Reed

Which port do you want me to test?  Also, I can run the timing test on 
my machine if you share the source code so I can build it.


Rene Herman wrote:

On 11-12-07 17:30, Andi Kleen wrote:


Anyways it looks like the discussion here is going in a
a loop. I had hoped David would post his test results with
another port so that we know for sure that the bus aborts (and not 
port 80) is the problem on his box. But it looks like

he doesn't want to do this. Still removing the bus aborts
is probably the correct way to go forward.


Yes, I do also still want to know that. David (Reed)?


Only needs a patch now. If nobody beats me to it i'll
add one later to my tree.


Pavel Machek already posted one. His udelay(8) wants to be less -- 1 
or "to be safe" perhaps 2.


http://lkml.org/lkml/2007/12/9/131

Rene.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-11 Thread David P. Reed


As the person who started this thread, I'm still puzzled by two things:

1) is there anyone out there who wrote one of these drivers (most listed 
below, from a list of those I needed to patch to eliminate refs to _b 
calls) or arch specific code (also listed below), who might know why the 
_p macros are actually needed (for any platform)?


Note that many of the devices are not on the ISA/LPC bus now, even if 
they were, and the vga has never needed a bus-level pause since the 
original IBM PC existed. (it did need a sync with retrace, but that's 
another story).


2) Why are opterons and so forth so slow on out's to x80 as the 
measurements show?   That seems to me like there is a hidden bus timeout 
going on.   I'm still trying to figure out what is happening on my 
machine which hangs when not in legacy mode (i.e. in ACPI mode) after a 
lot of out's to x80.  Perhaps the bus timeout handling is the issue.


I do remind all that 0x80 is a BIOS-specific standard, and is per BIOS - 
other ports have been used in the history of the IBM PC family by some 
vendors, and some machines have used it for DMA port mapping!!  And 
Windows XP does NOT use it at all.  Therefore it may not be supported by 
vendors, and may in fact be used for other purposes, since it can if the 
BIOS doesn't use it.


I have a simple patch that fixes my primary concern - just change the 
CMOS_READ and CMOS_WRITE, 64-bit versions of I/O and bootcode vga 
accesses (first group below) to use the straight inb and outb code.


I may submit it so that the many others who share my pain will be made 
happy - at least on modern _64 x86 machines those instructions don't 
need the _p feature.  The rest of the drivers and code are just lurking 
disasters, which I hope can be resolved somehow by the community 
figuring out what the timing delays were put there for...

-
arch/x86/boot/compressed/misc_64.c
arch/x86/kernel/i8259_64.c
arch/x86/pci/irq.c
include/asm/floppy.h
include/asm/io_64.h
include/asm/mc146818rtc_64.h
include/asm/vga.h
include/video/vga.h
drivers/video/console/vgacon.c
drivers/video/vesafb.c
drivers/video/vga16fb.c

drivers/acpi/processor_idle.c
drivers/bluetooth/bluecard_cs.c
drivers/char/pc8736x_gpio.c
drivers/char/rocket_int.h
drivers/hwmon/abituguru.c
drivers/hwmon/abituguru3.c
drivers/hwmon/it87.c
drivers/hwmon/lm78.c
drivers/hwmon/pc87360.c
drivers/hwmon/sis5595.c
drivers/hwmon/smsc47b397.c
drivers/hwmon/smsc47m1.c
drivers/hwmon/via686a.c
drivers/hwmon/vt8231.c
drivers/hwmon/w83627ehf.c
drivers/hwmon/w83627hf.c
drivers/hwmon/w83781d.c
drivers/i2c/busses/i2c-amd756.c
drivers/i2c/busses/i2c-i801.c
drivers/i2c/busses/i2c-nforce2.c
drivers/i2c/busses/i2c-viapro.c
drivers/input/misc/pcspkr.c
drivers/isdn/hisax/elsa_ser.c
drivers/isdn/hisax/s0box.c
drivers/misc/sony-laptop.c
drivers/net/8390.h
drivers/net/de600.c
drivers/net/de600.h
drivers/net/irda/nsc-ircc.c
drivers/net/irda/w83977af_ir.c
drivers/net/pcmcia/axnet_cs.c
drivers/net/pcmcia/pcnet_cs.c
drivers/net/wireless/wl3501_cs.c
drivers/scsi/megaraid.c
drivers/scsi/megaraid.h
drivers/scsi/ppa.h
drivers/serial/8250.c
drivers/video/console/vgacon.c
drivers/video/vesafb.c
drivers/video/vga16fb.c
drivers/watchdog/pcwd_pci.c
drivers/watchdog/w83627hf_wdt.c
drivers/watchdog/w83697hf_wdt.c
drivers/watchdog/w83877f_wdt.c
drivers/watchdog/w83977f_wdt.c
drivers/watchdog/wdt_pci.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-08 Thread David P. Reed


I am going to do a test on another "unused" port.

However, I realized as I was thinking about this.  0x80 is the 
"diagnostic device" port.  It is not an "unused" port.


Normally, Linux would support a device like the diagnostic device by 
providing a character device, called /dev/post-diagnosis  (for the 
power-on test diagnostic).  That device would reserve port 80 for its 
use, and the driver could be loaded if there was such a device.


Now one possibility is that my laptop contains a diagnostic code device 
that stores all the out's to port 80 (documented only to the designers, 
and kept "secret").   That device may need "clearing" periodically, 
which is perhaps done by the SMM, which is turned off when I go into 
ACPI-on state.  Or maybe it is designed to be cleared only when the 
system boots at the beginning of the BIOS.  What happens when (as 
happens in hwclock's polling of the RTC) thousands of in/out*_p calls 
are made very fast?  Well, perhaps it is not cleared quickly enough, and 
hangs the bus.


The point here is that Linux is NOT using a defined-to-be "unused" 
port.  It IS using the "diagnostic" port, and talking to a diagnostic 
device that *is* used, and may be present.


Just doesn't seem clean to me.

So I'd suggest 2 actions:

1) figure out a better implementation of _p that is "safe" and doesn't 
use questionable heuristics.  udelay seems reasonable because it doesn't 
drive contention on the busses on SMP machines, but perhaps someone has 
a better idea.


2) Start a background task with the maintainers of drivers to clean up 
their code regarding these short delays for slow devices (note that it's 
never because the *bus* is slow, but because the *device* is slow.)  
Perhaps this could be helped by "deprecating" the _p calls and 
suggesting an alternative that requires the coder to be precise about 
what the delay is for, and how long it is supposed to be, perhaps on a 
per-machine basis.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-08 Thread David P. Reed




Alan Cox wrote:


0x80 should be fine for anything PC compatible anyway, its specifically
reserved as a debug port and supported for *exactly* that purpose by
many chipsets.

  
Disagree. The definitions of PC compatible are quite problematic.   I 
have the advantage over some of you young guys, in that I actually wrote 
code on one of the first 5 breadboard IBM PCs on the planet at Software 
Arts, Inc. and I was directly involved in hardware spec projects with 
the original IBM and Compaq engineers.  No one actually defined the port 
numbered 80h as a "standard" for anything.  You won't find it documented 
in any early manual for an IBM machine.


The ISA bus supported unterminated transactions safely.   That allowed 
some clever folks to design BIOS diagnostic tools that optionally 
plugged into the bus.


In any case, my machine does not have an ISA bus.  Why should it?  It's 
a laptop!


Now the interesting thing is that I have been scanning the source code 
of Linux, and I find gazillions of inb_p outb_p and so forth 
instructions where they have NO value.   It's as if some hacker who half 
understood hardware threw in the _p "just to be safe".  Well, it's 
neither safe, nor is it economical of code or data.  It hangs up the bus 
on an MP machine, for example, even when it works, to do the delay by 
"outb al,80h"


Worse, the actual requirements of the gazillions of inb_p instructions 
for delays are not documented in the code!  This is interesting, because 
the number of devices likely to need a delay after providing data on an 
"in" instruction is very likely to be near zero.  After all, the device 
has already serviced the bus and delivered data!  Why put many 
microseconds into the bus, locking out other ISA transactions (and PCI 
maybe too) with an out to port 80?


Some of the code in linux is really nice, really clean, really 
well-thought out.  Some is ... well, I'm not trolling for a fight.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-07 Thread David P. Reed


Andi Kleen wrote:

Changing the delay instruction sequence from the outb to short jumps
might be the safe thing.

I don't think that makes sense to do on anything modern. The trouble
is that the jumps will effectively execute near "infinitely fast" on any
modern CPU compared to the bus. But the delay really needs to be something
that is about IO port speed.
This all presumes that you need any delay at all.  From back in the 
early days (when I was writing DOS and BIOS code on 80286 class 
machines) the /only/ reason this was a problem was using really slow 
acting, non-buffered chips compared to the processor clock (8259?).  If 
you think about it, if there is a sequence such as outb->device, 
inb<-device, the only reason for a delay would be that the device failed 
to process the out command, /and/ the device had no "done" flag.  The 
other "slow" problem would be an out->device, out->device at a rate 
higher than the device could handle because it had a one-level buffer 
that ignored input that came too fast after the previous, but didn't 
stall the bus to protect the device.  Modern machines just are not 
designed that way - a few of the early PC compatibles were.


My machine in question, for example, needs no waiting within CMOS_READs 
at all.   And I doubt any other chip/device needs waiting that isn't 
already provided by the bus. the i/o to port 80 is very, very odd in 
this context.  Actually, modern machines have potentially more serious 
problems with i/o ops to non-existent addresses, which may cause real 
bus wierdness.


So that's why I suggested the short-jump answer - it fixes the problem 
on the ancient machines, but doesn't do anything on the modern ones, 
where there should be no problem.


One patch that makes immediate sense is to use the "virtualization" 
hooks for the CMOS_READ/WRITE ops that is there in the 32-bit code to 
allow substitution of a workable sequence for the RTC, which is where I 
experience the problem on my machine.  This doesn't fix any lurking 
issues with the _p APIs, since they are not virtualized.  I'd suggest 
the safest possible route that would fix my machine would be either an 
early_quirk, a boot parameter, or both that would then control the 
virtualization hook logic.


That patch would fix my machine's current issues, and would not harm any 
machines that need the 0x80 delay.


But I know it leaves a lurking issue for another day - for all the other 
inb_p and outb_p code in the kernel drivers.  A grep suggests that they 
are used only in somewhat less modern drivers - perhaps for legacy 
machines.  I don't think any such drivers are used on any of my machines.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-06 Thread David P. Reed

Pardon my ignorance, but is port 0xed really safe to push an out cycle 
at across the entire x86_64 family?  How long must real _p pauses be in 
reality?  (and who cares about what the code calls "really slow i/o").


Why are we waiting at all?  I read the comments in io_64.h, and am a bit 
mystified.  Does Windoze or DOS do this magical mystery wait?


Anyway, the virtualization hooks in 32-bit x86 almost make it possible 
to isolate this simply - maybe - after the merge of 32/64 being 
contemplated.


And anyone who knows what the chipset might be doing with the 80 port 
rather than POST codes, perhaps could contribute.  Any nvidia folks who 
know what's happening under NDA?  Any Phoenix BIOS types?


I think the worst of the problems would be fixed by changing just the 
CMOS_READ/CMOS_WRITE macros.   But the danger lingers in the *_p code.


Rene Herman wrote:

On 07-12-07 01:23, Robert Hancock wrote:


David P. Reed wrote:
After much, much testing (months, off and on, pursuing hypotheses), 
I've discovered that the use of "outb al,0x80" instructions to 
"delay" after inb and outb instructions causes solid freezes on my 
HP dv9000z laptop, when ACPI is enabled.


It takes a fair number of out's to 0x80, but the hard freeze is 
reliably reproducible by writing a driver that solely does a loop of 
50 outb's to 0x80 and calling it in a loop 1000 times from user 
space.  !!!


The serious impact is that the /dev/rtc and /dev/nvram devices are 
very unreliable - thus "hwclock" freezes very reliably while looping 
waiting for a new second value and calling "cat /dev/nvram" in a 
loop freezes the machine if done a few times in a row.


This is reproducible, but requires a fair number of outb's to the 
0x80 diagnostic port, and seems to require ACPI to be on.


io_64.h is the source of these particular instructions, via the 
CMOS_READ and CMOS_WRITE macros, which are defined in 
mc146818_64.h.  (I wonder if the same problem occurs in 32-bit mode).


I'm happy to complete and test a patch, but I'm curious what the 
right approach ought to be.  I have to say I have no clue as to what 
ACPI is doing on this chipset  (nvidia MCP51) that would make port 
80 do this.  A raw random guess is that something is logging POST 
codes, but if so, not clear what is problematic in ACPI mode.


ANy help/suggestions?

Changing the delay instruction sequence from the outb to short jumps 
might be the safe thing.  But Linus, et al. may have experience with 
that on other architectures like older Pentiums etc.


The fact that these "pausing" calls are needed in the first place 
seems rather cheesy. If there's hardware that's unable to respond to 
IO port writes as fast as possible, then surely there's a better 
solution than trying to stall the IOs by an arbitrary and 
hardware-dependent amount of time, like udelay calls, etc. Does any 
remotely recent hardware even need this?


The idea is that the delay is not in fact hardware dependent. With in 
the the absense of a POST board port 0x80 being sort of guaranteeed to 
not be decoded on PCI but forwarded to and left to die on ISA/LPC one 
should get the effect that the _next_ write will have survived an 
ISA/LPC bus address cycle acknowledgement timeout.


I believe.

And no, I don't believe any remotely recent hardware needs it and have 
in fact wondered about it since actual 386 days, having since that 
time never found a device that wouldn't in fact take back to back I/O 
even. Even back then (ie, legacy only systems, no forwarding from PCI 
or anything) BIOSes provided ISA bus wait-state settings which should 
be involved in getting insanely stupid and old hardware to behave...


Port 0xed has been suggested as an alternate port. Probably not a 
great "fix" but if replacing the out with a simple udelay() isn't that 
simple (during early boot I gather) then it might at least be 
something for you to try. I'd hope that the 0x80 in 
include/asm/io.h:native_io_delay() would be the only one you are 
running into, so you could change that to 0xed and see what catches fire.


If there are no sensible fixes, an 0x80/0xed choice could I assume be 
hung of DMI or something (if that _is_ parsed soon enough).


Rene.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-06 Thread David P. Reed

After much, much testing (months, off and on, pursuing hypotheses), I've 
discovered that the use of "outb al,0x80" instructions to "delay" after 
inb and outb instructions causes solid freezes on my HP dv9000z laptop, 
when ACPI is enabled.


It takes a fair number of out's to 0x80, but the hard freeze is reliably 
reproducible by writing a driver that solely does a loop of 50 outb's to 
0x80 and calling it in a loop 1000 times from user space.  !!!


The serious impact is that the /dev/rtc and /dev/nvram devices are very 
unreliable - thus "hwclock" freezes very reliably while looping waiting 
for a new second value and calling "cat /dev/nvram" in a loop freezes 
the machine if done a few times in a row.


This is reproducible, but requires a fair number of outb's to the 0x80 
diagnostic port, and seems to require ACPI to be on.


io_64.h is the source of these particular instructions, via the 
CMOS_READ and CMOS_WRITE macros, which are defined in mc146818_64.h.  (I 
wonder if the same problem occurs in 32-bit mode).


I'm happy to complete and test a patch, but I'm curious what the right 
approach ought to be.  I have to say I have no clue as to what ACPI is 
doing on this chipset  (nvidia MCP51) that would make port 80 do this.  
A raw random guess is that something is logging POST codes, but if so, 
not clear what is problematic in ACPI mode.


ANy help/suggestions?

Changing the delay instruction sequence from the outb to short jumps 
might be the safe thing.  But Linus, et al. may have experience with 
that on other architectures like older Pentiums etc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: on x86_64, correct reading of PC RTC when update in progress in time_64.c

2007-11-15 Thread David P. Reed

There are a couple of things I don't understand on this one.  And I 
presume you thought the other two bug fixing patches I sent before this 
were OK to go, since on my system


Thomas Gleixner wrote:


Still whitespace wreckage in your patches. I guess the kernel tree you
made your patches against is already white space wrecked.

I fixed that up manually, but please be more careful about that next
time.


Um ... I fixed the whitespaces I detected from the first round with 
checkpatch.pl.  Then for good measure
I ran checkpatch.pl on the patches, then pasted the files directly into 
the emails.  No problems detected.
And I also just tried checkpatch.pl on the "sent" folder copy.  No 
problems detected there.
Where was the whitespace?  Was it in the patches? Would you mind showing 
me the output so I can do a better job in the future?
  

Correct potentially unstable PC RTC time register reading in time_64.c

Stop the use of an incorrect technique for reading the standard PC RTC
timer, which is documented to "disconnect" time registers from the bus
while updates are in progress.  The use of UIP flag while interrupts
are disabled to protect a 244 microsecond window is one of the
Motorola spec sheet's documented ways to read the RTC time registers
reliably.

The patch updates the misleading comments and also minimizes the amount of
time that the kernel disables interrupts during the reading.



While I think that the UIP change is correct and a must have, the
locking change is not really useful. read_persistent_clock is called
from exactly three places:
  
What locking change?  I didn't change how locking works in 
read_persistent_clock at all.
I did introduce cpu_relax() because if anyone else ever calls from a hot 
path, that would be good practice and its' one line.

Right after boot, right before suspend and right after resume. None of
those places is a hot path, where we really care about the interrupt
enable/disable. IIRC, this is even called with interrupts disabled
most of the time, so no real gain here.

Another reason not to do the locking change is the paravirt stuff
which is coming for 64bit. I looked into the existing 32bit code and
doing the same lock thing would introduce a real nasty hackery, which
is definitely not worth the trouble.
  
I presume time_64.c and time_32.c will be unified at some point, 
discarding time_64.c.  There's no arch-specific reason to be separate.  
The current time_32.c depends on a different nmi path (that does some 
weird stuff saving and restoring the CMOS index register!), and I didn't 
dare usurp your long-term plan to unify architectures.  But a simple 
cleanup here makes sense, lest someone copy the bad technique as if it 
were good.

Thanks,

tglx

  

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>
---
Index: linux-2.6/arch/x86/kernel/time_64.c
===
--- linux-2.6.orig/arch/x86/kernel/time_64.c
+++ linux-2.6/arch/x86/kernel/time_64.c
@@ -160,22 +160,30 @@ unsigned long read_persistent_clock(void
unsigned long flags;
unsigned century = 0;

-   spin_lock_irqsave(&rtc_lock, flags);
+ retry:spin_lock_irqsave(&rtc_lock, flags);
+   /* if UIP is clear, then we have >= 244 microseconds before RTC
+* registers will be updated.  Spec sheet says that this is the
+* reliable way to read RTC - registers invalid (off bus) during
update
+*/
+   if ((CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP)) {
+ spin_unlock_irqrestore(&rtc_lock, flags);
+ cpu_relax();
+ goto retry;
+   }

-   do {
-   sec = CMOS_READ(RTC_SECONDS);
-   min = CMOS_READ(RTC_MINUTES);
-   hour = CMOS_READ(RTC_HOURS);
-   day = CMOS_READ(RTC_DAY_OF_MONTH);
-   mon = CMOS_READ(RTC_MONTH);
-   year = CMOS_READ(RTC_YEAR);
+   /* now read all RTC registers while stable with interrupts disabled */
+
+   sec = CMOS_READ(RTC_SECONDS);
+   min = CMOS_READ(RTC_MINUTES);
+   hour = CMOS_READ(RTC_HOURS);
+   day = CMOS_READ(RTC_DAY_OF_MONTH);
+   mon = CMOS_READ(RTC_MONTH);
+   year = CMOS_READ(RTC_YEAR);
 #ifdef CONFIG_ACPI
-   if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID &&
-   acpi_gbl_FADT.century)
-   century = CMOS_READ(acpi_gbl_FADT.century);
+   if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID &&
+   acpi_gbl_FADT.century)
+   century = CMOS_READ(acpi_gbl_FADT.century);
 #endif
-   } while (sec != CMOS_READ(RTC_SECONDS));
-
spin_unlock_irqrestore(&rtc_lock, flags);

/*





  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED

[PATCH] x86: on x86_64, correct reading of PC RTC when update in progress in time_64.c

2007-11-14 Thread David P. Reed


Correct potentially unstable PC RTC time register reading in time_64.c

Stop the use of an incorrect technique for reading the standard PC RTC
timer, which is documented to "disconnect" time registers from the bus
while updates are in progress.  The use of UIP flag while interrupts
are disabled to protect a 244 microsecond window is one of the
Motorola spec sheet's documented ways to read the RTC time registers
reliably.

The patch updates the misleading comments and also minimizes the amount of
time that the kernel disables interrupts during the reading.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>
---
Index: linux-2.6/arch/x86/kernel/time_64.c
===
--- linux-2.6.orig/arch/x86/kernel/time_64.c
+++ linux-2.6/arch/x86/kernel/time_64.c
@@ -160,22 +160,30 @@ unsigned long read_persistent_clock(void
unsigned long flags;
unsigned century = 0;

-   spin_lock_irqsave(&rtc_lock, flags);
+ retry:spin_lock_irqsave(&rtc_lock, flags);
+   /* if UIP is clear, then we have >= 244 microseconds before RTC
+* registers will be updated.  Spec sheet says that this is the
+* reliable way to read RTC - registers invalid (off bus) during update
+*/
+   if ((CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP)) {
+ spin_unlock_irqrestore(&rtc_lock, flags);
+ cpu_relax();
+ goto retry;
+   }

-   do {
-   sec = CMOS_READ(RTC_SECONDS);
-   min = CMOS_READ(RTC_MINUTES);
-   hour = CMOS_READ(RTC_HOURS);
-   day = CMOS_READ(RTC_DAY_OF_MONTH);
-   mon = CMOS_READ(RTC_MONTH);
-   year = CMOS_READ(RTC_YEAR);
+   /* now read all RTC registers while stable with interrupts disabled */
+
+   sec = CMOS_READ(RTC_SECONDS);
+   min = CMOS_READ(RTC_MINUTES);
+   hour = CMOS_READ(RTC_HOURS);
+   day = CMOS_READ(RTC_DAY_OF_MONTH);
+   mon = CMOS_READ(RTC_MONTH);
+   year = CMOS_READ(RTC_YEAR);
 #ifdef CONFIG_ACPI
-   if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID &&
-   acpi_gbl_FADT.century)
-   century = CMOS_READ(acpi_gbl_FADT.century);
+   if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID &&
+   acpi_gbl_FADT.century)
+   century = CMOS_READ(acpi_gbl_FADT.century);
 #endif
-   } while (sec != CMOS_READ(RTC_SECONDS));
-
spin_unlock_irqrestore(&rtc_lock, flags);

/*


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] time: fix typo that makes sync_cmos_clock erratic

2007-11-14 Thread David P. Reed


Fix a typo in ntp.c that has caused updating of the persistent (RTC)
clock when synced to NTP to behave erratically.

When debugging a freeze that arises on my AMD64 machines when I
run the ntpd service, I added a number of printk's to monitor the
sync_cmos_clock procedure.  I discovered that it was not syncing to
cmos RTC every 11 minutes as documented, but instead would keep trying
every second for hours at a time.  The reason turned out to be a typo
in sync_cmos_clock, where it attempts to ensure that
update_persistent_clock is called very close to 500 msec. after a 1
second boundary (required by the PC RTC's spec). That typo referred to
"xtime" in one spot, rather than "now", which is derived from "xtime"
but not equal to it.  This makes the test erratic, creating a
"coin-flip" that decides when update_persistent_clock is called - when
it is called, which is rarely, it may be at any time during the one
second period, rather than close to 500 msec, so the value written is
needlessly incorrect, too.

Signed-off-by: David P. Reed

---
I have thoroughly tested this on several x86_64 machines, and also
on one 32-bit machine.  I have also submitted a patch to fix the
freeze cited above, but both patches are independent, and should be
treated separately.

Index: linux-2.6/kernel/time/ntp.c
===
--- linux-2.6.orig/kernel/time/ntp.c
+++ linux-2.6/kernel/time/ntp.c
@@ -205,7 +205,7 @@ static void sync_cmos_clock(unsigned lon
return;

getnstimeofday(&now);
-   if (abs(xtime.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec / 2)
+   if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec / 2)
fail = update_persistent_clock(now);

next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: fix freeze in x86_64 RTC update code in time_64.c

2007-11-14 Thread David P. Reed

Fix hard freeze on x86_64 when the ntpd service calls 
update_persistent_clock()


A repeatable but randomly timed freeze has been happening in Fedora 6
and 7 for the last year, whenever I run the ntpd service on my AMD64x2
HP Pavilion dv9000z laptop.  This freeze is due to the use of
spin_lock(&rtc_lock) under the assumption (per a bad comment) that
set_rtc_mmss is called only with interrupts disabled.  The call from
ntp.c to update_persistent_clock is made with interrupts enabled.

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>
---
I have thoroughly tested this patch on a number of x86_64 machines
with various numbers of cores and chipsets, using 2.6.24rc2 kernel
source.

Index: linux-2.6/arch/x86/kernel/time_64.c
===
--- linux-2.6.orig/arch/x86/kernel/time_64.c
+++ linux-2.6/arch/x86/kernel/time_64.c
@@ -82,18 +82,15 @@ static int set_rtc_mmss(unsigned long no
int retval = 0;
int real_seconds, real_minutes, cmos_minutes;
unsigned char control, freq_select;
+   unsigned long flags;

 /*
- * IRQs are disabled when we're called from the timer interrupt,
- * no need for spin_lock_irqsave()
+ * set_rtc_mmss is called when irqs are enabled, so disable irqs here
  */
-
-   spin_lock(&rtc_lock);
-
+   spin_lock_irqsave(&rtc_lock, flags);
 /*
  * Tell the clock it's being set and stop it.
  */
-
control = CMOS_READ(RTC_CONTROL);
CMOS_WRITE(control | RTC_SET, RTC_CONTROL);

@@ -138,7 +135,7 @@ static int set_rtc_mmss(unsigned long no
CMOS_WRITE(control, RTC_CONTROL);
CMOS_WRITE(freq_select, RTC_FREQ_SELECT);

-   spin_unlock(&rtc_lock);
+   spin_unlock_irqrestore(&rtc_lock, flags);

return retval;
 }


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix locking and sync bugs in x86_64 RTC code in time_64.c

2007-11-14 Thread David P. Reed


Cool. More reason to separate this into two.

Matt Mackall wrote:

On Wed, Nov 14, 2007 at 08:10:22AM -0500, David P. Reed wrote:
  

Will make two patches and resend.



I've already got a set of patches brewing to fix the UIP problem across all
the affected arches (11+).

  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix locking and sync bugs in x86_64 RTC code in time_64.c

2007-11-14 Thread David P. Reed


Will make two patches and resend.

Thomas Gleixner wrote:

David,

On Mon, 12 Nov 2007, David P. Reed wrote:

  

From: David P. Reed

Fix two bugs in arch/x86/kernel/time_64.c that affect the x86_64 kernel. 1) a
repeatable hard freeze due to interrupts when the ntpd service calls
update_persistent_clock(), 2) potentially unstable PC RTC timer values when
timer is read.



1) Please send separate patches for separate problems.

2) Your patch is white space damaged and does not apply. Please run it
   through scripts/checkpatch.pl before submitting, maybe send it to
   yourself first and verify that it applies correctly.

  

Signed-off-by: David P. Reed <[EMAIL PROTECTED]>
---
More explanation:
1) A repeatable but randomly timed freeze has been happening in Fedora 6 and 7
for the last year, whenever I run the ntpd service on my AMD64x2 HP Pavilion
dv9000z laptop.  This freeze is due to the use of spin_lock(&rtc_lock) under
the assumption (per a bad comment) that set_rtc_mmss is called only with
interrupts disabled.  The call from ntp.c to update_persistent_clock is made
with interrupts enabled.



Good catch.
 
  

2) the use of an incorrect technique for reading the standard PC RTC timer,
which is documented to "disconnect" time registers from the bus while updates
are in progress.  The use of UIP flag while interrupts are disabled to protect
a 244 microsecond window is one of the Motorola spec sheet's documented ways
to read the RTC time registers reliably.  I realize that not all "clones" of
the



of the what ?

Also put the detailed description into the patch comment, please.

  

The patch updates the misleading comments and minimizes the amount of time
that the kernel disables interrupts.

I have thoroughly tested this patch on a number of x86_64 machines with
various numbers of cores and chipsets, using 2.6.24rc2 kernel source. Note
that while testing the ntp code I found another bug in kernel/time/ntp.c which
is independent of this fix - neither patch requires the other.

If possible, I'd love to see the patch merged with 2.6.24, and even with
2.6.23.



Care to resend ?

Thanks,

 tglx

  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] time: fix typo that makes sync_cmos_clock erratic

2007-11-12 Thread David P. Reed


From: David P. Reed

Fix a typo in ntp.c that has caused updating of the persistent (RTC)
clock when synced to NTP to behave erratically.

Signed-off-by: David P. Reed

---
When debugging a freeze that arises on my AMD64 machines when I run the
ntpd service, I added a number of printk's to monitor the
sync_cmos_clock procedure.  I discovered that it was not syncing to cmos
RTC every 11 minutes as documented, but instead would keep trying every
second for hours at a time.   The reason turned out to be a typo in
sync_cmos_clock, where it attempts to ensure that
update_persistent_clock is called very close to 500 msec. after a 1
second boundary (required by the PC RTC's spec). That typo referred to
"xtime" in one spot, rather than "now", which is derived from "xtime"
but not equal to it.  This makes the test erratic, creating a
"coin-flip" that decides when update_persistent_clock is called - when
it is called, which is rarely, it may be at any time during the one
second period, rather than close to 500 msec, so the value written is
needlessly incorrect, too.

I rewrote the code to fix the typo and to be a little more clear about
the logic that was intended here.  I have thoroughly tested this on
several x86_64 machines, and also on one 32-bit machine.  I have also
submitted a patch to fix the freeze cited above, but both patches are
independent, and should be treated separately.

The patch works in 2.6.24rc2, but it should also work in 2.6.23.

Index: linux-2.6/kernel/time/ntp.c
===
--- linux-2.6.orig/kernel/time/ntp.c
+++ linux-2.6/kernel/time/ntp.c
@@ -188,6 +188,7 @@ static DEFINE_TIMER(sync_cmos_timer, syn
static void sync_cmos_clock(unsigned long dummy)
{
struct timespec now, next;
+long delta_from_target_time;
int fail = 1;

/*
@@ -205,7 +206,10 @@ static void sync_cmos_clock(unsigned lon
return;

getnstimeofday(&now);
-if (abs(xtime.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec / 2)
+delta_from_target_time = abs(now.tv_nsec - (NSEC_PER_SEC / 2));
+
+/* set CMOS clock, only if close enough to 500 msec point */
+if (delta_from_target_time <= tick_nsec / 2)
fail = update_persistent_clock(now);

next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: fix locking and sync bugs in x86_64 RTC code in time_64.c

2007-11-12 Thread David P. Reed


From: David P. Reed

Fix two bugs in arch/x86/kernel/time_64.c that affect the x86_64 kernel. 
1) a repeatable hard freeze due to interrupts when the ntpd service 
calls update_persistent_clock(), 2) potentially unstable PC RTC timer 
values when timer is read.


Signed-off-by: David P. Reed <[EMAIL PROTECTED]>
---
More explanation:
1) A repeatable but randomly timed freeze has been happening in Fedora 6 
and 7 for the last year, whenever I run the ntpd service on my AMD64x2 
HP Pavilion dv9000z laptop.  This freeze is due to the use of 
spin_lock(&rtc_lock) under the assumption (per a bad comment) that 
set_rtc_mmss is called only with interrupts disabled.  The call from 
ntp.c to update_persistent_clock is made with interrupts enabled.


2) the use of an incorrect technique for reading the standard PC RTC 
timer, which is documented to "disconnect" time registers from the bus 
while updates are in progress.  The use of UIP flag while interrupts are 
disabled to protect a 244 microsecond window is one of the Motorola spec 
sheet's documented ways to read the RTC time registers reliably.  I 
realize that not all "clones" of the


The patch updates the misleading comments and minimizes the amount of 
time that the kernel disables interrupts.


I have thoroughly tested this patch on a number of x86_64 machines with 
various numbers of cores and chipsets, using 2.6.24rc2 kernel source. 
Note that while testing the ntp code I found another bug in 
kernel/time/ntp.c which is independent of this fix - neither patch 
requires the other.


If possible, I'd love to see the patch merged with 2.6.24, and even with 
2.6.23.


Index: linux-2.6/arch/x86/kernel/time_64.c
===
--- linux-2.6.orig/arch/x86/kernel/time_64.c
+++ linux-2.6/arch/x86/kernel/time_64.c
@@ -82,13 +82,12 @@ static int set_rtc_mmss(unsigned long no
int retval = 0;
int real_seconds, real_minutes, cmos_minutes;
unsigned char control, freq_select;
+unsigned long flags;

-/*
- * IRQs are disabled when we're called from the timer interrupt,
- * no need for spin_lock_irqsave()
+/*
+ * set_rtc_mmss is called when irqs are enabled, so disable irqs here
 */
-
-spin_lock(&rtc_lock);
+spin_lock_irqsave(&rtc_lock, flags);

/*
 * Tell the clock it's being set and stop it.
@@ -138,7 +137,7 @@ static int set_rtc_mmss(unsigned long no
CMOS_WRITE(control, RTC_CONTROL);
CMOS_WRITE(freq_select, RTC_FREQ_SELECT);

-spin_unlock(&rtc_lock);
+spin_unlock_irqrestore(&rtc_lock, flags);

return retval;
}
@@ -163,22 +162,30 @@ unsigned long read_persistent_clock(void
unsigned long flags;
unsigned century = 0;

-spin_lock_irqsave(&rtc_lock, flags);
+ retry:spin_lock_irqsave(&rtc_lock, flags);
+/* if UIP is clear, then we have >= 244 microseconds before RTC
+ * registers will be updated.  Spec sheet says that this is the
+ * reliable way to read RTC - registers invalid (off bus) during update
+ */
+if ((CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP)) {
+  spin_unlock_irqrestore(&rtc_lock, flags);
+  cpu_relax();
+  goto retry;
+}

-do {
-sec = CMOS_READ(RTC_SECONDS);
-min = CMOS_READ(RTC_MINUTES);
-hour = CMOS_READ(RTC_HOURS);
-day = CMOS_READ(RTC_DAY_OF_MONTH);
-mon = CMOS_READ(RTC_MONTH);
-year = CMOS_READ(RTC_YEAR);
+/* now read all RTC registers while stable with interrupts disabled */
+
+sec = CMOS_READ(RTC_SECONDS);
+min = CMOS_READ(RTC_MINUTES);
+hour = CMOS_READ(RTC_HOURS);
+day = CMOS_READ(RTC_DAY_OF_MONTH);
+mon = CMOS_READ(RTC_MONTH);
+year = CMOS_READ(RTC_YEAR);
#ifdef CONFIG_ACPI
-if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID &&
-acpi_gbl_FADT.century)
-century = CMOS_READ(acpi_gbl_FADT.century);
+if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID &&
+acpi_gbl_FADT.century)
+century = CMOS_READ(acpi_gbl_FADT.century);
#endif
-} while (sec != CMOS_READ(RTC_SECONDS));
-
spin_unlock_irqrestore(&rtc_lock, flags);

/*







-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] time: fix typo that makes sync_cmos_clock erratic

2007-11-12 Thread David P. Reed


From: David P. Reed

Fix a typo in ntp.c that has caused updating of the persistent (RTC) 
clock when synced to NTP to behave erratically.


Signed-off-by: David P. Reed

---
When debugging a freeze that arises on my AMD64 machines when I run the 
ntpd service, I added a number of printk's to monitor the 
sync_cmos_clock procedure.  I discovered that it was not syncing to cmos 
RTC every 11 minutes as documented, but instead would keep trying every 
second for hours at a time.   The reason turned out to be a typo in 
sync_cmos_clock, where it attempts to ensure that 
update_persistent_clock is called very close to 500 msec. after a 1 
second boundary (required by the PC RTC's spec). That typo referred to 
"xtime" in one spot, rather than "now", which is derived from "xtime" 
but not equal to it.  This makes the test erratic, creating a 
"coin-flip" that decides when update_persistent_clock is called - when 
it is called, which is rarely, it may be at any time during the one 
second period, rather than close to 500 msec, so the value written is 
needlessly incorrect, too.


I rewrote the code to fix the typo and to be a little more clear about 
the logic that was intended here.  I have thoroughly tested this on 
several x86_64 machines, and also on one 32-bit machine.  I have also 
submitted a patch to fix the freeze cited above, but both patches are 
independent, and should be treated separately.


The patch works in 2.6.24rc2, but it should also work in 2.6.23.

Index: linux-2.6/kernel/time/ntp.c
===
--- linux-2.6.orig/kernel/time/ntp.c
+++ linux-2.6/kernel/time/ntp.c
@@ -188,6 +188,7 @@ static DEFINE_TIMER(sync_cmos_timer, syn
static void sync_cmos_clock(unsigned long dummy)
{
struct timespec now, next;
+long delta_from_target_time;
int fail = 1;

/*
@@ -205,7 +206,10 @@ static void sync_cmos_clock(unsigned lon
return;

getnstimeofday(&now);
-if (abs(xtime.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec / 2)
+delta_from_target_time = abs(now.tv_nsec - (NSEC_PER_SEC / 2));
+
+/* set CMOS clock, only if close enough to 500 msec point */
+if (delta_from_target_time <= tick_nsec / 2)
fail = update_persistent_clock(now);

next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: do get_rtc_time() correctly

2007-08-15 Thread David P. Reed


Alan:

Thanks for the comment. I will code a patch, and include a sanity check 
as you suggested, and send it for review. Just to clarify one concern 
your note raised:


I understand that SMM/SMI servicing can take a long time, but SMM/SMI 
shouldn't  happen while interrupts are masked using local_irq_disable() 
[included in spin_lock_irq()], at least on x86-architectures.  If 
SMM/SMI can happen even then, the NMI fix below could be generalized.


My mention of NMI (which by definition can't be masked) is because NMI 
can happen even while interrupts are masked.  This is a timing problem 
that can't be dealt with by masking interrupts, and NMI's are used for 
watchdogs, etc these days.   It seems like just a general good thing to 
be able to ask if an NMI has happened.   A per-cpu NMI eventcount that 
is incremented every NMI would allow one to detect NMI's that happen 
during an otherwise masked code sequence by reading it at the beginning 
and end of the code sequence.   Don't know if NMIs are common on other 
architectures, or if this is an architecture dependent concern.


Perhaps I'm really talking about two patches here.  One for a mechanism 
to detect NMIs that happen during a critical piece of code (so it can be 
retried), and one that depends on that to be really proper in reading 
the RTC reliably.


Alan Cox wrote:
So the proper way to read the RTC contents is to read the UIP flag, and 
if zero, read all the RTC registers with interrupts masked completely, 
so all reads happen in the 224 usec window.  (NMI can still be a 
problem, but you can have NMI's set a flag that forces a retry).



SMM/SMI is more likely to be what bumps you 224usec or more.

  
I'm happy to code and test a patch.   Rather than just submit a patch, I 
thought I'd request others' comments on this, since it affects so many 
architectures.  cc me, if you will, as I don't subscribe to LKML, just 
check it periodically.



Go for it. The other architectures generally inherit it by inheriting
similar bridge chips or in more modern times the RTC macrocell. It should
also be possible to debug by putting in an optional sanity check
initially which checks the read made sense compared to a few more

  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RFC: do get_rtc_time() correctly

2007-08-15 Thread David P. Reed

In trying to track down a bug related to hwclock hanging my x86_64 
machine, I found myself reading include/asm-generic/rtc.h carefully with 
a chipset spec for several RTC chips (in particular, the granddaddy, the 
MC146818A) in my hand.


I found that the code in get_rtc_time() is very, very odd, and IMO very 
wrong.


The idea in the comment at the top seems to suggest that the author 
thought that the UIP flag indicates an update is in progress at that 
very instant, so one needs to synchronize with the "falling edge" of 
that flag to ensure that one can read the RTC state without instability 
in its buffered value.   That is not the way the UIP flag is defined to 
work.  

The UIP flag is =1 during a period PRIOR to the actual update, starting 
224 usec before the update, and ending when the update is complete.   It 
is done that way (which might seem odd) so that if you read UIP=0, you 
have a 224 usec window, EVEN IF the UIP were to become =1 just after you 
read it.


So the proper way to read the RTC contents is to read the UIP flag, and 
if zero, read all the RTC registers with interrupts masked completely, 
so all reads happen in the 224 usec window.  (NMI can still be a 
problem, but you can have NMI's set a flag that forces a retry).


If the UIP flag is one, you need to try again.  Pseudo-code is as follows:

retry:
   spin_lock_irq(&rtc_lock);
   if (UIP_flag !=0 )  { spin_unlock_irq(&rtc_lock); cpu_relax(); goto 
retry; }

   ... read RTC registers ...
   spin_unlock_irq(&rtc_lock);

This should work for all RTC's compatible with the MC146818A, and is 
also somewhat faster and less masked than the code in the current Linux 
(not that reading RTC's is crucial for performance, but the current code 
occasionally *loops with all interrupts masked for 10 msec!*  Why anyone 
thought this necessary, I have no clue.)


I'm happy to code and test a patch.   Rather than just submit a patch, I 
thought I'd request others' comments on this, since it affects so many 
architectures.  cc me, if you will, as I don't subscribe to LKML, just 
check it periodically.


As noted in the comment, it *is* true that setting the RTC clock needs 
additional synchronization, which can be done in the drivers, as it 
seems to be.  (though I would use an API that is designed so that any 
delay during the set period actually adds to the value being stored, so 
that delaying during the set_rtc operation would not cause the value 
stored to be "old")


- David

PS: I wrote code for 8088 versions of various PC DOS apps back in the 
'80's, such as VisiCalc and Lotus 1-2-3, and hacked many timing-related 
drivers back in the day - so I know this chip spec cold, and figured out 
this odd stuff back then.  That's why the weirdness jumped out at me.  
I'm still an assembly language coder at heart.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

98 matches

Mail list logo