Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic
On Tuesday, July 7, 2020 3:24pm, "Sean Christopherson" said: > On Tue, Jul 07, 2020 at 03:09:38PM -0400, David P. Reed wrote: >> >> On Tuesday, July 7, 2020 1:09am, "Sean Christopherson" >> said: >> Sean, are you the one who would get this particular fix pushed into Linus's >> tree, by the way? The "maintainership" is not clear to me. > > Nope, I'm just here to complain and nitpick :-) There's no direct maintainer > for virtext.h so it falls under the higher level arch/x86 umbrella, i.e. I > expect Boris/Thomas/Ingo will pick this up. > Thanks for your time and effort in helping.
Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic
On Tuesday, July 7, 2020 1:09am, "Sean Christopherson" said: > On Sat, Jul 04, 2020 at 04:38:08PM -0400, David P. Reed wrote: >> Fix: Mask undefined operation fault during emergency VMXOFF that must be >> attempted to force cpu exit from VMX root operation. >> Explanation: When a cpu may be in VMX root operation (only possible when >> CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation >> using VMXOFF. This is necessary, because any INIT will be masked while cpu >> is in VMX root operation, but that state cannot be reliably >> discerned by the state of the cpu. >> VMXOFF faults if the cpu is not actually in VMX root operation, signalling >> undefined operation. >> Discovered while debugging an out-of-tree x-visor with a race. Can happen >> due to certain kinds of bugs in KVM. >> >> Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> >> Reported-by: David P. Reed >> Suggested-by: Thomas Gleixner >> Suggested-by: Sean Christopherson >> Suggested-by: Andy Lutomirski >> Signed-off-by: David P. Reed >> --- >> arch/x86/include/asm/virtext.h | 20 ++-- >> 1 file changed, 14 insertions(+), 6 deletions(-) >> >> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h >> index 0ede8d04535a..0e0900eacb9c 100644 >> --- a/arch/x86/include/asm/virtext.h >> +++ b/arch/x86/include/asm/virtext.h >> @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void) >> } >> >> >> -/* Disable VMX on the current CPU >> +/* Exit VMX root mode and isable VMX on the current CPU. >> * >> * vmxoff causes a undefined-opcode exception if vmxon was not run >> - * on the CPU previously. Only call this function if you know VMX >> - * is enabled. >> + * on the CPU previously. Only call this function if you know cpu >> + * is in VMX root mode. >> */ >> static inline void cpu_vmxoff(void) >> { >> @@ -47,14 +47,22 @@ static inline int cpu_vmx_enabled(void) >> return __read_cr4() & X86_CR4_VMXE; >> } >> >> -/* Disable VMX if it is enabled on the current CPU >> +/* Safely exit VMX root mode and disable VMX if VMX enabled >> + * on the current CPU. Handle undefined-opcode fault >> + * that can occur if cpu is not in VMX root mode, due >> + * to a race. >> * >> * You shouldn't call this if cpu_has_vmx() returns 0. >> */ >> static inline void __cpu_emergency_vmxoff(void) >> { >> -if (cpu_vmx_enabled()) >> -cpu_vmxoff(); >> +if (!cpu_vmx_enabled()) >> +return; >> +asm volatile ("1:vmxoff\n\t" >> + "2:\n\t" >> + _ASM_EXTABLE(1b, 2b) >> + ::: "cc", "memory"); >> +cr4_clear_bits(X86_CR4_VMXE); > > Open coding vmxoff doesn't make sense, and IMO is flat out wrong as it fixes > flows that use __cpu_emergency_vmxoff() but leaves the same bug hanging > around in emergency_vmx_disable_all() until the next patch. > > The reason I say it doesn't make sense is that there is no sane scenario > where the generic vmxoff helper should _not_ eat the fault. All other VMXOFF > faults are mode related, i.e. any fault is guaranteed to be due to the > !post-VMXON check unless we're magically in RM, VM86, compat mode, or at > CPL>0. Given that the whole point of this series is that it's impossible to > determine whether or not the CPU if post-VMXON if CR4.VMXE=1 without taking a > fault of some form, there's simply no way that anything except the hypervisor > (in normal operation) can know the state of VMX. And given that the only > in-tree hypervisor (KVM) has its own version of vmxoff, that means there is > no scenario in which cpu_vmxoff() can safely be used. Case in point, after > the next patch there are no users of cpu_vmxoff(). > > TL;DR: Just do fixup on cpu_vmxoff(). Personally, I don't care either way, since it fixes the bug either way (and it's inlined, so either way no additional code is generated. I was just being conservative since the cpu_vmxoff() is exported throughout the kernel source, so it might be expected to stay the same (when not in an "emergency"). I'll wait a day or two for any objections to just doing the fix in cpu_vmxoff() by other commenters. WIth no objection, I'll just do it that way. Sean, are you the one who would get this particular fix pushed into Linus's tree, by the way? The "maintainership" is not clear to me. If you are, happy to take direction from you as the primary input. > >> } >> >> /* Disable VMX if it is supported and enabled on the current CPU >> -- >> 2.26.2 >> >
Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic
On Sunday, July 5, 2020 4:55pm, "Andy Lutomirski" said: > On Sun, Jul 5, 2020 at 12:52 PM David P. Reed wrote: >> >> Thanks, will handle these. 2 questions below. >> >> On Sunday, July 5, 2020 2:22pm, "Andy Lutomirski" said: >> >> > On Sat, Jul 4, 2020 at 1:38 PM David P. Reed wrote: >> >> >> >> Fix: Mask undefined operation fault during emergency VMXOFF that must be >> >> attempted to force cpu exit from VMX root operation. >> >> Explanation: When a cpu may be in VMX root operation (only possible when >> >> CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation >> >> using VMXOFF. This is necessary, because any INIT will be masked while cpu >> >> is in VMX root operation, but that state cannot be reliably >> >> discerned by the state of the cpu. >> >> VMXOFF faults if the cpu is not actually in VMX root operation, signalling >> >> undefined operation. >> >> Discovered while debugging an out-of-tree x-visor with a race. Can happen >> >> due to certain kinds of bugs in KVM. >> > >> > Can you re-wrap lines to 68 characters? Also, the Fix: and >> >> I used 'scripts/checkpatch.pl' and it had me wrap to 75 chars: >> "WARNING: Possible unwrapped commit description (prefer a maximum 75 chars >> per >> line)" >> >> Should I submit a fix to checkpatch.pl to say 68? > > 75 is probably fine too, but something is odd about your wrapping. > You have long lines mostly alternating with short lines. It's as if > you wrote 120-ish character lines and then wrapped to 75 without > reflowing. My emacs settings tend to wrap at about 85 depending on file type (big screens). I did the shortening manually, aimed at breaking at meaningful points, not worrying too much about line-length uniformity. > >> >> > Explanation: is probably unnecessary. You could say: >> > >> > Ignore a potential #UD failut during emergency VMXOFF ... >> > >> > When a cpu may be in VMX ... >> > >> >> >> >> Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> >> >> Reported-by: David P. Reed >> > >> > It's not really necessary to say that you, the author, reported the >> > problem, but I guess it's harmless. >> > >> >> Suggested-by: Thomas Gleixner >> >> Suggested-by: Sean Christopherson >> >> Suggested-by: Andy Lutomirski >> >> Signed-off-by: David P. Reed >> >> --- >> >> arch/x86/include/asm/virtext.h | 20 ++-- >> >> 1 file changed, 14 insertions(+), 6 deletions(-) >> >> >> >> diff --git a/arch/x86/include/asm/virtext.h >> >> b/arch/x86/include/asm/virtext.h >> >> index 0ede8d04535a..0e0900eacb9c 100644 >> >> --- a/arch/x86/include/asm/virtext.h >> >> +++ b/arch/x86/include/asm/virtext.h >> >> @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void) >> >> } >> >> >> >> >> >> -/* Disable VMX on the current CPU >> >> +/* Exit VMX root mode and isable VMX on the current CPU. >> > >> > s/isable/disable/ >> > >> > >> >> /* Disable VMX if it is supported and enabled on the current CPU >> >> -- >> >> 2.26.2 >> >> >> > >> > Other than that: >> > >> > Reviewed-by: Andy Lutomirski >> >> As a newbie, I have a process question - should I resend the patch with the >> 'Reviewed-by' line, as well as correcting the other wording? Thanks! > > Probably. Sometimes a maintainer will apply the patch and make these > types of cosmetic changes, but it's easier if you resubmit. That > being said, for non-urgent patches, it's usually considered polite to > wait a day or two to give other people a chance to comment. I'm not sure which maintainer will move the patches along. I am waiting for additional input, but will resubmit in a day or two. > > --Andy >
Re: [PATCH v3 3/3] Force all cpus to exit VMX root operation on crash/panic reliably
On Sunday, July 5, 2020 2:26pm, "Andy Lutomirski" said: > On Sat, Jul 4, 2020 at 1:38 PM David P. Reed wrote: >> >> Fix the logic during crash/panic reboot on Intel processors that >> can support VMX operation to ensure that all processors are not >> in VMX root operation. Prior code made optimistic assumptions >> about other cpus that would leave other cpus in VMX root operation >> depending on timing of crash/panic reboot. >> Builds on cpu_ermergency_vmxoff() and __cpu_emergency_vmxoff() created >> in a prior patch. >> >> Suggested-by: Sean Christopherson >> Signed-off-by: David P. Reed >> --- >> arch/x86/kernel/reboot.c | 20 +++- >> 1 file changed, 7 insertions(+), 13 deletions(-) >> >> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c >> index 0ec7ced727fe..c8e96ba78efc 100644 >> --- a/arch/x86/kernel/reboot.c >> +++ b/arch/x86/kernel/reboot.c >> @@ -543,24 +543,18 @@ static void emergency_vmx_disable_all(void) >> * signals when VMX is enabled. >> * >> * We can't take any locks and we may be on an inconsistent >> -* state, so we use NMIs as IPIs to tell the other CPUs to disable >> -* VMX and halt. >> +* state, so we use NMIs as IPIs to tell the other CPUs to exit >> +* VMX root operation and halt. >> * >> * For safety, we will avoid running the nmi_shootdown_cpus() >> * stuff unnecessarily, but we don't have a way to check >> -* if other CPUs have VMX enabled. So we will call it only if the >> -* CPU we are running on has VMX enabled. >> -* >> -* We will miss cases where VMX is not enabled on all CPUs. This >> -* shouldn't do much harm because KVM always enable VMX on all >> -* CPUs anyway. But we can miss it on the small window where KVM >> -* is still enabling VMX. >> +* if other CPUs might be in VMX root operation. >> */ >> - if (cpu_has_vmx() && cpu_vmx_enabled()) { >> - /* Disable VMX on this CPU. */ >> - cpu_vmxoff(); >> + if (cpu_has_vmx()) { >> + /* Safely force out of VMX root operation on this CPU. */ >> + __cpu_emergency_vmxoff(); >> >> - /* Halt and disable VMX on the other CPUs */ >> + /* Halt and exit VMX root operation on the other CPUs */ >> nmi_shootdown_cpus(vmxoff_nmi); >> >> } > > Seems reasonable to me. > > As a minor caveat, doing cr4_clear_bits() in NMI context is not really > okay, but we're about to reboot, so nothing too awful should happen. > And this has very little to do with your patch. I had wondered why the bit is cleared, too. (I assumed it was OK or desirable, because it was being cleared in NMI context before). Happy to submit a separate patch to eliminate that issue as well, since the point of emergency vmxoff is only to get out of VMX root mode - CR4.VMXE's state is irrelevant. Of course, clearing it prevents any future emergency vmxoff attempts. (there seemed to be some confusion about "enabling" VMX vs. "in VMX operation" in the comments) Should I? > > Reviewed-by: Andy Lutomirski >
Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic
Thanks, will handle these. 2 questions below. On Sunday, July 5, 2020 2:22pm, "Andy Lutomirski" said: > On Sat, Jul 4, 2020 at 1:38 PM David P. Reed wrote: >> >> Fix: Mask undefined operation fault during emergency VMXOFF that must be >> attempted to force cpu exit from VMX root operation. >> Explanation: When a cpu may be in VMX root operation (only possible when >> CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation >> using VMXOFF. This is necessary, because any INIT will be masked while cpu >> is in VMX root operation, but that state cannot be reliably >> discerned by the state of the cpu. >> VMXOFF faults if the cpu is not actually in VMX root operation, signalling >> undefined operation. >> Discovered while debugging an out-of-tree x-visor with a race. Can happen >> due to certain kinds of bugs in KVM. > > Can you re-wrap lines to 68 characters? Also, the Fix: and I used 'scripts/checkpatch.pl' and it had me wrap to 75 chars: "WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)" Should I submit a fix to checkpatch.pl to say 68? > Explanation: is probably unnecessary. You could say: > > Ignore a potential #UD failut during emergency VMXOFF ... > > When a cpu may be in VMX ... > >> >> Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> >> Reported-by: David P. Reed > > It's not really necessary to say that you, the author, reported the > problem, but I guess it's harmless. > >> Suggested-by: Thomas Gleixner >> Suggested-by: Sean Christopherson >> Suggested-by: Andy Lutomirski >> Signed-off-by: David P. Reed >> --- >> arch/x86/include/asm/virtext.h | 20 ++-- >> 1 file changed, 14 insertions(+), 6 deletions(-) >> >> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h >> index 0ede8d04535a..0e0900eacb9c 100644 >> --- a/arch/x86/include/asm/virtext.h >> +++ b/arch/x86/include/asm/virtext.h >> @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void) >> } >> >> >> -/* Disable VMX on the current CPU >> +/* Exit VMX root mode and isable VMX on the current CPU. > > s/isable/disable/ > > >> /* Disable VMX if it is supported and enabled on the current CPU >> -- >> 2.26.2 >> > > Other than that: > > Reviewed-by: Andy Lutomirski As a newbie, I have a process question - should I resend the patch with the 'Reviewed-by' line, as well as correcting the other wording? Thanks! > > --Andy >
[PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic
Fix: Mask undefined operation fault during emergency VMXOFF that must be attempted to force cpu exit from VMX root operation. Explanation: When a cpu may be in VMX root operation (only possible when CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation using VMXOFF. This is necessary, because any INIT will be masked while cpu is in VMX root operation, but that state cannot be reliably discerned by the state of the cpu. VMXOFF faults if the cpu is not actually in VMX root operation, signalling undefined operation. Discovered while debugging an out-of-tree x-visor with a race. Can happen due to certain kinds of bugs in KVM. Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> Reported-by: David P. Reed Suggested-by: Thomas Gleixner Suggested-by: Sean Christopherson Suggested-by: Andy Lutomirski Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 0ede8d04535a..0e0900eacb9c 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void) } -/* Disable VMX on the current CPU +/* Exit VMX root mode and isable VMX on the current CPU. * * vmxoff causes a undefined-opcode exception if vmxon was not run - * on the CPU previously. Only call this function if you know VMX - * is enabled. + * on the CPU previously. Only call this function if you know cpu + * is in VMX root mode. */ static inline void cpu_vmxoff(void) { @@ -47,14 +47,22 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/* Disable VMX if it is enabled on the current CPU +/* Safely exit VMX root mode and disable VMX if VMX enabled + * on the current CPU. Handle undefined-opcode fault + * that can occur if cpu is not in VMX root mode, due + * to a race. * * You shouldn't call this if cpu_has_vmx() returns 0. */ static inline void __cpu_emergency_vmxoff(void) { - if (cpu_vmx_enabled()) - cpu_vmxoff(); + if (!cpu_vmx_enabled()) + return; + asm volatile ("1:vmxoff\n\t" + "2:\n\t" + _ASM_EXTABLE(1b, 2b) + ::: "cc", "memory"); + cr4_clear_bits(X86_CR4_VMXE); } /* Disable VMX if it is supported and enabled on the current CPU -- 2.26.2
[PATCH v3 0/3] Fix undefined operation VMXOFF during reboot and crash
At the request of Sean Christopherson, the original patch was split into three patches, each fixing a distinct issue related to the original bug, of a hang due to VMXOFF causing an undefined operation fault when the kernel reboots with CR4.VMXE set. The combination of the patches is the complete fix to the reported bug, and a lurking error in asm side effects. David P. Reed (3): Correct asm VMXOFF side effects Fix undefined operation fault that can hang a cpu on crash or panic Force all cpus to exit VMX root operation on crash/panic reliably arch/x86/include/asm/virtext.h | 24 arch/x86/kernel/reboot.c | 20 +++- 2 files changed, 23 insertions(+), 21 deletions(-) -- 2.26.2
[PATCH v3 3/3] Force all cpus to exit VMX root operation on crash/panic reliably
Fix the logic during crash/panic reboot on Intel processors that can support VMX operation to ensure that all processors are not in VMX root operation. Prior code made optimistic assumptions about other cpus that would leave other cpus in VMX root operation depending on timing of crash/panic reboot. Builds on cpu_ermergency_vmxoff() and __cpu_emergency_vmxoff() created in a prior patch. Suggested-by: Sean Christopherson Signed-off-by: David P. Reed --- arch/x86/kernel/reboot.c | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 0ec7ced727fe..c8e96ba78efc 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -543,24 +543,18 @@ static void emergency_vmx_disable_all(void) * signals when VMX is enabled. * * We can't take any locks and we may be on an inconsistent -* state, so we use NMIs as IPIs to tell the other CPUs to disable -* VMX and halt. +* state, so we use NMIs as IPIs to tell the other CPUs to exit +* VMX root operation and halt. * * For safety, we will avoid running the nmi_shootdown_cpus() * stuff unnecessarily, but we don't have a way to check -* if other CPUs have VMX enabled. So we will call it only if the -* CPU we are running on has VMX enabled. -* -* We will miss cases where VMX is not enabled on all CPUs. This -* shouldn't do much harm because KVM always enable VMX on all -* CPUs anyway. But we can miss it on the small window where KVM -* is still enabling VMX. +* if other CPUs might be in VMX root operation. */ - if (cpu_has_vmx() && cpu_vmx_enabled()) { - /* Disable VMX on this CPU. */ - cpu_vmxoff(); + if (cpu_has_vmx()) { + /* Safely force out of VMX root operation on this CPU. */ + __cpu_emergency_vmxoff(); - /* Halt and disable VMX on the other CPUs */ + /* Halt and exit VMX root operation on the other CPUs */ nmi_shootdown_cpus(vmxoff_nmi); } -- 2.26.2
[PATCH v3 1/3] Correct asm VMXOFF side effects
Tell gcc that VMXOFF instruction clobbers condition codes and memory when executed. Also, correct original comments to remove kernel-doc syntax per Randy Dunlap's request. Suggested-by: Randy Dunlap Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 9aad0e0876fb..0ede8d04535a 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -30,7 +30,7 @@ static inline int cpu_has_vmx(void) } -/** Disable VMX on the current CPU +/* Disable VMX on the current CPU * * vmxoff causes a undefined-opcode exception if vmxon was not run * on the CPU previously. Only call this function if you know VMX @@ -38,7 +38,7 @@ static inline int cpu_has_vmx(void) */ static inline void cpu_vmxoff(void) { - asm volatile ("vmxoff"); + asm volatile ("vmxoff" ::: "cc", "memory"); cr4_clear_bits(X86_CR4_VMXE); } @@ -47,7 +47,7 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/** Disable VMX if it is enabled on the current CPU +/* Disable VMX if it is enabled on the current CPU * * You shouldn't call this if cpu_has_vmx() returns 0. */ @@ -57,7 +57,7 @@ static inline void __cpu_emergency_vmxoff(void) cpu_vmxoff(); } -/** Disable VMX if it is supported and enabled on the current CPU +/* Disable VMX if it is supported and enabled on the current CPU */ static inline void cpu_emergency_vmxoff(void) { -- 2.26.2
Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash
On Monday, June 29, 2020 5:49pm, "Sean Christopherson" said: > On Mon, Jun 29, 2020 at 02:22:45PM -0700, Andy Lutomirski wrote: >> >> >> > On Jun 29, 2020, at 1:54 PM, David P. Reed wrote: >> > >> > Simple question for those on the To: and CC: list here. Should I >> > abandon any hope of this patch being accepted? It's been a long time. >> > >> > The non-response after I acknowledged that this was discovered by when >> > working on a personal, non-commercial research project - which is >> > "out-of-tree" (apparently dirty words on LKML) has me thinking my >> > contribution is unwanted. That's fine, I suppose. I can maintain this patch >> > out-of-tree as well. I did incorporate all the helpful suggestions I >> > received in this second patch, and given some encouragement, will happily >> > submit a revised v3 if there is any likelihood of acceptance. I'm wary of >> > doing more radical changes (like combining emergency and normal paths). >> > >> >> Sorry about being slow and less actively encouraging than we should be. We >> absolutely welcome personal contributions. The actual problem is that >> everyone is worked and we’re all slow. Also, you may be hitting a corner >> case >> in the process: is this a KVM patch or an x86 patch? > > It's an x86 patch as it's not KVM specific, e.g. this code also helps play > nice with out of tree hypervisors. > > The code change is mostly good, but it needs to be split up as there are > three separate fixes: > > 1. Handle #UD on VMXON due to a race. > 2. Mark memory and flags as clobbered by VMXON. > 3. Change emergency_vmx_disable_all() to not manually check > cpu_vmx_enabled(). > > Yes, the changes are tiny, but if for example #3 introduces a bug then we > don't have to revert #1 and #2. Or perhaps older kernels are only subject > to the #1 and #2 and thus dumping all three changes into a single patch makes > it all harder to backport. In other words, all the usual "one change per > patch" reasons. > Thanks. If no one else responds with additional suggestions, I will make it into 3 patches. I'm happy to learn the nuances of the kernel patch regimen.
Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash
Simple question for those on the To: and CC: list here. Should I abandon any hope of this patch being accepted? It's been a long time. The non-response after I acknowledged that this was discovered by when working on a personal, non-commercial research project - which is "out-of-tree" (apparently dirty words on LKML) has me thinking my contribution is unwanted. That's fine, I suppose. I can maintain this patch out-of-tree as well. I did incorporate all the helpful suggestions I received in this second patch, and given some encouragement, will happily submit a revised v3 if there is any likelihood of acceptance. I'm wary of doing more radical changes (like combining emergency and normal paths). On Thursday, June 25, 2020 10:59am, "David P. Reed" said: > Correction to my comment below. > On Thursday, June 25, 2020 10:45am, "David P. Reed" > said: > >> [Sorry: this is resent because my mailer included HTML, rejected by LKML] >> Thanks for the response, Sean ... I had thought everyone was too busy to >> follow >> up >> from the first version. >> >> I confess I'm not sure why this should be broken up into a patch series, >> given >> that it is so very small and is all aimed at the same category of bug. >> >> And the "emergency" path pre-existed, I didn't want to propose removing it, >> since >> I assumed it was there for a reason. I didn't want to include my own >> judgement as >> to whether there should only be one path. (I'm pretty sure I didn't find a >> VMXOFF >> in KVM separately from the instance in this include file, but I will check). > Just checked. Yes, the kvm code's handling of VMXOFF is separate, and though > it > uses exception masking, seems to do other things, perhaps related to nested > KVM, > but I haven't studied the deep logic of KVM nesting. > >> >> A question: if I make it a series, I have to test each patch doesn't break >> something individually, in order to handle the case where one patch is >> accepted >> and the others are not. Do I need to test each individual patch thoroughly >> as an >> independent patch against all those cases? >> I know the combination don't break anything and fixes the issues I've >> discovered >> by testing all combinations (and I've done some thorough testing of panics, >> oopses >> crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash >> source >> to verify the fix fixes the problem's manifestations and to verify that it >> doesn't >> break any of the working paths. >> >> That said, I'm willing to do a v3 "series" based on these suggestions if it >> will >> smooth its acceptance. If it's not going to get accepted after doing that, my >> motivation is flagging. >> On Thursday, June 25, 2020 2:06am, "Sean Christopherson" >> said: >> >> >> >>> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote: >>> > -/** Disable VMX on the current CPU >>> > +/* Disable VMX on the current CPU >>> > * >>> > - * vmxoff causes a undefined-opcode exception if vmxon was not run >>> > - * on the CPU previously. Only call this function if you know VMX >>> > - * is enabled. >>> > + * vmxoff causes an undefined-opcode exception if vmxon was not run >>> > + * on the CPU previously. Only call this function directly if you know >>> > VMX >>> > + * is enabled *and* CPU is in VMX root operation. >>> > */ >>> > static inline void cpu_vmxoff(void) >>> > { >>> > - asm volatile ("vmxoff"); >>> > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on >>> > success >>> */ >>> > cr4_clear_bits(X86_CR4_VMXE); >>> > } >>> > >>> > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void) >>> > return __read_cr4() & X86_CR4_VMXE; >>> > } >>> > >>> > -/** Disable VMX if it is enabled on the current CPU >>> > - * >>> > - * You shouldn't call this if cpu_has_vmx() returns 0. >>> > +/* >>> > + * Safely disable VMX root operation if active >>> > + * Note that if CPU is not in VMX root operation this >>> > + * VMXOFF will fault an undefined operation fault, >>> > + * so use the exception masking facility to handle that RARE >>> > + * case. >>> > + * You shouldn't call this directly if cpu_has_vmx() returns 0 >>
Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash
Correction to my comment below. On Thursday, June 25, 2020 10:45am, "David P. Reed" said: > [Sorry: this is resent because my mailer included HTML, rejected by LKML] > Thanks for the response, Sean ... I had thought everyone was too busy to > follow up > from the first version. > > I confess I'm not sure why this should be broken up into a patch series, given > that it is so very small and is all aimed at the same category of bug. > > And the "emergency" path pre-existed, I didn't want to propose removing it, > since > I assumed it was there for a reason. I didn't want to include my own > judgement as > to whether there should only be one path. (I'm pretty sure I didn't find a > VMXOFF > in KVM separately from the instance in this include file, but I will check). Just checked. Yes, the kvm code's handling of VMXOFF is separate, and though it uses exception masking, seems to do other things, perhaps related to nested KVM, but I haven't studied the deep logic of KVM nesting. > > A question: if I make it a series, I have to test each patch doesn't break > something individually, in order to handle the case where one patch is > accepted > and the others are not. Do I need to test each individual patch thoroughly as > an > independent patch against all those cases? > I know the combination don't break anything and fixes the issues I've > discovered > by testing all combinations (and I've done some thorough testing of panics, > oopses > crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash > source > to verify the fix fixes the problem's manifestations and to verify that it > doesn't > break any of the working paths. > > That said, I'm willing to do a v3 "series" based on these suggestions if it > will > smooth its acceptance. If it's not going to get accepted after doing that, my > motivation is flagging. > On Thursday, June 25, 2020 2:06am, "Sean Christopherson" > said: > > > >> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote: >> > -/** Disable VMX on the current CPU >> > +/* Disable VMX on the current CPU >> > * >> > - * vmxoff causes a undefined-opcode exception if vmxon was not run >> > - * on the CPU previously. Only call this function if you know VMX >> > - * is enabled. >> > + * vmxoff causes an undefined-opcode exception if vmxon was not run >> > + * on the CPU previously. Only call this function directly if you know VMX >> > + * is enabled *and* CPU is in VMX root operation. >> > */ >> > static inline void cpu_vmxoff(void) >> > { >> > - asm volatile ("vmxoff"); >> > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on >> > success >> */ >> > cr4_clear_bits(X86_CR4_VMXE); >> > } >> > >> > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void) >> > return __read_cr4() & X86_CR4_VMXE; >> > } >> > >> > -/** Disable VMX if it is enabled on the current CPU >> > - * >> > - * You shouldn't call this if cpu_has_vmx() returns 0. >> > +/* >> > + * Safely disable VMX root operation if active >> > + * Note that if CPU is not in VMX root operation this >> > + * VMXOFF will fault an undefined operation fault, >> > + * so use the exception masking facility to handle that RARE >> > + * case. >> > + * You shouldn't call this directly if cpu_has_vmx() returns 0 >> > + */ >> > +static inline void cpu_vmxoff_safe(void) >> > +{ >> > + asm volatile("1:vmxoff\n\t" /* clears all flags on success */ >> >> Eh, I wouldn't bother with the comment, there are a million other caveats >> with VMXOFF that are far more interesting. >> >> > + "2:\n\t" >> > + _ASM_EXTABLE(1b, 2b) >> > + ::: "cc", "memory"); >> >> Adding the memory and flags clobber should be a separate patch. >> >> > + cr4_clear_bits(X86_CR4_VMXE); >> > +} >> >> >> I don't see any value in safe/unsafe variants. The only in-kernel user of >> VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF >> helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add >> the exception fixup to cpu_vmxoff() and call it good. >> >> > + >> > +/* >> > + * Force disable VMX if it is enabled on the current CPU, >> > + * when it is unknown whether CPU is in VMX operation. >> > */ >> > static inline void __cp
Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash
[Sorry: this is resent because my mailer included HTML, rejected by LKML] Thanks for the response, Sean ... I had thought everyone was too busy to follow up from the first version. I confess I'm not sure why this should be broken up into a patch series, given that it is so very small and is all aimed at the same category of bug. And the "emergency" path pre-existed, I didn't want to propose removing it, since I assumed it was there for a reason. I didn't want to include my own judgement as to whether there should only be one path. (I'm pretty sure I didn't find a VMXOFF in KVM separately from the instance in this include file, but I will check). A question: if I make it a series, I have to test each patch doesn't break something individually, in order to handle the case where one patch is accepted and the others are not. Do I need to test each individual patch thoroughly as an independent patch against all those cases? I know the combination don't break anything and fixes the issues I've discovered by testing all combinations (and I've done some thorough testing of panics, oopses crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash source to verify the fix fixes the problem's manifestations and to verify that it doesn't break any of the working paths. That said, I'm willing to do a v3 "series" based on these suggestions if it will smooth its acceptance. If it's not going to get accepted after doing that, my motivation is flagging. On Thursday, June 25, 2020 2:06am, "Sean Christopherson" said: > On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote: > > -/** Disable VMX on the current CPU > > +/* Disable VMX on the current CPU > > * > > - * vmxoff causes a undefined-opcode exception if vmxon was not run > > - * on the CPU previously. Only call this function if you know VMX > > - * is enabled. > > + * vmxoff causes an undefined-opcode exception if vmxon was not run > > + * on the CPU previously. Only call this function directly if you know VMX > > + * is enabled *and* CPU is in VMX root operation. > > */ > > static inline void cpu_vmxoff(void) > > { > > - asm volatile ("vmxoff"); > > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success > */ > > cr4_clear_bits(X86_CR4_VMXE); > > } > > > > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void) > > return __read_cr4() & X86_CR4_VMXE; > > } > > > > -/** Disable VMX if it is enabled on the current CPU > > - * > > - * You shouldn't call this if cpu_has_vmx() returns 0. > > +/* > > + * Safely disable VMX root operation if active > > + * Note that if CPU is not in VMX root operation this > > + * VMXOFF will fault an undefined operation fault, > > + * so use the exception masking facility to handle that RARE > > + * case. > > + * You shouldn't call this directly if cpu_has_vmx() returns 0 > > + */ > > +static inline void cpu_vmxoff_safe(void) > > +{ > > + asm volatile("1:vmxoff\n\t" /* clears all flags on success */ > > Eh, I wouldn't bother with the comment, there are a million other caveats > with VMXOFF that are far more interesting. > > > + "2:\n\t" > > + _ASM_EXTABLE(1b, 2b) > > + ::: "cc", "memory"); > > Adding the memory and flags clobber should be a separate patch. > > > + cr4_clear_bits(X86_CR4_VMXE); > > +} > > > I don't see any value in safe/unsafe variants. The only in-kernel user of > VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF > helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add > the exception fixup to cpu_vmxoff() and call it good. > > > + > > +/* > > + * Force disable VMX if it is enabled on the current CPU, > > + * when it is unknown whether CPU is in VMX operation. > > */ > > static inline void __cpu_emergency_vmxoff(void) > > { > > - if (cpu_vmx_enabled()) > > - cpu_vmxoff(); > > + if (!cpu_vmx_enabled()) > > + return; > > + cpu_vmxoff_safe(); > > Unnecessary churn. > > > } > > > > -/** Disable VMX if it is supported and enabled on the current CPU > > +/* Force disable VMX if it is supported on current CPU > > */ > > static inline void cpu_emergency_vmxoff(void) > > { > > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c > > index e040ba6be27b..b0e6b106a67e 100644 > > --- a/arch/x86/kernel/reboot.c > > +++ b/arch/x86/kernel/reboot.c > > @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void) > > * > >
[PATCH v2] Fix undefined operation VMXOFF during reboot and crash
If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is done on all CPUS, to allow the INIT IPI to function, since INIT is suppressed when CPUs are in VMX root operation. Problem is that VMXOFF will causes undefined operation fault when CPU not in VMX operation, that is, VMXON has not been executed yet, or VMXOFF has been execute but VMX still enabled. Patch makes the reboot work more reliably by masking the exception on VMXOFF in the crash/panic/reboot path, which uses cpu_emergency_vmxoff(). Can happen with KVM due to a race, but that race is rare today. Problem discovered doing out-of-tree x-visor development that uses VMX in a novel way for kernel performance analysis. The logic in reboot.c is also corrected, since the point of forcing the processor out of VMX root operation is to allow the INIT signal to be unmasked. See Intel SDM section on differences between VMX Root operation and normal operation. Thus every CPU must be forced out of VMX operation. Since the CPU may hang rather if INIT fails than restart, a manual hardware "reset" is the only way out of this state in a lights-out datacenter (well, if there is a BMC, it can issue a hardware RESET to the chip). Style errors in original file fixed, at request of Randy Dunlap: eliminate '/**' in non-kernel-doc comments. Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> Reported-by: David P. Reed Reported-by: Randy Dunlap Suggested-by: Thomas Gleixner Suggested-by: Sean Christopherson Suggested-by: Andy Lutomirski Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 40 -- arch/x86/kernel/reboot.c | 13 +++ 2 files changed, 32 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 9aad0e0876fb..ed22c1983da8 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -30,15 +30,15 @@ static inline int cpu_has_vmx(void) } -/** Disable VMX on the current CPU +/* Disable VMX on the current CPU * - * vmxoff causes a undefined-opcode exception if vmxon was not run - * on the CPU previously. Only call this function if you know VMX - * is enabled. + * vmxoff causes an undefined-opcode exception if vmxon was not run + * on the CPU previously. Only call this function directly if you know VMX + * is enabled *and* CPU is in VMX root operation. */ static inline void cpu_vmxoff(void) { - asm volatile ("vmxoff"); + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success */ cr4_clear_bits(X86_CR4_VMXE); } @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/** Disable VMX if it is enabled on the current CPU - * - * You shouldn't call this if cpu_has_vmx() returns 0. +/* + * Safely disable VMX root operation if active + * Note that if CPU is not in VMX root operation this + * VMXOFF will fault an undefined operation fault, + * so use the exception masking facility to handle that RARE + * case. + * You shouldn't call this directly if cpu_has_vmx() returns 0 + */ +static inline void cpu_vmxoff_safe(void) +{ + asm volatile("1:vmxoff\n\t" /* clears all flags on success */ + "2:\n\t" +_ASM_EXTABLE(1b, 2b) +::: "cc", "memory"); + cr4_clear_bits(X86_CR4_VMXE); +} + +/* + * Force disable VMX if it is enabled on the current CPU, + * when it is unknown whether CPU is in VMX operation. */ static inline void __cpu_emergency_vmxoff(void) { - if (cpu_vmx_enabled()) - cpu_vmxoff(); + if (!cpu_vmx_enabled()) + return; + cpu_vmxoff_safe(); } -/** Disable VMX if it is supported and enabled on the current CPU +/* Force disable VMX if it is supported on current CPU */ static inline void cpu_emergency_vmxoff(void) { diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index e040ba6be27b..b0e6b106a67e 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void) * * For safety, we will avoid running the nmi_shootdown_cpus() * stuff unnecessarily, but we don't have a way to check -* if other CPUs have VMX enabled. So we will call it only if the -* CPU we are running on has VMX enabled. -* -* We will miss cases where VMX is not enabled on all CPUs. This -* shouldn't do much harm because KVM always enable VMX on all -* CPUs anyway. But we can miss it on the small window where KVM -* is still enabling VMX. +* if other CPUs have VMX enabled. */ - if (cpu_has_vmx() && cpu_vmx_enabled()) { + if (cpu_has_vmx()) { /* Disable VMX on this CPU. */ - cpu_v
[PATCH v2] Fix undefined operation VMXOFF during reboot and crash
If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is done on all CPUS, to allow the INIT IPI to function, since INIT is suppressed when CPUs are in VMX root operation. Problem is that VMXOFF will causes undefined operation fault when CPU not in VMX operation, that is, VMXON has not been executed yet, or VMXOFF has been execute but VMX still enabled. Patch makes the reboot work more reliably by masking the exception on VMXOFF in the crash/panic/reboot path, which uses cpu_emergency_vmxoff(). Can happen with KVM due to a race, but that race is rare today. Problem discovered doing out-of-tree x-visor development that uses VMX in a novel way for kernel performance analysis. The logic in reboot.c is also corrected, since the point of forcing the processor out of VMX root operation is to allow the INIT signal to be unmasked. See Intel SDM section on differences between VMX Root operation and normal operation. Thus every CPU must be forced out of VMX operation. Since the CPU may hang rather if INIT fails than restart, a manual hardware "reset" is the only way out of this state in a lights-out datacenter (well, if there is a BMC, it can issue a hardware RESET to the chip). Style errors in original file fixed, at request of Randy Dunlap: eliminate '/**' in non-kernel-doc comments. Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> Reported-by: David P. Reed Reported-by: Randy Dunlap Suggested-by: Thomas Gleixner Suggested-by: Sean Christopherson Suggested-by: Andy Lutomirski Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 40 -- arch/x86/kernel/reboot.c | 13 +++ 2 files changed, 32 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 9aad0e0876fb..ed22c1983da8 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -30,15 +30,15 @@ static inline int cpu_has_vmx(void) } -/** Disable VMX on the current CPU +/* Disable VMX on the current CPU * - * vmxoff causes a undefined-opcode exception if vmxon was not run - * on the CPU previously. Only call this function if you know VMX - * is enabled. + * vmxoff causes an undefined-opcode exception if vmxon was not run + * on the CPU previously. Only call this function directly if you know VMX + * is enabled *and* CPU is in VMX root operation. */ static inline void cpu_vmxoff(void) { - asm volatile ("vmxoff"); + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success */ cr4_clear_bits(X86_CR4_VMXE); } @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/** Disable VMX if it is enabled on the current CPU - * - * You shouldn't call this if cpu_has_vmx() returns 0. +/* + * Safely disable VMX root operation if active + * Note that if CPU is not in VMX root operation this + * VMXOFF will fault an undefined operation fault, + * so use the exception masking facility to handle that RARE + * case. + * You shouldn't call this directly if cpu_has_vmx() returns 0 + */ +static inline void cpu_vmxoff_safe(void) +{ + asm volatile("1:vmxoff\n\t" /* clears all flags on success */ + "2:\n\t" +_ASM_EXTABLE(1b, 2b) +::: "cc", "memory"); + cr4_clear_bits(X86_CR4_VMXE); +} + +/* + * Force disable VMX if it is enabled on the current CPU, + * when it is unknown whether CPU is in VMX operation. */ static inline void __cpu_emergency_vmxoff(void) { - if (cpu_vmx_enabled()) - cpu_vmxoff(); + if (!cpu_vmx_enabled()) + return; + cpu_vmxoff_safe(); } -/** Disable VMX if it is supported and enabled on the current CPU +/* Force disable VMX if it is supported on current CPU */ static inline void cpu_emergency_vmxoff(void) { diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index e040ba6be27b..b0e6b106a67e 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void) * * For safety, we will avoid running the nmi_shootdown_cpus() * stuff unnecessarily, but we don't have a way to check -* if other CPUs have VMX enabled. So we will call it only if the -* CPU we are running on has VMX enabled. -* -* We will miss cases where VMX is not enabled on all CPUs. This -* shouldn't do much harm because KVM always enable VMX on all -* CPUs anyway. But we can miss it on the small window where KVM -* is still enabling VMX. +* if other CPUs have VMX enabled. */ - if (cpu_has_vmx() && cpu_vmx_enabled()) { + if (cpu_has_vmx()) { /* Disable VMX on this CPU. */ - cpu_v
[PATCH] Fix undefined operation VMXOFF during reboot and crash
If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is done on all CPUS, to allow the INIT IPI to function, since INIT is suppressed when CPUs are in VMX root operation. However, VMXOFF causes an undefined operation fault if the CPU is not in VMX operation, that is, VMXON has not been executed, or VMXOFF has been executed, but VMX is enabled. This fix makes the reboot work more reliably by modifying the #UD handler to skip the VMXOFF if VMX is enabled on the CPU and the VMXOFF is executed as part of cpu_emergency_vmxoff(). The logic in reboot.c is also corrected, since the point of forcing the processor out of VMX root operation is because when VMX root operation is enabled, the processor INIT signal is always masked. See Intel SDM section on differences between VMX Root operation and normal operation. Thus every CPU must be forced out of VMX operation. Since the CPU will hang rather than restart, a manual "reset" is the only way out of this state (or if there is a BMC, it can issue a RESET to the chip). Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 24 arch/x86/kernel/reboot.c | 13 ++--- arch/x86/kernel/traps.c| 52 -- 3 files changed, 71 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 9aad0e0876fb..ea2d67191684 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -13,12 +13,16 @@ #ifndef _ASM_X86_VIRTEX_H #define _ASM_X86_VIRTEX_H +#include + #include #include #include #include +DECLARE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff); + /* * VMX functions: */ @@ -33,8 +37,8 @@ static inline int cpu_has_vmx(void) /** Disable VMX on the current CPU * * vmxoff causes a undefined-opcode exception if vmxon was not run - * on the CPU previously. Only call this function if you know VMX - * is enabled. + * on the CPU previously. Only call this function directly if you know VMX + * is enabled *and* CPU is in VMX root operation. */ static inline void cpu_vmxoff(void) { @@ -47,17 +51,25 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/** Disable VMX if it is enabled on the current CPU +/** Force disable VMX if it is enabled on the current CPU. + * Note that if CPU is not in VMX root operation this + * VMXOFF will fault an undefined operation fault. + * So the 'doing_emergency_vmxoff' percpu flag is set, + * the trap handler for just restarts execution after + * the VMXOFF instruction. * - * You shouldn't call this if cpu_has_vmx() returns 0. + * You shouldn't call this directly if cpu_has_vmx() returns 0. */ static inline void __cpu_emergency_vmxoff(void) { - if (cpu_vmx_enabled()) + if (cpu_vmx_enabled()) { + this_cpu_write(doing_emergency_vmxoff, 1); cpu_vmxoff(); + this_cpu_write(doing_emergency_vmxoff, 0); + } } -/** Disable VMX if it is supported and enabled on the current CPU +/** Force disable VMX if it is supported and enabled on the current CPU */ static inline void cpu_emergency_vmxoff(void) { diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 3ca43be4f9cf..abc8b51a57c7 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void) * * For safety, we will avoid running the nmi_shootdown_cpus() * stuff unnecessarily, but we don't have a way to check -* if other CPUs have VMX enabled. So we will call it only if the -* CPU we are running on has VMX enabled. -* -* We will miss cases where VMX is not enabled on all CPUs. This -* shouldn't do much harm because KVM always enable VMX on all -* CPUs anyway. But we can miss it on the small window where KVM -* is still enabling VMX. +* if other CPUs have VMX enabled. */ - if (cpu_has_vmx() && cpu_vmx_enabled()) { + if (cpu_has_vmx()) { /* Disable VMX on this CPU. */ - cpu_vmxoff(); + cpu_emergency_vmxoff(); /* Halt and disable VMX on the other CPUs */ nmi_shootdown_cpus(vmxoff_nmi); - } } diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 4cc541051994..2dcf57ef467e 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -59,6 +60,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -70,6 +72,8 @@ #include #endif +DEFINE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff) = 0; + DECLARE_BITMAP(system_vectors, NR_VECTORS); static inline void cond_local_irq_enable(struct pt_regs *regs) @@ -115,6 +119,43 @@ int fixup_bug(struct pt_regs *regs, int tr
Re: [linux-kernel] Re: [PATCH] x86: use explicit timing delay for pit accesses in kernel and pcspkr driver
Actually, disparaging things as "one idiotic system" doesn't seem like a long-term thoughtful process - it's not even accurate. There are more such systems that are running code today than the total number of 486 systems ever manufactured. The production rate is $1M/month. a) ENE chips are "documented" to receive port 80, and also it is the case that modern chipsets will happily diagnose writes to non-existent ports as MCE's. Using side effects that depend on non-existent ports just creates a brittle failure mode down the road. And it's not just post ACPI "initialization". The pcspkr use of port 80 caused solid freezes if you typed "tab" to complete a command line and there were more than one choice, leading to beeps. b) sad to say, Linux is not what hardware vendors use as the system that their BIOSes MUST work with. That's Windows, and Windows, whether we like it or not does not require hardware vendors to stay away from port 80. IMHO, calling something "idiotic" is hardly evidence-based decision making. Maybe you love to hate Microsoft, but until Intel writes an architecture standard that says explicitly that a "standard PC" must not use port 80 for any peripheral, the port 80 thing is folklore, and one that is solely Linux-defined. Rene Herman wrote: On 20-02-08 18:05, H. Peter Anvin wrote: Rene Herman wrote: _Something_ like this would seem to be the only remaining option. It seems fairly unuseful to #ifdef around that switch statement for kernels without support for the earlier families, but if you insist... "Only remaining option" other than the one we've had all along. Even on the one idiotic set of systems which break, it only breaks post-ACPI intialization, IIRC. Linus vetoed the DMI switch. Rene. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: use explicit timing delay for pit accesses in kernel and pcspkr driver
Actually, disparaging things as one idiotic system doesn't seem like a long-term thoughtful process - it's not even accurate. There are more such systems that are running code today than the total number of 486 systems ever manufactured. The production rate is $1M/month. a) ENE chips are documented to receive port 80, and also it is the case that modern chipsets will happily diagnose writes to non-existent ports as MCE's. Using side effects that depend on non-existent ports just creates a brittle failure mode down the road. And it's not just post ACPI initialization. The pcspkr use of port 80 caused solid freezes if you typed tab to complete a command line and there were more than one choice, leading to beeps. b) sad to say, Linux is not what hardware vendors use as the system that their BIOSes MUST work with. That's Windows, and Windows, whether we like it or not does not require hardware vendors to stay away from port 80. IMHO, calling something idiotic is hardly evidence-based decision making. Maybe you love to hate Microsoft, but until Intel writes an architecture standard that says explicitly that a standard PC must not use port 80 for any peripheral, the port 80 thing is folklore, and one that is solely Linux-defined. Rene Herman wrote: On 20-02-08 18:05, H. Peter Anvin wrote: Rene Herman wrote: _Something_ like this would seem to be the only remaining option. It seems fairly unuseful to #ifdef around that switch statement for kernels without support for the earlier families, but if you insist... Only remaining option other than the one we've had all along. Even on the one idiotic set of systems which break, it only breaks post-ACPI intialization, IIRC. Linus vetoed the DMI switch. Rene. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: use explicit timing delay for pit accesses in kernel and pcspkr driver
x86: use explicit timing delay for pit accesses in kernel and pcspkr driver pit accesses in i8253.c and pcspkr driver use outb_p for timing. Fix them to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. Since the pcspkr driver accesses PIT registers directly, it should also use outb_pit, which is inlined, so does not need bo exported. Explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT. Fix pcspkr driver to use the new outb_pic call properly, use named PIC port values rather than hex constants, and drop its use of inb_p and outb_p in accessing port 61h where it has never been needed. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev { unsigned int count = 0; unsigned long flags; + unsigned char port61; if (type != EV_SND) return -1; @@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(_lock, flags); + port61 = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(port61 | 3, 0x61); /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb_pit(0xB6, PIT_MODE); /* select desired HZ */ - outb_p(count & 0xff, 0x42); - outb((count >> 8) & 0xff, 0x42); + outb_pit(count & 0xff, PIT_CH2); + outb((count >> 8) & 0xff, PIT_CH2); } else { /* disable counter 2 */ - outb(inb_p(0x61) & 0xFC, 0x61); + outb(port61 & 0xFC, 0x61); } spin_unlock_irqrestore(_lock, flags); Index: linux-2.6/include/asm-x86/i8253.h === --- linux-2.6.orig/include/asm-x86/i8253.h +++ linux-2.6/include/asm-x86/i8253.h @@ -12,7 +12,25 @@ extern struct clock_event_device *global extern void setup_pit_timer(void); -#define inb_pitinb_p -#define outb_pit outb_p +/* accesses to PIT registers need careful delays on some platforms. Define + them here in a common place */ +static inline unsigned char inb_pit(unsigned int port) +{ + /* delay for some accesses to PIT on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +static inline void outb_pit(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIT on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + + #endif /* __ASM_I8253_H__ */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: define outb_pic and inb_pic to stop using outb_p and inb_p
x86: define outb_pic and inb_pic to stop using outb_p and inb_p The delay between io port accesses to the PIC is now defined using outb_pic and inb_pic. This fix provides the next step, using udelay(2) to define the *PIC specific* timing requirements, rather than on bus-oriented timing, which is not well calibrated. Again, the primary reason for fixing this is to use proper delay strategy, and in particular to fix crashes that can result from using port 80 writes on machines that have resources on port 80, such as the ENE chips used by Quanta in latops it designs and sells to, e.g. HP. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/include/asm-x86/i8259.h === --- linux-2.6.orig/include/asm-x86/i8259.h +++ linux-2.6/include/asm-x86/i8259.h @@ -29,7 +29,23 @@ extern void enable_8259A_irq(unsigned in extern void disable_8259A_irq(unsigned int irq); extern unsigned int startup_8259A_irq(unsigned int irq); -#define inb_picinb_p -#define outb_pic outb_p +/* the PIC may need a careful delay on some platforms, hence specific calls */ +static inline unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +static inline void outb_pic(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + #endif /* __ASM_I8259_H__ */ -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [patch 1/2] x86: define outb_pic and inb_pic to stop using outb_p and inb_p
Alan Cox wrote: +unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} inline it. Its almost no instructions Will do. Assume you desire inlining of the outb_pic, and also the inb_pit and outb_pit routines. Didn't do it because the code is slightly bigger than the call. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] x86: revised - use explicit timing delay for pit accesses
x86: revised - use explicit timing delay for pit accesses pit accesses in i8253.c and pcspkr driver use outb_p for timing. Fix them to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. Since the pcspkr driver accesses PIT registers directly, it needs the symbol outb_pit exported, so it can be built as a module. Explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT. Fix pcspkr driver to use the new outb_pic call properly, use named PIC port values rather than hex constants, and drop its use of inb_p and outb_p in accessing port 61h where it has never been needed. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev { unsigned int count = 0; unsigned long flags; + unsigned char port61; if (type != EV_SND) return -1; @@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(_lock, flags); + port61 = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(port61 | 3, 0x61); /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb_pit(0xB6, PIT_MODE); /* select desired HZ */ - outb_p(count & 0xff, 0x42); - outb((count >> 8) & 0xff, 0x42); + outb_pit(count & 0xff, PIT_CH2); + outb((count >> 8) & 0xff, PIT_CH2); } else { /* disable counter 2 */ - outb(inb_p(0x61) & 0xFC, 0x61); + outb(port61 & 0xFC, 0x61); } spin_unlock_irqrestore(_lock, flags); Index: linux-2.6/arch/x86/kernel/i8253.c === --- linux-2.6.orig/arch/x86/kernel/i8253.c +++ linux-2.6/arch/x86/kernel/i8253.c @@ -31,6 +31,29 @@ static inline void pit_disable_clocksour struct clock_event_device *global_clock_event; /* + * define the PIT specific port access routines, which define the timing + * needed by the PIT registers on some platforms. + */ +unsigned char inb_pit(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} +EXPORT_SYMBOL(inb_pit); + +void outb_pit(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} +EXPORT_SYMBOL(outb_pit); + +/* * Initialize the PIT timer. * * This is also called after resume to bring the PIT into operation again. Index: linux-2.6/include/asm-x86/i8253.h === --- linux-2.6.orig/include/asm-x86/i8253.h +++ linux-2.6/include/asm-x86/i8253.h @@ -12,7 +12,9 @@ extern struct clock_event_device *global extern void setup_pit_timer(void); -#define inb_pitinb_p -#define outb_pit outb_p +/* accesses to PIT registers need careful delays on some platforms. Define + them here in a common place */ +extern unsigned char inb_pit(unsigned int port); +extern void outb_pit(unsigned char value, unsigned int port); #endif /* __ASM_I8253_H__ */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] [patch 2/2] x86: use explicit timing delay for pit accesses
Oops. the patch I just submitted for i8253.c didn't export the symbol needed by the pcspkr driver to build it as a module. I will send the revised patch shortly. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/2] x86: use explicit timing delay for pit accesses
pit accesses in i8253.c and pcspkr driver use outb_p for timing. Fix them to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. The explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT. Fix pcspkr driver to use the new outb_pic call properly, use named port values rather than hex constants, and drop its use of inb_p and outb_p in accessing port 61h where it has never been needed. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev { unsigned int count = 0; unsigned long flags; + unsigned char port61; if (type != EV_SND) return -1; @@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(_lock, flags); + port61 = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(port61 | 3, 0x61); /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb_pit(0xB6, PIT_MODE); /* select desired HZ */ - outb_p(count & 0xff, 0x42); - outb((count >> 8) & 0xff, 0x42); + outb_pit(count & 0xff, PIT_CH2); + outb((count >> 8) & 0xff, PIT_CH2); } else { /* disable counter 2 */ - outb(inb_p(0x61) & 0xFC, 0x61); + outb(port61 & 0xFC, 0x61); } spin_unlock_irqrestore(_lock, flags); Index: linux-2.6/arch/x86/kernel/i8253.c === --- linux-2.6.orig/arch/x86/kernel/i8253.c +++ linux-2.6/arch/x86/kernel/i8253.c @@ -31,6 +31,27 @@ static inline void pit_disable_clocksour struct clock_event_device *global_clock_event; /* + * define the PIT specific port access routines, which define the timing + * needed by the PIT registers on some platforms. + */ +unsigned char inb_pit(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +void outb_pit(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + +/* * Initialize the PIT timer. * * This is also called after resume to bring the PIT into operation again. Index: linux-2.6/include/asm-x86/i8253.h === --- linux-2.6.orig/include/asm-x86/i8253.h +++ linux-2.6/include/asm-x86/i8253.h @@ -12,7 +12,9 @@ extern struct clock_event_device *global extern void setup_pit_timer(void); -#define inb_pitinb_p -#define outb_pit outb_p +/* accesses to PIT registers need careful delays on some platforms. Define + them here in a common place */ +extern unsigned char inb_pit(unsigned int port); +extern void outb_pit(unsigned char value, unsigned int port); #endif /* __ASM_I8253_H__ */ -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/2] x86: define outb_pic and inb_pic to stop using outb_p and inb_p
The delay between io port accesses to the PIC is now defined using outb_pic and inb_pic. This fix provides the next step, using udelay(2) to define the *PIC specific* timing requirements, rather than on bus-oriented timing, which is not well calibrated. Again, the primary reason for fixing this is to use proper delay strategy, and in particular to fix crashes that can result from using port 80 writes on machines that have resources on port 80, such as the ENE chips used by Quanta in latops it designs and sells to, e.g. HP. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/arch/x86/kernel/i8259_32.c === --- linux-2.6.orig/arch/x86/kernel/i8259_32.c +++ linux-2.6/arch/x86/kernel/i8259_32.c @@ -277,6 +277,23 @@ static int __init i8259A_init_sysfs(void device_initcall(i8259A_init_sysfs); +unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +void outb_pic(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + void init_8259A(int auto_eoi) { unsigned long flags; Index: linux-2.6/arch/x86/kernel/i8259_64.c === --- linux-2.6.orig/arch/x86/kernel/i8259_64.c +++ linux-2.6/arch/x86/kernel/i8259_64.c @@ -347,6 +347,23 @@ static int __init i8259A_init_sysfs(void device_initcall(i8259A_init_sysfs); +unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +void outb_pic(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + void init_8259A(int auto_eoi) { unsigned long flags; Index: linux-2.6/include/asm-x86/i8259.h === --- linux-2.6.orig/include/asm-x86/i8259.h +++ linux-2.6/include/asm-x86/i8259.h @@ -28,8 +28,8 @@ extern void init_8259A(int auto_eoi); extern void enable_8259A_irq(unsigned int irq); extern void disable_8259A_irq(unsigned int irq); extern unsigned int startup_8259A_irq(unsigned int irq); - -#define inb_picinb_p -#define outb_pic outb_p +/* the PIC may need a careful delay on some platforms, hence specific calls */ +extern unsigned char inb_pic(unsigned int port); +extern void outb_pic(unsigned char value, unsigned int port); #endif /* __ASM_I8259_H__ */ -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/2] replacement submission for motherboard/chipset iodelay fixes
Here are the two revised patches based on Alan Cox's NAK's and suggestions regarding using the _pic and _pit versions of inb/outb. The new patches use udelay(2) as a conservative delay for pic and pit, and isolate that usage in the respective i8253.c and i8259_*.c files. Together with the already ack'ed patch for CMOS rtc (not included here) these should solve the problem with modern machines, the ones that don't use older devices, but only motherboard/chipset resources. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/2] x86: define outb_pic and inb_pic to stop using outb_p and inb_p
The delay between io port accesses to the PIC is now defined using outb_pic and inb_pic. This fix provides the next step, using udelay(2) to define the *PIC specific* timing requirements, rather than on bus-oriented timing, which is not well calibrated. Again, the primary reason for fixing this is to use proper delay strategy, and in particular to fix crashes that can result from using port 80 writes on machines that have resources on port 80, such as the ENE chips used by Quanta in latops it designs and sells to, e.g. HP. Signed-off-by: David P. Reed [EMAIL PROTECTED] Index: linux-2.6/arch/x86/kernel/i8259_32.c === --- linux-2.6.orig/arch/x86/kernel/i8259_32.c +++ linux-2.6/arch/x86/kernel/i8259_32.c @@ -277,6 +277,23 @@ static int __init i8259A_init_sysfs(void device_initcall(i8259A_init_sysfs); +unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +void outb_pic(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + void init_8259A(int auto_eoi) { unsigned long flags; Index: linux-2.6/arch/x86/kernel/i8259_64.c === --- linux-2.6.orig/arch/x86/kernel/i8259_64.c +++ linux-2.6/arch/x86/kernel/i8259_64.c @@ -347,6 +347,23 @@ static int __init i8259A_init_sysfs(void device_initcall(i8259A_init_sysfs); +unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +void outb_pic(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + void init_8259A(int auto_eoi) { unsigned long flags; Index: linux-2.6/include/asm-x86/i8259.h === --- linux-2.6.orig/include/asm-x86/i8259.h +++ linux-2.6/include/asm-x86/i8259.h @@ -28,8 +28,8 @@ extern void init_8259A(int auto_eoi); extern void enable_8259A_irq(unsigned int irq); extern void disable_8259A_irq(unsigned int irq); extern unsigned int startup_8259A_irq(unsigned int irq); - -#define inb_picinb_p -#define outb_pic outb_p +/* the PIC may need a careful delay on some platforms, hence specific calls */ +extern unsigned char inb_pic(unsigned int port); +extern void outb_pic(unsigned char value, unsigned int port); #endif /* __ASM_I8259_H__ */ -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/2] replacement submission for motherboard/chipset iodelay fixes
Here are the two revised patches based on Alan Cox's NAK's and suggestions regarding using the _pic and _pit versions of inb/outb. The new patches use udelay(2) as a conservative delay for pic and pit, and isolate that usage in the respective i8253.c and i8259_*.c files. Together with the already ack'ed patch for CMOS rtc (not included here) these should solve the problem with modern machines, the ones that don't use older devices, but only motherboard/chipset resources. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/2] x86: use explicit timing delay for pit accesses
pit accesses in i8253.c and pcspkr driver use outb_p for timing. Fix them to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. The explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT. Fix pcspkr driver to use the new outb_pic call properly, use named port values rather than hex constants, and drop its use of inb_p and outb_p in accessing port 61h where it has never been needed. Signed-off-by: David P. Reed [EMAIL PROTECTED] Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev { unsigned int count = 0; unsigned long flags; + unsigned char port61; if (type != EV_SND) return -1; @@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(i8253_lock, flags); + port61 = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(port61 | 3, 0x61); /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb_pit(0xB6, PIT_MODE); /* select desired HZ */ - outb_p(count 0xff, 0x42); - outb((count 8) 0xff, 0x42); + outb_pit(count 0xff, PIT_CH2); + outb((count 8) 0xff, PIT_CH2); } else { /* disable counter 2 */ - outb(inb_p(0x61) 0xFC, 0x61); + outb(port61 0xFC, 0x61); } spin_unlock_irqrestore(i8253_lock, flags); Index: linux-2.6/arch/x86/kernel/i8253.c === --- linux-2.6.orig/arch/x86/kernel/i8253.c +++ linux-2.6/arch/x86/kernel/i8253.c @@ -31,6 +31,27 @@ static inline void pit_disable_clocksour struct clock_event_device *global_clock_event; /* + * define the PIT specific port access routines, which define the timing + * needed by the PIT registers on some platforms. + */ +unsigned char inb_pit(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +void outb_pit(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + +/* * Initialize the PIT timer. * * This is also called after resume to bring the PIT into operation again. Index: linux-2.6/include/asm-x86/i8253.h === --- linux-2.6.orig/include/asm-x86/i8253.h +++ linux-2.6/include/asm-x86/i8253.h @@ -12,7 +12,9 @@ extern struct clock_event_device *global extern void setup_pit_timer(void); -#define inb_pitinb_p -#define outb_pit outb_p +/* accesses to PIT registers need careful delays on some platforms. Define + them here in a common place */ +extern unsigned char inb_pit(unsigned int port); +extern void outb_pit(unsigned char value, unsigned int port); #endif /* __ASM_I8253_H__ */ -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] [patch 2/2] x86: use explicit timing delay for pit accesses
Oops. the patch I just submitted for i8253.c didn't export the symbol needed by the pcspkr driver to build it as a module. I will send the revised patch shortly. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] x86: revised - use explicit timing delay for pit accesses
x86: revised - use explicit timing delay for pit accesses pit accesses in i8253.c and pcspkr driver use outb_p for timing. Fix them to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. Since the pcspkr driver accesses PIT registers directly, it needs the symbol outb_pit exported, so it can be built as a module. Explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT. Fix pcspkr driver to use the new outb_pic call properly, use named PIC port values rather than hex constants, and drop its use of inb_p and outb_p in accessing port 61h where it has never been needed. Signed-off-by: David P. Reed [EMAIL PROTECTED] Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev { unsigned int count = 0; unsigned long flags; + unsigned char port61; if (type != EV_SND) return -1; @@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(i8253_lock, flags); + port61 = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(port61 | 3, 0x61); /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb_pit(0xB6, PIT_MODE); /* select desired HZ */ - outb_p(count 0xff, 0x42); - outb((count 8) 0xff, 0x42); + outb_pit(count 0xff, PIT_CH2); + outb((count 8) 0xff, PIT_CH2); } else { /* disable counter 2 */ - outb(inb_p(0x61) 0xFC, 0x61); + outb(port61 0xFC, 0x61); } spin_unlock_irqrestore(i8253_lock, flags); Index: linux-2.6/arch/x86/kernel/i8253.c === --- linux-2.6.orig/arch/x86/kernel/i8253.c +++ linux-2.6/arch/x86/kernel/i8253.c @@ -31,6 +31,29 @@ static inline void pit_disable_clocksour struct clock_event_device *global_clock_event; /* + * define the PIT specific port access routines, which define the timing + * needed by the PIT registers on some platforms. + */ +unsigned char inb_pit(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} +EXPORT_SYMBOL(inb_pit); + +void outb_pit(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} +EXPORT_SYMBOL(outb_pit); + +/* * Initialize the PIT timer. * * This is also called after resume to bring the PIT into operation again. Index: linux-2.6/include/asm-x86/i8253.h === --- linux-2.6.orig/include/asm-x86/i8253.h +++ linux-2.6/include/asm-x86/i8253.h @@ -12,7 +12,9 @@ extern struct clock_event_device *global extern void setup_pit_timer(void); -#define inb_pitinb_p -#define outb_pit outb_p +/* accesses to PIT registers need careful delays on some platforms. Define + them here in a common place */ +extern unsigned char inb_pit(unsigned int port); +extern void outb_pit(unsigned char value, unsigned int port); #endif /* __ASM_I8253_H__ */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [patch 1/2] x86: define outb_pic and inb_pic to stop using outb_p and inb_p
Alan Cox wrote: +unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} inline it. Its almost no instructions Will do. Assume you desire inlining of the outb_pic, and also the inb_pit and outb_pit routines. Didn't do it because the code is slightly bigger than the call. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: define outb_pic and inb_pic to stop using outb_p and inb_p
x86: define outb_pic and inb_pic to stop using outb_p and inb_p The delay between io port accesses to the PIC is now defined using outb_pic and inb_pic. This fix provides the next step, using udelay(2) to define the *PIC specific* timing requirements, rather than on bus-oriented timing, which is not well calibrated. Again, the primary reason for fixing this is to use proper delay strategy, and in particular to fix crashes that can result from using port 80 writes on machines that have resources on port 80, such as the ENE chips used by Quanta in latops it designs and sells to, e.g. HP. Signed-off-by: David P. Reed [EMAIL PROTECTED] Index: linux-2.6/include/asm-x86/i8259.h === --- linux-2.6.orig/include/asm-x86/i8259.h +++ linux-2.6/include/asm-x86/i8259.h @@ -29,7 +29,23 @@ extern void enable_8259A_irq(unsigned in extern void disable_8259A_irq(unsigned int irq); extern unsigned int startup_8259A_irq(unsigned int irq); -#define inb_picinb_p -#define outb_pic outb_p +/* the PIC may need a careful delay on some platforms, hence specific calls */ +static inline unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +static inline void outb_pic(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + #endif /* __ASM_I8259_H__ */ -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: use explicit timing delay for pit accesses in kernel and pcspkr driver
x86: use explicit timing delay for pit accesses in kernel and pcspkr driver pit accesses in i8253.c and pcspkr driver use outb_p for timing. Fix them to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. Since the pcspkr driver accesses PIT registers directly, it should also use outb_pit, which is inlined, so does not need bo exported. Explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT. Fix pcspkr driver to use the new outb_pic call properly, use named PIC port values rather than hex constants, and drop its use of inb_p and outb_p in accessing port 61h where it has never been needed. Signed-off-by: David P. Reed [EMAIL PROTECTED] Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev { unsigned int count = 0; unsigned long flags; + unsigned char port61; if (type != EV_SND) return -1; @@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(i8253_lock, flags); + port61 = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(port61 | 3, 0x61); /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb_pit(0xB6, PIT_MODE); /* select desired HZ */ - outb_p(count 0xff, 0x42); - outb((count 8) 0xff, 0x42); + outb_pit(count 0xff, PIT_CH2); + outb((count 8) 0xff, PIT_CH2); } else { /* disable counter 2 */ - outb(inb_p(0x61) 0xFC, 0x61); + outb(port61 0xFC, 0x61); } spin_unlock_irqrestore(i8253_lock, flags); Index: linux-2.6/include/asm-x86/i8253.h === --- linux-2.6.orig/include/asm-x86/i8253.h +++ linux-2.6/include/asm-x86/i8253.h @@ -12,7 +12,25 @@ extern struct clock_event_device *global extern void setup_pit_timer(void); -#define inb_pitinb_p -#define outb_pit outb_p +/* accesses to PIT registers need careful delays on some platforms. Define + them here in a common place */ +static inline unsigned char inb_pit(unsigned int port) +{ + /* delay for some accesses to PIT on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +static inline void outb_pit(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIT on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + + #endif /* __ASM_I8253_H__ */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH 1/3] x86: fix init_8259A() to not use outb_pic
Rene Herman wrote: On 17-02-08 23:25, Alan Cox wrote: On Sun, 17 Feb 2008 16:56:28 -0500 (EST) "David P. Reed" <[EMAIL PROTECTED]> wrote: fix init_8259A() which initializes the 8259 PIC to not use outb_pic, which is a renamed version of outb_p, and delete outb_pic define. NAK The entire point of inb_pic/outb_pic is to isolate the various methods and keep the logic for delays in one place. Undoing this just creates a nasty mess. Quite probably inb_pic/outb_pic will end up as static inlines that do inb or outb with a udelay of 1 or 2 but that is where the knowledge belongs. Additional NAK in sofar that the PIC delays were reported to be necesary with some VIA chipsets earlier in these threads. Rene. This not being a place where performance matters, I will submit a new patch that changes inb_pic and outb_pic to use udelay(2). However, note that init_8259A does not use these consistently in its own accesses to the PIC registers. Should I change it to use the _pic calls whereever it touches the PIC registers to be conservative? Note that there is a udelay(100) after the registers are all setup, perhaps this is the real VIA requirement... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] x86: fix pcspkr to not use inb_p/outb_p calls.
Fix pcspkr driver to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. The explicit timing delay is only needed for accesses to the 8253 PIT. The standard requirement for the 8253 to respond to successive writes is 1 microsecond. The 8253 has never been on the expansion bus, so a proper delay has nothing to do with expansion bus timing, but instead its internal logic's capability to react to input. Since udelay is correctly calibrated by the time the pcspkr driver is initialized, we use 1 microsecond as the timing. Also shorten lines to less than 80 characters. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -32,9 +32,11 @@ MODULE_ALIAS("platform:pcspkr"); static DEFINE_SPINLOCK(i8253_lock); #endif -static int pcspkr_event(struct input_dev *dev, unsigned int type, unsigned int code, int value) +static int pcspkr_event(struct input_dev *dev, unsigned int type, + unsigned int code, int value) { unsigned int count = 0; + unsigned char mask; unsigned long flags; if (type != EV_SND) @@ -51,17 +53,21 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(_lock, flags); + mask = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(mask | 3, 0x61); + /* some 8253's may require 1 usec. between accesses */ /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb(0xB6, 0x43); + udelay(1); /* select desired HZ */ - outb_p(count & 0xff, 0x42); + outb(count & 0xff, 0x42); + udelay(1); outb((count >> 8) & 0xff, 0x42); } else { /* disable counter 2 */ - outb(inb_p(0x61) & 0xFC, 0x61); + outb(mask & 0xFC, 0x61); } spin_unlock_irqrestore(_lock, flags); -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] x86: fix init_8259A() to not use outb_pic
fix init_8259A() which initializes the 8259 PIC to not use outb_pic, which is a renamed version of outb_p, and delete outb_pic define. There is already code in the .c files that does accesses to CMD & IMR registers in successive outb() calls without _p. Thus the outb_p is obviously not needed, if it ever was. Research into chipset documentation and old BIOS listings shows that IODELAY was not used even in early machines. Thus the delay between i/o port writes was deleted for the 8259. Again, the primary reason for fixing this is to use proper delay strategy, and in particular to fix crashes that can result from using port 80 writes on machines that have resources on port 80, such as the ENE chips used by Quanta in latops it designs and sells to, e.g. HP. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/arch/x86/kernel/i8259_32.c === --- linux-2.6.orig/arch/x86/kernel/i8259_32.c +++ linux-2.6/arch/x86/kernel/i8259_32.c @@ -285,24 +285,30 @@ void init_8259A(int auto_eoi) spin_lock_irqsave(_lock, flags); - outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */ - outb(0xff, PIC_SLAVE_IMR); /* mask all of 8259A-2 */ - - /* -* outb_pic - this has to work on a wide range of PC hardware. -*/ - outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */ - outb_pic(0x20 + 0, PIC_MASTER_IMR); /* ICW2: 8259A-1 IR0-7 mapped to 0x20-0x27 */ - outb_pic(1U << PIC_CASCADE_IR, PIC_MASTER_IMR); /* 8259A-1 (the master) has a slave on IR2 */ + /* mask all of 8259A-1 */ + outb(0xff, PIC_MASTER_IMR); + /* mask all of 8259A-2 */ + outb(0xff, PIC_SLAVE_IMR); + + /* ICW1: select 8259A-1 init */ + outb(0x11, PIC_MASTER_CMD); + /* ICW2: 8259A-1 IR0-7 mapped to 0x20-0x27 */ + outb(0x20 + 0, PIC_MASTER_IMR); + /* 8259A-1 (the master) has a slave on IR2 */ + outb(1U << PIC_CASCADE_IR, PIC_MASTER_IMR); if (auto_eoi) /* master does Auto EOI */ - outb_pic(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); else/* master expects normal EOI */ - outb_pic(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); - outb_pic(0x11, PIC_SLAVE_CMD); /* ICW1: select 8259A-2 init */ - outb_pic(0x20 + 8, PIC_SLAVE_IMR); /* ICW2: 8259A-2 IR0-7 mapped to 0x28-0x2f */ - outb_pic(PIC_CASCADE_IR, PIC_SLAVE_IMR);/* 8259A-2 is a slave on master's IR2 */ - outb_pic(SLAVE_ICW4_DEFAULT, PIC_SLAVE_IMR); /* (slave's support for AEOI in flat mode is to be investigated) */ + /* ICW1: select 8259A-2 init */ + outb(0x11, PIC_SLAVE_CMD); + /* ICW2: 8259A-2 IR0-7 mapped to 0x28-0x2f */ + outb(0x20 + 8, PIC_SLAVE_IMR); + /* 8259A-2 is a slave on master's IR2 */ + outb(PIC_CASCADE_IR, PIC_SLAVE_IMR); + /* (slave's support for AEOI in flat mode is to be investigated) */ + outb(SLAVE_ICW4_DEFAULT, PIC_SLAVE_IMR); if (auto_eoi) /* * In AEOI mode we just have to mask the interrupt Index: linux-2.6/arch/x86/kernel/i8259_64.c === --- linux-2.6.orig/arch/x86/kernel/i8259_64.c +++ linux-2.6/arch/x86/kernel/i8259_64.c @@ -355,29 +355,30 @@ void init_8259A(int auto_eoi) spin_lock_irqsave(_lock, flags); - outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */ - outb(0xff, PIC_SLAVE_IMR); /* mask all of 8259A-2 */ + /* mask all of 8259A-1 */ + outb(0xff, PIC_MASTER_IMR); + /* mask all of 8259A-2 */ + outb(0xff, PIC_SLAVE_IMR); - /* -* outb_pic - this has to work on a wide range of PC hardware. -*/ - outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */ + /* ICW1: select 8259A-1 init */ + outb(0x11, PIC_MASTER_CMD); /* ICW2: 8259A-1 IR0-7 mapped to 0x30-0x37 */ - outb_pic(IRQ0_VECTOR, PIC_MASTER_IMR); + outb(IRQ0_VECTOR, PIC_MASTER_IMR); /* 8259A-1 (the master) has a slave on IR2 */ - outb_pic(0x04, PIC_MASTER_IMR); + outb(0x04, PIC_MASTER_IMR); if (auto_eoi) /* master does Auto EOI */ - outb_pic(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); else/* master expects normal EOI */ - outb_pic(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); - outb_pic(0x11, PIC_SLAVE_CMD); /* ICW1: select 8259A-2 init */ + /* ICW1: select 8259A-2 init */ + outb(0x11, PIC_SLAVE_CMD); /* ICW2: 8259A-2 IR0-7 mapped to 0x38-0x3f */ -
[PATCH 2/3] x86: fix cmos read and write to not use inb_p and outb_p
fix code to access CMOS rtc registers so that it does not use inb_p and outb_p routines, which are deprecated. Extensive research on all known CMOS RTC chipset timing shows that there is no need for a delay in accessing the registers of these chips even on old machines. These chipa are never on an expansion bus, but have always been "motherboard" resources, either in the processor chipset or explicitly on the motherboard, and they are not part of the ISA/LPC or PCI buses, so delays should not be based on bus timing. The reason to fix it: 1) port 80 writes often hang some laptops that use ENE EC chipsets, esp. those designed and manufactured by Quanta for HP; 2) RTC accesses are timing sensitive, and extra microseconds may matter; 3) the new "io_delay" function is calibrated by expansion bus timing needs, thus is not appropriate for access to CMOS rtc registers. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/arch/x86/kernel/rtc.c === --- linux-2.6.orig/arch/x86/kernel/rtc.c +++ linux-2.6/arch/x86/kernel/rtc.c @@ -151,8 +151,8 @@ unsigned char rtc_cmos_read(unsigned cha unsigned char val; lock_cmos_prefix(addr); - outb_p(addr, RTC_PORT(0)); - val = inb_p(RTC_PORT(1)); + outb(addr, RTC_PORT(0)); + val = inb(RTC_PORT(1)); lock_cmos_suffix(addr); return val; } @@ -161,8 +161,8 @@ EXPORT_SYMBOL(rtc_cmos_read); void rtc_cmos_write(unsigned char val, unsigned char addr) { lock_cmos_prefix(addr); - outb_p(addr, RTC_PORT(0)); - outb_p(val, RTC_PORT(1)); + outb(addr, RTC_PORT(0)); + outb(val, RTC_PORT(1)); lock_cmos_suffix(addr); } EXPORT_SYMBOL(rtc_cmos_write); -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] x86: cleanup primary motherboard chip port access delays
cleanup motherboard chip io port delays. inb_p and outb_p have traditionally used a write to port 80 (a non-existent port) as a delay. Though there is an argument that that is a good delay for devices on the ISA or PCI expansion buses it is not a good mechanism for devices in the processor chipset or on the "motherboard". The write to port 80 at best causes an abort on the ISA or LPC bus, and on some machines (like many of the HP laptops manufactured by Quanta) actually writes data to real i/o devices. For example, the ENE Embedded Controller chip family defaults to provide a register at port 80 that can be written, and which can cause an interrupt in the Embedded Controller. This has been shown to cause hangs on some machines, especially in accessing the CMOS RTC during bootup. This patch series addresses three of the places where these are used in common kernel code - in particular the three uses that affect the HP laptops mentioned above, modifying the delays to match the worst known delay issues for the specific chips. The patch set is complementary to the iodelay= kernel parameter added in 2.6.25, since it means fewer users will need to add that parameter to run linux "out of the box" without hanging. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] x86: fix pcspkr to not use inb_p/outb_p calls.
Fix pcspkr driver to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. The explicit timing delay is only needed for accesses to the 8253 PIT. The standard requirement for the 8253 to respond to successive writes is 1 microsecond. The 8253 has never been on the expansion bus, so a proper delay has nothing to do with expansion bus timing, but instead its internal logic's capability to react to input. Since udelay is correctly calibrated by the time the pcspkr driver is initialized, we use 1 microsecond as the timing. Also shorten lines to less than 80 characters. Signed-off-by: David P. Reed [EMAIL PROTECTED] Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -32,9 +32,11 @@ MODULE_ALIAS(platform:pcspkr); static DEFINE_SPINLOCK(i8253_lock); #endif -static int pcspkr_event(struct input_dev *dev, unsigned int type, unsigned int code, int value) +static int pcspkr_event(struct input_dev *dev, unsigned int type, + unsigned int code, int value) { unsigned int count = 0; + unsigned char mask; unsigned long flags; if (type != EV_SND) @@ -51,17 +53,21 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(i8253_lock, flags); + mask = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(mask | 3, 0x61); + /* some 8253's may require 1 usec. between accesses */ /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb(0xB6, 0x43); + udelay(1); /* select desired HZ */ - outb_p(count 0xff, 0x42); + outb(count 0xff, 0x42); + udelay(1); outb((count 8) 0xff, 0x42); } else { /* disable counter 2 */ - outb(inb_p(0x61) 0xFC, 0x61); + outb(mask 0xFC, 0x61); } spin_unlock_irqrestore(i8253_lock, flags); -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] x86: fix init_8259A() to not use outb_pic
fix init_8259A() which initializes the 8259 PIC to not use outb_pic, which is a renamed version of outb_p, and delete outb_pic define. There is already code in the .c files that does accesses to CMD IMR registers in successive outb() calls without _p. Thus the outb_p is obviously not needed, if it ever was. Research into chipset documentation and old BIOS listings shows that IODELAY was not used even in early machines. Thus the delay between i/o port writes was deleted for the 8259. Again, the primary reason for fixing this is to use proper delay strategy, and in particular to fix crashes that can result from using port 80 writes on machines that have resources on port 80, such as the ENE chips used by Quanta in latops it designs and sells to, e.g. HP. Signed-off-by: David P. Reed [EMAIL PROTECTED] Index: linux-2.6/arch/x86/kernel/i8259_32.c === --- linux-2.6.orig/arch/x86/kernel/i8259_32.c +++ linux-2.6/arch/x86/kernel/i8259_32.c @@ -285,24 +285,30 @@ void init_8259A(int auto_eoi) spin_lock_irqsave(i8259A_lock, flags); - outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */ - outb(0xff, PIC_SLAVE_IMR); /* mask all of 8259A-2 */ - - /* -* outb_pic - this has to work on a wide range of PC hardware. -*/ - outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */ - outb_pic(0x20 + 0, PIC_MASTER_IMR); /* ICW2: 8259A-1 IR0-7 mapped to 0x20-0x27 */ - outb_pic(1U PIC_CASCADE_IR, PIC_MASTER_IMR); /* 8259A-1 (the master) has a slave on IR2 */ + /* mask all of 8259A-1 */ + outb(0xff, PIC_MASTER_IMR); + /* mask all of 8259A-2 */ + outb(0xff, PIC_SLAVE_IMR); + + /* ICW1: select 8259A-1 init */ + outb(0x11, PIC_MASTER_CMD); + /* ICW2: 8259A-1 IR0-7 mapped to 0x20-0x27 */ + outb(0x20 + 0, PIC_MASTER_IMR); + /* 8259A-1 (the master) has a slave on IR2 */ + outb(1U PIC_CASCADE_IR, PIC_MASTER_IMR); if (auto_eoi) /* master does Auto EOI */ - outb_pic(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); else/* master expects normal EOI */ - outb_pic(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); - outb_pic(0x11, PIC_SLAVE_CMD); /* ICW1: select 8259A-2 init */ - outb_pic(0x20 + 8, PIC_SLAVE_IMR); /* ICW2: 8259A-2 IR0-7 mapped to 0x28-0x2f */ - outb_pic(PIC_CASCADE_IR, PIC_SLAVE_IMR);/* 8259A-2 is a slave on master's IR2 */ - outb_pic(SLAVE_ICW4_DEFAULT, PIC_SLAVE_IMR); /* (slave's support for AEOI in flat mode is to be investigated) */ + /* ICW1: select 8259A-2 init */ + outb(0x11, PIC_SLAVE_CMD); + /* ICW2: 8259A-2 IR0-7 mapped to 0x28-0x2f */ + outb(0x20 + 8, PIC_SLAVE_IMR); + /* 8259A-2 is a slave on master's IR2 */ + outb(PIC_CASCADE_IR, PIC_SLAVE_IMR); + /* (slave's support for AEOI in flat mode is to be investigated) */ + outb(SLAVE_ICW4_DEFAULT, PIC_SLAVE_IMR); if (auto_eoi) /* * In AEOI mode we just have to mask the interrupt Index: linux-2.6/arch/x86/kernel/i8259_64.c === --- linux-2.6.orig/arch/x86/kernel/i8259_64.c +++ linux-2.6/arch/x86/kernel/i8259_64.c @@ -355,29 +355,30 @@ void init_8259A(int auto_eoi) spin_lock_irqsave(i8259A_lock, flags); - outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */ - outb(0xff, PIC_SLAVE_IMR); /* mask all of 8259A-2 */ + /* mask all of 8259A-1 */ + outb(0xff, PIC_MASTER_IMR); + /* mask all of 8259A-2 */ + outb(0xff, PIC_SLAVE_IMR); - /* -* outb_pic - this has to work on a wide range of PC hardware. -*/ - outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */ + /* ICW1: select 8259A-1 init */ + outb(0x11, PIC_MASTER_CMD); /* ICW2: 8259A-1 IR0-7 mapped to 0x30-0x37 */ - outb_pic(IRQ0_VECTOR, PIC_MASTER_IMR); + outb(IRQ0_VECTOR, PIC_MASTER_IMR); /* 8259A-1 (the master) has a slave on IR2 */ - outb_pic(0x04, PIC_MASTER_IMR); + outb(0x04, PIC_MASTER_IMR); if (auto_eoi) /* master does Auto EOI */ - outb_pic(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); else/* master expects normal EOI */ - outb_pic(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); - outb_pic(0x11, PIC_SLAVE_CMD); /* ICW1: select 8259A-2 init */ + /* ICW1: select 8259A-2 init */ + outb(0x11, PIC_SLAVE_CMD); /* ICW2: 8259A-2 IR0-7 mapped to 0x38-0x3f */ - outb_pic
[PATCH 2/3] x86: fix cmos read and write to not use inb_p and outb_p
fix code to access CMOS rtc registers so that it does not use inb_p and outb_p routines, which are deprecated. Extensive research on all known CMOS RTC chipset timing shows that there is no need for a delay in accessing the registers of these chips even on old machines. These chipa are never on an expansion bus, but have always been motherboard resources, either in the processor chipset or explicitly on the motherboard, and they are not part of the ISA/LPC or PCI buses, so delays should not be based on bus timing. The reason to fix it: 1) port 80 writes often hang some laptops that use ENE EC chipsets, esp. those designed and manufactured by Quanta for HP; 2) RTC accesses are timing sensitive, and extra microseconds may matter; 3) the new io_delay function is calibrated by expansion bus timing needs, thus is not appropriate for access to CMOS rtc registers. Signed-off-by: David P. Reed [EMAIL PROTECTED] Index: linux-2.6/arch/x86/kernel/rtc.c === --- linux-2.6.orig/arch/x86/kernel/rtc.c +++ linux-2.6/arch/x86/kernel/rtc.c @@ -151,8 +151,8 @@ unsigned char rtc_cmos_read(unsigned cha unsigned char val; lock_cmos_prefix(addr); - outb_p(addr, RTC_PORT(0)); - val = inb_p(RTC_PORT(1)); + outb(addr, RTC_PORT(0)); + val = inb(RTC_PORT(1)); lock_cmos_suffix(addr); return val; } @@ -161,8 +161,8 @@ EXPORT_SYMBOL(rtc_cmos_read); void rtc_cmos_write(unsigned char val, unsigned char addr) { lock_cmos_prefix(addr); - outb_p(addr, RTC_PORT(0)); - outb_p(val, RTC_PORT(1)); + outb(addr, RTC_PORT(0)); + outb(val, RTC_PORT(1)); lock_cmos_suffix(addr); } EXPORT_SYMBOL(rtc_cmos_write); -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] x86: cleanup primary motherboard chip port access delays
cleanup motherboard chip io port delays. inb_p and outb_p have traditionally used a write to port 80 (a non-existent port) as a delay. Though there is an argument that that is a good delay for devices on the ISA or PCI expansion buses it is not a good mechanism for devices in the processor chipset or on the motherboard. The write to port 80 at best causes an abort on the ISA or LPC bus, and on some machines (like many of the HP laptops manufactured by Quanta) actually writes data to real i/o devices. For example, the ENE Embedded Controller chip family defaults to provide a register at port 80 that can be written, and which can cause an interrupt in the Embedded Controller. This has been shown to cause hangs on some machines, especially in accessing the CMOS RTC during bootup. This patch series addresses three of the places where these are used in common kernel code - in particular the three uses that affect the HP laptops mentioned above, modifying the delays to match the worst known delay issues for the specific chips. The patch set is complementary to the iodelay= kernel parameter added in 2.6.25, since it means fewer users will need to add that parameter to run linux out of the box without hanging. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH 1/3] x86: fix init_8259A() to not use outb_pic
Rene Herman wrote: On 17-02-08 23:25, Alan Cox wrote: On Sun, 17 Feb 2008 16:56:28 -0500 (EST) David P. Reed [EMAIL PROTECTED] wrote: fix init_8259A() which initializes the 8259 PIC to not use outb_pic, which is a renamed version of outb_p, and delete outb_pic define. NAK The entire point of inb_pic/outb_pic is to isolate the various methods and keep the logic for delays in one place. Undoing this just creates a nasty mess. Quite probably inb_pic/outb_pic will end up as static inlines that do inb or outb with a udelay of 1 or 2 but that is where the knowledge belongs. Additional NAK in sofar that the PIC delays were reported to be necesary with some VIA chipsets earlier in these threads. Rene. This not being a place where performance matters, I will submit a new patch that changes inb_pic and outb_pic to use udelay(2). However, note that init_8259A does not use these consistently in its own accesses to the PIC registers. Should I change it to use the _pic calls whereever it touches the PIC registers to be conservative? Note that there is a udelay(100) after the registers are all setup, perhaps this is the real VIA requirement... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
David Woodhouse wrote: On Fri, 2008-01-11 at 09:35 -0500, David P. Reed wrote: Using any "unused port" for a delay means that the machine check feature is wasted and utterly unusable. Not entirely unusable. You can recover silently from the machine check if it was one of the known accesses to the 'unused port'. It certainly achieves a delay :) I'm sure that's what the driver writers had in mind. ;-) And I think we probably have a great shot at getting Intel, Microsoft, HP, et al.. to add a feature for Linux to one of the ACPI table specifications that define an "unused port for delay purposes" field in the ACPI 4.0 spec, and retrofit it into PC/104 machine BIOSes. At least Microsoft doesn't have a patent on using port 80 for delay purposes. :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
David Woodhouse wrote: On Fri, 2008-01-11 at 09:35 -0500, David P. Reed wrote: Using any unused port for a delay means that the machine check feature is wasted and utterly unusable. Not entirely unusable. You can recover silently from the machine check if it was one of the known accesses to the 'unused port'. It certainly achieves a delay :) I'm sure that's what the driver writers had in mind. ;-) And I think we probably have a great shot at getting Intel, Microsoft, HP, et al.. to add a feature for Linux to one of the ACPI table specifications that define an unused port for delay purposes field in the ACPI 4.0 spec, and retrofit it into PC/104 machine BIOSes. At least Microsoft doesn't have a patent on using port 80 for delay purposes. :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: bus abort on the LPC bus". More problematic is that I would think some people might want to turn on the AMD feature that generates machine checks if a bus timeout happens. The whole point of machine checks is An ISA/LPC bus timeout is fulfilled by the bridge so doesn't cause an MCE. Good possibility, but the documentation on HyperTransport suggests otherwise, even for LPC bridges in this particular modern world of AMD64. I might do the experiment someday to see if my LPC bridge is implemented in a way that does or doesn't support enabling MCE's. Could be different between Intel and AMD - I haven't had reason to pore over the Intel chipset specs, since my poking into all this stuff has been driven by my personal machine's issues, and it's not got any Intel compatible parts. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Rene Herman wrote: On 11-01-08 02:36, Zachary Amsden wrote: FWIW, I fixed the problem locally by recompiling, changing port 80 to port 84 in io.h; works great, and doesn't conflict with any occupied ports. Might not give you a "proper" delay though. 0xed should be a better choice... I don't think there is any magic here. I modified the patch to do *no delay at all* in the io_delay "quirk" and have been running reliably for weeks including the very heavy I/O load that comes from using software RAID on this nice laptop that has two separate SATA drives! This particular laptop has no problematic devices - the only problem is actually in the CMOS_READ and CMOS_WRITE macros that *use* the _p operations in a way that is unnecessary on this machine. (in fact, it would be hard to add a problematic device - there's no PCMCIA slot either, and so every option is USB or Firewire). Using 0xED happens to work, but it's not guaranteed to work either. There is not a "standard" for an "unused port that is mapped to cause a bus abort on the LPC bus". More problematic is that I would think some people might want to turn on the AMD feature that generates machine checks if a bus timeout happens. The whole point of machine checks is to allow the machine to be more reliable. Using any "unused port" for a delay means that the machine check feature is wasted and utterly unusable. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Rene Herman wrote: On 11-01-08 02:36, Zachary Amsden wrote: FWIW, I fixed the problem locally by recompiling, changing port 80 to port 84 in io.h; works great, and doesn't conflict with any occupied ports. Might not give you a proper delay though. 0xed should be a better choice... I don't think there is any magic here. I modified the patch to do *no delay at all* in the io_delay quirk and have been running reliably for weeks including the very heavy I/O load that comes from using software RAID on this nice laptop that has two separate SATA drives! This particular laptop has no problematic devices - the only problem is actually in the CMOS_READ and CMOS_WRITE macros that *use* the _p operations in a way that is unnecessary on this machine. (in fact, it would be hard to add a problematic device - there's no PCMCIA slot either, and so every option is USB or Firewire). Using 0xED happens to work, but it's not guaranteed to work either. There is not a standard for an unused port that is mapped to cause a bus abort on the LPC bus. More problematic is that I would think some people might want to turn on the AMD feature that generates machine checks if a bus timeout happens. The whole point of machine checks is to allow the machine to be more reliable. Using any unused port for a delay means that the machine check feature is wasted and utterly unusable. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: bus abort on the LPC bus. More problematic is that I would think some people might want to turn on the AMD feature that generates machine checks if a bus timeout happens. The whole point of machine checks is An ISA/LPC bus timeout is fulfilled by the bridge so doesn't cause an MCE. Good possibility, but the documentation on HyperTransport suggests otherwise, even for LPC bridges in this particular modern world of AMD64. I might do the experiment someday to see if my LPC bridge is implemented in a way that does or doesn't support enabling MCE's. Could be different between Intel and AMD - I haven't had reason to pore over the Intel chipset specs, since my poking into all this stuff has been driven by my personal machine's issues, and it's not got any Intel compatible parts. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Rene Herman wrote: On 10-01-08 01:37, Robert Hancock wrote: I agree. In this case the BIOS on these laptops is trying to tell us "port 80 is used for our purposes, do not touch it". We should be listening. Listening is fine but what are you going to do after you have listened? Right, not use port 0x80 but since that's the current idea anyway outside of legacy drivers, you don't actually need to listen. If the quirk-to-0xed or similar was to stay, it's a much better switching point than DMI strings but if not, it's not actually important. Well, I was just suggesting a warning that would come up when a driver that still used port 80 was initialized... I think that is what Alan Cox and others suggest for legacy drivers that have worked - I agree that it may not be the right thing to screw them up, especially since my laptop, and probably most machines that might start using port 80 or other "legacy ports" won't ever need those drivers. I thought more about a complete solution last night. A clean idea that really fits the linux design might be the following outline of a patch. I suspect it might seem far less ugly, and probably would meet Alan Cox's needs, too - I am very sympathetic about not breaking 8390's, etc. Define a "motherboard resources" driver that claims all the resources defined for PNP0C02 devices during the pnp process. I think Windoze actually does something quite similar. This would claim port 80. Define an iodelay driver. This driver exists largely to claim port 80 for the iodelay operation (you could even define an option for other ports). Legacy drivers would be modified to require iodelay. The iodelay driver would set up the iodelay mechanism to do something other than the "boot time" default - which could be no delay, or udelay. It would also set a flag that says "_b operations are safe". Put a WARN_ONCE() in the in/out*_b operations that checks the flag that is set by the iodelay driver. This would only trigger in the case that 80 or whatever was reserved by some other device driver - such as the motherboard resources driver above. Modern machines won't do that. Finally, anything that happens before the motherboard resources and iodelay drivers are initialized cannot use in*_p or out*_p (whether they can be loadable modules rather than built in is a question). This is a very small set, and I believe with the exception of the PIT (8253/4) are very safe. Note that this idea is also compatible with rewriting all drivers to use "iodelay()" explicitly instead of _p(). But it doesn't require that. Rene. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Rene Herman wrote: On 10-01-08 01:37, Robert Hancock wrote: I agree. In this case the BIOS on these laptops is trying to tell us port 80 is used for our purposes, do not touch it. We should be listening. Listening is fine but what are you going to do after you have listened? Right, not use port 0x80 but since that's the current idea anyway outside of legacy drivers, you don't actually need to listen. If the quirk-to-0xed or similar was to stay, it's a much better switching point than DMI strings but if not, it's not actually important. Well, I was just suggesting a warning that would come up when a driver that still used port 80 was initialized... I think that is what Alan Cox and others suggest for legacy drivers that have worked - I agree that it may not be the right thing to screw them up, especially since my laptop, and probably most machines that might start using port 80 or other legacy ports won't ever need those drivers. I thought more about a complete solution last night. A clean idea that really fits the linux design might be the following outline of a patch. I suspect it might seem far less ugly, and probably would meet Alan Cox's needs, too - I am very sympathetic about not breaking 8390's, etc. Define a motherboard resources driver that claims all the resources defined for PNP0C02 devices during the pnp process. I think Windoze actually does something quite similar. This would claim port 80. Define an iodelay driver. This driver exists largely to claim port 80 for the iodelay operation (you could even define an option for other ports). Legacy drivers would be modified to require iodelay. The iodelay driver would set up the iodelay mechanism to do something other than the boot time default - which could be no delay, or udelay. It would also set a flag that says _b operations are safe. Put a WARN_ONCE() in the in/out*_b operations that checks the flag that is set by the iodelay driver. This would only trigger in the case that 80 or whatever was reserved by some other device driver - such as the motherboard resources driver above. Modern machines won't do that. Finally, anything that happens before the motherboard resources and iodelay drivers are initialized cannot use in*_p or out*_p (whether they can be loadable modules rather than built in is a question). This is a very small set, and I believe with the exception of the PIT (8253/4) are very safe. Note that this idea is also compatible with rewriting all drivers to use iodelay() explicitly instead of _p(). But it doesn't require that. Rene. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Zachary Amsden wrote: According to Phoenix Technologies book "System BIOS for IBM PCs, Compatibles and EISA Computers, 2nd Edition", the I/O port list gives port 0080h R/W Extra page register (temporary storage) Despite looking, I've never seen it documented anywhere else, but I believe it works on just about every PC platform. Except, apparently, my laptop. The port 80 problem was discovered by me, after months of "bisecting" the running code around a problem with hanging when using hwclock in 64-bit mode when ACPI is on. So it kills my laptop, too, and many currentlaptop motherboards designed by Quanta for HP and Compaq (dv6000, dv9000, tx1000, apparently) In the last couple of weeks, I was able with luck to discover that the problem is the ENE KB3920 chip, which is the "big brother" of the KB3700 chip included in the OLPC XO "$100 laptop" made also by Quanta. I verified this by taking my laptop apart - a fun and risky experience. Didn't break any connectors, but I don't recommend it for those who are not experienced disassembling laptops and cellphones, etc. The KB3920 contains an EC, an SMBus, a KBC, some watchdog timers, and a variety of other functions that keep the laptop going, coordinating the relationships among various peripherals. The firmware is part standard from ENE, part OEM-specific, in this case coded by Quanta or a BIOS subcontractor. You can read the specsheet for the KB3700 online at laptop.org, since the specs of the laptop are "open". The 3920's spec is confidential. And the firmware is confidential as well for both the 3700 and 3920. Clues as to what it does can be gleaned by reading the disassembler output of the DSDT code in the particular laptops - though the SMM BIOS probably also talks to it. Modern machines have many subsystems, and the ACPI and SMBIOS coordinate to run them; blade servers also have drawer controllers and backplane management buses. The part that runs Linux is only part of the machine. Your laptop isn't an aberration. It's part of the new generation of evolved machines that take advantage of the capabilities of ACPI and SMBIOS and DMI standards that are becoming core parts of the market. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Christer Weinigel wrote: Did I miss anyting? Nothing that seems *crucial* going forward for Linux. The fate of "legacy machines" is really important to get right. I have a small suggestion in mind that might be helpful in the future: the "motherboard resources" discovered as PNP0C02 devices in their _CRS settings in ACPI during ACPI PnP startup should be reserved (or checked), and any drivers that still use port 80 implicitly should reserve that port. This may be too late in the boot process to make a decision not to use port 80, and it doesn't help decide a strategy to use an alternate port (0xED happens to "work" on the dv9000 machines in the sense that it generates a bus timeout on LPC, but there is no guarantee that 0xED is free on any particular motherboard, and "unusedness" is not declared in any BIOS/ACPI tables) or to use a udelay-based iodelay (but there is nothing in the BIOS tables that suggest the right delays for various I/O ports if any modern parts need them...which I question, but can't prove a negative in general). However, doing the reservations on such resources could generate a warning that would help diagnose new current and future designs including devices like the ENE KB3920 that have a port that is defaulted to port 80 and routed to the EC for functions that the firmware and ACPI can agree to do. Or any other ports used in new ways and properly notified to the OS via the now-standard Wintel BIOS functions. I don't know if /proc/ioports is being maintained, but the fact that it doesn't contain all of those PNP0C02 resources known on my machine seems to be a bug - which I am happy to code a patch or two for as a contribution back to Linux, if it isn't on the way out as the /sys hierarchy does a better job. /sys/bus/pnp/... does get built properly and has port 80 described properly - not as a DMA port, but as a port in use by device 05:00, which is the motherboard resource catchall. Thus the patch would be small. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Christer Weinigel wrote: Did I miss anyting? Nothing that seems *crucial* going forward for Linux. The fate of legacy machines is really important to get right. I have a small suggestion in mind that might be helpful in the future: the motherboard resources discovered as PNP0C02 devices in their _CRS settings in ACPI during ACPI PnP startup should be reserved (or checked), and any drivers that still use port 80 implicitly should reserve that port. This may be too late in the boot process to make a decision not to use port 80, and it doesn't help decide a strategy to use an alternate port (0xED happens to work on the dv9000 machines in the sense that it generates a bus timeout on LPC, but there is no guarantee that 0xED is free on any particular motherboard, and unusedness is not declared in any BIOS/ACPI tables) or to use a udelay-based iodelay (but there is nothing in the BIOS tables that suggest the right delays for various I/O ports if any modern parts need them...which I question, but can't prove a negative in general). However, doing the reservations on such resources could generate a warning that would help diagnose new current and future designs including devices like the ENE KB3920 that have a port that is defaulted to port 80 and routed to the EC for functions that the firmware and ACPI can agree to do. Or any other ports used in new ways and properly notified to the OS via the now-standard Wintel BIOS functions. I don't know if /proc/ioports is being maintained, but the fact that it doesn't contain all of those PNP0C02 resources known on my machine seems to be a bug - which I am happy to code a patch or two for as a contribution back to Linux, if it isn't on the way out as the /sys hierarchy does a better job. /sys/bus/pnp/... does get built properly and has port 80 described properly - not as a DMA port, but as a port in use by device 05:00, which is the motherboard resource catchall. Thus the patch would be small. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Zachary Amsden wrote: According to Phoenix Technologies book System BIOS for IBM PCs, Compatibles and EISA Computers, 2nd Edition, the I/O port list gives port 0080h R/W Extra page register (temporary storage) Despite looking, I've never seen it documented anywhere else, but I believe it works on just about every PC platform. Except, apparently, my laptop. The port 80 problem was discovered by me, after months of bisecting the running code around a problem with hanging when using hwclock in 64-bit mode when ACPI is on. So it kills my laptop, too, and many currentlaptop motherboards designed by Quanta for HP and Compaq (dv6000, dv9000, tx1000, apparently) In the last couple of weeks, I was able with luck to discover that the problem is the ENE KB3920 chip, which is the big brother of the KB3700 chip included in the OLPC XO $100 laptop made also by Quanta. I verified this by taking my laptop apart - a fun and risky experience. Didn't break any connectors, but I don't recommend it for those who are not experienced disassembling laptops and cellphones, etc. The KB3920 contains an EC, an SMBus, a KBC, some watchdog timers, and a variety of other functions that keep the laptop going, coordinating the relationships among various peripherals. The firmware is part standard from ENE, part OEM-specific, in this case coded by Quanta or a BIOS subcontractor. You can read the specsheet for the KB3700 online at laptop.org, since the specs of the laptop are open. The 3920's spec is confidential. And the firmware is confidential as well for both the 3700 and 3920. Clues as to what it does can be gleaned by reading the disassembler output of the DSDT code in the particular laptops - though the SMM BIOS probably also talks to it. Modern machines have many subsystems, and the ACPI and SMBIOS coordinate to run them; blade servers also have drawer controllers and backplane management buses. The part that runs Linux is only part of the machine. Your laptop isn't an aberration. It's part of the new generation of evolved machines that take advantage of the capabilities of ACPI and SMBIOS and DMI standards that are becoming core parts of the market. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Christer Weinigel wrote: Argument by personal authority. Thats good. There is no other kind of argument. Are you claiming supernatural authority drives your typing fingers, or is your argument based on what you think you know? I have piles of code that I wrote, spec sheets (now that I'm back in my home office), code that others wrote at the time, and documentation from vendors that come from my personal experiences. That doesn't mean I'm always right - always happy to learn something new. Just don't condescend to a 55 year old who has been writing operating systems, compilers, and designing hardware for almost 40 years professionally (yes, I got my first job at 16 writing FORTRAN code to simulate hydrodynamic systems). I guess that's why you don't seem to understand the difference between reading the serial port status register and not being allowed to access a register at all due to such this as the 4 cycle delay you quoted yourself from the 8390 data sheet, If you read what I said carefully, I said that the 8390 was a very special case. The "chip select" problem it experienced was pretty much unique among boards of the time. Those of us who looked at its design and had any experience designing hardware for buses like the unibus or even the buses on PDP-8's and DG machines thought it had to be a joke. Of course it saved money per board, so it beat the 3Com boards on price - and you could program it after a fashion. So it involved "cheaping out". The normal timing problem was that an out or in operation to a board or chip required some time to elapse before the chip performed the side effects internally so that the next operation to it would have an effect. This is exactly the reason why most chips and boards are designed to either have a polling of a flag indicate operation completion. The serial "buffer empty" flag is the simplest possible explanatory example of such handshaking that came to mind (writing a character to a serial output device twice often leads to surprises, unless you wait for the previous character to clock out). See my comment on RTC below, for a more complex to explain example. and similar issues with the I8253 that I quoted from its data sheet a few posts ago. The 8253 was a motherboard chip. I am not sure it had any timing problems with its electrical signalling. I just don't remember. The spec sheet doesn't say it's internal state can get scrambled. I was thinking of another timer, the RTC which is usually a part of the Super I/O. The RTC has very well documented timing requirements. But none of the spec sheets, nor my experience with it, mention electrical issues that prevented back-to-back port operations. The documented timing requirements have to do with the state during the time it ticks over internally once per second. But it is carefully designed to have a flag that is "on" during 244 microseconds prior to and covering the time it is unsafe to read the registers. That design is special because it is designed to operate when the machine is powered off, so it has two internal clock domains, one of which is used in "low power" mode and is very slow to minimize power. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Christer Weinigel wrote: There is no need to use io writes to supposedly/theoretically "unused ports" to make drivers work on any bus. ISA included! You can, for example, wait for an ISA bus serial adapter to put out its next character by looping reading the port that has the output buffer full flag in a tight loop, with no delay code at all. And if you need to time things, just call a timing loop subroutine that you calibrate at boot time. Now you're totally confusing things. You're talking about looking at bits in a register to see if a transmit register is empty. That's easy. The delays needed for the Intel M8259 and M8253 say that you're not even allowed to access the registers _at_ _all_ for some time after a register access. If you do a write to a register immediately followed by any access, including a read of the status register, you can corrupt the state of the chip. Not true. Even on the original IBM 5150 PC, the 8259 on the motherboard accepted back to back OUT and IN instructions, and it would NOT trash the chip state. You can read the original IBM BIOS code if you like. I don't remember about the 8253's timing. I doubt the chip's state would be corrupted in any way. The data and address lines were the same data and address lines that the microprocessor used to access memory - it didn't "hold" the lines stable any longer than the OUT instruction. And the Intel chips are not the only ones with that kind of brain damage. But what makes the 8259 and 8253 a big problem is that every modern PC has a descendant of those chips in them. Register compatible. Not the same chips or even the same masks or timing requirements. The discrete Intel chips or clones got aggregated into Super I/O chips, and the Super I/O chips were put on a LPC bus (an ISA bus with another name) or integrated into the southbrige. Don't try to teach your grandmother to suck eggs: I've been programming PC compatibles since probably before you were able to do long division - including writing code on the first prototype IBM PCs, the first pre-manufacturing PC-ATs, and zillions of clones. (and I was also involved in designing hardware including the so-called "Lotus Intel" expanded memory cards and the original PC cards) The 8259 PIC is an *interrupt controller*. It was NEVER present in a Super I/O chip, or an LPC chip. Its functionality was absorbed into the chipsets that control interrupt mapping, like the PIIX and the nForce. And the "if it ain't broken, don't fix it" mantra probably means that some modern chipsets are still using exactly the same internal design as the 25 year old chips and will still be subject to some of those ancient limitations. Oh, come on. Give the VLSI designers some credit for brains. The CAD tools used to design the 8259 and 8253 were so primitive you couldn't even get a chip manufactured with designs from that era today. When people design chips today they do it with simulators that can't even work, and testers that run from test suites that were not available at the time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan - I dug up a DP83901A SNIC datasheet in a quick Google search, while that wasn't the only such chip, it was one of them. I can forward the PDF (the www.alldatasheet.com site dynamically creates the download URL), if anyone wants it. The relevant passage says, in regard to delaying between checking the CRDA addresses to see if a dummy "remote read" has been executed., and in regard perhaps to other card IO register loops: TIME BETWEEN CHIP SELECTS The SNIC requires that successive chip selects be no closer than 4 bus clocks (BSCK) together. If the condition is violat- ed the SNIC may glitch ACK. CPUs that operate from pipe- lined instructions (i e 386) or have a cache (i e 486) can execute consecutive I O cycles very quickly The solution is to delay the execution of consecutive I O cycles by either breaking the pipeline or forcing the CPU to access outside its cache. The NE2000 as I recall had no special logic on the board to protect the chip from successive chip selects that were too close - which is the reason for the problem. Clearly an out to port 80 takes more than 4 ISA bus clocks, so that works if the NE2000 is on the ISA bus, On the other hand, there are other ways to delay more than 4 ISA bus clocks. And as you say, one needs a delay for this chip that relates to the chip's card's bus's clock speed, not absolute time. Alan Cox wrote: As well you should. I am honestly curious (for my own satisfaction) as to what the natsemi docs say the delay code should do (can't imagine they say "use io port 80 because it is unused"). I don't have any They say you must allow 4 bus clocks for the address decode. They don't deal with the ISA side as the chip itself has no ISA glue. copies anymore. But mere curiosity on my part is not worth spending a lot of time on - I know you are super busy. If there's a copy online at a URL ... Not that I know of. There may be. A good general source of info is Russ Nelson's old DOS packet driver collection. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: The natsemi docs here say otherwise. I trust them not you. As well you should. I am honestly curious (for my own satisfaction) as to what the natsemi docs say the delay code should do (can't imagine they say "use io port 80 because it is unused"). I don't have any copies anymore. But mere curiosity on my part is not worth spending a lot of time on - I know you are super busy. If there's a copy online at a URL ... The problem is that certain people, unfortunately those who know nothing about ISA related bus systems, keep trying to confuse ISA delay logic with core chip logic and end up trying to solve both a problem and a non-problem in one, creating a nasty mess in the process. I agree that the problems of chip logic and ISA delay are all tangled up, probably more than need be. I hope that the solution turns out to simplify matters, and hopefully to document the intention of the resulting code sections a bit more clearly for the future. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Ondrej Zary wrote: On Tuesday 08 January 2008 18:24:02 David P. Reed wrote: Windows these days does delays with timing loops or the scheduler. It doesn't use a "port". Also, Windows XP only supports machines that tend not to have timing problems that use delays. Instead, if a device takes a while to respond, it has a "busy bit" in some port or memory slot that can be tested. Windows XP can run on a machine with ISA slot(s) and has built-in drivers for some plug ISA cards - e.g. the famous 3Com EtherLink III. I think that there's a driver for NE2000-compatible cards too and it probably works. There is no need to use io writes to supposedly/theoretically "unused ports" to make drivers work on any bus. ISA included! You can, for example, wait for an ISA bus serial adapter to put out its next character by looping reading the port that has the output buffer full flag in a tight loop, with no delay code at all. And if you need to time things, just call a timing loop subroutine that you calibrate at boot time. I wrote DOS drivers for NE2000's on the ISA bus when they were brand new designs from Novell without such kludges as writes to I/O port 80. I don't remember writing a driver for the 3com devices - probably didn't, because 3Com's cards were expensive at the time. In any case, Linux *did* adopt this port 80 strategy - I'm sure all concerned thought it was frightfully clever at the time. Linus expressed his skepticism in the comments in io.h. The problem is to safely move away from it toward a proper strategy that doesn't depend on "bus aborts" which would trigger machine checks if they were properly enabled. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Windows these days does delays with timing loops or the scheduler. It doesn't use a "port". Also, Windows XP only supports machines that tend not to have timing problems that use delays. Instead, if a device takes a while to respond, it has a "busy bit" in some port or memory slot that can be tested. Almost all of the issues in Linux where _p operations are used are (or should be) historical - IMO. Ondrej Zary wrote: On Tuesday 08 January 2008 02:38:15 David P. Reed wrote: H. Peter Anvin wrote: And shoot the designer of this particular microcontroller firmware. Well, some days I want to shoot the "designer" of the entire Wintel architecture... it's not exactly "designed" by anybody of course, and today it's created largely by a collection of Taiwanese and Chinese ODM firms, coupled with Microsoft WinHEC and Intel folks. At least they follow the rules and their ACPI and BIOS code say that they are using port 80 very clearly if you use PnP and ACPI properly. And in the old days, you were "supposed" to use the system BIOS to talk to things like the PIT that had timing issues, not write your own code. Does anyone know what port does Windows use? I'm pretty sure that it isn't 80h as I run Windows 98 often with port 80h debug card inserted. The last POST code set by BIOS usually remains on the display and only changes when BIOS does something like suspend/resume. IIRC, there was a program that was able to display temperature from onboard sensors on the port 80h display that's integrated on some mainboards. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
The last time I heard of a 12 MHz bus in a PC system was in the days of the PC-AT, when some clone makers sped up their buses (pre PCI!!!) in an attempt to allow adapter card *memory* to run at the 12 MHz speed. This caused so many industry-wide problems with adapter cards that couldn't be installed in certain machines and still run reliably that the industry learned a lesson. That doesn't mean that LPCs don't run at 12 MHz, but if they do, they don't have old 8 bit punky cards plugged into them for lots of practical reasons. (I have whole drawers full of such old cards, trying to figure out an environmentally responsible way to get rid of them - even third world countries would be fools to make machiens with them). I can't believe that we are not supporting today's machines correctly because we are still trying to be compatible with a few (at most a hundre thousand were manufactured! Much less still functioning or running Linux) machines. Now I understand that PC/104 machines and other things are very non PC compatible, but are x86 processor architectures. Do they even run x86 under 2.6.24? Perhaps the rational solution here is to declare x86 the architecture for "relics" and develop a merged architecture called "modern machines" to include only those PCs that have been made to work since, say, the release of (cough) WIndows 2000? Bodo Eggert wrote: On Tue, 8 Jan 2008, Rene Herman wrote: On 08-01-08 00:24, H. Peter Anvin wrote: Rene Herman wrote: Is this only about the ones then left for things like legacy PIC and PIT? Does anyone care about just sticking in a udelay(2) (or 1) there as a replacement and call it a day? PIT is problematic because the PIT may be necessary for udelay setup. Yes, can initialise loops_per_jiffy conservatively. Just didn't quite get why you guys are talking about an ISA bus speed parameter. If the ISA bus is below 8 MHz, we might need a longer delay. If we default to the longer delay, the delay will be too long for more than 99,99 % of all systems, not counting i586+. Especially if the driver is fine-tuned to give maximum throughput, this may be bad. OTOH, the DOS drivers I heared about use delays and would break on underclocked ISA busses if the n * ISA_HZ delay was needed. Maybe somebody having a configurable ISA bus speed and some problematic chips can test it ... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
The last time I heard of a 12 MHz bus in a PC system was in the days of the PC-AT, when some clone makers sped up their buses (pre PCI!!!) in an attempt to allow adapter card *memory* to run at the 12 MHz speed. This caused so many industry-wide problems with adapter cards that couldn't be installed in certain machines and still run reliably that the industry learned a lesson. That doesn't mean that LPCs don't run at 12 MHz, but if they do, they don't have old 8 bit punky cards plugged into them for lots of practical reasons. (I have whole drawers full of such old cards, trying to figure out an environmentally responsible way to get rid of them - even third world countries would be fools to make machiens with them). I can't believe that we are not supporting today's machines correctly because we are still trying to be compatible with a few (at most a hundre thousand were manufactured! Much less still functioning or running Linux) machines. Now I understand that PC/104 machines and other things are very non PC compatible, but are x86 processor architectures. Do they even run x86 under 2.6.24? Perhaps the rational solution here is to declare x86 the architecture for relics and develop a merged architecture called modern machines to include only those PCs that have been made to work since, say, the release of (cough) WIndows 2000? Bodo Eggert wrote: On Tue, 8 Jan 2008, Rene Herman wrote: On 08-01-08 00:24, H. Peter Anvin wrote: Rene Herman wrote: Is this only about the ones then left for things like legacy PIC and PIT? Does anyone care about just sticking in a udelay(2) (or 1) there as a replacement and call it a day? PIT is problematic because the PIT may be necessary for udelay setup. Yes, can initialise loops_per_jiffy conservatively. Just didn't quite get why you guys are talking about an ISA bus speed parameter. If the ISA bus is below 8 MHz, we might need a longer delay. If we default to the longer delay, the delay will be too long for more than 99,99 % of all systems, not counting i586+. Especially if the driver is fine-tuned to give maximum throughput, this may be bad. OTOH, the DOS drivers I heared about use delays and would break on underclocked ISA busses if the n * ISA_HZ delay was needed. Maybe somebody having a configurable ISA bus speed and some problematic chips can test it ... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Windows these days does delays with timing loops or the scheduler. It doesn't use a port. Also, Windows XP only supports machines that tend not to have timing problems that use delays. Instead, if a device takes a while to respond, it has a busy bit in some port or memory slot that can be tested. Almost all of the issues in Linux where _p operations are used are (or should be) historical - IMO. Ondrej Zary wrote: On Tuesday 08 January 2008 02:38:15 David P. Reed wrote: H. Peter Anvin wrote: And shoot the designer of this particular microcontroller firmware. Well, some days I want to shoot the designer of the entire Wintel architecture... it's not exactly designed by anybody of course, and today it's created largely by a collection of Taiwanese and Chinese ODM firms, coupled with Microsoft WinHEC and Intel folks. At least they follow the rules and their ACPI and BIOS code say that they are using port 80 very clearly if you use PnP and ACPI properly. And in the old days, you were supposed to use the system BIOS to talk to things like the PIT that had timing issues, not write your own code. Does anyone know what port does Windows use? I'm pretty sure that it isn't 80h as I run Windows 98 often with port 80h debug card inserted. The last POST code set by BIOS usually remains on the display and only changes when BIOS does something like suspend/resume. IIRC, there was a program that was able to display temperature from onboard sensors on the port 80h display that's integrated on some mainboards. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Ondrej Zary wrote: On Tuesday 08 January 2008 18:24:02 David P. Reed wrote: Windows these days does delays with timing loops or the scheduler. It doesn't use a port. Also, Windows XP only supports machines that tend not to have timing problems that use delays. Instead, if a device takes a while to respond, it has a busy bit in some port or memory slot that can be tested. Windows XP can run on a machine with ISA slot(s) and has built-in drivers for some plugplay ISA cards - e.g. the famous 3Com EtherLink III. I think that there's a driver for NE2000-compatible cards too and it probably works. There is no need to use io writes to supposedly/theoretically unused ports to make drivers work on any bus. ISA included! You can, for example, wait for an ISA bus serial adapter to put out its next character by looping reading the port that has the output buffer full flag in a tight loop, with no delay code at all. And if you need to time things, just call a timing loop subroutine that you calibrate at boot time. I wrote DOS drivers for NE2000's on the ISA bus when they were brand new designs from Novell without such kludges as writes to I/O port 80. I don't remember writing a driver for the 3com devices - probably didn't, because 3Com's cards were expensive at the time. In any case, Linux *did* adopt this port 80 strategy - I'm sure all concerned thought it was frightfully clever at the time. Linus expressed his skepticism in the comments in io.h. The problem is to safely move away from it toward a proper strategy that doesn't depend on bus aborts which would trigger machine checks if they were properly enabled. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: The natsemi docs here say otherwise. I trust them not you. As well you should. I am honestly curious (for my own satisfaction) as to what the natsemi docs say the delay code should do (can't imagine they say use io port 80 because it is unused). I don't have any copies anymore. But mere curiosity on my part is not worth spending a lot of time on - I know you are super busy. If there's a copy online at a URL ... The problem is that certain people, unfortunately those who know nothing about ISA related bus systems, keep trying to confuse ISA delay logic with core chip logic and end up trying to solve both a problem and a non-problem in one, creating a nasty mess in the process. I agree that the problems of chip logic and ISA delay are all tangled up, probably more than need be. I hope that the solution turns out to simplify matters, and hopefully to document the intention of the resulting code sections a bit more clearly for the future. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan - I dug up a DP83901A SNIC datasheet in a quick Google search, while that wasn't the only such chip, it was one of them. I can forward the PDF (the www.alldatasheet.com site dynamically creates the download URL), if anyone wants it. The relevant passage says, in regard to delaying between checking the CRDA addresses to see if a dummy remote read has been executed., and in regard perhaps to other card IO register loops: TIME BETWEEN CHIP SELECTS The SNIC requires that successive chip selects be no closer than 4 bus clocks (BSCK) together. If the condition is violat- ed the SNIC may glitch ACK. CPUs that operate from pipe- lined instructions (i e 386) or have a cache (i e 486) can execute consecutive I O cycles very quickly The solution is to delay the execution of consecutive I O cycles by either breaking the pipeline or forcing the CPU to access outside its cache. The NE2000 as I recall had no special logic on the board to protect the chip from successive chip selects that were too close - which is the reason for the problem. Clearly an out to port 80 takes more than 4 ISA bus clocks, so that works if the NE2000 is on the ISA bus, On the other hand, there are other ways to delay more than 4 ISA bus clocks. And as you say, one needs a delay for this chip that relates to the chip's card's bus's clock speed, not absolute time. Alan Cox wrote: As well you should. I am honestly curious (for my own satisfaction) as to what the natsemi docs say the delay code should do (can't imagine they say use io port 80 because it is unused). I don't have any They say you must allow 4 bus clocks for the address decode. They don't deal with the ISA side as the chip itself has no ISA glue. copies anymore. But mere curiosity on my part is not worth spending a lot of time on - I know you are super busy. If there's a copy online at a URL ... Not that I know of. There may be. A good general source of info is Russ Nelson's old DOS packet driver collection. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Christer Weinigel wrote: There is no need to use io writes to supposedly/theoretically unused ports to make drivers work on any bus. ISA included! You can, for example, wait for an ISA bus serial adapter to put out its next character by looping reading the port that has the output buffer full flag in a tight loop, with no delay code at all. And if you need to time things, just call a timing loop subroutine that you calibrate at boot time. Now you're totally confusing things. You're talking about looking at bits in a register to see if a transmit register is empty. That's easy. The delays needed for the Intel M8259 and M8253 say that you're not even allowed to access the registers _at_ _all_ for some time after a register access. If you do a write to a register immediately followed by any access, including a read of the status register, you can corrupt the state of the chip. Not true. Even on the original IBM 5150 PC, the 8259 on the motherboard accepted back to back OUT and IN instructions, and it would NOT trash the chip state. You can read the original IBM BIOS code if you like. I don't remember about the 8253's timing. I doubt the chip's state would be corrupted in any way. The data and address lines were the same data and address lines that the microprocessor used to access memory - it didn't hold the lines stable any longer than the OUT instruction. And the Intel chips are not the only ones with that kind of brain damage. But what makes the 8259 and 8253 a big problem is that every modern PC has a descendant of those chips in them. Register compatible. Not the same chips or even the same masks or timing requirements. The discrete Intel chips or clones got aggregated into Super I/O chips, and the Super I/O chips were put on a LPC bus (an ISA bus with another name) or integrated into the southbrige. Don't try to teach your grandmother to suck eggs: I've been programming PC compatibles since probably before you were able to do long division - including writing code on the first prototype IBM PCs, the first pre-manufacturing PC-ATs, and zillions of clones. (and I was also involved in designing hardware including the so-called Lotus Intel expanded memory cards and the original PC cards) The 8259 PIC is an *interrupt controller*. It was NEVER present in a Super I/O chip, or an LPC chip. Its functionality was absorbed into the chipsets that control interrupt mapping, like the PIIX and the nForce. And the if it ain't broken, don't fix it mantra probably means that some modern chipsets are still using exactly the same internal design as the 25 year old chips and will still be subject to some of those ancient limitations. Oh, come on. Give the VLSI designers some credit for brains. The CAD tools used to design the 8259 and 8253 were so primitive you couldn't even get a chip manufactured with designs from that era today. When people design chips today they do it with simulators that can't even work, and testers that run from test suites that were not available at the time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Christer Weinigel wrote: Argument by personal authority. Thats good. There is no other kind of argument. Are you claiming supernatural authority drives your typing fingers, or is your argument based on what you think you know? I have piles of code that I wrote, spec sheets (now that I'm back in my home office), code that others wrote at the time, and documentation from vendors that come from my personal experiences. That doesn't mean I'm always right - always happy to learn something new. Just don't condescend to a 55 year old who has been writing operating systems, compilers, and designing hardware for almost 40 years professionally (yes, I got my first job at 16 writing FORTRAN code to simulate hydrodynamic systems). I guess that's why you don't seem to understand the difference between reading the serial port status register and not being allowed to access a register at all due to such this as the 4 cycle delay you quoted yourself from the 8390 data sheet, If you read what I said carefully, I said that the 8390 was a very special case. The chip select problem it experienced was pretty much unique among boards of the time. Those of us who looked at its design and had any experience designing hardware for buses like the unibus or even the buses on PDP-8's and DG machines thought it had to be a joke. Of course it saved money per board, so it beat the 3Com boards on price - and you could program it after a fashion. So it involved cheaping out. The normal timing problem was that an out or in operation to a board or chip required some time to elapse before the chip performed the side effects internally so that the next operation to it would have an effect. This is exactly the reason why most chips and boards are designed to either have a polling of a flag indicate operation completion. The serial buffer empty flag is the simplest possible explanatory example of such handshaking that came to mind (writing a character to a serial output device twice often leads to surprises, unless you wait for the previous character to clock out). See my comment on RTC below, for a more complex to explain example. and similar issues with the I8253 that I quoted from its data sheet a few posts ago. The 8253 was a motherboard chip. I am not sure it had any timing problems with its electrical signalling. I just don't remember. The spec sheet doesn't say it's internal state can get scrambled. I was thinking of another timer, the RTC which is usually a part of the Super I/O. The RTC has very well documented timing requirements. But none of the spec sheets, nor my experience with it, mention electrical issues that prevented back-to-back port operations. The documented timing requirements have to do with the state during the time it ticks over internally once per second. But it is carefully designed to have a flag that is on during 244 microseconds prior to and covering the time it is unsafe to read the registers. That design is special because it is designed to operate when the machine is powered off, so it has two internal clock domains, one of which is used in low power mode and is very slow to minimize power. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
H. Peter Anvin wrote: And shoot the designer of this particular microcontroller firmware. Well, some days I want to shoot the "designer" of the entire Wintel architecture... it's not exactly "designed" by anybody of course, and today it's created largely by a collection of Taiwanese and Chinese ODM firms, coupled with Microsoft WinHEC and Intel folks. At least they follow the rules and their ACPI and BIOS code say that they are using port 80 very clearly if you use PnP and ACPI properly. And in the old days, you were "supposed" to use the system BIOS to talk to things like the PIT that had timing issues, not write your own code. Or perhaps the ACPI spec should specify a timing loop spec and precisely specify the desired timing after accessing an I/O port till that device has properly "acted" on that operation. The idea that Port 80 was "unused" and appropriate for delay purposes elicited skepticism by Linus that is recorded for posterity in the comments of the relevant Linux include files - especially since it was clearly "used" for non-delay purposes, by cards that could be plugged into a PCI (fast), not just an 8-bit ISA, bus. Perhaps we should declare the world of ACPI systems a separate "arch" from the world of l'ancien regime where folklore about which ports were used for what ruled. I lived through those old days, and they were not wonderful, either. The world sucks, and Linux is supposed to be able to adapt to that world, suckitude and all. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
On another topic. I have indeed determined what device uses port 80 on Quanta AMD64 laptops from HP. I had lunch with Jim Gettys of OLPC a week ago; he's an old friend since he worked on the original X windows system. After telling him my story about port 80, he mentioned that the OLPC XO machine had some issues with port 80 which was by design handled by the ENE KBC device on its motherboard. He said the ENE was a very desirable chipset for AMD designs recommended by Quanta. Richard Smith of OLPC explained to me how the KB3700 they use works, and that they use the KB3700 to send POST codes out over a serial link during boot up. This gave me a reason to take apart my laptop, to discover that it has an ENE KB3920 B0 as its EC and KBC. The port interface for the KB3920 includes listening to port 80 which is then made available to firmware on the EC. It is recognized and decoded on the LPC bus, only for writes, and optionally can generate an interrupt in the 8051. Dumping both the ENE chip, and looking at the DSDT.dsl for my machine, I discovered that port 80 is used as an additional parameter for various DSDT methods that communicate to the EC, when it is operating in ACPI mode. More work is in progress as I play around with this. But the key thing is that ACPI and perhaps SMM both use port 80 as part of the base function of the chipset. And actually, if I had looked at the /sys/bus/pnp definitions, rather than /proc/ioports, I would have noticed that port 80 was part of a PNP0C02 resource set. That means exactly one thing: ACPI says that port 80 is NOT free to be used, for delays or anything else. This should make no difference here: it's just one more reason to stop using port 80 for delays on modern machines. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
H. Peter Anvin wrote: Rene Herman wrote: Is this only about the ones then left for things like legacy PIC and PIT? Does anyone care about just sticking in a udelay(2) (or 1) there as a replacement and call it a day? PIT is problematic because the PIT may be necessary for udelay setup. The PIT usage for calibrating the delay loop can be moderated, if need by, by using the PC BIOS which by definition uses the PIT correctly it its int 15 function 83 call.. Just do it before coming up in a state where the PC BIOS int 15h calls no longer work. I gave code to do this in a much earlier message. This is the MOST reliable way to use the PIT early in boot, on a PC compatible. God knows how one should do it on a Macintosh running a 386/20 :-). But the ONLY old bat-PIT machines are, thank god, PC compatible, maybe. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
H. Peter Anvin wrote: Rene Herman wrote: Is this only about the ones then left for things like legacy PIC and PIT? Does anyone care about just sticking in a udelay(2) (or 1) there as a replacement and call it a day? PIT is problematic because the PIT may be necessary for udelay setup. The PIT usage for calibrating the delay loop can be moderated, if need by, by using the PC BIOS which by definition uses the PIT correctly it its int 15 function 83 call.. Just do it before coming up in a state where the PC BIOS int 15h calls no longer work. I gave code to do this in a much earlier message. This is the MOST reliable way to use the PIT early in boot, on a PC compatible. God knows how one should do it on a Macintosh running a 386/20 :-). But the ONLY old bat-PIT machines are, thank god, PC compatible, maybe. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
On another topic. I have indeed determined what device uses port 80 on Quanta AMD64 laptops from HP. I had lunch with Jim Gettys of OLPC a week ago; he's an old friend since he worked on the original X windows system. After telling him my story about port 80, he mentioned that the OLPC XO machine had some issues with port 80 which was by design handled by the ENE KBC device on its motherboard. He said the ENE was a very desirable chipset for AMD designs recommended by Quanta. Richard Smith of OLPC explained to me how the KB3700 they use works, and that they use the KB3700 to send POST codes out over a serial link during boot up. This gave me a reason to take apart my laptop, to discover that it has an ENE KB3920 B0 as its EC and KBC. The port interface for the KB3920 includes listening to port 80 which is then made available to firmware on the EC. It is recognized and decoded on the LPC bus, only for writes, and optionally can generate an interrupt in the 8051. Dumping both the ENE chip, and looking at the DSDT.dsl for my machine, I discovered that port 80 is used as an additional parameter for various DSDT methods that communicate to the EC, when it is operating in ACPI mode. More work is in progress as I play around with this. But the key thing is that ACPI and perhaps SMM both use port 80 as part of the base function of the chipset. And actually, if I had looked at the /sys/bus/pnp definitions, rather than /proc/ioports, I would have noticed that port 80 was part of a PNP0C02 resource set. That means exactly one thing: ACPI says that port 80 is NOT free to be used, for delays or anything else. This should make no difference here: it's just one more reason to stop using port 80 for delays on modern machines. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
H. Peter Anvin wrote: And shoot the designer of this particular microcontroller firmware. Well, some days I want to shoot the designer of the entire Wintel architecture... it's not exactly designed by anybody of course, and today it's created largely by a collection of Taiwanese and Chinese ODM firms, coupled with Microsoft WinHEC and Intel folks. At least they follow the rules and their ACPI and BIOS code say that they are using port 80 very clearly if you use PnP and ACPI properly. And in the old days, you were supposed to use the system BIOS to talk to things like the PIT that had timing issues, not write your own code. Or perhaps the ACPI spec should specify a timing loop spec and precisely specify the desired timing after accessing an I/O port till that device has properly acted on that operation. The idea that Port 80 was unused and appropriate for delay purposes elicited skepticism by Linus that is recorded for posterity in the comments of the relevant Linux include files - especially since it was clearly used for non-delay purposes, by cards that could be plugged into a PCI (fast), not just an 8-bit ISA, bus. Perhaps we should declare the world of ACPI systems a separate arch from the world of l'ancien regime where folklore about which ports were used for what ruled. I lived through those old days, and they were not wonderful, either. The world sucks, and Linux is supposed to be able to adapt to that world, suckitude and all. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
FYI - another quirky Quanta motherboard from HP, with DMI readings reported to me. Original Message Date: Wed, 2 Jan 2008 16:23:27 +1030 From: Joel Stanley <[EMAIL PROTECTED]> To: David P. Reed <[EMAIL PROTECTED]> Subject:Re: [PATCH] Option to disable AMD C1E (allows dynticks to work) On Dec 30, 2007 1:13 AM, David P. Reed <[EMAIL PROTECTED]> wrote: I have also attached a c program that only touches port 80. Compile it for 32-bit mode (see comment), run it as root, and after two or three runs, it will hang a system that has the port 80 bug. Using port80.c, I could hard lock a HP Pavilion tx1000 laptop on the first go. This was with ubuntu hardy's stock kernel (a 2.6.24-rc) dmidecode -s baseboard-manufacturer dmidecode -s baseboard-product-name Quanta 30BF Tonight, I will try compiling a kernel with these values added to your patch. Some history, feel free to ignore if it's not relevant: ubuntu feisty's 2.6.22 based kernel worked fine, irc. We were having issues with sound, so tried fedora8's .23 based kernel, but this would sporadically hard lock. Ubuntu hardy's 2.6.24 appeared fine, for the 2 hours or so I used it last night, until using the port80.c program, obviously. Cheers, Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
FYI - another quirky Quanta motherboard from HP, with DMI readings reported to me. Original Message Date: Wed, 2 Jan 2008 16:23:27 +1030 From: Joel Stanley [EMAIL PROTECTED] To: David P. Reed [EMAIL PROTECTED] Subject:Re: [PATCH] Option to disable AMD C1E (allows dynticks to work) On Dec 30, 2007 1:13 AM, David P. Reed [EMAIL PROTECTED] wrote: I have also attached a c program that only touches port 80. Compile it for 32-bit mode (see comment), run it as root, and after two or three runs, it will hang a system that has the port 80 bug. Using port80.c, I could hard lock a HP Pavilion tx1000 laptop on the first go. This was with ubuntu hardy's stock kernel (a 2.6.24-rc) dmidecode -s baseboard-manufacturer dmidecode -s baseboard-product-name Quanta 30BF Tonight, I will try compiling a kernel with these values added to your patch. Some history, feel free to ignore if it's not relevant: ubuntu feisty's 2.6.22 based kernel worked fine, irc. We were having issues with sound, so tried fedora8's .23 based kernel, but this would sporadically hard lock. Ubuntu hardy's 2.6.24 appeared fine, for the 2 hours or so I used it last night, until using the port80.c program, obviously. Cheers, Joel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: That does imply some muppet 'extended' the debug interface for power management on your laptop. Also pretty much proves that for such systems we do have to move from port 0x80 to another delay approach. Alan - in googling around the net yesterday looking for SuperIO chipsets that claim to support port 80, I have found that "blade" servers from companies like IBM and HP *claim* to have a system for monitoring port 80 diagnostic codes and sending them to the "drawer" management processor through a management backplane. This is a little puzzling, because you'd think they would have noticed port 80 issues, since they run Linux in their systems. Maybe not hangs, but it seems unhelpful to have a lot of noise spewing over a bus that is supposed to provide "management" diagnostics. Anyway, what I did not find was whether there was a particular chipset that provided that port 80 feature on those machines. However, if it's a common "cell" in a design, it may have leaked into the notebook market chipsets too. Anyone know if the Linux kernels used on blade servers have been patched to not do the port 80 things? I don't think this would break anything there, but it might have been a helpful patch for their purposes. I don't do blades personally or at work (I focus on mobile devices these days, and my personal servers are discrete), so I have no knowledge. It could be that the blade servers have BIOSes that don't do POST codes over port 80, but send them directly to the "drawer" management bus, of course. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Pavel Machek wrote: 2. there is some "meaning" to certain byte values being written (the _PTS and _WAK use of arguments that come from callers to store into port 80 makes me suspicious.) That might mean that the freeze happens only when certain values are written, or when they are written closely in time to some other action - being used to communicate something to the There's nothing easier than always writing 0 to the 0x80 to check if it hangs in such case...? Pavel I did try that. Machine in question does hang when you write 0 to 0x80 in a loop a few thousand times. This particular suspicion was that the problem was caused by the following sort of thing (it's a multi-cpu system...) First, some ACPI code writes "meaningful value" X to port 80 that is sort of a "parameter" to whatever follows. Just because the DSDT disassembly *calls* it the DBUG port doesn't mean it is *only* used for debugging. We (Linux) use it for timing delays, after all... then Linux driver writes some random value (!=X) including zero to port 80. then ACPI writes some other values that cause SMI or some other thing to happen, There are experiments that are not so simple that could rule this particular guess out. I have them on my queue of experiments I might try (locking out ACPI). Of course if the BIOS were GPL, we could look at the comments, etc... I may today pull the laptop apart to see if I can see what chips are on it, besides the nvidia chipset and the processor. That might give a clue as to what SuperIO or other logic chips are there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: responds to reads differently than "unused" ports. In particular, an inb takes 1/2 the elapsed time compared to a read to "known" unused port 0xed - 792 tsc ticks for port 80 compared to about 1450 tsc ticks for port 0xed and other unused ports (tsc at 800 MHz). Well at least we know where the port is now - thats too fast for an LPC bus device, so it must be an SMI trap. Only easy way to find out is to use the debugging event counters and see how many instruction cycles are issued as part of the 0x80 port. If its suprisingly high then you've got a firmware bug and can go spank HP. Alan, thank you for the pointers. I have been doing variations on this testing theme for a while - I get intrigued by a good debugging challenge, and after all it's my machine... Two relevant new data points, and then some more suggestions: 1. It appears to be a real port. SMI traps are not happening in the normal outb to 80. Hundreds of them execute perfectly with the expected instruction counts. If I can trace the particular event that creates the hard freeze (getting really creative, here) and stop before the freeze disables the entire computer, I will. That may be an SMI, or perhaps any other kind of interrupt or exception. Maybe someone knows how to safely trace through an impending SMI while doing printk's or something? 2. It appears to be the standard POST diagnostic port. On a whim, I disassembled my DSDT code, and studied it more closely. It turns out that there are a bunch of "Store(..., DBUG)" instructions scattered throughout, and when you look at what DBUG is defined as, it is defined as an IO Port at IO address DBGP, which is a 1-byte value = 0x80. So the ACPI BIOS thinks it has something to do with debugging. There's a little strangeness here, however, because the value sent to the port occasionally has something to do with arguments to the ACPI operations relating to sleep and wakeup ... could just be that those arguments are distinctive. In thinking about this, I recognize a couple of things. ACPI is telling us something when it declares a reference to port 80 in its code. It's not telling us the function of this port on this machine, but it is telling us that it is being used by the BIOS. This could be a reason to put out a printk warning message... 'warning: port 80 is used by ACPI BIOS - if you are experiencing problems, you might try an alternate means of iodelay.' Second, it seems likely that there are one of two possible reasons that the port 80 writes cause hang/freezes: 1. buffer overflow in such a device. 2. there is some "meaning" to certain byte values being written (the _PTS and _WAK use of arguments that come from callers to store into port 80 makes me suspicious.) That might mean that the freeze happens only when certain values are written, or when they are written closely in time to some other action - being used to communicate something to the SMM code). If there is some race in when Linux's port 80 writes happen that happen to change the meaning of a request to the hardware or to SMM, then we could be rarely stepping on -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: responds to reads differently than unused ports. In particular, an inb takes 1/2 the elapsed time compared to a read to known unused port 0xed - 792 tsc ticks for port 80 compared to about 1450 tsc ticks for port 0xed and other unused ports (tsc at 800 MHz). Well at least we know where the port is now - thats too fast for an LPC bus device, so it must be an SMI trap. Only easy way to find out is to use the debugging event counters and see how many instruction cycles are issued as part of the 0x80 port. If its suprisingly high then you've got a firmware bug and can go spank HP. Alan, thank you for the pointers. I have been doing variations on this testing theme for a while - I get intrigued by a good debugging challenge, and after all it's my machine... Two relevant new data points, and then some more suggestions: 1. It appears to be a real port. SMI traps are not happening in the normal outb to 80. Hundreds of them execute perfectly with the expected instruction counts. If I can trace the particular event that creates the hard freeze (getting really creative, here) and stop before the freeze disables the entire computer, I will. That may be an SMI, or perhaps any other kind of interrupt or exception. Maybe someone knows how to safely trace through an impending SMI while doing printk's or something? 2. It appears to be the standard POST diagnostic port. On a whim, I disassembled my DSDT code, and studied it more closely. It turns out that there are a bunch of Store(..., DBUG) instructions scattered throughout, and when you look at what DBUG is defined as, it is defined as an IO Port at IO address DBGP, which is a 1-byte value = 0x80. So the ACPI BIOS thinks it has something to do with debugging. There's a little strangeness here, however, because the value sent to the port occasionally has something to do with arguments to the ACPI operations relating to sleep and wakeup ... could just be that those arguments are distinctive. In thinking about this, I recognize a couple of things. ACPI is telling us something when it declares a reference to port 80 in its code. It's not telling us the function of this port on this machine, but it is telling us that it is being used by the BIOS. This could be a reason to put out a printk warning message... 'warning: port 80 is used by ACPI BIOS - if you are experiencing problems, you might try an alternate means of iodelay.' Second, it seems likely that there are one of two possible reasons that the port 80 writes cause hang/freezes: 1. buffer overflow in such a device. 2. there is some meaning to certain byte values being written (the _PTS and _WAK use of arguments that come from callers to store into port 80 makes me suspicious.) That might mean that the freeze happens only when certain values are written, or when they are written closely in time to some other action - being used to communicate something to the SMM code). If there is some race in when Linux's port 80 writes happen that happen to change the meaning of a request to the hardware or to SMM, then we could be rarely stepping on -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: That does imply some muppet 'extended' the debug interface for power management on your laptop. Also pretty much proves that for such systems we do have to move from port 0x80 to another delay approach. Alan - in googling around the net yesterday looking for SuperIO chipsets that claim to support port 80, I have found that blade servers from companies like IBM and HP *claim* to have a system for monitoring port 80 diagnostic codes and sending them to the drawer management processor through a management backplane. This is a little puzzling, because you'd think they would have noticed port 80 issues, since they run Linux in their systems. Maybe not hangs, but it seems unhelpful to have a lot of noise spewing over a bus that is supposed to provide management diagnostics. Anyway, what I did not find was whether there was a particular chipset that provided that port 80 feature on those machines. However, if it's a common cell in a design, it may have leaked into the notebook market chipsets too. Anyone know if the Linux kernels used on blade servers have been patched to not do the port 80 things? I don't think this would break anything there, but it might have been a helpful patch for their purposes. I don't do blades personally or at work (I focus on mobile devices these days, and my personal servers are discrete), so I have no knowledge. It could be that the blade servers have BIOSes that don't do POST codes over port 80, but send them directly to the drawer management bus, of course. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Pavel Machek wrote: 2. there is some meaning to certain byte values being written (the _PTS and _WAK use of arguments that come from callers to store into port 80 makes me suspicious.) That might mean that the freeze happens only when certain values are written, or when they are written closely in time to some other action - being used to communicate something to the There's nothing easier than always writing 0 to the 0x80 to check if it hangs in such case...? Pavel I did try that. Machine in question does hang when you write 0 to 0x80 in a loop a few thousand times. This particular suspicion was that the problem was caused by the following sort of thing (it's a multi-cpu system...) First, some ACPI code writes meaningful value X to port 80 that is sort of a parameter to whatever follows. Just because the DSDT disassembly *calls* it the DBUG port doesn't mean it is *only* used for debugging. We (Linux) use it for timing delays, after all... then Linux driver writes some random value (!=X) including zero to port 80. then ACPI writes some other values that cause SMI or some other thing to happen, There are experiments that are not so simple that could rule this particular guess out. I have them on my queue of experiments I might try (locking out ACPI). Of course if the BIOS were GPL, we could look at the comments, etc... I may today pull the laptop apart to see if I can see what chips are on it, besides the nvidia chipset and the processor. That might give a clue as to what SuperIO or other logic chips are there. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override
H. Peter Anvin wrote: Now, I think there is a specific reason to believe that EGA/VGA (but perhaps not CGA/MDA) didn't need these kinds of hacks: the video cards of the day was touched, directly, by an interminable number of DOS applications. CGA/MDA generally *were not*, due to the unsynchronized memory of the original versions (writing could cause snow), so most applications tended to fall back to using the BIOS access methods for CGA and MDA. A little history... not that it really matters, but some might be interested in a 55-year-old hacker's sentimental recollections...As someone who actually wrote drivers for CGA and MDA on the original IBM PC, I can tell you that back to back I/O *port* writes and reads were perfectly fine. The "snow" problem had nothing to do with I/O ports. It had to do with the memory on the CGA adapter card not being dual ported, and in high-res (80x25) character mode (only!) a CPU read or write access caused a read of the adapter memory by the character-generator to fail, causing one character-position of the current scanline being output to get all random bits, which was then put through the character-generator and generated whatever the character generator did with 8 random bits of character or attributes as an index into the character generator's font table. In particular, the solution in both the BIOS and in Visicalc, 1-2-3, and other products that did NOT use the BIOS or DOS for I/O to the CGA or MDA because they were Dog Slow, was to detect the CGA, and do a *very* tight loop doing "inb" instructions from one of the CGA status registers, looking for a 0-1 transition on the horizontal retrace flag. It would then do a write to display memory with all interrupts locked out, because that was all it could do during the horizontal retrace, given the speed of the processor. One of the hacks I did in those days (I wrote the CGA driver for Visicalc Advanced Version and several other Software Arts programs, some of which were sold to Lotus when they bought our assets, and hired me, in 1985) was to measure the "horizontal retrace time" and the "vertical blanking interval" when the program started, and compile screen-writing code that squeezed as many writes as possible into both horizontal retraces and vertical retraces. That was actually a "selling point" for spreadsheets - the reviewers actually measured whether you could use the down-arrow key in auto-repeat mode and keep the screen scrolling at the relevant rate! That was hard on an 8088 or 80286 processor, with a CGA card. It was great when EGA and VGA came out, but we still had to support the CGA long after. Which is why I fully understand the need not to break old machines. We had to run on every machine that was claimed to be "PC compatible" - many of which were hardly so compatible (the PS/2 model 50 had a completely erroneous serial chip that claimed to emulate the original 8250, but had an immense pile of bugs, for example, that IBM begged ISVs to call a software problem and fix so they didn't get sued). The IBM PC bus (predecessor of the current ISA bus, which came from the PC-AT's 16-bit bus), did just fine electrically - any I/O port-specific timing problems had to do with the timing of the chips attached to the bus. For example, if a bus write to a port was routed into a particular chip, the timing of that chip's subsequent processing might be such that it was not ready to respond to another read or write.) That's not a "signalling" problem - it has nothing to do with capacitance on the bus, e.g., but a functional speed problem in the chip (if on the motherboard) or the adapter card. Rant off. This has nothing, of course, to do with present issues. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)
Richard Harman wrote: I think I may have a monkey wrench to throw into this, I finally got around to testing the C1E patch, and the port80 patch. End result: port80 patch has zero effect on this laptop, and the C1E patch makes it stable. Stating that your system is "stable" is not very definitive. Does hwclock work when full Fedora 8 is running without the port80 patch, or have you disabled the uses of hwclock in your init and shutdown code? Have you set the hwclock setting to use the extremely dangerous "-directisa" option - which hides the problem because it avoids the port 80 i/o? Try compiling and running the test program port80.c a few times. If your machine doesn't hang, it would be interesting to see the results it gives. The C1E patch alone does not fix the port 80 problem several of us have observed. what does dmidecode say for your motherboard vendor and model? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override
Alan Cox wrote: Now what's interesting is that the outb to port 80 is *faster* than an outb to an unused port, on my machine. So there's something there - actually accepting the bus transaction. In the ancient 5150 PC, 80 was Yes and I even told you a while back how to verify where it is. From the timing you get its not on the LPC bus but chipset core so pretty certainly an SMM trap as other systems with the same chipset don't have the bug. Probably all that is needed is a BIOS upgrade Actually, I could see whether it was SMM trapping due to AMD MSR's that would allow such trapping, performance or debug registers. Nothing was set to trap with SMI or other traps on any port outputs. But I'm continuing to investigate for a cause. It would be nice if it were a BIOS-fixable problem. It would be even nicer if the BIOS were GPL... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override
I am so happy that there will be a way for people who don't build their own kernels to run Linux on their HP and Compaq laptops that have problems with gazillions of writes to port 80, and I'm also happy that some of the strange driver code will be cleaned up over time. Thank you all. Some thoughts you all might consider, take or leave, in this process, from an old engineering manager who once had to worry about QA for software on nearly every personal computer model in the 1980-1992 period: You know, there is a class of devices that are defined to use port 0x80... it's that historically useful class of devices that show/record the POST diagnostics. It certainly was not designed for "delay" purposes. In fact, some of those same silly devices are still used in industry during manufacturing test. I wonder what would happen if Windows were not part of manufacturing test, and instead Linux were the "standard" for some category of machines... When I was still working at Lotus in the late '80's, when we still supported machines like 286's, there were lots of problems with timing loops in drivers in applications (even Win 3.0 had some in hard disk drivers, as did some of our printer drivers, ...), as clock speeds continued to ramp. There were major news stories of machines that "crashed when xyz application or zyx peripheral were added". It was Intel, as I recall, that started "publicly" berating companies in the PC industry for using the "two short jumps" solutions, and suggesting that they measure the processor speed at bootup, using the BIOS standard for doing that with the int 15 BIOS elapsed time calls, and always use "calibrated" timing loops. Which all of us who supported device drivers started to do (remember, apps had device drivers in those days for many devices that talked directly with the registers). I was impressed when I dug into Linux eventually, that this operating system "got it right" by measuring the timing during boot and creating a udelay function that really worked! So I have to say, that when I was tracing down the problem that originally kicked off this thread, which was that just accessing the RTC using the standard CMOS_READ macros in a loop caused a hang, that these "outb al,80h" things were there. And I noticed your skeptical comment in the code, Linus. Knowing that there was never in any of the documented RTC chipsets a need for a pause between accesses (going back to my days at Software Arts working on just about every old machine there was...) I changed it on a lark to do no pause at all. And my machine never hung... Now what's interesting is that the outb to port 80 is *faster* than an outb to an unused port, on my machine. So there's something there - actually accepting the bus transaction. In the ancient 5150 PC, 80 was unused because it was the DMA controller port that drove memory refresh, and had no meaning. Now my current hypothesis (not having access to quanta's design specs for a board they designed and have shipped in quantity, or having taken the laptop apart recently) is that there is logic there on port 80, doing something. Perhaps even "POST diagnostic recording" as every PC since the XT has supported... perhaps supporting post-crash dignostics... And that that something has a buffer, perhaps even in the "Embedded Controller" that may need emptying periodically. It takes several tens of thousands of "outb" to port 80 to hang the hardware solid - so something is either rare or overflowing. In any case, if this hypothesis is correct - the hardware may have an erratum, but the hardware is doing a very desirable thing - standardizing on an error mechanism that was already in the "standard" as an option... It's Linux that is using a "standard" in a wrong way (a diagnostic port as a delay). So I say all this, mainly to point out that Linux has done timing loops right (udelay and ndelay) - except one place where there was some skepticism expressed, right there in the code. Linus may have some idea why it was thought important to do an essential delay with a bus transaction that had uncertain timing. My hypothesis is that "community" projects have the danger of "magical theories" and "coolness" overriding careful engineering design practices. Cleaning up that "clever hack" that seemed so good at the time is hugely difficult, especially when the driver writer didn't write down why he used it. Thus I would suggest that the _p functions be deprecated, and if there needs to be a timing-delay after in/out instructions, define in_pause(port, nsec_delay) with an explicit delay. And if the delay is dependent on bus speeds, define a bus-speed ratio calibration. Thus in future driver writing, people will be forced to think clearly about what the timing characteristics of their device on its bus must be. That presupposes that driver writers understand the timing
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override
I am so happy that there will be a way for people who don't build their own kernels to run Linux on their HP and Compaq laptops that have problems with gazillions of writes to port 80, and I'm also happy that some of the strange driver code will be cleaned up over time. Thank you all. Some thoughts you all might consider, take or leave, in this process, from an old engineering manager who once had to worry about QA for software on nearly every personal computer model in the 1980-1992 period: You know, there is a class of devices that are defined to use port 0x80... it's that historically useful class of devices that show/record the POST diagnostics. It certainly was not designed for delay purposes. In fact, some of those same silly devices are still used in industry during manufacturing test. I wonder what would happen if Windows were not part of manufacturing test, and instead Linux were the standard for some category of machines... When I was still working at Lotus in the late '80's, when we still supported machines like 286's, there were lots of problems with timing loops in drivers in applications (even Win 3.0 had some in hard disk drivers, as did some of our printer drivers, ...), as clock speeds continued to ramp. There were major news stories of machines that crashed when xyz application or zyx peripheral were added. It was Intel, as I recall, that started publicly berating companies in the PC industry for using the two short jumps solutions, and suggesting that they measure the processor speed at bootup, using the BIOS standard for doing that with the int 15 BIOS elapsed time calls, and always use calibrated timing loops. Which all of us who supported device drivers started to do (remember, apps had device drivers in those days for many devices that talked directly with the registers). I was impressed when I dug into Linux eventually, that this operating system got it right by measuring the timing during boot and creating a udelay function that really worked! So I have to say, that when I was tracing down the problem that originally kicked off this thread, which was that just accessing the RTC using the standard CMOS_READ macros in a loop caused a hang, that these outb al,80h things were there. And I noticed your skeptical comment in the code, Linus. Knowing that there was never in any of the documented RTC chipsets a need for a pause between accesses (going back to my days at Software Arts working on just about every old machine there was...) I changed it on a lark to do no pause at all. And my machine never hung... Now what's interesting is that the outb to port 80 is *faster* than an outb to an unused port, on my machine. So there's something there - actually accepting the bus transaction. In the ancient 5150 PC, 80 was unused because it was the DMA controller port that drove memory refresh, and had no meaning. Now my current hypothesis (not having access to quanta's design specs for a board they designed and have shipped in quantity, or having taken the laptop apart recently) is that there is logic there on port 80, doing something. Perhaps even POST diagnostic recording as every PC since the XT has supported... perhaps supporting post-crash dignostics... And that that something has a buffer, perhaps even in the Embedded Controller that may need emptying periodically. It takes several tens of thousands of outb to port 80 to hang the hardware solid - so something is either rare or overflowing. In any case, if this hypothesis is correct - the hardware may have an erratum, but the hardware is doing a very desirable thing - standardizing on an error mechanism that was already in the standard as an option... It's Linux that is using a standard in a wrong way (a diagnostic port as a delay). So I say all this, mainly to point out that Linux has done timing loops right (udelay and ndelay) - except one place where there was some skepticism expressed, right there in the code. Linus may have some idea why it was thought important to do an essential delay with a bus transaction that had uncertain timing. My hypothesis is that community projects have the danger of magical theories and coolness overriding careful engineering design practices. Cleaning up that clever hack that seemed so good at the time is hugely difficult, especially when the driver writer didn't write down why he used it. Thus I would suggest that the _p functions be deprecated, and if there needs to be a timing-delay after in/out instructions, define in_pause(port, nsec_delay) with an explicit delay. And if the delay is dependent on bus speeds, define a bus-speed ratio calibration. Thus in future driver writing, people will be forced to think clearly about what the timing characteristics of their device on its bus must be. That presupposes that driver writers understand the timing issues. If they do not, they
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override
Alan Cox wrote: Now what's interesting is that the outb to port 80 is *faster* than an outb to an unused port, on my machine. So there's something there - actually accepting the bus transaction. In the ancient 5150 PC, 80 was Yes and I even told you a while back how to verify where it is. From the timing you get its not on the LPC bus but chipset core so pretty certainly an SMM trap as other systems with the same chipset don't have the bug. Probably all that is needed is a BIOS upgrade Actually, I could see whether it was SMM trapping due to AMD MSR's that would allow such trapping, performance or debug registers. Nothing was set to trap with SMI or other traps on any port outputs. But I'm continuing to investigate for a cause. It would be nice if it were a BIOS-fixable problem. It would be even nicer if the BIOS were GPL... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)
Richard Harman wrote: I think I may have a monkey wrench to throw into this, I finally got around to testing the C1E patch, and the port80 patch. End result: port80 patch has zero effect on this laptop, and the C1E patch makes it stable. Stating that your system is stable is not very definitive. Does hwclock work when full Fedora 8 is running without the port80 patch, or have you disabled the uses of hwclock in your init and shutdown code? Have you set the hwclock setting to use the extremely dangerous -directisa option - which hides the problem because it avoids the port 80 i/o? Try compiling and running the test program port80.c a few times. If your machine doesn't hang, it would be interesting to see the results it gives. The C1E patch alone does not fix the port 80 problem several of us have observed. what does dmidecode say for your motherboard vendor and model? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override
H. Peter Anvin wrote: Now, I think there is a specific reason to believe that EGA/VGA (but perhaps not CGA/MDA) didn't need these kinds of hacks: the video cards of the day was touched, directly, by an interminable number of DOS applications. CGA/MDA generally *were not*, due to the unsynchronized memory of the original versions (writing could cause snow), so most applications tended to fall back to using the BIOS access methods for CGA and MDA. A little history... not that it really matters, but some might be interested in a 55-year-old hacker's sentimental recollections...As someone who actually wrote drivers for CGA and MDA on the original IBM PC, I can tell you that back to back I/O *port* writes and reads were perfectly fine. The snow problem had nothing to do with I/O ports. It had to do with the memory on the CGA adapter card not being dual ported, and in high-res (80x25) character mode (only!) a CPU read or write access caused a read of the adapter memory by the character-generator to fail, causing one character-position of the current scanline being output to get all random bits, which was then put through the character-generator and generated whatever the character generator did with 8 random bits of character or attributes as an index into the character generator's font table. In particular, the solution in both the BIOS and in Visicalc, 1-2-3, and other products that did NOT use the BIOS or DOS for I/O to the CGA or MDA because they were Dog Slow, was to detect the CGA, and do a *very* tight loop doing inb instructions from one of the CGA status registers, looking for a 0-1 transition on the horizontal retrace flag. It would then do a write to display memory with all interrupts locked out, because that was all it could do during the horizontal retrace, given the speed of the processor. One of the hacks I did in those days (I wrote the CGA driver for Visicalc Advanced Version and several other Software Arts programs, some of which were sold to Lotus when they bought our assets, and hired me, in 1985) was to measure the horizontal retrace time and the vertical blanking interval when the program started, and compile screen-writing code that squeezed as many writes as possible into both horizontal retraces and vertical retraces. That was actually a selling point for spreadsheets - the reviewers actually measured whether you could use the down-arrow key in auto-repeat mode and keep the screen scrolling at the relevant rate! That was hard on an 8088 or 80286 processor, with a CGA card. It was great when EGA and VGA came out, but we still had to support the CGA long after. Which is why I fully understand the need not to break old machines. We had to run on every machine that was claimed to be PC compatible - many of which were hardly so compatible (the PS/2 model 50 had a completely erroneous serial chip that claimed to emulate the original 8250, but had an immense pile of bugs, for example, that IBM begged ISVs to call a software problem and fix so they didn't get sued). The IBM PC bus (predecessor of the current ISA bus, which came from the PC-AT's 16-bit bus), did just fine electrically - any I/O port-specific timing problems had to do with the timing of the chips attached to the bus. For example, if a bus write to a port was routed into a particular chip, the timing of that chip's subsequent processing might be such that it was not ready to respond to another read or write.) That's not a signalling problem - it has nothing to do with capacitance on the bus, e.g., but a functional speed problem in the chip (if on the motherboard) or the adapter card. Rant off. This has nothing, of course, to do with present issues. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)
Islam Amer wrote: Hello. I was interested in getting dynticks to work on my compaq presario v6000 to help with the 1 hour thirty minutes battery time, but after this discussion I lost interest. I too had the early boot time hang, and found it was udev triggering the bug. This early boot time hang is *almost certainly* due to the in/out port 80 bug, which I discovered a few weeks ago, which affects hwclock and other I/O device drivers on a number of HP/Compaq machines in exactly this way. The proper fix for this bug is in dispute, and will probably not occur in the 2.6.24 release because it touches code in many, many drivers. The simplest way to test if you have a problem of this sort is to try this shell line as root, after you boot successfully. If your machine hangs hard, you have a problem that really looks like the port 80 problem. for ((i = 0; i < 1000; i = i + 1)); do cat /dev/nvram > /dev/null; done I have also attached a c program that only touches port 80. Compile it for 32-bit mode (see comment), run it as root, and after two or three runs, it will hang a system that has the port 80 bug. If you then run: dmidecode -s baseboard-manufacturer dmidecode -s baseboard-product-name are the values you should plug into the .matches field in the dmi_system_id struct in the attached patch. It would be great if you could do that, test, and post back with those values so they can be accumulated. HP/Compaq machines with quanta m/b's are very popular, and very common - so at least a quirk patch for all the broken models would be worth doing in 2.6.25 or downstream in the distros. The right patches will probably take a long time - there is a dispute as to what the semantics of port 80 writes even mean among the core kernel developers, because the hack is lost in the dim dark days of history, and safe resolution will take time There is also a C1E issue with the BIOS in my machine (an HP Pavilion dv9000z). I don't know if it is a bug, yet, but that's a different problem - associated with dynticks, perhaps. I have to say that researching the AMD Kernel/BIOS docs on C1E (a very new feature in the last year on AMD) leaves me puzzled as to whether the dynticks problem exists on my machine at all, but the patch for it turns off dynticks! Changing the /etc/init.d/udev script so that the line containing /sbin/udevtrigger to /sbin/udevtrigger --subsystem-nomatch="*misc*" seemed to fix things. the hang is triggered specifically by echo add > /sys/class/misc/rtc/uevent after inserting rtc.ko Also using hwclock to set the rtc , will cause a hard hang, if you are using 64bit linux. Disable the init scripts that set the time, or use the 32bit binary, as suggested here : http://www.mail-archive.com/[EMAIL PROTECTED]/msg41964.html I hope this helps. But your hardware is slightly different though. commit c12c7a47b9af87e8d867d5aa0ca5c6bcdd2463da Author: Rene Herman <[EMAIL PROTECTED]> Date: Mon Dec 17 21:23:55 2007 +0100 x86: provide a DMI based port 0x80 I/O delay override. Certain (HP) laptops experience trouble from our port 0x80 I/O delay writes. This patch provides for a DMI based switch to the "alternate diagnostic port" 0xed (as used by some BIOSes as well) for these. David P. Reed confirmed that port 0xed works for him and provides a proper delay. The symptoms of _not_ working are a hanging machine, with "hwclock" use being a direct trigger. Earlier versions of this attempted to simply use udelay(2), with the 2 being a value tested to be a nicely conservative upper-bound with help from many on the linux-kernel mailinglist, but that approach has two problems. First, pre-loops_per_jiffy calibration (which is post PIT init while some implementations of the PIT are actually one of the historically problematic devices that need the delay) udelay() isn't particularly well-defined. We could initialise loops_per_jiffy conservatively (and based on CPU family so as to not unduly delay old machines) which would sort of work, but still leaves: Second, delaying isn't the only effect that a write to port 0x80 has. It's also a PCI posting barrier which some devices may be explicitly or implicitly relying on. Alan Cox did a survey and found evidence that additionally various drivers are racy on SMP without the bus locking outb. Switching to an inb() makes the timing too unpredictable and as such, this DMI based switch should be the safest approach for now. Any more invasive changes should get more rigid testing first. It's moreover only very few machines with the problem and a DMI based hack seems to fit that situation. An early boot parameter to make the choice manually (and override any possible DMI based decision) is also provided: io_delay=standard|alterna
Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)
Islam Amer wrote: Hello. I was interested in getting dynticks to work on my compaq presario v6000 to help with the 1 hour thirty minutes battery time, but after this discussion I lost interest. I too had the early boot time hang, and found it was udev triggering the bug. This early boot time hang is *almost certainly* due to the in/out port 80 bug, which I discovered a few weeks ago, which affects hwclock and other I/O device drivers on a number of HP/Compaq machines in exactly this way. The proper fix for this bug is in dispute, and will probably not occur in the 2.6.24 release because it touches code in many, many drivers. The simplest way to test if you have a problem of this sort is to try this shell line as root, after you boot successfully. If your machine hangs hard, you have a problem that really looks like the port 80 problem. for ((i = 0; i 1000; i = i + 1)); do cat /dev/nvram /dev/null; done I have also attached a c program that only touches port 80. Compile it for 32-bit mode (see comment), run it as root, and after two or three runs, it will hang a system that has the port 80 bug. If you then run: dmidecode -s baseboard-manufacturer dmidecode -s baseboard-product-name are the values you should plug into the .matches field in the dmi_system_id struct in the attached patch. It would be great if you could do that, test, and post back with those values so they can be accumulated. HP/Compaq machines with quanta m/b's are very popular, and very common - so at least a quirk patch for all the broken models would be worth doing in 2.6.25 or downstream in the distros. The right patches will probably take a long time - there is a dispute as to what the semantics of port 80 writes even mean among the core kernel developers, because the hack is lost in the dim dark days of history, and safe resolution will take time There is also a C1E issue with the BIOS in my machine (an HP Pavilion dv9000z). I don't know if it is a bug, yet, but that's a different problem - associated with dynticks, perhaps. I have to say that researching the AMD Kernel/BIOS docs on C1E (a very new feature in the last year on AMD) leaves me puzzled as to whether the dynticks problem exists on my machine at all, but the patch for it turns off dynticks! Changing the /etc/init.d/udev script so that the line containing /sbin/udevtrigger to /sbin/udevtrigger --subsystem-nomatch=*misc* seemed to fix things. the hang is triggered specifically by echo add /sys/class/misc/rtc/uevent after inserting rtc.ko Also using hwclock to set the rtc , will cause a hard hang, if you are using 64bit linux. Disable the init scripts that set the time, or use the 32bit binary, as suggested here : http://www.mail-archive.com/[EMAIL PROTECTED]/msg41964.html I hope this helps. But your hardware is slightly different though. commit c12c7a47b9af87e8d867d5aa0ca5c6bcdd2463da Author: Rene Herman [EMAIL PROTECTED] Date: Mon Dec 17 21:23:55 2007 +0100 x86: provide a DMI based port 0x80 I/O delay override. Certain (HP) laptops experience trouble from our port 0x80 I/O delay writes. This patch provides for a DMI based switch to the alternate diagnostic port 0xed (as used by some BIOSes as well) for these. David P. Reed confirmed that port 0xed works for him and provides a proper delay. The symptoms of _not_ working are a hanging machine, with hwclock use being a direct trigger. Earlier versions of this attempted to simply use udelay(2), with the 2 being a value tested to be a nicely conservative upper-bound with help from many on the linux-kernel mailinglist, but that approach has two problems. First, pre-loops_per_jiffy calibration (which is post PIT init while some implementations of the PIT are actually one of the historically problematic devices that need the delay) udelay() isn't particularly well-defined. We could initialise loops_per_jiffy conservatively (and based on CPU family so as to not unduly delay old machines) which would sort of work, but still leaves: Second, delaying isn't the only effect that a write to port 0x80 has. It's also a PCI posting barrier which some devices may be explicitly or implicitly relying on. Alan Cox did a survey and found evidence that additionally various drivers are racy on SMP without the bus locking outb. Switching to an inb() makes the timing too unpredictable and as such, this DMI based switch should be the safest approach for now. Any more invasive changes should get more rigid testing first. It's moreover only very few machines with the problem and a DMI based hack seems to fit that situation. An early boot parameter to make the choice manually (and override any possible DMI based decision) is also provided: io_delay=standard|alternate This does not change the io_delay() in the boot
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Besides the two reports of freezes on bugzilla.kernel.org (9511, 6307), the following two bug reports on bugzilla.redhat.com are almost certainly due to the same cause (imo, of course): 245834, 227234. Ubuntu launchpad bug 158849 also seems to report the same problem, for an HP dv6258se 64-bit machine. Also this one: http://www.mail-archive.com/[EMAIL PROTECTED]/msg10321.html If you want to collect dmidecode data from these folks, perhaps we might get a wider sense of what categories of machines are affected. They all seem to be recemt HP and Compaq AMD64 laptops, probably all Quanta motherboards. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
H. Peter Anvin wrote: David P. Reed wrote: As support: port 80 on the reporter's (my) HP dv9000z laptop clearly responds to reads differently than "unused" ports. In particular, an inb takes 1/2 the elapsed time compared to a read to "known" unused port 0xed - 792 tsc ticks for port 80 compared to about 1450 tsc ticks for port 0xed and other unused ports (tsc at 800 MHz). Any timings for port 0xf0 (write zero), out of curiosity? Here's a bunch of data: port 0xF0: cycles: out 919, in 933 port 0xed: cycles: out 2541, in 2036 port 0x70: cycles: out n/a, in 934 port 0x80: cycles: out 1424, in 795 AMD Turion 64x2 TL-60 CPU running at 800 MHz, nVidia MCP51 chipset, Quanta motherboard. Running 2.6.24-rc5 with Ingo's patch so inb_p, etc. use port 0xed. Note that I can run the port 80 test once, the second time I get the hard freeze. I didn't try writing to port 70 from userspace - that one's dangerous, but the reading of it was included for a timing typical of a chipset supported device. These are all pretty consistent. I find the "read" timing from 0x80 very interesting. The write timeing is also interesting, being faster than an unused port. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
H. Peter Anvin wrote: Rene Herman wrote: I do not know how universal that is, but _reading_ port 0xf0 might in fact be sensible then? And should even work on a 386/387 pair? (I have a 386/387 in fact, although I'd need to dig it up). No. Someone might have used 0xf0 as a readonly port for other uses. As support: port 80 on the reporter's (my) HP dv9000z laptop clearly responds to reads differently than "unused" ports. In particular, an inb takes 1/2 the elapsed time compared to a read to "known" unused port 0xed - 792 tsc ticks for port 80 compared to about 1450 tsc ticks for port 0xed and other unused ports (tsc at 800 MHz). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Ingo - I finished testing the rolled up patch that you provided. It seems to work just fine. Thank you for putting this all together and persevering in this long and complex discussion. Here are the results, on the offending laptop, using 2.6.24-rc5 plus that one patch. First: booted with normal boot parameters (no io_delay=): According to dmesg, 0xed is used. hwclock ran fine, hundreds of times. my shell script loop doing "cat /dev/nvram > /dev/null" ran fine, several times. Running Rene's "port 80" speed test ran fine once, then froze the system hard. (expected) Second: booted with io_delay=0x80, several tests, rebooting after freezes: hwclock froze system hard. (this is the problem that drove me to find this bug). my shell script loop froze system hard. Third: booted with io_delay=none: hwclock ran fine, also hundreds of times. my shell script loop ran fine several times. Running rene's port80 test ran fine twice, froze system hard on third try. Fourth: booted with io_delay=udelay: hwclock ran fine, also hundreds of times. my shell script loop ran fine several times. Running Rene's port80 test ran fine, froze system hard on second try. Analysis: patch works fine, and default to 0xed seems super conservative. I will probably use the boot parameter io_delay=none, because I don't seem to have any I/O devices that require any delays - and this way I can find any that do. Still wondering: what the heck is going on with port 80 on my laptop motherboard. Clearly it "does something". I will in my spare time continue investigating, though having a reliable system is GREAT. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
About to start building and testing. It will take a few hours. Ingo Molnar wrote: here's an updated rollup patch, against 2.6.24-rc4. David, could you please try this? This should work out of box on your system, without any boot option or other tweak needed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Rene Herman wrote: No, most definitely not. Having the user select udelay or none through the kernel config and then the kernel deciding "ah, you know what, I'll know better and use port access anyway" is _utterly_ broken behaviour. Software needs to listen to its master. When acting as an ordinary user, the .config is beyond my control (except on Gentoo). It is in control of the distro (Fedora, Ubuntu, ... but perhaps not Gentoo). I think the distro guys want a default behavior that is set in .config, with quirk overrides being done when needed. And of course the user in his/her boot params gets the final say. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/