Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic
On Tuesday, July 7, 2020 3:24pm, "Sean Christopherson" said: > On Tue, Jul 07, 2020 at 03:09:38PM -0400, David P. Reed wrote: >> >> On Tuesday, July 7, 2020 1:09am, "Sean Christopherson" >> said: >> Sean, are you the one who would get this particular fix pushed into Linus's >> tree, by the way? The "maintainership" is not clear to me. > > Nope, I'm just here to complain and nitpick :-) There's no direct maintainer > for virtext.h so it falls under the higher level arch/x86 umbrella, i.e. I > expect Boris/Thomas/Ingo will pick this up. > Thanks for your time and effort in helping.
Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic
On Tuesday, July 7, 2020 1:09am, "Sean Christopherson" said: > On Sat, Jul 04, 2020 at 04:38:08PM -0400, David P. Reed wrote: >> Fix: Mask undefined operation fault during emergency VMXOFF that must be >> attempted to force cpu exit from VMX root operation. >> Explanation: When a cpu may be in VMX root operation (only possible when >> CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation >> using VMXOFF. This is necessary, because any INIT will be masked while cpu >> is in VMX root operation, but that state cannot be reliably >> discerned by the state of the cpu. >> VMXOFF faults if the cpu is not actually in VMX root operation, signalling >> undefined operation. >> Discovered while debugging an out-of-tree x-visor with a race. Can happen >> due to certain kinds of bugs in KVM. >> >> Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> >> Reported-by: David P. Reed >> Suggested-by: Thomas Gleixner >> Suggested-by: Sean Christopherson >> Suggested-by: Andy Lutomirski >> Signed-off-by: David P. Reed >> --- >> arch/x86/include/asm/virtext.h | 20 ++-- >> 1 file changed, 14 insertions(+), 6 deletions(-) >> >> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h >> index 0ede8d04535a..0e0900eacb9c 100644 >> --- a/arch/x86/include/asm/virtext.h >> +++ b/arch/x86/include/asm/virtext.h >> @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void) >> } >> >> >> -/* Disable VMX on the current CPU >> +/* Exit VMX root mode and isable VMX on the current CPU. >> * >> * vmxoff causes a undefined-opcode exception if vmxon was not run >> - * on the CPU previously. Only call this function if you know VMX >> - * is enabled. >> + * on the CPU previously. Only call this function if you know cpu >> + * is in VMX root mode. >> */ >> static inline void cpu_vmxoff(void) >> { >> @@ -47,14 +47,22 @@ static inline int cpu_vmx_enabled(void) >> return __read_cr4() & X86_CR4_VMXE; >> } >> >> -/* Disable VMX if it is enabled on the current CPU >> +/* Safely exit VMX root mode and disable VMX if VMX enabled >> + * on the current CPU. Handle undefined-opcode fault >> + * that can occur if cpu is not in VMX root mode, due >> + * to a race. >> * >> * You shouldn't call this if cpu_has_vmx() returns 0. >> */ >> static inline void __cpu_emergency_vmxoff(void) >> { >> -if (cpu_vmx_enabled()) >> -cpu_vmxoff(); >> +if (!cpu_vmx_enabled()) >> +return; >> +asm volatile ("1:vmxoff\n\t" >> + "2:\n\t" >> + _ASM_EXTABLE(1b, 2b) >> + ::: "cc", "memory"); >> +cr4_clear_bits(X86_CR4_VMXE); > > Open coding vmxoff doesn't make sense, and IMO is flat out wrong as it fixes > flows that use __cpu_emergency_vmxoff() but leaves the same bug hanging > around in emergency_vmx_disable_all() until the next patch. > > The reason I say it doesn't make sense is that there is no sane scenario > where the generic vmxoff helper should _not_ eat the fault. All other VMXOFF > faults are mode related, i.e. any fault is guaranteed to be due to the > !post-VMXON check unless we're magically in RM, VM86, compat mode, or at > CPL>0. Given that the whole point of this series is that it's impossible to > determine whether or not the CPU if post-VMXON if CR4.VMXE=1 without taking a > fault of some form, there's simply no way that anything except the hypervisor > (in normal operation) can know the state of VMX. And given that the only > in-tree hypervisor (KVM) has its own version of vmxoff, that means there is > no scenario in which cpu_vmxoff() can safely be used. Case in point, after > the next patch there are no users of cpu_vmxoff(). > > TL;DR: Just do fixup on cpu_vmxoff(). Personally, I don't care either way, since it fixes the bug either way (and it's inlined, so either way no additional code is generated. I was just being conservative since the cpu_vmxoff() is exported throughout the kernel source, so it might be expected to stay the same (when not in an "emergency"). I'll wait a day or two for any objections to just doing the fix in cpu_vmxoff() by other commenters. WIth no objection, I'll just do it that way. Sean, are you the one who would get this particular fix pushed into Linus's tree, by the way? The "maintainership" is not clear to me. If you are, happy to take direction from you as the primary input. > >> } >> >> /* Disable VMX if it is supported and enabled on the current CPU >> -- >> 2.26.2 >> >
Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic
On Sunday, July 5, 2020 4:55pm, "Andy Lutomirski" said: > On Sun, Jul 5, 2020 at 12:52 PM David P. Reed wrote: >> >> Thanks, will handle these. 2 questions below. >> >> On Sunday, July 5, 2020 2:22pm, "Andy Lutomirski" said: >> >> > On Sat, Jul 4, 2020 at 1:38 PM David P. Reed wrote: >> >> >> >> Fix: Mask undefined operation fault during emergency VMXOFF that must be >> >> attempted to force cpu exit from VMX root operation. >> >> Explanation: When a cpu may be in VMX root operation (only possible when >> >> CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation >> >> using VMXOFF. This is necessary, because any INIT will be masked while cpu >> >> is in VMX root operation, but that state cannot be reliably >> >> discerned by the state of the cpu. >> >> VMXOFF faults if the cpu is not actually in VMX root operation, signalling >> >> undefined operation. >> >> Discovered while debugging an out-of-tree x-visor with a race. Can happen >> >> due to certain kinds of bugs in KVM. >> > >> > Can you re-wrap lines to 68 characters? Also, the Fix: and >> >> I used 'scripts/checkpatch.pl' and it had me wrap to 75 chars: >> "WARNING: Possible unwrapped commit description (prefer a maximum 75 chars >> per >> line)" >> >> Should I submit a fix to checkpatch.pl to say 68? > > 75 is probably fine too, but something is odd about your wrapping. > You have long lines mostly alternating with short lines. It's as if > you wrote 120-ish character lines and then wrapped to 75 without > reflowing. My emacs settings tend to wrap at about 85 depending on file type (big screens). I did the shortening manually, aimed at breaking at meaningful points, not worrying too much about line-length uniformity. > >> >> > Explanation: is probably unnecessary. You could say: >> > >> > Ignore a potential #UD failut during emergency VMXOFF ... >> > >> > When a cpu may be in VMX ... >> > >> >> >> >> Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> >> >> Reported-by: David P. Reed >> > >> > It's not really necessary to say that you, the author, reported the >> > problem, but I guess it's harmless. >> > >> >> Suggested-by: Thomas Gleixner >> >> Suggested-by: Sean Christopherson >> >> Suggested-by: Andy Lutomirski >> >> Signed-off-by: David P. Reed >> >> --- >> >> arch/x86/include/asm/virtext.h | 20 ++-- >> >> 1 file changed, 14 insertions(+), 6 deletions(-) >> >> >> >> diff --git a/arch/x86/include/asm/virtext.h >> >> b/arch/x86/include/asm/virtext.h >> >> index 0ede8d04535a..0e0900eacb9c 100644 >> >> --- a/arch/x86/include/asm/virtext.h >> >> +++ b/arch/x86/include/asm/virtext.h >> >> @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void) >> >> } >> >> >> >> >> >> -/* Disable VMX on the current CPU >> >> +/* Exit VMX root mode and isable VMX on the current CPU. >> > >> > s/isable/disable/ >> > >> > >> >> /* Disable VMX if it is supported and enabled on the current CPU >> >> -- >> >> 2.26.2 >> >> >> > >> > Other than that: >> > >> > Reviewed-by: Andy Lutomirski >> >> As a newbie, I have a process question - should I resend the patch with the >> 'Reviewed-by' line, as well as correcting the other wording? Thanks! > > Probably. Sometimes a maintainer will apply the patch and make these > types of cosmetic changes, but it's easier if you resubmit. That > being said, for non-urgent patches, it's usually considered polite to > wait a day or two to give other people a chance to comment. I'm not sure which maintainer will move the patches along. I am waiting for additional input, but will resubmit in a day or two. > > --Andy >
Re: [PATCH v3 3/3] Force all cpus to exit VMX root operation on crash/panic reliably
On Sunday, July 5, 2020 2:26pm, "Andy Lutomirski" said: > On Sat, Jul 4, 2020 at 1:38 PM David P. Reed wrote: >> >> Fix the logic during crash/panic reboot on Intel processors that >> can support VMX operation to ensure that all processors are not >> in VMX root operation. Prior code made optimistic assumptions >> about other cpus that would leave other cpus in VMX root operation >> depending on timing of crash/panic reboot. >> Builds on cpu_ermergency_vmxoff() and __cpu_emergency_vmxoff() created >> in a prior patch. >> >> Suggested-by: Sean Christopherson >> Signed-off-by: David P. Reed >> --- >> arch/x86/kernel/reboot.c | 20 +++- >> 1 file changed, 7 insertions(+), 13 deletions(-) >> >> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c >> index 0ec7ced727fe..c8e96ba78efc 100644 >> --- a/arch/x86/kernel/reboot.c >> +++ b/arch/x86/kernel/reboot.c >> @@ -543,24 +543,18 @@ static void emergency_vmx_disable_all(void) >> * signals when VMX is enabled. >> * >> * We can't take any locks and we may be on an inconsistent >> -* state, so we use NMIs as IPIs to tell the other CPUs to disable >> -* VMX and halt. >> +* state, so we use NMIs as IPIs to tell the other CPUs to exit >> +* VMX root operation and halt. >> * >> * For safety, we will avoid running the nmi_shootdown_cpus() >> * stuff unnecessarily, but we don't have a way to check >> -* if other CPUs have VMX enabled. So we will call it only if the >> -* CPU we are running on has VMX enabled. >> -* >> -* We will miss cases where VMX is not enabled on all CPUs. This >> -* shouldn't do much harm because KVM always enable VMX on all >> -* CPUs anyway. But we can miss it on the small window where KVM >> -* is still enabling VMX. >> +* if other CPUs might be in VMX root operation. >> */ >> - if (cpu_has_vmx() && cpu_vmx_enabled()) { >> - /* Disable VMX on this CPU. */ >> - cpu_vmxoff(); >> + if (cpu_has_vmx()) { >> + /* Safely force out of VMX root operation on this CPU. */ >> + __cpu_emergency_vmxoff(); >> >> - /* Halt and disable VMX on the other CPUs */ >> + /* Halt and exit VMX root operation on the other CPUs */ >> nmi_shootdown_cpus(vmxoff_nmi); >> >> } > > Seems reasonable to me. > > As a minor caveat, doing cr4_clear_bits() in NMI context is not really > okay, but we're about to reboot, so nothing too awful should happen. > And this has very little to do with your patch. I had wondered why the bit is cleared, too. (I assumed it was OK or desirable, because it was being cleared in NMI context before). Happy to submit a separate patch to eliminate that issue as well, since the point of emergency vmxoff is only to get out of VMX root mode - CR4.VMXE's state is irrelevant. Of course, clearing it prevents any future emergency vmxoff attempts. (there seemed to be some confusion about "enabling" VMX vs. "in VMX operation" in the comments) Should I? > > Reviewed-by: Andy Lutomirski >
Re: [PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic
Thanks, will handle these. 2 questions below. On Sunday, July 5, 2020 2:22pm, "Andy Lutomirski" said: > On Sat, Jul 4, 2020 at 1:38 PM David P. Reed wrote: >> >> Fix: Mask undefined operation fault during emergency VMXOFF that must be >> attempted to force cpu exit from VMX root operation. >> Explanation: When a cpu may be in VMX root operation (only possible when >> CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation >> using VMXOFF. This is necessary, because any INIT will be masked while cpu >> is in VMX root operation, but that state cannot be reliably >> discerned by the state of the cpu. >> VMXOFF faults if the cpu is not actually in VMX root operation, signalling >> undefined operation. >> Discovered while debugging an out-of-tree x-visor with a race. Can happen >> due to certain kinds of bugs in KVM. > > Can you re-wrap lines to 68 characters? Also, the Fix: and I used 'scripts/checkpatch.pl' and it had me wrap to 75 chars: "WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)" Should I submit a fix to checkpatch.pl to say 68? > Explanation: is probably unnecessary. You could say: > > Ignore a potential #UD failut during emergency VMXOFF ... > > When a cpu may be in VMX ... > >> >> Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> >> Reported-by: David P. Reed > > It's not really necessary to say that you, the author, reported the > problem, but I guess it's harmless. > >> Suggested-by: Thomas Gleixner >> Suggested-by: Sean Christopherson >> Suggested-by: Andy Lutomirski >> Signed-off-by: David P. Reed >> --- >> arch/x86/include/asm/virtext.h | 20 ++-- >> 1 file changed, 14 insertions(+), 6 deletions(-) >> >> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h >> index 0ede8d04535a..0e0900eacb9c 100644 >> --- a/arch/x86/include/asm/virtext.h >> +++ b/arch/x86/include/asm/virtext.h >> @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void) >> } >> >> >> -/* Disable VMX on the current CPU >> +/* Exit VMX root mode and isable VMX on the current CPU. > > s/isable/disable/ > > >> /* Disable VMX if it is supported and enabled on the current CPU >> -- >> 2.26.2 >> > > Other than that: > > Reviewed-by: Andy Lutomirski As a newbie, I have a process question - should I resend the patch with the 'Reviewed-by' line, as well as correcting the other wording? Thanks! > > --Andy >
[PATCH v3 2/3] Fix undefined operation fault that can hang a cpu on crash or panic
Fix: Mask undefined operation fault during emergency VMXOFF that must be attempted to force cpu exit from VMX root operation. Explanation: When a cpu may be in VMX root operation (only possible when CR4.VMXE is set), crash or panic reboot tries to exit VMX root operation using VMXOFF. This is necessary, because any INIT will be masked while cpu is in VMX root operation, but that state cannot be reliably discerned by the state of the cpu. VMXOFF faults if the cpu is not actually in VMX root operation, signalling undefined operation. Discovered while debugging an out-of-tree x-visor with a race. Can happen due to certain kinds of bugs in KVM. Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> Reported-by: David P. Reed Suggested-by: Thomas Gleixner Suggested-by: Sean Christopherson Suggested-by: Andy Lutomirski Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 0ede8d04535a..0e0900eacb9c 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -30,11 +30,11 @@ static inline int cpu_has_vmx(void) } -/* Disable VMX on the current CPU +/* Exit VMX root mode and isable VMX on the current CPU. * * vmxoff causes a undefined-opcode exception if vmxon was not run - * on the CPU previously. Only call this function if you know VMX - * is enabled. + * on the CPU previously. Only call this function if you know cpu + * is in VMX root mode. */ static inline void cpu_vmxoff(void) { @@ -47,14 +47,22 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/* Disable VMX if it is enabled on the current CPU +/* Safely exit VMX root mode and disable VMX if VMX enabled + * on the current CPU. Handle undefined-opcode fault + * that can occur if cpu is not in VMX root mode, due + * to a race. * * You shouldn't call this if cpu_has_vmx() returns 0. */ static inline void __cpu_emergency_vmxoff(void) { - if (cpu_vmx_enabled()) - cpu_vmxoff(); + if (!cpu_vmx_enabled()) + return; + asm volatile ("1:vmxoff\n\t" + "2:\n\t" + _ASM_EXTABLE(1b, 2b) + ::: "cc", "memory"); + cr4_clear_bits(X86_CR4_VMXE); } /* Disable VMX if it is supported and enabled on the current CPU -- 2.26.2
[PATCH v3 0/3] Fix undefined operation VMXOFF during reboot and crash
At the request of Sean Christopherson, the original patch was split into three patches, each fixing a distinct issue related to the original bug, of a hang due to VMXOFF causing an undefined operation fault when the kernel reboots with CR4.VMXE set. The combination of the patches is the complete fix to the reported bug, and a lurking error in asm side effects. David P. Reed (3): Correct asm VMXOFF side effects Fix undefined operation fault that can hang a cpu on crash or panic Force all cpus to exit VMX root operation on crash/panic reliably arch/x86/include/asm/virtext.h | 24 arch/x86/kernel/reboot.c | 20 +++- 2 files changed, 23 insertions(+), 21 deletions(-) -- 2.26.2
[PATCH v3 3/3] Force all cpus to exit VMX root operation on crash/panic reliably
Fix the logic during crash/panic reboot on Intel processors that can support VMX operation to ensure that all processors are not in VMX root operation. Prior code made optimistic assumptions about other cpus that would leave other cpus in VMX root operation depending on timing of crash/panic reboot. Builds on cpu_ermergency_vmxoff() and __cpu_emergency_vmxoff() created in a prior patch. Suggested-by: Sean Christopherson Signed-off-by: David P. Reed --- arch/x86/kernel/reboot.c | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 0ec7ced727fe..c8e96ba78efc 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -543,24 +543,18 @@ static void emergency_vmx_disable_all(void) * signals when VMX is enabled. * * We can't take any locks and we may be on an inconsistent -* state, so we use NMIs as IPIs to tell the other CPUs to disable -* VMX and halt. +* state, so we use NMIs as IPIs to tell the other CPUs to exit +* VMX root operation and halt. * * For safety, we will avoid running the nmi_shootdown_cpus() * stuff unnecessarily, but we don't have a way to check -* if other CPUs have VMX enabled. So we will call it only if the -* CPU we are running on has VMX enabled. -* -* We will miss cases where VMX is not enabled on all CPUs. This -* shouldn't do much harm because KVM always enable VMX on all -* CPUs anyway. But we can miss it on the small window where KVM -* is still enabling VMX. +* if other CPUs might be in VMX root operation. */ - if (cpu_has_vmx() && cpu_vmx_enabled()) { - /* Disable VMX on this CPU. */ - cpu_vmxoff(); + if (cpu_has_vmx()) { + /* Safely force out of VMX root operation on this CPU. */ + __cpu_emergency_vmxoff(); - /* Halt and disable VMX on the other CPUs */ + /* Halt and exit VMX root operation on the other CPUs */ nmi_shootdown_cpus(vmxoff_nmi); } -- 2.26.2
[PATCH v3 1/3] Correct asm VMXOFF side effects
Tell gcc that VMXOFF instruction clobbers condition codes and memory when executed. Also, correct original comments to remove kernel-doc syntax per Randy Dunlap's request. Suggested-by: Randy Dunlap Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 9aad0e0876fb..0ede8d04535a 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -30,7 +30,7 @@ static inline int cpu_has_vmx(void) } -/** Disable VMX on the current CPU +/* Disable VMX on the current CPU * * vmxoff causes a undefined-opcode exception if vmxon was not run * on the CPU previously. Only call this function if you know VMX @@ -38,7 +38,7 @@ static inline int cpu_has_vmx(void) */ static inline void cpu_vmxoff(void) { - asm volatile ("vmxoff"); + asm volatile ("vmxoff" ::: "cc", "memory"); cr4_clear_bits(X86_CR4_VMXE); } @@ -47,7 +47,7 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/** Disable VMX if it is enabled on the current CPU +/* Disable VMX if it is enabled on the current CPU * * You shouldn't call this if cpu_has_vmx() returns 0. */ @@ -57,7 +57,7 @@ static inline void __cpu_emergency_vmxoff(void) cpu_vmxoff(); } -/** Disable VMX if it is supported and enabled on the current CPU +/* Disable VMX if it is supported and enabled on the current CPU */ static inline void cpu_emergency_vmxoff(void) { -- 2.26.2
Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash
On Monday, June 29, 2020 5:49pm, "Sean Christopherson" said: > On Mon, Jun 29, 2020 at 02:22:45PM -0700, Andy Lutomirski wrote: >> >> >> > On Jun 29, 2020, at 1:54 PM, David P. Reed wrote: >> > >> > Simple question for those on the To: and CC: list here. Should I >> > abandon any hope of this patch being accepted? It's been a long time. >> > >> > The non-response after I acknowledged that this was discovered by when >> > working on a personal, non-commercial research project - which is >> > "out-of-tree" (apparently dirty words on LKML) has me thinking my >> > contribution is unwanted. That's fine, I suppose. I can maintain this patch >> > out-of-tree as well. I did incorporate all the helpful suggestions I >> > received in this second patch, and given some encouragement, will happily >> > submit a revised v3 if there is any likelihood of acceptance. I'm wary of >> > doing more radical changes (like combining emergency and normal paths). >> > >> >> Sorry about being slow and less actively encouraging than we should be. We >> absolutely welcome personal contributions. The actual problem is that >> everyone is worked and we’re all slow. Also, you may be hitting a corner >> case >> in the process: is this a KVM patch or an x86 patch? > > It's an x86 patch as it's not KVM specific, e.g. this code also helps play > nice with out of tree hypervisors. > > The code change is mostly good, but it needs to be split up as there are > three separate fixes: > > 1. Handle #UD on VMXON due to a race. > 2. Mark memory and flags as clobbered by VMXON. > 3. Change emergency_vmx_disable_all() to not manually check > cpu_vmx_enabled(). > > Yes, the changes are tiny, but if for example #3 introduces a bug then we > don't have to revert #1 and #2. Or perhaps older kernels are only subject > to the #1 and #2 and thus dumping all three changes into a single patch makes > it all harder to backport. In other words, all the usual "one change per > patch" reasons. > Thanks. If no one else responds with additional suggestions, I will make it into 3 patches. I'm happy to learn the nuances of the kernel patch regimen.
Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash
Simple question for those on the To: and CC: list here. Should I abandon any hope of this patch being accepted? It's been a long time. The non-response after I acknowledged that this was discovered by when working on a personal, non-commercial research project - which is "out-of-tree" (apparently dirty words on LKML) has me thinking my contribution is unwanted. That's fine, I suppose. I can maintain this patch out-of-tree as well. I did incorporate all the helpful suggestions I received in this second patch, and given some encouragement, will happily submit a revised v3 if there is any likelihood of acceptance. I'm wary of doing more radical changes (like combining emergency and normal paths). On Thursday, June 25, 2020 10:59am, "David P. Reed" said: > Correction to my comment below. > On Thursday, June 25, 2020 10:45am, "David P. Reed" > said: > >> [Sorry: this is resent because my mailer included HTML, rejected by LKML] >> Thanks for the response, Sean ... I had thought everyone was too busy to >> follow >> up >> from the first version. >> >> I confess I'm not sure why this should be broken up into a patch series, >> given >> that it is so very small and is all aimed at the same category of bug. >> >> And the "emergency" path pre-existed, I didn't want to propose removing it, >> since >> I assumed it was there for a reason. I didn't want to include my own >> judgement as >> to whether there should only be one path. (I'm pretty sure I didn't find a >> VMXOFF >> in KVM separately from the instance in this include file, but I will check). > Just checked. Yes, the kvm code's handling of VMXOFF is separate, and though > it > uses exception masking, seems to do other things, perhaps related to nested > KVM, > but I haven't studied the deep logic of KVM nesting. > >> >> A question: if I make it a series, I have to test each patch doesn't break >> something individually, in order to handle the case where one patch is >> accepted >> and the others are not. Do I need to test each individual patch thoroughly >> as an >> independent patch against all those cases? >> I know the combination don't break anything and fixes the issues I've >> discovered >> by testing all combinations (and I've done some thorough testing of panics, >> oopses >> crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash >> source >> to verify the fix fixes the problem's manifestations and to verify that it >> doesn't >> break any of the working paths. >> >> That said, I'm willing to do a v3 "series" based on these suggestions if it >> will >> smooth its acceptance. If it's not going to get accepted after doing that, my >> motivation is flagging. >> On Thursday, June 25, 2020 2:06am, "Sean Christopherson" >> said: >> >> >> >>> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote: >>> > -/** Disable VMX on the current CPU >>> > +/* Disable VMX on the current CPU >>> > * >>> > - * vmxoff causes a undefined-opcode exception if vmxon was not run >>> > - * on the CPU previously. Only call this function if you know VMX >>> > - * is enabled. >>> > + * vmxoff causes an undefined-opcode exception if vmxon was not run >>> > + * on the CPU previously. Only call this function directly if you know >>> > VMX >>> > + * is enabled *and* CPU is in VMX root operation. >>> > */ >>> > static inline void cpu_vmxoff(void) >>> > { >>> > - asm volatile ("vmxoff"); >>> > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on >>> > success >>> */ >>> > cr4_clear_bits(X86_CR4_VMXE); >>> > } >>> > >>> > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void) >>> > return __read_cr4() & X86_CR4_VMXE; >>> > } >>> > >>> > -/** Disable VMX if it is enabled on the current CPU >>> > - * >>> > - * You shouldn't call this if cpu_has_vmx() returns 0. >>> > +/* >>> > + * Safely disable VMX root operation if active >>> > + * Note that if CPU is not in VMX root operation this >>> > + * VMXOFF will fault an undefined operation fault, >>> > + * so use the exception masking facility to handle that RARE >>> > + * case. >
Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash
Correction to my comment below. On Thursday, June 25, 2020 10:45am, "David P. Reed" said: > [Sorry: this is resent because my mailer included HTML, rejected by LKML] > Thanks for the response, Sean ... I had thought everyone was too busy to > follow up > from the first version. > > I confess I'm not sure why this should be broken up into a patch series, given > that it is so very small and is all aimed at the same category of bug. > > And the "emergency" path pre-existed, I didn't want to propose removing it, > since > I assumed it was there for a reason. I didn't want to include my own > judgement as > to whether there should only be one path. (I'm pretty sure I didn't find a > VMXOFF > in KVM separately from the instance in this include file, but I will check). Just checked. Yes, the kvm code's handling of VMXOFF is separate, and though it uses exception masking, seems to do other things, perhaps related to nested KVM, but I haven't studied the deep logic of KVM nesting. > > A question: if I make it a series, I have to test each patch doesn't break > something individually, in order to handle the case where one patch is > accepted > and the others are not. Do I need to test each individual patch thoroughly as > an > independent patch against all those cases? > I know the combination don't break anything and fixes the issues I've > discovered > by testing all combinations (and I've done some thorough testing of panics, > oopses > crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash > source > to verify the fix fixes the problem's manifestations and to verify that it > doesn't > break any of the working paths. > > That said, I'm willing to do a v3 "series" based on these suggestions if it > will > smooth its acceptance. If it's not going to get accepted after doing that, my > motivation is flagging. > On Thursday, June 25, 2020 2:06am, "Sean Christopherson" > said: > > > >> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote: >> > -/** Disable VMX on the current CPU >> > +/* Disable VMX on the current CPU >> > * >> > - * vmxoff causes a undefined-opcode exception if vmxon was not run >> > - * on the CPU previously. Only call this function if you know VMX >> > - * is enabled. >> > + * vmxoff causes an undefined-opcode exception if vmxon was not run >> > + * on the CPU previously. Only call this function directly if you know VMX >> > + * is enabled *and* CPU is in VMX root operation. >> > */ >> > static inline void cpu_vmxoff(void) >> > { >> > - asm volatile ("vmxoff"); >> > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on >> > success >> */ >> > cr4_clear_bits(X86_CR4_VMXE); >> > } >> > >> > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void) >> > return __read_cr4() & X86_CR4_VMXE; >> > } >> > >> > -/** Disable VMX if it is enabled on the current CPU >> > - * >> > - * You shouldn't call this if cpu_has_vmx() returns 0. >> > +/* >> > + * Safely disable VMX root operation if active >> > + * Note that if CPU is not in VMX root operation this >> > + * VMXOFF will fault an undefined operation fault, >> > + * so use the exception masking facility to handle that RARE >> > + * case. >> > + * You shouldn't call this directly if cpu_has_vmx() returns 0 >> > + */ >> > +static inline void cpu_vmxoff_safe(void) >> > +{ >> > + asm volatile("1:vmxoff\n\t" /* clears all flags on success */ >> >> Eh, I wouldn't bother with the comment, there are a million other caveats >> with VMXOFF that are far more interesting. >> >> > + "2:\n\t" >> > + _ASM_EXTABLE(1b, 2b) >> > + ::: "cc", "memory"); >> >> Adding the memory and flags clobber should be a separate patch. >> >> > + cr4_clear_bits(X86_CR4_VMXE); >> > +} >> >> >> I don't see any value in safe/unsafe variants. The only in-kernel user of >> VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF >> helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add >> the exception fixup to cpu_vmxoff() and call it good. >> >> > + >> > +/* >> > + * Force disable VMX if it is enabled on the current CPU, >> > + * when it is unk
Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash
[Sorry: this is resent because my mailer included HTML, rejected by LKML] Thanks for the response, Sean ... I had thought everyone was too busy to follow up from the first version. I confess I'm not sure why this should be broken up into a patch series, given that it is so very small and is all aimed at the same category of bug. And the "emergency" path pre-existed, I didn't want to propose removing it, since I assumed it was there for a reason. I didn't want to include my own judgement as to whether there should only be one path. (I'm pretty sure I didn't find a VMXOFF in KVM separately from the instance in this include file, but I will check). A question: if I make it a series, I have to test each patch doesn't break something individually, in order to handle the case where one patch is accepted and the others are not. Do I need to test each individual patch thoroughly as an independent patch against all those cases? I know the combination don't break anything and fixes the issues I've discovered by testing all combinations (and I've done some thorough testing of panics, oopses crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash source to verify the fix fixes the problem's manifestations and to verify that it doesn't break any of the working paths. That said, I'm willing to do a v3 "series" based on these suggestions if it will smooth its acceptance. If it's not going to get accepted after doing that, my motivation is flagging. On Thursday, June 25, 2020 2:06am, "Sean Christopherson" said: > On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote: > > -/** Disable VMX on the current CPU > > +/* Disable VMX on the current CPU > > * > > - * vmxoff causes a undefined-opcode exception if vmxon was not run > > - * on the CPU previously. Only call this function if you know VMX > > - * is enabled. > > + * vmxoff causes an undefined-opcode exception if vmxon was not run > > + * on the CPU previously. Only call this function directly if you know VMX > > + * is enabled *and* CPU is in VMX root operation. > > */ > > static inline void cpu_vmxoff(void) > > { > > - asm volatile ("vmxoff"); > > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success > */ > > cr4_clear_bits(X86_CR4_VMXE); > > } > > > > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void) > > return __read_cr4() & X86_CR4_VMXE; > > } > > > > -/** Disable VMX if it is enabled on the current CPU > > - * > > - * You shouldn't call this if cpu_has_vmx() returns 0. > > +/* > > + * Safely disable VMX root operation if active > > + * Note that if CPU is not in VMX root operation this > > + * VMXOFF will fault an undefined operation fault, > > + * so use the exception masking facility to handle that RARE > > + * case. > > + * You shouldn't call this directly if cpu_has_vmx() returns 0 > > + */ > > +static inline void cpu_vmxoff_safe(void) > > +{ > > + asm volatile("1:vmxoff\n\t" /* clears all flags on success */ > > Eh, I wouldn't bother with the comment, there are a million other caveats > with VMXOFF that are far more interesting. > > > + "2:\n\t" > > + _ASM_EXTABLE(1b, 2b) > > + ::: "cc", "memory"); > > Adding the memory and flags clobber should be a separate patch. > > > + cr4_clear_bits(X86_CR4_VMXE); > > +} > > > I don't see any value in safe/unsafe variants. The only in-kernel user of > VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF > helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add > the exception fixup to cpu_vmxoff() and call it good. > > > + > > +/* > > + * Force disable VMX if it is enabled on the current CPU, > > + * when it is unknown whether CPU is in VMX operation. > > */ > > static inline void __cpu_emergency_vmxoff(void) > > { > > - if (cpu_vmx_enabled()) > > - cpu_vmxoff(); > > + if (!cpu_vmx_enabled()) > > + return; > > + cpu_vmxoff_safe(); > > Unnecessary churn. > > > } > > > > -/** Disable VMX if it is supported and enabled on the current CPU > > +/* Force disable VMX if it is supported on current CPU > > */ > > static inline void cpu_emergency_vmxoff(void) > > { > > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c > > index e040ba6be27b..b0e6b106a67e 100644 > > --- a/arch/x86/kernel/reboot.c > > +++ b/arch/x86/kernel/reboot.c > > @
[PATCH v2] Fix undefined operation VMXOFF during reboot and crash
If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is done on all CPUS, to allow the INIT IPI to function, since INIT is suppressed when CPUs are in VMX root operation. Problem is that VMXOFF will causes undefined operation fault when CPU not in VMX operation, that is, VMXON has not been executed yet, or VMXOFF has been execute but VMX still enabled. Patch makes the reboot work more reliably by masking the exception on VMXOFF in the crash/panic/reboot path, which uses cpu_emergency_vmxoff(). Can happen with KVM due to a race, but that race is rare today. Problem discovered doing out-of-tree x-visor development that uses VMX in a novel way for kernel performance analysis. The logic in reboot.c is also corrected, since the point of forcing the processor out of VMX root operation is to allow the INIT signal to be unmasked. See Intel SDM section on differences between VMX Root operation and normal operation. Thus every CPU must be forced out of VMX operation. Since the CPU may hang rather if INIT fails than restart, a manual hardware "reset" is the only way out of this state in a lights-out datacenter (well, if there is a BMC, it can issue a hardware RESET to the chip). Style errors in original file fixed, at request of Randy Dunlap: eliminate '/**' in non-kernel-doc comments. Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> Reported-by: David P. Reed Reported-by: Randy Dunlap Suggested-by: Thomas Gleixner Suggested-by: Sean Christopherson Suggested-by: Andy Lutomirski Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 40 -- arch/x86/kernel/reboot.c | 13 +++ 2 files changed, 32 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 9aad0e0876fb..ed22c1983da8 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -30,15 +30,15 @@ static inline int cpu_has_vmx(void) } -/** Disable VMX on the current CPU +/* Disable VMX on the current CPU * - * vmxoff causes a undefined-opcode exception if vmxon was not run - * on the CPU previously. Only call this function if you know VMX - * is enabled. + * vmxoff causes an undefined-opcode exception if vmxon was not run + * on the CPU previously. Only call this function directly if you know VMX + * is enabled *and* CPU is in VMX root operation. */ static inline void cpu_vmxoff(void) { - asm volatile ("vmxoff"); + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success */ cr4_clear_bits(X86_CR4_VMXE); } @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/** Disable VMX if it is enabled on the current CPU - * - * You shouldn't call this if cpu_has_vmx() returns 0. +/* + * Safely disable VMX root operation if active + * Note that if CPU is not in VMX root operation this + * VMXOFF will fault an undefined operation fault, + * so use the exception masking facility to handle that RARE + * case. + * You shouldn't call this directly if cpu_has_vmx() returns 0 + */ +static inline void cpu_vmxoff_safe(void) +{ + asm volatile("1:vmxoff\n\t" /* clears all flags on success */ + "2:\n\t" +_ASM_EXTABLE(1b, 2b) +::: "cc", "memory"); + cr4_clear_bits(X86_CR4_VMXE); +} + +/* + * Force disable VMX if it is enabled on the current CPU, + * when it is unknown whether CPU is in VMX operation. */ static inline void __cpu_emergency_vmxoff(void) { - if (cpu_vmx_enabled()) - cpu_vmxoff(); + if (!cpu_vmx_enabled()) + return; + cpu_vmxoff_safe(); } -/** Disable VMX if it is supported and enabled on the current CPU +/* Force disable VMX if it is supported on current CPU */ static inline void cpu_emergency_vmxoff(void) { diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index e040ba6be27b..b0e6b106a67e 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void) * * For safety, we will avoid running the nmi_shootdown_cpus() * stuff unnecessarily, but we don't have a way to check -* if other CPUs have VMX enabled. So we will call it only if the -* CPU we are running on has VMX enabled. -* -* We will miss cases where VMX is not enabled on all CPUs. This -* shouldn't do much harm because KVM always enable VMX on all -* CPUs anyway. But we can miss it on the small window where KVM -* is still enabling VMX. +* if other CPUs have VMX enabled. */ - if (cpu_has_vmx() && cpu_vmx_enabled()) { + if (cpu_has_vmx()) { /* Disable VMX on this
[PATCH v2] Fix undefined operation VMXOFF during reboot and crash
If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is done on all CPUS, to allow the INIT IPI to function, since INIT is suppressed when CPUs are in VMX root operation. Problem is that VMXOFF will causes undefined operation fault when CPU not in VMX operation, that is, VMXON has not been executed yet, or VMXOFF has been execute but VMX still enabled. Patch makes the reboot work more reliably by masking the exception on VMXOFF in the crash/panic/reboot path, which uses cpu_emergency_vmxoff(). Can happen with KVM due to a race, but that race is rare today. Problem discovered doing out-of-tree x-visor development that uses VMX in a novel way for kernel performance analysis. The logic in reboot.c is also corrected, since the point of forcing the processor out of VMX root operation is to allow the INIT signal to be unmasked. See Intel SDM section on differences between VMX Root operation and normal operation. Thus every CPU must be forced out of VMX operation. Since the CPU may hang rather if INIT fails than restart, a manual hardware "reset" is the only way out of this state in a lights-out datacenter (well, if there is a BMC, it can issue a hardware RESET to the chip). Style errors in original file fixed, at request of Randy Dunlap: eliminate '/**' in non-kernel-doc comments. Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067> Reported-by: David P. Reed Reported-by: Randy Dunlap Suggested-by: Thomas Gleixner Suggested-by: Sean Christopherson Suggested-by: Andy Lutomirski Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 40 -- arch/x86/kernel/reboot.c | 13 +++ 2 files changed, 32 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 9aad0e0876fb..ed22c1983da8 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -30,15 +30,15 @@ static inline int cpu_has_vmx(void) } -/** Disable VMX on the current CPU +/* Disable VMX on the current CPU * - * vmxoff causes a undefined-opcode exception if vmxon was not run - * on the CPU previously. Only call this function if you know VMX - * is enabled. + * vmxoff causes an undefined-opcode exception if vmxon was not run + * on the CPU previously. Only call this function directly if you know VMX + * is enabled *and* CPU is in VMX root operation. */ static inline void cpu_vmxoff(void) { - asm volatile ("vmxoff"); + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success */ cr4_clear_bits(X86_CR4_VMXE); } @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/** Disable VMX if it is enabled on the current CPU - * - * You shouldn't call this if cpu_has_vmx() returns 0. +/* + * Safely disable VMX root operation if active + * Note that if CPU is not in VMX root operation this + * VMXOFF will fault an undefined operation fault, + * so use the exception masking facility to handle that RARE + * case. + * You shouldn't call this directly if cpu_has_vmx() returns 0 + */ +static inline void cpu_vmxoff_safe(void) +{ + asm volatile("1:vmxoff\n\t" /* clears all flags on success */ + "2:\n\t" +_ASM_EXTABLE(1b, 2b) +::: "cc", "memory"); + cr4_clear_bits(X86_CR4_VMXE); +} + +/* + * Force disable VMX if it is enabled on the current CPU, + * when it is unknown whether CPU is in VMX operation. */ static inline void __cpu_emergency_vmxoff(void) { - if (cpu_vmx_enabled()) - cpu_vmxoff(); + if (!cpu_vmx_enabled()) + return; + cpu_vmxoff_safe(); } -/** Disable VMX if it is supported and enabled on the current CPU +/* Force disable VMX if it is supported on current CPU */ static inline void cpu_emergency_vmxoff(void) { diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index e040ba6be27b..b0e6b106a67e 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void) * * For safety, we will avoid running the nmi_shootdown_cpus() * stuff unnecessarily, but we don't have a way to check -* if other CPUs have VMX enabled. So we will call it only if the -* CPU we are running on has VMX enabled. -* -* We will miss cases where VMX is not enabled on all CPUs. This -* shouldn't do much harm because KVM always enable VMX on all -* CPUs anyway. But we can miss it on the small window where KVM -* is still enabling VMX. +* if other CPUs have VMX enabled. */ - if (cpu_has_vmx() && cpu_vmx_enabled()) { + if (cpu_has_vmx()) { /* Disable VMX on this
[PATCH] Fix undefined operation VMXOFF during reboot and crash
If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is done on all CPUS, to allow the INIT IPI to function, since INIT is suppressed when CPUs are in VMX root operation. However, VMXOFF causes an undefined operation fault if the CPU is not in VMX operation, that is, VMXON has not been executed, or VMXOFF has been executed, but VMX is enabled. This fix makes the reboot work more reliably by modifying the #UD handler to skip the VMXOFF if VMX is enabled on the CPU and the VMXOFF is executed as part of cpu_emergency_vmxoff(). The logic in reboot.c is also corrected, since the point of forcing the processor out of VMX root operation is because when VMX root operation is enabled, the processor INIT signal is always masked. See Intel SDM section on differences between VMX Root operation and normal operation. Thus every CPU must be forced out of VMX operation. Since the CPU will hang rather than restart, a manual "reset" is the only way out of this state (or if there is a BMC, it can issue a RESET to the chip). Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 24 arch/x86/kernel/reboot.c | 13 ++--- arch/x86/kernel/traps.c| 52 -- 3 files changed, 71 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 9aad0e0876fb..ea2d67191684 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -13,12 +13,16 @@ #ifndef _ASM_X86_VIRTEX_H #define _ASM_X86_VIRTEX_H +#include + #include #include #include #include +DECLARE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff); + /* * VMX functions: */ @@ -33,8 +37,8 @@ static inline int cpu_has_vmx(void) /** Disable VMX on the current CPU * * vmxoff causes a undefined-opcode exception if vmxon was not run - * on the CPU previously. Only call this function if you know VMX - * is enabled. + * on the CPU previously. Only call this function directly if you know VMX + * is enabled *and* CPU is in VMX root operation. */ static inline void cpu_vmxoff(void) { @@ -47,17 +51,25 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/** Disable VMX if it is enabled on the current CPU +/** Force disable VMX if it is enabled on the current CPU. + * Note that if CPU is not in VMX root operation this + * VMXOFF will fault an undefined operation fault. + * So the 'doing_emergency_vmxoff' percpu flag is set, + * the trap handler for just restarts execution after + * the VMXOFF instruction. * - * You shouldn't call this if cpu_has_vmx() returns 0. + * You shouldn't call this directly if cpu_has_vmx() returns 0. */ static inline void __cpu_emergency_vmxoff(void) { - if (cpu_vmx_enabled()) + if (cpu_vmx_enabled()) { + this_cpu_write(doing_emergency_vmxoff, 1); cpu_vmxoff(); + this_cpu_write(doing_emergency_vmxoff, 0); + } } -/** Disable VMX if it is supported and enabled on the current CPU +/** Force disable VMX if it is supported and enabled on the current CPU */ static inline void cpu_emergency_vmxoff(void) { diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 3ca43be4f9cf..abc8b51a57c7 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void) * * For safety, we will avoid running the nmi_shootdown_cpus() * stuff unnecessarily, but we don't have a way to check -* if other CPUs have VMX enabled. So we will call it only if the -* CPU we are running on has VMX enabled. -* -* We will miss cases where VMX is not enabled on all CPUs. This -* shouldn't do much harm because KVM always enable VMX on all -* CPUs anyway. But we can miss it on the small window where KVM -* is still enabling VMX. +* if other CPUs have VMX enabled. */ - if (cpu_has_vmx() && cpu_vmx_enabled()) { + if (cpu_has_vmx()) { /* Disable VMX on this CPU. */ - cpu_vmxoff(); + cpu_emergency_vmxoff(); /* Halt and disable VMX on the other CPUs */ nmi_shootdown_cpus(vmxoff_nmi); - } } diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 4cc541051994..2dcf57ef467e 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -59,6 +60,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -70,6 +72,8 @@ #include #endif +DEFINE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff) = 0; + DECLARE_BITMAP(system_vectors, NR_VECTORS); static inline void cond_local_irq_enable(struct pt_regs *regs) @@ -115,6 +119,43 @@ int fixup_bu
Re: [linux-kernel] Re: [PATCH] x86: use explicit timing delay for pit accesses in kernel and pcspkr driver
Actually, disparaging things as "one idiotic system" doesn't seem like a long-term thoughtful process - it's not even accurate. There are more such systems that are running code today than the total number of 486 systems ever manufactured. The production rate is $1M/month. a) ENE chips are "documented" to receive port 80, and also it is the case that modern chipsets will happily diagnose writes to non-existent ports as MCE's. Using side effects that depend on non-existent ports just creates a brittle failure mode down the road. And it's not just post ACPI "initialization". The pcspkr use of port 80 caused solid freezes if you typed "tab" to complete a command line and there were more than one choice, leading to beeps. b) sad to say, Linux is not what hardware vendors use as the system that their BIOSes MUST work with. That's Windows, and Windows, whether we like it or not does not require hardware vendors to stay away from port 80. IMHO, calling something "idiotic" is hardly evidence-based decision making. Maybe you love to hate Microsoft, but until Intel writes an architecture standard that says explicitly that a "standard PC" must not use port 80 for any peripheral, the port 80 thing is folklore, and one that is solely Linux-defined. Rene Herman wrote: On 20-02-08 18:05, H. Peter Anvin wrote: Rene Herman wrote: _Something_ like this would seem to be the only remaining option. It seems fairly unuseful to #ifdef around that switch statement for kernels without support for the earlier families, but if you insist... "Only remaining option" other than the one we've had all along. Even on the one idiotic set of systems which break, it only breaks post-ACPI intialization, IIRC. Linus vetoed the DMI switch. Rene. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: use explicit timing delay for pit accesses in kernel and pcspkr driver
x86: use explicit timing delay for pit accesses in kernel and pcspkr driver pit accesses in i8253.c and pcspkr driver use outb_p for timing. Fix them to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. Since the pcspkr driver accesses PIT registers directly, it should also use outb_pit, which is inlined, so does not need bo exported. Explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT. Fix pcspkr driver to use the new outb_pic call properly, use named PIC port values rather than hex constants, and drop its use of inb_p and outb_p in accessing port 61h where it has never been needed. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev { unsigned int count = 0; unsigned long flags; + unsigned char port61; if (type != EV_SND) return -1; @@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(&i8253_lock, flags); + port61 = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(port61 | 3, 0x61); /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb_pit(0xB6, PIT_MODE); /* select desired HZ */ - outb_p(count & 0xff, 0x42); - outb((count >> 8) & 0xff, 0x42); + outb_pit(count & 0xff, PIT_CH2); + outb((count >> 8) & 0xff, PIT_CH2); } else { /* disable counter 2 */ - outb(inb_p(0x61) & 0xFC, 0x61); + outb(port61 & 0xFC, 0x61); } spin_unlock_irqrestore(&i8253_lock, flags); Index: linux-2.6/include/asm-x86/i8253.h === --- linux-2.6.orig/include/asm-x86/i8253.h +++ linux-2.6/include/asm-x86/i8253.h @@ -12,7 +12,25 @@ extern struct clock_event_device *global extern void setup_pit_timer(void); -#define inb_pitinb_p -#define outb_pit outb_p +/* accesses to PIT registers need careful delays on some platforms. Define + them here in a common place */ +static inline unsigned char inb_pit(unsigned int port) +{ + /* delay for some accesses to PIT on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +static inline void outb_pit(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIT on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + + #endif /* __ASM_I8253_H__ */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: define outb_pic and inb_pic to stop using outb_p and inb_p
x86: define outb_pic and inb_pic to stop using outb_p and inb_p The delay between io port accesses to the PIC is now defined using outb_pic and inb_pic. This fix provides the next step, using udelay(2) to define the *PIC specific* timing requirements, rather than on bus-oriented timing, which is not well calibrated. Again, the primary reason for fixing this is to use proper delay strategy, and in particular to fix crashes that can result from using port 80 writes on machines that have resources on port 80, such as the ENE chips used by Quanta in latops it designs and sells to, e.g. HP. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/include/asm-x86/i8259.h === --- linux-2.6.orig/include/asm-x86/i8259.h +++ linux-2.6/include/asm-x86/i8259.h @@ -29,7 +29,23 @@ extern void enable_8259A_irq(unsigned in extern void disable_8259A_irq(unsigned int irq); extern unsigned int startup_8259A_irq(unsigned int irq); -#define inb_picinb_p -#define outb_pic outb_p +/* the PIC may need a careful delay on some platforms, hence specific calls */ +static inline unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +static inline void outb_pic(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + #endif /* __ASM_I8259_H__ */ -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [patch 1/2] x86: define outb_pic and inb_pic to stop using outb_p and inb_p
Alan Cox wrote: +unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} inline it. Its almost no instructions Will do. Assume you desire inlining of the outb_pic, and also the inb_pit and outb_pit routines. Didn't do it because the code is slightly bigger than the call. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] x86: revised - use explicit timing delay for pit accesses
x86: revised - use explicit timing delay for pit accesses pit accesses in i8253.c and pcspkr driver use outb_p for timing. Fix them to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. Since the pcspkr driver accesses PIT registers directly, it needs the symbol outb_pit exported, so it can be built as a module. Explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT. Fix pcspkr driver to use the new outb_pic call properly, use named PIC port values rather than hex constants, and drop its use of inb_p and outb_p in accessing port 61h where it has never been needed. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev { unsigned int count = 0; unsigned long flags; + unsigned char port61; if (type != EV_SND) return -1; @@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(&i8253_lock, flags); + port61 = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(port61 | 3, 0x61); /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb_pit(0xB6, PIT_MODE); /* select desired HZ */ - outb_p(count & 0xff, 0x42); - outb((count >> 8) & 0xff, 0x42); + outb_pit(count & 0xff, PIT_CH2); + outb((count >> 8) & 0xff, PIT_CH2); } else { /* disable counter 2 */ - outb(inb_p(0x61) & 0xFC, 0x61); + outb(port61 & 0xFC, 0x61); } spin_unlock_irqrestore(&i8253_lock, flags); Index: linux-2.6/arch/x86/kernel/i8253.c === --- linux-2.6.orig/arch/x86/kernel/i8253.c +++ linux-2.6/arch/x86/kernel/i8253.c @@ -31,6 +31,29 @@ static inline void pit_disable_clocksour struct clock_event_device *global_clock_event; /* + * define the PIT specific port access routines, which define the timing + * needed by the PIT registers on some platforms. + */ +unsigned char inb_pit(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} +EXPORT_SYMBOL(inb_pit); + +void outb_pit(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} +EXPORT_SYMBOL(outb_pit); + +/* * Initialize the PIT timer. * * This is also called after resume to bring the PIT into operation again. Index: linux-2.6/include/asm-x86/i8253.h === --- linux-2.6.orig/include/asm-x86/i8253.h +++ linux-2.6/include/asm-x86/i8253.h @@ -12,7 +12,9 @@ extern struct clock_event_device *global extern void setup_pit_timer(void); -#define inb_pitinb_p -#define outb_pit outb_p +/* accesses to PIT registers need careful delays on some platforms. Define + them here in a common place */ +extern unsigned char inb_pit(unsigned int port); +extern void outb_pit(unsigned char value, unsigned int port); #endif /* __ASM_I8253_H__ */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] [patch 2/2] x86: use explicit timing delay for pit accesses
Oops. the patch I just submitted for i8253.c didn't export the symbol needed by the pcspkr driver to build it as a module. I will send the revised patch shortly. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/2] x86: use explicit timing delay for pit accesses
pit accesses in i8253.c and pcspkr driver use outb_p for timing. Fix them to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. The explicit timing delay is only needed in pcspkr for accesses to the 8253 PIT. Fix pcspkr driver to use the new outb_pic call properly, use named port values rather than hex constants, and drop its use of inb_p and outb_p in accessing port 61h where it has never been needed. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -36,6 +36,7 @@ static int pcspkr_event(struct input_dev { unsigned int count = 0; unsigned long flags; + unsigned char port61; if (type != EV_SND) return -1; @@ -51,17 +52,18 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(&i8253_lock, flags); + port61 = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(port61 | 3, 0x61); /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb_pit(0xB6, PIT_MODE); /* select desired HZ */ - outb_p(count & 0xff, 0x42); - outb((count >> 8) & 0xff, 0x42); + outb_pit(count & 0xff, PIT_CH2); + outb((count >> 8) & 0xff, PIT_CH2); } else { /* disable counter 2 */ - outb(inb_p(0x61) & 0xFC, 0x61); + outb(port61 & 0xFC, 0x61); } spin_unlock_irqrestore(&i8253_lock, flags); Index: linux-2.6/arch/x86/kernel/i8253.c === --- linux-2.6.orig/arch/x86/kernel/i8253.c +++ linux-2.6/arch/x86/kernel/i8253.c @@ -31,6 +31,27 @@ static inline void pit_disable_clocksour struct clock_event_device *global_clock_event; /* + * define the PIT specific port access routines, which define the timing + * needed by the PIT registers on some platforms. + */ +unsigned char inb_pit(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +void outb_pit(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + +/* * Initialize the PIT timer. * * This is also called after resume to bring the PIT into operation again. Index: linux-2.6/include/asm-x86/i8253.h === --- linux-2.6.orig/include/asm-x86/i8253.h +++ linux-2.6/include/asm-x86/i8253.h @@ -12,7 +12,9 @@ extern struct clock_event_device *global extern void setup_pit_timer(void); -#define inb_pitinb_p -#define outb_pit outb_p +/* accesses to PIT registers need careful delays on some platforms. Define + them here in a common place */ +extern unsigned char inb_pit(unsigned int port); +extern void outb_pit(unsigned char value, unsigned int port); #endif /* __ASM_I8253_H__ */ -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/2] x86: define outb_pic and inb_pic to stop using outb_p and inb_p
The delay between io port accesses to the PIC is now defined using outb_pic and inb_pic. This fix provides the next step, using udelay(2) to define the *PIC specific* timing requirements, rather than on bus-oriented timing, which is not well calibrated. Again, the primary reason for fixing this is to use proper delay strategy, and in particular to fix crashes that can result from using port 80 writes on machines that have resources on port 80, such as the ENE chips used by Quanta in latops it designs and sells to, e.g. HP. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/arch/x86/kernel/i8259_32.c === --- linux-2.6.orig/arch/x86/kernel/i8259_32.c +++ linux-2.6/arch/x86/kernel/i8259_32.c @@ -277,6 +277,23 @@ static int __init i8259A_init_sysfs(void device_initcall(i8259A_init_sysfs); +unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +void outb_pic(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + void init_8259A(int auto_eoi) { unsigned long flags; Index: linux-2.6/arch/x86/kernel/i8259_64.c === --- linux-2.6.orig/arch/x86/kernel/i8259_64.c +++ linux-2.6/arch/x86/kernel/i8259_64.c @@ -347,6 +347,23 @@ static int __init i8259A_init_sysfs(void device_initcall(i8259A_init_sysfs); +unsigned char inb_pic(unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + unsigned char value = inb(port); + udelay(2); + return value; +} + +void outb_pic(unsigned char value, unsigned int port) +{ + /* delay for some accesses to PIC on motherboard or in chipset must be + at least one microsecond, but be safe here. */ + outb(value, port); + udelay(2); +} + void init_8259A(int auto_eoi) { unsigned long flags; Index: linux-2.6/include/asm-x86/i8259.h === --- linux-2.6.orig/include/asm-x86/i8259.h +++ linux-2.6/include/asm-x86/i8259.h @@ -28,8 +28,8 @@ extern void init_8259A(int auto_eoi); extern void enable_8259A_irq(unsigned int irq); extern void disable_8259A_irq(unsigned int irq); extern unsigned int startup_8259A_irq(unsigned int irq); - -#define inb_picinb_p -#define outb_pic outb_p +/* the PIC may need a careful delay on some platforms, hence specific calls */ +extern unsigned char inb_pic(unsigned int port); +extern void outb_pic(unsigned char value, unsigned int port); #endif /* __ASM_I8259_H__ */ -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/2] replacement submission for motherboard/chipset iodelay fixes
Here are the two revised patches based on Alan Cox's NAK's and suggestions regarding using the _pic and _pit versions of inb/outb. The new patches use udelay(2) as a conservative delay for pic and pit, and isolate that usage in the respective i8253.c and i8259_*.c files. Together with the already ack'ed patch for CMOS rtc (not included here) these should solve the problem with modern machines, the ones that don't use older devices, but only motherboard/chipset resources. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH 1/3] x86: fix init_8259A() to not use outb_pic
Rene Herman wrote: On 17-02-08 23:25, Alan Cox wrote: On Sun, 17 Feb 2008 16:56:28 -0500 (EST) "David P. Reed" <[EMAIL PROTECTED]> wrote: fix init_8259A() which initializes the 8259 PIC to not use outb_pic, which is a renamed version of outb_p, and delete outb_pic define. NAK The entire point of inb_pic/outb_pic is to isolate the various methods and keep the logic for delays in one place. Undoing this just creates a nasty mess. Quite probably inb_pic/outb_pic will end up as static inlines that do inb or outb with a udelay of 1 or 2 but that is where the knowledge belongs. Additional NAK in sofar that the PIC delays were reported to be necesary with some VIA chipsets earlier in these threads. Rene. This not being a place where performance matters, I will submit a new patch that changes inb_pic and outb_pic to use udelay(2). However, note that init_8259A does not use these consistently in its own accesses to the PIC registers. Should I change it to use the _pic calls whereever it touches the PIC registers to be conservative? Note that there is a udelay(100) after the registers are all setup, perhaps this is the real VIA requirement... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] x86: fix pcspkr to not use inb_p/outb_p calls.
Fix pcspkr driver to use explicit timing delay for access to PIT, rather than inb_p/outb_p calls, which use insufficiently explicit delays (defaulting to port 80 writes) that can cause freeze problems on some machines, such as Quanta moterboard machines using ENE EC's. The explicit timing delay is only needed for accesses to the 8253 PIT. The standard requirement for the 8253 to respond to successive writes is 1 microsecond. The 8253 has never been on the expansion bus, so a proper delay has nothing to do with expansion bus timing, but instead its internal logic's capability to react to input. Since udelay is correctly calibrated by the time the pcspkr driver is initialized, we use 1 microsecond as the timing. Also shorten lines to less than 80 characters. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/drivers/input/misc/pcspkr.c === --- linux-2.6.orig/drivers/input/misc/pcspkr.c +++ linux-2.6/drivers/input/misc/pcspkr.c @@ -32,9 +32,11 @@ MODULE_ALIAS("platform:pcspkr"); static DEFINE_SPINLOCK(i8253_lock); #endif -static int pcspkr_event(struct input_dev *dev, unsigned int type, unsigned int code, int value) +static int pcspkr_event(struct input_dev *dev, unsigned int type, + unsigned int code, int value) { unsigned int count = 0; + unsigned char mask; unsigned long flags; if (type != EV_SND) @@ -51,17 +53,21 @@ static int pcspkr_event(struct input_dev spin_lock_irqsave(&i8253_lock, flags); + mask = inb(0x61); if (count) { /* enable counter 2 */ - outb_p(inb_p(0x61) | 3, 0x61); + outb(mask | 3, 0x61); + /* some 8253's may require 1 usec. between accesses */ /* set command for counter 2, 2 byte write */ - outb_p(0xB6, 0x43); + outb(0xB6, 0x43); + udelay(1); /* select desired HZ */ - outb_p(count & 0xff, 0x42); + outb(count & 0xff, 0x42); + udelay(1); outb((count >> 8) & 0xff, 0x42); } else { /* disable counter 2 */ - outb(inb_p(0x61) & 0xFC, 0x61); + outb(mask & 0xFC, 0x61); } spin_unlock_irqrestore(&i8253_lock, flags); -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] x86: fix init_8259A() to not use outb_pic
fix init_8259A() which initializes the 8259 PIC to not use outb_pic, which is a renamed version of outb_p, and delete outb_pic define. There is already code in the .c files that does accesses to CMD & IMR registers in successive outb() calls without _p. Thus the outb_p is obviously not needed, if it ever was. Research into chipset documentation and old BIOS listings shows that IODELAY was not used even in early machines. Thus the delay between i/o port writes was deleted for the 8259. Again, the primary reason for fixing this is to use proper delay strategy, and in particular to fix crashes that can result from using port 80 writes on machines that have resources on port 80, such as the ENE chips used by Quanta in latops it designs and sells to, e.g. HP. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/arch/x86/kernel/i8259_32.c === --- linux-2.6.orig/arch/x86/kernel/i8259_32.c +++ linux-2.6/arch/x86/kernel/i8259_32.c @@ -285,24 +285,30 @@ void init_8259A(int auto_eoi) spin_lock_irqsave(&i8259A_lock, flags); - outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */ - outb(0xff, PIC_SLAVE_IMR); /* mask all of 8259A-2 */ - - /* -* outb_pic - this has to work on a wide range of PC hardware. -*/ - outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */ - outb_pic(0x20 + 0, PIC_MASTER_IMR); /* ICW2: 8259A-1 IR0-7 mapped to 0x20-0x27 */ - outb_pic(1U << PIC_CASCADE_IR, PIC_MASTER_IMR); /* 8259A-1 (the master) has a slave on IR2 */ + /* mask all of 8259A-1 */ + outb(0xff, PIC_MASTER_IMR); + /* mask all of 8259A-2 */ + outb(0xff, PIC_SLAVE_IMR); + + /* ICW1: select 8259A-1 init */ + outb(0x11, PIC_MASTER_CMD); + /* ICW2: 8259A-1 IR0-7 mapped to 0x20-0x27 */ + outb(0x20 + 0, PIC_MASTER_IMR); + /* 8259A-1 (the master) has a slave on IR2 */ + outb(1U << PIC_CASCADE_IR, PIC_MASTER_IMR); if (auto_eoi) /* master does Auto EOI */ - outb_pic(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); else/* master expects normal EOI */ - outb_pic(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); - outb_pic(0x11, PIC_SLAVE_CMD); /* ICW1: select 8259A-2 init */ - outb_pic(0x20 + 8, PIC_SLAVE_IMR); /* ICW2: 8259A-2 IR0-7 mapped to 0x28-0x2f */ - outb_pic(PIC_CASCADE_IR, PIC_SLAVE_IMR);/* 8259A-2 is a slave on master's IR2 */ - outb_pic(SLAVE_ICW4_DEFAULT, PIC_SLAVE_IMR); /* (slave's support for AEOI in flat mode is to be investigated) */ + /* ICW1: select 8259A-2 init */ + outb(0x11, PIC_SLAVE_CMD); + /* ICW2: 8259A-2 IR0-7 mapped to 0x28-0x2f */ + outb(0x20 + 8, PIC_SLAVE_IMR); + /* 8259A-2 is a slave on master's IR2 */ + outb(PIC_CASCADE_IR, PIC_SLAVE_IMR); + /* (slave's support for AEOI in flat mode is to be investigated) */ + outb(SLAVE_ICW4_DEFAULT, PIC_SLAVE_IMR); if (auto_eoi) /* * In AEOI mode we just have to mask the interrupt Index: linux-2.6/arch/x86/kernel/i8259_64.c === --- linux-2.6.orig/arch/x86/kernel/i8259_64.c +++ linux-2.6/arch/x86/kernel/i8259_64.c @@ -355,29 +355,30 @@ void init_8259A(int auto_eoi) spin_lock_irqsave(&i8259A_lock, flags); - outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */ - outb(0xff, PIC_SLAVE_IMR); /* mask all of 8259A-2 */ + /* mask all of 8259A-1 */ + outb(0xff, PIC_MASTER_IMR); + /* mask all of 8259A-2 */ + outb(0xff, PIC_SLAVE_IMR); - /* -* outb_pic - this has to work on a wide range of PC hardware. -*/ - outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */ + /* ICW1: select 8259A-1 init */ + outb(0x11, PIC_MASTER_CMD); /* ICW2: 8259A-1 IR0-7 mapped to 0x30-0x37 */ - outb_pic(IRQ0_VECTOR, PIC_MASTER_IMR); + outb(IRQ0_VECTOR, PIC_MASTER_IMR); /* 8259A-1 (the master) has a slave on IR2 */ - outb_pic(0x04, PIC_MASTER_IMR); + outb(0x04, PIC_MASTER_IMR); if (auto_eoi) /* master does Auto EOI */ - outb_pic(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT | PIC_ICW4_AEOI, PIC_MASTER_IMR); else/* master expects normal EOI */ - outb_pic(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); + outb(MASTER_ICW4_DEFAULT, PIC_MASTER_IMR); - outb_pic(0x11, PIC_SLAVE_CMD); /* ICW1: select 8259A-2 init */ + /* ICW1: select 8259A-2 init */ + outb(0x11, PIC_SLAVE_CMD); /* ICW2: 825
[PATCH 2/3] x86: fix cmos read and write to not use inb_p and outb_p
fix code to access CMOS rtc registers so that it does not use inb_p and outb_p routines, which are deprecated. Extensive research on all known CMOS RTC chipset timing shows that there is no need for a delay in accessing the registers of these chips even on old machines. These chipa are never on an expansion bus, but have always been "motherboard" resources, either in the processor chipset or explicitly on the motherboard, and they are not part of the ISA/LPC or PCI buses, so delays should not be based on bus timing. The reason to fix it: 1) port 80 writes often hang some laptops that use ENE EC chipsets, esp. those designed and manufactured by Quanta for HP; 2) RTC accesses are timing sensitive, and extra microseconds may matter; 3) the new "io_delay" function is calibrated by expansion bus timing needs, thus is not appropriate for access to CMOS rtc registers. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/arch/x86/kernel/rtc.c === --- linux-2.6.orig/arch/x86/kernel/rtc.c +++ linux-2.6/arch/x86/kernel/rtc.c @@ -151,8 +151,8 @@ unsigned char rtc_cmos_read(unsigned cha unsigned char val; lock_cmos_prefix(addr); - outb_p(addr, RTC_PORT(0)); - val = inb_p(RTC_PORT(1)); + outb(addr, RTC_PORT(0)); + val = inb(RTC_PORT(1)); lock_cmos_suffix(addr); return val; } @@ -161,8 +161,8 @@ EXPORT_SYMBOL(rtc_cmos_read); void rtc_cmos_write(unsigned char val, unsigned char addr) { lock_cmos_prefix(addr); - outb_p(addr, RTC_PORT(0)); - outb_p(val, RTC_PORT(1)); + outb(addr, RTC_PORT(0)); + outb(val, RTC_PORT(1)); lock_cmos_suffix(addr); } EXPORT_SYMBOL(rtc_cmos_write); -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] x86: cleanup primary motherboard chip port access delays
cleanup motherboard chip io port delays. inb_p and outb_p have traditionally used a write to port 80 (a non-existent port) as a delay. Though there is an argument that that is a good delay for devices on the ISA or PCI expansion buses it is not a good mechanism for devices in the processor chipset or on the "motherboard". The write to port 80 at best causes an abort on the ISA or LPC bus, and on some machines (like many of the HP laptops manufactured by Quanta) actually writes data to real i/o devices. For example, the ENE Embedded Controller chip family defaults to provide a register at port 80 that can be written, and which can cause an interrupt in the Embedded Controller. This has been shown to cause hangs on some machines, especially in accessing the CMOS RTC during bootup. This patch series addresses three of the places where these are used in common kernel code - in particular the three uses that affect the HP laptops mentioned above, modifying the delays to match the worst known delay issues for the specific chips. The patch set is complementary to the iodelay= kernel parameter added in 2.6.25, since it means fewer users will need to add that parameter to run linux "out of the box" without hanging. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
David Woodhouse wrote: On Fri, 2008-01-11 at 09:35 -0500, David P. Reed wrote: Using any "unused port" for a delay means that the machine check feature is wasted and utterly unusable. Not entirely unusable. You can recover silently from the machine check if it was one of the known accesses to the 'unused port'. It certainly achieves a delay :) I'm sure that's what the driver writers had in mind. ;-) And I think we probably have a great shot at getting Intel, Microsoft, HP, et al.. to add a feature for Linux to one of the ACPI table specifications that define an "unused port for delay purposes" field in the ACPI 4.0 spec, and retrofit it into PC/104 machine BIOSes. At least Microsoft doesn't have a patent on using port 80 for delay purposes. :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: bus abort on the LPC bus". More problematic is that I would think some people might want to turn on the AMD feature that generates machine checks if a bus timeout happens. The whole point of machine checks is An ISA/LPC bus timeout is fulfilled by the bridge so doesn't cause an MCE. Good possibility, but the documentation on HyperTransport suggests otherwise, even for LPC bridges in this particular modern world of AMD64. I might do the experiment someday to see if my LPC bridge is implemented in a way that does or doesn't support enabling MCE's. Could be different between Intel and AMD - I haven't had reason to pore over the Intel chipset specs, since my poking into all this stuff has been driven by my personal machine's issues, and it's not got any Intel compatible parts. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Rene Herman wrote: On 11-01-08 02:36, Zachary Amsden wrote: FWIW, I fixed the problem locally by recompiling, changing port 80 to port 84 in io.h; works great, and doesn't conflict with any occupied ports. Might not give you a "proper" delay though. 0xed should be a better choice... I don't think there is any magic here. I modified the patch to do *no delay at all* in the io_delay "quirk" and have been running reliably for weeks including the very heavy I/O load that comes from using software RAID on this nice laptop that has two separate SATA drives! This particular laptop has no problematic devices - the only problem is actually in the CMOS_READ and CMOS_WRITE macros that *use* the _p operations in a way that is unnecessary on this machine. (in fact, it would be hard to add a problematic device - there's no PCMCIA slot either, and so every option is USB or Firewire). Using 0xED happens to work, but it's not guaranteed to work either. There is not a "standard" for an "unused port that is mapped to cause a bus abort on the LPC bus". More problematic is that I would think some people might want to turn on the AMD feature that generates machine checks if a bus timeout happens. The whole point of machine checks is to allow the machine to be more reliable. Using any "unused port" for a delay means that the machine check feature is wasted and utterly unusable. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Rene Herman wrote: On 10-01-08 01:37, Robert Hancock wrote: I agree. In this case the BIOS on these laptops is trying to tell us "port 80 is used for our purposes, do not touch it". We should be listening. Listening is fine but what are you going to do after you have listened? Right, not use port 0x80 but since that's the current idea anyway outside of legacy drivers, you don't actually need to listen. If the quirk-to-0xed or similar was to stay, it's a much better switching point than DMI strings but if not, it's not actually important. Well, I was just suggesting a warning that would come up when a driver that still used port 80 was initialized... I think that is what Alan Cox and others suggest for legacy drivers that have worked - I agree that it may not be the right thing to screw them up, especially since my laptop, and probably most machines that might start using port 80 or other "legacy ports" won't ever need those drivers. I thought more about a complete solution last night. A clean idea that really fits the linux design might be the following outline of a patch. I suspect it might seem far less ugly, and probably would meet Alan Cox's needs, too - I am very sympathetic about not breaking 8390's, etc. Define a "motherboard resources" driver that claims all the resources defined for PNP0C02 devices during the pnp process. I think Windoze actually does something quite similar. This would claim port 80. Define an iodelay driver. This driver exists largely to claim port 80 for the iodelay operation (you could even define an option for other ports). Legacy drivers would be modified to require iodelay. The iodelay driver would set up the iodelay mechanism to do something other than the "boot time" default - which could be no delay, or udelay. It would also set a flag that says "_b operations are safe". Put a WARN_ONCE() in the in/out*_b operations that checks the flag that is set by the iodelay driver. This would only trigger in the case that 80 or whatever was reserved by some other device driver - such as the motherboard resources driver above. Modern machines won't do that. Finally, anything that happens before the motherboard resources and iodelay drivers are initialized cannot use in*_p or out*_p (whether they can be loadable modules rather than built in is a question). This is a very small set, and I believe with the exception of the PIT (8253/4) are very safe. Note that this idea is also compatible with rewriting all drivers to use "iodelay()" explicitly instead of _p(). But it doesn't require that. Rene. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Zachary Amsden wrote: According to Phoenix Technologies book "System BIOS for IBM PCs, Compatibles and EISA Computers, 2nd Edition", the I/O port list gives port 0080h R/W Extra page register (temporary storage) Despite looking, I've never seen it documented anywhere else, but I believe it works on just about every PC platform. Except, apparently, my laptop. The port 80 problem was discovered by me, after months of "bisecting" the running code around a problem with hanging when using hwclock in 64-bit mode when ACPI is on. So it kills my laptop, too, and many currentlaptop motherboards designed by Quanta for HP and Compaq (dv6000, dv9000, tx1000, apparently) In the last couple of weeks, I was able with luck to discover that the problem is the ENE KB3920 chip, which is the "big brother" of the KB3700 chip included in the OLPC XO "$100 laptop" made also by Quanta. I verified this by taking my laptop apart - a fun and risky experience. Didn't break any connectors, but I don't recommend it for those who are not experienced disassembling laptops and cellphones, etc. The KB3920 contains an EC, an SMBus, a KBC, some watchdog timers, and a variety of other functions that keep the laptop going, coordinating the relationships among various peripherals. The firmware is part standard from ENE, part OEM-specific, in this case coded by Quanta or a BIOS subcontractor. You can read the specsheet for the KB3700 online at laptop.org, since the specs of the laptop are "open". The 3920's spec is confidential. And the firmware is confidential as well for both the 3700 and 3920. Clues as to what it does can be gleaned by reading the disassembler output of the DSDT code in the particular laptops - though the SMM BIOS probably also talks to it. Modern machines have many subsystems, and the ACPI and SMBIOS coordinate to run them; blade servers also have drawer controllers and backplane management buses. The part that runs Linux is only part of the machine. Your laptop isn't an aberration. It's part of the new generation of evolved machines that take advantage of the capabilities of ACPI and SMBIOS and DMI standards that are becoming core parts of the market. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Christer Weinigel wrote: Did I miss anyting? Nothing that seems *crucial* going forward for Linux. The fate of "legacy machines" is really important to get right. I have a small suggestion in mind that might be helpful in the future: the "motherboard resources" discovered as PNP0C02 devices in their _CRS settings in ACPI during ACPI PnP startup should be reserved (or checked), and any drivers that still use port 80 implicitly should reserve that port. This may be too late in the boot process to make a decision not to use port 80, and it doesn't help decide a strategy to use an alternate port (0xED happens to "work" on the dv9000 machines in the sense that it generates a bus timeout on LPC, but there is no guarantee that 0xED is free on any particular motherboard, and "unusedness" is not declared in any BIOS/ACPI tables) or to use a udelay-based iodelay (but there is nothing in the BIOS tables that suggest the right delays for various I/O ports if any modern parts need them...which I question, but can't prove a negative in general). However, doing the reservations on such resources could generate a warning that would help diagnose new current and future designs including devices like the ENE KB3920 that have a port that is defaulted to port 80 and routed to the EC for functions that the firmware and ACPI can agree to do. Or any other ports used in new ways and properly notified to the OS via the now-standard Wintel BIOS functions. I don't know if /proc/ioports is being maintained, but the fact that it doesn't contain all of those PNP0C02 resources known on my machine seems to be a bug - which I am happy to code a patch or two for as a contribution back to Linux, if it isn't on the way out as the /sys hierarchy does a better job. /sys/bus/pnp/... does get built properly and has port 80 described properly - not as a DMA port, but as a port in use by device 05:00, which is the motherboard resource catchall. Thus the patch would be small. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Christer Weinigel wrote: Argument by personal authority. Thats good. There is no other kind of argument. Are you claiming supernatural authority drives your typing fingers, or is your argument based on what you think you know? I have piles of code that I wrote, spec sheets (now that I'm back in my home office), code that others wrote at the time, and documentation from vendors that come from my personal experiences. That doesn't mean I'm always right - always happy to learn something new. Just don't condescend to a 55 year old who has been writing operating systems, compilers, and designing hardware for almost 40 years professionally (yes, I got my first job at 16 writing FORTRAN code to simulate hydrodynamic systems). I guess that's why you don't seem to understand the difference between reading the serial port status register and not being allowed to access a register at all due to such this as the 4 cycle delay you quoted yourself from the 8390 data sheet, If you read what I said carefully, I said that the 8390 was a very special case. The "chip select" problem it experienced was pretty much unique among boards of the time. Those of us who looked at its design and had any experience designing hardware for buses like the unibus or even the buses on PDP-8's and DG machines thought it had to be a joke. Of course it saved money per board, so it beat the 3Com boards on price - and you could program it after a fashion. So it involved "cheaping out". The normal timing problem was that an out or in operation to a board or chip required some time to elapse before the chip performed the side effects internally so that the next operation to it would have an effect. This is exactly the reason why most chips and boards are designed to either have a polling of a flag indicate operation completion. The serial "buffer empty" flag is the simplest possible explanatory example of such handshaking that came to mind (writing a character to a serial output device twice often leads to surprises, unless you wait for the previous character to clock out). See my comment on RTC below, for a more complex to explain example. and similar issues with the I8253 that I quoted from its data sheet a few posts ago. The 8253 was a motherboard chip. I am not sure it had any timing problems with its electrical signalling. I just don't remember. The spec sheet doesn't say it's internal state can get scrambled. I was thinking of another timer, the RTC which is usually a part of the Super I/O. The RTC has very well documented timing requirements. But none of the spec sheets, nor my experience with it, mention electrical issues that prevented back-to-back port operations. The documented timing requirements have to do with the state during the time it ticks over internally once per second. But it is carefully designed to have a flag that is "on" during 244 microseconds prior to and covering the time it is unsafe to read the registers. That design is special because it is designed to operate when the machine is powered off, so it has two internal clock domains, one of which is used in "low power" mode and is very slow to minimize power. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Christer Weinigel wrote: There is no need to use io writes to supposedly/theoretically "unused ports" to make drivers work on any bus. ISA included! You can, for example, wait for an ISA bus serial adapter to put out its next character by looping reading the port that has the output buffer full flag in a tight loop, with no delay code at all. And if you need to time things, just call a timing loop subroutine that you calibrate at boot time. Now you're totally confusing things. You're talking about looking at bits in a register to see if a transmit register is empty. That's easy. The delays needed for the Intel M8259 and M8253 say that you're not even allowed to access the registers _at_ _all_ for some time after a register access. If you do a write to a register immediately followed by any access, including a read of the status register, you can corrupt the state of the chip. Not true. Even on the original IBM 5150 PC, the 8259 on the motherboard accepted back to back OUT and IN instructions, and it would NOT trash the chip state. You can read the original IBM BIOS code if you like. I don't remember about the 8253's timing. I doubt the chip's state would be corrupted in any way. The data and address lines were the same data and address lines that the microprocessor used to access memory - it didn't "hold" the lines stable any longer than the OUT instruction. And the Intel chips are not the only ones with that kind of brain damage. But what makes the 8259 and 8253 a big problem is that every modern PC has a descendant of those chips in them. Register compatible. Not the same chips or even the same masks or timing requirements. The discrete Intel chips or clones got aggregated into Super I/O chips, and the Super I/O chips were put on a LPC bus (an ISA bus with another name) or integrated into the southbrige. Don't try to teach your grandmother to suck eggs: I've been programming PC compatibles since probably before you were able to do long division - including writing code on the first prototype IBM PCs, the first pre-manufacturing PC-ATs, and zillions of clones. (and I was also involved in designing hardware including the so-called "Lotus Intel" expanded memory cards and the original PC cards) The 8259 PIC is an *interrupt controller*. It was NEVER present in a Super I/O chip, or an LPC chip. Its functionality was absorbed into the chipsets that control interrupt mapping, like the PIIX and the nForce. And the "if it ain't broken, don't fix it" mantra probably means that some modern chipsets are still using exactly the same internal design as the 25 year old chips and will still be subject to some of those ancient limitations. Oh, come on. Give the VLSI designers some credit for brains. The CAD tools used to design the 8259 and 8253 were so primitive you couldn't even get a chip manufactured with designs from that era today. When people design chips today they do it with simulators that can't even work, and testers that run from test suites that were not available at the time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan - I dug up a DP83901A SNIC datasheet in a quick Google search, while that wasn't the only such chip, it was one of them. I can forward the PDF (the www.alldatasheet.com site dynamically creates the download URL), if anyone wants it. The relevant passage says, in regard to delaying between checking the CRDA addresses to see if a dummy "remote read" has been executed., and in regard perhaps to other card IO register loops: TIME BETWEEN CHIP SELECTS The SNIC requires that successive chip selects be no closer than 4 bus clocks (BSCK) together. If the condition is violat- ed the SNIC may glitch ACK. CPUs that operate from pipe- lined instructions (i e 386) or have a cache (i e 486) can execute consecutive I O cycles very quickly The solution is to delay the execution of consecutive I O cycles by either breaking the pipeline or forcing the CPU to access outside its cache. The NE2000 as I recall had no special logic on the board to protect the chip from successive chip selects that were too close - which is the reason for the problem. Clearly an out to port 80 takes more than 4 ISA bus clocks, so that works if the NE2000 is on the ISA bus, On the other hand, there are other ways to delay more than 4 ISA bus clocks. And as you say, one needs a delay for this chip that relates to the chip's card's bus's clock speed, not absolute time. Alan Cox wrote: As well you should. I am honestly curious (for my own satisfaction) as to what the natsemi docs say the delay code should do (can't imagine they say "use io port 80 because it is unused"). I don't have any They say you must allow 4 bus clocks for the address decode. They don't deal with the ISA side as the chip itself has no ISA glue. copies anymore. But mere curiosity on my part is not worth spending a lot of time on - I know you are super busy. If there's a copy online at a URL ... Not that I know of. There may be. A good general source of info is Russ Nelson's old DOS packet driver collection. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: The natsemi docs here say otherwise. I trust them not you. As well you should. I am honestly curious (for my own satisfaction) as to what the natsemi docs say the delay code should do (can't imagine they say "use io port 80 because it is unused"). I don't have any copies anymore. But mere curiosity on my part is not worth spending a lot of time on - I know you are super busy. If there's a copy online at a URL ... The problem is that certain people, unfortunately those who know nothing about ISA related bus systems, keep trying to confuse ISA delay logic with core chip logic and end up trying to solve both a problem and a non-problem in one, creating a nasty mess in the process. I agree that the problems of chip logic and ISA delay are all tangled up, probably more than need be. I hope that the solution turns out to simplify matters, and hopefully to document the intention of the resulting code sections a bit more clearly for the future. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Ondrej Zary wrote: On Tuesday 08 January 2008 18:24:02 David P. Reed wrote: Windows these days does delays with timing loops or the scheduler. It doesn't use a "port". Also, Windows XP only supports machines that tend not to have timing problems that use delays. Instead, if a device takes a while to respond, it has a "busy bit" in some port or memory slot that can be tested. Windows XP can run on a machine with ISA slot(s) and has built-in drivers for some plug&play ISA cards - e.g. the famous 3Com EtherLink III. I think that there's a driver for NE2000-compatible cards too and it probably works. There is no need to use io writes to supposedly/theoretically "unused ports" to make drivers work on any bus. ISA included! You can, for example, wait for an ISA bus serial adapter to put out its next character by looping reading the port that has the output buffer full flag in a tight loop, with no delay code at all. And if you need to time things, just call a timing loop subroutine that you calibrate at boot time. I wrote DOS drivers for NE2000's on the ISA bus when they were brand new designs from Novell without such kludges as writes to I/O port 80. I don't remember writing a driver for the 3com devices - probably didn't, because 3Com's cards were expensive at the time. In any case, Linux *did* adopt this port 80 strategy - I'm sure all concerned thought it was frightfully clever at the time. Linus expressed his skepticism in the comments in io.h. The problem is to safely move away from it toward a proper strategy that doesn't depend on "bus aborts" which would trigger machine checks if they were properly enabled. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Windows these days does delays with timing loops or the scheduler. It doesn't use a "port". Also, Windows XP only supports machines that tend not to have timing problems that use delays. Instead, if a device takes a while to respond, it has a "busy bit" in some port or memory slot that can be tested. Almost all of the issues in Linux where _p operations are used are (or should be) historical - IMO. Ondrej Zary wrote: On Tuesday 08 January 2008 02:38:15 David P. Reed wrote: H. Peter Anvin wrote: And shoot the designer of this particular microcontroller firmware. Well, some days I want to shoot the "designer" of the entire Wintel architecture... it's not exactly "designed" by anybody of course, and today it's created largely by a collection of Taiwanese and Chinese ODM firms, coupled with Microsoft WinHEC and Intel folks. At least they follow the rules and their ACPI and BIOS code say that they are using port 80 very clearly if you use PnP and ACPI properly. And in the old days, you were "supposed" to use the system BIOS to talk to things like the PIT that had timing issues, not write your own code. Does anyone know what port does Windows use? I'm pretty sure that it isn't 80h as I run Windows 98 often with port 80h debug card inserted. The last POST code set by BIOS usually remains on the display and only changes when BIOS does something like suspend/resume. IIRC, there was a program that was able to display temperature from onboard sensors on the port 80h display that's integrated on some mainboards. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
The last time I heard of a 12 MHz bus in a PC system was in the days of the PC-AT, when some clone makers sped up their buses (pre PCI!!!) in an attempt to allow adapter card *memory* to run at the 12 MHz speed. This caused so many industry-wide problems with adapter cards that couldn't be installed in certain machines and still run reliably that the industry learned a lesson. That doesn't mean that LPCs don't run at 12 MHz, but if they do, they don't have old 8 bit punky cards plugged into them for lots of practical reasons. (I have whole drawers full of such old cards, trying to figure out an environmentally responsible way to get rid of them - even third world countries would be fools to make machiens with them). I can't believe that we are not supporting today's machines correctly because we are still trying to be compatible with a few (at most a hundre thousand were manufactured! Much less still functioning or running Linux) machines. Now I understand that PC/104 machines and other things are very non PC compatible, but are x86 processor architectures. Do they even run x86 under 2.6.24? Perhaps the rational solution here is to declare x86 the architecture for "relics" and develop a merged architecture called "modern machines" to include only those PCs that have been made to work since, say, the release of (cough) WIndows 2000? Bodo Eggert wrote: On Tue, 8 Jan 2008, Rene Herman wrote: On 08-01-08 00:24, H. Peter Anvin wrote: Rene Herman wrote: Is this only about the ones then left for things like legacy PIC and PIT? Does anyone care about just sticking in a udelay(2) (or 1) there as a replacement and call it a day? PIT is problematic because the PIT may be necessary for udelay setup. Yes, can initialise loops_per_jiffy conservatively. Just didn't quite get why you guys are talking about an ISA bus speed parameter. If the ISA bus is below 8 MHz, we might need a longer delay. If we default to the longer delay, the delay will be too long for more than 99,99 % of all systems, not counting i586+. Especially if the driver is fine-tuned to give maximum throughput, this may be bad. OTOH, the DOS drivers I heared about use delays and would break on underclocked ISA busses if the n * ISA_HZ delay was needed. Maybe somebody having a configurable ISA bus speed and some problematic chips can test it ... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
H. Peter Anvin wrote: And shoot the designer of this particular microcontroller firmware. Well, some days I want to shoot the "designer" of the entire Wintel architecture... it's not exactly "designed" by anybody of course, and today it's created largely by a collection of Taiwanese and Chinese ODM firms, coupled with Microsoft WinHEC and Intel folks. At least they follow the rules and their ACPI and BIOS code say that they are using port 80 very clearly if you use PnP and ACPI properly. And in the old days, you were "supposed" to use the system BIOS to talk to things like the PIT that had timing issues, not write your own code. Or perhaps the ACPI spec should specify a timing loop spec and precisely specify the desired timing after accessing an I/O port till that device has properly "acted" on that operation. The idea that Port 80 was "unused" and appropriate for delay purposes elicited skepticism by Linus that is recorded for posterity in the comments of the relevant Linux include files - especially since it was clearly "used" for non-delay purposes, by cards that could be plugged into a PCI (fast), not just an 8-bit ISA, bus. Perhaps we should declare the world of ACPI systems a separate "arch" from the world of l'ancien regime where folklore about which ports were used for what ruled. I lived through those old days, and they were not wonderful, either. The world sucks, and Linux is supposed to be able to adapt to that world, suckitude and all. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
On another topic. I have indeed determined what device uses port 80 on Quanta AMD64 laptops from HP. I had lunch with Jim Gettys of OLPC a week ago; he's an old friend since he worked on the original X windows system. After telling him my story about port 80, he mentioned that the OLPC XO machine had some issues with port 80 which was by design handled by the ENE KBC device on its motherboard. He said the ENE was a very desirable chipset for AMD designs recommended by Quanta. Richard Smith of OLPC explained to me how the KB3700 they use works, and that they use the KB3700 to send POST codes out over a serial link during boot up. This gave me a reason to take apart my laptop, to discover that it has an ENE KB3920 B0 as its EC and KBC. The port interface for the KB3920 includes listening to port 80 which is then made available to firmware on the EC. It is recognized and decoded on the LPC bus, only for writes, and optionally can generate an interrupt in the 8051. Dumping both the ENE chip, and looking at the DSDT.dsl for my machine, I discovered that port 80 is used as an additional parameter for various DSDT methods that communicate to the EC, when it is operating in ACPI mode. More work is in progress as I play around with this. But the key thing is that ACPI and perhaps SMM both use port 80 as part of the base function of the chipset. And actually, if I had looked at the /sys/bus/pnp definitions, rather than /proc/ioports, I would have noticed that port 80 was part of a PNP0C02 resource set. That means exactly one thing: ACPI says that port 80 is NOT free to be used, for delays or anything else. This should make no difference here: it's just one more reason to stop using port 80 for delays on modern machines. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
H. Peter Anvin wrote: Rene Herman wrote: Is this only about the ones then left for things like legacy PIC and PIT? Does anyone care about just sticking in a udelay(2) (or 1) there as a replacement and call it a day? PIT is problematic because the PIT may be necessary for udelay setup. The PIT usage for calibrating the delay loop can be moderated, if need by, by using the PC BIOS which by definition uses the PIT correctly it its int 15 function 83 call.. Just do it before coming up in a state where the PC BIOS int 15h calls no longer work. I gave code to do this in a much earlier message. This is the MOST reliable way to use the PIT early in boot, on a PC compatible. God knows how one should do it on a Macintosh running a 386/20 :-). But the ONLY old bat-PIT machines are, thank god, PC compatible, maybe. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-kernel] Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
FYI - another quirky Quanta motherboard from HP, with DMI readings reported to me. Original Message Date: Wed, 2 Jan 2008 16:23:27 +1030 From: Joel Stanley <[EMAIL PROTECTED]> To: David P. Reed <[EMAIL PROTECTED]> Subject:Re: [PATCH] Option to disable AMD C1E (allows dynticks to work) On Dec 30, 2007 1:13 AM, David P. Reed <[EMAIL PROTECTED]> wrote: I have also attached a c program that only touches port 80. Compile it for 32-bit mode (see comment), run it as root, and after two or three runs, it will hang a system that has the port 80 bug. Using port80.c, I could hard lock a HP Pavilion tx1000 laptop on the first go. This was with ubuntu hardy's stock kernel (a 2.6.24-rc) dmidecode -s baseboard-manufacturer dmidecode -s baseboard-product-name Quanta 30BF Tonight, I will try compiling a kernel with these values added to your patch. Some history, feel free to ignore if it's not relevant: ubuntu feisty's 2.6.22 based kernel worked fine, irc. We were having issues with sound, so tried fedora8's .23 based kernel, but this would sporadically hard lock. Ubuntu hardy's 2.6.24 appeared fine, for the 2 hours or so I used it last night, until using the port80.c program, obviously. Cheers, Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: That does imply some muppet 'extended' the debug interface for power management on your laptop. Also pretty much proves that for such systems we do have to move from port 0x80 to another delay approach. Alan - in googling around the net yesterday looking for SuperIO chipsets that claim to support port 80, I have found that "blade" servers from companies like IBM and HP *claim* to have a system for monitoring port 80 diagnostic codes and sending them to the "drawer" management processor through a management backplane. This is a little puzzling, because you'd think they would have noticed port 80 issues, since they run Linux in their systems. Maybe not hangs, but it seems unhelpful to have a lot of noise spewing over a bus that is supposed to provide "management" diagnostics. Anyway, what I did not find was whether there was a particular chipset that provided that port 80 feature on those machines. However, if it's a common "cell" in a design, it may have leaked into the notebook market chipsets too. Anyone know if the Linux kernels used on blade servers have been patched to not do the port 80 things? I don't think this would break anything there, but it might have been a helpful patch for their purposes. I don't do blades personally or at work (I focus on mobile devices these days, and my personal servers are discrete), so I have no knowledge. It could be that the blade servers have BIOSes that don't do POST codes over port 80, but send them directly to the "drawer" management bus, of course. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Pavel Machek wrote: 2. there is some "meaning" to certain byte values being written (the _PTS and _WAK use of arguments that come from callers to store into port 80 makes me suspicious.) That might mean that the freeze happens only when certain values are written, or when they are written closely in time to some other action - being used to communicate something to the There's nothing easier than always writing 0 to the 0x80 to check if it hangs in such case...? Pavel I did try that. Machine in question does hang when you write 0 to 0x80 in a loop a few thousand times. This particular suspicion was that the problem was caused by the following sort of thing (it's a multi-cpu system...) First, some ACPI code writes "meaningful value" X to port 80 that is sort of a "parameter" to whatever follows. Just because the DSDT disassembly *calls* it the DBUG port doesn't mean it is *only* used for debugging. We (Linux) use it for timing delays, after all... then Linux driver writes some random value (!=X) including zero to port 80. then ACPI writes some other values that cause SMI or some other thing to happen, There are experiments that are not so simple that could rule this particular guess out. I have them on my queue of experiments I might try (locking out ACPI). Of course if the BIOS were GPL, we could look at the comments, etc... I may today pull the laptop apart to see if I can see what chips are on it, besides the nvidia chipset and the processor. That might give a clue as to what SuperIO or other logic chips are there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Alan Cox wrote: responds to reads differently than "unused" ports. In particular, an inb takes 1/2 the elapsed time compared to a read to "known" unused port 0xed - 792 tsc ticks for port 80 compared to about 1450 tsc ticks for port 0xed and other unused ports (tsc at 800 MHz). Well at least we know where the port is now - thats too fast for an LPC bus device, so it must be an SMI trap. Only easy way to find out is to use the debugging event counters and see how many instruction cycles are issued as part of the 0x80 port. If its suprisingly high then you've got a firmware bug and can go spank HP. Alan, thank you for the pointers. I have been doing variations on this testing theme for a while - I get intrigued by a good debugging challenge, and after all it's my machine... Two relevant new data points, and then some more suggestions: 1. It appears to be a real port. SMI traps are not happening in the normal outb to 80. Hundreds of them execute perfectly with the expected instruction counts. If I can trace the particular event that creates the hard freeze (getting really creative, here) and stop before the freeze disables the entire computer, I will. That may be an SMI, or perhaps any other kind of interrupt or exception. Maybe someone knows how to safely trace through an impending SMI while doing printk's or something? 2. It appears to be the standard POST diagnostic port. On a whim, I disassembled my DSDT code, and studied it more closely. It turns out that there are a bunch of "Store(..., DBUG)" instructions scattered throughout, and when you look at what DBUG is defined as, it is defined as an IO Port at IO address DBGP, which is a 1-byte value = 0x80. So the ACPI BIOS thinks it has something to do with debugging. There's a little strangeness here, however, because the value sent to the port occasionally has something to do with arguments to the ACPI operations relating to sleep and wakeup ... could just be that those arguments are distinctive. In thinking about this, I recognize a couple of things. ACPI is telling us something when it declares a reference to port 80 in its code. It's not telling us the function of this port on this machine, but it is telling us that it is being used by the BIOS. This could be a reason to put out a printk warning message... 'warning: port 80 is used by ACPI BIOS - if you are experiencing problems, you might try an alternate means of iodelay.' Second, it seems likely that there are one of two possible reasons that the port 80 writes cause hang/freezes: 1. buffer overflow in such a device. 2. there is some "meaning" to certain byte values being written (the _PTS and _WAK use of arguments that come from callers to store into port 80 makes me suspicious.) That might mean that the freeze happens only when certain values are written, or when they are written closely in time to some other action - being used to communicate something to the SMM code). If there is some race in when Linux's port 80 writes happen that happen to change the meaning of a request to the hardware or to SMM, then we could be rarely stepping on -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override
H. Peter Anvin wrote: Now, I think there is a specific reason to believe that EGA/VGA (but perhaps not CGA/MDA) didn't need these kinds of hacks: the video cards of the day was touched, directly, by an interminable number of DOS applications. CGA/MDA generally *were not*, due to the unsynchronized memory of the original versions (writing could cause snow), so most applications tended to fall back to using the BIOS access methods for CGA and MDA. A little history... not that it really matters, but some might be interested in a 55-year-old hacker's sentimental recollections...As someone who actually wrote drivers for CGA and MDA on the original IBM PC, I can tell you that back to back I/O *port* writes and reads were perfectly fine. The "snow" problem had nothing to do with I/O ports. It had to do with the memory on the CGA adapter card not being dual ported, and in high-res (80x25) character mode (only!) a CPU read or write access caused a read of the adapter memory by the character-generator to fail, causing one character-position of the current scanline being output to get all random bits, which was then put through the character-generator and generated whatever the character generator did with 8 random bits of character or attributes as an index into the character generator's font table. In particular, the solution in both the BIOS and in Visicalc, 1-2-3, and other products that did NOT use the BIOS or DOS for I/O to the CGA or MDA because they were Dog Slow, was to detect the CGA, and do a *very* tight loop doing "inb" instructions from one of the CGA status registers, looking for a 0-1 transition on the horizontal retrace flag. It would then do a write to display memory with all interrupts locked out, because that was all it could do during the horizontal retrace, given the speed of the processor. One of the hacks I did in those days (I wrote the CGA driver for Visicalc Advanced Version and several other Software Arts programs, some of which were sold to Lotus when they bought our assets, and hired me, in 1985) was to measure the "horizontal retrace time" and the "vertical blanking interval" when the program started, and compile screen-writing code that squeezed as many writes as possible into both horizontal retraces and vertical retraces. That was actually a "selling point" for spreadsheets - the reviewers actually measured whether you could use the down-arrow key in auto-repeat mode and keep the screen scrolling at the relevant rate! That was hard on an 8088 or 80286 processor, with a CGA card. It was great when EGA and VGA came out, but we still had to support the CGA long after. Which is why I fully understand the need not to break old machines. We had to run on every machine that was claimed to be "PC compatible" - many of which were hardly so compatible (the PS/2 model 50 had a completely erroneous serial chip that claimed to emulate the original 8250, but had an immense pile of bugs, for example, that IBM begged ISVs to call a software problem and fix so they didn't get sued). The IBM PC bus (predecessor of the current ISA bus, which came from the PC-AT's 16-bit bus), did just fine electrically - any I/O port-specific timing problems had to do with the timing of the chips attached to the bus. For example, if a bus write to a port was routed into a particular chip, the timing of that chip's subsequent processing might be such that it was not ready to respond to another read or write.) That's not a "signalling" problem - it has nothing to do with capacitance on the bus, e.g., but a functional speed problem in the chip (if on the motherboard) or the adapter card. Rant off. This has nothing, of course, to do with present issues. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)
Richard Harman wrote: I think I may have a monkey wrench to throw into this, I finally got around to testing the C1E patch, and the port80 patch. End result: port80 patch has zero effect on this laptop, and the C1E patch makes it stable. Stating that your system is "stable" is not very definitive. Does hwclock work when full Fedora 8 is running without the port80 patch, or have you disabled the uses of hwclock in your init and shutdown code? Have you set the hwclock setting to use the extremely dangerous "-directisa" option - which hides the problem because it avoids the port 80 i/o? Try compiling and running the test program port80.c a few times. If your machine doesn't hang, it would be interesting to see the results it gives. The C1E patch alone does not fix the port 80 problem several of us have observed. what does dmidecode say for your motherboard vendor and model? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override
Alan Cox wrote: Now what's interesting is that the outb to port 80 is *faster* than an outb to an unused port, on my machine. So there's something there - actually accepting the bus transaction. In the ancient 5150 PC, 80 was Yes and I even told you a while back how to verify where it is. From the timing you get its not on the LPC bus but chipset core so pretty certainly an SMM trap as other systems with the same chipset don't have the bug. Probably all that is needed is a BIOS upgrade Actually, I could see whether it was SMM trapping due to AMD MSR's that would allow such trapping, performance or debug registers. Nothing was set to trap with SMI or other traps on any port outputs. But I'm continuing to investigate for a cause. It would be nice if it were a BIOS-fixable problem. It would be even nicer if the BIOS were GPL... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override
I am so happy that there will be a way for people who don't build their own kernels to run Linux on their HP and Compaq laptops that have problems with gazillions of writes to port 80, and I'm also happy that some of the strange driver code will be cleaned up over time. Thank you all. Some thoughts you all might consider, take or leave, in this process, from an old engineering manager who once had to worry about QA for software on nearly every personal computer model in the 1980-1992 period: You know, there is a class of devices that are defined to use port 0x80... it's that historically useful class of devices that show/record the POST diagnostics. It certainly was not designed for "delay" purposes. In fact, some of those same silly devices are still used in industry during manufacturing test. I wonder what would happen if Windows were not part of manufacturing test, and instead Linux were the "standard" for some category of machines... When I was still working at Lotus in the late '80's, when we still supported machines like 286's, there were lots of problems with timing loops in drivers in applications (even Win 3.0 had some in hard disk drivers, as did some of our printer drivers, ...), as clock speeds continued to ramp. There were major news stories of machines that "crashed when xyz application or zyx peripheral were added". It was Intel, as I recall, that started "publicly" berating companies in the PC industry for using the "two short jumps" solutions, and suggesting that they measure the processor speed at bootup, using the BIOS standard for doing that with the int 15 BIOS elapsed time calls, and always use "calibrated" timing loops. Which all of us who supported device drivers started to do (remember, apps had device drivers in those days for many devices that talked directly with the registers). I was impressed when I dug into Linux eventually, that this operating system "got it right" by measuring the timing during boot and creating a udelay function that really worked! So I have to say, that when I was tracing down the problem that originally kicked off this thread, which was that just accessing the RTC using the standard CMOS_READ macros in a loop caused a hang, that these "outb al,80h" things were there. And I noticed your skeptical comment in the code, Linus. Knowing that there was never in any of the documented RTC chipsets a need for a pause between accesses (going back to my days at Software Arts working on just about every old machine there was...) I changed it on a lark to do no pause at all. And my machine never hung... Now what's interesting is that the outb to port 80 is *faster* than an outb to an unused port, on my machine. So there's something there - actually accepting the bus transaction. In the ancient 5150 PC, 80 was unused because it was the DMA controller port that drove memory refresh, and had no meaning. Now my current hypothesis (not having access to quanta's design specs for a board they designed and have shipped in quantity, or having taken the laptop apart recently) is that there is logic there on port 80, doing something. Perhaps even "POST diagnostic recording" as every PC since the XT has supported... perhaps supporting post-crash dignostics... And that that something has a buffer, perhaps even in the "Embedded Controller" that may need emptying periodically. It takes several tens of thousands of "outb" to port 80 to hang the hardware solid - so something is either rare or overflowing. In any case, if this hypothesis is correct - the hardware may have an erratum, but the hardware is doing a very desirable thing - standardizing on an error mechanism that was already in the "standard" as an option... It's Linux that is using a "standard" in a wrong way (a diagnostic port as a delay). So I say all this, mainly to point out that Linux has done timing loops right (udelay and ndelay) - except one place where there was some skepticism expressed, right there in the code. Linus may have some idea why it was thought important to do an essential delay with a bus transaction that had uncertain timing. My hypothesis is that "community" projects have the danger of "magical theories" and "coolness" overriding careful engineering design practices. Cleaning up that "clever hack" that seemed so good at the time is hugely difficult, especially when the driver writer didn't write down why he used it. Thus I would suggest that the _p functions be deprecated, and if there needs to be a timing-delay after in/out instructions, define in_pause(port, nsec_delay) with an explicit delay. And if the delay is dependent on bus speeds, define a bus-speed ratio calibration. Thus in future driver writing, people will be forced to think clearly about what the timing characteristics of their device on its bus must be. That presupposes that driver writers understand the timing is
Re: [PATCH] Option to disable AMD C1E (allows dynticks to work)
Islam Amer wrote: Hello. I was interested in getting dynticks to work on my compaq presario v6000 to help with the 1 hour thirty minutes battery time, but after this discussion I lost interest. I too had the early boot time hang, and found it was udev triggering the bug. This early boot time hang is *almost certainly* due to the in/out port 80 bug, which I discovered a few weeks ago, which affects hwclock and other I/O device drivers on a number of HP/Compaq machines in exactly this way. The proper fix for this bug is in dispute, and will probably not occur in the 2.6.24 release because it touches code in many, many drivers. The simplest way to test if you have a problem of this sort is to try this shell line as root, after you boot successfully. If your machine hangs hard, you have a problem that really looks like the port 80 problem. for ((i = 0; i < 1000; i = i + 1)); do cat /dev/nvram > /dev/null; done I have also attached a c program that only touches port 80. Compile it for 32-bit mode (see comment), run it as root, and after two or three runs, it will hang a system that has the port 80 bug. If you then run: dmidecode -s baseboard-manufacturer dmidecode -s baseboard-product-name are the values you should plug into the .matches field in the dmi_system_id struct in the attached patch. It would be great if you could do that, test, and post back with those values so they can be accumulated. HP/Compaq machines with quanta m/b's are very popular, and very common - so at least a quirk patch for all the broken models would be worth doing in 2.6.25 or downstream in the distros. The right patches will probably take a long time - there is a dispute as to what the semantics of port 80 writes even mean among the core kernel developers, because the hack is lost in the dim dark days of history, and safe resolution will take time There is also a C1E issue with the BIOS in my machine (an HP Pavilion dv9000z). I don't know if it is a bug, yet, but that's a different problem - associated with dynticks, perhaps. I have to say that researching the AMD Kernel/BIOS docs on C1E (a very new feature in the last year on AMD) leaves me puzzled as to whether the dynticks problem exists on my machine at all, but the patch for it turns off dynticks! Changing the /etc/init.d/udev script so that the line containing /sbin/udevtrigger to /sbin/udevtrigger --subsystem-nomatch="*misc*" seemed to fix things. the hang is triggered specifically by echo add > /sys/class/misc/rtc/uevent after inserting rtc.ko Also using hwclock to set the rtc , will cause a hard hang, if you are using 64bit linux. Disable the init scripts that set the time, or use the 32bit binary, as suggested here : http://www.mail-archive.com/[EMAIL PROTECTED]/msg41964.html I hope this helps. But your hardware is slightly different though. commit c12c7a47b9af87e8d867d5aa0ca5c6bcdd2463da Author: Rene Herman <[EMAIL PROTECTED]> Date: Mon Dec 17 21:23:55 2007 +0100 x86: provide a DMI based port 0x80 I/O delay override. Certain (HP) laptops experience trouble from our port 0x80 I/O delay writes. This patch provides for a DMI based switch to the "alternate diagnostic port" 0xed (as used by some BIOSes as well) for these. David P. Reed confirmed that port 0xed works for him and provides a proper delay. The symptoms of _not_ working are a hanging machine, with "hwclock" use being a direct trigger. Earlier versions of this attempted to simply use udelay(2), with the 2 being a value tested to be a nicely conservative upper-bound with help from many on the linux-kernel mailinglist, but that approach has two problems. First, pre-loops_per_jiffy calibration (which is post PIT init while some implementations of the PIT are actually one of the historically problematic devices that need the delay) udelay() isn't particularly well-defined. We could initialise loops_per_jiffy conservatively (and based on CPU family so as to not unduly delay old machines) which would sort of work, but still leaves: Second, delaying isn't the only effect that a write to port 0x80 has. It's also a PCI posting barrier which some devices may be explicitly or implicitly relying on. Alan Cox did a survey and found evidence that additionally various drivers are racy on SMP without the bus locking outb. Switching to an inb() makes the timing too unpredictable and as such, this DMI based switch should be the safest approach for now. Any more invasive changes should get more rigid testing first. It's moreover only very few machines with the problem and a DMI based hack seems to fit that situation. An early boot parameter to make the choice manually (and override any possible DMI based decision) is also provided:
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Besides the two reports of freezes on bugzilla.kernel.org (9511, 6307), the following two bug reports on bugzilla.redhat.com are almost certainly due to the same cause (imo, of course): 245834, 227234. Ubuntu launchpad bug 158849 also seems to report the same problem, for an HP dv6258se 64-bit machine. Also this one: http://www.mail-archive.com/[EMAIL PROTECTED]/msg10321.html If you want to collect dmidecode data from these folks, perhaps we might get a wider sense of what categories of machines are affected. They all seem to be recemt HP and Compaq AMD64 laptops, probably all Quanta motherboards. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
H. Peter Anvin wrote: David P. Reed wrote: As support: port 80 on the reporter's (my) HP dv9000z laptop clearly responds to reads differently than "unused" ports. In particular, an inb takes 1/2 the elapsed time compared to a read to "known" unused port 0xed - 792 tsc ticks for port 80 compared to about 1450 tsc ticks for port 0xed and other unused ports (tsc at 800 MHz). Any timings for port 0xf0 (write zero), out of curiosity? Here's a bunch of data: port 0xF0: cycles: out 919, in 933 port 0xed: cycles: out 2541, in 2036 port 0x70: cycles: out n/a, in 934 port 0x80: cycles: out 1424, in 795 AMD Turion 64x2 TL-60 CPU running at 800 MHz, nVidia MCP51 chipset, Quanta motherboard. Running 2.6.24-rc5 with Ingo's patch so inb_p, etc. use port 0xed. Note that I can run the port 80 test once, the second time I get the hard freeze. I didn't try writing to port 70 from userspace - that one's dangerous, but the reading of it was included for a timing typical of a chipset supported device. These are all pretty consistent. I find the "read" timing from 0x80 very interesting. The write timeing is also interesting, being faster than an unused port. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
H. Peter Anvin wrote: Rene Herman wrote: I do not know how universal that is, but _reading_ port 0xf0 might in fact be sensible then? And should even work on a 386/387 pair? (I have a 386/387 in fact, although I'd need to dig it up). No. Someone might have used 0xf0 as a readonly port for other uses. As support: port 80 on the reporter's (my) HP dv9000z laptop clearly responds to reads differently than "unused" ports. In particular, an inb takes 1/2 the elapsed time compared to a read to "known" unused port 0xed - 792 tsc ticks for port 80 compared to about 1450 tsc ticks for port 0xed and other unused ports (tsc at 800 MHz). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Ingo - I finished testing the rolled up patch that you provided. It seems to work just fine. Thank you for putting this all together and persevering in this long and complex discussion. Here are the results, on the offending laptop, using 2.6.24-rc5 plus that one patch. First: booted with normal boot parameters (no io_delay=): According to dmesg, 0xed is used. hwclock ran fine, hundreds of times. my shell script loop doing "cat /dev/nvram > /dev/null" ran fine, several times. Running Rene's "port 80" speed test ran fine once, then froze the system hard. (expected) Second: booted with io_delay=0x80, several tests, rebooting after freezes: hwclock froze system hard. (this is the problem that drove me to find this bug). my shell script loop froze system hard. Third: booted with io_delay=none: hwclock ran fine, also hundreds of times. my shell script loop ran fine several times. Running rene's port80 test ran fine twice, froze system hard on third try. Fourth: booted with io_delay=udelay: hwclock ran fine, also hundreds of times. my shell script loop ran fine several times. Running Rene's port80 test ran fine, froze system hard on second try. Analysis: patch works fine, and default to 0xed seems super conservative. I will probably use the boot parameter io_delay=none, because I don't seem to have any I/O devices that require any delays - and this way I can find any that do. Still wondering: what the heck is going on with port 80 on my laptop motherboard. Clearly it "does something". I will in my spare time continue investigating, though having a reliable system is GREAT. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
About to start building and testing. It will take a few hours. Ingo Molnar wrote: here's an updated rollup patch, against 2.6.24-rc4. David, could you please try this? This should work out of box on your system, without any boot option or other tweak needed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Rene Herman wrote: No, most definitely not. Having the user select udelay or none through the kernel config and then the kernel deciding "ah, you know what, I'll know better and use port access anyway" is _utterly_ broken behaviour. Software needs to listen to its master. When acting as an ordinary user, the .config is beyond my control (except on Gentoo). It is in control of the distro (Fedora, Ubuntu, ... but perhaps not Gentoo). I think the distro guys want a default behavior that is set in .config, with quirk overrides being done when needed. And of course the user in his/her boot params gets the final say. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override.
Rene Herman wrote: David: I've plugged in your DMI values in this. Could you perhaps test this to confirm that it works for you? Will test it by tomorrow morning. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.
The process of safely making delicate changes here is beyond my responsibility as just a user - believe me, I'm not suggesting that a risky fix be put in .24. I can patch my own kernels, and I can even share an unofficial patch with others for now, or suggest that Fedora and Ubuntu add it to their downstream. May I make a small suggestion, though. If the decision is a DMI-keyed switch from out-80 to udelay(2) gets put in, perhaps there should also be a way for people to test their own configuration for the underlying problem made available as a script. Though it is a "hack", all you need to freeze a problem system is to run a loop doing about 1000 "cat /dev/nvram > /dev/null" commands. If that leads to a freeze, one might ask to have the motherboard added to the DMI-key list. H. Peter Anvin wrote: Ingo Molnar wrote: * H. Peter Anvin <[EMAIL PROTECTED]> wrote: Pavel Machek wrote: this is also something for v2.6.24 merging. As much as I like this patch, I do not think it is suitable for .24. Too risky, I'd say. No kidding! We're talking about removing a hack that has been successful on thousands of pieces of hardware over 15 years because it ^[*] breaks ONE machine. [*] "- none of which needs it anymore -" there, fixed it for you ;-) So lets keep this in perspective: this is a hack that only helps on a very low number of systems. (the PIT of one PII era chipset is known to be affected) Yes, but the status quo has been *tested* on thousands of systems and is known to work. Thus, changing it puts things into unknown territory, even if only a small number of machines actually need the current configuration. Heck, there are only a small number of 386/486 machines still in operation and being actively updated. unfortunately this hack's side-effects are mis-used by an unknown number of drivers to mask PCI posting bugs. We want to figure out those bugs (safely and carefully) and we want to remove this hack from modern machines that dont need it. Doing anything else would be superstition. anyway, we likely wont be doing anything about this in .24. Again, 24 is "right out". 25 is a "maybe", IMO. Rene's fix could be an exception, since it is a DMI-keyed workaround for a specific machine and doesn't change behaviour in general. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More info on port 80 symptoms on MCP51 machine.
Allen Martin wrote: Nothing inside the chipset should be decoding port 80 writes. It's possible this board has a port 80 decoder wired onto the board that's misbehaving. I've seen other laptop boards with port 80 decoders wired onto the board, even if the 7 segment display is only populated on debug builds. This is very helpful. So the next question is there something on the laptop mainboard. Any idea how to look for such a thing? I am not averse to taking the laptop apart to look at the mainboard. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.
Here we go. # dmidecode -s baseboard-manufacturer Quanta # dmidecode -s baseboard-product-name 30B9 There do seem to be other systems, besides mine, that have the same problem. I think it's pretty likely that other machines that have this problem are Quanta machines, since Quanta is one of the primary ODM's that does HP laptops. Don't know about the product-name being common with the HP dv6000z family, which is another one reported to have this problem. We could try to ask all the reporters of hwclock freezes to report their results from dmidecode. Rene Herman wrote: On 15-12-07 21:27, H. Peter Anvin wrote: Rene Herman wrote: Yes, just posted a Patch-For-Comments that switches on the availability of a TSC (tsc_init sets tsc_disable also for !cpu_has_tsc) which would mean that only really old stuff would be using the outb still. A TSC is really all we need to have a sensible udelay(). Uhm, no. You have no clue what the speed of the TSC is until you have been able to calibrate it against a fixed timesource - like the PIT. Yes. Hnng. Okay, this is going nowhere in a hurry, so back to the very first suggestion in this thread. How about this? This allows to switch from port 0x80 to port 0xed based on DMI. David: I plugged in my own DMI values for testing, but obviously yours are needed. The values that are needed are retrieved by the "dmidecode" program which you will probably have installed (it might be in an sbin directory) or will be able to install through whatever package manager you use. dmidecode -s baseboard-manufacturer dmidecode -s baseboard-product-name are the values you should plug into the .matches field in the dmi_system_id struct in this. It would be great if you could do that, test, and post back with those values. .ident should be a nice human name. It's been tested on x86-32 and seems to work fine. It's not been tested on x86-64 but seems to stand a fair chance of working similarly. It ofcourse remains possible to switch to a udelay() based method later on anyways but with all the pre-calibratin trouble, this might be the lowest risk method in the short run. This is partly based on previous patches by Pavel Machek and David P. Reed. I hope this is considered half-way correct/sane (note by the way that it's not a good idea to switch a "native_io_delay_port" value since plugging in a variable port would clobber register dx for every outb_p, which would then have to be reloaded for the next outb again). Comments appreciated. Signed-off-by: Rene Herman <[EMAIL PROTECTED]> arch/x86/boot/compressed/misc_32.c |8 ++--- arch/x86/boot/compressed/misc_64.c |8 ++--- arch/x86/kernel/Makefile_32|2 - arch/x86/kernel/Makefile_64|2 - arch/x86/kernel/io_delay.c | 53 + arch/x86/kernel/setup_32.c |2 + arch/x86/kernel/setup_64.c |2 + include/asm-x86/io_32.h| 17 ++- include/asm-x86/io_64.h| 23 ++-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.
H. Peter Anvin wrote: David P. Reed wrote: Just a thought for a way to fix the "very early" timing needed to set up udelay to work in a way that works on all machines. Perhaps we haven't exploited the BIOS enough: The PC BIOS since the PC AT (286) has always had a standard "countdown timer" way to delay for n microseconds, which as far as I know still works. This can be used to measure the speed of a delay loop, without using ANY in/out instructions directly (the ones in the BIOS are presumably correctly delayed...). If we enter from the 32-bit entrypoint, we already don't have access to the BIOS, though. My understanding is that the linux starts in real mode, and uses the BIOS for such things as reading the very first image. arch/x86/boot/main.c seems to use BIOS calls, and one can do what I wrote in C or asm. Good place to measure the appropriate delay timing, and pass it on forward. That's what I was suggesting, which is why I copied the ASM routine from my old code listing as I did. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.
I understand the risks of such a fundamental change, and it may be only a minor concern, but I did also point out that using an unused port causes the bus to be tied up for a microsecond or two, which matters on a fast SMP machine. Of course all the other concerns you guys are worrying about are really important. I don't want to break anybody's running systems... I'd like to see my machine run smoothly, and all the other machines that may or may not have this problem (google "hwclock freeze" to see that I'm far from alone - I just have persevered in "bisecting" this problem with kernel tweaks for months, whereas the others did not or did not know how). By the way, this laptop is really nice for Linux in lots of ways. Dual drives, so I set it up with software RAID for reliability, dual 64-bit processors, fast 3D graphics, etc. Great battery life. Just one last kernel issue. I also note that curent machines like the problem machine have ACPI, and maybe those would be the ones that vendors might start to define port 80 to mean something. As I noted, it /seems/ to be only when ACPI is turned on that this problem happens on my machine - that's when the OS starts to be involved in servicing various things, so it suggests that maybe things change about port 80's unknown function on these machines when ACPI is servicing the system management code (that's not something I have been able to verify). My belief is that my machine has some device that is responding to port 80 by doing something. And that something requires some other program to "service" port 80 in some way. But it sure would be nice to know. I can't personally sand off the top of the chipset to put probes into it - so my normal approach of putting a logic analyzer on the bus doesn't work. PS: If I have time, I may try to build Rene's port 80 test for Windows and run it under WinXP on this machine (I still have a crappy little partition that boots it). If it freezes the same way, it's almost certain a design "feature", and if it doesn't freeze, we might suspect that there is compensating logic in either Windows ACPI code or some way that windows "sets up" the machine. Alan Cox wrote: On Sat, 15 Dec 2007 14:27:25 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: * Rene Herman <[EMAIL PROTECTED]> wrote: The issue is -- how do you safely replace the outb pre-loops_per_jiffy calibration? I'm currently running with the attached hack (not submitted, only for 32-bit and discussion) the idea of which might be the best we can do? how about doing a known-NOP chipset cycle? For example: inb(PIC_SLAVE_IMR) It needs tobe a different chip to the main one (or macrocell anyway) - so PIC for PIT and vice versa. However since we know 0x80 works for everything on the planet but this one specifies of laptop which seems to need a firmware update its a very high risk approach. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.
Just a thought for a way to fix the "very early" timing needed to set up udelay to work in a way that works on all machines. Perhaps we haven't exploited the BIOS enough: The PC BIOS since the PC AT (286) has always had a standard "countdown timer" way to delay for n microseconds, which as far as I know still works. This can be used to measure the speed of a delay loop, without using ANY in/out instructions directly (the ones in the BIOS are presumably correctly delayed...). So first thing in the boot sequence, one can calibrate a timing loop using this technique, and save the value to be used for all the "early" stuff. Here's skeleton code from old ASM code I found lying around in my archives to use BIOS to measure how many unrolled short jumps can execute in 10 msec. Note that it can run without knowing anything whatsoever about port timing. haltbyte db 0 calibrate: les bx,haltbyte ; address of halt flag into es:bx mov ax,8300h sub cx,cx mov dx,1 ; 10 msec. in cx:dx int 15h jc bad sub dx,dx sub cx,cx ; decrement counter in dx:cx tloop: jmp short $+2 ; 10 short jmps jmp short $+2 jmp short $+2 jmp short $+2 jmp short $+2 jmp short $+2 jmp short $+2 jmp short $+2 jmp short $+2 test haltbyte loopz tloop jnz done dec dx jnz tloop " overflowed 32 bits ... never happens, cancel BIOS event wait. mov ax,8301h int 15h jmp error done: mov ax,cx negl " here dx:ax contains 32 bit loop count corresponding to 10 msec. ret ; return 32-bit value Doc on function 83h of int 15h should be available online. Alan Cox wrote: On Fri, 14 Dec 2007 14:13:46 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: Pavel Machek wrote: On Fri 2007-12-14 10:02:57, H. Peter Anvin wrote: Ingo Molnar wrote: wow, cool fix! (I remember that there were other systems as well that are affected by port 0x80 muckery - i thought we had removed port 0x80 accesses long ago.) how about the simpler fix below, as a first-level approach? We can then remove the _p in/out sequences after this. I believe this will suffer from the issue that was raised: this will use udelay() long before loop calibration (and no, we can't just "be conservative" since there is no "conservative" value we can use.) ?? Just initialize bogomips to 6GHz equivalent... and we are fine until 6GHz cpus come out. How long will that take to boot on a 386? Well the dumb approach to fix that would seem to be to initialise it to cpu->family 3 -> 50MHz 4 -> 300Mhz 5-> etc... Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.
Avi Kivity wrote: kvm will forward a virtual machine's writes to port 0x80 to the real port. The reason is that the write is much faster than exiting and emulating it; the difference is measurable when compiling kernels. Now if the cause is simply writing to port 0x80, then we must stop doing that. But if the reason is the back-to-back writes, when we can keep it, since the other writes will be trapped by kvm and emulated. Do you which is the case? As for kvm, I don't have enough info to know anything about that. Is there a test you'd like me to try? I think you are also asking if the crash on these laptops is caused only by back-to-back writes. Actually, it doesn't seem to matter if they are back to back. I can cause the crash if the writes to 80 are very much spread out in time - it seems only to matter how many of them get executed - almost as if there is a buffer overflow. (And of course if you do back to back writes to other ports that are apparently fully unused, such as 0xED on my machine, no crash occurs). I believe (though no one seems to have confirming documentation from the chipset or motherboard vendor) that port 80 is actually functional for some unknown function on these machines. (They do respond to "in" instructions faster than a bus cycle abort does - more evidence). I searched the DSDT to see if there is any evidence of an ACPI use for this port, but found nothing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.
Andi Kleen wrote: " With the additional call this should be completely out of line now to save code size. Similar for the in variant. Sure. Want me to make a new patch with the _p croutines out-of-line? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.
Replace use of outb to "unused" diagnostic port 0x80 for time delay with udelay based time delay on x86_64 architecture machines. Fix for bugs 9511 and 6307 in bugzilla, plus bugs reported in bugzilla.redhat.com. Derived from suggestion (that didn't compile) by Pavel Machek, and tested, also based on measurements of typical timings of out's collated by Rene Herman from many in the community. This patch fixes a number of bugs known to cause problems on HP Pavilion dv9000z and dv6000z laptops - in the form of solid freezes when hwclock is used to show or set the time. Also, it potentially improves bus utilization on SMP machines, by using a waiting process that doesn't tie up the ISA/LPC bus for 1 or 2 microseconds. i386 family fixes (completely parallel) were not included, considering that such machines might involve more risk of problems on legacy machines. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> Index: linux-2.6/arch/x86/boot/compressed/misc_64.c === --- linux-2.6.orig/arch/x86/boot/compressed/misc_64.c +++ linux-2.6/arch/x86/boot/compressed/misc_64.c @@ -269,10 +269,10 @@ static void putstr(const char *s) RM_SCREEN_INFO.orig_y = y; pos = (x + cols * y) * 2; /* Update cursor position */ - outb_p(14, vidport); - outb_p(0xff & (pos >> 9), vidport+1); - outb_p(15, vidport); - outb_p(0xff & (pos >> 1), vidport+1); + outb(14, vidport); + outb(0xff & (pos >> 9), vidport+1); + outb(15, vidport); + outb(0xff & (pos >> 1), vidport+1); } static void* memset(void* s, int c, unsigned n) Index: linux-2.6/include/asm/io_64.h === --- linux-2.6.orig/include/asm/io_64.h +++ linux-2.6/include/asm/io_64.h @@ -1,6 +1,7 @@ #ifndef _ASM_IO_H #define _ASM_IO_H +#include /* * This file contains the definitions for the x86 IO instructions @@ -15,19 +16,7 @@ * mistake somewhere. */ -/* - * Thanks to James van Artsdalen for a better timing-fix than - * the two short jumps: using outb's to a nonexistent port seems - * to guarantee better timings even on fast machines. - * - * On the other hand, I'd like to be sure of a non-existent port: - * I feel a bit unsafe about using 0x80 (should be safe, though) - * - * Linus - */ - - /* - * Bit simplified and optimized by Jan Hubicka +/* Bit simplified and optimized by Jan Hubicka * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999. * * isa_memset_io, isa_memcpy_fromio, isa_memcpy_toio added, @@ -35,36 +24,36 @@ * - Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> */ -#define __SLOW_DOWN_IO "\noutb %%al,$0x80" - +/* the following delays are really conservative, at least for modern machines */ #ifdef REALLY_SLOW_IO -#define __FULL_SLOW_DOWN_IO __SLOW_DOWN_IO __SLOW_DOWN_IO __SLOW_DOWN_IO __SLOW_DOWN_IO +#define _IOPORT_PAUSE_DELAY 10 #else -#define __FULL_SLOW_DOWN_IO __SLOW_DOWN_IO +#define _IOPORT_PAUSE_DELAY 2 #endif /* * Talk about misusing macros.. */ -#define __OUT1(s,x) \ +#define __OUT1(s, x) \ static inline void out##s(unsigned x value, unsigned short port) { -#define __OUT2(s,s1,s2) \ -__asm__ __volatile__ ("out" #s " %" s1 "0,%" s2 "1" - -#define __OUT(s,s1,x) \ -__OUT1(s,x) __OUT2(s,s1,"w") : : "a" (value), "Nd" (port)); } \ -__OUT1(s##_p,x) __OUT2(s,s1,"w") __FULL_SLOW_DOWN_IO : : "a" (value), "Nd" (port));} \ +#define __OUT2(s, s1, s2) \ + __asm__ __volatile__ ("out" #s " %" s1 "0,%" s2 "1" : : "a" (value), "Nd" \ +(port)); + +#define __OUT(s, s1, x) \ +__OUT1(s, x) __OUT2(s, s1, "w") } \ + __OUT1(s##_p, x) __OUT2(s, s1, "w") udelay(_IOPORT_PAUSE_DELAY); } \ #define __IN1(s) \ static inline RETURN_TYPE in##s(unsigned short port) { RETURN_TYPE _v; -#define __IN2(s,s1,s2) \ -__asm__ __volatile__ ("in" #s " %" s2 "1,%" s1 "0" +#define __IN2(s, s1, s2) \ +__asm__ __volatile__ ("in" #s " %" s2 "1,%" s1 "0" : "=a" (_v) : "Nd" (port)); -#define __IN(s,s1,i...) \ -__IN1(s) __IN2(s,s1,"w") : "=a" (_v) : "Nd" (port) ,##i ); return _v; } \ -__IN1(s##_p) __IN2(s,s1,"w") __FULL_SLOW_DOWN_IO : "=a" (_v) : "Nd" (port) ,##i ); return _v; } \ +#define __IN(s, s1) \ +__IN1(s) __IN2(s, s1, "w") return _v; } \ + __IN1(s##_p) __IN2(s, s1, "w") udelay(_IOPORT_PAUSE_DELAY); return _v; } \ #define __INS(s) \ static inline void ins##s(unsigned short port, void * addr, unsigned long count) \ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Simulating 1 microsecond delays (assuming LPC meets that goal for 0x80) is "absolutely correct" for devices provided on PCI-X running on 3 GHz or greater machines? Well, you are entitled to your opinion. Seems likely that reading the timing specs of such a chipset might be correct, and delaying for a time proportional to CPU speed, rather than assuming running 3000 3GHz clock cycles is needed on a very fast emulation of an old device that probably runs at the fastest bus speed provided in the chipset. Every device has different timing constraints. In the real world that I live in. Alan Cox wrote: On Thu, 13 Dec 2007 08:13:29 -0500 "David P. Reed" <[EMAIL PROTECTED]> wrote: Perhaps what was meant is that ISA-tuned timings make little sense on devices that are part of the chipset or on the PCI or PCI-X buses? No. ISA as LPC bus is alive and well inside and outside chipsets. Welcome to planet earth and the reality of 'its cheaper to reuse cells than design a new one'. For the chipset logic like DMA controllers the _p is absolutely correct. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Perhaps what was meant is that ISA-tuned timings make little sense on devices that are part of the chipset or on the PCI or PCI-X buses? On the other hand, since we don't know in many cases whether the "_p" was supposed to mean "the time it takes to execute an "out al,80h" on whatever bus structure happens to be on whatever machine, the problem is unsolvable. Ranting about whether ISA/LPC is on what machines seems to be of little value in contributing to a constructive solution. It seems to me that in the long term, driver writers would do well to think more clearly about the timings their devices require, when that is possible. They are probably implementation dependent - depending on the clock speed of the particular clock that is driving the particular i/o device. Then there's the social problem of a community development project - which is to get people to tune their code but preserve its ability to run on older and variant machines. Alan Cox wrote: Yes, it's now clear that all of this is so. Regrettably, it's used in dozens of drivers, most having nothing to do with an ISA/LPC bus. If it really is specific to the ISA architecture, then it should only be used in architecture specific code. ISA/LPC is not architecture specific. In fact ISA/LPC bus components get everywhere because the PC market drives the cost per unit for those components down to nearly nothing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More info on port 80 symptoms on MCP51 machine.
Port 0xED, just FYI: cycles: out 1430, in 1370 cycles: out 1429, in 1370 (800 Mhz) Rene Herman wrote: On 12-12-07 21:07, David P. Reed wrote: Sadly, I've been busy with other crises in my day job for the last few days. I did modify Rene's test program and ran it on my "problem" machine, with the results below. The interesting part of this is that port 80 seems to respond to "in" instructions faster than the presumably "unused" ports 0xEC and 0XEF (those were mentioned by someone as alternatives to port 80). Don't know if someone else mentioned those but I only said 0xed. That's the value Phoenix BIOSes use (yes, and which H. Peter Anvin) reported as being generally problematic as well). It's in fact not all that unexpected it seems that port 0x80 responds to in given that it's used by the DMA controller. It's a write that falls on deaf ears. The read is going to be faster if it doesn't timeout on an unused port. Although it's not faster for everyone, such as for me indicating that for us port 0x80 is really-really unused, it is for many. See results here: http://lkml.org/lkml/2007/12/12/309 That, and the fact that the port 80 test reliably freezes the machine solid the second time it is run, and the "hwclock" utility reliably hangs the machine if the port 80's are used in the CMOS_READ/CMOS_WRITE loop, seems to strongly indicate that this chipset or motherboard actually uses port 80, rather than there being a bus problem. Yes, so it seems. In this case we could in fact also "fix" your situation by just going to 0xed depending on for example DMI. Alan Cox just posted a few further problems with a simple udelay() replacement... Someone might have an in to nVidia to clarify this, since I don't. In any case, the udelay(2) approach seems to be a safe fix for this machine. Hope input from an "outsider" is helpful in going forward. I put a lot of time and effort into tracking down this problem on this particular machine model, largely because I like the machine. Running the (slightly modified to test ports 80, ec, ef instead of just port 80) test when the 2 GHz max speed CPU is running at 800 MHz, here's what I get for port 80 and port ec and port ef. port 80: cycles: out 1430, in 792 At 800 MHz, that's 1.79 / 0.99 microseconds. The precision of the "in" is somewhat interesting. Did someone at nVidia think it's an "in" from 0x80 which should get the 1 microsec delay? port ef:cycles: out 1431, in 1378 port ec: cycles: out 1432, in 1372 Unused ports, bus timeouts. System info: HP Pavilion dv9000z laptop (AMD64x2) PCI bus controller is nVidia MCP51. processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 72 model name : AMD Turion(tm) 64 X2 Mobile Technology TL-60 stepping: 2 cpu MHz : 800.000 cache size : 512 KB Rene. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
More info on port 80 symptoms on MCP51 machine.
Sadly, I've been busy with other crises in my day job for the last few days. I did modify Rene's test program and ran it on my "problem" machine, with the results below. The interesting part of this is that port 80 seems to respond to "in" instructions faster than the presumably "unused" ports 0xEC and 0XEF (those were mentioned by someone as alternatives to port 80). That, and the fact that the port 80 test reliably freezes the machine solid the second time it is run, and the "hwclock" utility reliably hangs the machine if the port 80's are used in the CMOS_READ/CMOS_WRITE loop, seems to strongly indicate that this chipset or motherboard actually uses port 80, rather than there being a bus problem. Someone might have an in to nVidia to clarify this, since I don't. In any case, the udelay(2) approach seems to be a safe fix for this machine. Hope input from an "outsider" is helpful in going forward. I put a lot of time and effort into tracking down this problem on this particular machine model, largely because I like the machine. Running the (slightly modified to test ports 80, ec, ef instead of just port 80) test when the 2 GHz max speed CPU is running at 800 MHz, here's what I get for port 80 and port ec and port ef. port 80: cycles: out 1430, in 792 port ef:cycles: out 1431, in 1378 port ec: cycles: out 1432, in 1372 System info: HP Pavilion dv9000z laptop (AMD64x2) PCI bus controller is nVidia MCP51. processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 72 model name : AMD Turion(tm) 64 X2 Mobile Technology TL-60 stepping: 2 cpu MHz : 800.000 cache size : 512 KB -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Attitude problems.
Who has attitude problems here? I have indeed learned a lot about assholes. linux-os (Dick Johnson) wrote: Yep. We are all wrong. You come out of nowhere and claim to be right. Goodbye. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT] Port 0x80 I/O speed
I have been having a fun time testing this on my AMD64x2 system. Since out's to port 80 hang the system hard after a while, I can run a test just after booting, but the next run will typically hang it. I did also test two ports thought to be unused. They do *not* hang the system. Thus apparently there is some device responding to port 80! Running the (slightly modified to test 3 ports instead of just port 80) test when the CPU is running at 800 MHz, here's what I get for port 80 and port ec and port ef. port 80: cycles: out 1430, in 792 port ef:cycles: out 1431, in 1378 port ec: cycles: out 1432, in 1372 [Note: port 80, when it doesn't hang, which is very often, responds to a read twice as fast as a port which is "not there". Typically though, the second time I run the test, the system freezes solid. Seems like evidence of a present device, not a missing one. Also, port 80 responds when booted with acpi=off, but never seems to hang - sounds like ACPI causes it to be active in some way] System info: HP Pavilion dv9000z laptop (AMD64x2) PCI bus controller is nVidia MCP51. /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 72 model name : AMD Turion(tm) 64 X2 Mobile Technology TL-60 stepping: 2 cpu MHz : 800.000 cache size : 512 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips: 1608.35 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 72 model name : AMD Turion(tm) 64 X2 Mobile Technology TL-60 stepping: 2 cpu MHz : 800.000 cache size : 512 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips: 1608.35 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
1) I found in a book, the Undocumented PC, that I have lying around that the "pause" recommended for some old adapter chips on the ISA bus was 1 usec. The book carefully points out on various models of PCs how many short jumps are required to implement 1 usec, and suggests that for faster machines, 1 usec loops be calibrated. That seems like a good heuristic. 2) Also, Dick, you got me interested in doing more historical research into electrical specs and circuit diagrams (which did come with the IBM 5150). The bus in the original IBM PC had no problem with "bus capacity being charged" as you put it. Perhaps you don't remember that the I/O bus had the same electrical characteristics as the memory bus. Thus there is no issue with the bus being "charged". The issue of delays between i/o instructions was entirely a problem of whether the adapter card could clock data into its buffers and handle the clocked in data in time for another bus cycle. This had nothing to do with "charging" - buses in those days happily handled edges that were much faster than 1 usec. We at Software Arts did what we did based on direct measurements and testing. We found that the early BIOS listings were usually fine, but in fact were misleading. After all, the guys who built the machine and wrote the BIOS were in a hurry. There were errata. linux-os (Dick Johnson) wrote: You do remember that the X86 can do back-to-back port instructions faster than the ISA bus capacity can be charged, don't you? You do remember the admonishment about: intel asm mov dx, port; One of two adjacent ports mov al,ffh ; All bits set out dx,al ; Output to port, bits start charging bus inc al ; Al becomes 0 inc dx ; Next port out dx,al ; Write 0 there, data bits discharged When the port at 'port' gets its data, it will likely be 0, not 0xff, because the intervening instructions can execute faster than a heavily-loaded ISA bus. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Dick - I didn't work for Don in Boca. I did know him, having met with him several times when he was still alive. I worked from 1979-1985 as a consultant and eventually VP R&D, at Software Arts in Cambridge, MA, and there was a machine we developed the first IBM Visicalc for, in a locked room which required NDA sign-in, with a list of authorized employees and consultants. The machine was a plywood board. It was not a 5150, yet. Note that I did not say I worked in Boca. Funny thing to twist my comments into that assertion. Note: I am not trying to say that I know everything about the history of PC-compatibles, nor am I trying to prove some kind of macho thing. But I do happen to have a lot of practical experience in this space. In particular, I am trying to contribute to Linux to make it better. Largely because I think you guys are doing a great thing, and as a user of it, I think it's a good thing to give back. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Alan Cox wrote: The vga driver is somewhat misnamed. In console mode we handle everything back to MDA/HGA and some HGA adapters do need delays. No they don't. I really, really, really know this for a fact. I wrote ASM drivers for every early video adapter and ran them all through Lotus QA and Software Arts QA. Personally. The only delay needed is caused by not having dual-ported video frame buffers on the original CGA in high res character mode. This caused "snow" when a memory write was done concurrently with the read being done by the scanline character generator. And that delay was done by waiting for a bit in the I/O port space to change. There was NO reason to do waits between I/O instructions. Produce a spec sheet, or even better a machine. I may have an original PC-XT still lying around in the attic, don't know if I can fire it up, but it had such graphics cards. I also have several early high-speed clones that were "overclocked". I do remind all that 0x80 is a BIOS-specific standard, and is per BIOS - other ports have been used in the history of the IBM PC family by some vendors, and some machines have used it for DMA port mapping!! And All do -thats why it is suitable. Not true. Again, I can produce machines that don't use 0x80. Perhaps that is because I am many years older than you are, and have been writing code for PC compatibles since 1981. (not a typo - this was before the first IBM PC was released to the public). Windows XP does NOT use it at all. Therefore it may not be supported by Older Windows does. Don't know about XP although DOS apps in XP will but they may virtualise the port. Show me one line of Windows code written by Microsoft that uses port 80. I don't know what app hackers might have done - there was no protection, and someone might have copied the BIOS for some reason. I have a simple patch that fixes my primary concern - just change the CMOS_READ and CMOS_WRITE, 64-bit versions of I/O and bootcode vga accesses (first group below) to use the straight inb and outb code. Which requires care. Have you verified all the main chipset vendors ? I obviously have not. Clearly the guys who want this port 80 hack so desperately have not either. That's why we are in this pickle. (well, we only to the extent that I am accepted as having useful input. I'm happy to go if I'm not perceived as being helpful). I may submit it so that the many others who share my pain will be made All .. none of them ? There is a long standing set of reports of "hwclock" not working on HP dv.000v laptops, where the . stands for 2, 4, 6, and 9. These are all nvidia MCP51 chipset AMD64's. And if you choose to be such an insulting , I may just stop trying to be helpful. I presume that others in the community find my comments helpful. I can do some of these off the top of my head drivers/net/8390.h Needed for some 8390 devices on ISA bus drivers/net/de600.c drivers/net/de600.h Uses the parallel port which isnt guaranteed to be full ISA speed. drivers/scsi/ppa.h Parallel port drivers/serial/8250.c Some PC's need delays for certain ops. drivers/watchdog/wdt_pci.c That one is a mistake I believe, I'll dig out the docs. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Which port do you want me to test? Also, I can run the timing test on my machine if you share the source code so I can build it. Rene Herman wrote: On 11-12-07 17:30, Andi Kleen wrote: Anyways it looks like the discussion here is going in a a loop. I had hoped David would post his test results with another port so that we know for sure that the bus aborts (and not port 80) is the problem on his box. But it looks like he doesn't want to do this. Still removing the bus aborts is probably the correct way to go forward. Yes, I do also still want to know that. David (Reed)? Only needs a patch now. If nobody beats me to it i'll add one later to my tree. Pavel Machek already posted one. His udelay(8) wants to be less -- 1 or "to be safe" perhaps 2. http://lkml.org/lkml/2007/12/9/131 Rene. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
As the person who started this thread, I'm still puzzled by two things: 1) is there anyone out there who wrote one of these drivers (most listed below, from a list of those I needed to patch to eliminate refs to _b calls) or arch specific code (also listed below), who might know why the _p macros are actually needed (for any platform)? Note that many of the devices are not on the ISA/LPC bus now, even if they were, and the vga has never needed a bus-level pause since the original IBM PC existed. (it did need a sync with retrace, but that's another story). 2) Why are opterons and so forth so slow on out's to x80 as the measurements show? That seems to me like there is a hidden bus timeout going on. I'm still trying to figure out what is happening on my machine which hangs when not in legacy mode (i.e. in ACPI mode) after a lot of out's to x80. Perhaps the bus timeout handling is the issue. I do remind all that 0x80 is a BIOS-specific standard, and is per BIOS - other ports have been used in the history of the IBM PC family by some vendors, and some machines have used it for DMA port mapping!! And Windows XP does NOT use it at all. Therefore it may not be supported by vendors, and may in fact be used for other purposes, since it can if the BIOS doesn't use it. I have a simple patch that fixes my primary concern - just change the CMOS_READ and CMOS_WRITE, 64-bit versions of I/O and bootcode vga accesses (first group below) to use the straight inb and outb code. I may submit it so that the many others who share my pain will be made happy - at least on modern _64 x86 machines those instructions don't need the _p feature. The rest of the drivers and code are just lurking disasters, which I hope can be resolved somehow by the community figuring out what the timing delays were put there for... - arch/x86/boot/compressed/misc_64.c arch/x86/kernel/i8259_64.c arch/x86/pci/irq.c include/asm/floppy.h include/asm/io_64.h include/asm/mc146818rtc_64.h include/asm/vga.h include/video/vga.h drivers/video/console/vgacon.c drivers/video/vesafb.c drivers/video/vga16fb.c drivers/acpi/processor_idle.c drivers/bluetooth/bluecard_cs.c drivers/char/pc8736x_gpio.c drivers/char/rocket_int.h drivers/hwmon/abituguru.c drivers/hwmon/abituguru3.c drivers/hwmon/it87.c drivers/hwmon/lm78.c drivers/hwmon/pc87360.c drivers/hwmon/sis5595.c drivers/hwmon/smsc47b397.c drivers/hwmon/smsc47m1.c drivers/hwmon/via686a.c drivers/hwmon/vt8231.c drivers/hwmon/w83627ehf.c drivers/hwmon/w83627hf.c drivers/hwmon/w83781d.c drivers/i2c/busses/i2c-amd756.c drivers/i2c/busses/i2c-i801.c drivers/i2c/busses/i2c-nforce2.c drivers/i2c/busses/i2c-viapro.c drivers/input/misc/pcspkr.c drivers/isdn/hisax/elsa_ser.c drivers/isdn/hisax/s0box.c drivers/misc/sony-laptop.c drivers/net/8390.h drivers/net/de600.c drivers/net/de600.h drivers/net/irda/nsc-ircc.c drivers/net/irda/w83977af_ir.c drivers/net/pcmcia/axnet_cs.c drivers/net/pcmcia/pcnet_cs.c drivers/net/wireless/wl3501_cs.c drivers/scsi/megaraid.c drivers/scsi/megaraid.h drivers/scsi/ppa.h drivers/serial/8250.c drivers/video/console/vgacon.c drivers/video/vesafb.c drivers/video/vga16fb.c drivers/watchdog/pcwd_pci.c drivers/watchdog/w83627hf_wdt.c drivers/watchdog/w83697hf_wdt.c drivers/watchdog/w83877f_wdt.c drivers/watchdog/w83977f_wdt.c drivers/watchdog/wdt_pci.c -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
I am going to do a test on another "unused" port. However, I realized as I was thinking about this. 0x80 is the "diagnostic device" port. It is not an "unused" port. Normally, Linux would support a device like the diagnostic device by providing a character device, called /dev/post-diagnosis (for the power-on test diagnostic). That device would reserve port 80 for its use, and the driver could be loaded if there was such a device. Now one possibility is that my laptop contains a diagnostic code device that stores all the out's to port 80 (documented only to the designers, and kept "secret"). That device may need "clearing" periodically, which is perhaps done by the SMM, which is turned off when I go into ACPI-on state. Or maybe it is designed to be cleared only when the system boots at the beginning of the BIOS. What happens when (as happens in hwclock's polling of the RTC) thousands of in/out*_p calls are made very fast? Well, perhaps it is not cleared quickly enough, and hangs the bus. The point here is that Linux is NOT using a defined-to-be "unused" port. It IS using the "diagnostic" port, and talking to a diagnostic device that *is* used, and may be present. Just doesn't seem clean to me. So I'd suggest 2 actions: 1) figure out a better implementation of _p that is "safe" and doesn't use questionable heuristics. udelay seems reasonable because it doesn't drive contention on the busses on SMP machines, but perhaps someone has a better idea. 2) Start a background task with the maintainers of drivers to clean up their code regarding these short delays for slow devices (note that it's never because the *bus* is slow, but because the *device* is slow.) Perhaps this could be helped by "deprecating" the _p calls and suggesting an alternative that requires the coder to be precise about what the delay is for, and how long it is supposed to be, perhaps on a per-machine basis. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Alan Cox wrote: 0x80 should be fine for anything PC compatible anyway, its specifically reserved as a debug port and supported for *exactly* that purpose by many chipsets. Disagree. The definitions of PC compatible are quite problematic. I have the advantage over some of you young guys, in that I actually wrote code on one of the first 5 breadboard IBM PCs on the planet at Software Arts, Inc. and I was directly involved in hardware spec projects with the original IBM and Compaq engineers. No one actually defined the port numbered 80h as a "standard" for anything. You won't find it documented in any early manual for an IBM machine. The ISA bus supported unterminated transactions safely. That allowed some clever folks to design BIOS diagnostic tools that optionally plugged into the bus. In any case, my machine does not have an ISA bus. Why should it? It's a laptop! Now the interesting thing is that I have been scanning the source code of Linux, and I find gazillions of inb_p outb_p and so forth instructions where they have NO value. It's as if some hacker who half understood hardware threw in the _p "just to be safe". Well, it's neither safe, nor is it economical of code or data. It hangs up the bus on an MP machine, for example, even when it works, to do the delay by "outb al,80h" Worse, the actual requirements of the gazillions of inb_p instructions for delays are not documented in the code! This is interesting, because the number of devices likely to need a delay after providing data on an "in" instruction is very likely to be near zero. After all, the device has already serviced the bus and delivered data! Why put many microseconds into the bus, locking out other ISA transactions (and PCI maybe too) with an out to port 80? Some of the code in linux is really nice, really clean, really well-thought out. Some is ... well, I'm not trolling for a fight. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Andi Kleen wrote: Changing the delay instruction sequence from the outb to short jumps might be the safe thing. I don't think that makes sense to do on anything modern. The trouble is that the jumps will effectively execute near "infinitely fast" on any modern CPU compared to the bus. But the delay really needs to be something that is about IO port speed. This all presumes that you need any delay at all. From back in the early days (when I was writing DOS and BIOS code on 80286 class machines) the /only/ reason this was a problem was using really slow acting, non-buffered chips compared to the processor clock (8259?). If you think about it, if there is a sequence such as outb->device, inb<-device, the only reason for a delay would be that the device failed to process the out command, /and/ the device had no "done" flag. The other "slow" problem would be an out->device, out->device at a rate higher than the device could handle because it had a one-level buffer that ignored input that came too fast after the previous, but didn't stall the bus to protect the device. Modern machines just are not designed that way - a few of the early PC compatibles were. My machine in question, for example, needs no waiting within CMOS_READs at all. And I doubt any other chip/device needs waiting that isn't already provided by the bus. the i/o to port 80 is very, very odd in this context. Actually, modern machines have potentially more serious problems with i/o ops to non-existent addresses, which may cause real bus wierdness. So that's why I suggested the short-jump answer - it fixes the problem on the ancient machines, but doesn't do anything on the modern ones, where there should be no problem. One patch that makes immediate sense is to use the "virtualization" hooks for the CMOS_READ/WRITE ops that is there in the 32-bit code to allow substitution of a workable sequence for the RTC, which is where I experience the problem on my machine. This doesn't fix any lurking issues with the _p APIs, since they are not virtualized. I'd suggest the safest possible route that would fix my machine would be either an early_quirk, a boot parameter, or both that would then control the virtualization hook logic. That patch would fix my machine's current issues, and would not harm any machines that need the 0x80 delay. But I know it leaves a lurking issue for another day - for all the other inb_p and outb_p code in the kernel drivers. A grep suggests that they are used only in somewhat less modern drivers - perhaps for legacy machines. I don't think any such drivers are used on any of my machines. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Pardon my ignorance, but is port 0xed really safe to push an out cycle at across the entire x86_64 family? How long must real _p pauses be in reality? (and who cares about what the code calls "really slow i/o"). Why are we waiting at all? I read the comments in io_64.h, and am a bit mystified. Does Windoze or DOS do this magical mystery wait? Anyway, the virtualization hooks in 32-bit x86 almost make it possible to isolate this simply - maybe - after the merge of 32/64 being contemplated. And anyone who knows what the chipset might be doing with the 80 port rather than POST codes, perhaps could contribute. Any nvidia folks who know what's happening under NDA? Any Phoenix BIOS types? I think the worst of the problems would be fixed by changing just the CMOS_READ/CMOS_WRITE macros. But the danger lingers in the *_p code. Rene Herman wrote: On 07-12-07 01:23, Robert Hancock wrote: David P. Reed wrote: After much, much testing (months, off and on, pursuing hypotheses), I've discovered that the use of "outb al,0x80" instructions to "delay" after inb and outb instructions causes solid freezes on my HP dv9000z laptop, when ACPI is enabled. It takes a fair number of out's to 0x80, but the hard freeze is reliably reproducible by writing a driver that solely does a loop of 50 outb's to 0x80 and calling it in a loop 1000 times from user space. !!! The serious impact is that the /dev/rtc and /dev/nvram devices are very unreliable - thus "hwclock" freezes very reliably while looping waiting for a new second value and calling "cat /dev/nvram" in a loop freezes the machine if done a few times in a row. This is reproducible, but requires a fair number of outb's to the 0x80 diagnostic port, and seems to require ACPI to be on. io_64.h is the source of these particular instructions, via the CMOS_READ and CMOS_WRITE macros, which are defined in mc146818_64.h. (I wonder if the same problem occurs in 32-bit mode). I'm happy to complete and test a patch, but I'm curious what the right approach ought to be. I have to say I have no clue as to what ACPI is doing on this chipset (nvidia MCP51) that would make port 80 do this. A raw random guess is that something is logging POST codes, but if so, not clear what is problematic in ACPI mode. ANy help/suggestions? Changing the delay instruction sequence from the outb to short jumps might be the safe thing. But Linus, et al. may have experience with that on other architectures like older Pentiums etc. The fact that these "pausing" calls are needed in the first place seems rather cheesy. If there's hardware that's unable to respond to IO port writes as fast as possible, then surely there's a better solution than trying to stall the IOs by an arbitrary and hardware-dependent amount of time, like udelay calls, etc. Does any remotely recent hardware even need this? The idea is that the delay is not in fact hardware dependent. With in the the absense of a POST board port 0x80 being sort of guaranteeed to not be decoded on PCI but forwarded to and left to die on ISA/LPC one should get the effect that the _next_ write will have survived an ISA/LPC bus address cycle acknowledgement timeout. I believe. And no, I don't believe any remotely recent hardware needs it and have in fact wondered about it since actual 386 days, having since that time never found a device that wouldn't in fact take back to back I/O even. Even back then (ie, legacy only systems, no forwarding from PCI or anything) BIOSes provided ISA bus wait-state settings which should be involved in getting insanely stupid and old hardware to behave... Port 0xed has been suggested as an alternate port. Probably not a great "fix" but if replacing the out with a simple udelay() isn't that simple (during early boot I gather) then it might at least be something for you to try. I'd hope that the 0x80 in include/asm/io.h:native_io_delay() would be the only one you are running into, so you could change that to 0xed and see what catches fire. If there are no sensible fixes, an 0x80/0xed choice could I assume be hung of DMI or something (if that _is_ parsed soon enough). Rene. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
After much, much testing (months, off and on, pursuing hypotheses), I've discovered that the use of "outb al,0x80" instructions to "delay" after inb and outb instructions causes solid freezes on my HP dv9000z laptop, when ACPI is enabled. It takes a fair number of out's to 0x80, but the hard freeze is reliably reproducible by writing a driver that solely does a loop of 50 outb's to 0x80 and calling it in a loop 1000 times from user space. !!! The serious impact is that the /dev/rtc and /dev/nvram devices are very unreliable - thus "hwclock" freezes very reliably while looping waiting for a new second value and calling "cat /dev/nvram" in a loop freezes the machine if done a few times in a row. This is reproducible, but requires a fair number of outb's to the 0x80 diagnostic port, and seems to require ACPI to be on. io_64.h is the source of these particular instructions, via the CMOS_READ and CMOS_WRITE macros, which are defined in mc146818_64.h. (I wonder if the same problem occurs in 32-bit mode). I'm happy to complete and test a patch, but I'm curious what the right approach ought to be. I have to say I have no clue as to what ACPI is doing on this chipset (nvidia MCP51) that would make port 80 do this. A raw random guess is that something is logging POST codes, but if so, not clear what is problematic in ACPI mode. ANy help/suggestions? Changing the delay instruction sequence from the outb to short jumps might be the safe thing. But Linus, et al. may have experience with that on other architectures like older Pentiums etc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: on x86_64, correct reading of PC RTC when update in progress in time_64.c
There are a couple of things I don't understand on this one. And I presume you thought the other two bug fixing patches I sent before this were OK to go, since on my system Thomas Gleixner wrote: Still whitespace wreckage in your patches. I guess the kernel tree you made your patches against is already white space wrecked. I fixed that up manually, but please be more careful about that next time. Um ... I fixed the whitespaces I detected from the first round with checkpatch.pl. Then for good measure I ran checkpatch.pl on the patches, then pasted the files directly into the emails. No problems detected. And I also just tried checkpatch.pl on the "sent" folder copy. No problems detected there. Where was the whitespace? Was it in the patches? Would you mind showing me the output so I can do a better job in the future? Correct potentially unstable PC RTC time register reading in time_64.c Stop the use of an incorrect technique for reading the standard PC RTC timer, which is documented to "disconnect" time registers from the bus while updates are in progress. The use of UIP flag while interrupts are disabled to protect a 244 microsecond window is one of the Motorola spec sheet's documented ways to read the RTC time registers reliably. The patch updates the misleading comments and also minimizes the amount of time that the kernel disables interrupts during the reading. While I think that the UIP change is correct and a must have, the locking change is not really useful. read_persistent_clock is called from exactly three places: What locking change? I didn't change how locking works in read_persistent_clock at all. I did introduce cpu_relax() because if anyone else ever calls from a hot path, that would be good practice and its' one line. Right after boot, right before suspend and right after resume. None of those places is a hot path, where we really care about the interrupt enable/disable. IIRC, this is even called with interrupts disabled most of the time, so no real gain here. Another reason not to do the locking change is the paravirt stuff which is coming for 64bit. I looked into the existing 32bit code and doing the same lock thing would introduce a real nasty hackery, which is definitely not worth the trouble. I presume time_64.c and time_32.c will be unified at some point, discarding time_64.c. There's no arch-specific reason to be separate. The current time_32.c depends on a different nmi path (that does some weird stuff saving and restoring the CMOS index register!), and I didn't dare usurp your long-term plan to unify architectures. But a simple cleanup here makes sense, lest someone copy the bad technique as if it were good. Thanks, tglx Signed-off-by: David P. Reed <[EMAIL PROTECTED]> --- Index: linux-2.6/arch/x86/kernel/time_64.c === --- linux-2.6.orig/arch/x86/kernel/time_64.c +++ linux-2.6/arch/x86/kernel/time_64.c @@ -160,22 +160,30 @@ unsigned long read_persistent_clock(void unsigned long flags; unsigned century = 0; - spin_lock_irqsave(&rtc_lock, flags); + retry:spin_lock_irqsave(&rtc_lock, flags); + /* if UIP is clear, then we have >= 244 microseconds before RTC +* registers will be updated. Spec sheet says that this is the +* reliable way to read RTC - registers invalid (off bus) during update +*/ + if ((CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP)) { + spin_unlock_irqrestore(&rtc_lock, flags); + cpu_relax(); + goto retry; + } - do { - sec = CMOS_READ(RTC_SECONDS); - min = CMOS_READ(RTC_MINUTES); - hour = CMOS_READ(RTC_HOURS); - day = CMOS_READ(RTC_DAY_OF_MONTH); - mon = CMOS_READ(RTC_MONTH); - year = CMOS_READ(RTC_YEAR); + /* now read all RTC registers while stable with interrupts disabled */ + + sec = CMOS_READ(RTC_SECONDS); + min = CMOS_READ(RTC_MINUTES); + hour = CMOS_READ(RTC_HOURS); + day = CMOS_READ(RTC_DAY_OF_MONTH); + mon = CMOS_READ(RTC_MONTH); + year = CMOS_READ(RTC_YEAR); #ifdef CONFIG_ACPI - if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID && - acpi_gbl_FADT.century) - century = CMOS_READ(acpi_gbl_FADT.century); + if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID && + acpi_gbl_FADT.century) + century = CMOS_READ(acpi_gbl_FADT.century); #endif - } while (sec != CMOS_READ(RTC_SECONDS)); - spin_unlock_irqrestore(&rtc_lock, flags); /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED
[PATCH] x86: on x86_64, correct reading of PC RTC when update in progress in time_64.c
Correct potentially unstable PC RTC time register reading in time_64.c Stop the use of an incorrect technique for reading the standard PC RTC timer, which is documented to "disconnect" time registers from the bus while updates are in progress. The use of UIP flag while interrupts are disabled to protect a 244 microsecond window is one of the Motorola spec sheet's documented ways to read the RTC time registers reliably. The patch updates the misleading comments and also minimizes the amount of time that the kernel disables interrupts during the reading. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> --- Index: linux-2.6/arch/x86/kernel/time_64.c === --- linux-2.6.orig/arch/x86/kernel/time_64.c +++ linux-2.6/arch/x86/kernel/time_64.c @@ -160,22 +160,30 @@ unsigned long read_persistent_clock(void unsigned long flags; unsigned century = 0; - spin_lock_irqsave(&rtc_lock, flags); + retry:spin_lock_irqsave(&rtc_lock, flags); + /* if UIP is clear, then we have >= 244 microseconds before RTC +* registers will be updated. Spec sheet says that this is the +* reliable way to read RTC - registers invalid (off bus) during update +*/ + if ((CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP)) { + spin_unlock_irqrestore(&rtc_lock, flags); + cpu_relax(); + goto retry; + } - do { - sec = CMOS_READ(RTC_SECONDS); - min = CMOS_READ(RTC_MINUTES); - hour = CMOS_READ(RTC_HOURS); - day = CMOS_READ(RTC_DAY_OF_MONTH); - mon = CMOS_READ(RTC_MONTH); - year = CMOS_READ(RTC_YEAR); + /* now read all RTC registers while stable with interrupts disabled */ + + sec = CMOS_READ(RTC_SECONDS); + min = CMOS_READ(RTC_MINUTES); + hour = CMOS_READ(RTC_HOURS); + day = CMOS_READ(RTC_DAY_OF_MONTH); + mon = CMOS_READ(RTC_MONTH); + year = CMOS_READ(RTC_YEAR); #ifdef CONFIG_ACPI - if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID && - acpi_gbl_FADT.century) - century = CMOS_READ(acpi_gbl_FADT.century); + if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID && + acpi_gbl_FADT.century) + century = CMOS_READ(acpi_gbl_FADT.century); #endif - } while (sec != CMOS_READ(RTC_SECONDS)); - spin_unlock_irqrestore(&rtc_lock, flags); /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] time: fix typo that makes sync_cmos_clock erratic
Fix a typo in ntp.c that has caused updating of the persistent (RTC) clock when synced to NTP to behave erratically. When debugging a freeze that arises on my AMD64 machines when I run the ntpd service, I added a number of printk's to monitor the sync_cmos_clock procedure. I discovered that it was not syncing to cmos RTC every 11 minutes as documented, but instead would keep trying every second for hours at a time. The reason turned out to be a typo in sync_cmos_clock, where it attempts to ensure that update_persistent_clock is called very close to 500 msec. after a 1 second boundary (required by the PC RTC's spec). That typo referred to "xtime" in one spot, rather than "now", which is derived from "xtime" but not equal to it. This makes the test erratic, creating a "coin-flip" that decides when update_persistent_clock is called - when it is called, which is rarely, it may be at any time during the one second period, rather than close to 500 msec, so the value written is needlessly incorrect, too. Signed-off-by: David P. Reed --- I have thoroughly tested this on several x86_64 machines, and also on one 32-bit machine. I have also submitted a patch to fix the freeze cited above, but both patches are independent, and should be treated separately. Index: linux-2.6/kernel/time/ntp.c === --- linux-2.6.orig/kernel/time/ntp.c +++ linux-2.6/kernel/time/ntp.c @@ -205,7 +205,7 @@ static void sync_cmos_clock(unsigned lon return; getnstimeofday(&now); - if (abs(xtime.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec / 2) + if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec / 2) fail = update_persistent_clock(now); next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: fix freeze in x86_64 RTC update code in time_64.c
Fix hard freeze on x86_64 when the ntpd service calls update_persistent_clock() A repeatable but randomly timed freeze has been happening in Fedora 6 and 7 for the last year, whenever I run the ntpd service on my AMD64x2 HP Pavilion dv9000z laptop. This freeze is due to the use of spin_lock(&rtc_lock) under the assumption (per a bad comment) that set_rtc_mmss is called only with interrupts disabled. The call from ntp.c to update_persistent_clock is made with interrupts enabled. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> --- I have thoroughly tested this patch on a number of x86_64 machines with various numbers of cores and chipsets, using 2.6.24rc2 kernel source. Index: linux-2.6/arch/x86/kernel/time_64.c === --- linux-2.6.orig/arch/x86/kernel/time_64.c +++ linux-2.6/arch/x86/kernel/time_64.c @@ -82,18 +82,15 @@ static int set_rtc_mmss(unsigned long no int retval = 0; int real_seconds, real_minutes, cmos_minutes; unsigned char control, freq_select; + unsigned long flags; /* - * IRQs are disabled when we're called from the timer interrupt, - * no need for spin_lock_irqsave() + * set_rtc_mmss is called when irqs are enabled, so disable irqs here */ - - spin_lock(&rtc_lock); - + spin_lock_irqsave(&rtc_lock, flags); /* * Tell the clock it's being set and stop it. */ - control = CMOS_READ(RTC_CONTROL); CMOS_WRITE(control | RTC_SET, RTC_CONTROL); @@ -138,7 +135,7 @@ static int set_rtc_mmss(unsigned long no CMOS_WRITE(control, RTC_CONTROL); CMOS_WRITE(freq_select, RTC_FREQ_SELECT); - spin_unlock(&rtc_lock); + spin_unlock_irqrestore(&rtc_lock, flags); return retval; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix locking and sync bugs in x86_64 RTC code in time_64.c
Cool. More reason to separate this into two. Matt Mackall wrote: On Wed, Nov 14, 2007 at 08:10:22AM -0500, David P. Reed wrote: Will make two patches and resend. I've already got a set of patches brewing to fix the UIP problem across all the affected arches (11+). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix locking and sync bugs in x86_64 RTC code in time_64.c
Will make two patches and resend. Thomas Gleixner wrote: David, On Mon, 12 Nov 2007, David P. Reed wrote: From: David P. Reed Fix two bugs in arch/x86/kernel/time_64.c that affect the x86_64 kernel. 1) a repeatable hard freeze due to interrupts when the ntpd service calls update_persistent_clock(), 2) potentially unstable PC RTC timer values when timer is read. 1) Please send separate patches for separate problems. 2) Your patch is white space damaged and does not apply. Please run it through scripts/checkpatch.pl before submitting, maybe send it to yourself first and verify that it applies correctly. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> --- More explanation: 1) A repeatable but randomly timed freeze has been happening in Fedora 6 and 7 for the last year, whenever I run the ntpd service on my AMD64x2 HP Pavilion dv9000z laptop. This freeze is due to the use of spin_lock(&rtc_lock) under the assumption (per a bad comment) that set_rtc_mmss is called only with interrupts disabled. The call from ntp.c to update_persistent_clock is made with interrupts enabled. Good catch. 2) the use of an incorrect technique for reading the standard PC RTC timer, which is documented to "disconnect" time registers from the bus while updates are in progress. The use of UIP flag while interrupts are disabled to protect a 244 microsecond window is one of the Motorola spec sheet's documented ways to read the RTC time registers reliably. I realize that not all "clones" of the of the what ? Also put the detailed description into the patch comment, please. The patch updates the misleading comments and minimizes the amount of time that the kernel disables interrupts. I have thoroughly tested this patch on a number of x86_64 machines with various numbers of cores and chipsets, using 2.6.24rc2 kernel source. Note that while testing the ntp code I found another bug in kernel/time/ntp.c which is independent of this fix - neither patch requires the other. If possible, I'd love to see the patch merged with 2.6.24, and even with 2.6.23. Care to resend ? Thanks, tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] time: fix typo that makes sync_cmos_clock erratic
From: David P. Reed Fix a typo in ntp.c that has caused updating of the persistent (RTC) clock when synced to NTP to behave erratically. Signed-off-by: David P. Reed --- When debugging a freeze that arises on my AMD64 machines when I run the ntpd service, I added a number of printk's to monitor the sync_cmos_clock procedure. I discovered that it was not syncing to cmos RTC every 11 minutes as documented, but instead would keep trying every second for hours at a time. The reason turned out to be a typo in sync_cmos_clock, where it attempts to ensure that update_persistent_clock is called very close to 500 msec. after a 1 second boundary (required by the PC RTC's spec). That typo referred to "xtime" in one spot, rather than "now", which is derived from "xtime" but not equal to it. This makes the test erratic, creating a "coin-flip" that decides when update_persistent_clock is called - when it is called, which is rarely, it may be at any time during the one second period, rather than close to 500 msec, so the value written is needlessly incorrect, too. I rewrote the code to fix the typo and to be a little more clear about the logic that was intended here. I have thoroughly tested this on several x86_64 machines, and also on one 32-bit machine. I have also submitted a patch to fix the freeze cited above, but both patches are independent, and should be treated separately. The patch works in 2.6.24rc2, but it should also work in 2.6.23. Index: linux-2.6/kernel/time/ntp.c === --- linux-2.6.orig/kernel/time/ntp.c +++ linux-2.6/kernel/time/ntp.c @@ -188,6 +188,7 @@ static DEFINE_TIMER(sync_cmos_timer, syn static void sync_cmos_clock(unsigned long dummy) { struct timespec now, next; +long delta_from_target_time; int fail = 1; /* @@ -205,7 +206,10 @@ static void sync_cmos_clock(unsigned lon return; getnstimeofday(&now); -if (abs(xtime.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec / 2) +delta_from_target_time = abs(now.tv_nsec - (NSEC_PER_SEC / 2)); + +/* set CMOS clock, only if close enough to 500 msec point */ +if (delta_from_target_time <= tick_nsec / 2) fail = update_persistent_clock(now); next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: fix locking and sync bugs in x86_64 RTC code in time_64.c
From: David P. Reed Fix two bugs in arch/x86/kernel/time_64.c that affect the x86_64 kernel. 1) a repeatable hard freeze due to interrupts when the ntpd service calls update_persistent_clock(), 2) potentially unstable PC RTC timer values when timer is read. Signed-off-by: David P. Reed <[EMAIL PROTECTED]> --- More explanation: 1) A repeatable but randomly timed freeze has been happening in Fedora 6 and 7 for the last year, whenever I run the ntpd service on my AMD64x2 HP Pavilion dv9000z laptop. This freeze is due to the use of spin_lock(&rtc_lock) under the assumption (per a bad comment) that set_rtc_mmss is called only with interrupts disabled. The call from ntp.c to update_persistent_clock is made with interrupts enabled. 2) the use of an incorrect technique for reading the standard PC RTC timer, which is documented to "disconnect" time registers from the bus while updates are in progress. The use of UIP flag while interrupts are disabled to protect a 244 microsecond window is one of the Motorola spec sheet's documented ways to read the RTC time registers reliably. I realize that not all "clones" of the The patch updates the misleading comments and minimizes the amount of time that the kernel disables interrupts. I have thoroughly tested this patch on a number of x86_64 machines with various numbers of cores and chipsets, using 2.6.24rc2 kernel source. Note that while testing the ntp code I found another bug in kernel/time/ntp.c which is independent of this fix - neither patch requires the other. If possible, I'd love to see the patch merged with 2.6.24, and even with 2.6.23. Index: linux-2.6/arch/x86/kernel/time_64.c === --- linux-2.6.orig/arch/x86/kernel/time_64.c +++ linux-2.6/arch/x86/kernel/time_64.c @@ -82,13 +82,12 @@ static int set_rtc_mmss(unsigned long no int retval = 0; int real_seconds, real_minutes, cmos_minutes; unsigned char control, freq_select; +unsigned long flags; -/* - * IRQs are disabled when we're called from the timer interrupt, - * no need for spin_lock_irqsave() +/* + * set_rtc_mmss is called when irqs are enabled, so disable irqs here */ - -spin_lock(&rtc_lock); +spin_lock_irqsave(&rtc_lock, flags); /* * Tell the clock it's being set and stop it. @@ -138,7 +137,7 @@ static int set_rtc_mmss(unsigned long no CMOS_WRITE(control, RTC_CONTROL); CMOS_WRITE(freq_select, RTC_FREQ_SELECT); -spin_unlock(&rtc_lock); +spin_unlock_irqrestore(&rtc_lock, flags); return retval; } @@ -163,22 +162,30 @@ unsigned long read_persistent_clock(void unsigned long flags; unsigned century = 0; -spin_lock_irqsave(&rtc_lock, flags); + retry:spin_lock_irqsave(&rtc_lock, flags); +/* if UIP is clear, then we have >= 244 microseconds before RTC + * registers will be updated. Spec sheet says that this is the + * reliable way to read RTC - registers invalid (off bus) during update + */ +if ((CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP)) { + spin_unlock_irqrestore(&rtc_lock, flags); + cpu_relax(); + goto retry; +} -do { -sec = CMOS_READ(RTC_SECONDS); -min = CMOS_READ(RTC_MINUTES); -hour = CMOS_READ(RTC_HOURS); -day = CMOS_READ(RTC_DAY_OF_MONTH); -mon = CMOS_READ(RTC_MONTH); -year = CMOS_READ(RTC_YEAR); +/* now read all RTC registers while stable with interrupts disabled */ + +sec = CMOS_READ(RTC_SECONDS); +min = CMOS_READ(RTC_MINUTES); +hour = CMOS_READ(RTC_HOURS); +day = CMOS_READ(RTC_DAY_OF_MONTH); +mon = CMOS_READ(RTC_MONTH); +year = CMOS_READ(RTC_YEAR); #ifdef CONFIG_ACPI -if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID && -acpi_gbl_FADT.century) -century = CMOS_READ(acpi_gbl_FADT.century); +if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID && +acpi_gbl_FADT.century) +century = CMOS_READ(acpi_gbl_FADT.century); #endif -} while (sec != CMOS_READ(RTC_SECONDS)); - spin_unlock_irqrestore(&rtc_lock, flags); /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] time: fix typo that makes sync_cmos_clock erratic
From: David P. Reed Fix a typo in ntp.c that has caused updating of the persistent (RTC) clock when synced to NTP to behave erratically. Signed-off-by: David P. Reed --- When debugging a freeze that arises on my AMD64 machines when I run the ntpd service, I added a number of printk's to monitor the sync_cmos_clock procedure. I discovered that it was not syncing to cmos RTC every 11 minutes as documented, but instead would keep trying every second for hours at a time. The reason turned out to be a typo in sync_cmos_clock, where it attempts to ensure that update_persistent_clock is called very close to 500 msec. after a 1 second boundary (required by the PC RTC's spec). That typo referred to "xtime" in one spot, rather than "now", which is derived from "xtime" but not equal to it. This makes the test erratic, creating a "coin-flip" that decides when update_persistent_clock is called - when it is called, which is rarely, it may be at any time during the one second period, rather than close to 500 msec, so the value written is needlessly incorrect, too. I rewrote the code to fix the typo and to be a little more clear about the logic that was intended here. I have thoroughly tested this on several x86_64 machines, and also on one 32-bit machine. I have also submitted a patch to fix the freeze cited above, but both patches are independent, and should be treated separately. The patch works in 2.6.24rc2, but it should also work in 2.6.23. Index: linux-2.6/kernel/time/ntp.c === --- linux-2.6.orig/kernel/time/ntp.c +++ linux-2.6/kernel/time/ntp.c @@ -188,6 +188,7 @@ static DEFINE_TIMER(sync_cmos_timer, syn static void sync_cmos_clock(unsigned long dummy) { struct timespec now, next; +long delta_from_target_time; int fail = 1; /* @@ -205,7 +206,10 @@ static void sync_cmos_clock(unsigned lon return; getnstimeofday(&now); -if (abs(xtime.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec / 2) +delta_from_target_time = abs(now.tv_nsec - (NSEC_PER_SEC / 2)); + +/* set CMOS clock, only if close enough to 500 msec point */ +if (delta_from_target_time <= tick_nsec / 2) fail = update_persistent_clock(now); next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: do get_rtc_time() correctly
Alan: Thanks for the comment. I will code a patch, and include a sanity check as you suggested, and send it for review. Just to clarify one concern your note raised: I understand that SMM/SMI servicing can take a long time, but SMM/SMI shouldn't happen while interrupts are masked using local_irq_disable() [included in spin_lock_irq()], at least on x86-architectures. If SMM/SMI can happen even then, the NMI fix below could be generalized. My mention of NMI (which by definition can't be masked) is because NMI can happen even while interrupts are masked. This is a timing problem that can't be dealt with by masking interrupts, and NMI's are used for watchdogs, etc these days. It seems like just a general good thing to be able to ask if an NMI has happened. A per-cpu NMI eventcount that is incremented every NMI would allow one to detect NMI's that happen during an otherwise masked code sequence by reading it at the beginning and end of the code sequence. Don't know if NMIs are common on other architectures, or if this is an architecture dependent concern. Perhaps I'm really talking about two patches here. One for a mechanism to detect NMIs that happen during a critical piece of code (so it can be retried), and one that depends on that to be really proper in reading the RTC reliably. Alan Cox wrote: So the proper way to read the RTC contents is to read the UIP flag, and if zero, read all the RTC registers with interrupts masked completely, so all reads happen in the 224 usec window. (NMI can still be a problem, but you can have NMI's set a flag that forces a retry). SMM/SMI is more likely to be what bumps you 224usec or more. I'm happy to code and test a patch. Rather than just submit a patch, I thought I'd request others' comments on this, since it affects so many architectures. cc me, if you will, as I don't subscribe to LKML, just check it periodically. Go for it. The other architectures generally inherit it by inheriting similar bridge chips or in more modern times the RTC macrocell. It should also be possible to debug by putting in an optional sanity check initially which checks the read made sense compared to a few more - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RFC: do get_rtc_time() correctly
In trying to track down a bug related to hwclock hanging my x86_64 machine, I found myself reading include/asm-generic/rtc.h carefully with a chipset spec for several RTC chips (in particular, the granddaddy, the MC146818A) in my hand. I found that the code in get_rtc_time() is very, very odd, and IMO very wrong. The idea in the comment at the top seems to suggest that the author thought that the UIP flag indicates an update is in progress at that very instant, so one needs to synchronize with the "falling edge" of that flag to ensure that one can read the RTC state without instability in its buffered value. That is not the way the UIP flag is defined to work. The UIP flag is =1 during a period PRIOR to the actual update, starting 224 usec before the update, and ending when the update is complete. It is done that way (which might seem odd) so that if you read UIP=0, you have a 224 usec window, EVEN IF the UIP were to become =1 just after you read it. So the proper way to read the RTC contents is to read the UIP flag, and if zero, read all the RTC registers with interrupts masked completely, so all reads happen in the 224 usec window. (NMI can still be a problem, but you can have NMI's set a flag that forces a retry). If the UIP flag is one, you need to try again. Pseudo-code is as follows: retry: spin_lock_irq(&rtc_lock); if (UIP_flag !=0 ) { spin_unlock_irq(&rtc_lock); cpu_relax(); goto retry; } ... read RTC registers ... spin_unlock_irq(&rtc_lock); This should work for all RTC's compatible with the MC146818A, and is also somewhat faster and less masked than the code in the current Linux (not that reading RTC's is crucial for performance, but the current code occasionally *loops with all interrupts masked for 10 msec!* Why anyone thought this necessary, I have no clue.) I'm happy to code and test a patch. Rather than just submit a patch, I thought I'd request others' comments on this, since it affects so many architectures. cc me, if you will, as I don't subscribe to LKML, just check it periodically. As noted in the comment, it *is* true that setting the RTC clock needs additional synchronization, which can be done in the drivers, as it seems to be. (though I would use an API that is designed so that any delay during the set period actually adds to the value being stored, so that delaying during the set_rtc operation would not cause the value stored to be "old") - David PS: I wrote code for 8088 versions of various PC DOS apps back in the '80's, such as VisiCalc and Lotus 1-2-3, and hacked many timing-related drivers back in the day - so I know this chip spec cold, and figured out this odd stuff back then. That's why the weirdness jumped out at me. I'm still an assembly language coder at heart. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/