Re: [BUG] 2.6.24 refuses to boot - ATA problem?

2008-02-04 Thread Mark Lord

Gene Heskett wrote:

On Sunday 03 February 2008, Ingo Molnar wrote:

* Gene Heskett [EMAIL PROTECTED] wrote:

I believe its the same, but lemme paste it for sure, yes:
[   26.339926] ENABLING IO-APIC IRQs
[   26.340119] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
[   26.350129] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[   26.350182] ...trying to set up timer (IRQ0) through the 8259A ... 
failed. [   26.350185] ...trying to set up timer as Virtual Wire IRQ...

failed. [   26.360186] ...trying to set up timer as ExtINT IRQ... works.

The third line is the only line that makes it to the screen during the
boot trace.

Now, what does this tell us?

the question would be:

- if you remove the acpi_use_timer_override boot flag
- and if you boot a kernel with this hack applied

= do those weird PATA failures come back?

If the failues do _not_ come back then the problem is somehow
affected/worked-around by the IO-APIC code that generates the above 4
lines. If the failures are still the same then the above 4 lines are
really just an uninteresting side-effect of the acpi_use_timer_override
flag - and the real side-effects (that fixes PATA on your box) are to be
found elsewhere.

Sadly, the latter variant is the expected answer.

Ingo


And at this point, I can't tell.  This reboot was from a cold start, without 
the argument, and cold by long enough to make the rounds about the house and 
pick up a beer, but not take my evening pillbox.  A minute cold, maybe 2 max.  
The log is clean since except for a kudzu nag of some sort:

..

Just to muddy your observations:  it is quite possible that a cold (power-off)
reboot may be required to properly observe what happens here.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.24 refuses to boot - ATA problem?

2008-02-04 Thread Gene Heskett
On Monday 04 February 2008, Mark Lord wrote:
Gene Heskett wrote:
 On Sunday 03 February 2008, Ingo Molnar wrote:
 * Gene Heskett [EMAIL PROTECTED] wrote:
 I believe its the same, but lemme paste it for sure, yes:
 [   26.339926] ENABLING IO-APIC IRQs
 [   26.340119] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
 [   26.350129] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 [   26.350182] ...trying to set up timer (IRQ0) through the 8259A ...
 failed. [   26.350185] ...trying to set up timer as Virtual Wire IRQ...
 failed. [   26.360186] ...trying to set up timer as ExtINT IRQ... works.

 The third line is the only line that makes it to the screen during the
 boot trace.

 Now, what does this tell us?

 the question would be:

 - if you remove the acpi_use_timer_override boot flag
 - and if you boot a kernel with this hack applied

 = do those weird PATA failures come back?

 If the failues do _not_ come back then the problem is somehow
 affected/worked-around by the IO-APIC code that generates the above 4
 lines. If the failures are still the same then the above 4 lines are
 really just an uninteresting side-effect of the acpi_use_timer_override
 flag - and the real side-effects (that fixes PATA on your box) are to be
 found elsewhere.

 Sadly, the latter variant is the expected answer.

 Ingo

 And at this point, I can't tell.  This reboot was from a cold start,
 without the argument, and cold by long enough to make the rounds about the
 house and pick up a beer, but not take my evening pillbox.  A minute cold,
 maybe 2 max. The log is clean since except for a kudzu nag of some sort:

..

Just to muddy your observations:  it is quite possible that a cold
 (power-off) reboot may be required to properly observe what happens here.

Precisely why I've now done that twice, without using the extra argument.  No 
recurrence dammit.

Cheers



-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
He who makes a beast of himself gets rid of the pain of being a man.
-- Dr. Johnson
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.24 refuses to boot - ATA problem?

2008-02-02 Thread Jeff Garzik

Chris Rankin wrote:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: configured for UDMA/66
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link



Had at least one other report like this...  Sleepiness prevents me from 
recalling more at the moment, but I think the other report was fixed 
with a special ACPI switch...


/me puts in pile for Monday...

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.24 refuses to boot - ATA problem?

2008-02-02 Thread Gene Heskett
On Saturday 02 February 2008, Jeff Garzik wrote:
Chris Rankin wrote:
 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
 ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
 ata1.00: status: { DRDY }
 ata1: soft resetting link
 ata1.00: configured for UDMA/66
 ata1: EH complete
 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
 ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
 ata1.00: status: { DRDY }
 ata1: soft resetting link

Had at least one other report like this...  Sleepiness prevents me from
recalling more at the moment, but I think the other report was fixed
with a special ACPI switch...

I think that one came from me, but it also gets over 14,000 hits on google.

Now Jeff, here is the strange part.  That error was killing me, many times 
an hour and eventually crashing completely, repeatedly.

I applied that kernel argument acpi_use_timer_override once and have not 
had the error since, and that includes one test of a full let it cool for 
a minute powerdown reboot to see if it would come back, which it did not.

That argument causes the kernel to log this as its responding to that command:

[   27.097095] ENABLING IO-APIC IRQs
[   27.097287] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
[   27.107291] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[   27.107343] ...trying to set up timer (IRQ0) through the 8259A ...  failed.
[   27.107346] ...trying to set up timer as Virtual Wire IRQ... failed.
[   27.117353] ...trying to set up timer as ExtINT IRQ... works.

The last 4 lines above are not logged without that argument.  So my theory ATM
is that this forced the kernel to initialize something in the boards
registers that it does not initialize without that command, and that its
going fubar as shown in the msg quoted above is a totally random thing, perhaps 
dependent on the phase of one of jupiters moons as to what state it powers 
up in.  And I got lucky, so far in that my single powerdown reset didn't 
trigger it again...  And you _know_ what that knocking sound is by now. :)
 
That's my admittedly hardware oriented view of the goings on.  But I also
think it should be a good clue as to what piece of the acpi code
needs walked around in and its tires kicked again, with an eye toward 
making that item a wee bit more intelligently done.  If you can cobble
up something that will extract the data and prove what fails, I'll be 
glad to play guinea pig.  With ccache, a kernel build is  15 minutes to
actually running it.

My $0.02 in 1934 dollars.  Adjust for inflation since.

/me puts in pile for Monday...

   Jeff

Thanks Jeff.  I'm glad to see that this isn't scheduled to 'fall through
the cracks' as does happen when folks get busy.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
What!?  Me worry?
-- Alfred E. Newman
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.24 refuses to boot - ATA problem?

2008-02-02 Thread Ingo Molnar

* Gene Heskett [EMAIL PROTECTED] wrote:

 I think that one came from me, but it also gets over 14,000 hits on 
 google.
 
 Now Jeff, here is the strange part.  That error was killing me, many 
 times an hour and eventually crashing completely, repeatedly.
 
 I applied that kernel argument acpi_use_timer_override once and have 
 not had the error since, and that includes one test of a full let it 
 cool for a minute powerdown reboot to see if it would come back, which 
 it did not.
 
 That argument causes the kernel to log this as its responding to that 
 command:
 
 [   27.097095] ENABLING IO-APIC IRQs
 [   27.097287] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
 [   27.107291] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 [   27.107343] ...trying to set up timer (IRQ0) through the 8259A ...  failed.
 [   27.107346] ...trying to set up timer as Virtual Wire IRQ... failed.
 [   27.117353] ...trying to set up timer as ExtINT IRQ... works.
 
 The last 4 lines above are not logged without that argument.  So my 
 theory ATM is that this forced the kernel to initialize something in 
 the boards registers that it does not initialize without that command, 
 and that its going fubar as shown in the msg quoted above is a totally 
 random thing, perhaps dependent on the phase of one of jupiters moons 
 as to what state it powers up in.  And I got lucky, so far in that my 
 single powerdown reset didn't trigger it again...  And you _know_ what 
 that knocking sound is by now. :)

that's weird. Could you try the hack below and _remove_ the 
acpi_use_timer_override flag? The change should artificially cause the 
above 4 lines to appear again, in all cases.

This would test the following aspects of your theory: is this unknown 
side-effect of the the acpi_use_timer_override flag related to the timer 
setup sequence in io_apic_32.c? If not, then the difference most likely 
lies in the different ACPI setup sequence.

Ingo

---
 arch/x86/kernel/io_apic_32.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86/kernel/io_apic_32.c
===
--- linux.orig/arch/x86/kernel/io_apic_32.c
+++ linux/arch/x86/kernel/io_apic_32.c
@@ -2208,7 +2208,7 @@ static inline void __init check_timer(vo
 * Ok, does IRQ0 through the IOAPIC work?
 */
unmask_IO_APIC_irq(0);
-   if (timer_irq_works()) {
+   if (timer_irq_works()  0) {
if (nmi_watchdog == NMI_IO_APIC) {
disable_8259A_irq(0);
setup_nmi();
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.24 refuses to boot - ATA problem?

2008-02-02 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

  [   27.097095] ENABLING IO-APIC IRQs
  [   27.097287] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
  [   27.107291] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
  [   27.107343] ...trying to set up timer (IRQ0) through the 8259A ...  
  failed.
  [   27.107346] ...trying to set up timer as Virtual Wire IRQ... failed.
  [   27.117353] ...trying to set up timer as ExtINT IRQ... works.
  
  The last 4 lines above are not logged without that argument.  So my 
  theory ATM is that this forced the kernel to initialize something in 
  the boards registers that it does not initialize without that 
  command, and that its going fubar as shown in the msg quoted above 
  is a totally random thing, perhaps dependent on the phase of one of 
  jupiters moons as to what state it powers up in.  And I got lucky, 
  so far in that my single powerdown reset didn't trigger it again...  
  And you _know_ what that knocking sound is by now. :)
 
 that's weird. Could you try the hack below and _remove_ the 
 acpi_use_timer_override flag? The change should artificially cause the 
 above 4 lines to appear again, in all cases.
 
 This would test the following aspects of your theory: is this unknown 
 side-effect of the the acpi_use_timer_override flag related to the 
 timer setup sequence in io_apic_32.c? If not, then the difference most 
 likely lies in the different ACPI setup sequence.

i tried that patch on a box here, and it produces similar 4 lines:

[0.172141] ENABLING IO-APIC IRQs
[0.175498] init IO_APIC IRQs
[0.176059]  IO-APIC (apicid-pin) 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 
2-23 not connected.
[0.187942] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
[0.233859] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[0.236014] ...trying to set up timer (IRQ0) through the 8259A ...  failed.
[0.236014] ...trying to set up timer as Virtual Wire IRQ... failed.
[0.236014] ...trying to set up timer as ExtINT IRQ... works.
[0.277879] Using local APIC timer interrupts.

but ... in all likelyhood it's some ACPI side-effects of the 
acpi_use_timer_override flag, not really this IO-APIC/timer-setup detail 
that matters.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.24 refuses to boot - ATA problem?

2008-02-02 Thread Gene Heskett
On Saturday 02 February 2008, Ingo Molnar wrote:
* Gene Heskett [EMAIL PROTECTED] wrote:
 I think that one came from me, but it also gets over 14,000 hits on
 google.

 Now Jeff, here is the strange part.  That error was killing me, many
 times an hour and eventually crashing completely, repeatedly.

 I applied that kernel argument acpi_use_timer_override once and have
 not had the error since, and that includes one test of a full let it
 cool for a minute powerdown reboot to see if it would come back, which
 it did not.

 That argument causes the kernel to log this as its responding to that
 command:

 [   27.097095] ENABLING IO-APIC IRQs
 [   27.097287] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
 [   27.107291] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 [   27.107343] ...trying to set up timer (IRQ0) through the 8259A ... 
 failed. [   27.107346] ...trying to set up timer as Virtual Wire IRQ...
 failed. [   27.117353] ...trying to set up timer as ExtINT IRQ... works.

 The last 4 lines above are not logged without that argument.  So my
 theory ATM is that this forced the kernel to initialize something in
 the boards registers that it does not initialize without that command,
 and that its going fubar as shown in the msg quoted above is a totally
 random thing, perhaps dependent on the phase of one of jupiters moons
 as to what state it powers up in.  And I got lucky, so far in that my
 single powerdown reset didn't trigger it again...  And you _know_ what
 that knocking sound is by now. :)

that's weird. Could you try the hack below and _remove_ the
acpi_use_timer_override flag? The change should artificially cause the
above 4 lines to appear again, in all cases.

This would test the following aspects of your theory: is this unknown
side-effect of the the acpi_use_timer_override flag related to the timer
setup sequence in io_apic_32.c? If not, then the difference most likely
lies in the different ACPI setup sequence.

   Ingo

---
 arch/x86/kernel/io_apic_32.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86/kernel/io_apic_32.c
===
--- linux.orig/arch/x86/kernel/io_apic_32.c
+++ linux/arch/x86/kernel/io_apic_32.c
@@ -2208,7 +2208,7 @@ static inline void __init check_timer(vo
* Ok, does IRQ0 through the IOAPIC work?
*/
   unmask_IO_APIC_irq(0);
-  if (timer_irq_works()) {
+  if (timer_irq_works()  0) {
   if (nmi_watchdog == NMI_IO_APIC) {
   disable_8259A_irq(0);
   setup_nmi();

I believe its the same, but lemme paste it for sure, yes:
[   26.339926] ENABLING IO-APIC IRQs
[   26.340119] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
[   26.350129] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[   26.350182] ...trying to set up timer (IRQ0) through the 8259A ...  failed.
[   26.350185] ...trying to set up timer as Virtual Wire IRQ... failed.
[   26.360186] ...trying to set up timer as ExtINT IRQ... works.

The third line is the only line that makes it to the screen during the boot 
trace.

Now, what does this tell us?

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
As far as the laws of mathematics refer to reality, they are not
certain, and as far as they are certain, they do not refer to reality.
-- Albert Einstein
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.24 refuses to boot - ATA problem?

2008-02-02 Thread Ingo Molnar

* Gene Heskett [EMAIL PROTECTED] wrote:

 I believe its the same, but lemme paste it for sure, yes:
 [   26.339926] ENABLING IO-APIC IRQs
 [   26.340119] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
 [   26.350129] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 [   26.350182] ...trying to set up timer (IRQ0) through the 8259A ...  failed.
 [   26.350185] ...trying to set up timer as Virtual Wire IRQ... failed.
 [   26.360186] ...trying to set up timer as ExtINT IRQ... works.
 
 The third line is the only line that makes it to the screen during the 
 boot trace.
 
 Now, what does this tell us?

the question would be:

 - if you remove the acpi_use_timer_override boot flag
 - and if you boot a kernel with this hack applied

= do those weird PATA failures come back?

If the failues do _not_ come back then the problem is somehow 
affected/worked-around by the IO-APIC code that generates the above 4 
lines. If the failures are still the same then the above 4 lines are 
really just an uninteresting side-effect of the acpi_use_timer_override 
flag - and the real side-effects (that fixes PATA on your box) are to be 
found elsewhere.

Sadly, the latter variant is the expected answer.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.24 refuses to boot - ATA problem?

2008-02-02 Thread Gene Heskett
On Sunday 03 February 2008, Ingo Molnar wrote:
* Gene Heskett [EMAIL PROTECTED] wrote:
 I believe its the same, but lemme paste it for sure, yes:
 [   26.339926] ENABLING IO-APIC IRQs
 [   26.340119] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
 [   26.350129] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 [   26.350182] ...trying to set up timer (IRQ0) through the 8259A ... 
 failed. [   26.350185] ...trying to set up timer as Virtual Wire IRQ...
 failed. [   26.360186] ...trying to set up timer as ExtINT IRQ... works.

 The third line is the only line that makes it to the screen during the
 boot trace.

 Now, what does this tell us?

the question would be:

 - if you remove the acpi_use_timer_override boot flag
 - and if you boot a kernel with this hack applied

= do those weird PATA failures come back?

If the failues do _not_ come back then the problem is somehow
affected/worked-around by the IO-APIC code that generates the above 4
lines. If the failures are still the same then the above 4 lines are
really just an uninteresting side-effect of the acpi_use_timer_override
flag - and the real side-effects (that fixes PATA on your box) are to be
found elsewhere.

Sadly, the latter variant is the expected answer.

   Ingo

And at this point, I can't tell.  This reboot was from a cold start, without 
the argument, and cold by long enough to make the rounds about the house and 
pick up a beer, but not take my evening pillbox.  A minute cold, maybe 2 max.  
The log is clean since except for a kudzu nag of some sort:

[   50.535388] warning: process `kudzu' used the deprecated sysctl system call 
with 1.23.

which isn't your problem, but fedora's.

As I said before, that error has not returned since the first time I used that 
argument, and I have booted several times now without it.  Uptime now is just 
over an hour though, so I'm not taking bets just yet. :)

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Now I lay me down to sleep
I pray the double lock will keep;
May no brick through the window break,
And, no one rob me till I awake.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html