On 2007.01.22 19:24:22 -0600, Robert Hancock wrote: > Björn Steinbrink wrote: > >>>Running a kernel with the return statement replace by a line that prints > >>>the irq_stat instead. > >>> > >>>Currently I'm seeing lots of 0x10 on ata1 and 0x0 on ata2. > >>40 minutes stress test now and no exception yet. What's interesting is > >>that ata1 saw exactly one interrupt with irq_stat 0x0, all others that > >>might have get dropped are as above. > >>I'll keep it running for some time and will then re-enable the return > >>statement to see if there's a relation between the irq_stat 0x0 and the > >>exception. > > > >No, doesn't seem to be related, did get 2 exceptions, but no irq_stat > >0x0 for ata1. Syslog/dmesg has nothing new either, still the same > >pattern of dismissed irq_stats. > > I've finally managed to reproduce this problem on my box, by doing: > > watch --interval=0.1 /sbin/hdparm -I /dev/sda > > on one drive and then running bonnie++ on /dev/sdb connected to the > other port on the same controller device. Usually within a few minutes > one of the IDENTIFY commands would time out in the same way you guys > have been seeing. > > Through some various trials and tribulations, the only conclusion I can > come to is that this controller really doesn't like that > NV_INT_STATUS_CK804 register being looked at in ADMA mode. I tried > adding some debug code to the qc_issue function that would check to see > if the BUSY flag in altstatus went high or that register showed an > interrupt within a certain time afterwards, however that really seemed > to hose things, the system wouldn't even boot.
Hm, I don't think it is unhappy about looking at NV_INT_STATUS_CK804. I'm running 2.6.20-rc5 with the INT_DEV check removed for 8 hours now without a single problem and that should still look at NV_INT_STATUS_CK804, right? I just noticed that my last email might not have been clear enough. The exceptions happened when I re-enabled the return statement in addition to the debug message. Without the INT_DEV check, it is completely fine AFAICT. > Try out this patch, it just calls the ata_host_intr function where > appropriate without using nv_host_intr which looks at the > NV_INT_STATUS_CK804 register. This is what the original ADMA patch from > Mr. Mysterious NVIDIA Person did, I'm guessing there may be a reason for > that. With this patch I can get through a whole bonnie++ run with the > repeated IDENTIFY requests running without seeing the error. I'll see if I can schedule a test run for tomorrow, I currently need this box. Thanks, Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/