On 10/5/07, Torsten Kaiser <[EMAIL PROTECTED]> wrote: > So I will use the weekend to see if I can find out who issues this > command and add more debug to that place...
I added some DPRINTK to sil24_qc_issue and sil24_fill_sg, but I only found one suspicious thing. My sil24_fill_sg now looks like this: static inline void sil24_fill_sg(struct ata_queued_cmd *qc, struct sil24_sge *sge) { struct scatterlist *sg; ata_for_each_sg(sg, qc) { sge->addr = cpu_to_le64(sg_dma_address(sg)); sge->cnt = cpu_to_le32(sg_dma_len(sg)); if (ata_sg_is_last(sg, qc)) sge->flags = cpu_to_le32(SGE_TRM); else sge->flags = 0; DPRINTK("flags,addr,cnt = 0x%x, 0x%X, 0x%X\n", sge->flags, sge->addr, sge->cnt); sge++; } } Suspicious is, that *all* output from this DPRINTK shows flags as 0x0, so the last sg is never terminated (SGE_TRM is 1<<31)? But if that is the cause, how is this working at all? Or am I doing something stupid? Timing and outputs from five boots: good: bad: more moreboot more 3->35 3->35 3->35 3->35 3->35 3->2a 2->35 2->35 3->2a 3->2a 3->setup 2->2a 2->2a 3->setup 3->setup 2->35 2->35 2->35 2->35 2->35 1->35 3->2a 3->2a 1->35 1->35 2->2a 3->setup 3->setup 2->2a 2->2a 1->2a 1->35 1->35 1->2a 1->2a 2->35 1->2a 1->2a 2->35 1->35 1->35 1->35 1->35 3->int 3->int 3->int 3->int 3->int 3->35 3->35 3->35 3->35 3->35 1->5DF/1439C 1->5DC/1439C 1->5DE/1439C 2->5E0/143BC 2->5DE/143BC 2->5DF/143BC sg:170E sg:1AAB sg:1A60 XXX: 5DD 5DF 5DC 5DF 5DE 5E0 5E0 5DE 5E0 5DF The first three columns where working tries, the last two failed one drive. column 1: ATA_DEBUG added, reboot column 2: +my additions, reboot column 3: +my additions, cold boot, wanted to make it fail, but worked column 4: ATA_DEBUG added, cold boot column 5: +my additions, cold boot [x]->[y]: x is the ata-port, 1+2 on the sata_sil24, 3 on sata_nv with swncq y:35 -> SYNCHRONIZE_CACHE commands that where send to the drive y:2a -> WRITE_10 commands that where send to the drive y:setup -> Debug from swncq: nv_swncq_dmafis: dma setup tag 0x0 y:int -> Debug from swncq: nv_swncq_host_interrupt: id 0x3 SWNCQ: qc_active 0x1 ... The lines before the XXX: x->a/b: x is the ata-port, a the paddr from sil24_qc_issue, b the activate from sil24_qc_issue All outputs from sil24_qc_issue where identical in each boot sequence, only differed from run to run. sg:a: a is the sge->addr from sil24_fill_sg The lines after the XXX: This are the addresses that the XXX-printk from sil24_port_start prints. I hope I explained enough what above table should mean. This hole sequence (two syncs and one write to each drive) happens between the output: [ 40.300000] md1: bitmap initialized from disk: read 10/10 pages, set 87 bits [ 40.320000] created bitmap (145 pages) for device md1 and the error on a bad boot: [ 70.680000] ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [ 70.700000] ata2.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out or if on a good boot: [ 40.910000] md: considering sdb1 ... (sdb1 is part of another raid) (If someone whats to complete bootlogs, just ask) So now I have two questions: 1) What happens in sil24_fill_sg with SGE_TRM? 2) If that is ok, should I try to add debug to sil24_error_intr and/or sil24_host_intr? Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/