Torsten Kaiser wrote: > Looking closer at > http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=commitdiff;h=ec6fdded4d76aa54aa57341e5dfdd61c507b1dcd > the change to libata.h seems bogus : > > in ata_qc_first_sg: > old new > return qc->__sg return qc->__sg > qc->__sg - qc->__sg == 0 qc->n_iter=0 > -> sg - qc->__sg corresponds to qc->n_iter > > in ata_qc_next_sg: > sg++; sg_next(sg); qc->n_iter++; > sg - qc->__sg < qc->n_elem qc->n_iter < qc->nelem > -> sg - qc->__sg corresponds to qc->n_iter > > but in ata_sg_is_last: > (sg - qc->__sg) +1 == qc->n_elem qc->n_iter == qc->n_elem > if sg - qc->__sg corresponds to qc->n_iter then shoudn't it be > qc->n_iter+1 == qc->n_elem? > > That missing +1 would explain, why the SGE_TRM never gets set.
Thanks a lot for tracking this down. Does changing the above code fix your problem? Jens, Torsten's analysis looks correct && depending on qc state (n_iter) during iteration doesn't look like a good idea. Those iterators are not supposed to have side effects. Would it be difficult to implement sg_last() test? > And it would fit the symptoms, that the boot would fail at random. If > the "correct" garbage was in place to where the sglist runs off it > hangs the drive. > And that would even fit the two different errors that I only got one time > each: > * a completely illegal access (PCI master abort while fetching SGT) > * wrong alignment of the SGT (SGT no on qword boundary) > At that that times the garbage seemed to point invalid addresses. > > But I'm still not understanding, how the kernel could only fail > sometimes at bootup, but after that working without any visible > errors? Is the sil-chip rather intelligent about detecting corrupted > sglists and silently ignoring them? I have no idea why it fails only sometimes. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

