On Thu, 25 May 2006, Norbert Preining wrote: > Hi all! > > On Mit, 24 Mai 2006, Alan Stern wrote: > > It might not be a bad idea to follow Pete's advice and use usbmon. > > Done. > > On Mit, 24 Mai 2006, Pete Zaitcev wrote: > > I suggest looking for -104 (unlink). It can be done with grep. Use > > Attached is the region around the first -104, I hope I didn't delete too > much. The full output for the time frame: Start of copying till after > the second reset (not for all, that would be bad) is here: > http://www.logic.at/people/preining/usbmon.bus5.out.gz > size: 2716164 > > Tell me if I should edit/cut more or less.
Hmm. It's a little tricky to make sense of this, partly because it looks like the usbmon buffer filled up. I've chopped out a bunch of useless data-buffer contents from the log; here's what we get: c2b26240 677194319 S Bo:002:01 -115 4096 = c2b261c0 677194323 S Bo:002:01 -115 4096 = c2b26140 677194327 S Bo:002:01 -115 4096 = c2b260c0 677194330 S Bo:002:01 -115 4096 = d8b23f40 677194333 S Bo:002:01 -115 4096 = d8b23ec0 677194336 S Bo:002:01 -115 8192 = c2b26840 677194339 S Bo:002:01 -115 8192 = d8b23e40 677194342 S Bo:002:01 -115 4096 = d8b23dc0 677194345 S Bo:002:01 -115 4096 = These lines are last few URBs of a WRITE data transfer. It looks like two lines are missing from the end; note that the last URB address is d8b23dc0. c2b26ac0 677195112 C Bo:002:01 0 4096 > c2b26a40 677195116 C Bo:002:01 0 4096 > c2b269c0 677195120 C Bo:002:01 0 4096 > c2b266c0 677202754 C Bo:002:01 0 4096 > c2b268c0 677202759 C Bo:002:01 0 4096 > c2b26940 677202762 C Bo:002:01 0 4096 > c2b267c0 677202764 C Bo:002:01 0 8192 > c2b26740 677202766 C Bo:002:01 0 4096 > c2b26640 677202769 C Bo:002:01 0 4096 > c2b26540 677202772 C Bo:002:01 0 8192 > c2b264c0 677202775 C Bo:002:01 0 4096 > c2b26440 677202777 C Bo:002:01 0 4096 > c2b263c0 677202780 C Bo:002:01 0 4096 > c2b26340 677202782 C Bo:002:01 0 4096 > c2b262c0 677202785 C Bo:002:01 0 4096 > c2b26240 677202788 C Bo:002:01 0 4096 > c2b261c0 677202791 C Bo:002:01 0 4096 > c2b26140 677202794 C Bo:002:01 0 4096 > c2b260c0 677202796 C Bo:002:01 0 4096 > d8b23f40 677202799 C Bo:002:01 0 4096 > d8b23ec0 677202802 C Bo:002:01 0 8192 > c2b26840 677202804 C Bo:002:01 0 8192 > d8b23e40 677202807 C Bo:002:01 0 4096 > d8b23dc0 677202809 C Bo:002:01 0 4096 > d8b23d40 677202812 C Bo:002:01 -32 2048 > d8b23cc0 677202818 C Bo:002:01 -104 0 These are the completions of those data transfers. Notice that after d8b23dc0 there are two more lines; presumably these were missing above. All of the transfers succeeded up until the last two. The second-to-last stopped after transferring only 2048 bytes and the last one was cancelled. By looking at the complete log it's clear that each of the last two lines was meant to transfer 4096 bytes. de811940 677202833 S Co:002:00 s 02 01 0000 0001 0000 0 de811940 677203004 C Co:002:00 0 0 The endpoint halt is cleared. de811940 677203012 S Bi:002:02 -115 13 < de811940 677203134 C Bi:002:02 0 13 = 55534253 be020000 00200000 00 The computer fetches the status from the previous command. The status indicates that the command succeeded but the last 0x2000 = 8192 bytes were not accepted. This is reasonable, considering that the last 6144 bytes were not transferred at all. It could be, as you guessed, that the drive reached some sort of internal limit and then stopped accepting any more data. Perhaps we should treat this short transfer as an indication that we need to pause -- but for how long? de811940 677203264 S Bo:002:01 -115 31 = 55534243 bf020000 00e00100 00000a2a 00047134 2d0000f0 00000000 000000 de811940 677203381 C Bo:002:01 0 31 > This is the next write command. It is in fact a retry of the previous command. It shouldn't be necessary to transfer the first 224 sectors over again since they were accepted, but the SCSI core is conservative and resends them anyway. The interesting thing here is the total length of the write. It is 0x01e000 bytes, or 240 sectors, or 30 4096-byte pages. c2b26ac0 677203397 S Bo:002:01 -115 4096 = c2b26a40 677203400 S Bo:002:01 -115 4096 = c2b269c0 677203404 S Bo:002:01 -115 4096 = c2b266c0 677203407 S Bo:002:01 -115 4096 = c2b268c0 677203410 S Bo:002:01 -115 4096 = c2b26940 677203413 S Bo:002:01 -115 4096 = c2b267c0 677203416 S Bo:002:01 -115 8192 = c2b26740 677203421 S Bo:002:01 -115 4096 = c2b26640 677203424 S Bo:002:01 -115 4096 = c2b26540 677203427 S Bo:002:01 -115 8192 = c2b264c0 677203430 S Bo:002:01 -115 4096 = c2b26440 677203433 S Bo:002:01 -115 4096 = c2b263c0 677203436 S Bo:002:01 -115 4096 = c2b26340 677203440 S Bo:002:01 -115 4096 = c2b262c0 677203443 S Bo:002:01 -115 4096 = c2b26240 677203445 S Bo:002:01 -115 4096 = c2b261c0 677203448 S Bo:002:01 -115 4096 = c2b26140 677203451 S Bo:002:01 -115 4096 = c2b260c0 677203454 S Bo:002:01 -115 4096 = d8b23f40 677203457 S Bo:002:01 -115 4096 = d8b23ec0 677203460 S Bo:002:01 -115 8192 = c2b26840 677203463 S Bo:002:01 -115 8192 = d8b23e40 677203466 S Bo:002:01 -115 4096 = d8b23dc0 677203469 S Bo:002:01 -115 4096 = If you add up the total number of pages listed, you'll see it comes to 28, not 30. Once again it appears that two lines, each for 4096 bytes, have been dropped from the end. c2b26ac0 737210713 C Bo:002:01 -104 2048 > c2b26a40 737210729 C Bo:002:01 -104 0 c2b269c0 737210732 C Bo:002:01 -104 0 c2b266c0 737210735 C Bo:002:01 -104 0 c2b268c0 737210738 C Bo:002:01 -104 0 c2b26940 737210740 C Bo:002:01 -104 0 c2b267c0 737210743 C Bo:002:01 -104 0 c2b26740 737210745 C Bo:002:01 -104 0 c2b26640 737210748 C Bo:002:01 -104 0 c2b26540 737210752 C Bo:002:01 -104 0 c2b264c0 737210754 C Bo:002:01 -104 0 c2b26440 737210757 C Bo:002:01 -104 0 c2b263c0 737210759 C Bo:002:01 -104 0 c2b26340 737210762 C Bo:002:01 -104 0 c2b262c0 737210765 C Bo:002:01 -104 0 c2b26240 737210767 C Bo:002:01 -104 0 c2b261c0 737210770 C Bo:002:01 -104 0 c2b26140 737210773 C Bo:002:01 -104 0 c2b260c0 737210776 C Bo:002:01 -104 0 d8b23f40 737210779 C Bo:002:01 -104 0 d8b23ec0 737210781 C Bo:002:01 -104 0 c2b26840 737210784 C Bo:002:01 -104 0 d8b23e40 737210786 C Bo:002:01 -104 0 d8b23dc0 737210789 C Bo:002:01 -104 0 d8b23d40 737210791 C Bo:002:01 -104 0 This time the transfer stopped after only 2048 bytes, and everything else was cancelled. Note that it didn't fail, in the sense that the drive did not halt the endpoint like it did before. Instead the transfer timed out: The timestamp jumps from 677.203469 to 737.210713 (i.e., 30 seconds). c2b26ec0 737266617 S Ci:001:00 s a3 00 0000 0003 0004 4 < c2b26ec0 737266935 C Ci:001:00 0 4 = 03051000 c2b26ec0 737322613 S Co:001:00 s 23 01 0014 0003 0000 0 c2b26ec0 737322621 C Co:001:00 0 0 c2b26ec0 737322643 S Ci:000:00 s 80 06 0100 0000 0040 64 < c2b26ec0 737328992 C Ci:000:00 0 18 = 12010002 00000040 06100330 00014950 6301 c2b26ec0 737329008 S Co:001:00 s 23 03 0004 0003 0000 0 c2b26ec0 737329014 C Co:001:00 0 0 c2b26ec0 737382620 S Ci:001:00 s a3 00 0000 0003 0004 4 < c2b26ec0 737382833 C Ci:001:00 0 4 = 03051000 c2b26ec0 737438627 S Co:001:00 s 23 01 0014 0003 0000 0 c2b26ec0 737438635 C Co:001:00 0 0 c2b26ec0 737438640 S Co:000:00 s 00 05 0002 0000 0000 0 c2b26ec0 737438755 C Co:000:00 0 0 c2b26ec0 737458632 S Ci:002:00 s 80 06 0100 0000 0012 18 < c2b26ec0 737464889 C Ci:002:00 0 18 = 12010002 00000040 06100330 00014950 6301 c2b26ec0 737464907 S Ci:002:00 s 80 06 0200 0000 0027 39 < c2b26ec0 737471521 C Ci:002:00 0 39 = 09022700 010200c0 31090400 00030806 50000705 82020002 01070501 02000201 c2b26ec0 737471538 S Co:002:00 s 00 09 0002 0000 0000 0 c2b26ec0 737471634 C Co:002:00 0 0 c2b26ec0 737471645 S Co:002:00 s 01 0b 0000 0000 0000 0 c2b26ec0 737471760 C Co:002:00 0 0 de811940 737471860 S Bo:002:01 -115 31 = 55534243 c0020000 00000000 00000600 00000000 00000000 00000000 000000 de811940 737472011 C Bo:002:01 0 31 > de811940 737472023 S Bi:002:02 -115 13 < de811940 737472138 C Bi:002:02 0 13 = 55534253 c0020000 00000000 00 All that stuff is the USB port reset and the drive reporting that it is now okay. de811940 737472173 S Bo:002:01 -115 31 = 55534243 c1020000 00e00100 00000a2a 00047134 2d0000f0 00000000 000000 de811940 737472385 C Bo:002:01 0 31 > Here the command is retried again. c2b26ac0 737472403 S Bo:002:01 -115 4096 = c2b26a40 737472407 S Bo:002:01 -115 4096 = c2b269c0 737472411 S Bo:002:01 -115 4096 = c2b266c0 737472415 S Bo:002:01 -115 4096 = c2b268c0 737472418 S Bo:002:01 -115 4096 = c2b26940 737472421 S Bo:002:01 -115 4096 = c2b267c0 737472424 S Bo:002:01 -115 8192 = c2b26740 737472427 S Bo:002:01 -115 4096 = c2b26640 737472432 S Bo:002:01 -115 4096 = c2b26540 737472435 S Bo:002:01 -115 8192 = c2b264c0 737472439 S Bo:002:01 -115 4096 = c2b26440 737472442 S Bo:002:01 -115 4096 = c2b263c0 737472445 S Bo:002:01 -115 4096 = c2b26340 737472449 S Bo:002:01 -115 4096 = c2b262c0 737472452 S Bo:002:01 -115 4096 = c2b26240 737472456 S Bo:002:01 -115 4096 = c2b261c0 737472459 S Bo:002:01 -115 4096 = c2b26140 737472462 S Bo:002:01 -115 4096 = c2b260c0 737472466 S Bo:002:01 -115 4096 = d8b23f40 737472468 S Bo:002:01 -115 4096 = d8b23ec0 737472471 S Bo:002:01 -115 8192 = c2b26840 737472475 S Bo:002:01 -115 8192 = d8b23e40 737472478 S Bo:002:01 -115 4096 = d8b23dc0 737472482 S Bo:002:01 -115 4096 = c2b26ac0 737474631 C Bo:002:01 0 4096 > c2b26a40 737474636 C Bo:002:01 0 4096 > c2b269c0 737474639 C Bo:002:01 0 4096 > c2b266c0 737474642 C Bo:002:01 0 4096 > c2b268c0 737474647 C Bo:002:01 0 4096 > c2b26940 737474649 C Bo:002:01 0 4096 > c2b267c0 737474652 C Bo:002:01 0 8192 > c2b26740 737480764 C Bo:002:01 0 4096 > c2b26640 737480768 C Bo:002:01 0 4096 > c2b26540 737480771 C Bo:002:01 0 8192 > c2b264c0 737480773 C Bo:002:01 0 4096 > c2b26440 737480776 C Bo:002:01 0 4096 > c2b263c0 737480778 C Bo:002:01 0 4096 > c2b26340 737480781 C Bo:002:01 0 4096 > c2b262c0 737480783 C Bo:002:01 0 4096 > c2b26240 737480786 C Bo:002:01 0 4096 > And this time it does succeed, although the ultimate outcome is beyond the end of the log extract (the full log contains the result). It could well be that we are just sending data faster than the drive can absorb. Normally one would expect the drive simply to hold off on accepting the data until it was ready. Apparently the firmware crashed instead. If you want to try an interesting test, go back to the usb_stor_invoke_transport() routine in transport.c, just above the Handle_Errors line. Where it says: /* Did we transfer less than the minimum amount required? */ if (srb->result == SAM_STAT_GOOD && srb->request_bufflen - srb->resid < srb->underflow) srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24); If the conditional succeeds, have it call msleep(1000) as well as setting srb->result. This delay may allow the drive to flush its buffers and get ready to accept more data. Alan Stern ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel