Re: Some questions about da0 on USB2 (recent bad behaviour)

2009-10-26 Thread Andrew Reilly
On Sat, Oct 24, 2009 at 07:07:00PM +, b. f. wrote:
 That is: it seems to work fine for some fraction of a minute
 (doesn't seem to be longer than a minute, anyway), and then
 stops completely for several minutes (processes reading or
 writing sit in D state in ps) and then starts again, after
 logging Request completed with CAM_REQ_CMP_ERR\nRetrying
 Command.
 
 In the past week or so, Alexander Motin (m...@freebsd.org) and Andrew
 Thompson (thom...@freebsd.org) have made a number of related changes
 to cam and usb in the P4 repository, and in 9-CURRENT.  Some of these
 may address your problem.  I'm not sure when they will be back-ported
 to 8.X.  You may wish to try out the latest version of -CURRENT, to
 see if it solves your problem(s); or to contact them.

I've done this, and it seems to have worked.  It seems possible
that the bulk throughput (measured by systat while doing a cat
/backup/bigfile /dev/null) might even have increased a bit,
but maybe not.  The big improvement is that the transfer isn't
pausing any more.  No more CAM_REQ_CMP_ERR messages.

Thanks b. f. for the suggestion, and thanks Alexander and Andrew
for the fixes!

I haven't run -current for, probably, ten years, and the
occasional messages about lock-order-reversals worry me a bit,
but don't seem to be doing any harm.  Should I report them?  In
a PR?

Please count this message as a vote for MFC'ing those cam and
usb changes to 8-STABLE.

Cheers,

-- 
Andrew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Some questions about da0 on USB2 (recent bad behaviour)

2009-10-26 Thread b. f.
On 10/26/09, Andrew Reilly arei...@bigpond.net.au wrote:
 On Sat, Oct 24, 2009 at 07:07:00PM +, b. f. wrote:
...

 I haven't run -current for, probably, ten years, and the
 occasional messages about lock-order-reversals worry me a bit,
 but don't seem to be doing any harm.  Should I report them?  In
 a PR?


Some of them are known, and are either harmless or a false positive.
If you get an LOR, you can check to see if a similar LOR has already
been noted on Bjoern Zeeb's page:

http://sources.zabbadoz.net/freebsd/lor.html

If it hasn't, send him an email with the LOR, and the circumstances
under which it arose, as described on that page.  Don't forget that
there is a lot of debugging code enabled by default under -CURRENT
(which at this point is still pretty close to 8-STABLE, and will be
until after the release of 8.0,  so it ought to be fairly stable).  If
you feel confident about the safety of using -CURRENT, and want to get
the best performance, disable the debugging code as described in
/usr/src/UPDATING.

 Please count this message as a vote for MFC'ing those cam and
 usb changes to 8-STABLE.


I'm guessing that these changes will make it into 8.0, or at least
8-STABLE, fairly soon;  but mav@, thompsa@, and re@ will know more.

b.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Some questions about da0 on USB2 (recent bad behaviour)

2009-10-24 Thread Andrew Reilly
Just a follow-up with some more information:

I now doubt that the problem that I reported in the original
message is the drive dying: I've just done some read tests (cat
largefile /dev/null) on the other USB2-attached drive (also a
Western Digital MyBook, but this one is a USB2+Firewire one with
1TB, while the other one was just USB2 with 750G.)  I'm seeing
essentially the same behaviour on that drive, too.

That is: it seems to work fine for some fraction of a minute
(doesn't seem to be longer than a minute, anyway), and then
stops completely for several minutes (processes reading or
writing sit in D state in ps) and then starts again, after
logging Request completed with CAM_REQ_CMP_ERR\nRetrying
Command.

I reckon that the duty cycle of useful behaviour is is a bit
less than a third.

Any chance this is some new badness in the USB+umass stack?

Anything that I can poke or prod to make it behave better?

Any way that I can find out where it's going awry?  I don't have
kdb in my kernel, but everything not directly connected to these
USB devices seem to be behaving themselves completely.

Oh: stoppage on the two drives doesn't seem to be
chronologically correllated.

Cheers,

-- 
Andrew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Some questions about da0 on USB2 (recent bad behaviour)

2009-10-24 Thread b. f.
That is: it seems to work fine for some fraction of a minute
(doesn't seem to be longer than a minute, anyway), and then
stops completely for several minutes (processes reading or
writing sit in D state in ps) and then starts again, after
logging Request completed with CAM_REQ_CMP_ERR\nRetrying
Command.

In the past week or so, Alexander Motin (m...@freebsd.org) and Andrew
Thompson (thom...@freebsd.org) have made a number of related changes
to cam and usb in the P4 repository, and in 9-CURRENT.  Some of these
may address your problem.  I'm not sure when they will be back-ported
to 8.X.  You may wish to try out the latest version of -CURRENT, to
see if it solves your problem(s); or to contact them.

b.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Some questions about da0 on USB2 (recent bad behaviour)

2009-10-23 Thread Andrew Reilly
Hi there,

I have a system with a couple of Western Digital MyBook
USB2 drives connected to it, and have started seeing some odd
behaviour that I am not sure how to identify the cause of.
Perhaps someone could offer a suggestion or two?

The behaviour that I've noticed (and I can't remember any
particular event precipitating this, but I do track 8-STABLE
approximately weekly, so things do change from time to time...)
is that the drive will just stop for a couple of minutes, and
then continue what it was doing.  For a while that's all I
could see: no error messages at all.  The last time I booted, I
turned on verbose booting and now I see that these periods of
inactivity result in streams of syslog messages like:

Oct 24 12:48:55 duncan kernel: (da0:umass-sim0:0:0:0): Request completed with 
CAM_REQ_CMP_ERR
Oct 24 12:48:55 duncan kernel: (da0:umass-sim0:0:0:0): Retrying Command
Oct 24 12:50:24 duncan kernel: (da0:umass-sim0:0:0:0): Request completed with 
CAM_REQ_CMP_ERR
Oct 24 12:50:24 duncan kernel: (da0:umass-sim0:0:0:0): Retrying Command

The retry seems to be successful, because I'm not getting any
hard error messages anywhere, and the disk activity does seem to
proceed afterwards.  The disk drive isn't making any bad/broken
noises, either.

That drive is, according to dmesg.boot:

ugen1.2: Western Digital at usbus1
umass0: Western Digital External HDD, class 0/0, rev 2.00/1.06, addr 2 on 
usbus1
umass0:  SCSI over Bulk-Only; quirks = 0x
Root mount waiting for: usbus1
umass0:1:0:-1: Attached to scbus1
(probe0:umass-sim0:0:0:0): Down reving Protocol Version from 2 to 0?
pass0 at umass-sim0 bus 0 target 0 lun 0
pass0: WD 7500AAK External 1.06 Fixed Direct Access SCSI-0 device 
pass0: 40.000MB/s transfers
GEOM: new disk da0
da0 at umass-sim0 bus 0 target 0 lun 0
da0: WD 7500AAK External 1.06 Fixed Direct Access SCSI-0 device 
da0: 40.000MB/s transfers
da0: 715404MB (1465149168 512 byte sectors: 255H 63S/T 91201C)

What is the likelihood that these pauses and command retries are
a sign that this specific drive is in the process of dying,
physically?  If that were the case, are there any diagnostic
tools that I could run against it to show, say, internal error
logs?

What is the significance of the sim part of the device
designation umass-sim0?  I've looked in all of the manual pages
I can think of, but that clearly isn't enough.

usbdevs -v says no USB controllers found, which I thought a
bit unuseful.  I assume it is *supposed* to work, is there a
trick?  usbconfig shows my connected USB devices and hubs:

ugen0.1: OHCI root HUB nVidia at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) 
pwr=ON
ugen1.1: EHCI root HUB nVidia at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) 
pwr=ON
ugen1.2: External HDD Western Digital at usbus1, cfg=0 md=HOST spd=HIGH 
(480Mbps) pwr=ON
ugen1.3: My Book Western Digital at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) 
pwr=ON
ugen0.2: USB-PS/2 Trackball Logitech at usbus0, cfg=0 md=HOST spd=LOW 
(1.5Mbps) pwr=ON
ugen0.3: SNAPSCAN e20 AGFA at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON

At this rate the dump/restore backup that I'm running to get the
data off it will take a little over a day to finish (according
to dump), even though systat shows the drive doing about 8MB/s
while it's working, which would allow the dump to finish in
about eight hours.  These modern, large drives are all very
well, but they make doing any kind of system reconfiguration or
backup really time consuming...

Cheers,

-- 
Andrew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org