-- Forwarded Message --
Subject: Re: System lockups caused by USB external HDD
Date: Tuesday 25 January 2011, 01:48:03
From: CDP dr.c...@gmail.com
To: freebsd-...@freebsd.org
CC: Hans Petter Selasky hsela...@c2i.net, m...@freebsd.org
On 01/24/11 13:27, Hans Petter Selasky wrote:
On Monday 24 January 2011 12:08:47 CDP wrote:
On 01/24/11 11:34, Hans Petter Selasky wrote:
On Monday 24 January 2011 10:00:53 CDP wrote:
On 01/24/11 01:56, Daniel O'Connor wrote:
On 24/01/2011, at 9:10, CDP wrote:
g_vfs_done():da0s2[WRITE(offset=, length=16384)]error = 5
[several more lines similar to the above]
panic: softdep_move_dependencies: need merge code
cpuid = 0
KDB: stack backtrace:
#0 0x... at kdb_backtrace+0x5e
#1 0x... at panic+0x182
It looks like the disk is dying, or the FS is corrupt (the former might
cause the later).
Can you run smartctl on the disk? Unfortunately a lot of enclosures
reject SMART commands so you might not be able to :(
I have attached the output of smartctl -d sat -a /dev/da0. I didn't yet
run a SMART long test for the simple reason that the disk is going into
sleep mode and interrupts it. Haven't bothered to keep it alive for a
long test but I might just do that.
Although, I doubt it's a disk failure, since I do backups on it without
problems by using FreeBSD 7.3, on the same space where FreeBSD 8.x
fails. And I am talking about over 150GB of data in one run, while
8.2-RC2 crashes after 5-10GB. I have experienced disk failure in the
past, on SATA, and a few read/write errors never caused a system lockup.
My feeling is that enough traffic on USB causes the problem, and that
this problem is only present in the new USB stack.
Unfortunately downgrading to 7.x is not an option because there are
things that won't work on this notebook.
If you run a simple test like this:
dd if=/dev/da0 of=/dev/null bs=65536
dd if=/dev/da0 of=/dev/null bs=16384
Do you then see any errors?
Do you have a spare USB memory stick which you could run similar write
tests on?
Both reads fail with I/O error, while writes to an unused partition seem
to be fine (I interrupted the writes after a while):
% dd if=/dev/da0 of=/dev/null bs=65536
dd: /dev/da0: Input/output error
191732+0 records in
191732+0 records out
12565348352 bytes transferred in 429.999272 secs (29221790 bytes/sec)
% dd if=/dev/da0 of=/dev/null bs=16384
dd: /dev/da0: Input/output error
126427+0 records in
126427+0 records out
2071379968 bytes transferred in 169.431766 secs (12225452 bytes/sec)
# dd if=/dev/random of=/dev/da0s3 bs=65536
^C329378+0 records in
329377+0 records out
21586051072 bytes transferred in 1003.020293 secs (21521051 bytes/sec)
# dd if=/dev/random of=/dev/da0s3 bs=16384
^C679571+0 records in
679571+0 records out
11134091264 bytes transferred in 690.135793 secs (16133189 bytes/sec)
This is what I get in /var/log/messages when the I/O error occurs:
(da0:umass-sim0:0:0:0): AutoSense failed
However, I experience no lockup. Maybe this situation is not handled
correctly at another level ?
I haven't looked into the code of CAM or GEOM that much so I won't say too
much about that. I believe the USB/umass is not to blame. What you could do
is
to add a conditional error printout in umass_t_bbb_status_callback() in
/sys/dev/usb/storage/umass.c when the error happens. If that error is not a
USB transport error, then we are most likely seeing a SCSI issue in layers
above umass. Or if you have access to USB analyser use that. There is now
also
the option to trace USB from the kernel itself, but the feature is in its
early development.
The panics I was able to catch/inspect (latest from add_to_worklist() /
ffs_softdep.c) indicated they were thrown by ffs/softupdates code,
therefore I tried disabling softupdates.
The system doesn't panic anymore. The operations on the USB HDD still
stop, but after several tens of seconds the system logs the 'autosense
failed' error, a bunch of write errors, and the copy operation resumes.
md5 shows the copied files are identical to the source files.
In 7.x I don't recall having any kind of errors, neither temporary locks
in disk operations, so I'm guessing the 'autosense failed' situation is
handled differently in 8.x, compared to 7.x.
Claudiu.
-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org