Re: Debugging an USB array issue

2024-03-16 Thread gene heskett

On 3/16/24 12:27, Max Nikulin wrote:

On 16/03/2024 00:45, Marc SCHAEFER wrote:

On Fri, Mar 15, 2024 at 01:30:08PM -0400, Dan Ritter wrote:

I have never had long-term happiness with multiple disks
connected via USB.


However: I have a similar disk array running 24h/24h for the last 
three years
on a Debian buster with no problem. I am going to upgrade this system 
soon, so
if there is something bad with bullseye's kernel I would love to learn 
about

it :)


You may search https://bugs.debian.org for known issues.

If it is really a software issue rather than a hardware one I would try 
at least bookworm-backports kernel package. Further steps may be git 
bisect game with custom builds of vanilla kernel. It would be tedious 
since 4 hours is required for each iteration.


 From my point of view some failure of USB to SATA converter is more 
probable.


And if its not a startech, probable. I have several startech's in 
service for years. And have NOT had to replace any of them.  Ignore that 
faint knocking on wood sound. :o)>


Cheers, Gene Heskett, CET.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis



Re: Debugging an USB array issue

2024-03-16 Thread Max Nikulin

On 16/03/2024 00:45, Marc SCHAEFER wrote:

On Fri, Mar 15, 2024 at 01:30:08PM -0400, Dan Ritter wrote:

I have never had long-term happiness with multiple disks
connected via USB.


However: I have a similar disk array running 24h/24h for the last three years
on a Debian buster with no problem. I am going to upgrade this system soon, so
if there is something bad with bullseye's kernel I would love to learn about
it :)


You may search https://bugs.debian.org for known issues.

If it is really a software issue rather than a hardware one I would try 
at least bookworm-backports kernel package. Further steps may be git 
bisect game with custom builds of vanilla kernel. It would be tedious 
since 4 hours is required for each iteration.


From my point of view some failure of USB to SATA converter is more 
probable.





Re: Debugging an USB array issue

2024-03-16 Thread tomas
On Fri, Mar 15, 2024 at 08:24:04PM +0100, Marc SCHAEFER wrote:
> Hello,
> 
> On Fri, Mar 15, 2024 at 06:54:38PM +0100, to...@tuxteam.de wrote:
> > I may be stating the obvious, but have you made sure the USB hub
> > is providing enough power to keep your disks happy?
> 
> It's a 60W external power supply, for 4 disks.

Thanks, that seems to settle that :-)

Cheers
-- 
t


signature.asc
Description: PGP signature


Re: Debugging an USB array issue

2024-03-15 Thread Marc SCHAEFER
Hello,

On Fri, Mar 15, 2024 at 06:54:38PM +0100, to...@tuxteam.de wrote:
> I may be stating the obvious, but have you made sure the USB hub
> is providing enough power to keep your disks happy?

It's a 60W external power supply, for 4 disks.



Re: Debugging an USB array issue

2024-03-15 Thread tomas
On Fri, Mar 15, 2024 at 05:32:30PM +0100, Marc SCHAEFER wrote:
> Hello,
> 
> on a Debian bullseye uptodate system [1], I experiment frequent (every
> 3-4 hours on heavy load) disk disconnections from a md RAID10 array with
> 4 drives connected to an USB 1M adapter [2].
> 
> Errors do not look like a timeout, but like a DMA error [3].

I may be stating the obvious, but have you made sure the USB hub
is providing enough power to keep your disks happy?

Cheers
-- 
t


signature.asc
Description: PGP signature


Re: Debugging an USB array issue

2024-03-15 Thread Dan Ritter
Marc SCHAEFER wrote: 
> on a Debian bullseye uptodate system [1], I experiment frequent (every
> 3-4 hours on heavy load) disk disconnections from a md RAID10 array with
> 4 drives connected to an USB 1M adapter [2].
> 
> Errors do not look like a timeout, but like a DMA error [3].
> 
> Immediately after, the disk reappears as a new drive name and can be
> re-added quickly to the md RAID array (I am doing those tests with a
> read-only mounted filesystem for obvious reasons).
> 
> Initially, I was wondering if it was maybe a disk doing a too long
> recovery procedure, but it is to be noted that it's not always the same
> disk which has an error, and smartctl -a shows no recorded errors for
> any of the 4 drives [4]. The drives are connected to a SATA-to-USB
> enclosure [6].
> 
> This is on a 3.1 USB PCI-Express card [5].
> 
> I already applied this work-around (which does not seem to apply to a
> non-idle system):
>echo -1 > /sys/module/usbcore/parameters/autosuspend
> 
> What would be your recommandations?  I have thought about downgrading to
> a slower port (it should not be much different with 5000M), changing the
> cable, or maybe it's the enclosure?

I have never had long-term happiness with multiple disks
connected via USB. I strongly recommend that you find a 4 or 8
disk SATA/SAS PCIe card -- an LSI 2008, for example -- and connect
through that, instead. US prices are $40-45 new. Add $15 for an 8087-to-4xSATA
cable, you will have happiness for less than $75.

-dsr-



Re: Debugging an USB array issue

2024-03-15 Thread Marc SCHAEFER
Hello,

On Fri, Mar 15, 2024 at 01:30:08PM -0400, Dan Ritter wrote:
> I have never had long-term happiness with multiple disks
> connected via USB. I strongly recommend that you find a 4 or 8
> disk SATA/SAS PCIe card -- an LSI 2008, for example -- and connect
> through that, instead. US prices are $40-45 new. Add $15 for an 8087-to-4xSATA
> cable, you will have happiness for less than $75.

Interesting. I will keep the idea in mind.  I also had a prejudice against USB
in the beginning.

However: I have a similar disk array running 24h/24h for the last three years
on a Debian buster with no problem. I am going to upgrade this system soon, so
if there is something bad with bullseye's kernel I would love to learn about
it :)



Debugging an USB array issue

2024-03-15 Thread Marc SCHAEFER
Hello,

on a Debian bullseye uptodate system [1], I experiment frequent (every
3-4 hours on heavy load) disk disconnections from a md RAID10 array with
4 drives connected to an USB 1M adapter [2].

Errors do not look like a timeout, but like a DMA error [3].

Immediately after, the disk reappears as a new drive name and can be
re-added quickly to the md RAID array (I am doing those tests with a
read-only mounted filesystem for obvious reasons).

Initially, I was wondering if it was maybe a disk doing a too long
recovery procedure, but it is to be noted that it's not always the same
disk which has an error, and smartctl -a shows no recorded errors for
any of the 4 drives [4]. The drives are connected to a SATA-to-USB
enclosure [6].

This is on a 3.1 USB PCI-Express card [5].

I already applied this work-around (which does not seem to apply to a
non-idle system):
   echo -1 > /sys/module/usbcore/parameters/autosuspend

What would be your recommandations?  I have thought about downgrading to
a slower port (it should not be much different with 5000M), changing the
cable, or maybe it's the enclosure?

Or is this a known issue (maybe with the xhci_hd driver) and I should
try another driver?

Thank you for any idea or pointer.



[1] Linux video 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 
GNU/Linux
[2] 
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 1M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 1M
|__ Port 2: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
|__ Port 1: Dev 4, If 0, Class=Hub, Driver=hub/4p, 1M
|__ Port 3: Dev 8, If 0, Class=Mass Storage, Driver=uas, 1M
|__ Port 1: Dev 6, If 0, Class=Mass Storage, Driver=uas, 1M
|__ Port 4: Dev 10, If 0, Class=Mass Storage, Driver=uas, 1M
|__ Port 2: Dev 7, If 0, Class=Mass Storage, Driver=uas, 1M
|__ Port 2: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
[3]
Mar 15 17:08:06 video kernel: [ 6607.383180] xhci_hcd :01:00.0: WARN Set TR 
Deq Ptr cmd invalid because of stream ID configuration
Mar 15 17:08:06 video kernel: [ 6607.386754] DMAR: DRHD: handling fault status 
reg 3
Mar 15 17:08:06 video kernel: [ 6607.386762] DMAR: [DMA Write] Request device 
[01:00.0] PASID  fault addr f98be000 [fault reason 05] PTE Write access 
is not set
Mar 15 17:08:06 video kernel: [ 6607.386774] sd 18:0:0:0: [sde] tag#5 data 
cmplt err -75 uas-tag 1 inflight: CMD
Mar 15 17:08:06 video kernel: [ 6607.386780] sd 18:0:0:0: [sde] tag#5 CDB: 
Read(16) 88 00 00 00 00 01 5e 1d 88 00 00 00 01 00 00 00
Mar 15 17:08:06 video kernel: [ 6607.479406] xhci_hcd :01:00.0: WARN Event 
TRB for slot 12 ep 10 with no TDs queued?
Mar 15 17:08:06 video kernel: [ 6607.479708] xhci_hcd :01:00.0: WARN Set TR 
deq ptr command for freed stream ID 38885
Mar 15 17:08:06 video kernel: [ 6607.510551] xhci_hcd :01:00.0: WARN Event 
TRB for slot 12 ep 10 with no TDs queued?
[ ... many ... ]
Mar 15 17:08:13 video kernel: [ 6614.443826] sd 18:0:0:0: [sde] tag#2 
uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN
Mar 15 17:08:13 video kernel: [ 6614.443829] sd 18:0:0:0: [sde] tag#2 CDB: ATA 
command pass through(12)/Blank a1 08 2e d0 01 00 4f c2 00 b0 00 00
Mar 15 17:08:13 video kernel: [ 6614.457969] xhci_hcd :01:00.0: WARN Event 
TRB for slot 12 ep 10 with no TDs queued?
Mar 15 17:08:13 video kernel: [ 6614.458274] xhci_hcd :01:00.0: WARN Set TR 
deq ptr command for freed stream ID 38885
[ ... many ... ]
Mar 15 17:08:25 video kernel: [ 6626.497696] sd 18:0:0:0: [sde] tag#5 FAILED 
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=19s
Mar 15 17:08:25 video kernel: [ 6626.497725] sd 18:0:0:0: [sde] tag#5 Sense Key 
: Illegal Request [current]
Mar 15 17:08:25 video kernel: [ 6626.497731] sd 18:0:0:0: [sde] tag#5 Add. 
Sense: Invalid command operation code
Mar 15 17:08:25 video kernel: [ 6626.497739] sd 18:0:0:0: [sde] tag#5 CDB: 
Read(16) 88 00 00 00 00 01 5e 1d 88 00 00 00 01 00 00 00
Mar 15 17:08:25 video kernel: [ 6626.497746] blk_update_request: critical 
target error, dev sde, sector 5873960960 op 0x0:(READ) flags 0x0 phys_seg 32 
prio class 0
Mar 15 17:08:25 video kernel: [ 6626.497755] md/raid10:md0: sde: rescheduling 
sector 11747394560
Mar 15 17:08:25 video kernel: [ 6626.497801] usb 3-1.1.4: stat urb: no pending 
cmd for uas-tag 3
Mar 15 17:08:25 video kernel: [ 6626.497807] md/raid10:md0: sdd: redirecting 
sector 11747394560 to another mirror
Mar 15 17:08:25 video kernel: [ 6626.519426] xhci_hcd :01:00.0: WARN Event 
TRB for slot 12 ep 10 with no TDs queued?
Mar 15 17:08:25 video kernel: [ 6626.519719] xhci_hcd :01:00.0: WARN Set TR 
deq ptr command for freed stream ID 38885
Mar 15 17:08:25 video kernel: [ 6626.550583] xhci_hcd :01:00.0: WARN Event 
TRB for slot 12 ep 10 with no TDs queued?
Mar 15 17:08:25 video kernel: [ 6626.550875] xhci_hcd :01:00.0: WARN Set TR 
deq ptr command for freed