Re: Debugging an USB array issue
On 3/16/24 12:27, Max Nikulin wrote: On 16/03/2024 00:45, Marc SCHAEFER wrote: On Fri, Mar 15, 2024 at 01:30:08PM -0400, Dan Ritter wrote: I have never had long-term happiness with multiple disks connected via USB. However: I have a similar disk array running 24h/24h for the last three years on a Debian buster with no problem. I am going to upgrade this system soon, so if there is something bad with bullseye's kernel I would love to learn about it :) You may search https://bugs.debian.org for known issues. If it is really a software issue rather than a hardware one I would try at least bookworm-backports kernel package. Further steps may be git bisect game with custom builds of vanilla kernel. It would be tedious since 4 hours is required for each iteration. From my point of view some failure of USB to SATA converter is more probable. And if its not a startech, probable. I have several startech's in service for years. And have NOT had to replace any of them. Ignore that faint knocking on wood sound. :o)> Cheers, Gene Heskett, CET. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis
Re: Debugging an USB array issue
On 16/03/2024 00:45, Marc SCHAEFER wrote: On Fri, Mar 15, 2024 at 01:30:08PM -0400, Dan Ritter wrote: I have never had long-term happiness with multiple disks connected via USB. However: I have a similar disk array running 24h/24h for the last three years on a Debian buster with no problem. I am going to upgrade this system soon, so if there is something bad with bullseye's kernel I would love to learn about it :) You may search https://bugs.debian.org for known issues. If it is really a software issue rather than a hardware one I would try at least bookworm-backports kernel package. Further steps may be git bisect game with custom builds of vanilla kernel. It would be tedious since 4 hours is required for each iteration. From my point of view some failure of USB to SATA converter is more probable.
Re: Debugging an USB array issue
On Fri, Mar 15, 2024 at 08:24:04PM +0100, Marc SCHAEFER wrote: > Hello, > > On Fri, Mar 15, 2024 at 06:54:38PM +0100, to...@tuxteam.de wrote: > > I may be stating the obvious, but have you made sure the USB hub > > is providing enough power to keep your disks happy? > > It's a 60W external power supply, for 4 disks. Thanks, that seems to settle that :-) Cheers -- t signature.asc Description: PGP signature
Re: Debugging an USB array issue
Hello, On Fri, Mar 15, 2024 at 06:54:38PM +0100, to...@tuxteam.de wrote: > I may be stating the obvious, but have you made sure the USB hub > is providing enough power to keep your disks happy? It's a 60W external power supply, for 4 disks.
Re: Debugging an USB array issue
On Fri, Mar 15, 2024 at 05:32:30PM +0100, Marc SCHAEFER wrote: > Hello, > > on a Debian bullseye uptodate system [1], I experiment frequent (every > 3-4 hours on heavy load) disk disconnections from a md RAID10 array with > 4 drives connected to an USB 1M adapter [2]. > > Errors do not look like a timeout, but like a DMA error [3]. I may be stating the obvious, but have you made sure the USB hub is providing enough power to keep your disks happy? Cheers -- t signature.asc Description: PGP signature
Re: Debugging an USB array issue
Marc SCHAEFER wrote: > on a Debian bullseye uptodate system [1], I experiment frequent (every > 3-4 hours on heavy load) disk disconnections from a md RAID10 array with > 4 drives connected to an USB 1M adapter [2]. > > Errors do not look like a timeout, but like a DMA error [3]. > > Immediately after, the disk reappears as a new drive name and can be > re-added quickly to the md RAID array (I am doing those tests with a > read-only mounted filesystem for obvious reasons). > > Initially, I was wondering if it was maybe a disk doing a too long > recovery procedure, but it is to be noted that it's not always the same > disk which has an error, and smartctl -a shows no recorded errors for > any of the 4 drives [4]. The drives are connected to a SATA-to-USB > enclosure [6]. > > This is on a 3.1 USB PCI-Express card [5]. > > I already applied this work-around (which does not seem to apply to a > non-idle system): >echo -1 > /sys/module/usbcore/parameters/autosuspend > > What would be your recommandations? I have thought about downgrading to > a slower port (it should not be much different with 5000M), changing the > cable, or maybe it's the enclosure? I have never had long-term happiness with multiple disks connected via USB. I strongly recommend that you find a 4 or 8 disk SATA/SAS PCIe card -- an LSI 2008, for example -- and connect through that, instead. US prices are $40-45 new. Add $15 for an 8087-to-4xSATA cable, you will have happiness for less than $75. -dsr-
Re: Debugging an USB array issue
Hello, On Fri, Mar 15, 2024 at 01:30:08PM -0400, Dan Ritter wrote: > I have never had long-term happiness with multiple disks > connected via USB. I strongly recommend that you find a 4 or 8 > disk SATA/SAS PCIe card -- an LSI 2008, for example -- and connect > through that, instead. US prices are $40-45 new. Add $15 for an 8087-to-4xSATA > cable, you will have happiness for less than $75. Interesting. I will keep the idea in mind. I also had a prejudice against USB in the beginning. However: I have a similar disk array running 24h/24h for the last three years on a Debian buster with no problem. I am going to upgrade this system soon, so if there is something bad with bullseye's kernel I would love to learn about it :)
Debugging an USB array issue
Hello, on a Debian bullseye uptodate system [1], I experiment frequent (every 3-4 hours on heavy load) disk disconnections from a md RAID10 array with 4 drives connected to an USB 1M adapter [2]. Errors do not look like a timeout, but like a DMA error [3]. Immediately after, the disk reappears as a new drive name and can be re-added quickly to the md RAID array (I am doing those tests with a read-only mounted filesystem for obvious reasons). Initially, I was wondering if it was maybe a disk doing a too long recovery procedure, but it is to be noted that it's not always the same disk which has an error, and smartctl -a shows no recorded errors for any of the 4 drives [4]. The drives are connected to a SATA-to-USB enclosure [6]. This is on a 3.1 USB PCI-Express card [5]. I already applied this work-around (which does not seem to apply to a non-idle system): echo -1 > /sys/module/usbcore/parameters/autosuspend What would be your recommandations? I have thought about downgrading to a slower port (it should not be much different with 5000M), changing the cable, or maybe it's the enclosure? Or is this a known issue (maybe with the xhci_hd driver) and I should try another driver? Thank you for any idea or pointer. [1] Linux video 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux [2] /: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 1M |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 1M |__ Port 2: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 5000M |__ Port 1: Dev 4, If 0, Class=Hub, Driver=hub/4p, 1M |__ Port 3: Dev 8, If 0, Class=Mass Storage, Driver=uas, 1M |__ Port 1: Dev 6, If 0, Class=Mass Storage, Driver=uas, 1M |__ Port 4: Dev 10, If 0, Class=Mass Storage, Driver=uas, 1M |__ Port 2: Dev 7, If 0, Class=Mass Storage, Driver=uas, 1M |__ Port 2: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M [3] Mar 15 17:08:06 video kernel: [ 6607.383180] xhci_hcd :01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration Mar 15 17:08:06 video kernel: [ 6607.386754] DMAR: DRHD: handling fault status reg 3 Mar 15 17:08:06 video kernel: [ 6607.386762] DMAR: [DMA Write] Request device [01:00.0] PASID fault addr f98be000 [fault reason 05] PTE Write access is not set Mar 15 17:08:06 video kernel: [ 6607.386774] sd 18:0:0:0: [sde] tag#5 data cmplt err -75 uas-tag 1 inflight: CMD Mar 15 17:08:06 video kernel: [ 6607.386780] sd 18:0:0:0: [sde] tag#5 CDB: Read(16) 88 00 00 00 00 01 5e 1d 88 00 00 00 01 00 00 00 Mar 15 17:08:06 video kernel: [ 6607.479406] xhci_hcd :01:00.0: WARN Event TRB for slot 12 ep 10 with no TDs queued? Mar 15 17:08:06 video kernel: [ 6607.479708] xhci_hcd :01:00.0: WARN Set TR deq ptr command for freed stream ID 38885 Mar 15 17:08:06 video kernel: [ 6607.510551] xhci_hcd :01:00.0: WARN Event TRB for slot 12 ep 10 with no TDs queued? [ ... many ... ] Mar 15 17:08:13 video kernel: [ 6614.443826] sd 18:0:0:0: [sde] tag#2 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN Mar 15 17:08:13 video kernel: [ 6614.443829] sd 18:0:0:0: [sde] tag#2 CDB: ATA command pass through(12)/Blank a1 08 2e d0 01 00 4f c2 00 b0 00 00 Mar 15 17:08:13 video kernel: [ 6614.457969] xhci_hcd :01:00.0: WARN Event TRB for slot 12 ep 10 with no TDs queued? Mar 15 17:08:13 video kernel: [ 6614.458274] xhci_hcd :01:00.0: WARN Set TR deq ptr command for freed stream ID 38885 [ ... many ... ] Mar 15 17:08:25 video kernel: [ 6626.497696] sd 18:0:0:0: [sde] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=19s Mar 15 17:08:25 video kernel: [ 6626.497725] sd 18:0:0:0: [sde] tag#5 Sense Key : Illegal Request [current] Mar 15 17:08:25 video kernel: [ 6626.497731] sd 18:0:0:0: [sde] tag#5 Add. Sense: Invalid command operation code Mar 15 17:08:25 video kernel: [ 6626.497739] sd 18:0:0:0: [sde] tag#5 CDB: Read(16) 88 00 00 00 00 01 5e 1d 88 00 00 00 01 00 00 00 Mar 15 17:08:25 video kernel: [ 6626.497746] blk_update_request: critical target error, dev sde, sector 5873960960 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Mar 15 17:08:25 video kernel: [ 6626.497755] md/raid10:md0: sde: rescheduling sector 11747394560 Mar 15 17:08:25 video kernel: [ 6626.497801] usb 3-1.1.4: stat urb: no pending cmd for uas-tag 3 Mar 15 17:08:25 video kernel: [ 6626.497807] md/raid10:md0: sdd: redirecting sector 11747394560 to another mirror Mar 15 17:08:25 video kernel: [ 6626.519426] xhci_hcd :01:00.0: WARN Event TRB for slot 12 ep 10 with no TDs queued? Mar 15 17:08:25 video kernel: [ 6626.519719] xhci_hcd :01:00.0: WARN Set TR deq ptr command for freed stream ID 38885 Mar 15 17:08:25 video kernel: [ 6626.550583] xhci_hcd :01:00.0: WARN Event TRB for slot 12 ep 10 with no TDs queued? Mar 15 17:08:25 video kernel: [ 6626.550875] xhci_hcd :01:00.0: WARN Set TR deq ptr command for freed