Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)
Thank you for the detailed report! I've added these controllers for the quirk list. With ahcisata_pci.c rev 1.63 and later, AHCISATA_EXTRA_DELAY kernel option is no longer required. Thanks, rin On 2022/05/27 15:02, Matthias Petermann wrote: Hello Rin, the option AHCISATA_EXTRA_DELAY seems to fix the problem for both systems below. As discussed I send here the two dmesg with: - dmesg.nuc5.txt: from my NUC5 with AHCI and a Seagate hard disk. - dmesg.fujitsu.txt: from my Esprimo, with AHCI and wd2 (Seagate) and wd3 (WD). A few more notes: - On the NUC, I had intermediately and temporarily replaced the hard drive. In the process, the reproducibility of the problem suffered. Before I "moved" the cables, I could see the problem every time I booted. Now it's more of a coincidence that it happens (even with the original hard drive installed). - On the Esprimo - when the error occurs at almost every cold boot - according to my observations, both mechanical hard disks are always affected (wd2 and wd3). The SSDs (wd0 and wd1), on the other hand, are always detected correctly. More generally, the state of the cabling seems to contribute at least somewhat to the problems. With the NUC, unplugging and plugging in changed the probability of occurrence. With the Fujitsu, I noticed the problems more since I installed a 4x SATA dock. That the problem is almost certainly related to the AHCI SATA delay would be judged by the fact that it only occurs with NetBSD 9.99.x and not with 9.2 or FreeBSD/Linux. Especially with the Fujitsu, however, I had already exchanged cables several times beforehand and tried different things, because I had initially suspected a pure cabling problem. However, it seems to me at the moment that the cabling at most changes the timing and this is set so "on edge" that the problem sometimes occurs and sometimes not. Kind regards Matthias Am 24.05.2022 um 18:23 schrieb Rin Okuyama: Hi, The recent change for probe timing should only affect ahcisata(4). Is your SATA controller ahcisata(4)? If so, (1) please try kernel built with: --- options AHCISATA_EXTRA_DELAY --- If it works around the problem, (2) please send us full dmesg of your machine. Then, we can add your controller to the quirk list. At once it is registered to the list, AHCISATA_EXTRA_DELAY option is no longer required. Thanks, rin On 2022/05/25 0:49, Matthias Petermann wrote: A small addendum: disabling the Intel Platform Trust technology in the BIOS did not help me (had read this in another post of the linked thread). However, by plugging in additional USB devices (a mouse) I apparently caused the necessary delay, which the disk would have needed in the first case to execute the WDCTL_RST without errors. This "workaround" is a shaky one though, an extremely close call. I don't even want to think about what I would do to a production server if this happened to me on a reboot. Kind regards Matthias Am 24.05.2022 um 17:31 schrieb Matthias Petermann: Hello all, with one of the newer builds of 9.99 (unfortunately I can't narrow it down more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard drive (hybrid HDD/SSD). As long as I boot from the USB stick (for installation, as well as later for booting the kernel with root redirected to the wd0) the hard drive wd0 is recognized correctly and works without problems. When I boot directly from the wd0 hard drive, I get through the boot loader fine, which also still loads the kernel correctly into memory. However, when running the initialization or hardware detection, there is then a problem with the initialization of wd0: ``` WDCTL_RST failed for drive 0 wd0: IDENTIFY failed ``` The error pattern seems to be not quite rare and probably the closest to it is this post: http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html Recent changes to the SATA autodetection timing are mentioned there. This would fit my experience, since I had the problem neither with 9.1 (build from 02/16/2021) nor with older 9.99 versions. Does anyone know more specifics about this timing thing, as well as known workarounds if there are any? I have several NUC5s with exactly this model of hard drive running stably for several years - it would be a shame if I now have to replace them for such a reason. Many greetings Matthias
Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)
The boot issue I’m seeing on a VM in Linux with 9.99.96 was with a build from 2022-05-11, so I need to download a new set of file and try again. May take me a week or so to get some results but I’ll report back when I get them. Thanks for the hint! -bob On May 25, 2022, at 2:44 PM, matthew green wrote: > [ .. ] >> install 9.99.96 in a Virtual Machine (on Linux using KVM) I noticed that >> after installing to a qcow2 disk any attempt to boot the disk results in >> not being about to find the boot device. However, the boot log shows > > was this between 2022-05-08 and 2022-05-22? i accidentally > broke some types of bootable images that Jared fixed, and > i think this error matches the failure seen. > > > .mrg. > > https://mail-index.netbsd.org/source-changes/2022/05/08/msg138416.html > https://mail-index.netbsd.org/source-changes/2022/05/22/msg138783.html
Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)
Well, test went quicker than I expected. I downloaded the amd64 image for 9.99.97 (assuming it had the fix since it was built 2022-05-25). When I tried booting the CD in a new VM it shows the same issue - the log shows it found the cd but then it claims it can’t find the root device. Entering cd0 allows it to proceed. Also seems like another issue I don’t recall seeing previously. Although the VM is configured with both a PS/2 and USB keyboard and mouse, the keyboard isn’t usable unless I remove the USB keyboard and mouse from the VM. -bob On May 25, 2022, at 2:44 PM, matthew green wrote: > [ .. ] >> install 9.99.96 in a Virtual Machine (on Linux using KVM) I noticed that >> after installing to a qcow2 disk any attempt to boot the disk results in >> not being about to find the boot device. However, the boot log shows > > was this between 2022-05-08 and 2022-05-22? i accidentally > broke some types of bootable images that Jared fixed, and > i think this error matches the failure seen. > > > .mrg. > > https://mail-index.netbsd.org/source-changes/2022/05/08/msg138416.html > https://mail-index.netbsd.org/source-changes/2022/05/22/msg138783.html
re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)
[ .. ] > install 9.99.96 in a Virtual Machine (on Linux using KVM) I noticed that > after installing to a qcow2 disk any attempt to boot the disk results in > not being about to find the boot device. However, the boot log shows was this between 2022-05-08 and 2022-05-22? i accidentally broke some types of bootable images that Jared fixed, and i think this error matches the failure seen. .mrg. https://mail-index.netbsd.org/source-changes/2022/05/08/msg138416.html https://mail-index.netbsd.org/source-changes/2022/05/22/msg138783.html
Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)
I first saw this issue on a system trying to install and run 9.92, and adding the suggested AHCISATA_EXTRA_DELAY and disabling TPM seemed to fix it for me. But then I tried 9.99.96 and saw the same problems and the fixes had no effect. However I may have stumbled onto something that could be one of the causes, although I haven’t completely tested it yet. While trying to install 9.99.96 in a Virtual Machine (on Linux using KVM) I noticed that after installing to a qcow2 disk any attempt to boot the disk results in not being about to find the boot device. However, the boot log shows the disks were located and in the case of GPT partitioning all the wedges were found and identified correctly. Responding to the prompt for a boot device with “dk1” where the system was installed, allows the system to come up and run. This makes me suspect that there may be some timing issue with disk identification in the boot code - maybe there’s something not being detected and passed to the kernel correctly for successful boot? So far all I’ve tested is an installation with UEFI boot and GPT partitions. I don’t remember if I saw the problem on real hardware using a BIOS boot though and don’t know if I ever tried doing an installation with MBR instead of GPT. BTW, this (for me) could just be an issue with KVM on Linux and have nothing at all to do with NetBSD, but so far I haven’t seen anything similar with other installations I’ve done under KVM. At this point I’ve successfully installed and run 3 different Linux systems, FreeBSD, MSDOS, FreeDOS, Solaris and Windows 95, 98, XP and 10. The only one showing a problem so far has been 9.99.96 of NetBSD, and an 8.0 version of NetBSD installs and runs OK as well. Tried NetBSD 9.92 and it had problems, but don’t recall offhand what they were at the moment. -bob
Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)
Hi Rin, thank you for your quick response. I can first confirm that the controller installed in the system is ahcisata(4). I have two different model variants where the problem occurs - on one very reliably at every boot, and on the other almost after every cold start (and only two of four disks affected on the latter). I will build and test the kernel with AHCISATA_EXTRA_DELAY and give feedback in a timely manner. Many greetings Matthias Am 24.05.2022 um 18:23 schrieb Rin Okuyama: Hi, The recent change for probe timing should only affect ahcisata(4). Is your SATA controller ahcisata(4)? If so, (1) please try kernel built with: --- options AHCISATA_EXTRA_DELAY --- If it works around the problem, (2) please send us full dmesg of your machine. Then, we can add your controller to the quirk list. At once it is registered to the list, AHCISATA_EXTRA_DELAY option is no longer required. Thanks, rin On 2022/05/25 0:49, Matthias Petermann wrote: A small addendum: disabling the Intel Platform Trust technology in the BIOS did not help me (had read this in another post of the linked thread). However, by plugging in additional USB devices (a mouse) I apparently caused the necessary delay, which the disk would have needed in the first case to execute the WDCTL_RST without errors. This "workaround" is a shaky one though, an extremely close call. I don't even want to think about what I would do to a production server if this happened to me on a reboot. Kind regards Matthias Am 24.05.2022 um 17:31 schrieb Matthias Petermann: Hello all, with one of the newer builds of 9.99 (unfortunately I can't narrow it down more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard drive (hybrid HDD/SSD). As long as I boot from the USB stick (for installation, as well as later for booting the kernel with root redirected to the wd0) the hard drive wd0 is recognized correctly and works without problems. When I boot directly from the wd0 hard drive, I get through the boot loader fine, which also still loads the kernel correctly into memory. However, when running the initialization or hardware detection, there is then a problem with the initialization of wd0: ``` WDCTL_RST failed for drive 0 wd0: IDENTIFY failed ``` The error pattern seems to be not quite rare and probably the closest to it is this post: http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html Recent changes to the SATA autodetection timing are mentioned there. This would fit my experience, since I had the problem neither with 9.1 (build from 02/16/2021) nor with older 9.99 versions. Does anyone know more specifics about this timing thing, as well as known workarounds if there are any? I have several NUC5s with exactly this model of hard drive running stably for several years - it would be a shame if I now have to replace them for such a reason. Many greetings Matthias
Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)
Hi, The recent change for probe timing should only affect ahcisata(4). Is your SATA controller ahcisata(4)? If so, (1) please try kernel built with: --- options AHCISATA_EXTRA_DELAY --- If it works around the problem, (2) please send us full dmesg of your machine. Then, we can add your controller to the quirk list. At once it is registered to the list, AHCISATA_EXTRA_DELAY option is no longer required. Thanks, rin On 2022/05/25 0:49, Matthias Petermann wrote: A small addendum: disabling the Intel Platform Trust technology in the BIOS did not help me (had read this in another post of the linked thread). However, by plugging in additional USB devices (a mouse) I apparently caused the necessary delay, which the disk would have needed in the first case to execute the WDCTL_RST without errors. This "workaround" is a shaky one though, an extremely close call. I don't even want to think about what I would do to a production server if this happened to me on a reboot. Kind regards Matthias Am 24.05.2022 um 17:31 schrieb Matthias Petermann: Hello all, with one of the newer builds of 9.99 (unfortunately I can't narrow it down more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard drive (hybrid HDD/SSD). As long as I boot from the USB stick (for installation, as well as later for booting the kernel with root redirected to the wd0) the hard drive wd0 is recognized correctly and works without problems. When I boot directly from the wd0 hard drive, I get through the boot loader fine, which also still loads the kernel correctly into memory. However, when running the initialization or hardware detection, there is then a problem with the initialization of wd0: ``` WDCTL_RST failed for drive 0 wd0: IDENTIFY failed ``` The error pattern seems to be not quite rare and probably the closest to it is this post: http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html Recent changes to the SATA autodetection timing are mentioned there. This would fit my experience, since I had the problem neither with 9.1 (build from 02/16/2021) nor with older 9.99 versions. Does anyone know more specifics about this timing thing, as well as known workarounds if there are any? I have several NUC5s with exactly this model of hard drive running stably for several years - it would be a shame if I now have to replace them for such a reason. Many greetings Matthias
Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)
A small addendum: disabling the Intel Platform Trust technology in the BIOS did not help me (had read this in another post of the linked thread). However, by plugging in additional USB devices (a mouse) I apparently caused the necessary delay, which the disk would have needed in the first case to execute the WDCTL_RST without errors. This "workaround" is a shaky one though, an extremely close call. I don't even want to think about what I would do to a production server if this happened to me on a reboot. Kind regards Matthias Am 24.05.2022 um 17:31 schrieb Matthias Petermann: Hello all, with one of the newer builds of 9.99 (unfortunately I can't narrow it down more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard drive (hybrid HDD/SSD). As long as I boot from the USB stick (for installation, as well as later for booting the kernel with root redirected to the wd0) the hard drive wd0 is recognized correctly and works without problems. When I boot directly from the wd0 hard drive, I get through the boot loader fine, which also still loads the kernel correctly into memory. However, when running the initialization or hardware detection, there is then a problem with the initialization of wd0: ``` WDCTL_RST failed for drive 0 wd0: IDENTIFY failed ``` The error pattern seems to be not quite rare and probably the closest to it is this post: http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html Recent changes to the SATA autodetection timing are mentioned there. This would fit my experience, since I had the problem neither with 9.1 (build from 02/16/2021) nor with older 9.99 versions. Does anyone know more specifics about this timing thing, as well as known workarounds if there are any? I have several NUC5s with exactly this model of hard drive running stably for several years - it would be a shame if I now have to replace them for such a reason. Many greetings Matthias
WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)
Hello all, with one of the newer builds of 9.99 (unfortunately I can't narrow it down more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard drive (hybrid HDD/SSD). As long as I boot from the USB stick (for installation, as well as later for booting the kernel with root redirected to the wd0) the hard drive wd0 is recognized correctly and works without problems. When I boot directly from the wd0 hard drive, I get through the boot loader fine, which also still loads the kernel correctly into memory. However, when running the initialization or hardware detection, there is then a problem with the initialization of wd0: ``` WDCTL_RST failed for drive 0 wd0: IDENTIFY failed ``` The error pattern seems to be not quite rare and probably the closest to it is this post: http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html Recent changes to the SATA autodetection timing are mentioned there. This would fit my experience, since I had the problem neither with 9.1 (build from 02/16/2021) nor with older 9.99 versions. Does anyone know more specifics about this timing thing, as well as known workarounds if there are any? I have several NUC5s with exactly this model of hard drive running stably for several years - it would be a shame if I now have to replace them for such a reason. Many greetings Matthias
Re: IDENTIFY failed
Sorry for the late reply. Patrick, Jun, thank you very much for testing! I've committed the patch: http://mail-index.netbsd.org/source-changes/2021/11/19/msg133924.html Thanks, rin On 2021/11/10 1:10, Patrick Welche wrote: On Mon, Nov 08, 2021 at 08:42:44PM +0900, Rin Okuyama wrote: Jun, Patrick, thank you for dmesg (and discussion offlist). For Jun, the problem is no longer reproducible even with the original copy of kernel, which failed before. So, I've just added AHCI_QUIRK_EXTRA_DELAY quirk for Patrick's machine: https://gist.github.com/rokuyama/7535594fc42a7867e3890702aee34c5c With this patch, AHCISATA_EXTRA_DELAY option is no longer required for this machine. I cvs updated, rebuilt the kernel without the DELAY, and checked that the problem still existed. (it does) Then applied your gist patch, and had a successful reboot! (I haven't tried reducing the delay) Thanks, Patrick
Re: IDENTIFY failed
On Mon, Nov 08, 2021 at 08:42:44PM +0900, Rin Okuyama wrote: > Jun, Patrick, thank you for dmesg (and discussion offlist). > > For Jun, the problem is no longer reproducible even with the original > copy of kernel, which failed before. > > So, I've just added AHCI_QUIRK_EXTRA_DELAY quirk for Patrick's machine: > > https://gist.github.com/rokuyama/7535594fc42a7867e3890702aee34c5c > > With this patch, AHCISATA_EXTRA_DELAY option is no longer required for > this machine. I cvs updated, rebuilt the kernel without the DELAY, and checked that the problem still existed. (it does) Then applied your gist patch, and had a successful reboot! (I haven't tried reducing the delay) Thanks, Patrick
Re: IDENTIFY failed
From: Rin Okuyama Subject: Re: IDENTIFY failed Date: Mon, 8 Nov 2021 20:42:44 +0900 > So, I've just added AHCI_QUIRK_EXTRA_DELAY quirk for Patrick's > machine: > https://gist.github.com/rokuyama/7535594fc42a7867e3890702aee34c5c > With this patch, AHCISATA_EXTRA_DELAY option is no longer required for > this machine. applyed,and boot fine. https://cdn.netbsd.org/pub/NetBSD/misc/jun/amd64/kernel/netbsd-AHCI_QUIRK_EXTRA_DELAY-9.99.92.gz https://github.com/ebijun/NetBSD/blob/master/dmesg/amd64/SONY_VGN-NW50JB Thanx! -- Jun Ebihara
Re: IDENTIFY failed
On Nov 4, 2021, at 08:00, Rin Okuyama wrote: > > Hmm, if affected hardware is somehow limited, we can just introduce something > like AHCI_QUIRK_EXTRADELAY. Otherwise, we can reconsider, for example, before > NetBSD 10 is released. > > Jun, Patrick, can you please provide full dmesg for your machines? > Is it a function of the ACHI controller, or the drive attached to it, or a mismatch between how the two handle the ATA protocol? A quirk table would be a good solution. I can only hope that it’s as easy as tagging a controller. For the systems that demonstrate the failure, do other drives work fine? That may be the easiest way to check. The two drives in Ebihara-san’s dmesg output are low-cost mechanical drives, and I’m more suspicious of the drives than then controllers. jf -- John Franklin frank...@elfie.org
Re: IDENTIFY failed
Jun, Patrick, thank you for dmesg (and discussion offlist). For Jun, the problem is no longer reproducible even with the original copy of kernel, which failed before. So, I've just added AHCI_QUIRK_EXTRA_DELAY quirk for Patrick's machine: https://gist.github.com/rokuyama/7535594fc42a7867e3890702aee34c5c With this patch, AHCISATA_EXTRA_DELAY option is no longer required for this machine. Also, I've added AHCISATA_EXTRA_DELAY_MS option. You can specify how many extra msec's the driver should sleep with AHCISATA_EXTRA_DELAY quirk or AHCISATA_EXTRA_DELAY option. The default is still 500ms, but you can adjust delays like: options AHCISATA_EXTRA_DELAY_MS=100 I will commit the patch if there's no objection. Thanks, rin
Re: IDENTIFY failed
On 2021/11/04 23:28, Brian Buhrow wrote: Hello. Without going and reading the probe routines, I wonder if we can create some sort of hybrid approach? Specifically, probe with the shorter delays, then, if we get a timeout, reset and probe with the longer delays? That wil cause hardware that doesn't exhibit the behavior to work with the faster probes, while slowing the non-working hardware, slightly during boot, while it's probed twice. Again, I'm not sure how dificult it is to introduce that logic, but it's a similar logic we used to determine if old PATA drives needed specific ATA commands to address blocks over 148GB, or something like that. (We'd try the command with the standard command and, if it failed, then try it with the altered command and set a quirk.) I'm not sure whether this is possible. The failure should be related to ahci_probe_drive(), but the error itself occurs afterward in wdattach(). I wonder whether we can start it over with extra delays from when wdattach() fails. If possible, this needs modifications for MI ata(4) layer. If the affected HWs are limited, it should be cleaner to add a quirk to work around them. Thanks, rin
Re: IDENTIFY failed
From: Rin Okuyama Subject: Re: IDENTIFY failed Date: Thu, 4 Nov 2021 21:18:35 +0900 > Yeah. Patrick, Jun, experiment to adjust delays will be appreciated a > lot, > if you have time. But, dmesg should be helpful enough :) On my environment, 1. after that,back to the original kernel , boot fine. >>>>>>>>> NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed 8 times reboot,and cold boot after 15min system stop seems ok. 2. change deley time 500/250/125/60/30/15/7/0. boot fine. +#defineAHCISATA_EXTRA_DELAY_TIME 250 --- ahcisata_core.c 2021/11/04 23:58:09 1.1 +++ ahcisata_core.c 2021/11/05 00:00:11 @@ -114,6 +114,7 @@ #define ATA_DELAY 1 /* 10s for a drive I/O */ #define ATA_RESET_DELAY 31000 /* 31s for a drive reset */ #define AHCI_RST_WAIT (ATA_RESET_DELAY / 10) +#defineAHCISATA_EXTRA_DELAY_TIME 250 const struct ata_bustype ahci_ata_bustype = { .bustype_type = SCSIPI_BUSTYPE_ATA, @@ -971,7 +972,7 @@ end: ahci_channel_stop(sc, chp, flags); #ifdef AHCISATA_EXTRA_DELAY - ata_delay(chp, 500, "ahcirst", flags); + ata_delay(chp, AHCISATA_EXTRA_DELAY_TIME, "ahcirst", flags); #endif /* clear port interrupt register */ AHCI_WRITE(sc, AHCI_P_IS(chp->ch_channel), 0x); @@ -997,7 +998,7 @@ } ata_kill_active(chp, KILL_RESET, flags); #ifdef AHCISATA_EXTRA_DELAY - ata_delay(chp, 500, "ahcirst", flags); + ata_delay(chp, AHCISATA_EXTRA_DELAY_TIME, "ahcirst", flags); #endif /* clear port interrupt register */ AHCI_WRITE(sc, AHCI_P_IS(chp->ch_channel), 0x); @@ -1069,7 +1070,7 @@ achp->ahcic_sstatus, AT_WAIT)) { case SStatus_DET_DEV: #ifdef AHCISATA_EXTRA_DELAY - ata_delay(chp, 500, "ahcidv", AT_WAIT); + ata_delay(chp, AHCISATA_EXTRA_DELAY_TIME, "ahcidv", AT_WAIT); #endif /* Initial value, used in case the soft reset fails */ @@ -,7 +1112,7 @@ AHCI_P_IX_PSS | AHCI_P_IX_DHRS | AHCI_P_IX_SDBS); #ifdef AHCISATA_EXTRA_DELAY /* wait 500ms before actually starting operations */ - ata_delay(chp, 500, "ahciprb", AT_WAIT); + ata_delay(chp, AHCISATA_EXTRA_DELAY_TIME, "ahciprb", AT_WAIT); #endif break;
Re: IDENTIFY failed
From: Rin Okuyama Subject: Re: IDENTIFY failed Date: Thu, 4 Nov 2021 21:00:58 +0900 > Hmm, if affected hardware is somehow limited, we can just introduce > something > like AHCI_QUIRK_EXTRADELAY. Otherwise, we can reconsider, for example, > before > NetBSD 10 is released. > Jun, Patrick, can you please provide full dmesg for your machines? two machines, https://github.com/ebijun/NetBSD/blob/master/dmesg/amd64/SONY_VGN-NW50JB https://github.com/ebijun/NetBSD/blob/master/dmesg/amd64/ASUS_X200M thanx. -- Jun Ebihara
Re: IDENTIFY failed
Hello. Without going and reading the probe routines, I wonder if we can create some sort of hybrid approach? Specifically, probe with the shorter delays, then, if we get a timeout, reset and probe with the longer delays? That wil cause hardware that doesn't exhibit the behavior to work with the faster probes, while slowing the non-working hardware, slightly during boot, while it's probed twice. Again, I'm not sure how dificult it is to introduce that logic, but it's a similar logic we used to determine if old PATA drives needed specific ATA commands to address blocks over 148GB, or something like that. (We'd try the command with the standard command and, if it failed, then try it with the altered command and set a quirk.) -thanks -Brian
Re: IDENTIFY failed
Yeah. Patrick, Jun, experiment to adjust delays will be appreciated a lot, if you have time. But, dmesg should be helpful enough :) Thanks, rin On 2021/11/04 21:04, Jared McNeill wrote: It's also possible that 2 full seconds of delays are unnecessary. Do those delays really need to be 500ms each? On Thu, 4 Nov 2021, Rin Okuyama wrote: Yeah, I know that. But, we already have two problem reports. What I am concerned about is similar problems will occur for a lot of machines. (Thinking again...) But, yes, by this way, innocent people will be punished forever by extra seconds per boot... Hmm, if affected hardware is somehow limited, we can just introduce something like AHCI_QUIRK_EXTRADELAY. Otherwise, we can reconsider, for example, before NetBSD 10 is released. Jun, Patrick, can you please provide full dmesg for your machines? Thanks, rin On 2021/11/04 19:58, Jared McNeill wrote: From the commit message: There are a handful of inexplicable 500ms delays introduced to the drive detect path in this driver, slowing boot. They can be re-enabled with options AHCISATA_EXTRA_DELAY, but should not be enabled for normal kernels. If a delay does need to be introduced in these places, the value should either be more carefully selected or the scope limited to hardware that requires the extra delay. I don't have any hardware that has problems with the delays removed, so go ahead and revert this commit if you're happy with that as a solution. It would be better to fix the problem properly though as this costs multiple seconds per drive at boot. Take care, Jared On Thu, 4 Nov 2021, Rin Okuyama wrote: Can't we put back AHCISATA_EXTRA_DELAY by default? IIUC, the option affects only probe/reset; no bad effects for I/O performance. Thanks, rin On 2021/11/01 21:19, Patrick Welche wrote: On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote: From: matthew green Subject: re: IDENTIFY failed Date: Fri, 29 Oct 2021 07:18:09 +1100 autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0 https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html this one has reduced timeframe, too: between NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed which changed how some interrupt handling works, and: http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html which removed some delays in the probe path. possibly this one is more likely to be at fault since it touches the probe path directly. add /usr/src/sys/arch/amd64/conf/GENERIC.local options AHCISATA_EXTRA_DELAY compile kernel That did the trick - thanks! (Wanted to be near the box before trying it) Cheers, Patrick
Re: IDENTIFY failed
It's also possible that 2 full seconds of delays are unnecessary. Do those delays really need to be 500ms each? On Thu, 4 Nov 2021, Rin Okuyama wrote: Yeah, I know that. But, we already have two problem reports. What I am concerned about is similar problems will occur for a lot of machines. (Thinking again...) But, yes, by this way, innocent people will be punished forever by extra seconds per boot... Hmm, if affected hardware is somehow limited, we can just introduce something like AHCI_QUIRK_EXTRADELAY. Otherwise, we can reconsider, for example, before NetBSD 10 is released. Jun, Patrick, can you please provide full dmesg for your machines? Thanks, rin On 2021/11/04 19:58, Jared McNeill wrote: From the commit message: There are a handful of inexplicable 500ms delays introduced to the drive detect path in this driver, slowing boot. They can be re-enabled with options AHCISATA_EXTRA_DELAY, but should not be enabled for normal kernels. If a delay does need to be introduced in these places, the value should either be more carefully selected or the scope limited to hardware that requires the extra delay. I don't have any hardware that has problems with the delays removed, so go ahead and revert this commit if you're happy with that as a solution. It would be better to fix the problem properly though as this costs multiple seconds per drive at boot. Take care, Jared On Thu, 4 Nov 2021, Rin Okuyama wrote: Can't we put back AHCISATA_EXTRA_DELAY by default? IIUC, the option affects only probe/reset; no bad effects for I/O performance. Thanks, rin On 2021/11/01 21:19, Patrick Welche wrote: On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote: From: matthew green Subject: re: IDENTIFY failed Date: Fri, 29 Oct 2021 07:18:09 +1100 autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0 https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html this one has reduced timeframe, too: between NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed which changed how some interrupt handling works, and: http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html which removed some delays in the probe path. possibly this one is more likely to be at fault since it touches the probe path directly. add /usr/src/sys/arch/amd64/conf/GENERIC.local options AHCISATA_EXTRA_DELAY compile kernel That did the trick - thanks! (Wanted to be near the box before trying it) Cheers, Patrick
Re: IDENTIFY failed
Yeah, I know that. But, we already have two problem reports. What I am concerned about is similar problems will occur for a lot of machines. (Thinking again...) But, yes, by this way, innocent people will be punished forever by extra seconds per boot... Hmm, if affected hardware is somehow limited, we can just introduce something like AHCI_QUIRK_EXTRADELAY. Otherwise, we can reconsider, for example, before NetBSD 10 is released. Jun, Patrick, can you please provide full dmesg for your machines? Thanks, rin On 2021/11/04 19:58, Jared McNeill wrote: From the commit message: There are a handful of inexplicable 500ms delays introduced to the drive detect path in this driver, slowing boot. They can be re-enabled with options AHCISATA_EXTRA_DELAY, but should not be enabled for normal kernels. If a delay does need to be introduced in these places, the value should either be more carefully selected or the scope limited to hardware that requires the extra delay. I don't have any hardware that has problems with the delays removed, so go ahead and revert this commit if you're happy with that as a solution. It would be better to fix the problem properly though as this costs multiple seconds per drive at boot. Take care, Jared On Thu, 4 Nov 2021, Rin Okuyama wrote: Can't we put back AHCISATA_EXTRA_DELAY by default? IIUC, the option affects only probe/reset; no bad effects for I/O performance. Thanks, rin On 2021/11/01 21:19, Patrick Welche wrote: On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote: From: matthew green Subject: re: IDENTIFY failed Date: Fri, 29 Oct 2021 07:18:09 +1100 autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0 https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html this one has reduced timeframe, too: between NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed which changed how some interrupt handling works, and: http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html which removed some delays in the probe path. possibly this one is more likely to be at fault since it touches the probe path directly. add /usr/src/sys/arch/amd64/conf/GENERIC.local options AHCISATA_EXTRA_DELAY compile kernel That did the trick - thanks! (Wanted to be near the box before trying it) Cheers, Patrick
Re: IDENTIFY failed
From the commit message: There are a handful of inexplicable 500ms delays introduced to the drive detect path in this driver, slowing boot. They can be re-enabled with options AHCISATA_EXTRA_DELAY, but should not be enabled for normal kernels. If a delay does need to be introduced in these places, the value should either be more carefully selected or the scope limited to hardware that requires the extra delay. I don't have any hardware that has problems with the delays removed, so go ahead and revert this commit if you're happy with that as a solution. It would be better to fix the problem properly though as this costs multiple seconds per drive at boot. Take care, Jared On Thu, 4 Nov 2021, Rin Okuyama wrote: Can't we put back AHCISATA_EXTRA_DELAY by default? IIUC, the option affects only probe/reset; no bad effects for I/O performance. Thanks, rin On 2021/11/01 21:19, Patrick Welche wrote: On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote: From: matthew green Subject: re: IDENTIFY failed Date: Fri, 29 Oct 2021 07:18:09 +1100 autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0 https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html this one has reduced timeframe, too: between NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed which changed how some interrupt handling works, and: http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html which removed some delays in the probe path. possibly this one is more likely to be at fault since it touches the probe path directly. add /usr/src/sys/arch/amd64/conf/GENERIC.local options AHCISATA_EXTRA_DELAY compile kernel That did the trick - thanks! (Wanted to be near the box before trying it) Cheers, Patrick
Re: IDENTIFY failed
Can't we put back AHCISATA_EXTRA_DELAY by default? IIUC, the option affects only probe/reset; no bad effects for I/O performance. Thanks, rin On 2021/11/01 21:19, Patrick Welche wrote: On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote: From: matthew green Subject: re: IDENTIFY failed Date: Fri, 29 Oct 2021 07:18:09 +1100 autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0 https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html this one has reduced timeframe, too: between NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed which changed how some interrupt handling works, and: http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html which removed some delays in the probe path. possibly this one is more likely to be at fault since it touches the probe path directly. add /usr/src/sys/arch/amd64/conf/GENERIC.local options AHCISATA_EXTRA_DELAY compile kernel That did the trick - thanks! (Wanted to be near the box before trying it) Cheers, Patrick
Re: IDENTIFY failed
On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote: > From: matthew green > Subject: re: IDENTIFY failed > Date: Fri, 29 Oct 2021 07:18:09 +1100 > > >> > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for > >> > drive 0 > >> https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html > > this one has reduced timeframe, too: > >> between > >> NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK > >> NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed > > which changed how some interrupt handling works, and: > >http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html > > which removed some delays in the probe path. possibly this one > > is more likely to be at fault since it touches the probe path > > directly. > > add > /usr/src/sys/arch/amd64/conf/GENERIC.local > options AHCISATA_EXTRA_DELAY > > compile kernel That did the trick - thanks! (Wanted to be near the box before trying it) Cheers, Patrick
Re: IDENTIFY failed
From: matthew green Subject: re: IDENTIFY failed Date: Fri, 29 Oct 2021 07:18:09 +1100 >> > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for >> > drive 0 >> https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html > this one has reduced timeframe, too: >> between >> NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK >> NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed > which changed how some interrupt handling works, and: >http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html > which removed some delays in the probe path. possibly this one > is more likely to be at fault since it touches the probe path > directly. add /usr/src/sys/arch/amd64/conf/GENERIC.local options AHCISATA_EXTRA_DELAY compile kernel https://cdn.netbsd.org/pub/NetBSD/misc/jun/amd64/kernel/netbsd-AHCISATA_EXTRA_DELAY-9.99.92.gz seems ok. Thanx. -- Jun Ebihara
re: IDENTIFY failed
> > wd1 at atabus1 drive 0 > > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for > > drive 0 > > wd1: autoconfiguration error: IDENTIFY failed > > wd1(ahcisata0:1:0): using PIO mode 0 > > > > and booting fails. Reverting and booting with 9.99.90 gets me a working box: > > > > wd1 at atabus1 drive 0 > > wd1: > > wd1: drive supports 16-sector PIO transfers, LBA48 addressing > > wd1: 9314 GB, 19377850 cyl, 16 head, 63 sec, 512 bytes/sect... > > ... > > wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 > > (Ultra/133) (using DMA), NCQ (31 tags) > > > > I'm sure someone else saw this too, but I can't find the original post... > > https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html this one has reduced timeframe, too: > between > NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK > NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed two possible changes to test reverting: http://mail-index.netbsd.org/source-changes/2021/10/05/msg132733.html which changed how some interrupt handling works, and: http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html which removed some delays in the probe path. possibly this one is more likely to be at fault since it touches the probe path directly. .mrg.
Re: IDENTIFY failed
On Thu, 28 Oct 2021 at 14:11, Patrick Welche wrote: > > Updating from NetBSD-9.99.90/amd64 to 9.99.92, I get the following failure: > > wd1 at atabus1 drive 0 > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive > 0 > wd1: autoconfiguration error: IDENTIFY failed > wd1(ahcisata0:1:0): using PIO mode 0 > > and booting fails. Reverting and booting with 9.99.90 gets me a working box: > > wd1 at atabus1 drive 0 > wd1: > wd1: drive supports 16-sector PIO transfers, LBA48 addressing > wd1: 9314 GB, 19377850 cyl, 16 head, 63 sec, 512 bytes/sect... > ... > wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 > (Ultra/133) (using DMA), NCQ (31 tags) > > I'm sure someone else saw this too, but I can't find the original post... https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html > > > Cheers, > > Patrick --
IDENTIFY failed
Updating from NetBSD-9.99.90/amd64 to 9.99.92, I get the following failure: wd1 at atabus1 drive 0 autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0 wd1: autoconfiguration error: IDENTIFY failed wd1(ahcisata0:1:0): using PIO mode 0 and booting fails. Reverting and booting with 9.99.90 gets me a working box: wd1 at atabus1 drive 0 wd1: wd1: drive supports 16-sector PIO transfers, LBA48 addressing wd1: 9314 GB, 19377850 cyl, 16 head, 63 sec, 512 bytes/sect... ... wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags) I'm sure someone else saw this too, but I can't find the original post... Cheers, Patrick