Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)

2022-05-29 Thread Rin Okuyama

Thank you for the detailed report!

I've added these controllers for the quirk list. With ahcisata_pci.c rev 1.63
and later, AHCISATA_EXTRA_DELAY kernel option is no longer required.

Thanks,
rin

On 2022/05/27 15:02, Matthias Petermann wrote:

Hello Rin,

the option AHCISATA_EXTRA_DELAY seems to fix the problem for both systems below.

As discussed I send here the two dmesg with:

  - dmesg.nuc5.txt: from my NUC5 with AHCI and a Seagate hard disk.

  - dmesg.fujitsu.txt: from my Esprimo, with AHCI and wd2 (Seagate) and wd3 
(WD).

A few more notes:

  - On the NUC, I had intermediately and temporarily replaced the hard drive. In the 
process, the reproducibility of the problem suffered. Before I "moved" the 
cables, I could see the problem every time I booted. Now it's more of a coincidence that 
it happens (even with the original hard drive installed).

  - On the Esprimo - when the error occurs at almost every cold boot - 
according to my observations, both mechanical hard disks are always affected 
(wd2 and wd3). The SSDs (wd0 and wd1), on the other hand, are always detected 
correctly.

More generally, the state of the cabling seems to contribute at least somewhat 
to the problems. With the NUC, unplugging and plugging in changed the 
probability of occurrence. With the Fujitsu, I noticed the problems more since 
I installed a 4x SATA dock. That the problem is almost certainly related to the 
AHCI SATA delay would be judged by the fact that it only occurs with NetBSD 
9.99.x and not with 9.2 or FreeBSD/Linux.

Especially with the Fujitsu, however, I had already exchanged cables several times 
beforehand and tried different things, because I had initially suspected a pure cabling 
problem. However, it seems to me at the moment that the cabling at most changes the 
timing and this is set so "on edge" that the problem sometimes occurs and 
sometimes not.

Kind regards
Matthias


Am 24.05.2022 um 18:23 schrieb Rin Okuyama:

Hi,

The recent change for probe timing should only affect ahcisata(4).
Is your SATA controller ahcisata(4)? If so,

(1) please try kernel built with:

---
options AHCISATA_EXTRA_DELAY
---

If it works around the problem,

(2) please send us full dmesg of your machine.

Then, we can add your controller to the quirk list. At once it is
registered to the list, AHCISATA_EXTRA_DELAY option is no longer
required.

Thanks,
rin

On 2022/05/25 0:49, Matthias Petermann wrote:

A small addendum: disabling the Intel Platform Trust technology in the BIOS did 
not help me (had read this in another post of the linked thread).

However, by plugging in additional USB devices (a mouse) I apparently caused the 
necessary delay, which the disk would have needed in the first case to execute the 
WDCTL_RST without errors. This "workaround" is a shaky one though, an extremely 
close call. I don't even want to think about what I would do to a production server if 
this happened to me on a reboot.

Kind regards
Matthias


Am 24.05.2022 um 17:31 schrieb Matthias Petermann:


Hello all,

with one of the newer builds of 9.99 (unfortunately I can't narrow it down 
more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard drive 
(hybrid HDD/SSD).

As long as I boot from the USB stick (for installation, as well as later for 
booting the kernel with root redirected to the wd0) the hard drive wd0 is 
recognized correctly and works without problems.

When I boot directly from the wd0 hard drive, I get through the boot loader 
fine, which also still loads the kernel correctly into memory. However, when 
running the initialization or hardware detection, there is then a problem with 
the initialization of wd0:

```
WDCTL_RST failed for drive 0
wd0: IDENTIFY failed
```

The error pattern seems to be not quite rare and probably the closest to it is 
this post:

http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html

Recent changes to the SATA autodetection timing are mentioned there. This would 
fit my experience, since I had the problem neither with 9.1 (build from 
02/16/2021) nor with older 9.99 versions. Does anyone know more specifics about 
this timing thing, as well as known workarounds if there are any? I have 
several NUC5s with exactly this model of hard drive running stably for several 
years - it would be a shame if I now have to replace them for such a reason.

Many greetings
Matthias




Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)

2022-05-25 Thread Robert Nestor
The boot issue I’m seeing on a VM in Linux with 9.99.96 was with a build from 
2022-05-11, so I need to download a new set of file and try again. May take me 
a week or so to get some results but I’ll report back when I get them.

Thanks for the hint!
-bob

On May 25, 2022, at 2:44 PM, matthew green  wrote:

> [ .. ]
>> install 9.99.96 in a Virtual Machine (on Linux using KVM) I noticed that
>> after installing to a qcow2 disk any attempt to boot the disk results in
>> not being about to find the boot device.  However, the boot log shows
> 
> was this between 2022-05-08 and 2022-05-22?  i accidentally
> broke some types of bootable images that Jared fixed, and
> i think this error matches the failure seen.
> 
> 
> .mrg.
> 
> https://mail-index.netbsd.org/source-changes/2022/05/08/msg138416.html
> https://mail-index.netbsd.org/source-changes/2022/05/22/msg138783.html



Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)

2022-05-25 Thread Robert Nestor
Well, test went quicker than I expected.  I downloaded the amd64 image for 
9.99.97 (assuming it had the fix since it was built 2022-05-25).  When I tried 
booting the CD in a new VM it shows the same issue - the log shows it found the 
cd but then it claims it can’t find the root device.  Entering cd0 allows it to 
proceed.  Also seems like another issue I don’t recall seeing previously. 
Although the VM is configured with both a PS/2 and USB keyboard and mouse, the 
keyboard isn’t usable unless I remove the USB keyboard and mouse from the VM.

-bob

On May 25, 2022, at 2:44 PM, matthew green  wrote:

> [ .. ]
>> install 9.99.96 in a Virtual Machine (on Linux using KVM) I noticed that
>> after installing to a qcow2 disk any attempt to boot the disk results in
>> not being about to find the boot device.  However, the boot log shows
> 
> was this between 2022-05-08 and 2022-05-22?  i accidentally
> broke some types of bootable images that Jared fixed, and
> i think this error matches the failure seen.
> 
> 
> .mrg.
> 
> https://mail-index.netbsd.org/source-changes/2022/05/08/msg138416.html
> https://mail-index.netbsd.org/source-changes/2022/05/22/msg138783.html



re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)

2022-05-25 Thread matthew green
[ .. ]
> install 9.99.96 in a Virtual Machine (on Linux using KVM) I noticed that
> after installing to a qcow2 disk any attempt to boot the disk results in
> not being about to find the boot device.  However, the boot log shows

was this between 2022-05-08 and 2022-05-22?  i accidentally
broke some types of bootable images that Jared fixed, and
i think this error matches the failure seen.


.mrg.

https://mail-index.netbsd.org/source-changes/2022/05/08/msg138416.html
https://mail-index.netbsd.org/source-changes/2022/05/22/msg138783.html


Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)

2022-05-24 Thread Robert Nestor
I first saw this issue on a system trying to install and run 9.92, and adding 
the suggested AHCISATA_EXTRA_DELAY and disabling TPM seemed to fix it for me.  
But then I tried 9.99.96 and saw the same problems and the fixes had no effect.

However I may have stumbled onto something that could be one of the causes, 
although I haven’t completely tested it yet.  While trying to install 9.99.96 
in a Virtual Machine (on Linux using KVM) I noticed that after installing to a 
qcow2 disk any attempt to boot the disk results in not being about to find the 
boot device.  However, the boot log shows the disks were located and in the 
case of GPT partitioning all the wedges were found and identified correctly.  
Responding to the prompt for a boot device with “dk1” where the system was 
installed, allows the system to come up and run.  This makes me suspect that 
there may be some timing issue with disk identification in the boot code - 
maybe there’s something not being detected and passed to the kernel correctly 
for successful boot?

So far all I’ve tested is an installation with UEFI boot and GPT partitions.  I 
don’t remember if I saw the problem on real hardware using a BIOS boot though 
and don’t know if I ever tried doing an installation with MBR instead of GPT.

BTW, this (for me) could just be an issue with KVM on Linux and have nothing at 
all to do with NetBSD, but so far I haven’t seen anything similar with other 
installations I’ve done under KVM.  At this point I’ve successfully installed 
and run 3 different Linux systems, FreeBSD, MSDOS, FreeDOS, Solaris and Windows 
95, 98, XP and 10.  The only one showing a problem so far has been 9.99.96 of 
NetBSD, and an 8.0 version of NetBSD installs and runs OK as well.  Tried 
NetBSD 9.92 and it had problems, but don’t recall offhand what they were at the 
moment.

-bob

Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)

2022-05-24 Thread Matthias Petermann

Hi Rin,

thank you for your quick response. I can first confirm that the 
controller installed in the system is ahcisata(4). I have two different 
model variants where the problem occurs - on one very reliably at every 
boot, and on the other almost after every cold start (and only two of 
four disks affected on the latter). I will build and test the kernel 
with AHCISATA_EXTRA_DELAY and give feedback in a timely manner.


Many greetings
Matthias

Am 24.05.2022 um 18:23 schrieb Rin Okuyama:

Hi,

The recent change for probe timing should only affect ahcisata(4).
Is your SATA controller ahcisata(4)? If so,

(1) please try kernel built with:

---
options AHCISATA_EXTRA_DELAY
---

If it works around the problem,

(2) please send us full dmesg of your machine.

Then, we can add your controller to the quirk list. At once it is
registered to the list, AHCISATA_EXTRA_DELAY option is no longer
required.

Thanks,
rin

On 2022/05/25 0:49, Matthias Petermann wrote:
A small addendum: disabling the Intel Platform Trust technology in the 
BIOS did not help me (had read this in another post of the linked 
thread).


However, by plugging in additional USB devices (a mouse) I apparently 
caused the necessary delay, which the disk would have needed in the 
first case to execute the WDCTL_RST without errors. This "workaround" 
is a shaky one though, an extremely close call. I don't even want to 
think about what I would do to a production server if this happened to 
me on a reboot.


Kind regards
Matthias


Am 24.05.2022 um 17:31 schrieb Matthias Petermann:


Hello all,

with one of the newer builds of 9.99 (unfortunately I can't narrow it 
down more) I have a problem on a NUC5 with a Seagate Firecuda SATA 
hard drive (hybrid HDD/SSD).


As long as I boot from the USB stick (for installation, as well as 
later for booting the kernel with root redirected to the wd0) the 
hard drive wd0 is recognized correctly and works without problems.


When I boot directly from the wd0 hard drive, I get through the boot 
loader fine, which also still loads the kernel correctly into memory. 
However, when running the initialization or hardware detection, there 
is then a problem with the initialization of wd0:


```
WDCTL_RST failed for drive 0
wd0: IDENTIFY failed
```

The error pattern seems to be not quite rare and probably the closest 
to it is this post:


http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html

Recent changes to the SATA autodetection timing are mentioned there. 
This would fit my experience, since I had the problem neither with 
9.1 (build from 02/16/2021) nor with older 9.99 versions. Does anyone 
know more specifics about this timing thing, as well as known 
workarounds if there are any? I have several NUC5s with exactly this 
model of hard drive running stably for several years - it would be a 
shame if I now have to replace them for such a reason.


Many greetings
Matthias






Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)

2022-05-24 Thread Rin Okuyama

Hi,

The recent change for probe timing should only affect ahcisata(4).
Is your SATA controller ahcisata(4)? If so,

(1) please try kernel built with:

---
options AHCISATA_EXTRA_DELAY
---

If it works around the problem,

(2) please send us full dmesg of your machine.

Then, we can add your controller to the quirk list. At once it is
registered to the list, AHCISATA_EXTRA_DELAY option is no longer
required.

Thanks,
rin

On 2022/05/25 0:49, Matthias Petermann wrote:

A small addendum: disabling the Intel Platform Trust technology in the BIOS did 
not help me (had read this in another post of the linked thread).

However, by plugging in additional USB devices (a mouse) I apparently caused the 
necessary delay, which the disk would have needed in the first case to execute the 
WDCTL_RST without errors. This "workaround" is a shaky one though, an extremely 
close call. I don't even want to think about what I would do to a production server if 
this happened to me on a reboot.

Kind regards
Matthias


Am 24.05.2022 um 17:31 schrieb Matthias Petermann:


Hello all,

with one of the newer builds of 9.99 (unfortunately I can't narrow it down 
more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard drive 
(hybrid HDD/SSD).

As long as I boot from the USB stick (for installation, as well as later for 
booting the kernel with root redirected to the wd0) the hard drive wd0 is 
recognized correctly and works without problems.

When I boot directly from the wd0 hard drive, I get through the boot loader 
fine, which also still loads the kernel correctly into memory. However, when 
running the initialization or hardware detection, there is then a problem with 
the initialization of wd0:

```
WDCTL_RST failed for drive 0
wd0: IDENTIFY failed
```

The error pattern seems to be not quite rare and probably the closest to it is 
this post:

http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html

Recent changes to the SATA autodetection timing are mentioned there. This would 
fit my experience, since I had the problem neither with 9.1 (build from 
02/16/2021) nor with older 9.99 versions. Does anyone know more specifics about 
this timing thing, as well as known workarounds if there are any? I have 
several NUC5s with exactly this model of hard drive running stably for several 
years - it would be a shame if I now have to replace them for such a reason.

Many greetings
Matthias




Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)

2022-05-24 Thread Matthias Petermann
A small addendum: disabling the Intel Platform Trust technology in the 
BIOS did not help me (had read this in another post of the linked thread).


However, by plugging in additional USB devices (a mouse) I apparently 
caused the necessary delay, which the disk would have needed in the 
first case to execute the WDCTL_RST without errors. This "workaround" is 
a shaky one though, an extremely close call. I don't even want to think 
about what I would do to a production server if this happened to me on a 
reboot.


Kind regards
Matthias


Am 24.05.2022 um 17:31 schrieb Matthias Petermann:


Hello all,

with one of the newer builds of 9.99 (unfortunately I can't narrow it 
down more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard 
drive (hybrid HDD/SSD).


As long as I boot from the USB stick (for installation, as well as later 
for booting the kernel with root redirected to the wd0) the hard drive 
wd0 is recognized correctly and works without problems.


When I boot directly from the wd0 hard drive, I get through the boot 
loader fine, which also still loads the kernel correctly into memory. 
However, when running the initialization or hardware detection, there is 
then a problem with the initialization of wd0:


```
WDCTL_RST failed for drive 0
wd0: IDENTIFY failed
```

The error pattern seems to be not quite rare and probably the closest to 
it is this post:


http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html

Recent changes to the SATA autodetection timing are mentioned there. 
This would fit my experience, since I had the problem neither with 9.1 
(build from 02/16/2021) nor with older 9.99 versions. Does anyone know 
more specifics about this timing thing, as well as known workarounds if 
there are any? I have several NUC5s with exactly this model of hard 
drive running stably for several years - it would be a shame if I now 
have to replace them for such a reason.


Many greetings
Matthias




WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)

2022-05-24 Thread Matthias Petermann



Hello all,

with one of the newer builds of 9.99 (unfortunately I can't narrow it 
down more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard 
drive (hybrid HDD/SSD).


As long as I boot from the USB stick (for installation, as well as later 
for booting the kernel with root redirected to the wd0) the hard drive 
wd0 is recognized correctly and works without problems.


When I boot directly from the wd0 hard drive, I get through the boot 
loader fine, which also still loads the kernel correctly into memory. 
However, when running the initialization or hardware detection, there is 
then a problem with the initialization of wd0:


```
WDCTL_RST failed for drive 0
wd0: IDENTIFY failed
```

The error pattern seems to be not quite rare and probably the closest to 
it is this post:


http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html

Recent changes to the SATA autodetection timing are mentioned there. 
This would fit my experience, since I had the problem neither with 9.1 
(build from 02/16/2021) nor with older 9.99 versions. Does anyone know 
more specifics about this timing thing, as well as known workarounds if 
there are any? I have several NUC5s with exactly this model of hard 
drive running stably for several years - it would be a shame if I now 
have to replace them for such a reason.


Many greetings
Matthias


Re: IDENTIFY failed

2021-11-19 Thread Rin Okuyama

Sorry for the late reply.

Patrick, Jun, thank you very much for testing!

I've committed the patch:
http://mail-index.netbsd.org/source-changes/2021/11/19/msg133924.html

Thanks,
rin

On 2021/11/10 1:10, Patrick Welche wrote:

On Mon, Nov 08, 2021 at 08:42:44PM +0900, Rin Okuyama wrote:

Jun, Patrick, thank you for dmesg (and discussion offlist).

For Jun, the problem is no longer reproducible even with the original
copy of kernel, which failed before.

So, I've just added AHCI_QUIRK_EXTRA_DELAY quirk for Patrick's machine:

https://gist.github.com/rokuyama/7535594fc42a7867e3890702aee34c5c

With this patch, AHCISATA_EXTRA_DELAY option is no longer required for
this machine.


I cvs updated, rebuilt the kernel without the DELAY, and checked that
the problem still existed. (it does) Then applied your gist patch, and
had a successful reboot!

(I haven't tried reducing the delay)


Thanks,

Patrick



Re: IDENTIFY failed

2021-11-09 Thread Patrick Welche
On Mon, Nov 08, 2021 at 08:42:44PM +0900, Rin Okuyama wrote:
> Jun, Patrick, thank you for dmesg (and discussion offlist).
> 
> For Jun, the problem is no longer reproducible even with the original
> copy of kernel, which failed before.
> 
> So, I've just added AHCI_QUIRK_EXTRA_DELAY quirk for Patrick's machine:
> 
> https://gist.github.com/rokuyama/7535594fc42a7867e3890702aee34c5c
> 
> With this patch, AHCISATA_EXTRA_DELAY option is no longer required for
> this machine.

I cvs updated, rebuilt the kernel without the DELAY, and checked that
the problem still existed. (it does) Then applied your gist patch, and
had a successful reboot!

(I haven't tried reducing the delay)


Thanks,

Patrick


Re: IDENTIFY failed

2021-11-08 Thread Jun Ebihara
From: Rin Okuyama 
Subject: Re: IDENTIFY failed
Date: Mon, 8 Nov 2021 20:42:44 +0900

> So, I've just added AHCI_QUIRK_EXTRA_DELAY quirk for Patrick's
> machine:
> https://gist.github.com/rokuyama/7535594fc42a7867e3890702aee34c5c
> With this patch, AHCISATA_EXTRA_DELAY option is no longer required for
> this machine.

applyed,and boot fine.
https://cdn.netbsd.org/pub/NetBSD/misc/jun/amd64/kernel/netbsd-AHCI_QUIRK_EXTRA_DELAY-9.99.92.gz
https://github.com/ebijun/NetBSD/blob/master/dmesg/amd64/SONY_VGN-NW50JB

Thanx!
--
Jun Ebihara



Re: IDENTIFY failed

2021-11-08 Thread John Franklin


On Nov 4, 2021, at 08:00, Rin Okuyama  wrote:
> 
> Hmm, if affected hardware is somehow limited, we can just introduce something
> like AHCI_QUIRK_EXTRADELAY. Otherwise, we can reconsider, for example, before
> NetBSD 10 is released.
> 
> Jun, Patrick, can you please provide full dmesg for your machines?
> 

Is it a function of the ACHI controller, or the drive attached to it, or a 
mismatch between how the two handle the ATA protocol?  A quirk table would be a 
good solution.  I can only hope that it’s as easy as tagging a controller.

For the systems that demonstrate the failure, do other drives work fine?  That 
may be the easiest way to check.  The two drives in Ebihara-san’s dmesg output 
are low-cost mechanical drives, and I’m more suspicious of the drives than then 
controllers. 

jf
-- 
John Franklin
frank...@elfie.org



Re: IDENTIFY failed

2021-11-08 Thread Rin Okuyama

Jun, Patrick, thank you for dmesg (and discussion offlist).

For Jun, the problem is no longer reproducible even with the original
copy of kernel, which failed before.

So, I've just added AHCI_QUIRK_EXTRA_DELAY quirk for Patrick's machine:

https://gist.github.com/rokuyama/7535594fc42a7867e3890702aee34c5c

With this patch, AHCISATA_EXTRA_DELAY option is no longer required for
this machine.

Also, I've added AHCISATA_EXTRA_DELAY_MS option. You can specify how
many extra msec's the driver should sleep with AHCISATA_EXTRA_DELAY
quirk or AHCISATA_EXTRA_DELAY option. The default is still 500ms, but
you can adjust delays like:

options AHCISATA_EXTRA_DELAY_MS=100

I will commit the patch if there's no objection.

Thanks,
rin


Re: IDENTIFY failed

2021-11-08 Thread Rin Okuyama

On 2021/11/04 23:28, Brian Buhrow wrote:

Hello.  Without going and reading the probe routines, I wonder if we 
can create some sort
of hybrid approach?  Specifically, probe with the shorter delays, then, if we 
get a timeout,
reset and probe with the longer delays?  That wil cause hardware that doesn't 
exhibit the
behavior to work with the faster probes, while slowing the non-working 
hardware, slightly
during boot, while it's probed twice.  Again, I'm not sure how dificult it is 
to introduce that
logic, but it's a similar logic we used to determine if old PATA drives needed 
specific ATA
commands to address blocks over 148GB, or something like that.  (We'd try the 
command with the
standard command and, if it failed, then try it with the altered command and 
set a quirk.)


I'm not sure whether this is possible. The failure should be related to
ahci_probe_drive(), but the error itself occurs afterward in wdattach().
I wonder whether we can start it over with extra delays from when
wdattach() fails.

If possible, this needs modifications for MI ata(4) layer. If the
affected HWs are limited, it should be cleaner to add a quirk to work
around them.

Thanks,
rin


Re: IDENTIFY failed

2021-11-04 Thread Jun Ebihara
From: Rin Okuyama 
Subject: Re: IDENTIFY failed
Date: Thu, 4 Nov 2021 21:18:35 +0900

> Yeah. Patrick, Jun, experiment to adjust delays will be appreciated a
> lot,
> if you have time. But, dmesg should be helpful enough :)

On my environment,

1. after that,back to the original kernel , boot fine.
>>>>>>>>> NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed

 8 times reboot,and cold boot after 15min system stop seems ok.

2. change deley time 500/250/125/60/30/15/7/0. boot fine.
+#defineAHCISATA_EXTRA_DELAY_TIME 250


--- ahcisata_core.c 2021/11/04 23:58:09 1.1
+++ ahcisata_core.c 2021/11/05 00:00:11
@@ -114,6 +114,7 @@
 #define ATA_DELAY 1 /* 10s for a drive I/O */
 #define ATA_RESET_DELAY 31000 /* 31s for a drive reset */
 #define AHCI_RST_WAIT (ATA_RESET_DELAY / 10)
+#defineAHCISATA_EXTRA_DELAY_TIME 250
 
 const struct ata_bustype ahci_ata_bustype = {
.bustype_type = SCSIPI_BUSTYPE_ATA,
@@ -971,7 +972,7 @@
 end:
ahci_channel_stop(sc, chp, flags);
 #ifdef AHCISATA_EXTRA_DELAY
-   ata_delay(chp, 500, "ahcirst", flags);
+   ata_delay(chp, AHCISATA_EXTRA_DELAY_TIME, "ahcirst", flags);
 #endif
/* clear port interrupt register */
AHCI_WRITE(sc, AHCI_P_IS(chp->ch_channel), 0x);
@@ -997,7 +998,7 @@
}
ata_kill_active(chp, KILL_RESET, flags);
 #ifdef AHCISATA_EXTRA_DELAY
-   ata_delay(chp, 500, "ahcirst", flags);
+   ata_delay(chp, AHCISATA_EXTRA_DELAY_TIME, "ahcirst", flags);
 #endif
/* clear port interrupt register */
AHCI_WRITE(sc, AHCI_P_IS(chp->ch_channel), 0x);
@@ -1069,7 +1070,7 @@
achp->ahcic_sstatus, AT_WAIT)) {
case SStatus_DET_DEV:
 #ifdef AHCISATA_EXTRA_DELAY
-   ata_delay(chp, 500, "ahcidv", AT_WAIT);
+   ata_delay(chp, AHCISATA_EXTRA_DELAY_TIME, "ahcidv", AT_WAIT);
 #endif
 
/* Initial value, used in case the soft reset fails */
@@ -,7 +1112,7 @@
AHCI_P_IX_PSS | AHCI_P_IX_DHRS | AHCI_P_IX_SDBS);
 #ifdef AHCISATA_EXTRA_DELAY
/* wait 500ms before actually starting operations */
-   ata_delay(chp, 500, "ahciprb", AT_WAIT);
+   ata_delay(chp, AHCISATA_EXTRA_DELAY_TIME, "ahciprb", AT_WAIT);
 #endif
break;
 



Re: IDENTIFY failed

2021-11-04 Thread Jun Ebihara
From: Rin Okuyama 
Subject: Re: IDENTIFY failed
Date: Thu, 4 Nov 2021 21:00:58 +0900

> Hmm, if affected hardware is somehow limited, we can just introduce
> something
> like AHCI_QUIRK_EXTRADELAY. Otherwise, we can reconsider, for example,
> before
> NetBSD 10 is released.
> Jun, Patrick, can you please provide full dmesg for your machines?

two machines,
https://github.com/ebijun/NetBSD/blob/master/dmesg/amd64/SONY_VGN-NW50JB
https://github.com/ebijun/NetBSD/blob/master/dmesg/amd64/ASUS_X200M

thanx.
--
Jun Ebihara


Re: IDENTIFY failed

2021-11-04 Thread Brian Buhrow
Hello.  Without going and reading the probe routines, I wonder if we 
can create some sort
of hybrid approach?  Specifically, probe with the shorter delays, then, if we 
get a timeout,
reset and probe with the longer delays?  That wil cause hardware that doesn't 
exhibit the
behavior to work with the faster probes, while slowing the non-working 
hardware, slightly
during boot, while it's probed twice.  Again, I'm not sure how dificult it is 
to introduce that
logic, but it's a similar logic we used to determine if old PATA drives needed 
specific ATA
commands to address blocks over 148GB, or something like that.  (We'd try the 
command with the
standard command and, if it failed, then try it with the altered command and 
set a quirk.)

-thanks
-Brian



Re: IDENTIFY failed

2021-11-04 Thread Rin Okuyama

Yeah. Patrick, Jun, experiment to adjust delays will be appreciated a lot,
if you have time. But, dmesg should be helpful enough :)

Thanks,
rin

On 2021/11/04 21:04, Jared McNeill wrote:

It's also possible that 2 full seconds of delays are unnecessary. Do those 
delays really need to be 500ms each?

On Thu, 4 Nov 2021, Rin Okuyama wrote:


Yeah, I know that. But, we already have two problem reports. What I am
concerned about is similar problems will occur for a lot of machines.

(Thinking again...) But, yes, by this way, innocent people will be punished
forever by extra seconds per boot...

Hmm, if affected hardware is somehow limited, we can just introduce something
like AHCI_QUIRK_EXTRADELAY. Otherwise, we can reconsider, for example, before
NetBSD 10 is released.

Jun, Patrick, can you please provide full dmesg for your machines?

Thanks,
rin

On 2021/11/04 19:58, Jared McNeill wrote:

 From the commit message:

   There are a handful of inexplicable 500ms delays introduced to the drive
   detect path in this driver, slowing boot. They can be re-enabled with
   options AHCISATA_EXTRA_DELAY, but should not be enabled for normal
   kernels.
   If a delay does need to be introduced in these places, the value should
   either be more carefully selected or the scope limited to hardware that
   requires the extra delay.

I don't have any hardware that has problems with the delays removed, so go 
ahead and revert this commit if you're happy with that as a solution. It would 
be better to fix the problem properly though as this costs multiple seconds per 
drive at boot.

Take care,
Jared


On Thu, 4 Nov 2021, Rin Okuyama wrote:


Can't we put back AHCISATA_EXTRA_DELAY by default?

IIUC, the option affects only probe/reset; no bad effects for
I/O performance.

Thanks,
rin

On 2021/11/01 21:19, Patrick Welche wrote:

On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote:

From: matthew green 
Subject: re: IDENTIFY failed
Date: Fri, 29 Oct 2021 07:18:09 +1100


autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0

https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html

this one has reduced timeframe, too:

between
NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK
NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed

which changed how some interrupt handling works, and:
http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html
which removed some delays in the probe path.  possibly this one
is more likely to be at fault since it touches the probe path
directly.


add
/usr/src/sys/arch/amd64/conf/GENERIC.local
options AHCISATA_EXTRA_DELAY

compile kernel


That did the trick - thanks! (Wanted to be near the box before trying it)


Cheers,

Patrick







Re: IDENTIFY failed

2021-11-04 Thread Jared McNeill
It's also possible that 2 full seconds of delays are unnecessary. Do those 
delays really need to be 500ms each?


On Thu, 4 Nov 2021, Rin Okuyama wrote:

Yeah, I know that. But, we already have two problem reports. What I 
am
concerned about is similar problems will occur for a lot of 
machines.


(Thinking again...) But, yes, by this way, innocent people will be 
punished

forever by extra seconds per boot...

Hmm, if affected hardware is somehow limited, we can just introduce 
something
like AHCI_QUIRK_EXTRADELAY. Otherwise, we can reconsider, for 
example, before

NetBSD 10 is released.

Jun, Patrick, can you please provide full dmesg for your machines?

Thanks,
rin

On 2021/11/04 19:58, Jared McNeill wrote:

 From the commit message:

   There are a handful of inexplicable 500ms delays introduced to 
the drive
   detect path in this driver, slowing boot. They can be 
re-enabled with
   options AHCISATA_EXTRA_DELAY, but should not be enabled for 
normal

   kernels.
   If a delay does need to be introduced in these places, the 
value should
   either be more carefully selected or the scope limited to 
hardware that

   requires the extra delay.

I don't have any hardware that has problems with the delays 
removed, so go ahead and revert this commit if you're happy with 
that as a solution. It would be better to fix the problem 
properly though as this costs multiple seconds per drive at boot.


Take care,
Jared


On Thu, 4 Nov 2021, Rin Okuyama wrote:


Can't we put back AHCISATA_EXTRA_DELAY by default?

IIUC, the option affects only probe/reset; no bad effects for
I/O performance.

Thanks,
rin

On 2021/11/01 21:19, Patrick Welche wrote:

On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote:

From: matthew green 
Subject: re: IDENTIFY failed
Date: Fri, 29 Oct 2021 07:18:09 +1100

autoconfiguration error: ahcisata0 port 1: setting 
WDCTL_RST failed for drive 0

https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html

this one has reduced timeframe, too:

between
NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK
NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 
Failed

which changed how some interrupt handling works, and:
    
http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html
which removed some delays in the probe path.  possibly this 
one

is more likely to be at fault since it touches the probe path
directly.


add
/usr/src/sys/arch/amd64/conf/GENERIC.local
options AHCISATA_EXTRA_DELAY

compile kernel


That did the trick - thanks! (Wanted to be near the box before 
trying it)



Cheers,

Patrick





Re: IDENTIFY failed

2021-11-04 Thread Rin Okuyama

Yeah, I know that. But, we already have two problem reports. What I am
concerned about is similar problems will occur for a lot of machines.

(Thinking again...) But, yes, by this way, innocent people will be punished
forever by extra seconds per boot...

Hmm, if affected hardware is somehow limited, we can just introduce something
like AHCI_QUIRK_EXTRADELAY. Otherwise, we can reconsider, for example, before
NetBSD 10 is released.

Jun, Patrick, can you please provide full dmesg for your machines?

Thanks,
rin

On 2021/11/04 19:58, Jared McNeill wrote:

 From the commit message:

   There are a handful of inexplicable 500ms delays introduced to the drive
   detect path in this driver, slowing boot. They can be re-enabled with
   options AHCISATA_EXTRA_DELAY, but should not be enabled for normal
   kernels.
   If a delay does need to be introduced in these places, the value should
   either be more carefully selected or the scope limited to hardware that
   requires the extra delay.

I don't have any hardware that has problems with the delays removed, so go 
ahead and revert this commit if you're happy with that as a solution. It would 
be better to fix the problem properly though as this costs multiple seconds per 
drive at boot.

Take care,
Jared


On Thu, 4 Nov 2021, Rin Okuyama wrote:


Can't we put back AHCISATA_EXTRA_DELAY by default?

IIUC, the option affects only probe/reset; no bad effects for
I/O performance.

Thanks,
rin

On 2021/11/01 21:19, Patrick Welche wrote:

On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote:

From: matthew green 
Subject: re: IDENTIFY failed
Date: Fri, 29 Oct 2021 07:18:09 +1100


autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0

https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html

this one has reduced timeframe, too:

between
NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK
NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed

which changed how some interrupt handling works, and:
    http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html
which removed some delays in the probe path.  possibly this one
is more likely to be at fault since it touches the probe path
directly.


add
/usr/src/sys/arch/amd64/conf/GENERIC.local
options AHCISATA_EXTRA_DELAY

compile kernel


That did the trick - thanks! (Wanted to be near the box before trying it)


Cheers,

Patrick





Re: IDENTIFY failed

2021-11-04 Thread Jared McNeill

From the commit message:


  There are a handful of inexplicable 500ms delays introduced to the drive
  detect path in this driver, slowing boot. They can be re-enabled with
  options AHCISATA_EXTRA_DELAY, but should not be enabled for normal
  kernels.
  If a delay does need to be introduced in these places, the value should
  either be more carefully selected or the scope limited to hardware that
  requires the extra delay.

I don't have any hardware that has problems with the delays removed, so go 
ahead and revert this commit if you're happy with that as a solution. It 
would be better to fix the problem properly though as this costs multiple 
seconds per drive at boot.


Take care,
Jared


On Thu, 4 Nov 2021, Rin Okuyama wrote:


Can't we put back AHCISATA_EXTRA_DELAY by default?

IIUC, the option affects only probe/reset; no bad effects for
I/O performance.

Thanks,
rin

On 2021/11/01 21:19, Patrick Welche wrote:

On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote:

From: matthew green 
Subject: re: IDENTIFY failed
Date: Fri, 29 Oct 2021 07:18:09 +1100

autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for 
drive 0

https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html

this one has reduced timeframe, too:

between
NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK
NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed

which changed how some interrupt handling works, and:
http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html
which removed some delays in the probe path.  possibly this one
is more likely to be at fault since it touches the probe path
directly.


add
/usr/src/sys/arch/amd64/conf/GENERIC.local
options AHCISATA_EXTRA_DELAY

compile kernel


That did the trick - thanks! (Wanted to be near the box before trying it)


Cheers,

Patrick





Re: IDENTIFY failed

2021-11-04 Thread Rin Okuyama

Can't we put back AHCISATA_EXTRA_DELAY by default?

IIUC, the option affects only probe/reset; no bad effects for
I/O performance.

Thanks,
rin

On 2021/11/01 21:19, Patrick Welche wrote:

On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote:

From: matthew green 
Subject: re: IDENTIFY failed
Date: Fri, 29 Oct 2021 07:18:09 +1100


autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0

https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html

this one has reduced timeframe, too:

between
NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK
NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed

which changed how some interrupt handling works, and:
http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html
which removed some delays in the probe path.  possibly this one
is more likely to be at fault since it touches the probe path
directly.


add
/usr/src/sys/arch/amd64/conf/GENERIC.local
options AHCISATA_EXTRA_DELAY

compile kernel


That did the trick - thanks! (Wanted to be near the box before trying it)


Cheers,

Patrick



Re: IDENTIFY failed

2021-11-01 Thread Patrick Welche
On Fri, Oct 29, 2021 at 01:05:26PM +0900, Jun Ebihara wrote:
> From: matthew green 
> Subject: re: IDENTIFY failed
> Date: Fri, 29 Oct 2021 07:18:09 +1100
> 
> >> > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for 
> >> > drive 0
> >> https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html
> > this one has reduced timeframe, too:
> >> between
> >> NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK
> >> NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed 
> > which changed how some interrupt handling works, and:
> >http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html
> > which removed some delays in the probe path.  possibly this one
> > is more likely to be at fault since it touches the probe path
> > directly.
> 
> add 
> /usr/src/sys/arch/amd64/conf/GENERIC.local
> options AHCISATA_EXTRA_DELAY
> 
> compile kernel

That did the trick - thanks! (Wanted to be near the box before trying it)


Cheers,

Patrick


Re: IDENTIFY failed

2021-10-28 Thread Jun Ebihara
From: matthew green 
Subject: re: IDENTIFY failed
Date: Fri, 29 Oct 2021 07:18:09 +1100

>> > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for 
>> > drive 0
>> https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html
> this one has reduced timeframe, too:
>> between
>> NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK
>> NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed 
> which changed how some interrupt handling works, and:
>http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html
> which removed some delays in the probe path.  possibly this one
> is more likely to be at fault since it touches the probe path
> directly.

add 
/usr/src/sys/arch/amd64/conf/GENERIC.local
options AHCISATA_EXTRA_DELAY

compile kernel
https://cdn.netbsd.org/pub/NetBSD/misc/jun/amd64/kernel/netbsd-AHCISATA_EXTRA_DELAY-9.99.92.gz

seems ok.

Thanx.
--
Jun Ebihara


re: IDENTIFY failed

2021-10-28 Thread matthew green
> > wd1 at atabus1 drive 0
> > autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for 
> > drive 0
> > wd1: autoconfiguration error: IDENTIFY failed
> > wd1(ahcisata0:1:0): using PIO mode 0
> >
> > and booting fails. Reverting and booting with 9.99.90 gets me a working box:
> >
> > wd1 at atabus1 drive 0
> > wd1: 
> > wd1: drive supports 16-sector PIO transfers, LBA48 addressing
> > wd1: 9314 GB, 19377850 cyl, 16 head, 63 sec, 512 bytes/sect...
> > ...
> > wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 
> > (Ultra/133) (using DMA), NCQ (31 tags)
> >
> > I'm sure someone else saw this too, but I can't find the original post...
>
> https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html

this one has reduced timeframe, too:

> between
> NetBSD 9.99.91 (GENERIC) #0: Tue Oct 12 19:57:53 UTC 2021 OK
> NetBSD 9.99.92 (GENERIC) #0: Mon Oct 25 20:32:38 UTC 2021 Failed 

two possible changes to test reverting:

   http://mail-index.netbsd.org/source-changes/2021/10/05/msg132733.html

which changed how some interrupt handling works, and:

   http://mail-index.netbsd.org/source-changes/2021/10/11/msg132941.html

which removed some delays in the probe path.  possibly this one
is more likely to be at fault since it touches the probe path
directly.


.mrg.


Re: IDENTIFY failed

2021-10-28 Thread Chavdar Ivanov
On Thu, 28 Oct 2021 at 14:11, Patrick Welche  wrote:
>
> Updating from NetBSD-9.99.90/amd64 to 9.99.92, I get the following failure:
>
> wd1 at atabus1 drive 0
> autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive > 0
> wd1: autoconfiguration error: IDENTIFY failed
> wd1(ahcisata0:1:0): using PIO mode 0
>
> and booting fails. Reverting and booting with 9.99.90 gets me a working box:
>
> wd1 at atabus1 drive 0
> wd1: 
> wd1: drive supports 16-sector PIO transfers, LBA48 addressing
> wd1: 9314 GB, 19377850 cyl, 16 head, 63 sec, 512 bytes/sect...
> ...
> wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 
> (Ultra/133) (using DMA), NCQ (31 tags)
>
> I'm sure someone else saw this too, but I can't find the original post...

https://mail-index.netbsd.org/current-users/2021/10/27/msg041615.html

>
>
> Cheers,
>
> Patrick



-- 



IDENTIFY failed

2021-10-28 Thread Patrick Welche
Updating from NetBSD-9.99.90/amd64 to 9.99.92, I get the following failure:

wd1 at atabus1 drive 0
autoconfiguration error: ahcisata0 port 1: setting WDCTL_RST failed for drive 0
wd1: autoconfiguration error: IDENTIFY failed
wd1(ahcisata0:1:0): using PIO mode 0

and booting fails. Reverting and booting with 9.99.90 gets me a working box:

wd1 at atabus1 drive 0
wd1: 
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 9314 GB, 19377850 cyl, 16 head, 63 sec, 512 bytes/sect...
...
wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) 
(using DMA), NCQ (31 tags)

I'm sure someone else saw this too, but I can't find the original post...


Cheers,

Patrick