Re: zfs problems after rebuilding system [SOLVED]
I based my fix heavily on that patch from the PR, but I rewrote it enough that I might've made any number of mistakes, so it needs fresh testing. Ok, have been rebooting with the patch eery ten minutes for 24 hours now, and it comes back up perfectly every time, so as far as I am concerned thats sufficient testing for me to say its fixed and I would be very happy to have it merged into STABLE (and I;ll then roll it out everywhere). Thanks! -pete. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
On Mon, 2018-03-12 at 17:21 +, Pete French wrote: > > On 10/03/2018 23:48, Ian Lepore wrote: > > > > I based my fix heavily on that patch from the PR, but I rewrote it > > enough that I might've made any number of mistakes, so it needs fresh > > testing. The main change I made was to make it a lot less noisy while > > waiting (it only mentions the wait once, unless bootverbose is set, in > > which case it's once per second). I also removed the logic that > > limited the retries to nfs and zfs, because I think we can remove all > > the old code related to waiting that only worked for ufs and let this > > new retry be the way it waits for all filesystems. But that's a bigger > > change we can do separately; I didn't want to hold up this fix any > > longer. > TThansk for the patch, its is very much appercaited! I applied this > earlier today, and have been continuously rebooting the machine in Azure > ever since (every ten minutes). This has worked flawlessly, so I am very > happy that this fixes the issue for me. I am going to leave it running > though, just to see if anything happens. I havent examined dmesg, but I > thould be able to see the output from the patch there to verify that its > waiting, yes ? > > cheers, > > -pete. Yes, if the root filesystem isn't available on the first attempt, it should emit a single line saying it will wait for up to N seconds for it to arrive, where N is the vfs.mountroot.timeout value (3 seconds if not set in loader.conf). -- Ian ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
On 10/03/2018 23:48, Ian Lepore wrote: I based my fix heavily on that patch from the PR, but I rewrote it enough that I might've made any number of mistakes, so it needs fresh testing. The main change I made was to make it a lot less noisy while waiting (it only mentions the wait once, unless bootverbose is set, in which case it's once per second). I also removed the logic that limited the retries to nfs and zfs, because I think we can remove all the old code related to waiting that only worked for ufs and let this new retry be the way it waits for all filesystems. But that's a bigger change we can do separately; I didn't want to hold up this fix any longer. TThansk for the patch, its is very much appercaited! I applied this earlier today, and have been continuously rebooting the machine in Azure ever since (every ten minutes). This has worked flawlessly, so I am very happy that this fixes the issue for me. I am going to leave it running though, just to see if anything happens. I havent examined dmesg, but I thould be able to see the output from the patch there to verify that its waiting, yes ? cheers, -pete. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
On Sat, 2018-03-10 at 23:42 +, Pete French wrote: > > > > It looks like r330745 applies fine to stable-11 without any changes, > > and there's plenty of value in testing that as well, if you're already > > set up for that world. > > > > Ive been running the patch from the PR in production since the original > bug report and it works fine. I havent looked at r330745 yes, but can > replace the PR patch with that and give it a whirl will take a look > Monday at whats possible. > > -pete. > I based my fix heavily on that patch from the PR, but I rewrote it enough that I might've made any number of mistakes, so it needs fresh testing. The main change I made was to make it a lot less noisy while waiting (it only mentions the wait once, unless bootverbose is set, in which case it's once per second). I also removed the logic that limited the retries to nfs and zfs, because I think we can remove all the old code related to waiting that only worked for ufs and let this new retry be the way it waits for all filesystems. But that's a bigger change we can do separately; I didn't want to hold up this fix any longer. -- Ian ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
It looks like r330745 applies fine to stable-11 without any changes, and there's plenty of value in testing that as well, if you're already set up for that world. Ive been running the patch from the PR in production since the original bug report and it works fine. I havent looked at r330745 yes, but can replace the PR patch with that and give it a whirl will take a look Monday at whats possible. -pete. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
On Sat, 2018-03-10 at 23:08 +, Pete French wrote: > Ah, thankyou! I haven;t run current before, but as this is such an issue > for us I;ll setup an Azure machine running it and have it reboot every > five minutes or so to check it works OK. Unfortunately the error doesnt > show up consisntently, as its a race condition. Will let you know if it > fails for any reason. > > -pete. [time to take a dive into the exiting world of current] It looks like r330745 applies fine to stable-11 without any changes, and there's plenty of value in testing that as well, if you're already set up for that world. -- Ian ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
Ah, thankyou! I haven;t run current before, but as this is such an issue for us I;ll setup an Azure machine running it and have it reboot every five minutes or so to check it works OK. Unfortunately the error doesnt show up consisntently, as its a race condition. Will let you know if it fails for any reason. -pete. [time to take a dive into the exiting world of current] ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
On Sat, 2018-03-03 at 16:19 +, Pete French wrote: > > > > > That won't work for the boot drive. > > > > When no boot drive is detected early enough, the kernel goes to the > > mountroot prompt. That seems to hold a Giant lock which inhibits > > further progress being made. Sometimes progress can be made by > > trying > > to mount unmountable partitions on other drives, but this usually > > goes > > too fast, especially if the USB drive often times out. > > > We have this problem in Azure with a ZFS root, was fixed by the pacth > in > this bug report, which actually starts off being about USB. > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208882 > > You can then set the mountroot timeout as normal and it works. > > I wold really like this patch to be applied, but it seems to have > languished since last summer. We use this as standard on all our > cloud > machines now, and it works very nicely. > > -pete. I've committed a fix to -current (r330745) based on that patch. It would be good if people running -current who've had this problem could give it some testing. I'd like to get it merged back to 11 before the 11.1 release (and back to 10-stable as well). With r330745 in place, the only setting that should be needed if your rootfs is on a device that is slow to arrive is vfs.mountroot.timeout= in loader.conf; the value is the number of seconds to wait before giving up and going to the mountroot prompt. -- Ian ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
Eugene Grosbein eugen at grosbein.net wrote on Mon Mar 5 12:20:47 UTC 2018 : > 05.03.2018 19:10, Dimitry Andric wrote: > >>> When no boot drive is detected early enough, the kernel goes to the >>> mountroot prompt. That seems to hold a Giant lock which inhibits >>> further progress being made. Sometimes progress can be made by trying >>> to mount unmountable partitions on other drives, but this usually goes >>> too fast, especially if the USB drive often times out. >> >> What I would like to know, is why our USB stack has such timeout issues >> at all. When I boot Linux on the same type of hardware, I never see USB >> timeouts. They must be doing something right, or maybe they just don't >> bother checking some status bits that we are very strict about? > > This is heavily hardware-dependent. You may have no issues with some > software+hardware combination and long timeouts with same software > but different hardware. Dimitry's example is for changing the software for the same(?) hardware, if I understand right. (FreeBSD vs. some Linux distribution.) (?: He did say "type of".) Perhaps that type of hardware can be used to figure out the difference. === Mark Millard marklmi at yahoo.com ( markmi at dsl-only.net is going away in 2018-Feb, late) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
05.03.2018 19:10, Dimitry Andric wrote: >> When no boot drive is detected early enough, the kernel goes to the >> mountroot prompt. That seems to hold a Giant lock which inhibits >> further progress being made. Sometimes progress can be made by trying >> to mount unmountable partitions on other drives, but this usually goes >> too fast, especially if the USB drive often times out. > > What I would like to know, is why our USB stack has such timeout issues > at all. When I boot Linux on the same type of hardware, I never see USB > timeouts. They must be doing something right, or maybe they just don't > bother checking some status bits that we are very strict about? This is heavily hardware-dependent. You may have no issues with some software+hardware combination and long timeouts with same software but different hardware. signature.asc Description: OpenPGP digital signature
Re: zfs problems after rebuilding system [SOLVED]
On 3 Mar 2018, at 13:56, Bruce Evans wrote: > > On Sat, 3 Mar 2018, tech-lists wrote: >> On 03/03/2018 00:23, Dimitry Andric wrote: ... >>> Whether this is due to some sort of BIOS handover trouble, or due to >>> cheap and/or crappy USB-to-SATA bridges (even with brand WD and Seagate >>> disks!), I have no idea. I attempted to debug it at some point, but >>> a well-placed "sleep 10" was an acceptable workaround... :) >> >> That fixed it, thank you again :D > > That won't work for the boot drive. > > When no boot drive is detected early enough, the kernel goes to the > mountroot prompt. That seems to hold a Giant lock which inhibits > further progress being made. Sometimes progress can be made by trying > to mount unmountable partitions on other drives, but this usually goes > too fast, especially if the USB drive often times out. What I would like to know, is why our USB stack has such timeout issues at all. When I boot Linux on the same type of hardware, I never see USB timeouts. They must be doing something right, or maybe they just don't bother checking some status bits that we are very strict about? -Dimitry signature.asc Description: Message signed with OpenPGP
Re: zfs problems after rebuilding system [SOLVED]
That won't work for the boot drive. When no boot drive is detected early enough, the kernel goes to the mountroot prompt. That seems to hold a Giant lock which inhibits further progress being made. Sometimes progress can be made by trying to mount unmountable partitions on other drives, but this usually goes too fast, especially if the USB drive often times out. We have this problem in Azure with a ZFS root, was fixed by the pacth in this bug report, which actually starts off being about USB. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208882 You can then set the mountroot timeout as normal and it works. I wold really like this patch to be applied, but it seems to have languished since last summer. We use this as standard on all our cloud machines now, and it works very nicely. -pete. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
03.03.2018 19:56, Bruce Evans wrote: > On Sat, 3 Mar 2018, tech-lists wrote: > >> On 03/03/2018 00:23, Dimitry Andric wrote: >>> Indeed. I have had the following for a few years now, due to USB drives >>> with ZFS pools: >>> >>> --- /usr/src/etc/rc.d/zfs2016-11-08 10:21:29.820131000 +0100 >>> +++ /etc/rc.d/zfs2016-11-08 12:49:52.971161000 +0100 >>> @@ -25,6 +25,8 @@ >>> >>> zfs_start_main() >>> { >>> +echo "Sleeping for 10 seconds to let USB devices settle..." >>> +sleep 10 >>> zfs mount -va >>> zfs share -a >>> if [ ! -r /etc/zfs/exports ]; then >>> >>> For some reason, USB3 (xhci) controllers can take a very, very long time >>> to correctly attach mass storage devices: I usually see many timeouts >>> before they finally get detected. After that, the devices always work >>> just fine, though. > > I have one that works for an old USB hard drive but never works for a not > so old USB flash drive and a new SSD in a USB dock (just to check the SSD > speed when handicapped by USB). Win7 has no problems with the xhci and > USB flash drive combination, and FreeBSD has no problems with the drive > on other systems. > >>> Whether this is due to some sort of BIOS handover trouble, or due to >>> cheap and/or crappy USB-to-SATA bridges (even with brand WD and Seagate >>> disks!), I have no idea. I attempted to debug it at some point, but >>> a well-placed "sleep 10" was an acceptable workaround... :) >> >> That fixed it, thank you again :D > > That won't work for the boot drive. > > When no boot drive is detected early enough, the kernel goes to the > mountroot prompt. That seems to hold a Giant lock which inhibits > further progress being made. Sometimes progress can be made by trying > to mount unmountable partitions on other drives, but this usually goes > too fast, especially if the USB drive often times out. In fact, we have enough loader.conf quirks for that: kern.cam.boot_delay "Bus registration wait time" # miliseconds vfs.mountroot.timeout "Wait for root mount" # seconds vfs.root_mount_always_wait "Wait for root mount holds even if the root device already exists" # boolean No need in extra hacks to zfs rc.d script. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
On 03/03/2018 12:56, Bruce Evans wrote: > That won't work for the boot drive. In my case the workaround is fine because it's not a boot drive -- J. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
On Sat, 3 Mar 2018, tech-lists wrote: On 03/03/2018 00:23, Dimitry Andric wrote: Indeed. I have had the following for a few years now, due to USB drives with ZFS pools: --- /usr/src/etc/rc.d/zfs 2016-11-08 10:21:29.820131000 +0100 +++ /etc/rc.d/zfs 2016-11-08 12:49:52.971161000 +0100 @@ -25,6 +25,8 @@ zfs_start_main() { + echo "Sleeping for 10 seconds to let USB devices settle..." + sleep 10 zfs mount -va zfs share -a if [ ! -r /etc/zfs/exports ]; then For some reason, USB3 (xhci) controllers can take a very, very long time to correctly attach mass storage devices: I usually see many timeouts before they finally get detected. After that, the devices always work just fine, though. I have one that works for an old USB hard drive but never works for a not so old USB flash drive and a new SSD in a USB dock (just to check the SSD speed when handicapped by USB). Win7 has no problems with the xhci and USB flash drive combination, and FreeBSD has no problems with the drive on other systems. Whether this is due to some sort of BIOS handover trouble, or due to cheap and/or crappy USB-to-SATA bridges (even with brand WD and Seagate disks!), I have no idea. I attempted to debug it at some point, but a well-placed "sleep 10" was an acceptable workaround... :) That fixed it, thank you again :D That won't work for the boot drive. When no boot drive is detected early enough, the kernel goes to the mountroot prompt. That seems to hold a Giant lock which inhibits further progress being made. Sometimes progress can be made by trying to mount unmountable partitions on other drives, but this usually goes too fast, especially if the USB drive often times out. Bruce ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
On 03/03/2018 00:23, Dimitry Andric wrote: > Indeed. I have had the following for a few years now, due to USB drives > with ZFS pools: > > --- /usr/src/etc/rc.d/zfs 2016-11-08 10:21:29.820131000 +0100 > +++ /etc/rc.d/zfs 2016-11-08 12:49:52.971161000 +0100 > @@ -25,6 +25,8 @@ > > zfs_start_main() > { > + echo "Sleeping for 10 seconds to let USB devices settle..." > + sleep 10 > zfs mount -va > zfs share -a > if [ ! -r /etc/zfs/exports ]; then > > For some reason, USB3 (xhci) controllers can take a very, very long time > to correctly attach mass storage devices: I usually see many timeouts > before they finally get detected. After that, the devices always work > just fine, though. > > Whether this is due to some sort of BIOS handover trouble, or due to > cheap and/or crappy USB-to-SATA bridges (even with brand WD and Seagate > disks!), I have no idea. I attempted to debug it at some point, but > a well-placed "sleep 10" was an acceptable workaround... :) That fixed it, thank you again :D -- J. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"