Re: geli - is it better to partition then encrypt, or vice versa ?
On 4/17/2021 15:52, Pete French wrote: So, am building a zpool on some encrypted discs - and what I have done is to partition the disc with GPT add a single big partition, and encrypt that. So the pool is on nda1p1.eli. But I could, of course, encrypt the disc first, and then partition the encrypted disc, or indded just put the zpool directly onto it. Just wondering what the general consensus is as to the best way to go here ... if there is one! :-) What do other people do ? IMHO one reason to partition first (and the reason I do it) is to prevent "drive attachment point hopping" from causing an unwelcome surprise if/when there is a failure or if, for some reason, you plug a drive into a different machine at some point. If you partition and label, then geli init and attach at "/dev/gpt/the-label" you now can label the drive carrier with that and irrespective of the slot or adapter that gets connected to on whatever machine it will be in the same place. If it fails this also means (assuming you labeled the carrier) you know which carrier to yank and replace. Yanking the wrong drive can be an unpleasant surprise. This also makes "geli groups" trivial in /etc/rc.conf for attachment at boot time irrespective of whether they physically come up in the same place (again typically yes, but in the case of a failure or you plug it into a different adapter.) -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: freebsd-update and speed
On 4/15/2021 08:28, Ferdinand Goldmann wrote: Following up on my own mail: to type this mail while waiting for '8778 patches'. Which has ended in: 71107120 done. Applying patches... done. Fetching 1965 files... failed. and after restarting it: Fetching 1750 patches [...] Applying patches... done. Fetching 326 files... This does not seem very reassuring to me. :( It already got the others, so it now only has to fetch 326 more. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: possibly silly question regarding freebsd-update
On 3/30/2021 12:02, Gary Palmer wrote: On Tue, Mar 30, 2021 at 11:55:24AM -0400, Karl Denninger wrote: On 3/30/2021 11:22, Guido Falsi via freebsd-stable wrote: On 30/03/21 15:35, tech-lists wrote: Hi, Recently there was https://lists.freebsd.org/pipermail/freebsd-security/2021-March/010380.html about openssl. Upgraded to 12.2-p5 with freebsd-update and rebooted. What I'm unsure about is the openssl version. Up-to-date 12.1-p5 instances report OpenSSL 1.1.1h-freebsd? 22 Sep 2020 Up-to-date stable/13-n245043-7590d7800c4 reports OpenSSL 1.1.1k-freebsd 25 Mar 2021 shouldn't the 12.2-p5 be reporting openssl 1.1.1k-freebsd as well? No, as you can see in the commit in the official git [1] while for current and stable the new upstream version of openssl was imported for the release the fix was applied without importing the new release and without changing the reported version of the library. So with 12.2p5 you do get the fix but don't get a new version of the library. [1] https://cgit.freebsd.org/src/commit/?h=releng/12.2&id=af61348d61f51a88b438d41c3c91b56b2b65ed9b Excuse me $ uname -v FreeBSD 12.2-RELEASE-p4 GENERIC $ sudo sh # freebsd-update fetch Looking up update.FreeBSD.org mirrors... 3 mirrors found. Fetching metadata signature for 12.2-RELEASE from update4.freebsd.org... done. Fetching metadata index... done. Inspecting system... done. Preparing to download files... done. No updates needed to update system to 12.2-RELEASE-p5. I am running 12.2-RELEASE-p4, so says uname -v IMHO it is an *extraordinarily* bad practice to change a library that in fact will result in a revision change while leaving the revision number alone. How do I *know*, without source to go look at, whether or not the fix is present on a binary system? If newvers.sh gets bumped then a build and -p5 release should have resulted from that, and in turn a fetch/install (and reboot of course since it's in the kernel) should result in uname -v returning "-p5" Most of my deployed "stuff" is on -STABLE but I do have a handful of machines on cloud infrastructure that are binary-only and on which I rely on freebsd-update and pkg to keep current with security-related items. What does "freebsd-version -u" report? The fix was only to a userland library, so I would not expect the kernel version as reported by uname to change. Regards, Gary Ok, that's fair; it DOES show -p5 for the user side. $ freebsd-version -ru 12.2-RELEASE-p4 12.2-RELEASE-p5 So that says my userland is -p5 while the kernel, which did not change (even though if you built from source it would carry the -p5 number) is -p4. I can live with that as it allows me to "see" that indeed the revision is present without having source on the box. I recognize that this is probably a reasonably-infrequent thing but it certainly is one that for people running binary releases is likely quite important given that the issue is in the openssl libraries. It was enough for me to rebuild all the firewall machines the other day since a DOS (which is reasonably possible for one of the flaws) aimed at my VPN server causing the server process to exit would be.. bad. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: possibly silly question regarding freebsd-update
On 3/30/2021 11:22, Guido Falsi via freebsd-stable wrote: On 30/03/21 15:35, tech-lists wrote: Hi, Recently there was https://lists.freebsd.org/pipermail/freebsd-security/2021-March/010380.html about openssl. Upgraded to 12.2-p5 with freebsd-update and rebooted. What I'm unsure about is the openssl version. Up-to-date 12.1-p5 instances report OpenSSL 1.1.1h-freebsd 22 Sep 2020 Up-to-date stable/13-n245043-7590d7800c4 reports OpenSSL 1.1.1k-freebsd 25 Mar 2021 shouldn't the 12.2-p5 be reporting openssl 1.1.1k-freebsd as well? No, as you can see in the commit in the official git [1] while for current and stable the new upstream version of openssl was imported for the release the fix was applied without importing the new release and without changing the reported version of the library. So with 12.2p5 you do get the fix but don't get a new version of the library. [1] https://cgit.freebsd.org/src/commit/?h=releng/12.2&id=af61348d61f51a88b438d41c3c91b56b2b65ed9b Excuse me $ uname -v FreeBSD 12.2-RELEASE-p4 GENERIC $ sudo sh # freebsd-update fetch Looking up update.FreeBSD.org mirrors... 3 mirrors found. Fetching metadata signature for 12.2-RELEASE from update4.freebsd.org... done. Fetching metadata index... done. Inspecting system... done. Preparing to download files... done. No updates needed to update system to 12.2-RELEASE-p5. I am running 12.2-RELEASE-p4, so says uname -v IMHO it is an *extraordinarily* bad practice to change a library that in fact will result in a revision change while leaving the revision number alone. How do I *know*, without source to go look at, whether or not the fix is present on a binary system? If newvers.sh gets bumped then a build and -p5 release should have resulted from that, and in turn a fetch/install (and reboot of course since it's in the kernel) should result in uname -v returning "-p5" Most of my deployed "stuff" is on -STABLE but I do have a handful of machines on cloud infrastructure that are binary-only and on which I rely on freebsd-update and pkg to keep current with security-related items. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: possibly silly question regarding freebsd-update
On 3/30/2021 10:40, tech-lists wrote: On Tue, Mar 30, 2021 at 09:14:56AM -0500, Doug McIntyre wrote: Like the patch referenced in the SA. https://security.FreeBSD.org/patches/SA-21:07/openssl-12.patch Again, it seems like confusion over what happens in RELEASE, STABLE and CURRENT.. Hi, I'm not sure what you mean by this. In https://lists.freebsd.org/pipermail/freebsd-security/2021-March/010380.html it says 1) To update your vulnerable system via a binary patch: Systems running a RELEASE version of FreeBSD on the i386 or amd64 platforms can be updated via the freebsd-update(8) utility: # freebsd-update fetch # freebsd-update install # which I did. If openssl updated, would it not be logical to expect openssl version information to indicate it had in fact been updated? If not, then how am I able to tell that it has updated? On an un-upgraded 12.2-p4 system *and* on an upgraded one, openssl version reports 1.1.1h-freebsd It is not updating; as I noted it appears this security patch was NOT backported and thus 12.2-RELEASE does not "see" it. You cannot go to "-STABLE" via freebsd-update; to run -STABLE you must be doing buildworld/buildkernel from source. I can confirm that 12.2-STABLE *does* have the patch as I checked it recently. From a system I cross-build for an updated yesterday: $ uname -v FreeBSD 12.2-STABLE stable/12-n232909-4fd5354e85e KSD-SMP $ openssl version OpenSSL 1.1.1k-freebsd 25 Mar 2021 -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: possibly silly question regarding freebsd-update
On 3/30/2021 10:14, Doug McIntyre wrote: Like the patch referenced in the SA. https://security.FreeBSD.org/patches/SA-21:07/openssl-12.patch Again, it seems like confusion over what happens in RELEASE, STABLE and CURRENT.. On Tue, Mar 30, 2021 at 04:05:32PM +0200, Ruben via freebsd-stable wrote: Hi, Did you mean 12.1-p5 or 12.2-p5 ? I'm asking because you refer to both 12.1-p5 and 12.2-p5 (typo?). If you meant 12.2-p5: Perhaps the FreeBSD security team did not bump the version, but "only" backported the patches to version 1.1.1h ? Regards, Ruben On 3/30/21 3:35 PM, tech-lists wrote: Hi, Recently there was https://lists.freebsd.org/pipermail/freebsd-security/2021-March/010380.html about openssl. Upgraded to 12.2-p5 with freebsd-update and rebooted. What I'm unsure about is the openssl version. Up-to-date 12.1-p5 instances report OpenSSL 1.1.1h-freebsd 22 Sep 2020 Up-to-date stable/13-n245043-7590d7800c4 reports OpenSSL 1.1.1k-freebsd 25 Mar 2021 shouldn't the 12.2-p5 be reporting openssl 1.1.1k-freebsd as well? thanks, _ Ok, except # uname -v FreeBSD 12.2-RELEASE-p4 GENERIC # openssl version OpenSSL 1.1.1h-freebsd 22 Sep 2020 # freebsd-update fetch Looking up update.FreeBSD.org mirrors... 3 mirrors found. Fetching metadata signature for 12.2-RELEASE from update4.freebsd.org... done. Fetching metadata index... done. Fetching 2 metadata patches.. done. Applying metadata patches... done. Fetching 2 metadata files... done. Inspecting system... done. Preparing to download files... done. No updates needed to update system to 12.2-RELEASE-p5. So if you're running RELEASE then /security patches /don't get backported? And you CAN'T upgrade to 12.2-STABLE via freebsd-update: # freebsd-update -r 12.2-STABLE upgrade Looking up update.FreeBSD.org mirrors... 3 mirrors found. Fetching metadata signature for 12.2-RELEASE from update1.freebsd.org... done. Fetching metadata index... done. Inspecting system... done. The following components of FreeBSD seem to be installed: kernel/generic src/src world/base world/doc world/lib32 The following components of FreeBSD do not seem to be installed: kernel/generic-dbg world/base-dbg world/lib32-dbg Does this look reasonable (y/n)? y Fetching metadata signature for 12.2-STABLE from update1.freebsd.org... failed. Fetching metadata signature for 12.2-STABLE from update2.freebsd.org... failed. Fetching metadata signature for 12.2-STABLE from update4.freebsd.org... failed. No mirrors remaining, giving up. This may be because upgrading from this platform (amd64) or release (12.2-STABLE) is unsupported by freebsd-update. Only platforms with Tier 1 support can be upgraded by freebsd-update. See https://www.freebsd.org/platforms/index.html for more info. If unsupported, FreeBSD must be upgraded by source. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: How do I know if my 13-stable has security patches?
On 2/26/2021 10:22, Ed Maste wrote: On Thu, 25 Feb 2021 at 16:57, Karl Denninger wrote: The time (and present items) on a given machine to know whether it is covered by a given advisory under the "svn view of the world" is one command, and no sources. That is, if the advisory says "r123456" has the fix, then if I do a "uname -v" and get something larger, it's safe. Yes, as previously stated the commit count will be included in future advisories. On stable/13 today uname will include: uname displays e.g. stable/13-n244688-66308a13dddc The advisory would report stabl/13-n244572 244688 is greater than 244572 so will have the fix. Sounds like the issue has been addressed -- thank you! -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: How do I know if my 13-stable has security patches?
On 2/25/2021 15:56, Warner Losh wrote: On Thu, Feb 25, 2021 at 6:37 AM Karl Denninger <mailto:k...@denninger.net>> wrote: On 2/25/2021 04:30, Olivier Certner wrote: >> Neither command is what I'd call 'intuitive', so it would have taken me a >> long time to find either of them. I cut and pasted the 'git branch' command >> and it took me a moment to realize what that meant. Never ran "grep -l" on >> a pipe, I guess. > You made me laugh! Apart from relatively simple commands, git's interface is > far from intuitive. That's the reason why I regret that it became the hugely > dominant DVCS. Regression doesn't have to come to a project, but if the tools you choose do things like this then you have to work around them as a project to avoid the issue, and that might wind up being somewhat of a PITA. This specific issue is IMHO quite severe in terms of operational impact. I track -STABLE but don't load "new things" all the time. For security-related things it's more important to know if I've got something out there in a specific instance where it may apply (and not care in others where it doesn't; aka the recent Xen thing if you're not using Xen.) Otherwise if everything is running as it should do I wish to risk introducing bugs along with improvements? If not in a security-related context, frequently not. Well, this used to be easy. Is your "uname" r-number HIGHER than the "when fixed" revision? You're good. Now, nope. Now I have to go dig source to know because there is no longer a "revision number" that monotonically increments with each commit so there is no longer a way to have a "point in time" view of the source, as-committed, for a given checked-out version. IMHO that's a fairly serious regression for the person responsible for keeping security-related things up to date and something the project should find a way to fix before rolling the next -RELEASE. (Yeah, I know that's almost-certain to not happen but it's not like this issue wasn't known since moving things over to git.) We should likely just publish the 'v' number in the advisories. It's basically a count back to the start of the project. We put that number in uname already. You can also find out the 'v' number in the latest advisories by cloning the repo and doing the same thing we do in newvers.sh: % git rev-list --first-parent --count $HASH and that will tell you. This needn't be on the target machine since the hashes are stable across the world. (list of further "stuff") But that's my entire point Warner. The time (and present items) on a given machine to know whether it is covered by a given advisory under the "svn view of the world" is one command, and no sources. That is, if the advisory says "r123456" has the fix, then if I do a "uname -v" and get something larger, it's safe. If I get something smaller it's not. I don't need the source on the machine, I don't need svn on the target or, for that matter, do I need to know if the source tree I have on a build machine is coherent with whatever is on the running machine. I simply need to know if the source that built the code that is running was updated *after* the commit that fixes the problem. What if the source /isn't on that machine /because you build on some system and then distribute? Does every machine now have to be coherent with your source repository in order to be able to figure out where you are or worse, it must keep the source from which that specific installation, individually, was built? /What if the source isn't there at all /because you run binary code and update with freebsd-update? Unless I've missed something that's what was lost and IMHO needs to be restored; a way to know that in seconds with nothing other than the operating OS on the box (e.g. via uname) and the advisory with its "greater than X is safe" from the mailing list. Am I misunderstanding the current state of things in this regard? -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: How do I know if my 13-stable has security patches?
On 2/25/2021 04:30, Olivier Certner wrote: Neither command is what I'd call 'intuitive', so it would have taken me a long time to find either of them. I cut and pasted the 'git branch' command and it took me a moment to realize what that meant. Never ran "grep -l" on a pipe, I guess. You made me laugh! Apart from relatively simple commands, git's interface is far from intuitive. That's the reason why I regret that it became the hugely dominant DVCS. Regression doesn't have to come to a project, but if the tools you choose do things like this then you have to work around them as a project to avoid the issue, and that might wind up being somewhat of a PITA. This specific issue is IMHO quite severe in terms of operational impact. I track -STABLE but don't load "new things" all the time. For security-related things it's more important to know if I've got something out there in a specific instance where it may apply (and not care in others where it doesn't; aka the recent Xen thing if you're not using Xen.) Otherwise if everything is running as it should do I wish to risk introducing bugs along with improvements? If not in a security-related context, frequently not. Well, this used to be easy. Is your "uname" r-number HIGHER than the "when fixed" revision? You're good. Now, nope. Now I have to go dig source to know because there is no longer a "revision number" that monotonically increments with each commit so there is no longer a way to have a "point in time" view of the source, as-committed, for a given checked-out version. IMHO that's a fairly serious regression for the person responsible for keeping security-related things up to date and something the project should find a way to fix before rolling the next -RELEASE. (Yeah, I know that's almost-certain to not happen but it's not like this issue wasn't known since moving things over to git.) -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: HEADS UP: FreeBSD src repo transitioning to git this weekend
On 12/23/2020 12:01, Warner Losh wrote: On Wed, Dec 23, 2020 at 7:32 AM Michael Grimm wrote: Hi, Warner Losh wrote: The FreeBSD project will be moving it's source repo from subversion to git starting this this weekend. First of all I'd like to thank all those involved in this for their efforts. Following https://github.com/bsdimp/freebsd-git-docs/blob/main/mini-primer.md form your other mail I was able to migrate from svn to git without running into any issues. Right now I am learning how to use git the way I sed svn before. I am just following 12-STABLE in order to build world and kernel. I am not developing, neither am I committing. I wonder how one would switch from a currently used branch (OLD) to another branch (NEW). With svn I used: svn switch svn://svn.freebsd.org/base/stable/NEW /usr/src For git I found: git branch -m stable/OLD stable/NEW or git branch -M stable/OLD stable/NEW git-branch(1): With a -m or -M option, will be renamed to . If had a corresponding reflog, it is renamed to match , and a reflog entry is created to remember the branch renaming. If exists, -M must be used to force the rename to happen. I don't understand that text completely, because I don't know what a reflog is, yet ;-) Thus: Should I use "-m" or "-M" in my scenario when switching from stable/12 to stable/13 in the near future? I think the answer is a simple "git checkout NEW". This will replace the current tree at branch OLD with the contents of branch NEW. git branch -m is different and changes what the branch means. If you did what you suggested then you'd be renaming the OLD brnach to NEW, which isn't what I think you're asking about. Correct -- "git checkout NEW" where "new" is the desired branch you wish to have "active." If you have made local changes it will tell you to act on that first; the usual is "git stash" to save them. You can then apply them with "git stash apply" to the *new* branch, assuming that makes sense to do (e.g. a kernel configuration file, etc.) "Stash" maintains a stack which can be manipulated as well (so a "stash" if you already "stash"ed and did not drop it creates a second one, aka "stash@(0) and stash@(1)". -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: URGENT: Microsoft overwrites boot loader!
On 7/16/2020 16:28, Alan Somers wrote: > On Thu, Jul 16, 2020 at 2:20 PM Don Wilde wrote: > >> The [deleted] ones in Redmond have done it again. My multi-OS GRUB2 boot >> loader is gone, and in its place is a 500M partition called 'Windows >> boot loader'. >> >> The purpose is to force us to look at MS' new version of Edge. All my >> old boot files are gone. >> >> It's taken me much of the morning to get underneath this, since on this >> unit my only OS (other than Doze 10) with a WM and GUI is Ubuntu. >> >> That's the last time I will allow this, and I'm calling those [deleted]s >> tomorrow to give them a piece of my mind. After that I will erase every >> vestige of that obscene OS from my disk. >> >> -- >> Don Wilde >> >> * What is the Internet of Things but a system * >> * of systems including humans? * >> >> > Edge? I thought that was a browser. What does it have to do with boot > loaders? Microsoft does this on any of their "Feature" updates. I managed to figure out how to arrange my EFI setup so that all I have to do is restore the index in the BIOS to point back at REFIND, and everything else is still there. But if you stick the FreeBSD loader where Microsoft wants to clobber it, yeah, it'll do that. It doesn't actually blast the partition though -- just the single file where they want to stuff it. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: 12.1p7 no longer boots after doing zpool upgrade -a
On 7/9/2020 09:32, Pete French wrote: > > On 09/07/2020 14:24, Kyle Evans wrote: > >>> gpart bootcode -p /boot/boot1.efifat -i 1 ada0 >>> gpart bootcode -p /boot/boot1.efifat -i 1 ada1 >> >> >> This method of updating the ESP is no longer recommended for new 12.x >> installations -- we now more carefully construct the ESP with an >> /EFI/FreeBSD/loader.efi where loader.efi is /boot/loader.efi. You will >> want to rebuild this as such, and that may fix part of your problem. > > Out of interest, how should the ESP partition be upgraded then ? I > dont have any EFI machines...yet. But one day I will, and I was > assuming that an upgrade would be done using the above lines too. > Nope. An EFI partition is just a "funky" MSDOS (FAT) one, really. Thus the upgrade of the loader on one would be just a copy onto it as with any other file on a filesystem (e.g. mount the partition, copy the file to the correct place, unmount it); the gpart command does a byte-copy onto what is, for all intents and purposes, an unformatted (no directory, etc) reserved part of the disk. My laptop dual boot (Windows 10 / FreeBSD) is EFI and I've yet to have to screw with the loader, but if you do then it's just a copy over. Windows has several times blown up my Refind install -- all the "Feature" upgrades from Windows have a habit of resetting the BIOS boot order which makes the machine "Windows boots immediately and only", so I have to go back and reset it whenever Microslug looses one of those on me. If I had cause to update the loader for FreeBSD then I'd just mount the partition and copy it over. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: support of PCIe NVME drives
On 4/16/2020 13:23, Pete Wright wrote: > > > On 4/16/20 11:12 AM, Miroslav Lachman wrote: >> Kurt Jaeger wrote on 04/16/2020 20:07: >>> Hi! >>> >>>> I was requested to install FreeBSD 11.3 on a new Dell machine with >>>> only 2 >>>> NVME drives in ZFS mirror. The problem is that installer does not >>>> see the >>>> drives. Are there any special procedure to use NVME drives for >>>> installation a later for booting? >>> >>> I use 2 NVMe drives as zfs mirror to boot from on my testbox, >>> but it runs CURRENT, since approx. November 2018. >>> >>> So maybe try it with 12.1 ? I know, that does not help if you are asked >>> to install 11.3, but at least it gives you an idea... >>> >> >> I tried 12.1 few minutes ago but the result is the same - no NVME >> drives listed. >> Should I try something with kernel modules, some sysctl tweaks? >> Should I try UEFI boot? (I never did) >> > > I would try booting via UEFI if you can. I just installed a laptop > yesterday which has a nvme root device, it was detected by the > 12-STABLE snapshot I used to boot from. no other modifications were > necessary on my end. > > -pete > Yeah my Lenovo Carbon X1 has an nVME drive in it, and nothing else - 12-Stable found it immediately and works fine. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Running FreeBSD on M.2 SSD
On 2/25/2020 9:53 AM, John Kennedy wrote: On Tue, Feb 25, 2020 at 11:07:48AM +, Pete French wrote: I have often wondered if ZFS is more aggressive with discs, because until very recently any solid state drive I have used ZFS on broke very quicky. ... I've always wondered if ZFS (and other snapshotting file systems) would help kill SSD disks by locking up blocks longer than other filesystems might. For example, I've got snapshot-backups going back, say, a year then those blocks that haven't changed aren't going back into the pool to be rewritten (and perhaps favored because of low write-cycle count). As the disk fills up, the blocks that aren't locked up get reused more and more, leading to extra wear on them. Eventually one of those will get to the point of erroring out. Personally, I just size generously but that isn't always an option for everybody. I have a ZFS RaidZ2 on SSDs that has been running for several /years /without any problems. The drives are Intel 730s, which Intel CLAIMS don't have power-loss protection but in fact appear to; not only do they have caps in them but in addition they pass a "pull the cord out of the wall and then check to see if the data is corrupted on restart" test on a repeated basis, which I did several times before trusting them. BTW essentially all non-data-center SSDs fail that test and some fail it spectacularly (destroying the OS due to some of the in-flight data being comingled on an allocated block with something important; if the read/erase/write cycle interrupts you're cooked as the "other" data that was not being modified gets destroyed too!) -- the Intels are one of the very, very few that have passed it. -- -- Karl Denninger /The Market-Ticker/ S/MIME Email accepted and preferred smime.p7s Description: S/MIME Cryptographic Signature
Re: Running FreeBSD on M.2 SSD
On 2/25/2020 8:28 AM, Mario Olofo wrote: Good morning all, @Pete French, you have trim activated on your SSDs right? I heard that if its not activated, the SSD disc can stop working very quickly. @Daniel Kalchev, I used UFS2 with SU+J as suggested on the forums for me, and in this case the filesystem didn't "corrupted", it justs kernel panic from time to time so I gave up. I think that the problem was related to the size of the journal, that become full when I put so many files at once on the system, or was deadlocks in the version of the OS that I was using. @Alexander Leidinger I have the original HDD 1TB Hybrid that came with the notebook will try to reinstall FreeBSD on it to see if it works correctly. Besides my notebook been a 2019 model Dell G3 with no customizations other than the m.2 SSD, I never trust that the system is 100%, so I'll try all possibilities. 1- The BIOS received an update last month but I'll look if there's something newer. 2- Reinstall the FreeBSD on the Hybrid HDD, but if the problem is the FreeBSD driver, it'll work correctly on that HD. 3- Will try with other RAM. This I really don't think that is the problem because is a brand new notebook, but... who knows =). Thank you, Mario I have a Lenovo Carbon X1 that has a Samsung nVME SSD in it and it's fine with both FreeBSD12-STABLE and Windows (I have it set up for dual EFI boot using REFIND.) It does not have a "custom" driver for Win10; it is using Microsoft's "built-in" stuff. Zero problems and I beat on it pretty-heavily. -- -- Karl Denninger /The Market-Ticker/ S/MIME Email accepted and preferred smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS and power management
On 1/5/2020 16:10, Peter wrote: > On Wed, 18 Dec 2019 17:22:16 +0100, Karl Denninger > wrote: > >> I'm curious if anyone has come up with a way to do this... >> >> I have a system here that has two pools -- one comprised of SSD disks >> that are the "most commonly used" things including user home directories >> and mailboxes, and another that is comprised of very large things that >> are far less-commonly used (e.g. video data files, media, build >> environments for various devices, etc.) > > I'm using such a configuration for more than 10 years already, and > didn't perceive the problems You describe. > Disks are powered down with gstopd or other means, and they stay > powered down until filesystems in the pool are actively accessed. > A difficulty for me was that postgres autovacuum must be completeley > disabled if there are tablespaces on the quiesced pools. Another thing > that comes to mind is smartctl in daemon mode (but I never used that). > There are probably a whole bunch more of potential culprits, so I > suggest You work thru all the housekeeping stuff (daemons, cronjobs, > etc.) to find it. I found a number of things and managed to kill them off in terms of active access, and now it is behaving. I'm using "camcontrol idle -t 240 da{xxx}", which interestingly enough appears NOT to survive a reboot, but otherwise does what's expected. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
ZFS and power management
I'm curious if anyone has come up with a way to do this... I have a system here that has two pools -- one comprised of SSD disks that are the "most commonly used" things including user home directories and mailboxes, and another that is comprised of very large things that are far less-commonly used (e.g. video data files, media, build environments for various devices, etc.) The second pool has perhaps two dozen filesystems that are mounted, but again, rarely accessed. However, despite them being rarely accessed ZFS performs various maintenance checkpoint functions on a nearly-continuous basis (it appears) because there's a low level, but not zero, amount of I/O traffic to and from them. Thus if I set power control (e.g. spin down after 5 minutes of inactivity) they never do. I could simply export the pool but I prefer (greatly) to not do that because some of the data on that pool (e.g. backups from PCs) is information that if a user wants to get to it it ought to "just work." Well, one disk is no big deal. A rack full of them is another matter. I could materially cut the power consumption of this box down (likely by a third or more) if those disks were spun down during 95% of the time the box is up, but with the "standard" way ZFS does things that doesn't appear to be possible. Has anyone taken a crack at changing the paradigm (e.g. using the automounter, perhaps?) to get around this? -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Kernel panic in zfs code; 12-STABLE
On 7/18/2019 15:35, Karl Denninger wrote: > On 7/18/2019 15:19, Eugene Grosbein wrote: >> 19.07.2019 3:13, Karl Denninger wrote: >> >>> FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019 >>> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP >>> >>> Note -- no patches of any sort in the ZFS code; I am NOT running any of >>> my former patch set. >>> >>> NewFS.denninger.net dumped core - see /var/crash/vmcore.8 >>> >>> Thu Jul 18 15:02:54 CDT 2019 >>> >>> FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r349024M: >>> Thu Jun 13 18:01:16 CDT 2019 >>> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP amd64 >>> >>> panic: double fault >> [skip] >> >>> #283 0x82748d91 in zio_vdev_io_done (zio=0xf8000b8b8000) >>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3376 >>> #284 0x82744eac in zio_execute (zio=0xf8000b8b8000) >>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786 >>> #285 0x80c3b7f4 in taskqueue_run_locked (queue=0xf801a8b35100) >>> at /usr/src/sys/kern/subr_taskqueue.c:467 >>> #286 0x80c3cb28 in taskqueue_thread_loop (arg=) >>> at /usr/src/sys/kern/subr_taskqueue.c:773 >>> #287 0x80b9ab23 in fork_exit ( >>> callout=0x80c3ca90 , >>> arg=0xf801a0577520, frame=0xfe009d4edc00) >>> at /usr/src/sys/kern/kern_fork.c:1063 >>> #288 0x810b367e in fork_trampoline () >>> at /usr/src/sys/amd64/amd64/exception.S:996 >>> #289 0x in ?? () >>> Current language: auto; currently minimal >>> (kgdb) >> You have "double fault" and completely insane number of stack frames in the >> trace. >> This is obviously infinite recursion resulting in kernel stack overflow and >> panic. > Yes, but why and how? > > What's executing at the time is this command: > > zfs send -RI $i@zfs-old $i@zfs-base | zfs receive -Fudv $BACKUP > > Which in turn results in the old snapshots on the target not on the > source being deleted, then the new ones being sent. It never gets to > the sending part; it blows up during the delete of the OLD snapshots. > > The one(s) it deletes, however, it DOES delete. When the box is > rebooted those two snapshots on the target are indeed gone. > > That is, it is NOT getting "stuck" on one (which would imply there's an > un-detected fault in the filesystem on the target in the metadata for > that snapshot, resulting in a recursive call that blows up the stack) > and it never gets to send the new snapshot, so whatever is going on is > NOT on the source filesystem. Neither source or destination shows any > errors on the filesystem; both pools are healthy with zero error counts. > > Therefore the question -- is the system queueing enough work to blow the > stack *BUT* the work it queues is all legitimate? If so there's a > serious problem in the way the code now functions in that an "ordinary" > operation can result in what amounts to kernel stack exhaustion. > > One note -- I haven't run this backup for the last five days, as I do it > manually and I've been out of town. Previous running it on a daily > basis completed without trouble. This smells like a backlog of "things > to do" when the send runs that results in the allegedly-infinite > recursion (that isn't really infinite) that runs the stack out of space > -- and THAT implies that the system is trying to queue a crazy amount of > work on a recursive basis for what is a perfectly-legitimate operation > -- which it should *NOT* do. Update: This looks like an OLD bug that came back. Previously the system would go absolutely insane on the first few accesses to spinning rust during a snapshot delete and ATTEMPT to send thousands of TRIM requests -- which spinning rust does not support. On a system with mixed vdevs, where some pools are rust and some are SSD, this was a problem since you can't turn TRIM off because you REALLY want it on those disks. The FIX for this was to do this on the import of said pool comprised of spinning rust: # # Now try to trigger TRIM so that we don't have a storm of them # # echo "Attempting to disable TRIM on spinning rust" mount -t zfs $BACKUP/no-trim /mnt dd if=/dev/random of=/mnt/kill-trim bs=128k count=2 echo "Performed 2 writes" sleep 2 rm /mnt/kill-trim echo "Performed delete of written file; wait" sleep 35 umount /mnt echo "Unmounted tempo
Re: Kernel panic in zfs code; 12-STABLE
On 7/18/2019 15:19, Eugene Grosbein wrote: > 19.07.2019 3:13, Karl Denninger wrote: > >> FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019 >> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP >> >> Note -- no patches of any sort in the ZFS code; I am NOT running any of >> my former patch set. >> >> NewFS.denninger.net dumped core - see /var/crash/vmcore.8 >> >> Thu Jul 18 15:02:54 CDT 2019 >> >> FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r349024M: >> Thu Jun 13 18:01:16 CDT 2019 >> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP amd64 >> >> panic: double fault > [skip] > >> #283 0x82748d91 in zio_vdev_io_done (zio=0xf8000b8b8000) >> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3376 >> #284 0x82744eac in zio_execute (zio=0xf8000b8b8000) >> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786 >> #285 0x80c3b7f4 in taskqueue_run_locked (queue=0xf801a8b35100) >> at /usr/src/sys/kern/subr_taskqueue.c:467 >> #286 0x80c3cb28 in taskqueue_thread_loop (arg=) >> at /usr/src/sys/kern/subr_taskqueue.c:773 >> #287 0x80b9ab23 in fork_exit ( >> callout=0x80c3ca90 , >> arg=0xf801a0577520, frame=0xfe009d4edc00) >> at /usr/src/sys/kern/kern_fork.c:1063 >> #288 0x810b367e in fork_trampoline () >> at /usr/src/sys/amd64/amd64/exception.S:996 >> #289 0x in ?? () >> Current language: auto; currently minimal >> (kgdb) > You have "double fault" and completely insane number of stack frames in the > trace. > This is obviously infinite recursion resulting in kernel stack overflow and > panic. Yes, but why and how? What's executing at the time is this command: zfs send -RI $i@zfs-old $i@zfs-base | zfs receive -Fudv $BACKUP Which in turn results in the old snapshots on the target not on the source being deleted, then the new ones being sent. It never gets to the sending part; it blows up during the delete of the OLD snapshots. The one(s) it deletes, however, it DOES delete. When the box is rebooted those two snapshots on the target are indeed gone. That is, it is NOT getting "stuck" on one (which would imply there's an un-detected fault in the filesystem on the target in the metadata for that snapshot, resulting in a recursive call that blows up the stack) and it never gets to send the new snapshot, so whatever is going on is NOT on the source filesystem. Neither source or destination shows any errors on the filesystem; both pools are healthy with zero error counts. Therefore the question -- is the system queueing enough work to blow the stack *BUT* the work it queues is all legitimate? If so there's a serious problem in the way the code now functions in that an "ordinary" operation can result in what amounts to kernel stack exhaustion. One note -- I haven't run this backup for the last five days, as I do it manually and I've been out of town. Previous running it on a daily basis completed without trouble. This smells like a backlog of "things to do" when the send runs that results in the allegedly-infinite recursion (that isn't really infinite) that runs the stack out of space -- and THAT implies that the system is trying to queue a crazy amount of work on a recursive basis for what is a perfectly-legitimate operation -- which it should *NOT* do. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Kernel panic in zfs code; 12-STABLE
o_done (zio=0xf8000b8b8000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3376 #284 0x82744eac in zio_execute (zio=0xf8000b8b8000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786 #285 0x80c3b7f4 in taskqueue_run_locked (queue=0xf801a8b35100) at /usr/src/sys/kern/subr_taskqueue.c:467 #286 0x80c3cb28 in taskqueue_thread_loop (arg=) at /usr/src/sys/kern/subr_taskqueue.c:773 #287 0x80b9ab23 in fork_exit ( callout=0x80c3ca90 , arg=0xf801a0577520, frame=0xfe009d4edc00) at /usr/src/sys/kern/kern_fork.c:1063 #288 0x810b367e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:996 #289 0x in ?? () Current language: auto; currently minimal (kgdb) This is currently repeatable. What was going on at the instant in time was: root@NewFS:~ # /root/backup-zfs/run-backup Begin local ZFS backup by SEND Run backups of default [zsr/R/12.STABLE-2019-06-14 zsr/home zs/archive zs/colo-archive zs/disk zsr/dbms/pgsql zs/work zs/dbms/ticker-9.6] Thu Jul 18 14:57:57 CDT 2019 Import backup pool Imported; ready to proceed Processing zsr/R/12.STABLE-2019-06-14 Bring incremental backup up to date attempting destroy backup/R/12.STABLE-2019-06-14@zfs-auto-snap_daily-2019-07-10-00h07 success attempting destroy backup/R/12.STABLE-2019-06-14@zfs-auto-snap_daily-2019-07-11-00h07 success It destroyed the snapshot on the backup volume, and panic'd immediately thereafter. This is an incremental send. If I reboot the machine and re-start the backup job it will blow up when a couple more of the incremental deletes get done. Given the depth of the callback stack is this simply a kernel stack exhaustion problem? I wouldn't THINK it would be, but.. I do have this in /boot/loader.conf: # Try to avoid kernel stack exhaustion due to TRIM storms. kern.kstack_pages="6" The backup volumes are spinning rust, so there should be no "TRIM" attempts to them. In theory. I have the dump if someone wants me to run anything specific against it in terms of stack frames, etc. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS...
On 5/8/2019 19:28, Kevin P. Neal wrote: > On Wed, May 08, 2019 at 11:28:57AM -0500, Karl Denninger wrote: >> If you have pool(s) that are taking *two weeks* to run a scrub IMHO >> either something is badly wrong or you need to rethink organization of >> the pool structure -- that is, IMHO you likely either have a severe >> performance problem with one or more members or an architectural problem >> you *really* need to determine and fix. If a scrub takes two weeks >> *then a resilver could conceivably take that long as well* and that's >> *extremely* bad as the window for getting screwed is at its worst when a >> resilver is being run. > Wouldn't having multiple vdevs mitigate the issue for resilvers (but not > scrubs)? My understanding, please correct me if I'm wrong, is that a > resilver only reads the surviving drives in that specific vdev. Yes. In addition while "most-modern" revisions have material improvements (very much so) in scrub times "out of the box" a bit of tuning makes for very material differences in older revisions. Specifically maxinflight can be a big deal given a reasonable amount of RAM (e.g. 16 or 32Gb) as are async_write_min_active (raise it to "2"; you may get a bit more with "3", but not a lot) I have a scrub running right now and this is what it looks like: Disks da2 da3 da4 da5 da8 da9 da10 KB/t 10.40 11.03 103 108 122 98.11 98.48 tps 46 45 1254 1205 1062 1324 1319 MB/s 0.46 0.48 127 127 127 127 127 %busy 0 0 48 62 97 28 31 Here's the current stat on that pool: pool: zs state: ONLINE scan: scrub in progress since Thu May 9 03:10:00 2019 11.9T scanned at 643M/s, 11.0T issued at 593M/s, 12.8T total 0 repaired, 85.58% done, 0 days 00:54:29 to go config: NAME STATE READ WRITE CKSUM zs ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/rust1.eli ONLINE 0 0 0 gpt/rust2.eli ONLINE 0 0 0 gpt/rust3.eli ONLINE 0 0 0 gpt/rust4.eli ONLINE 0 0 0 gpt/rust5.eli ONLINE 0 0 0 errors: No known data errors Indeed it will be done in about an hour; this is an "automatic" kicked off out of periodic. It's comprised of 4Tb disks and is about 70% occupied. When I get somewhere around another 5-10% I'll swap in 6Tb drives for the 4Tb ones and swap in 8Tb "primary" backup disks for the existing 6Tb ones. This particular machine has a spinning rust pool (which is this one) and another that's comprised of 240Gb Intel 730 SSDs (fairly old as SSDs go but much faster than spinning rust and they have power protection which IMHO is utterly mandatory for SSDs in any environment where you actually care about the data being there after a forced, unexpected plug-pull.) This machine is UPS-backed with apcupsd monitoring it so *in theory* it should never have an unsolicited power failure without notice but "crap happens"; a few years ago there was an undetected fault in one of the batteries (the UPS didn't know about it despite it being programmed to do automated self-tests and hadn't reported the fault), power glitched and blammo -- down it went, no warning. My current "consider those" SSDs for similar replacement or size upgrades would likely be the Micron units -- not the fastest out there but plenty fast, reasonably priced, available in several different versions depending on write endurance and power-protected. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS...
On 5/8/2019 11:53, Freddie Cash wrote: > On Wed, May 8, 2019 at 9:31 AM Karl Denninger wrote: > >> I have a system here with about the same amount of net storage on it as >> you did. It runs scrubs regularly; none of them take more than 8 hours >> on *any* of the pools. The SSD-based pool is of course *much* faster >> but even the many-way RaidZ2 on spinning rust is an ~8 hour deal; it >> kicks off automatically at 2:00 AM when the time comes but is complete >> before noon. I run them on 14 day intervals. >> > (description elided) That is a /lot /bigger pool than either Michelle or I are describing. We're both in the ~20Tb of storage space area. You're running 5-10x that in usable space in some of these pools and yet seeing ~2 day scrub times on a couple of them (that is, the organization looks pretty reasonable given the size and so is the scrub time), one that's ~5 days and likely has some issues with parallelism and fragmentation, and then, well, two awfuls which are both dedup-enabled. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS...
On 5/8/2019 10:14, Michelle Sullivan wrote: > Paul Mather wrote: >> On May 8, 2019, at 9:59 AM, Michelle Sullivan >> wrote: >> >>>> Did you have regular pool scrubs enabled? It would have picked up >>>> silent data corruption like this. It does for me. >>> Yes, every month (once a month because, (1) the data doesn't change >>> much (new data is added, old it not touched), and (2) because to >>> complete it took 2 weeks.) >> >> >> Do you also run sysutils/smartmontools to monitor S.M.A.R.T. >> attributes? Although imperfect, it can sometimes signal trouble >> brewing with a drive (e.g., increasing Reallocated_Sector_Ct and >> Current_Pending_Sector counts) that can lead to proactive remediation >> before catastrophe strikes. > not Automatically >> >> Unless you have been gathering periodic drive metrics, you have no >> way of knowing whether these hundreds of bad sectors have happened >> suddenly or slowly over a period of time. > no, it something i have thought about but been unable to spend the > time on. > There are two issues here that would concern me greatly and IMHO you should address. I have a system here with about the same amount of net storage on it as you did. It runs scrubs regularly; none of them take more than 8 hours on *any* of the pools. The SSD-based pool is of course *much* faster but even the many-way RaidZ2 on spinning rust is an ~8 hour deal; it kicks off automatically at 2:00 AM when the time comes but is complete before noon. I run them on 14 day intervals. If you have pool(s) that are taking *two weeks* to run a scrub IMHO either something is badly wrong or you need to rethink organization of the pool structure -- that is, IMHO you likely either have a severe performance problem with one or more members or an architectural problem you *really* need to determine and fix. If a scrub takes two weeks *then a resilver could conceivably take that long as well* and that's *extremely* bad as the window for getting screwed is at its worst when a resilver is being run. Second, smartmontools/smartd isn't the be-all, end-all but it *does* sometimes catch incipient problems with specific units before they turn into all-on death and IMHO in any installation of any material size where one cares about the data (as opposed to "if it fails just restore it from backup") it should be running. It's very easy to set up and there are no real downsides to using it. I have one disk that I rotate in and out that was bought as a "refurb" and has 70 permanent relocated sectors on it. It has never grown another one since I acquired it, but every time it goes in the machine within minutes I get an alert on that. If I was to ever get *71*, or a *different* drive grew a new one said drive would get replaced *instantly*. Over the years it has flagged two disks before they "hard failed" and both were immediately taken out of service, replaced and then destroyed and thrown away. Maybe that's me being paranoid but IMHO it's the correct approach to such notifications. BTW that tool will *also* tell you if something else software-wise is going on that you *might* think is drive-related. For example recently here on the list I ran into a really oddball thing happening with SAS expanders that showed up with 12-STABLE and was *not* present in the same box with 11.1. Smartmontools confirmed that while the driver was reporting errors from the disks *the disks themselves were not in fact taking errors.* Had I not had that information I might well have traveled down a road that led to a catastrophic pool failure by attempting to replace disks that weren't actually bad. The SAS expander wound up being taken out of service and replaced with an HBA that has more ports -- the issues disappeared. Finally, while you *think* you only have a metadata problem I'm with the other people here in expressing disbelief that the damage is limited to that. There is enough redundancy in the metadata on ZFS that if *all* copies are destroyed or inconsistent to the degree that they're unusable it's extremely likely that if you do get some sort of "disaster recovery" tool working you're going to find out that what you thought was a metadata problem is really a "you're hosed; the data is also gone" sort of problem. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS...
to try to separate them until you get well into the terabytes of storage range and a half-dozen or so physical volumes. There's a very clean argument that prior to that point but with greater than one drive mirrored is always the better choice. Note that if you have an *adapter* go insane (and as I've noted here I've had it happen TWICE in my IT career!) then *all* of the data on the disks served by that adapter is screwed. It doesn't make a bit of difference what filesystem you're using in that scenario and thus you had better have a backup scheme and make sure it works as well, never mind software bugs or administrator stupidity ("dd" as root to the wrong target, for example, will reliably screw you every single time!) For a single-disk machine ZFS is no *less* safe than UFS and provides a number of advantages, with arguably the most-important being easily-used snapshots. Not only does this simplify backups since coherency during the backup is never at issue and incremental backups become fast and easily-done in addition boot environments make roll-forward and even *roll-back* reasonable to implement for software updates -- a critical capability if you ever run an OS version update and something goes seriously wrong with it. If you've never had that happen then consider yourself blessed; it's NOT fun to manage in a UFS environment and often winds up leading to a "restore from backup" scenario. (To be fair it can be with ZFS too if you're foolish enough to upgrade the pool before being sure you're happy with the new OS rev.) -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS...
On 4/30/2019 20:59, Michelle Sullivan wrote >> On 01 May 2019, at 11:33, Karl Denninger wrote: >> >>> On 4/30/2019 19:14, Michelle Sullivan wrote: >>> >>> Michelle Sullivan >>> http://www.mhix.org/ >>> Sent from my iPad >>> >> Nope. I'd much rather *know* the data is corrupt and be forced to >> restore from backups than to have SILENT corruption occur and perhaps >> screw me 10 years down the road when the odds are my backups have >> long-since been recycled. > Ahh yes the be all and end all of ZFS.. stops the silent corruption of data.. > but don’t install it on anything unless it’s server grade with backups and > ECC RAM, but it’s good on laptops because it protects you from silent > corruption of your data when 10 years later the backups have long-since been > recycled... umm is that not a circular argument? > > Don’t get me wrong here.. and I know you (and some others are) zfs in the DC > with 10s of thousands in redundant servers and/or backups to keep your > critical data corruption free = good thing. > > ZFS on everything is what some say (because it prevents silent corruption) > but then you have default policies to install it everywhere .. including > hardware not equipped to function safely with it (in your own arguments) and > yet it’s still good because it will still prevent silent corruption even > though it relies on hardware that you can trust... umm say what? > > Anyhow veered way way off (the original) topic... > > Modest (part consumer grade, part commercial) suffered irreversible data loss > because of a (very unusual, but not impossible) double power outage.. and no > tools to recover the data (or part data) unless you have some form of backup > because the file system deems the corruption to be too dangerous to let you > access any of it (even the known good bits) ... > > Michelle IMHO you're dead wrong Michelle. I respect your opinion but disagree vehemently. I run ZFS on both of my laptops under FreeBSD. Both have non-power-protected SSDs in them. Neither is mirrored or Raidz-anything. So why run ZFS instead of UFS? Because a scrub will detect data corruption that UFS cannot detect *at all.* It is a balance-of-harms test and you choose. I can make a very clean argument that *greater information always wins*; that is, I prefer in every case to *know* I'm screwed rather than not. I can defend against being screwed with some amount of diligence but in order for that diligence to be reasonable I have to know about the screwing in a reasonable amount of time after it happens. You may have never had silent corruption bite you. I have had it happen several times over my IT career. If that happens to you the odds are that it's absolutely unrecoverable and whatever gets corrupted is *gone.* The defensive measures against silent corruption require retention of backup data *literally forever* for the entire useful life of the information because from the point of corruption forward *the backups are typically going to be complete and correct copies of the corrupt data and thus equally worthless to what's on the disk itself.* With non-ZFS filesystems quite a lot of thought and care has to go into defending against that, and said defense usually requires the active cooperation of whatever software wrote said file in the first place (e.g. a database, etc.) If said software has no tools to "walk" said data or if it's impractical to have it do so you're at severe risk of being hosed. Prior to ZFS there really wasn't any comprehensive defense against this sort of event. There are a whole host of applications that manipulate data that are absolutely reliant on that sort of thing not happening (e.g. anything using a btree data structure) and recovery if it *does* happen is a five-alarm nightmare if it's possible at all. In the worst-case scenario you don't detect the corruption and the data that has the pointer to it that gets corrupted is overwritten and destroyed. A ZFS scrub on a volume that has no redundancy cannot *fix* that corruption but it can and will detect it. This puts a boundary on the backups that I must keep in order to *not* have that happen. This is of very high value to me and is why, even on systems without ECC memory and without redundant disks, provided there is enough RAM to make it reasonable (e.g. not on embedded systems I do development on with are severely RAM-constrained) I run ZFS. BTW if you've never had a UFS volume unlink all the blocks within a file on an fsck and then recover them back into the free list after a crash you're a rare bird indeed. If you think a corrupt ZFS volume is fun try to get your data back from said file after that happens. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS...
On 4/30/2019 19:14, Michelle Sullivan wrote: > > Michelle Sullivan > http://www.mhix.org/ > Sent from my iPad > >> On 01 May 2019, at 01:15, Karl Denninger wrote: >> >> >> IMHO non-ECC memory systems are ok for personal desktop and laptop >> machines where loss of stored data requiring a restore is acceptable >> (assuming you have a reasonable backup paradigm for same) but not for >> servers and *especially* not for ZFS storage. I don't like the price of >> ECC memory and I really don't like Intel's practices when it comes to >> only enabling ECC RAM on their "server" class line of CPUs either but it >> is what it is. Pay up for the machines where it matters. > And the irony is the FreeBSD policy to default to zfs on new installs using > the complete drive.. even when there is only one disk available and > regardless of the cpu or ram class... with one usb stick I have around here > it attempted to use zfs on one of my laptops. > > Damned if you do, damned if you don’t comes to mind. > Nope. I'd much rather *know* the data is corrupt and be forced to restore from backups than to have SILENT corruption occur and perhaps screw me 10 years down the road when the odds are my backups have long-since been recycled. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS...
On 4/30/2019 09:12, Alan Somers wrote: > On Tue, Apr 30, 2019 at 8:05 AM Michelle Sullivan wrote: > . >> I know this... unless I misread Karl’s message he implied the ECC would have >> saved the corruption in the crash... which is patently false... I think >> you’ll agree.. > I don't think that's what Karl meant. I think he meant that the > non-ECC RAM could've caused latent corruption that was only detected > when the crash forced a reboot and resilver. Exactly. Non-ECC memory means you can potentially write data to *all* copies of a block (and its parity in the case of a Raidz) where the checksum is invalid and there is no way for the code to know it happened or defend against it. Unfortunately since the checksum is very small compared to the data size the odds are that IF that happens it's the *data* and not the checksum that's bad and there are *no* good copies. Contrary to popular belief the "power good" signal on your PSU and MB do not provide 100% protection against transient power problems causing this to occur with non-ECC memory either. IMHO non-ECC memory systems are ok for personal desktop and laptop machines where loss of stored data requiring a restore is acceptable (assuming you have a reasonable backup paradigm for same) but not for servers and *especially* not for ZFS storage. I don't like the price of ECC memory and I really don't like Intel's practices when it comes to only enabling ECC RAM on their "server" class line of CPUs either but it is what it is. Pay up for the machines where it matters. One of the ironies is that there's better data *integrity* with ZFS than other filesystems in this circumstance; you're much more-likely to *know* you're hosed even if the situation is unrecoverable and requires a restore. With UFS and other filesystems you can quite-easily wind up with silent corruption that can go undetected; the filesystem "works" just fine but the data is garbage. From my point of view that's *much* worse. In addition IMHO consumer drives are not exactly safe for online ZFS storage. Ironically they're *safer* for archival use because when not actively in use they're dismounted and thus not subject to "you're silently hosed" sort of failures. What sort of "you're hosed" failures? Oh, for example, claiming to have flushed their cache buffers before returning "complete" on that request when they really did not! In combination with write re-ordering that can *really* screw you and there's nothing that any filesystem can defensively do about it either. This sort of "cheat" is much-more likely to be present in consumer drives than ones sold for either enterprise or NAS purposes and it's quite difficult to accurately test for this sort of thing on an individual basis too. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS...
On 4/30/2019 08:38, Michelle Sullivan wrote: > Karl Denninger wrote: >> On 4/30/2019 03:09, Michelle Sullivan wrote: >>> Consider.. >>> >>> If one triggers such a fault on a production server, how can one >>> justify transferring from backup multiple terabytes (or even >>> petabytes now) of data to repair an unmountable/faulted array >>> because all backup solutions I know currently would take days if not >>> weeks to restore the sort of store ZFS is touted with supporting. >> Had it happen on a production server a few years back with ZFS. The >> *hardware* went insane (disk adapter) and scribbled on *all* of the >> vdevs. >> >> >> Time to recover essential functions was ~8 hours (and over 24 >> hours for everything to be restored.) >> > How big was the storage area? > In that specific instance approximately 10Tb in-use. The working set that allowed critical functions to come online (and which got restored first, obviously) was ~3Tb. BTW my personal "primary server" working set is approximately 20Tb. There is data on that server dating back to 1982 -- yes, data all the way back to systems I owned that ran on a Z-80 processor with 64Kb (not MB) of RAM. I started running ZFS a fairly long time ago on FreeBSD -- 9.0, I believe, and have reorganized and upgraded drives over time. If my storage fails "hard" in a way that I have no local backups available (e.g. building fire, adapter scribbles on drives including not-mounted ones, etc) critical functionality (e.g. email receipt, etc) can be back online in roughly 3-4 hours, assuming the bank is open and I can get to the backup volumes. A full restore will require more than a day. I've tested restore of each individual piece of the backup structure but do not have the hardware available in the building to restore a complete clone. With the segregated structure of it, however, I'm 100% certain it is all restorable. That's tested regularly -- just to be sure. Now if we get nuked and the bank vault is also destroyed then it's over, but then again I'm probably a burnt piece of charcoal in such a circumstance so that's a risk I accept. When I ran my ISP in the 1990s we had both local copies and vault copies because a "scribbles on disk" failure on a Saturday couldn't be unable to be addressed until Monday morning. We would have been out of business instantly if that had happened in any situation short of the office with our primary data center burning down. Incidentally one of my adapter failures was in exactly the worst possible place for it to occur while running that company -- the adapter on the machine that held our primary authentication and billing database. At the time the best option for "large" working sets was DLT. Now for most purposes it's another disk. Disks, however, must be re-verified more-frequently than DLT -- MUCH more frequently. Further, if you have only ONE backup then it cannot be singular (e.g. there must be two or more, whether via mirroring or some other mechanism) on ANY media. While DLT, for example, has a typical expected 30 year archival life that doesn't mean the ONE tape you have will be readable 30 years later. As data size expands noodling on how you segregate data into read-only, write-very-occasionally and read/write, along with how you handle backups of each component and how, or if, you subdivide those categories for backup purposes becomes quite important. If performance matters (and it usually does) then what goes where in what pool (and across pools of similar base storage types) matters too; in my personal working set there is both SSD (all "power-pull safe" drives, which cost more and tend to be somewhat slower than "consumer" SSDs) and spinning rust storage for that exact reason. Note that on this list right now I'm chasing a potential "gotcha" interaction between geli and ZFS that thus far has eluded isolation. While it has yet to corrupt data the potential is there and the hair on the back of my neck is standing up a bit as a consequence. It appears to date to either 11.2 or 12.0 and *definitely* is present in 12-STABLE; it was *not* present on 11.1. The price of keeping your data intact is always eternal vigilance. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS...
On 4/30/2019 03:09, Michelle Sullivan wrote: > Consider.. > > If one triggers such a fault on a production server, how can one justify > transferring from backup multiple terabytes (or even petabytes now) of data > to repair an unmountable/faulted array because all backup solutions I > know currently would take days if not weeks to restore the sort of store ZFS > is touted with supporting. Had it happen on a production server a few years back with ZFS. The *hardware* went insane (disk adapter) and scribbled on *all* of the vdevs. The machine crashed and would not come back up -- at all. I insist on (and had) emergency boot media physically in the box (a USB key) in any production machine and it was quite-quickly obvious that all of the vdevs were corrupted beyond repair. There was no rational option other than to restore. It was definitely not a pleasant experience, but this is why when you get into systems and data store sizes where it's a five-alarm pain in the neck you must figure out some sort of strategy that covers you 99% of the time without a large amount of downtime involved, and in the 1% case accept said downtime. In this particular circumstance the customer didn't want to spend on a doubled-and-transaction-level protected on-site (in the same DC) redundancy setup originally so restore, as opposed to fail-over/promote and then restore and build a new "redundant" box where the old "primary" resided was the most-viable option. Time to recover essential functions was ~8 hours (and over 24 hours for everything to be restored.) Incidentally that's not the first time I've had a disk adapter failure on a production machine in my career as a systems dude; it was, in fact, the *third* such failure. Then again I've been doing this stuff since the 1980s and learned long ago that if it can break it eventually will, and that Murphy is a real b**. The answer to your question Michelle is that when restore times get into "seriously disruptive" amounts of time (e.g. hours, days or worse depending on the application involved and how critical it is) you spend the time and money to have redundancy in multiple places and via paths that do not destroy the redundant copies when things go wrong, and you spend the engineering time to figure out what those potential faults are and how to design such that a fault which can destroy the data set does not propagate to the redundant copies before it is detected. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: ZFS...
On 4/30/2019 05:14, Michelle Sullivan wrote: >> On 30 Apr 2019, at 19:50, Xin LI wrote: >>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan >>> wrote: >>> but in my recent experience 2 issues colliding at the same time results in >>> disaster >> Do we know exactly what kind of corruption happen to your pool? If you see >> it twice in a row, it might suggest a software bug that should be >> investigated. >> >> All I know is it’s a checksum error on a meta slab (122) and from what I can >> gather it’s the spacemap that is corrupt... but I am no expert. I don’t >> believe it’s a software fault as such, because this was cause by a hard >> outage (damaged UPSes) whilst resilvering a single (but completely failed) >> drive. ...and after the first outage a second occurred (same as the first >> but more damaging to the power hardware)... the host itself was not damaged >> nor were the drives or controller. . >> Note that ZFS stores multiple copies of its essential metadata, and in my >> experience with my old, consumer grade crappy hardware (non-ECC RAM, with >> several faulty, single hard drive pool: bad enough to crash almost monthly >> and damages my data from time to time), > This was a top end consumer grade mb with non ecc ram that had been running > for 8+ years without fault (except for hard drive platter failures.). Uptime > would have been years if it wasn’t for patching. Yuck. I'm sorry, but that may well be what nailed you. ECC is not just about the random cosmic ray. It also saves your bacon when there are power glitches. Unfortunately however there is also cache memory on most modern hard drives, most of the time (unless you explicitly shut it off) it's on for write caching, and it'll nail you too. Oh, and it's never, in my experience, ECC. In addition, however, and this is something I learned a LONG time ago (think Z-80 processors!) is that as in so many very important things "two is one and one is none." In other words without a backup you WILL lose data eventually, and it WILL be important. Raidz2 is very nice, but as the name implies it you have two redundancies. If you take three errors, or if, God forbid, you *write* a block that has a bad checksum in it because it got scrambled while in RAM, you're dead if that happens in the wrong place. > Yeah.. unlike UFS that has to get really really hosed to restore from backup > with nothing recoverable it seems ZFS can get hosed where issues occur in > just the wrong bit... but mostly it is recoverable (and my experience has > been some nasty shit that always ended up being recoverable.) > > Michelle Oh that is definitely NOT true again, from hard experience, including (but not limited to) on FreeBSD. My experience is that ZFS is materially more-resilient but there is no such thing as "can never be corrupted by any set of events." Backup strategies for moderately large (e.g. many Terabytes) to very large (e.g. Petabytes and beyond) get quite complex but they're also very necessary. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20) [[UPDATE w/more tests]]
On 4/20/2019 15:56, Steven Hartland wrote: > Thanks for extra info, the next question would be have you eliminated > that corruption exists before the disk is removed? > > Would be interesting to add a zpool scrub to confirm this isn't the > case before the disk removal is attempted. > > Regards > Steve > > On 20/04/2019 18:35, Karl Denninger wrote: >> >> On 4/20/2019 10:50, Steven Hartland wrote: >>> Have you eliminated geli as possible source? >> No; I could conceivably do so by re-creating another backup volume >> set without geli-encrypting the drives, but I do not have an extra >> set of drives of the capacity required laying around to do that. I >> would have to do it with lower-capacity disks, which I can attempt if >> you think it would help. I *do* have open slots in the drive >> backplane to set up a second "test" unit of this sort. For reasons >> below it will take at least a couple of weeks to get good data on >> whether the problem exists without geli, however. >> Ok, following up on this with more data First step taken was to create a *second* backup pool (I have the backplane slots open, fortunately) with three different disks but *no encryption.* I ran both side-by-side for several days, with the *unencrypted* one operating with one disk detached and offline (pulled physically) just as I do normally. Then I swapped the two using the same paradigm. The difference was *dramatic* -- the resilver did *not* scan the entire disk; it only copied the changed blocks and was finished FAST. A subsequent scrub came up 100% clean. Next I put THOSE disks in the vault (so as to make sure I didn't get hosed if something went wrong) and re-initialized the pool in question, leaving only the "geli" alone (in other words I zpool destroy'd the pool and then re-created it with all three disks connected and geli-attached.) The purpose for doing this was to eliminate the possibility of old corruption somewhere, or some sort of problem with multiple, spanning years, in-place "zpool upgrade" commands. Then I ran a base backup to initialize all three volumes, detached one and yanked it out of the backplane, as would be the usual, leaving the other two online and operating. I ran backups as usual for most of last week after doing this, with the 61.eli and 62-1.eli volumes online, and 62-2 physically out of the backplane. Today I swapped them again as I usually do (e.g. offline 62.1, geli detach, camcontrol standby and then yank it -- then insert the 62-2 volume, geli attach and zpool online) and this is happening: [\u@NewFS /home/karl]# zpool status backup pool: backup state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sun Apr 28 12:57:47 2019 2.48T scanned at 202M/s, 1.89T issued at 154M/s, 3.27T total 1.89T resilvered, 57.70% done, 0 days 02:37:14 to go config: NAME STATE READ WRITE CKSUM backup DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 gpt/backup61.eli ONLINE 0 0 0 11295390187305954877 OFFLINE 0 0 0 was /dev/gpt/backup62-1.eli gpt/backup62-2.eli ONLINE 0 0 0 errors: No known data errors The "3.27T" number is accurate (by "zpool list") for the space in use. There is not a snowball's chance in Hades that anywhere near 1.89T of that data (thus far, and it ain't done as you can see!) was modified between when all three disks were online and when the 62-2.eli volume was swapped back in for 62.1.eli. No possible way. Maybe some 100-200Gb of data has been touched across the backed-up filesystems in the last three-ish days but there's just flat-out no way it's more than that; this would imply an entropy of well over 50% of the writeable data on this box in less than a week! That's NOT possible. Further it's not 100%; it shows 2.48T scanned but 1.89T actually written to the other drive. So something is definitely foooged here and it does appear that geli is involved in it. Whatever is foooging zfs the resilver process thinks it has to recopy MOST (but not all!) of the blocks in use, it appears, from the 61.eli volume to the 62-2.eli volume. The question is what would lead ZFS to think it has to do that -- it clearly DOES NOT as a *much* smaller percentage of the total TXG set on 61.eli was modified while 62-2.eli was offline and 62.1.eli was online. Again I note that on 11.1 and previous this resilver was a rapid operation; whatever was actually changed got copied but the system never copied *nearly everythi
Pkg upgrade for 12-STABLE builds in "Latest" broken?
[\u@NewFS /home/karl]# pkg upgrade dovecot Updating FreeBSD repository catalogue... FreeBSD repository is up to date. All repositories are up to date. The following 1 package(s) will be affected (of 0 checked): Installed packages to be UPGRADED: dovecot: 2.3.4.1 -> 2.3.5 Number of packages to be upgraded: 1 4 MiB to be downloaded. Proceed with this action? [y/N]: y pkg: http://pkg.FreeBSD.org/FreeBSD:12:amd64/latest/All/dovecot-2.3.5.txz: Not Found [\u@NewFS /home/karl]# -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
No; I can, but of course that's another ~8 hour (overnight) delay between swaps. That's not a bad idea however On 4/20/2019 15:56, Steven Hartland wrote: > Thanks for extra info, the next question would be have you eliminated > that corruption exists before the disk is removed? > > Would be interesting to add a zpool scrub to confirm this isn't the > case before the disk removal is attempted. > > Regards > Steve > > On 20/04/2019 18:35, Karl Denninger wrote: >> >> On 4/20/2019 10:50, Steven Hartland wrote: >>> Have you eliminated geli as possible source? >> No; I could conceivably do so by re-creating another backup volume >> set without geli-encrypting the drives, but I do not have an extra >> set of drives of the capacity required laying around to do that. I >> would have to do it with lower-capacity disks, which I can attempt if >> you think it would help. I *do* have open slots in the drive >> backplane to set up a second "test" unit of this sort. For reasons >> below it will take at least a couple of weeks to get good data on >> whether the problem exists without geli, however. >>> >>> I've just setup an old server which has a LSI 2008 running and old >>> FW (11.0) so was going to have a go at reproducing this. >>> >>> Apart from the disconnect steps below is there anything else needed >>> e.g. read / write workload during disconnect? >> >> Yes. An attempt to recreate this on my sandbox machine using smaller >> disks (WD RE-320s) and a decent amount of read/write activity (tens >> to ~100 gigabytes) on a root mirror of three disks with one taken >> offline did not succeed. It *reliably* appears, however, on my >> backup volumes with every drive swap. The sandbox machine is >> physically identical other than the physical disks; both are Xeons >> with ECC RAM in them. >> >> The only operational difference is that the backup volume sets have a >> *lot* of data written to them via zfs send|zfs recv over the >> intervening period where with "ordinary" activity from I/O (which was >> the case on my sandbox) the I/O pattern is materially different. The >> root pool on the sandbox where I tried to reproduce it synthetically >> *is* using geli (in fact it boots native-encrypted.) >> >> The "ordinary" resilver on a disk swap typically covers ~2-3Tb and is >> a ~6-8 hour process. >> >> The usual process for the backup pool looks like this: >> >> Have 2 of the 3 physical disks mounted; the third is in the bank vault. >> >> Over the space of a week, the backup script is run daily. It first >> imports the pool and then for each zfs filesystem it is backing up >> (which is not all of them; I have a few volatile ones that I don't >> care if I lose, such as object directories for builds and such, plus >> some that are R/O data sets that are backed up separately) it does: >> >> If there is no "...@zfs-base": zfs snapshot -r ...@zfs-base; zfs send >> -R ...@zfs-base | zfs receive -Fuvd $BACKUP >> >> else >> >> zfs rename -r ...@zfs-base ...@zfs-old >> zfs snapshot -r ...@zfs-base >> >> zfs send -RI ...@zfs-old ...@zfs-base |zfs recv -Fudv $BACKUP >> >> if ok then zfs destroy -vr ...@zfs-old otherwise print a >> complaint and stop. >> >> When all are complete it then does a "zpool export backup" to detach >> the pool in order to reduce the risk of "stupid root user" (me) >> accidents. >> >> In short I send an incremental of the changes since the last backup, >> which in many cases includes a bunch of automatic snapshots that are >> taken on frequent basis out of the cron. Typically there are a week's >> worth of these that accumulate between swaps of the disk to the >> vault, and the offline'd disk remains that way for a week. I also >> wait for the zpool destroy on each of the targets to drain before >> continuing, as not doing so back in the 9 and 10.x days was a good >> way to stimulate an instant panic on re-import the next day due to >> kernel stack page exhaustion if the previous operation destroyed >> hundreds of gigabytes of snapshots (which does routinely happen as >> part of the backed up data is Macrium images from PCs, so when a new >> month comes around the PC's backup routine removes a huge amount of >> old data from the filesystem.) >> >> Trying to simulate the checksum errors in a few hours' time thus far >> has failed. But every time I swap the
Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
0 0 13282812295755460479 OFFLINE 0 0 0 was /dev/gpt/backup62-2.eli errors: No known data errors It knows it fixed the checksums but the error count is zero -- I did NOT "zpool clear". This may have been present in 11.2; I didn't run that long enough in this environment to know. It definitely was *not* present in 11.1 and before; the same data structure and script for backups has been in use for a very long time without any changes and this first appeared when I upgraded from 11.1 to 12.0 on this specific machine, with the exact same physical disks being used for over a year (they're currently 6Tb units; the last change out for those was ~1.5 years ago when I went from 4Tb to 6Tb volumes.) I have both HGST-NAS and He-Enterprise disks in the rotation and both show identical behavior so it doesn't appear to be related to a firmware problem in one disk .vs. the other (e.g. firmware that fails to flush the on-drive cache before going to standby even though it was told to.) > > mps0: port 0xe000-0xe0ff mem > 0xfaf3c000-0xfaf3,0xfaf4-0xfaf7 irq 26 at device 0.0 on pci3 > mps0: Firmware: 11.00.00.00, Driver: 21.02.00.00-fbsd > mps0: IOCCapabilities: > 185c > > Regards > Steve > > On 20/04/2019 15:39, Karl Denninger wrote: >> I can confirm that 20.00.07.00 does *not* stop this. >> The previous write/scrub on this device was on 20.00.07.00. It was >> swapped back in from the vault yesterday, resilvered without incident, >> but a scrub says >> >> root@NewFS:/home/karl # zpool status backup >> pool: backup >> state: DEGRADED >> status: One or more devices has experienced an unrecoverable error. An >> attempt was made to correct the error. Applications are >> unaffected. >> action: Determine if the device needs to be replaced, and clear the >> errors >> using 'zpool clear' or replace the device with 'zpool replace'. >> see: http://illumos.org/msg/ZFS-8000-9P >> scan: scrub repaired 188K in 0 days 09:40:18 with 0 errors on Sat Apr >> 20 08:45:09 2019 >> config: >> >> NAME STATE READ WRITE CKSUM >> backup DEGRADED 0 0 0 >> mirror-0 DEGRADED 0 0 0 >> gpt/backup61.eli ONLINE 0 0 0 >> gpt/backup62-1.eli ONLINE 0 0 47 >> 13282812295755460479 OFFLINE 0 0 0 was >> /dev/gpt/backup62-2.eli >> >> errors: No known data errors >> >> So this is firmware-invariant (at least between 19.00.00.00 and >> 20.00.07.00); the issue persists. >> >> Again, in my instance these devices are never removed "unsolicited" so >> there can't be (or at least shouldn't be able to) unflushed data in the >> device or kernel cache. The procedure is and remains: >> >> zpool offline . >> geli detach . >> camcontrol standby ... >> >> Wait a few seconds for the spindle to spin down. >> >> Remove disk. >> >> Then of course on the other side after insertion and the kernel has >> reported "finding" the device: >> >> geli attach ... >> zpool online >> >> Wait... >> >> If this is a boogered TXG that's held in the metadata for the >> "offline"'d device (maybe "off by one"?) that's potentially bad in that >> if there is an unknown failure in the other mirror component the >> resilver will complete but data has been irrevocably destroyed. >> >> Granted, this is a very low probability scenario (the area where the bad >> checksums are has to be where the corruption hits, and it has to happen >> between the resilver and access to that data.) Those are long odds but >> nonetheless a window of "you're hosed" does appear to exist. >> > -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
On 4/13/2019 06:00, Karl Denninger wrote: > On 4/11/2019 13:57, Karl Denninger wrote: >> On 4/11/2019 13:52, Zaphod Beeblebrox wrote: >>> On Wed, Apr 10, 2019 at 10:41 AM Karl Denninger wrote: >>> >>> >>>> In this specific case the adapter in question is... >>>> >>>> mps0: port 0xc000-0xc0ff mem >>>> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3 >>>> mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd >>>> mps0: IOCCapabilities: >>>> 1285c >>>> >>>> Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects >>>> his drives via dumb on-MoBo direct SATA connections. >>>> >>> Maybe I'm in good company. My current setup has 8 of the disks connected >>> to: >>> >>> mps0: port 0xb000-0xb0ff mem >>> 0xfe24-0xfe24,0xfe20-0xfe23 irq 32 at device 0.0 on pci6 >>> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd >>> mps0: IOCCapabilities: >>> 5a85c >>> >>> ... just with a cable that breaks out each of the 2 connectors into 4 >>> SATA-style connectors, and the other 8 disks (plus boot disks and SSD >>> cache/log) connected to ports on... >>> >>> - ahci0: port >>> 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem >>> 0xfe90-0xfe9001ff irq 44 at device 0.0 on pci2 >>> - ahci2: port >>> 0xa050-0xa057,0xa040-0xa043,0xa030-0xa037,0xa020-0xa023,0xa000-0xa01f mem >>> 0xfe61-0xfe6107ff irq 40 at device 0.0 on pci7 >>> - ahci3: port >>> 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem >>> 0xfea07000-0xfea073ff irq 19 at device 17.0 on pci0 >>> >>> ... each drive connected to a single port. >>> >>> I can actually reproduce this at will. Because I have 16 drives, when one >>> fails, I need to find it. I pull the sata cable for a drive, determine if >>> it's the drive in question, if not, reconnect, "ONLINE" it and wait for >>> resilver to stop... usually only a minute or two. >>> >>> ... if I do this 4 to 6 odd times to find a drive (I can tell, in general, >>> that a drive is part of the SAS controller or the SATA controllers... so >>> I'm only looking among 8, ever) ... then I "REPLACE" the problem drive. >>> More often than not, the a scrub will find a few problems. In fact, it >>> appears that the most recent scrub is an example: >>> >>> [1:7:306]dgilbert@vr:~> zpool status >>> pool: vr1 >>> state: ONLINE >>> scan: scrub repaired 32K in 47h16m with 0 errors on Mon Apr 1 23:12:03 >>> 2019 >>> config: >>> >>> NAMESTATE READ WRITE CKSUM >>> vr1 ONLINE 0 0 0 >>> raidz2-0 ONLINE 0 0 0 >>> gpt/v1-d0 ONLINE 0 0 0 >>> gpt/v1-d1 ONLINE 0 0 0 >>> gpt/v1-d2 ONLINE 0 0 0 >>> gpt/v1-d3 ONLINE 0 0 0 >>> gpt/v1-d4 ONLINE 0 0 0 >>> gpt/v1-d5 ONLINE 0 0 0 >>> gpt/v1-d6 ONLINE 0 0 0 >>> gpt/v1-d7 ONLINE 0 0 0 >>> raidz2-2 ONLINE 0 0 0 >>> gpt/v1-e0c ONLINE 0 0 0 >>> gpt/v1-e1b ONLINE 0 0 0 >>> gpt/v1-e2b ONLINE 0 0 0 >>> gpt/v1-e3b ONLINE 0 0 0 >>> gpt/v1-e4b ONLINE 0 0 0 >>> gpt/v1-e5a ONLINE 0 0 0 >>> gpt/v1-e6a ONLINE 0 0 0 >>> gpt/v1-e7c ONLINE 0 0 0 >>> logs >>> gpt/vr1logONLINE 0 0 0 >>> cache >>> gpt/vr1cache ONLINE 0 0 0 >>> >>> errors: No known data errors >>> >>> ... it doesn't say it now, but there were 5 CKSUM errors on one of the >>> drives that I had trial-removed (and not on the one replaced). >>> ___ >> That is EXACTLY what I'm seeing; the "OFFLINE'd" drive is the one that, >> after a scrub, comes up with the checksum errors. It does *not* fla
Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
On 4/11/2019 13:57, Karl Denninger wrote: > On 4/11/2019 13:52, Zaphod Beeblebrox wrote: >> On Wed, Apr 10, 2019 at 10:41 AM Karl Denninger wrote: >> >> >>> In this specific case the adapter in question is... >>> >>> mps0: port 0xc000-0xc0ff mem >>> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3 >>> mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd >>> mps0: IOCCapabilities: >>> 1285c >>> >>> Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects >>> his drives via dumb on-MoBo direct SATA connections. >>> >> Maybe I'm in good company. My current setup has 8 of the disks connected >> to: >> >> mps0: port 0xb000-0xb0ff mem >> 0xfe24-0xfe24,0xfe20-0xfe23 irq 32 at device 0.0 on pci6 >> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd >> mps0: IOCCapabilities: >> 5a85c >> >> ... just with a cable that breaks out each of the 2 connectors into 4 >> SATA-style connectors, and the other 8 disks (plus boot disks and SSD >> cache/log) connected to ports on... >> >> - ahci0: port >> 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem >> 0xfe90-0xfe9001ff irq 44 at device 0.0 on pci2 >> - ahci2: port >> 0xa050-0xa057,0xa040-0xa043,0xa030-0xa037,0xa020-0xa023,0xa000-0xa01f mem >> 0xfe61-0xfe6107ff irq 40 at device 0.0 on pci7 >> - ahci3: port >> 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem >> 0xfea07000-0xfea073ff irq 19 at device 17.0 on pci0 >> >> ... each drive connected to a single port. >> >> I can actually reproduce this at will. Because I have 16 drives, when one >> fails, I need to find it. I pull the sata cable for a drive, determine if >> it's the drive in question, if not, reconnect, "ONLINE" it and wait for >> resilver to stop... usually only a minute or two. >> >> ... if I do this 4 to 6 odd times to find a drive (I can tell, in general, >> that a drive is part of the SAS controller or the SATA controllers... so >> I'm only looking among 8, ever) ... then I "REPLACE" the problem drive. >> More often than not, the a scrub will find a few problems. In fact, it >> appears that the most recent scrub is an example: >> >> [1:7:306]dgilbert@vr:~> zpool status >> pool: vr1 >> state: ONLINE >> scan: scrub repaired 32K in 47h16m with 0 errors on Mon Apr 1 23:12:03 >> 2019 >> config: >> >> NAMESTATE READ WRITE CKSUM >> vr1 ONLINE 0 0 0 >> raidz2-0 ONLINE 0 0 0 >> gpt/v1-d0 ONLINE 0 0 0 >> gpt/v1-d1 ONLINE 0 0 0 >> gpt/v1-d2 ONLINE 0 0 0 >> gpt/v1-d3 ONLINE 0 0 0 >> gpt/v1-d4 ONLINE 0 0 0 >> gpt/v1-d5 ONLINE 0 0 0 >> gpt/v1-d6 ONLINE 0 0 0 >> gpt/v1-d7 ONLINE 0 0 0 >> raidz2-2 ONLINE 0 0 0 >> gpt/v1-e0c ONLINE 0 0 0 >> gpt/v1-e1b ONLINE 0 0 0 >> gpt/v1-e2b ONLINE 0 0 0 >> gpt/v1-e3b ONLINE 0 0 0 >> gpt/v1-e4b ONLINE 0 0 0 >> gpt/v1-e5a ONLINE 0 0 0 >> gpt/v1-e6a ONLINE 0 0 0 >> gpt/v1-e7c ONLINE 0 0 0 >> logs >> gpt/vr1logONLINE 0 0 0 >> cache >> gpt/vr1cache ONLINE 0 0 0 >> >> errors: No known data errors >> >> ... it doesn't say it now, but there were 5 CKSUM errors on one of the >> drives that I had trial-removed (and not on the one replaced). >> ___ > That is EXACTLY what I'm seeing; the "OFFLINE'd" drive is the one that, > after a scrub, comes up with the checksum errors. It does *not* flag > any errors during the resilver and the drives *not* taken offline do not > (ever) show checksum errors either. > > Interestingly enough you have 19.00.00.00 firmware on your card as well > -- which is what was on mine. > > I have flashed my card forward to 20.00.07.00 -- we'll see if it still > does it when I do the next swap of the backup set. Verry interes
Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
On 4/11/2019 13:52, Zaphod Beeblebrox wrote: > On Wed, Apr 10, 2019 at 10:41 AM Karl Denninger wrote: > > >> In this specific case the adapter in question is... >> >> mps0: port 0xc000-0xc0ff mem >> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3 >> mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd >> mps0: IOCCapabilities: >> 1285c >> >> Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects >> his drives via dumb on-MoBo direct SATA connections. >> > Maybe I'm in good company. My current setup has 8 of the disks connected > to: > > mps0: port 0xb000-0xb0ff mem > 0xfe24-0xfe24,0xfe20-0xfe23 irq 32 at device 0.0 on pci6 > mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd > mps0: IOCCapabilities: > 5a85c > > ... just with a cable that breaks out each of the 2 connectors into 4 > SATA-style connectors, and the other 8 disks (plus boot disks and SSD > cache/log) connected to ports on... > > - ahci0: port > 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem > 0xfe90-0xfe9001ff irq 44 at device 0.0 on pci2 > - ahci2: port > 0xa050-0xa057,0xa040-0xa043,0xa030-0xa037,0xa020-0xa023,0xa000-0xa01f mem > 0xfe61-0xfe6107ff irq 40 at device 0.0 on pci7 > - ahci3: port > 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem > 0xfea07000-0xfea073ff irq 19 at device 17.0 on pci0 > > ... each drive connected to a single port. > > I can actually reproduce this at will. Because I have 16 drives, when one > fails, I need to find it. I pull the sata cable for a drive, determine if > it's the drive in question, if not, reconnect, "ONLINE" it and wait for > resilver to stop... usually only a minute or two. > > ... if I do this 4 to 6 odd times to find a drive (I can tell, in general, > that a drive is part of the SAS controller or the SATA controllers... so > I'm only looking among 8, ever) ... then I "REPLACE" the problem drive. > More often than not, the a scrub will find a few problems. In fact, it > appears that the most recent scrub is an example: > > [1:7:306]dgilbert@vr:~> zpool status > pool: vr1 > state: ONLINE > scan: scrub repaired 32K in 47h16m with 0 errors on Mon Apr 1 23:12:03 > 2019 > config: > > NAMESTATE READ WRITE CKSUM > vr1 ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > gpt/v1-d0 ONLINE 0 0 0 > gpt/v1-d1 ONLINE 0 0 0 > gpt/v1-d2 ONLINE 0 0 0 > gpt/v1-d3 ONLINE 0 0 0 > gpt/v1-d4 ONLINE 0 0 0 > gpt/v1-d5 ONLINE 0 0 0 > gpt/v1-d6 ONLINE 0 0 0 > gpt/v1-d7 ONLINE 0 0 0 > raidz2-2 ONLINE 0 0 0 > gpt/v1-e0c ONLINE 0 0 0 > gpt/v1-e1b ONLINE 0 0 0 > gpt/v1-e2b ONLINE 0 0 0 > gpt/v1-e3b ONLINE 0 0 0 > gpt/v1-e4b ONLINE 0 0 0 > gpt/v1-e5a ONLINE 0 0 0 > gpt/v1-e6a ONLINE 0 0 0 > gpt/v1-e7c ONLINE 0 0 0 > logs > gpt/vr1logONLINE 0 0 0 > cache > gpt/vr1cache ONLINE 0 0 0 > > errors: No known data errors > > ... it doesn't say it now, but there were 5 CKSUM errors on one of the > drives that I had trial-removed (and not on the one replaced). > ___ That is EXACTLY what I'm seeing; the "OFFLINE'd" drive is the one that, after a scrub, comes up with the checksum errors. It does *not* flag any errors during the resilver and the drives *not* taken offline do not (ever) show checksum errors either. Interestingly enough you have 19.00.00.00 firmware on your card as well -- which is what was on mine. I have flashed my card forward to 20.00.07.00 -- we'll see if it still does it when I do the next swap of the backup set. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
On 4/10/2019 08:45, Andriy Gapon wrote: > On 10/04/2019 04:09, Karl Denninger wrote: >> Specifically, I *explicitly* OFFLINE the disk in question, which is a >> controlled operation and *should* result in a cache flush out of the ZFS >> code into the drive before it is OFFLINE'd. >> >> This should result in the "last written" TXG that the remaining online >> members have, and the one in the offline member, being consistent. >> >> Then I "camcontrol standby" the involved drive, which forces a writeback >> cache flush and a spindown; in other words, re-ordered or not, the >> on-platter data *should* be consistent with what the system thinks >> happened before I yank the physical device. > This may not be enough for a specific [RAID] controller and a specific > configuration. It should be enough for a dumb HBA. But, for example, > mrsas(9) > can simply ignore the synchronize cache command (meaning neither the on-board > cache is flushed nor the command is propagated to a disk). So, if you use > some > advanced controller it would make sense to use its own management tool to > offline a disk before pulling it. > > I do not preclude a possibility of an issue in ZFS. But it's not the only > possibility either. In this specific case the adapter in question is... mps0: port 0xc000-0xc0ff mem 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd mps0: IOCCapabilities: 1285c Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects his drives via dumb on-MoBo direct SATA connections. What I don't know (yet) is if the update to firmware 20.00.07.00 in the HBA has fixed it. The 11.2 and 12.0 revs of FreeBSD through some mechanism changed timing quite materially in the mps driver; prior to 11.2 I ran with a Lenovo SAS expander connected to SATA disks without any problems at all, even across actual disk failures through the years, but in 11.2 and 12.0 doing this resulted in spurious retries out of the CAM layer that allegedly came from timeouts on individual units (which looked very much like a lost command sent to the disk), but only on mirrored volume sets -- yet there were no errors reported by the drive itself, nor did either of my RaidZ2 pools (one spinning rust, one SSD) experience problems of any sort. Flashing the HBA forward to 20.00.07.00 with the expander in resulted in the *driver* (mps) taking disconnects and resets instead of the targets, which in turn caused random drive fault events across all of the pools. For obvious reasons that got backed out *fast*. Without the expander 19.00.00.00 has been stable over the last few months *except* for this circumstance, where an intentionally OFFLINE'd disk in a mirror that is brought back online after some reasonably long period of time (days to a week) results in a successful resilver but then a small number of checksum errors on that drive -- always on the one that was OFFLINEd, never on the one(s) not taken OFFLINE -- appear and are corrected when a scrub is subsequently performed. I am now on 20.00.07.00 and so far -- no problems. But I've yet to do the backup disk swap on 20.00.07.00 (scheduled for late week or Monday) so I do not know if the 20.00.07.00 roll-forward addresses the scrub issue or not. I have no reason to believe it is involved, but given the previous "iffy" nature of 11.2 and 12.0 on 19.0 with the expander it very well might be due to what appear to be timing changes in the driver architecture. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
On 4/9/2019 16:27, Zaphod Beeblebrox wrote: > I have a "Ghetto" home RAID array. It's built on compromises and makes use > of RAID-Z2 to survive. It consists of two plexes of 8x 4T units of > "spinning rust". It's been upgraded and upgraded. It started as 8x 2T, > then 8x 2T + 8x 4T then the current 16x 4T. The first 8 disks are > connected to motherboard SATA. IIRC, there are 10. Two ports are used for > a mirror that it boots from. There's also an SSD in there somhow, so it > might be 12 ports on the motherboard. > > The other 8 disks started life in eSATA port multiplier boxes. That was > doubleplusungood, so I got a RAID card based on LSI pulled from a fujitsu > server in Japan. That's been upgraded a couple of times... not always a > good experience. One problem is that cheap or refurbished drives don't > always "like" SAS controllers and FreeBSD. YMMV. > > Anyways, this is all to introduce the fact that I've seen this behaviour > multiple times. You have a drive that leaves the array for some amount of > time, and after resilvering, a scrub will find a small amount of bad data. > 32 k or 40k or somesuch. In my cranial schema of things, I've chalked it > up to out-of-order writing of the drives ... or other such behavior s.t. > ZFS doesn't know exactly what has been written. I've often wondered if the > fix would be to add an amount of fuzz to the transaction range that is > resilvered. > > > On Tue, Apr 9, 2019 at 4:32 PM Karl Denninger wrote: > >> On 4/9/2019 15:04, Andriy Gapon wrote: >>> On 09/04/2019 22:01, Karl Denninger wrote: >>>> the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S >>>> IN USE AREA was examined, compared, and blocks not on the "new member" >>>> or changed copied over. >>> I think that that's not entirely correct. >>> ZFS maintains something called DTL, a dirty-time log, for a missing / >> offlined / >>> removed device. When the device re-appears and gets resilvered, ZFS >> walks only >>> those blocks that were born within the TXG range(s) when the device was >> missing. >>> In any case, I do not have an explanation for what you are seeing. >> That implies something much more-serious could be wrong such as given >> enough time -- a week, say -- that the DTL marker is incorrect and some >> TXGs that were in fact changed since the OFFLINE are not walked through >> and synchronized. That would explain why it gets caught by a scrub -- >> the resilver is in fact not actually copying all the blocks that got >> changed and so when you scrub the blocks are not identical. Assuming >> the detached disk is consistent that's not catastrophically bad IF >> CAUGHT; where you'd get screwed HARD is in the situation where (for >> example) you had a 2-unit mirror, detached one, re-attached it, resilver >> says all is well, there is no scrub performed and then the >> *non-detached* disk fails before there is a scrub. In that case you >> will have permanently destroyed or corrupted data since the other disk >> is allegedly consistent but there are blocks *missing* that were never >> copied over. >> >> Again this just showed up on 12.x; it definitely was *not* at issue in >> 11.1 at all. I never ran 11.2 in production for a material amount of >> time (I went from 11.1 to 12.0 STABLE after the IPv6 fixes were posted >> to 12.x) so I don't know if it is in play on 11.2 or not. >> >> I'll see if it shows up again with 20.00.07.00 card firmware. >> >> Of note I cannot reproduce this on my test box with EITHER 19.00.00.00 >> or 20.00.07.00 firmware when I set up a 3-unit mirror, offline one, make >> a crap-ton of changes, offline the second and reattach the third (in >> effect mirroring the "take one to the vault" thing) with a couple of >> hours elapsed time and a synthetic (e.g. "dd if=/dev/random of=outfile >> bs=1m" sort of thing) "make me some new data that has to be resilvered" >> workload. I don't know if that's because I need more entropy in the >> filesystem than I can reasonably generate this way (e.g. more >> fragmentation of files, etc) or whether it's a time-based issue (e.g. >> something's wrong with the DTL/TXG thing as you note above in terms of >> how it functions and it only happens if the time elapsed causes >> something to be subject to a rollover or similar problem.) >> >> I spent quite a lot of time trying to make reproduce the issue on my >> "sandbox" machine
Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
On 4/9/2019 15:04, Andriy Gapon wrote: > On 09/04/2019 22:01, Karl Denninger wrote: >> the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S >> IN USE AREA was examined, compared, and blocks not on the "new member" >> or changed copied over. > I think that that's not entirely correct. > ZFS maintains something called DTL, a dirty-time log, for a missing / > offlined / > removed device. When the device re-appears and gets resilvered, ZFS walks > only > those blocks that were born within the TXG range(s) when the device was > missing. > > In any case, I do not have an explanation for what you are seeing. That implies something much more-serious could be wrong such as given enough time -- a week, say -- that the DTL marker is incorrect and some TXGs that were in fact changed since the OFFLINE are not walked through and synchronized. That would explain why it gets caught by a scrub -- the resilver is in fact not actually copying all the blocks that got changed and so when you scrub the blocks are not identical. Assuming the detached disk is consistent that's not catastrophically bad IF CAUGHT; where you'd get screwed HARD is in the situation where (for example) you had a 2-unit mirror, detached one, re-attached it, resilver says all is well, there is no scrub performed and then the *non-detached* disk fails before there is a scrub. In that case you will have permanently destroyed or corrupted data since the other disk is allegedly consistent but there are blocks *missing* that were never copied over. Again this just showed up on 12.x; it definitely was *not* at issue in 11.1 at all. I never ran 11.2 in production for a material amount of time (I went from 11.1 to 12.0 STABLE after the IPv6 fixes were posted to 12.x) so I don't know if it is in play on 11.2 or not. I'll see if it shows up again with 20.00.07.00 card firmware. Of note I cannot reproduce this on my test box with EITHER 19.00.00.00 or 20.00.07.00 firmware when I set up a 3-unit mirror, offline one, make a crap-ton of changes, offline the second and reattach the third (in effect mirroring the "take one to the vault" thing) with a couple of hours elapsed time and a synthetic (e.g. "dd if=/dev/random of=outfile bs=1m" sort of thing) "make me some new data that has to be resilvered" workload. I don't know if that's because I need more entropy in the filesystem than I can reasonably generate this way (e.g. more fragmentation of files, etc) or whether it's a time-based issue (e.g. something's wrong with the DTL/TXG thing as you note above in terms of how it functions and it only happens if the time elapsed causes something to be subject to a rollover or similar problem.) I spent quite a lot of time trying to make reproduce the issue on my "sandbox" machine and was unable -- and of note it is never a large quantity of data that is impacted, it's usually only a couple of dozen checksums that show as bad and fixed. Of note it's also never just one; if there was a single random hit on a data block due to ordinary bitrot sort of issues I'd expect only one checksum to be bad. But generating a realistic synthetic workload over the amount of time involved on a sandbox is not trivial at all; the system on which this is now happening handles a lot of email and routine processing of various sorts including a fair bit of database activity associated with network monitoring and statistical analysis. I'm assuming that using "offline" as a means to do this hasn't become "invalid" as something that's considered "ok" as a means of doing this sort of thing it certainly has worked perfectly well for a very long time! -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
.1 and 11.2/12.0, as I discovered when the SAS expander I used to have in these boxes started returning timeout errors that were false. Again -- this same configuration was completely stable under 11.1 and previous over a period of years. With 20.00.07.00 I have yet to have this situation recur -- so far -- but I have limited time with 20.00.07.00 and as such my confidence that the issue is in fact resolved by the card firmware change is only modest at this point. Over the next month or so, if it doesn't happen again, my confidence will of course improve. Checksum errors on ZFS volumes are extraordinarily uncool for the obvious reason -- they imply the disk thinks the data is fine (since it is not recording any errors on the interface or at the drive level) BUT ZFS thinks the data off that particular record was corrupt as the checksum fails. Silent corruption is the worst sort in that it can hide for months or even years before being discovered, long after your redundant copies have been re-used or overwritten. Assuming I do not see a recurrence with the 20.00.07.00 firmware I would suggest that UPDATING, the Release Notes or Errata have an entry added that for 12.x forward card firmware revisions prior to 20.00.07.00 carry *strong* cautions and that those with these HBAs be strongly urged to flash the card forward to 20.00.07.00 before upgrading or installing. If you get a surprise of this sort and have no second copy that is not impacted you could find yourself severely hosed. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Observations from a ZFS reorganization on 12-STABLE
On 3/18/2019 08:37, Walter Cramer wrote: > I suggest caution in raising vm.v_free_min, at least on 11.2-RELEASE > systems with less RAM. I tried "65536" (256MB) on a 4GB mini-server, > with vfs.zfs.arc_max of 2.5GB. Bad things happened when the cron > daemon merely tried to run `periodic daily`. > > A few more details - ARC was mostly full, and "bad things" was 1: > `pagedaemon` seemed to be thrashing memory - using 100% of CPU, with > little disk activity, and 2: many normal processes seemed unable to > run. The latter is probably explained by `man 3 sysctl` (see entry for > "VM_V_FREE_MIN"). > > > On Mon, 18 Mar 2019, Pete French wrote: > >> On 17/03/2019 21:57, Eugene Grosbein wrote: >>> I agree. Recently I've found kind-of-workaround for this problem: >>> increase vm.v_free_min so when "FREE" memory goes low, >>> page daemon wakes earlier and shrinks UMA (and ZFS ARC too) moving >>> some memory >>> from WIRED to FREE quick enough so it can be re-used before bad >>> things happen. >>> >>> But avoid increasing vm.v_free_min too much (e.g. over 1/4 of total >>> RAM) >>> because kernel may start behaving strange. For 16Gb system it should >>> be enough >>> to raise vm.v_free_min upto 262144 (1GB) or 131072 (512M). >>> >>> This is not permanent solution in any way but it really helps. >> >> Ah, thats very interesting, thankyou for that! I;ve been bitten by >> this issue too in the past, and it is (as mentioned) much improved on >> 12, but the act it could still cause issues worries me. >> Raising free_target should *not* result in that sort of thrashing. However, that's not really a fix standing alone either since the underlying problem is not being addressed by either change. It is especially dangerous to raise the pager wakeup thresholds if you still run into UMA allocated-but-not-in-use not being cleared out issues as there's a risk of severe pathological behavior arising that's worse than the original problem. 11.1 and before (I didn't have enough operational experience with 11.2 to know, as I went to 12.x from mostly-11.1 installs around here) were essentially unusable in my workload without either my patch set or the Phabricator one. This is *very* workload-specific however, or nobody would use ZFS on earlier releases, and many do without significant problems. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Observations from a ZFS reorganization on 12-STABLE
On 3/18/2019 08:07, Pete French wrote: > > > On 17/03/2019 21:57, Eugene Grosbein wrote: >> I agree. Recently I've found kind-of-workaround for this problem: >> increase vm.v_free_min so when "FREE" memory goes low, >> page daemon wakes earlier and shrinks UMA (and ZFS ARC too) moving >> some memory >> from WIRED to FREE quick enough so it can be re-used before bad >> things happen. >> >> But avoid increasing vm.v_free_min too much (e.g. over 1/4 of total RAM) >> because kernel may start behaving strange. For 16Gb system it should >> be enough >> to raise vm.v_free_min upto 262144 (1GB) or 131072 (512M). >> >> This is not permanent solution in any way but it really helps. > > Ah, thats very interesting, thankyou for that! I;ve been bitten by > this issue too in the past, and it is (as mentioned) much improved on > 12, but the act it could still cause issues worries me. > > The code patch I developed originally essentially sought to have the ARC code pare back before the pager started evicting working set. A second crack went after clearing allocated-but-not-in-use UMA. v_free_min may not be the right place to do this -- see if bumping up vm.v_free_target also works. I'll stick this on my "to do" list, but it's much less critical in my applications than it was with 10.x and 11.x, both of which suffered from it much more-severely to the point that I was getting "stalls" that in some cases went on for 10 or more seconds due to things like your shell being evicted to swap to make room for arc, which is flat-out nuts. That, at least, doesn't appear to be a problem with 12. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Observations from a ZFS reorganization on 12-STABLE
I've long argued that the VM system's interaction with ZFS' arc cache and UMA has serious, even severe issues. 12.x appeared to have addressed some of them, and as such I've yet to roll forward any part of the patch series that is found here [ https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 ] or the Phabricator version referenced in the bug thread (which is more-complex and attempts to dig at the root of the issue more effectively, particularly when UMA is involved as it usually is.) Yesterday I decided to perform a fairly significant reorganization of the ZFS pools on one of my personal machines, including the root pool which was on mirrored SSDs, changing to a Raidz2 (also on SSDs.) This of course required booting single-user from a 12-Stable memstick. A simple "zfs send -R zs/root-save/R | zfs recv -Fuev zsr/R" should have done it, no sweat. The root that was copied over before I started is uncomplicated; it's compressed, but not de-duped. While it has snapshots on it too it's by no means complex. *The system failed to execute that command with an "out of swap space" error, killing the job; there was indeed no swap configured since I booted from a memstick.* Huh? A simple *filesystem copy* managed to force a 16Gb system into requiring page file backing store? I was able to complete the copy by temporarily adding the swap space back on (where it would be when the move was complete) but that requirement is pure insanity and it appears, from what I was able to determine, that it came about from the same root cause that's been plaguing VM/ZFS interaction since 2014 when I started work this issue -- specifically, when RAM gets low rather than evict ARC (or clean up UMA that is allocated but unused) the system will attempt to page out working set. In this case since it couldn't page out working set since there was nowhere to page it to the process involved got an OOM error and was terminated. *I continue to argue that this decision is ALWAYS wrong.* It's wrong because if you invalidate cache and reclaim it you *might* take a read from physical I/O to replace that data back into the cache in the future (since it's not in RAM) but in exchange for a *potential* I/O you perform a GUARANTEED physical I/O (to page out some amount of working set) and possibly TWO physical I/Os (to page said working set out and, later, page it back in.) It has always appeared to me to be flat-out nonsensical to trade a possible physical I/O (if there is a future cache miss) for a guaranteed physical I/O and a possible second one. It's even worse if the reason you make that decision is that UMA is allocated but unused; in that case you are paging when no physical I/O is required at all as the "memory pressure" is a phantom! While UMA is a very material performance win in the general case to allow allocated-but-unused UMA to force paging, from a performance perspective, appears to be flat-out insanity. I find it very difficult to come up with any reasonable scenario where releasing allocated-but-unused UMA rather than paging out working set is a net performance loser. In this case since the system was running in single user mode the process that got selected to be destroyed when that circumstance arose and there was no available swap was the copy process itself. The copy itself did not require anywhere near all of the available non-kernel RAM. I'm going to dig into this further but IMHO the base issue still exists, even though the impact of it for my workloads with everything "running normally" has materially decreased with 12.x. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Coffee Lake Xeons...
Anyone used them yet with FreeBSD? The server boards available with IPMI/iKVM are sparse thus far -- in fact, I've only found one that looks like it might fit my requirements (specifically I need iKVM, a legacy serial port and enough PCIe to handle both a LSI SAS/Sata card *and* forward expansion to 10Gb networking as required on a forward basis.) I'm specifically considering this board https://www.asrockrack.com/general/productdetail.asp?Model=E3C246D4U#Specifications ... with one of the E-2100 "G" series chips to replace an aging (but still fully-functional) Westmere Xeon board. The goal is to gain CPU, memory and I/O bandwidth (collectively "performance"), keep forward optionality for network performance improvements while materially reducing power consumption/./ -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Binary update to -STABLE? And if so, what do I get?
On 2/13/2019 07:49, Kurt Jaeger wrote: > Hi! > >> I know (and have done) binary updates between -RELEASE versions > [...] >> How do I do this, say, coming from 11.2 and wanting to target 12 post >> the IPv6 fix MFC? > You can't. Either wait until a 12.0 with the fix included or > 12.1 is released, or you fetch the source with the fix included, > and build from source. Got it -- thanks. Wait it shall be. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Binary update to -STABLE? And if so, what do I get?
I know (and have done) binary updates between -RELEASE versions But 12 has a problem with -RELEASE and IPv6, which was recently fixed and MFC'd. So now I have an interesting situation in that I have two machines in the field running 11.2 that do things for me at one of the "shared colo" joints, and I would like to roll them forward -- but they have to roll forward to a reasonably-recent -STABLE. How do I do this, say, coming from 11.2 and wanting to target 12 post the IPv6 fix MFC? (e.g. how do I specify the target, since it wouldn't be "12-RELEASE"?) I'm assuming (perhaps incorrectly) that "12-STABLE" is not the correct means to do so. Or is it, since at first blush it doesn't blow up if I use that... but I'm hesitant to say "yeah, go at it." # freebsd-update -r 12-STABLE upgrade src component not installed, skipped Looking up update.FreeBSD.org mirrors... 3 mirrors found. Fetching metadata signature for 11.2-RELEASE from update2.freebsd.org... done. Fetching metadata index... done. Inspecting system... done. The following components of FreeBSD seem to be installed: kernel/generic world/base world/lib32 The following components of FreeBSD do not seem to be installed: kernel/generic-dbg world/base-dbg world/doc world/lib32-dbg Does this look reasonable (y/n)? -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: 11.2-STABLE kernel wired memory leak
2,932,160 2,884,928 5,853,184 512 1,529,344 4,323,840 5,935,104 256_Bucket 5,836,800 98,304 6,039,544 vmem_btag 4,980,528 1,059,016 6,502,272 L_VFS_Cache 2,106,088 4,396,184 7,471,104 zio_buf_393216 0 7,471,104 8,519,680 zio_buf_65536 0 8,519,680 8,988,000 32 8,279,456 708,544 9,175,040 zio_buf_458752 0 9,175,040 9,535,488 1,024 9,174,016 361,472 11,376,288 BUF_TRIE 0 11,376,288 11,640,832 zio_data_buf_57344 860,160 10,780,672 11,657,216 mbuf_cluster 11,446,272 210,944 11,796,480 zio_buf_655360 0 11,796,480 13,271,040 zio_data_buf_81920 737,280 12,533,760 14,024,704 zio_data_buf_65536 917,504 13,107,200 17,039,360 zio_buf_1310720 0 17,039,360 17,301,504 zio_buf_524288 0 17,301,504 18,087,936 zio_data_buf_98304 1,277,952 16,809,984 18,153,456 zio_cache 388,808 17,764,648 24,144,120 MAP_ENTRY 16,120,200 8,023,920 26,214,400 zio_buf_1048576 2,097,152 24,117,248 29,379,240 range_seg_cache 21,302,856 8,076,384 29,782,080 RADIX_NODE 17,761,104 12,020,976 34,511,400 S_VFS_Cache 31,512,672 2,998,728 38,535,168 zio_buf_786432 0 38,535,168 40,680,144 sa_cache 40,548,816 131,328 41,517,056 zio_data_buf_114688 1,490,944 40,026,112 42,205,184 zio_buf_917504 0 42,205,184 50,147,328 zio_data_buf_4096 98,304 50,049,024 50,675,712 zio_data_buf_49152 983,040 49,692,672 53,972,736 64 29,877,888 24,094,848 61,341,696 zio_buf_131072 42,205,184 19,136,512 72,019,200 VM_OBJECT 71,597,056 422,144 76,731,200 zfs_znode_cache 76,592,208 138,992 88,972,800 256 82,925,568 6,047,232 90,390,528 4,096 89,911,296 479,232 94,036,000 UMA_Slabs 94,033,280 2,720 135,456,000 VNODE 135,273,600 182,400 171,928,320 arc_buf_hdr_t_full 119,737,344 52,190,976 221,970,432 zio_data_buf_8192 166,076,416 55,894,016 277,923,840 dmu_buf_impl_t 130,788,240 147,135,600 376,586,240 zio_buf_16384 372,719,616 3,866,624 397,680,896 128 195,944,448 201,736,448 443,023,360 zio_data_buf_131072 158,072,832 284,950,528 535,584,768 zio_buf_512 255,641,088 279,943,680 713,552,840 dnode_t 373,756,656 339,796,184 3,849,744,384 abd_chunk 3,848,310,784 1,433,600 8,418,769,848 TOTAL 6,459,507,936 1,959,261,912 So far running 12-STABLE "neat" is behaving well for me -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: 11.2-STABLE kernel wired memory leak
On 2/12/2019 10:49, Eugene Grosbein wrote: > 12.02.2019 23:34, Mark Johnston wrote: > >> I suspect that the "leaked" memory is simply being used to cache UMA >> items. Note that the values in the FREE column of vmstat -z output are >> quite large. The cached items are reclaimed only when the page daemon >> wakes up to reclaim memory; if there are no memory shortages, large >> amounts of memory may accumulate in UMA caches. In this case, the sum >> of the product of columns 2 and 5 gives a total of roughly 4GB cached. > Forgot to note, that before I got system to single user mode, there was heavy > swap usage (over 3.5GB) > and heavy page-in/page-out, 10-20 megabytes per second and system was > crawling slow due to pageing. This is a manifestation of the general issue I've had an ongoing "discussion" running in a long-running thread on bugzilla and the interaction between UMA, ARC and the VM system. The short version is that the VM system does pathological things including paging out working set when there is a large amount of allocated-but-unused UMA and the means by which the ARC code is "told" that it needs to release RAM also interacts with the same mechanisms and exacerbates the problem. I've basically given up on getting anything effective to deal with this merged into the code and have my own private set of patches that I published for a while in that thread (and which had some collaborative development and testing) but have given up on seeing anything meaningful put into the base code. To the extent I need them in a given workload and environment I simply apply them on my own and go on my way. I don't have enough experience with 12 yet to know if the same approach will be necessary there (that is, what if any changes got into the 12.x code), and never ran 11.2 much, choosing to stay on 11.1 where said patches may not have been the most-elegant means of dealing with it but were successful. There was also a phabricator thread on this but I don't know what part of it, if any, got merged (it was more-elegant, in my opinion, than what I had coded up.) Under certain workloads here without the patches I was experiencing "freezes" due to unnecessary page-outs onto spinning rust that in some cases reached into double-digit *seconds.* With them the issue was entirely resolved. At the core of the issue is that "something" has to be taught that *before* the pager starts evicting working set to swap if you have large amounts of UMA allocated to ARC but not in use that RAM should be released, and beyond that if you have ARC allocated and in use but are approaching where VM is going to page working set out you need to come up with some meaningful way of deciding whether to release some of the ARC rather than take the page hit -- and in virtually every case the answer to that question is to release the RAM consumed by ARC. Part of the issue is that UMA can be allocated for other things besides ARC yet you really only want to release the ARC-related UMA that is allocated-but-unused in this instance. The logic is IMHO pretty simple on this -- a page-out of a process that will run again always requires TWO disk operations -- one to page it out right now and a second at a later time to page it back in. A released ARC cache *MAY* (if there would have been a cache hit in the future) require ONE disk operation (to retrieve it from disk.) Two is always greater than one and one is never worse than "maybe one later" therefore prioritizing taking two *definite* disk I/Os or one definite I/O now and one possible one later instead of one *possible* disk I/O later is always a net lose -- and thus IMHO substantial effort should be made to avoid doing that. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Oddball error from "batch"
On 2/10/2019 16:01, Karl Denninger wrote: > Note -- working fine on 11.1 and 11.2, upgraded machine to 12.0-STABLE > and everything is ok that I'm aware of *except*. > > # batch > who > df > ^D > > Job 170 will be executed using /bin/sh > > Then the time comes and... no output is emailed to me. > > In the cron log file I find: > > Feb 10 16:00:00 NewFS atrun[65142]: cannot open input file > E000aa018a24c3: No such file or directory > > Note that scheduled cron jobs are running as expected, and the > permissions on /var/at are correct (match exactly my 11. 1 and 11.2 > boxes), and in addition of looking BEFORE the job runs the named job > number IS THERE. > > [\u@NewFS /var/at/jobs]# ls -al > total 13 > drwxr-xr-x 2 daemon wheel 5 Feb 10 15:55 . > drwxr-xr-x 4 root wheel 5 Oct 8 2013 .. > -rw-r--r-- 1 root wheel 6 Feb 10 15:55 .SEQ > -rw--- 1 root wheel 0 Jul 5 2008 .lockfile > -rwx-- 1 root wheel 615 Feb 10 15:55 E000aa018a24c3 > > After the error the file isn't there. It was removed (as one would > expect when the job is complete.) > > What the blankety-blank?! Turns out it's a nasty race in the atrun code I have no idea why this hasn't bit the living daylights out of lots of people before, but it's sure biting me! https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235657 Includes a proposed fix... :) -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Oddball error from "batch"
Note -- working fine on 11.1 and 11.2, upgraded machine to 12.0-STABLE and everything is ok that I'm aware of *except*. # batch who df ^D Job 170 will be executed using /bin/sh Then the time comes and... no output is emailed to me. In the cron log file I find: Feb 10 16:00:00 NewFS atrun[65142]: cannot open input file E000aa018a24c3: No such file or directory Note that scheduled cron jobs are running as expected, and the permissions on /var/at are correct (match exactly my 11. 1 and 11.2 boxes), and in addition of looking BEFORE the job runs the named job number IS THERE. [\u@NewFS /var/at/jobs]# ls -al total 13 drwxr-xr-x 2 daemon wheel 5 Feb 10 15:55 . drwxr-xr-x 4 root wheel 5 Oct 8 2013 .. -rw-r--r-- 1 root wheel 6 Feb 10 15:55 .SEQ -rw--- 1 root wheel 0 Jul 5 2008 .lockfile -rwx-- 1 root wheel 615 Feb 10 15:55 E000aa018a24c3 After the error the file isn't there. It was removed (as one would expect when the job is complete.) What the blankety-blank?! -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Geli prompts on gptzfsboot (Was:: Serious ZFS Bootcode Problem (GPT NON-UEFI -- RESOLVED)
On 2/10/2019 12:40, Ian Lepore wrote: > On Sun, 2019-02-10 at 12:35 -0600, Karl Denninger wrote: >> On 2/10/2019 12:01, Ian Lepore wrote: >>> On Sun, 2019-02-10 at 11:54 -0600, Karl Denninger wrote: >>>> On 2/10/2019 11:50, Ian Lepore wrote: >>>>> On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote: >>>>> >>>>>> [...] >>>>>> >>>>>> BTW am I correct that gptzfsboot did *not* get the ability to >>>>>> read >>>>>> geli-encrypted pools in 12.0? The UEFI loader does know how >>>>>> (which I'm >>>>>> using on my laptop) but I was under the impression that for >>>>>> non- >>>>>> UEFI >>>>>> systems you still needed the unencrypted boot partition from >>>>>> which to >>>>>> load the kernel. >>>>>> >>>>> Nope, that's not correct. GELI support was added to the boot >>>>> and >>>>> loader >>>>> programs for both ufs and zfs in freebsd 12. You must set the >>>>> geli >>>>> '-g' >>>>> option to be prompted for the passphrase while booting (this is >>>>> separate from the '-b' flag that enables mounting the encrypted >>>>> partition as the rootfs). You can use "geli configure -g" to >>>>> turn >>>>> on >>>>> the flag on any existing geli partition. >>>>> >>>>> -- Ian >>>> Excellent - this will eliminate the need for me to run down the >>>> foot-shooting that occurred in my update script since the >>>> unencrypted >>>> kernel partition is no longer needed at all. That also >>>> significantly >>>> reduces the attack surface on such a machine (although you could >>>> still >>>> tamper with the contents of freebsd-boot of course.) >>>> >>>> The "-g" flag I knew about from experience in putting 12 on my X1 >>>> Carbon >>>> (which works really well incidentally; the only issue I'm aware >>>> of is >>>> that there's no 5Ghz WiFi support.) >>>> >>> One thing that is rather unfortunate... if you have multiple geli >>> encrypted partitions that all have the same passphrase, you will be >>> required to enter that passphrase twice while booting -- once in >>> gpt[zfs]boot, then again during kernel startup when the rest of the >>> drives/partitions get tasted by geom. This is because APIs within >>> the >>> boot process got changed to pass keys instead of the passphrase >>> itself >>> from one stage of booting to the next, and the fallout of that is >>> the >>> key for the rootfs is available to the kernel for mountroot, but >>> the >>> passphrase is not available to the system when geom is probing all >>> the >>> devices, so you get prompted for it again. >>> >>> -- Ian >> Let me see if I understand this before I do it then... :-) >> >> I have the following layout: >> >> 1. Two SSDs that contain the OS as a two-provider ZFS pool, which has >> "-b" set on both members; I get the "GELI Passphrase:" prompt from >> the >> loader and those two providers (along with encrypted swap) attach >> early >> in the boot process. The same SSDs contain a mirrored non-encrypted >> pool that has /boot (and only /boot) on it because previously you >> couldn't boot from an EFI-encrypted pool at all. >> >> Thus: >> >> [\u@NewFS /root]# gpart show da1 >> => 34 468862061 da1 GPT (224G) >> 34 2014 - free - (1.0M) >>2048 10241 freebsd-boot (512K) >>3072 1024 - free - (512K) >>4096 209715202 freebsd-zfs [bootme] (10G) >>20975616 1342177283 freebsd-swap (64G) >> 155193344 3136675844 freebsd-zfs (150G) >> 468860928 1167 - free - (584K) >> >> There is of course a "da2" that is identical. The actual encrypted >> root >> pool is on partition 4 with "-b" set at present. I get prompted from >> loader as a result after the unencrypted partition (#2) boots. >> >> 2. Multiple additional "user space" pools on a bunch of other disks. >> >> Right now #2 is using geli groups. Prior to 12.0 they were
Re: Geli prompts on gptzfsboot (Was:: Serious ZFS Bootcode Problem (GPT NON-UEFI -- RESOLVED)
On 2/10/2019 12:01, Ian Lepore wrote: > On Sun, 2019-02-10 at 11:54 -0600, Karl Denninger wrote: >> On 2/10/2019 11:50, Ian Lepore wrote: >>> On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote: >>> >>>> [...] >>>> >>>> BTW am I correct that gptzfsboot did *not* get the ability to >>>> read >>>> geli-encrypted pools in 12.0? The UEFI loader does know how >>>> (which I'm >>>> using on my laptop) but I was under the impression that for non- >>>> UEFI >>>> systems you still needed the unencrypted boot partition from >>>> which to >>>> load the kernel. >>>> >>> Nope, that's not correct. GELI support was added to the boot and >>> loader >>> programs for both ufs and zfs in freebsd 12. You must set the geli >>> '-g' >>> option to be prompted for the passphrase while booting (this is >>> separate from the '-b' flag that enables mounting the encrypted >>> partition as the rootfs). You can use "geli configure -g" to turn >>> on >>> the flag on any existing geli partition. >>> >>> -- Ian >> Excellent - this will eliminate the need for me to run down the >> foot-shooting that occurred in my update script since the unencrypted >> kernel partition is no longer needed at all. That also significantly >> reduces the attack surface on such a machine (although you could >> still >> tamper with the contents of freebsd-boot of course.) >> >> The "-g" flag I knew about from experience in putting 12 on my X1 >> Carbon >> (which works really well incidentally; the only issue I'm aware of is >> that there's no 5Ghz WiFi support.) >> > One thing that is rather unfortunate... if you have multiple geli > encrypted partitions that all have the same passphrase, you will be > required to enter that passphrase twice while booting -- once in > gpt[zfs]boot, then again during kernel startup when the rest of the > drives/partitions get tasted by geom. This is because APIs within the > boot process got changed to pass keys instead of the passphrase itself > from one stage of booting to the next, and the fallout of that is the > key for the rootfs is available to the kernel for mountroot, but the > passphrase is not available to the system when geom is probing all the > devices, so you get prompted for it again. > > -- Ian Let me see if I understand this before I do it then... :-) I have the following layout: 1. Two SSDs that contain the OS as a two-provider ZFS pool, which has "-b" set on both members; I get the "GELI Passphrase:" prompt from the loader and those two providers (along with encrypted swap) attach early in the boot process. The same SSDs contain a mirrored non-encrypted pool that has /boot (and only /boot) on it because previously you couldn't boot from an EFI-encrypted pool at all. Thus: [\u@NewFS /root]# gpart show da1 => 34 468862061 da1 GPT (224G) 34 2014 - free - (1.0M) 2048 1024 1 freebsd-boot (512K) 3072 1024 - free - (512K) 4096 20971520 2 freebsd-zfs [bootme] (10G) 20975616 134217728 3 freebsd-swap (64G) 155193344 313667584 4 freebsd-zfs (150G) 468860928 1167 - free - (584K) There is of course a "da2" that is identical. The actual encrypted root pool is on partition 4 with "-b" set at present. I get prompted from loader as a result after the unencrypted partition (#2) boots. 2. Multiple additional "user space" pools on a bunch of other disks. Right now #2 is using geli groups. Prior to 12.0 they were handled using a custom /etc/rc.d script I wrote that did basically the same thing that geli groups does because all use the same passphrase and entering the same thing over and over on a boot was a pain in the butt. It prompted cleanly with no echo, took a password and then iterated over a list of devices attaching them one at a time. That requirement is now gone with geli groups, which is nice since mergemaster always complained about it being a "non-standard" thing; it *had* to go in /etc/rc.d and not in /usr/etc/rc.d else I couldn't get it to run early enough -- unfortunately. So if I remove the non-encrypted freebsd-zfs mirror that the system boots from in favor of setting "-g" on the root pool (both providers) gptzfsboot will find and prompt for the password to boot before loader gets invoked at all, much like the EFI loader does. That's good. (My assumption is that the "-g" is sufficient; I don't need (or want) "bootme" set -- correct?)
Re: Fwd: Serious ZFS Bootcode Problem (GPT NON-UEFI)
On 2/10/2019 11:50, Ian Lepore wrote: > On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote: >> On 2/10/2019 09:28, Allan Jude wrote: >>> Are you sure it is non-UEFI? As the instructions you followed, >>> overwriting da0p1 with gptzfsboot, will make quite a mess if that >>> happens to be the EFI system partition, rather than the freebsd- >>> boot >>> partition. >> [...] >> >> BTW am I correct that gptzfsboot did *not* get the ability to read >> geli-encrypted pools in 12.0? The UEFI loader does know how (which I'm >> using on my laptop) but I was under the impression that for non-UEFI >> systems you still needed the unencrypted boot partition from which to >> load the kernel. >> > Nope, that's not correct. GELI support was added to the boot and loader > programs for both ufs and zfs in freebsd 12. You must set the geli '-g' > option to be prompted for the passphrase while booting (this is > separate from the '-b' flag that enables mounting the encrypted > partition as the rootfs). You can use "geli configure -g" to turn on > the flag on any existing geli partition. > > -- Ian Excellent - this will eliminate the need for me to run down the foot-shooting that occurred in my update script since the unencrypted kernel partition is no longer needed at all. That also significantly reduces the attack surface on such a machine (although you could still tamper with the contents of freebsd-boot of course.) The "-g" flag I knew about from experience in putting 12 on my X1 Carbon (which works really well incidentally; the only issue I'm aware of is that there's no 5Ghz WiFi support.) -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Fwd: Serious ZFS Bootcode Problem (GPT NON-UEFI)
On 2/10/2019 09:28, Allan Jude wrote: > Are you sure it is non-UEFI? As the instructions you followed, > overwriting da0p1 with gptzfsboot, will make quite a mess if that > happens to be the EFI system partition, rather than the freebsd-boot > partition. Absolutely certain. The system board in this machine (and a bunch I have in the field) are SuperMicro X8DTL-IFs which do not support UEFI at all (they have no available EFI-capable bios.) They have encrypted root pools but due to the inability of gptzfsboot to read them they have a small freebsd-zfs partition that, when upgraded, I copy /boot/* to after the kernel upgrade is done but before they are rebooted. That partition is not mounted during normal operation; it's only purpose is to load the kernel (and pre-boot .kos such as geli.) > Can you show 'gpart show' output? [karl@NewFS ~]$ gpart show da1 => 34 468862061 da1 GPT (224G) 34 2014 - free - (1.0M) 2048 1024 1 freebsd-boot (512K) 3072 1024 - free - (512K) 4096 20971520 2 freebsd-zfs [bootme] (10G) 20975616 134217728 3 freebsd-swap (64G) 155193344 313667584 4 freebsd-zfs (150G) 468860928 1167 - free - (584K) Partition "2" is the one that should boot. There is also a da2 that has an identical layout (mirrored; the drives are 240Gb Intel 730 SSDs) > What is the actual boot error? It says it can't load the kernel and gives me a prompt. "lsdev" shows all the disks and all except the two (zfs mirror) that have the "bootme" partition on them don't show up as zfs pools at all (they're geli-encrypted, so that's not unexpected.) I don't believe the loader ever gets actually loaded. An attempt to use "ls" from the bootloader to look inside that "bootme" partition fails; gptzfsboot cannot get it open. My belief was that I screwed up and wrote the old 11.1 gptzfsboot to the freebsd-boot partition originally -- but that is clearly not the case. Late last night I took my "rescue media" (which is a "make memstick" from the build of -STABLE), booted that on my sandbox machine, stuck two disks in there and made a base system -- which booted. Thus whatever is going on here it is not as simple as it first appears as that system had the spacemap_v2 flag on and active once it came up. This may be my own foot-shooting since I was able to make a bootable system on my sandbox using the same media (a clone hardware-wise so also no EFI) -- there may have been some part of the /boot hierarchy that didn't get copied over, and if so that would explain it. Update: Indeed that appears to be what it was -- a couple of the *other* files in the boot partition didn't get copied from the -STABLE build (although the entire kernel directory did) I need to look at why that happened as the update process is my own due to the dual-partition requirement for booting with non-EFI but that's not your problem -- it's mine. Sorry about this one; turns out to be something in my update scripts that failed to move over some of the files to the non-encrypted /boot BTW am I correct that gptzfsboot did *not* get the ability to read geli-encrypted pools in 12.0? The UEFI loader does know how (which I'm using on my laptop) but I was under the impression that for non-UEFI systems you still needed the unencrypted boot partition from which to load the kernel. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Serious ZFS Bootcode Problem (GPT NON-UEFI)
FreeBSD 12.0-STABLE r343809 After upgrading to this (without material incident) zfs was telling me that the pools could be upgraded (this machine was running 11.1, then 11.2.) I did so, /and put the new bootcode on with gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i da... /on both of the candidate (mirrored ZFS boot disk) devices, in the correct partition. Then I rebooted to test and. /could not find the zsboot pool containing the kernel./ I booted the rescue image off my SD and checked -- the copy of gptzfsboot that I put on the boot partition is exactly identical to the one on the rescue image SD. Then, to be /absolutely sure /I wasn't going insane I grabbed the mini-memstick img for 12-RELEASE and tried THAT copy of gptzfsboot. /Nope; that won't boot either!/ Fortunately I had a spare drive slot so I stuck in a piece of spinning rust, gpart'ed THAT with an old-style UFS boot filesystem, wrote bootcode on that, mounted the ZFS "zsboot" filesystem and copied it over. That boots fine (of course) and mounts the root pool, and off it goes. I'm going to blow away the entire /usr/obj tree and rebuild the kernel to see if that gets me anything that's more-sane, but right now this looks pretty bad. BTW just to be absolutely sure I blew away the entire /usr/obj directory and rebuilt -- same size and checksum on the binary that I have installed, so. Not sure what's going on here -- did something get moved? -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: 9211 (LSI/SAS) issues on 11.2-STABLE
On 2/6/2019 09:18, Borja Marcos wrote: >> On 5 Feb 2019, at 23:49, Karl Denninger wrote: >> >> BTW under 12.0-STABLE (built this afternoon after the advisories came >> out, with the patches) it's MUCH worse. I get the same device resets >> BUT it's followed by an immediate panic which I cannot dump as it >> generates a page-fault (supervisor read data, page not present) in the >> mps *driver* at mpssas_send_abort+0x21. >> This precludes a dump of course since attempting to do so gives you a >> double-panic (I was wondering why I didn't get a crash dump!); I'll >> re-jigger the box to stick a dump device on an internal SATA device so I >> can successfully get the dump when it happens and see if I can obtain a >> proper crash dump on this. >> >> I think it's fair to assume that 12.0-STABLE should not panic on a disk >> problem (unless of course the problem is trying to page something back >> in -- it's not, the drive that aborts and resets is on a data pack doing >> a scrub) > It shouldn’t panic I imagine. > >>>>> mps0: Sending reset from mpssas_send_abort for target ID 37 > >>> 0x06 = = = === == Transport Statistics (rev 1) == >>> 0x06 0x008 4 6 --- Number of Hardware Resets >>> 0x06 0x010 4 0 --- Number of ASR Events >>> 0x06 0x018 4 0 --- Number of Interface CRC Errors >>> |||_ C monitored condition met >>> ||__ D supports DSN >>> |___ N normalized value >>> >>> 0x06 0x008 4 7 --- Number of Hardware Resets >>> 0x06 0x010 4 0 --- Number of ASR Events >>> 0x06 0x018 4 0 --- Number of Interface CRC Errors >>> |||_ C monitored condition met >>> ||__ D supports DSN >>> |___ N normalized value >>> >>> Number of Hardware Resets has incremented. There are no other errors shown: > What is _exactly_ that value? Is it related to the number of resets sent from > the HBA > _or_ the device resetting by itself? Good question. What counts as a "reset"; UNIT ATTENTION is what the controller receives but whether that's a power reset, a reset *command* from the HBA or a firmware crash (yikes!) in the disk I'm not certain. >>> I'd throw possible shade at the backplane or cable /but I have already >>> swapped both out for spares without any change in behavior./ > What about the power supply? > There are multiple other devices and the system board on that supply (and thus voltage rails) but it too has been swapped out without change. In fact at this point other than the system board and RAM (which is ECC, and is showing no errors in the system's BMC log) /everything /in the server case (HBA, SATA expander, backplane, power supply and cables) has been swapped for spares. No change in behavior. Note that with 20.0.7.0 firmware in the HBA instead of a unit attention I get a *controller* reset (!!!) which detaches some random number of devices from ZFS's point of view before it comes back up (depending on what's active at the time) which is potentially catastrophic if it hits the system pool. I immediately went back to 19.0.0.0 firmware on the HBA; I had upgraded to 20.0.7.0 since there had been good reports of stability with it when I first saw this, thinking there was a drive change that might have resulted in issues with it when running 19.0 firmware on the card. This system was completely stable for over a year on 11.1-STABLE and in fact hadn't been rebooted or logged a single "event" in over six months; the problems started immediately upon upgrade to 11.2-STABLE and persists on 12.0-STABLE. The disks in question haven't changed either (so it can't be a difference in firmware that is in a newer purchased disk, for example.) I'm thinking perhaps *something* in the codebase change made the HBA -> SAS Expander combination trouble where it wasn't before; I've got a couple of 16i HBAs on the way which will allow me to remove the SAS expander to see if that causes the problem to disappear. I've got a bunch of these Lenovo expanders and have been using them without any sort of trouble in multiple machines; it's only when I went beyond 11.1 that I started having trouble with them. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: 9211 (LSI/SAS) issues on 11.2-STABLE
BTW under 12.0-STABLE (built this afternoon after the advisories came out, with the patches) it's MUCH worse. I get the same device resets BUT it's followed by an immediate panic which I cannot dump as it generates a page-fault (supervisor read data, page not present) in the mps *driver* at mpssas_send_abort+0x21. This precludes a dump of course since attempting to do so gives you a double-panic (I was wondering why I didn't get a crash dump!); I'll re-jigger the box to stick a dump device on an internal SATA device so I can successfully get the dump when it happens and see if I can obtain a proper crash dump on this. I think it's fair to assume that 12.0-STABLE should not panic on a disk problem (unless of course the problem is trying to page something back in -- it's not, the drive that aborts and resets is on a data pack doing a scrub) On 2/5/2019 10:26, Karl Denninger wrote: > On 2/5/2019 09:22, Karl Denninger wrote: >> On 2/2/2019 12:02, Karl Denninger wrote: >>> I recently started having some really oddball things happening under >>> stress. This coincided with the machine being updated to 11.2-STABLE >>> (FreeBSD 11.2-STABLE #1 r342918:) from 11.1. >>> >>> Specifically, I get "errors" like this: >>> >>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00 >>> length 131072 SMID 269 Aborting command 0xfe0001179110 >>> mps0: Sending reset from mpssas_send_abort for target ID 37 >>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00 >>> length 131072 SMID 924 terminated ioc 804b loginfo 3114 scsi 0 state >>> c xfer 0 >>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 >>> length 131072 SMID 161 terminated ioc 804b loginfo 3114 scsi 0 state >>> c xfer 0 >>> mps0: Unfreezing devq for target ID 37 >>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00 >>> (da12:mps0:0:37:0): CAM status: CCB request completed with an error >>> (da12:mps0:0:37:0): Retrying command >>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00 >>> (da12:mps0:0:37:0): CAM status: Command timeout >>> (da12:mps0:0:37:0): Retrying command >>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 >>> (da12:mps0:0:37:0): CAM status: CCB request completed with an error >>> (da12:mps0:0:37:0): Retrying command >>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 >>> (da12:mps0:0:37:0): CAM status: SCSI Status Error >>> (da12:mps0:0:37:0): SCSI status: Check Condition >>> (da12:mps0:0:37:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, >>> reset, or bus device reset occurred) >>> (da12:mps0:0:37:0): Retrying command (per sense data) >>> >>> The "Unit Attention" implies the drive reset. It only occurs on certain >>> drives under very heavy load (e.g. a scrub.) I've managed to provoke it >>> on two different brands of disk across multiple firmware and capacities, >>> however, which tends to point away from a drive firmware problem. >>> >>> A look at the pool data shows /no /errors (e.g. no checksum problems, >>> etc) and a look at the disk itself (using smartctl) shows no problems >>> either -- whatever is going on here the adapter is recovering from it >>> without any data corruption or loss registered on *either end*! >>> >>> The configuration is an older SuperMicro Xeon board (X8DTL-IF) and shows: >>> >>> mps0: port 0xc000-0xc0ff mem >>> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3 >>> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd >>> mps0: IOCCapabilities: >>> 1285c >> After considerable additional work this looks increasingly like either a >> missed interrupt or a command is getting lost between the host adapter >> and the expander. >> >> I'm going to turn the driver debug level up and see if I can capture >> more information. whatever is behind this, however, it is >> almost-certainly related to something that changed between 11.1 and >> 11.2, as I never saw these on the 11.1-STABLE build. >> >> -- >> Karl Denninger >> k...@denninger.net <mailto:k...@denninger.net> >> /The Market Ticker/ >> /[S/MIME encrypted email preferred]/ > Pretty decent trace here -- any ideas? > > mps0: timedout cm 0xfe00011b5020 allocated tm 0xfe00011812a0 > (da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3b 80 00 01 00 00 > length 131072 SMID 634 Aborting comma
Re: 9211 (LSI/SAS) issues on 11.2-STABLE
On 2/5/2019 09:22, Karl Denninger wrote: > On 2/2/2019 12:02, Karl Denninger wrote: >> I recently started having some really oddball things happening under >> stress. This coincided with the machine being updated to 11.2-STABLE >> (FreeBSD 11.2-STABLE #1 r342918:) from 11.1. >> >> Specifically, I get "errors" like this: >> >> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00 >> length 131072 SMID 269 Aborting command 0xfe0001179110 >> mps0: Sending reset from mpssas_send_abort for target ID 37 >> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00 >> length 131072 SMID 924 terminated ioc 804b loginfo 3114 scsi 0 state >> c xfer 0 >> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 >> length 131072 SMID 161 terminated ioc 804b loginfo 3114 scsi 0 state >> c xfer 0 >> mps0: Unfreezing devq for target ID 37 >> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00 >> (da12:mps0:0:37:0): CAM status: CCB request completed with an error >> (da12:mps0:0:37:0): Retrying command >> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00 >> (da12:mps0:0:37:0): CAM status: Command timeout >> (da12:mps0:0:37:0): Retrying command >> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 >> (da12:mps0:0:37:0): CAM status: CCB request completed with an error >> (da12:mps0:0:37:0): Retrying command >> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 >> (da12:mps0:0:37:0): CAM status: SCSI Status Error >> (da12:mps0:0:37:0): SCSI status: Check Condition >> (da12:mps0:0:37:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, >> reset, or bus device reset occurred) >> (da12:mps0:0:37:0): Retrying command (per sense data) >> >> The "Unit Attention" implies the drive reset. It only occurs on certain >> drives under very heavy load (e.g. a scrub.) I've managed to provoke it >> on two different brands of disk across multiple firmware and capacities, >> however, which tends to point away from a drive firmware problem. >> >> A look at the pool data shows /no /errors (e.g. no checksum problems, >> etc) and a look at the disk itself (using smartctl) shows no problems >> either -- whatever is going on here the adapter is recovering from it >> without any data corruption or loss registered on *either end*! >> >> The configuration is an older SuperMicro Xeon board (X8DTL-IF) and shows: >> >> mps0: port 0xc000-0xc0ff mem >> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3 >> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd >> mps0: IOCCapabilities: >> 1285c > After considerable additional work this looks increasingly like either a > missed interrupt or a command is getting lost between the host adapter > and the expander. > > I'm going to turn the driver debug level up and see if I can capture > more information. whatever is behind this, however, it is > almost-certainly related to something that changed between 11.1 and > 11.2, as I never saw these on the 11.1-STABLE build. > > -- > Karl Denninger > k...@denninger.net <mailto:k...@denninger.net> > /The Market Ticker/ > /[S/MIME encrypted email preferred]/ Pretty decent trace here -- any ideas? mps0: timedout cm 0xfe00011b5020 allocated tm 0xfe00011812a0 (da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3b 80 00 01 00 00 length 131072 SMID 634 Aborting command 0xfe00011b5020 mps0: Sending reset from mpssas_send_abort for target ID 37 mps0: queued timedout cm 0xfe00011c2760 for processing by tm 0xfe00011812a0 mps0: queued timedout cm 0xfe00011a74f0 for processing by tm 0xfe00011812a0 mps0: queued timedout cm 0xfe00011cfd50 for processing by tm 0xfe00011812a0 mps0: EventReply : EventDataLength: 2 AckRequired: 0 Event: SasDiscovery (0x16) EventContext: 0x0 Flags: 1 ReasonCode: Discovery Started PhysicalPort: 0 DiscoveryStatus: 0 mps0: (0)->(mpssas_fw_work) Working on Event: [16] mps0: (1)->(mpssas_fw_work) Event Free: [16] (da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3c 80 00 01 00 00 length 131072 SMID 961 completed timedout cm 0xfe00011cfd50 ccb 0xf8019458e000 during recovery ioc 804b scsi 0 state c (da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3c 80 00 01 00 00 length 131072 SMID 961 terminated ioc 804b loginfo 3114 scsi 0 state c xfer 0 (da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3b 80 00 01 00 00 length 131072 SMID 634 completed timedout cm 0xfe00011b5(da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3c 80
Re: 9211 (LSI/SAS) issues on 11.2-STABLE
On 2/2/2019 12:02, Karl Denninger wrote: > I recently started having some really oddball things happening under > stress. This coincided with the machine being updated to 11.2-STABLE > (FreeBSD 11.2-STABLE #1 r342918:) from 11.1. > > Specifically, I get "errors" like this: > > (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00 > length 131072 SMID 269 Aborting command 0xfe0001179110 > mps0: Sending reset from mpssas_send_abort for target ID 37 > (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00 > length 131072 SMID 924 terminated ioc 804b loginfo 3114 scsi 0 state > c xfer 0 > (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 > length 131072 SMID 161 terminated ioc 804b loginfo 3114 scsi 0 state > c xfer 0 > mps0: Unfreezing devq for target ID 37 > (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00 > (da12:mps0:0:37:0): CAM status: CCB request completed with an error > (da12:mps0:0:37:0): Retrying command > (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00 > (da12:mps0:0:37:0): CAM status: Command timeout > (da12:mps0:0:37:0): Retrying command > (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 > (da12:mps0:0:37:0): CAM status: CCB request completed with an error > (da12:mps0:0:37:0): Retrying command > (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 > (da12:mps0:0:37:0): CAM status: SCSI Status Error > (da12:mps0:0:37:0): SCSI status: Check Condition > (da12:mps0:0:37:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, > reset, or bus device reset occurred) > (da12:mps0:0:37:0): Retrying command (per sense data) > > The "Unit Attention" implies the drive reset. It only occurs on certain > drives under very heavy load (e.g. a scrub.) I've managed to provoke it > on two different brands of disk across multiple firmware and capacities, > however, which tends to point away from a drive firmware problem. > > A look at the pool data shows /no /errors (e.g. no checksum problems, > etc) and a look at the disk itself (using smartctl) shows no problems > either -- whatever is going on here the adapter is recovering from it > without any data corruption or loss registered on *either end*! > > The configuration is an older SuperMicro Xeon board (X8DTL-IF) and shows: > > mps0: port 0xc000-0xc0ff mem > 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3 > mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd > mps0: IOCCapabilities: > 1285c After considerable additional work this looks increasingly like either a missed interrupt or a command is getting lost between the host adapter and the expander. I'm going to turn the driver debug level up and see if I can capture more information. whatever is behind this, however, it is almost-certainly related to something that changed between 11.1 and 11.2, as I never saw these on the 11.1-STABLE build. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
9211 (LSI/SAS) issues on 11.2-STABLE
I recently started having some really oddball things happening under stress. This coincided with the machine being updated to 11.2-STABLE (FreeBSD 11.2-STABLE #1 r342918:) from 11.1. Specifically, I get "errors" like this: (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00 length 131072 SMID 269 Aborting command 0xfe0001179110 mps0: Sending reset from mpssas_send_abort for target ID 37 (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00 length 131072 SMID 924 terminated ioc 804b loginfo 3114 scsi 0 state c xfer 0 (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 length 131072 SMID 161 terminated ioc 804b loginfo 3114 scsi 0 state c xfer 0 mps0: Unfreezing devq for target ID 37 (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00 (da12:mps0:0:37:0): CAM status: CCB request completed with an error (da12:mps0:0:37:0): Retrying command (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00 (da12:mps0:0:37:0): CAM status: Command timeout (da12:mps0:0:37:0): Retrying command (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 (da12:mps0:0:37:0): CAM status: CCB request completed with an error (da12:mps0:0:37:0): Retrying command (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00 (da12:mps0:0:37:0): CAM status: SCSI Status Error (da12:mps0:0:37:0): SCSI status: Check Condition (da12:mps0:0:37:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da12:mps0:0:37:0): Retrying command (per sense data) The "Unit Attention" implies the drive reset. It only occurs on certain drives under very heavy load (e.g. a scrub.) I've managed to provoke it on two different brands of disk across multiple firmware and capacities, however, which tends to point away from a drive firmware problem. A look at the pool data shows /no /errors (e.g. no checksum problems, etc) and a look at the disk itself (using smartctl) shows no problems either -- whatever is going on here the adapter is recovering from it without any data corruption or loss registered on *either end*! The configuration is an older SuperMicro Xeon board (X8DTL-IF) and shows: mps0: port 0xc000-0xc0ff mem 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3 mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd mps0: IOCCapabilities: 1285c There is also a SAS expander connected to that with all but the boot drives on it (the LSI card will not boot from the expander so the boot mirror is directly connected to the adapter.) Thinking this might be a firmware/driver compatibility related problem I flashed the card to 20.00.07.00, which is the latest available. That made the situation **MUCH** worse; now instead of getting unit attention issues I got *controller* resets (!!) which invariably some random device (and sometimes more than one) in one of the pools to get detached, as the controller didn't come back up fast enough for ZFS and it declares the device(s) in question "removed". Needless to say I immediately flashed the card back to 19.00.00.00! This configuration has been completely stable on 11.1 for upwards of a year, and only started misbehaving when I updated the OS to 11.2. I've pounded the living daylights out of this box for a very long time on a succession of FreeBSD OS builds and up to 11.1 have never seen anything like this; if I had a bad drive, it was clearly the drive. Looking at the commit logs for the mps driver it appears there isn't much here that *could* be involved, unless there's an interrupt issue with some of the MSI changes that is interacting with my specific motherboard line. Any ideas on running this down would be appreciated; it's not easy to trigger it on the 19.0 firmware but on 20. I can force a controller reset and detach within a few minutes by running scrubs so if there are things I can try (I have a sandbox machine with the same hardware in it that won't make me cry much if I blow it up) that would great. Thanks! -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Not sure if this is the correct place.... (laptop, dual-boot EFI)
Here's a write-up on it -- it was /much /simpler than I expected and unlike my X220 didn't require screwing with group policy for Bitlocker to coexist with a dual-boot environment. https://market-ticker.org/akcs-www?post=234936 Feel free to grab/reproduce/link to/whatever; hope this helps others. It runs very nicely on 12-RELEASE -- the only thing I've noted thus far is the expected lack of 5g WiFi support. On 1/26/2019 15:04, Karl Denninger wrote: > Nevermind! > > I set the "-g" flag on the provider and voila. Up she comes; the > loader figured out that it had to prompt for the password and it was > immediately good. > > Now THAT'S easy compared with the convoluted BS I had to do (two > partitions, fully "by-hand" install, etc) for 11 on my X220. > > Off to the races I go; now I have to figure out what I have to set in > Windows group policy so Bitlocker doesn't throw up every time I boot > FreeBSD (this took a bit with my X220 since the boot manager tickled > something that Bitlocker interpreted as "someone tampered with the > system.") Maybe this will be a nothingburger too (which would be great > if true.) > > I'm going to write this one up when I've got it all solid and post it on > my blog; hopefully it will help others. > > On 1/26/2019 14:26, Karl Denninger wrote: >> 1/26/2019 14:10, Warner Losh wrote: >>> On Sat, Jan 26, 2019 at 1:01 PM Karl Denninger >> <mailto:k...@denninger.net>> wrote: >>> >>> Further question does boot1.efi (which I assume has to be >>> placed on >>> the EFI partition and then something like rEFInd can select it) >>> know how >>> to handle a geli-encrypted primary partition (e.g. for root/boot so I >>> don't need an unencrypted /boot partition), and if so how do I tell it >>> that's the case and to prompt for the password? >>> >>> >>> Not really. The whole reason we ditched boot1.efi is because it is >>> quite limited in what it can do. You must loader.efi for that. >>> >>> >>> (If not I know how to set up for geli-encryption using a non-encrypted >>> /boot partition, but my understanding is that for 12 the loader was >>> taught how to handle geli internally and thus you can now install >>> 12 -- >>> at least for ZFS -- with encryption on root. However, that wipes the >>> disk if you try to select it in the installer, so that's no good >>> -- and >>> besides, on a laptop zfs is overkill.) >>> >>> >>> For MBR stuff, yes. For loader.efi, yes. For boot1.efi, no: it did not >>> and will not grow that functionality. >>> >>> Warner >>> >> Ok, next dumb question -- can I put loader.efi in the EFI partition >> under EFI/FreeBSD as "bootx64.efi" there (from reading mailing list >> archives that appears to be yes -- just copy it in) and, if yes, how do >> I "tell" it that when it finds the freebsd-ufs partition on the disk it >> was started from (which, if I'm reading correctly, it will scan and look >> for) that it needs to geli attach the partition before it dig into there >> and find the rest of what it needs to boot? >> >> That SHOULD allow me to use an EFI boot manager to come up on initial >> boot, select FreeBSD and the loader.efi (named as bootx64.efi in >> EFI/FreeBSD) code will then boot the system. >> >> I've looked as the 12-RELEASE man page(s) and it's not obvious how you >> tell the loader to look for the partition and then attach it via GELI >> (prompting for the password of course) before attempting to boot it; >> obviously a "load" directive (e.g. geom_eli_load ="YES") makes no sense >> as the thing you'd "load" is on the disk you'd be loading it from and >> its encrypted.. .never mind that loader.conf violates the 8.3 filename >> rules for a DOS filesystem. >> >> Thanks! >> -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Not sure if this is the correct place.... (laptop, dual-boot EFI)
Nevermind! I set the "-g" flag on the provider and voila. Up she comes; the loader figured out that it had to prompt for the password and it was immediately good. Now THAT'S easy compared with the convoluted BS I had to do (two partitions, fully "by-hand" install, etc) for 11 on my X220. Off to the races I go; now I have to figure out what I have to set in Windows group policy so Bitlocker doesn't throw up every time I boot FreeBSD (this took a bit with my X220 since the boot manager tickled something that Bitlocker interpreted as "someone tampered with the system.") Maybe this will be a nothingburger too (which would be great if true.) I'm going to write this one up when I've got it all solid and post it on my blog; hopefully it will help others. On 1/26/2019 14:26, Karl Denninger wrote: > 1/26/2019 14:10, Warner Losh wrote: >> >> On Sat, Jan 26, 2019 at 1:01 PM Karl Denninger > <mailto:k...@denninger.net>> wrote: >> >> Further question does boot1.efi (which I assume has to be >> placed on >> the EFI partition and then something like rEFInd can select it) >> know how >> to handle a geli-encrypted primary partition (e.g. for root/boot so I >> don't need an unencrypted /boot partition), and if so how do I tell it >> that's the case and to prompt for the password? >> >> >> Not really. The whole reason we ditched boot1.efi is because it is >> quite limited in what it can do. You must loader.efi for that. >> >> >> (If not I know how to set up for geli-encryption using a non-encrypted >> /boot partition, but my understanding is that for 12 the loader was >> taught how to handle geli internally and thus you can now install >> 12 -- >> at least for ZFS -- with encryption on root. However, that wipes the >> disk if you try to select it in the installer, so that's no good >> -- and >> besides, on a laptop zfs is overkill.) >> >> >> For MBR stuff, yes. For loader.efi, yes. For boot1.efi, no: it did not >> and will not grow that functionality. >> >> Warner >> > Ok, next dumb question -- can I put loader.efi in the EFI partition > under EFI/FreeBSD as "bootx64.efi" there (from reading mailing list > archives that appears to be yes -- just copy it in) and, if yes, how do > I "tell" it that when it finds the freebsd-ufs partition on the disk it > was started from (which, if I'm reading correctly, it will scan and look > for) that it needs to geli attach the partition before it dig into there > and find the rest of what it needs to boot? > > That SHOULD allow me to use an EFI boot manager to come up on initial > boot, select FreeBSD and the loader.efi (named as bootx64.efi in > EFI/FreeBSD) code will then boot the system. > > I've looked as the 12-RELEASE man page(s) and it's not obvious how you > tell the loader to look for the partition and then attach it via GELI > (prompting for the password of course) before attempting to boot it; > obviously a "load" directive (e.g. geom_eli_load ="YES") makes no sense > as the thing you'd "load" is on the disk you'd be loading it from and > its encrypted.. .never mind that loader.conf violates the 8.3 filename > rules for a DOS filesystem. > > Thanks! > -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Not sure if this is the correct place.... (laptop, dual-boot EFI)
1/26/2019 14:10, Warner Losh wrote: > > > On Sat, Jan 26, 2019 at 1:01 PM Karl Denninger <mailto:k...@denninger.net>> wrote: > > Further question does boot1.efi (which I assume has to be > placed on > the EFI partition and then something like rEFInd can select it) > know how > to handle a geli-encrypted primary partition (e.g. for root/boot so I > don't need an unencrypted /boot partition), and if so how do I tell it > that's the case and to prompt for the password? > > > Not really. The whole reason we ditched boot1.efi is because it is > quite limited in what it can do. You must loader.efi for that. > > > (If not I know how to set up for geli-encryption using a non-encrypted > /boot partition, but my understanding is that for 12 the loader was > taught how to handle geli internally and thus you can now install > 12 -- > at least for ZFS -- with encryption on root. However, that wipes the > disk if you try to select it in the installer, so that's no good > -- and > besides, on a laptop zfs is overkill.) > > > For MBR stuff, yes. For loader.efi, yes. For boot1.efi, no: it did not > and will not grow that functionality. > > Warner > Ok, next dumb question -- can I put loader.efi in the EFI partition under EFI/FreeBSD as "bootx64.efi" there (from reading mailing list archives that appears to be yes -- just copy it in) and, if yes, how do I "tell" it that when it finds the freebsd-ufs partition on the disk it was started from (which, if I'm reading correctly, it will scan and look for) that it needs to geli attach the partition before it dig into there and find the rest of what it needs to boot? That SHOULD allow me to use an EFI boot manager to come up on initial boot, select FreeBSD and the loader.efi (named as bootx64.efi in EFI/FreeBSD) code will then boot the system. I've looked as the 12-RELEASE man page(s) and it's not obvious how you tell the loader to look for the partition and then attach it via GELI (prompting for the password of course) before attempting to boot it; obviously a "load" directive (e.g. geom_eli_load ="YES") makes no sense as the thing you'd "load" is on the disk you'd be loading it from and its encrypted.. .never mind that loader.conf violates the 8.3 filename rules for a DOS filesystem. Thanks! -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Not sure if this is the correct place.... (laptop, dual-boot EFI)
Further question does boot1.efi (which I assume has to be placed on the EFI partition and then something like rEFInd can select it) know how to handle a geli-encrypted primary partition (e.g. for root/boot so I don't need an unencrypted /boot partition), and if so how do I tell it that's the case and to prompt for the password? (If not I know how to set up for geli-encryption using a non-encrypted /boot partition, but my understanding is that for 12 the loader was taught how to handle geli internally and thus you can now install 12 -- at least for ZFS -- with encryption on root. However, that wipes the disk if you try to select it in the installer, so that's no good -- and besides, on a laptop zfs is overkill.) Thanks! On 1/26/2019 08:08, Kamila Součková wrote: > I'm just booting the installer, going to do this on my X1 Carbon (5th gen), > and I'm planning to use the efibootmgr entry first (which is sufficient for > booting), and later I might add rEFInd if I feel like it. I'll be posting > my steps online, I can post the link once it's out there if you're > interested. > > I'm very curious about HW support on the 6th gen Carbon, it'd be great to > hear how it goes. > > Have fun! > > Kamila > > On Sat, 26 Jan 2019, 06:54 Kyle Evans, wrote: > >> On Fri, Jan 25, 2019 at 6:30 PM Jonathan Chen wrote: >>> On Sat, 26 Jan 2019 at 13:00, Karl Denninger wrote: >>> [...] >>>> I'd like to repartition it to be able to dual boot it much as I do with >>>> my X220 (I wish I could ditch Windows entirely, but that is just not >>>> going to happen), but I'm not sure how to accomplish that in the EFI >>>> world -- or if it reasonably CAN be done in the EFI world. Fortunately >>>> the BIOS has an option to turn off secure boot (which I surmise from >>>> reading the Wiki FreeBSD doesn't yet support) but I still need a means >>>> to select from some reasonably-friendly way *what* to boot. >>> The EFI partition is just a MS-DOS partition, and most EFI aware BIOS >>> will (by default) load /EFI/Boot/boot64.efi when starting up. On my >>> Dell Inspiron 17, I created /EFI/FreeBSD and copied FreeBSD's >>> /boot/loader.efi to /EFI/FreeBSD/boot64.efi. My laptop's BIOS setup >>> allowed me to specify a boot-entry to for \EFI\FreeBSD\boot64.efi. On >>> a cold start, I have to be quick to hit the F12 key, which then allows >>> me to specify whether to boot Windows or FreeBSD. I'm not sure how >>> Lenovo's BIOS setup works, but I'm pretty sure that it should have >>> something similar. >>> >> Adding a boot-entry can also be accomplished with efibootmgr. This is >> effectively what the installer in -CURRENT does, copying loader to >> \EFI\FreeBSD on the ESP and using efibootmgr to insert a "FreeBSD" >> entry for that loader and activating it. >> ___ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" >> > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Not sure if this is the correct place.... (laptop, dual-boot EFI)
-mobile appears to be pretty much a dead-letter, so I'm posting here... I have dual-boot working well on my Lenovo X220, and have for quite some time, between Win10 and FreeBSD 11. This is set up for MBR however, not EFI. I just picked up an X1 Carbon Gen 6, which is an UEFI machine, with Win10 on it. I'd like to repartition it to be able to dual boot it much as I do with my X220 (I wish I could ditch Windows entirely, but that is just not going to happen), but I'm not sure how to accomplish that in the EFI world -- or if it reasonably CAN be done in the EFI world. Fortunately the BIOS has an option to turn off secure boot (which I surmise from reading the Wiki FreeBSD doesn't yet support) but I still need a means to select from some reasonably-friendly way *what* to boot. With the X220 Bootmanager does this reasonably easily; you get an "F" key for the desired partition, and if you press nothing after a few seconds whatever you pressed last is booted. Works fine. What options exist for doing this in a UEFI world, if any, and is there a "cookbook" for putting this together? I assume *someone* has set up dual, given that the X1 Carbon Gen 6 is listed as working in the laptop database. Thanks in advance! -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
11.2-STABLE and MMC problems on pcEngines apu2c0
way 192.168.10.100 fib 0: route already in table . Updating motd:. Mounting late filesystems:. Starting ntpd. Starting powerd. Starting rtadvd. Dec 10 15:31:01 IpGw ntpd[985]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): expired less than 558 days ago Starting dhcpd. Internet Systems Consortium DHCP Server 4.3.5 Copyright 2004-2016 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Config file: /usr/local/etc/dhcpd.conf Database file: /var/db/dhcpd.leases PID file: /var/run/dhcpd.pid Wrote 0 deleted host decls to leases file. Wrote 0 new dynamic host decls to leases file. Wrote 0 leases to leases file. Listening on BPF/igb1.3/00:0d:b9:46:71:89/192.168.4.0/24 Sending on BPF/igb1.3/00:0d:b9:46:71:89/192.168.4.0/24 Listening on BPF/igb1/00:0d:b9:46:71:89/two-on-cable Sending on BPF/igb1/00:0d:b9:46:71:89/two-on-cable Sending on Socket/fallback/fallback-net Starting sshguard. Starting snmpd. cannot forward src fe80:2::396c:673e:9cf:ba21, dst 2600:1402:16::17db:a212, nxt 6, rcvif igb1, outif igb0 Starting openvpn. tun0: link state changed to UP Configuring vt: blanktime. Performing sanity check on sshd configuration. Starting sshd. Starting sendmail_submit. Starting sendmail_msp_queue. Starting cron. Starting background file system checks in 60 seconds. Mon Dec 10 15:31:06 CST 2018 cannot forward src fe80:2::396c:673e:9cf:ba21, dst 2600:1402:16::17db:a212, nxt 6, rcvif igb1, outif igb0 cannot forward src fe80:2::396c:673e:9cf:ba21, dst 2600:1402:16::17db:a212, nxt 6, rcvif igb1, outif igb0 cannot forward src fe80:2::396c:673e:9cf:ba21, dst 2600:1402:16::17db:a212, nxt 6, rcvif igb1, outif igb0 cannot forward src fe80:2::396c:673e:9cf:ba21, dst 2600:1402:16::17db:a212, nxt 6, rcvif igb1, outif igb0 -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Random freezes of my FreeBSD droplet (DigitalOcean)
Yeah, don't do that. I have a DO Zfs-enabled FreeBSD deployment and I have the swap on /dev/gpt/swap0 (a regular slice)... no problems. $ uptime 9:17AM up 174 days, 2:06, 1 users, load averages: 0.89, 0.80, 0.83 $ On 11/22/2017 08:15, Stefan Lambrev wrote: > Hi, > > Thanks for this idea. > > Device 1K-blocks UsedAvail Capacity > /dev/zvol/zroot/swap 2097152 310668 178648415% > > Will check why at all swap is used when the VM is not used. But yes - as > it's a image provided by DO the swap is on the zvol... > > On Wed, Nov 22, 2017 at 4:08 PM, Adam Vande More > wrote: > >> On Wed, Nov 22, 2017 at 7:17 AM, Stefan Lambrev >> wrote: >> >>> Greetings, >>> >>> I have a droplet in DO with very light load, currently >>> running 11.0-RELEASE-p15 amd64 GENERIC kernel + zfs (1 GB Memory / 30 GB >>> Disk / FRA1 - FreeBSD 11.0 zfs) >>> >>> I know ZFS needs more memory, but the load is really light. Unfortunatelly >>> last few weeks I'm experiencing those freezes almost every second day. >>> There are no logs or console messages - just freeze. Networks seems to >>> work, but nothing else. >>> >>> Is there anyone with similar experience here? >>> Are there any updates in 11.1 that may affect positively my experience in >>> the digital ocean cloud? >>> >> It's entirely possible to run a stable VM using that configuration so you >> haven't provided enough details to give any real help. A common foot >> shooting method is putting swap on zvol, but the possibilities are endless. >> >> -- >> Adam >> > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Installing amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an MBR partitioned disk
On 10/7/2017 12:12, Eugene Grosbein wrote: > 07.10.2017 22:26, Warner Losh wrote: > >> Sorry for top posting. Sounds like your BIOS will read the botox64.efi from >> the removable USB drive, >> but won't from the hard drive. Force BIOS booting instead of UEFI and it >> will install correctly. >> However, it may not boot Windows, which I think requires UEFI these days. > My home desktop is UEFI-capable and but switched to BIOS/MBR mode > and it dual-boots FreeBSD/Windows 8.1 just fine. Windows (including Windows 10) doesn't "require" UEFI but the current installer will set it up that way on a "from scratch" installation. If you have it on an MBR disk (e.g. you started with 7 or 8, for example) it will boot and run just fine from it, and in fact if you have a legacy license and try to change to UEFI (with a full, from-scratch reload) you run the risk of it declaring your license invalid! You can /probably /get around that by getting in touch with Microsoft but why do so without good reason? -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Installing amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an MBR partitioned disk
On 10/7/2017 11:27, Rostislav Krasny wrote: > On Sat, Oct 7, 2017 at 6:26 PM, Warner Losh wrote: >> Sorry for top posting. Sounds like your BIOS will read the botox64.efi from >> the removable USB drive, but won't from the hard drive. Force BIOS booting >> instead of UEFI and it will install correctly. However, it may not boot >> Windows, which I think requires UEFI these days. >> >> The root of the problem is that we have no way to setup the EFI boot >> variables in the installer that we need to properly installed under UEFI. >> I'm working on that, so you'll need to be patient... >> >> Warner > My computer doesn't have any EFI partition and this explains why the > installed FreeBSD boots in the BIOS mode on it. The installation media > probably has the EFI partition (with the bootx64.efi) and then BIOS > probably boots the installation media in the UEFI mode instead of the > BIOS mode. So the "machdep.bootmethod" sysctl doesn't represent the > BIOS boot mode configuration but a boot method the currently running > system was booted in. If this is true then the "machdep.bootmethod" > sysctl should not be used in bsdinstall. At least not for the > bootability check. Something else should be used for the bootability > check or the bsdinstall should trust the user choice. > > BTW this is how the EFI partition looks like in someone's Windows 7 > disk manager: > https://www.easyuefi.com/wintousb/images/en_US/efi-system-partition.png > and this how it looks without any EFI partition in my system (with > Windows 7 / FreeBSD dual-boot) > http://i68.tinypic.com/9u19b8.png > > I think even that NTFS System Reserved partition is not mandatory for > Windows installation. It just used to keep Windows boot files in a > safe, place preventing accidental deletion by a user. It's being > created if Windows is installed on an empty disk but if you create > just one big NTFS partition prior to the Windows installation and > install it on that single partition it will be ok. There will be just > more Windows system files on the C disk, for example ntldr, > NTDETECT.COM. It can be checked on VM, for example on VirtualBox. > ___ The problem with the new installer appears to be that it follows this heuristic when you boot FreeBSD media from a USB stick or similar media: 1. If the system CAN boot EFI then it WILL EFI boot the FreeBSD installer from that media. 2. The installer sees that it booted from EFI. It also sees a fixed disk with a non-EFI boot environment. It declares that fixed disk environment "non-bootable", which is not by any means a known fact. 3. Having done that it will then "offer" to re-initialize the disk as EFI/GPT, which is ok if you don't have anything else on there that you want. If you DO then it's obviously not ok, and in that instance it both won't load the MBR boot manager *and* won't load the second-stage MBR boot code either. You can get around this by hand-installing both parts of the boot code, which is what I wound up doing on my Lenovo laptop. That machine was originally delivered with Windows 7 and upgraded "in place" to Win10, which is why the disk is MBR-partitioned rather than EFI/GPT, although the machine itself does support EFI booting. I would suggest that the FreeBSD installer should be more-intelligent about this, but I suspect it's a fairly uncommon set of circumstances. Far more troublesome in the EFI world is the fact that "out-of-the-box" multi-boot in an EFI environment is a five-alarm pain in the butt although there are EFI boot managers that make it reasonable. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Installing amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an MBR partitioned disk
On 10/6/2017 10:42, Karl Denninger wrote: > On 10/6/2017 10:17, Rostislav Krasny wrote: >> Hi there, >> >> I try to install amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an >> MBR partitioned disk and I can't make it bootable. My Windows 7 uses >> its standard MBR partitioning scheme (1. 100MB System Reserved >> Partition; 2 - 127GB disk C partition) and there is about 112GB of >> free unallocated disk space that I want to use to install FreeBSD on >> it. As an installation media I use the >> FreeBSD-11.1-RELEASE-amd64-mini-memstick.img flashed on a USB drive. >> >> During the installation, if I choose to use Guided Partitioning Tool >> and automatic partitioning of the free space, I get a pop-up message >> that says: >> >> == >> The existing partition scheme on this disk >> (MBR) is not bootable on this platform. To >> install FreeBSD, it must be repartitioned. >> This will destroy all data on the disk. >> Are you sure you want to proceed? >> [Yes] [No] >> == >> >> If instead of the Guided Partitioning Tool I choose to partition the >> disk manually I get a similar message as a warning and the >> installation process continues without an error, but the installed >> FreeBSD system is not bootable. Installing boot0 manually (boot0cfg >> -Bv /dev/ada0) doesn't fix it. The boot0 boot loader is able to boot >> Windows but it's unable to start the FreeBSD boot process. It only >> prints hash symbols when I press F3 (the FreeBSD slice/MBR partition >> number). >> >> I consider this as a critical bug. But maybe there is some workaround >> that allows me to install the FreeBSD 11.1 as a second OS without >> repartitioning the entire disk? >> >> My hardware is an Intel Core i7 4790 3.6GHz based machine with 16GB >> RAM. The ada0 disk is 238GB SanDisk SD8SBAT256G1122 (SSD). >> > You have to do the partitioning and then install FreeBSD's boot > manager by hand. It /does /work; I ran into the same thing with my > Lenovo X220 and was able to manually install it, which works fine to > dual-boot between Windows and FreeBSD-11. I had to do it manually > because the installer detected that the X220 was UEFI capable and > insisted on GPT-partitioning the disk, which is incompatible with > dual-boot and the existing MBR-partitioned Windows installation. > > You want the partition layout to look like this: > > $ gpart show > => 63 500118129 ada0 MBR (238G) > 63 4208967 1 ntfs (2.0G) > 4209030 307481727 2 ntfs (147G) > 311690757 3 - free - (1.5K) > 311690760 165675008 3 freebsd [active] (79G) > 477365768 808957 - free - (395M) > 478174725 21928725 4 ntfs (10G) > 500103450 14742 - free - (7.2M) > > => 0 165675008 ada0s3 BSD (79G) > 0 8388608 1 freebsd-ufs (4.0G) > 8388608 136314880 2 freebsd-ufs (65G) > 144703488 20971519 4 freebsd-swap (10G) > 165675007 1 - free - (512B) > > MBR has only four partitions; the "standard" Windows (7+) install uses > /three. /The "boot"/repair area, the main partition and, on most > machines, a "recovery" partition. That usually leaves partition 3 > free which is where I stuck FreeBSD. Note that you must then set up > slices on Partition 3 (e.g. root/usr/swap) as usual. > BTW if you're getting the "#" when you hit the partition key that means the /second stage /boot loader is /not /on the partition you selected; the bootmanager can't find it. This can be manually installed with: # gpart bootcode -b /boot/boot ada0s3 "s3" is the partition in question upon which you created the BSD-labeled structure. One thing to be aware of is that you must adjust Windows group policy if you intend to use Bitlocker, or it will declare the disk structure changed and refuse to take your key (demanding the recovery key) whenever the FreeBSD boot manager changes the active "next boot" flag. By default /any /change in the boot structure will set off Bitlocker; you can relax it to not get so cranked, but you need to do so /before /encrypting the partition under Windows. I run GELI encryption on the FreeBSD partition which is why I have a separate (small) boot filesystem; that too has to be manually set up for an installation like this using MBR. It works well. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Installing amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an MBR partitioned disk
On 10/6/2017 10:17, Rostislav Krasny wrote: Hi there, I try to install amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an MBR partitioned disk and I can't make it bootable. My Windows 7 uses its standard MBR partitioning scheme (1. 100MB System Reserved Partition; 2 - 127GB disk C partition) and there is about 112GB of free unallocated disk space that I want to use to install FreeBSD on it. As an installation media I use the FreeBSD-11.1-RELEASE-amd64-mini-memstick.img flashed on a USB drive. During the installation, if I choose to use Guided Partitioning Tool and automatic partitioning of the free space, I get a pop-up message that says: == The existing partition scheme on this disk (MBR) is not bootable on this platform. To install FreeBSD, it must be repartitioned. This will destroy all data on the disk. Are you sure you want to proceed? [Yes] [No] == If instead of the Guided Partitioning Tool I choose to partition the disk manually I get a similar message as a warning and the installation process continues without an error, but the installed FreeBSD system is not bootable. Installing boot0 manually (boot0cfg -Bv /dev/ada0) doesn't fix it. The boot0 boot loader is able to boot Windows but it's unable to start the FreeBSD boot process. It only prints hash symbols when I press F3 (the FreeBSD slice/MBR partition number). I consider this as a critical bug. But maybe there is some workaround that allows me to install the FreeBSD 11.1 as a second OS without repartitioning the entire disk? My hardware is an Intel Core i7 4790 3.6GHz based machine with 16GB RAM. The ada0 disk is 238GB SanDisk SD8SBAT256G1122 (SSD). You have to do the partitioning and then install FreeBSD's boot manager by hand. It /does /work; I ran into the same thing with my Lenovo X220 and was able to manually install it, which works fine to dual-boot between Windows and FreeBSD-11. I had to do it manually because the installer detected that the X220 was UEFI capable and insisted on GPT-partitioning the disk, which is incompatible with dual-boot and the existing MBR-partitioned Windows installation. You want the partition layout to look like this: $ gpart show => 63 500118129 ada0 MBR (238G) 63 4208967 1 ntfs (2.0G) 4209030 307481727 2 ntfs (147G) 311690757 3 - free - (1.5K) 311690760 165675008 3 freebsd [active] (79G) 477365768 808957 - free - (395M) 478174725 21928725 4 ntfs (10G) 500103450 14742 - free - (7.2M) => 0 165675008 ada0s3 BSD (79G) 0 8388608 1 freebsd-ufs (4.0G) 8388608 136314880 2 freebsd-ufs (65G) 144703488 20971519 4 freebsd-swap (10G) 165675007 1 - free - (512B) MBR has only four partitions; the "standard" Windows (7+) install uses /three. /The "boot"/repair area, the main partition and, on most machines, a "recovery" partition. That usually leaves partition 3 free which is where I stuck FreeBSD. Note that you must then set up slices on Partition 3 (e.g. root/usr/swap) as usual. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: issues with powerd/freq_levels
On 8/2/2017 13:06, Ian Smith wrote: > Is it working on the others? Does it actually idle at 600MHz? If in > doubt, running 'powerd -v' for a while will show you what's happening. > Despite being low power, running slower when more or less idle - along > with hopefully getting to use C2 state - should cool these down a lot. > Yes. "powerd -v" sez (once it gets going) load 3%, current freq 600 MHz ( 2), wanted freq 600 MHz load 6%, current freq 600 MHz ( 2), wanted freq 600 MHz load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz load 8%, current freq 600 MHz ( 2), wanted freq 600 MHz load 4%, current freq 600 MHz ( 2), wanted freq 600 MHz load 3%, current freq 600 MHz ( 2), wanted freq 600 MHz load 4%, current freq 600 MHz ( 2), wanted freq 600 MHz load 0%, current freq 600 MHz ( 2), wanted freq 600 MHz load 14%, current freq 600 MHz ( 2), wanted freq 600 MHz dev.cpu.3.temperature: 58.5C dev.cpu.2.temperature: 58.5C dev.cpu.1.temperature: 58.5C dev.cpu.0.temperature: 58.5C -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: issues with powerd/freq_levels
ure is a totally separate issue. It is VERY sensitive to external > > issue like airflow and position of the CPU in relation to other components > > in the chassis Also, unless you have a lot of cores, you probably should > > set both economy_cx_lowest and performance_cx_lowest to Cmax. Economy > > should default to that, but performance will not as that can cause issues > > on systems with large numbers of cores, so is set to C2. Many such system > > used to disable deeper sleep modes in BIOS, but I am way behind the times > > and don't know about the current state of affairs. Certainly for systems > > with 32 or fewer cores, this should not be an issue. In any case, Cx state > > can sharply impact temperature. > > Indeed. But as these are low-power devices already, it's likely less of > a concern, but maximising efficiency and minimising stress never hurts. > > > Finally, the last case with power levels of -1 for all frequencies is > > probably because the CPU manufacturer (Intel?) has not published this > > information. For a while they were treating this as "proprietary" > > information. Very annoying! It's always something that is not readily > > available. Thi is one reason I suspect your CPUs are not identical. > > Hmm, bought as a batch, that sounds unlikely, though their BIOSes (ono) > may vary, and would be worth checking on each - and BIOS settings, too. > > Danny, is powerd running on all these? I doubt it would load on apu-1 > as it stands. Note these are 'pure' 1/8 factors of 1000, p4tcc-alike, > and I think quite likely indicate that cpufreq(4) failed to initialise? > debug.cpufreq.verbose=1 in /boot/loader.conf might show a clue, with a > verbose dmesg.boot anyway. > > Later: oops, just reread Karl's message, where I was unfaniliar with > different CPUs showing different C-states, and noticing that despite > cpu0 showing C2(io) available, and cx_lowest as C2, yet it used 100% C1 > state, which was all that was available to cpu1 to 3. > > But then I twigged to Karl's hwpstate errors, so with 'apropos hwpstate' > still showing nothing after all these years, along with other cpufreq(4) > drivers, I used the list search via duckduckgo to finally find one (1) > message, which lead to one detailed thread (that I even bought into!) > > https://lists.freebsd.org/pipermail/freebsd-stable/2012-May/subject.html > https://lists.freebsd.org/pipermail/freebsd-stable/2012-June/thread.html > > /hwpstate Note the May one needs following by Subject, else it splits > into 5 separate threads (?) > > Which may be interesting to cpufreq nerds, but had me remember that > hwpstate(0) is for AMD not Intel CPUs. So now I'm totally confused :) > > Danny, do your results from Karl's sysctl listings agree with his? These are not Intel CPUs; they are an embedded AMD 64-bit CPU. The specs on the one I have are: * CPU: AMD Embedded G series GX-412TC, 1 GHz quad Jaguar core with 64 bit and AES-NI support, 32K data + 32K instruction cache per core, shared 2MB L2 cache. * DRAM: 2 or 4 GB DDR3-1333 DRAM * Storage: Boot from m-SATA SSD, SD card (internal sdhci controller), or external USB. 1 SATA + power connector. * 12V DC, about 6 to 12W depending on CPU load. Jack = 2.5 mm, center positive * Connectivity: 2 or 3 Gigabit Ethernet channels (Intel i211AT on apu2b2, i210AT on apu2b4) * I/O: DB9 serial port, 2 USB 3.0 external + + 2 USB 2.0 internal, three front panel LEDs, pushbutton * Expansion: 2 miniPCI express (one with SIM socket), LPC bus, GPIO header, I2C bus, COM2 (3.3V RXD / TXD) * Board size: 6 x 6" (152.4 x 152.4 mm) - same as apu1d, alix2d13 and wrap1e. * Firmware: coreboot <http://www.coreboot.org/> (please contact supp...@pcengines.ch for source code if desired). * Cooling: Conductive cooling from the CPU to the enclosure using a 3 mm alu heat spreader (included). The one I have here is a 2Gb RAM/2 IGB Ethernet interface unit. They're surprisingly capable for their size, conductive cooling and (especially) price. As a firewall/VPN ingress point they perform nicely. I boot the one I have here from an SD card in a NanoBSD config but you can stick a mSATA SSD (laptop-computer style) in the case and boot from that if you want (I've tried it; the internal BIOS it comes with boots from it just fine.) -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: issues with powerd/freq_levels
ew PCEngines unit here running 11.0-STABLE and this is what I have in the related sysctls: $ sysctl -a|grep cpu.0 dev.cpu.0.cx_method: C1/hlt C2/io dev.cpu.0.cx_usage_counters: 2261969965 3038 dev.cpu.0.cx_usage: 99.99% 0.00% last 798us dev.cpu.0.cx_lowest: C2 dev.cpu.0.cx_supported: C1/1/0 C2/2/400 dev.cpu.0.freq_levels: 1000/924 800/760 600/571 dev.cpu.0.freq: 1000 dev.cpu.0.temperature: 59.2C dev.cpu.0.%parent: acpi0 dev.cpu.0.%pnpinfo: _HID=none _UID=0 dev.cpu.0.%location: handle=\_PR_.P000 dev.cpu.0.%driver: cpu dev.cpu.0.%desc: ACPI CPU $ sysctl -a|grep cx hw.acpi.cpu.cx_lowest: C2 dev.cpu.3.cx_method: C1/hlt dev.cpu.3.cx_usage_counters: 111298364 dev.cpu.3.cx_usage: 100.00% last 30us dev.cpu.3.cx_lowest: C2 dev.cpu.3.cx_supported: C1/1/0 dev.cpu.2.cx_method: C1/hlt dev.cpu.2.cx_usage_counters: 127978480 dev.cpu.2.cx_usage: 100.00% last 35us dev.cpu.2.cx_lowest: C2 dev.cpu.2.cx_supported: C1/1/0 dev.cpu.1.cx_method: C1/hlt dev.cpu.1.cx_usage_counters: 108161434 dev.cpu.1.cx_usage: 100.00% last 29us dev.cpu.1.cx_lowest: C2 dev.cpu.1.cx_supported: C1/1/0 dev.cpu.0.cx_method: C1/hlt C2/io dev.cpu.0.cx_usage_counters: 2261916773 3038 dev.cpu.0.cx_usage: 99.99% 0.00% last 378us dev.cpu.0.cx_lowest: C2 dev.cpu.0.cx_supported: C1/1/0 C2/2/400 These are fanless, 4-core devices that are pretty cool -- they've got AES instructions in them and thus make very nice VPN gateways running something like Strongswan, and come with either 2 or 3 gigabit interfaces on the board. Oh, and they run on 12V. Powerd is logging this, however... hwpstate0: set freq failed, err 6 hwpstate0: set freq failed, err 6 H -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Interesting permissions difference on NanoBSD build
On 6/16/2017 09:55, Karl Denninger wrote: > On 6/16/2017 08:21, Karl Denninger wrote: >> On 6/16/2017 07:52, Guido Falsi wrote: >>> On 06/16/17 14:25, Karl Denninger wrote: >>>> I've recently started playing with the "base" NanoBSD scripts and have >>>> run into an interesting issue. >>> [...] >>>> Note the missing "r" bit for "other" in usr and etc directories -- and >>>> the missing "x" bit (at minimum) for the root! The same is carried down >>>> to "local" under usr: >>>> >>>> root@NewFS:/pics/Crochet-work-AMD/obj/_.w # ls -al usr >>>> total 134 >>>> drwxr-x--x 12 root wheel 12 Jun 15 17:10 . >>>> drwxr-x--- 18 root wheel 24 Jun 15 17:10 .. >>>> drwxr-xr-x 2 root wheel 497 Jun 15 17:09 bin >>>> drwxr-xr-x 52 root wheel 327 Jun 15 17:10 include >>>> drwxr-xr-x 8 root wheel 655 Jun 15 17:10 lib >>>> drwxr-xr-x 4 root wheel 670 Jun 15 17:09 lib32 >>>> drwxr-xr-x 5 root wheel5 Jun 15 17:10 libdata >>>> drwxr-xr-x 7 root wheel 70 Jun 15 17:10 libexec >>>> drwxr-x--x 10 root wheel 11 Jun 15 17:10 local >>>> drwxr-xr-x 2 root wheel 294 Jun 15 17:08 sbin >>>> drwxr-xr-x 31 root wheel 31 Jun 15 17:10 share >>>> drwxr-xr-x 14 root wheel 17 Jun 15 17:10 tests >>>> root@NewFS:/pics/Crochet-work-AMD/obj/_.w # >>> I have no idea why this is happening on your system but I'm not >>> observing it here: >>> >>>> ls -al usr >>> total 85 >>> drwxr-xr-x 9 root wheel9 Jun 15 13:32 . >>> drwxr-xr-x 22 root wheel 29 Jun 15 13:32 .. >>> drwxr-xr-x 2 root wheel 359 Jun 15 13:32 bin >>> drwxr-xr-x 4 root wheel 446 Jun 15 13:32 lib >>> drwxr-xr-x 3 root wheel3 Jun 15 13:32 libdata >>> drwxr-xr-x 5 root wheel 47 Jun 15 13:32 libexec >>> drwxr-xr-x 12 root wheel 13 Jun 15 13:32 local >>> drwxr-xr-x 2 root wheel 218 Jun 15 13:32 sbin >>> drwxr-xr-x 17 root wheel 17 Jun 15 13:32 share >>> >>> >>> and I get (almost) the same on the installed nanobsd system: >>>> ls -al usr >>> total 24 >>> drwxr-xr-x 9 root wheel512 Jun 15 13:32 . >>> drwxr-xr-x 23 root wheel512 Jun 15 13:34 .. >>> drwxr-xr-x 2 root wheel 6144 Jun 15 13:32 bin >>> drwxr-xr-x 4 root wheel 10752 Jun 15 13:32 lib >>> drwxr-xr-x 3 root wheel512 Jun 15 13:32 libdata >>> drwxr-xr-x 5 root wheel 1024 Jun 15 13:32 libexec >>> drwxr-xr-x 12 root wheel512 Jun 15 13:32 local >>> drwxr-xr-x 2 root wheel 4096 Jun 15 13:32 sbin >>> drwxr-xr-x 17 root wheel512 Jun 15 13:32 share >>> >>> The machine I'm building the NanoBSD image on is running head r318959, >>> and is running ZFS, while the NanoBSD system I've built is tracking >>> 11-STABLE and is at r319971 at present, so a BETA1. >>> >>> Could you report version information too? maybe it's a problem present >>> on head NanoBSD scripts? >> FreeBSD 11.0-STABLE #15 r312669M: Mon Jan 23 14:01:03 CST 2017 >> k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP >> >> I also build using Crochet against both /usr/src (my "primary" source >> repo, which is on the rev noted here) and against a second one (-HEAD), >> which I need to use for the RPI3. Neither winds up with this sort of >> permission issue. >> >> The obj directory is on /pics/Crochet-Work-AMD, which is a zfs >> filesystem mounted off a "scratch" SSD. >> >> The problem appears to stem from the creation of "_.w" and since >> directory permissions are "normally" inherited it promulgates from there >> unless an explicit permission set occurs. Yet I see nothing that would >> create the world directory with anything other than the umask at the >> time it runs. >> >> I *am* running this from "batch" -- perhaps that's where the problem is >> coming from? I'll try adding a "umask 022" to the nanobsd.sh script at >> the top and see what that does. > Nope. > > It's something in the installworld subset; I put a stop in after the > clean/create world directory and I have a 0755 permission mask on the > (empty) directory. > > Hmmm... > > I do not know where this is coming from now but this test implies that > it's the "installworld" action
Re: Interesting permissions difference on NanoBSD build
On 6/16/2017 08:21, Karl Denninger wrote: > On 6/16/2017 07:52, Guido Falsi wrote: >> On 06/16/17 14:25, Karl Denninger wrote: >>> I've recently started playing with the "base" NanoBSD scripts and have >>> run into an interesting issue. >> [...] >>> Note the missing "r" bit for "other" in usr and etc directories -- and >>> the missing "x" bit (at minimum) for the root! The same is carried down >>> to "local" under usr: >>> >>> root@NewFS:/pics/Crochet-work-AMD/obj/_.w # ls -al usr >>> total 134 >>> drwxr-x--x 12 root wheel 12 Jun 15 17:10 . >>> drwxr-x--- 18 root wheel 24 Jun 15 17:10 .. >>> drwxr-xr-x 2 root wheel 497 Jun 15 17:09 bin >>> drwxr-xr-x 52 root wheel 327 Jun 15 17:10 include >>> drwxr-xr-x 8 root wheel 655 Jun 15 17:10 lib >>> drwxr-xr-x 4 root wheel 670 Jun 15 17:09 lib32 >>> drwxr-xr-x 5 root wheel5 Jun 15 17:10 libdata >>> drwxr-xr-x 7 root wheel 70 Jun 15 17:10 libexec >>> drwxr-x--x 10 root wheel 11 Jun 15 17:10 local >>> drwxr-xr-x 2 root wheel 294 Jun 15 17:08 sbin >>> drwxr-xr-x 31 root wheel 31 Jun 15 17:10 share >>> drwxr-xr-x 14 root wheel 17 Jun 15 17:10 tests >>> root@NewFS:/pics/Crochet-work-AMD/obj/_.w # >> I have no idea why this is happening on your system but I'm not >> observing it here: >> >>> ls -al usr >> total 85 >> drwxr-xr-x 9 root wheel9 Jun 15 13:32 . >> drwxr-xr-x 22 root wheel 29 Jun 15 13:32 .. >> drwxr-xr-x 2 root wheel 359 Jun 15 13:32 bin >> drwxr-xr-x 4 root wheel 446 Jun 15 13:32 lib >> drwxr-xr-x 3 root wheel3 Jun 15 13:32 libdata >> drwxr-xr-x 5 root wheel 47 Jun 15 13:32 libexec >> drwxr-xr-x 12 root wheel 13 Jun 15 13:32 local >> drwxr-xr-x 2 root wheel 218 Jun 15 13:32 sbin >> drwxr-xr-x 17 root wheel 17 Jun 15 13:32 share >> >> >> and I get (almost) the same on the installed nanobsd system: >>> ls -al usr >> total 24 >> drwxr-xr-x 9 root wheel512 Jun 15 13:32 . >> drwxr-xr-x 23 root wheel512 Jun 15 13:34 .. >> drwxr-xr-x 2 root wheel 6144 Jun 15 13:32 bin >> drwxr-xr-x 4 root wheel 10752 Jun 15 13:32 lib >> drwxr-xr-x 3 root wheel512 Jun 15 13:32 libdata >> drwxr-xr-x 5 root wheel 1024 Jun 15 13:32 libexec >> drwxr-xr-x 12 root wheel512 Jun 15 13:32 local >> drwxr-xr-x 2 root wheel 4096 Jun 15 13:32 sbin >> drwxr-xr-x 17 root wheel512 Jun 15 13:32 share >> >> The machine I'm building the NanoBSD image on is running head r318959, >> and is running ZFS, while the NanoBSD system I've built is tracking >> 11-STABLE and is at r319971 at present, so a BETA1. >> >> Could you report version information too? maybe it's a problem present >> on head NanoBSD scripts? > FreeBSD 11.0-STABLE #15 r312669M: Mon Jan 23 14:01:03 CST 2017 > k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP > > I also build using Crochet against both /usr/src (my "primary" source > repo, which is on the rev noted here) and against a second one (-HEAD), > which I need to use for the RPI3. Neither winds up with this sort of > permission issue. > > The obj directory is on /pics/Crochet-Work-AMD, which is a zfs > filesystem mounted off a "scratch" SSD. > > The problem appears to stem from the creation of "_.w" and since > directory permissions are "normally" inherited it promulgates from there > unless an explicit permission set occurs. Yet I see nothing that would > create the world directory with anything other than the umask at the > time it runs. > > I *am* running this from "batch" -- perhaps that's where the problem is > coming from? I'll try adding a "umask 022" to the nanobsd.sh script at > the top and see what that does. Nope. It's something in the installworld subset; I put a stop in after the clean/create world directory and I have a 0755 permission mask on the (empty) directory. Hmmm... I do not know where this is coming from now but this test implies that it's the "installworld" action that causes it. root@NewFS:/pics/Crochet-work-AMD/obj # ls -al total 2176760 drwxr-xr-x 5 root wheel 24 Jun 16 09:41 . drwxr-xr-x 3 root wheel 3 Jun 16 08:25 .. -rw-r--r-- 1 root wheel 7658918 Jun 16 09:22 _.bk -rw-r--r-- 1 root wheel53768368 Jun 16 09:15 _.bw -rw-r--r-- 1 root wheel 200 Jun 16 09:25 _.cust.cust_comconsole -rw-r--r-- 1 root wheel 733 J
Re: Interesting permissions difference on NanoBSD build
On 6/16/2017 07:52, Guido Falsi wrote: > On 06/16/17 14:25, Karl Denninger wrote: >> I've recently started playing with the "base" NanoBSD scripts and have >> run into an interesting issue. > [...] >> Note the missing "r" bit for "other" in usr and etc directories -- and >> the missing "x" bit (at minimum) for the root! The same is carried down >> to "local" under usr: >> >> root@NewFS:/pics/Crochet-work-AMD/obj/_.w # ls -al usr >> total 134 >> drwxr-x--x 12 root wheel 12 Jun 15 17:10 . >> drwxr-x--- 18 root wheel 24 Jun 15 17:10 .. >> drwxr-xr-x 2 root wheel 497 Jun 15 17:09 bin >> drwxr-xr-x 52 root wheel 327 Jun 15 17:10 include >> drwxr-xr-x 8 root wheel 655 Jun 15 17:10 lib >> drwxr-xr-x 4 root wheel 670 Jun 15 17:09 lib32 >> drwxr-xr-x 5 root wheel5 Jun 15 17:10 libdata >> drwxr-xr-x 7 root wheel 70 Jun 15 17:10 libexec >> drwxr-x--x 10 root wheel 11 Jun 15 17:10 local >> drwxr-xr-x 2 root wheel 294 Jun 15 17:08 sbin >> drwxr-xr-x 31 root wheel 31 Jun 15 17:10 share >> drwxr-xr-x 14 root wheel 17 Jun 15 17:10 tests >> root@NewFS:/pics/Crochet-work-AMD/obj/_.w # > I have no idea why this is happening on your system but I'm not > observing it here: > >> ls -al usr > total 85 > drwxr-xr-x 9 root wheel9 Jun 15 13:32 . > drwxr-xr-x 22 root wheel 29 Jun 15 13:32 .. > drwxr-xr-x 2 root wheel 359 Jun 15 13:32 bin > drwxr-xr-x 4 root wheel 446 Jun 15 13:32 lib > drwxr-xr-x 3 root wheel3 Jun 15 13:32 libdata > drwxr-xr-x 5 root wheel 47 Jun 15 13:32 libexec > drwxr-xr-x 12 root wheel 13 Jun 15 13:32 local > drwxr-xr-x 2 root wheel 218 Jun 15 13:32 sbin > drwxr-xr-x 17 root wheel 17 Jun 15 13:32 share > > > and I get (almost) the same on the installed nanobsd system: >> ls -al usr > total 24 > drwxr-xr-x 9 root wheel512 Jun 15 13:32 . > drwxr-xr-x 23 root wheel512 Jun 15 13:34 .. > drwxr-xr-x 2 root wheel 6144 Jun 15 13:32 bin > drwxr-xr-x 4 root wheel 10752 Jun 15 13:32 lib > drwxr-xr-x 3 root wheel512 Jun 15 13:32 libdata > drwxr-xr-x 5 root wheel 1024 Jun 15 13:32 libexec > drwxr-xr-x 12 root wheel512 Jun 15 13:32 local > drwxr-xr-x 2 root wheel 4096 Jun 15 13:32 sbin > drwxr-xr-x 17 root wheel512 Jun 15 13:32 share > > The machine I'm building the NanoBSD image on is running head r318959, > and is running ZFS, while the NanoBSD system I've built is tracking > 11-STABLE and is at r319971 at present, so a BETA1. > > Could you report version information too? maybe it's a problem present > on head NanoBSD scripts? FreeBSD 11.0-STABLE #15 r312669M: Mon Jan 23 14:01:03 CST 2017 k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP I also build using Crochet against both /usr/src (my "primary" source repo, which is on the rev noted here) and against a second one (-HEAD), which I need to use for the RPI3. Neither winds up with this sort of permission issue. The obj directory is on /pics/Crochet-Work-AMD, which is a zfs filesystem mounted off a "scratch" SSD. The problem appears to stem from the creation of "_.w" and since directory permissions are "normally" inherited it promulgates from there unless an explicit permission set occurs. Yet I see nothing that would create the world directory with anything other than the umask at the time it runs. I *am* running this from "batch" -- perhaps that's where the problem is coming from? I'll try adding a "umask 022" to the nanobsd.sh script at the top and see what that does. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Interesting permissions difference on NanoBSD build
I've recently started playing with the "base" NanoBSD scripts and have run into an interesting issue. Specifically, this is what winds up in the "_.w" (world) directory base when the build completes: root@NewFS:/pics/Crochet-work-AMD/obj/_.w # ls -al total 112 drwxr-x--- 18 root wheel24 Jun 15 17:10 . drwxr-xr-x 5 root wheel24 Jun 15 17:11 .. -rw-r--r-- 2 root wheel 955 Jun 15 17:09 .cshrc -rw-r--r-- 2 root wheel 247 Jun 15 17:09 .profile -r--r--r-- 1 root wheel 6197 Jun 15 17:09 COPYRIGHT drwxr-xr-x 2 root wheel47 Jun 15 17:08 bin drwxr-xr-x 8 root wheel51 Jun 15 17:09 boot -rw-r--r-- 1 root wheel12 Jun 15 17:09 boot.config drwxr-xr-x 2 root wheel 2 Jun 15 17:09 cfg drwxr-xr-x 4 root wheel 4 Jun 15 17:10 conf dr-xr-xr-x 2 root wheel 3 Jun 15 17:09 dev drwxr-x--x 28 root wheel 110 Jun 15 17:10 etc drwxr-xr-x 4 root wheel56 Jun 15 17:08 lib drwxr-xr-x 3 root wheel 5 Jun 15 17:09 libexec drwxr-xr-x 2 root wheel 2 Jun 15 17:07 media drwxr-xr-x 2 root wheel 2 Jun 15 17:07 mnt dr-xr-xr-x 2 root wheel 2 Jun 15 17:07 proc drwxr-xr-x 2 root wheel 146 Jun 15 17:08 rescue drwxr-xr-x 2 root wheel12 Jun 15 17:10 root drwxr-xr-x 2 root wheel 137 Jun 15 17:08 sbin lrwxr-xr-x 1 root wheel11 Jun 15 17:07 sys -> usr/src/sys lrwxr-xr-x 1 root wheel 7 Jun 15 17:10 tmp -> var/tmp drwxr-x--x 12 root wheel12 Jun 15 17:10 usr drwxr-xr-x 25 root wheel25 Jun 15 17:10 var root@NewFS:/pics/Crochet-work-AMD/obj/_.w # Note the missing "r" bit for "other" in usr and etc directories -- and the missing "x" bit (at minimum) for the root! The same is carried down to "local" under usr: root@NewFS:/pics/Crochet-work-AMD/obj/_.w # ls -al usr total 134 drwxr-x--x 12 root wheel 12 Jun 15 17:10 . drwxr-x--- 18 root wheel 24 Jun 15 17:10 .. drwxr-xr-x 2 root wheel 497 Jun 15 17:09 bin drwxr-xr-x 52 root wheel 327 Jun 15 17:10 include drwxr-xr-x 8 root wheel 655 Jun 15 17:10 lib drwxr-xr-x 4 root wheel 670 Jun 15 17:09 lib32 drwxr-xr-x 5 root wheel5 Jun 15 17:10 libdata drwxr-xr-x 7 root wheel 70 Jun 15 17:10 libexec drwxr-x--x 10 root wheel 11 Jun 15 17:10 local drwxr-xr-x 2 root wheel 294 Jun 15 17:08 sbin drwxr-xr-x 31 root wheel 31 Jun 15 17:10 share drwxr-xr-x 14 root wheel 17 Jun 15 17:10 tests root@NewFS:/pics/Crochet-work-AMD/obj/_.w # I do not know if this is intentional, but it certainly was not expected. It does carry through to the disk image that is created as well and then there's this, which if you mount the image leads me to wonder what's going on: root@NewFS:/pics/Crochet-work-AMD/obj # mount -o ro /dev/md0s1a /mnt root@NewFS:/pics/Crochet-work-AMD/obj # cd /mnt root@NewFS:/mnt # ls -al total 34 drwxr-x--- 19 root wheel 512 Jun 15 17:10 . drwxr-xr-x 45 root wheel 55 Jun 1 10:58 .. -rw-r--r-- 2 root wheel 955 Jun 15 17:09 .cshrc -rw-r--r-- 2 root wheel 247 Jun 15 17:09 .profile drwxrwxr-x 2 root operator 512 Jun 15 17:10 .snap -r--r--r-- 1 root wheel 6197 Jun 15 17:09 COPYRIGHT drwxr-xr-x 2 root wheel 1024 Jun 15 17:08 bin drwxr-xr-x 8 root wheel 1024 Jun 15 17:09 boot -rw-r--r-- 1 root wheel 12 Jun 15 17:09 boot.config drwxr-xr-x 2 root wheel 512 Jun 15 17:09 cfg drwxr-xr-x 4 root wheel 512 Jun 15 17:10 conf dr-xr-xr-x 2 root wheel 512 Jun 15 17:09 dev drwxr-x--x 28 root wheel 2048 Jun 15 17:10 etc drwxr-xr-x 4 root wheel 1536 Jun 15 17:08 lib drwxr-xr-x 3 root wheel 512 Jun 15 17:09 libexec drwxr-xr-x 2 root wheel 512 Jun 15 17:07 media drwxr-xr-x 2 root wheel 512 Jun 15 17:07 mnt dr-xr-xr-x 2 root wheel 512 Jun 15 17:07 proc drwxr-xr-x 2 root wheel 2560 Jun 15 17:08 rescue drwxr-xr-x 2 root wheel 512 Jun 15 17:10 root drwxr-xr-x 2 root wheel 2560 Jun 15 17:08 sbin lrwxr-xr-x 1 root wheel 11 Jun 15 17:07 sys -> usr/src/sys lrwxr-xr-x 1 root wheel7 Jun 15 17:10 tmp -> var/tmp drwxr-x--x 12 root wheel 512 Jun 15 17:10 usr drwxr-xr-x 25 root wheel 512 Jun 15 17:10 var Note the permissions at the root -- that denies *search* for others it is an exact copy of the "_.w" permission list of course, but if you create a non-root user as a part of the NanoBSD build you wind up with some "interesting" behavior when that user logs in! I'm assuming this is unintentional but wondering where it comes from (and whether it needs / should be fixed); it's easy to fix it, of course, once the embedded system boots but you need to (obviously) mount read/write long enough to update it -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: FreeBSD 10.2-RELEASE #0 r286666: Panic and crash
On 2/6/2017 15:01, Shawn Bakhtiar wrote: > Hi all! > > http://pastebin.com/niXrjF0D > > Please refer to full output from crash above. > > This morning our IMAP server decided to go belly up. I could not remote in, > and the machine would not respond to any pings. > > Checking the physical console I had the following worrisome messages on > screen: > > • g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5 > • g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16 > • /mnt/USBBD: got error 16 while accessing filesystem > • panic: softdep_deallocate_dependencies: unrecovered I/O error > • cpuid = 5 > > /mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the > IMAP data using rsync. Everything so far has worked without issue. > > I also noticed a bunch of: > > • fstat: can't read file 2 at 0x41f > • fstat: can't read file 4 at 0x78 > • fstat: can't read file 5 at 0x6 > • fstat: can't read file 1 at 0x27f > • fstat: can't read file 2 at 0x41f > • fstat: can't read file 4 at 0x78 > • fstat: can't read file 5 at 0x6 > > > but I have no idea what these are from. > > df -h output: > /dev/da0p21.8T226G1.5T13%/ > devfs 1.0K1.0K 0B 100%/dev > /dev/da1p17.0T251G6.2T 4%/mnt/USBBD > > > da0p2 is a RAID level 5 on an HP Smart Array > > Here is the output of dmsg after reboot: > http://pastebin.com/rHVjgZ82 > > Obviously both the RAID and USB drive did not walk away from the crash > cleaning. Should I be running a fsck at this point on both from single user > mode to verify and clean up. My concern is the: > WARNING: /: mount pending error: blocks 0 files 26 > when mounting /dev/da0p2 > > For some reason I was under the impression that fsck was run automatically on > reboot. > > Any help in this matter would be greatly appreciated. I'm a little concerned > that a backup strategy that has worked for us for many MANY years would so > easily throw the OS into panic. If an I/O error occurred on the USB Drive I > would frankly think it should just back out, without panic. Or am I missing > something? > > Any recommendations / insights would be most welcome. > Shawn > > The "mount pending error" is normal on a disk that has softupdates turned on; fsck runs in the background after the boot, and this is "safe" because of how the metadata and data writes are ordered. In other words the filesystem in this situation is missing uncommitted data, but the state of the system is consistent. As a result the system can mount root read-write without having to fsck it first and the background cleanup is safe from a disk consistency problem. The panic itself appears to have resulted from an I/O error that resulted in a failed operation. I was part of a thread in 2016 on this you can find here: https://lists.freebsd.org/pipermail/freebsd-stable/2016-July/084944.html The basic problem is that the softupdates code cannot deal with a hard I/O error on write because it no longer can guarantee filesystem integrity if it continues. I argued in that thread that the superior solution would be forcibly detach the volume, which would leave you with a "dirty" filesystem and a failed operation but not a panic. The file(s) involved in the write error might be lost, but the integrity of the filesystem is recoverable (as it is in the panic case) -- at least it is if the fsck doesn't require writing to a block that *also* errors out. The decision in the code is to panic rather than detach the volume, however, so panic it is. This one has bit me with sd cards in small embedded-style machines (where turning off softupdates makes things VERY slow) and at some point I may look into developing a patch to forcibly-detach the volume instead. That obviously won't help you if the system volume is the one the error happens on (now you just forcibly detached the root filesystem which is going to get you an immediate panic anyway) but in the event of a data disk it would prevent the system from crashing. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Ugh -- attempted to update this morning, and got a nasty panic in ZFS....
A second attempt to come up on the new kernel was successful -- so this had to be due to queued I/Os that were pending at the time of the shutdown On 1/11/2017 08:31, Karl Denninger wrote: > During the reboot, immediately after the daemons started up on the > machine (the boot got beyond mounting all the disks and was well into > starting up all the background stuff it runs), I got a double-fault. > > . (there were a LOT more of this same; it pretty clearly was a > recursive call sequence that ran the system out of stack space) > > #294 0x822fdcfd in zio_execute (zio=) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #295 0x8230130e in zio_vdev_io_start (zio=0xf8010c8f27b0) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127 > #296 0x822fdcfd in zio_execute (zio=) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #297 0x822e464d in vdev_queue_io_done (zio=) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 > #298 0x823014c9 in zio_vdev_io_done (zio=0xf8010cff0b88) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152 > #299 0x822fdcfd in zio_execute (zio=) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #300 0x8230130e in zio_vdev_io_start (zio=0xf8010cff0b88) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127 > #301 0x822fdcfd in zio_execute (zio=) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #302 0x822e464d in vdev_queue_io_done (zio=) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 > #303 0x823014c9 in zio_vdev_io_done (zio=0xf8010c962000) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152 > #304 0x822fdcfd in zio_execute (zio=) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #305 0x8230130e in zio_vdev_io_start (zio=0xf8010c962000) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127 > #306 0x822fdcfd in zio_execute (zio=) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #307 0x822e464d in vdev_queue_io_done (zio=) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 > #308 0x823014c9 in zio_vdev_io_done (zio=0xf80102175000) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152 > #309 0x822fdcfd in zio_execute (zio=) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #310 0x80b2585a in taskqueue_run_locked (queue= out>) > at /usr/src/sys/kern/subr_taskqueue.c:454 > #311 0x80b26a48 in taskqueue_thread_loop (arg=) > at /usr/src/sys/kern/subr_taskqueue.c:724 > #312 0x80a7eb05 in fork_exit ( > callout=0x80b26960 , > arg=0xf800b8824c30, frame=0xfe0667430c00) > at /usr/src/sys/kern/kern_fork.c:1040 > #313 0x80f87c3e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:611 > #314 0x in ?? () > Current language: auto; currently minimal > (kgdb) > > . > > > NewFS.denninger.net dumped core - see /var/crash/vmcore.3 > > Wed Jan 11 08:15:33 CST 2017 > > FreeBSD NewFS.denninger.net 11.0-STABLE FreeBSD 11.0-STABLE #14 > r311927M: Wed Ja > n 11 07:55:20 CST 2017 > k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP > amd64 > > panic: double fault > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > Fatal double fault > rip = 0x822e3c5d > rsp = 0xfe066742af90 > rbp = 0xfe066742b420 > cpuid = 15; apic id = 35 > panic: double fault > cpuid = 15 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfe0649ddee30 > vpanic() at vpanic+0x186/frame 0xfe0649ddeeb0 > panic() at panic+0x43/frame 0xfe0649ddef10 > dblfault_handler() at dblfault_handler+0xa2/frame 0xfe0649ddef30 > Xdblfault() at Xdblfault+0xac/frame 0xfe0649ddef30 > --- trap 0x17, rip = 0x822e3c5d, rsp = 0xfe066742af90, rbp = > 0xf > e0
Ugh -- attempted to update this morning, and got a nasty panic in ZFS....
blem -- and setting stackpages didn't help! I've got the dump if anything in particular would be of help. The prompt to do this in the first place was the openssh CVE that was recently issued. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE
On 10/17/2016 18:32, Steven Hartland wrote: > > On 17/10/2016 22:50, Karl Denninger wrote: >> I will make some effort on the sandbox machine to see if I can come up >> with a way to replicate this. I do have plenty of spare larger drives >> laying around that used to be in service and were obsolesced due to >> capacity -- but what I don't know if whether the system will misbehave >> if the source is all spinning rust. >> >> In other words: >> >> 1. Root filesystem is mirrored spinning rust (production is mirrored >> SSDs) >> >> 2. Backup is mirrored spinning rust (of approx the same size) >> >> 3. Set up auto-snapshot exactly as the production system has now (which >> the sandbox is NOT since I don't care about incremental recovery on that >> machine; it's a sandbox!) >> >> 4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for >> the Pi2s I have here, etc) to generate a LOT of filesystem entropy >> across lots of snapshots. >> >> 5. Back that up. >> >> 6. Export the backup pool. >> >> 7. Re-import it and "zfs destroy -r" the backup filesystem. >> >> That is what got me in a reboot loop after the *first* panic; I was >> simply going to destroy the backup filesystem and re-run the backup, but >> as soon as I issued that zfs destroy the machine panic'd and as soon as >> I re-attached it after a reboot it panic'd again. Repeat until I set >> trim=0. >> >> But... if I CAN replicate it that still shouldn't be happening, and the >> system should *certainly* survive attempting to TRIM on a vdev that >> doesn't support TRIMs, even if the removal is for a large amount of >> space and/or files on the target, without blowing up. >> >> BTW I bet it isn't that rare -- if you're taking timed snapshots on an >> active filesystem (with lots of entropy) and then make the mistake of >> trying to remove those snapshots (as is the case with a zfs destroy -r >> or a zfs recv of an incremental copy that attempts to sync against a >> source) on a pool that has been imported before the system realizes that >> TRIM is unavailable on those vdevs. >> >> Noting this: >> >> Yes need to find some time to have a look at it, but given how rare >> this is and with TRIM being re-implemented upstream in a totally >> different manor I'm reticent to spend any real time on it. >> >> What's in-process in this regard, if you happen to have a reference? > Looks like it may be still in review: https://reviews.csiden.org/r/263/ > > Having increased the kernel stack page count I have not had another instance of this in the last couple of weeks+, and I am running daily backup jobs as usual... So this *does not* appear to be an infinite recursion problem... -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE
On 10/17/2016 18:32, Steven Hartland wrote: > > > On 17/10/2016 22:50, Karl Denninger wrote: >> I will make some effort on the sandbox machine to see if I can come up >> with a way to replicate this. I do have plenty of spare larger drives >> laying around that used to be in service and were obsolesced due to >> capacity -- but what I don't know if whether the system will misbehave >> if the source is all spinning rust. >> >> In other words: >> >> 1. Root filesystem is mirrored spinning rust (production is mirrored >> SSDs) >> >> 2. Backup is mirrored spinning rust (of approx the same size) >> >> 3. Set up auto-snapshot exactly as the production system has now (which >> the sandbox is NOT since I don't care about incremental recovery on that >> machine; it's a sandbox!) >> >> 4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for >> the Pi2s I have here, etc) to generate a LOT of filesystem entropy >> across lots of snapshots. >> >> 5. Back that up. >> >> 6. Export the backup pool. >> >> 7. Re-import it and "zfs destroy -r" the backup filesystem. >> >> That is what got me in a reboot loop after the *first* panic; I was >> simply going to destroy the backup filesystem and re-run the backup, but >> as soon as I issued that zfs destroy the machine panic'd and as soon as >> I re-attached it after a reboot it panic'd again. Repeat until I set >> trim=0. >> >> But... if I CAN replicate it that still shouldn't be happening, and the >> system should *certainly* survive attempting to TRIM on a vdev that >> doesn't support TRIMs, even if the removal is for a large amount of >> space and/or files on the target, without blowing up. >> >> BTW I bet it isn't that rare -- if you're taking timed snapshots on an >> active filesystem (with lots of entropy) and then make the mistake of >> trying to remove those snapshots (as is the case with a zfs destroy -r >> or a zfs recv of an incremental copy that attempts to sync against a >> source) on a pool that has been imported before the system realizes that >> TRIM is unavailable on those vdevs. >> >> Noting this: >> >> Yes need to find some time to have a look at it, but given how rare >> this is and with TRIM being re-implemented upstream in a totally >> different manor I'm reticent to spend any real time on it. >> >> What's in-process in this regard, if you happen to have a reference? > Looks like it may be still in review: https://reviews.csiden.org/r/263/ > > Initial attempts to provoke the panic has failed on the sandbox machine -- it appears that I need a materially-fragmented backup volume (which makes sense, as that would greatly increase the number of TRIM's queued.) Running a bunch of builds with snapshots taken between generates a metric ton of entropy in the filesystem, but it appears that the number of TRIMs actually issued when you bulk-remove them (with zfs destroy -r) is small enough to not cause it -- probably because the system issues one per area of freed disk, and since there is no interleaving with other (non-removed) data that number is "reasonable" since there's little fragmentation of that free space. The TRIMs *are* attempted, and they *do* fail, however. I'm running with the 6 pages of kstack now on the production machine, and we'll see if I get another panic... -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE
I will make some effort on the sandbox machine to see if I can come up with a way to replicate this. I do have plenty of spare larger drives laying around that used to be in service and were obsolesced due to capacity -- but what I don't know if whether the system will misbehave if the source is all spinning rust. In other words: 1. Root filesystem is mirrored spinning rust (production is mirrored SSDs) 2. Backup is mirrored spinning rust (of approx the same size) 3. Set up auto-snapshot exactly as the production system has now (which the sandbox is NOT since I don't care about incremental recovery on that machine; it's a sandbox!) 4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for the Pi2s I have here, etc) to generate a LOT of filesystem entropy across lots of snapshots. 5. Back that up. 6. Export the backup pool. 7. Re-import it and "zfs destroy -r" the backup filesystem. That is what got me in a reboot loop after the *first* panic; I was simply going to destroy the backup filesystem and re-run the backup, but as soon as I issued that zfs destroy the machine panic'd and as soon as I re-attached it after a reboot it panic'd again. Repeat until I set trim=0. But... if I CAN replicate it that still shouldn't be happening, and the system should *certainly* survive attempting to TRIM on a vdev that doesn't support TRIMs, even if the removal is for a large amount of space and/or files on the target, without blowing up. BTW I bet it isn't that rare -- if you're taking timed snapshots on an active filesystem (with lots of entropy) and then make the mistake of trying to remove those snapshots (as is the case with a zfs destroy -r or a zfs recv of an incremental copy that attempts to sync against a source) on a pool that has been imported before the system realizes that TRIM is unavailable on those vdevs. Noting this: Yes need to find some time to have a look at it, but given how rare this is and with TRIM being re-implemented upstream in a totally different manor I'm reticent to spend any real time on it. What's in-process in this regard, if you happen to have a reference? On 10/17/2016 16:40, Steven Hartland wrote: > Setting those values will only effect what's queued to the device not > what's actually outstanding. > > On 17/10/2016 21:22, Karl Denninger wrote: >> Since I cleared it (by setting TRIM off on the test machine, rebooting, >> importing the pool and noting that it did not panic -- pulled drives, >> re-inserted into the production machine and ran backup routine -- all >> was normal) it may be a while before I see it again (a week or so is >> usual.) >> >> It appears to be related to entropy in the filesystem that comes up as >> "eligible" to be removed from the backup volume, which (not >> surprisingly) tends to happen a few days after I do a new world build or >> something similar (the daily and/or periodic snapshots roll off at about >> that point.) >> >> I don't happen to have a spare pair of high-performance SSDs I can stick >> in the sandbox machine in an attempt to force the condition to assert >> itself in test, unfortunately. >> >> I *am* concerned that it's not "simple" stack exhaustion because setting >> the max outstanding TRIMs on a per-vdev basis down quite-dramatically >> did *not* prevent it from happening -- and if it was simply stack depth >> related I would have expected that to put a stop to it. >> >> On 10/17/2016 15:16, Steven Hartland wrote: >>> Be good to confirm its not an infinite loop by giving it a good bump >>> first. >>> >>> On 17/10/2016 19:58, Karl Denninger wrote: >>>> I can certainly attempt setting that higher but is that not just >>>> hiding the problem rather than addressing it? >>>> >>>> >>>> On 10/17/2016 13:54, Steven Hartland wrote: >>>>> You're hitting stack exhaustion, have you tried increasing the kernel >>>>> stack pages? >>>>> It can be changed from /boot/loader.conf >>>>> kern.kstack_pages="6" >>>>> >>>>> Default on amd64 is 4 IIRC >>>>> >>>>> On 17/10/2016 19:08, Karl Denninger wrote: >>>>>> The target (and devices that trigger this) are a pair of 4Gb 7200RPM >>>>>> SATA rotating rust drives (zmirror) with each provider >>>>>> geli-encrypted >>>>>> (that is, the actual devices used for the pool create are the >>>>>> .eli's) >>>>>> >>>>>> The machine generating the problem has both rotating rust device
Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE
Since I cleared it (by setting TRIM off on the test machine, rebooting, importing the pool and noting that it did not panic -- pulled drives, re-inserted into the production machine and ran backup routine -- all was normal) it may be a while before I see it again (a week or so is usual.) It appears to be related to entropy in the filesystem that comes up as "eligible" to be removed from the backup volume, which (not surprisingly) tends to happen a few days after I do a new world build or something similar (the daily and/or periodic snapshots roll off at about that point.) I don't happen to have a spare pair of high-performance SSDs I can stick in the sandbox machine in an attempt to force the condition to assert itself in test, unfortunately. I *am* concerned that it's not "simple" stack exhaustion because setting the max outstanding TRIMs on a per-vdev basis down quite-dramatically did *not* prevent it from happening -- and if it was simply stack depth related I would have expected that to put a stop to it. On 10/17/2016 15:16, Steven Hartland wrote: > Be good to confirm its not an infinite loop by giving it a good bump > first. > > On 17/10/2016 19:58, Karl Denninger wrote: >> I can certainly attempt setting that higher but is that not just >> hiding the problem rather than addressing it? >> >> >> On 10/17/2016 13:54, Steven Hartland wrote: >>> You're hitting stack exhaustion, have you tried increasing the kernel >>> stack pages? >>> It can be changed from /boot/loader.conf >>> kern.kstack_pages="6" >>> >>> Default on amd64 is 4 IIRC >>> >>> On 17/10/2016 19:08, Karl Denninger wrote: >>>> The target (and devices that trigger this) are a pair of 4Gb 7200RPM >>>> SATA rotating rust drives (zmirror) with each provider geli-encrypted >>>> (that is, the actual devices used for the pool create are the .eli's) >>>> >>>> The machine generating the problem has both rotating rust devices >>>> *and* >>>> SSDs, so I can't simply shut TRIM off system-wide and call it a day as >>>> TRIM itself is heavily-used; both the boot/root pools and a Postgresql >>>> database pool are on SSDs, while several terabytes of lesser-used data >>>> is on a pool of Raidz2 that is made up of spinning rust. >>> snip... >>>> NewFS.denninger.net dumped core - see /var/crash/vmcore.1 >>>> >>>> Mon Oct 17 09:02:33 CDT 2016 >>>> >>>> FreeBSD NewFS.denninger.net 11.0-STABLE FreeBSD 11.0-STABLE #13 >>>> r307318M: Fri Oct 14 09:23:46 CDT 2016 >>>> k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP amd64 >>>> >>>> panic: double fault >>>> >>>> GNU gdb 6.1.1 [FreeBSD] >>>> Copyright 2004 Free Software Foundation, Inc. >>>> GDB is free software, covered by the GNU General Public License, and >>>> you are >>>> welcome to change it and/or distribute copies of it under certain >>>> conditions. >>>> Type "show copying" to see the conditions. >>>> There is absolutely no warranty for GDB. Type "show warranty" for >>>> details. >>>> This GDB was configured as "amd64-marcel-freebsd"... >>>> >>>> Unread portion of the kernel message buffer: >>>> >>>> Fatal double fault >>>> rip = 0x8220d9ec >>>> rsp = 0xfe066821f000 >>>> rbp = 0xfe066821f020 >>>> cpuid = 6; apic id = 14 >>>> panic: double fault >>>> cpuid = 6 >>>> KDB: stack backtrace: >>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>>> 0xfe0649d78e30 >>>> vpanic() at vpanic+0x182/frame 0xfe0649d78eb0 >>>> panic() at panic+0x43/frame 0xfe0649d78f10 >>>> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe0649d78f30 >>>> Xdblfault() at Xdblfault+0xac/frame 0xfe0649d78f30 >>>> --- trap 0x17, rip = 0x8220d9ec, rsp = 0xfe066821f000, >>>> rbp = >>>> 0xfe066821f020 --- >>>> avl_rotation() at avl_rotation+0xc/frame 0xfe066821f020 >>>> avl_remove() at avl_remove+0x1c8/frame 0xfe066821f070 >>>> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x87f/frame >>>> 0xfe066821f530 >>>> vdev_queue_io_done() at vdev_queue_io_done+0x83/frame >>>> 0xfe066821f570 >>>> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f5a0 >>>&g
Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE
I can certainly attempt setting that higher but is that not just hiding the problem rather than addressing it? On 10/17/2016 13:54, Steven Hartland wrote: > You're hitting stack exhaustion, have you tried increasing the kernel > stack pages? > It can be changed from /boot/loader.conf > kern.kstack_pages="6" > > Default on amd64 is 4 IIRC > > On 17/10/2016 19:08, Karl Denninger wrote: >> The target (and devices that trigger this) are a pair of 4Gb 7200RPM >> SATA rotating rust drives (zmirror) with each provider geli-encrypted >> (that is, the actual devices used for the pool create are the .eli's) >> >> The machine generating the problem has both rotating rust devices *and* >> SSDs, so I can't simply shut TRIM off system-wide and call it a day as >> TRIM itself is heavily-used; both the boot/root pools and a Postgresql >> database pool are on SSDs, while several terabytes of lesser-used data >> is on a pool of Raidz2 that is made up of spinning rust. > snip... >> >> NewFS.denninger.net dumped core - see /var/crash/vmcore.1 >> >> Mon Oct 17 09:02:33 CDT 2016 >> >> FreeBSD NewFS.denninger.net 11.0-STABLE FreeBSD 11.0-STABLE #13 >> r307318M: Fri Oct 14 09:23:46 CDT 2016 >> k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP amd64 >> >> panic: double fault >> >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and >> you are >> welcome to change it and/or distribute copies of it under certain >> conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for >> details. >> This GDB was configured as "amd64-marcel-freebsd"... >> >> Unread portion of the kernel message buffer: >> >> Fatal double fault >> rip = 0x8220d9ec >> rsp = 0xfe066821f000 >> rbp = 0xfe066821f020 >> cpuid = 6; apic id = 14 >> panic: double fault >> cpuid = 6 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfe0649d78e30 >> vpanic() at vpanic+0x182/frame 0xfe0649d78eb0 >> panic() at panic+0x43/frame 0xfe0649d78f10 >> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe0649d78f30 >> Xdblfault() at Xdblfault+0xac/frame 0xfe0649d78f30 >> --- trap 0x17, rip = 0x8220d9ec, rsp = 0xfe066821f000, rbp = >> 0xfe066821f020 --- >> avl_rotation() at avl_rotation+0xc/frame 0xfe066821f020 >> avl_remove() at avl_remove+0x1c8/frame 0xfe066821f070 >> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x87f/frame >> 0xfe066821f530 >> vdev_queue_io_done() at vdev_queue_io_done+0x83/frame 0xfe066821f570 >> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f5a0 >> zio_execute() at zio_execute+0x23d/frame 0xfe066821f5f0 >> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f650 >> zio_execute() at zio_execute+0x23d/frame 0xfe066821f6a0 >> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f6e0 >> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f710 >> zio_execute() at zio_execute+0x23d/frame 0xfe066821f760 >> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f7c0 >> zio_execute() at zio_execute+0x23d/frame 0xfe066821f810 >> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f850 >> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f880 >> zio_execute() at zio_execute+0x23d/frame 0xfe066821f8d0 >> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f930 >> zio_execute() at zio_execute+0x23d/frame 0xfe066821f980 >> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f9c0 >> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f9f0 >> zio_execute() at zio_execute+0x23d/frame 0xfe066821fa40 >> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821faa0 >> zio_execute() at zio_execute+0x23d/frame 0xfe066821faf0 >> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821fb30 >> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821fb60 >> zio_execute() at zio_execute+0x23d/frame 0xfe066821fbb0 >> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821fc10 >> zio_execute() at zio_execute+0x23d/frame 0xfe066821fc60 >> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821fca0 >> zio_vdev_io_d
Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE
The target (and devices that trigger this) are a pair of 4Gb 7200RPM SATA rotating rust drives (zmirror) with each provider geli-encrypted (that is, the actual devices used for the pool create are the .eli's) The machine generating the problem has both rotating rust devices *and* SSDs, so I can't simply shut TRIM off system-wide and call it a day as TRIM itself is heavily-used; both the boot/root pools and a Postgresql database pool are on SSDs, while several terabytes of lesser-used data is on a pool of Raidz2 that is made up of spinning rust. vfs.zfs.trim.max_interval: 1 vfs.zfs.trim.timeout: 30 vfs.zfs.trim.txg_delay: 32 vfs.zfs.trim.enabled: 1 vfs.zfs.vdev.trim_max_pending: 1 vfs.zfs.vdev.trim_max_active: 64 vfs.zfs.vdev.trim_min_active: 1 vfs.zfs.vdev.trim_on_init: 1 kstat.zfs.misc.zio_trim.failed: 0 kstat.zfs.misc.zio_trim.unsupported: 1080 kstat.zfs.misc.zio_trim.success: 573768 kstat.zfs.misc.zio_trim.bytes: 28964282368 The machine in question has been up for ~3 hours now since the last panic, so obviously TRIM is being heavily used... The issue, once the problem has been created, is *portable* and it is not being caused by the SSD source drives. That is, once the machine panics if I remove the two disks that form the backup pool, physically move them to my sandbox machine, geli attach the drives and import the pool within seconds the second machine will panic in the identical fashion. It's possible (but have not proved) that if I were to reboot enough times the filesystem would eventually reach consistency with the removed snapshots all gone and the panics would stop, but I got a half-dozen of them sequentially this morning on my test machine so I'm not at all sure how many more I'd need to allow to run, or whether *any* of the removals committed before the panic (if not then the cycle of reboot/attach/panic would never end) :-) Reducing trim_max_active (to 10, a quite-drastic reduction) did not stop the panics. What appears to be happening is that the removal of the datasets in question on a reasonably newly-imported pool, whether it occurs by the incremental zfs recv -Fudv or by zfs destroy -r from the command line, generates a large number of TRIM requests which are of course rejected by the providers as spinning rust does not support them. However the attempt to queue them generates a stack overflow and double-fault panic as a result, and since once the command is issued the filesystem now has the deletes pending and the consistent state is in fact with them gone, any attempt to reattach the drives with TRIM enabled can result in an immediate additional panic. I tried to work around this in my backup script by creating and then destroying a file on the backup volume, then sleeping for a few seconds before the backup actually commenced, in the hope that this would (1) trigger a TRIM attempt and (2) lead the system to recognize that the target volume cannot support TRIM and thus stop trying to do so (and thus not lead to the storm that exhausts the stack and panic.) That approach, however (see below), failed to prevent the problem. # # Now try to trigger TRIM so that we don't have a storm of them # echo "Attempting to disable TRIM on spinning rust" mount -t zfs backup/no-trim /mnt dd if=/dev/random of=/mnt/kill-trim bs=128k count=2 echo "Performed 2 writes" sleep 2 rm /mnt/kill-trim echo "Performed delete of written file" sleep 5 umount /mnt echo "Unmounted temporary filesystem" sleep 2 echo "TRIM disable theoretically done" On 10/17/2016 12:43, Warner Losh wrote: > what's your underlying media? > > Warner > > > On Mon, Oct 17, 2016 at 10:02 AM, Karl Denninger wrote: >> Update from my test system: >> >> Setting vfs.zfs.vdev_trim_max_active to 10 (from default 64) does *not* >> stop the panics. >> >> Setting vfs.zfs.vdev.trim.enabled = 0 (which requires a reboot) DOES >> stop the panics. >> >> I am going to run a scrub on the pack, but I suspect the pack itself >> (now that I can actually mount it without the machine blowing up!) is fine. >> >> THIS (OBVIOUSLY) NEEDS ATTENTION! >> >> On 10/17/2016 09:17, Karl Denninger wrote: >>> This is a situation I've had happen before, and reported -- it appeared >>> to be a kernel stack overflow, and it has gotten materially worse on >>> 11.0-STABLE. >>> >>> The issue occurs after some period of time (normally a week or so.) The >>> system has a mirrored pair of large drives used for backup purposes to >>> which ZFS snapshots are written using a script that iterates over the >>> system. >>> >>> The panic /only /happens when the root filesystem is being sent, and it >>> appears that the panic itself is being triggered by an I/O pattern
Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE
Update from my test system: Setting vfs.zfs.vdev_trim_max_active to 10 (from default 64) does *not* stop the panics. Setting vfs.zfs.vdev.trim.enabled = 0 (which requires a reboot) DOES stop the panics. I am going to run a scrub on the pack, but I suspect the pack itself (now that I can actually mount it without the machine blowing up!) is fine. THIS (OBVIOUSLY) NEEDS ATTENTION! On 10/17/2016 09:17, Karl Denninger wrote: > This is a situation I've had happen before, and reported -- it appeared > to be a kernel stack overflow, and it has gotten materially worse on > 11.0-STABLE. > > The issue occurs after some period of time (normally a week or so.) The > system has a mirrored pair of large drives used for backup purposes to > which ZFS snapshots are written using a script that iterates over the > system. > > The panic /only /happens when the root filesystem is being sent, and it > appears that the panic itself is being triggered by an I/O pattern on > the /backup /drive -- not the source drives. Zpool scrubs on the source > are clean; I am going to run one now on the backup, but in the past that > has been clean as well. > > I now have a *repeatable* panic in that if I attempt a "zfs list -rt all > backup" on the backup volume I get the below panic. A "zfs list" > does*__*not panic the system. > > The operating theory previously (after digging through the passed > structures in the dump) was that the ZFS system was attempting to issue > TRIMs on a device that can't do them before the ZFS system realizes this > and stops asking (the backup volume is comprised of spinning rust) but > the appearance of the panic now on the volume when I simply do a "zfs > list -rt all backup" appears to negate that theory since no writes are > performed by that operation, and thus no TRIM calls should be issued. > > I can leave the backup volume in the state that causes this for a short > period of time in an attempt to find and fix this. > > > NewFS.denninger.net dumped core - see /var/crash/vmcore.1 > > Mon Oct 17 09:02:33 CDT 2016 > > FreeBSD NewFS.denninger.net 11.0-STABLE FreeBSD 11.0-STABLE #13 > r307318M: Fri Oct 14 09:23:46 CDT 2016 > k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP amd64 > > panic: double fault > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > Fatal double fault > rip = 0x8220d9ec > rsp = 0xfe066821f000 > rbp = 0xfe066821f020 > cpuid = 6; apic id = 14 > panic: double fault > cpuid = 6 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfe0649d78e30 > vpanic() at vpanic+0x182/frame 0xfe0649d78eb0 > panic() at panic+0x43/frame 0xfe0649d78f10 > dblfault_handler() at dblfault_handler+0xa2/frame 0xfe0649d78f30 > Xdblfault() at Xdblfault+0xac/frame 0xfe0649d78f30 > --- trap 0x17, rip = 0x8220d9ec, rsp = 0xfe066821f000, rbp = > 0xfe066821f020 --- > avl_rotation() at avl_rotation+0xc/frame 0xfe066821f020 > avl_remove() at avl_remove+0x1c8/frame 0xfe066821f070 > vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x87f/frame > 0xfe066821f530 > vdev_queue_io_done() at vdev_queue_io_done+0x83/frame 0xfe066821f570 > zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f5a0 > zio_execute() at zio_execute+0x23d/frame 0xfe066821f5f0 > zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f650 > zio_execute() at zio_execute+0x23d/frame 0xfe066821f6a0 > vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f6e0 > zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f710 > zio_execute() at zio_execute+0x23d/frame 0xfe066821f760 > zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f7c0 > zio_execute() at zio_execute+0x23d/frame 0xfe066821f810 > vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f850 > zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f880 > zio_execute() at zio_execute+0x23d/frame 0xfe066821f8d0 > zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f930 > zio_execute() at zio_execute+0x23d/frame 0xfe066821f980 > vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f9c0 > zio_
Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE
t /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 #170 0x822bf919 in zio_vdev_io_done (zio=0xf8056e47c000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137 #171 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #172 0x822bf75d in zio_vdev_io_start (zio=0xf8056e47c000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3112 #173 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #174 0x822a216d in vdev_queue_io_done (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 #175 0x822bf919 in zio_vdev_io_done (zio=0xf800b17b23d8) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137 #176 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #177 0x822bf75d in zio_vdev_io_start (zio=0xf800b17b23d8) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3112 #178 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #179 0x822a216d in vdev_queue_io_done (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 #180 0x822bf919 in zio_vdev_io_done (zio=0xf800a6d367b0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137 #181 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #182 0x822bf75d in zio_vdev_io_start (zio=0xf800a6d367b0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3112 #183 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #184 0x822a216d in vdev_queue_io_done (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 #185 0x822bf919 in zio_vdev_io_done (zio=0xf8056e99fb88) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137 #186 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #187 0x822bf75d in zio_vdev_io_start (zio=0xf8056e99fb88) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3112 #188 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #189 0x822a216d in vdev_queue_io_done (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 #190 0x822bf919 in zio_vdev_io_done (zio=0xf8056e5227b0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137 #191 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #192 0x822bf75d in zio_vdev_io_start (zio=0xf8056e5227b0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3112 #193 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #194 0x822a216d in vdev_queue_io_done (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 #195 0x822bf919 in zio_vdev_io_done (zio=0xf800a6d3c000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137 #196 0x822bbefd in zio_execute (zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651 #197 0x80b4895a in taskqueue_run_locked (queue=) at /usr/src/sys/kern/subr_taskqueue.c:454 #198 0x80b49b58 in taskqueue_thread_loop (arg=) at /usr/src/sys/kern/subr_taskqueue.c:724 #199 0x80a9f255 in fork_exit ( callout=0x80b49a70 , arg=0xf8057caa72a0, frame=0xfe0668222c00) at /usr/src/sys/kern/kern_fork.c:1040 #200 0x80fb44ae in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #201 0x in ?? () Current language: auto; currently minimal (kgdb) -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Errata notice for 11.0 -- question
I noted this /after /svn updating my 11.x box to the most-current -STABLE: A bug was diagnosed in interaction of the |pmap_activate()| function and TLB shootdownIPI handler on amd64 systems which have PCID features but do not implement theINVPCID instruction. On such machines, such as SandyBridge™ and IvyBridge™ microarchitectures, set the loader tunable |vm.pmap.pcid_enabled=0| during boot: set vm.pmap.pcid_enabled=0 boot Add this line to |/boot/loader.conf| for the change to persist across reboots: To check if the system is affected, check dmesg(8) <http://www.freebsd.org/cgi/man.cgi?query=dmesg&sektion=8&manpath=freebsd-release-ports> for PCID listed in the "Features2", and absence of INVPCID in the "Structured Extended Features". If the PCID feature is not present, or INVPCID is present, system is not affected. Well, I'm allegedly subject to this: Copyright (c) 1992-2016 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.0-STABLE #13 r307318M: Fri Oct 14 09:23:46 CDT 2016 k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP amd64 FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0) VT(vga): text 80x25 CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (2400.13-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206c2 Family=0x6 Model=0x2c Stepping=2 Features=0xbfebfbff Features2=0x29ee3ff AMD Features=0x2c100800 AMD Features2=0x1 VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics And I do _*not*_ have it turned off at this moment. vm.pmap.invpcid_works: 0 vm.pmap.pcid_enabled: 1 But I've also yet to see any sort of misbehavior. So the questions are: 1. What's the misbehavior I should expect to see if this is not shut off (and it isn't)? 2. Should I take this machine down immediately to reboot it with vm.pmap.pcid_enabled=0 in /boot/loader.conf? A svn log perusal didn't see anything post the errata date that appears to be related which makes me wonder. Thanks in advance! -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: FreeBSD 11.0-RC1 regression with regard to mouse integration in VirtualBox 5.1.4
On 8/23/2016 12:48, David Boyd wrote: > Using FreeBSD 10.3-RELEASE-p6 with virtualbox-guest-additions 5.0.26 on > VirtualBox 5.1.4 (CentOS EL7 host) as a baseline I didn't experience any > difficulties. > > After fresh install of FreeBSD 11.0-RC1 with virtualbox-guest-additions > 5.0.26 on VirtualBox 5.1.4 (CentOS EL7 host) mouse integration is > missing. > > I have time and resources to test any changes you have to suggest. > > Thanks. > Does the mouse normally attach as what appears to be a USB port? If so the problem is likely here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211884 -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Delay with 11.0-RC2 builds
On 8/22/2016 23:01, Glen Barber wrote: > On Mon, Aug 22, 2016 at 10:53:06PM -0500, Karl Denninger wrote: >> On 8/22/2016 22:43, Glen Barber wrote: >>> On Thu, Aug 18, 2016 at 11:30:24PM +, Glen Barber wrote: >>>> Two issues have been brought to our attention, and as a result, 11.0-RC2 >>>> builds will be delayed a day or two while these are investigated. >>>> >>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211872 >>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211926 >>>> >>>> An update will be sent if the delay is longer than anticipated. >>>> >>> Just an update, the 11.0-RC2 will be delayed at least two days. One of >>> the issues mentioned in the above PR URLs does not affect releng/11.0, >>> and is a non-issue, but we are awaiting one more change to the stable/11 >>> and releng/11.0 branches that we hope will be the final major changes to >>> 11.0. >>> >>> If this is the case, we may be able to eliminate 11.0-RC3 entirely, and >>> still release on time (or, on time as the current schedule suggests). >>> >>> However, as you know, FreeBSD releases prioritize quality over schedule, >>> so we may still need to adjust the schedule appropriately. >>> >>> So, help with testing 11.0-RC1 (or the latest releng/11.0 from svn) is >>> greatly appreciated. >>> >>> Glen >>> On behalf of: re@ >>> >> Has any decision been made on this? >> >> It is not local to me (others have reported problems with combination >> devices) and rolling back the change in question eliminates the >> problem. It remains un-triaged as of this point. >> >> Note that this impacts a system that is booting and needs manual >> intervention, which is not a good place to a have a problem >> >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211884 >> > Well, it's an EN candidate if we cannot get it fixed before the release. > But, I've put it on our radar, which it was not on mine previously... > > Glen Thank you. As far as I can tell reverting that one commit (which results in just one file being rolled back with a handful of lines) fixes it. The other PR (which is linked in this one) reporter also reported that reverting that same commit fixes the problem for him as well. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Delay with 11.0-RC2 builds
On 8/22/2016 22:43, Glen Barber wrote: > On Thu, Aug 18, 2016 at 11:30:24PM +, Glen Barber wrote: >> Two issues have been brought to our attention, and as a result, 11.0-RC2 >> builds will be delayed a day or two while these are investigated. >> >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211872 >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211926 >> >> An update will be sent if the delay is longer than anticipated. >> > Just an update, the 11.0-RC2 will be delayed at least two days. One of > the issues mentioned in the above PR URLs does not affect releng/11.0, > and is a non-issue, but we are awaiting one more change to the stable/11 > and releng/11.0 branches that we hope will be the final major changes to > 11.0. > > If this is the case, we may be able to eliminate 11.0-RC3 entirely, and > still release on time (or, on time as the current schedule suggests). > > However, as you know, FreeBSD releases prioritize quality over schedule, > so we may still need to adjust the schedule appropriately. > > So, help with testing 11.0-RC1 (or the latest releng/11.0 from svn) is > greatly appreciated. > > Glen > On behalf of: re@ > Has any decision been made on this? It is not local to me (others have reported problems with combination devices) and rolling back the change in question eliminates the problem. It remains un-triaged as of this point. Note that this impacts a system that is booting and needs manual intervention, which is not a good place to a have a problem https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211884 -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Very odd behavior with RC2
On 8/15/2016 15:52, Karl Denninger wrote: > FreeBSD 11.0-PRERELEASE #2 r304166: Mon Aug 15 13:17:09 CDT 2016 > k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP > > Symptoms: > > This machine is on a SuperMicro board with the integrated KVM. > > After updating to this from the previous Alpha release this morning > (built circa July 15th) the emulated keyboard disappeared intermittently > (!) and would not register keypresses. There appears to have been > something that has changed quite-materially in the loader and/or the > kernel in this regard. Screen display was unaffected. > > Toggling the mouse mode would restore the keyboard; this causes a detach > and reattach of the virtual keyboard to the system, and it would then work. > > Just a heads-up as this was wildly unexpected and needless to say caused > me quite a bit of heartburn trying to perform the upgrade and mergemaster! > From the PR I filed on this... Scanning back through recent commits I am wondering if this one is related; the problem occurs after the kernel is loaded (I can use the keyboard on the KVM perfectly well in the BIOS, etc) and once the system is fully up and running it works as well. It is only if/when geli wants a password *during the boot process* that the keyboard is hosed. r304124 | hselasky | 2016-08-15 03:58:55 -0500 (Mon, 15 Aug 2016) | 7 lines MFC r303765: Keep a reference count on USB keyboard polling to allow recursive cngrab() during a panic for example, similar to what the AT-keyboard driver is doing. Found by: Bruce Evans The reason this looks possibly-related is that the KVM attaches as a USB keyboard and a plugged-in USB keyboard also exhibits the problem during the boot-time process, as shown here from the boot log on one of the impacted machines Enter passphrase for da8p4: ugen1.2: at usbus1 ukbd0: on usbus1 kbd2 at ukbd0 And... uhid0: on usbus4 Hmmm. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Very odd behavior with RC2
FreeBSD 11.0-PRERELEASE #2 r304166: Mon Aug 15 13:17:09 CDT 2016 k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP Symptoms: This machine is on a SuperMicro board with the integrated KVM. After updating to this from the previous Alpha release this morning (built circa July 15th) the emulated keyboard disappeared intermittently (!) and would not register keypresses. There appears to have been something that has changed quite-materially in the loader and/or the kernel in this regard. Screen display was unaffected. Toggling the mouse mode would restore the keyboard; this causes a detach and reattach of the virtual keyboard to the system, and it would then work. Just a heads-up as this was wildly unexpected and needless to say caused me quite a bit of heartburn trying to perform the upgrade and mergemaster! -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Postfix and tcpwrappers?
On 7/25/2016 14:48, Willem Jan Withagen wrote: > On 25-7-2016 19:32, Karl Denninger wrote: >> On 7/25/2016 12:04, Ronald Klop wrote: >>> On Mon, 25 Jul 2016 18:48:25 +0200, Karl Denninger >>> wrote: >>> >>>> This may not belong in "stable", but since Postfix is one of the >>>> high-performance alternatives to sendmail >>>> >>>> Question is this -- I have sshguard protecting connections inbound, but >>>> Postfix appears to be ignoring it, which implies that it is not paying >>>> attention to the hosts.allow file (and the wrapper that enables it.) >>>> >>>> Recently a large body of clowncars have been targeting my sasl-enabled >>>> https gateway (which I use for client machines and thus do in fact need) >>>> and while sshguard picks up the attacks and tries to ban them, postfix >>>> is ignoring the entries it makes which implies it is not linked with the >>>> tcp wrappers. >>>> >>>> A quick look at the config for postfix doesn't disclose an obvious >>>> configuration solutiondid I miss it? >>>> >>> Don't know if postfix can handle tcp wrappers, but I use bruteblock >>> [1] for protecting connections via the ipfw firewall. I use this for >>> ssh and postfix. > Given the fact that both tcpwrappers and postfix originate from the same > author (Wietse Venenma) I'd be very surprised it you could not do this. > http://www.postfix.org/linuxsecurity-200407.html > > But grepping the binary for libwrap it does seems to be the case. > Note that you can also educate sshguard to actually use a script to do > whatever you want it to do. I'm using it to add rules to an ipfw table > that is used in a deny-rule. > > Reloading the fw keeps the deny-rules, flushing the table deletes all > blocked hosts without reloading the firewall. > Both times a bonus. > > --WjW > --WjW That's why I was surprised too... .but it is what it is. I just rebuilt sshguard to use an ipfw table instead of hosts.allow, since I use ipfw anyway for firewall/routing/ipsec/etc adding one line up near the top of my ruleset to match against the table and send back a reset (I'm considering black-holing attempts instead as that will slow the clowncar brigade down and thus "helps" others) and resolved the issue. It's interesting that all of a sudden the clowncar folks figured out that if they hit my email server with SSL they could then attempt an auth. I have always had auth turned off for non-SSL connections for obvious reasons (passing passwords around plain is bad news, yanno) and until recently the clowns hadn't bothered with the overhead of setting up SSL connections. That appears to now have changed, so -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Postfix and tcpwrappers?
On 7/25/2016 14:38, Tim Daneliuk wrote: > On 07/25/2016 01:20 PM, Shawn Bakhtiar wrote: >> ecently a large body of clowncars have been targeting my sasl-enabled >> https gateway (which I use for client machines and thus do in fact need) >> and while sshguard picks up the attacks and tries to ban them, postfix >> is ignoring the entries it makes which implies it is not linked with the >> tcp wrappers. >> >> A quick look at the config for postfix doesn't disclose an obvious >> configuration solutiondid I miss it? >> > > You can more-or-less run anything from a wrapper if you don't daemonize it > and kick it off on-demand from inetd. Essentially, you have inetd.conf > configured with a stanza that - upon connection attempt - launches an > instance of your desired program (postfix in this case), if and only > if the hosts.allow rules are satisfied. > > This works nicely for smaller installations, but is very slow in high > arrival rate environments because each connection attempt incurs the full > startup overhead of the program you're running. > Tcpwrapper works with many persistent system services (sshd being a notable ones) and integrates nicely, so you can use hosts.allow. The package (or default build in ports) for sshguard uses the hosts.allow file. But, sshguard does know (if you build it by hand or use the right subport) how to insert into an ipfw table instead so I switched over to that. I was rather curious, however, if/why postfix wasn't integrated with the hosts.allow file as are many other system services (or if I just missed the config option to turn it on) since it's offered by FreeBSD as a "stock sendmail replacement" option for higher-volume (and more-secure) sites -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Postfix and tcpwrappers?
On 7/25/2016 12:04, Ronald Klop wrote: > On Mon, 25 Jul 2016 18:48:25 +0200, Karl Denninger > wrote: > >> This may not belong in "stable", but since Postfix is one of the >> high-performance alternatives to sendmail >> >> Question is this -- I have sshguard protecting connections inbound, but >> Postfix appears to be ignoring it, which implies that it is not paying >> attention to the hosts.allow file (and the wrapper that enables it.) >> >> Recently a large body of clowncars have been targeting my sasl-enabled >> https gateway (which I use for client machines and thus do in fact need) >> and while sshguard picks up the attacks and tries to ban them, postfix >> is ignoring the entries it makes which implies it is not linked with the >> tcp wrappers. >> >> A quick look at the config for postfix doesn't disclose an obvious >> configuration solutiondid I miss it? >> > > Don't know if postfix can handle tcp wrappers, but I use bruteblock > [1] for protecting connections via the ipfw firewall. I use this for > ssh and postfix. > I recompiled sshguard to use ipfw and stuck the table lookup in my firewall config. works, and is software-agnostic (thus doesn't care if something was linked against tcpwrappers or not.) -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature
Postfix and tcpwrappers?
This may not belong in "stable", but since Postfix is one of the high-performance alternatives to sendmail Question is this -- I have sshguard protecting connections inbound, but Postfix appears to be ignoring it, which implies that it is not paying attention to the hosts.allow file (and the wrapper that enables it.) Recently a large body of clowncars have been targeting my sasl-enabled https gateway (which I use for client machines and thus do in fact need) and while sshguard picks up the attacks and tries to ban them, postfix is ignoring the entries it makes which implies it is not linked with the tcp wrappers. A quick look at the config for postfix doesn't disclose an obvious configuration solutiondid I miss it? -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ smime.p7s Description: S/MIME Cryptographic Signature