from:"Karl Denninger"

Re: geli - is it better to partition then encrypt, or vice versa ?

2021-04-17 Thread Karl Denninger


On 4/17/2021 15:52, Pete French wrote:
So, am building a zpool on some encrypted discs - and what I have done 
is to partition the disc with GPT add a single big partition, and 
encrypt that. So the pool is on nda1p1.eli.


But I could, of course, encrypt the disc first, and then partition the 
encrypted disc, or indded just put the zpool directly onto it.


Just wondering what the general consensus is as to the best way to go 
here ... if there is one! :-) What do other people do ?


IMHO one reason to partition first (and the reason I do it) is to 
prevent "drive attachment point hopping" from causing an unwelcome 
surprise if/when there is a failure or if, for some reason, you plug a 
drive into a different machine at some point.  If you partition and 
label, then geli init and attach at "/dev/gpt/the-label" you now can 
label the drive carrier with that and irrespective of the slot or 
adapter that gets connected to on whatever machine it will be in the 
same place.  If it fails this also means (assuming you labeled the 
carrier) you know which carrier to yank and replace. Yanking the wrong 
drive can be an unpleasant surprise.


This also makes "geli groups" trivial in /etc/rc.conf for attachment at 
boot time irrespective of whether they physically come up in the same 
place (again typically yes, but in the case of a failure or you plug it 
into a different adapter.)


--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: freebsd-update and speed

2021-04-15 Thread Karl Denninger



On 4/15/2021 08:28, Ferdinand Goldmann wrote:

Following up on my own mail:


to type this mail while waiting for '8778 patches'.


Which has ended in:

71107120 done.
Applying patches... done.
Fetching 1965 files... failed.

and after restarting it:

Fetching 1750 patches
[...]
Applying patches... done.
Fetching 326 files...

This does not seem very reassuring to me. :(


It already got the others, so it now only has to fetch 326 more.

--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: possibly silly question regarding freebsd-update

2021-03-30 Thread Karl Denninger

On 3/30/2021 12:02, Gary Palmer wrote:

On Tue, Mar 30, 2021 at 11:55:24AM -0400, Karl Denninger wrote:

On 3/30/2021 11:22, Guido Falsi via freebsd-stable wrote:

On 30/03/21 15:35, tech-lists wrote:

Hi,

Recently there was
https://lists.freebsd.org/pipermail/freebsd-security/2021-March/010380.html

about openssl. Upgraded to 12.2-p5 with freebsd-update and rebooted.

What I'm unsure about is the openssl version.
Up-to-date 12.1-p5 instances report OpenSSL 1.1.1h-freebsd? 22 Sep 2020

Up-to-date stable/13-n245043-7590d7800c4 reports OpenSSL 1.1.1k-freebsd
25 Mar 2021

shouldn't the 12.2-p5 be reporting openssl 1.1.1k-freebsd as well?

No, as you can see in the commit in the official git [1] while for
current and stable the new upstream version of openssl was imported for
the release the fix was applied without importing the new release and
without changing the reported version of the library.

So with 12.2p5 you do get the fix but don't get a new version of the
library.

[1]
https://cgit.freebsd.org/src/commit/?h=releng/12.2&id=af61348d61f51a88b438d41c3c91b56b2b65ed9b

Excuse me

$ uname -v
FreeBSD 12.2-RELEASE-p4 GENERIC
$ sudo sh
# freebsd-update fetch
Looking up update.FreeBSD.org mirrors... 3 mirrors found.
Fetching metadata signature for 12.2-RELEASE from update4.freebsd.org...
done.
Fetching metadata index... done.
Inspecting system... done.
Preparing to download files... done.

No updates needed to update system to 12.2-RELEASE-p5.

I am running 12.2-RELEASE-p4, so says uname -v

IMHO it is an *extraordinarily* bad practice to change a library that in
fact will result in a revision change while leaving the revision number
alone.

How do I *know*, without source to go look at, whether or not the fix is
present on a binary system?

If newvers.sh gets bumped then a build and -p5 release should have resulted
from that, and in turn a fetch/install (and reboot of course since it's in
the kernel) should result in uname -v returning "-p5"

Most of my deployed "stuff" is on -STABLE but I do have a handful of
machines on cloud infrastructure that are binary-only and on which I rely on
freebsd-update and pkg to keep current with security-related items.

What does "freebsd-version -u" report? The fix was only to a userland
library, so I would not expect the kernel version as reported by uname
to change.

Regards,

Gary

Ok, that's fair; it DOES show -p5 for the user side.

$ freebsd-version -ru
12.2-RELEASE-p4
12.2-RELEASE-p5

So that says my userland is -p5 while the kernel, which did not change
(even though if you built from source it would carry the -p5 number) is -p4.

I can live with that as it allows me to "see" that indeed the revision
is present without having source on the box.

I recognize that this is probably a reasonably-infrequent thing but it
certainly is one that for people running binary releases is likely quite
important given that the issue is in the openssl libraries. It was
enough for me to rebuild all the firewall machines the other day since a
DOS (which is reasonably possible for one of the flaws) aimed at my VPN
server causing the server process to exit would be.. bad.

--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: possibly silly question regarding freebsd-update

2021-03-30 Thread Karl Denninger

On 3/30/2021 11:22, Guido Falsi via freebsd-stable wrote:

On 30/03/21 15:35, tech-lists wrote:

Hi,

Recently there was
https://lists.freebsd.org/pipermail/freebsd-security/2021-March/010380.html

about openssl. Upgraded to 12.2-p5 with freebsd-update and rebooted.

What I'm unsure about is the openssl version.
Up-to-date 12.1-p5 instances report OpenSSL 1.1.1h-freebsd 22 Sep 2020

Up-to-date stable/13-n245043-7590d7800c4 reports OpenSSL 1.1.1k-freebsd
25 Mar 2021

shouldn't the 12.2-p5 be reporting openssl 1.1.1k-freebsd as well?

No, as you can see in the commit in the official git [1] while for
current and stable the new upstream version of openssl was imported
for the release the fix was applied without importing the new release
and without changing the reported version of the library.

So with 12.2p5 you do get the fix but don't get a new version of the
library.

[1]
https://cgit.freebsd.org/src/commit/?h=releng/12.2&id=af61348d61f51a88b438d41c3c91b56b2b65ed9b

Excuse me

Fetching metadata index... done.
Inspecting system... done.
Preparing to download files... done.

No updates needed to update system to 12.2-RELEASE-p5.

I am running 12.2-RELEASE-p4, so says uname -v

IMHO it is an *extraordinarily* bad practice to change a library that in
fact will result in a revision change while leaving the revision number
alone.

How do I *know*, without source to go look at, whether or not the fix is
present on a binary system?

If newvers.sh gets bumped then a build and -p5 release should have
resulted from that, and in turn a fetch/install (and reboot of course
since it's in the kernel) should result in uname -v returning "-p5"

Most of my deployed "stuff" is on -STABLE but I do have a handful of
machines on cloud infrastructure that are binary-only and on which I
rely on freebsd-update and pkg to keep current with security-related items.

--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: possibly silly question regarding freebsd-update

2021-03-30 Thread Karl Denninger


On 3/30/2021 10:40, tech-lists wrote:

On Tue, Mar 30, 2021 at 09:14:56AM -0500, Doug McIntyre wrote:

Like the patch referenced in the SA.
https://security.FreeBSD.org/patches/SA-21:07/openssl-12.patch

Again, it seems like confusion over what happens in RELEASE, STABLE 
and CURRENT..

Hi,

I'm not sure what you mean by this. In
https://lists.freebsd.org/pipermail/freebsd-security/2021-March/010380.html 


it says


1) To update your vulnerable system via a binary patch:



Systems running a RELEASE version of FreeBSD on the i386 or amd64
platforms can be updated via the freebsd-update(8) utility:



# freebsd-update fetch
# freebsd-update install
# 


which I did. If openssl updated, would it not be logical to expect
openssl version information to indicate it had in fact been updated?

If not, then how am I able to tell that it has updated? On an
un-upgraded 12.2-p4 system *and* on an upgraded one, openssl version
reports 1.1.1h-freebsd


It is not updating; as I noted it appears this security patch was NOT 
backported and thus 12.2-RELEASE does not "see" it.


You cannot go to "-STABLE" via freebsd-update; to run -STABLE you must 
be doing buildworld/buildkernel from source.  I can confirm that 
12.2-STABLE *does* have the patch as I checked it recently.


From a system I cross-build for an updated yesterday:

$ uname -v
FreeBSD 12.2-STABLE stable/12-n232909-4fd5354e85e KSD-SMP
$ openssl version
OpenSSL 1.1.1k-freebsd  25 Mar 2021

--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: possibly silly question regarding freebsd-update

2021-03-30 Thread Karl Denninger



On 3/30/2021 10:14, Doug McIntyre wrote:

Like the patch referenced in the SA.
https://security.FreeBSD.org/patches/SA-21:07/openssl-12.patch

Again, it seems like confusion over what happens in RELEASE, STABLE and 

CURRENT..




On Tue, Mar 30, 2021 at 04:05:32PM +0200, Ruben via freebsd-stable wrote:

Hi,

Did you mean 12.1-p5 or 12.2-p5 ? I'm asking because you refer to both
12.1-p5 and 12.2-p5 (typo?).

If you meant 12.2-p5: Perhaps the FreeBSD security team did not bump the
version, but "only" backported the patches to version 1.1.1h ?

Regards,

Ruben


On 3/30/21 3:35 PM, tech-lists wrote:

Hi,

Recently there was
https://lists.freebsd.org/pipermail/freebsd-security/2021-March/010380.html
about openssl. Upgraded to 12.2-p5 with freebsd-update and rebooted.

What I'm unsure about is the openssl version.
Up-to-date 12.1-p5 instances report OpenSSL 1.1.1h-freebsd  22 Sep 2020

Up-to-date stable/13-n245043-7590d7800c4 reports OpenSSL 1.1.1k-freebsd
25 Mar 2021

shouldn't the 12.2-p5 be reporting openssl 1.1.1k-freebsd as well?

thanks,

_


Ok, except

# uname -v
FreeBSD 12.2-RELEASE-p4 GENERIC

# openssl version
OpenSSL 1.1.1h-freebsd  22 Sep 2020
# freebsd-update fetch
Looking up update.FreeBSD.org mirrors... 3 mirrors found.
Fetching metadata signature for 12.2-RELEASE from update4.freebsd.org... 
done.

Fetching metadata index... done.
Fetching 2 metadata patches.. done.
Applying metadata patches... done.
Fetching 2 metadata files... done.
Inspecting system... done.
Preparing to download files... done.

No updates needed to update system to 12.2-RELEASE-p5.

So if you're running RELEASE then /security patches /don't get backported?

And you CAN'T upgrade to 12.2-STABLE via freebsd-update:

# freebsd-update -r 12.2-STABLE upgrade
Looking up update.FreeBSD.org mirrors... 3 mirrors found.
Fetching metadata signature for 12.2-RELEASE from update1.freebsd.org... 
done.

Fetching metadata index... done.
Inspecting system... done.

The following components of FreeBSD seem to be installed:
kernel/generic src/src world/base world/doc world/lib32

The following components of FreeBSD do not seem to be installed:
kernel/generic-dbg world/base-dbg world/lib32-dbg

Does this look reasonable (y/n)? y

Fetching metadata signature for 12.2-STABLE from update1.freebsd.org... 
failed.
Fetching metadata signature for 12.2-STABLE from update2.freebsd.org... 
failed.
Fetching metadata signature for 12.2-STABLE from update4.freebsd.org... 
failed.

No mirrors remaining, giving up.

This may be because upgrading from this platform (amd64)
or release (12.2-STABLE) is unsupported by freebsd-update. Only
platforms with Tier 1 support can be upgraded by freebsd-update.
See https://www.freebsd.org/platforms/index.html for more info.

If unsupported, FreeBSD must be upgraded by source.

--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: How do I know if my 13-stable has security patches?

2021-02-26 Thread Karl Denninger


On 2/26/2021 10:22, Ed Maste wrote:

On Thu, 25 Feb 2021 at 16:57, Karl Denninger  wrote:

The time (and present items) on a given machine to know whether it is
covered by a given advisory under the "svn view of the world" is one
command, and no sources.  That is, if the advisory says "r123456" has
the fix, then if I do a "uname -v" and get something larger, it's safe.

Yes, as previously stated the commit count will be included in future
advisories.

On stable/13 today uname will include:
uname displays e.g. stable/13-n244688-66308a13dddc

The advisory would report stabl/13-n244572

244688 is greater than 244572 so will have the fix.

Sounds like the issue has been addressed -- thank you!
--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: How do I know if my 13-stable has security patches?

2021-02-25 Thread Karl Denninger

On 2/25/2021 15:56, Warner Losh wrote:

On Thu, Feb 25, 2021 at 6:37 AM Karl Denninger <mailto:k...@denninger.net>> wrote:

On 2/25/2021 04:30, Olivier Certner wrote:
>> Neither command is what I'd call 'intuitive', so it would have
taken me a
>> long time to find either of them. I cut and pasted the 'git
branch' command
>> and it took me a moment to realize what that meant. Never ran
"grep -l" on
>> a pipe, I guess.
> You made me laugh! Apart from relatively simple commands, git's
interface is
> far from intuitive. That's the reason why I regret that it
became the hugely
> dominant DVCS.

Regression doesn't have to come to a project, but if the tools you
choose do things like this then you have to work around them as a
project to avoid the issue, and that might wind up being somewhat
of a PITA.

This specific issue is IMHO quite severe in terms of operational
impact.  I track -STABLE but don't load "new things" all the
time.  For
security-related things it's more important to know if I've got
something out there in a specific instance where it may apply (and
not
care in others where it doesn't; aka the recent Xen thing if
you're not
using Xen.)  Otherwise if everything is running as it should do I
wish
to risk introducing bugs along with improvements?  If not in a
security-related context, frequently not.

Well, this used to be easy.  Is your "uname" r-number HIGHER than the
"when fixed" revision?  You're good.  Now, nope.  Now I have to go
dig
source to know because there is no longer a "revision number" that
monotonically increments with each commit so there is no longer a
way to
have a "point in time" view of the source, as-committed, for a given
checked-out version.

IMHO that's a fairly serious regression for the person responsible
for
keeping security-related things up to date and something the project
should find a way to fix before rolling the next -RELEASE. (Yeah,
I know
that's almost-certain to not happen but it's not like this issue
wasn't
known since moving things over to git.)

We should likely just publish the 'v' number in the advisories. It's 
basically a count back to the start of the project. We put that number 
in uname already.

You can also  find out the 'v' number in the latest advisories by 
cloning the repo and doing the same thing we do in newvers.sh:

% git rev-list --first-parent --count $HASH
and that will tell you. This needn't be on the target machine since 
the hashes are stable across the world.

(list of further "stuff")

But that's my entire point Warner.

The time (and present items) on a given machine to know whether it is 
covered by a given advisory under the "svn view of the world" is one 
command, and no sources.  That is, if the advisory says "r123456" has 
the fix, then if I do a "uname -v" and get something larger, it's safe.

If I get something smaller it's not.

I don't need the source on the machine, I don't need svn on the target 
or, for that matter, do I need to know if the source tree I have on a 
build machine is coherent with whatever is on the running machine.  I 
simply need to know if the source that built the code that is running 
was updated *after* the commit that fixes the problem.  What if the 
source /isn't on that machine /because you build on some system and then 
distribute?  Does every machine now have to be coherent with your source 
repository in order to be able to figure out where you are or worse, it 
must keep the source from which that specific installation, 
individually, was built? /What if the source isn't there at all /because 
you run binary code and update with freebsd-update?

Unless I've missed something that's what was lost and IMHO needs to be 
restored; a way to know that in seconds with nothing other than the 
operating OS on the box (e.g. via uname) and the advisory with its 
"greater than X is safe" from the mailing list.  Am I misunderstanding 
the current state of things in this regard?

--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: How do I know if my 13-stable has security patches?

2021-02-25 Thread Karl Denninger


On 2/25/2021 04:30, Olivier Certner wrote:

Neither command is what I'd call 'intuitive', so it would have taken me a
long time to find either of them. I cut and pasted the 'git branch' command
and it took me a moment to realize what that meant. Never ran "grep -l" on
a pipe, I guess.

You made me laugh! Apart from relatively simple commands, git's interface is
far from intuitive. That's the reason why I regret that it became the hugely
dominant DVCS.


Regression doesn't have to come to a project, but if the tools you 
choose do things like this then you have to work around them as a 
project to avoid the issue, and that might wind up being somewhat of a PITA.


This specific issue is IMHO quite severe in terms of operational 
impact.  I track -STABLE but don't load "new things" all the time.  For 
security-related things it's more important to know if I've got 
something out there in a specific instance where it may apply (and not 
care in others where it doesn't; aka the recent Xen thing if you're not 
using Xen.)  Otherwise if everything is running as it should do I wish 
to risk introducing bugs along with improvements?  If not in a 
security-related context, frequently not.


Well, this used to be easy.  Is your "uname" r-number HIGHER than the 
"when fixed" revision?  You're good.  Now, nope.  Now I have to go dig 
source to know because there is no longer a "revision number" that 
monotonically increments with each commit so there is no longer a way to 
have a "point in time" view of the source, as-committed, for a given 
checked-out version.


IMHO that's a fairly serious regression for the person responsible for 
keeping security-related things up to date and something the project 
should find a way to fix before rolling the next -RELEASE. (Yeah, I know 
that's almost-certain to not happen but it's not like this issue wasn't 
known since moving things over to git.)


--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: HEADS UP: FreeBSD src repo transitioning to git this weekend

2020-12-23 Thread Karl Denninger



On 12/23/2020 12:01, Warner Losh wrote:

On Wed, Dec 23, 2020 at 7:32 AM Michael Grimm  wrote:


Hi,

Warner Losh  wrote:


The FreeBSD project will be moving it's source repo from subversion to

git

starting this this weekend.

First of all I'd like to thank all those involved in this for their
efforts.

Following
https://github.com/bsdimp/freebsd-git-docs/blob/main/mini-primer.md form
your other mail I was able to migrate from svn to git without running into
any issues.

Right now I am learning how to use git the way I sed svn before. I am just
following 12-STABLE in order to build world and kernel. I am not
developing, neither am I committing.

I wonder how one would switch from a currently used branch (OLD) to
another branch (NEW).

With svn I used:
 svn switch svn://svn.freebsd.org/base/stable/NEW /usr/src

For git I found:
 git branch -m stable/OLD stable/NEW
 or
 git branch -M stable/OLD stable/NEW

git-branch(1):
With a -m or -M option,  will be renamed to .
If
 had a corresponding reflog, it is renamed to match
, and a reflog entry is created to remember the branch
renaming. If  exists, -M must be used to force the
rename to
happen.

I don't understand that text completely, because I don't know what a
reflog is, yet ;-)

Thus: Should I use "-m" or "-M" in my scenario when switching from
stable/12 to stable/13 in the near future?


I think the answer is a simple "git checkout NEW". This will replace the
current tree at branch OLD with the contents of branch NEW.

git branch -m is different and changes what the branch means. If you did
what you suggested then you'd be renaming the OLD brnach to NEW, which
isn't what I think you're asking about.


Correct -- "git checkout NEW" where "new" is the desired branch you wish 
to have "active."


If you have made local changes it will tell you to act on that first; 
the usual is "git stash" to save them.  You can then apply them with 
"git stash apply" to the *new* branch, assuming that makes sense to do 
(e.g. a kernel configuration file, etc.) "Stash" maintains a stack which 
can be manipulated as well (so a "stash" if you already "stash"ed and 
did not drop it creates a second one, aka "stash@(0) and stash@(1)".


--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: URGENT: Microsoft overwrites boot loader!

2020-07-16 Thread Karl Denninger

On 7/16/2020 16:28, Alan Somers wrote:
> On Thu, Jul 16, 2020 at 2:20 PM Don Wilde  wrote:
>
>> The [deleted] ones in Redmond have done it again. My multi-OS GRUB2 boot
>> loader is gone, and in its place is a 500M partition called 'Windows
>> boot loader'.
>>
>> The purpose is to force us to look at MS' new version of Edge. All my
>> old boot files are gone.
>>
>> It's taken me much of the morning to get underneath this, since on this
>> unit my only OS (other than Doze 10) with a WM and GUI is Ubuntu.
>>
>> That's the last time I will allow this, and I'm calling those [deleted]s
>> tomorrow to give them a piece of my mind. After that I will erase every
>> vestige of that obscene OS from my disk.
>>
>> --
>> Don Wilde
>> 
>> * What is the Internet of Things but a system  *
>> * of systems including humans? *
>> 
>>
> Edge?  I thought that was a browser.  What does it have to do with boot
> loaders?

Microsoft does this on any of their "Feature" updates.  I managed to
figure out how to arrange my EFI setup so that all I have to do is
restore the index in the BIOS to point back at REFIND, and everything
else is still there.

But if you stick the FreeBSD loader where Microsoft wants to clobber
it, yeah, it'll do that.  It doesn't actually blast the partition though
-- just the single file where they want to stuff it.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: 12.1p7 no longer boots after doing zpool upgrade -a

2020-07-09 Thread Karl Denninger

On 7/9/2020 09:32, Pete French wrote:
>
> On 09/07/2020 14:24, Kyle Evans wrote:
>
>>> gpart bootcode -p /boot/boot1.efifat -i 1 ada0
>>> gpart bootcode -p /boot/boot1.efifat -i 1 ada1
>>
>>
>> This method of updating the ESP is no longer recommended for new 12.x
>> installations -- we now more carefully construct the ESP with an
>> /EFI/FreeBSD/loader.efi where loader.efi is /boot/loader.efi. You will
>> want to rebuild this as such, and that may fix part of your problem.
>
> Out of interest, how should the ESP partition be upgraded then ? I
> dont have any EFI machines...yet. But one day I will, and I was
> assuming that an upgrade would be done using the above lines too.
>
Nope.  An EFI partition is just a "funky" MSDOS (FAT) one, really.  Thus
the upgrade of the loader on one would be just a copy onto it as with
any other file on a filesystem (e.g. mount the partition, copy the file
to the correct place, unmount it); the gpart command does a byte-copy
onto what is, for all intents and purposes, an unformatted (no
directory, etc) reserved part of the disk.

My laptop dual boot (Windows 10 / FreeBSD) is EFI and I've yet to have
to screw with the loader, but if you do then it's just a copy over. 
Windows has several times blown up my Refind install -- all the
"Feature" upgrades from Windows have a habit of resetting the BIOS boot
order which makes the machine "Windows boots immediately and only", so I
have to go back and reset it whenever Microslug looses one of those on
me.  If I had cause to update the loader for FreeBSD then I'd just mount
the partition and copy it over.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: support of PCIe NVME drives

2020-04-16 Thread Karl Denninger

On 4/16/2020 13:23, Pete Wright wrote:
>
>
> On 4/16/20 11:12 AM, Miroslav Lachman wrote:
>> Kurt Jaeger wrote on 04/16/2020 20:07:
>>> Hi!
>>>
>>>> I was requested to install FreeBSD 11.3 on a new Dell machine with
>>>> only 2
>>>> NVME drives in ZFS mirror. The problem is that installer does not
>>>> see the
>>>> drives. Are there any special procedure to use NVME drives for
>>>> installation a later for booting?
>>>
>>> I use 2 NVMe drives as zfs mirror to boot from on my testbox,
>>> but it runs CURRENT, since approx. November 2018.
>>>
>>> So maybe try it with 12.1 ? I know, that does not help if you are asked
>>> to install 11.3, but at least it gives you an idea...
>>>
>>
>> I tried 12.1 few minutes ago but the result is the same - no NVME
>> drives listed.
>> Should I try something with kernel modules, some sysctl tweaks?
>> Should I try UEFI boot? (I never did)
>>
>
> I would try booting via UEFI if you can.  I just installed a laptop
> yesterday which has a nvme root device, it was detected by the
> 12-STABLE snapshot I used to boot from.  no other modifications were
> necessary on my end.
>
> -pete
>
Yeah my Lenovo Carbon X1 has an nVME drive in it, and nothing else -
12-Stable found it immediately and works fine.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Running FreeBSD on M.2 SSD

2020-02-25 Thread Karl Denninger



On 2/25/2020 9:53 AM, John Kennedy wrote:

On Tue, Feb 25, 2020 at 11:07:48AM +, Pete French wrote:

I have often wondered if ZFS is more aggressive with discs, because until
very recently any solid state drive I have used ZFS on broke very quicky. ...

   I've always wondered if ZFS (and other snapshotting file systems) would help
kill SSD disks by locking up blocks longer than other filesystems might.  For
example, I've got snapshot-backups going back, say, a year then those blocks
that haven't changed aren't going back into the pool to be rewritten (and
perhaps favored because of low write-cycle count).  As the disk fills up, the
blocks that aren't locked up get reused more and more, leading to extra wear
on them.  Eventually one of those will get to the point of erroring out.

   Personally, I just size generously but that isn't always an option for
everybody.


I have a ZFS RaidZ2 on SSDs that has been running for several /years 
/without any problems.  The drives are Intel 730s, which Intel CLAIMS 
don't have power-loss protection but in fact appear to; not only do they 
have caps in them but in addition they pass a "pull the cord out of the 
wall and then check to see if the data is corrupted on restart" test on 
a repeated basis, which I did several times before trusting them.


BTW essentially all non-data-center SSDs fail that test and some fail it 
spectacularly (destroying the OS due to some of the in-flight data being 
comingled on an allocated block with something important; if the 
read/erase/write cycle interrupts you're cooked as the "other" data that 
was not being modified gets destroyed too!) -- the Intels are one of the 
very, very few that have passed it.


--
-- Karl Denninger
/The Market-Ticker/
S/MIME Email accepted and preferred


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Running FreeBSD on M.2 SSD

2020-02-25 Thread Karl Denninger



On 2/25/2020 8:28 AM, Mario Olofo wrote:

Good morning all,

@Pete French, you have trim activated on your SSDs right? I heard that if
its not activated, the SSD disc can stop working very quickly.
@Daniel Kalchev, I used UFS2 with SU+J as suggested on the forums for me,
and in this case the filesystem didn't "corrupted", it justs kernel panic
from time to time so I gave up.
I think that the problem was related to the size of the journal, that
become full when I put so many files at once on the system, or was
deadlocks in the version of the OS that I was using.
@Alexander Leidinger I have the original HDD 1TB Hybrid that came with the
notebook will try to reinstall FreeBSD on it to see if it works correctly.

Besides my notebook been a 2019 model Dell G3 with no customizations other
than the m.2 SSD, I never trust that the system is 100%, so I'll try all
possibilities.
1- The BIOS received an update last month but I'll look if there's
something newer.
2- Reinstall the FreeBSD on the Hybrid HDD, but if the problem is the
FreeBSD driver, it'll work correctly on that HD.
3- Will try with other RAM. This I really don't think that is the problem
because is a brand new notebook, but... who knows =).

Thank you,

Mario

I have a Lenovo Carbon X1 that has a Samsung nVME SSD in it and it's 
fine with both FreeBSD12-STABLE and Windows (I have it set up for dual 
EFI boot using REFIND.)  It does not have a "custom" driver for Win10; 
it is using Microsoft's "built-in" stuff.


Zero problems and I beat on it pretty-heavily.

--
-- Karl Denninger
/The Market-Ticker/
S/MIME Email accepted and preferred


smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS and power management

2020-01-05 Thread Karl Denninger

On 1/5/2020 16:10, Peter wrote:
> On Wed, 18 Dec 2019 17:22:16 +0100, Karl Denninger
>  wrote:
>
>> I'm curious if anyone has come up with a way to do this...
>>
>> I have a system here that has two pools -- one comprised of SSD disks
>> that are the "most commonly used" things including user home directories
>> and mailboxes, and another that is comprised of very large things that
>> are far less-commonly used (e.g. video data files, media, build
>> environments for various devices, etc.)
>
> I'm using such a configuration for more than 10 years already, and
> didn't perceive the problems You describe.
> Disks are powered down with gstopd or other means, and they stay
> powered down until filesystems in the pool are actively accessed.
> A difficulty for me was that postgres autovacuum must be completeley
> disabled if there are tablespaces on the quiesced pools. Another thing
> that comes to mind is smartctl in daemon mode (but I never used that).
> There are probably a whole bunch more of potential culprits, so I
> suggest You work thru all the housekeeping stuff (daemons, cronjobs,
> etc.) to find it.

I found a number of things and managed to kill them off in terms of
active access, and now it is behaving.  I'm using "camcontrol idle -t
240 da{xxx}", which interestingly enough appears NOT to survive a
reboot, but otherwise does what's expected.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

ZFS and power management

2019-12-18 Thread Karl Denninger

I'm curious if anyone has come up with a way to do this...

I have a system here that has two pools -- one comprised of SSD disks
that are the "most commonly used" things including user home directories
and mailboxes, and another that is comprised of very large things that
are far less-commonly used (e.g. video data files, media, build
environments for various devices, etc.)

The second pool has perhaps two dozen filesystems that are mounted, but
again, rarely accessed.  However, despite them being rarely accessed ZFS
performs various maintenance checkpoint functions on a nearly-continuous
basis (it appears) because there's a low level, but not zero, amount of
I/O traffic to and from them.  Thus if I set power control (e.g. spin
down after 5 minutes of inactivity) they never do.  I could simply
export the pool but I prefer (greatly) to not do that because some of
the data on that pool (e.g. backups from PCs) is information that if a
user wants to get to it it ought to "just work."

Well, one disk is no big deal.  A rack full of them is another matter. 
I could materially cut the power consumption of this box down (likely by
a third or more) if those disks were spun down during 95% of the time
the box is up, but with the "standard" way ZFS does things that doesn't
appear to be possible.

Has anyone taken a crack at changing the paradigm (e.g. using the
automounter, perhaps?) to get around this?

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Kernel panic in zfs code; 12-STABLE

2019-07-18 Thread Karl Denninger

On 7/18/2019 15:35, Karl Denninger wrote:
> On 7/18/2019 15:19, Eugene Grosbein wrote:
>> 19.07.2019 3:13, Karl Denninger wrote:
>>
>>> FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019
>>> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP
>>>
>>> Note -- no patches of any sort in the ZFS code; I am NOT running any of
>>> my former patch set.
>>>
>>> NewFS.denninger.net dumped core - see /var/crash/vmcore.8
>>>
>>> Thu Jul 18 15:02:54 CDT 2019
>>>
>>> FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r349024M:
>>> Thu Jun 13 18:01:16 CDT 2019
>>> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP  amd64
>>>
>>> panic: double fault
>> [skip]
>>
>>> #283 0x82748d91 in zio_vdev_io_done (zio=0xf8000b8b8000)
>>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3376
>>> #284 0x82744eac in zio_execute (zio=0xf8000b8b8000)
>>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786
>>> #285 0x80c3b7f4 in taskqueue_run_locked (queue=0xf801a8b35100)
>>> at /usr/src/sys/kern/subr_taskqueue.c:467
>>> #286 0x80c3cb28 in taskqueue_thread_loop (arg=)
>>> at /usr/src/sys/kern/subr_taskqueue.c:773
>>> #287 0x80b9ab23 in fork_exit (
>>> callout=0x80c3ca90 ,
>>> arg=0xf801a0577520, frame=0xfe009d4edc00)
>>> at /usr/src/sys/kern/kern_fork.c:1063
>>> #288 0x810b367e in fork_trampoline ()
>>> at /usr/src/sys/amd64/amd64/exception.S:996
>>> #289 0x in ?? ()
>>> Current language:  auto; currently minimal
>>> (kgdb)
>> You have "double fault" and completely insane number of stack frames in the 
>> trace.
>> This is obviously infinite recursion resulting in kernel stack overflow and 
>> panic.
> Yes, but why and how?
>
> What's executing at the time is this command:
>
> zfs send -RI $i@zfs-old $i@zfs-base | zfs receive -Fudv $BACKUP
>
> Which in turn results in the old snapshots on the target not on the
> source being deleted, then the new ones being sent.  It never gets to
> the sending part; it blows up during the delete of the OLD snapshots.
>
> The one(s) it deletes, however, it DOES delete.  When the box is
> rebooted those two snapshots on the target are indeed gone.
>
> That is, it is NOT getting "stuck" on one (which would imply there's an
> un-detected fault in the filesystem on the target in the metadata for
> that snapshot, resulting in a recursive call that blows up the stack)
> and it never gets to send the new snapshot, so whatever is going on is
> NOT on the source filesystem.  Neither source or destination shows any
> errors on the filesystem; both pools are healthy with zero error counts.
>
> Therefore the question -- is the system queueing enough work to blow the
> stack *BUT* the work it queues is all legitimate?  If so there's a
> serious problem in the way the code now functions in that an "ordinary"
> operation can result in what amounts to kernel stack exhaustion.
>
> One note -- I haven't run this backup for the last five days, as I do it
> manually and I've been out of town.  Previous running it on a daily
> basis completed without trouble.  This smells like a backlog of "things
> to do" when the send runs that results in the allegedly-infinite
> recursion (that isn't really infinite) that runs the stack out of space
> -- and THAT implies that the system is trying to queue a crazy amount of
> work on a recursive basis for what is a perfectly-legitimate operation
> -- which it should *NOT* do.

Update: This looks like an OLD bug that came back.

Previously the system would go absolutely insane on the first few
accesses to spinning rust during a snapshot delete and ATTEMPT to send
thousands of TRIM requests -- which spinning rust does not support.  On
a system with mixed vdevs, where some pools are rust and some are SSD,
this was a problem since you can't turn TRIM off because you REALLY want
it on those disks.

The FIX for this was to do this on the import of said pool comprised of
spinning rust:

#
# Now try to trigger TRIM so that we don't have a storm of them
#
# echo "Attempting to disable TRIM on spinning rust"

mount -t zfs $BACKUP/no-trim /mnt
dd if=/dev/random of=/mnt/kill-trim bs=128k count=2
echo "Performed 2 writes"
sleep 2
rm /mnt/kill-trim
echo "Performed delete of written file; wait"
sleep 35
umount /mnt
echo "Unmounted tempo

Re: Kernel panic in zfs code; 12-STABLE

2019-07-18 Thread Karl Denninger

On 7/18/2019 15:19, Eugene Grosbein wrote:
> 19.07.2019 3:13, Karl Denninger wrote:
>
>> FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019
>> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP
>>
>> Note -- no patches of any sort in the ZFS code; I am NOT running any of
>> my former patch set.
>>
>> NewFS.denninger.net dumped core - see /var/crash/vmcore.8
>>
>> Thu Jul 18 15:02:54 CDT 2019
>>
>> FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r349024M:
>> Thu Jun 13 18:01:16 CDT 2019
>> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP  amd64
>>
>> panic: double fault
> [skip]
>
>> #283 0x82748d91 in zio_vdev_io_done (zio=0xf8000b8b8000)
>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3376
>> #284 0x82744eac in zio_execute (zio=0xf8000b8b8000)
>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786
>> #285 0x80c3b7f4 in taskqueue_run_locked (queue=0xf801a8b35100)
>> at /usr/src/sys/kern/subr_taskqueue.c:467
>> #286 0x80c3cb28 in taskqueue_thread_loop (arg=)
>> at /usr/src/sys/kern/subr_taskqueue.c:773
>> #287 0x80b9ab23 in fork_exit (
>> callout=0x80c3ca90 ,
>> arg=0xf801a0577520, frame=0xfe009d4edc00)
>> at /usr/src/sys/kern/kern_fork.c:1063
>> #288 0x810b367e in fork_trampoline ()
>> at /usr/src/sys/amd64/amd64/exception.S:996
>> #289 0x in ?? ()
>> Current language:  auto; currently minimal
>> (kgdb)
> You have "double fault" and completely insane number of stack frames in the 
> trace.
> This is obviously infinite recursion resulting in kernel stack overflow and 
> panic.

Yes, but why and how?

What's executing at the time is this command:

zfs send -RI $i@zfs-old $i@zfs-base | zfs receive -Fudv $BACKUP

Which in turn results in the old snapshots on the target not on the
source being deleted, then the new ones being sent.  It never gets to
the sending part; it blows up during the delete of the OLD snapshots.

The one(s) it deletes, however, it DOES delete.  When the box is
rebooted those two snapshots on the target are indeed gone.

That is, it is NOT getting "stuck" on one (which would imply there's an
un-detected fault in the filesystem on the target in the metadata for
that snapshot, resulting in a recursive call that blows up the stack)
and it never gets to send the new snapshot, so whatever is going on is
NOT on the source filesystem.  Neither source or destination shows any
errors on the filesystem; both pools are healthy with zero error counts.

Therefore the question -- is the system queueing enough work to blow the
stack *BUT* the work it queues is all legitimate?  If so there's a
serious problem in the way the code now functions in that an "ordinary"
operation can result in what amounts to kernel stack exhaustion.

One note -- I haven't run this backup for the last five days, as I do it
manually and I've been out of town.  Previous running it on a daily
basis completed without trouble.  This smells like a backlog of "things
to do" when the send runs that results in the allegedly-infinite
recursion (that isn't really infinite) that runs the stack out of space
-- and THAT implies that the system is trying to queue a crazy amount of
work on a recursive basis for what is a perfectly-legitimate operation
-- which it should *NOT* do.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Kernel panic in zfs code; 12-STABLE

2019-07-18 Thread Karl Denninger

o_done (zio=0xf8000b8b8000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3376
#284 0x82744eac in zio_execute (zio=0xf8000b8b8000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786
#285 0x80c3b7f4 in taskqueue_run_locked (queue=0xf801a8b35100)
    at /usr/src/sys/kern/subr_taskqueue.c:467
#286 0x80c3cb28 in taskqueue_thread_loop (arg=)
    at /usr/src/sys/kern/subr_taskqueue.c:773
#287 0x80b9ab23 in fork_exit (
    callout=0x80c3ca90 ,
    arg=0xf801a0577520, frame=0xfe009d4edc00)
    at /usr/src/sys/kern/kern_fork.c:1063
#288 0x810b367e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:996
#289 0x in ?? ()
Current language:  auto; currently minimal
(kgdb)

This is currently repeatable.  What was going on at the instant in time was:

root@NewFS:~ # /root/backup-zfs/run-backup
Begin local ZFS backup by SEND
Run backups of default [zsr/R/12.STABLE-2019-06-14 zsr/home zs/archive
zs/colo-archive zs/disk zsr/dbms/pgsql zs/work zs/dbms/ticker-9.6]
Thu Jul 18 14:57:57 CDT 2019

Import backup pool
Imported; ready to proceed
Processing zsr/R/12.STABLE-2019-06-14
Bring incremental backup up to date
attempting destroy
backup/R/12.STABLE-2019-06-14@zfs-auto-snap_daily-2019-07-10-00h07
success
attempting destroy
backup/R/12.STABLE-2019-06-14@zfs-auto-snap_daily-2019-07-11-00h07
success

It destroyed the snapshot on the backup volume, and panic'd immediately
thereafter.  This is an incremental send.

If I reboot the machine and re-start the backup job it will blow up when
a couple more of the incremental deletes get done.

Given the depth of the callback stack is this simply a kernel stack
exhaustion problem?  I wouldn't THINK it would be, but..

I do have this in /boot/loader.conf:

# Try to avoid kernel stack exhaustion due to TRIM storms.
kern.kstack_pages="6"

The backup volumes are spinning rust, so there should be no "TRIM"
attempts to them.  In theory.

I have the dump if someone wants me to run anything specific against it
in terms of stack frames, etc.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS...

2019-05-09 Thread Karl Denninger

On 5/8/2019 19:28, Kevin P. Neal wrote:
> On Wed, May 08, 2019 at 11:28:57AM -0500, Karl Denninger wrote:
>> If you have pool(s) that are taking *two weeks* to run a scrub IMHO
>> either something is badly wrong or you need to rethink organization of
>> the pool structure -- that is, IMHO you likely either have a severe
>> performance problem with one or more members or an architectural problem
>> you *really* need to determine and fix.  If a scrub takes two weeks
>> *then a resilver could conceivably take that long as well* and that's
>> *extremely* bad as the window for getting screwed is at its worst when a
>> resilver is being run.
> Wouldn't having multiple vdevs mitigate the issue for resilvers (but not
> scrubs)? My understanding, please correct me if I'm wrong, is that a
> resilver only reads the surviving drives in that specific vdev.

Yes.

In addition while "most-modern" revisions have material improvements
(very much so) in scrub times "out of the box" a bit of tuning makes for
very material differences in older revisions.  Specifically maxinflight
can be a big deal given a reasonable amount of RAM (e.g. 16 or 32Gb) as
are async_write_min_active (raise it to "2"; you may get a bit more with
"3", but not a lot)

I have a scrub running right now and this is what it looks like:

Disks   da2   da3   da4   da5   da8   da9  da10  
KB/t  10.40 11.03   103   108   122 98.11 98.48
tps  46    45  1254  1205  1062  1324  1319
MB/s   0.46  0.48   127   127   127   127   127
%busy 0 0    48    62    97    28    31

Here's the current stat on that pool:

  pool: zs
 state: ONLINE
  scan: scrub in progress since Thu May  9 03:10:00 2019
    11.9T scanned at 643M/s, 11.0T issued at 593M/s, 12.8T total
    0 repaired, 85.58% done, 0 days 00:54:29 to go
config:

    NAME   STATE READ WRITE CKSUM
    zs ONLINE   0 0 0
  raidz2-0 ONLINE   0 0 0
    gpt/rust1.eli  ONLINE   0 0 0
    gpt/rust2.eli  ONLINE   0 0 0
    gpt/rust3.eli  ONLINE   0 0 0
    gpt/rust4.eli  ONLINE   0 0 0
    gpt/rust5.eli  ONLINE   0 0 0

errors: No known data errors

Indeed it will be done in about an hour; this is an "automatic" kicked
off out of periodic.  It's comprised of 4Tb disks and is about 70%
occupied.  When I get somewhere around another 5-10% I'll swap in 6Tb
drives for the 4Tb ones and swap in 8Tb "primary" backup disks for the
existing 6Tb ones.

This particular machine has a spinning rust pool (which is this one) and
another that's comprised of 240Gb Intel 730 SSDs (fairly old as SSDs go
but much faster than spinning rust and they have power protection which
IMHO is utterly mandatory for SSDs in any environment where you actually
care about the data being there after a forced, unexpected plug-pull.) 
This machine is UPS-backed with apcupsd monitoring it so *in theory* it
should never have an unsolicited power failure without notice but "crap
happens"; a few years ago there was an undetected fault in one of the
batteries (the UPS didn't know about it despite it being programmed to
do automated self-tests and hadn't reported the fault), power glitched
and blammo -- down it went, no warning.

My current "consider those" SSDs for similar replacement or size
upgrades would likely be the Micron units -- not the fastest out there
but plenty fast, reasonably priced, available in several different
versions depending on write endurance and power-protected.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS...

2019-05-08 Thread Karl Denninger

On 5/8/2019 11:53, Freddie Cash wrote:
> On Wed, May 8, 2019 at 9:31 AM Karl Denninger  wrote:
>
>> I have a system here with about the same amount of net storage on it as
>> you did.  It runs scrubs regularly; none of them take more than 8 hours
>> on *any* of the pools.  The SSD-based pool is of course *much* faster
>> but even the many-way RaidZ2 on spinning rust is an ~8 hour deal; it
>> kicks off automatically at 2:00 AM when the time comes but is complete
>> before noon.  I run them on 14 day intervals.
>>
>  (description elided)

That is a /lot /bigger pool than either Michelle or I are describing.

We're both in the ~20Tb of storage space area.  You're running 5-10x
that in usable space in some of these pools and yet seeing ~2 day scrub
times on a couple of them (that is, the organization looks pretty
reasonable given the size and so is the scrub time), one that's ~5 days
and likely has some issues with parallelism and fragmentation, and then,
well, two awfuls which are both dedup-enabled.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS...

2019-05-08 Thread Karl Denninger

On 5/8/2019 10:14, Michelle Sullivan wrote:
> Paul Mather wrote:
>> On May 8, 2019, at 9:59 AM, Michelle Sullivan 
>> wrote:
>>
>>>> Did you have regular pool scrubs enabled?  It would have picked up
>>>> silent data corruption like this.  It does for me.
>>> Yes, every month (once a month because, (1) the data doesn't change
>>> much (new data is added, old it not touched), and (2) because to
>>> complete it took 2 weeks.)
>>
>>
>> Do you also run sysutils/smartmontools to monitor S.M.A.R.T.
>> attributes?  Although imperfect, it can sometimes signal trouble
>> brewing with a drive (e.g., increasing Reallocated_Sector_Ct and
>> Current_Pending_Sector counts) that can lead to proactive remediation
>> before catastrophe strikes.
> not Automatically
>>
>> Unless you have been gathering periodic drive metrics, you have no
>> way of knowing whether these hundreds of bad sectors have happened
>> suddenly or slowly over a period of time.
> no, it something i have thought about but been unable to spend the
> time on.
>
There are two issues here that would concern me greatly and IMHO you
should address.

I have a system here with about the same amount of net storage on it as
you did.  It runs scrubs regularly; none of them take more than 8 hours
on *any* of the pools.  The SSD-based pool is of course *much* faster
but even the many-way RaidZ2 on spinning rust is an ~8 hour deal; it
kicks off automatically at 2:00 AM when the time comes but is complete
before noon.  I run them on 14 day intervals.

If you have pool(s) that are taking *two weeks* to run a scrub IMHO
either something is badly wrong or you need to rethink organization of
the pool structure -- that is, IMHO you likely either have a severe
performance problem with one or more members or an architectural problem
you *really* need to determine and fix.  If a scrub takes two weeks
*then a resilver could conceivably take that long as well* and that's
*extremely* bad as the window for getting screwed is at its worst when a
resilver is being run.

Second, smartmontools/smartd isn't the be-all, end-all but it *does*
sometimes catch incipient problems with specific units before they turn
into all-on death and IMHO in any installation of any material size
where one cares about the data (as opposed to "if it fails just restore
it from backup") it should be running.  It's very easy to set up and
there are no real downsides to using it.  I have one disk that I rotate
in and out that was bought as a "refurb" and has 70 permanent relocated
sectors on it.  It has never grown another one since I acquired it, but
every time it goes in the machine within minutes I get an alert on
that.  If I was to ever get *71*, or a *different* drive grew a new one
said drive would get replaced *instantly*.  Over the years it has
flagged two disks before they "hard failed" and both were immediately
taken out of service, replaced and then destroyed and thrown away. 
Maybe that's me being paranoid but IMHO it's the correct approach to
such notifications.

BTW that tool will *also* tell you if something else software-wise is
going on that you *might* think is drive-related.  For example recently
here on the list I ran into a really oddball thing happening with SAS
expanders that showed up with 12-STABLE and was *not* present in the
same box with 11.1.  Smartmontools confirmed that while the driver was
reporting errors from the disks *the disks themselves were not in fact
taking errors.*  Had I not had that information I might well have
traveled down a road that led to a catastrophic pool failure by
attempting to replace disks that weren't actually bad.  The SAS expander
wound up being taken out of service and replaced with an HBA that has
more ports -- the issues disappeared.

Finally, while you *think* you only have a metadata problem I'm with the
other people here in expressing disbelief that the damage is limited to
that.  There is enough redundancy in the metadata on ZFS that if *all*
copies are destroyed or inconsistent to the degree that they're unusable
it's extremely likely that if you do get some sort of "disaster
recovery" tool working you're going to find out that what you thought
was a metadata problem is really a "you're hosed; the data is also gone"
sort of problem.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS...

2019-05-07 Thread Karl Denninger

 to try to separate them until you get well into the
terabytes of storage range and a half-dozen or so physical volumes. 
There's a  very clean argument that prior to that point but with greater
than one drive mirrored is always the better choice.

Note that if you have an *adapter* go insane (and as I've noted here
I've had it happen TWICE in my IT career!) then *all* of the data on the
disks served by that adapter is screwed.

It doesn't make a bit of difference what filesystem you're using in that
scenario and thus you had better have a backup scheme and make sure it
works as well, never mind software bugs or administrator stupidity ("dd"
as root to the wrong target, for example, will reliably screw you every
single time!)

For a single-disk machine ZFS is no *less* safe than UFS and provides a
number of advantages, with arguably the most-important being easily-used
snapshots.  Not only does this simplify backups since coherency during
the backup is never at issue and incremental backups become fast and
easily-done in addition boot environments make roll-forward and even
*roll-back* reasonable to implement for software updates -- a critical
capability if you ever run an OS version update and something goes
seriously wrong with it.  If you've never had that happen then consider
yourself blessed; it's NOT fun to manage in a UFS environment and often
winds up leading to a "restore from backup" scenario.  (To be fair it
can be with ZFS too if you're foolish enough to upgrade the pool before
being sure you're happy with the new OS rev.)

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS...

2019-04-30 Thread Karl Denninger

On 4/30/2019 20:59, Michelle Sullivan wrote
>> On 01 May 2019, at 11:33, Karl Denninger  wrote:
>>
>>> On 4/30/2019 19:14, Michelle Sullivan wrote:
>>>
>>> Michelle Sullivan
>>> http://www.mhix.org/
>>> Sent from my iPad
>>>
>> Nope.  I'd much rather *know* the data is corrupt and be forced to
>> restore from backups than to have SILENT corruption occur and perhaps
>> screw me 10 years down the road when the odds are my backups have
>> long-since been recycled.
> Ahh yes the be all and end all of ZFS.. stops the silent corruption of data.. 
> but don’t install it on anything unless it’s server grade with backups and 
> ECC RAM, but it’s good on laptops because it protects you from silent 
> corruption of your data when 10 years later the backups have long-since been 
> recycled...  umm is that not a circular argument?
>
> Don’t get me wrong here.. and I know you (and some others are) zfs in the DC 
> with 10s of thousands in redundant servers and/or backups to keep your 
> critical data corruption free = good thing.
>
> ZFS on everything is what some say (because it prevents silent corruption) 
> but then you have default policies to install it everywhere .. including 
> hardware not equipped to function safely with it (in your own arguments) and 
> yet it’s still good because it will still prevent silent corruption even 
> though it relies on hardware that you can trust...  umm say what?
>
> Anyhow veered way way off (the original) topic...
>
> Modest (part consumer grade, part commercial) suffered irreversible data loss 
> because of a (very unusual, but not impossible) double power outage.. and no 
> tools to recover the data (or part data) unless you have some form of backup 
> because the file system deems the corruption to be too dangerous to let you 
> access any of it (even the known good bits) ...  
>
> Michelle

IMHO you're dead wrong Michelle.  I respect your opinion but disagree
vehemently.

I run ZFS on both of my laptops under FreeBSD.  Both have
non-power-protected SSDs in them.  Neither is mirrored or Raidz-anything.

So why run ZFS instead of UFS?

Because a scrub will detect data corruption that UFS cannot detect *at all.*

It is a balance-of-harms test and you choose.  I can make a very clean
argument that *greater information always wins*; that is, I prefer in
every case to *know* I'm screwed rather than not.  I can defend against
being screwed with some amount of diligence but in order for that
diligence to be reasonable I have to know about the screwing in a
reasonable amount of time after it happens.

You may have never had silent corruption bite you.  I have had it happen
several times over my IT career.  If that happens to you the odds are
that it's absolutely unrecoverable and whatever gets corrupted is
*gone.*  The defensive measures against silent corruption require
retention of backup data *literally forever* for the entire useful life
of the information because from the point of corruption forward *the
backups are typically going to be complete and correct copies of the
corrupt data and thus equally worthless to what's on the disk itself.* 
With non-ZFS filesystems quite a lot of thought and care has to go into
defending against that, and said defense usually requires the active
cooperation of whatever software wrote said file in the first place
(e.g. a database, etc.)  If said software has no tools to "walk" said
data or if it's impractical to have it do so you're at severe risk of
being hosed.  Prior to ZFS there really wasn't any comprehensive defense
against this sort of event.  There are a whole host of applications that
manipulate data that are absolutely reliant on that sort of thing not
happening (e.g. anything using a btree data structure) and recovery if
it *does* happen is a five-alarm nightmare if it's possible at all.  In
the worst-case scenario you don't detect the corruption and the data
that has the pointer to it that gets corrupted is overwritten and 
destroyed.

A ZFS scrub on a volume that has no redundancy cannot *fix* that
corruption but it can and will detect it.  This puts a boundary on the
backups that I must keep in order to *not* have that happen.  This is of
very high value to me and is why, even on systems without ECC memory and
without redundant disks, provided there is enough RAM to make it
reasonable (e.g. not on embedded systems I do development on with are
severely RAM-constrained) I run ZFS.

BTW if you've never had a UFS volume unlink all the blocks within a file
on an fsck and then recover them back into the free list after a crash
you're a rare bird indeed.  If you think a corrupt ZFS volume is fun try
to get your data back from said file after that happens.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS...

2019-04-30 Thread Karl Denninger


On 4/30/2019 19:14, Michelle Sullivan wrote:
>
> Michelle Sullivan
> http://www.mhix.org/
> Sent from my iPad
>
>> On 01 May 2019, at 01:15, Karl Denninger  wrote:
>>
>>
>> IMHO non-ECC memory systems are ok for personal desktop and laptop
>> machines where loss of stored data requiring a restore is acceptable
>> (assuming you have a reasonable backup paradigm for same) but not for
>> servers and *especially* not for ZFS storage.  I don't like the price of
>> ECC memory and I really don't like Intel's practices when it comes to
>> only enabling ECC RAM on their "server" class line of CPUs either but it
>> is what it is.  Pay up for the machines where it matters.
> And the irony is the FreeBSD policy to default to zfs on new installs using 
> the complete drive.. even when there is only one disk available and 
> regardless of the cpu or ram class...  with one usb stick I have around here 
> it attempted to use zfs on one of my laptops.
>
> Damned if you do, damned if you don’t comes to mind.
>
Nope.  I'd much rather *know* the data is corrupt and be forced to
restore from backups than to have SILENT corruption occur and perhaps
screw me 10 years down the road when the odds are my backups have
long-since been recycled.
-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS...

2019-04-30 Thread Karl Denninger

On 4/30/2019 09:12, Alan Somers wrote:
> On Tue, Apr 30, 2019 at 8:05 AM Michelle Sullivan  wrote:
> .
>> I know this... unless I misread Karl’s message he implied the ECC would have 
>> saved the corruption in the crash... which is patently false... I think 
>> you’ll agree..
> I don't think that's what Karl meant.  I think he meant that the
> non-ECC RAM could've caused latent corruption that was only detected
> when the crash forced a reboot and resilver.

Exactly.

Non-ECC memory means you can potentially write data to *all* copies of a
block (and its parity in the case of a Raidz) where the checksum is
invalid and there is no way for the code to know it happened or defend
against it.  Unfortunately since the checksum is very small compared to
the data size the odds are that IF that happens it's the *data* and not
the checksum that's bad and there are *no* good copies.

Contrary to popular belief the  "power good" signal on your PSU and MB
do not provide 100% protection against transient power problems causing
this to occur with non-ECC memory either.

IMHO non-ECC memory systems are ok for personal desktop and laptop
machines where loss of stored data requiring a restore is acceptable
(assuming you have a reasonable backup paradigm for same) but not for
servers and *especially* not for ZFS storage.  I don't like the price of
ECC memory and I really don't like Intel's practices when it comes to
only enabling ECC RAM on their "server" class line of CPUs either but it
is what it is.  Pay up for the machines where it matters.

One of the ironies is that there's better data *integrity* with ZFS than
other filesystems in this circumstance; you're much more-likely to
*know* you're hosed even if the situation is unrecoverable and requires
a restore.  With UFS and other filesystems you can quite-easily wind up
with silent corruption that can go undetected; the filesystem "works"
just fine but the data is garbage.  From my point of view that's *much*
worse.

In addition IMHO consumer drives are not exactly safe for online ZFS
storage.  Ironically they're *safer* for archival use because when not
actively in use they're dismounted and thus not subject to "you're
silently hosed" sort of failures.  What sort of "you're hosed"
failures?  Oh, for example, claiming to have flushed their cache buffers
before returning "complete" on that request when they really did not! 
In combination with write re-ordering that can *really* screw you and
there's nothing that any filesystem can defensively do about it either. 
This sort of "cheat" is much-more likely to be present in consumer
drives than ones sold for either enterprise or NAS purposes and it's
quite difficult to accurately test for this sort of thing on an
individual basis too.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS...

2019-04-30 Thread Karl Denninger

On 4/30/2019 08:38, Michelle Sullivan wrote:
> Karl Denninger wrote:
>> On 4/30/2019 03:09, Michelle Sullivan wrote:
>>> Consider..
>>>
>>> If one triggers such a fault on a production server, how can one
>>> justify transferring from backup multiple terabytes (or even
>>> petabytes now) of data to repair an unmountable/faulted array
>>> because all backup solutions I know currently would take days if not
>>> weeks to restore the sort of store ZFS is touted with supporting.
>> Had it happen on a production server a few years back with ZFS.  The
>> *hardware* went insane (disk adapter) and scribbled on *all* of the
>> vdevs.
>>
>> 
>> Time to recover essential functions was ~8 hours (and over 24
>> hours for everything to be restored.)
>>
> How big was the storage area?
>
In that specific instance approximately 10Tb in-use.  The working set
that allowed critical functions to come online (and which got restored
first, obviously) was ~3Tb.

BTW my personal "primary server" working set is approximately 20Tb. 
There is data on that server dating back to 1982 -- yes, data all the
way back to systems I owned that ran on a Z-80 processor with 64Kb (not
MB) of RAM.  I started running ZFS a fairly long time ago on FreeBSD --
9.0, I believe, and have reorganized and upgraded drives over time.  If
my storage fails "hard" in a way that I have no local backups available
(e.g. building fire, adapter scribbles on drives including not-mounted
ones, etc) critical functionality (e.g. email receipt, etc) can be back
online in roughly 3-4 hours, assuming the bank is open and I can get to
the backup volumes.  A full restore will require more than a day.  I've
tested restore of each individual piece of the backup structure but do
not have the hardware available in the building to restore a complete
clone.  With the segregated structure of it, however, I'm 100% certain
it is all restorable.  That's tested regularly -- just to be sure.

Now if we get nuked and the bank vault is also destroyed then it's over,
but then again I'm probably a burnt piece of charcoal in such a
circumstance so that's a risk I accept.

When I ran my ISP in the 1990s we had both local copies and vault copies
because a "scribbles on disk" failure on a Saturday couldn't be unable
to be addressed until Monday morning.  We would have been out of
business instantly if that had happened in any situation short of the
office with our primary data center burning down.  Incidentally one of
my adapter failures was in exactly the worst possible place for it to
occur while running that company -- the adapter on the machine that held
our primary authentication and billing database.

At the time the best option for "large" working sets was DLT.  Now for
most purposes it's another disk.  Disks, however, must be re-verified
more-frequently than DLT -- MUCH more frequently.  Further, if you have
only ONE backup then it cannot be singular (e.g. there must be two or
more, whether via mirroring or some other mechanism) on ANY media. 
While DLT, for example, has a typical expected 30 year archival life
that doesn't mean the ONE tape you have will be readable 30 years later.

As data size expands noodling on how you segregate data into read-only,
write-very-occasionally and read/write, along with how you handle
backups of each component and how, or if, you subdivide those categories
for backup purposes becomes quite important.  If performance matters
(and it usually does) then what goes where in what pool (and across
pools of similar base storage types) matters too; in my personal working
set there is both SSD (all "power-pull safe" drives, which cost more and
tend to be somewhat slower than "consumer" SSDs) and spinning rust
storage for that exact reason.

Note that on this list right now I'm chasing a potential "gotcha"
interaction between geli and ZFS that thus far has eluded isolation. 
While it has yet to corrupt data the potential is there and the hair on
the back of my neck is standing up a bit as a consequence.  It appears
to date to either 11.2 or 12.0 and *definitely* is present in 12-STABLE;
it was *not* present on 11.1.

The price of keeping your data intact is always eternal vigilance.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS...

2019-04-30 Thread Karl Denninger

On 4/30/2019 03:09, Michelle Sullivan wrote:
> Consider..
>
> If one triggers such a fault on a production server, how can one justify 
> transferring from backup multiple terabytes (or even petabytes now) of data 
> to repair an unmountable/faulted array because all backup solutions I 
> know currently would take days if not weeks to restore the sort of store ZFS 
> is touted with supporting.  

Had it happen on a production server a few years back with ZFS.  The
*hardware* went insane (disk adapter) and scribbled on *all* of the vdevs.

The machine crashed and would not come back up -- at all.  I insist on
(and had) emergency boot media physically in the box (a USB key) in any
production machine and it was quite-quickly obvious that all of the
vdevs were corrupted beyond repair.  There was no rational option other
than to restore.

It was definitely not a pleasant experience, but this is why when you
get into systems and data store sizes where it's a five-alarm pain in
the neck you must figure out some sort of strategy that covers you 99%
of the time without a large amount of downtime involved, and in the 1%
case accept said downtime.  In this particular circumstance the customer
didn't want to spend on a doubled-and-transaction-level protected
on-site (in the same DC) redundancy setup originally so restore, as
opposed to fail-over/promote and then restore and build a new
"redundant" box where the old "primary" resided was the most-viable
option.  Time to recover essential functions was ~8 hours (and over 24
hours for everything to be restored.)

Incidentally that's not the first time I've had a disk adapter failure
on a production machine in my career as a systems dude; it was, in fact,
the *third* such failure.  Then again I've been doing this stuff since
the 1980s and learned long ago that if it can break it eventually will,
and that Murphy is a real b**.

The answer to your question Michelle is that when restore times get into
"seriously disruptive" amounts of time (e.g. hours, days or worse
depending on the application involved and how critical it is) you spend
the time and money to have redundancy in multiple places and via paths
that do not destroy the redundant copies when things go wrong, and you
spend the engineering time to figure out what those potential faults are
and how to design such that a fault which can destroy the data set does
not propagate to the redundant copies before it is detected.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: ZFS...

2019-04-30 Thread Karl Denninger

On 4/30/2019 05:14, Michelle Sullivan wrote:
>> On 30 Apr 2019, at 19:50, Xin LI  wrote:
>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan  
>>> wrote:
>>> but in my recent experience 2 issues colliding at the same time results in 
>>> disaster
>> Do we know exactly what kind of corruption happen to your pool?  If you see 
>> it twice in a row, it might suggest a software bug that should be 
>> investigated.
>>
>> All I know is it’s a checksum error on a meta slab (122) and from what I can 
>> gather it’s the spacemap that is corrupt... but I am no expert.  I don’t 
>> believe it’s a software fault as such, because this was cause by a hard 
>> outage (damaged UPSes) whilst resilvering a single (but completely failed) 
>> drive.  ...and after the first outage a second occurred (same as the first 
>> but more damaging to the power hardware)... the host itself was not damaged 
>> nor were the drives or controller.
.
>> Note that ZFS stores multiple copies of its essential metadata, and in my 
>> experience with my old, consumer grade crappy hardware (non-ECC RAM, with 
>> several faulty, single hard drive pool: bad enough to crash almost monthly 
>> and damages my data from time to time),
> This was a top end consumer grade mb with non ecc ram that had been running 
> for 8+ years without fault (except for hard drive platter failures.). Uptime 
> would have been years if it wasn’t for patching.

Yuck.

I'm sorry, but that may well be what nailed you.

ECC is not just about the random cosmic ray.  It also saves your bacon
when there are power glitches.

Unfortunately however there is also cache memory on most modern hard
drives, most of the time (unless you explicitly shut it off) it's on for
write caching, and it'll nail you too.  Oh, and it's never, in my
experience, ECC.

In addition, however, and this is something I learned a LONG time ago
(think Z-80 processors!) is that as in so many very important things
"two is one and one is none."

In other words without a backup you WILL lose data eventually, and it
WILL be important.

Raidz2 is very nice, but as the name implies it you have two
redundancies.  If you take three errors, or if, God forbid, you *write*
a block that has a bad checksum in it because it got scrambled while in
RAM, you're dead if that happens in the wrong place.

> Yeah.. unlike UFS that has to get really really hosed to restore from backup 
> with nothing recoverable it seems ZFS can get hosed where issues occur in 
> just the wrong bit... but mostly it is recoverable (and my experience has 
> been some nasty shit that always ended up being recoverable.)
>
> Michelle 

Oh that is definitely NOT true again, from hard experience,
including (but not limited to) on FreeBSD.

My experience is that ZFS is materially more-resilient but there is no
such thing as "can never be corrupted by any set of events."  Backup
strategies for moderately large (e.g. many Terabytes) to very large
(e.g. Petabytes and beyond) get quite complex but they're also very
necessary.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20) [[UPDATE w/more tests]]

2019-04-28 Thread Karl Denninger

On 4/20/2019 15:56, Steven Hartland wrote:
> Thanks for extra info, the next question would be have you eliminated
> that corruption exists before the disk is removed?
>
> Would be interesting to add a zpool scrub to confirm this isn't the
> case before the disk removal is attempted.
>
>     Regards
>     Steve
>
> On 20/04/2019 18:35, Karl Denninger wrote:
>>
>> On 4/20/2019 10:50, Steven Hartland wrote:
>>> Have you eliminated geli as possible source?
>> No; I could conceivably do so by re-creating another backup volume
>> set without geli-encrypting the drives, but I do not have an extra
>> set of drives of the capacity required laying around to do that.  I
>> would have to do it with lower-capacity disks, which I can attempt if
>> you think it would help.  I *do* have open slots in the drive
>> backplane to set up a second "test" unit of this sort.  For reasons
>> below it will take at least a couple of weeks to get good data on
>> whether the problem exists without geli, however.
>>
Ok, following up on this with more data

First step taken was to create a *second* backup pool (I have the
backplane slots open, fortunately) with three different disks but *no
encryption.*

I ran both side-by-side for several days, with the *unencrypted* one
operating with one disk detached and offline (pulled physically) just as
I do normally.  Then I swapped the two using the same paradigm.

The difference was *dramatic* -- the resilver did *not* scan the entire
disk; it only copied the changed blocks and was finished FAST.  A
subsequent scrub came up 100% clean.

Next I put THOSE disks in the vault (so as to make sure I didn't get
hosed if something went wrong) and re-initialized the pool in question,
leaving only the "geli" alone (in other words I zpool destroy'd the pool
and then re-created it with all three disks connected and
geli-attached.)  The purpose for doing this was to eliminate the
possibility of old corruption somewhere, or some sort of problem with
multiple, spanning years, in-place "zpool upgrade" commands.  Then I ran
a base backup to initialize all three volumes, detached one and yanked
it out of the backplane, as would be the usual, leaving the other two
online and operating.

I ran backups as usual for most of last week after doing this, with the
61.eli and 62-1.eli volumes online, and 62-2 physically out of the
backplane.

Today I swapped them again as I usually do (e.g. offline 62.1, geli
detach, camcontrol standby and then yank it -- then insert the 62-2
volume, geli attach and zpool online) and this is happening:

[\u@NewFS /home/karl]# zpool status backup
  pool: backup
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Apr 28 12:57:47 2019
    2.48T scanned at 202M/s, 1.89T issued at 154M/s, 3.27T total
    1.89T resilvered, 57.70% done, 0 days 02:37:14 to go
config:

    NAME  STATE READ WRITE CKSUM
    backup    DEGRADED 0 0 0
  mirror-0    DEGRADED 0 0 0
    gpt/backup61.eli  ONLINE   0 0 0
    11295390187305954877  OFFLINE  0 0 0  was
/dev/gpt/backup62-1.eli
    gpt/backup62-2.eli    ONLINE   0 0 0

errors: No known data errors

The "3.27T" number is accurate (by "zpool list") for the space in use.

There is not a snowball's chance in Hades that anywhere near 1.89T of
that data (thus far, and it ain't done as you can see!) was modified
between when all three disks were online and when the 62-2.eli volume
was swapped back in for 62.1.eli.  No possible way.  Maybe some
100-200Gb of data has been touched across the backed-up filesystems in
the last three-ish days but there's just flat-out no way it's more than
that; this would imply an entropy of well over 50% of the writeable data
on this box in less than a week!  That's NOT possible.  Further it's not
100%; it shows 2.48T scanned but 1.89T actually written to the other drive.

So something is definitely foooged here and it does appear that geli is
involved in it.  Whatever is foooging zfs the resilver process thinks it
has to recopy MOST (but not all!) of the blocks in use, it appears, from
the 61.eli volume to the 62-2.eli volume.

The question is what would lead ZFS to think it has to do that -- it
clearly DOES NOT as a *much* smaller percentage of the total TXG set on
61.eli was modified while 62-2.eli was offline and 62.1.eli was online.

Again I note that on 11.1 and previous this resilver was a rapid
operation; whatever was actually changed got copied but the system never
copied *nearly everythi

Pkg upgrade for 12-STABLE builds in "Latest" broken?

2019-04-26 Thread Karl Denninger

[\u@NewFS /home/karl]# pkg upgrade dovecot

Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

Installed packages to be UPGRADED:
    dovecot: 2.3.4.1 -> 2.3.5

Number of packages to be upgraded: 1

4 MiB to be downloaded.

Proceed with this action? [y/N]: y
pkg:
http://pkg.FreeBSD.org/FreeBSD:12:amd64/latest/All/dovecot-2.3.5.txz:
Not Found
[\u@NewFS /home/karl]#

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-20 Thread Karl Denninger

No; I can, but of course that's another ~8 hour (overnight) delay
between swaps.

That's not a bad idea however

On 4/20/2019 15:56, Steven Hartland wrote:
> Thanks for extra info, the next question would be have you eliminated
> that corruption exists before the disk is removed?
>
> Would be interesting to add a zpool scrub to confirm this isn't the
> case before the disk removal is attempted.
>
>     Regards
>     Steve
>
> On 20/04/2019 18:35, Karl Denninger wrote:
>>
>> On 4/20/2019 10:50, Steven Hartland wrote:
>>> Have you eliminated geli as possible source?
>> No; I could conceivably do so by re-creating another backup volume
>> set without geli-encrypting the drives, but I do not have an extra
>> set of drives of the capacity required laying around to do that. I
>> would have to do it with lower-capacity disks, which I can attempt if
>> you think it would help.  I *do* have open slots in the drive
>> backplane to set up a second "test" unit of this sort.  For reasons
>> below it will take at least a couple of weeks to get good data on
>> whether the problem exists without geli, however.
>>>
>>> I've just setup an old server which has a LSI 2008 running and old
>>> FW (11.0) so was going to have a go at reproducing this.
>>>
>>> Apart from the disconnect steps below is there anything else needed
>>> e.g. read / write workload during disconnect?
>>
>> Yes.  An attempt to recreate this on my sandbox machine using smaller
>> disks (WD RE-320s) and a decent amount of read/write activity (tens
>> to ~100 gigabytes) on a root mirror of three disks with one taken
>> offline did not succeed.  It *reliably* appears, however, on my
>> backup volumes with every drive swap. The sandbox machine is
>> physically identical other than the physical disks; both are Xeons
>> with ECC RAM in them.
>>
>> The only operational difference is that the backup volume sets have a
>> *lot* of data written to them via zfs send|zfs recv over the
>> intervening period where with "ordinary" activity from I/O (which was
>> the case on my sandbox) the I/O pattern is materially different.  The
>> root pool on the sandbox where I tried to reproduce it synthetically
>> *is* using geli (in fact it boots native-encrypted.)
>>
>> The "ordinary" resilver on a disk swap typically covers ~2-3Tb and is
>> a ~6-8 hour process.
>>
>> The usual process for the backup pool looks like this:
>>
>> Have 2 of the 3 physical disks mounted; the third is in the bank vault.
>>
>> Over the space of a week, the backup script is run daily.  It first
>> imports the pool and then for each zfs filesystem it is backing up
>> (which is not all of them; I have a few volatile ones that I don't
>> care if I lose, such as object directories for builds and such, plus
>> some that are R/O data sets that are backed up separately) it does:
>>
>> If there is no "...@zfs-base": zfs snapshot -r ...@zfs-base; zfs send
>> -R ...@zfs-base | zfs receive -Fuvd $BACKUP
>>
>> else
>>
>> zfs rename -r ...@zfs-base ...@zfs-old
>> zfs snapshot -r ...@zfs-base
>>
>> zfs send -RI ...@zfs-old ...@zfs-base |zfs recv -Fudv $BACKUP
>>
>>  if ok then zfs destroy -vr ...@zfs-old otherwise print a
>> complaint and stop.
>>
>> When all are complete it then does a "zpool export backup" to detach
>> the pool in order to reduce the risk of "stupid root user" (me)
>> accidents.
>>
>> In short I send an incremental of the changes since the last backup,
>> which in many cases includes a bunch of automatic snapshots that are
>> taken on frequent basis out of the cron. Typically there are a week's
>> worth of these that accumulate between swaps of the disk to the
>> vault, and the offline'd disk remains that way for a week.  I also
>> wait for the zpool destroy on each of the targets to drain before
>> continuing, as not doing so back in the 9 and 10.x days was a good
>> way to stimulate an instant panic on re-import the next day due to
>> kernel stack page exhaustion if the previous operation destroyed
>> hundreds of gigabytes of snapshots (which does routinely happen as
>> part of the backed up data is Macrium images from PCs, so when a new
>> month comes around the PC's backup routine removes a huge amount of
>> old data from the filesystem.)
>>
>> Trying to simulate the checksum errors in a few hours' time thus far
>> has failed.  But every time I swap the

Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-20 Thread Karl Denninger

 0 0
    13282812295755460479  OFFLINE  0 0 0  was
/dev/gpt/backup62-2.eli

errors: No known data errors

It knows it fixed the checksums but the error count is zero -- I did NOT
"zpool clear".

This may have been present in 11.2; I didn't run that long enough in
this environment to know.  It definitely was *not* present in 11.1 and
before; the same data structure and script for backups has been in use
for a very long time without any changes and this first appeared when I
upgraded from 11.1 to 12.0 on this specific machine, with the exact same
physical disks being used for over a year (they're currently 6Tb units;
the last change out for those was ~1.5 years ago when I went from 4Tb to
6Tb volumes.)  I have both HGST-NAS and He-Enterprise disks in the
rotation and both show identical behavior so it doesn't appear to be
related to a firmware problem in one disk .vs. the other (e.g. firmware
that fails to flush the on-drive cache before going to standby even
though it was told to.)

>
> mps0:  port 0xe000-0xe0ff mem
> 0xfaf3c000-0xfaf3,0xfaf4-0xfaf7 irq 26 at device 0.0 on pci3
> mps0: Firmware: 11.00.00.00, Driver: 21.02.00.00-fbsd
> mps0: IOCCapabilities:
> 185c
>
>     Regards
>     Steve
>
> On 20/04/2019 15:39, Karl Denninger wrote:
>> I can confirm that 20.00.07.00 does *not* stop this.
>> The previous write/scrub on this device was on 20.00.07.00.  It was
>> swapped back in from the vault yesterday, resilvered without incident,
>> but a scrub says
>>
>> root@NewFS:/home/karl # zpool status backup
>>    pool: backup
>>   state: DEGRADED
>> status: One or more devices has experienced an unrecoverable error.  An
>>  attempt was made to correct the error.  Applications are
>> unaffected.
>> action: Determine if the device needs to be replaced, and clear the
>> errors
>>  using 'zpool clear' or replace the device with 'zpool replace'.
>>     see: http://illumos.org/msg/ZFS-8000-9P
>>    scan: scrub repaired 188K in 0 days 09:40:18 with 0 errors on Sat Apr
>> 20 08:45:09 2019
>> config:
>>
>>  NAME  STATE READ WRITE CKSUM
>>  backup    DEGRADED 0 0 0
>>    mirror-0    DEGRADED 0 0 0
>>  gpt/backup61.eli  ONLINE   0 0 0
>>  gpt/backup62-1.eli    ONLINE   0 0    47
>>  13282812295755460479  OFFLINE  0 0 0  was
>> /dev/gpt/backup62-2.eli
>>
>> errors: No known data errors
>>
>> So this is firmware-invariant (at least between 19.00.00.00 and
>> 20.00.07.00); the issue persists.
>>
>> Again, in my instance these devices are never removed "unsolicited" so
>> there can't be (or at least shouldn't be able to) unflushed data in the
>> device or kernel cache.  The procedure is and remains:
>>
>> zpool offline .
>> geli detach .
>> camcontrol standby ...
>>
>> Wait a few seconds for the spindle to spin down.
>>
>> Remove disk.
>>
>> Then of course on the other side after insertion and the kernel has
>> reported "finding" the device:
>>
>> geli attach ...
>> zpool online 
>>
>> Wait...
>>
>> If this is a boogered TXG that's held in the metadata for the
>> "offline"'d device (maybe "off by one"?) that's potentially bad in that
>> if there is an unknown failure in the other mirror component the
>> resilver will complete but data has been irrevocably destroyed.
>>
>> Granted, this is a very low probability scenario (the area where the bad
>> checksums are has to be where the corruption hits, and it has to happen
>> between the resilver and access to that data.)  Those are long odds but
>> nonetheless a window of "you're hosed" does appear to exist.
>>
>
-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-20 Thread Karl Denninger


On 4/13/2019 06:00, Karl Denninger wrote:
> On 4/11/2019 13:57, Karl Denninger wrote:
>> On 4/11/2019 13:52, Zaphod Beeblebrox wrote:
>>> On Wed, Apr 10, 2019 at 10:41 AM Karl Denninger  wrote:
>>>
>>>
>>>> In this specific case the adapter in question is...
>>>>
>>>> mps0:  port 0xc000-0xc0ff mem
>>>> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3
>>>> mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
>>>> mps0: IOCCapabilities:
>>>> 1285c
>>>>
>>>> Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects
>>>> his drives via dumb on-MoBo direct SATA connections.
>>>>
>>> Maybe I'm in good company.  My current setup has 8 of the disks connected
>>> to:
>>>
>>> mps0:  port 0xb000-0xb0ff mem
>>> 0xfe24-0xfe24,0xfe20-0xfe23 irq 32 at device 0.0 on pci6
>>> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd
>>> mps0: IOCCapabilities:
>>> 5a85c
>>>
>>> ... just with a cable that breaks out each of the 2 connectors into 4
>>> SATA-style connectors, and the other 8 disks (plus boot disks and SSD
>>> cache/log) connected to ports on...
>>>
>>> - ahci0:  port
>>> 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem
>>> 0xfe90-0xfe9001ff irq 44 at device 0.0 on pci2
>>> - ahci2:  port
>>> 0xa050-0xa057,0xa040-0xa043,0xa030-0xa037,0xa020-0xa023,0xa000-0xa01f mem
>>> 0xfe61-0xfe6107ff irq 40 at device 0.0 on pci7
>>> - ahci3:  port
>>> 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem
>>> 0xfea07000-0xfea073ff irq 19 at device 17.0 on pci0
>>>
>>> ... each drive connected to a single port.
>>>
>>> I can actually reproduce this at will.  Because I have 16 drives, when one
>>> fails, I need to find it.  I pull the sata cable for a drive, determine if
>>> it's the drive in question, if not, reconnect, "ONLINE" it and wait for
>>> resilver to stop... usually only a minute or two.
>>>
>>> ... if I do this 4 to 6 odd times to find a drive (I can tell, in general,
>>> that a drive is part of the SAS controller or the SATA controllers... so
>>> I'm only looking among 8, ever) ... then I "REPLACE" the problem drive.
>>> More often than not, the a scrub will find a few problems.  In fact, it
>>> appears that the most recent scrub is an example:
>>>
>>> [1:7:306]dgilbert@vr:~> zpool status
>>>   pool: vr1
>>>  state: ONLINE
>>>   scan: scrub repaired 32K in 47h16m with 0 errors on Mon Apr  1 23:12:03
>>> 2019
>>> config:
>>>
>>> NAMESTATE READ WRITE CKSUM
>>> vr1 ONLINE   0 0 0
>>>   raidz2-0  ONLINE   0 0 0
>>> gpt/v1-d0   ONLINE   0 0 0
>>> gpt/v1-d1   ONLINE   0 0 0
>>> gpt/v1-d2   ONLINE   0 0 0
>>> gpt/v1-d3   ONLINE   0 0 0
>>> gpt/v1-d4   ONLINE   0 0 0
>>> gpt/v1-d5   ONLINE   0 0 0
>>> gpt/v1-d6   ONLINE   0 0 0
>>> gpt/v1-d7   ONLINE   0 0 0
>>>   raidz2-2  ONLINE   0 0 0
>>> gpt/v1-e0c  ONLINE   0 0 0
>>> gpt/v1-e1b  ONLINE   0 0 0
>>> gpt/v1-e2b  ONLINE   0 0 0
>>> gpt/v1-e3b  ONLINE   0 0 0
>>> gpt/v1-e4b  ONLINE   0 0 0
>>> gpt/v1-e5a  ONLINE   0 0 0
>>> gpt/v1-e6a  ONLINE   0 0 0
>>> gpt/v1-e7c  ONLINE   0 0 0
>>> logs
>>>   gpt/vr1logONLINE   0 0 0
>>> cache
>>>   gpt/vr1cache  ONLINE   0 0 0
>>>
>>> errors: No known data errors
>>>
>>> ... it doesn't say it now, but there were 5 CKSUM errors on one of the
>>> drives that I had trial-removed (and not on the one replaced).
>>> ___
>> That is EXACTLY what I'm seeing; the "OFFLINE'd" drive is the one that,
>> after a scrub, comes up with the checksum errors.  It does *not* fla

Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-13 Thread Karl Denninger

On 4/11/2019 13:57, Karl Denninger wrote:
> On 4/11/2019 13:52, Zaphod Beeblebrox wrote:
>> On Wed, Apr 10, 2019 at 10:41 AM Karl Denninger  wrote:
>>
>>
>>> In this specific case the adapter in question is...
>>>
>>> mps0:  port 0xc000-0xc0ff mem
>>> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3
>>> mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
>>> mps0: IOCCapabilities:
>>> 1285c
>>>
>>> Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects
>>> his drives via dumb on-MoBo direct SATA connections.
>>>
>> Maybe I'm in good company.  My current setup has 8 of the disks connected
>> to:
>>
>> mps0:  port 0xb000-0xb0ff mem
>> 0xfe24-0xfe24,0xfe20-0xfe23 irq 32 at device 0.0 on pci6
>> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd
>> mps0: IOCCapabilities:
>> 5a85c
>>
>> ... just with a cable that breaks out each of the 2 connectors into 4
>> SATA-style connectors, and the other 8 disks (plus boot disks and SSD
>> cache/log) connected to ports on...
>>
>> - ahci0:  port
>> 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem
>> 0xfe90-0xfe9001ff irq 44 at device 0.0 on pci2
>> - ahci2:  port
>> 0xa050-0xa057,0xa040-0xa043,0xa030-0xa037,0xa020-0xa023,0xa000-0xa01f mem
>> 0xfe61-0xfe6107ff irq 40 at device 0.0 on pci7
>> - ahci3:  port
>> 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem
>> 0xfea07000-0xfea073ff irq 19 at device 17.0 on pci0
>>
>> ... each drive connected to a single port.
>>
>> I can actually reproduce this at will.  Because I have 16 drives, when one
>> fails, I need to find it.  I pull the sata cable for a drive, determine if
>> it's the drive in question, if not, reconnect, "ONLINE" it and wait for
>> resilver to stop... usually only a minute or two.
>>
>> ... if I do this 4 to 6 odd times to find a drive (I can tell, in general,
>> that a drive is part of the SAS controller or the SATA controllers... so
>> I'm only looking among 8, ever) ... then I "REPLACE" the problem drive.
>> More often than not, the a scrub will find a few problems.  In fact, it
>> appears that the most recent scrub is an example:
>>
>> [1:7:306]dgilbert@vr:~> zpool status
>>   pool: vr1
>>  state: ONLINE
>>   scan: scrub repaired 32K in 47h16m with 0 errors on Mon Apr  1 23:12:03
>> 2019
>> config:
>>
>> NAMESTATE READ WRITE CKSUM
>> vr1 ONLINE   0 0 0
>>   raidz2-0  ONLINE   0 0 0
>> gpt/v1-d0   ONLINE   0 0 0
>> gpt/v1-d1   ONLINE   0 0 0
>> gpt/v1-d2   ONLINE   0 0 0
>> gpt/v1-d3   ONLINE   0 0 0
>> gpt/v1-d4   ONLINE   0 0 0
>> gpt/v1-d5   ONLINE   0 0 0
>> gpt/v1-d6   ONLINE   0 0 0
>> gpt/v1-d7   ONLINE   0 0 0
>>   raidz2-2  ONLINE   0 0 0
>> gpt/v1-e0c  ONLINE   0 0 0
>> gpt/v1-e1b  ONLINE   0 0 0
>> gpt/v1-e2b  ONLINE   0 0 0
>> gpt/v1-e3b  ONLINE   0 0 0
>> gpt/v1-e4b  ONLINE   0 0 0
>> gpt/v1-e5a  ONLINE   0 0 0
>> gpt/v1-e6a  ONLINE   0 0 0
>> gpt/v1-e7c  ONLINE   0 0 0
>> logs
>>   gpt/vr1logONLINE   0 0 0
>> cache
>>   gpt/vr1cache  ONLINE   0 0 0
>>
>> errors: No known data errors
>>
>> ... it doesn't say it now, but there were 5 CKSUM errors on one of the
>> drives that I had trial-removed (and not on the one replaced).
>> ___
> That is EXACTLY what I'm seeing; the "OFFLINE'd" drive is the one that,
> after a scrub, comes up with the checksum errors.  It does *not* flag
> any errors during the resilver and the drives *not* taken offline do not
> (ever) show checksum errors either.
>
> Interestingly enough you have 19.00.00.00 firmware on your card as well
> -- which is what was on mine.
>
> I have flashed my card forward to 20.00.07.00 -- we'll see if it still
> does it when I do the next swap of the backup set.

Verry interes

Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-11 Thread Karl Denninger


On 4/11/2019 13:52, Zaphod Beeblebrox wrote:
> On Wed, Apr 10, 2019 at 10:41 AM Karl Denninger  wrote:
>
>
>> In this specific case the adapter in question is...
>>
>> mps0:  port 0xc000-0xc0ff mem
>> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3
>> mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
>> mps0: IOCCapabilities:
>> 1285c
>>
>> Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects
>> his drives via dumb on-MoBo direct SATA connections.
>>
> Maybe I'm in good company.  My current setup has 8 of the disks connected
> to:
>
> mps0:  port 0xb000-0xb0ff mem
> 0xfe24-0xfe24,0xfe20-0xfe23 irq 32 at device 0.0 on pci6
> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd
> mps0: IOCCapabilities:
> 5a85c
>
> ... just with a cable that breaks out each of the 2 connectors into 4
> SATA-style connectors, and the other 8 disks (plus boot disks and SSD
> cache/log) connected to ports on...
>
> - ahci0:  port
> 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem
> 0xfe90-0xfe9001ff irq 44 at device 0.0 on pci2
> - ahci2:  port
> 0xa050-0xa057,0xa040-0xa043,0xa030-0xa037,0xa020-0xa023,0xa000-0xa01f mem
> 0xfe61-0xfe6107ff irq 40 at device 0.0 on pci7
> - ahci3:  port
> 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem
> 0xfea07000-0xfea073ff irq 19 at device 17.0 on pci0
>
> ... each drive connected to a single port.
>
> I can actually reproduce this at will.  Because I have 16 drives, when one
> fails, I need to find it.  I pull the sata cable for a drive, determine if
> it's the drive in question, if not, reconnect, "ONLINE" it and wait for
> resilver to stop... usually only a minute or two.
>
> ... if I do this 4 to 6 odd times to find a drive (I can tell, in general,
> that a drive is part of the SAS controller or the SATA controllers... so
> I'm only looking among 8, ever) ... then I "REPLACE" the problem drive.
> More often than not, the a scrub will find a few problems.  In fact, it
> appears that the most recent scrub is an example:
>
> [1:7:306]dgilbert@vr:~> zpool status
>   pool: vr1
>  state: ONLINE
>   scan: scrub repaired 32K in 47h16m with 0 errors on Mon Apr  1 23:12:03
> 2019
> config:
>
> NAMESTATE READ WRITE CKSUM
> vr1 ONLINE   0 0 0
>   raidz2-0  ONLINE   0 0 0
> gpt/v1-d0   ONLINE   0 0 0
> gpt/v1-d1   ONLINE   0 0 0
> gpt/v1-d2   ONLINE   0 0 0
> gpt/v1-d3   ONLINE   0 0 0
> gpt/v1-d4   ONLINE   0 0 0
> gpt/v1-d5   ONLINE   0 0 0
> gpt/v1-d6   ONLINE   0 0 0
> gpt/v1-d7   ONLINE   0 0 0
>   raidz2-2  ONLINE   0 0 0
> gpt/v1-e0c  ONLINE   0 0 0
> gpt/v1-e1b  ONLINE   0 0 0
> gpt/v1-e2b  ONLINE   0 0 0
> gpt/v1-e3b  ONLINE   0 0 0
> gpt/v1-e4b  ONLINE   0 0 0
> gpt/v1-e5a  ONLINE   0 0 0
> gpt/v1-e6a  ONLINE   0 0 0
> gpt/v1-e7c  ONLINE   0 0 0
> logs
>   gpt/vr1logONLINE   0 0 0
> cache
>   gpt/vr1cache  ONLINE   0 0 0
>
> errors: No known data errors
>
> ... it doesn't say it now, but there were 5 CKSUM errors on one of the
> drives that I had trial-removed (and not on the one replaced).
> ___

That is EXACTLY what I'm seeing; the "OFFLINE'd" drive is the one that,
after a scrub, comes up with the checksum errors.  It does *not* flag
any errors during the resilver and the drives *not* taken offline do not
(ever) show checksum errors either.

Interestingly enough you have 19.00.00.00 firmware on your card as well
-- which is what was on mine.

I have flashed my card forward to 20.00.07.00 -- we'll see if it still
does it when I do the next swap of the backup set.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-10 Thread Karl Denninger

On 4/10/2019 08:45, Andriy Gapon wrote:
> On 10/04/2019 04:09, Karl Denninger wrote:
>> Specifically, I *explicitly* OFFLINE the disk in question, which is a
>> controlled operation and *should* result in a cache flush out of the ZFS
>> code into the drive before it is OFFLINE'd.
>>
>> This should result in the "last written" TXG that the remaining online
>> members have, and the one in the offline member, being consistent.
>>
>> Then I "camcontrol standby" the involved drive, which forces a writeback
>> cache flush and a spindown; in other words, re-ordered or not, the
>> on-platter data *should* be consistent with what the system thinks
>> happened before I yank the physical device.
> This may not be enough for a specific [RAID] controller and a specific
> configuration.  It should be enough for a dumb HBA.  But, for example, 
> mrsas(9)
> can simply ignore the synchronize cache command (meaning neither the on-board
> cache is flushed nor the command is propagated to a disk).  So, if you use 
> some
> advanced controller it would make sense to use its own management tool to
> offline a disk before pulling it.
>
> I do not preclude a possibility of an issue in ZFS.  But it's not the only
> possibility either.

In this specific case the adapter in question is...

mps0:  port 0xc000-0xc0ff mem
0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities:
1285c

Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects
his drives via dumb on-MoBo direct SATA connections.

What I don't know (yet) is if the update to firmware 20.00.07.00 in the
HBA has fixed it.  The 11.2 and 12.0 revs of FreeBSD through some
mechanism changed timing quite materially in the mps driver; prior to
11.2 I ran with a Lenovo SAS expander connected to SATA disks without
any problems at all, even across actual disk failures through the years,
but in 11.2 and 12.0 doing this resulted in spurious retries out of the
CAM layer that allegedly came from timeouts on individual units (which
looked very much like a lost command sent to the disk), but only on
mirrored volume sets -- yet there were no errors reported by the drive
itself, nor did either of my RaidZ2 pools (one spinning rust, one SSD)
experience problems of any sort.   Flashing the HBA forward to
20.00.07.00 with the expander in resulted in the  *driver* (mps) taking
disconnects and resets instead of the targets, which in turn caused
random drive fault events across all of the pools.  For obvious reasons
that got backed out *fast*.

Without the expander 19.00.00.00 has been stable over the last few
months *except* for this circumstance, where an intentionally OFFLINE'd
disk in a mirror that is brought back online after some reasonably long
period of time (days to a week) results in a successful resilver but
then a small number of checksum errors on that drive -- always on the
one that was OFFLINEd, never on the one(s) not taken OFFLINE -- appear
and are corrected when a scrub is subsequently performed.  I am now on
20.00.07.00 and so far -- no problems.  But I've yet to do the backup
disk swap on 20.00.07.00 (scheduled for late week or Monday) so I do not
know if the 20.00.07.00 roll-forward addresses the scrub issue or not. 
I have no reason to believe it is involved, but given the previous
"iffy" nature of 11.2 and 12.0 on 19.0 with the expander it very well
might be due to what appear to be timing changes in the driver architecture.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-09 Thread Karl Denninger

On 4/9/2019 16:27, Zaphod Beeblebrox wrote:
> I have a "Ghetto" home RAID array.  It's built on compromises and makes use
> of RAID-Z2 to survive.  It consists of two plexes of 8x 4T units of
> "spinning rust".  It's been upgraded and upgraded.  It started as 8x 2T,
> then 8x 2T + 8x 4T then the current 16x 4T.  The first 8 disks are
> connected to motherboard SATA.  IIRC, there are 10.  Two ports are used for
> a mirror that it boots from.  There's also an SSD in there somhow, so it
> might be 12 ports on the motherboard.
>
> The other 8 disks started life in eSATA port multiplier boxes.  That was
> doubleplusungood, so I got a RAID card based on LSI pulled from a fujitsu
> server in Japan.  That's been upgraded a couple of times... not always a
> good experience.  One problem is that cheap or refurbished drives don't
> always "like" SAS controllers and FreeBSD.  YMMV.
>
> Anyways, this is all to introduce the fact that I've seen this behaviour
> multiple times. You have a drive that leaves the array for some amount of
> time, and after resilvering, a scrub will find a small amount of bad data.
> 32 k or 40k or somesuch.  In my cranial schema of things, I've chalked it
> up to out-of-order writing of the drives ... or other such behavior s.t.
> ZFS doesn't know exactly what has been written.  I've often wondered if the
> fix would be to add an amount of fuzz to the transaction range that is
> resilvered.
>
>
> On Tue, Apr 9, 2019 at 4:32 PM Karl Denninger  wrote:
>
>> On 4/9/2019 15:04, Andriy Gapon wrote:
>>> On 09/04/2019 22:01, Karl Denninger wrote:
>>>> the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S
>>>> IN USE AREA was examined, compared, and blocks not on the "new member"
>>>> or changed copied over.
>>> I think that that's not entirely correct.
>>> ZFS maintains something called DTL, a dirty-time log, for a missing /
>> offlined /
>>> removed device.  When the device re-appears and gets resilvered, ZFS
>> walks only
>>> those blocks that were born within the TXG range(s) when the device was
>> missing.
>>> In any case, I do not have an explanation for what you are seeing.
>> That implies something much more-serious could be wrong such as given
>> enough time -- a week, say -- that the DTL marker is incorrect and some
>> TXGs that were in fact changed since the OFFLINE are not walked through
>> and synchronized.  That would explain why it gets caught by a scrub --
>> the resilver is in fact not actually copying all the blocks that got
>> changed and so when you scrub the blocks are not identical.  Assuming
>> the detached disk is consistent that's not catastrophically bad IF
>> CAUGHT; where you'd get screwed HARD is in the situation where (for
>> example) you had a 2-unit mirror, detached one, re-attached it, resilver
>> says all is well, there is no scrub performed and then the
>> *non-detached* disk fails before there is a scrub.  In that case you
>> will have permanently destroyed or corrupted data since the other disk
>> is allegedly consistent but there are blocks *missing* that were never
>> copied over.
>>
>> Again this just showed up on 12.x; it definitely was *not* at issue in
>> 11.1 at all.  I never ran 11.2 in production for a material amount of
>> time (I went from 11.1 to 12.0 STABLE after the IPv6 fixes were posted
>> to 12.x) so I don't know if it is in play on 11.2 or not.
>>
>> I'll see if it shows up again with 20.00.07.00 card firmware.
>>
>> Of note I cannot reproduce this on my test box with EITHER 19.00.00.00
>> or 20.00.07.00 firmware when I set up a 3-unit mirror, offline one, make
>> a crap-ton of changes, offline the second and reattach the third (in
>> effect mirroring the "take one to the vault" thing) with a couple of
>> hours elapsed time and a synthetic (e.g. "dd if=/dev/random of=outfile
>> bs=1m" sort of thing) "make me some new data that has to be resilvered"
>> workload.  I don't know if that's because I need more entropy in the
>> filesystem than I can reasonably generate this way (e.g. more
>> fragmentation of files, etc) or whether it's a time-based issue (e.g.
>> something's wrong with the DTL/TXG thing as you note above in terms of
>> how it functions and it only happens if the time elapsed causes
>> something to be subject to a rollover or similar problem.)
>>
>> I spent quite a lot of time trying to make reproduce the issue on my
>> "sandbox" machine

Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-09 Thread Karl Denninger

On 4/9/2019 15:04, Andriy Gapon wrote:
> On 09/04/2019 22:01, Karl Denninger wrote:
>> the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S
>> IN USE AREA was examined, compared, and blocks not on the "new member"
>> or changed copied over.
> I think that that's not entirely correct.
> ZFS maintains something called DTL, a dirty-time log, for a missing / 
> offlined /
> removed device.  When the device re-appears and gets resilvered, ZFS walks 
> only
> those blocks that were born within the TXG range(s) when the device was 
> missing.
>
> In any case, I do not have an explanation for what you are seeing.

That implies something much more-serious could be wrong such as given
enough time -- a week, say -- that the DTL marker is incorrect and some
TXGs that were in fact changed since the OFFLINE are not walked through
and synchronized.  That would explain why it gets caught by a scrub --
the resilver is in fact not actually copying all the blocks that got
changed and so when you scrub the blocks are not identical.  Assuming
the detached disk is consistent that's not catastrophically bad IF
CAUGHT; where you'd get screwed HARD is in the situation where (for
example) you had a 2-unit mirror, detached one, re-attached it, resilver
says all is well, there is no scrub performed and then the
*non-detached* disk fails before there is a scrub.  In that case you
will have permanently destroyed or corrupted data since the other disk
is allegedly consistent but there are blocks *missing* that were never
copied over.

Again this just showed up on 12.x; it definitely was *not* at issue in
11.1 at all.  I never ran 11.2 in production for a material amount of
time (I went from 11.1 to 12.0 STABLE after the IPv6 fixes were posted
to 12.x) so I don't know if it is in play on 11.2 or not.

I'll see if it shows up again with 20.00.07.00 card firmware.

Of note I cannot reproduce this on my test box with EITHER 19.00.00.00
or 20.00.07.00 firmware when I set up a 3-unit mirror, offline one, make
a crap-ton of changes, offline the second and reattach the third (in
effect mirroring the "take one to the vault" thing) with a couple of
hours elapsed time and a synthetic (e.g. "dd if=/dev/random of=outfile
bs=1m" sort of thing) "make me some new data that has to be resilvered"
workload.  I don't know if that's because I need more entropy in the
filesystem than I can reasonably generate this way (e.g. more
fragmentation of files, etc) or whether it's a time-based issue (e.g.
something's wrong with the DTL/TXG thing as you note above in terms of
how it functions and it only happens if the time elapsed causes
something to be subject to a rollover or similar problem.) 

I spent quite a lot of time trying to make reproduce the issue on my
"sandbox" machine and was unable -- and of note it is never a large
quantity of data that is impacted, it's usually only a couple of dozen
checksums that show as bad and fixed.  Of note it's also never just one;
if there was a single random hit on a data block due to ordinary bitrot
sort of issues I'd expect only one checksum to be bad.  But generating a
realistic synthetic workload over the amount of time involved on a
sandbox is not trivial at all; the system on which this is now happening
handles a lot of email and routine processing of various sorts including
a fair bit of database activity associated with network monitoring and
statistical analysis.

I'm assuming that using "offline" as a means to do this hasn't become
"invalid" as something that's considered "ok" as a means of doing this
sort of thing it certainly has worked perfectly well for a very long
time!

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-09 Thread Karl Denninger

.1 and 11.2/12.0, as I discovered when the
SAS expander I used to have in these boxes started returning timeout
errors that were false.  Again -- this same configuration was completely
stable under 11.1 and previous over a period of years.

With 20.00.07.00 I have yet to have this situation recur -- so far --
but I have limited time with 20.00.07.00 and as such my confidence that
the issue is in fact resolved by the card firmware change is only modest
at this point.  Over the next month or so, if it doesn't happen again,
my confidence will of course improve.

Checksum errors on ZFS volumes are extraordinarily uncool for the
obvious reason -- they imply the disk thinks the data is fine (since it
is not recording any errors on the interface or at the drive level) BUT
ZFS thinks the data off that particular record was corrupt as the
checksum fails.  Silent corruption is the worst sort in that it can hide
for months or even years before being discovered, long after your
redundant copies have been re-used or overwritten.

Assuming I do not see a recurrence with the 20.00.07.00 firmware I would
suggest that UPDATING, the Release Notes or Errata have an entry added
that for 12.x forward card firmware revisions prior to 20.00.07.00 carry
*strong* cautions and that those with these HBAs be strongly urged to
flash the card forward to 20.00.07.00 before upgrading or installing. 
If you get a surprise of this sort and have no second copy that is not
impacted you could find yourself severely hosed.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Observations from a ZFS reorganization on 12-STABLE

2019-03-18 Thread Karl Denninger

On 3/18/2019 08:37, Walter Cramer wrote:
> I suggest caution in raising vm.v_free_min, at least on 11.2-RELEASE
> systems with less RAM.  I tried "65536" (256MB) on a 4GB mini-server,
> with vfs.zfs.arc_max of 2.5GB.  Bad things happened when the cron
> daemon merely tried to run `periodic daily`.
>
> A few more details - ARC was mostly full, and "bad things" was 1:
> `pagedaemon` seemed to be thrashing memory - using 100% of CPU, with
> little disk activity, and 2: many normal processes seemed unable to
> run. The latter is probably explained by `man 3 sysctl` (see entry for
> "VM_V_FREE_MIN").
>
>
> On Mon, 18 Mar 2019, Pete French wrote:
>
>> On 17/03/2019 21:57, Eugene Grosbein wrote:
>>> I agree. Recently I've found kind-of-workaround for this problem:
>>> increase vm.v_free_min so when "FREE" memory goes low,
>>> page daemon wakes earlier and shrinks UMA (and ZFS ARC too) moving
>>> some memory
>>> from WIRED to FREE quick enough so it can be re-used before bad
>>> things happen.
>>>
>>> But avoid increasing vm.v_free_min too much (e.g. over 1/4 of total
>>> RAM)
>>> because kernel may start behaving strange. For 16Gb system it should
>>> be enough
>>> to raise vm.v_free_min upto 262144 (1GB) or 131072 (512M).
>>>
>>> This is not permanent solution in any way but it really helps.
>>
>> Ah, thats very interesting, thankyou for that! I;ve been bitten by
>> this issue too in the past, and it is (as mentioned) much improved on
>> 12, but the act it could still cause issues worries me.
>>
Raising free_target should *not* result in that sort of thrashing. 
However, that's not really a fix standing alone either since the
underlying problem is not being addressed by either change.  It is
especially dangerous to raise the pager wakeup thresholds if you still
run into UMA allocated-but-not-in-use not being cleared out issues as
there's a risk of severe pathological behavior arising that's worse than
the original problem.

11.1 and before (I didn't have enough operational experience with 11.2
to know, as I went to 12.x from mostly-11.1 installs around here) were
essentially unusable in my workload without either my patch set or the
Phabricator one.

This is *very* workload-specific however, or nobody would use ZFS on
earlier releases, and many do without significant problems.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Observations from a ZFS reorganization on 12-STABLE

2019-03-18 Thread Karl Denninger


On 3/18/2019 08:07, Pete French wrote:
>
>
> On 17/03/2019 21:57, Eugene Grosbein wrote:
>> I agree. Recently I've found kind-of-workaround for this problem:
>> increase vm.v_free_min so when "FREE" memory goes low,
>> page daemon wakes earlier and shrinks UMA (and ZFS ARC too) moving
>> some memory
>> from WIRED to FREE quick enough so it can be re-used before bad
>> things happen.
>>
>> But avoid increasing vm.v_free_min too much (e.g. over 1/4 of total RAM)
>> because kernel may start behaving strange. For 16Gb system it should
>> be enough
>> to raise vm.v_free_min upto 262144 (1GB) or 131072 (512M).
>>
>> This is not permanent solution in any way but it really helps.
>
> Ah, thats very interesting, thankyou for that! I;ve been bitten by
> this issue too in the past, and it is (as mentioned) much improved on
> 12, but the act it could still cause issues worries me.
>
>
The code patch I developed originally essentially sought to have the ARC
code pare back before the pager started evicting working set.  A second
crack went after clearing allocated-but-not-in-use UMA.

v_free_min may not be the right place to do this -- see if bumping up
vm.v_free_target also works.

I'll stick this on my "to do" list, but it's much less critical in my
applications than it was with 10.x and 11.x, both of which suffered from
it much more-severely to the point that I was getting "stalls" that in
some cases went on for 10 or more seconds due to things like your shell
being evicted to swap to make room for arc, which is flat-out nuts. 
That, at least, doesn't appear to be a problem with 12.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Observations from a ZFS reorganization on 12-STABLE

2019-03-17 Thread Karl Denninger

I've long argued that the VM system's interaction with ZFS' arc cache
and UMA has serious, even severe issues.  12.x appeared to have
addressed some of them, and as such I've yet to roll forward any part of
the patch series that is found here [
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 ] or the
Phabricator version referenced in the bug thread (which is more-complex
and attempts to dig at the root of the issue more effectively,
particularly when UMA is involved as it usually is.)

Yesterday I decided to perform a fairly significant reorganization of
the ZFS pools on one of my personal machines, including the root pool
which was on mirrored SSDs, changing to a Raidz2 (also on SSDs.)  This
of course required booting single-user from a 12-Stable memstick.

A simple "zfs send -R zs/root-save/R | zfs recv -Fuev zsr/R" should have
done it, no sweat.  The root that was copied over before I started is
uncomplicated; it's compressed, but not de-duped.  While it has
snapshots on it too it's by no means complex.

*The system failed to execute that command with an "out of swap space"
error, killing the job; there was indeed no swap configured since I
booted from a memstick.*

Huh?  A simple *filesystem copy* managed to force a 16Gb system into
requiring page file backing store?

I was able to complete the copy by temporarily adding the swap space
back on (where it would be when the move was complete) but that
requirement is pure insanity and it appears, from what I was able to
determine, that it came about from the same root cause that's been
plaguing VM/ZFS interaction since 2014 when I started work this issue --
specifically, when RAM gets low rather than evict ARC (or clean up UMA
that is allocated but unused) the system will attempt to page out
working set.  In this case since it couldn't page out working set since
there was nowhere to page it to the process involved got an OOM error
and was terminated.

*I continue to argue that this decision is ALWAYS wrong.*

It's wrong because if you invalidate cache and reclaim it you *might*
take a read from physical I/O to replace that data back into the cache
in the future (since it's not in RAM) but in exchange for a *potential*
I/O you perform a GUARANTEED physical I/O (to page out some amount of
working set) and possibly TWO physical I/Os (to page said working set
out and, later, page it back in.)

It has always appeared to me to be flat-out nonsensical to trade a
possible physical I/O (if there is a future cache miss) for a guaranteed
physical I/O and a possible second one.  It's even worse if the reason
you make that decision is that UMA is allocated but unused; in that case
you are paging when no physical I/O is required at all as the "memory
pressure" is a phantom!  While UMA is a very material performance win in
the general case to allow allocated-but-unused UMA to force paging, from
a performance perspective, appears to be flat-out insanity.  I find it
very difficult to come up with any reasonable scenario where releasing
allocated-but-unused UMA rather than paging out working set is a net
performance loser.

In this case since the system was running in single user mode the
process that got selected to be destroyed when that circumstance arose
and there was no available swap was the copy process itself.  The copy
itself did not require anywhere near all of the available non-kernel RAM.

I'm going to dig into this further but IMHO the base issue still exists,
even though the impact of it for my workloads with everything "running
normally" has materially decreased with 12.x.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Coffee Lake Xeons...

2019-02-18 Thread Karl Denninger

Anyone used them yet with FreeBSD?  The server boards available with
IPMI/iKVM are sparse thus far -- in fact, I've only found one that looks
like it might fit my requirements (specifically I need iKVM, a legacy
serial port and enough PCIe to handle both a LSI SAS/Sata card *and*
forward expansion to 10Gb networking as required on a forward basis.)

I'm specifically considering this board
https://www.asrockrack.com/general/productdetail.asp?Model=E3C246D4U#Specifications

... with one of the E-2100 "G" series chips to replace an aging (but
still fully-functional) Westmere Xeon board.  The goal is to gain CPU,
memory and I/O bandwidth (collectively "performance"), keep forward
optionality for network performance improvements while materially
reducing power consumption/./

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Binary update to -STABLE? And if so, what do I get?

2019-02-13 Thread Karl Denninger

On 2/13/2019 07:49, Kurt Jaeger wrote:
> Hi!
>
>> I know (and have done) binary updates between -RELEASE versions
> [...]
>> How do I do this, say, coming from 11.2 and wanting to target 12 post
>> the IPv6 fix MFC?
> You can't. Either wait until a 12.0 with the fix included or
> 12.1 is released, or you fetch the source with the fix included,
> and build from source.
Got it -- thanks.  Wait it shall be.
-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Binary update to -STABLE? And if so, what do I get?

2019-02-13 Thread Karl Denninger

I know (and have done) binary updates between -RELEASE versions

But 12 has a problem with -RELEASE and IPv6, which was recently fixed
and MFC'd.  So now I have an interesting situation in that I have two
machines in the field running 11.2 that do things for me at one of the
"shared colo" joints, and I would like to roll them forward -- but they
have to roll forward to a reasonably-recent -STABLE.

How do I do this, say, coming from 11.2 and wanting to target 12 post
the IPv6 fix MFC?  (e.g. how do I specify the target, since it wouldn't
be "12-RELEASE"?)  I'm assuming (perhaps incorrectly) that "12-STABLE"
is not the correct means to do so.

Or is it, since at first blush it doesn't blow up if I use that... but
I'm hesitant to say "yeah, go at it."

# freebsd-update -r 12-STABLE upgrade
src component not installed, skipped
Looking up update.FreeBSD.org mirrors... 3 mirrors found.
Fetching metadata signature for 11.2-RELEASE from update2.freebsd.org...
done.
Fetching metadata index... done.
Inspecting system... done.

The following components of FreeBSD seem to be installed:
kernel/generic world/base world/lib32

The following components of FreeBSD do not seem to be installed:
kernel/generic-dbg world/base-dbg world/doc world/lib32-dbg

Does this look reasonable (y/n)?

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: 11.2-STABLE kernel wired memory leak

2019-02-13 Thread Karl Denninger

  2,932,160  2,884,928
5,853,184  512 1,529,344  4,323,840
5,935,104  256_Bucket  5,836,800  98,304
6,039,544  vmem_btag   4,980,528  1,059,016
6,502,272  L_VFS_Cache 2,106,088  4,396,184
7,471,104  zio_buf_393216  0  7,471,104
8,519,680  zio_buf_65536   0  8,519,680
8,988,000  32  8,279,456  708,544
9,175,040  zio_buf_458752  0  9,175,040
9,535,488  1,024   9,174,016  361,472
11,376,288 BUF_TRIE    0  11,376,288
11,640,832 zio_data_buf_57344  860,160    10,780,672
11,657,216 mbuf_cluster    11,446,272 210,944
11,796,480 zio_buf_655360  0  11,796,480
13,271,040 zio_data_buf_81920  737,280    12,533,760
14,024,704 zio_data_buf_65536  917,504    13,107,200
17,039,360 zio_buf_1310720 0  17,039,360
17,301,504 zio_buf_524288  0  17,301,504
18,087,936 zio_data_buf_98304  1,277,952  16,809,984
18,153,456 zio_cache   388,808    17,764,648
24,144,120 MAP_ENTRY   16,120,200 8,023,920
26,214,400 zio_buf_1048576 2,097,152  24,117,248
29,379,240 range_seg_cache 21,302,856 8,076,384
29,782,080 RADIX_NODE  17,761,104 12,020,976
34,511,400 S_VFS_Cache 31,512,672 2,998,728
38,535,168 zio_buf_786432  0  38,535,168
40,680,144 sa_cache    40,548,816 131,328
41,517,056 zio_data_buf_114688 1,490,944  40,026,112
42,205,184 zio_buf_917504  0  42,205,184
50,147,328 zio_data_buf_4096   98,304 50,049,024
50,675,712 zio_data_buf_49152  983,040    49,692,672
53,972,736 64  29,877,888 24,094,848
61,341,696 zio_buf_131072  42,205,184 19,136,512
72,019,200 VM_OBJECT   71,597,056 422,144
76,731,200 zfs_znode_cache 76,592,208 138,992
88,972,800 256 82,925,568 6,047,232
90,390,528 4,096   89,911,296 479,232
94,036,000 UMA_Slabs   94,033,280 2,720
135,456,000    VNODE   135,273,600    182,400
171,928,320    arc_buf_hdr_t_full  119,737,344    52,190,976
221,970,432    zio_data_buf_8192   166,076,416    55,894,016
277,923,840    dmu_buf_impl_t  130,788,240    147,135,600
376,586,240    zio_buf_16384   372,719,616    3,866,624
397,680,896    128 195,944,448    201,736,448
443,023,360    zio_data_buf_131072 158,072,832    284,950,528
535,584,768    zio_buf_512 255,641,088    279,943,680
713,552,840    dnode_t     373,756,656    339,796,184
3,849,744,384  abd_chunk   3,848,310,784  1,433,600
8,418,769,848  TOTAL   6,459,507,936  1,959,261,912

So far running 12-STABLE "neat" is behaving well for me

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: 11.2-STABLE kernel wired memory leak

2019-02-12 Thread Karl Denninger

On 2/12/2019 10:49, Eugene Grosbein wrote:
> 12.02.2019 23:34, Mark Johnston wrote:
>
>> I suspect that the "leaked" memory is simply being used to cache UMA
>> items.  Note that the values in the FREE column of vmstat -z output are
>> quite large.  The cached items are reclaimed only when the page daemon
>> wakes up to reclaim memory; if there are no memory shortages, large
>> amounts of memory may accumulate in UMA caches.  In this case, the sum
>> of the product of columns 2 and 5 gives a total of roughly 4GB cached.
> Forgot to note, that before I got system to single user mode, there was heavy 
> swap usage (over 3.5GB)
> and heavy page-in/page-out, 10-20 megabytes per second and system was 
> crawling slow due to pageing.

This is a manifestation of the general issue I've had an ongoing
"discussion" running in a long-running thread on bugzilla and the
interaction between UMA, ARC and the VM system.

The short version is that the VM system does pathological things
including paging out working set when there is a large amount of
allocated-but-unused UMA and the means by which the ARC code is "told"
that it needs to release RAM also interacts with the same mechanisms and
exacerbates the problem.

I've basically given up on getting anything effective to deal with this
merged into the code and have my own private set of patches that I
published for a while in that thread (and which had some collaborative
development and testing) but have given up on seeing anything meaningful
put into the base code.  To the extent I need them in a given workload
and environment I simply apply them on my own and go on my way.

I don't have enough experience with 12 yet to know if the same approach
will be necessary there (that is, what if any changes got into the 12.x
code), and never ran 11.2 much, choosing to stay on 11.1 where said
patches may not have been the most-elegant means of dealing with it but
were successful.  There was also a phabricator thread on this but I
don't know what part of it, if any, got merged (it was more-elegant, in
my opinion, than what I had coded up.)  Under certain workloads here
without the patches I was experiencing "freezes" due to unnecessary
page-outs onto spinning rust that in some cases reached into
double-digit *seconds.*  With them the issue was entirely resolved.

At the core of the issue is that "something" has to be taught that
*before* the pager starts evicting working set to swap if you have large
amounts of UMA allocated to ARC but not in use that RAM should be
released, and beyond that if you have ARC allocated and in use but are
approaching where VM is going to page working set out you need to come
up with some meaningful way of deciding whether to release some of the
ARC rather than take the page hit -- and in virtually every case the
answer to that question is to release the RAM consumed by ARC.  Part of
the issue is that UMA can be allocated for other things besides ARC yet
you really only want to release the ARC-related UMA that is
allocated-but-unused in this instance.

The logic is IMHO pretty simple on this -- a page-out of a process that
will run again always requires TWO disk operations -- one to page it out
right now and a second at a later time to page it back in.  A released
ARC cache *MAY* (if there would have been a cache hit in the future)
require ONE disk operation (to retrieve it from disk.)

Two is always greater than one and one is never worse than "maybe one
later" therefore prioritizing taking two *definite* disk I/Os or one
definite I/O now and one possible one later instead of one *possible*
disk I/O later is always a net lose -- and thus IMHO substantial effort
should be made to avoid doing that.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Oddball error from "batch"

2019-02-10 Thread Karl Denninger

On 2/10/2019 16:01, Karl Denninger wrote:
> Note -- working fine on 11.1 and 11.2, upgraded machine to 12.0-STABLE
> and everything is ok that I'm aware of *except*.
>
> # batch
> who
> df
> ^D
>
> Job 170 will be executed using /bin/sh
>
> Then the time comes and... no output is emailed to me.
>
> In the cron log file I find:
>
> Feb 10 16:00:00 NewFS atrun[65142]: cannot open input file
> E000aa018a24c3: No such file or directory
>
> Note that scheduled cron jobs are running as expected, and the
> permissions on /var/at are correct (match exactly my 11. 1 and 11.2
> boxes), and in addition of looking BEFORE the job runs the named job
> number IS THERE.
>
> [\u@NewFS /var/at/jobs]# ls -al
> total 13
> drwxr-xr-x  2 daemon  wheel    5 Feb 10 15:55 .
> drwxr-xr-x  4 root    wheel    5 Oct  8  2013 ..
> -rw-r--r--  1 root    wheel    6 Feb 10 15:55 .SEQ
> -rw---  1 root    wheel    0 Jul  5  2008 .lockfile
> -rwx--  1 root    wheel  615 Feb 10 15:55 E000aa018a24c3
>
> After the error the file isn't there.  It was removed (as one would
> expect when the job is complete.)
>
> What the blankety-blank?!

Turns out it's a nasty race in the atrun code I have no idea why
this hasn't bit the living daylights out of lots of people before, but
it's sure biting me!

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235657

Includes a proposed fix... :)

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Oddball error from "batch"

2019-02-10 Thread Karl Denninger

Note -- working fine on 11.1 and 11.2, upgraded machine to 12.0-STABLE
and everything is ok that I'm aware of *except*.

# batch
who
df
^D

Job 170 will be executed using /bin/sh

Then the time comes and... no output is emailed to me.

In the cron log file I find:

Feb 10 16:00:00 NewFS atrun[65142]: cannot open input file
E000aa018a24c3: No such file or directory

Note that scheduled cron jobs are running as expected, and the
permissions on /var/at are correct (match exactly my 11. 1 and 11.2
boxes), and in addition of looking BEFORE the job runs the named job
number IS THERE.

[\u@NewFS /var/at/jobs]# ls -al
total 13
drwxr-xr-x  2 daemon  wheel    5 Feb 10 15:55 .
drwxr-xr-x  4 root    wheel    5 Oct  8  2013 ..
-rw-r--r--  1 root    wheel    6 Feb 10 15:55 .SEQ
-rw---  1 root    wheel    0 Jul  5  2008 .lockfile
-rwx--  1 root    wheel  615 Feb 10 15:55 E000aa018a24c3

After the error the file isn't there.  It was removed (as one would
expect when the job is complete.)

What the blankety-blank?!

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Geli prompts on gptzfsboot (Was:: Serious ZFS Bootcode Problem (GPT NON-UEFI -- RESOLVED)

2019-02-10 Thread Karl Denninger


On 2/10/2019 12:40, Ian Lepore wrote:
> On Sun, 2019-02-10 at 12:35 -0600, Karl Denninger wrote:
>> On 2/10/2019 12:01, Ian Lepore wrote:
>>> On Sun, 2019-02-10 at 11:54 -0600, Karl Denninger wrote:
>>>> On 2/10/2019 11:50, Ian Lepore wrote:
>>>>> On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote:
>>>>>
>>>>>> [...]
>>>>>>
>>>>>> BTW am I correct that gptzfsboot did *not* get the ability to
>>>>>> read
>>>>>> geli-encrypted pools in 12.0?  The UEFI loader does know how
>>>>>> (which I'm
>>>>>> using on my laptop) but I was under the impression that for
>>>>>> non-
>>>>>> UEFI
>>>>>> systems you still needed the unencrypted boot partition from
>>>>>> which to
>>>>>> load the kernel.
>>>>>>
>>>>> Nope, that's not correct. GELI support was added to the boot
>>>>> and
>>>>> loader
>>>>> programs for both ufs and zfs in freebsd 12. You must set the
>>>>> geli
>>>>> '-g' 
>>>>> option to be prompted for the passphrase while booting (this is
>>>>> separate from the '-b' flag that enables mounting the encrypted
>>>>> partition as the rootfs). You can use "geli configure -g" to
>>>>> turn
>>>>> on
>>>>> the flag on any existing geli partition.
>>>>>
>>>>> -- Ian
>>>> Excellent - this will eliminate the need for me to run down the
>>>> foot-shooting that occurred in my update script since the
>>>> unencrypted
>>>> kernel partition is no longer needed at all.  That also
>>>> significantly
>>>> reduces the attack surface on such a machine (although you could
>>>> still
>>>> tamper with the contents of freebsd-boot of course.)
>>>>
>>>> The "-g" flag I knew about from experience in putting 12 on my X1
>>>> Carbon
>>>> (which works really well incidentally; the only issue I'm aware
>>>> of is
>>>> that there's no 5Ghz WiFi support.)
>>>>
>>> One thing that is rather unfortunate... if you have multiple geli
>>> encrypted partitions that all have the same passphrase, you will be
>>> required to enter that passphrase twice while booting -- once in
>>> gpt[zfs]boot, then again during kernel startup when the rest of the
>>> drives/partitions get tasted by geom. This is because APIs within
>>> the
>>> boot process got changed to pass keys instead of the passphrase
>>> itself
>>> from one stage of booting to the next, and the fallout of that is
>>> the
>>> key for the rootfs is available to the kernel for mountroot, but
>>> the
>>> passphrase is not available to the system when geom is probing all
>>> the
>>> devices, so you get prompted for it again.
>>>
>>> -- Ian
>> Let me see if I understand this before I do it then... :-)
>>
>> I have the following layout:
>>
>> 1. Two SSDs that contain the OS as a two-provider ZFS pool, which has
>> "-b" set on both members; I get the "GELI Passphrase:" prompt from
>> the
>> loader and those two providers (along with encrypted swap) attach
>> early
>> in the boot process.  The same SSDs contain a mirrored non-encrypted
>> pool that has /boot (and only /boot) on it because previously you
>> couldn't boot from an EFI-encrypted pool at all.
>>
>> Thus:
>>
>> [\u@NewFS /root]# gpart show da1
>> =>   34  468862061  da1  GPT  (224G)
>>  34   2014   - free -  (1.0M)
>>2048   10241  freebsd-boot  (512K)
>>3072   1024   - free -  (512K)
>>4096   209715202  freebsd-zfs  [bootme]  (10G)
>>20975616  1342177283  freebsd-swap  (64G)
>>   155193344  3136675844  freebsd-zfs  (150G)
>>   468860928   1167   - free -  (584K)
>>
>> There is of course a "da2" that is identical.  The actual encrypted
>> root
>> pool is on partition 4 with "-b" set at present.  I get prompted from
>> loader as a result after the unencrypted partition (#2) boots.
>>
>> 2. Multiple additional "user space" pools on a bunch of other disks.
>>
>> Right now #2 is using geli groups.  Prior to 12.0 they were

Re: Geli prompts on gptzfsboot (Was:: Serious ZFS Bootcode Problem (GPT NON-UEFI -- RESOLVED)

2019-02-10 Thread Karl Denninger

On 2/10/2019 12:01, Ian Lepore wrote:
> On Sun, 2019-02-10 at 11:54 -0600, Karl Denninger wrote:
>> On 2/10/2019 11:50, Ian Lepore wrote:
>>> On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote:
>>>
>>>> [...]
>>>>
>>>> BTW am I correct that gptzfsboot did *not* get the ability to
>>>> read
>>>> geli-encrypted pools in 12.0?  The UEFI loader does know how
>>>> (which I'm
>>>> using on my laptop) but I was under the impression that for non-
>>>> UEFI
>>>> systems you still needed the unencrypted boot partition from
>>>> which to
>>>> load the kernel.
>>>>
>>> Nope, that's not correct. GELI support was added to the boot and
>>> loader
>>> programs for both ufs and zfs in freebsd 12. You must set the geli
>>> '-g' 
>>> option to be prompted for the passphrase while booting (this is
>>> separate from the '-b' flag that enables mounting the encrypted
>>> partition as the rootfs). You can use "geli configure -g" to turn
>>> on
>>> the flag on any existing geli partition.
>>>
>>> -- Ian
>> Excellent - this will eliminate the need for me to run down the
>> foot-shooting that occurred in my update script since the unencrypted
>> kernel partition is no longer needed at all.  That also significantly
>> reduces the attack surface on such a machine (although you could
>> still
>> tamper with the contents of freebsd-boot of course.)
>>
>> The "-g" flag I knew about from experience in putting 12 on my X1
>> Carbon
>> (which works really well incidentally; the only issue I'm aware of is
>> that there's no 5Ghz WiFi support.)
>>
> One thing that is rather unfortunate... if you have multiple geli
> encrypted partitions that all have the same passphrase, you will be
> required to enter that passphrase twice while booting -- once in
> gpt[zfs]boot, then again during kernel startup when the rest of the
> drives/partitions get tasted by geom. This is because APIs within the
> boot process got changed to pass keys instead of the passphrase itself
> from one stage of booting to the next, and the fallout of that is the
> key for the rootfs is available to the kernel for mountroot, but the
> passphrase is not available to the system when geom is probing all the
> devices, so you get prompted for it again.
>
> -- Ian

Let me see if I understand this before I do it then... :-)

I have the following layout:

1. Two SSDs that contain the OS as a two-provider ZFS pool, which has
"-b" set on both members; I get the "GELI Passphrase:" prompt from the
loader and those two providers (along with encrypted swap) attach early
in the boot process.  The same SSDs contain a mirrored non-encrypted
pool that has /boot (and only /boot) on it because previously you
couldn't boot from an EFI-encrypted pool at all.

Thus:

[\u@NewFS /root]# gpart show da1
=>   34  468862061  da1  GPT  (224G)
 34   2014   - free -  (1.0M)
   2048   1024    1  freebsd-boot  (512K)
   3072   1024   - free -  (512K)
   4096   20971520    2  freebsd-zfs  [bootme]  (10G)
   20975616  134217728    3  freebsd-swap  (64G)
  155193344  313667584    4  freebsd-zfs  (150G)
  468860928   1167   - free -  (584K)

There is of course a "da2" that is identical.  The actual encrypted root
pool is on partition 4 with "-b" set at present.  I get prompted from
loader as a result after the unencrypted partition (#2) boots.

2. Multiple additional "user space" pools on a bunch of other disks.

Right now #2 is using geli groups.  Prior to 12.0 they were handled
using a custom /etc/rc.d script I wrote that did basically the same
thing that geli groups does because all use the same passphrase and
entering the same thing over and over on a boot was a pain in the butt. 
It prompted cleanly with no echo, took a password and then iterated over
a list of devices attaching them one at a time.  That requirement is now
gone with geli groups, which is nice since mergemaster always complained
about it being a "non-standard" thing; it *had* to go in /etc/rc.d and
not in /usr/etc/rc.d else I couldn't get it to run early enough --
unfortunately.

So if I remove the non-encrypted freebsd-zfs mirror that the system
boots from in favor of setting "-g" on the root pool (both providers)
gptzfsboot will find and prompt for the password to boot before loader
gets invoked at all, much like the EFI loader does.  That's good.  (My
assumption is that the "-g" is sufficient; I don't need (or want)
"bootme" set -- correct?)

Re: Fwd: Serious ZFS Bootcode Problem (GPT NON-UEFI)

2019-02-10 Thread Karl Denninger

On 2/10/2019 11:50, Ian Lepore wrote:
> On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote:
>> On 2/10/2019 09:28, Allan Jude wrote:
>>> Are you sure it is non-UEFI? As the instructions you followed,
>>> overwriting da0p1 with gptzfsboot, will make quite a mess if that
>>> happens to be the EFI system partition, rather than the freebsd-
>>> boot
>>> partition.
>> [...]
>>
>> BTW am I correct that gptzfsboot did *not* get the ability to read
>> geli-encrypted pools in 12.0?  The UEFI loader does know how (which I'm
>> using on my laptop) but I was under the impression that for non-UEFI
>> systems you still needed the unencrypted boot partition from which to
>> load the kernel.
>>
> Nope, that's not correct. GELI support was added to the boot and loader
> programs for both ufs and zfs in freebsd 12. You must set the geli '-g' 
> option to be prompted for the passphrase while booting (this is
> separate from the '-b' flag that enables mounting the encrypted
> partition as the rootfs). You can use "geli configure -g" to turn on
> the flag on any existing geli partition.
>
> -- Ian

Excellent - this will eliminate the need for me to run down the
foot-shooting that occurred in my update script since the unencrypted
kernel partition is no longer needed at all.  That also significantly
reduces the attack surface on such a machine (although you could still
tamper with the contents of freebsd-boot of course.)

The "-g" flag I knew about from experience in putting 12 on my X1 Carbon
(which works really well incidentally; the only issue I'm aware of is
that there's no 5Ghz WiFi support.)

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Fwd: Serious ZFS Bootcode Problem (GPT NON-UEFI)

2019-02-10 Thread Karl Denninger

On 2/10/2019 09:28, Allan Jude wrote:
> Are you sure it is non-UEFI? As the instructions you followed,
> overwriting da0p1 with gptzfsboot, will make quite a mess if that
> happens to be the EFI system partition, rather than the freebsd-boot
> partition.

Absolutely certain.  The system board in this machine (and a bunch I
have in the field) are SuperMicro X8DTL-IFs which do not support UEFI at
all (they have no available EFI-capable bios.)

They have encrypted root pools but due to the inability of gptzfsboot to
read them they have a small freebsd-zfs partition that, when upgraded, I
copy /boot/* to after the kernel upgrade is done but before they are
rebooted.  That partition is not mounted during normal operation; it's
only purpose is to load the kernel (and pre-boot .kos such as geli.)

> Can you show 'gpart show' output?
[karl@NewFS ~]$ gpart show da1
=>   34  468862061  da1  GPT  (224G)
 34   2014   - free -  (1.0M)
   2048   1024    1  freebsd-boot  (512K)
   3072   1024   - free -  (512K)
   4096   20971520    2  freebsd-zfs  [bootme]  (10G)
   20975616  134217728    3  freebsd-swap  (64G)
  155193344  313667584    4  freebsd-zfs  (150G)
  468860928   1167   - free -  (584K)

Partition "2" is the one that should boot.

There is also a da2 that has an identical layout (mirrored; the drives
are 240Gb Intel 730 SSDs)

> What is the actual boot error?

It says it can't load the kernel and gives me a prompt.  "lsdev" shows
all the disks and all except the two (zfs mirror) that have the "bootme"
partition on them don't show up as zfs pools at all (they're
geli-encrypted, so that's not unexpected.)  I don't believe the loader
ever gets actually loaded.

An attempt to use "ls" from the bootloader to look inside that "bootme"
partition fails; gptzfsboot cannot get it open.

My belief was that I screwed up and wrote the old 11.1 gptzfsboot to the
freebsd-boot partition originally -- but that is clearly not the case.

Late last night I took my "rescue media" (which is a "make memstick"
from the build of -STABLE), booted that on my sandbox machine, stuck two
disks in there and made a base system -- which booted.  Thus whatever is
going on here it is not as simple as it first appears as that system had
the spacemap_v2 flag on and active once it came up.

This may be my own foot-shooting since I was able to make a bootable
system on my sandbox using the same media (a clone hardware-wise so also
no EFI) -- there may have been some part of the /boot hierarchy that
didn't get copied over, and if so that would explain it.

Update: Indeed that appears to be what it was -- a couple of the *other*
files in the boot partition didn't get copied from the -STABLE build
(although the entire kernel directory did)  I need to look at why
that happened as the update process is my own due to the dual-partition
requirement for booting with non-EFI but that's not your problem -- it's
mine.

Sorry about this one; turns out to be something in my update scripts
that failed to move over some of the files to the non-encrypted /boot

BTW am I correct that gptzfsboot did *not* get the ability to read
geli-encrypted pools in 12.0?  The UEFI loader does know how (which I'm
using on my laptop) but I was under the impression that for non-UEFI
systems you still needed the unencrypted boot partition from which to
load the kernel.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Serious ZFS Bootcode Problem (GPT NON-UEFI)

2019-02-09 Thread Karl Denninger

FreeBSD 12.0-STABLE r343809

After upgrading to this (without material incident) zfs was telling me
that the pools could be upgraded (this machine was running 11.1, then 11.2.)

I did so, /and put the new bootcode on with gpart bootcode -b /boot/pmbr
-p /boot/gptzfsboot -i  da... /on both of the candidate (mirrored
ZFS boot disk) devices, in the correct partition.

Then I rebooted to test and. /could not find the zsboot pool
containing the kernel./

I booted the rescue image off my SD and checked -- the copy of
gptzfsboot that I put on the boot partition is exactly identical to the
one on the rescue image SD.

Then, to be /absolutely sure /I wasn't going insane I grabbed the
mini-memstick img for 12-RELEASE and tried THAT copy of gptzfsboot.

/Nope; that won't boot either!/

Fortunately I had a spare drive slot so I stuck in a piece of spinning
rust, gpart'ed THAT with an old-style UFS boot filesystem, wrote
bootcode on that, mounted the ZFS "zsboot" filesystem and copied it
over.  That boots fine (of course) and mounts the root pool, and off it
goes.

I'm going to blow away the entire /usr/obj tree and rebuild the kernel
to see if that gets me anything that's more-sane, but right now this
looks pretty bad.

BTW just to be absolutely sure I blew away the entire /usr/obj directory
and rebuilt -- same size and checksum on the binary that I have
installed, so.

Not sure what's going on here -- did something get moved?

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: 9211 (LSI/SAS) issues on 11.2-STABLE

2019-02-06 Thread Karl Denninger

On 2/6/2019 09:18, Borja Marcos wrote:
>> On 5 Feb 2019, at 23:49, Karl Denninger  wrote:
>>
>> BTW under 12.0-STABLE (built this afternoon after the advisories came
>> out, with the patches) it's MUCH worse.  I get the same device resets
>> BUT it's followed by an immediate panic which I cannot dump as it
>> generates a page-fault (supervisor read data, page not present) in the
>> mps *driver* at mpssas_send_abort+0x21.
>> This precludes a dump of course since attempting to do so gives you a
>> double-panic (I was wondering why I didn't get a crash dump!); I'll
>> re-jigger the box to stick a dump device on an internal SATA device so I
>> can successfully get the dump when it happens and see if I can obtain a
>> proper crash dump on this.
>>
>> I think it's fair to assume that 12.0-STABLE should not panic on a disk
>> problem (unless of course the problem is trying to page something back
>> in -- it's not, the drive that aborts and resets is on a data pack doing
>> a scrub)
> It shouldn’t panic I imagine.
>
>>>>> mps0: Sending reset from mpssas_send_abort for target ID 37
>
>>> 0x06  =  =   =  ===  == Transport Statistics (rev 1) ==
>>> 0x06  0x008  4   6  ---  Number of Hardware Resets
>>> 0x06  0x010  4   0  ---  Number of ASR Events
>>> 0x06  0x018  4   0  ---  Number of Interface CRC Errors
>>> |||_ C monitored condition met
>>> ||__ D supports DSN
>>> |___ N normalized value
>>>
>>> 0x06  0x008  4   7  ---  Number of Hardware Resets
>>> 0x06  0x010  4   0  ---  Number of ASR Events
>>> 0x06  0x018  4   0  ---  Number of Interface CRC Errors
>>> |||_ C monitored condition met
>>> ||__ D supports DSN
>>> |___ N normalized value
>>>
>>> Number of Hardware Resets has incremented.  There are no other errors shown:
> What is _exactly_ that value? Is it related to the number of resets sent from 
> the HBA
> _or_ the device resetting by itself?
Good question.  What counts as a "reset"; UNIT ATTENTION is what the
controller receives but whether that's a power reset, a reset *command*
from the HBA or a firmware crash (yikes!) in the disk I'm not certain.
>>> I'd throw possible shade at the backplane or cable /but I have already
>>> swapped both out for spares without any change in behavior./
> What about the power supply? 
>
There are multiple other devices and the system board on that supply
(and thus voltage rails) but it too has been swapped out without
change.  In fact at this point other than the system board and RAM
(which is ECC, and is showing no errors in the system's BMC log)
/everything /in the server case (HBA, SATA expander, backplane, power
supply and cables) has been swapped for spares.  No change in behavior.

Note that with 20.0.7.0 firmware in the HBA instead of a unit attention
I get a *controller* reset (!!!) which detaches some random number of
devices from ZFS's point of view before it comes back up (depending on
what's active at the time) which is potentially catastrophic if it hits
the system pool.  I immediately went back to 19.0.0.0 firmware on the
HBA; I had upgraded to 20.0.7.0 since there had been good reports of
stability with it when I first saw this, thinking there was a drive
change that might have resulted in issues with it when running 19.0
firmware on the card.

This system was completely stable for over a year on 11.1-STABLE and in
fact hadn't been rebooted or logged a single "event" in over six months;
the problems started immediately upon upgrade to 11.2-STABLE and
persists on 12.0-STABLE.  The disks in question haven't changed either
(so it can't be a difference in firmware that is in a newer purchased
disk, for example.)

I'm thinking perhaps *something* in the codebase change made the HBA ->
SAS Expander combination trouble where it wasn't before; I've got a
couple of 16i HBAs on the way which will allow me to remove the SAS
expander to see if that causes the problem to disappear.  I've got a
bunch of these Lenovo expanders and have been using them without any
sort of trouble in multiple machines; it's only when I went beyond 11.1
that I started having trouble with them.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: 9211 (LSI/SAS) issues on 11.2-STABLE

2019-02-05 Thread Karl Denninger

BTW under 12.0-STABLE (built this afternoon after the advisories came
out, with the patches) it's MUCH worse.  I get the same device resets
BUT it's followed by an immediate panic which I cannot dump as it
generates a page-fault (supervisor read data, page not present) in the
mps *driver* at mpssas_send_abort+0x21.

This precludes a dump of course since attempting to do so gives you a
double-panic (I was wondering why I didn't get a crash dump!); I'll
re-jigger the box to stick a dump device on an internal SATA device so I
can successfully get the dump when it happens and see if I can obtain a
proper crash dump on this.

I think it's fair to assume that 12.0-STABLE should not panic on a disk
problem (unless of course the problem is trying to page something back
in -- it's not, the drive that aborts and resets is on a data pack doing
a scrub)

On 2/5/2019 10:26, Karl Denninger wrote:
> On 2/5/2019 09:22, Karl Denninger wrote:
>> On 2/2/2019 12:02, Karl Denninger wrote:
>>> I recently started having some really oddball things  happening under
>>> stress.  This coincided with the machine being updated to 11.2-STABLE
>>> (FreeBSD 11.2-STABLE #1 r342918:) from 11.1.
>>>
>>> Specifically, I get "errors" like this:
>>>
>>>     (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00
>>> length 131072 SMID 269 Aborting command 0xfe0001179110
>>> mps0: Sending reset from mpssas_send_abort for target ID 37
>>>     (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00
>>> length 131072 SMID 924 terminated ioc 804b loginfo 3114 scsi 0 state
>>> c xfer 0
>>>     (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
>>> length 131072 SMID 161 terminated ioc 804b loginfo 3114 scsi 0 state
>>> c xfer 0
>>> mps0: Unfreezing devq for target ID 37
>>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00
>>> (da12:mps0:0:37:0): CAM status: CCB request completed with an error
>>> (da12:mps0:0:37:0): Retrying command
>>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00
>>> (da12:mps0:0:37:0): CAM status: Command timeout
>>> (da12:mps0:0:37:0): Retrying command
>>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
>>> (da12:mps0:0:37:0): CAM status: CCB request completed with an error
>>> (da12:mps0:0:37:0): Retrying command
>>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
>>> (da12:mps0:0:37:0): CAM status: SCSI Status Error
>>> (da12:mps0:0:37:0): SCSI status: Check Condition
>>> (da12:mps0:0:37:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
>>> reset, or bus device reset occurred)
>>> (da12:mps0:0:37:0): Retrying command (per sense data)
>>>
>>> The "Unit Attention" implies the drive reset.  It only occurs on certain
>>> drives under very heavy load (e.g. a scrub.)  I've managed to provoke it
>>> on two different brands of disk across multiple firmware and capacities,
>>> however, which tends to point away from a drive firmware problem.
>>>
>>> A look at the pool data shows /no /errors (e.g. no checksum problems,
>>> etc) and a look at the disk itself (using smartctl) shows no problems
>>> either -- whatever is going on here the adapter is recovering from it
>>> without any data corruption or loss registered on *either end*!
>>>
>>> The configuration is an older SuperMicro Xeon board (X8DTL-IF) and shows:
>>>
>>> mps0:  port 0xc000-0xc0ff mem
>>> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3
>>> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd
>>> mps0: IOCCapabilities:
>>> 1285c
>> After considerable additional work this looks increasingly like either a
>> missed interrupt or a command is getting lost between the host adapter
>> and the expander.
>>
>> I'm going to turn the driver debug level up and see if I can capture
>> more information. whatever is behind this, however, it is
>> almost-certainly related to something that changed between 11.1 and
>> 11.2, as I never saw these on the 11.1-STABLE build.
>>
>> --
>> Karl Denninger
>> k...@denninger.net <mailto:k...@denninger.net>
>> /The Market Ticker/
>> /[S/MIME encrypted email preferred]/
> Pretty decent trace here -- any ideas?
>
> mps0: timedout cm 0xfe00011b5020 allocated tm 0xfe00011812a0
>     (da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3b 80 00 01 00 00
> length 131072 SMID 634 Aborting comma

Re: 9211 (LSI/SAS) issues on 11.2-STABLE

2019-02-05 Thread Karl Denninger

On 2/5/2019 09:22, Karl Denninger wrote:
> On 2/2/2019 12:02, Karl Denninger wrote:
>> I recently started having some really oddball things  happening under
>> stress.  This coincided with the machine being updated to 11.2-STABLE
>> (FreeBSD 11.2-STABLE #1 r342918:) from 11.1.
>>
>> Specifically, I get "errors" like this:
>>
>>     (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00
>> length 131072 SMID 269 Aborting command 0xfe0001179110
>> mps0: Sending reset from mpssas_send_abort for target ID 37
>>     (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00
>> length 131072 SMID 924 terminated ioc 804b loginfo 3114 scsi 0 state
>> c xfer 0
>>     (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
>> length 131072 SMID 161 terminated ioc 804b loginfo 3114 scsi 0 state
>> c xfer 0
>> mps0: Unfreezing devq for target ID 37
>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00
>> (da12:mps0:0:37:0): CAM status: CCB request completed with an error
>> (da12:mps0:0:37:0): Retrying command
>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00
>> (da12:mps0:0:37:0): CAM status: Command timeout
>> (da12:mps0:0:37:0): Retrying command
>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
>> (da12:mps0:0:37:0): CAM status: CCB request completed with an error
>> (da12:mps0:0:37:0): Retrying command
>> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
>> (da12:mps0:0:37:0): CAM status: SCSI Status Error
>> (da12:mps0:0:37:0): SCSI status: Check Condition
>> (da12:mps0:0:37:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
>> reset, or bus device reset occurred)
>> (da12:mps0:0:37:0): Retrying command (per sense data)
>>
>> The "Unit Attention" implies the drive reset.  It only occurs on certain
>> drives under very heavy load (e.g. a scrub.)  I've managed to provoke it
>> on two different brands of disk across multiple firmware and capacities,
>> however, which tends to point away from a drive firmware problem.
>>
>> A look at the pool data shows /no /errors (e.g. no checksum problems,
>> etc) and a look at the disk itself (using smartctl) shows no problems
>> either -- whatever is going on here the adapter is recovering from it
>> without any data corruption or loss registered on *either end*!
>>
>> The configuration is an older SuperMicro Xeon board (X8DTL-IF) and shows:
>>
>> mps0:  port 0xc000-0xc0ff mem
>> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3
>> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd
>> mps0: IOCCapabilities:
>> 1285c
> After considerable additional work this looks increasingly like either a
> missed interrupt or a command is getting lost between the host adapter
> and the expander.
>
> I'm going to turn the driver debug level up and see if I can capture
> more information. whatever is behind this, however, it is
> almost-certainly related to something that changed between 11.1 and
> 11.2, as I never saw these on the 11.1-STABLE build.
>
> --
> Karl Denninger
> k...@denninger.net <mailto:k...@denninger.net>
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/
Pretty decent trace here -- any ideas?

mps0: timedout cm 0xfe00011b5020 allocated tm 0xfe00011812a0
    (da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3b 80 00 01 00 00
length 131072 SMID 634 Aborting command 0xfe00011b5020
mps0: Sending reset from mpssas_send_abort for target ID 37
mps0: queued timedout cm 0xfe00011c2760 for processing by tm
0xfe00011812a0
mps0: queued timedout cm 0xfe00011a74f0 for processing by tm
0xfe00011812a0
mps0: queued timedout cm 0xfe00011cfd50 for processing by tm
0xfe00011812a0
mps0: EventReply    :
    EventDataLength: 2
    AckRequired: 0
    Event: SasDiscovery (0x16)
    EventContext: 0x0
    Flags: 1
    ReasonCode: Discovery Started
    PhysicalPort: 0
    DiscoveryStatus: 0
mps0: (0)->(mpssas_fw_work) Working on  Event: [16]
mps0: (1)->(mpssas_fw_work) Event Free: [16]
    (da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3c 80 00 01 00 00
length 131072 SMID 961 completed timedout cm 0xfe00011cfd50 ccb
0xf8019458e000 during recovery ioc 804b scsi 0 state c
(da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3c 80 00 01 00 00 length
131072 SMID 961 terminated ioc 804b loginfo 3114 scsi 0 state c xfer 0
    (da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3b 80 00 01 00 00
length 131072 SMID 634 completed timedout cm
0xfe00011b5(da11:mps0:0:37:0): READ(10). CDB: 28 00 82 b5 3c 80

Re: 9211 (LSI/SAS) issues on 11.2-STABLE

2019-02-05 Thread Karl Denninger


On 2/2/2019 12:02, Karl Denninger wrote:
> I recently started having some really oddball things  happening under
> stress.  This coincided with the machine being updated to 11.2-STABLE
> (FreeBSD 11.2-STABLE #1 r342918:) from 11.1.
>
> Specifically, I get "errors" like this:
>
>     (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00
> length 131072 SMID 269 Aborting command 0xfe0001179110
> mps0: Sending reset from mpssas_send_abort for target ID 37
>     (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00
> length 131072 SMID 924 terminated ioc 804b loginfo 3114 scsi 0 state
> c xfer 0
>     (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
> length 131072 SMID 161 terminated ioc 804b loginfo 3114 scsi 0 state
> c xfer 0
> mps0: Unfreezing devq for target ID 37
> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00
> (da12:mps0:0:37:0): CAM status: CCB request completed with an error
> (da12:mps0:0:37:0): Retrying command
> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00
> (da12:mps0:0:37:0): CAM status: Command timeout
> (da12:mps0:0:37:0): Retrying command
> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
> (da12:mps0:0:37:0): CAM status: CCB request completed with an error
> (da12:mps0:0:37:0): Retrying command
> (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
> (da12:mps0:0:37:0): CAM status: SCSI Status Error
> (da12:mps0:0:37:0): SCSI status: Check Condition
> (da12:mps0:0:37:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
> reset, or bus device reset occurred)
> (da12:mps0:0:37:0): Retrying command (per sense data)
>
> The "Unit Attention" implies the drive reset.  It only occurs on certain
> drives under very heavy load (e.g. a scrub.)  I've managed to provoke it
> on two different brands of disk across multiple firmware and capacities,
> however, which tends to point away from a drive firmware problem.
>
> A look at the pool data shows /no /errors (e.g. no checksum problems,
> etc) and a look at the disk itself (using smartctl) shows no problems
> either -- whatever is going on here the adapter is recovering from it
> without any data corruption or loss registered on *either end*!
>
> The configuration is an older SuperMicro Xeon board (X8DTL-IF) and shows:
>
> mps0:  port 0xc000-0xc0ff mem
> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3
> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd
> mps0: IOCCapabilities:
> 1285c

After considerable additional work this looks increasingly like either a
missed interrupt or a command is getting lost between the host adapter
and the expander.

I'm going to turn the driver debug level up and see if I can capture
more information. whatever is behind this, however, it is
almost-certainly related to something that changed between 11.1 and
11.2, as I never saw these on the 11.1-STABLE build.

--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

9211 (LSI/SAS) issues on 11.2-STABLE

2019-02-02 Thread Karl Denninger

I recently started having some really oddball things  happening under
stress.  This coincided with the machine being updated to 11.2-STABLE
(FreeBSD 11.2-STABLE #1 r342918:) from 11.1.

Specifically, I get "errors" like this:

    (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00
length 131072 SMID 269 Aborting command 0xfe0001179110
mps0: Sending reset from mpssas_send_abort for target ID 37
    (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00
length 131072 SMID 924 terminated ioc 804b loginfo 3114 scsi 0 state
c xfer 0
    (da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
length 131072 SMID 161 terminated ioc 804b loginfo 3114 scsi 0 state
c xfer 0
mps0: Unfreezing devq for target ID 37
(da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00
(da12:mps0:0:37:0): CAM status: CCB request completed with an error
(da12:mps0:0:37:0): Retrying command
(da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00
(da12:mps0:0:37:0): CAM status: Command timeout
(da12:mps0:0:37:0): Retrying command
(da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
(da12:mps0:0:37:0): CAM status: CCB request completed with an error
(da12:mps0:0:37:0): Retrying command
(da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
(da12:mps0:0:37:0): CAM status: SCSI Status Error
(da12:mps0:0:37:0): SCSI status: Check Condition
(da12:mps0:0:37:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
reset, or bus device reset occurred)
(da12:mps0:0:37:0): Retrying command (per sense data)

The "Unit Attention" implies the drive reset.  It only occurs on certain
drives under very heavy load (e.g. a scrub.)  I've managed to provoke it
on two different brands of disk across multiple firmware and capacities,
however, which tends to point away from a drive firmware problem.

A look at the pool data shows /no /errors (e.g. no checksum problems,
etc) and a look at the disk itself (using smartctl) shows no problems
either -- whatever is going on here the adapter is recovering from it
without any data corruption or loss registered on *either end*!

The configuration is an older SuperMicro Xeon board (X8DTL-IF) and shows:

mps0:  port 0xc000-0xc0ff mem
0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3
mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities:
1285c

There is also a SAS expander connected to that with all but the boot
drives on it (the LSI card will not boot from the expander so the boot
mirror is directly connected to the adapter.)

Thinking this might be a firmware/driver compatibility related problem I
flashed the card to 20.00.07.00, which is the latest available.  That
made the situation **MUCH** worse; now instead of getting unit attention
issues I got *controller* resets (!!) which invariably some random
device (and sometimes more than one) in one of the pools to get
detached, as the controller didn't come back up fast enough for ZFS and
it declares the device(s) in question "removed".

Needless to say I immediately flashed the card back to 19.00.00.00!

This configuration has been completely stable on 11.1 for upwards of a
year, and only started misbehaving when I updated the OS to 11.2.  I've
pounded the living daylights out of this box for a very long time on a
succession of FreeBSD OS builds and up to 11.1 have never seen anything
like this; if I had a bad drive, it was clearly the drive.

Looking at the commit logs for the mps driver it appears there isn't
much here that *could* be involved, unless there's an interrupt issue
with some of the MSI changes that is interacting with my specific
motherboard line.

Any ideas on running this down would be appreciated; it's not easy to
trigger it on the 19.0 firmware but on 20. I can force a controller
reset and detach within a few minutes by running scrubs so if there are
things I can try (I have a sandbox machine with the same hardware in it
that won't make me cry much if I blow it up) that would great.

Thanks!

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Not sure if this is the correct place.... (laptop, dual-boot EFI)

2019-01-27 Thread Karl Denninger

Here's a write-up on it -- it was /much /simpler than I expected and
unlike my X220 didn't require screwing with group policy for Bitlocker
to coexist with a dual-boot environment.

https://market-ticker.org/akcs-www?post=234936

Feel free to grab/reproduce/link to/whatever; hope this helps others. 
It runs very nicely on 12-RELEASE -- the only thing I've noted thus far
is the expected lack of 5g WiFi support.

On 1/26/2019 15:04, Karl Denninger wrote:
> Nevermind!
>
> I set the "-g" flag on the provider and voila.  Up she comes; the
> loader figured out that it had to prompt for the password and it was
> immediately good.
>
> Now THAT'S easy compared with the convoluted BS I had to do (two
> partitions, fully "by-hand" install, etc) for 11 on my X220.
>
> Off to the races I go; now I have to figure out what I have to set in
> Windows group policy so Bitlocker doesn't throw up every time I boot
> FreeBSD (this took a bit with my X220 since the boot manager tickled
> something that Bitlocker interpreted as "someone tampered with the
> system.")  Maybe this will be a nothingburger too (which would be great
> if true.)
>
> I'm going to write this one up when I've got it all solid and post it on
> my blog; hopefully it will help others.
>
> On 1/26/2019 14:26, Karl Denninger wrote:
>>  1/26/2019 14:10, Warner Losh wrote:
>>> On Sat, Jan 26, 2019 at 1:01 PM Karl Denninger >> <mailto:k...@denninger.net>> wrote:
>>>
>>> Further question  does boot1.efi (which I assume has to be
>>> placed on
>>> the EFI partition and then something like rEFInd can select it)
>>> know how
>>> to handle a geli-encrypted primary partition (e.g. for root/boot so I
>>> don't need an unencrypted /boot partition), and if so how do I tell it
>>> that's the case and to prompt for the password?
>>>
>>>
>>> Not really. The whole reason we ditched boot1.efi is because it is
>>> quite limited in what it can do. You must loader.efi for that.
>>>  
>>>
>>> (If not I know how to set up for geli-encryption using a non-encrypted
>>> /boot partition, but my understanding is that for 12 the loader was
>>> taught how to handle geli internally and thus you can now install
>>> 12 --
>>> at least for ZFS -- with encryption on root.  However, that wipes the
>>> disk if you try to select it in the installer, so that's no good
>>> -- and
>>> besides, on a laptop zfs is overkill.)
>>>
>>>
>>> For MBR stuff, yes. For loader.efi, yes. For boot1.efi, no: it did not
>>> and will not grow that functionality.
>>>
>>> Warner
>>>  
>> Ok, next dumb question -- can I put loader.efi in the EFI partition
>> under EFI/FreeBSD as "bootx64.efi" there (from reading mailing list
>> archives that appears to be yes -- just copy it in) and, if yes, how do
>> I "tell" it that when it finds the freebsd-ufs partition on the disk it
>> was started from (which, if I'm reading correctly, it will scan and look
>> for) that it needs to geli attach the partition before it dig into there
>> and find the rest of what it needs to boot?
>>
>> That SHOULD allow me to use an EFI boot manager to come up on initial
>> boot, select FreeBSD and the loader.efi (named as bootx64.efi in
>> EFI/FreeBSD) code will then boot the system.
>>
>> I've looked as the 12-RELEASE man page(s) and it's not obvious how you
>> tell the loader to look for the partition and then attach it via GELI
>> (prompting for the password of course) before attempting to boot it;
>> obviously a "load" directive (e.g. geom_eli_load ="YES") makes no sense
>> as the thing you'd "load" is on the disk you'd be loading it from and
>> its encrypted.. .never mind that loader.conf violates the 8.3 filename
>> rules for a DOS filesystem.
>>
>> Thanks!
>>
-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Not sure if this is the correct place.... (laptop, dual-boot EFI)

2019-01-26 Thread Karl Denninger

Nevermind!

I set the "-g" flag on the provider and voila.  Up she comes; the
loader figured out that it had to prompt for the password and it was
immediately good.

Now THAT'S easy compared with the convoluted BS I had to do (two
partitions, fully "by-hand" install, etc) for 11 on my X220.

Off to the races I go; now I have to figure out what I have to set in
Windows group policy so Bitlocker doesn't throw up every time I boot
FreeBSD (this took a bit with my X220 since the boot manager tickled
something that Bitlocker interpreted as "someone tampered with the
system.")  Maybe this will be a nothingburger too (which would be great
if true.)

I'm going to write this one up when I've got it all solid and post it on
my blog; hopefully it will help others.

On 1/26/2019 14:26, Karl Denninger wrote:
>  1/26/2019 14:10, Warner Losh wrote:
>>
>> On Sat, Jan 26, 2019 at 1:01 PM Karl Denninger > <mailto:k...@denninger.net>> wrote:
>>
>> Further question  does boot1.efi (which I assume has to be
>> placed on
>> the EFI partition and then something like rEFInd can select it)
>> know how
>> to handle a geli-encrypted primary partition (e.g. for root/boot so I
>> don't need an unencrypted /boot partition), and if so how do I tell it
>> that's the case and to prompt for the password?
>>
>>
>> Not really. The whole reason we ditched boot1.efi is because it is
>> quite limited in what it can do. You must loader.efi for that.
>>  
>>
>> (If not I know how to set up for geli-encryption using a non-encrypted
>> /boot partition, but my understanding is that for 12 the loader was
>> taught how to handle geli internally and thus you can now install
>> 12 --
>> at least for ZFS -- with encryption on root.  However, that wipes the
>> disk if you try to select it in the installer, so that's no good
>> -- and
>> besides, on a laptop zfs is overkill.)
>>
>>
>> For MBR stuff, yes. For loader.efi, yes. For boot1.efi, no: it did not
>> and will not grow that functionality.
>>
>> Warner
>>  
> Ok, next dumb question -- can I put loader.efi in the EFI partition
> under EFI/FreeBSD as "bootx64.efi" there (from reading mailing list
> archives that appears to be yes -- just copy it in) and, if yes, how do
> I "tell" it that when it finds the freebsd-ufs partition on the disk it
> was started from (which, if I'm reading correctly, it will scan and look
> for) that it needs to geli attach the partition before it dig into there
> and find the rest of what it needs to boot?
>
> That SHOULD allow me to use an EFI boot manager to come up on initial
> boot, select FreeBSD and the loader.efi (named as bootx64.efi in
> EFI/FreeBSD) code will then boot the system.
>
> I've looked as the 12-RELEASE man page(s) and it's not obvious how you
> tell the loader to look for the partition and then attach it via GELI
> (prompting for the password of course) before attempting to boot it;
> obviously a "load" directive (e.g. geom_eli_load ="YES") makes no sense
> as the thing you'd "load" is on the disk you'd be loading it from and
> its encrypted.. .never mind that loader.conf violates the 8.3 filename
> rules for a DOS filesystem.
>
> Thanks!
>
-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Not sure if this is the correct place.... (laptop, dual-boot EFI)

2019-01-26 Thread Karl Denninger

 1/26/2019 14:10, Warner Losh wrote:
>
>
> On Sat, Jan 26, 2019 at 1:01 PM Karl Denninger  <mailto:k...@denninger.net>> wrote:
>
> Further question  does boot1.efi (which I assume has to be
> placed on
> the EFI partition and then something like rEFInd can select it)
> know how
> to handle a geli-encrypted primary partition (e.g. for root/boot so I
> don't need an unencrypted /boot partition), and if so how do I tell it
> that's the case and to prompt for the password?
>
>
> Not really. The whole reason we ditched boot1.efi is because it is
> quite limited in what it can do. You must loader.efi for that.
>  
>
> (If not I know how to set up for geli-encryption using a non-encrypted
> /boot partition, but my understanding is that for 12 the loader was
> taught how to handle geli internally and thus you can now install
> 12 --
> at least for ZFS -- with encryption on root.  However, that wipes the
> disk if you try to select it in the installer, so that's no good
> -- and
> besides, on a laptop zfs is overkill.)
>
>
> For MBR stuff, yes. For loader.efi, yes. For boot1.efi, no: it did not
> and will not grow that functionality.
>
> Warner
>  

Ok, next dumb question -- can I put loader.efi in the EFI partition
under EFI/FreeBSD as "bootx64.efi" there (from reading mailing list
archives that appears to be yes -- just copy it in) and, if yes, how do
I "tell" it that when it finds the freebsd-ufs partition on the disk it
was started from (which, if I'm reading correctly, it will scan and look
for) that it needs to geli attach the partition before it dig into there
and find the rest of what it needs to boot?

That SHOULD allow me to use an EFI boot manager to come up on initial
boot, select FreeBSD and the loader.efi (named as bootx64.efi in
EFI/FreeBSD) code will then boot the system.

I've looked as the 12-RELEASE man page(s) and it's not obvious how you
tell the loader to look for the partition and then attach it via GELI
(prompting for the password of course) before attempting to boot it;
obviously a "load" directive (e.g. geom_eli_load ="YES") makes no sense
as the thing you'd "load" is on the disk you'd be loading it from and
its encrypted.. .never mind that loader.conf violates the 8.3 filename
rules for a DOS filesystem.

Thanks!

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Not sure if this is the correct place.... (laptop, dual-boot EFI)

2019-01-26 Thread Karl Denninger

Further question  does boot1.efi (which I assume has to be placed on
the EFI partition and then something like rEFInd can select it) know how
to handle a geli-encrypted primary partition (e.g. for root/boot so I
don't need an unencrypted /boot partition), and if so how do I tell it
that's the case and to prompt for the password?

(If not I know how to set up for geli-encryption using a non-encrypted
/boot partition, but my understanding is that for 12 the loader was
taught how to handle geli internally and thus you can now install 12 --
at least for ZFS -- with encryption on root.  However, that wipes the
disk if you try to select it in the installer, so that's no good -- and
besides, on a laptop zfs is overkill.)

Thanks!

On 1/26/2019 08:08, Kamila Součková wrote:
> I'm just booting the installer, going to do this on my X1 Carbon (5th gen),
> and I'm planning to use the efibootmgr entry first (which is sufficient for
> booting), and later I might add rEFInd if I feel like it. I'll be posting
> my steps online, I can post the link once it's out there if you're
> interested.
>
> I'm very curious about HW support on the 6th gen Carbon, it'd be great to
> hear how it goes.
>
> Have fun!
>
> Kamila
>
> On Sat, 26 Jan 2019, 06:54 Kyle Evans,  wrote:
>
>> On Fri, Jan 25, 2019 at 6:30 PM Jonathan Chen  wrote:
>>> On Sat, 26 Jan 2019 at 13:00, Karl Denninger  wrote:
>>> [...]
>>>> I'd like to repartition it to be able to dual boot it much as I do with
>>>> my X220 (I wish I could ditch Windows entirely, but that is just not
>>>> going to happen), but I'm not sure how to accomplish that in the EFI
>>>> world -- or if it reasonably CAN be done in the EFI world.  Fortunately
>>>> the BIOS has an option to turn off secure boot (which I surmise from
>>>> reading the Wiki FreeBSD doesn't yet support) but I still need a means
>>>> to select from some reasonably-friendly way *what* to boot.
>>> The EFI partition is just a MS-DOS partition, and most EFI aware BIOS
>>> will (by default) load /EFI/Boot/boot64.efi when starting up. On my
>>> Dell Inspiron 17, I created /EFI/FreeBSD and copied FreeBSD's
>>> /boot/loader.efi to /EFI/FreeBSD/boot64.efi. My laptop's BIOS setup
>>> allowed me to specify a boot-entry to for \EFI\FreeBSD\boot64.efi. On
>>> a cold start, I have to be quick to hit the F12 key, which then allows
>>> me to specify whether to boot Windows or FreeBSD. I'm not sure how
>>> Lenovo's BIOS setup works, but I'm pretty sure that it should have
>>> something similar.
>>>
>> Adding a boot-entry can also be accomplished with efibootmgr. This is
>> effectively what the installer in -CURRENT does, copying loader to
>> \EFI\FreeBSD on the ESP and using efibootmgr to insert a "FreeBSD"
>> entry for that loader and activating it.
>> ___
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>>
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Not sure if this is the correct place.... (laptop, dual-boot EFI)

2019-01-25 Thread Karl Denninger

-mobile appears to be pretty much a dead-letter, so I'm posting here...

I have dual-boot working well on my Lenovo X220, and have for quite some
time, between Win10 and FreeBSD 11.  This is set up for MBR however, not
EFI.

I just picked up an X1 Carbon Gen 6, which is an UEFI machine, with
Win10 on it.

I'd like to repartition it to be able to dual boot it much as I do with
my X220 (I wish I could ditch Windows entirely, but that is just not
going to happen), but I'm not sure how to accomplish that in the EFI
world -- or if it reasonably CAN be done in the EFI world.  Fortunately
the BIOS has an option to turn off secure boot (which I surmise from
reading the Wiki FreeBSD doesn't yet support) but I still need a means
to select from some reasonably-friendly way *what* to boot.

With the X220 Bootmanager does this reasonably easily; you get an "F"
key for the desired partition, and if you press nothing after a few
seconds whatever you pressed last is booted.  Works fine.  What options
exist for doing this in a UEFI world, if any, and is there a "cookbook"
for putting this together?  I assume *someone* has set up dual, given
that the X1 Carbon Gen 6 is listed as working in the laptop database.

Thanks in advance!

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

11.2-STABLE and MMC problems on pcEngines apu2c0

2018-12-10 Thread Karl Denninger

way 192.168.10.100 fib
0: route already in table
.
Updating motd:.
Mounting late filesystems:.
Starting ntpd.
Starting powerd.
Starting rtadvd.
Dec 10 15:31:01 IpGw ntpd[985]: leapsecond file
('/var/db/ntpd.leap-seconds.list'): expired less than 558 days ago
Starting dhcpd.
Internet Systems Consortium DHCP Server 4.3.5
Copyright 2004-2016 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Config file: /usr/local/etc/dhcpd.conf
Database file: /var/db/dhcpd.leases
PID file: /var/run/dhcpd.pid
Wrote 0 deleted host decls to leases file.
Wrote 0 new dynamic host decls to leases file.
Wrote 0 leases to leases file.
Listening on BPF/igb1.3/00:0d:b9:46:71:89/192.168.4.0/24
Sending on   BPF/igb1.3/00:0d:b9:46:71:89/192.168.4.0/24
Listening on BPF/igb1/00:0d:b9:46:71:89/two-on-cable
Sending on   BPF/igb1/00:0d:b9:46:71:89/two-on-cable
Sending on   Socket/fallback/fallback-net
Starting sshguard.
Starting snmpd.
cannot forward src fe80:2::396c:673e:9cf:ba21, dst
2600:1402:16::17db:a212, nxt 6, rcvif igb1, outif igb0
Starting openvpn.
tun0: link state changed to UP
Configuring vt: blanktime.
Performing sanity check on sshd configuration.
Starting sshd.
Starting sendmail_submit.
Starting sendmail_msp_queue.
Starting cron.
Starting background file system checks in 60 seconds.

Mon Dec 10 15:31:06 CST 2018
cannot forward src fe80:2::396c:673e:9cf:ba21, dst
2600:1402:16::17db:a212, nxt 6, rcvif igb1, outif igb0
cannot forward src fe80:2::396c:673e:9cf:ba21, dst
2600:1402:16::17db:a212, nxt 6, rcvif igb1, outif igb0
cannot forward src fe80:2::396c:673e:9cf:ba21, dst
2600:1402:16::17db:a212, nxt 6, rcvif igb1, outif igb0
cannot forward src fe80:2::396c:673e:9cf:ba21, dst
2600:1402:16::17db:a212, nxt 6, rcvif igb1, outif igb0

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Random freezes of my FreeBSD droplet (DigitalOcean)

2017-11-22 Thread Karl Denninger

Yeah, don't do that.  I have a DO Zfs-enabled FreeBSD deployment and I
have the swap on /dev/gpt/swap0 (a regular slice)... no problems.

$ uptime
 9:17AM  up 174 days,  2:06, 1 users, load averages: 0.89, 0.80, 0.83
$


On 11/22/2017 08:15, Stefan Lambrev wrote:
> Hi,
>
> Thanks for this idea.
>
> Device  1K-blocks UsedAvail Capacity
> /dev/zvol/zroot/swap   2097152   310668  178648415%
>
> Will check why at all swap is used when the VM is not used. But yes - as
> it's a image provided by DO the swap is on the zvol...
>
> On Wed, Nov 22, 2017 at 4:08 PM, Adam Vande More 
> wrote:
>
>> On Wed, Nov 22, 2017 at 7:17 AM, Stefan Lambrev 
>> wrote:
>>
>>> Greetings,
>>>
>>> I have a droplet in DO with very light load, currently
>>> running 11.0-RELEASE-p15 amd64 GENERIC kernel + zfs (1 GB Memory / 30 GB
>>> Disk / FRA1 - FreeBSD 11.0 zfs)
>>>
>>> I know ZFS needs more memory, but the load is really light. Unfortunatelly
>>> last few weeks I'm experiencing those freezes almost every second day.
>>> There are no logs or console messages - just freeze. Networks seems to
>>> work, but nothing else.
>>>
>>> Is there anyone with similar experience here?
>>> Are there any updates in 11.1 that may affect positively my experience in
>>> the digital ocean cloud?
>>>
>> It's entirely possible to run a stable VM using that configuration so you
>> haven't provided enough details to give any real help.  A common foot
>> shooting method is putting swap on zvol, but the possibilities are endless.
>>
>> --
>> Adam
>>
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Installing amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an MBR partitioned disk

2017-10-07 Thread Karl Denninger

On 10/7/2017 12:12, Eugene Grosbein wrote:
> 07.10.2017 22:26, Warner Losh wrote:
>
>> Sorry for top posting. Sounds like your BIOS will read the botox64.efi from 
>> the removable USB drive,
>> but won't from the hard drive. Force BIOS booting instead of UEFI and it 
>> will install correctly.
>> However, it may not boot Windows, which I think requires UEFI these days.
> My home desktop is UEFI-capable and but switched to BIOS/MBR mode
> and it dual-boots FreeBSD/Windows 8.1 just fine.
Windows (including Windows 10) doesn't "require" UEFI but the current
installer will set it up that way on a "from scratch" installation.  If
you have it on an MBR disk (e.g. you started with 7 or 8, for example)
it will boot and run just fine from it, and in fact if you have a legacy
license and try to change to UEFI (with a full, from-scratch reload) you
run the risk of it declaring your license invalid!  You can /probably
/get around that by getting in touch with Microsoft but why do so
without good reason?

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Installing amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an MBR partitioned disk

2017-10-07 Thread Karl Denninger

On 10/7/2017 11:27, Rostislav Krasny wrote:
> On Sat, Oct 7, 2017 at 6:26 PM, Warner Losh  wrote:
>> Sorry for top posting. Sounds like your BIOS will read the botox64.efi from
>> the removable USB drive, but won't from the hard drive. Force BIOS booting
>> instead of UEFI and it will install correctly. However, it may not boot
>> Windows, which I think requires UEFI these days.
>>
>> The root of the problem is that we have no way to setup the EFI boot
>> variables in the installer that we need to properly installed under UEFI.
>> I'm working on that, so you'll need to be patient...
>>
>> Warner
> My computer doesn't have any EFI partition and this explains why the
> installed FreeBSD boots in the BIOS mode on it. The installation media
> probably has the EFI partition (with the bootx64.efi) and then BIOS
> probably boots the installation media in the UEFI mode instead of the
> BIOS mode. So the "machdep.bootmethod" sysctl doesn't represent the
> BIOS boot mode configuration but a boot method the currently running
> system was booted in. If this is true then the "machdep.bootmethod"
> sysctl should not be used in bsdinstall. At least not for the
> bootability check. Something else should be used for the bootability
> check or the bsdinstall should trust the user choice.
>
> BTW this is how the EFI partition looks like in someone's Windows 7
> disk manager:
> https://www.easyuefi.com/wintousb/images/en_US/efi-system-partition.png
> and this how it looks without any EFI partition in my system (with
> Windows 7 / FreeBSD dual-boot)
> http://i68.tinypic.com/9u19b8.png
>
> I think even that NTFS System Reserved partition is not mandatory for
> Windows installation. It just used to keep Windows boot files in a
> safe, place preventing accidental deletion by a user. It's being
> created if Windows is installed on an empty disk but if you create
> just one big NTFS partition prior to the Windows installation and
> install it on that single partition it will be ok. There will be just
> more Windows system files on the C disk, for example ntldr,
> NTDETECT.COM. It can be checked on VM, for example on VirtualBox.
> ___
The problem with the new installer appears to be that it follows this
heuristic when you boot FreeBSD media from a USB stick or similar media:

1. If the system CAN boot EFI then it WILL EFI boot the FreeBSD
installer from that media.

2. The installer sees that it booted from EFI.  It also sees a fixed
disk with a non-EFI boot environment.  It declares that fixed disk
environment "non-bootable", which is not by any means a known fact.

3. Having done that it will then "offer" to re-initialize the disk as
EFI/GPT, which is ok if you don't have anything else on there that you
want.  If you DO then it's obviously not ok, and in that instance it
both won't load the MBR boot manager *and* won't load the second-stage
MBR boot code either.

You can get around this by hand-installing both parts of the boot code,
which is what I wound up doing on my Lenovo laptop.  That machine was
originally delivered with Windows 7 and upgraded "in place" to Win10,
which is why the disk is MBR-partitioned rather than EFI/GPT, although
the machine itself does support EFI booting.

I would suggest that the FreeBSD installer should be more-intelligent
about this, but I suspect it's a fairly uncommon set of circumstances. 
Far more troublesome in the EFI world is the fact that "out-of-the-box"
multi-boot in an EFI environment is a five-alarm pain in the butt
although there are EFI boot managers that make it reasonable.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Installing amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an MBR partitioned disk

2017-10-06 Thread Karl Denninger



On 10/6/2017 10:42, Karl Denninger wrote:
> On 10/6/2017 10:17, Rostislav Krasny wrote:
>> Hi there,
>>
>> I try to install amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an
>> MBR partitioned disk and I can't make it bootable. My Windows 7 uses
>> its standard MBR partitioning scheme (1. 100MB System Reserved
>> Partition; 2 - 127GB disk C partition) and there is about 112GB of
>> free unallocated disk space that I want to use to install FreeBSD on
>> it. As an installation media I use the
>> FreeBSD-11.1-RELEASE-amd64-mini-memstick.img flashed on a USB drive.
>>
>> During the installation, if I choose to use Guided Partitioning Tool
>> and automatic partitioning of the free space, I get a pop-up message
>> that says:
>>
>> ==
>> The existing partition scheme on this disk
>> (MBR) is not bootable on this platform. To
>> install FreeBSD, it must be repartitioned.
>> This will destroy all data on the disk.
>> Are you sure you want to proceed?
>>   [Yes]  [No]
>> ==
>>
>> If instead of the Guided Partitioning Tool I choose to partition the
>> disk manually I get a similar message as a warning and the
>> installation process continues without an error, but the installed
>> FreeBSD system is not bootable. Installing boot0 manually (boot0cfg
>> -Bv /dev/ada0) doesn't fix it. The boot0 boot loader is able to boot
>> Windows but it's unable to start the FreeBSD boot process. It only
>> prints hash symbols when I press F3 (the FreeBSD slice/MBR partition
>> number).
>>
>> I consider this as a critical bug. But maybe there is some workaround
>> that allows me to install the FreeBSD 11.1 as a second OS without
>> repartitioning the entire disk?
>>
>> My hardware is an Intel Core i7 4790 3.6GHz based machine with 16GB
>> RAM. The ada0 disk is 238GB SanDisk SD8SBAT256G1122 (SSD).
>>
> You have to do the partitioning and then install FreeBSD's boot
> manager by hand.  It /does /work; I ran into the same thing with my
> Lenovo X220 and was able to manually install it, which works fine to
> dual-boot between Windows and FreeBSD-11.  I had to do it manually
> because the installer detected that the X220 was UEFI capable and
> insisted on GPT-partitioning the disk, which is incompatible with
> dual-boot and the existing MBR-partitioned Windows installation.
>
> You want the partition layout to look like this:
>
> $ gpart show
> =>   63  500118129  ada0  MBR  (238G)
>  63    4208967 1  ntfs  (2.0G)
>     4209030  307481727 2  ntfs  (147G)
>   311690757  3    - free -  (1.5K)
>   311690760  165675008 3  freebsd  [active]  (79G)
>   477365768 808957    - free -  (395M)
>   478174725   21928725 4  ntfs  (10G)
>   500103450  14742    - free -  (7.2M)
>
> =>    0  165675008  ada0s3  BSD  (79G)
>   0    8388608   1  freebsd-ufs  (4.0G)
>     8388608  136314880   2  freebsd-ufs  (65G)
>   144703488   20971519   4  freebsd-swap  (10G)
>   165675007  1  - free -  (512B)
>
> MBR has only four partitions; the "standard" Windows (7+) install uses
> /three. /The "boot"/repair area, the main partition and, on most
> machines, a "recovery" partition.  That usually leaves partition 3
> free which is where I stuck FreeBSD.   Note that you must then set up
> slices on Partition 3 (e.g. root/usr/swap) as usual.
>
BTW if you're getting the "#" when you hit the partition key that means
the /second stage /boot loader is /not /on the partition you selected;
the bootmanager can't find it.  This can be manually installed with:

# gpart bootcode -b /boot/boot ada0s3

"s3" is the partition in question upon which you created the BSD-labeled
structure.

One thing to be aware of is that you must adjust Windows group policy if
you intend to use Bitlocker, or it will declare the disk structure
changed and refuse to take your key (demanding the recovery key)
whenever the FreeBSD boot manager changes the active "next boot" flag. 
By default /any /change in the boot structure will set off Bitlocker;
you can relax it to not get so cranked, but you need to do so /before
/encrypting the partition under Windows.

I run GELI encryption on the FreeBSD partition which is why I have a
separate (small) boot filesystem; that too has to be manually set up for
an installation like this using MBR.  It works well.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Installing amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an MBR partitioned disk

2017-10-06 Thread Karl Denninger


On 10/6/2017 10:17, Rostislav Krasny wrote:

Hi there,

I try to install amd64 FreeBSD 11.1 in dual-boot with Windows 7 on an
MBR partitioned disk and I can't make it bootable. My Windows 7 uses
its standard MBR partitioning scheme (1. 100MB System Reserved
Partition; 2 - 127GB disk C partition) and there is about 112GB of
free unallocated disk space that I want to use to install FreeBSD on
it. As an installation media I use the
FreeBSD-11.1-RELEASE-amd64-mini-memstick.img flashed on a USB drive.

During the installation, if I choose to use Guided Partitioning Tool
and automatic partitioning of the free space, I get a pop-up message
that says:

==
The existing partition scheme on this disk
(MBR) is not bootable on this platform. To
install FreeBSD, it must be repartitioned.
This will destroy all data on the disk.
Are you sure you want to proceed?
  [Yes]  [No]
==

If instead of the Guided Partitioning Tool I choose to partition the
disk manually I get a similar message as a warning and the
installation process continues without an error, but the installed
FreeBSD system is not bootable. Installing boot0 manually (boot0cfg
-Bv /dev/ada0) doesn't fix it. The boot0 boot loader is able to boot
Windows but it's unable to start the FreeBSD boot process. It only
prints hash symbols when I press F3 (the FreeBSD slice/MBR partition
number).

I consider this as a critical bug. But maybe there is some workaround
that allows me to install the FreeBSD 11.1 as a second OS without
repartitioning the entire disk?

My hardware is an Intel Core i7 4790 3.6GHz based machine with 16GB
RAM. The ada0 disk is 238GB SanDisk SD8SBAT256G1122 (SSD).

You have to do the partitioning and then install FreeBSD's boot manager 
by hand.  It /does /work; I ran into the same thing with my Lenovo X220 
and was able to manually install it, which works fine to dual-boot 
between Windows and FreeBSD-11.  I had to do it manually because the 
installer detected that the X220 was UEFI capable and insisted on 
GPT-partitioning the disk, which is incompatible with dual-boot and the 
existing MBR-partitioned Windows installation.


You want the partition layout to look like this:

$ gpart show
=>   63  500118129  ada0  MBR  (238G)
 63    4208967 1  ntfs  (2.0G)
    4209030  307481727 2  ntfs  (147G)
  311690757  3    - free -  (1.5K)
  311690760  165675008 3  freebsd  [active]  (79G)
  477365768 808957    - free -  (395M)
  478174725   21928725 4  ntfs  (10G)
  500103450  14742    - free -  (7.2M)

=>    0  165675008  ada0s3  BSD  (79G)
  0    8388608   1  freebsd-ufs  (4.0G)
    8388608  136314880   2  freebsd-ufs  (65G)
  144703488   20971519   4  freebsd-swap  (10G)
  165675007  1  - free -  (512B)

MBR has only four partitions; the "standard" Windows (7+) install uses 
/three. /The "boot"/repair area, the main partition and, on most 
machines, a "recovery" partition.  That usually leaves partition 3 free 
which is where I stuck FreeBSD.   Note that you must then set up slices 
on Partition 3 (e.g. root/usr/swap) as usual.



--
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: issues with powerd/freq_levels

2017-08-02 Thread Karl Denninger


On 8/2/2017 13:06, Ian Smith wrote:
> Is it working on the others?  Does it actually idle at 600MHz?  If in 
> doubt, running 'powerd -v' for a while will show you what's happening.  
> Despite being low power, running slower when more or less idle - along 
> with hopefully getting to use C2 state - should cool these down a lot.
>
Yes.  "powerd -v" sez (once it gets going)

load   3%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   6%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   8%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   4%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   3%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   4%, current freq  600 MHz ( 2), wanted freq  600 MHz
load   0%, current freq  600 MHz ( 2), wanted freq  600 MHz
load  14%, current freq  600 MHz ( 2), wanted freq  600 MHz

dev.cpu.3.temperature: 58.5C
dev.cpu.2.temperature: 58.5C
dev.cpu.1.temperature: 58.5C
dev.cpu.0.temperature: 58.5C

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: issues with powerd/freq_levels

2017-08-01 Thread Karl Denninger

ure is a totally separate issue. It is VERY sensitive to external
>  > issue like airflow and position of the CPU in relation to other components
>  > in the chassis Also, unless you have a lot of cores, you probably should
>  > set both economy_cx_lowest and performance_cx_lowest to Cmax. Economy
>  > should default to that, but  performance will not as that can cause issues
>  > on systems with large numbers of cores, so is set to C2. Many such system
>  > used to disable deeper sleep modes in BIOS, but I am way behind the times
>  > and don't know about the current state of affairs. Certainly for systems
>  > with 32 or fewer cores, this should not be an issue. In any case, Cx state
>  > can sharply impact temperature.
>
> Indeed.  But as these are low-power devices already, it's likely less of 
> a concern, but maximising efficiency and minimising stress never hurts.
>
>  > Finally, the last case with power levels of -1 for all frequencies is
>  > probably because the CPU manufacturer (Intel?) has not published this
>  > information. For a while they were treating this as "proprietary"
>  > information. Very annoying! It's always something that is not readily
>  > available. Thi is one reason I suspect your CPUs are not identical.
>
> Hmm, bought as a batch, that sounds unlikely, though their BIOSes (ono) 
> may vary, and would be worth checking on each - and BIOS settings, too.
>
> Danny, is powerd running on all these?  I doubt it would load on apu-1 
> as it stands.  Note these are 'pure' 1/8 factors of 1000, p4tcc-alike, 
> and I think quite likely indicate that cpufreq(4) failed to initialise? 
> debug.cpufreq.verbose=1 in /boot/loader.conf might show a clue, with a 
> verbose dmesg.boot anyway.
>
> Later: oops, just reread Karl's message, where I was unfaniliar with 
> different CPUs showing different C-states, and noticing that despite 
> cpu0 showing C2(io) available, and cx_lowest as C2, yet it used 100% C1 
> state, which was all that was available to cpu1 to 3.
>
> But then I twigged to Karl's hwpstate errors, so with 'apropos hwpstate' 
> still showing nothing after all these years, along with other cpufreq(4) 
> drivers, I used the list search via duckduckgo to finally find one (1) 
> message, which lead to one detailed thread (that I even bought into!)
>
>  https://lists.freebsd.org/pipermail/freebsd-stable/2012-May/subject.html
>  https://lists.freebsd.org/pipermail/freebsd-stable/2012-June/thread.html
>
> /hwpstate  Note the May one needs following by Subject, else it splits 
> into 5 separate threads (?)
>
> Which may be interesting to cpufreq nerds, but had me remember that 
> hwpstate(0) is for AMD not Intel CPUs.  So now I'm totally confused :)
>
> Danny, do your results from Karl's sysctl listings agree with his?
These are not Intel CPUs; they are an embedded AMD 64-bit CPU.

The specs on the one I have are:

  * CPU: AMD Embedded G series GX-412TC, 1 GHz quad Jaguar core with 64
bit and AES-NI support, 32K data + 32K instruction cache per core,
shared 2MB L2 cache.
  * DRAM: 2 or 4 GB DDR3-1333 DRAM
  * Storage: Boot from m-SATA SSD, SD card (internal sdhci controller),
or external USB. 1 SATA + power connector.
  * 12V DC, about 6 to 12W depending on CPU load. Jack = 2.5 mm, center
positive
  * Connectivity: 2 or 3 Gigabit Ethernet channels (Intel i211AT on
apu2b2, i210AT on apu2b4)
  * I/O: DB9 serial port, 2 USB 3.0 external + + 2 USB 2.0 internal,
three front panel LEDs, pushbutton
  * Expansion: 2 miniPCI express (one with SIM socket), LPC bus, GPIO
header, I2C bus, COM2 (3.3V RXD / TXD)
  * Board size: 6 x 6" (152.4 x 152.4 mm) - same as apu1d, alix2d13 and
wrap1e.
  * Firmware: coreboot <http://www.coreboot.org/> (please contact
supp...@pcengines.ch for source code if desired).
  * Cooling: Conductive cooling from the CPU to the enclosure using a 3
mm alu heat spreader (included).


The one I have here is a 2Gb RAM/2 IGB Ethernet interface unit.

They're surprisingly capable for their size, conductive cooling and
(especially) price.  As a firewall/VPN ingress point they perform
nicely.  I boot the one I have here from an SD card in a NanoBSD config
but you can stick a mSATA SSD (laptop-computer style) in the case and
boot from that if you want (I've tried it; the internal BIOS it comes
with boots from it just fine.)

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: issues with powerd/freq_levels

2017-07-31 Thread Karl Denninger

ew PCEngines unit here running 11.0-STABLE and this is
what I have in the related sysctls:

$ sysctl -a|grep cpu.0
dev.cpu.0.cx_method: C1/hlt C2/io
dev.cpu.0.cx_usage_counters: 2261969965 3038
dev.cpu.0.cx_usage: 99.99% 0.00% last 798us
dev.cpu.0.cx_lowest: C2
dev.cpu.0.cx_supported: C1/1/0 C2/2/400
dev.cpu.0.freq_levels: 1000/924 800/760 600/571
dev.cpu.0.freq: 1000
dev.cpu.0.temperature: 59.2C
dev.cpu.0.%parent: acpi0
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%location: handle=\_PR_.P000
dev.cpu.0.%driver: cpu
dev.cpu.0.%desc: ACPI CPU



$ sysctl -a|grep cx
hw.acpi.cpu.cx_lowest: C2
dev.cpu.3.cx_method: C1/hlt
dev.cpu.3.cx_usage_counters: 111298364
dev.cpu.3.cx_usage: 100.00% last 30us
dev.cpu.3.cx_lowest: C2
dev.cpu.3.cx_supported: C1/1/0
dev.cpu.2.cx_method: C1/hlt
dev.cpu.2.cx_usage_counters: 127978480
dev.cpu.2.cx_usage: 100.00% last 35us
dev.cpu.2.cx_lowest: C2
dev.cpu.2.cx_supported: C1/1/0
dev.cpu.1.cx_method: C1/hlt
dev.cpu.1.cx_usage_counters: 108161434
dev.cpu.1.cx_usage: 100.00% last 29us
dev.cpu.1.cx_lowest: C2
dev.cpu.1.cx_supported: C1/1/0
dev.cpu.0.cx_method: C1/hlt C2/io
dev.cpu.0.cx_usage_counters: 2261916773 3038
dev.cpu.0.cx_usage: 99.99% 0.00% last 378us
dev.cpu.0.cx_lowest: C2
dev.cpu.0.cx_supported: C1/1/0 C2/2/400


These are fanless, 4-core devices that are pretty cool -- they've got
AES instructions in them and thus make very nice VPN gateways running
something like Strongswan, and come with either 2 or 3 gigabit
interfaces on the board.   Oh, and they run on 12V.

Powerd is logging this, however...

hwpstate0: set freq failed, err 6
hwpstate0: set freq failed, err 6

H

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Interesting permissions difference on NanoBSD build

2017-06-16 Thread Karl Denninger



On 6/16/2017 09:55, Karl Denninger wrote:
> On 6/16/2017 08:21, Karl Denninger wrote:
>> On 6/16/2017 07:52, Guido Falsi wrote:
>>> On 06/16/17 14:25, Karl Denninger wrote:
>>>> I've recently started playing with the "base" NanoBSD scripts and have
>>>> run into an interesting issue.
>>> [...]
>>>> Note the missing "r" bit for "other" in usr and etc directories -- and
>>>> the missing "x" bit (at minimum) for the root!  The same is carried down
>>>> to "local" under usr:
>>>>
>>>> root@NewFS:/pics/Crochet-work-AMD/obj/_.w # ls -al usr
>>>> total 134
>>>> drwxr-x--x  12 root  wheel   12 Jun 15 17:10 .
>>>> drwxr-x---  18 root  wheel   24 Jun 15 17:10 ..
>>>> drwxr-xr-x   2 root  wheel  497 Jun 15 17:09 bin
>>>> drwxr-xr-x  52 root  wheel  327 Jun 15 17:10 include
>>>> drwxr-xr-x   8 root  wheel  655 Jun 15 17:10 lib
>>>> drwxr-xr-x   4 root  wheel  670 Jun 15 17:09 lib32
>>>> drwxr-xr-x   5 root  wheel5 Jun 15 17:10 libdata
>>>> drwxr-xr-x   7 root  wheel   70 Jun 15 17:10 libexec
>>>> drwxr-x--x  10 root  wheel   11 Jun 15 17:10 local
>>>> drwxr-xr-x   2 root  wheel  294 Jun 15 17:08 sbin
>>>> drwxr-xr-x  31 root  wheel   31 Jun 15 17:10 share
>>>> drwxr-xr-x  14 root  wheel   17 Jun 15 17:10 tests
>>>> root@NewFS:/pics/Crochet-work-AMD/obj/_.w #
>>> I have no idea why this is happening on your system but I'm not
>>> observing it here:
>>>
>>>> ls -al usr
>>> total 85
>>> drwxr-xr-x   9 root  wheel9 Jun 15 13:32 .
>>> drwxr-xr-x  22 root  wheel   29 Jun 15 13:32 ..
>>> drwxr-xr-x   2 root  wheel  359 Jun 15 13:32 bin
>>> drwxr-xr-x   4 root  wheel  446 Jun 15 13:32 lib
>>> drwxr-xr-x   3 root  wheel3 Jun 15 13:32 libdata
>>> drwxr-xr-x   5 root  wheel   47 Jun 15 13:32 libexec
>>> drwxr-xr-x  12 root  wheel   13 Jun 15 13:32 local
>>> drwxr-xr-x   2 root  wheel  218 Jun 15 13:32 sbin
>>> drwxr-xr-x  17 root  wheel   17 Jun 15 13:32 share
>>>
>>>
>>> and I get (almost) the same on the installed nanobsd system:
>>>> ls -al usr
>>> total 24
>>> drwxr-xr-x   9 root  wheel512 Jun 15 13:32 .
>>> drwxr-xr-x  23 root  wheel512 Jun 15 13:34 ..
>>> drwxr-xr-x   2 root  wheel   6144 Jun 15 13:32 bin
>>> drwxr-xr-x   4 root  wheel  10752 Jun 15 13:32 lib
>>> drwxr-xr-x   3 root  wheel512 Jun 15 13:32 libdata
>>> drwxr-xr-x   5 root  wheel   1024 Jun 15 13:32 libexec
>>> drwxr-xr-x  12 root  wheel512 Jun 15 13:32 local
>>> drwxr-xr-x   2 root  wheel   4096 Jun 15 13:32 sbin
>>> drwxr-xr-x  17 root  wheel512 Jun 15 13:32 share
>>>
>>> The machine I'm building the NanoBSD image on is running head r318959,
>>> and is running ZFS, while the NanoBSD system I've built is tracking
>>> 11-STABLE and is at r319971 at present, so a BETA1.
>>>
>>> Could you report version information too? maybe it's a problem present
>>> on head NanoBSD scripts?
>> FreeBSD 11.0-STABLE #15 r312669M: Mon Jan 23 14:01:03 CST 2017
>> k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP
>>
>> I also build using Crochet against both /usr/src (my "primary" source
>> repo, which is on the rev noted here) and against a second one (-HEAD),
>> which I need to use for the RPI3.  Neither winds up with this sort of
>> permission issue.
>>
>> The obj directory is on /pics/Crochet-Work-AMD, which is a zfs
>> filesystem mounted off a "scratch" SSD.
>>
>> The problem appears to stem from the creation of "_.w" and since
>> directory permissions are "normally" inherited it promulgates from there
>> unless an explicit permission set occurs.  Yet I see nothing that would
>> create the world directory with anything other than the umask at the
>> time it runs.
>>
>> I *am* running this from "batch" -- perhaps that's where the problem is
>> coming from?  I'll try adding a "umask 022" to the nanobsd.sh script at
>> the top and see what that does.
> Nope.
>
> It's something in the installworld subset; I put a stop in after the
> clean/create world directory and I have a 0755 permission mask on the
> (empty) directory.
>
> Hmmm...
>
> I do not know where this is coming from now but this test implies that
> it's the "installworld" action

Re: Interesting permissions difference on NanoBSD build

2017-06-16 Thread Karl Denninger

On 6/16/2017 08:21, Karl Denninger wrote:
> On 6/16/2017 07:52, Guido Falsi wrote:
>> On 06/16/17 14:25, Karl Denninger wrote:
>>> I've recently started playing with the "base" NanoBSD scripts and have
>>> run into an interesting issue.
>> [...]
>>> Note the missing "r" bit for "other" in usr and etc directories -- and
>>> the missing "x" bit (at minimum) for the root!  The same is carried down
>>> to "local" under usr:
>>>
>>> root@NewFS:/pics/Crochet-work-AMD/obj/_.w # ls -al usr
>>> total 134
>>> drwxr-x--x  12 root  wheel   12 Jun 15 17:10 .
>>> drwxr-x---  18 root  wheel   24 Jun 15 17:10 ..
>>> drwxr-xr-x   2 root  wheel  497 Jun 15 17:09 bin
>>> drwxr-xr-x  52 root  wheel  327 Jun 15 17:10 include
>>> drwxr-xr-x   8 root  wheel  655 Jun 15 17:10 lib
>>> drwxr-xr-x   4 root  wheel  670 Jun 15 17:09 lib32
>>> drwxr-xr-x   5 root  wheel5 Jun 15 17:10 libdata
>>> drwxr-xr-x   7 root  wheel   70 Jun 15 17:10 libexec
>>> drwxr-x--x  10 root  wheel   11 Jun 15 17:10 local
>>> drwxr-xr-x   2 root  wheel  294 Jun 15 17:08 sbin
>>> drwxr-xr-x  31 root  wheel   31 Jun 15 17:10 share
>>> drwxr-xr-x  14 root  wheel   17 Jun 15 17:10 tests
>>> root@NewFS:/pics/Crochet-work-AMD/obj/_.w #
>> I have no idea why this is happening on your system but I'm not
>> observing it here:
>>
>>> ls -al usr
>> total 85
>> drwxr-xr-x   9 root  wheel9 Jun 15 13:32 .
>> drwxr-xr-x  22 root  wheel   29 Jun 15 13:32 ..
>> drwxr-xr-x   2 root  wheel  359 Jun 15 13:32 bin
>> drwxr-xr-x   4 root  wheel  446 Jun 15 13:32 lib
>> drwxr-xr-x   3 root  wheel3 Jun 15 13:32 libdata
>> drwxr-xr-x   5 root  wheel   47 Jun 15 13:32 libexec
>> drwxr-xr-x  12 root  wheel   13 Jun 15 13:32 local
>> drwxr-xr-x   2 root  wheel  218 Jun 15 13:32 sbin
>> drwxr-xr-x  17 root  wheel   17 Jun 15 13:32 share
>>
>>
>> and I get (almost) the same on the installed nanobsd system:
>>> ls -al usr
>> total 24
>> drwxr-xr-x   9 root  wheel512 Jun 15 13:32 .
>> drwxr-xr-x  23 root  wheel512 Jun 15 13:34 ..
>> drwxr-xr-x   2 root  wheel   6144 Jun 15 13:32 bin
>> drwxr-xr-x   4 root  wheel  10752 Jun 15 13:32 lib
>> drwxr-xr-x   3 root  wheel512 Jun 15 13:32 libdata
>> drwxr-xr-x   5 root  wheel   1024 Jun 15 13:32 libexec
>> drwxr-xr-x  12 root  wheel512 Jun 15 13:32 local
>> drwxr-xr-x   2 root  wheel   4096 Jun 15 13:32 sbin
>> drwxr-xr-x  17 root  wheel512 Jun 15 13:32 share
>>
>> The machine I'm building the NanoBSD image on is running head r318959,
>> and is running ZFS, while the NanoBSD system I've built is tracking
>> 11-STABLE and is at r319971 at present, so a BETA1.
>>
>> Could you report version information too? maybe it's a problem present
>> on head NanoBSD scripts?
> FreeBSD 11.0-STABLE #15 r312669M: Mon Jan 23 14:01:03 CST 2017
> k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP
>
> I also build using Crochet against both /usr/src (my "primary" source
> repo, which is on the rev noted here) and against a second one (-HEAD),
> which I need to use for the RPI3.  Neither winds up with this sort of
> permission issue.
>
> The obj directory is on /pics/Crochet-Work-AMD, which is a zfs
> filesystem mounted off a "scratch" SSD.
>
> The problem appears to stem from the creation of "_.w" and since
> directory permissions are "normally" inherited it promulgates from there
> unless an explicit permission set occurs.  Yet I see nothing that would
> create the world directory with anything other than the umask at the
> time it runs.
>
> I *am* running this from "batch" -- perhaps that's where the problem is
> coming from?  I'll try adding a "umask 022" to the nanobsd.sh script at
> the top and see what that does.
Nope.

It's something in the installworld subset; I put a stop in after the
clean/create world directory and I have a 0755 permission mask on the
(empty) directory.

Hmmm...

I do not know where this is coming from now but this test implies that
it's the "installworld" action that causes it.

root@NewFS:/pics/Crochet-work-AMD/obj # ls -al

total 2176760
drwxr-xr-x  5 root  wheel  24 Jun 16 09:41 .
drwxr-xr-x  3 root  wheel   3 Jun 16 08:25 ..
-rw-r--r--  1 root  wheel 7658918 Jun 16 09:22 _.bk
-rw-r--r--  1 root  wheel53768368 Jun 16 09:15 _.bw
-rw-r--r--  1 root  wheel 200 Jun 16 09:25 _.cust.cust_comconsole
-rw-r--r--  1 root  wheel 733 J

Re: Interesting permissions difference on NanoBSD build

2017-06-16 Thread Karl Denninger

On 6/16/2017 07:52, Guido Falsi wrote:
> On 06/16/17 14:25, Karl Denninger wrote:
>> I've recently started playing with the "base" NanoBSD scripts and have
>> run into an interesting issue.
> [...]
>> Note the missing "r" bit for "other" in usr and etc directories -- and
>> the missing "x" bit (at minimum) for the root!  The same is carried down
>> to "local" under usr:
>>
>> root@NewFS:/pics/Crochet-work-AMD/obj/_.w # ls -al usr
>> total 134
>> drwxr-x--x  12 root  wheel   12 Jun 15 17:10 .
>> drwxr-x---  18 root  wheel   24 Jun 15 17:10 ..
>> drwxr-xr-x   2 root  wheel  497 Jun 15 17:09 bin
>> drwxr-xr-x  52 root  wheel  327 Jun 15 17:10 include
>> drwxr-xr-x   8 root  wheel  655 Jun 15 17:10 lib
>> drwxr-xr-x   4 root  wheel  670 Jun 15 17:09 lib32
>> drwxr-xr-x   5 root  wheel5 Jun 15 17:10 libdata
>> drwxr-xr-x   7 root  wheel   70 Jun 15 17:10 libexec
>> drwxr-x--x  10 root  wheel   11 Jun 15 17:10 local
>> drwxr-xr-x   2 root  wheel  294 Jun 15 17:08 sbin
>> drwxr-xr-x  31 root  wheel   31 Jun 15 17:10 share
>> drwxr-xr-x  14 root  wheel   17 Jun 15 17:10 tests
>> root@NewFS:/pics/Crochet-work-AMD/obj/_.w #
> I have no idea why this is happening on your system but I'm not
> observing it here:
>
>> ls -al usr
> total 85
> drwxr-xr-x   9 root  wheel9 Jun 15 13:32 .
> drwxr-xr-x  22 root  wheel   29 Jun 15 13:32 ..
> drwxr-xr-x   2 root  wheel  359 Jun 15 13:32 bin
> drwxr-xr-x   4 root  wheel  446 Jun 15 13:32 lib
> drwxr-xr-x   3 root  wheel3 Jun 15 13:32 libdata
> drwxr-xr-x   5 root  wheel   47 Jun 15 13:32 libexec
> drwxr-xr-x  12 root  wheel   13 Jun 15 13:32 local
> drwxr-xr-x   2 root  wheel  218 Jun 15 13:32 sbin
> drwxr-xr-x  17 root  wheel   17 Jun 15 13:32 share
>
>
> and I get (almost) the same on the installed nanobsd system:
>> ls -al usr
> total 24
> drwxr-xr-x   9 root  wheel512 Jun 15 13:32 .
> drwxr-xr-x  23 root  wheel512 Jun 15 13:34 ..
> drwxr-xr-x   2 root  wheel   6144 Jun 15 13:32 bin
> drwxr-xr-x   4 root  wheel  10752 Jun 15 13:32 lib
> drwxr-xr-x   3 root  wheel512 Jun 15 13:32 libdata
> drwxr-xr-x   5 root  wheel   1024 Jun 15 13:32 libexec
> drwxr-xr-x  12 root  wheel512 Jun 15 13:32 local
> drwxr-xr-x   2 root  wheel   4096 Jun 15 13:32 sbin
> drwxr-xr-x  17 root  wheel512 Jun 15 13:32 share
>
> The machine I'm building the NanoBSD image on is running head r318959,
> and is running ZFS, while the NanoBSD system I've built is tracking
> 11-STABLE and is at r319971 at present, so a BETA1.
>
> Could you report version information too? maybe it's a problem present
> on head NanoBSD scripts?
FreeBSD 11.0-STABLE #15 r312669M: Mon Jan 23 14:01:03 CST 2017
k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP

I also build using Crochet against both /usr/src (my "primary" source
repo, which is on the rev noted here) and against a second one (-HEAD),
which I need to use for the RPI3.  Neither winds up with this sort of
permission issue.

The obj directory is on /pics/Crochet-Work-AMD, which is a zfs
filesystem mounted off a "scratch" SSD.

The problem appears to stem from the creation of "_.w" and since
directory permissions are "normally" inherited it promulgates from there
unless an explicit permission set occurs.  Yet I see nothing that would
create the world directory with anything other than the umask at the
time it runs.

I *am* running this from "batch" -- perhaps that's where the problem is
coming from?  I'll try adding a "umask 022" to the nanobsd.sh script at
the top and see what that does.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Interesting permissions difference on NanoBSD build

2017-06-16 Thread Karl Denninger

I've recently started playing with the "base" NanoBSD scripts and have
run into an interesting issue.

Specifically, this is what winds up in the "_.w" (world) directory base
when the build completes:

root@NewFS:/pics/Crochet-work-AMD/obj/_.w # ls -al
total 112
drwxr-x---  18 root  wheel24 Jun 15 17:10 .
drwxr-xr-x   5 root  wheel24 Jun 15 17:11 ..
-rw-r--r--   2 root  wheel   955 Jun 15 17:09 .cshrc
-rw-r--r--   2 root  wheel   247 Jun 15 17:09 .profile
-r--r--r--   1 root  wheel  6197 Jun 15 17:09 COPYRIGHT
drwxr-xr-x   2 root  wheel47 Jun 15 17:08 bin
drwxr-xr-x   8 root  wheel51 Jun 15 17:09 boot
-rw-r--r--   1 root  wheel12 Jun 15 17:09 boot.config
drwxr-xr-x   2 root  wheel 2 Jun 15 17:09 cfg
drwxr-xr-x   4 root  wheel 4 Jun 15 17:10 conf
dr-xr-xr-x   2 root  wheel 3 Jun 15 17:09 dev
drwxr-x--x  28 root  wheel   110 Jun 15 17:10 etc
drwxr-xr-x   4 root  wheel56 Jun 15 17:08 lib
drwxr-xr-x   3 root  wheel 5 Jun 15 17:09 libexec
drwxr-xr-x   2 root  wheel 2 Jun 15 17:07 media
drwxr-xr-x   2 root  wheel 2 Jun 15 17:07 mnt
dr-xr-xr-x   2 root  wheel 2 Jun 15 17:07 proc
drwxr-xr-x   2 root  wheel   146 Jun 15 17:08 rescue
drwxr-xr-x   2 root  wheel12 Jun 15 17:10 root
drwxr-xr-x   2 root  wheel   137 Jun 15 17:08 sbin
lrwxr-xr-x   1 root  wheel11 Jun 15 17:07 sys -> usr/src/sys
lrwxr-xr-x   1 root  wheel 7 Jun 15 17:10 tmp -> var/tmp
drwxr-x--x  12 root  wheel12 Jun 15 17:10 usr
drwxr-xr-x  25 root  wheel25 Jun 15 17:10 var
root@NewFS:/pics/Crochet-work-AMD/obj/_.w #

Note the missing "r" bit for "other" in usr and etc directories -- and
the missing "x" bit (at minimum) for the root!  The same is carried down
to "local" under usr:

root@NewFS:/pics/Crochet-work-AMD/obj/_.w # ls -al usr
total 134
drwxr-x--x  12 root  wheel   12 Jun 15 17:10 .
drwxr-x---  18 root  wheel   24 Jun 15 17:10 ..
drwxr-xr-x   2 root  wheel  497 Jun 15 17:09 bin
drwxr-xr-x  52 root  wheel  327 Jun 15 17:10 include
drwxr-xr-x   8 root  wheel  655 Jun 15 17:10 lib
drwxr-xr-x   4 root  wheel  670 Jun 15 17:09 lib32
drwxr-xr-x   5 root  wheel5 Jun 15 17:10 libdata
drwxr-xr-x   7 root  wheel   70 Jun 15 17:10 libexec
drwxr-x--x  10 root  wheel   11 Jun 15 17:10 local
drwxr-xr-x   2 root  wheel  294 Jun 15 17:08 sbin
drwxr-xr-x  31 root  wheel   31 Jun 15 17:10 share
drwxr-xr-x  14 root  wheel   17 Jun 15 17:10 tests
root@NewFS:/pics/Crochet-work-AMD/obj/_.w #

I do not know if this is intentional, but it certainly was not
expected.  It does carry through to the disk image that is created as
well and then there's this, which if you mount the image leads me to
wonder what's going on:

root@NewFS:/pics/Crochet-work-AMD/obj # mount -o ro /dev/md0s1a /mnt
root@NewFS:/pics/Crochet-work-AMD/obj # cd /mnt
root@NewFS:/mnt # ls -al
total 34
drwxr-x---  19 root  wheel  512 Jun 15 17:10 .
drwxr-xr-x  45 root  wheel   55 Jun  1 10:58 ..
-rw-r--r--   2 root  wheel  955 Jun 15 17:09 .cshrc
-rw-r--r--   2 root  wheel  247 Jun 15 17:09 .profile
drwxrwxr-x   2 root  operator   512 Jun 15 17:10 .snap
-r--r--r--   1 root  wheel 6197 Jun 15 17:09 COPYRIGHT
drwxr-xr-x   2 root  wheel 1024 Jun 15 17:08 bin
drwxr-xr-x   8 root  wheel 1024 Jun 15 17:09 boot
-rw-r--r--   1 root  wheel   12 Jun 15 17:09 boot.config
drwxr-xr-x   2 root  wheel  512 Jun 15 17:09 cfg
drwxr-xr-x   4 root  wheel  512 Jun 15 17:10 conf
dr-xr-xr-x   2 root  wheel  512 Jun 15 17:09 dev
drwxr-x--x  28 root  wheel 2048 Jun 15 17:10 etc
drwxr-xr-x   4 root  wheel 1536 Jun 15 17:08 lib
drwxr-xr-x   3 root  wheel  512 Jun 15 17:09 libexec
drwxr-xr-x   2 root  wheel  512 Jun 15 17:07 media
drwxr-xr-x   2 root  wheel  512 Jun 15 17:07 mnt
dr-xr-xr-x   2 root  wheel  512 Jun 15 17:07 proc
drwxr-xr-x   2 root  wheel 2560 Jun 15 17:08 rescue
drwxr-xr-x   2 root  wheel  512 Jun 15 17:10 root
drwxr-xr-x   2 root  wheel 2560 Jun 15 17:08 sbin
lrwxr-xr-x   1 root  wheel   11 Jun 15 17:07 sys -> usr/src/sys
lrwxr-xr-x   1 root  wheel7 Jun 15 17:10 tmp -> var/tmp
drwxr-x--x  12 root  wheel  512 Jun 15 17:10 usr
drwxr-xr-x  25 root  wheel  512 Jun 15 17:10 var

Note the permissions at the root -- that denies *search* for others
it is an exact copy of the "_.w" permission list of course, but if you
create a non-root user as a part of the NanoBSD build you wind up with
some "interesting" behavior when that user logs in!

I'm assuming this is unintentional but wondering where it comes from
(and whether it needs / should be fixed); it's easy to fix it, of
course, once the embedded system boots but you need to (obviously) mount
read/write long enough to update it

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: FreeBSD 10.2-RELEASE #0 r286666: Panic and crash

2017-02-06 Thread Karl Denninger

On 2/6/2017 15:01, Shawn Bakhtiar wrote:
> Hi all!
>
> http://pastebin.com/niXrjF0D
>
> Please refer to full output from crash above.
>
> This morning our IMAP server decided to go belly up. I could not remote in, 
> and the machine would not respond to any pings.
>
> Checking the physical console I had the following worrisome messages on 
> screen:
>
> • g_vfs_done():da1p1[READ(offset=7265561772032, length=32768)]error = 5
> • g_vfs_done():da1p1[WRITE(offset=7267957735424, length=131072)]error = 16
> • /mnt/USBBD: got error 16 while accessing filesystem
> • panic: softdep_deallocate_dependencies: unrecovered I/O error
> • cpuid = 5
>
> /mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of the 
> IMAP data using rsync. Everything so far has worked without issue.
>
> I also noticed a bunch of:
>
> • fstat: can't read file 2 at 0x41f
> • fstat: can't read file 4 at 0x78
> • fstat: can't read file 5 at 0x6
> • fstat: can't read file 1 at 0x27f
> • fstat: can't read file 2 at 0x41f
> • fstat: can't read file 4 at 0x78
> • fstat: can't read file 5 at 0x6
>
>
> but I have no idea what these are from.
>
> df -h output:
> /dev/da0p21.8T226G1.5T13%/
> devfs 1.0K1.0K  0B   100%/dev
> /dev/da1p17.0T251G6.2T 4%/mnt/USBBD
>
>
> da0p2 is a RAID level 5 on an HP Smart Array
>
> Here is the output of dmsg after reboot:
> http://pastebin.com/rHVjgZ82
>
> Obviously both the RAID and USB drive did not walk away from the crash 
> cleaning. Should I be running a fsck at this point on both from single user 
> mode to verify and clean up. My concern is the:
> WARNING: /: mount pending error: blocks 0 files 26
> when mounting /dev/da0p2
>
> For some reason I was under the impression that fsck was run automatically on 
> reboot.
>
> Any help in this matter would be greatly appreciated. I'm a little concerned 
> that a backup strategy that has worked for us for many MANY years would so 
> easily throw the OS into panic. If an I/O error occurred on the USB Drive I 
> would frankly think it should just back out, without panic. Or am I missing 
> something?
>
> Any recommendations / insights would be most welcome.
> Shawn
>
>
The "mount pending error" is normal on a disk that has softupdates
turned on; fsck runs in the background after the boot, and this is
"safe" because of how the metadata and data writes are ordered.  In
other words the filesystem in this situation is missing uncommitted
data, but the state of the system is consistent.  As a result the system
can mount root read-write without having to fsck it first and the
background cleanup is safe from a disk consistency problem.

The panic itself appears to have resulted from an I/O error that
resulted in a failed operation.

I was part of a thread in 2016 on this you can find here:
https://lists.freebsd.org/pipermail/freebsd-stable/2016-July/084944.html

The basic problem is that the softupdates code cannot deal with a hard
I/O error on write because it no longer can guarantee filesystem
integrity if it continues.  I argued in that thread that the superior
solution would be forcibly detach the volume, which would leave you with
a "dirty" filesystem and a failed operation but not a panic.  The
file(s) involved in the write error might be lost, but the integrity of
the filesystem is recoverable (as it is in the panic case) -- at least
it is if the fsck doesn't require writing to a block that *also* errors out.

The decision in the code is to panic rather than detach the volume,
however, so panic it is.  This one has bit me with sd cards in small
embedded-style machines (where turning off softupdates makes things VERY
slow) and at some point I may look into developing a patch to
forcibly-detach the volume instead.  That obviously won't help you if
the system volume is the one the error happens on (now you just forcibly
detached the root filesystem which is going to get you an immediate
panic anyway) but in the event of a data disk it would prevent the
system from crashing.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Ugh -- attempted to update this morning, and got a nasty panic in ZFS....

2017-01-11 Thread Karl Denninger

A second attempt to come up on the new kernel was successful -- so this
had to be due to queued I/Os that were pending at the time of the
shutdown


On 1/11/2017 08:31, Karl Denninger wrote:
> During the reboot, immediately after the daemons started up on the
> machine (the boot got beyond mounting all the disks and was well into
> starting up all the background stuff it runs), I got a double-fault.
>
> . (there were a LOT more of this same; it pretty clearly was a
> recursive call sequence that ran the system out of stack space)
>
> #294 0x822fdcfd in zio_execute (zio=)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666
> #295 0x8230130e in zio_vdev_io_start (zio=0xf8010c8f27b0)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127
> #296 0x822fdcfd in zio_execute (zio=)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666
> #297 0x822e464d in vdev_queue_io_done (zio=)
> at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913
> #298 0x823014c9 in zio_vdev_io_done (zio=0xf8010cff0b88)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152
> #299 0x822fdcfd in zio_execute (zio=)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666
> #300 0x8230130e in zio_vdev_io_start (zio=0xf8010cff0b88)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127
> #301 0x822fdcfd in zio_execute (zio=)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666
> #302 0x822e464d in vdev_queue_io_done (zio=)
> at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913
> #303 0x823014c9 in zio_vdev_io_done (zio=0xf8010c962000)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152
> #304 0x822fdcfd in zio_execute (zio=)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666
> #305 0x8230130e in zio_vdev_io_start (zio=0xf8010c962000)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127
> #306 0x822fdcfd in zio_execute (zio=)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666
> #307 0x822e464d in vdev_queue_io_done (zio=)
> at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913
> #308 0x823014c9 in zio_vdev_io_done (zio=0xf80102175000)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152
> #309 0x822fdcfd in zio_execute (zio=)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666
> #310 0x80b2585a in taskqueue_run_locked (queue= out>)
> at /usr/src/sys/kern/subr_taskqueue.c:454
> #311 0x80b26a48 in taskqueue_thread_loop (arg=)
> at /usr/src/sys/kern/subr_taskqueue.c:724
> #312 0x80a7eb05 in fork_exit (
> callout=0x80b26960 ,
> arg=0xf800b8824c30, frame=0xfe0667430c00)
> at /usr/src/sys/kern/kern_fork.c:1040
> #313 0x80f87c3e in fork_trampoline ()
> at /usr/src/sys/amd64/amd64/exception.S:611
> #314 0x in ?? ()
> Current language:  auto; currently minimal
> (kgdb)
>
> .
>
>
> NewFS.denninger.net dumped core - see /var/crash/vmcore.3
>
> Wed Jan 11 08:15:33 CST 2017
>
> FreeBSD NewFS.denninger.net 11.0-STABLE FreeBSD 11.0-STABLE #14
> r311927M: Wed Ja
> n 11 07:55:20 CST 2017
> k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP
>   amd64
>
> panic: double fault
>
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
>
> Unread portion of the kernel message buffer:
>
> Fatal double fault
> rip = 0x822e3c5d
> rsp = 0xfe066742af90
> rbp = 0xfe066742b420
> cpuid = 15; apic id = 35
> panic: double fault
> cpuid = 15
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe0649ddee30
> vpanic() at vpanic+0x186/frame 0xfe0649ddeeb0
> panic() at panic+0x43/frame 0xfe0649ddef10
> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe0649ddef30
> Xdblfault() at Xdblfault+0xac/frame 0xfe0649ddef30
> --- trap 0x17, rip = 0x822e3c5d, rsp = 0xfe066742af90, rbp =
> 0xf
> e0

Ugh -- attempted to update this morning, and got a nasty panic in ZFS....

2017-01-11 Thread Karl Denninger

blem -- and setting stackpages didn't help!

I've got the dump if anything in particular would be of help.

The prompt to do this in the first place was the openssh CVE that was
recently issued.


-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-11-21 Thread Karl Denninger

On 10/17/2016 18:32, Steven Hartland wrote:
>
> On 17/10/2016 22:50, Karl Denninger wrote:
>> I will make some effort on the sandbox machine to see if I can come up
>> with a way to replicate this.  I do have plenty of spare larger drives
>> laying around that used to be in service and were obsolesced due to
>> capacity -- but what I don't know if whether the system will misbehave
>> if the source is all spinning rust.
>>
>> In other words:
>>
>> 1. Root filesystem is mirrored spinning rust (production is mirrored
>> SSDs)
>>
>> 2. Backup is mirrored spinning rust (of approx the same size)
>>
>> 3. Set up auto-snapshot exactly as the production system has now (which
>> the sandbox is NOT since I don't care about incremental recovery on that
>> machine; it's a sandbox!)
>>
>> 4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for
>> the Pi2s I have here, etc) to generate a LOT of filesystem entropy
>> across lots of snapshots.
>>
>> 5. Back that up.
>>
>> 6. Export the backup pool.
>>
>> 7. Re-import it and "zfs destroy -r" the backup filesystem.
>>
>> That is what got me in a reboot loop after the *first* panic; I was
>> simply going to destroy the backup filesystem and re-run the backup, but
>> as soon as I issued that zfs destroy the machine panic'd and as soon as
>> I re-attached it after a reboot it panic'd again.  Repeat until I set
>> trim=0.
>>
>> But... if I CAN replicate it that still shouldn't be happening, and the
>> system should *certainly* survive attempting to TRIM on a vdev that
>> doesn't support TRIMs, even if the removal is for a large amount of
>> space and/or files on the target, without blowing up.
>>
>> BTW I bet it isn't that rare -- if you're taking timed snapshots on an
>> active filesystem (with lots of entropy) and then make the mistake of
>> trying to remove those snapshots (as is the case with a zfs destroy -r
>> or a zfs recv of an incremental copy that attempts to sync against a
>> source) on a pool that has been imported before the system realizes that
>> TRIM is unavailable on those vdevs.
>>
>> Noting this:
>>
>>  Yes need to find some time to have a look at it, but given how rare
>>  this is and with TRIM being re-implemented upstream in a totally
>>  different manor I'm reticent to spend any real time on it.
>>
>> What's in-process in this regard, if you happen to have a reference?
> Looks like it may be still in review: https://reviews.csiden.org/r/263/
>
>

Having increased the kernel stack page count I have not had another
instance of this in the last couple of weeks+, and I am running daily
backup jobs as usual...

So this *does not* appear to be an infinite recursion problem...

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-18 Thread Karl Denninger

On 10/17/2016 18:32, Steven Hartland wrote:
>
>
> On 17/10/2016 22:50, Karl Denninger wrote:
>> I will make some effort on the sandbox machine to see if I can come up
>> with a way to replicate this.  I do have plenty of spare larger drives
>> laying around that used to be in service and were obsolesced due to
>> capacity -- but what I don't know if whether the system will misbehave
>> if the source is all spinning rust.
>>
>> In other words:
>>
>> 1. Root filesystem is mirrored spinning rust (production is mirrored
>> SSDs)
>>
>> 2. Backup is mirrored spinning rust (of approx the same size)
>>
>> 3. Set up auto-snapshot exactly as the production system has now (which
>> the sandbox is NOT since I don't care about incremental recovery on that
>> machine; it's a sandbox!)
>>
>> 4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for
>> the Pi2s I have here, etc) to generate a LOT of filesystem entropy
>> across lots of snapshots.
>>
>> 5. Back that up.
>>
>> 6. Export the backup pool.
>>
>> 7. Re-import it and "zfs destroy -r" the backup filesystem.
>>
>> That is what got me in a reboot loop after the *first* panic; I was
>> simply going to destroy the backup filesystem and re-run the backup, but
>> as soon as I issued that zfs destroy the machine panic'd and as soon as
>> I re-attached it after a reboot it panic'd again.  Repeat until I set
>> trim=0.
>>
>> But... if I CAN replicate it that still shouldn't be happening, and the
>> system should *certainly* survive attempting to TRIM on a vdev that
>> doesn't support TRIMs, even if the removal is for a large amount of
>> space and/or files on the target, without blowing up.
>>
>> BTW I bet it isn't that rare -- if you're taking timed snapshots on an
>> active filesystem (with lots of entropy) and then make the mistake of
>> trying to remove those snapshots (as is the case with a zfs destroy -r
>> or a zfs recv of an incremental copy that attempts to sync against a
>> source) on a pool that has been imported before the system realizes that
>> TRIM is unavailable on those vdevs.
>>
>> Noting this:
>>
>>  Yes need to find some time to have a look at it, but given how rare
>>  this is and with TRIM being re-implemented upstream in a totally
>>  different manor I'm reticent to spend any real time on it.
>>
>> What's in-process in this regard, if you happen to have a reference?
> Looks like it may be still in review: https://reviews.csiden.org/r/263/
>
>
Initial attempts to provoke the panic has failed on the sandbox machine
-- it appears that I need a materially-fragmented backup volume (which
makes sense, as that would greatly increase the number of TRIM's queued.)

Running a bunch of builds with snapshots taken between generates a
metric ton of entropy in the filesystem, but it appears that the number
of TRIMs actually issued when you bulk-remove them (with zfs destroy -r)
is small enough to not cause it -- probably because the system issues
one per area of freed disk, and since there is no interleaving with
other (non-removed) data that number is "reasonable" since there's
little fragmentation of that free space.

The TRIMs *are* attempted, and they *do* fail, however.

I'm running with the 6 pages of kstack now on the production machine,
and we'll see if I get another panic...

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger

I will make some effort on the sandbox machine to see if I can come up
with a way to replicate this.  I do have plenty of spare larger drives
laying around that used to be in service and were obsolesced due to
capacity -- but what I don't know if whether the system will misbehave
if the source is all spinning rust.

In other words:

1. Root filesystem is mirrored spinning rust (production is mirrored SSDs)

2. Backup is mirrored spinning rust (of approx the same size)

3. Set up auto-snapshot exactly as the production system has now (which
the sandbox is NOT since I don't care about incremental recovery on that
machine; it's a sandbox!)

4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for
the Pi2s I have here, etc) to generate a LOT of filesystem entropy
across lots of snapshots.

5. Back that up.

6. Export the backup pool.

7. Re-import it and "zfs destroy -r" the backup filesystem.

That is what got me in a reboot loop after the *first* panic; I was
simply going to destroy the backup filesystem and re-run the backup, but
as soon as I issued that zfs destroy the machine panic'd and as soon as
I re-attached it after a reboot it panic'd again.  Repeat until I set
trim=0.

But... if I CAN replicate it that still shouldn't be happening, and the
system should *certainly* survive attempting to TRIM on a vdev that
doesn't support TRIMs, even if the removal is for a large amount of
space and/or files on the target, without blowing up.

BTW I bet it isn't that rare -- if you're taking timed snapshots on an
active filesystem (with lots of entropy) and then make the mistake of
trying to remove those snapshots (as is the case with a zfs destroy -r
or a zfs recv of an incremental copy that attempts to sync against a
source) on a pool that has been imported before the system realizes that
TRIM is unavailable on those vdevs.

Noting this:

Yes need to find some time to have a look at it, but given how rare
this is and with TRIM being re-implemented upstream in a totally
different manor I'm reticent to spend any real time on it.

What's in-process in this regard, if you happen to have a reference?

On 10/17/2016 16:40, Steven Hartland wrote:
> Setting those values will only effect what's queued to the device not
> what's actually outstanding.
>
> On 17/10/2016 21:22, Karl Denninger wrote:
>> Since I cleared it (by setting TRIM off on the test machine, rebooting,
>> importing the pool and noting that it did not panic -- pulled drives,
>> re-inserted into the production machine and ran backup routine -- all
>> was normal) it may be a while before I see it again (a week or so is
>> usual.)
>>
>> It appears to be related to entropy in the filesystem that comes up as
>> "eligible" to be removed from the backup volume, which (not
>> surprisingly) tends to happen a few days after I do a new world build or
>> something similar (the daily and/or periodic snapshots roll off at about
>> that point.)
>>
>> I don't happen to have a spare pair of high-performance SSDs I can stick
>> in the sandbox machine in an attempt to force the condition to assert
>> itself in test, unfortunately.
>>
>> I *am* concerned that it's not "simple" stack exhaustion because setting
>> the max outstanding TRIMs on a per-vdev basis down quite-dramatically
>> did *not* prevent it from happening -- and if it was simply stack depth
>> related I would have expected that to put a stop to it.
>>
>> On 10/17/2016 15:16, Steven Hartland wrote:
>>> Be good to confirm its not an infinite loop by giving it a good bump
>>> first.
>>>
>>> On 17/10/2016 19:58, Karl Denninger wrote:
>>>> I can certainly attempt setting that higher but is that not just
>>>> hiding the problem rather than addressing it?
>>>>
>>>>
>>>> On 10/17/2016 13:54, Steven Hartland wrote:
>>>>> You're hitting stack exhaustion, have you tried increasing the kernel
>>>>> stack pages?
>>>>> It can be changed from /boot/loader.conf
>>>>> kern.kstack_pages="6"
>>>>>
>>>>> Default on amd64 is 4 IIRC
>>>>>
>>>>> On 17/10/2016 19:08, Karl Denninger wrote:
>>>>>> The target (and devices that trigger this) are a pair of 4Gb 7200RPM
>>>>>> SATA rotating rust drives (zmirror) with each provider
>>>>>> geli-encrypted
>>>>>> (that is, the actual devices used for the pool create are the
>>>>>> .eli's)
>>>>>>
>>>>>> The machine generating the problem has both rotating rust device

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger

Since I cleared it (by setting TRIM off on the test machine, rebooting,
importing the pool and noting that it did not panic -- pulled drives,
re-inserted into the production machine and ran backup routine -- all
was normal) it may be a while before I see it again (a week or so is usual.)

It appears to be related to entropy in the filesystem that comes up as
"eligible" to be removed from the backup volume, which (not
surprisingly) tends to happen a few days after I do a new world build or
something similar (the daily and/or periodic snapshots roll off at about
that point.)

I don't happen to have a spare pair of high-performance SSDs I can stick
in the sandbox machine in an attempt to force the condition to assert
itself in test, unfortunately.

I *am* concerned that it's not "simple" stack exhaustion because setting
the max outstanding TRIMs on a per-vdev basis down quite-dramatically
did *not* prevent it from happening -- and if it was simply stack depth
related I would have expected that to put a stop to it.

On 10/17/2016 15:16, Steven Hartland wrote:
> Be good to confirm its not an infinite loop by giving it a good bump
> first.
>
> On 17/10/2016 19:58, Karl Denninger wrote:
>> I can certainly attempt setting that higher but is that not just
>> hiding the problem rather than addressing it?
>>
>>
>> On 10/17/2016 13:54, Steven Hartland wrote:
>>> You're hitting stack exhaustion, have you tried increasing the kernel
>>> stack pages?
>>> It can be changed from /boot/loader.conf
>>> kern.kstack_pages="6"
>>>
>>> Default on amd64 is 4 IIRC
>>>
>>> On 17/10/2016 19:08, Karl Denninger wrote:
>>>> The target (and devices that trigger this) are a pair of 4Gb 7200RPM
>>>> SATA rotating rust drives (zmirror) with each provider geli-encrypted
>>>> (that is, the actual devices used for the pool create are the .eli's)
>>>>
>>>> The machine generating the problem has both rotating rust devices
>>>> *and*
>>>> SSDs, so I can't simply shut TRIM off system-wide and call it a day as
>>>> TRIM itself is heavily-used; both the boot/root pools and a Postgresql
>>>> database pool are on SSDs, while several terabytes of lesser-used data
>>>> is on a pool of Raidz2 that is made up of spinning rust.
>>> snip...
>>>> NewFS.denninger.net dumped core - see /var/crash/vmcore.1
>>>>
>>>> Mon Oct 17 09:02:33 CDT 2016
>>>>
>>>> FreeBSD NewFS.denninger.net 11.0-STABLE FreeBSD 11.0-STABLE #13
>>>> r307318M: Fri Oct 14 09:23:46 CDT 2016
>>>> k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP  amd64
>>>>
>>>> panic: double fault
>>>>
>>>> GNU gdb 6.1.1 [FreeBSD]
>>>> Copyright 2004 Free Software Foundation, Inc.
>>>> GDB is free software, covered by the GNU General Public License, and
>>>> you are
>>>> welcome to change it and/or distribute copies of it under certain
>>>> conditions.
>>>> Type "show copying" to see the conditions.
>>>> There is absolutely no warranty for GDB.  Type "show warranty" for
>>>> details.
>>>> This GDB was configured as "amd64-marcel-freebsd"...
>>>>
>>>> Unread portion of the kernel message buffer:
>>>>
>>>> Fatal double fault
>>>> rip = 0x8220d9ec
>>>> rsp = 0xfe066821f000
>>>> rbp = 0xfe066821f020
>>>> cpuid = 6; apic id = 14
>>>> panic: double fault
>>>> cpuid = 6
>>>> KDB: stack backtrace:
>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>>> 0xfe0649d78e30
>>>> vpanic() at vpanic+0x182/frame 0xfe0649d78eb0
>>>> panic() at panic+0x43/frame 0xfe0649d78f10
>>>> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe0649d78f30
>>>> Xdblfault() at Xdblfault+0xac/frame 0xfe0649d78f30
>>>> --- trap 0x17, rip = 0x8220d9ec, rsp = 0xfe066821f000,
>>>> rbp =
>>>> 0xfe066821f020 ---
>>>> avl_rotation() at avl_rotation+0xc/frame 0xfe066821f020
>>>> avl_remove() at avl_remove+0x1c8/frame 0xfe066821f070
>>>> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x87f/frame
>>>> 0xfe066821f530
>>>> vdev_queue_io_done() at vdev_queue_io_done+0x83/frame
>>>> 0xfe066821f570
>>>> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f5a0
>>>&g

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger

I can certainly attempt setting that higher but is that not just
hiding the problem rather than addressing it?


On 10/17/2016 13:54, Steven Hartland wrote:
> You're hitting stack exhaustion, have you tried increasing the kernel
> stack pages?
> It can be changed from /boot/loader.conf
> kern.kstack_pages="6"
>
> Default on amd64 is 4 IIRC
>
> On 17/10/2016 19:08, Karl Denninger wrote:
>> The target (and devices that trigger this) are a pair of 4Gb 7200RPM
>> SATA rotating rust drives (zmirror) with each provider geli-encrypted
>> (that is, the actual devices used for the pool create are the .eli's)
>>
>> The machine generating the problem has both rotating rust devices *and*
>> SSDs, so I can't simply shut TRIM off system-wide and call it a day as
>> TRIM itself is heavily-used; both the boot/root pools and a Postgresql
>> database pool are on SSDs, while several terabytes of lesser-used data
>> is on a pool of Raidz2 that is made up of spinning rust.
> snip...
>>
>> NewFS.denninger.net dumped core - see /var/crash/vmcore.1
>>
>> Mon Oct 17 09:02:33 CDT 2016
>>
>> FreeBSD NewFS.denninger.net 11.0-STABLE FreeBSD 11.0-STABLE #13
>> r307318M: Fri Oct 14 09:23:46 CDT 2016
>> k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP  amd64
>>
>> panic: double fault
>>
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and
>> you are
>> welcome to change it and/or distribute copies of it under certain
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for
>> details.
>> This GDB was configured as "amd64-marcel-freebsd"...
>>
>> Unread portion of the kernel message buffer:
>>
>> Fatal double fault
>> rip = 0x8220d9ec
>> rsp = 0xfe066821f000
>> rbp = 0xfe066821f020
>> cpuid = 6; apic id = 14
>> panic: double fault
>> cpuid = 6
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe0649d78e30
>> vpanic() at vpanic+0x182/frame 0xfe0649d78eb0
>> panic() at panic+0x43/frame 0xfe0649d78f10
>> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe0649d78f30
>> Xdblfault() at Xdblfault+0xac/frame 0xfe0649d78f30
>> --- trap 0x17, rip = 0x8220d9ec, rsp = 0xfe066821f000, rbp =
>> 0xfe066821f020 ---
>> avl_rotation() at avl_rotation+0xc/frame 0xfe066821f020
>> avl_remove() at avl_remove+0x1c8/frame 0xfe066821f070
>> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x87f/frame
>> 0xfe066821f530
>> vdev_queue_io_done() at vdev_queue_io_done+0x83/frame 0xfe066821f570
>> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f5a0
>> zio_execute() at zio_execute+0x23d/frame 0xfe066821f5f0
>> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f650
>> zio_execute() at zio_execute+0x23d/frame 0xfe066821f6a0
>> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f6e0
>> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f710
>> zio_execute() at zio_execute+0x23d/frame 0xfe066821f760
>> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f7c0
>> zio_execute() at zio_execute+0x23d/frame 0xfe066821f810
>> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f850
>> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f880
>> zio_execute() at zio_execute+0x23d/frame 0xfe066821f8d0
>> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f930
>> zio_execute() at zio_execute+0x23d/frame 0xfe066821f980
>> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f9c0
>> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f9f0
>> zio_execute() at zio_execute+0x23d/frame 0xfe066821fa40
>> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821faa0
>> zio_execute() at zio_execute+0x23d/frame 0xfe066821faf0
>> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821fb30
>> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821fb60
>> zio_execute() at zio_execute+0x23d/frame 0xfe066821fbb0
>> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821fc10
>> zio_execute() at zio_execute+0x23d/frame 0xfe066821fc60
>> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821fca0
>> zio_vdev_io_d

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger

The target (and devices that trigger this) are a pair of 4Gb 7200RPM
SATA rotating rust drives (zmirror) with each provider geli-encrypted
(that is, the actual devices used for the pool create are the .eli's)

The machine generating the problem has both rotating rust devices *and*
SSDs, so I can't simply shut TRIM off system-wide and call it a day as
TRIM itself is heavily-used; both the boot/root pools and a Postgresql
database pool are on SSDs, while several terabytes of lesser-used data
is on a pool of Raidz2 that is made up of spinning rust.

vfs.zfs.trim.max_interval: 1
vfs.zfs.trim.timeout: 30
vfs.zfs.trim.txg_delay: 32
vfs.zfs.trim.enabled: 1
vfs.zfs.vdev.trim_max_pending: 1
vfs.zfs.vdev.trim_max_active: 64
vfs.zfs.vdev.trim_min_active: 1
vfs.zfs.vdev.trim_on_init: 1
kstat.zfs.misc.zio_trim.failed: 0
kstat.zfs.misc.zio_trim.unsupported: 1080
kstat.zfs.misc.zio_trim.success: 573768
kstat.zfs.misc.zio_trim.bytes: 28964282368

The machine in question has been up for ~3 hours now since the last
panic, so obviously TRIM is being heavily used...

The issue, once the problem has been created, is *portable* and it is
not being caused by the SSD source drives.  That is, once the machine
panics if I remove the two disks that form the backup pool, physically
move them to my sandbox machine, geli attach the drives and import the
pool within seconds the second machine will panic in the identical
fashion.  It's possible (but have not proved) that if I were to reboot
enough times the filesystem would eventually reach consistency with the
removed snapshots all gone and the panics would stop, but I got a
half-dozen of them sequentially this morning on my test machine so I'm
not at all sure how many more I'd need to allow to run, or whether *any*
of the removals committed before the panic (if not then the cycle of
reboot/attach/panic would never end) :-)

Reducing trim_max_active (to 10, a quite-drastic reduction) did not stop
the panics.

What appears to be happening is that the removal of the datasets in
question on a reasonably newly-imported pool, whether it occurs by the
incremental zfs recv -Fudv or by zfs destroy -r from the command line,
generates a large number of TRIM requests which are of course rejected
by the providers as spinning rust does not support them.  However the
attempt to queue them generates a stack overflow and double-fault panic
as a result, and since once the command is issued the filesystem now has
the deletes pending and the consistent state is in fact with them gone,
any attempt to reattach the drives with TRIM enabled can result in an
immediate additional panic.

I tried to work around this in my backup script by creating and then
destroying a file on the backup volume, then sleeping for a few seconds
before the backup actually commenced, in the hope that this would (1)
trigger a TRIM attempt and (2) lead the system to recognize that the
target volume cannot support TRIM and thus stop trying to do so (and
thus not lead to the storm that exhausts the stack and panic.)  That
approach, however (see below), failed to prevent the problem.

#
# Now try to trigger TRIM so that we don't have a storm of them
#
echo "Attempting to disable TRIM on spinning rust"

mount -t zfs backup/no-trim /mnt
dd if=/dev/random of=/mnt/kill-trim bs=128k count=2
echo "Performed 2 writes"
sleep 2
rm /mnt/kill-trim
echo "Performed delete of written file"
sleep 5
umount /mnt
echo "Unmounted temporary filesystem"
sleep 2
echo "TRIM disable theoretically done"

On 10/17/2016 12:43, Warner Losh wrote:
> what's your underlying media?
>
> Warner
>
>
> On Mon, Oct 17, 2016 at 10:02 AM, Karl Denninger  wrote:
>> Update from my test system:
>>
>> Setting vfs.zfs.vdev_trim_max_active to 10 (from default 64) does *not*
>> stop the panics.
>>
>> Setting vfs.zfs.vdev.trim.enabled = 0 (which requires a reboot) DOES
>> stop the panics.
>>
>> I am going to run a scrub on the pack, but I suspect the pack itself
>> (now that I can actually mount it without the machine blowing up!) is fine.
>>
>> THIS (OBVIOUSLY) NEEDS ATTENTION!
>>
>> On 10/17/2016 09:17, Karl Denninger wrote:
>>> This is a situation I've had happen before, and reported -- it appeared
>>> to be a kernel stack overflow, and it has gotten materially worse on
>>> 11.0-STABLE.
>>>
>>> The issue occurs after some period of time (normally a week or so.)  The
>>> system has a mirrored pair of large drives used for backup purposes to
>>> which ZFS snapshots are written using a script that iterates over the
>>> system.
>>>
>>> The panic /only /happens when the root filesystem is being sent, and it
>>> appears that the panic itself is being triggered by an I/O pattern

Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger

Update from my test system:

Setting vfs.zfs.vdev_trim_max_active to 10 (from default 64) does *not*
stop the panics.

Setting vfs.zfs.vdev.trim.enabled = 0 (which requires a reboot) DOES
stop the panics.

I am going to run a scrub on the pack, but I suspect the pack itself
(now that I can actually mount it without the machine blowing up!) is fine.

THIS (OBVIOUSLY) NEEDS ATTENTION!

On 10/17/2016 09:17, Karl Denninger wrote:
> This is a situation I've had happen before, and reported -- it appeared
> to be a kernel stack overflow, and it has gotten materially worse on
> 11.0-STABLE.
>
> The issue occurs after some period of time (normally a week or so.)  The
> system has a mirrored pair of large drives used for backup purposes to
> which ZFS snapshots are written using a script that iterates over the
> system.
>
> The panic /only /happens when the root filesystem is being sent, and it
> appears that the panic itself is being triggered by an I/O pattern on
> the /backup /drive -- not the source drives.  Zpool scrubs on the source
> are clean; I am going to run one now on the backup, but in the past that
> has been clean as well.
>
> I now have a *repeatable* panic in that if I attempt a "zfs list -rt all
> backup" on the backup volume I get the below panic.  A "zfs list"
> does*__*not panic the system.
>
> The operating theory previously (after digging through the passed
> structures in the dump) was that the ZFS system was attempting to issue
> TRIMs on a device that can't do them before the ZFS system realizes this
> and stops asking (the backup volume is comprised of spinning rust) but
> the appearance of the panic now on the volume when I simply do a "zfs
> list -rt all backup" appears to negate that theory since no writes are
> performed by that operation, and thus no TRIM calls should be issued.
>
> I can leave the backup volume in the state that causes this for a short
> period of time in an attempt to find and fix this.
>
>
> NewFS.denninger.net dumped core - see /var/crash/vmcore.1
>
> Mon Oct 17 09:02:33 CDT 2016
>
> FreeBSD NewFS.denninger.net 11.0-STABLE FreeBSD 11.0-STABLE #13
> r307318M: Fri Oct 14 09:23:46 CDT 2016
> k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP  amd64
>
> panic: double fault
>
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
>
> Unread portion of the kernel message buffer:
>
> Fatal double fault
> rip = 0x8220d9ec
> rsp = 0xfe066821f000
> rbp = 0xfe066821f020
> cpuid = 6; apic id = 14
> panic: double fault
> cpuid = 6
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe0649d78e30
> vpanic() at vpanic+0x182/frame 0xfe0649d78eb0
> panic() at panic+0x43/frame 0xfe0649d78f10
> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe0649d78f30
> Xdblfault() at Xdblfault+0xac/frame 0xfe0649d78f30
> --- trap 0x17, rip = 0x8220d9ec, rsp = 0xfe066821f000, rbp =
> 0xfe066821f020 ---
> avl_rotation() at avl_rotation+0xc/frame 0xfe066821f020
> avl_remove() at avl_remove+0x1c8/frame 0xfe066821f070
> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x87f/frame
> 0xfe066821f530
> vdev_queue_io_done() at vdev_queue_io_done+0x83/frame 0xfe066821f570
> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f5a0
> zio_execute() at zio_execute+0x23d/frame 0xfe066821f5f0
> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f650
> zio_execute() at zio_execute+0x23d/frame 0xfe066821f6a0
> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f6e0
> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f710
> zio_execute() at zio_execute+0x23d/frame 0xfe066821f760
> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f7c0
> zio_execute() at zio_execute+0x23d/frame 0xfe066821f810
> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f850
> zio_vdev_io_done() at zio_vdev_io_done+0xd9/frame 0xfe066821f880
> zio_execute() at zio_execute+0x23d/frame 0xfe066821f8d0
> zio_vdev_io_start() at zio_vdev_io_start+0x34d/frame 0xfe066821f930
> zio_execute() at zio_execute+0x23d/frame 0xfe066821f980
> vdev_queue_io_done() at vdev_queue_io_done+0xcd/frame 0xfe066821f9c0
> zio_

Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

2016-10-17 Thread Karl Denninger

t
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913
#170 0x822bf919 in zio_vdev_io_done (zio=0xf8056e47c000)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137
#171 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#172 0x822bf75d in zio_vdev_io_start (zio=0xf8056e47c000)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3112
#173 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#174 0x822a216d in vdev_queue_io_done (zio=)
at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913
#175 0x822bf919 in zio_vdev_io_done (zio=0xf800b17b23d8)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137
#176 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#177 0x822bf75d in zio_vdev_io_start (zio=0xf800b17b23d8)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3112
#178 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#179 0x822a216d in vdev_queue_io_done (zio=)
at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913
#180 0x822bf919 in zio_vdev_io_done (zio=0xf800a6d367b0)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137
#181 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#182 0x822bf75d in zio_vdev_io_start (zio=0xf800a6d367b0)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3112
#183 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#184 0x822a216d in vdev_queue_io_done (zio=)
at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913
#185 0x822bf919 in zio_vdev_io_done (zio=0xf8056e99fb88)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137
#186 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#187 0x822bf75d in zio_vdev_io_start (zio=0xf8056e99fb88)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3112
#188 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#189 0x822a216d in vdev_queue_io_done (zio=)
at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913
#190 0x822bf919 in zio_vdev_io_done (zio=0xf8056e5227b0)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137
#191 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#192 0x822bf75d in zio_vdev_io_start (zio=0xf8056e5227b0)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3112
#193 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#194 0x822a216d in vdev_queue_io_done (zio=)
at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913
#195 0x822bf919 in zio_vdev_io_done (zio=0xf800a6d3c000)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3137
#196 0x822bbefd in zio_execute (zio=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1651
#197 0x80b4895a in taskqueue_run_locked (queue=)
at /usr/src/sys/kern/subr_taskqueue.c:454
#198 0x80b49b58 in taskqueue_thread_loop (arg=)
at /usr/src/sys/kern/subr_taskqueue.c:724
#199 0x80a9f255 in fork_exit (
callout=0x80b49a70 ,
arg=0xf8057caa72a0, frame=0xfe0668222c00)
at /usr/src/sys/kern/kern_fork.c:1040
#200 0x80fb44ae in fork_trampoline ()
at /usr/src/sys/amd64/amd64/exception.S:611
#201 0x in ?? ()
Current language:  auto; currently minimal
(kgdb)


-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Errata notice for 11.0 -- question

2016-10-15 Thread Karl Denninger

I noted this /after /svn updating my 11.x box to the most-current -STABLE:

A bug was diagnosed in interaction of the |pmap_activate()| function
and TLB shootdownIPI handler on amd64 systems which
have PCID features but do not implement theINVPCID instruction. On
such machines, such as SandyBridge™ and IvyBridge™
microarchitectures, set the loader
tunable |vm.pmap.pcid_enabled=0| during boot:

set vm.pmap.pcid_enabled=0
boot

Add this line to |/boot/loader.conf| for the change to persist
across reboots:

To check if the system is affected, check dmesg(8)

<http://www.freebsd.org/cgi/man.cgi?query=dmesg&sektion=8&manpath=freebsd-release-ports>
 for PCID listed
in the "Features2", and absence of INVPCID in the "Structured
Extended Features". If the PCID feature is not present,
or INVPCID is present, system is not affected.

Well, I'm allegedly subject to this:

Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-STABLE #13 r307318M: Fri Oct 14 09:23:46 CDT 2016
k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP amd64
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on
LLVM 3.8.0)
VT(vga): text 80x25
CPU: Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz (2400.13-MHz
K8-class CPU)
  Origin="GenuineIntel"  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2
 
Features=0xbfebfbff
 
Features2=0x29ee3ff
  AMD Features=0x2c100800
  AMD Features2=0x1
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics

And I do _*not*_ have it turned off at this moment.

vm.pmap.invpcid_works: 0
vm.pmap.pcid_enabled: 1

But I've also yet to see any sort of misbehavior.

So the questions are:

1. What's the misbehavior I should expect to see if this is not shut off
(and it isn't)?

2. Should I take this machine down immediately to reboot it with
vm.pmap.pcid_enabled=0 in /boot/loader.conf?

A svn log perusal didn't see anything post the errata date that appears
to be related which makes me wonder.

Thanks in advance!

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: FreeBSD 11.0-RC1 regression with regard to mouse integration in VirtualBox 5.1.4

2016-08-23 Thread Karl Denninger

On 8/23/2016 12:48, David Boyd wrote:
> Using FreeBSD 10.3-RELEASE-p6 with virtualbox-guest-additions 5.0.26 on
> VirtualBox 5.1.4 (CentOS EL7 host) as a baseline I didn't experience any
> difficulties.
>
> After fresh install of FreeBSD 11.0-RC1 with virtualbox-guest-additions
> 5.0.26 on VirtualBox 5.1.4 (CentOS EL7 host) mouse integration is
> missing.
>
> I have time and resources to test any changes you have to suggest.
>
> Thanks.
>
Does the mouse normally attach as what appears to be a USB port?  If so
the problem is likely here:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211884

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Delay with 11.0-RC2 builds

2016-08-22 Thread Karl Denninger

On 8/22/2016 23:01, Glen Barber wrote:
> On Mon, Aug 22, 2016 at 10:53:06PM -0500, Karl Denninger wrote:
>> On 8/22/2016 22:43, Glen Barber wrote:
>>> On Thu, Aug 18, 2016 at 11:30:24PM +, Glen Barber wrote:
>>>> Two issues have been brought to our attention, and as a result, 11.0-RC2
>>>> builds will be delayed a day or two while these are investigated.
>>>>
>>>>  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211872
>>>>  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211926
>>>>
>>>> An update will be sent if the delay is longer than anticipated.
>>>>
>>> Just an update, the 11.0-RC2 will be delayed at least two days.  One of
>>> the issues mentioned in the above PR URLs does not affect releng/11.0,
>>> and is a non-issue, but we are awaiting one more change to the stable/11
>>> and releng/11.0 branches that we hope will be the final major changes to
>>> 11.0.
>>>
>>> If this is the case, we may be able to eliminate 11.0-RC3 entirely, and
>>> still release on time (or, on time as the current schedule suggests).
>>>
>>> However, as you know, FreeBSD releases prioritize quality over schedule,
>>> so we may still need to adjust the schedule appropriately.
>>>
>>> So, help with testing 11.0-RC1 (or the latest releng/11.0 from svn) is
>>> greatly appreciated.
>>>
>>> Glen
>>> On behalf of:   re@
>>>
>> Has any decision been made on this?
>>
>> It is not local to me (others have reported problems with combination
>> devices) and rolling back the change in question eliminates the
>> problem.  It remains un-triaged as of this point.
>>
>> Note that this impacts a system that is booting and needs manual
>> intervention, which is not a good place to a have a problem
>>
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211884
>>
> Well, it's an EN candidate if we cannot get it fixed before the release.
> But, I've put it on our radar, which it was not on mine previously...
>
> Glen
Thank you.

As far as I can tell reverting that one commit (which results in just
one file being rolled back with a handful of lines) fixes it.  The other
PR (which is linked in this one) reporter also reported that reverting
that same commit fixes the problem for him as well.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Delay with 11.0-RC2 builds

2016-08-22 Thread Karl Denninger

On 8/22/2016 22:43, Glen Barber wrote:
> On Thu, Aug 18, 2016 at 11:30:24PM +, Glen Barber wrote:
>> Two issues have been brought to our attention, and as a result, 11.0-RC2
>> builds will be delayed a day or two while these are investigated.
>>
>>  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211872
>>  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211926
>>
>> An update will be sent if the delay is longer than anticipated.
>>
> Just an update, the 11.0-RC2 will be delayed at least two days.  One of
> the issues mentioned in the above PR URLs does not affect releng/11.0,
> and is a non-issue, but we are awaiting one more change to the stable/11
> and releng/11.0 branches that we hope will be the final major changes to
> 11.0.
>
> If this is the case, we may be able to eliminate 11.0-RC3 entirely, and
> still release on time (or, on time as the current schedule suggests).
>
> However, as you know, FreeBSD releases prioritize quality over schedule,
> so we may still need to adjust the schedule appropriately.
>
> So, help with testing 11.0-RC1 (or the latest releng/11.0 from svn) is
> greatly appreciated.
>
> Glen
> On behalf of: re@
>
Has any decision been made on this?

It is not local to me (others have reported problems with combination
devices) and rolling back the change in question eliminates the
problem.  It remains un-triaged as of this point.

Note that this impacts a system that is booting and needs manual
intervention, which is not a good place to a have a problem

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211884


-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Very odd behavior with RC2

2016-08-15 Thread Karl Denninger


On 8/15/2016 15:52, Karl Denninger wrote:
> FreeBSD 11.0-PRERELEASE #2 r304166: Mon Aug 15 13:17:09 CDT 2016
> k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP
>
> Symptoms:
>
> This machine is on a SuperMicro board with the integrated KVM.
>
> After updating to this from the previous Alpha release this morning
> (built circa July 15th) the emulated keyboard disappeared intermittently
> (!) and would not register keypresses.  There appears to have been
> something that has changed quite-materially in the loader and/or the
> kernel in this regard.  Screen display was unaffected.
>
> Toggling the mouse mode would restore the keyboard; this causes a detach
> and reattach of the virtual keyboard to the system, and it would then work.
>
> Just a heads-up as this was wildly unexpected and needless to say caused
> me quite a bit of heartburn trying to perform the upgrade and mergemaster!
>
From the PR I filed on this...

Scanning back through recent commits I am wondering if this one is
related; the problem occurs after the kernel is loaded (I can use the
keyboard on the KVM perfectly well in the BIOS, etc) and once the system
is fully up and running it works as well.  It is only if/when geli wants
a password *during the boot process* that the keyboard is hosed.

r304124 | hselasky | 2016-08-15 03:58:55 -0500 (Mon, 15 Aug 2016) | 7 lines

MFC r303765:
Keep a reference count on USB keyboard polling to allow recursive
cngrab() during a panic for example, similar to what the AT-keyboard
driver is doing.

Found by:   Bruce Evans 

The reason this looks possibly-related is that the KVM attaches as a USB
keyboard and a plugged-in USB keyboard also exhibits the problem
during the boot-time process, as shown here from the boot log on one of
the impacted machines

Enter passphrase for da8p4:
ugen1.2:  at usbus1
ukbd0:  on usbus1
kbd2 at ukbd0

And...

uhid0:  on usbus4

Hmmm.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Very odd behavior with RC2

2016-08-15 Thread Karl Denninger

FreeBSD 11.0-PRERELEASE #2 r304166: Mon Aug 15 13:17:09 CDT 2016
k...@newfs.denninger.net:/usr/obj/usr/src/sys/KSD-SMP

Symptoms:

This machine is on a SuperMicro board with the integrated KVM.

After updating to this from the previous Alpha release this morning
(built circa July 15th) the emulated keyboard disappeared intermittently
(!) and would not register keypresses.  There appears to have been
something that has changed quite-materially in the loader and/or the
kernel in this regard.  Screen display was unaffected.

Toggling the mouse mode would restore the keyboard; this causes a detach
and reattach of the virtual keyboard to the system, and it would then work.

Just a heads-up as this was wildly unexpected and needless to say caused
me quite a bit of heartburn trying to perform the upgrade and mergemaster!

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Postfix and tcpwrappers?

2016-07-25 Thread Karl Denninger

On 7/25/2016 14:48, Willem Jan Withagen wrote:
> On 25-7-2016 19:32, Karl Denninger wrote:
>> On 7/25/2016 12:04, Ronald Klop wrote:
>>> On Mon, 25 Jul 2016 18:48:25 +0200, Karl Denninger
>>>  wrote:
>>>
>>>> This may not belong in "stable", but since Postfix is one of the
>>>> high-performance alternatives to sendmail
>>>>
>>>> Question is this -- I have sshguard protecting connections inbound, but
>>>> Postfix appears to be ignoring it, which implies that it is not paying
>>>> attention to the hosts.allow file (and the wrapper that enables it.)
>>>>
>>>> Recently a large body of clowncars have been targeting my sasl-enabled
>>>> https gateway (which I use for client machines and thus do in fact need)
>>>> and while sshguard picks up the attacks and tries to ban them, postfix
>>>> is ignoring the entries it makes which implies it is not linked with the
>>>> tcp wrappers.
>>>>
>>>> A quick look at the config for postfix doesn't disclose an obvious
>>>> configuration solutiondid I miss it?
>>>>
>>> Don't know if postfix can handle tcp wrappers, but I use bruteblock
>>> [1] for protecting connections via the ipfw firewall. I use this for
>>> ssh and postfix.
> Given the fact that both tcpwrappers and postfix originate from the same
> author (Wietse Venenma) I'd be very surprised it you could not do this.
> http://www.postfix.org/linuxsecurity-200407.html
>
> But grepping the binary for libwrap it does seems to be the case.
> Note that you can also educate sshguard to actually use a script to do
> whatever you want it to do. I'm using it to add rules to an ipfw table
> that is used in a deny-rule.
>
> Reloading the fw keeps the deny-rules, flushing the table deletes all
> blocked hosts without reloading the firewall.
> Both times a bonus.
>
> --WjW
> --WjW
That's why I was surprised too... .but it is what it is.

I just rebuilt sshguard to use an ipfw table instead of hosts.allow,
since I use ipfw anyway for firewall/routing/ipsec/etc adding one line
up near the top of my ruleset to match against the table and send back a
reset (I'm considering black-holing attempts instead as that will slow
the clowncar brigade down and thus "helps" others) and resolved the issue.

It's interesting that all of a sudden the clowncar folks figured out
that if they hit my email server with SSL they could then attempt an
auth.  I have always had auth turned off for non-SSL connections for
obvious reasons (passing passwords around plain is bad news, yanno) and
until recently the clowns hadn't bothered with the overhead of setting
up SSL connections.

That appears to now have changed, so

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Postfix and tcpwrappers?

2016-07-25 Thread Karl Denninger

On 7/25/2016 14:38, Tim Daneliuk wrote:
> On 07/25/2016 01:20 PM, Shawn Bakhtiar wrote:
>> ecently a large body of clowncars have been targeting my sasl-enabled
>> https gateway (which I use for client machines and thus do in fact need)
>> and while sshguard picks up the attacks and tries to ban them, postfix
>> is ignoring the entries it makes which implies it is not linked with the
>> tcp wrappers.
>>
>> A quick look at the config for postfix doesn't disclose an obvious
>> configuration solutiondid I miss it?
>>
>
> You can more-or-less run anything from a wrapper if you don't daemonize it
> and kick it off on-demand from inetd.  Essentially, you have inetd.conf
> configured with a stanza that - upon connection attempt - launches an
> instance of your desired program (postfix in this case), if and only
> if the hosts.allow rules are satisfied.
>
> This works nicely for smaller installations, but is very slow in high 
> arrival rate environments because each connection attempt incurs the full
> startup overhead of the program you're running.
>

Tcpwrapper works with many persistent system services (sshd being a
notable ones) and integrates nicely, so you can use hosts.allow.  The
package (or default build in ports) for sshguard uses the hosts.allow file.

But, sshguard does know (if you build it by hand or use the right
subport) how to insert into an ipfw table instead so I switched over
to that.  I was rather curious, however, if/why postfix wasn't
integrated with the hosts.allow file as are many other system services
(or if I just missed the config option to turn it on) since it's offered
by FreeBSD as a "stock sendmail replacement" option for higher-volume
(and more-secure) sites

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Postfix and tcpwrappers?

2016-07-25 Thread Karl Denninger

On 7/25/2016 12:04, Ronald Klop wrote:
> On Mon, 25 Jul 2016 18:48:25 +0200, Karl Denninger
>  wrote:
>
>> This may not belong in "stable", but since Postfix is one of the
>> high-performance alternatives to sendmail
>>
>> Question is this -- I have sshguard protecting connections inbound, but
>> Postfix appears to be ignoring it, which implies that it is not paying
>> attention to the hosts.allow file (and the wrapper that enables it.)
>>
>> Recently a large body of clowncars have been targeting my sasl-enabled
>> https gateway (which I use for client machines and thus do in fact need)
>> and while sshguard picks up the attacks and tries to ban them, postfix
>> is ignoring the entries it makes which implies it is not linked with the
>> tcp wrappers.
>>
>> A quick look at the config for postfix doesn't disclose an obvious
>> configuration solutiondid I miss it?
>>
>
> Don't know if postfix can handle tcp wrappers, but I use bruteblock
> [1] for protecting connections via the ipfw firewall. I use this for
> ssh and postfix.
>
I recompiled sshguard to use ipfw and stuck the table lookup in my
firewall config. works, and is software-agnostic (thus doesn't care
if something was linked against tcpwrappers or not.)

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

Postfix and tcpwrappers?

2016-07-25 Thread Karl Denninger

This may not belong in "stable", but since Postfix is one of the
high-performance alternatives to sendmail

Question is this -- I have sshguard protecting connections inbound, but
Postfix appears to be ignoring it, which implies that it is not paying
attention to the hosts.allow file (and the wrapper that enables it.)

Recently a large body of clowncars have been targeting my sasl-enabled
https gateway (which I use for client machines and thus do in fact need)
and while sshguard picks up the attacks and tries to ban them, postfix
is ignoring the entries it makes which implies it is not linked with the
tcp wrappers.

A quick look at the config for postfix doesn't disclose an obvious
configuration solutiondid I miss it?

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature

1 2 3 4 >

1 - 100 of 346 matches

Mail list logo