I've long argued that the VM system's interaction with ZFS' arc cache and UMA has serious, even severe issues. 12.x appeared to have addressed some of them, and as such I've yet to roll forward any part of the patch series that is found here [ https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 ] or the Phabricator version referenced in the bug thread (which is more-complex and attempts to dig at the root of the issue more effectively, particularly when UMA is involved as it usually is.)
Yesterday I decided to perform a fairly significant reorganization of the ZFS pools on one of my personal machines, including the root pool which was on mirrored SSDs, changing to a Raidz2 (also on SSDs.) This of course required booting single-user from a 12-Stable memstick. A simple "zfs send -R zs/root-save/R | zfs recv -Fuev zsr/R" should have done it, no sweat. The root that was copied over before I started is uncomplicated; it's compressed, but not de-duped. While it has snapshots on it too it's by no means complex. *The system failed to execute that command with an "out of swap space" error, killing the job; there was indeed no swap configured since I booted from a memstick.* Huh? A simple *filesystem copy* managed to force a 16Gb system into requiring page file backing store? I was able to complete the copy by temporarily adding the swap space back on (where it would be when the move was complete) but that requirement is pure insanity and it appears, from what I was able to determine, that it came about from the same root cause that's been plaguing VM/ZFS interaction since 2014 when I started work this issue -- specifically, when RAM gets low rather than evict ARC (or clean up UMA that is allocated but unused) the system will attempt to page out working set. In this case since it couldn't page out working set since there was nowhere to page it to the process involved got an OOM error and was terminated. *I continue to argue that this decision is ALWAYS wrong.* It's wrong because if you invalidate cache and reclaim it you *might* take a read from physical I/O to replace that data back into the cache in the future (since it's not in RAM) but in exchange for a *potential* I/O you perform a GUARANTEED physical I/O (to page out some amount of working set) and possibly TWO physical I/Os (to page said working set out and, later, page it back in.) It has always appeared to me to be flat-out nonsensical to trade a possible physical I/O (if there is a future cache miss) for a guaranteed physical I/O and a possible second one. It's even worse if the reason you make that decision is that UMA is allocated but unused; in that case you are paging when no physical I/O is required at all as the "memory pressure" is a phantom! While UMA is a very material performance win in the general case to allow allocated-but-unused UMA to force paging, from a performance perspective, appears to be flat-out insanity. I find it very difficult to come up with any reasonable scenario where releasing allocated-but-unused UMA rather than paging out working set is a net performance loser. In this case since the system was running in single user mode the process that got selected to be destroyed when that circumstance arose and there was no available swap was the copy process itself. The copy itself did not require anywhere near all of the available non-kernel RAM. I'm going to dig into this further but IMHO the base issue still exists, even though the impact of it for my workloads with everything "running normally" has materially decreased with 12.x. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/
smime.p7s
Description: S/MIME Cryptographic Signature