I'm having trouble with zfs causing a system to run out of memory, when I think it should work ok. I have tried to err on the side of TMI.
I have a semi-old computer (2010) that is: netbsd-10 amd64 8GB RAM 1T SSD cpu0: "Pentium(R) Dual-Core CPU E5700 @ 3.00GHz" cpu1: "Pentium(R) Dual-Core CPU E5700 @ 3.00GHz" and it basically works fine, besides being a bit slow by today's standards. I am using it as a build and fileserver, heading to eventually running pbulk, either in domUs or chroots. I have recently moved 2 physical machines (netbsd-9 i386 and amd64) to domUs; I use these to build packages for production use. (The machines are 2006 and 2008 mac notebooks, with painfully slow spinning disks and 4G of RAM each -- but they work.) wd0 has a disklabel, with / and /usr as normal FFSv2 (a and e), normal swap on wd0b. wd0f is defined as most of the disk, and is the sole component of tank0: #> zpool status pool: tank0 state: ONLINE scan: scrub repaired 0 in 0h8m with 0 errors on Tue Jul 4 20:31:03 2023 config: NAME STATE READ WRITE CKSUM tank0 ONLINE 0 0 0 /etc/zfs/tank0/wd0f ONLINE 0 0 0 errors: No known data errors I have a bunch of filesystems, for various pkgsrc branches (created from snapshots), etc: NAME USED AVAIL REFER MOUNTPOINT tank0 138G 699G 26K /tank0 tank0/b0 6.16G 699G 6.16G /tank0/b0 tank0/ccache 24.1G 699G 24.1G /tank0/ccache tank0/distfiles 35.1G 699G 35.1G /tank0/distfiles tank0/n0 31.5K 699G 31.5K /tank0/n0 tank0/obj 3.48G 699G 3.48G /tank0/obj tank0/packages 7.27G 699G 7.27G /tank0/packages tank0/pkgsrc-2022Q1 130M 699G 567M /tank0/pkgsrc-2022Q1 tank0/pkgsrc-2022Q2 145M 699G 569M /tank0/pkgsrc-2022Q2 tank0/pkgsrc-2022Q3 194M 699G 566M /tank0/pkgsrc-2022Q3 tank0/pkgsrc-2022Q4 130M 699G 573M /tank0/pkgsrc-2022Q4 tank0/pkgsrc-2023Q1 147M 699G 582M /tank0/pkgsrc-2023Q1 tank0/pkgsrc-2023Q2 148M 699G 583M /tank0/pkgsrc-2023Q2 tank0/pkgsrc-current 10.3G 699G 1.14G /tank0/pkgsrc-current tank0/pkgsrc-wip 623M 699G 623M /tank0/pkgsrc-wip tank0/u0 1.91M 699G 1.91M /tank0/u0 tank0/vm 49.5G 699G 23K /tank0/vm tank0/vm/n9-amd64 33.0G 722G 10.1G - tank0/vm/n9-i386 16.5G 711G 4.38G - tank0/ztmp 121M 699G 121M /tank0/ztmp which all feels normal to me. I used to usually boot this as GENERIC. Now I'm booting xen with 4G: menu=GENERIC:rndseed /var/db/entropy-file;boot netbsd menu=GENERIC single user:rndseed /var/db/entropy-file;boot netbsd -s menu=Xen:load /netbsd-XEN3_DOM0.gz root=wd0a rndseed=/var/db/entropy-file console=pc;multiboot /xen.gz dom0_mem=4096M menu=Xen single user:load /netbsd-XEN3_DOM0.gz root=wd0a rndseed=/var/db/entropy-file console=pc -s;multiboot /xen.gz dom0_mem=4096M menu=GENERIC.ok:rndseed /var/db/entropy-file;boot netbsd menu=Drop to boot prompt:prompt default=3 timeout=5 clear=1 I find that after doing things like cvs update in pkgsrc, I have a vast amount of memory in pools: Memory: 629M Act, 341M Inact, 16M Wired, 43M Exec, 739M File, 66M Free Swap: 16G Total, 16G Free / Pools: 3372M Used vmstat -m, sorted by Npage and showing > 1E4: zio_buf_16384 16384 57643 1 53341 33786 22341 11445 30831 0 inf 7143 zio_buf_2560 2560 18636 0 17890 15244 2467 12777 12777 0 inf 12031 ffsdino2 264 540607 0 348374 28691 15875 12816 13522 0 inf 0 zfs_znode_cache 248 245152 0 206469 13015 18 12997 13015 0 inf 665 ffsino 280 540249 0 348016 30887 17156 13731 14488 0 inf 0 zio_buf_2048 2048 36944 0 36004 15617 599 15018 15026 0 inf 14259 zio_buf_1536 2048 41491 0 40737 18313 6 18307 18313 0 inf 17657 zio_buf_1024 1536 55808 0 54191 22942 357 22585 22942 0 inf 21442 dmu_buf_impl_t 216 538828 0 440673 23016 11 23005 23016 0 inf 380 arc_buf_hdr_t_f 208 657474 0 556468 25273 638 24635 25096 0 inf 7913 zio_data_buf_51 1024 187177 0 157005 45575 14127 31448 45575 0 inf 10220 vcachepl 640 266639 0 56918 34959 2 34957 34958 0 inf 1 dnode_t 640 576198 0 485522 70645 9470 61175 70645 0 inf 11511 zio_buf_512 1024 848240 0 798838 141743 15535 126208 128224 0 inf 96759 Memory resource pool statistics Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle systcl: kstat.zfs.misc.arcstats.size = 283598992 If I continue to do things, the system locks up and needs to have the reset button pushed. I'm now trying an external tickle watchdog with a srript that does sync/tickle/sleep-60, on the hopes that it will reboot when sync starts hanging up. My memory is that this happens with non-xen too, but it takes longer. Other than the lockups, zfs behaves as I expect it to. So what I don't understand is: Is there any mechanism to cause zfs (guessing ARC) to limit the amount of memory in use? I there any mechanism to cause zfs to free ARC during memory pressure? Do people think this is a xen/zfs interaction bug, that doesn't happen in non-xen? Basically especially with and SSD, ARC is not such a win, and ARC causing the machine to run out of memory is dysfunctional. questions: Have I misconfigured/mis-used zfs? Is there really no reclaiming under pressur? Is there some way to limit ARC to say 1 GB? Why isn't x% of memory a default limit, if there's no functioning reclaim under memory pressure? Are others having this problem? Greg