Re: "zfs send" freezes system (was: Re: pgdaemon high CPU consumption)
On Tue, Jul 19, 2022 at 08:46:07AM +0200, Matthias Petermann wrote: > Hello, > > On 13.07.22 12:30, Matthias Petermann wrote: > > > I can now confirm that reverting the patch also solved my problem. Of > > course I first fell into the trap, because I had not considered that the > > ZFS code is loaded as a module and had only changed the kernel. As a > > result, it looked at first as if this would not help. Finally it did...I > > am now glad that I can use a zfs send again in this way. This previously > > led reproducibly to a crash, whereby I could not make backups. This is > > critical for me and I would like to support tests regarding this. > > > > In contrast to the PR, there are hardly any xcalls in my use case - > > however, my system only has 4 CPU cores, 2 of which are physical. > > > > > > Many greetings > > Matthias > > > > Roundabout one week after removing the patch, my system with ZFS is behaving > "normally" for the most part and the freezes have disappeared. What is the > recommended way given the 10 branch? If it is not foreseeable that the basic > problem can be solved shortly, would it also be an option to withdraw the > patch in the sources to get at least a stable behavior? (Not only) on the > sidelines, I would still be interested in whether this "zfs send" problem > occurs in general, or whether certain hardware requirements have a favorable > effect on it. > > Kind regards > Matthias hi, sorry for the delay in getting to this. what is happening here is that the pagedaemon is hitting the check for "uvm_km_va_starved_p()", which tries to keep the usage of kernel memory below 90% of the virtual space available for kernel memory. the checks that I changed (effectively removed for 64-bit kernels) in that previous patch tried to keep the ARC kernel memory usage below 75% of the kernel virtual space. on other OSs that support ZFS, the kernel allocates enough virtual space for the kernel be able to allocate almost all of RAM for itself if it wants, but on netbsd we have this calculation in kmeminit_nkmempages(): #if defined(KMSAN) npages = (physmem / 8); #elif defined(PMAP_MAP_POOLPAGE) npages = (physmem / 4); #else npages = (physmem / 3) * 2; #endif /* defined(PMAP_MAP_POOLPAGE) */ #ifndef NKMEMPAGES_MAX_UNLIMITED if (npages > NKMEMPAGES_MAX) npages = NKMEMPAGES_MAX; #endif this limits the amount of kernel memory to 1/4 of RAM on 64-bit platforms. PMAP_MAP_POOLPAGE is for accessing pool objects that are smaller than a page using a direct-mapped region of virtual addresses. all 64-bit kernels can do this... though it looks like sparc64 doesn't do this for such pool allocations even though it could? weird. most 64-bit kernels also define NKMEMPAGES_MAX_UNLIMITED to indicate that no arbitrary fixed limit should be imposed on kernel memory usage. though again not all platforms that could define this actually do. this time it's the mips kernels that don't enable this one. for ZFS, the memory used for the ARC cache is allocated through pools but the allocation sizes are almost all larger than a page, so basically none of these allocations will be able to use the direct map, and instead they will all have to allocate kernel virtual space. I don't think it makes sense for the kernel to arbitrarily limit the ZFS ARC cache to 1/4 of RAM just because that's how much virtual space is made available for kernel memory mappings, so instead I think we should increase the size of the kernel virtual space on 64-bit kernels to support mapping all of RAM, something like the attached patch. however even with this change, reading an bunch of data into the ZFS ARC still results in the system hanging, this time due to running out of physical memory. there are other mechanisms that ZFS also uses to try to control its memory usage, and some part of that is apparently not working either. I'm continuing to look into this. -Chuck Index: src/sys/uvm/uvm_km.c === RCS file: /home/chs/netbsd/cvs/src/sys/uvm/uvm_km.c,v retrieving revision 1.160 diff -u -p -r1.160 uvm_km.c --- src/sys/uvm/uvm_km.c13 Mar 2021 15:29:55 - 1.160 +++ src/sys/uvm/uvm_km.c26 Jul 2022 20:24:14 - @@ -237,6 +237,8 @@ kmeminit_nkmempages(void) #ifndef NKMEMPAGES_MAX_UNLIMITED if (npages > NKMEMPAGES_MAX) npages = NKMEMPAGES_MAX; +#else + npages = physmem; #endif if (npages < NKMEMPAGES_MIN)
Automated report: NetBSD-current/i386 build failure
This is an automatically generated notice of a NetBSD-current/i386 build failure. The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host, using sources from CVS date 2022.07.28.22.00.23. An extract from the build.sh output follows: # compile GENERIC/intel_huc.o /tmp/build/2022.07.28.22.00.23-i386/tools/bin/i486--netbsdelf-gcc -fwrapv -msoft-float -mno-mmx -mno-sse -mno-avx -mindirect-branch=thunk -mindirect-branch-register -ffreestanding -fno-zero-initialized-in-bss -fno-delete-null-pointer-checks -O2 -fno-omit-frame-pointer -fstack-protector -Wstack-protector --param ssp-buffer-size=1 -fstack-usage -Wstack-usage=3584 -fno-strict-aliasing -fno-common -fwrapv -std=gnu99 -Werror -Wall -Wno-main -Wno-format-zero-length -Wpointer-arith -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wno-unreachable-code -Wno-pointer-sign -Wno-attributes -Wextra -Wno-unused-parameter -Wold-style-definition -Wno-sign-compare -Walloca -Wno-missing-field-initializers -Wno-pointer-arith -Wno-shadow -Wno-address-of-packed-member --sysroot=/tmp/build/2022.07.28.22.00.23-i386/destdir -Di386 -I. -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/mit/xen-include-public/dist/ -I/tmp/build/2022.07.28.22 .00.23-i386/src/sys/external/bsd/libnv/dist -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/acpica/dist -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/../common/lib/libx86emu -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/../common/lib/libc/misc -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/../common/include -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/arch -I/tmp/build/2022.07.28.22.00.23-i386/src/sys -nostdinc -DCOMPAT_UTILS -D__XEN_INTERFACE_VERSION__=0x3020a -DDIAGNOSTIC -DCOMPAT_44 -D_KERNEL -D_KERNEL_OPT -std=gnu99 -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/lib/libkern/../../../common/lib/libc/quad -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/lib/libkern/../../../common/lib/libc/string -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/lib/libkern/../../../common/lib/libc/arch/i386/string -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/lib/libkern/../../../common/lib/libc/arch/i386/atomic -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/lib/libkern/../../../common/lib/libc /hash/sha3 -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/drm2/include -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/drm2/include/drm -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/common/include -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/drm2/dist/include -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/drm2/dist/include/drm -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/drm2/dist/include/uapi -D__KERNEL__ -DCONFIG_X86 -DCONFIG_X86_PAT -DCONFIG_BACKLIGHT_CLASS_DEVICE=0 -DCONFIG_BACKLIGHT_CLASS_DEVICE_MODULE=0 -DCONFIG_DRM_FBDEV_EMULATION=1 -DCONFIG_DRM_FBDEV_OVERALLOC=100 -DCONFIG_FB=0 -DCONFIG_LOCKDEP=0 -DCONFIG_PCI=1 -DCONFIG_DRM_LEGACY -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/drm2/i915drm -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/drm2/dist/drm/i915 -DCONFIG_DRM_FBDEV_EMULATION=1 -DCONFIG_DRM_I915_DEBUG=1 -DCONFIG_DRM_I915_DEBUG_GEM=1 -DCONFIG_DRM_I915_DEBUG_RUNTIME_PM=0 -DCONFIG_DRM_I915_PREEMPT_TIMEOUT=640 -DCONFIG_DRM_I915_TIMESLICE_DURATION=1 -DCONFIG_DRM_I915_ALPHA_SUPPORT=0 -DCONFIG_DRM_I915_FBDEV=1 -DCONFIG_DRM_I915_GVT=0 -DCONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT=0 -DCONFIG_DRM_I915_DEBUG_MMIO=1 -DCONFIG_DRM_I915_FORCE_PROBE=0 -DCONFIG_DRM_I915_SPIN_REQUEST=0 -DCONFIG_DRM_I915_SW_FENCE_CHECK_DAG=1 -DCONFIG_DRM_I915_HEARTBEAT_INTERVAL=2500 -DCONFIG_DRM_I915_STOP_TIMEOUT=100 -DCONFIG_DRM_I915_PREEMPT_TIMEOUT=640 -DCONFIG_DRM_I915_CAPTURE_ERROR=0 -DCONFIG_DRM_I915_SELFTEST=0 -DCONFIG_DRM_I915_USERFAULT_AUTOSUSPEND=0 -DCONFIG_PM=1 -DCONFIG_INTEL_MEI_HDCP=0 -D_FORTIFY_SOURCE=2 -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/isc/atheros_hal/dist -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/isc/atheros_hal/ic -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/../common/include -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/acpica/dist/include -I/tmp/build/2022.07.28.22.00.23-i386/src/sys/external/bsd/libnv/dist -c /tmp/build/2 022.07.28.22.00.23-i386/src/sys/external/bsd/drm2/dist/drm/i915/gt/uc/intel_huc.c -o intel_huc.o --- kern-MONOLITHIC --- --- intel_lvds.o --- /tmp/build/2022.07.28.22.00.23-i386/tools/bin/nbctfconvert -g -L VERSION intel_lvds.o --- kern-INSTALL --- => @/tmp/build/2022.07.28.22.00.23-i386/tools/bin/nbsed '/const char sccs/!d;s/.*@(.)//;s/" "//;s/\\.*//' vers.c && /tmp/build/2022.07.28.22.00.23-i386/tools/bin/i486--netbsdelf-size netbsd && /tmp/build/2022.07.28.22.00.23-i386/tools/bin/nbctfmerge -t -g -L VERSION -o netbsd locore.o copy.o spl.o vector.o lock_stubs.o multiboot.o multiboot2.o autoconf.o aout_machdep.o busfun
daily CVS update output
Updating src tree: P src/distrib/sets/lists/debug/module.md.amd64 P src/distrib/sets/lists/debug/module.md.i386 P src/distrib/sets/lists/modules/md.amd64 P src/distrib/sets/lists/modules/md.i386 P src/sys/arch/aarch64/aarch64/cpu_machdep.c P src/sys/arch/arm/arm32/arm32_machdep.c P src/sys/arch/arm/pic/pic.c P src/sys/arch/i386/conf/GENERIC P src/sys/dev/pci/if_wm.c P src/sys/external/bsd/drm2/dist/drm/scheduler/sched_fence.c P src/sys/external/bsd/drm2/drm/files.drmkms cvs update: `src/sys/external/bsd/drm2/drm/sched_module.c' is no longer in the repository P src/sys/modules/Makefile P src/sys/modules/drmkms_sched/Makefile P src/sys/net/if.h Updating xsrc tree: P xsrc/external/mit/xf86-video-nv/dist/src/nv_driver.c Killing core files: Updating file list: -rw-rw-r-- 1 srcmastr netbsd 40588525 Jul 29 03:03 ls-lRA.gz