Re: i made a mess (forgot the buildworld and installworld)
Am 2024-07-15 03:05, schrieb Alexander Ziaee: Eventually, you want to learn to snapshot your system when you're touching critical directories. Then, it never matters. You can just roll back stressfree. Specific to updates and if on ZFS: man bectl Even better than snapshots. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: bridge: no traffic with vnet (epair) beyond bridge device
Am 2024-06-03 21:02, schrieb FreeBSD User: Hello, I'm running a dual socket NUMA CURRENT host (Fujitsu RX host) running several jails. Jails are attached to a bridge device (bridge1), the physical device on that bridge is igb1 (i350 based NIC). The bridge is created via host's rc scripts, adding and/or deleting epair members of the bridge is performed by the jail.conf script. I do not know how long the setup worked, but out of the blue, last week after a longish poudriere run after updating the host to most recent CURRENT (as of today, latest update kernel and world) and performing "etcupdate" on both the host and all jails, traffic beyond the bridge is not seen on the network! All jails can communicate with each other. Traffic from the host itself is routed via igb0 to network and back via igb1 onto the bridge. I check all setups for net.link.bridge: net.link.bridge.ipfw: 0 net.link.bridge.log_mac_flap: 1 net.link.bridge.allow_llz_overlap: 0 net.link.bridge.inherit_mac: 0 net.link.bridge.log_stp: 0 net.link.bridge.pfil_local_phys: 0 net.link.bridge.pfil_member: 0 net.link.bridge.ipfw_arp: 0 net.link.bridge.pfil_bridge: 0 net.link.bridge.pfil_onlyip: 0 I did not change anything (knowingly). I also have an oldish box running single socket processor, also driven by the very same CURRENT and similar, but not identical setup. The box is running very well and the bridge is working as expected. I was wondering if something in detail has changed in the handling of jails, epair and bridges. I followed the setup "after the book", nothing suspicious. "after the book" = the IP of the host itself is not on igb1 but on a different interface or on the bridge? Is there a firewall active on the box itself? Which one? What does wireshark / a traffic dump at the physical interface level tell compared to a traffic dump at the switch interface? Did you replace the cable / SFP / move to a different switch port as a test? I suggest to provide the output of ifconfig -a and netstat -rn (feel free to mangle the IPs, as long as the mangling is a consistent replacement and not a cut-off). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: May 2024 stabilization week
Am 2024-05-28 06:24, schrieb Gleb Smirnoff: On Mon, May 27, 2024 at 01:00:24AM -0700, Gleb Smirnoff wrote: T> This is an automated email to inform you that the May 2024 stabilization week T> started with FreeBSD/main at main-n270422-cca0ce62f367, which was tagged as T> main-stabweek-2024-May. Monday night status update: - Updated my personal desktop and home router, no issues noticed. - Testing at Netflix is delayed due to several issues: the test cluster busy with other stuff, some small difficulties with merging, etc. Usually we run the test Monday night to Tuesday, but this time we plan to do it Tuesday to Wednesday. Regressions I am aware of and tracking: - Linuxulator too strict on Netlink (PR 279012) Replying to this email thread with your success reports as well as reporting any regressions is very much appreciated. Thanks! Intel 32bit users users which use ZFS may want to have https://cgit.FreeBSD.org/src/commit/?id=4c053c17f2c8a715988f215d16284879857ca376 Apart from that much more stable on my 30-jails + poudriere host than the src as from the middle of the month. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /..../sys/cam/nvme/nvme_da.c:469
Am 2024-05-22 22:45, schrieb Alexander Leidinger: Am 2024-05-22 20:53, schrieb Warner Losh: First order: Looks like we're trying to schedule a trim, but that fails due to a malloc issue. So then, since it's a malloc issue, we wind up trying to automatically reschedule this I/O, which recurses into the driver with a bad lock held and boop. Can you reproduce this? So far I had it once. At least I have only one crashdump. I had one more reboot/crash, but no dump. I also have a watchdog running on this system, so not sure what caused the (unusual) reboot. I had a poudriere build running at both times. Since the crashdump I didn't run poudriere anymore. If so, can you test this patch? I give it a try tomorrow anyway, and I will try to stress the system again with poudriere. The nvme is a cache and also a log device for a zpool, so not really a deterministic way to trigger access to it. I've run a lot of poudriere builds together with other load (about 30 jails with mysql, postgresql, redis, webmail, postfix, imap, java stuff, ..) on this system since thursday. So far no panic in the nvme part. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
panic: lock "tmpfsni" 0xfffff80721307090 already initialized
Hi, [123095] panic: lock "tmpfsni" 0xf80721307090 already initialized [123095] cpuid = 8 [123095] time = 1716597585 [123095] KDB: stack backtrace: [123095] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe08285c9690 [123095] vpanic() at vpanic+0x13f/frame 0xfe08285c97c0 [123095] panic() at panic+0x43/frame 0xfe08285c9820 [123095] lock_init() at lock_init+0x155/frame 0xfe08285c9830 [123095] _mtx_init() at _mtx_init+0x89/frame 0xfe08285c9850 [123095] tmpfs_node_init() at tmpfs_node_init+0x28/frame 0xfe08285c9870 [123095] keg_alloc_slab() at keg_alloc_slab+0x28d/frame 0xfe08285c98c0 [123095] zone_import() at zone_import+0xec/frame 0xfe08285c9950 [123095] cache_alloc() at cache_alloc+0x3b3/frame 0xfe08285c99b0 [123095] cache_alloc_retry() at cache_alloc_retry+0x23/frame 0xfe08285c99f0 [123095] tmpfs_alloc_node() at tmpfs_alloc_node+0x108/frame 0xfe08285c9a40 [123095] tmpfs_alloc_file() at tmpfs_alloc_file+0xbf/frame 0xfe08285c9ad0 [123095] tmpfs_create() at tmpfs_create+0x38/frame 0xfe08285c9b00 [123095] VOP_CREATE_APV() at VOP_CREATE_APV+0x3c/frame 0xfe08285c9b20 [123095] vn_open_cred() at vn_open_cred+0x2e2/frame 0xfe08285c9c80 [123095] openatfp() at openatfp+0x268/frame 0xfe08285c9dc0 [123095] sys_openat() at sys_openat+0x28/frame 0xfe08285c9de0 [123095] filemon_wrapper_openat() at filemon_wrapper_openat+0x12/frame 0xfe08285c9e00 [123095] amd64_syscall() at amd64_syscall+0x15b/frame 0xfe08285c9f30 [123095] fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe08285c9f30 [123095] --- syscall (499, FreeBSD ELF64, openat), rip = 0xab82ba, rsp = 0x8217439e8, rbp = 0x821743a20 --- [123095] Uptime: 1d10h11m35s This is with a world from 2024-05-17-084543. Full logs available at https://wiki.leidinger.net/core.txt.7 (1.1 MB). This was in the middle of the night, poudriere was running. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /..../sys/cam/nvme/nvme_da.c:469
Am 2024-05-22 20:53, schrieb Warner Losh: First order: Looks like we're trying to schedule a trim, but that fails due to a malloc issue. So then, since it's a malloc issue, we wind up trying to automatically reschedule this I/O, which recurses into the driver with a bad lock held and boop. Can you reproduce this? So far I had it once. At least I have only one crashdump. I had one more reboot/crash, but no dump. I also have a watchdog running on this system, so not sure what caused the (unusual) reboot. I had a poudriere build running at both times. Since the crashdump I didn't run poudriere anymore. If so, can you test this patch? I give it a try tomorrow anyway, and I will try to stress the system again with poudriere. The nvme is a cache and also a log device for a zpool, so not really a deterministic way to trigger access to it. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
_mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ /..../sys/cam/nvme/nvme_da.c:469
Hi, I got the panic in $Subject. Anyone an idea? Complete crashlog available at https://wiki.leidinger.net/core.txt.6 (1.2 MB) Short version: ---snip--- [11417] KDB: stack backtrace: [11417] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe043133f830 [11417] vpanic() at vpanic+0x13f/frame 0xfe043133f960 [11417] panic() at panic+0x43/frame 0xfe043133f9c0 [11417] __mtx_lock_sleep() at __mtx_lock_sleep+0x491/frame 0xfe043133fa50 [11417] __mtx_lock_flags() at __mtx_lock_flags+0x9c/frame 0xfe043133fa70 [11417] ndastrategy() at ndastrategy+0x3c/frame 0xfe043133faa0 [11417] g_disk_start() at g_disk_start+0x569/frame 0xfe043133fb00 [11417] g_io_request() at g_io_request+0x2b6/frame 0xfe043133fb30 [11417] g_io_deliver() at g_io_deliver+0x1cc/frame 0xfe043133fb80 [11417] g_disk_done() at g_disk_done+0xee/frame 0xfe043133fbc0 [11417] ndastart() at ndastart+0x4a3/frame 0xfe043133fc20 [11417] xpt_run_allocq() at xpt_run_allocq+0xa5/frame 0xfe043133fc70 [11417] ndastrategy() at ndastrategy+0x6d/frame 0xfe043133fca0 [11417] g_disk_start() at g_disk_start+0x569/frame 0xfe043133fd00 [11417] g_io_request() at g_io_request+0x2b6/frame 0xfe043133fd30 [11417] g_io_request() at g_io_request+0x2b6/frame 0xfe043133fd60 [11417] g_io_request() at g_io_request+0x2b6/frame 0xfe043133fd90 [11417] vdev_geom_io_start() at vdev_geom_io_start+0x257/frame 0xfe043133fdc0 [11417] zio_vdev_io_start() at zio_vdev_io_start+0x321/frame 0xfe043133fe10 [11417] zio_execute() at zio_execute+0x78/frame 0xfe043133fe40 [11417] taskqueue_run_locked() at taskqueue_run_locked+0x1c7/frame 0xfe043133fec0 [11417] taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfe043133fef0 ---snip--- This is with a world from 2024-05-17-084543. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Graph of the FreeBSD memory fragmentation
Am 2024-05-14 03:54, schrieb Ryan Libby: That was a long winded way of saying: the "UMA bucket" axis is actually "vm phys free list order". That said, I find that dimension confusing because in fact there's just one piece of information there, the average size of a free list entry, and it doesn't actually depend on the free list order. The graph could be 2D. It evolved into that... At first I had a 3 dimensional dataset and the first try was to plot it as is (3D). The outcome (as points) was not as good as I wanted it to be, and plotting as lines gave the wrong direction of lines. I massaged the plotting instructions until it looked good enough. I did not try a 2D plot. I agree, with different colors for each free list order a 2D plot may work too. If a 2D plot is better than a 3D plot in this case, depends on the mental model of the topic the viewer has. One size may not fit all. Feel free to experiment with other plotting styles. The paper that defines this fragmentation index also says that "the fragmentation index is only meaningful when an allocation fails". Are you actually seeing any contiguous allocations failures in your measurements? I'm not aware of such. The index may only be meaningful for the purposes of the goal of the paper when there are such failures, but if you look at the graph and how it changed when Bojan changed the guard pages, I see value in the graph for more than what the paper suggests. Without that context, it seems like what the proposed sysctl reports is indirectly just the average size of free list entries. We could just report that. The calculation of the value is part of a bigger picture. The value returned is used by some other code to make decisions. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Graph of the FreeBSD memory fragmentation
Am 2024-05-08 18:45, schrieb Bojan Novković: Hi, On 5/7/24 14:02, Alexander Leidinger wrote: Hi, I created some graphs of the memory fragmentation. https://www.leidinger.net/blog/2024/05/07/plotting-the-freebsd-memory-fragmentation/ My goal was not comparing a specific change on a given benchmark, but to "have something which visualizes memory fragmentation". As part of that, Bojans commit https://cgit.freebsd.org/src/commit/?id=7a79d066976149349ecb90240d02eed0c4268737 was just in the middle of my data collection. I have the impression that it made a positive difference in my non deterministic workload. Thank you for working on this, the plots look great! They provide a really clean visual overview of what's happening. I'm working on another type of memory visualization which might interest you, I'll share it with you once its done. One small nit - the fragmentation index does not quantify fragmentation for UMA buckets, but for page allocator freelists. Do I get it more correctly now: UMA buckets are type/structure specific allocation lists, and the page allocator freelists are size-specific allocation lists (which are used by UMA when no free item is available in a bucket)? Is there anything which prevents https://reviews.freebsd.org/D40575 to be committed? D40575 is closely tied to the compaction patch (D40772) which is currently on hold until another issue is solved (see D45046 and related revisions for more details). Any idea about https://reviews.freebsd.org/D16620 ? Is D45046 supposed to replace this, or is it about something else? I wanted to try D16620, but it doesn't apply and my naive/mechanical way of applying it panics. I didn't consider landing D40575 because of that, but I guess it could be useful on its own. It at least gives a way to quantify with numbers resp. qualitatively visualize. And as such it may help in visualizing differences like with your guard-pages commit. I wonder if the segregation of nofree allocations may result in a similar improvement for long-running systems. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Graph of the FreeBSD memory fragmentation
Hi, I created some graphs of the memory fragmentation. https://www.leidinger.net/blog/2024/05/07/plotting-the-freebsd-memory-fragmentation/ My goal was not comparing a specific change on a given benchmark, but to "have something which visualizes memory fragmentation". As part of that, Bojans commit https://cgit.freebsd.org/src/commit/?id=7a79d066976149349ecb90240d02eed0c4268737 was just in the middle of my data collection. I have the impression that it made a positive difference in my non deterministic workload. Is there anything which prevents https://reviews.freebsd.org/D40575 to be committed? Maybe some other people want to have a look at the memory fragmentation and some of Bojans work (https://wiki.freebsd.org/SummerOfCode2023Projects/PhysicalMemoryAntiFragmentationMechanisms). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Strange network/socket anomalies since about a month
Am 2024-04-22 18:12, schrieb Gleb Smirnoff: There were several preparatory commits that were not reverted and one of them had a bug. The bug manifested itself as failure to send(2) zero bytes over unix/stream. It was fixed with e6a4b57239dafc6c944473326891d46d966c0264. Can you please check you have this revision? Other than that there are no known bugs left. Yes, I have this fix in my running kernel. A> Any ideas how to track this down more easily than running the entire A> poudriere in ktrace (e.g. a hint/script which dtrace probes to use)? I don't have any better idea than ktrace over failing application. Yep, I understand that poudriere will produce a lot. But first we need to determine Yes, it does. 4.4 GB just for the start of poudriere until the first package build fails due to a failed sccache start (luckily in the first builder, but I had at least 2 builders automatically spin up by poudriere at the time when I validated the failure in the logs and disabled the tracing). what syscall fails and on what type of socket. After that we can scope down to using dtrace on very particular functions. I'm not sure I manage to find the cause of the failure... the only thing which remotely looks like an issue is "Resource temporarily unavailable", but this is from the process which waits for the server to have started: ---snip--- 58406 sccache 1713947887.504834367 RET __sysctl 0 58406 sccache 1713947887.505521884 CALL rfork(0x8000<>2147483648) 58406 sccache 1713947887.50575 CAP system call not allowed: rfork 58406 sccache 1713947887.505774176 RET rfork 58426/0xe43a 58406 sccache 1713947887.507304865 CALL compat11.kevent(0x3,0x371d360f89e8,0x2,0x371d360f89e8,0x2,0) 58406 sccache 1713947887.507657906 STRU struct freebsd11_kevent[] = { { ident=11, filter=EVFILT_READ, flags=0x61, fflags=0, data=0, udata=0x0 } { ident=11, filter=EVFILT_WRITE, flags=0x61, fflags=0, data=0, udata=0x0 } } 58406 sccache 1713947887.507689980 STRU struct freebsd11_kevent[] = { { ident=11, filter=EVFILT_READ, flags=0x4000, fflags=0, data=0, udata=0x0 } { ident=11, filter=EVFILT_WRITE, flags=0x4000, fflags=0, data=0, udata=0x0 } } 58406 sccache 1713947887.507977155 RET compat11.kevent 2 58406 sccache 1713947887.508015751 CALL write(0x5,0x371515685bcc,0x1) 58406 sccache 1713947887.508086434 GIO fd 5 wrote 1 byte 0x 01 |.| 58406 sccache 1713947887.508145930 RET write 1 58406 sccache 1713947887.508183140 CALL compat11.kevent(0x7,0,0,0x5a5689ab0c40,0x400,0) 58406 sccache 1713947887.508396614 STRU struct freebsd11_kevent[] = { } 58406 sccache 1713947887.508156537 STRU struct freebsd11_kevent[] = { { ident=4, filter=EVFILT_READ, flags=0x60, fflags=0, data=0x1, udata=0x } } 58406 sccache 1713947887.508530888 RET compat11.kevent 1 58406 sccache 1713947887.508563736 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947887.508729102 GIO fd 4 read 1 byte 0x 01 |.| 58406 sccache 1713947887.508967661 RET read 1 58406 sccache 1713947887.508996753 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947887.509028311 RET read -1 errno 35 Resource temporarily unavailable 58406 sccache 1713947887.509068838 CALL compat11.kevent(0x3,0,0,0x5a5689a97540,0x400,0x371d3a2887c8) .. 58406 sccache 1713947897.514352552 CALL _umtx_op(0x5a5689a3d290,0x10,0x7fff,0,0) 58406 sccache 1713947897.514383653 RET _umtx_op 0 58406 sccache 1713947897.514421273 CALL write(0x5,0x371515685bcc,0x1) 58406 sccache 1713947897.515050967 STRU struct freebsd11_kevent[] = { { ident=4, filter=EVFILT_READ, flags=0x60, fflags=0, data=0x1, udata=0x } } 58406 sccache 1713947897.515146151 RET compat11.kevent 1 58406 sccache 1713947897.515178978 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947897.515368070 GIO fd 4 read 1 byte 0x 01 |.| 58406 sccache 1713947897.515396600 RET read 1 58406 sccache 1713947897.515426523 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947897.515457073 RET read -1 errno 35 Resource temporarily unavailable 58406 sccache 1713947897.515004494 GIO fd 5 wrote 1 byte 0x 01 |.| ---snip--- https://www.leidinger.net/test/sccache.tar.bz2 contains the parts of the trace of the sccache processes (in case someone wants to have a look). The poudriere run had several builders in parallel, at least 2 were running at that point in time. What the overlay does is to startup (sccache --start-server) the sccache server process (forks to return back on the command line) which creates a file system socket, and then it queries the stats (sccache --show-stats). So some of the traces in the tarball are the server start (those with "Timed
Strange network/socket anomalies since about a month
Hi, I see a higher failure rate of socket/network related stuff since a while. Those failures are transient. Directly executing the same thing again may or may not result in success/failure. I'm not able to reproduce this at will. Sometimes they show up. Examples: - poudriere runs with the sccache overlay (like ccache but also works for rust) sometimes fail to create the communication socket and as such the build fails. I have 3 different poudriere bulk runs after each other in my build script, and when the first one fails, the second and third still run. If the first fails due to the sccache issue, the second and 3rd may or may not fail. Sometimes the first fails and the rest is ok. Sometimes all fail, and if I then run one by hand it works (the script does the same as the manual run, the script is simply a "for type in A B C; do; poudriere bulk -O sccache -j $type -f ${type}.pkglist; done" which I execute from the same shell, and the script doesn't do env-sanityzing). - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx (webmail service) -> php -> imap) sees intermittent issues sometimes. Opening the same email directly again afterwards normally works. I've also seen transient issues with pgp signing (webmail interface -> gnupg / gpg-agent on the server), simply hitting send again after a failure works fine. Gleb, could this be related to the socket stuff you did 2 weeks ago? My world is from 2024-04-17-112537. I do notice this since at least then, but I'm not sure if they where there before that and I simply didn't notice them. They are surely "new recently", that amount of issues I haven's seen in January. The last two updates of current I did before the last one where on 2024-03-31-120210 and 2024-04-08-112551. I could also imagine that some memory related transient failure could cause this, but with >3 GB free I do not expect this. Important here may be that I have https://reviews.freebsd.org/D40575 in my tree, which is memory related, but it's only a metric to quantify memory fragmentation. Any ideas how to track this down more easily than running the entire poudriere in ktrace (e.g. a hint/script which dtrace probes to use)? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Am 2024-03-29 18:21, schrieb Alexander Leidinger: Am 2024-03-29 18:13, schrieb Mark Johnston: On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote: Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. What is the exact revision you're running? There were some unrelated changes to the kernel linker around the same time. The working src is from 2024-03-11-094351 (GMT+0100). The failing src was fetched after Glebs stabilization week message (and todays src before the sound stuff still fails). Retpoline wasn't the cause, next test is the CTF stuff in the kernel... A rather obscure problem was causing this. The "last" BE had canmount set to "on" instead of "noauto". No idea how this happened, but this resulted in the "last" BE to be mounted on "zfs mount -a" on top of the current BE. This means that all modules loaded after the zfs rc script has run was loading old kernel modules and the error message of kernel version mismatch was correct. I fiund the issue while bisecting the tree and suddenly the error message went away but the new issue of missing dev entries popped up (/dev was mounted correctly on the booting dataset, but the last BE was mounted on top of it and /dev went empty...). It looks to me like bectl was doing this (from "zpool history")... 2024-03-11.14:16:31 zpool set bootfs=rpool/ROOT/2024-03-11-094351 rpool 2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-01-18-092730 2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-02-10-144617 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-11-212006 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-16-082836 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211_ok 2024-03-11.14:16:33 zfs set canmount=on rpool/ROOT/2024-03-11-094351 2024-03-11.14:16:33 zfs promote rpool/ROOT/2024-03-11-094351 2024-03-11.14:17:03 zfs destroy -r rpool/ROOT/2024-02-24-140211_ok I surely didn't do the "zfs set canmount=..." for those by hand. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Am 2024-03-29 18:13, schrieb Mark Johnston: On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote: Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. What is the exact revision you're running? There were some unrelated changes to the kernel linker around the same time. The working src is from 2024-03-11-094351 (GMT+0100). The failing src was fetched after Glebs stabilization week message (and todays src before the sound stuff still fails). Retpoline wasn't the cause, next test is the CTF stuff in the kernel... I tried the workaround to load the modules from the loader, which works, but then I can't login remotely as ssh fails to allocate a pty. By loading modules via the loader, I can see messages about missing CTF info when the nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko instead of /boot/kernel/...ko) try to get initialised... and it looks like they are failing to get initialised because of this missing CTF stuff (I'm back to the previous boot env to be able to login remotely and send mails, I don't have a copy of the failure message at hand). I assume the missing CTF stuff is due to the CTF based pretty printing (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). Is this supposed to fail to load modules which are compiled without CTF data? Shouldn't this work gracefully (e.g. spit out a warning that pretty printing is not available for module X and have the module working)? From my reading of linker_ctf_load_file(), this is exactly how it already works. Great that it works this way, I still suggest to print a message what the warning about missing stuff means. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. I tried the workaround to load the modules from the loader, which works, but then I can't login remotely as ssh fails to allocate a pty. By loading modules via the loader, I can see messages about missing CTF info when the nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko instead of /boot/kernel/...ko) try to get initialised... and it looks like they are failing to get initialised because of this missing CTF stuff (I'm back to the previous boot env to be able to login remotely and send mails, I don't have a copy of the failure message at hand). I assume the missing CTF stuff is due to the CTF based pretty printing (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). Is this supposed to fail to load modules which are compiled without CTF data? Shouldn't this work gracefully (e.g. spit out a warning that pretty printing is not available for module X and have the module working)? Next steps: - try a world without retpoline (bind_now and ccache active) - try a kernel without CTF (bind now, ccache, retpoline active) - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS If anyone has an idea how to debug this in some other way... Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Reason why "nocache" option is not displayed in "mount"?
Am 2024-03-10 22:57, schrieb Konstantin Belousov: We are already low on the free bits in the flags, even after expanding them to 64bit. More, there are useful common fs services continuously consuming that flags, e.g. the recent NFS TLS options. I object against using the flags for absolutely not important things, like this nullfs "cache" option. In long term, we would have to export nmount(2) strings since bits in flags are finite, but I prefer to delay it as much as possible. Why do you want to delay this? Personal priorities, or technical reasons? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Reason why "nocache" option is not displayed in "mount"?
Am 2024-03-09 15:27, schrieb Rick Macklem: On Sat, Mar 9, 2024 at 5:08 AM Alexander Leidinger wrote: Am 2024-03-09 06:07, schrieb Warner Losh: > On Thu, Mar 7, 2024 at 1:05 PM Jamie Landeg-Jones > wrote: > >> Alexander Leidinger wrote: >> >>> Hi, >>> >>> what is the reason why "nocache" is not displayed in the output of >>> "mount" for nullfs options? >> >> Good catch. I also notice that "hidden" is not shown either. >> >> I guess that as for some time, "nocache" was a "secret" option, no-one >> update "mount" to display it? > > So a couple of things to know. > > First, there's a list of known options. These are converted to a > bitmask. This is then decoded and reported by mount. The other strings > are passed to the filesystem directly. They decode it and do things, > but they don't export them (that I can find). I believe that's why they > aren't reported with 'mount'. There's a couple of other options in > /etc/fstab that are pseudo options too. That's the technical explanation why it doesn't work. I'm a step further since initial mail, I even had a look at the code and know that nocache is recorded in a nullfs private flag and that the userland can not access this (mount looks at struct statfs which doesn't provide info to this and some other things). My question was targeted more in the direction if there is a conceptual reason or if it was an oversight that it is not displayed. I admit that this was lost in translation... Regarding the issue of not being able to see all options which are in effect for a given mount point (not specific to nocache): I consider this to be a bug. Pseudo options like "late" or "noauto" in fstab which don't make sense to use when you use mount(8) a FS by hand, I do not consider here. As a data point, I added the "-m"option to nfsstat(1) so that all the nfs related options get displayed. Part of the problem is that this will be file system specific, since nmount() defers processing options to the file systems. There exists values for a lot of the mount opions which are not displayed. For example the nocache option for nullfs is MNTK_NULL_NOCACHE in https://cgit.freebsd.org/src/tree/sys/sys/mount.h#n515 This may not be useable as is, but I use it to show that there are already bits public about it, just not in the proper place to be useful to the userland. Even FS specific options could be set as part of statfs (by letting the FS set them in struct statfs). Or there could be a per-mount callback / ioctl / whatever which provides the options in some way to the userland if requested. So we either have something which could be used but requires some interface to let a FS set a value somewhere, or if this is a too gross hack, we would need to come up with a new interface to query this info. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Reason why "nocache" option is not displayed in "mount"?
Am 2024-03-09 06:07, schrieb Warner Losh: On Thu, Mar 7, 2024 at 1:05 PM Jamie Landeg-Jones wrote: Alexander Leidinger wrote: Hi, what is the reason why "nocache" is not displayed in the output of "mount" for nullfs options? Good catch. I also notice that "hidden" is not shown either. I guess that as for some time, "nocache" was a "secret" option, no-one update "mount" to display it? So a couple of things to know. First, there's a list of known options. These are converted to a bitmask. This is then decoded and reported by mount. The other strings are passed to the filesystem directly. They decode it and do things, but they don't export them (that I can find). I believe that's why they aren't reported with 'mount'. There's a couple of other options in /etc/fstab that are pseudo options too. That's the technical explanation why it doesn't work. I'm a step further since initial mail, I even had a look at the code and know that nocache is recorded in a nullfs private flag and that the userland can not access this (mount looks at struct statfs which doesn't provide info to this and some other things). My question was targeted more in the direction if there is a conceptual reason or if it was an oversight that it is not displayed. I admit that this was lost in translation... Regarding the issue of not being able to see all options which are in effect for a given mount point (not specific to nocache): I consider this to be a bug. Pseudo options like "late" or "noauto" in fstab which don't make sense to use when you use mount(8) a FS by hand, I do not consider here. I'm not sure if this warrants a bug tracker item (which maybe nobody is interested to take ownership of), or if we need to extend the man pages with info which option will not by displayed in the output of mounted FS, or both. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Reason why "nocache" option is not displayed in "mount"?
Am 2024-03-07 14:59, schrieb Christos Chatzaras: what is the reason why "nocache" is not displayed in the output of "mount" for nullfs options? # grep packages /etc/fstab.commit_leidinger_net /shared/ports/packages /space/jails/commit.leidinger.net/shared/ports/packages nullfs rw,noatime,nocache 0 0 # mount | grep commit | grep packages /shared/ports/packages on /space/jails/commit.leidinger.net/shared/ports/packages (nullfs, local, noatime, noexec, nosuid, nfsv4acls) Context: I wanted to check if poudriere is mounting with or without "nocache", and instead of reading the source I wanted to do it more quickly by looking at the mount options. In my setup, I mount the /home directory using nullfs with the nocache option to facilitate access for certain jails. The primary reason for employing nocache is due to the implementation of ZFS quotas on the main system, which do not accurately reflect changes in file usage by users within the jail unless nocache is used. When files are added or removed by a user within jail, their disk usage wasn't properly updated on the main system until I started using nocache. Based on this experience, I'm confident that applying nocache works as expected in your scenario as well. It does. The question is how to I _see_ that a mount point is _setup_ with nocache? In the above example the FS _is_ mounted with nocache, but it is _not displayed_ in the output. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Reason why "nocache" option is not displayed in "mount"?
Hi, what is the reason why "nocache" is not displayed in the output of "mount" for nullfs options? # grep packages /etc/fstab.commit_leidinger_net /shared/ports/packages /space/jails/commit.leidinger.net/shared/ports/packages nullfs rw,noatime,nocache 0 0 # mount | grep commit | grep packages /shared/ports/packages on /space/jails/commit.leidinger.net/shared/ports/packages (nullfs, local, noatime, noexec, nosuid, nfsv4acls) Context: I wanted to check if poudriere is mounting with or without "nocache", and instead of reading the source I wanted to do it more quickly by looking at the mount options. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: February 2024 stabilization week
Am 2024-02-24 21:18, schrieb Konstantin Belousov: On Fri, Feb 23, 2024 at 08:34:21PM -0800, Gleb Smirnoff wrote: Hi FreeBSD/main users, the February 2024 stabilization week started with 03cc3489a02d that was tagged as main-stabweek-2024-Feb. At the moment of the tag creation we already knew about several regression caused by libc/libsys split. In the stabilization branch stabweek-2024-Feb we accumulated following cherry-picks from FreeBSD/main: 1) closefrom() syscall was failing unless you have COMPAT_FREEBSD12 in kernel 99ea67573164637d633e8051eb0a5d52f1f9488e eb90239d08863bcff3cf82a556ad9d89776cdf3f 2) nextboot -k broken on ZFS 3aefe6759669bbadeb1a24a8956bf222ce279c68 0c3ade2cf13df1ed5cd9db4081137ec90fcd19d0 3) libsys links to libc baa7d0741b9a2117410d558c6715906980723eed 4) sleep(3) no longer being a pthread cancellation point 7d233b2220cd3d23c028bdac7eb3b6b7b2025125 We are aware of two regressions still unresolved: 1) libsys/rtld breaks bind 9.18 / mysql / java / ... https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277222 Konstantin, can you please check me? Is this the same issue fixed by baa7d0741b9a2117410d558c6715906980723eed or a different one? Most likely. Since no useful diagnostic was provided, I cannot confirm. It is. And for the curious reader: this affected a world which was build with WITH_BIND_NOW (ports build with RELRO and BIND_NOW were unaffected, as long as the basesystem was not build with BIND_NOW). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: sanitizers broken (was RE: libc/libsys split coming soon)
Am 2024-02-21 10:52, schrieb hartmut.bra...@dlr.de: Hi, I updated yesterday and now event a minimal program with cc -fsanitize=address produces ld: error: undefined symbol: __elf_aux_vector referenced by sanitizer_linux_libcdep.cpp:950 (/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:950) sanitizer_linux_libcdep.o:(__sanitizer::ReExec()) in archive /usr/lib/clang/17/lib/freebsd/libclang_rt.asan-x86_64.a cc: error: linker command failed with exit code 1 (use -v to see invocation) I think this is caused by the libsys split. There are other issues too. Discussed in multiple places. I opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277222 this morning, maybe it can be used to centralize the libsys issues (= I don't mind of you add a comment there, but maybe brooks wants to have a separate PR). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: segfault in ld-elf.so.1
Am 2024-02-13 01:58, schrieb Konstantin Belousov: On Mon, Feb 12, 2024 at 11:54:02AM +0200, Konstantin Belousov wrote: On Mon, Feb 12, 2024 at 10:35:56AM +0100, Alexander Leidinger wrote: > Hi, > > dovecot (and no other program I use on this machine... at least not that I > notice it) segfaults in ld-elf.so.1 after an update from 2024-01-18-092730 > to 2024-02-10-144617 (and now 2024-02-11-212006 in the hope the issue would > have been fixed by changes to libc/libsys since 2024-02-10-144617). The > issue shows up when I try to do an IMAP login. A successful authentication > starts the imap process which immediately segfaults. > > I didn't recompile dovecot for the initial update, but I did now to rule > out a regression in this area (and to get access via imap do my normal mail > account). > > > Backtrace: The backtrace looks incomplete. It might be the case of infinite recursion, but I cannot claim it from the trace. Does the program segfault if you run it manually? If yes, please provide No. me with the tarball of the binary and all required shared libs, including base system libraries, from your machine. Regardless of my request, you might try the following. Note that I did not tested the patch, ensure that you have a way to recover ld-elf.so.1 if something goes wrong. [inline patch] This did the trick and I have IMAP access to my emails again. As this runs in a jail, it was easy to test without fear to kill something. I will try the patch in the review next. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
kernel crash in tcp_subr.c:2386
Hi, I got a coredump with sources from 2024-02-10-144617 (GMT+0100): ---snip--- __curthread () at /space/system/usr_src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /space/system/usr_src/sys/amd64/include/pcpu_aux.h:57 td = #1 doadump (textdump=textdump@entry=1) at /space/system/usr_src/sys/kern/kern_shutdown.c:403 error = 0 coredump = #2 0x8052fe85 in kern_reboot (howto=260) at /space/system/usr_src/sys/kern/kern_shutdown.c:521 once = 0 __pc = #3 0x80530382 in vpanic ( fmt=0x808df476 "Assertion %s failed at %s:%d", ap=ap@entry=0xfe08a079ebf0) at /space/system/usr_src/sys/kern/kern_shutdownc:973 buf = "Assertion !callout_active(>t_callout) failed at /space/system/usr_src/sys/netinet/tcp_subr.c:2386", '\000' __pc = __pc = __pc = other_cpus = {__bits = {14680063, 0 }} td = 0xf8068ef99740 bootopt = newpanic = #4 0x805301d3 in panic (fmt=) at /space/system/usr_src/sys/kern/kern_shutdown.c:889 ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0xfe08a079ec20, reg_save_area = 0xfe08a079ebc0}} #5 0x806c9d8c in tcp_discardcb (tp=tp@entry=0xf80af441ba80) at /space/system/usr_src/sys/netinet/tcp_subr.c:2386 inp = 0xf80af441ba80 so = 0xf804d23d2780 m = isipv6 = #6 0x806d6291 in tcp_usr_detach (so=0xf804d23d2780) at /space/system/usr_src/sys/netinet/tcp_usrreq.c:214 inp = 0xf80af441ba80 tp = 0xf80af441ba80 #7 0x805dba57 in sofree (so=0xf804d23d2780) at /space/system/usr_src/sys/kern/uipc_socket.c:1205 pr = 0x80a8bd18 #8 sorele_locked (so=so@entry=0xf804d23d2780) at /space/system/usr_src/sys/kern/uipc_socket.c:1232 No locals. #9 0x805dc8c0 in soclose (so=0xf804d23d2780) at /space/system/usr_src/sys/kern/uipc_socket.c:1302 lqueue = {tqh_first = 0xf8068ef99740, tqh_last = 0xfe08a079ed40} error = 0 saved_vnet = 0x0 last = listening = #10 0x804ccbd1 in fo_close (fp=0xf805f2dfc500, td=) at /space/system/usr_src/sys/sys/file.h:390 No locals. #11 _fdrop (fp=fp@entry=0xf805f2dfc500, td=, td@entry=0xf8068ef99740) at /space/system/usr_src/sys/kern/kern_descrip.c:3666 count = error = #12 0x804d02f3 in closef (fp=fp@entry=0xf805f2dfc500, td=td@entry=0xf8068ef99740) at /space/system/usr_src/sys/kern/kern_descrip.c:2839 _error = 0 _fp = 0xf805f2dfc500 lf = {l_start = -8791759350504, l_len = -8791759350528, l_pid = 0, l_type = 0, l_whence = 0, l_sysid = 0} vp = fdtol = fdp = #13 0x804cd50c in closefp_impl (fdp=0xfe07afebf860, fd=19, fp=0xf805f2dfc500, td=0xf8068ef99740, audit=) at /space/system/usr_src/sys/kern/kern_descrip.c:1315 error = #14 closefp (fdp=0xfe07afebf860, fd=19, fp=0xf805f2dfc500, td=0xf8068ef99740, holdleaders=true, audit=) at /space/system/usr_src/sys/kern/kern_descrip.c:1372 No locals. #15 0x808597d6 in syscallenter (td=0xf8068ef99740) at /space/system/usr_src/sys/amd64/amd64/../../kern/subr_syscall.c:186 se = 0x80a48330 p = 0xfe07f29995c0 sa = 0xf8068ef99b30 error = sy_thr_static = traced = #16 amd64_syscall (td=0xf8068ef99740, traced=0) at /space/system/usr_src/sys/amd64/amd64/trap.c:1192 ksi = {ksi_link = {tqe_next = 0xfe08a079ef30, tqe_prev = 0x808588af }, ksi_info = { si_signo = 1, si_errno = 0, si_code = 2015268872, si_pid = -512, si_uid = 2398721856, si_status = -2042, si_addr = 0xfe08a079ef40, si_value = {sival_int = -1602621824, sival_ptr = 0xfe08a079ee80, sigval_int = -1602621824, sigval_ptr = 0xfe08a079ee80}, _reason = {_fault = { _trapno = 1489045984}, _timer = {_timerid = 1489045984, _overrun = 17999}, _mesgq = {_mqd = 1489045984}, _poll = { _band = 77306605406688}, _capsicum = {_syscall = 1489045984}, __spare__ = {__spare1__ = 77306605406688, __spare2__ = { 1489814048, 17999, 208, 0, 0, 0, 992191072, ksi_flags = 975329968, ksi_sigq = 0x8082f8f3 } #17 No locals. #18 0x3af13b17fc9a in ?? () No symbol table info available. Backtrace stopped: Cannot access memory at address 0x3af13a225ab8 ---snip--- Any ideas? Due to another issue in userland, I updated to 2024-02-11-212006, but I have the above mentioned version and core still in a BE if needed. Bye, Alexander.
segfault in ld-elf.so.1
Hi, dovecot (and no other program I use on this machine... at least not that I notice it) segfaults in ld-elf.so.1 after an update from 2024-01-18-092730 to 2024-02-10-144617 (and now 2024-02-11-212006 in the hope the issue would have been fixed by changes to libc/libsys since 2024-02-10-144617). The issue shows up when I try to do an IMAP login. A successful authentication starts the imap process which immediately segfaults. I didn't recompile dovecot for the initial update, but I did now to rule out a regression in this area (and to get access via imap do my normal mail account). Backtrace: ---snip--- (lldb) target create "/usr/local/libexec/dovecot/imap" --core "/var/run/dovecot/imap.core" Core file '/var/run/dovecot/imap.core' (x86_64) was loaded. * thread #1, name = 'imap', stop reason = signal SIGSEGV * frame #0: 0x4d3dfa2a4761 ld-elf.so.1`load_object [inlined] object_match_name(obj=0x49a47c203408, name="") at rtld.c:5606:6 frame #1: 0x4d3dfa2a4742 ld-elf.so.1`load_object(name="", fd_u=-1, refobj=0x49a47c228008, flags=0) at rtld.c:2704:10 frame #2: 0x4d3dfa2a3eaa ld-elf.so.1`dlopen_object(name="", fd=-1, refobj=0x49a47c228008, lo_flags=0, mode=1, lockstate=0x1ded0f98cb80) at rtld.c:3747:8 frame #3: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined] load_filtee1(obj=, needed=0x49a47c2007c8, flags=, lockstate=) at rtld.c:2576:16 frame #4: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined] load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80) at rtld.c:2589:2 frame #5: 0x4d3dfa2a223e ld-elf.so.1`symlook_obj(req=0x1ded011502e0, obj=0x49a47c228008) at rtld.c:4735:6 frame #6: 0x4d3dfa2a6992 ld-elf.so.1`symlook_list(req=0x1ded01150368, objlist=, dlp=0x1ded011504b0) at rtld.c:4637:13 frame #7: 0x4d3dfa2a680b ld-elf.so.1`symlook_global(req=0x1ded01150470, donelist=0x1ded011504b0) at rtld.c:4541:8 frame #8: 0x4d3dfa2a6673 ld-elf.so.1`get_program_var_addr(name=, lockstate=0x1ded0f98cb80) at rtld.c:4483:9 frame #9: 0x4d3dfa2a4374 ld-elf.so.1`dlopen_object [inlined] distribute_static_tls(list=0x1ded01150988, lockstate=0x1ded0f98cb80) at rtld.c:5908:6 frame #10: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1, refobj=0x49a47c228008, lo_flags=0, mode=1, lockstate=0x1ded0f98cb80) at rtld.c:3831:6 frame #11: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined] load_filtee1(obj=, needed=0x49a47c2007c8, flags=, lockstate=) at rtld.c:2576:16 frame #12: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined] load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80) at rtld.c:2589:2 frame #13: 0x4d3dfa2a223e ld-elf.so.1`symlook_obj(req=0x1ded01150a80, obj=0x49a47c228008) at rtld.c:4735:6 frame #14: 0x4d3dfa2a6992 ld-elf.so.1`symlook_list(req=0x1ded01150b08, objlist=, dlp=0x1ded01150c50) at rtld.c:4637:13 frame #15: 0x4d3dfa2a680b ld-elf.so.1`symlook_global(req=0x1ded01150c10, donelist=0x1ded01150c50) at rtld.c:4541:8 frame #16: 0x4d3dfa2a6673 ld-elf.so.1`get_program_var_addr(name=, lockstate=0x1ded0f98cb80) at rtld.c:4483:9 frame #17: 0x4d3dfa2a4374 ld-elf.so.1`dlopen_object [inlined] distribute_static_tls(list=0x1ded01151128, lockstate=0x1ded0f98cb80) at rtld.c:5908:6 frame #18: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1, refobj=0x49a47c228008, lo_flags=0, mode=1, lockstate=0x1ded0f98cb80) at rtld.c:3831:6 frame #19: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined] load_filtee1(obj=, needed=0x49a47c2007c8, flags=, lockstate=) at rtld.c:2576:16 frame #20: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined] load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80) at rtld.c:2589:2 frame #21: 0x4d3dfa2a223e ld-elf.so.1`symlook_obj(req=0x1ded01151220, obj=0x49a47c228008) at rtldc:4735:6 frame #22: 0x4d3dfa2a6992 ld-elf.so.1`symlook_list(req=0x1ded011512a8, objlist=, dlp=0x1ded011513f0) at rtld.c:4637:13 frame #23: 0x4d3dfa2a680b ld-elf.so.1`symlook_global(req=0x1ded011513b0, donelist=0x1ded011513f0) at rtld.c:4541:8 frame #24: 0x4d3dfa2a6673 ld-elf.so.1`get_program_var_addr(name=, lockstate=0x1ded0f98cb80) at rtld.c:4483:9 frame #25: 0x4d3dfa2a4374 ld-elf.so.1`dlopen_object [inlined] distribute_static_tls(list=0x1ded011518c8, lockstate=0x1ded0f98cb80) at rtld.c:5908:6 frame #26: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1, refobj=0x49a47c228008, lo_flags=0, mode=1, lockstate=0x1ded0f98cb80) at rtld.c:3831:6 frame #27: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined] load_filtee1(obj=, needed=0x49a47c2007c8, flags=, lockstate=) at rtld.c:2576:16 frame #28: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined] load_filtees(obj=0x49a47c228008, flags=0,
Re: noatime on ufs2
Am 2024-01-30 01:21, schrieb Warner Losh: On Mon, Jan 29, 2024 at 2:31 PM Olivier Certner wrote: It also seems undesirable to add a sysctl to control a value that the kernel doesn't use. The kernel has to use it to guarantee some uniform behavior irrespective of the mount being performed through mount(8) or by a direct call to nmount(2). I think this consistency is important. Perhaps all auto-mounters and mount helpers always run mount(8) and never deal with nmount(2), I would have to check (I seem to remember that, a long time ago, when nmount(2) was introduced as an enhancement over mount(2), the stance was that applications should use mount(8) and not nmount(2) directly). Even if there were no obvious callers of nmount(2), I would be a bit uncomfortable with this discrepancy in behavior. I disagree. I think Mike's suggestion was better and dealt with POLA and POLA breaking in a sane way. If the default is applied universally in user space, then we need not change the kernel at all. We lose all the chicken and egg problems and the non-linearness of the sysctl idea. I would like to add that a sysctl is some kind of a hidden setting, whereas /etc/fstab + /etc/defaults/fstab is a "right in the face" way of setting filesystem / mount related stuff. [...] It could also be generalized so that the FSTYPE could have different settings for different types of filesystem (maybe unique flags that some file systems don't understand). +1 nosuid for tmpfs comes into my mind here... One could also put it in /etc/defaults/fstab too and not break POLA since that's the pattern we use elsewhere. +1 Anyway, I've said my piece. I agree with Mike that there's consensus for this from the installer, and after that consensus falls away. Mike's idea is one that I can get behind since it elegantly solves the general problem. +1 Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Removing fdisk and bsdlabel (legacy partition tools)
Am 2024-01-25 18:49, schrieb Rodney W. Grimes: On Thu, Jan 25, 2024, 9:11?AM Ed Maste wrote: > On Thu, 25 Jan 2024 at 11:00, Rodney W. Grimes > wrote: > > > > > These will need to be addressed before actually removing any of these > > > binaries, of course. > > > > You seem to have missed /rescue. Now think about that long > > and hard, these tools classified as so important that they > > are part of /rescue. Again I can not stress enough how often > > I turn to these tools in a repair mode situation. > > I haven't missed rescue, it is included in the work in progress I > mentioned. Note that rescue has included gpart since 2007. > What can fdisk and/or disklabel repair that gpart can't? As far as I know there is no way in gpart to get to the MBR cyl/hd/sec values, you can only get to the LBA start and end values: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 63, size 8388513 (4095 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 1023/ head 15/ sector 63 gpart show ada0 => 63 8388545 ada0 MBR (4.0G) 63 8388513 1 freebsd [active] (4.0G) 8388576 32- free - (16K) What are you using cyl/hd/sec values for on a system which runs FreeBSD current or on which you would have to use FreeBSD-current in case of a repair need? What is the disk hardware on those systems that you still need cyl/hd/sec and LBA doesn't work? Serious questions out of curiosity. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: noatime on ufs2
Am 2024-01-15 00:08, schrieb Olivier Certner: Hi Warner, The consensus was we'd fix it in the installer. Isn't speaking about a "consensus", at least as a general response to the idea of making 'noatime' the default, a little premature? I have more to say on this topic (see below). Also, I would not dismiss Lyndon's last mail too quickly, and in particular the last paragraph. I'm as interested as he is about possible answers for it. We can't change ZFS easily, and discovery of the problem, should your assertion be wrong, for UFS means metadata loss that can't be recovered. Why ZFS would need changing? If you're referring to an earlier objection from Mark Millard, I don't think it stands, please see my response to him. Else, I don't know what you're referring to. ZFS by default has atime=on. It is our installer which sets atime=off in the ZFS properties. I was understanding Warners comment about changing ZFS in the sense of changing the ZFS code to have a default of atime=off. I agree with Warner that we should not do that. And in my opinion we should keep the other FS which support atime/noatime consistent (those which don't support atime/noatime due to technial limitations don't count in my opinion). By pushing to the installer, most installations get most of benefits. And people with special needs see the issue and can make an informed choice. I agree for those who use the installer. But I'm not sure which proportion of users they represent, and especially for those who care about disabling access times. As for me, I don't think I have used the installer in the past 10 years (to be safe). Is this such an atypical behavior? I haven't used an installer myself since longer (either I was creating a new system by attaching a disk and prepping it from an existing system, or by creating an image and transferring it to the target over the network). But I would say this is atypical behavior by people which know exactly what they are doing, not what a normal consumer would do. Such experts know exactly what they want to do with atime and handle it as needed. Additionally, the installer doesn't cover other use cases: - Mounting filesystems by hand on USB keys or other removal medias (which are not in '/etc/fstab'). This causes users to have to remember to add 'noatime' on the command-line. Those which care about that and know where this makes a difference, have it in their finger-memory. - Using auto-mounters. They have to be configured to use 'noatime'. If our automounter is not able to handle that, it is a bug / missing feature we can change. Personally I would have no objection to changing the automounter config to mount with noatime (if specifying noatime for a FS which don't support atime/noatime doesn't create failures). - Desktop environments shipping a mount helper. Again, they have to be configured, if at all possible. If they are not able to handle that, it is a bug. Typical media in desktop use-cases doesn't really need this. If you handle media which really _needs_ noatime in such a case, you may want to reconsider your way of operating. So limiting action to the installer, while certainly a sensible and pragmatic step, still seems to miss a lot. Nobody told to only limit this action to the installer. The pragmatic part here is to ask if it really matters for those use cases. For mounting by hand I disagree that it matters. For our automounter we should do something (at least making sure it is able to handle it, and if we don't want to swtich the default at least have a commented out entry in the config with a suitable comment). For the desktop helpers it is not our responsability, but interested people surely can file a bug report upstream. Though in all honesty, I've never been able to measure a speed difference. Nor have I worn out a ssd due to the tiny increase in write amp. Old (<100MB) SD and CF cards included. This includes my armv7 based dns server that I ran for a decade on a 256MB SD card with no special settings and full read/write and lots of logging. So the harm is minimal typically. I'm sure there are cases that it matters more than my experience. And it is good practice to enable noatime. Just that failure to do so typically has only a marginal effect. It seemed to make a difference on slow USB keys (well, not just evenly slow, but which could exhibit strange pauses while writing), but I didn't gather enough hard data to call that "scientific". I sometimes manage to saturate M2 SSD I/O bandwidth but then I always use 'noatime', so not sure how much a difference it makes. The "updatedb" scenario that runs 'find' causes access time updates for all directories, causing spikes in the number of writes which may affect the response time during the process. That said, it is only run once a week by default. I would say that most of the value of having 'noatime' the default is in
Re: noatime on ufs2
Am 2024-01-11 18:15, schrieb Rodney W. Grimes: Am 2024-01-10 22:49, schrieb Mark Millard: > I never use atime, always noatime, for UFS. That said, I'd never > propose > changing the long standing defaults for commands and calls. I'd avoid: [good points I fully agree on] There's one possibility which nobody talked about yet... changing the default to noatime at install time in fstab / zfs set. Perhaps you should take a closer look at what bsdinstall does when it creates a zfs install pool and boot environment, you might just find that noatime is already set everywhere but on /var/mail: /usr/libexec/bsdinstall/zfsboot:: ${ZFSBOOT_POOL_CREATE_OPTIONS:=-O compress=lz4 -O atime=off} /usr/libexec/bsdinstall/zfsboot:/var/mail atime=on While zfs is a part of what I talked about, it is not the complete picture. bsdinstall covers UFS and ZFS, and we should keep them in sync in this regard. Ideally with an option the user can modify. Personally I don't mind if the default setting for this option would be noatime. A quick serach in the scripts of bsdinstall didn't reveal to me what we use for UFS. I assume we use atime. I fully agree to not violate POLA by changing the default to noatime in any FS. I always set noatime everywhere on systems I take care about, no exceptions (any user visible mail is handled via maildir/IMAP, not mbox). I haven't made up my mind if it would be a good idea to change bsdinstall to set noatime (after asking the user about it, and later maybe offer the possibility to use relatime in case it gets implemented). I think it is at least worthwile to discuss this possibility (including what the default setting of bsdinstall should be for this option). Little late... iirc its been that way since day one of zfs support in bsdinstall. Which I don't mind, as this is what I use anyway. But the correct way would be to let the user decide. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: noatime on ufs2
Am 2024-01-10 22:49, schrieb Mark Millard: I never use atime, always noatime, for UFS. That said, I'd never propose changing the long standing defaults for commands and calls. I'd avoid: [good points I fully agree on] There's one possibility which nobody talked about yet... changing the default to noatime at install time in fstab / zfs set. I fully agree to not violate POLA by changing the default to noatime in any FS. I always set noatime everywhere on systems I take care about, no exceptions (any user visible mail is handled via maildir/IMAP, not mbox). I haven't made up my mind if it would be a good idea to change bsdinstall to set noatime (after asking the user about it, and later maybe offer the possibility to use relatime in case it gets implemented). I think it is at least worthwile to discuss this possibility (including what the default setting of bsdinstall should be for this option). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: ZFS problems since recently ?
Am 2024-01-02 08:22, schrieb Kurt Jaeger: Hi! The sysctl for block cloning is vfs.zfs.bclone_enabled. To check if a pool has made use of block cloning: zpool get all poolname | grep bclone One more thing: I have two pools on that box, and one of them has some bclone files: # zpool get all ref | grep bclone ref bcloneused 21.8M - ref bclonesaved24.4M - ref bcloneratio2.12x - # zpool get all pou | grep bclone pou bcloneused 0 - pou bclonesaved0 - pou bcloneratio1.00x - The ref pool contains the system and some files. The pou pool is for poudriere only. How do I find which files on ref are bcloned and how can I remove the bcloning from them ? No idea about the detection (I don't expect an easy way), but the answer to the second part is to copy the files after disabling block cloning. As this is system stuff, I would expect it is not much data, and you could copy everything and then move back to the original place. I would also assume original log files are not affected, and only files which were copied (installworld or installkernel or backup files or manual copies or port install (not sure about pkg install)) are possible targets. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: bridge(4) and IPv6 broken?
Am 2024-01-02 00:40, schrieb Lexi Winter: hello, i'm having an issue with bridge(4) and IPv6, with a configuration which is essentially identical to a working system running releng/14.0. ifconfig: lo0: flags=1008049 metric 0 mtu 16384 options=680003 inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21 pflog0: flags=1000141 metric 0 mtu 33152 options=0 groups: pflog alc0: flags=1008943 metric 0 mtu 1500 options=c3098 ether 30:9c:23:a8:89:a0 inet6 fe80::329c:23ff:fea8:89a0%alc0 prefixlen 64 scopeid 0x3 media: Ethernet autoselect (1000baseT ) status: active nd6 options=1 wg0: flags=10080c1 metric 0 mtu 1420 options=8 inet 172.16.145.21 netmask 0x inet6 fd00:0:1337:cafe:::829a:595e prefixlen 128 groups: wg tunnelfib: 1 nd6 options=101 bridge0: flags=1008843 metric 0 mtu 1500 options=0 ether 58:9c:fc:10:ff:b6 inet 10.1.4.101 netmask 0xff00 broadcast 10.1.4.255 inet6 2001:8b0:aab5:104:3::101 prefixlen 64 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: tap0 flags=143 ifmaxaddr 0 port 6 priority 128 path cost 200 member: alc0 flags=143 ifmaxaddr 0 port 3 priority 128 path cost 55 groups: bridge nd6 options=1 tap0: flags=9903 metric 0 mtu 1500 options=8 ether 58:9c:fc:10:ff:89 groups: tap media: Ethernet 1000baseT status: no carrier nd6 options=29 the issue is that the bridge doesn't seem to respond to IPv6 ICMP Neighbour Solicitation. for example, while running ping, tcpdump shows this: 23:30:16.567071 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 13, length 16 23:30:16.634860 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 23:30:17.567080 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 14, length 16 23:30:17.674842 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 23:30:17.936956 1e:ab:48:c1:f6:62 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 166: fe80::1cab:48ff:fec1:f662 > ff02::1: ICMP6, router advertisement, length 112 23:30:18.567093 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 15, length 16 23:30:19.567104 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 16, length 16 23:30:19.567529 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 fe80::1cab:48ff:fec1:f662 is the subnet router; it's sending solicitations but FreeBSD doesn't send a response, if i remove alc0 from the bridge and configure the IPv6 address directly on alc0 instead, everything works fine. i'm testing without any packet filter (ipfw/pf) in the kernel. it's possible i'm missing something obvious here; does anyone have an idea? Just an idea. I'm not sure if it is the right track... There is code in the kernel which is ignoring NS stuff from "non-valid" sources (security / spoofing reasons). The NS request is from a link local address. Your bridge has no link local address (and your tap has the auto linklocal flag set which I would have expected to be on the bridge instead). I'm not sure but I would guess it could be because of this. If my guess is not too far off, I would suggest to try: - remove auto linklocal from tap0 (like for alc0) - add auto linklocal to bridge0 If this doesn't help, there is the sysctl net.inet6.icmp6.nd6_onlink_ns_rfc4861 which you could try to set to 1. Please read https://www.freebsd.org/security/advisories/FreeBSD-SA-08:10.nd6.asc before you do that. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: ZFS problems since recently ?
Am 2023-12-31 19:34, schrieb Kurt Jaeger: I already have vfs.zfs.dmu_offset_next_sync=0 which is supposed to disable block-cloning. It isn't. This one is supposed to fix an issue which is unrelated to block cloning (but can be amplified by block cloning). This issue is fixed since some weeks, your Dec 23 build should not need it (when the issues happens, you have files with zero as parts of the data instead of the real data, and only if you copy files at the same time as those files are modified, and then only if you happen to get the timing right). The sysctl for block cloning is vfs.zfs.bclone_enabled. To check if a pool has made use of block cloning: zpool get all poolname | grep bclone Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
What is rc.d/opensm?
Hi, for my work on service jails (https://reviews.freebsd.org/D40370) I try to find out what opensm is. On my amd64 system I don't have a man page nor the binary (and man.freebsd.org doesn't know either about opensm). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: openzfs and block cloning question
Am 2023-11-24 08:10, schrieb Oleksandr Kryvulia: Hi, Recently cperciva@ published in his twitter [1] that enabling block cloning feature tends to data lost on 14. Is this statement true for the current? Since I am using current for daily work and block cloning enabled by default how can I verify that my data is not affected? Thank you. Block cloning may have an issue, or it does things which amplifies an old existing issue, or there are two issues... The full story is at https://github.com/openzfs/zfs/issues/15526 To be on the safe side, you may want to have vfs.zfs.dmu_offset_next_sync=0 (loader.conf / sysctl.conf) for the moment. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Request for Testing: TCP RACK
Am 2023-11-17 14:29, schrieb void: On Thu, Nov 16, 2023 at 10:13:05AM +0100, tue...@freebsd.org wrote: You can load the kernel module using kldload tcp_rack You can make the RACK stack the default stack using sysctl net.inet.tcp.functions_default=rack Hi, thank you for this. https://klarasystems.com/articles/using-the-freebsd-rack-tcp-stack/ mentions this needs to be set in /etc/src.conf : WITH_EXTRA_TCP_STACKS=1 Is this still the case? Context here is -current both in a vm and bare metal, on various machines, on various connections, from DSL to 10Gb. On a recent -current: this is not needed anymore, it is part of the defaults now. But you may still compile the kernel with "option TCPHPTS" (until it's added to the defaults too). Is there a method (yet) for enabling this functionality in various -RELENG maybe where one can compile in a vm built for that purpose, then transferring to the production vm? Copy the kernel which was build according to the acticle from klara systems to your target VM. Would it be expected to work on arm64? Yes (I use it on an ampere VM in the cloud). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: poudriere job && find jobs which received signal 11
Am 2023-10-18 09:54, schrieb Matthias Apitz: Hello, I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports, from git October 14, 2023. In the last two day 2229 packages were produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken). This morning I was looking for something in /var/log/messages and accidentally I detected that yesterday a few compilations failed: # grep 'signal 11' /var/log/messages | grep -v conftest Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) As I said, without that any of the 2229 jobs were failing: # cd /usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg # ls -C1 | wc -l 2229 # grep -l 'build failure' * p5-Gtk2-1.24993_3.log How this is possible, that the make engines didn't failing? The uid That can be part of configure runs which try to test some features. 65534 is the one used by poudriere, can I use the jid 24 somehow to find the job which received the signal 11? Or is the time the only way to jid = jail ID, the first column in the output of "jls". If you have the poudriere runtime logs (where it lists which package it is processing ATM), you will see a number from 1 to the max number of jails which run in parallel. This number is part of the hostname of the jail. So if you have the poudriere jails still running, you can make a mapping from the jid to the name to the number, and together with the time you can see which package it was building at that time. Unfortunately poudriere doesn't list the hostname of the builder nor the jid (feature request anyone?). Example poudriere runtime log: ---snip--- [00:54:11] [03] [00:00:00] Building security/nss | nss-3.94 [00:56:46] [03] [00:02:35] Finished security/nss | nss-3.94: Success [00:56:47] [03] [00:00:00] Building textproc/gsed | gsed-4.9 [00:57:41] [01] [00:06:18] Finished x11-toolkits/gtk30 | gtk3-3.24.34_1: Success [00:57:42] [01] [00:00:00] Building devel/qt6-base | qt6-base-6.5.3 ---snip--- While poudriere is running, jls reports this: ---snip--- # jls jid host.hostname [...] 91 poudriere-bastille-default 92 poudriere-bastille-default 93 poudriere-bastille-default-job-01 94 poudriere-bastille-default-job-01 95 poudriere-bastille-default-job-02 96 poudriere-bastille-default-job-03 97 poudriere-bastille-default-job-02 98 poudriere-bastille-default-job-03 ---snip--- So if we assume a coredump in jid 96 or 98, this means it was in builder 3. nss and gseed where build by poudriere builder number 3 (both about 56 minutes after start of poudriere), and gtk30 and qt6-base by poudriere builder number 1. If we assume further that the coredumps are in the timerange of 54 to 56 minutes after the poudriere start, the logs of nss may have a trace of it (or not, if it was part of configure, then you would have to do the configure run and check the messages if it generates similar coredumps) look, which of the 4 poudriere engines were running at this time? I'd like to rerun/reproduce the package again. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: issue: poudriere jail update fails after recent changes around certctl
Am 2023-10-13 17:42, schrieb Dag-Erling Smørgrav: Alexander Leidinger writes: some change around certctl (world from 2023-10-09) has broken the poudriere jail update command. The complete install finishes, certctl is run, and then there is an exit code 1. This is because I have some certs listed as untrusted, and this seems to give a retval of 1 inside certctl. This only happens if a certificate is listed as both trusted and untrusted, and I'm pretty sure the previous version would return 1 in that case as well. Can you check? I compared /usr/share/certs/untrusted/ with /usr/share/certs/trusted/ and some of them match with certs in /usr/share/certs/trusted/. Nothing in /usr/local/etc/ssl/untrusted/, one cert (as hash) in /usr/local/etc/ssl/blacklisted/ which is also in /usr/share/certs/untrusted/. If FreeBSD provides some certs as trusted (as part of e.g. installworld), and I have some of them listed in untrusted, I would not expect an error case, but a failsafe action of not trusting them and not complaining... am I doing something wrong? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
issue: poudriere jail update fails after recent changes around certctl
Hi, some change around certctl (world from 2023-10-09) has broken the poudriere jail update command. The complete install finishes, certctl is run, and then there is an exit code 1. This is because I have some certs listed as untrusted, and this seems to give a retval of 1 inside certctl. Testcase: set a cert as untrusted and try to use "poudriere jail -u -j YOUR_JAIL_NAME -m src=/usr/src" Relevant log: ---snip--- -- Installing everything completed on Fri Oct 13 10:00:04 CEST 2023 -- 83.55 real 103.83 user 109.42 sys certctl.sh: Skipping untrusted certificate ad088e1d (/space/poudriere/jails/poudriere-x11/etc/ssl/untrusted/ad088e1d.0) [some more untrusted] *** [installworld] Error code 1 make[1]: stopped in /space/system/usr_src 1 error make[1]: stopped in /space/system/usr_src make: stopped in /usr/src [00:01:32] Error: Failed to 'make installworld' ---snip--- Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: git: 989c5f6da990 - main - freebsd-update: create deep BEs by default [really about if -r for bectl create should just go away]
Am 2023-10-12 07:08, schrieb Mark Millard: I use the likes of: BE Active Mountpoint Space Created build_area_for-main-CA72 - - 1.99G 2023-09-20 10:19 main-CA72NR / 4.50G 2023-09-21 10:10 NAMECANMOUNT MOUNTPOINT zopt0 on/zopt0 . . . zopt0/ROOT onnone zopt0/ROOT/build_area_for-main-CA72 noautonone zopt0/ROOT/main-CA72noautonone zopt0/poudriere on /usr/local/poudriere zopt0/poudriere/dataon /usr/local/poudriere/data zopt0/poudriere/data/.m on /usr/local/poudriere/data/.m zopt0/poudriere/data/cache on /usr/local/poudriere/data/cache zopt0/poudriere/data/images on /usr/local/poudriere/data/images zopt0/poudriere/data/logs on /usr/local/poudriere/data/logs zopt0/poudriere/data/packages on /usr/local/poudriere/data/packages zopt0/poudriere/data/wrkdirson /usr/local/poudriere/data/wrkdirs zopt0/poudriere/jails on /usr/local/poudriere/jails zopt0/poudriere/ports on /usr/local/poudriere/ports zopt0/tmp on/tmp zopt0/usr off /usr zopt0/usr/13_0R-src on/usr/13_0R-src zopt0/usr/alt-main-src on/usr/alt-main-src zopt0/usr/home on/usr/home zopt0/usr/local on/usr/local [...] If such ends up as unsupportable, it will effectively eliminate my reason for using bectl (and, so, zfs): the sharing is important to my use. Additionally/complementary to what Kyle said... The -r option is about zop0/ROOT/main-CA72 zop0/ROOT/main-CA72/subDS1 zop0/ROOT/main-CA72/subDS2 A shallow clone is only taking zop0/ROOT/main-CA72 into account, while a -r clone is also cloning subDS1 and subDS2. So as Kyle said, your (and my) use case are not affected by this. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
base-krb5 issues (segfaults when adding principals in openssl)
Hi, has someone else issues with krb5 on -current when adding principals? With -current as of 2023-09-11 I get a segfault in openssl: ---snip--- Reading symbols from /usr/bin/kadmin... Reading symbols from /usr/lib/debug//usr/bin/kadmin.debug... [New LWP 270171] btCore was generated by `kadmin -l'. Program terminated with signal SIGSEGV, Segmentation fault. Address not mapped to object. #0 0x in ?? () (gdb) bt #0 0x in ?? () #1 0x0e118da145f8 in ARCFOUR_string_to_key (context=0x44f9fba1a000, enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., opaque=..., key=0x44f9fba211d8) at /space/system/usr_src/crypto/heimdal/lib/krb5/salt-arcfour.c:84 #2 0x0e118da156e9 in krb5_string_to_key_data_salt_opaque (enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, salt=..., opaque=..., context=, password=..., key=) at /space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:201 #3 krb5_string_to_key_data_salt (context=0x44f9fba1a000, enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., key=0x44f9fba211d8) at /space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:173 #4 0x0e118da158cb in krb5_string_to_key_salt (context=0x44f9fba4bc60, context@entry=0x44f9fba1a000, enctype=-1980854121, password=0x0, password@entry=0xe1189ee9510 "1kad$uwi6!", salt=..., key=0x5) at /space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:225 #5 0x0e118ba75423 in hdb_generate_key_set_password (context=0x44f9fba1a000, principal=, password=password@entry=0xe1189ee9510 "1kad$uwi6!", keys=keys@entry=0xe1189ee9210, num_keys=num_keys@entry=0xe1189ee9208) at /space/system/usr_src/crypto/heimdal/lib/hdb/keys.c:381 #6 0x0e118ca91c9a in _kadm5_set_keys (context=context@entry=0x44f9fba1a140, ent=ent@entry=0xe1189ee9258, password=0x1 , password@entry=0xe1189ee9510 "1kad$uwi6!") at /space/system/usr_src/crypto/heimdal/lib/kadm5/set_keys.c:51 #7 0x0e118ca8caac in kadm5_s_create_principal (server_handle=0x44f9fba1a140, princ=, mask=out>, password=0xe1189ee9510 "1kad$uwi6!") at /space/system/usr_src/crypto/heimdal/lib/kadm5/create_s.c:172 #8 0x0e0969e1a57b in add_one_principal (name=, rand_key=0, rand_password=0, use_defaults=0, password=0xe1189ee9510 "1kad$uwi6!", key_data=0x0, max_ticket_life=, max_renewable_life=, attributes=0x0, expiration=, pw_expiration=0x0) at /space/system/usr_src/crypto/heimdal/kadmin/ank.c:141 #9 add_new_key (opt=opt@entry=0xe1189ee9960, argc=argc@entry=1, argv=0x44f9fba49238, argv@entry=0x44f9fba49230) at /space/system/usr_src/crypto/heimdal/kadmin/ank.c:243 #10 0x0e0969e1e124 in add_wrap (argc=, argv=0x44f9fba49230) at kadmin-commands.c:210 #11 0x0e0969e23945 in sl_command (cmds=, argc=2, argv=0x44f9fba49230) at /space/system/usr_src/crypto/heimdal/lib/sl/sl.c:209 #12 sl_command_loop (cmds=cmds@entry=0xe0969e282a0 , prompt=prompt@entry=0xe0969e15cca "kadmin> ", data=) at /space/system/usr_src/crypto/heimdal/lib/sl/sl.c:328 #13 0x0e0969e1d876 in main (argc=, argv=out>) at /space/system/usr_src/crypto/heimdal/kadmin/kadmin.c:275 (gdb) up 1 #1 0x0e118da145f8 in ARCFOUR_string_to_key (context=0x44f9fba1a000, enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., opaque=..., key=0x44f9fba211d8) at /space/system/usr_src/crypto/heimdal/lib/krb5/salt-arcfour.c:84 84 EVP_DigestUpdate (m, , 1); (gdb) list 79 80 /* LE encoding */ 81 for (i = 0; i < len; i++) { 82 unsigned char p; 83 p = (s[i] & 0xff); 84 EVP_DigestUpdate (m, , 1); 85 p = (s[i] >> 8) & 0xff; 86 EVP_DigestUpdate (m, , 1); 87 } 88 (gdb) print i $1 = 0 (gdb) print len $2 = (gdb) print p $3 = 49 '1' (gdb) print m $4 = (EVP_MD_CTX *) 0x43e31de4bc60 (gdb) print *m $5 = {reqdigest = 0x17e678afd470, digest = 0x0, engine = 0x0, flags = 0, md_data = 0x0, pctx = 0x0, update = 0x0, algctx = 0x0, fetched_digest = 0x0} (gdb) ---snip--- Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Speed improvements in ZFS
Am 2023-09-15 13:40, schrieb George Michaelson: Not wanting to hijack threads I am interested if any of this can translate back up tree and make Linux ZFS faster. And, if there are simple sysctl tuning worth trying in large (tb) memory model pre 14 FreeBSD systems with slow zfs. Older freebsd alas. The current part of the discussion is not really about ZFS (I use a lot of nullfs on top of ZFS). So no to the first part. The tuning I did (maxvnodes) doesn't really depend on the FreeBSD version, but on the number of files touched/contained in the FS. The only other change I made is updating the OS itself, so this part doesn't apply to pre 14 systems. If you think your ZFS (with a large ARC) is slow, you need to review your primary cache settings per dataset, check the arcstats, and maybe think about a 2nd level arc on fast storage (cache device on nvm or ssd). IF you have a read-once workload, nothing of this will help. So all depends on your workload. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Speed improvements in ZFS
Am 2023-09-04 14:26, schrieb Mateusz Guzik: On 9/4/23, Alexander Leidinger wrote: Am 2023-08-28 22:33, schrieb Alexander Leidinger: Am 2023-08-22 18:59, schrieb Mateusz Guzik: On 8/22/23, Alexander Leidinger wrote: Am 2023-08-21 10:53, schrieb Konstantin Belousov: On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote: Am 2023-08-20 23:17, schrieb Konstantin Belousov: > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: > > On 8/20/23, Alexander Leidinger wrote: > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: > > >> On 8/20/23, Alexander Leidinger > > >> wrote: > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: > > >>>> On 8/18/23, Alexander Leidinger > > >>>> wrote: > > >>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you > > >>>>> interested > > >>>>> to > > >>>>> get it? > > >>>>> > > >>>> > > >>>> Your problem is not the vnode limit, but nullfs. > > >>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg > > >>> > > >>> 122 nullfs mounts on this system. And every jail I setup has > > >>> several > > >>> null mounts. One basesystem mounted into every jail, and then > > >>> shared > > >>> ports (packages/distfiles/ccache) across all of them. > > >>> > > >>>> First, some of the contention is notorious VI_LOCK in order > > >>>> to > > >>>> do > > >>>> anything. > > >>>> > > >>>> But more importantly the mind-boggling off-cpu time comes > > >>>> from > > >>>> exclusive locking which should not be there to begin with -- > > >>>> as > > >>>> in > > >>>> that xlock in stat should be a slock. > > >>>> > > >>>> Maybe I'm going to look into it later. > > >>> > > >>> That would be fantastic. > > >>> > > >> > > >> I did a quick test, things are shared locked as expected. > > >> > > >> However, I found the following: > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { > > >> mp->mnt_kern_flag |= > > >> lowerrootvp->v_mount->mnt_kern_flag & > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | > > >> MNTK_EXTENDED_SHARED); > > >> } > > >> > > >> are you using the "nocache" option? it has a side effect of > > >> xlocking > > > > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > > > > > > > If you don't have "nocache" on null mounts, then I don't see how > > this > > could happen. > > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set > for > fuse and nfs at least. 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them. By nfs I meant nfs client, not nfs exports. No NFS client mounts anywhere on this system. So where is this exclusive lock coming from then... This is a ZFS system. 2 pools: one for the root, one for anything I need space for. Both pools reside on the same disks. The root pool is a 3-way mirror, the "space-pool" is a 5-disk raidz2. All jails are on the space-pool. The jails are all basejail-style jails. While I don't see why xlocking happens, you should be able to dtrace or printf your way into finding out. dtrace looks to me like a faster approach to get to the root than printf... my first naive try is to detect exclusive locks. I'm not 100% sure I got it right, but at least dtrace doesn't complain about it: ---snip--- #pragma D option dynvarsize=32m fbt:nullfs:null_lock:entry /args[0]->a_flags & 0x08 != 0/ { stack(); } ---snip--- In which direction should I look with dtrace if this works in tonights run of periodic? I don't have enough knowledge about VFS to come up with some immediate ideas. After your sysctl fix for maxvnodes I increased the amount of vnodes 10 times compared to the initial report. This has increased the speed of the operation, the find runs in all those jails finished today after ~5h (@~8am) instead of in the afternoon as before. Could this suggest that in parallel some null_reclaim() is running which does the exclusive locks and slows down the entire operation? That may be a slowdown to some extent, but the primary problem is exclusive vnode locking for stat lookup, which should not be happening. With -current as of 2023-09-03 (and right now 2023-09-11), the periodic daily runs are down to less than an hour... and this didn't happen directly after switching to 2023-09-13. First it went down to 4h, then down to 1h without any update of the OS. The only thing what I did was modifying the number of maxfiles. First to some huge amount after your commit in the sysctl affecting part. Then after noticing way more freevnodes than configured down to 5. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: sed in CURRENT fails in textproc/jq
Am 2023-09-10 18:53, schrieb Robert Clausecker: Hi Warner, Thank you for your response. Am Sun, Sep 10, 2023 at 09:53:03AM -0600 schrieb Warner Losh: On Sun, Sep 10, 2023, 7:36 AM Robert Clausecker wrote: > Hi Warner, > > I have pushed a fix. It should hopefully address those failing tests. > The same issue should also affect memcmp(), but unlike for memchr(), it is > illegal to pass a length to memcmp() that extends past the actual end of > the buffer as memcmp() is permitted to examine the whole buffer regardless > of where the first mismatch is. > > I am considering a change to improve the behaviour of memcmp() on such > errorneous inputs. There are two options: (a) I could change memcmp() the > same way I fixed memchr() and have implausible buffer lengths behave as if > the buffer goes to the end of the address space or (b) I could change > memcmp() to crash loudly if it detects such a case. I could also > (c) leave memcmp() as is. Which of these three choices is preferable? > What does the standard say? I'm highly skeptical that these corner cases are UB behavior. I'd like actual support for this statement, rather than your conjecture that it's illegal. Even if you can come up with that, preserving the old behavior is my first choice. Especially since many of these functions aren't well defined by a standard, but are extensions. As for memchr, https://pubs.opengroup.org/onlinepubs/009696799/functions/memchr.html has no such permission to examine 'the entire buffer at once' nor any restirction as to the length extending beyond the address space. I'm skeptical of your reading that it allows one to examine all of [b, b + len), so please explain where the standard supports reading past the first occurance. memchr() in particular is specified to only examine the input until the matching character is found (ISO/IEC 9899:2011 § 7.24.5.1): *** The memchr function locates the first occurrence of c (converted to an unsigned char) in the initial n characters (each interpreted as unsigned char) of the object pointed to by s. The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found. *** Therefore, it appears reasonable that calls with fake buffer lengths (e.g. SIZE_MAX, to read until a mismatch occurs) must be supported. However, memcmp() has no such language and the text explicitly states that the whole buffer is compared (ISO/IEC 9899:2011 § 7.24.4.1): *** The memcmp function compares the first n characters of the object pointed to by s1 to the first n characters of the object pointed to by s2. *** By omission, this seems to give license to e.g. implement memcmp() like timingsafe_memcmp() where it inspects all n characters of both buffers and only then gives a result. So if n is longer than the actual buffer (e.g. n == SIZE_MAX), behaviour may not be defined (e.g. there could be a crash due to crossing into an unmapped page). Thus I have patched memchr() to behave correctly when length SIZE_MAX is given (commit b2618b65). My memcmp() suffers from similarly flawed logic and may need to be patched. However, as the language I cited above does not indicate that such usage needs to be supported for memcmp() (whereas it must be for memchr(), contrary to my assumptions), I was asking you for how to proceed with memcmp (hence choices (a)--(c)). My 2ct: What did the previous implementation of memcmp() do in this case? - If it was generous and behaved similar to the requirements of memchr(), POLA requires to have the same now too. - If it was crashing or silently going on (= lurking bugs in 3rd party code), we may have the possibility to do a coredump in case of running past the end of the buffer to prevent malicous use. - In general I go with the robustness principle, "be liberal what you accept, but strict in what you provide" = memcmp() should behave as if it is supported. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: 100% CPU time for sysctl command, not killable
Am 2023-09-03 21:22, schrieb Alexander Leidinger: Am 2023-09-02 16:56, schrieb Mateusz Guzik: On 8/20/23, Alexander Leidinger wrote: Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable sysctl program. This is somewhat unexpected... fixed here https://cgit.freebsd.org/src/commit/?id=32988c1499f8698b41e15ed40a46d271e757bba3 I confirm. There may be dragons...: kern.maxvnodes: 1048576000 vfs.wantfreevnodes: 262144000 vfs.freevnodes: 0 <--- vfs.vnodes_created: 11832359 vfs.numvnodes: 146699 vfs.recycles_free: 4700765 vfs.recycles: 0 vfs.vnode_alloc_sleeps: 0 Another time I got an insanely huge amount of free vnodes (more than maxvnodes). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Speed improvements in ZFS
Am 2023-08-28 22:33, schrieb Alexander Leidinger: Am 2023-08-22 18:59, schrieb Mateusz Guzik: On 8/22/23, Alexander Leidinger wrote: Am 2023-08-21 10:53, schrieb Konstantin Belousov: On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote: Am 2023-08-20 23:17, schrieb Konstantin Belousov: > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: > > On 8/20/23, Alexander Leidinger wrote: > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: > > >> On 8/20/23, Alexander Leidinger wrote: > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: > > >>>> On 8/18/23, Alexander Leidinger > > >>>> wrote: > > >>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you > > >>>>> interested > > >>>>> to > > >>>>> get it? > > >>>>> > > >>>> > > >>>> Your problem is not the vnode limit, but nullfs. > > >>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg > > >>> > > >>> 122 nullfs mounts on this system. And every jail I setup has > > >>> several > > >>> null mounts. One basesystem mounted into every jail, and then > > >>> shared > > >>> ports (packages/distfiles/ccache) across all of them. > > >>> > > >>>> First, some of the contention is notorious VI_LOCK in order to > > >>>> do > > >>>> anything. > > >>>> > > >>>> But more importantly the mind-boggling off-cpu time comes from > > >>>> exclusive locking which should not be there to begin with -- as > > >>>> in > > >>>> that xlock in stat should be a slock. > > >>>> > > >>>> Maybe I'm going to look into it later. > > >>> > > >>> That would be fantastic. > > >>> > > >> > > >> I did a quick test, things are shared locked as expected. > > >> > > >> However, I found the following: > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { > > >> mp->mnt_kern_flag |= > > >> lowerrootvp->v_mount->mnt_kern_flag & > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | > > >> MNTK_EXTENDED_SHARED); > > >> } > > >> > > >> are you using the "nocache" option? it has a side effect of > > >> xlocking > > > > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > > > > > > > If you don't have "nocache" on null mounts, then I don't see how > > this > > could happen. > > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set > for > fuse and nfs at least. 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them. By nfs I meant nfs client, not nfs exports. No NFS client mounts anywhere on this system. So where is this exclusive lock coming from then... This is a ZFS system. 2 pools: one for the root, one for anything I need space for. Both pools reside on the same disks. The root pool is a 3-way mirror, the "space-pool" is a 5-disk raidz2. All jails are on the space-pool. The jails are all basejail-style jails. While I don't see why xlocking happens, you should be able to dtrace or printf your way into finding out. dtrace looks to me like a faster approach to get to the root than printf... my first naive try is to detect exclusive locks. I'm not 100% sure I got it right, but at least dtrace doesn't complain about it: ---snip--- #pragma D option dynvarsize=32m fbt:nullfs:null_lock:entry /args[0]->a_flags & 0x08 != 0/ { stack(); } ---snip--- In which direction should I look with dtrace if this works in tonights run of periodic? I don't have enough knowledge about VFS to come up with some immediate ideas. After your sysctl fix for maxvnodes I increased the amount of vnodes 10 times compared to the initial report. This has increased the speed of the operation, the find runs in all those jails finished today after ~5h (@~8am) instead of in the afternoon as before. Could this suggest that in parallel some null_reclaim() is running which does the exclusive locks and slows down the entire operation? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: 100% CPU time for sysctl command, not killable
Am 2023-09-02 16:56, schrieb Mateusz Guzik: On 8/20/23, Alexander Leidinger wrote: Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable sysctl program. This is somewhat unexpected... fixed here https://cgit.freebsd.org/src/commit/?id=32988c1499f8698b41e15ed40a46d271e757bba3 I confirm. Thanks! Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: 100% CPU time for sysctl command, not killable
Am 2023-08-20 21:23, schrieb Alexander Leidinger: Am 2023-08-20 18:55, schrieb Mina Galić: procstat(1) kstack could be helpful here. Original Message On 20 Aug 2023, 17:29, Alexander Leidinger alexan...@leidinger.net> wrote: Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable sysctl program. This is somewhat unexpected... Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 0x8F31830F9F2772BF PIDTID COMMTDNAME KSTACK 94391 118678 sysctl - sysctl_maxvnodes sysctl_root_handler_locked sysctl_root userland_sysctl sys___sysctl amd64_syscall fast_syscall_common I experimented a bit by multiplying my initial value of 104857600. It fails between 5 and 6 times the initial value. sysctl kern.maxvnodes=524288000 is successful within 4 seconds. sysctl kern.maxvnodes=629145600 goes into a loop with the same procstat -k output. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Possible issue with linux xattr support?
Am 2023-08-29 21:31, schrieb Felix Palmen: * Shawn Webb [20230829 15:25]: On Tue, Aug 29, 2023 at 09:15:03PM +0200, Felix Palmen wrote: > * Kyle Evans [20230829 14:07]: > > On 8/29/23 14:02, Shawn Webb wrote: > > > Back in 2019, I had a similar issue: I needed access to be able to > > > read/write to the system extended attribute namespace from within a > > > jailed context. I wrote a rather simple patch that provides that > > > support on a per-jail basis: > > > > > > https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/commit/96c85982b45e44a6105664c7068a92d0a61da2a3 > > > > > > Hopefully that's useful to someone. > > > > > > Thanks, > > > > > > > FWIW (which likely isn't much), I like this approach much better; it makes > > more sense to me that it's a feature controlled by the creator of the jail > > and not one allowed just by using a compat ABI within a jail. > > Well, a typical GNU userland won't work in a jail without this, that's > what I know now. But I'm certainly with you, it doesn't feel logical > that a Linux binary can do something in a jail a FreeBSD binary can't. > > So, indeed, making it a jail option sounds better. > > Unless, bringing back a question raised earlier in this thread: What's > the reason to restrict this in a jailed context in the first place? IOW, > could it just be allowed unconditionally? In HardenedBSD's case, since we use filesystem extended attributes to toggle exploit mitigations on a per-application basis, there's now a conceptual security boundary between the host and the jail. Should the jail and the host share resources, like executables, a jailed process could toggle an exploit mitigation, and the toggle would bubble up to the host. So the next time the host executed /shared/app/executable/here, the security posture of the host would be affected. Isn't the sane approach here *not* to share any executables with a jail other than via a read-only nullfs mount? In https://reviews.freebsd.org/D40370 I provide infrastructure to automatically jail rc.d services. It will use the complete filesystem of the system, but uses all the other restrictions of jails. So the answer to your questions is "it depends". Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Possible issue with linux xattr support?
Am 2023-08-29 21:02, schrieb Shawn Webb: Back in 2019, I had a similar issue: I needed access to be able to read/write to the system extended attribute namespace from within a jailed context. I wrote a rather simple patch that provides that support on a per-jail basis: https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/commit/96c85982b45e44a6105664c7068a92d0a61da2a3 You enabled it by default. I would assume you had a thought about the implications... any memories about it? What I'm after is: - What can go wrong if we enable it by default? - Why would we like to disable it (or any ideas why it is disabled by default in FreeBSD)? Depending in the answers we may even use a simpler patch and have it allowed in jails even without the possibility to configure it. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-22 18:59, schrieb Mateusz Guzik: On 8/22/23, Alexander Leidinger wrote: Am 2023-08-21 10:53, schrieb Konstantin Belousov: On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote: Am 2023-08-20 23:17, schrieb Konstantin Belousov: > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: > > On 8/20/23, Alexander Leidinger wrote: > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: > > >> On 8/20/23, Alexander Leidinger wrote: > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: > > >>>> On 8/18/23, Alexander Leidinger > > >>>> wrote: > > >>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you > > >>>>> interested > > >>>>> to > > >>>>> get it? > > >>>>> > > >>>> > > >>>> Your problem is not the vnode limit, but nullfs. > > >>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg > > >>> > > >>> 122 nullfs mounts on this system. And every jail I setup has > > >>> several > > >>> null mounts. One basesystem mounted into every jail, and then > > >>> shared > > >>> ports (packages/distfiles/ccache) across all of them. > > >>> > > >>>> First, some of the contention is notorious VI_LOCK in order to > > >>>> do > > >>>> anything. > > >>>> > > >>>> But more importantly the mind-boggling off-cpu time comes from > > >>>> exclusive locking which should not be there to begin with -- as > > >>>> in > > >>>> that xlock in stat should be a slock. > > >>>> > > >>>> Maybe I'm going to look into it later. > > >>> > > >>> That would be fantastic. > > >>> > > >> > > >> I did a quick test, things are shared locked as expected. > > >> > > >> However, I found the following: > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { > > >> mp->mnt_kern_flag |= > > >> lowerrootvp->v_mount->mnt_kern_flag & > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | > > >> MNTK_EXTENDED_SHARED); > > >> } > > >> > > >> are you using the "nocache" option? it has a side effect of > > >> xlocking > > > > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > > > > > > > If you don't have "nocache" on null mounts, then I don't see how > > this > > could happen. > > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set > for > fuse and nfs at least. 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them. By nfs I meant nfs client, not nfs exports. No NFS client mounts anywhere on this system. So where is this exclusive lock coming from then... This is a ZFS system. 2 pools: one for the root, one for anything I need space for. Both pools reside on the same disks. The root pool is a 3-way mirror, the "space-pool" is a 5-disk raidz2. All jails are on the space-pool. The jails are all basejail-style jails. While I don't see why xlocking happens, you should be able to dtrace or printf your way into finding out. dtrace looks to me like a faster approach to get to the root than printf... my first naive try is to detect exclusive locks. I'm not 100% sure I got it right, but at least dtrace doesn't complain about it: ---snip--- #pragma D option dynvarsize=32m fbt:nullfs:null_lock:entry /args[0]->a_flags & 0x08 != 0/ { stack(); } ---snip--- In which direction should I look with dtrace if this works in tonights run of periodic? I don't have enough knowledge about VFS to come up with some immediate ideas. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Possible issue with linux xattr support?
Am 2023-08-28 13:06, schrieb Dmitry Chagin: On Sun, Aug 27, 2023 at 09:55:23PM +0200, Felix Palmen wrote: * Dmitry Chagin [20230827 22:46]: > I can fix this completely disabling exttatr for jailed proc, > however, it's gonna be bullshit, though Would probably be better than nothing. AFAIK, "Linux jails" are used a lot, probably with userlands from distributions actually using xattr. It might sense to allow this priv (PRIV_VFS_EXTATTR_SYSTEM) for linux jails by default? What do think, James? I think the question is more if we want to allow it in jails (not specific to linux jails, as in: if it is ok for linux jails, it should be ok for FreeBSD jails too). So the question is what does this protect the hosts from, if this is not allowed in jails? Some kind of possibility to DoS the host? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-21 10:53, schrieb Konstantin Belousov: On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote: Am 2023-08-20 23:17, schrieb Konstantin Belousov: > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: > > On 8/20/23, Alexander Leidinger wrote: > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: > > >> On 8/20/23, Alexander Leidinger wrote: > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: > > >>>> On 8/18/23, Alexander Leidinger wrote: > > >>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you interested > > >>>>> to > > >>>>> get it? > > >>>>> > > >>>> > > >>>> Your problem is not the vnode limit, but nullfs. > > >>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg > > >>> > > >>> 122 nullfs mounts on this system. And every jail I setup has several > > >>> null mounts. One basesystem mounted into every jail, and then shared > > >>> ports (packages/distfiles/ccache) across all of them. > > >>> > > >>>> First, some of the contention is notorious VI_LOCK in order to do > > >>>> anything. > > >>>> > > >>>> But more importantly the mind-boggling off-cpu time comes from > > >>>> exclusive locking which should not be there to begin with -- as in > > >>>> that xlock in stat should be a slock. > > >>>> > > >>>> Maybe I'm going to look into it later. > > >>> > > >>> That would be fantastic. > > >>> > > >> > > >> I did a quick test, things are shared locked as expected. > > >> > > >> However, I found the following: > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { > > >> mp->mnt_kern_flag |= > > >> lowerrootvp->v_mount->mnt_kern_flag & > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | > > >> MNTK_EXTENDED_SHARED); > > >> } > > >> > > >> are you using the "nocache" option? it has a side effect of xlocking > > > > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > > > > > > > If you don't have "nocache" on null mounts, then I don't see how this > > could happen. > > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set for > fuse and nfs at least. 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them. By nfs I meant nfs client, not nfs exports. No NFS client mounts anywhere on this system. So where is this exclusive lock coming from then... This is a ZFS system. 2 pools: one for the root, one for anything I need space for. Both pools reside on the same disks. The root pool is a 3-way mirror, the "space-pool" is a 5-disk raidz2. All jails are on the space-pool. The jails are all basejail-style jails. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-20 23:17, schrieb Konstantin Belousov: On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: On 8/20/23, Alexander Leidinger wrote: > Am 2023-08-20 22:02, schrieb Mateusz Guzik: >> On 8/20/23, Alexander Leidinger wrote: >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: >>>> On 8/18/23, Alexander Leidinger wrote: >>> >>>>> I have a 51MB text file, compressed to about 1MB. Are you interested >>>>> to >>>>> get it? >>>>> >>>> >>>> Your problem is not the vnode limit, but nullfs. >>>> >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg >>> >>> 122 nullfs mounts on this system. And every jail I setup has several >>> null mounts. One basesystem mounted into every jail, and then shared >>> ports (packages/distfiles/ccache) across all of them. >>> >>>> First, some of the contention is notorious VI_LOCK in order to do >>>> anything. >>>> >>>> But more importantly the mind-boggling off-cpu time comes from >>>> exclusive locking which should not be there to begin with -- as in >>>> that xlock in stat should be a slock. >>>> >>>> Maybe I'm going to look into it later. >>> >>> That would be fantastic. >>> >> >> I did a quick test, things are shared locked as expected. >> >> However, I found the following: >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { >> mp->mnt_kern_flag |= >> lowerrootvp->v_mount->mnt_kern_flag & >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | >> MNTK_EXTENDED_SHARED); >> } >> >> are you using the "nocache" option? it has a side effect of xlocking > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > If you don't have "nocache" on null mounts, then I don't see how this could happen. There is also MNTK_NULL_NOCACHE on lower fs, which is currently set for fuse and nfs at least. 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them. Shouldn't this implicit nocache propagate to the mount of the upper fs to give the user feedback about the effective state? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-20 22:02, schrieb Mateusz Guzik: On 8/20/23, Alexander Leidinger wrote: Am 2023-08-20 19:10, schrieb Mateusz Guzik: On 8/18/23, Alexander Leidinger wrote: I have a 51MB text file, compressed to about 1MB. Are you interested to get it? Your problem is not the vnode limit, but nullfs. https://people.freebsd.org/~mjg/netchild-periodic-find.svg 122 nullfs mounts on this system. And every jail I setup has several null mounts. One basesystem mounted into every jail, and then shared ports (packages/distfiles/ccache) across all of them. First, some of the contention is notorious VI_LOCK in order to do anything. But more importantly the mind-boggling off-cpu time comes from exclusive locking which should not be there to begin with -- as in that xlock in stat should be a slock. Maybe I'm going to look into it later. That would be fantastic. I did a quick test, things are shared locked as expected. However, I found the following: if ((xmp->nullm_flags & NULLM_CACHE) != 0) { mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag & (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | MNTK_EXTENDED_SHARED); } are you using the "nocache" option? it has a side effect of xlocking I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-20 19:10, schrieb Mateusz Guzik: On 8/18/23, Alexander Leidinger wrote: I have a 51MB text file, compressed to about 1MB. Are you interested to get it? Your problem is not the vnode limit, but nullfs. https://people.freebsd.org/~mjg/netchild-periodic-find.svg 122 nullfs mounts on this system. And every jail I setup has several null mounts. One basesystem mounted into every jail, and then shared ports (packages/distfiles/ccache) across all of them. First, some of the contention is notorious VI_LOCK in order to do anything. But more importantly the mind-boggling off-cpu time comes from exclusive locking which should not be there to begin with -- as in that xlock in stat should be a slock. Maybe I'm going to look into it later. That would be fantastic. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: 100% CPU time for sysctl command, not killable
Am 2023-08-20 18:55, schrieb Mina Galić: procstat(1) kstack could be helpful here. Original Message On 20 Aug 2023, 17:29, Alexander Leidinger alexan...@leidinger.net> wrote: Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable sysctl program. This is somewhat unexpected... Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 0x8F31830F9F2772BF PIDTID COMMTDNAME KSTACK 94391 118678 sysctl - sysctl_maxvnodes sysctl_root_handler_locked sysctl_root userland_sysctl sys___sysctl amd64_syscall fast_syscall_common Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
100% CPU time for sysctl command, not killable
Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable sysctl program. This is somewhat unexpected... Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-16 18:48, schrieb Alexander Leidinger: Am 2023-08-15 23:29, schrieb Mateusz Guzik: On 8/15/23, Alexander Leidinger wrote: Am 2023-08-15 14:41, schrieb Mateusz Guzik: With this in mind can you provide: sysctl kern.maxvnodes vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes vfs.recycles_free vfs.recycles After a reboot: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 24696 vfs.vnodes_created: 1658162 vfs.numvnodes: 173937 vfs.recycles_free: 0 vfs.recycles: 0 New values after one rund of periodic: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 356202 vfs.vnodes_created: 427696288 vfs.numvnodes: 532620 vfs.recycles_free: 20213257 vfs.recycles: 0 And after the second round which only took 7h this night: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 3071754 vfs.vnodes_created: 1275963316 vfs.numvnodes: 3414906 vfs.recycles_free: 58411371 vfs.recycles: 0 Meanwhile if there is tons of recycles, you can damage control by bumping kern.maxvnodes. What's the difference between recycles and recycles_free? Does the above count as bumping the maxvnodes? ^ Looks like there are not much free directly after the reboot. I will check the values tomorrow after the periodic run again and maybe increase by 10 or 100 so see if it makes a difference. If this is not the problem you can use dtrace to figure it out. dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or something else? I mean checking where find is spending time instead of speculating. There is no productized way to do it so to speak, but the following crapper should be good enough: [script] I will let it run this night. I have a 51MB text file, compressed to about 1MB. Are you interested to get it? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-15 23:29, schrieb Mateusz Guzik: On 8/15/23, Alexander Leidinger wrote: Am 2023-08-15 14:41, schrieb Mateusz Guzik: With this in mind can you provide: sysctl kern.maxvnodes vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes vfs.recycles_free vfs.recycles After a reboot: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 24696 vfs.vnodes_created: 1658162 vfs.numvnodes: 173937 vfs.recycles_free: 0 vfs.recycles: 0 New values after one rund of periodic: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 356202 vfs.vnodes_created: 427696288 vfs.numvnodes: 532620 vfs.recycles_free: 20213257 vfs.recycles: 0 Meanwhile if there is tons of recycles, you can damage control by bumping kern.maxvnodes. What's the difference between recycles and recycles_free? Does the above count as bumping the maxvnodes? Looks like there are not much free directly after the reboot. I will check the values tomorrow after the periodic run again and maybe increase by 10 or 100 so see if it makes a difference. If this is not the problem you can use dtrace to figure it out. dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or something else? I mean checking where find is spending time instead of speculating. There is no productized way to do it so to speak, but the following crapper should be good enough: [script] I will let it run this night. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-15 14:41, schrieb Mateusz Guzik: With this in mind can you provide: sysctl kern.maxvnodes vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes vfs.recycles_free vfs.recycles After a reboot: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 24696 vfs.vnodes_created: 1658162 vfs.numvnodes: 173937 vfs.recycles_free: 0 vfs.recycles: 0 Meanwhile if there is tons of recycles, you can damage control by bumping kern.maxvnodes. Looks like there are not much free directly after the reboot. I will check the values tomorrow after the periodic run again and maybe increase by 10 or 100 so see if it makes a difference. If this is not the problem you can use dtrace to figure it out. dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or something else? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Strange network issues with -current
Am 2023-08-15 14:24, schrieb Alexander Leidinger: Am 2023-08-15 13:48, schrieb Alexander Leidinger: since a while I have some strange network issues in some parts of a particular system. I just stumbled upon the mail which discusses issues with commit e3ba0d6adde3, and when I look into this I see changes related to the use of SO_REUSEPORT flags, and all my nginx systems use the reuseport directive in their config. I'm compiling right now with his change reverted. Once tested I will report back. Unfortunately it wasn't that. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Strange network issues with -current
Am 2023-08-15 13:48, schrieb Alexander Leidinger: since a while I have some strange network issues in some parts of a particular system. I just stumbled upon the mail which discusses issues with commit e3ba0d6adde3, and when I look into this I see changes related to the use of SO_REUSEPORT flags, and all my nginx systems use the reuseport directive in their config. I'm compiling right now with his change reverted. Once tested I will report back. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Speed improvements in ZFS
Hi, just a report that I noticed a very high speed improvement in ZFS in -current. Since a looong time (at least since last year), for a jail-host of mine with about >20 jails on it which each runs periodic daily, the periodic daily runs of the jails take from about 3 am to 5pm or longer. I don't remember when this started, and I thought at that time that the problem may be data related. It's the long runs of "find" in one of the periodic daily jobs which takes that long, and the number of jails together with null-mounted basesystem inside the jail and a null-mounted package repository inside each jail the number of files and congruent access to the spining rust with first SSD and now NVME based cache may have reached some tipping point. I have all the periodic daily mails around, so theoretically I may be able to find when this started, but as can be seen in another mail to this mailinglist, the system which has all the periodic mails has some issues which have higher priority for me to track down... Since I updated to a src from 2023-07-20, this is not the case anymore. The data is the same (maybe even a bit more, as I have added 2 more jails since then and the periodic daily runs which run more or less in parallel, are not taking considerably longer). The speed increase with the July-build are in the area of 3-4 hours for 23 parallel periodic daily runs. So instead of finishing the periodic runs around 5pm, they finish already around 1pm/2pm. So whatever was done inside ZFS or VFS or nullfs between 2023-06-19 and 2023-07-20 has given a huge speed improvement. From my memory I would say there is still room for improvement, as I think it may be the case that the periodic daily runs ended in the morning instead of the afteroon, but my memory may be flaky in this regard... Great work to whoever was involved. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Strange network issues with -current
Hi, since a while I have some strange network issues in some parts of a particular system. A build with src from 2023-07-26 was still working ok. An update to 2023-08-07 broke some parts in a strange way. I tried again with src from 2023-08-11 didn't fix things. What I see is... strange and complex. I have a jail host with about 23 jails. All the jails are sitting on a bridge, and have IPv6 and IPV4 addresses. One jail is a DNS server for a domain which contains all the DNS entries for all the jails on the system (and more). Other jails have mysql (FS socket for mysql nullfs-mounted into other jails for connecting to mysql via the FS socket instead of the network), dovecot IMAP server, postfix SMTP server, a nginx based reverse proxy and 2 different kinds of webmail solutions (old php74 based on the way out on favour for a php81 based one), a wiki and other things. With the old working basesystem I can login into the old webmail system and read mails. With the newer non-working basesystem I still can login, but the auth-credentials are not stored in the backend-session and as such no mail is listed at all, as this requires subsequent connections from php to dovecot. This webmail system is going via the reverse proxy to the webmail-jail which has another nginx configured to connect to the php-fpm backend. With the new webmail system I can login, read mails, and even are writing this email from. The first login to it fails. The second succeeds. It is not behind the reverse proxy (as it is not fully ready yet for access from the outside (DSL with NAT on the DSL-box to the reverse proxy)), but a single nginx with php-fpm backend (instead of 2 nginx + php-fpm as in the old webmail). The wiki behind the reverse proxy is sometimes working, and sometimes not. Sometimes it is providing everything, sometimes parts of the site is missing (e.g. pictures / icons). Sometimes there is simply a blank page, sometimes it gives an error message from the wiki about an unforseen bug... The error messages in the nginx reverse proxy log for all the strange failure cases is "accept4() failed (53: Software caused connection abort)". Sometimes I get "upstream timed out". When it times out in the reverse proxy instead of getting the accept4-errors, I see the same accept4-error message in the nginx inside the wiki or webmail jail instead. I tried to recompile all the components of the wiki and reverse proxy and php81 based webmail, to no avail. The issue persists. Does this ring a bell to someone? Maybe some network or socket or VM based changes in this timeframe which smell like they could be related and maybe good candidates for a backup-test? Any ideas how to drill down with debugging to have a more simple test-case than the complex setup of if_bridge, epair, jails, wiki, php, nginx, ...? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: kernel: sonewconn: pcb 0xfffff8002b255a00 (local:/var/run/devd.seqpacket.pipe): Listen queue overflow: 1 already in queue awaiting acceptance (60 occurrences), ?
Quoting Gary Jennejohn (from Tue, 20 Jun 2023 14:41:41 +): On Tue, 20 Jun 2023 12:04:13 +0200 Alexander Leidinger wrote: "listen X backlog=y" and "sysctl kern.ipx.somaxconn=X" for FreeBSD On my FreeBSD14 system these things are all under kern.ipc. Typo on my side... it was supposed to read ipc, not ipx. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpTjpqaetzBB.pgp Description: Digitale PGP-Signatur
Re: kernel: sonewconn: pcb 0xfffff8002b255a00 (local:/var/run/devd.seqpacket.pipe): Listen queue overflow: 1 already in queue awaiting acceptance (60 occurrences), ?
Quoting Gary Jennejohn (from Tue, 20 Jun 2023 07:41:08 +): On Tue, 20 Jun 2023 06:25:05 +0100 Graham Perrin wrote: Please, what's the meaning of the sonewconn lines? sonewconn is described in socket(9). Below a copy/paste of the description from socket(9): Protocol implementations can use sonewconn() to create a socket and attach protocol state to that socket. This can be used to create new sockets available for soaccept() on a listen socket. The returned socket has a reference count of zero. Apparently there was already a listen socket in the queue which had not been consumed by soaccept() when a new sonewconn() call was made. Anyway, that's my understanding. Might be wrong. In other words the software listening on it didn't process the request fast enough and a backlog piled up (e.g apache ListenBacklog or nginx "listen X backlog=y" and "sysctl kern.ipx.somaxconn=X" for FreeBSD itself). You may need faster hardware, more processes/threads to handle the traffic, or configure your software to do less to produce the same result (e.g. no real-time DNS resolution in the logging of a webserver or increasing the amount of allowed items in the backlog). If you can change the software, there's also the possibility to switch from blocking sockets to non-blocking sockets (to not have the select/accept loop block / run into contention) or kqueue. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpAjQQlBmAmQ.pgp Description: Digitale PGP-Signatur
Re: Surprise null root password
Quoting bob prohaska (from Tue, 30 May 2023 08:36:21 -0700): I suggest to review changes ("df" instead of "tf" in etcupdate) to at least those files which you know you have modified, including the password/group stuff. After that you can decide if the diff which is shown with "df" can be applied ("tf"), or if you want to keep the old version ("mf"), or if you want to modify the current file ("e", with both versions present in the file so that you can copy/paste between the different versions and keep what you need). The key sequences required to copy and paste between files in the edit screen were elusive. Probably it was thought self-evident, but not for me. I last tried it long ago, via mergemaster. Is there is a guide to commands for merging files using /etcupdate? Is it in the vi man page? I couldn't find it. etcupdate respects the EDITOR env-variable. You can use any editor you like there. Typically I use the mouse to copy myself and google every time I can't (https://linuxize.com/post/how-to-copy-cut-paste-in-vim/). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgp4RaBvVQJjb.pgp Description: Digitale PGP-Signatur
Re: Surprise null root password
Quoting bob prohaska (from Fri, 26 May 2023 16:26:06 -0700): On Fri, May 26, 2023 at 10:55:49PM +0200, Yuri wrote: The question is how you update the configuration files, mergemaster/etcupdate/something else? Via etcupdate after installworld. In the event the system requests manual intervention I accept "theirs all". It seems odd if that can null a root password. Still, it does seem an outside possibility. I could see it adding system users, but messing with root's existing password seems a bit unexpected. As you are posting to -current@, I expect you to report this issue about 14-current systems. As such: there was a "recent" change (2021-10-20) to the root entry to change the shell. https://cgit.freebsd.org/src/commit/etc/master.passwd?id=d410b585b6f00a26c2de7724d6576a3ea7d548b7 By blindly accepting all changes, this has reset the PW to the default setting (empty). I suggest to review changes ("df" instead of "tf" in etcupdate) to at least those files which you know you have modified, including the password/group stuff. After that you can decide if the diff which is shown with "df" can be applied ("tf"), or if you want to keep the old version ("mf"), or if you want to modify the current file ("e", with both versions present in the file so that you can copy/paste between the different versions and keep what you need). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpGEjDP92h3s.pgp Description: Digitale PGP-Signatur
Re: change in compat/linux breaking net/citrix_ica
Quoting Jakob Alvermark (from Wed, 26 Apr 2023 09:01:00 +0200): Hi, I use net/citrix_ica for work. After a recent change to -current in compat/linux it no longer works. The binary just segfaults. What does "sysctl compat.linux.osrelease" display? If it is not 2.6.30 or higher, try to set it to 2.6.30 or higher. Bye, Alexander. I have bisected and it happened after this commit: commit 40c36c4674eb9602709cf9d0483a4f34ad9753f6 Author: Dmitry Chagin Date: Sat Apr 22 22:17:17 2023 +0300 linux(4): Export the AT_RANDOM depending on the process osreldata AT_RANDOM has appeared in the 2.6.30 Linux kernel first time. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpvwszGFGPAo.pgp Description: Digitale PGP-Signatur
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
Quoting Mark Millard (from Wed, 12 Apr 2023 22:28:13 -0700): A fair number of errors are of the form: the build installing a previously built package for use in the builder but later the builder can not find some file from the package's installation. As a data point, last year I had such issues with one particular package. It was consistent no matter how often I was updating the ports tree. Poudriere always failed on port X which was depending on port Y (don't remember the names). The problem was, that port Y was build successfully but an extract of it was not having a file it was supposed to have. IIRC I fixed the issue by building the port Y manually, as re-building port Y with poudriere didn't change the outcome. So it seems this may not be specific to the most recent ZFS version, but could be an older issue. It may be the case that the more recent ZFS version amplifies the problem. It can also be that it is related to a specific use case in poudriere. I remember a recent mail which talks about poudriere failing to copy files in resource-limited environments, see https://lists.freebsd.org/archives/dev-commits-src-all/2023-April/025153.html While the issue you are trying to pin-point may not be related to this discussion, I mention it because it smells to me like we could be in a situation where a similar combination of unrelated to each other FreeBSD features could form a combination which triggers the issue at hand. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpjoaPNf5aAM.pgp Description: Digitale PGP-Signatur
Re: py-libzfs build failure on current, zpool_search_import() missing
Quoting Ryan Moeller (from Fri, 3 Feb 2023 10:48:35 -0500): The build still fails on -current as of end of Jan with "too few argument to function call, expected 4, have 3" for zfs_iter_filesystems. Is a patch for openzfs in -current missing? I haven't seen a commit to -current in openzfs in the last 2 days. The openzfs changes aren't that recent, but the py-libzfs port has been out of date for a while. I'll spin up a new snapshot VM and fix whatever is still broken. I can confirm that the 20230207 version of py-libzfs builds (and works) on -current. Thanks! Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpMIsS8likHB.pgp Description: Digitale PGP-Signatur
Re: py-libzfs build failure on current, zpool_search_import() missing
Quoting Ryan Moeller (from Thu, 2 Feb 2023 10:43:53 -0500): I've updated the py-libzfs port to fix the build. The build still fails on -current as of end of Jan with "too few argument to function call, expected 4, have 3" for zfs_iter_filesystems. Is a patch for openzfs in -current missing? I haven't seen a commit to -current in openzfs in the last 2 days. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpXHOYNsPe15.pgp Description: Digitale PGP-Signatur
Re: py-libzfs build failure on current, zpool_search_import() missing
Quoting Alan Somers (from Thu, 2 Feb 2023 06:58:35 -0700): Unfortunately libzfs doesn't have a stable API, so this kind of breakage is to be expected. libzfs_core does, but libzfs_core is incomplete. You should report this problem upstream at https://github.com/truenas/py-libzfs . I did already. https://github.com/truenas/py-libzfs/issues/224 There is no libzfs_core.h in /usr/include, can it be that we need to install this there? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpqkRgAqlhhN.pgp Description: Digitale PGP-Signatur
py-libzfs build failure on current, zpool_search_import() missing
Hi, the build of py-libzfs fails on -current due to a missing zpool_search_import(), and as such iocage can not be build (and the old iocage segfaults, so the ABI seems to have changed too). The symbol is available in libzutil, but I can not find zpool_search_import() in /usr/include. Anyone with an idea if there is something missing (maybe something to be installed into /usr/include), or what needs to be done to py-libzfs? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpT2LpLYHf04.pgp Description: Digitale PGP-Signatur
Re: RFC: nfsd in a vnet jail
Quoting Alan Somers (from Tue, 29 Nov 2022 17:28:10 -0700): On Tue, Nov 29, 2022 at 5:21 PM Rick Macklem wrote: So, what do others think of enforcing the requirement that each jail have its own file systems for this? I think that's a totally reasonable requirement. Especially so for ZFS users, who already create a filesystem per jail for other reasons. While I agree that it is a reasonable requirement, just a note that we can not assume that every existing jail resides on its own file system. The base system jail infrastructure doesn't check this, and the ezjail port doesn't either. The iocage port does it. Is there a way to detect this inside a jail and error out in nfsd/mountd? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpRjJWWhBIKb.pgp Description: Digitale PGP-Signatur
Re: ULE realtime scheduler advice needed
Quoting Hans Petter Selasky (from Fri, 18 Nov 2022 05:47:58 +0100): Hi, I'm doing some work with audio and have noticed some problems with the ULE scheduler. I have a program that generate audio based on key-presses. When no keys are pressed, the load is near 0%, but as soon as you start pressing keys, the load goes maybe to 80% of a CPU core. This program I run with rtprio 8 xxx. The issue I observe or hear actually, is that it takes too long until the scheduler grasps that this program needs it's own CPU core and stops time-sharing the program. When I however use cpuset -l xxx rtprio 8 yyy everything is good, and the program outputs realtime audio in-time. I have something in my mind about ULE not handling idleprio and/or rtprio correctly, but I have no pointer to a validation of this. Or is this perhaps a CPU frequency stepping issue? You could play with rc.conf (/etc/rc.d/power_profile): performance_cpu_freq="HIGH" performance_cx_lowest="C3" # see sysctl hw.cpu.0 | grep cx economy_cx_lowest="C3" # see sysctl hw.cpu.0 | grep cx Your system may provide other Cx possibilities, and ging to a lower number (e.g. C1) means less power-saving but faster response from the CPU (I do not expect that this is causing the issue you have). Any advice on where to look? Potential sysctl to play with to change "interactivity detection" in ULE: https://www.mail-archive.com/freebsd-stable@freebsd.org/msg112118.html Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpkBXOWiYjFT.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting Warner Losh (from Wed, 9 Nov 2022 08:54:33 -0700): On Wed, Nov 9, 2022 at 5:46 AM Alexander Leidinger wrote: While most of these options look OK on the surface, I'd feel a lot better if there were tests for these to prove they work. I'd also feel better if the ZFS experts could explain how those come to be set on a zpool as well. I'd settle for a good script that could be run as root (better It is explained in the zpool-features man page. would be not as root) that would take a filesystem that was created by makefs -t zfs and turn on these features after an zpool upgrade. Script attached. Maybe a little bit too verbose, but you can see which features are active directly, and which ones only enabled. It expects a zroot.img in the current directory and creates copies to zroot_num_featurename.img where it enables the features. In the beginning are some variables to adapt to pool/image name and destination directory. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF zpool_features.sh Description: Bourne shell script pgpnwcnaQ1dYc.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting "Patrick M. Hausen" (from Wed, 9 Nov 2022 22:47:28 +0100): Hi, Am 09.11.2022 um 22:38 schrieb Patrick M. Hausen : Am 09.11.2022 um 22:26 schrieb Alexander Leidinger : On quick look I haven't found a place where a compatibility setting is used for the rpool during the creation, so I can't point out what the exact difference is. Given that empty_bpobj is not in the list of the boot code, it can't be the same, some limit of enabled features has to be in place during initial install, and your example has to be different. That feature was imported into FreeBSD in 2012 so it should be enabled in every pool created since then. I apologize, should have included that in the last mail. This is a current FreeBSD 13.1-p2 hosting system we run. Boots quite fine ;-) There are several in the list which are not in the list in zfsipl.c. So that list is not the full truth... Bye, Alexander. --- [ry93@pdn006 ~]$ zpool get all zroot|grep feature zroot feature@async_destroy enabledlocal zroot feature@empty_bpobjactive local zroot feature@lz4_compress active local zroot feature@multi_vdev_crash_dump enabledlocal zroot feature@spacemap_histogram active local zroot feature@enabled_txgactive local zroot feature@hole_birth active local zroot feature@extensible_dataset active local zroot feature@embedded_data active local zroot feature@bookmarks enabledlocal zroot feature@filesystem_limits enabledlocal zroot feature@large_blocks enabledlocal zroot feature@large_dnodeenabledlocal zroot feature@sha512 enabledlocal zroot feature@skein enabledlocal zroot feature@userobj_accounting active local zroot feature@encryption enabledlocal zroot feature@project_quota active local zroot feature@device_removal enabledlocal zroot feature@obsolete_countsenabledlocal zroot feature@zpool_checkpoint enabledlocal zroot feature@spacemap_v2active local zroot feature@allocation_classes enabledlocal zroot feature@resilver_defer enabledlocal zroot feature@bookmark_v2enabledlocal zroot feature@redaction_bookmarksenabledlocal zroot feature@redacted_datasets enabledlocal zroot feature@bookmark_written enabledlocal zroot feature@log_spacemap active local zroot feature@livelist active local zroot feature@device_rebuild enabledlocal zroot feature@zstd_compress enabledlocal zroot feature@draid enabledlocal --- -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpfPnLCToEq6.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting Brooks Davis (from Wed, 9 Nov 2022 21:18:41 +): On Wed, Nov 09, 2022 at 09:19:47PM +0100, Alexander Leidinger wrote: Quoting Mark Millard (from Wed, 9 Nov 2022 12:10:18 -0800): > On Nov 9, 2022, at 11:58, Alexander Leidinger > wrote: > >> Quoting "Patrick M. Hausen" (from Wed, 9 Nov 2022 >> 20:49:37 +0100): >> >>> Hi, >>> >>>> Am 09.11.2022 um 20:45 schrieb Alexander Leidinger >>>> : >>>> But "zpool set feature@edonr=enabled rpool" (or any other feature >>>> not in the list we talk about) would render it unbootable. >>> >>> Sorry, just to be sure. So an active change of e.g. checksum or >>> compression algorithm >>> might render the system unbootable but a zpool upgrade never will? >>> At least not intentionally? ;-) >> >> If you mean "zpool upgrade", then no (modulo bugs). OpenZFS uses >> the feature flags instead of zpool upgrade. > > I'm confused by that answer: See my correction in another mail, the behavior seems to have changed and yes, doing a zpool upgrade on a boot pool should not be done. Maybe someone wants to check or add provisions to not do that on a pool which has the bootfs property set. Literally the entire point of the script added in the commit this thread is about upgrade the boot pool on first boot so that seems like it would be counterproductive. Something is missing here. Either some pointer to some safetynet for pools with the bootfs property set (or a similar "this is a bootable pool" flag), or a real-world test of the script. Any brave soul around to spin up a test-VM and perform a "echo before; zpool get all rpool | grep feature; zpool upgrade rpool; echo after; zpool get all rpool | grep feature" inside? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpbcaJMrp3Tk.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting "Patrick M. Hausen" (from Wed, 9 Nov 2022 22:11:29 +0100): Hi, Am 09.11.2022 um 22:05 schrieb Alexander Leidinger : Attention, "upgrade" is overloaded here. "OS upgrade" will not render the pool unbootable (modulo bugs), but "zpool upgrade rpool" will (except we have provisions that zpool upgrade doesn't enable all features in case the bootfs property is set). And we are back at the start. The "problem" is that I really like consistency. So when "zpool status" throws that ominous message at me - any you have to admit that it is phrased like a warning - I want simply to get rid of that. After a reasonable after-update grace period. But during our discussion I have come to wonder: - I upgrade from 13.0 to 13.1, I do a "zpool upgrade" afterwards, I also upgrade the boot loader - I install 13.1 with ZFS What is the difference? Shouldn't these two imaginary systems be absolutely the same in terms of ZFS features, boot loader, and all that? On quick look I haven't found a place where a compatibility setting is used for the rpool during the creation, so I can't point out what the exact difference is. Given that empty_bpobj is not in the list of the boot code, it can't be the same, some limit of enabled features has to be in place during initial install, and your example has to be different. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpGlO3ESlGcF.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting Warner Losh (from Wed, 9 Nov 2022 13:53:59 -0700): On Wed, Nov 9, 2022 at 12:47 PM Alexander Leidinger wrote: Quoting Warner Losh (from Wed, 9 Nov 2022 08:54:33 -0700): as well. I'd settle for a good script that could be run as root (better would be not as root) that would take a filesystem that was created by makefs -t zfs and turn on these features after an zpool upgrade. I have the vague outlines of a test suite for the boot loader that I could see about integrating something like that into, but most of my time these days is chasing after 'the last bug' in some kboot stuff I'm working on (which includes issues with our ZFS in the boot loader integration). How would you test a given image? bhyve/qemu/...? I have a script that creates a number of image files and a number of qemu scripts that look like the following: /home/imp/git/qemu/00-build/qemu-system-aarch64 -nographic -machine virt,gic-version=3 -m 512M -smp 4 \ -cpu cortex-a57 \ -drive file=/home/imp/stand-test-root/images/arm64-aarch64/linuxboot-arm64-aarch64-zfs.img,if=none,id=drive0,cache=writeback \ -device virtio-blk,drive=drive0,bootindex=0 \ -drive file=/home/imp/stand-test-root/bios/edk2-arm64-aarch64-code.fd,format=raw,if=pflash \ -drive file=/home/imp/stand-test-root/bios/edk2-arm64-aarch64-vars.fd,format=raw,if=pflash \ -monitor telnet::,server,nowait \ -serial stdio $* There's a list of these files that's generated and looks to see if it gets to the 'success' echo in the minimal root I have for them. So a little script which makes a copy of a source image, enables features on the copies and spits out a list of image files would suit your needs? e.g.: for feature A B C; do # ignoring inter-feature dependencies for a moment cp $source_image zfs_feature_$feature.img pool_name = import_pool zfs_feature_$feature.img enable_feature $pool_name $feature export_pool $pool_name echo zfs_feature_$feature.img done Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpK5i4nalL3R.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting Warner Losh (from Wed, 9 Nov 2022 13:56:43 -0700): On Wed, Nov 9, 2022 at 1:54 PM Patrick M. Hausen wrote: Hi Warner, > Am 09.11.2022 um 21:51 schrieb Warner Losh : > Yes. For safety, boot loader upgrade is mandatory when you do a zpool upgrade of the root filesystem. > It was definitely needed in the OpenZFS jump, and we've had one or two other flag days since. That's a given and not a problem. What I fear from my understanding of this thread so far is that there might be a situation when I upgrade the zpool and the boot loader and the system ends up unbootable nonetheless. Possible or not? If all you do is upgrade, then no, modulo bugs that we've thankfully not had yet. It's when you enable something on the zpool that you can run into trouble, but that's true independent of upgrade :) Attention, "upgrade" is overloaded here. "OS upgrade" will not render the pool unbootable (modulo bugs), but "zpool upgrade rpool" will (except we have provisions that zpool upgrade doesn't enable all features in case the bootfs property is set). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpm5pT61kFUD.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting "Patrick M. Hausen" (from Wed, 9 Nov 2022 21:19:23 +0100): Hi, Am 09.11.2022 um 21:15 schrieb Alexander Leidinger : Quoting "Patrick M. Hausen" (from Wed, 9 Nov 2022 21:02:52 +0100): Yet, I made it a habit to whenever I see this message: --- status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. --- to do a "zpool upgrade" after some time of burn in followed by an update of the boot loader. I desire to know if that is in fact dangerous. Ugh. This changed. It is indeed dangerous now. I just tested it with a non-root pool which didn't had all flags enabled. "zpool upgrade " will now enable all features. I know. But until now I assumed that features *enabled* but not *used* were not impeding booting. And that for all others the boot loader was supposed to keep track. Some features are used directly when enabled. Some features go back to the enabled state when some conditions are met. Some features are not reversible without re-creating the pool (e.g. device_removal). The zzpool-features man-page gives explanations which features belong into which category. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgp9Oy22Z69dx.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting Mark Millard (from Wed, 9 Nov 2022 12:10:18 -0800): On Nov 9, 2022, at 11:58, Alexander Leidinger wrote: Quoting "Patrick M. Hausen" (from Wed, 9 Nov 2022 20:49:37 +0100): Hi, Am 09.11.2022 um 20:45 schrieb Alexander Leidinger : But "zpool set feature@edonr=enabled rpool" (or any other feature not in the list we talk about) would render it unbootable. Sorry, just to be sure. So an active change of e.g. checksum or compression algorithm might render the system unbootable but a zpool upgrade never will? At least not intentionally? ;-) If you mean "zpool upgrade", then no (modulo bugs). OpenZFS uses the feature flags instead of zpool upgrade. I'm confused by that answer: See my correction in another mail, the behavior seems to have changed and yes, doing a zpool upgrade on a boot pool should not be done. Maybe someone wants to check or add provisions to not do that on a pool which has the bootfs property set. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpmpZ1ZW63NA.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting "Patrick M. Hausen" (from Wed, 9 Nov 2022 21:02:52 +0100): Hi, Am 09.11.2022 um 20:58 schrieb Alexander Leidinger : Quoting "Patrick M. Hausen" (from Wed, 9 Nov 2022 20:49:37 +0100): Hi, Am 09.11.2022 um 20:45 schrieb Alexander Leidinger : But "zpool set feature@edonr=enabled rpool" (or any other feature not in the list we talk about) would render it unbootable. Sorry, just to be sure. So an active change of e.g. checksum or compression algorithm might render the system unbootable but a zpool upgrade never will? At least not intentionally? ;-) If you mean "zpool upgrade", then no (modulo bugs). OpenZFS uses the feature flags instead of zpool upgrade. I know about feature flags and all my pools are recent enough to have them. Yet, I made it a habit to whenever I see this message: --- status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. --- to do a "zpool upgrade" after some time of burn in followed by an update of the boot loader. I desire to know if that is in fact dangerous. Ugh. This changed. It is indeed dangerous now. I just tested it with a non-root pool which didn't had all flags enabled. "zpool upgrade " will now enable all features. I remember that it wasn't in the past and I had to enable the feature flags by hand. I don't know if a pool with bootfs set is behaving differently, but I consider testing this with a real rpool to be dangerous. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgp55U6MDTOxF.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting "Patrick M. Hausen" (from Wed, 9 Nov 2022 20:49:37 +0100): Hi, Am 09.11.2022 um 20:45 schrieb Alexander Leidinger : But "zpool set feature@edonr=enabled rpool" (or any other feature not in the list we talk about) would render it unbootable. Sorry, just to be sure. So an active change of e.g. checksum or compression algorithm might render the system unbootable but a zpool upgrade never will? At least not intentionally? ;-) If you mean "zpool upgrade", then no (modulo bugs). OpenZFS uses the feature flags instead of zpool upgrade. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpO78tAvNPWO.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting Warner Losh (from Wed, 9 Nov 2022 08:54:33 -0700): as well. I'd settle for a good script that could be run as root (better would be not as root) that would take a filesystem that was created by makefs -t zfs and turn on these features after an zpool upgrade. I have the vague outlines of a test suite for the boot loader that I could see about integrating something like that into, but most of my time these days is chasing after 'the last bug' in some kboot stuff I'm working on (which includes issues with our ZFS in the boot loader integration). How would you test a given image? bhyve/qemu/...? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpBddbGiB5KJ.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting "Patrick M. Hausen" (from Wed, 9 Nov 2022 20:02:55 +0100): Hi all, Am 09.11.2022 um 16:54 schrieb Warner Losh : >>There is a fixed list of features we support in the boot loader: >>[...] >>Any feature not on this list will cause the boot loader to >> reject the pool. I admit that I do not grasp the full implications of this thread and the proposed and debated changes. Does that imply that a simple "zpool upgrade" of the boot/root pool might lead to an unbootable system in the future - even if the boot loader is upgraded as it should, too? For a recent pool (zpool get all rpool | grep -q feature && echo recent enough): no. But "zpool set feature@edonr=enabled rpool" (or any other feature not in the list we talk about) would render it unbootable. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgp2qrhU55d75.pgp Description: Digitale PGP-Signatur
Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting Warner Losh (from Wed, 9 Nov 2022 08:54:33 -0700): On Wed, Nov 9, 2022 at 5:46 AM Alexander Leidinger wrote: Quoting Alexander Leidinger (from Tue, 08 Nov 2022 10:50:53 +0100): > Should the above list be sorted in some way? Maybe in the same order > as the zpool-features lists them (sort by feature name after the > colon), or alphabetical? Is it OK if I commit this alphabetical sorting? [diff of feature-sorting] This patch looks good because it's a nop and just tidies things up a bit. Reviewed by: imp Will do later. > As Mark already mentioned some flags, I checked the features marked > as read only (I checked in the zpool-features man page, including > the dependencies documented there) and here are those not listed in > zfsimpl.c. I would assume as they are read-only compatible, we > should add them: > com.delphix:async_destroy > com.delphix:bookmarks > org.openzfs:device_rebuild > com.delphix:empty_bpobj > com.delphix:enable_txg > com.joyent:filesystem_limits > com.delphix:livelist > com.delphix:log_spacemap > com.zfsonlinux:project_quota > com.zfsonlinux:userobj_accounting > com.openzfs:zilsaxattr If my understanding is correct that the read-only compatible parts (according to the zpool-features man page) are safe to add to the zfs boot, here is what I have only build-tested (relative to the above alphabetical sorting): ---snip--- --- zfsimpl.c_sorted2022-11-09 12:55:06.346083000 +0100 +++ zfsimpl.c2022-11-09 13:01:24.083364000 +0100 @@ -121,24 +121,35 @@ "com.datto:bookmark_v2", "com.datto:encryption", "com.datto:resilver_defer", +"com.delphix:async_destroy", "com.delphix:bookmark_written", +"com.delphix:bookmarks", "com.delphix:device_removal", "com.delphix:embedded_data", +"com.delphix:empty_bpobj", +"com.delphix:enable_txg", "com.delphix:extensible_dataset", "com.delphix:head_errlog", "com.delphix:hole_birth", +"com.delphix:livelist", +"com.delphix:log_spacemap", "com.delphix:obsolete_counts", "com.delphix:spacemap_histogram", "com.delphix:spacemap_v2", "com.delphix:zpool_checkpoint", "com.intel:allocation_classes", +"com.joyent:filesystem_limits", "com.joyent:multi_vdev_crash_dump", +"com.openzfs:zilsaxattr", +"com.zfsonlinux:project_quota", +"com.zfsonlinux:userobj_accounting", "org.freebsd:zstd_compress", "org.illumos:lz4_compress", "org.illumos:sha512", "org.illumos:skein", "org.open-zfs:large_blocks", "org.openzfs:blake3", +"org.openzfs:device_rebuild", "org.zfsonlinux:allocation_classes", "org.zfsonlinux:large_dnode", NULL ---snip--- Anyone able to test some of those or confirms my understanding is correct and would sign-off on a "reviewed by" level? I'm inclined to strongly NAK this patch, absent some way to test it. There's no issues today with any of them being absent causing problems on boot that have been reported. The ZFS that's in the boot loader is a reduced copy of what's in base and not everything is supported. There's no urgency here to rush into this. The ones that are on the list already are for things that we know we support in the boot loader because we've gone to the trouble to put blake3 or sha512 into it (note: Not all boot loaders will support all ZFS features in the future... x86 BIOS booting likely is going to have to be frozen at its current ZFS feature set due to code size issues). While most of these options look OK on the surface, I'd feel a lot better if there were tests for these to prove they work. I'd also feel better if the ZFS experts could explain how those come to be set on a zpool as well. I'd settle for a good script that could be run as root (better would be not as root) that would take a filesystem that was created by makefs -t zfs and turn on these features after an zpool upgrade. I have the vague outlines of a test suite for the boot loader that I could see about integrating something like that into, but most of my time these days is chasing after 'the last bug' in some kboot stuff I'm working on (which includes issues with our ZFS in the boot loader integration). So not a hard no, but I plea for additional scripts to create images that can be tested. I didn't want to commit untested or unverified stuff. I fully agree with your reasoning. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgppq4Jrt1La5.pgp Description: Digitale PGP-Signatur
changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)
Quoting Alexander Leidinger (from Tue, 08 Nov 2022 10:50:53 +0100): Quoting Warner Losh (from Mon, 7 Nov 2022 14:23:11 -0700): On Mon, Nov 7, 2022 at 4:15 AM Alexander Leidinger wrote: Quoting Li-Wen Hsu (from Mon, 7 Nov 2022 03:39:19 GMT): The branch main has been updated by lwhsu: URL: https://cgit.FreeBSD.org/src/commit/?id=72a1cb05cd230ce0d12a7180ae65ddbba2e0cb6d commit 72a1cb05cd230ce0d12a7180ae65ddbba2e0cb6d Author: Li-Wen Hsu AuthorDate: 2022-11-07 03:30:09 + Commit: Li-Wen Hsu CommitDate: 2022-11-07 03:30:09 + rc(8): Add a zpoolupgrade rc.d script If a zpool is created by makefs(8), its version is 5000, i.e., all feature flags are off. Introduce an rc script to run `zpool upgrade` over the assigned zpools on the first boot. This is useful to the ZFS based VM images built from release(7). diff --git a/share/man/man5/rc.conf.5 b/share/man/man5/rc.conf.5 index f9ceabc83120..43fa44a5f1cb 100644 --- a/share/man/man5/rc.conf.5 +++ b/share/man/man5/rc.conf.5 @@ -24,7 +24,7 @@ .\" .\" $FreeBSD$ .\" -.Dd August 28, 2022 +.Dd November 7, 2022 .Dt RC.CONF 5 .Os .Sh NAME @@ -2109,6 +2109,13 @@ A space-separated list of ZFS pool names for which new pool GUIDs should be assigned upon first boot. This is useful when using a ZFS pool copied from a template, such as a virtual machine image. +.It Va zpool_upgrade +.Pq Vt str +A space-separated list of ZFS pool names for which version should be upgraded +upon first boot. +This is useful when using a ZFS pool generated by +.Xr makefs 8 +utility. For someone who knows ZFS well, it is clear that only a zpool upgrade is done. Not so experienced people may assume there is a combination of zpool upgrade and zfs upgrade (more so for people which do not know what the difference is). Maybe you want to add some explicit documentation, that zfs upgrade + feature flags needs to be done by hand. And this brings me to a second topic, we don't have an explicit list of features which are supported by the bootloader (I had a look at the zfs and the boot related man pages, if I overlooked a place, then the other places should reference this important part with some text). There is a fixed list of features we support in the boot loader: /* * List of ZFS features supported for read */ static const char *features_for_read[] = { "org.illumos:lz4_compress", "com.delphix:hole_birth", "com.delphix:extensible_dataset", "com.delphix:embedded_data", "org.open-zfs:large_blocks", "org.illumos:sha512", "org.illumos:skein", "org.zfsonlinux:large_dnode", "com.joyent:multi_vdev_crash_dump", "com.delphix:spacemap_histogram", "com.delphix:zpool_checkpoint", "com.delphix:spacemap_v2", "com.datto:encryption", "com.datto:bookmark_v2", "org.zfsonlinux:allocation_classes", "com.datto:resilver_defer", "com.delphix:device_removal", "com.delphix:obsolete_counts", "com.intel:allocation_classes", "org.freebsd:zstd_compress", "com.delphix:bookmark_written", "com.delphix:head_errlog", "org.openzfs:blake3", NULL }; Any feature not on this list will cause the boot loader to reject the pool. Whether or not it should do that by default, always, or never is an open question. I've thought there should be a 'shoot footing' override that isn't there today. Thanks for the list. For those interested, it is in $SRC/stand/libsa/zfs/zfsimpl.c Just to make my opinion expressed before explicit again, this should be documented in a boot / bootloader related man-page, but isn't. Should the above list be sorted in some way? Maybe in the same order as the zpool-features lists them (sort by feature name after the colon), or alphabetical? Is it OK if I commit this alphabetical sorting? ---snip--- diff --git a/stand/libsa/zfs/zfsimpl.c b/stand/libsa/zfs/zfsimpl.c index 6b961f3110a..36c90613e82 100644 --- a/stand/libsa/zfs/zfsimpl.c +++ b/stand/libsa/zfs/zfsimpl.c @@ -118,29 +118,29 @@ static vdev_list_t zfs_vdevs; * List of ZFS features supported for read */ static const char *features_for_read[] = { -"org.illumos:lz4_compress", -"com.delphix:hole_birth", -"com.delphix:extensible_dataset", -"com.delphix:embedded_data", -"org.open-zfs:large_blocks", -"org.illumos:sha512", -"org.illumos:skein", -"org.zfsonlinux:large_dnode", -"com.j
Re: Did clang 14 lose some intrinsics support?
Quoting Dimitry Andric (from Mon, 26 Sep 2022 12:03:03 +0200): Sure, but if you are compiling without -mavx, why would you want the AVX intrinsics? You cannot use AVX intrinsics anyway, if AVX is not enabled. So I don't fully understand the problem this configure scripting is supposed to solve? Think about run time check of available CPU features and then using this code for performance critical sections only. Allows to generate programs which are generic to all CPUs in the main code paths, and able to switch to high performance implementations of critical code paths depending on the feature of the CPU. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpjLRXsPlQVc.pgp Description: Digitale PGP-Signatur
Re: Good practices with bectl
Quoting David Wolfskill (from Wed, 21 Sep 2022 03:25:52 -0700): On Wed, Sep 21, 2022 at 11:27:06AM +0200, Alexander Leidinger wrote: ... make DESTDIR=${BASEDIR} -DBATCH_DELETE_OLD_FILES delete-old delete-old-libs Usually I replace the delete-old-libs with check-old, as I don't want to blindly delete them (some ports may depend on them... at least for the few libs which don't have symbol versioning). A way to address that issue that may work for you is to install appropriate misc/compat* ports/packages. I'm running exclusively on -current. In the cases where this happens, there are no compat packages yet. And I rather update the ports than to install a compat package. It doesn't hurt me to keep the libs during the pkg rebuild. In the generic case I prefer to stay safe and keep the libs until I validated that nothing uses them anymore. That's the reason why I made the delete-old-libs functionality separate from delete-old already in the initial implementation. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpNgN_xBrE1T.pgp Description: Digitale PGP-Signatur
Re: Good practices with bectl
Quoting Alan Somers (from Tue, 20 Sep 2022 16:19:49 -0600): sudo bectl activate ${RELEASE} Failsafe (if the machine is too far away to simply walk over and switch to the old BE): bectl activate -t ${RELEASE} Needs an activate without -t later. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpXQ2xA9dSvy.pgp Description: Digitale PGP-Signatur
Re: Good practices with bectl
Quoting Nuno Teixeira (from Wed, 21 Sep 2022 00:11:41 +0100): (...) maybe: > yes | make DESTDIR=${BASEDIR} delete-old delete-old-libs make DESTDIR=${BASEDIR} -DBATCH_DELETE_OLD_FILES delete-old delete-old-libs Usually I replace the delete-old-libs with check-old, as I don't want to blindly delete them (some ports may depend on them... at least for the few libs which don't have symbol versioning). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpXRih39w_V7.pgp Description: Digitale PGP-Signatur
Re: domain names and internationalization?
Quoting Rick Macklem (from Mon, 19 Sep 2022 20:27:29 +): Hi, Recently there has been discussion on the NFSv4 IETF working group email list w.r.t. internationalization for the domain name it uses for users/groups. Right now, I am pretty sure the FreeBSD nfsuserd(8) only works for ascii domain names, but... I am hoping someone knows what DNS does in this area (the working group list uses terms like umlaut, which I have never even heard of;-). DNS does this: https://en.wikipedia.org/wiki/Punycode This page also shows some umlauts (German ones to be precise, e.g. "Bücher") and other things like chinese and other characters. There are libs which do the conversation, e.g. https://www.gnu.org/software/libidn/doxygen/index.html I don't know if there are libs with more preferred licenses. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpb4gSj3Lh6y.pgp Description: Digitale PGP-Signatur
Re: nullfs and ZFS issues
Quoting Eirik Øverby (from Mon, 25 Apr 2022 18:44:19 +0200): On Mon, 2022-04-25 at 15:27 +0200, Alexander Leidinger wrote: Quoting Alexander Leidinger (from Sun, 24 Apr 2022 19:58:17 +0200): > Quoting Alexander Leidinger (from Fri, 22 > Apr 2022 09:04:39 +0200): > > > Quoting Doug Ambrisko (from Thu, 21 Apr > > 2022 09:38:35 -0700): > > > > I've attached mount.patch that when doing mount -v should > > > show the vnode usage per filesystem. Note that the problem I was > > > running into was after some operations arc_prune and arc_evict would > > > consume 100% of 2 cores and make ZFS really slow. If you are not > > > running into that issue then nocache etc. shouldn't be needed. > > > > I don't run into this issue, but I have a huge perf difference when > > using nocache in the nightly periodic runs. 4h instead of 12-24h > > (22 jails on this system). > > > > > On my laptop I set ARC to 1G since I don't use swap and in the past > > > ARC would consume to much memory and things would die. When the > > > nullfs holds a bunch of vnodes then ZFS couldn't release them. > > > > > > FYI, on my laptop with nocache and limited vnodes I haven't run > > > into this problem. I haven't tried the patch to let ZFS free > > > it's and nullfs vnodes on my laptop. I have only tried it via > > > > I have this patch and your mount patch installed now, without > > nocache and reduced arc reclaim settings (100, 1). I will check the > > runtime for the next 2 days. > > 9-10h runtime with the above settings (compared to 4h with nocache > and 12-24h without any patch and without nocache). > I changed the sysctls back to the defaults and will see in the next > run (in 7h) what the result is with just the patches. And again 9-10h runtime (I've seen a lot of the find processes in the periodic daily run of those 22 jails in the state "*vnode"). Seems nocache gives the best perf for me in this case. Sorry for jumping in here - I've got a couple of questions: - Will this also apply to nullfs read-only mounts? Or is it only in case of writing "through" a nullfs mount that these problems are seen? - Is it a problem also in 13, or is this "new" in -CURRENT? We're having weird and unexplained CPU spikes on several systems, even after tuning geli to not use gazillions of threads. So far our suspicion has been ZFS snapshot cleanups but this is an interesting contender - unless the whole "read only" part makes it moot. For me this started after creating one more jail on this system and I dont't see CPU spikes (as the system is running permanently at 100% and the distribution of the CPU looks as I would expect it). The experience of Doug is a little bit different, as he experiences a high amount of CPU usage "for nothing" or even a dead-lock like situation. So I would say we see different things based on similar triggers. The nocache option for nullfs is affecting the number of vnodes in use on the system no matter if ro or rw. As such you can give it a try. Note, depending on the usage pattern, the nocache option may increase lock contention. So it may or may not have a positive or negative performance impact. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgp8REsu61FBW.pgp Description: Digitale PGP-Signatur
Re: nullfs and ZFS issues
Quoting Alexander Leidinger (from Sun, 24 Apr 2022 19:58:17 +0200): Quoting Alexander Leidinger (from Fri, 22 Apr 2022 09:04:39 +0200): Quoting Doug Ambrisko (from Thu, 21 Apr 2022 09:38:35 -0700): I've attached mount.patch that when doing mount -v should show the vnode usage per filesystem. Note that the problem I was running into was after some operations arc_prune and arc_evict would consume 100% of 2 cores and make ZFS really slow. If you are not running into that issue then nocache etc. shouldn't be needed. I don't run into this issue, but I have a huge perf difference when using nocache in the nightly periodic runs. 4h instead of 12-24h (22 jails on this system). On my laptop I set ARC to 1G since I don't use swap and in the past ARC would consume to much memory and things would die. When the nullfs holds a bunch of vnodes then ZFS couldn't release them. FYI, on my laptop with nocache and limited vnodes I haven't run into this problem. I haven't tried the patch to let ZFS free it's and nullfs vnodes on my laptop. I have only tried it via I have this patch and your mount patch installed now, without nocache and reduced arc reclaim settings (100, 1). I will check the runtime for the next 2 days. 9-10h runtime with the above settings (compared to 4h with nocache and 12-24h without any patch and without nocache). I changed the sysctls back to the defaults and will see in the next run (in 7h) what the result is with just the patches. And again 9-10h runtime (I've seen a lot of the find processes in the periodic daily run of those 22 jails in the state "*vnode"). Seems nocache gives the best perf for me in this case. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpbbgb6PWKOs.pgp Description: Digitale PGP-Signatur
Re: nullfs and ZFS issues
Quoting Alexander Leidinger (from Fri, 22 Apr 2022 09:04:39 +0200): Quoting Doug Ambrisko (from Thu, 21 Apr 2022 09:38:35 -0700): I've attached mount.patch that when doing mount -v should show the vnode usage per filesystem. Note that the problem I was running into was after some operations arc_prune and arc_evict would consume 100% of 2 cores and make ZFS really slow. If you are not running into that issue then nocache etc. shouldn't be needed. I don't run into this issue, but I have a huge perf difference when using nocache in the nightly periodic runs. 4h instead of 12-24h (22 jails on this system). On my laptop I set ARC to 1G since I don't use swap and in the past ARC would consume to much memory and things would die. When the nullfs holds a bunch of vnodes then ZFS couldn't release them. FYI, on my laptop with nocache and limited vnodes I haven't run into this problem. I haven't tried the patch to let ZFS free it's and nullfs vnodes on my laptop. I have only tried it via I have this patch and your mount patch installed now, without nocache and reduced arc reclaim settings (100, 1). I will check the runtime for the next 2 days. 9-10h runtime with the above settings (compared to 4h with nocache and 12-24h without any patch and without nocache). I changed the sysctls back to the defaults and will see in the next run (in 7h) what the result is with just the patches. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpufK9bQL8ps.pgp Description: Digitale PGP-Signatur
Re: nullfs and ZFS issues
Quoting Doug Ambrisko (from Thu, 21 Apr 2022 09:38:35 -0700): On Thu, Apr 21, 2022 at 03:44:02PM +0200, Alexander Leidinger wrote: | Quoting Mateusz Guzik (from Thu, 21 Apr 2022 | 14:50:42 +0200): | | > On 4/21/22, Alexander Leidinger wrote: | >> I tried nocache on a system with a lot of jails which use nullfs, | >> which showed very slow behavior in the daily periodic runs (12h runs | >> in the night after boot, 24h or more in subsequent nights). Now the | >> first nightly run after boot was finished after 4h. | >> | >> What is the benefit of not disabling the cache in nullfs? I would | >> expect zfs (or ufs) to cache the (meta)data anyway. | >> | > | > does the poor performance show up with | > https://people.freebsd.org/~mjg/vnlru_free_pick.diff ? | | I would like to have all the 22 jails run the periodic scripts a | second night in a row before trying this. | | > if the long runs are still there, can you get some profiling from it? | > sysctl -a before and after would be a start. | > | > My guess is that you are the vnode limit and bumping into the 1 second sleep. | | That would explain the behavior I see since I added the last jail | which seems to have crossed a threshold which triggers the slow | behavior. | | Current status (with the 112 nullfs mounts with nocache): | kern.maxvnodes: 10485760 | kern.numvnodes:3791064 | kern.freevnodes: 3613694 | kern.cache.stats.heldvnodes:151707 | kern.vnodes_created: 260288639 | | The maxvnodes value is already increased by 10 times compared to the | default value on this system. I've attached mount.patch that when doing mount -v should show the vnode usage per filesystem. Note that the problem I was running into was after some operations arc_prune and arc_evict would consume 100% of 2 cores and make ZFS really slow. If you are not running into that issue then nocache etc. shouldn't be needed. I don't run into this issue, but I have a huge perf difference when using nocache in the nightly periodic runs. 4h instead of 12-24h (22 jails on this system). On my laptop I set ARC to 1G since I don't use swap and in the past ARC would consume to much memory and things would die. When the nullfs holds a bunch of vnodes then ZFS couldn't release them. FYI, on my laptop with nocache and limited vnodes I haven't run into this problem. I haven't tried the patch to let ZFS free it's and nullfs vnodes on my laptop. I have only tried it via I have this patch and your mount patch installed now, without nocache and reduced arc reclaim settings (100, 1). I will check the runtime for the next 2 days. Your mount patch to show the per mount vnodes count looks useful, not only for this particular case. Do you intend to commit it? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpaRSyTU_E11.pgp Description: Digitale PGP-Signatur