Re: Files not deleted via update procedure: rescue/gbde usr/include/machine/fiq.h usr/lib/include/machine/fiq.h usr/share/man/man4/CAM.4.gz
On 7/27/24 19:28, Mark Millard wrote: On Jul 27, 2024, at 16:07, Mark Millard wrote: The following old files were in the historically incrementally updated directory tree but not in the installation to an empty directory tree (checked via diff -rq): /usr/obj/DESTDIRs/main-CA7-poud/rescue/gbde /usr/obj/DESTDIRs/main-CA7-poud/usr/include/machine/fiq.h /usr/obj/DESTDIRs/main-CA7-poud/usr/lib/include/machine/fiq.h /usr/obj/DESTDIRs/main-CA7-poud/usr/share/man/man4/CAM.4.gz That was an armv7 context. For comparison/contrast, aarch64 had: /usr/obj/DESTDIRs/main-CA76-poud/rescue/gbde /usr/obj/DESTDIRs/main-CA76-poud/usr/lib/debug/usr/tests/cddl/usr.sbin/dtrace/amd64/kinst/ /usr/obj/DESTDIRs/main-CA76-poud/usr/lib/debug/usr/tests/lib/libc/ssp/h_raw.debug /usr/obj/DESTDIRs/main-CA76-poud/usr/share/examples/IPv6/USAGE /usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man2/recvmmsg.2.gz /usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man2/sendmmsg.2.gz /usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man4/CAM.4.gz /usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man4/geom_map.4.gz /usr/obj/DESTDIRs/main-CA76-poud/usr/tests/cddl/usr.sbin/dtrace/amd64/kinst/ /usr/obj/DESTDIRs/main-CA76-poud/usr/tests/lib/libc/ssp/h_raw Thanks, I've pushed fixes for most of these. The *mmsg.2.gz links are actually not supposed to be stale and D46200 should fix those. h_raw is a bit more of an odd duck that isn't easily solved. I'm not sure why it was installed in the past for you but isn't installed anymore. -- John Baldwin
Re: aesni_load present in /boot/loader.conf on arm64
On 7/31/24 08:15, void wrote: Hi, Looking at man 4 aesni it appears this pertains to intel and AMD only? is its prescence on arm64 a bug? It seems to be added to /boot/loader.conf by default. The method I used to install is to boot to the latest snapshot at the time, then plug in a usb3 disk, ran bsdinstall to that disk, rebooted (this booted initially to the installer image), mounted the msdos partition on /mnt. moved the /boot/efi/efi from the installed-to disk out of the way, copied everything in /mnt to /boot/efi, moved the /boot/efi/efi back to where it originally was, halted the machine and removed the installer image. This was to achieve zfs-on-root. Maybe something about the way I installed meant aesni was added? Looks like bsdinstall hardcodes aesni without doing an architecture check for both ZFS and geli. Probably the bits of the zfsboot script referencing aesni need to switch on the architecture. The trick is that depending on the architecture you may want to load more than one module. For 14 I think you could get by with something like: crypto_kld() { case `uname -m` in amd64|i386) echo "aesni" ;; arm64) echo "armv8crypto" ;; *) echo "" } Then in the other parts of zfsboot call this function and treat it as a list of modules. On main I think you would want 32-bit arm and powerpc64 to list ossl, and you might want to include ossl for x86 and arm64 as well (eventually ossl should replace aesni and armv8crypto IMO). Side topic: the ossl(4) manpage in main is stale and needs to be updated to reflect armv7 and powerpc64 support. I'm not sure yet if it supports AES-GCM for armv8 as well. -- John Baldwin
Re: 41dfea24eec panics during ata attach on ESXi VM
On 6/5/24 4:35 AM, Yuri Pankov wrote: After updating to 41dfea24eec (GENERIC-NODEBUG), ESXi VM started to panic while attaching atapci children. I was unable to grab original boot panic data ("keyboard" dead), but was able to boot with hint.ata.0.disabled=1, hint.ata.1.disabled=1, and `devctl enable ata0` reproduced the issue: ata0: at channel 0 on atapci0 This should be fixed now by commit 56b822a17cde5940909633c50623d463191a7852. Sorry for the breakage. -- John Baldwin
Re: gcc behavior of init priority of .ctors and .dtors section
On 5/16/24 4:05 PM, Lorenzo Salvadore wrote: On Thursday, May 16th, 2024 at 20:26, Konstantin Belousov wrote: gcc13 from ports `# gcc ctors.c && ./a.out init 1 init 2 init 5 init 4 init 3 main fini 3 fini 4 fini 5 fini 2 fini 1` The above order is not expected. I think clang's one is correct. Further hacking with readelf shows that clang produces the right order of section .rela.ctors but gcc does not. ``` # clang -fno-use-init-array -c ctors.c && readelf -r ctors.o | grep 'Relocation section with addend (.rela.ctors)' -A5 > clang.txt # gcc -c ctors.c && readelf -r ctors.o | grep 'Relocation section with addend (.rela.ctors)' -A5 > gcc.txt # diff clang.txt gcc.txt 3,5c3,5 < 00080001 R_X86_64_64 0060 init_65535_2 + 0 < 0008 00070001 R_X86_64_64 0040 init + 0 < 0010 00060001 R_X86_64_64 0020 init_65535 + 0 --- 00060001 R_X86_64_64 0011 init_65535 + 0 0008 00070001 R_X86_64_64 0022 init + 0 0010 00080001 R_X86_64_64 0033 init_65535_2 + 0 ``` The above show clearly gcc produces the wrong order of section `.rela.ctors`. Is that expected behavior ? I have not tried Linux version of gcc. Note that init array vs. init function behavior is encoded by a note added by crt1.o. I suspect that the problem is that gcc port is built without --enable-initfini-array configure option. Indeed, support for .init_array and .fini_array has been added to the GCC ports but is present in the *-devel ports only for now. I will soon proceed to enable it for the GCC standard ports too. lang/gcc14 is soon to be added to the ports tree and will have it since the beginning. If this is indeed the issue, switching to a -devel GCC port should fix it. FWIW, the devel/freebsd-gcc* ports have passed this flag to GCC's configure for a long time (since we made the switch in clang). -- John Baldwin
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic [now fixed]
On 2/14/24 11:03 PM, Mark Millard wrote: On Feb 14, 2024, at 18:19, Mark Millard wrote: Your changes have the RPi4B that previously got the panic to boot all the way instead. Details: I have updated my pkg base environment to have the downloaded kernels (and kernel source) with your changes and have booted with each of: /boot/kernel/kernel /boot/kernel.GENERIC-NODEBUG/kernel For reference: # uname -apKU FreeBSD aarch64-main-pkgs 15.0-CURRENT FreeBSD 15.0-CURRENT main-n268300-d79b6b8ec267 GENERIC-NODEBUG arm64 aarch64 1500014 1500012 Thanks for the fix. Now I'll update the rest of pkg base materials. Question: Are any of the changes to be MFC'd at some point? If I do I will merge a large batch at once, and probably adjust the order. For example, I'll merge the pci_host_generic changes before pci_pci changes so that stable branches will be bisectable. -- John Baldwin
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/14/24 10:16 AM, Mark Millard wrote: Top posting a related but separate item: I looked up some old (2022-Dec-17) lspci -v output from a Linux boot. Note the "Memory at" value 6 (in the 35 bit BCM2711 address space) and the "(64-bit, non-prefetchable)" (and "[size=4K]"). 01:00.0 USB controller: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller (rev 01) (prog-if 30 [XHCI]) Subsystem: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller Device tree node: /sys/firmware/devicetree/base/scb/pcie@7d50/pci@0,0/usb@0,0 Flags: bus master, fast devsel, latency 0, IRQ 51 Memory at 6 (64-bit, non-prefetchable) [size=4K] Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/4 Maskable- 64bit+ Capabilities: [c4] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Kernel driver in use: xhci_hcd "Memory at 6 (64-bit, non-prefetchable)": Violation of a PCIe standard? No, this is a device BAR which can be 64-bit (memory BARs can either be 32-bits or 64-bits). However, the "window" in a PCI _bridge_ for memory is only defined to be 32-bits. Windows in PCI-PCI bridges are a special type of BAR that defines the address ranges that the bridge decodes on the parent side and passes down to child devices. The prefetchable window in PCI-PCI bridges can optionally be 64-bit. BAR == a range of memory or I/O port addresses decoded by a device, usually mapped to a register bank, but sometimes mapped to internal memory (e.g. a framebuffer) Window == a range of memory or I/O port addresses decoded by a bridge for which transactions are passed across the bridge to be handled by a child device. -- John Baldwin
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/14/24 9:57 AM, Mark Millard wrote: On Feb 14, 2024, at 08:08, John Baldwin wrote: On 2/12/24 5:57 PM, Mark Millard wrote: On Feb 12, 2024, at 16:36, Mark Millard wrote: On Feb 12, 2024, at 16:10, Mark Millard wrote: On Feb 12, 2024, at 12:00, Mark Millard wrote: [Gack: I was looking at the wrong vintage of source code, predating your changes: wrong system used.] On Feb 12, 2024, at 10:41, Mark Millard wrote: On Feb 12, 2024, at 09:32, John Baldwin wrote: On 2/9/24 8:13 PM, Mark Millard wrote: Summary: pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 . . . rman_manage_region: request: start 0x6, end 0x6000f panic: Failed to add resource to rman Hmmm, I suspect this is due to the way that bus_translate_resource works which is fundamentally broken. It rewrites the start address of a resource in-situ instead of keeping downstream resources separate from the upstream resources. For example, I don't see how you could ever release a resource in this design without completely screwing up your rman. That is, I expect trying to detach a PCI device behind a translating bridge that uses the current approach should corrupt the allocated resource ranges in an rman long before my changes. That said, that doesn't really explain the panic. Hmm, the panic might be because for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource hack only kicks in the activate_resource method of pci_host_generic.c. Detail: . . . pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 This indicates this is a translating bus. pcib1: irq 91 at device 0.0 on pci0 rman_manage_region: request: start 0x1, end 0x1 pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10 rman_reserve_resource_bound: request: [0xc000, 0xc00f], length 0x10, flags 102, device pcib1 rman_reserve_resource_bound: trying 0x <0xc000,0xf> considering [0xc000, 0x] truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10) candidate region: [0xc000, 0xc00f], size 0x10 allocating from the beginning rman_manage_region: request: start 0x6, end 0x6000f What you later typed does not match: 0x6 0x6000f You later typed: 0x6000 0x600fff This seems to have lead to some confusion from using the wrong figure(s). The fact that we are trying to reserve the CPU addresses in the rman is because bus_translate_resource rewrote the start address in the resource after it was allocated. That said, I can't see why rman_manage_region would actually fail. At this point the rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"), so only the check that should be failing are the checks against rm_start and rm_end. For the memory window, rm_start is always 0, and rm_end is always 0x, so both the old (0xc - 0xc00f) and new (0x6000 - 0x600fff) ranges are within those bounds. No: 0x .vs (actual): 0x6 0x6000f Ok, then this explains the failure if the "raw" addresses are above 4G. I have access to an emag I'm currently using to test fixes to pci_host_generic.c to avoid corrupting struct resource objects. I'll post the diff once I've got something verified to work. It looks to me like in sys/dev/pci/pci_pci.c the: static void pcib_probe_windows(struct pcib_softc *sc) { . . . pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0x); . . . is just inappropriately restrictive about where in the system address space a PCIe can validly be mapped to on the high end. That, in turn, leads to the rejection on the RPi4B now that the range use is checked. No, the physical register in PCI-PCI bridges is only 32-bits. Only the prefetchable BAR supports 64-bit addresses. Just for my edification . . . As I understand, SYS_RES_MEMORY for the BCM2711 means the 35 bit addressing space in the BCM2711, not a PCIe device internal address range that corresponds. Am I wrong about that? If I'm wrong, what does identify the 35 bit addressing space in the BCM2711? If I'm correct, then the 0..0x seems to be from the wrong address space up front. Or, may be, the SYS_RES_MEMORY and the 0x argments are not related as I expected and the 0x is not a SYS_RES_MEMORY value? We use SYS_RES_MEMORY for both address spaces. SYS_RES_MEMORY is more of an address space "type" and doesn't necessarily name a single, unique address space. The way to think about these address spaces is instances of 'struct rman'. There's a global 'struct rman' in the arm64
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/14/24 8:42 AM, Warner Losh wrote: On Wed, Feb 14, 2024 at 9:08 AM John Baldwin wrote: On 2/12/24 5:57 PM, Mark Millard wrote: On Feb 12, 2024, at 16:36, Mark Millard wrote: On Feb 12, 2024, at 16:10, Mark Millard wrote: On Feb 12, 2024, at 12:00, Mark Millard wrote: [Gack: I was looking at the wrong vintage of source code, predating your changes: wrong system used.] On Feb 12, 2024, at 10:41, Mark Millard wrote: On Feb 12, 2024, at 09:32, John Baldwin wrote: On 2/9/24 8:13 PM, Mark Millard wrote: Summary: pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 . . . rman_manage_region: request: start 0x6, end 0x6000f panic: Failed to add resource to rman Hmmm, I suspect this is due to the way that bus_translate_resource works which is fundamentally broken. It rewrites the start address of a resource in-situ instead of keeping downstream resources separate from the upstream resources. For example, I don't see how you could ever release a resource in this design without completely screwing up your rman. That is, I expect trying to detach a PCI device behind a translating bridge that uses the current approach should corrupt the allocated resource ranges in an rman long before my changes. That said, that doesn't really explain the panic. Hmm, the panic might be because for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource hack only kicks in the activate_resource method of pci_host_generic.c. Detail: . . . pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 This indicates this is a translating bus. pcib1: irq 91 at device 0.0 on pci0 rman_manage_region: request: start 0x1, end 0x1 pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10 rman_reserve_resource_bound: request: [0xc000, 0xc00f], length 0x10, flags 102, device pcib1 rman_reserve_resource_bound: trying 0x <0xc000,0xf> considering [0xc000, 0x] truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10) candidate region: [0xc000, 0xc00f], size 0x10 allocating from the beginning rman_manage_region: request: start 0x6, end 0x6000f What you later typed does not match: 0x6 0x6000f You later typed: 0x6000 0x600fff This seems to have lead to some confusion from using the wrong figure(s). The fact that we are trying to reserve the CPU addresses in the rman is because bus_translate_resource rewrote the start address in the resource after it was allocated. That said, I can't see why rman_manage_region would actually fail. At this point the rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"), so only the check that should be failing are the checks against rm_start and rm_end. For the memory window, rm_start is always 0, and rm_end is always 0x, so both the old (0xc - 0xc00f) and new (0x6000 - 0x600fff) ranges are within those bounds. No: 0x .vs (actual): 0x6 0x6000f Ok, then this explains the failure if the "raw" addresses are above 4G. I have access to an emag I'm currently using to test fixes to pci_host_generic.c to avoid corrupting struct resource objects. I'll post the diff once I've got something verified to work. It looks to me like in sys/dev/pci/pci_pci.c the: static void pcib_probe_windows(struct pcib_softc *sc) { . . . pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0x); . . . is just inappropriately restrictive about where in the system address space a PCIe can validly be mapped to on the high end. That, in turn, leads to the rejection on the RPi4B now that the range use is checked. No, the physical register in PCI-PCI bridges is only 32-bits. Only the prefetchable BAR supports 64-bit addresses. This is why the host bridge is doing a translation from the CPU side (0x6) to the PCI BAR addresses (0xc000) so that the BAR addresses are down in the 32-bit address range. It's also true that many PCI devices only support 32-bit addresses in memory BARs. 64-bit BARs are an optional extension not universally supported. The translation here is somewhat akin to a type of MMU where the CPU addresses are mapped to PCI addresses. The problem here is that the PCI BAR resources need to "stay" as PCI addresses since we depend on being able to use rman_get_start/end to get the PCI addresses of allocated resources, but pci_host_generic.c currently rewrites the addresses. Probably I should remove rman_set_start/end entirely (Warner added them back in 2004) as the methods don't
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/12/24 5:57 PM, Mark Millard wrote: On Feb 12, 2024, at 16:36, Mark Millard wrote: On Feb 12, 2024, at 16:10, Mark Millard wrote: On Feb 12, 2024, at 12:00, Mark Millard wrote: [Gack: I was looking at the wrong vintage of source code, predating your changes: wrong system used.] On Feb 12, 2024, at 10:41, Mark Millard wrote: On Feb 12, 2024, at 09:32, John Baldwin wrote: On 2/9/24 8:13 PM, Mark Millard wrote: Summary: pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 . . . rman_manage_region: request: start 0x6, end 0x6000f panic: Failed to add resource to rman Hmmm, I suspect this is due to the way that bus_translate_resource works which is fundamentally broken. It rewrites the start address of a resource in-situ instead of keeping downstream resources separate from the upstream resources. For example, I don't see how you could ever release a resource in this design without completely screwing up your rman. That is, I expect trying to detach a PCI device behind a translating bridge that uses the current approach should corrupt the allocated resource ranges in an rman long before my changes. That said, that doesn't really explain the panic. Hmm, the panic might be because for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource hack only kicks in the activate_resource method of pci_host_generic.c. Detail: . . . pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 This indicates this is a translating bus. pcib1: irq 91 at device 0.0 on pci0 rman_manage_region: request: start 0x1, end 0x1 pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10 rman_reserve_resource_bound: request: [0xc000, 0xc00f], length 0x10, flags 102, device pcib1 rman_reserve_resource_bound: trying 0x <0xc000,0xf> considering [0xc000, 0x] truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10) candidate region: [0xc000, 0xc00f], size 0x10 allocating from the beginning rman_manage_region: request: start 0x6, end 0x6000f What you later typed does not match: 0x6 0x6000f You later typed: 0x6000 0x600fff This seems to have lead to some confusion from using the wrong figure(s). The fact that we are trying to reserve the CPU addresses in the rman is because bus_translate_resource rewrote the start address in the resource after it was allocated. That said, I can't see why rman_manage_region would actually fail. At this point the rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"), so only the check that should be failing are the checks against rm_start and rm_end. For the memory window, rm_start is always 0, and rm_end is always 0x, so both the old (0xc - 0xc00f) and new (0x6000 - 0x600fff) ranges are within those bounds. No: 0x .vs (actual): 0x6 0x6000f Ok, then this explains the failure if the "raw" addresses are above 4G. I have access to an emag I'm currently using to test fixes to pci_host_generic.c to avoid corrupting struct resource objects. I'll post the diff once I've got something verified to work. It looks to me like in sys/dev/pci/pci_pci.c the: static void pcib_probe_windows(struct pcib_softc *sc) { . . . pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0x); . . . is just inappropriately restrictive about where in the system address space a PCIe can validly be mapped to on the high end. That, in turn, leads to the rejection on the RPi4B now that the range use is checked. No, the physical register in PCI-PCI bridges is only 32-bits. Only the prefetchable BAR supports 64-bit addresses. This is why the host bridge is doing a translation from the CPU side (0x6) to the PCI BAR addresses (0xc000) so that the BAR addresses are down in the 32-bit address range. It's also true that many PCI devices only support 32-bit addresses in memory BARs. 64-bit BARs are an optional extension not universally supported. The translation here is somewhat akin to a type of MMU where the CPU addresses are mapped to PCI addresses. The problem here is that the PCI BAR resources need to "stay" as PCI addresses since we depend on being able to use rman_get_start/end to get the PCI addresses of allocated resources, but pci_host_generic.c currently rewrites the addresses. Probably I should remove rman_set_start/end entirely (Warner added them back in 2004) as the methods don't do anything to deal with the fallout that the rman.rm_list linked-list is no longer sorted by address once some addresses get rewritten, etc. -- John Baldwin
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/10/24 2:09 PM, Michael Butler wrote: I have stability problems with anything at or after this commit (b377ff8) on an amd64 laptop. While I see the following panic logged, no crash dump is preserved :-( It happens after ~5-6 minutes running in KDE (X). Reverting to 36efc64 seems to work reliably (after ACPI changes but before the problematic PCI one) kernel: Fatal trap 12: page fault while in kernel mode kernel: cpuid = 2; apic id = 02 kernel: fault virtual address = 0x48 kernel: fault code= supervisor read data, page not present kernel: instruction pointer = 0x20:0x80acb962 kernel: stack pointer = 0x28:0xfe00c4318d80 kernel: frame pointer = 0x28:0xfe00c4318d80 kernel: code segment = base 0x0, limit 0xf, type 0x1b kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 kernel: processor eflags = interrupt enabled, resume, IOPL = 0 kernel: current process = 2 (clock (0)) kernel: rdi: f802e460c000 rsi: rdx: 0002 kernel: rcx: r8: 001e r9: fe00c4319000 kernel: rax: 0002 rbx: f802e460c000 rbp: fe00c4318d80 kernel: r10: 1388 r11: 7ffc765d r12: 000f kernel: r13: 0002 r14: f8000193e740 r15: kernel: trap number = 12 kernel: panic: page fault kernel: cpuid = 2 kernel: time = 1707573802 kernel: Uptime: 6m19s kernel: Dumping 942 out of 16242 MB:..2%..11%..21%..31%..41%..51%..62%..72%..82%..92% kernel: Dump complete kernel: Automatic reboot in 15 seconds - press a key on the console to abort Without a stack trace it is pretty much impossible to debug a panic like this. Do you have KDB_TRACE enabled in your kernel config? I'm also not sure how the PCI changes can result in a panic post-boot. If you were going to have problems they would be during device attach, not after you are booted and running X. Short of a stack trace, you can at least use lldb or gdb to lookup the source line associated with the faulting instruction pointer (as long as it isn't in a kernel module), e.g. for gdb you would use 'gdb /boot/kernel/kernel' and then 'l *', e.g. from above: 'l *0x80acb962' -- John Baldwin
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/9/24 8:13 PM, Mark Millard wrote: Summary: pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 . . . rman_manage_region: request: start 0x6, end 0x6000f panic: Failed to add resource to rman Hmmm, I suspect this is due to the way that bus_translate_resource works which is fundamentally broken. It rewrites the start address of a resource in-situ instead of keeping downstream resources separate from the upstream resources. For example, I don't see how you could ever release a resource in this design without completely screwing up your rman. That is, I expect trying to detach a PCI device behind a translating bridge that uses the current approach should corrupt the allocated resource ranges in an rman long before my changes. That said, that doesn't really explain the panic. Hmm, the panic might be because for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource hack only kicks in the activate_resource method of pci_host_generic.c. Detail: . . . pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 This indicates this is a translating bus. pcib1: irq 91 at device 0.0 on pci0 rman_manage_region: request: start 0x1, end 0x1 pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10 rman_reserve_resource_bound: request: [0xc000, 0xc00f], length 0x10, flags 102, device pcib1 rman_reserve_resource_bound: trying 0x <0xc000,0xf> considering [0xc000, 0x] truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10) candidate region: [0xc000, 0xc00f], size 0x10 allocating from the beginning rman_manage_region: request: start 0x6, end 0x6000f The fact that we are trying to reserve the CPU addresses in the rman is because bus_translate_resource rewrote the start address in the resource after it was allocated. That said, I can't see why rman_manage_region would actually fail. At this point the rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"), so only the check that should be failing are the checks against rm_start and rm_end. For the memory window, rm_start is always 0, and rm_end is always 0x, so both the old (0xc - 0xc00f) and new (0x6000 - 0x600fff) ranges are within those bounds. I would instead expect to see some other issue later on where we fail to allocate a resource for a child BAR, but I wouldn't expect rman_manage_region to fail. Logging the return value from rman_manage_region would be the first step I think to see which error value it is returning. Probably I should fix pci_host_generic.c to handle translation properly however. I can work on a patch for that. -- John Baldwin
Re: make installworld fails because /usr/include/c++/v1/__tuple is a file
On 12/10/23 8:43 AM, Dimitry Andric wrote: On 10 Dec 2023, at 15:11, Herbert J. Skuhra wrote: On Sun, Dec 10, 2023 at 01:22:38PM +, John F Carr wrote: On arm64 running CURRENT from two weeks ago I updated to c711af772782 Bump __FreeBSD_version for llvm 17.0.6 merge and built and installed from source. make installworld failed: install: target directory `/usr/include/c++/v1/__tuple/' does not exist That pathname is a file: -r--r--r-- 1 root wheel 20512 Feb 15 2023 /usr/include/c++/v1/__tuple Early in make output is mtree -deU -i -f /usr/src/etc/mtree/BSD.include.dist -p /usr/include ./c++/v1/__algorithm/pstl_backends missing (created) [...] ./c++/v1/__tuple missing (not created: File exists) Should I remove the file and try again, or is there a more elegant fix? The word "tuple" does not appear in UPDATING. 'make delete-old' should have removed this file. bdd1243df58e6 (Dimitry Andric 2023-04-14 23:41:27 +0200 965) OLD_FILES+=usr/include/c++/v1/__tuple Ah yes, that's it. The file was removed during the upgrade from libc++ 15.0 to 16.0, while its contents was split into a subdirectory named __tuple_dir. In libc++ 17.0.0 they renamed this subdirectory back to just __tuple. This means that apparently people are not running "make delete-old" after installations. Please don't forget that. :) Well, but if you have an old system with LLVM 15 that you upgrade directly to LLVM 17 you will hit this even if you ran delete-old after your last upgrading that used LLVM 15. We might need something to cope with this during the install target for libc++ in particular where this has occurred multiple times historically. -- John Baldwin
Re: bhyve -G
On 11/15/23 3:06 PM, Bakul Shah wrote: On Nov 15, 2023, at 7:57 AM, John Baldwin wrote: On 10/9/23 5:21 PM, Bakul Shah wrote: Any hints on how to use bhyve's -G option to debug a VM kernel? I can connect to it from gdb with "target remote :" & bhyve stops the VM initially but beyond that I am not sure. Ideally this should work just like an in-circuit-emulator, not requiring anything special in the VM or kernel itself. step only works on Intel CPUs currently (and is a bit fragile anyway due to interrupts firing while you try to step, but that happens for me in QEMU as well). Breakpoints should work fine. I tend to use 'until' to do stepping (basically stepping via temporary breakpoints) when debugging the kernel this way. Thanks for your response! I can ^C to stop the VM, examine the stack, set breakpoints, continue etc. but when the breakpoint is hit, kgdb doesn't regain control -- instead I get the usual db> ... prompt on the console. I guess I have to set some sysctl for this? Hmm, no, it shouldn't be breaking into DDB in the guest as the breakpoint exception should be intercepted by the stub and never made visible to the guest. -- John Baldwin
Re: [HEADS-UP] Quick update to 14.0-RELEASE schedule
On 11/14/23 8:52 PM, Glen Barber wrote: On Tue, Nov 14, 2023 at 08:10:23PM -0700, The Doctor wrote: On Wed, Nov 15, 2023 at 02:27:01AM +, Glen Barber wrote: On Tue, Nov 14, 2023 at 05:15:48PM -0700, The Doctor wrote: On Tue, Nov 14, 2023 at 08:36:54PM +, Glen Barber wrote: We are still waiting for a few (non-critical) things to complete before the announcement of 14.0-RELEASE will be ready. It should only be another day or so before these things complete. Thank you for your understanding. I always just installed my copy. Ok. I do not know what exactly is your point, but releases are never official until there is a PGP-signed email sent. The email is intended for the general public of consumers of official releases, not "yeah, but"s. Howver if you do a freebsd-update upgrade, you can upgrade. Is that suppose to happen? That does not say that the freebsd-update bits will not change *until* the official release announcement has been sent. In my past 15 years involved in the Project, I think we have been very clear on that. A RELEASE IS NOT FINAL UNTIL THE PGP-SIGNED ANNOUNCEMENT IS SENT. I mean, c'mon, dude. We really, seriously, for all intents and purposes, cannot be any more clear than that. So, yes, *IF* an update necessitates a new freebsd-update build, what you are running is *NOT* official. For at least 15 years, we have all said the same entire thing. Yes, but, if at this point we had to rebuild, it would have to be 14.0.1 or something (which we have done a few times in the past). It would be too confusing otherwise once the bits are built and published (where published means "uploaded to our CDN"). It is the 14.0 release bits, the only question is if for some reason we had a dire emergency that meant we had to pull it at the last minute and publish different bits (under a different release name). Realistically, once the bits are available, we can't prevent people from using them, it's just at their own risk to do so until the project says "yes, we believe these are good". Granted, they are under the same risk if they are still running the last RC. The best way to minimize that risk going forward is to add more automation of testing/CI to go along with the process of building release bits so that the build artifacts from the release build run through CI and are only published if the CI is green as that would give us greater confidence of "we believe these are good" before they are uploaded for publishing. -- John Baldwin
Re: bsdinstall/scriptedpart could not run ;-(
On 11/12/23 11:00 PM, KIRIYAMA Kazuhiko wrote: Hi, all I usually run bsdinstall by instllerconfig, but bsdinstall/scriptedpart could not run ;-( My installerconfig is: PARTITIONS='nda0 gpt { 200M efi, 804G freebsd-ufs /, 128G freebsd-swap }' DISTRIBUTIONS='base.txz kernel-dbg.txz kernel.txz lib32.txz tests.txz' ZFSBOOT_DISKS="" #!/bin/sh /bin/mkdir -p /.dake /bin/cp /usr/share/zoneinfo/Asia/Tokyo /etc/localtime /bin/cp /root/.cshrc /root/.cshrc.org /bin/cat <> /etc/fstab 192.168.1.17:/.dake /.dake nfs rw 0 0 EOF sed -i".bak" -Ee '/^#BDS_install.sh_added:start_line$/,/^#BDS_install.sh_added:end_line$/d' /root/.cshrc /bin/cat <<'EOF' >> /root/.cshrc #BDS_install.sh_added:start_line setenv PATH${PATH}:/.dake/bin setenv MGRHOME /usr/home/admin setenv OPENTOOLSDIR/.dake setenv DAKEDIR /.dake #BDS_install.sh_added:end_line EOF : (snip) : I investigated in bsdinstall script and found scriptedpart which acutually run partedit with scriptedpart would not be destroy existing partition. In fact scriptedpart -> partedit changed in script as follows, then parttion editor run at terminal. My guess is something to do with commit 23099099196548550461ba427dcf09dcfb01878d, though I don't see how it could work any differently in this case as the only change to part_config there was to return if if geom_gettree fails, and if it fails, provider_for_name would presumably have failed anyway. -- John Baldwin
Re: bhyve -G
On 10/9/23 5:21 PM, Bakul Shah wrote: Any hints on how to use bhyve's -G option to debug a VM kernel? I can connect to it from gdb with "target remote :" & bhyve stops the VM initially but beyond that I am not sure. Ideally this should work just like an in-circuit-emulator, not requiring anything special in the VM or kernel itself. step only works on Intel CPUs currently (and is a bit fragile anyway due to interrupts firing while you try to step, but that happens for me in QEMU as well). Breakpoints should work fine. I tend to use 'until' to do stepping (basically stepping via temporary breakpoints) when debugging the kernel this way. -- John Baldwin
Re: KTLS thread on 14.0-RC3
On 10/30/23 3:41 AM, Zhenlei Huang wrote: On Oct 30, 2023, at 12:09 PM, Zhenlei Huang wrote: On Oct 29, 2023, at 5:43 PM, Gordon Bergling wrote: Hi, I am currently building a new system, which should be based on 14.0-RELEASE. Therefor I am tracking releng/14.0 since its creation and updating it currently via the usualy buildworld steps. What I have noticed recently is, that the [KTLS] is missing. I have a stable/13 system which shows the [KTLS] thread and a very recent -CURRENT that also shows the [KTLS] thread. The stable/13 and releng/14.0 systems both use the GENERIC kernel, without any custom modifications. Loaded KLDs are also the same. Did I miss something, or is there something in releng/14.0 missing, which is currenlty enabled in stable/13? KTLS shall still work as intended, the creation of it threads is deferred. See a72ee355646c (ktls: Defer creation of threads and zones until first use) Run ktls_init() when the first KTLS session is created rather than unconditionally during boot. This avoids creating unused threads and allocating unused resources on systems which do not use KTLS. ``` -SYSINIT(ktls, SI_SUB_SMP + 1, SI_ORDER_ANY, ktls_init, NULL); ``` Seems 14.0 only create one KTLS thread. IIRC 13.2 create one thread per core. That part should not be different. There should always be one thread per core. -- John Baldwin
15/14 upgrades break old sudo, maybe bump PAM's shlib?
I upgraded my laptop from a late June current to current from yesterday today, and after installworld sudo stopped working (dies with a SIGBUS). After some debugging, the issue ended up being OpenSSL library version mismatches as sudo uses PAM and PAM is linked agianst OpenSSL 3, but sudo is linked against OpenSSL 1.1.1. Both shlibs get mapped into the the process and at some point sudo crosses the streams and the crash occurs inside OpenSSL 3's libcrypto. I realize that we do have a generate note about needing to update third party packages after an upgrade, but I tend to use sudo as part of my workflow for doing that sort of thing. I generally build all my own packages via poudriere and use sudo at various points in that process, but even if I were using FreeBSD.org packages I would be using sudo to try to run 'pkg upgrade'. su(8) in base works fine, so that's my workaround for now on my laptop, but I wonder if we want to make this particular bump on the upgrade path a little less bumpy? Either by being clear in our release notes that tools like sudo (and I suspect any other third-party su wrappers that also use PAM, xscreensaver's screen lock doesn't seem to be affected since it probably doesn't use OpenSSL directly thankfully) can break, or another route we could take would be to bump the DSO versions of things that depend on libcrypto/libssl in base. We did not do this latter approach for the OpenSSL 1.0.2 -> 1.1.1 upgrade FWIW. If we wanted to do the shlib bump approach, Enji had a good list from a while back (though Enji wanted to make them all private rather than bumping): - kerberos - libarchive - libbsnmp - libfetch - libgeli - libldns - libmp - libradius - libunbound From my research it seems that PAM (library and modules), gssapi libraries, and libzfs would also need to be on the list. libldns is already private as is libunbound, though bumping them might be safter anyway. There is on libgeli, instead there is geli_eli.so which has no version, but hopefully is not widely used in ports the same as PAM. Note also that if we did this, we would want to do it for 14.0 as 13.x -> 14 upgrades are affected in the same way. -- John Baldwin
Re: user problems when upgrading to v15
On 9/2/23 7:11 AM, Dimitry Andric wrote: On 1 Sep 2023, at 03:42, brian whalen wrote: Repeating the entire process: I created a 13.2 vm with 6 cores and 8GB of ram. Ran freebsd-update fetch and install. Ran pkg install git bash ccache open-vm-tools-nox11 Used git clone to get current and ports source files. Edited /etc/make.conf to use ccache Ran make -j6 buildworld && make -j6 kernel I then rebooted in single user mode and did the next steps saving output to a file with > filename. etcupdate -p was pretty uneventful. It did show the below and did not prompt to edit. root@f15:~ # less etcupdatep C /etc/group C /etc/master.passwd This is a problem: the "C" characters mean there were conflicts, and it's indeed very unfortunate that etcupdate does not immediately force you to resolve them. Because now you basically have mangled group and master.passwd files, with conflict markers in them! No, the conflicted files are in /var/db/etcupdate/conflicts, the files in /etc are still the old ones at this point and won't be updated until you run 'etcupdate resolve' to fix them. I suspect what happened here is that Brian chose the 'tf' (theirs-full) option for 'etcupdate resolve' when he really wanted to do 'e' to edit the conflicted version. Immediately after this, you should run "etcupdate resolve", and fix any conflicts that it has found. Note that recently there was a lot of churn due to the removal of $FreeBSD$ keywords, and this almost always creates conflicts in the group and passwd files. For lots of other files in /etc, the conflicts are resolved automatically, but unfortunately not for the files that are essential to log in! make installworld seemed mostly error free though I did see a nonzero status for a man page failed inn the man4 directory. etcupdate -B only showed the below. This was my first build after install. root@f15:~ # less etcupdateB Conflicts remain from previous update, aborting. Yes, that is indeed the problem. You must first resolve conflicts from any previous etcupdate run, before doing anything else. As to why it does not immediately forces you to do so, and delegates this to a separate step, which can easily be forgotten, I have no idea. So that if you are doing scripted upgrades, you don't hang forever in a script. The intention is that after doing a bunch of scripted installworld + etcupdate's on various hosts you can use 'etcupdate status' to see if there are any remaining steps requiring manual intervention. There could be an option to request batched vs interactivate updates perhaps. If I type exit in single user mode to go multi user mode, the local user still works. After a reboot the local user still works. This local user can also sudo as expected. This wasn't the case for the previous build when I first reported this. However, if I run etcupdate resolve it is still presenting /etc/group and /etc/master/passwd as problems. If this is is expected behavior for current then no big deal. I just wasn't sure. The conflicts themselves are expected, alas. But you _must_ resolve them, otherwise you can end up with a mostly-bricked system. No, the conflict markers are not placed in the versions in /etc. However, etucpdate does refuse to do a "new" upgrade until you resolve all the conflicts from your previous upgrade to ensure that conflicted upgrades aren't missed. -- John Baldwin
Re: Support for more than 256 CPU cores
On 5/5/23 6:38 AM, Ed Maste wrote: FreeBSD supports up to 256 CPU cores in the default kernel configuration (on Tier-1 architectures). Systems with more than 256 cores are available now, and will become increasingly common over FreeBSD 14’s lifetime. The FreeBSD Foundation is supporting the effort to increase MAXCPU, and PR269572[1] is open to track tasks and changes. As a project we have scalability work ahead of us to make best use of high core count machines, but at a minimum we should be able to boot a GENERIC kernel on such systems, and have an ABI for the FreeBSD 14 release that supports such a configuration. Some changes have already been committed in support of increased MAXCPU, including increasing MAX_APIC_ID (commit c8113dad7ed4) and a number of changes to reduce bloat (such as commits 42f722e721cd, e72f7ed43eef, 78cfa762ebf2 and 74ac712f72cf). The next step is to increase the maximum cpuset size for userland. I have this change open in review D39941[2] and an exp-run request in PR271213[3]. Following that the kernel change for increasing MAXCPU is in D36838[4]. Additional work on bloat reduction will continue after this change, and looking forward FreeBSD is going to need ongoing effort from the community and the FreeBSD Foundation to continue improving scalability. [1] https://bugs.freebsd.org/269572 [2] https://reviews.freebsd.org/D39941 [3] https://bugs.freebsd.org/271213 [4] https://reviews.freebsd.org/D36838 FWIW, I think it will be useful for main to run with a larger userspace MAXCPU than kernel for at least a while so that we have better testing of that configuration and to give headroom for bumping MAXCPU in the kernel during the 14.x branch. The only other viable path I think which would be more work would be to rework cpuset_t in userspace to always use a dynamically sized mask. This could perhaps be done in an API-preserving manner by making cpuset_t an opaque wrapper type in userland and requiring CPU_* to indirect to functions in libc, etc. That's a fair bit more work however. -- John Baldwin
Re: How to Enable support for IPsec deprecated algorithms: 3DES, MD5-HMAC
On 10/4/22 1:53 AM, alfadev wrote: Hi, i am trying to move my gateway from FreeBSD 11.0 to FreeBSD 14.0 to use newly added ipfw table lookup for mac addresses (https://reviews.freebsd.org/D35103) Also I have too many IPSec connections between fortigate, cisco etc. And their operators use only 3DES algorithms and they have no intention to change it for me. So, now i have to enable 3DES support for FreeBSD 14.0 . To add 3DES support again i changed some files shown below. I am not sure what i did any help welcomes. You do not want to just restore the files as-is. You instead want to revert some of the diffs from the first commit. The second commit for /dev/crypto doesn't matter for IPsec and you can ignore it. However, you will need to also partially revert commit 0e00c709d7f1cdaeb584d244df9534bcdd0ac527 which removes DES and 3DES from OCF itself. This is what removed enc_xform_des for example. -- John Baldwin
Re: pkg: Newer FreeBSD version for package... but why?
On 7/13/22 3:17 AM, Andriy Gapon wrote: On 2022-07-13 13:09, Michael Gmelin wrote: On Wed, 13 Jul 2022 10:29:06 +0300 Andriy Gapon wrote: # uname -U 1400063 # uname -K 1400063 # pkg upgrade Updating FreeBSD repository catalogue... Fetching packagesite.pkg: 100%5 MiB 4.8MB/s00:01 Processing entries: 0% Newer FreeBSD version for package zyre: To ignore this error set IGNORE_OSVERSION=yes - package: 1400063 - running kernel: 1400051 Ignore the mismatch and continue? [y/N]: Does anyone know why this would happen? Where does pkg get its notion of the running kernel version? If I'm reading the sources correctly, it's determining the OS version by looking at the elf headers of various files in this order: getenv("ABI_FILE") /usr/bin/uname /bin/sh So I would assume that `file /usr/bin/uname` shows 1400051 on your system. Thank you very much! That's it: # file /usr/bin/uname /usr/bin/uname: ELF 32-bit LSB executable, ARM, EABI5 version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, FreeBSD-style, for FreeBSD 14.0 (1400051), stripped You can point it to checking another file by setting ABI_FILE[0] in the environment or ignore the check by setting IGNORE_OSVERSION (like advised). The "running kernel:" label seems a bit misleading. Indeed. Now the next thing (for me) to research is why the binaries were built "for FreeBSD 14.0 (1400051)" when the source tree has 1400063 and uname -U also reports 1400063. FWIW, this was a cross-build, maybe that played a role too. If you do a NO_CLEAN=yes build, we don't relink binaries just because crt*.o changed (where the note is stored). -- John Baldwin
Re: BLAKE3 unstability?
On 7/12/22 1:41 AM, Evgeniy Khramtsov wrote: I can reproduce via: $ truncate -s 10G /tmp/test $ mdconfig -f /tmp/test -S 4096 $ zpool create test /dev/md1 $ zfs create -o checksum=blake3 test/b $ dd if=/dev/random of=/test/b/noise bs=1M count=4096 $ sync $ zpool scrub test $ zpool status I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was most recently merged) built out of tree on either stable/13 70fd40edb86 or main 9aa02d5120a. I'll update a system and see if I can reproduce it with the in-tree ZFS. - Ryan It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either. Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using different implementations? I do see that blake3 went in with only a Linux module parameter for the implementation selection, so I'll have to fix that. For now we can at least see which was fastest, which should be the one selected. You just won't be able to manually change it to see if that helps. - Ryan I found the culprit (kernel and base from download.FreeBSD.org kernel.txz and base.txz respectively) (I forgot about local sysctl.conf...): kern.sched.steal_thresh=1 kern.sched.preempt_thresh=121 Then #!/bin/sh truncate -s 10G /tmp/test mdconfig -f /tmp/test -S 4096 zpool create test /dev/md0 zfs create -o checksum=blake3 test/b dd if=/dev/random of=/test/b/noise bs=1M count=4096 sync zpool scrub test sleep 3 zpool status zpool destroy test mdconfig -d -u 0 rm /tmp/test As for ULE "tuning", these values give me fine desktop interactivity when building lang/rust when nice and idprio did not help, so I left them in sysctl.conf. Not sure if scheduling parameters are worthy of a ZFS PR, maybe something essential is preempted. It could be missing fpu_kern_enter/leave that lack of preemption would cover over. I thought that missing that would give a panic in the kernel though due to FPU instructions being disabled (including vector instructions). Maybe ZFS isn't using fpu_kern_enter(FPU_NOCTX) and is instead trying to juggle contexts and it has a bug in how it manages saved FPU contexts and reuses a context? If so, I would just suggest that ZFS switch to using FPU_KERN_NOCTX instead which runs all SSE type code in a critical section to disable preemption but avoids having to allocate and manage FPU contexts. -- John Baldwin
Re: Profiled libraries on freebsd-current
On 5/4/22 1:38 PM, Steve Kargl wrote: On Wed, May 04, 2022 at 01:22:57PM -0700, John Baldwin wrote: On 5/4/22 12:53 PM, Steve Kargl wrote: On Wed, May 04, 2022 at 11:12:55AM -0700, John Baldwin wrote: I don't know the entire FreeBSD ecosystem. Do people use FreeBSD on embedded systems (e.g., nanobsd) where libthr may be stripped out? Thus, --enable-threads=no is needed. If they do, they are also using a constrained userland and probably are not shipping a GCC binary either. However, it's not clear to me what --enable-threads means. Does this enable -pthread as an option? If so, that should definitely just always be on. It's still an option users have to opt into via a command line flag and doesn't prevent building non-threaded programs. If it's enabling use of threads at runtime within GCC itself, I'd say that also should probably just be allowed to be on. I can't really imagine what else it might mean (and I doubt it means the latter). AFAICT, it controls whether -lpthread is automatically added to the command line. In the case of -pg, it is -lpthread_p. The relevant lines are #ifdef FBSD_NO_THREADS #define FBSD_LIB_SPEC "\ %{pthread: %eThe -pthread option is only supported on FreeBSD when gcc \ is built with the --enable-threads configure-time option.} \ %{!shared: \ %{!pg: -lc} \ %{pg: -lc_p} \ }" #else #define FBSD_LIB_SPEC "\ %{!shared: \ %{!pg: %{pthread:-lpthread} -lc} \ %{pg: %{pthread:-lpthread_p} -lc_p} \ }\ %{shared:\ %{pthread:-lpthread} -lc \ }" #endif Ed is wondering if one can get rid of FBSD_NO_THREADS. With the pending removal of WITH_PROFILE, the above reduces to #define FBSD_LIB_SPEC " \ %{!shared:\ %{pthread:-lpthread} -lc\ } \ %{shared: \ %{pthread:-lpthread} -lc\ }" If one can do the above, then freebsd-nthr.h is no longer needed and can be deleted and config.gcc's handling of --enable-threads can be updated/removed. Ok, so it's just if -pthread is supported (%{pthread:-lpthread} only adds -lpthread if -pthread was given on the command line). That can just be on all the time and Ed is correct that it is safe to remove the FBSD_NO_THREADS case and assume it is always present instead. -- John Baldwin
Re: Profiled libraries on freebsd-current
On 5/4/22 12:53 PM, Steve Kargl wrote: On Wed, May 04, 2022 at 11:12:55AM -0700, John Baldwin wrote: On 5/2/22 10:37 AM, Steve Kargl wrote: On Mon, May 02, 2022 at 12:32:25PM -0400, Ed Maste wrote: On Sun, 1 May 2022 at 11:54, Steve Kargl wrote: diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h index 594487829b5..1e8ab2e1827 100644 --- a/gcc/config/freebsd-spec.h +++ b/gcc/config/freebsd-spec.h @@ -93,14 +93,22 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see (similar to the default, except no -lg, and no -p). */ #ifdef FBSD_NO_THREADS I wonder if we can simplify things now, and remove this `FBSD_NO_THREADS` case. I didn't see anything similar in other GCC targets I looked at. That I don't know. FBSD_NO_THREADS is defined in freebsd-nthr.h. In fact, it's the only thing in that header (except copyright broilerplate). freebsd-nthr.h only appears in config.gcc and seems to only get added to the build if someone runs configure with --enable-threads=no. Looking at my last config.log for gcc trunk, I see "Thread model: posix", which appears to be the default case or if someone does --enable-threads=yes or --enable-threads=posix. So, I suppose it comes down to two questions: (1) is libpthread.* available on all supported targets and versions? (2) does anyone build gcc without threads support? libpthread is available on all supported architectures on all supported versions. libthr has been the default threading library since 7.0 and the only supported library since 8.0. In GDB I just assume libthr style threads, and I think GCC can safely do the same. I don't know the entire FreeBSD ecosystem. Do people use FreeBSD on embedded systems (e.g., nanobsd) where libthr may be stripped out? Thus, --enable-threads=no is needed. If they do, they are also using a constrained userland and probably are not shipping a GCC binary either. However, it's not clear to me what --enable-threads means. Does this enable -pthread as an option? If so, that should definitely just always be on. It's still an option users have to opt into via a command line flag and doesn't prevent building non-threaded programs. If it's enabling use of threads at runtime within GCC itself, I'd say that also should probably just be allowed to be on. I can't really imagine what else it might mean (and I doubt it means the latter). -- John Baldwin
Re: Profiled libraries on freebsd-current
On 5/2/22 10:37 AM, Steve Kargl wrote: On Mon, May 02, 2022 at 12:32:25PM -0400, Ed Maste wrote: On Sun, 1 May 2022 at 11:54, Steve Kargl wrote: diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h index 594487829b5..1e8ab2e1827 100644 --- a/gcc/config/freebsd-spec.h +++ b/gcc/config/freebsd-spec.h @@ -93,14 +93,22 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see (similar to the default, except no -lg, and no -p). */ #ifdef FBSD_NO_THREADS I wonder if we can simplify things now, and remove this `FBSD_NO_THREADS` case. I didn't see anything similar in other GCC targets I looked at. That I don't know. FBSD_NO_THREADS is defined in freebsd-nthr.h. In fact, it's the only thing in that header (except copyright broilerplate). freebsd-nthr.h only appears in config.gcc and seems to only get added to the build if someone runs configure with --enable-threads=no. Looking at my last config.log for gcc trunk, I see "Thread model: posix", which appears to be the default case or if someone does --enable-threads=yes or --enable-threads=posix. So, I suppose it comes down to two questions: (1) is libpthread.* available on all supported targets and versions? (2) does anyone build gcc without threads support? libpthread is available on all supported architectures on all supported versions. libthr has been the default threading library since 7.0 and the only supported library since 8.0. In GDB I just assume libthr style threads, and I think GCC can safely do the same. -- John Baldwin
Re: 'set but unused' breaks drm-*-kmod
On 4/21/22 6:45 AM, Emmanuel Vadot wrote: On Thu, 21 Apr 2022 08:51:26 -0400 Michael Butler wrote: On 4/21/22 03:42, Emmanuel Vadot wrote: Hello Michael, On Wed, 20 Apr 2022 23:39:12 -0400 Michael Butler wrote: Seems this new requirement breaks kmod builds too .. The first of many errors was (I stopped chasing them all for lack of time) .. --- amdgpu_cs.o --- /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.7.19_3/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1210:26: error: variable 'priority' set but not used [-Werror,-Wunused-but-set-variable] enum drm_sched_priority priority; ^ 1 error generated. *** [amdgpu_cs.o] Error code 1 How are you building the port, directly or with PORTS_MODULES ? I do make passes on the warning for drm and I did for set-but-not-used case but unfortunately this option doesn't exists in 13.0 so I couldn't apply those in every branch. I build this directly on -current. I'm guessing that these are what triggered this behaviour: commit 8b83d7e0ee54416b0ee58bd85f9c0ae7fb3357a1 Author: John Baldwin Date: Mon Apr 18 16:06:27 2022 -0700 Make -Wunused-but-set-variable a fatal error for clang 13+ for kernel builds. Reviewed by:imp, emaste Differential Revision: https://reviews.freebsd.org/D34949 commit 615d289ffefe2b175f80caa9b1e113c975576472 Author: John Baldwin Date: Mon Apr 18 16:06:14 2022 -0700 Re-enable set but not used warnings for kernel builds. make tinderbox now passes with this warning enabled as a fatal error, so revert the change to hide it in preparation for making it fatal. This reverts commit e8e691983bb75e80153b802f47733f1531615fa2. Reviewed by:imp, emaste Differential Revision: https://reviews.freebsd.org/D34948 Ok I see, I won't have time until monday (maybe tuesday to fix this) but if someone wants to beat me to it we should add some new CWARNFLAGS for each problematic files in the 5.4-lts and 5.7-table branches of drm-kmod (master which is following 5.10 is already good) only if $ {COMPILER_VERSION} >= 13. There is already a helper you can use that deals with compiler versions: CWARNFLAGS+= ${NO_WUNUSED_BUT_SET_VARIABLE} or some such. -- John Baldwin
Re: ktrace on NFSroot failing?
On 3/10/22 8:14 AM, Mateusz Guzik wrote: On 3/10/22, Bjoern A. Zeeb wrote: Hi, I am having a weird issue with ktrace on an nfsroot machine: root:/tmp # ktrace sleep 1 root:/tmp # kdump -559038242 Events dropped. kdump: bogus length 0xdeadc0de Anyone seen something like this before? I just did a quick check and it definitely fails on nfs mounts: # ktrace pwd /root/mjg # kdump -559038242 Events dropped. kdump: bogus length 0xdeadc0de I don't have time to look into it this week though. Possibly related: core dumps are no longer working for me on NFS mounts. I get a 0 byte foo.core instead of a valid core dump. -- John Baldwin
Re: Buildworld fails with external GCC toolchain
On 2/12/22 11:34 AM, Yasuhiro Kimura wrote: From: Dimitry Andric Subject: Re: Buildworld fails with external GCC toolchain Date: Fri, 11 Feb 2022 22:53:44 +0100 Not really, the gcc 9 build has been broken for months, as far as I know. See also: https://ci.freebsd.org/job/FreeBSD-main-amd64-gcc9_build/ The last build(s) show a different error from yours, though: /workspace/src/tests/sys/netinet/libalias/util.c: In function 'set_udp': /workspace/src/tests/sys/netinet/libalias/util.c:112:2: error: converting a packed 'struct ip' pointer (alignment 2) to a 'uint32_t' {aka 'unsigned int'} pointer (alignment 4) may result in an unaligned pointer value [-Werror=address-of-packed-member] 112 | uint32_t *up = (void *)p; | ^~~~ In file included from /workspace/src/tests/sys/netinet/libalias/util.h:37, from /workspace/src/tests/sys/netinet/libalias/util.c:39: /workspace/src/sys/netinet/ip.h:51:8: note: defined here 51 | struct ip { |^~ -Dimitry Thanks for information. I went back the commit history of main branch about every month and check if buildworld succeeds with GCC. But it didn't succeed even if I went back about a year. And devel/binutils port was update to 2.37 on last August. So I suspect external GCC toolchain doesn't work well after binutils is updated to current version. I have amd64 world + kernel building with GCC 9 and the only remaining open review not merged yet is https://reviews.freebsd.org/D34147. It is work to keep it working though and I hadn't worked on it again until recently. -- John Baldwin
Re: Dragonfly Mail Agent (dma) in the base system
On 1/27/22 1:34 PM, Ed Maste wrote: The Dragonfly Mail Agent (dma) is a small Mail Transport Agent (MTA) which accepts mail from a local Mail User Agent (MUA) and delivers it locally or to a smarthost for delivery. dma does not accept inbound mail (i.e., it does not listen on port 25) and is not intended to provide the same functionality as a full MTA like postfix or sendmail. It is intended for use cases such as delivering cron(8) mail. Since 2014 we have a copy of dma in the base system available as an optional component, enabled via the WITH_DMAGENT src.conf knob. I am interested in determining whether dma is a viable minimal base system MTA, and if not what gaps remain. If you have enabled DMA on your systems (or are willing to give it a try) and have any feedback or are aware of issues please follow up or submit a PR as appropriate. I've used DMA on systems without local mail accounts to forward cron periodic e-mails just fine. It even supports STARTTLS and SMTP AUTH. I haven't tried using it for simple local delivery to /var/mail/root. -- John Baldwin
Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]
On 1/1/22 9:00 AM, Ed Maste wrote: On Fri, 31 Dec 2021 at 18:04, John Baldwin wrote: However, your point about libcxxrt.so.1 is valid. It needs to also be moved to /lib if libc++.so.1 is moved to /lib. libcxxrt.so.1 has always been in /lib. Oh, I was thrown off by the .so indirection for libcxxrt in the linker script. -- John Baldwin
Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]
On 12/31/21 2:59 PM, Mark Millard wrote: On 2021-Dec-31, at 14:28, Mark Millard wrote: On 2021-Dec-30, at 14:04, John Baldwin wrote: On 12/30/21 1:09 PM, Mark Millard wrote: On 2021-Dec-30, at 13:05, Mark Millard wrote: This asks a question in a different direction that my prior reports about my builds vs. Cy's reported build. Background: /usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libc++.so:GROUP ( /lib/libc++.so.1 /usr/lib/libcxxrt.so and: lrwxr-xr-x 1 root wheel23 Dec 29 13:17:01 2021 /usr/lib/libcxxrt.so -> ../../lib/libcxxrt.so.1 Why did libc++.so.1 not get a: /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 I forgot to remove the .1 on the left hand side: /usr/lib/libc++.so -> ../../lib/libc++.so.1 Because for libc++.so we don't just symlink to the current version of the library (as we do for most other shared libraries) to tell the compiler what to link against for -lc++, instead we use a linker script that tells the compiler to link against both of those libraries when -lc++ is encountered. A better identification of what looks odd to me is the path variations in: # more /usr/lib/libc++.so Another not great day on my part: That path alone makes the mix of /lib/ and /usr/lib/ use involved, given the reference to /lib/libc++.so.1 . That would still be true if the other path had been /lib/libcxxrt.so . /usr/lib/libc++.so is only used by the compiler/linker when linking a binary. The resulting binary has the associated paths (/lib/libc++.so.1 and /usr/lib/libcxxrt.so.1) in its DT_NEEDED. So it is fine for the .so to be in /usr/lib. This is the same with /usr/lib/libc.so vs /lib/libc.so.7. However, your point about libcxxrt.so.1 is valid. It needs to also be moved to /lib if libc++.so.1 is moved to /lib. Doing so will also require yet another depend-clean.sh fixup (well, probably just adjusting the one I added to check the libcxxrt path instead of libc++ path). -- John Baldwin
Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]
On 12/30/21 1:09 PM, Mark Millard wrote: On 2021-Dec-30, at 13:05, Mark Millard wrote: This asks a question in a different direction that my prior reports about my builds vs. Cy's reported build. Background: /usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libc++.so:GROUP ( /lib/libc++.so.1 /usr/lib/libcxxrt.so and: lrwxr-xr-x 1 root wheel23 Dec 29 13:17:01 2021 /usr/lib/libcxxrt.so -> ../../lib/libcxxrt.so.1 Why did libc++.so.1 not get a: /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 I forgot to remove the .1 on the left hand side: /usr/lib/libc++.so -> ../../lib/libc++.so.1 Because for libc++.so we don't just symlink to the current version of the library (as we do for most other shared libraries) to tell the compiler what to link against for -lc++, instead we use a linker script that tells the compiler to link against both of those libraries when -lc++ is encountered. I have finally reproduced Cy's build error locally and am testing my fix. If it works I'll commit it. -- John Baldwin
Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore
On 12/14/21 9:40 AM, Gleb Smirnoff wrote: On Tue, Dec 14, 2021 at 09:28:07AM -0800, John Baldwin wrote: J> > AFAIK, today it will always panic only with WITNESS. Without WITNESS it would J> > pass through mtx_lock as long as the mutex is not locked. J> J> Yes, but the default kernel on head is GENERIC which has witness enabled, hence J> the out of the box kernel panics reliably. :) J> J> > So, do you suggest to push D33340 before finalizing D9? J> J> Yes, I think so. Pushed. And I plan to post new version of D33339 today. Thanks! -- John Baldwin
Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore
On 12/13/21 12:25 PM, Gleb Smirnoff wrote: On Mon, Dec 13, 2021 at 11:56:35AM -0800, John Baldwin wrote: J> > J> So there are two things here. The root issue is that the devel/apr1 port J> > J> runs a configure test for TCP_NDELAY being inherited by accepted sockets. J> > J> This test panics because prison_check_ip4() tries to lock a prison mutex J> > J> to walk the IPs assigned to a jail, but the caller (in_pcblookup_hash()) has J> > J> done an smr_enter() which is a critical_enter(): J> > J> > The first one is known, and I got a patch to fix it: J> > J> > https://reviews.freebsd.org/D33340 J> > J> > However, a pre-requisite to this simple patch is more complex: J> > J> > https://reviews.freebsd.org/D9 J> > J> > There is some discussion on how to improve that, and I decided to do that J> > rather than stick to original version. So I takes a few extra days. J> > J> > We could push D33340 into main, if the negative effects (raciness of J> > the prison check) is considered lesser evil then potentially contested J> > mtx_lock in smr section. J> J> I think raciness is probably better than always panicking as it does today. AFAIK, today it will always panic only with WITNESS. Without WITNESS it would pass through mtx_lock as long as the mutex is not locked. Yes, but the default kernel on head is GENERIC which has witness enabled, hence the out of the box kernel panics reliably. :) So, do you suggest to push D33340 before finalizing D9? Yes, I think so. -- John Baldwin
Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore
On 12/14/21 2:14 AM, Alexey Dokuchaev wrote: How do you mean? Most FreeBSD people, not some random Twitter crowd, want the bell to be on by default, but it's still off. I don't know that that's true, and I myself am not sure that I want it back on by default. Previously my laptop had a rather annoying beep whose volume I couldn't control that I actually prefer to have off normally. On further reflection, the beep I was looking for for bad input may actually be an xscreensaver thing for an invalid character to unlock the screen vs a sysbeep anyway. -- John Baldwin
Re: RFC: What to do about Allocate in the NFS server for FreeBSD13?
On 12/13/21 8:30 AM, Konstantin Belousov wrote: On Mon, Dec 13, 2021 at 04:26:42PM +, Rick Macklem wrote: Hi, There are two problems with Allocate in the NFSv4.2 server in FreeBSD13: 1 - It uses the thread credentials instead of the ones for the RPC. 2 - It does not ensure that file changes are committed to stable storage. These problems are fixed by commit f0c9847a6c47 in main, which added ioflag and cred arguments to VOP_ALLOCATE(). I can think of 3 ways to fix Allocate in FreeBSD13: 1 - Apply a *hackish* patch like this: + savcred = p->td_ucred; + td->td_ucred = cred; do { olen = len; error = VOP_ALLOCATE(vp, &off, &len); if (error == 0 && len > 0 && olen > len) maybe_yield(); } while (error == 0 && len > 0 && olen > len); + p->td_ucred = savcred; if (error == 0 && len > 0) error = NFSERR_IO; + if (error == 0) + error = VOP_FSYNC(vp, MNT_WAIT, p); The worst part of it is temporarily setting td_ucred to cred. 2 - MFC'ng commit f0c9847a6c47. Normally changes to the VOP/VFS are not MFC'd. However, in this case, it might be ok to do so, since it is unlikely there is an out of source tree file system with a custom VOP_ALLOCATE() method? I do not see much wrong with #2, this is what I would do myself. I also think this is fine. -- John Baldwin
Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore
On 12/13/21 9:35 AM, Gleb Smirnoff wrote: Hi John, On Mon, Dec 13, 2021 at 07:45:07AM -0800, John Baldwin wrote: J> So there are two things here. The root issue is that the devel/apr1 port J> runs a configure test for TCP_NDELAY being inherited by accepted sockets. J> This test panics because prison_check_ip4() tries to lock a prison mutex J> to walk the IPs assigned to a jail, but the caller (in_pcblookup_hash()) has J> done an smr_enter() which is a critical_enter(): The first one is known, and I got a patch to fix it: https://reviews.freebsd.org/D33340 However, a pre-requisite to this simple patch is more complex: https://reviews.freebsd.org/D9 There is some discussion on how to improve that, and I decided to do that rather than stick to original version. So I takes a few extra days. We could push D33340 into main, if the negative effects (raciness of the prison check) is considered lesser evil then potentially contested mtx_lock in smr section. I think raciness is probably better than always panicking as it does today. J> However, it was a bit harder to see this originally as the 915kms driver J> tries to do a malloc(M_WAITOK) from cn_grab() when entering DDB which J> recursively panics (even a malloc(M_NOWAIT) from cn_grab() is probably a J> bad idea). When it panicked in X the result was that the screen just froze J> on whatever it had most recently drawn and the machine looked hung. (The J> fact that that sysbeep is off so I couldn't tell if typing in commands was J> doing anything vs emitting errors probably didn't improve trying to diagnose J> the hang as "sitting in ddb" initially, though I don't know if DDB itself J> emits a beep for invalid commands, etc.) Didn't know about this one. Is this isolated to actually entering DDB or there is some path that in a normal inpcb lookup we would M_WAITOK? This is in the drm(4) driver, nothing to do with in_pcb, just made it harder to see the in_pcb issue. -- John Baldwin
smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore
mprove trying to diagnose the hang as "sitting in ddb" initially, though I don't know if DDB itself emits a beep for invalid commands, etc.) -- John Baldwin
Re: Make etcupdate bootstrap requirement due to previous mergemaster usage more clear in handbook
On 12/3/21 6:09 PM, Tomoaki AOKI wrote: On Fri, 03 Dec 2021 05:54:37 -0800 (PST) "Jeffrey Bouquet" wrote: On Fri, 3 Dec 2021 13:58:39 +0100, Miroslav Lachman <000.f...@quip.cz> wrote: On 03/12/2021 12:52, Yetoo Happy wrote: [...] Quick Start* and follow the instructions and get to step 7 and may think that even though etcupdate is different from mergemaster from the last time they used the handbook they have faith that following the instructions won't brick their system. This user will instead find that faith in general is just a very complex facade for the pain and suffering of not following *24.5.6.1 Merging Configuration Files* because the user doesn't know that step exists or relevant to the current step and ends up unknowingly having etcupdate append "<<<< yours ... >>>>> new" to the top of the user's very important configuration files that they didn't expect the program to actually modify that way when they resolved differences nor could they predict easily because the diff format is so unintuitive and different from mergemaster. Now unable to login or boot into single user mode because redirections instead of the actual configuration is parsed the user goes to the handbook to find out what might have happened and scrolls down to find *24.5.6.1 Merging Configuration Files* is under *24.5.6. [...] That's why I think etcupdate is not so intuitive as tool like this should be and etcupdate is extremely dangerous because it intentionally breaks syntax of files vital to have system up and running. If anything goes wrong with mergemaster automatic process then your have configuration not updated which is almost always fine to boot the system and fix it. But after etcupdate? Much worse... I maintain about 30 machines for 2 decades and had problems with etcupdate many times. I had ti use mergemaster as fall back many times. Mainly because of etcupdate said "Reference tree to diff against unavailable" or "No previous tree to compare against, a sane comparison is not possible.". And sometimes because etcupdate cannot automatically update many files in /etc/rc.d and manual merging of a lot of files with "<<<< >>>>" is realy painful while with mergemaster only simple keyboard shortcuts will solve it. All of this must be very stressful for beginners. So beside the update of documentation I really would like to see some changes to etcupdate workflow where files are modified in temporary location and moved to destination only if they do not contain any syntax breaking changes like <<<<, , >>>>. Kind regards Miroslav Lachman Agree. I fell back to mergemaster this Nov on 13-stable when the /var files pertaining to etcupdate were all missing current /etc data, and no study of man etcupdate was clear enough to rectify such a scenario, and suspect my initial use of etcupdate will or may require a planned reinstall, not having had to do so since Jan 2004 iirc, [ vs failed hard disk migrations ] and I am just hoping mergemaster stays in /usr/src and updated for system changes, even if moved to 'tools' or something, since its use seems intuitive and much less of a black box. Also, /usr/src/UPDATING still at the bottom emphasizes mergemaster still. Not sure it's fixed or not (tooo dangerous to try...), -n (dry-run) option of etcupdate is now quite harmful. Maybe by any commit done in this april on main (MFC'ed to stable/13 in june). *I got busy manually checking and applying changes to /etc, and forgot to file PR. Doing `etcupdate -n` itself runs OK, but following `etcupdate -B` does NOT do anything, hence nothing is actually updated. The only workaround I have is NOT to try dry-run. Humm. It would be because the same trees are used for dry-run and actual run. (Not looked into the code. Just a thought.) So the new changes always build a temporary tree (vs trying to build /var/db/etupdate/current in place). For -n it should be that it just doesn't change /var/db/etcupdate/current at the end, but if it did the move anyway that would explain the bug you are seeing. That does indeed look broken. Please file a PR as a reminder for me to fix it. -- John Baldwin
Re: Make etcupdate bootstrap requirement due to previous mergemaster usage more clear in handbook
On 12/3/21 4:58 AM, Miroslav Lachman wrote: So beside the update of documentation I really would like to see some changes to etcupdate workflow where files are modified in temporary location and moved to destination only if they do not contain any syntax breaking changes like <<<<, , >>>>. This is what etcupdate does, so I'm a bit confused why you are getting merge markers in /etc. When an automated 3-way merge doesn't work due to conflicts, the file with the conflicts is saved in /var/db/etcupdate/conflicts/. It is only copied to /etc when you mark it as fully resolved when running 'etcupdate resolve'. Perhaps you had multiple conflicts in a modified file and when editing the file you only fixed the first one and then marked it as resolved at the prompt? Even in that case etcupdate explicitly prompts you a second time after you say "r" with "File still has conflicts, are you sure?", so it will only install a file to /etc with those changes if you have explicitly confirmed you want it. -- John Baldwin
Re: amd64 (example) main [so: 14]: delete-old check-old delete-old-libs missing a bunch of files?
d without updating ObsoleteFiles.inc. Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/mixer: Kyuafile Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/mixer: mixer_test Fallout from recent mixer changes? Hans might know more. Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v1-sparc64-sav.in Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v1-sparc64-sav.out Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v1-sparc64-u.out Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v1-sparc64-usr.in Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v1-sparc64-usr.out Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v2-sparc64-sav.in Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v2-sparc64-u.out Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v2-sparc64-usr.in I'll commit fixes for some of these. -- John Baldwin
Re: make cleandiry tries to access /lib/geom
On 11/24/21 3:30 AM, Bjoern A. Zeeb wrote: Hi, 673 ===> usr.bin/diff/tests (cleandir) 674 ===> lib/geom (cleandir) 675 ===> sbin/mount_udf (cleandir) 676 make[6] warning: /lib/geom: Permission denied. not sure what is going on here? 677 ===> share/i18n/esdb/ISO-8859 (cleandir) 678 ===> tests/sys/cddl/zfs/tests/cli_root/zfs_clone (cleandir) I think Jess has a possible fix. This is some regression added in the build system several months ago. -- John Baldwin
Re: "Khelp module "ertt" can't unload until its refcount drops from 1 to 0." after "All buffers synced."?
On 11/19/21 4:29 AM, tue...@freebsd.org wrote: On 19. Nov 2021, at 00:11, Mark Millard wrote: On 2021-Nov-18, at 12:31, tue...@freebsd.org wrote: On 17. Nov 2021, at 21:13, Mark Millard via freebsd-current wrote: I've not noticed the ertt message before in: . . . Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... done All buffers synced. Uptime: 1d9h57m18s Khelp module "ertt" can't unload until its refcount drops from 1 to 0. Hi Mark, what kernel configuration are you using? What kernel modules are loaded? The shutdown was of my ZFS boot media but the machine is currently doing builds on the UFS media. (The ZFS media is present but not mounted). For now I provide information from the booted UFS system. The UFS context is intended to be nearly a copy of the brctl selection for main [so: 14] from the ZFS media. Both systems have been doing the same poudriere builds for various comparison/contrast purposes. The current build activity will likely take 16+ hrs. Hi Mark, thanks a lot for the information. I was contemplating whether this message was related to a recent change in the TCP congestion module handling, but since it was already there in February, this is not the case. Will try to reproduce this, but wasn't able up to now. The congestion control changes just probably exacerbated the bug by adding a new reference on this module, just as they exposed the bug with khelp using the wrong SYSINIT subsystem. -- John Baldwin
Re: cross-compiling for i386 on amd64 fails
On 11/15/21 8:34 PM, Michael Butler via freebsd-current wrote: Haven't had time to identify which change caused this yet but I now get .. ===> lib/libsbuf (obj,all,install) ===> cddl/lib/libumem (obj,all,install) ===> cddl/lib/libnvpair (obj,all,install) ===> cddl/lib/libavl (obj,all,install) ld: error: /usr/obj/usr/src/i386.i386/tmp/usr/lib/libspl.a(assert.o) is incompatible with elf_i386_fbsd ===> cddl/lib/libspl (obj,all,install) cc: error: linker command failed with exit code 1 (use -v to see invocation) --- libavl.so.2 --- *** [libavl.so.2] Error code 1 make[4]: stopped in /usr/src/cddl/lib/libavl My guess is that this was fixed by git: 9e9c651caceb - main - cddl: fix missing ZFS library dependencies -- John Baldwin
Re: git: 2f7f8995367b - main - libdialog: Bump shared library version to 10. [ the .so.10 is listed in mk/OptionalObsoleteFiles.inc ?]
On 10/27/21 3:23 PM, Mark Millard via freebsd-current wrote: On 2021-Oct-27, at 15:21, Mark Millard wrote: Unfortunately(?) this update added the .so.10 to mk/OptionalObsoleteFiles.inc : diff --git a/tools/build/mk/OptionalObsoleteFiles.inc b/tools/build/mk/OptionalObsoleteFiles.inc index a8b0329104c4..91822aac492a 100644 --- a/tools/build/mk/OptionalObsoleteFiles.inc +++ b/tools/build/mk/OptionalObsoleteFiles.inc _at__at_ -1663,11 +1663,11 _at__at_ OLD_FILES+=usr/bin/dialog . . . OLD_FILES+=usr/lib/libdialog.so -OLD_FILES+=usr/lib/libdialog.so.8 +OLD_FILES+=usr/lib/libdialog.so.10 . . . Looks to my like that +line should have been: +OLD_FILES+=usr/lib/libdialog.so.9 (presuming the original .so.8 was correct during .so.9 's time frame). Looks like: +OLD_FILES+=usr/lib/libdpv.so.3 is the same sort of issue and possibly should have been: +OLD_FILES+=usr/lib/libdpv.so.2 No, these lines are for removing the current versions of the libraries if you do 'make delete-old WITHOUT_DIALOG=yes'. They weren't bumped previously when I bumped them for ncurses (probably my fault). -- John Baldwin
Re: main changed DIALOG_STATE, DIALOG_VARS, and DIALOG_COLORS but /usr/lib/libdialog.so.? naming was not adjusted? (crashes in releng/13 programs on main [so: 14] can result)
On 10/22/21 1:08 AM, Mark Millard via freebsd-current wrote: main [soi: 14] commit a96ef450 (2021-02-26 09:16:49 +) changed DIALOG_STATE, DIALOG_VARS, and DIALOG_COLORS . These are publicly exposed in (ones that I noticed): /usr/include/dialog.h:extern DIALOG_STATE dialog_state; /usr/include/dialog.h:extern DIALOG_VARS dialog_vars; /usr/include/dialog.h:extern DIALOG_COLORS dlg_color_table[]; Then we need to bump libdialog's so version to 10? (I don't think libdialog has symbol versioning) -- John Baldwin
Re: ELF binary type "0" not known. (while compiling buildworld on risc-v/qemu)
On 9/27/21 7:40 AM, Karel Gardas wrote: Hello, I'm playing with compiling freebsd 13 (releng/13.0 2 days ago) and current (git HEAD as of today) on qemu-5.1.0/qemu-6.1.0 on risv64 platform. The emulator invocation is: qemu-system-riscv64 -machine virt -smp 8 -m 16G -nographic -device virtio-blk-device,drive=hd -drive file=FreeBSD-14.0-CURRENT-riscv-riscv64.qcow2,if=none,id=hd -device virtio-net-device,netdev=net -netdev user,id=net,hostfwd=tcp::2233-:22 -bios /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.elf -kernel /usr/lib/u-boot/qemu-riscv64_smode/uboot.elf -object rng-random,filename=/dev/urandom,id=rng -device virtio-rng-device,rng=rng -nographic -append "root=LABEL=rootfs console=ttyS0" and the host is Ubuntu 20.04.x LTS. Both qemu 5.1.0 and qemu 6.1.0 are compiled from, source, but both OpenSBI and u-boot for risc-v are Ubuntu packages provided (to accompany ubuntu provided qemu 4.2.1) My issue while compiling both 13 and current is that compilation after some time fails with: root@freebsd:/usr/src # time make -j8 buildworld > /tmp/build-j8-2.txt ELF binary type "0" not known. 17784.134u 21388.907s 1:50:13.83 592.2% 30721+572k 10+2177io 0pf+0w I'm curious if this is a know issue either in Qemu or in FreeBSD for risc-v or if I'm doing anything wrong here? It is a known issue with how we brand FreeBSD/riscv binaries. Jess (cc'd) has a WIP review with a possible fix IIRC. -- John Baldwin
Re: [HEADSUP] making /bin/sh the default shell for root
On 9/22/21 1:36 AM, Baptiste Daroussin wrote: Hello, TL;DR: this is not a proposal to deorbit csh from base!!! For years now, csh is the default root shell for FreeBSD, csh can be confusing as a default shell for many as all other unix like settled on a bourne shell compatible interactive shell: zsh, bash, or variant of ksh. Recently our sh(1) has receive update to make it more user friendly in interactive mode: * command completion (thanks pstef@) * improvement in the emacs mode, to make it behave by default like other shells * improvement in the vi mode (in particular the vi edit to respect $EDITOR) * support for history as described by POSIX. This makes it a usable shell by default, which is why I would like to propose to make it the default shell for root starting FreeBSD 14.0-RELEASE (not MFCed) If no strong arguments has been raised until October 15th, I will make this proposal happen. Again just in case: THIS IS NOT A PROPOSAL TO REMOVE CSH FROM BASE! I think this is fine. I would also be fine with either removing 'toor' from the default password file or just leaving it as-is for POLA. (I would probably prefer removing it outright.) -- John Baldwin
Re: rescue/sh check failed, installation aborted
On 8/23/21 12:18 PM, Graham Perrin wrote: Encountered whilst attempting to build and install 14.0-CURRENT over 13.0-RELEASE-p3 (experimental, helloSystem): <https://i.imgur.com/euFBA8M.png> Background, condensed, to the best of my recollection: cd /usr/src make buildworld # succeeded make kernel # failed make clean LOCAL_MODULES= # added to /etc/src.conf make kernel-toolchain make kernel restarted in single user mode mount -uw / service zfs start cd /usr/src make installworld – failed as pictured. I see <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231325>, fixed in 2018. Any suggestions? I'm not sure what the 'make clean' would have done. Did you mean 'make cleanworld'? If so, you will need to do a 'make buildworld' again before trying to do 'make installworld'. The error message implies that there is no 'make buildworld' output in /usr/obj (as if you had run 'make cleanworld' up above where you list 'make clean') -- John Baldwin
Re: etcupdate: Failed to build new tree
On 7/2/21 2:30 AM, Nuno Teixeira wrote: Hello, Last update I have some issues with etcupdate: etcupdate warning: "No previous tree to compare against, a sane comparison is not possible." That I corrected with: etcupdate extract etcupdate diff > /tmp/etc.diff patch -R < /tmp/etc.diff (etcupdate diff doesn't show any diffs.) Today I've just updated current and etcupdate -p gives: "Failed to build new tree" What might be wrong? You can look in /var/db/etcupdate/log to check for errors. -- John Baldwin
Re: CURRENT: acpi_wakecode.S error: unknown -Werror warning specifier: '-Wno-error-tautological-compare'
On 6/22/21 11:13 AM, O. Hartmann wrote: Hello, on a recent CURRENT (FreeBSD 14.0-CURRENT #6 main-n247512-e3be51b2bc7c: Tue Jun 22 15:31:03 CEST 2021 amd64) we build a 13-STABLE based NanoBSD from a dedicated source tree for a small routing appliance. It should be " ...we built ..." because sinde the introduction of LLVM/CLANG 12 on FreeBSD, the build of the source tree fails with the error shown below. Since these errors a re die to some compiler knobs, the question is how to avoid them and make the tree of 13-STABLE build again? We do not do explicetely cross compiling, so if there is in general an issue with this "brute force method" I would appreciate any recommendation to avoid such malfunctions using other techniques - as long as they are moderate to implement. Thanks in advance and kind regards, You can use 'make buildworld WITHOUT_SYSTEM_COMPILER=yes' to force your builds to use the clang 11 included in stable/13 instead of the host clang 12. You could also MFC the fixes from head to use -Wno-error= instead of -Wno-error-. -- John Baldwin
Re: etcupdate warning: "No previous tree to compare against, a sane comparison is not possible."
On 6/22/21 12:34 AM, Nuno Teixeira wrote: Hello, Should I be worry about etcupdate warning "No previous tree to compare against, a sane comparison is not possible." when I recompile and update current? I receive same warning when I do a 'etcupdate -p' after installworld too. Yes, this means etcupdate is not merging any changes to /etc. You should run 'etcupdate extract' before your next upgrade cycle. You should then review the output of 'etcupdate diff' to see if there are files in /etc that need updating. If there are files that you want to update to stock versions you can use 'etcupdate revert /path/to/file'. Otherwise, you can use the patch generated by 'etcupdate diff' either as a guide to manually update files to remove unwanted differences, or as input to patch -R. -- John Baldwin
Re: drm-kmod kernel crash fatal trap 12
On 6/15/21 11:22 AM, Bakul Shah wrote: On Jun 15, 2021, at 9:03 AM, John Baldwin wrote: On 6/10/21 8:13 AM, Bakul Shah wrote: On Jun 10, 2021, at 7:13 AM, Thomas Laus wrote: The drm-kmod module is the latest from the pkg server. It all worked this past Monday after the recent drm-kmod update. This is what I did: git clone https://github.com/freebsd/drm-kmod ln -s $PWD/drm-kmod /usr/local/sys/modules Now it gets compiled every time you do make buildkernel. If things break you can do a git pull in the drm-kmod dir and rebuild. This is what I do now as well. I think this is probably the sanest approach to use on HEAD at least. IIRC I learned this from one of your posts. The PORTS_MODULES approach results in installing kernel modules /boot/modules, which doesn't track /boot/kernel*/. Yes, PORTS_MODULES is not so great when you are building test kernels from branches that are different points in time and then go back to booting your "stock" kernel as the module is now built against the wrong ABI and breaks your "stock" kernel. This is why I added LOCAL_MODULES and the SRC knob to drm-kmod, but the source knob is a bit bumpy in practice as you sometimes need newer source than your current package. (For example, if your "stock" kernel only changes every few months, but you pull newer work trees for test kernels.) For that case, it has proven simpler to just do the direct checkout that I can git pull when needed. -- John Baldwin
Re: drm-kmod kernel crash fatal trap 12
On 6/10/21 8:13 AM, Bakul Shah wrote: On Jun 10, 2021, at 7:13 AM, Thomas Laus wrote: The drm-kmod module is the latest from the pkg server. It all worked this past Monday after the recent drm-kmod update. This is what I did: git clone https://github.com/freebsd/drm-kmod ln -s $PWD/drm-kmod /usr/local/sys/modules Now it gets compiled every time you do make buildkernel. If things break you can do a git pull in the drm-kmod dir and rebuild. This is what I do now as well. I think this is probably the sanest approach to use on HEAD at least. -- John Baldwin
Re: Files in /etc containing empty VCSId header
On 6/7/21 12:58 PM, Ian Lepore wrote: On Mon, 2021-06-07 at 13:53 -0600, Warner Losh wrote: On Mon, Jun 7, 2021 at 12:26 PM John Baldwin wrote: On 5/20/21 9:37 AM, Michael Gmelin wrote: Hi, After a binary update using freebsd-update, all files in /etc contain "empty" VCS Id headers, e.g., $ head /etc/nsswitch.conf # # nsswitch.conf(5) - name service switch configuration file # $FreeBSD$ # group: compat group_compat: nis hosts: files dns netgroup: compat networks: files passwd: compat After migrating to git, I would've expected those to contain something else or disappear completely. Is this expected and are there any plans to remove them completely? I believe we might eventually remove them in the future, but doing so right now would introduce a lot of churn and the conversion to git had enough other churn going on. We'd planned on not removing things that might be merged to stable/12 since those releases (12.3 only I think) will be built out of svn. We'll likely start to remove things more widely as the stable/12 branch reaches EOL and after. Warner It would be really nice if, instead of just deleting the $FreeBSD$ markers, they could be replaced with the path/filename of the file in the source tree. Sometimes it's a real interesting exercise to figure out where a file on your runtime system comes from in the source world. All the source tree layout changes that happened for packaged-base makes it even more interesting. My hope is that we un-break src/etc. :( A few folks have looked at doing that (notably Kyle). -- John Baldwin
Re: Files in /etc containing empty VCSId header
On 5/20/21 9:37 AM, Michael Gmelin wrote: Hi, After a binary update using freebsd-update, all files in /etc contain "empty" VCS Id headers, e.g., $ head /etc/nsswitch.conf # # nsswitch.conf(5) - name service switch configuration file # $FreeBSD$ # group: compat group_compat: nis hosts: files dns netgroup: compat networks: files passwd: compat After migrating to git, I would've expected those to contain something else or disappear completely. Is this expected and are there any plans to remove them completely? I believe we might eventually remove them in the future, but doing so right now would introduce a lot of churn and the conversion to git had enough other churn going on. -- John Baldwin
Re: etcupdate -p: No previous tree to compare against, a sane comparison is not possible. (was: Review D28062 …)
On 4/24/21 4:42 AM, Graham Perrin wrote: On 21/04/2021 18:19, John Baldwin wrote: On 4/17/21 12:52 PM, Graham Perrin wrote: 2) <https://reviews.freebsd.org/D28062#change-5KzY5tEtVUor> line 2274 etcupdate -p I get: > No previous tree to compare against, a sane comparison is not possible. Hmm, how did you initially install this machine? Release images should generally include a pre-populated /var/db/etcupdate so that etcupdate works. If you don't have one of those, you will have to perform an initial bootstrap of etcupdate (only once) by running 'etcupdate extract'. If you do this before you update /usr/src then 'etcupdate' will later work fine. If you are doing this after you have already updated /usr/src, you will need to run 'etcupdate diff' after 'etcupdate extract' and fix any unexpected local differences in the generated patch, e.g. by copying files from /var/db/etcupdate/current/etc to /etc. Once you have done this, 'etcupdate' will work fine on the next upgrade. However, I'm curious how you didn't get the etcupdate bootstrap when you initially installed. Sorry for not replying sooner. It's not an answer to your question, but might the thread at <https://lists.freebsd.org/pipermail/freebsd-current/2021-April/079538.html> be relevant? Yes, you might indeed have hit this bug (which has since been fixed). You might have to 'etcupdate extract' and then manually review 'etcupdate diff' to see if you have any unexpected diffs to recover. Sorry. :-/ -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Despite the documentation, "etcupdate extract" handles -D destdir (and its contribution to the default workdir)
On 4/24/21 12:22 PM, Mark Millard via freebsd-current wrote: # etcupdate -? Illegal option -? usage: etcupdate [-npBF] [-d workdir] [-r | -s source | -t tarball] [-A patterns] [-D destdir] [-I patterns] [-L logfile] [-M options] etcupdate build [-B] [-d workdir] [-s source] [-L logfile] [-M options] etcupdate diff [-d workdir] [-D destdir] [-I patterns] [-L logfile] etcupdate extract [-B] [-d workdir] [-s source | -t tarball] [-L logfile] [-M options] etcupdate resolve [-p] [-d workdir] [-D destdir] [-L logfile] etcupdate status [-d workdir] [-D destdir] The "etcupdate extract" material does not show -D destdir as valid. Thanks, it was a documentation oversight I've just fixed. It is definitely supposed to work and is quite useful for cross-builds (e.g. I use it frequently to update rootfs images I use with qemu for RISC-V or MIPS that I run under qemu, or when updating the SD-card for my RPI that I cross-build on an x86 host). -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: "etcupdate -p" vs. $OLDTREE?
On 4/23/21 6:59 AM, Olivier Cochard-Labbé wrote: On Fri, Apr 23, 2021 at 1:10 PM David Wolfskill wrote: After the set of updates to etcupdate (main-n246232-0611aec3cf3a .. main-n246235-ba30215ae0ef), I find that "etcupdate -B -p" is working as expected, but after the following "make installworld", a subsequent "etcupdate -B" chugs along for a bit, then stops, whining: No previous tree to compare against, a sane comparison is not possible. Same problem here while using /usr/src/tools/build/beinstall.sh: (...) Skipping blacklisted certificate /usr/share/certs/trusted/Verisign_Class_1_Public_Primary_Certification_Authority_-_G3.pem (/etc/ssl[0/1831]sted/ee1365c0.0) Skipping blacklisted certificate /usr/share/certs/trusted/Verisign_Class_2_Public_Primary_Certification_Authority_-_G3.pem (/etc/ssl/blacklisted/dc45b0bd.0) Scanning /usr/local/share/certs for certificates... + [ -n etcupdate ] + update_etcupdate + /usr/src/usr.sbin/etcupdate/etcupdate.sh -s /usr/src -D /tmp/beinstall.MZ4oy8/mnt -F No previous tree to compare against, a sane comparison is not possible. + return 1 + [ 1 -ne 0 ] + errx 'etcupdate (post-world) failed!' + cleanup (...) Sorry, this should be fixed. beinstall.sh still has a bug in that it needs this change: diff --git a/tools/build/beinstall.sh b/tools/build/beinstall.sh index 46c65d87e61a..fab21edc0fd5 100755 --- a/tools/build/beinstall.sh +++ b/tools/build/beinstall.sh @@ -133,7 +133,7 @@ update_mergemaster() { update_etcupdate_pre() { ${ETCUPDATE_CMD} -p -s ${srcdir} -D ${BE_MNTPT} ${ETCUPDATE_FLAGS} || return $? - ${ETCUPDATE_CMD} resolve -D ${BE_MNTPT} || return $? + ${ETCUPDATE_CMD} resolve -p -D ${BE_MNTPT} || return $? } update_etcupdate() { -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Review D28062 of /usr/src/UPDATING with regard to upgrading FreeBSD and inconsistency with the FreeBSD Handbook
On 4/17/21 12:52 PM, Graham Perrin wrote: 2) <https://reviews.freebsd.org/D28062#change-5KzY5tEtVUor> line 2274 etcupdate -p I get: > No previous tree to compare against, a sane comparison is not possible. Hmm, how did you initially install this machine? Release images should generally include a pre-populated /var/db/etcupdate so that etcupdate works. If you don't have one of those, you will have to perform an initial bootstrap of etcupdate (only once) by running 'etcupdate extract'. If you do this before you update /usr/src then 'etcupdate' will later work fine. If you are doing this after you have already updated /usr/src, you will need to run 'etcupdate diff' after 'etcupdate extract' and fix any unexpected local differences in the generated patch, e.g. by copying files from /var/db/etcupdate/current/etc to /etc. Once you have done this, 'etcupdate' will work fine on the next upgrade. However, I'm curious how you didn't get the etcupdate bootstrap when you initially installed. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Review D28062 of /usr/src/UPDATING with regard to upgrading FreeBSD and inconsistency with the FreeBSD Handbook
On 4/18/21 1:48 AM, driesm.michi...@gmail.com wrote: If etcupdate -p fails before make installworld, then should the subsequent etcupdate be with or without option -B ? -p and -B are not related. -p deals with changes needed for a correct run of installworld (see above). -B uses a freshly built world to speed up the tree comparison; although no flags work fine here as well, so -B is not necessarily required. Technically -B speeds up generating the tree to compare against as it assumes /usr/obj is up to date and uses that instead of rebuilding some files generated by buildworld normally. The actual tree comparison isn't affected. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Getting started with ktls
On 3/18/21 8:31 AM, tech-lists wrote: On Wed, Mar 17, 2021 at 08:39:02PM +, Rick Macklem wrote: Make sure you've done the following: ktls_ocf - is loaded these sysctls are set to 1 kern.ipc.tls.enable kern.ipc.mb_use_ext_pgs [on stable/13] % sysctl kern.ipc.tls.enable kern.ipc.mb_use_ext_pgs kern.ipc.tls.enable: 1 kern.ipc.mb_use_ext_pgs: 1 % kldstat | grep ktls 71 0x0135300025520 ktls_ocf.ko % % sysctl -a | fgrep kern.ipc.tls.stats kern.ipc.tls.stats.ocf.retries: 0 kern.ipc.tls.stats.ocf.separate_output: 0 kern.ipc.tls.stats.ocf.inplace: 0 kern.ipc.tls.stats.ocf.tls13_gcm_crypts: 0 kern.ipc.tls.stats.ocf.tls12_gcm_crypts: 0 kern.ipc.tls.stats.ocf.tls11_cbc_crypts: 0 kern.ipc.tls.stats.ocf.tls10_cbc_crypts: 0 kern.ipc.tls.stats.switch_failed: 0 kern.ipc.tls.stats.switch_to_sw: 0 kern.ipc.tls.stats.switch_to_ifnet: 0 kern.ipc.tls.stats.failed_crypto: 0 kern.ipc.tls.stats.corrupted_records: 0 kern.ipc.tls.stats.active: 0 kern.ipc.tls.stats.enable_calls: 535 kern.ipc.tls.stats.offload_total: 0 kern.ipc.tls.stats.sw_rx_inqueue: 0 kern.ipc.tls.stats.sw_tx_inqueue: 0 kern.ipc.tls.stats.threads: 4 % FYI, you can do this a bit more efficiently with just 'sysctl kern.ipc.tls.stats' The 'enable_calls' means that OpenSSL is trying to offload connections, but those attempts are all failing (offload_total is a count of how many of those setsockopt() calls succeed). If you are familiar with dtrace, you can use some DTrace probes to see why 'ktls_enable_tx' and 'ktls_enable_rx' are failing, or barring that printf. For example, does ktls_create_session() fail, or does ktls_try_sw() fail? It is probably easiest to debug this using a userland application using openssl than trying NFS over TLS. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: On 14-CURRENT: no ports options anymore?
On 3/13/21 12:58 PM, Guido Falsi via freebsd-current wrote: On 13/03/21 20:17, Hartmann, O. wrote: Since I moved on to 14-CURRENT, I face a very strange behaviour when trying to set options via "make config" or via poudriere accordingly. I always get "===> Options unchanged" (when options has been already set and I'd expect a dialog menu). This misbehaviour is throughout ALL 14-CURRENT systems (the oldest is at FreeBSD 14.0-CURRENT #49 main-n245422-cecfaf9bede9: Fri Mar 12 16:08:09 CET 2021 amd64). I do not see such a behaviour with 13-STABLE, 12-STABLE, 12.2-RELENG. How to fix this? What happened? I encountered something similar, some base shared library has changed, guess this is related with the ncurses changes in base. If I remember correctly force reinstalling dialog4ports package fixed it. Make sure you reinstall a freshly rebuilt one. Most probably anything using ncurses will require rebuild/reinstall. The cause is dialog4ports failing to start and the system sees no option changed. If that's not enough try # ldd -v /usr/local/bin/dialog4ports And see if it reports some useful information. There was an ABI breakage for ncurses that broke 12.x dialog4ports binaries. The shared library versions for everything that depended on ncurses were bumped for 13 and 14 after the branch of stable/13.0 (commit 6e1fe6d26ea2). After that commit, if you upgraded from 12 to 13 you should have been fine, but if you had updated before that, the 12.x dialog4ports was still going to fail as the 12.x version of those libraries were already broken. I haven't checked to see if the affected libraries have been added to misc/compat12x. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Getting started with ktls
On 3/10/21 4:18 PM, Alan Somers wrote: I'm trying to make ktls work with "zfs send/recv" to substantially reduce the CPU utilization of applications like zrepl. But I have a few questions: * ktls(4)'s "Transmit" section says "Once TLS transmit is enabled by a successful set of the TCP_TXTLS_ENABLE socket option", but the "Supported Libraries" section says "Applications using a supported library should generally work with ktls without any changes". These sentences seem to be contradictory. I think it means that the TCP_TXTLS_ENABLE option is necessary, but OpenSSL sets it automatically? Yes, you can do it by hand if you want but you'd have to do all the key exchange by hand as well. * When using OpenSSL, the library will automatically call setsockopt(_, TCP_TXTLS_ENABLE). But it swallows the error, if any. How is an application to tell if ktls is enabled on a particular socket or OpenSSL session? BIO_get_ktls_send() and BIO_get_ktls_recv() on the write and read BIO's of the connection, respectively. * From experiment, I can see that OpenSSL attempts to set TCP_TXTLS_ENABLE. But it doesn't try to set TCP_RXTLS_ENABLE. Why not? From reading ktls_start and ossl_statem_server_post_work, it looks like maybe a single socket cannot have ktls enabled for both sending and receiving at the same time. Is that true? Neither FreeBSD nor OpenSSL yet support RX offload on TLS 1.3. If you use TLS 1.2 you will get KTLS in both directions (or if you use TLS 1.1 with TOE offload on a Chelsio T6). -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: (n244517-f17fc5439f5) svn stuck forever in /usr/ports?
On 2/11/21 9:59 AM, Hartmann, O. wrote: On Wed, 10 Feb 2021 07:21:20 +0100 "Hartmann, O." wrote: On Tue, 9 Feb 2021 15:15:38 -0800 John Baldwin wrote: On 2/9/21 2:16 PM, Hartmann, O. wrote: On Wed, 3 Feb 2021 17:34:24 +0100 Guido Falsi via freebsd-current wrote: On 03/02/21 17:02, John Baldwin wrote: On 2/2/21 10:16 PM, Hartmann, O. wrote: On Mon, 1 Feb 2021 03:24:45 + Rick Macklem wrote: Rick Macklem wrote: Guido Falsi wrote: [good stuff snipped] Performed a full bisect. Tracked it down to commit aa906e2a4957, adding KTLS support to embedded OpenSSL. I filed a bug report about this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253135 Apart from switching to svn:// scheme, another workaround is to build base using WITHOUT_OPENSSL_KTLS. Just fyi, when I tested the daemons I have for nfs-over-tls (which use ktls), they acted like things were ok (no handshake problems), but the data ended up on the wire unencrypted (nfs-over-tls doesn't do a SSL_write(), so it depends on ktls to do the encryption). Since these daemons work fine with openssl3 in ports/security/openssl-devel, I suspect the ktls backport is not quite right. I've sent jhb@ email. I was wrong on the above. I did a full buildworld/installworld and the daemons now seem to work with the openssl in head/main. Btw, did anyone try rebuilding svn from sources after doing the system upgrade? (The openssl library calls and .h files definitely changed.) Yes, I did, on all boxes and its a pain in the a..., we had to rebuild EVERY port (at least, I did, to avoid further problem). Yesterday, on of our fastes boxes got ready and even with a full rebuild of the system AND a full rebuild of the ports (no poudriere, traditional way via make), the Apache 2.4 webservice doesn't work, and so does subversion not (Firefox reports problems with SSL handshake, subversion is stuck/frozen forever). I will run today another full world build today, hopefully finishing on friday (portmaster -dfR doesn't get everything in line on some ports, I assume). oh I tracked the subversion hang down to a bug in serf (an Apache library used by subversion). It would also affect any other software using serf. The serf in ports will also have to be patched. I submitted your patch as a bug report to the serf port: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253214 What is the status of this bug? As PR 253214 might suggest, the patch to www/serf has been commited. We still face a problem with FreeBSD CURRENT-14 based systems running Apache24: FreeBSD 14.0-CURRENT #4 main-n244672-866c8b8d5dd: Mon Feb 8 08:38:59 CET 2021 amd64 /usr/ports is at Revision: 564736. www/apache24, www/serf have been rebuilt using "portmaster -f www/apache24 www/serf". Restarting Apache 2.4 still fails on any access with SSL enabled, firefox reports: SSL_ERROR_HANDSHAKE_UNEXPECTED_ALERT This is the first report I've had after the serf update. Here's an untested patch that is similar to the serf bug. You would apply this in the www/apache24 port. Index: files/patch-modules_ssl_ssl__engine__io.c === --- files/patch-modules_ssl_ssl__engine__io.c (nonexistent) +++ files/patch-modules_ssl_ssl__engine__io.c (working copy) @@ -0,0 +1,11 @@ +--- modules/ssl/ssl_engine_io.c.orig 2021-02-09 15:09:39.362123000 -0800 modules/ssl/ssl_engine_io.c2021-02-09 15:12:13.59669 -0800 +@@ -542,7 +542,7 @@ static int bio_filter_in_gets(BIO *bio, char *buf, int + + static long bio_filter_in_ctrl(BIO *bio, int cmd, long num, void *ptr) + { +-return -1; ++return 0; + } + + #if MODSSL_USE_OPENSSL_PRE_1_1_API Thank you very much for investigating and the patch. I haven't got the chance to apply the patch yet, I'll do within the next two hours. For the record: I filed a PR on this specific problem in Apache 2.4, please see here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253394 Kind regards, O. Hartmann I tried the patch, it doesn't work. Assuming that it is sufficient to recompile from scratch/clean tree the whole OS and then recompile every port required by www/apach24, applying then the patch, I tried to connect to pages served by the 14-CURRENT server running the pacthed Apache 2.4 (ports tree at the most recent state at that time), I still get the error described above. Kind regards, oh I finally reproduced this today and was able to at least get a valid response back from the server using openssl s_client as the client with a larger version of this patch. You can find the full patch at https://reviews.freebsd.org/D28932 -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: (n244517-f17fc5439f5) svn stuck forever in /usr/ports?
On 2/9/21 2:16 PM, Hartmann, O. wrote: On Wed, 3 Feb 2021 17:34:24 +0100 Guido Falsi via freebsd-current wrote: On 03/02/21 17:02, John Baldwin wrote: On 2/2/21 10:16 PM, Hartmann, O. wrote: On Mon, 1 Feb 2021 03:24:45 + Rick Macklem wrote: Rick Macklem wrote: Guido Falsi wrote: [good stuff snipped] Performed a full bisect. Tracked it down to commit aa906e2a4957, adding KTLS support to embedded OpenSSL. I filed a bug report about this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253135 Apart from switching to svn:// scheme, another workaround is to build base using WITHOUT_OPENSSL_KTLS. Just fyi, when I tested the daemons I have for nfs-over-tls (which use ktls), they acted like things were ok (no handshake problems), but the data ended up on the wire unencrypted (nfs-over-tls doesn't do a SSL_write(), so it depends on ktls to do the encryption). Since these daemons work fine with openssl3 in ports/security/openssl-devel, I suspect the ktls backport is not quite right. I've sent jhb@ email. I was wrong on the above. I did a full buildworld/installworld and the daemons now seem to work with the openssl in head/main. Btw, did anyone try rebuilding svn from sources after doing the system upgrade? (The openssl library calls and .h files definitely changed.) Yes, I did, on all boxes and its a pain in the a..., we had to rebuild EVERY port (at least, I did, to avoid further problem). Yesterday, on of our fastes boxes got ready and even with a full rebuild of the system AND a full rebuild of the ports (no poudriere, traditional way via make), the Apache 2.4 webservice doesn't work, and so does subversion not (Firefox reports problems with SSL handshake, subversion is stuck/frozen forever). I will run today another full world build today, hopefully finishing on friday (portmaster -dfR doesn't get everything in line on some ports, I assume). oh I tracked the subversion hang down to a bug in serf (an Apache library used by subversion). It would also affect any other software using serf. The serf in ports will also have to be patched. I submitted your patch as a bug report to the serf port: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253214 What is the status of this bug? As PR 253214 might suggest, the patch to www/serf has been commited. We still face a problem with FreeBSD CURRENT-14 based systems running Apache24: FreeBSD 14.0-CURRENT #4 main-n244672-866c8b8d5dd: Mon Feb 8 08:38:59 CET 2021 amd64 /usr/ports is at Revision: 564736. www/apache24, www/serf have been rebuilt using "portmaster -f www/apache24 www/serf". Restarting Apache 2.4 still fails on any access with SSL enabled, firefox reports: SSL_ERROR_HANDSHAKE_UNEXPECTED_ALERT This is the first report I've had after the serf update. Here's an untested patch that is similar to the serf bug. You would apply this in the www/apache24 port. Index: files/patch-modules_ssl_ssl__engine__io.c === --- files/patch-modules_ssl_ssl__engine__io.c (nonexistent) +++ files/patch-modules_ssl_ssl__engine__io.c (working copy) @@ -0,0 +1,11 @@ +--- modules/ssl/ssl_engine_io.c.orig 2021-02-09 15:09:39.362123000 -0800 modules/ssl/ssl_engine_io.c2021-02-09 15:12:13.59669 -0800 +@@ -542,7 +542,7 @@ static int bio_filter_in_gets(BIO *bio, char *buf, int + + static long bio_filter_in_ctrl(BIO *bio, int cmd, long num, void *ptr) + { +-return -1; ++return 0; + } + + #if MODSSL_USE_OPENSSL_PRE_1_1_API -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: (n244517-f17fc5439f5) svn stuck forever in /usr/ports?
On 2/2/21 10:16 PM, Hartmann, O. wrote: On Mon, 1 Feb 2021 03:24:45 + Rick Macklem wrote: Rick Macklem wrote: Guido Falsi wrote: [good stuff snipped] Performed a full bisect. Tracked it down to commit aa906e2a4957, adding KTLS support to embedded OpenSSL. I filed a bug report about this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253135 Apart from switching to svn:// scheme, another workaround is to build base using WITHOUT_OPENSSL_KTLS. Just fyi, when I tested the daemons I have for nfs-over-tls (which use ktls), they acted like things were ok (no handshake problems), but the data ended up on the wire unencrypted (nfs-over-tls doesn't do a SSL_write(), so it depends on ktls to do the encryption). Since these daemons work fine with openssl3 in ports/security/openssl-devel, I suspect the ktls backport is not quite right. I've sent jhb@ email. I was wrong on the above. I did a full buildworld/installworld and the daemons now seem to work with the openssl in head/main. Btw, did anyone try rebuilding svn from sources after doing the system upgrade? (The openssl library calls and .h files definitely changed.) Yes, I did, on all boxes and its a pain in the a..., we had to rebuild EVERY port (at least, I did, to avoid further problem). Yesterday, on of our fastes boxes got ready and even with a full rebuild of the system AND a full rebuild of the ports (no poudriere, traditional way via make), the Apache 2.4 webservice doesn't work, and so does subversion not (Firefox reports problems with SSL handshake, subversion is stuck/frozen forever). I will run today another full world build today, hopefully finishing on friday (portmaster -dfR doesn't get everything in line on some ports, I assume). oh I tracked the subversion hang down to a bug in serf (an Apache library used by subversion). It would also affect any other software using serf. The serf in ports will also have to be patched. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Can In-Kernel TLS (kTLS) work with any OpenSSL Application?
On 1/20/21 12:21 PM, Neel Chauhan wrote: Hi freebsd-current@, I know that In-Kernel TLS was merged into the FreeBSD HEAD tree a while back. With 13.0-RELEASE around the corner, I'm thinking about upgrading my home server, well if I can accelerate any SSL application. I'm asking because I have a home server on a symmetrical Gigabit connection (Google Fiber/Webpass), and that server runs a Tor relay. If you're interested in how Tor works, the EFF has a writeup: https://www.eff.org/pages/what-tor-relay But the main point for you all is: more-or-less Tor relays deal with 1000s TLS connections going into and out of the server. Would In-Kernel TLS help with an application like Tor (or even load balancers/TLS termination), or is it more for things like web servers sending static files via sendfile() (e.g. CDN used by Netflix). It depends. Applications with allow OpenSSL to use a socket directly (e.g. via SSL_set_fd() or via SSL_connect() or the like) will work with kernel TLS transparently. This includes things like apache, nginx, fetch, wget, curl, etc. However, some applications use OpenSSL purely as a data transformation library and manage the socket I/O separately (e.g. OpenVPN). KTLS will not work with these applications since OpenSSL doesn't "know" about the socket in question. My server could also work with Intel's QuickAssist (since it has an Intel Xeon "Scalable" CPU). Would QuickAssist SSL be more helpful here? You can use this with ktls_ocf.ko and the qat(4) drivers. I am working, btw, on merging KTLS into base OpenSSL and hope to have it present in 13.0. As you noted, applications would need to be changed to use SSL_sendfile() to get the best performance on TX. We don't really have an analog on the receive side in our syscall API. One might be able to do some creative things with aio_read(4) perhaps, but I haven't implemented that. Also, currently RX offload always returns individual records with the full TLS header via recvmsg(). Linux's RX offload only includes the message for non-application-data messages so that one could in theory do bulk read(2) calls larger than a single TLS record. OpenSSL itself though always reads a single TLS record at a time, so if I were to change this (e.g. with a new socket option to toggle headers for application data), this would only be relevant to software that "knew" it was using KTLS and would use direct read/write after letting OpenSSL (or a similar library) handle the handshake. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: service -e doesn't really sort does it? the cool tip is slightly off
On 1/16/21 3:28 PM, Dennis Clarke wrote: root@rhea:/usr/src/freebsd-src # diff -u usr.bin/fortune/datfiles/freebsd-tips.orig usr.bin/fortune/datfiles/freebsd-tips --- usr.bin/fortune/datfiles/freebsd-tips.orig 2021-01-15 00:37:37.863506000 + +++ usr.bin/fortune/datfiles/freebsd-tips 2021-01-16 07:46:57.335803000 + @@ -517,7 +517,7 @@ -- Lars Engels % -If you want to get a sorted list of all services that are started when FreeBSD boots, +If you want to get a list of all services that are started when FreeBSD boots, enter "service -e". -- Lars Engels root@rhea:/usr/src/freebsd-src # Sorry for being all OCD here. Perhaps it should say sorted in the order in which they were started. Something like that. I think the "sorted" detail is relevant, but describing the key (how it is sorted) is certainly a missing detail, maybe: If you want to get a list of all services started when FreeBSD boots in the order they are started, enter "service-e". ? -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: git non-time-sequential logs
On 1/4/21 8:52 AM, John Kennedy wrote: > On Mon, Jan 04, 2021 at 08:22:56AM -0800, John Kennedy wrote: >> The git logs in /usr/src aren't time-sequential, so maybe I shouldn't trust >> those dates above (I pulled it ~Jan 3rd and let it compile overnight), but >> I'm going to repull all the sources and recompile, just in case. I might >> have initiall pulled it during the git conversion and maybe it is confused. > > This might be perfectly natural and just new to me, but when I look at the > git logs this morning I see things like this (editing by me): > > commit e5df46055add3bcf074c9ba275ceb4481802ba04 (HEAD -> main, > freebsd/main, freebsd/HEAD) > Author: Emmanuel Vadot > Date: Mon Jan 4 17:30:00 2021 +0100 > > commit f61a3898bb989edef7ca308043224e495ed78f64 > Author: Emmanuel Vadot > Date: Mon Dec 14 18:56:56 2020 +0100 > > commit b6cc69322a77fa778b00db873781be04f26bd2ee > Author: Emmanuel Vadot > Date: Tue Dec 15 13:50:00 2020 +0100 > > commit 4401fa9bf1a3f2a7f2ca04fae9291218e1ca56bf > Author: Emmanuel Vadot > Date: Mon Jan 4 16:23:10 2021 +0100 > > This is a fresh clone+pull off of anon...@git.freebsd.org:src.git. > > I've always assumed that the "Date:" there was when the commit happened, > so they'd be increasing (most recent on top), but I suppose that you might > have developers in branches that are committing to their branch at one > point in time and it's getting merged into current (main) later, but the > original date is preserved? > > I guess I only care because I was trying to use time to bisect the > time I thought the problem might have been introduced. For commits to gdb (which uses git), the project asks that all series be rebased via 'git rebase --ignore-date' prior to pushing to master to give monotonically increasing commit dates. We could do something similar in FreeBSD either by asking folks to do that explicitly (though I know I sometimes forget when pushing to gdb myself), or we could avoid direct pushes to main. One option some folks mentioned on IRC was to have a separate "staging" branch that developers push to and then have a bot that does a rebase --ignore-date of that branch to main periodically, though that opens the question of how to deal with cherry-picks to stable (for which asking developers to do a rebase --ignore-date prior to pushing is probably the simpler approach). If we did want monotonically increasing dates without having a staging branch, we could perhaps use a server-side push hook to reject them and developers would just have to do a rebase --ignore-date before pushing again. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Enabling AESNI by default
On 12/31/20 11:51 AM, Allan Jude wrote: > We've had the AESNI module for quite a few years now, and it has not > caused any problems. > > I am wondering if there are any objections to including it in GENERIC, > so that users get the benefit without having to have the "tribal > knowledge" that 'to accelerate kernel crypto (GELI, ZFS, IPSEC, etc), > you need to load aesni.ko' > > Userspace crypto that uses openssl or similar libraries is already > taking advantage of these CPU instructions if they are available, by > excluding this feature from GENERIC we are just causing the "out of the > box" experience to by very very slow for crypto. > > For example, writing 1MB blocks to a GELI encrypted swap-backed md(4) > device: > > with 8 jobs on a 10 core Intel Xeon CPU E5-2630 v4 @ 2.20GHz > > fio --filename=/dev/md0.eli --device=1 --name=geli --rw=write --bs=1m > --numjobs=8 --iodepth=16 --end_fsync=1 --ioengine=pvsync > --group_reporting --fallocate=none --runtime=60 --time_based > > > stock: > write: IOPS=530, BW=530MiB/s (556MB/s) (31.1GiB/60012msec) > > with aesni.ko loaded: > write: IOPS=2824, BW=2825MiB/s (2962MB/s) (166GiB/60002msec) > > > Does anyone have a compelling reason to deny our users the 5x speedup? I think this is fine. I do hope to add AES support to ossl(4) at some point, and it may be that ossl(4) will supplant aesni(4) (and armv8crypto(4)) at that point, but aesni in GENERIC makes sense right now (and I'd say to MFC it to 12). armv8crypto in arm64 GENERIC will make sense once AES-XTS support (currently in a review) lands. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Intel TigerLake NVMe vmd: Adding Support & Debugging a Patch
On 12/31/20 2:40 PM, Chuck Tuffli wrote: > On Wed, Dec 30, 2020 at 4:38 PM Neel Chauhan wrote: >> >> Hi Chuck, >> >> On 2020-12-30 10:04, Chuck Tuffli wrote: >>> What is the output from >>> # pciconf -rb pci0:0:14:0 0x40:0x48 >> >> The output is: >> >> 01 00 00 00 01 2e 68 02 00 > > Perfect. The Linux driver says the 8086:9a0b device you have "... may > provide root port configuration information which limits bus > numbering" which causes the code to read the VM Capability register > (0x40) and the VM Configuration register (0x44). Here, VMCAP = 0x0001 > where bit 0 set appears to mean the config register has starting bus > number information. VMCFG = 0x2e01 where bits 5:4 give the coded start > number of bus 224 or 0xe0 which matches the PCI bridge shown in the > lspci output (i.e. 1:e0:06.0). > > I wonder if mirroring the logic in [1] and setting > bus->rman.rm_start = 224; > in vmd_attach() might help. > >> I was also able to stop kernel panics by adding: >> >> rman_fini(&sc->vmd_bus.rman); >> >> In the fail: statement in vmd_attach(). >> >> But I still cannot detect the SSD. > > [1] > https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L507 You will also need to subtract that starting bus number from the bus number used to compute the offset into the PCI-express region for config register read/write as this code does: https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L339 Also, that means the vm_bus.c can't hardcode reading from bus 0. Instead, vmd(4) might need to export an IVAR to vmd_bus(4) that is the starting bus number and vm_bus needs to use that instead of hardcoding 0. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Enabling AESNI by default
On 12/31/20 12:15 PM, Franco Fichtner wrote: > https://cgit.freebsd.org/src/commit/sys/crypto/aesni?h=stable/12&id=95b37a4ed741fd116809d0f2cb295c4e9977f5b6 > > may have subtly broken a number of IPsec installations by stalling active > connections after certain amounts of traffic transferred. We're still > trying to confirm, but it looks like this had an overall impact on 12.0 > and 12.1 except that only one person in OPNsense traced it back to aesni.ko > to our knowledge to effective work around an apparent issue there. > > If that is not the actual fix, the problem still exists in 12.2 and onward ;) We don't support AES-CCM for IPsec, so there is 0 chance that commit has any effect on IPsec in 12. There's not much detail in the forum posts though (e.g. netstat -s output to get ipsec, esp, and ah stats). Also, at least one forum post mentioned it happened when doing an upgrade from 11.2 to 12.1 which is a larger set of changes. I know the pfsense folks had a major performance regression due to iflib with Intel e1000 devices that might manifest as this perhaps? Disabling aseni might just be throttling the connection slow enough to avoid hitting a bug in a NIC driver for example. I think netstat -s would be a better place to start to try to debug this. > https://github.com/opnsense/core/issues/4415 -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PCIe Root Port/Bus Not Detected in VMD
On 12/30/20 9:45 PM, Neel Chauhan wrote: > For reference, I am attaching the `pciconf -lv` and `acpidump -dt` > dumps. Hmm, the acpidump doesn't have the -d contents, only the -t, and PCI bridges are generally enumerated in the the -d part. These PCI bridges aren't enumerated in ACPI though, so that probably doesn't matter. The dinfo getting 0x means that somehow the way the PCI config access is being handled for the child devices in vmd.c is wrong for this bridge. You might have to spelunk in the Linux driver to see if the logic in vmd_read_config() and vmd_write_config() is correct. -- -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: git and the loss of revision numbers
On 12/29/20 7:11 AM, monochrome wrote: > ok, this appears to be what I was looking for > > example: > git reset --hard f20c0e331 > then: > git pull --ff-only > is again able to update as normal > > I should point out also that this is from the point of view of any > random person just building freebsd from source, not a developer, so > there are no local changes. Though it does blow away changes to the conf > file, that's a lesser issue to deal with. One other thing to consider is that if you are trying to track down a regression, you can use 'git bisect' to do this and it will do the binary search for you. In the case of searching for a regression, you will be better served by that tool than trying to use 'git reset --hard' directly. The other approach you can use to avoid having to look up hashes would be to create your own local tags each time you update. Then you can easily go back to that tag by name instead of having to look up the hash. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic: Assertion pgrp->pg_jobc > 0 failed at kern_proc.c:816
On 12/28/20 12:24 PM, John Baldwin wrote: > I got this panic again today in a VM when quitting a gdb > session after killing a child process via 'kill'. > > panic: Assertion pgrp->pg_jobc > 0 failed at > /git/bhyve/sys/kern/kern_proc.c:816 > cpuid = 1 > time = 1609185862 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00946547f0 > vpanic() at vpanic+0x181/frame 0xfe0094654840 > panic() at panic+0x43/frame 0xfe00946548a0 > _refcount_update_saturated() at _refcount_update_saturated/frame > 0xfe00946548e0 > killjobc() at killjobc+0x6a6/frame 0xfe0094654940 > exit1() at exit1+0x6af/frame 0xfe00946549b0 > sys_sys_exit() at sys_sys_exit+0xd/frame 0xfe00946549c0 > amd64_syscall() at amd64_syscall+0x12e/frame 0xfe0094654af0 > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0094654af0 > --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8024c8f0a, rsp = > 0x7fffe358, rbp = 0x7fffe370 --- > KDB: enter: panic > [ thread pid 44034 tid 102484 ] > Stopped at kdb_enter+0x37: movq$0,0x10ab066(%rip) > > From what I can tell, the child process that was killed via 'kill' has > not yet exited and is stuck in ptracestop() from fork(): > > (kgdb) where > #0 sched_switch (td=0xfe0094001a00, flags=) at > /git/bhyve/sys/kern/sched_ule.c:2147 > #1 0x80bf4015 in mi_switch (flags=266) at > /git/bhyve/sys/kern/kern_synch.c:542 > #2 0x80bfeba5 in thread_suspend_switch (td=, > p=) > at /git/bhyve/sys/kern/kern_thread.c:1477 > #3 0x80bef04b in ptracestop (td=0xfe0094001a00, sig=17, si=0x0) > at /git/bhyve/sys/kern/kern_sig.c:2642 > #4 0x80ba1a54 in fork_return (td=0xfe0094001a00, > frame=0xfe0094671b00) > at /git/bhyve/sys/kern/kern_fork.c:1106 > #5 0x80ba18b0 in fork_exit (callout=0x80ba1950 > , arg=0xfe0094001a00, > frame=0xfe0094671b00) at /git/bhyve/sys/kern/kern_fork.c:1069 > #6 > #7 0x0008007b71aa in ?? () > > kgdb can't find the panicking process due to the zombproc removal, so I will > have to go work on kgdb to recover from that change. :( I've come up with a shorter reproducer (original was trying to debug a perl script in OpenSSL's test suite). Compile this program: #include #include #include #include #include #include int main(void) { pid_t pid, wpid; pid = fork(); if (pid == -1) err(1, "fork"); if (pid == 0) { printf("I'm in the child\n"); exit(1); } printf("I'm in the parent\n"); wpid = waitpid(pid, NULL, 0); if (wpid < 0) err(1, "waitpid"); return (0); } Then in gdb do the following: # gdb101 ./forktest ... (gdb) catch fork Catchpoint 1 (fork) (gdb) r Starting program: /mnt/forktest/forktest Catchpoint 1 (forked process 830), _fork () at _fork.S:4 4 _fork.S: No such file or directory. (gdb) kill Kill the program being debugged? (y or n) y [Inferior 1 (process 828) killed] (gdb) quit -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
panic: Assertion pgrp->pg_jobc > 0 failed at kern_proc.c:816
I got this panic again today in a VM when quitting a gdb session after killing a child process via 'kill'. panic: Assertion pgrp->pg_jobc > 0 failed at /git/bhyve/sys/kern/kern_proc.c:816 cpuid = 1 time = 1609185862 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00946547f0 vpanic() at vpanic+0x181/frame 0xfe0094654840 panic() at panic+0x43/frame 0xfe00946548a0 _refcount_update_saturated() at _refcount_update_saturated/frame 0xfe00946548e0 killjobc() at killjobc+0x6a6/frame 0xfe0094654940 exit1() at exit1+0x6af/frame 0xfe00946549b0 sys_sys_exit() at sys_sys_exit+0xd/frame 0xfe00946549c0 amd64_syscall() at amd64_syscall+0x12e/frame 0xfe0094654af0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0094654af0 --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8024c8f0a, rsp = 0x7fffe358, rbp = 0x7fffe370 --- KDB: enter: panic [ thread pid 44034 tid 102484 ] Stopped at kdb_enter+0x37: movq$0,0x10ab066(%rip) >From what I can tell, the child process that was killed via 'kill' has not yet exited and is stuck in ptracestop() from fork(): (kgdb) where #0 sched_switch (td=0xfe0094001a00, flags=) at /git/bhyve/sys/kern/sched_ule.c:2147 #1 0x80bf4015 in mi_switch (flags=266) at /git/bhyve/sys/kern/kern_synch.c:542 #2 0x80bfeba5 in thread_suspend_switch (td=, p=) at /git/bhyve/sys/kern/kern_thread.c:1477 #3 0x80bef04b in ptracestop (td=0xfe0094001a00, sig=17, si=0x0) at /git/bhyve/sys/kern/kern_sig.c:2642 #4 0x80ba1a54 in fork_return (td=0xfe0094001a00, frame=0xfe0094671b00) at /git/bhyve/sys/kern/kern_fork.c:1106 #5 0x80ba18b0 in fork_exit (callout=0x80ba1950 , arg=0xfe0094001a00, frame=0xfe0094671b00) at /git/bhyve/sys/kern/kern_fork.c:1069 #6 #7 0x0008007b71aa in ?? () kgdb can't find the panicking process due to the zombproc removal, so I will have to go work on kgdb to recover from that change. :( -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: OpenZFS and L2ARC
On 9/9/20 1:30 AM, Stefan Esser wrote: > Am 09.09.20 um 08:46 schrieb Stefan Esser: >> Am 09.09.20 um 00:45 schrieb Graham Perrin: >>> Recalling >>> <https://lists.freebsd.org/pipermail/freebsd-current/2020-March/075661.html>, >>> >>> on 28/03/2020 15:17,28/03/2020 15:17, Allan Jude wrote: >>> >>> >> … >>> >> >>> >> Basically 'arc' was converted to a subtree. >>> >> >>> >> We should add some backwards compat sysctls to cover some of >>> >> these renames etc so configs and scripts don't break etc. >> >> This is not possible for quite a number of sysctls, since there is >> no simple 1:1 mapping for many of them. >> >> >> And there is an annoyance that I had noticed before but now have >> tracked down: >> >> $ time sysctl kstat.zfs.misc.dbufs | wc >> 55327 2047031 16333472 >> >> real 0m16,446s >> user 0m0,055s >> sys 0m16,397s >> >> Somebody decided to put a complete list of dbufs under this sysctl >> and thus querying "kstat.zfs.misc" takes that long (16 seconds to >> generate 16 MB of output on my system), even if only a few other >> values in "kstat.zfs.misc" are needed. >> >> I do not know whether there is any chance to get that debug output >> moved out of the "misc", e.g. into a new "debug" sub-tree. I'm afraid, >> that on Linux there are scripts that expect it under this name. >> >> If it is not acceptable to the upstream, we should locally modify the >> sysctl tree to move that variable out of "misc", IMHO. (While not >> taking much time, "kstat.zfs.misc.dbgmsg" should also be relocted to >> a "debug" sub-tree, IMHO ...) >> >> zfs-stats needs tens of values from "misc", and if they are not all >> added individually to the Kstat array, this will limit the response >> time to any zfs-stats invocation. >> >> It is not too hard to add the new variables in zfs-stats and to >> adapt the calculations to derive meaningful values to display. >> >> But if it always takes 16 seconds to generate any output, I'm not >> likely to use it too often ... > > Update: I have created a fork of zfs-stats to work on: > > https://github.com/stesser/zfs-stats > > Initial change is to work around the long delay mentioned above and to > use the correct name for the vdev cache size variable and to display > the size, data contents and the corresponding compression factor of the > compressed L2ARC. > > I'll create pull requests to inform the upstream of these changes. A simple fix might be to use CTLFLAG_SKIP so that you only invoke the expensive sysctls if you request them by name, but not if you request the 'kstat.zfs' tree. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird Ctrl-T debug messages
On 6/27/20 2:59 AM, Michael Gmelin wrote: > > > On Sat, 27 Jun 2020 12:06:17 +0300 > Andriy Gapon wrote: > >> On 27/06/2020 10:44, Li-Wen Hsu wrote: >>> On Sat, Jun 27, 2020 at 3:04 PM Hartmann, O. >>> wrote: >>>> >>>> Running poudriere on recent CURRENT with (recent) 12-STABLE and >>>> CURRENT jails reveals a weird behaviour recently when hitting >>>> Ctrl-T: >>> ... >>>> Is this debug fallout from /bin/sh? >>> >>> It's because kern.tty_info_kstacks is on by default now: >>> >>> https://svnweb.freebsd.org/changeset/base/362141 >> >> May I suggest that the stack trace is printed procstat -kk style >> (single line) ? I think that the more compact output would be more >> convenient. > > It's a cool feature and having it on by default on CURRENT certainly > helps to discover it, which is great. Thanks for implementing this! > > I wouldn't enable it by default on RELEASE versions though, as CTRL-T > is a user interface to get status information (at least this is how I > use it personally, e.g., while running commands like dd[0], cp, mv, > poudriere etc.), not for getting debug output. I agree with this. > Question: Speaking of discovering the feature, wouldn't it make sense > to document this tunable on the stack(9) and/or tty(4) man page(s)? This sounds like a great idea. Would you able to come up with a patch? I'd be happy to review it. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]
On 5/27/20 2:05 PM, Hans Petter Selasky wrote: > On 2020-05-27 15:41, Justin Hibbits wrote: >> On Wed, 27 May 2020 06:27:16 -0700 >> John Baldwin wrote: >> >>> On 5/27/20 2:39 AM, Andriy Gapon wrote: >>>> On 27/05/2020 11:13, Andriy Gapon wrote: >>>>> I added more diagnostics and it seems to support the idea that the >>>>> problem is related to I/O cycles and bridges. >>>>> >>>>> ACPI timer suddenly starts returning 0x and that lasts for >>>>> tens of microseconds before the timer goes back to returning >>>>> normal values with an expected increase. >>>>> AMD provides a proprietary way to access ACPI registers via MMIO >>>>> (0xfed808xx). That mechanism is unaffected, ACPI timer register >>>>> always returns good values. >>>>> >>>>> The problem seems to happen when restoring configuration of a >>>>> particular PCI bridge. What's interesting is that the bridge >>>>> decodes one memory range and one I/O range. >>>>> >>>>> Looking at pci_cfg_restore() I wonder if it is wise to restore >>>>> PCIR_COMMAND so early. Could it be that after the resume the >>>>> bridge is configured with a wrong I/O range (e.g., too wide) and >>>>> by writing PCIR_COMMAND we enable that decoding. So, the bridge >>>>> steals I/O cycles destined for ACPI support hardware. If there is >>>>> nothing behind the bridge to handle those ports, then we get those >>>>> bad readings. Once the bridge configuration is fully restored, the >>>>> I/O handling goes back to normal. >>>> >>>> From what I see, this looks like a BIOS bug. >>>> Upon resume, it swaps window configurations of pcib1 and pcib2 >>>> (until FreeBSD restores them). pcib1 originally does not have an >>>> I/O window. So, BIOS programs both base and limit of pcib2 I/O >>>> window to zero. When FreeBSD writes its command register to >>>> enable I/O decoding it starts claiming 0x0 - 0xFFF I/O port range. >>>> That covers the ACPI ports at 0x8xx. >>>> >>>> Some printf-s. >>>> From (verbose) boot time: >>>> pcib1: domain0 >>>> pcib1: secondary bus 1 >>>> pcib1: subordinate bus 1 >>>> pcib1: memory decode 0xfea0-0xfeaf >>>> pcib2: domain0 >>>> pcib2: secondary bus 2 >>>> pcib2: subordinate bus 2 >>>> pcib2: I/O decode0xf000-0x >>>> pcib2: memory decode 0xfe90-0xfe9f >>>> >>>> My printf-s from resume time: >>>> pcib1: old I/O base (low): 0xf1 >>>> pcib1: old I/O base (high): 0x0 >>>> pcib1: old I/O limit (low): 0x1 >>>> pcib1: old I/O limit (high): 0x0 >>>> pcib2: old I/O base (low): 0x1 >>>> pcib2: old I/O base (high): 0x0 >>>> pcib2: old I/O limit (low): 0x1 >>>> pcib2: old I/O limit (high): 0x0 >>> >>> The "solution" I think is to have resume be multi-pass and to resume >>> all the bridges first before trying to resume leaf devices (including >>> timers), but that's a fair bit of work. It might be that we just >>> need to resume timer interrupts later after the new-bus resume (I >>> think we currently do it before?), though the reason for that was to >>> allow resume methods in devices to sleep (I'm not sure if any do). >>> >> >> That sounds like a good fit for https://reviews.freebsd.org/D203 . >> Someone (TM) just needs to take it over the finish line... 6 years >> later. > > Is this perhaps related to: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666 No. I get that constantly on a desktop that never suspends/resumes. It only started after upgrading to 12.0. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]
On 5/27/20 2:39 AM, Andriy Gapon wrote: > On 27/05/2020 11:13, Andriy Gapon wrote: >> I added more diagnostics and it seems to support the idea that the problem is >> related to I/O cycles and bridges. >> >> ACPI timer suddenly starts returning 0x and that lasts for tens of >> microseconds before the timer goes back to returning normal values with an >> expected increase. >> AMD provides a proprietary way to access ACPI registers via MMIO >> (0xfed808xx). >> That mechanism is unaffected, ACPI timer register always returns good values. >> >> The problem seems to happen when restoring configuration of a particular PCI >> bridge. What's interesting is that the bridge decodes one memory range and >> one >> I/O range. >> >> Looking at pci_cfg_restore() I wonder if it is wise to restore PCIR_COMMAND >> so >> early. Could it be that after the resume the bridge is configured with a >> wrong >> I/O range (e.g., too wide) and by writing PCIR_COMMAND we enable that >> decoding. >> So, the bridge steals I/O cycles destined for ACPI support hardware. If >> there >> is nothing behind the bridge to handle those ports, then we get those bad >> readings. >> Once the bridge configuration is fully restored, the I/O handling goes back >> to >> normal. > > From what I see, this looks like a BIOS bug. > Upon resume, it swaps window configurations of pcib1 and pcib2 (until FreeBSD > restores them). pcib1 originally does not have an I/O window. So, BIOS > programs both base and limit of pcib2 I/O window to zero. When FreeBSD > writes > its command register to enable I/O decoding it starts claiming 0x0 - 0xFFF I/O > port range. That covers the ACPI ports at 0x8xx. > > Some printf-s. > From (verbose) boot time: > pcib1: domain0 > pcib1: secondary bus 1 > pcib1: subordinate bus 1 > pcib1: memory decode 0xfea0-0xfeaf > pcib2: domain0 > pcib2: secondary bus 2 > pcib2: subordinate bus 2 > pcib2: I/O decode0xf000-0x > pcib2: memory decode 0xfe90-0xfe9f > > My printf-s from resume time: > pcib1: old I/O base (low): 0xf1 > pcib1: old I/O base (high): 0x0 > pcib1: old I/O limit (low): 0x1 > pcib1: old I/O limit (high): 0x0 > pcib2: old I/O base (low): 0x1 > pcib2: old I/O base (high): 0x0 > pcib2: old I/O limit (low): 0x1 > pcib2: old I/O limit (high): 0x0 The "solution" I think is to have resume be multi-pass and to resume all the bridges first before trying to resume leaf devices (including timers), but that's a fair bit of work. It might be that we just need to resume timer interrupts later after the new-bus resume (I think we currently do it before?), though the reason for that was to allow resume methods in devices to sleep (I'm not sure if any do). -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]
On 5/26/20 11:55 AM, Konstantin Belousov wrote: > On Tue, May 26, 2020 at 06:22:13PM +0300, Andriy Gapon wrote: >> On 25/05/2020 11:37, Andriy Gapon wrote: >>> Also, there is another issue related to atrtc. >>> When I have both drivers attached, and also when I have only atrtc attached >>> (efi.rt.disabled=1), system clock jumps 10 minutes forward after each >>> suspend / >>> resume cycle (S0 -> S3 -> S0). That does not happen for reboot and shutdown >>> cycles. I haven't investigated this deeper, but it is a curious problem. >> >> Actually, I was wrong. The problem can also occur with efirtc alone. >> Also, sometimes there is a different problem where there are no callouts for >> a >> period of time on the order of minutes. I tracked it to cc_lastscan being >> set >> to a value greater than the current uptime. So, any scheduled callout gets >> scheduled at cc_lastscan and it is a while before the uptime catches up. >> >> It seemed that both issues were connected and were a result of the uptime >> jumping forward by some minutes and then jumping back to a sane value. >> If something important happened during the weird period, like getting time of >> day from hardware or invoking a callout, it lead to the observed effects. >> >> So, that gave me some ideas where to add debugging checks. >> What I determined is that ACPI timer (ACPI-fast) could produce a reading of >> all >> 1-s like happens when there is no hardware response. >> >> I caught one such instance and got a stack trace for it (but no crash dump >> because devices had not resumed yet): >> tc_windup() at tc_windup+0x318/frame 0xfe00a7a19300 >> tc_ticktock() at tc_ticktock+0x4b/frame 0xfe00a7a19320 >> hardclock() at hardclock+0x107/frame 0xfe00a7a19360 >> handleevents() at handleevents+0xb3/frame 0xfe00a7a193a0 >> timercb() at timercb+0x196/frame 0xfe00a7a193f0 >> lapic_handle_timer() at lapic_handle_timer+0x98/frame 0xfe00a7a19420 >> Xtimerint() at Xtimerint+0xb1/frame 0xfe00a7a19420 >> --- interrupt, rip = 0x80b34500, rsp = 0xfe00a7a194f8, rbp = >> 0xfe00a7a19540 --- >> acpi_pcib_write_config() at acpi_pcib_write_config/frame 0xfe00a7a19540 >> pci_cfg_restore() at pci_cfg_restore+0x2cc/frame 0xfe00a7a195a0 >> pci_resume_child() at pci_resume_child+0xee/frame 0xfe00a7a195e0 >> pci_resume() at pci_resume+0x49/frame 0xfe00a7a19630 >> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame >> 0xfe00a7a19650 >> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a19680 >> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame >> 0xfe00a7a196a0 >> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a196d0 >> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame >> 0xfe00a7a196f0 >> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a19720 >> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame >> 0xfe00a7a19740 >> root_resume() at root_resume+0x29/frame 0xfe00a7a19770 >> acpi_EnterSleepState() at acpi_EnterSleepState+0x73b/frame 0xfe00a7a197f0 >> acpi_AckSleepState() at acpi_AckSleepState+0x144/frame 0xfe00a7a19820 >> devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfe00a7a19870 >> vn_ioctl() at vn_ioctl+0x132/frame 0xfe00a7a19980 >> devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfe00a7a199a0 >> kern_ioctl() at kern_ioctl+0x27b/frame 0xfe00a7a19a00 >> sys_ioctl() at sys_ioctl+0x123/frame 0xfe00a7a19ad0 >> amd64_syscall() at amd64_syscall+0x140/frame 0xfe00a7a19bf0 >> fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe00a7a19bf0 >> >> I am not sure if this is just a coincidence but it appears as if a write to >> some >> PCI configuration register could temporarily interfere with access to the PM >> timer I/O port. >> Is that plausible? > If something disabled a BAR, then typical response of x86 chipset for timed > out read from PCIe is 0xf... . And the ACPI timer might be "behind" the isab0 bridge device which would indeed cause this. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: RFC: merging nfs-over-tls changes into head/sys
On 5/21/20 2:01 PM, Rick Macklem wrote: > Hi, > > I have now completed changes to the code in projects/nfs-over-tls, which > implements TLS encryption of NFS RPC messages. (This roughly conforms > to the internet draft "Towards Remote Procedure Call Encryption By Default", > which should soon become an RFC. For now, TLS1.2 is used instead of TLS1.3, > since FreeBSD's KERN_TLS does not yet implement TLS1.3.) > > I'd like to start merging some of the kernel changes into head/sys. > > The first of these would be creation of the syscall used by the daemons. > (The code in projects/nfs-over-tls cheats and uses the syscall for the gssd, > but it needs to have its own syscall so that the gssd daemon can run > concurrently > with it. I didn't want testers to need to build userland just to get a > syscall stub > in libc.) > > After this, there are a bunch of changes to the NFS code to add support for > ext_pgs mbufs (these are significant patches, but should not affect the > non-ext_pgs mbuf case, since they'll be conditional on ND_EXTPGS/M_EXTPGS). > > Does this sound ok to do? > > Please let me know if you see problems with me doing this? I don't see any problems, per se, but I still need to do some changes on my end for software KTLS RX before it's ready to merge (I'm hoping to kill the iovecs in the kthreads entirely). -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ${COMPILER_VERSION} < 40300
On 5/7/20 10:17 AM, Warner Losh wrote: > On Thu, May 7, 2020 at 10:38 AM Eric van Gyzen wrote: > >> If I were to clean up obsolete ${COMPILER_VERSION} tests in the tree, >> which ones should I keep? I would probably confine it to head, so I >> could prune quite a few. >> > > Anything in the bootstrap path should remain, especially in the install > portion of the bootstrap path since we don't require new compilers for > that. I doubt there's more than one or two of these and there may be zero. > The rest can go away. > > We should also look at taking out the fmake workarounds in the tree too. > Most of these are in src/Makefile and src/Makefile.inc. I think Eric though was asking about and the like. Right now we still have conditional support for some really old compilers that are likely to never be used with FreeBSD 13 (ancient Intel icc, gcc 2.95, etc.). Like, do we keep support for pre-ANSI C to delete 'const' etc. via macros? Admittedly there isn't a tremendous amount of cruft in cdefs.h. What would seem more invasive would be to do things like require C99 and use 'restrict' directly instead of __restrict, but the first step towards any of that is probably to remove some of the cruft from cdefs.h and possibly some other places. (BTW, it would be good to know if it's at all useful to keep any of the icc bits around.) -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: toolchain status
On 4/18/20 5:46 PM, Eric van Gyzen wrote: > Which architectures are still often built with an external toolchain? > I'd like to re-commit jemalloc 5.2.1. It was reverted because > "compilation fails for non-llvm-based platforms." I just built > tinderbox worlds with 5.2.1 with no problems, albeit with llvm. All platforms now use LLVM. You can still use GCC to build platforms, but if amd64 builds fine with GCC 9 with the patch applied, I think you should be fine to move forward (that is, if only GCC 6 fails to build, we can just deprecate using GCC 6 as an external toolchain for 13) -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: buildkernel failure because ctfconvert not installed
On 4/7/20 11:32 PM, Gary Jennejohn wrote: > Has anyone else seen this error? > > I tried to build a kernel yesterday, but the build failed while compiling > modules because ctfconvert was not found. > > I've had WITH_CTF=no in my src.conf for years, so neither ctfconvert nor > ctfmerge were installed. > > OK, I'll just go to the source dirctories and build and install. > > Nope. I got this error: > make: exec(ctfconvert) failed (No such file or directory) > and the build failed. > > WTF? ctfconvert requires ctfconvert to build? That makes no sense and is > a real chicken-and-egg problem if I've ever seen one. > > I ended up creating /usr/bin/ctf{convert,merge} shell scripts which simply > did exit 0. That allowed me to finally compile and install the utilities. > > Now I'm forced to have WITH_CTF=yes in my src.conf. No big deal. > > Still, it seems like the change to the make infrastructure which assumed > that cft{convert,merge} are always installed was rather premature. The change is that GENERIC has 'makeoptions WITH_CTF=yes'. If you build a kernel without that, you shouldn't need to have ctfconvert installed. This does mean you need to use a custom kernel instead of GENERIC. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Emacs tramp mode doesn't work with CURRENT
On 1/28/20 8:57 AM, John F Carr wrote: > I use emacs tramp mode, which opens an ssh connection to a remote machine for > file access. It works to Linux and FreeBSD 12.1, but not to CURRENT. There > has been a change in the way characters are echoed by the shell, with 12.1 > treating a consecutive run of backspace as an atomic unit and CURRENT > processing them one at a time. This is not necessarily a bug, but it is a > nuisance and independently it is suboptimal. I have the same breakage with an amd64 laptop running HEAD (and using tramp-mode from emacs on a 12.x host) Have you been able to bisect it at all? I think libedit is probably a good candidate as well. What I see is that tramp-mode just hangs until I kill the ssh session it is using, and then I see the same output you had below in the debug window showing the extraneous newlines. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: how to use the ktls
On 1/26/20 8:08 PM, Rick Macklem wrote: John Baldwin wrote: [stuff snipped] Hmmm, this might be a fair bit of work indeed. Right now KTLS only works for transmit (though I have some WIP for receive). KTLS does assumes that the initial handshake and key negotiation is handled by OpenSSL. OpenSSL uses custom setockopt() calls to tell the kernel which session keys to use. I think what you would want to do is use something like OpenSSL_connect() in userspace, and then check to see if KTLS "worked". If it did, you can tell the kernel it can write to the socket directly, otherwise you will have to bounce data back out to userspace to run it through SSL_write() and have userspace do SSL_read() and then feed data into the kernel. The pseudo-code might look something like: SSL *s; s = SSL_new(...); /* fd is the existing TCP socket */ SSL_set_fd(s, fd); OpenSSL_connect(s); if (BIO_get_ktls_send(SSL_get_wbio(s)) { /* Can use KTLS for transmit. */ } if (BIO_get_ktls_recv(SSL_get_rbio(s)) { /* Can use KTLS for receive. */ } So, I've been making some progress. The first stab at the daemons that do the handshake are now on svn in base/projects/nfs-over-tls/usr.sbin/rpctlscd and rpctlssd. A couple of questions... 1 - I haven't found BIO_get_ktls_send() or BIO_get_ktls_recv(). Are they in some different library? They only existing currently in OpenSSL master (which will be OpenSSL 3.0.0 when it is released). I have some not-yet-tested WIP changes to backport those changes into the base OpenSSL, but it will also add overhead to future OpenSSL imports perhaps, so it is something I need to work with secteam@ on to decide if it's viable once I have a tested PoC. I will try to at least provide a patch to the security/openssl port to add a KTLS option "soon" that you could use for testing. 2 - After a successful SSL_connect(), the receive queue for the socket has 478bytes of stuff in it. SSL_read() seems to know how to skip over it, but I haven't figured out a good way to do this. (I currently just do a recv(..478,0) on the socket.) Any idea what to do with this? (Or will the receive side of the ktls figure out how to skip over it?) I don't know yet. :-/ With the TOE-based TLS I had been testing with, this doesn't happen because the NIC blocks the data until it gets the key and then it's always available via KTLS. With software-based KTLS for RX (which I'm going to start working on soon), this won't be the case and you will potentially have some data already ready by OpenSSL that needs to be drained from OpenSSL before you can depend on KTLS. It's probably only the first few messsages, but I will need to figure out a way that you can tell how much pending data in userland you need to read via SSL_read() and then pass back into the kernel before relying on KTLS (it would just be a single chunk of data after SSL_connect you would have to do this for). I'm currently testing with a kernel that doesn't have options KERN_TLS and (so long as I get rid of the 478 bytes), it then just does unencrypted RPCs. So, I guess the big question is can I get access to your WIP code for KTLS receive? (I have no idea if I can make progress on it, but I can't do a lot more before I have that.) The WIP only works right now if you have a Chelsio T6 NIC as it uses the T6's TCP offload engine to do TLS. If you don't have that gear, ping me off-list. It would also let you not worry about the SSL_read case for now for initial testing. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: how to use the ktls
On 1/12/20 8:23 PM, Benjamin Kaduk wrote: > On Thu, Jan 09, 2020 at 10:53:38PM +, Rick Macklem wrote: >> John Baldwin wrote: >>> On 1/7/20 3:02 PM, Rick Macklem wrote: >>>> Hi, >>>> >>>> Now that I've completed NFSv4.2 I'm on to the next project, which is >>>> making NFS >>>> work over TLS. >>>> Of course, I know absolutely nothing about TLS, which will make this an >>>> interesting >>>> exercise for me. >>>> I did find simple server code in the OpenSSL doc. which at least gives me >>>> a starting >>>> point for the initialization stuff. >>>> As I understand it, this initialization must be done in userspace? >>>> >>>> Then somehow, the ktls takes over and does the encryption of the >>>> data being sent on the socket via sosend_generic(). Does that sound right? >>>> >>>> So, how does the kernel know the stuff that the initialization phase >>>> (handshake) >>>> figures out, or is it magic I don't have to worry about? >>>> >>>> Don't waste much time replying to this. A few quick hints will keep me >>>> going for >>>> now. (From what I've seen sofar, this TLS stuff isn't simple. And I >>>> thought Kerberos >>>> was a pain.;-) >>>> >>>> Thanks in advance for any hints, rick >>> >>> Hmmm, this might be a fair bit of work indeed. >> If it was easy, it wouldn't be fun;-) FreeBSD13 is a ways off and if it >> doesn't make that, oh well.. >> >>> Right now KTLS only works for transmit (though I have some WIP for receive). >> Hopefully your WIP will make progress someday, or I might be able to work on >> it. >> >>> KTLS does assumes that the initial handshake and key negotiation is handled >>> by >>> OpenSSL. OpenSSL uses custom setockopt() calls to tell the kernel which >>> session keys to use. >> Yea, I figured I'd need a daemon like the gssd for this. The krpc makes it a >> little >> more fun, since it handles TCP connections in the kernel. >> >>> I think what you would want to do is use something like OpenSSL_connect() in >>> userspace, and then check to see if KTLS "worked". >> Thanks (and for the code below). I found the simple server code in the >> OpenSSL doc, >> but the client code gets a web page and is quite involved. >> >>> If it did, you can tell >>> the kernel it can write to the socket directly, otherwise you will have to >>> bounce data back out to userspace to run it through SSL_write() and have >>> userspace do SSL_read() and then feed data into the kernel. >> I don't think bouncing the data up/down to/from userland would work well. >> I'd say "if it can't be done in the kernel, too bad". The above could be >> used for >> a NULL RPC to see it is working, for the client. > > So you're saying that we'd only support rpc-over-tls as an NFS client and > not as a server, at least until the WIP for ktls read appears? To be clear, I have KTLS RX working with TOE right now. I have a design in my head for KTLS RX that would use software and co-processor engines via OCF such as aesni(4) and ccr(4) that I hope to implement in the next few months, so KTLS RX isn't too far off. OpenSSL already supports KTLS RX on Linux and the FreeBSD patches I already have use the same API. (Each received TLS frame is read via recvmsg() with the TLS header fields in a cmsg.) -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: how to use the ktls
On 1/7/20 3:02 PM, Rick Macklem wrote: > Hi, > > Now that I've completed NFSv4.2 I'm on to the next project, which is making > NFS > work over TLS. > Of course, I know absolutely nothing about TLS, which will make this an > interesting > exercise for me. > I did find simple server code in the OpenSSL doc. which at least gives me a > starting > point for the initialization stuff. > As I understand it, this initialization must be done in userspace? > > Then somehow, the ktls takes over and does the encryption of the > data being sent on the socket via sosend_generic(). Does that sound right? > > So, how does the kernel know the stuff that the initialization phase > (handshake) > figures out, or is it magic I don't have to worry about? > > Don't waste much time replying to this. A few quick hints will keep me going > for > now. (From what I've seen sofar, this TLS stuff isn't simple. And I thought > Kerberos > was a pain.;-) > > Thanks in advance for any hints, rick Hmmm, this might be a fair bit of work indeed. Right now KTLS only works for transmit (though I have some WIP for receive). KTLS does assumes that the initial handshake and key negotiation is handled by OpenSSL. OpenSSL uses custom setockopt() calls to tell the kernel which session keys to use. I think what you would want to do is use something like OpenSSL_connect() in userspace, and then check to see if KTLS "worked". If it did, you can tell the kernel it can write to the socket directly, otherwise you will have to bounce data back out to userspace to run it through SSL_write() and have userspace do SSL_read() and then feed data into the kernel. The pseudo-code might look something like: SSL *s; s = SSL_new(...); /* fd is the existing TCP socket */ SSL_set_fd(s, fd); OpenSSL_connect(s); if (BIO_get_ktls_send(SSL_get_wbio(s)) { /* Can use KTLS for transmit. */ } if (BIO_get_ktls_recv(SSL_get_rbio(s)) { /* Can use KTLS for receive. */ } -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: New external GCC toolchain ports/packages
On 12/19/19 12:06 PM, Ryan Libby wrote: > On Wed, Dec 18, 2019 at 1:49 PM John Baldwin wrote: >> >> In the interest of supporting newer versions of GCC for a base system >> toolchain, I've renamed the external GCC packages from -gcc >> to -gcc6. These are built as flavors of a new devel/freebsd-gcc6 >> port. The xtoolchain package is not used for these new packages, instead >> one does 'pkg install mips-gcc6' to get the GCC 6.x MIPS compiler and >> uses 'CROSS_TOOLCHAIN=mips-gcc6'. I've also gone ahead and updated this >> compiler to 6.5.0. >> >> I will leave the old ports/packages around for now to permit an easy >> transition, but going forward, the -gcc6 packages should be preferred >> to -xtoolchain-gcc for all but riscv (riscv64-gcc and >> riscv64-xtoolchain-gcc >> are separate from the powerpc64-gcc set of packages). >> >> In addition, I've also just added a devel/freebsd-gcc9 package which >> builds -gcc9 packages. It adds powerpc and riscv flavors relative >> to freebsd-gcc6 and uses GCC 9.2.0. To date in my testing I've yet to >> be able to finish a buildworld on any of the platforms I've tried >> (amd64, mips, sparc64), but the packages should permit other developers >> to get the tree building with GCC 9. To use these packages one would do >> something like: >> >> # pkg install amd64-gcc9 >> # make buildworld CROSS_TOOLCHAIN=amd64-gcc9 >> >> You can install both the gcc6 and gcc9 versions of a package at the same >> time, e.g. amd64-gcc6 and amd64-gcc9. Having different packages for major >> versions is similar to llvm and will also let us keep a known-good >> toolchain package for older releases while using newer major versions on >> newer FreeBSD releases (e.g gcc9 for 13.0 and gcc6 for 12.x). >> >> I do plan to switch the default toolchains for make universe/tinderbox >> for targets using -xtoolchain-gcc based on GCC 6 over to the >> freebsd-gcc6 variants in the next week or so. >> >> -- >> John Baldwin > > Awesome, thanks! I was able to get amd64 buildworld and buildkernel to > succeed with just a few changes, and none to the port. I'll work on > getting the changes in. I have been able to get it building as well, mostly by muting a few warnings, adding libcompiler_rt to rtld's link for i386, disabling googletest (needs an upstream patch to stop using signed wchar_t), and a hack to jemalloc. I was able to build riscv as well with those same changes and am working through builds of other platforms. I'm happy to compare notes. The jemalloc one is a bit weird. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: New external GCC toolchain ports/packages
On 12/18/19 4:16 PM, Mark Millard wrote: > > > On 2019-Dec-18, at 13:48, John Baldwin wrote: > >> In the interest of supporting newer versions of GCC for a base system >> toolchain, I've renamed the external GCC packages from -gcc >> to -gcc6. These are built as flavors of a new devel/freebsd-gcc6 >> port. The xtoolchain package is not used for these new packages, instead >> one does 'pkg install mips-gcc6' to get the GCC 6.x MIPS compiler and >> uses 'CROSS_TOOLCHAIN=mips-gcc6'. I've also gone ahead and updated this >> compiler to 6.5.0. >> >> I will leave the old ports/packages around for now to permit an easy >> transition, but going forward, the -gcc6 packages should be preferred >> to -xtoolchain-gcc for all but riscv (riscv64-gcc and >> riscv64-xtoolchain-gcc >> are separate from the powerpc64-gcc set of packages). >> >> In addition, I've also just added a devel/freebsd-gcc9 package which >> builds -gcc9 packages. It adds powerpc and riscv flavors relative >> to freebsd-gcc6 and uses GCC 9.2.0. To date in my testing I've yet to >> be able to finish a buildworld on any of the platforms I've tried >> (amd64, mips, sparc64), but the packages should permit other developers >> to get the tree building with GCC 9. To use these packages one would do >> something like: >> >> # pkg install amd64-gcc9 >> # make buildworld CROSS_TOOLCHAIN=amd64-gcc9 >> >> You can install both the gcc6 and gcc9 versions of a package at the same >> time, e.g. amd64-gcc6 and amd64-gcc9. Having different packages for major >> versions is similar to llvm and will also let us keep a known-good >> toolchain package for older releases while using newer major versions on >> newer FreeBSD releases (e.g gcc9 for 13.0 and gcc6 for 12.x). >> >> I do plan to switch the default toolchains for make universe/tinderbox >> for targets using -xtoolchain-gcc based on GCC 6 over to the >> freebsd-gcc6 variants in the next week or so. >> > > How about base/binutils and base/gcc ? Is their (future?) status > changed by any of this activity? I plan to rename base/gcc to base/gcc6 (and update it to 6.5) and then add a base/gcc9 that would provide GCC 9 as /usr/bin/cc. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
New external GCC toolchain ports/packages
In the interest of supporting newer versions of GCC for a base system toolchain, I've renamed the external GCC packages from -gcc to -gcc6. These are built as flavors of a new devel/freebsd-gcc6 port. The xtoolchain package is not used for these new packages, instead one does 'pkg install mips-gcc6' to get the GCC 6.x MIPS compiler and uses 'CROSS_TOOLCHAIN=mips-gcc6'. I've also gone ahead and updated this compiler to 6.5.0. I will leave the old ports/packages around for now to permit an easy transition, but going forward, the -gcc6 packages should be preferred to -xtoolchain-gcc for all but riscv (riscv64-gcc and riscv64-xtoolchain-gcc are separate from the powerpc64-gcc set of packages). In addition, I've also just added a devel/freebsd-gcc9 package which builds -gcc9 packages. It adds powerpc and riscv flavors relative to freebsd-gcc6 and uses GCC 9.2.0. To date in my testing I've yet to be able to finish a buildworld on any of the platforms I've tried (amd64, mips, sparc64), but the packages should permit other developers to get the tree building with GCC 9. To use these packages one would do something like: # pkg install amd64-gcc9 # make buildworld CROSS_TOOLCHAIN=amd64-gcc9 You can install both the gcc6 and gcc9 versions of a package at the same time, e.g. amd64-gcc6 and amd64-gcc9. Having different packages for major versions is similar to llvm and will also let us keep a known-good toolchain package for older releases while using newer major versions on newer FreeBSD releases (e.g gcc9 for 13.0 and gcc6 for 12.x). I do plan to switch the default toolchains for make universe/tinderbox for targets using -xtoolchain-gcc based on GCC 6 over to the freebsd-gcc6 variants in the next week or so. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: make delete-old: missing some files?
On 10/22/19 8:42 PM, Alexey Dokuchaev wrote: > On Tue, Oct 22, 2019 at 04:34:53PM -0700, John Baldwin wrote: >> On 10/18/19 10:05 AM, Alexey Dokuchaev wrote: >>> hi there, >>> >>> i've made my -CURRENT world and installed, but "make delete-old" tells >>> me it cannot remove some directories: >>> >>>>>> Removing old directories >>> rmdir: /usr/share/dtrace: Directory not empty >>> rmdir: /usr/lib/dtrace: Directory not empty > > Apparently, these are because I've started to put WITHOUT_CDDL=yes in > /etc/src.conf since recently: > > $ find /usr/lib/dtrace -type f > /usr/lib/dtrace/siftr.d > /usr/lib/dtrace/mbuf.d > /usr/lib/dtrace/socket.d > > $ find /usr/share/dtrace -type f > /usr/share/dtrace/nfsattrstats > /usr/share/dtrace/siftr > /usr/share/dtrace/blocking > /usr/share/dtrace/tcpdebug> > I can see some dtrace/*.d files in OptionalObsoleteFiles.inc, perhaps > these are missing? Probably. >>> # find /usr/lib/debug/usr/lib/engines >>> /usr/lib/debug/usr/lib/engines >>> /usr/lib/debug/usr/lib/engines/lib4758cca.so.debug >>> ... >> >> These are from the OpenSSL 1.1.1 commit. However, they are tagged as >> OLD_LIBS and check-old-libs and delete-old-libs should be automatically >> deleting these? Does 'make check-old' report these files as >> old libraries? > > I've manually placed one of those back on the filesystem and `make > check-old' reported it (twice!) under libraries. But after r353907 it > get cleaned up properly with `make delete-old'. Hmm, then 'make delete-old-libs' should already delete them without needing r353907. The issue with r353907 is if someone doesn't delete the actual libraries via 'make delete-old-libs' but then tries to debug an application that was using the old openssl and crashed, we'd no longer have debug symbols if the crash was in one of those libraries. That matters less for OpenSSL engines, but matters more for something like libutil, etc. hence why we delete debug symbols as part of delete-old-libs instead of delete-old. If 'make delete-old-libs' deletes these files already, then we should probably revert r353907. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: make delete-old: missing some files?
On 10/18/19 10:05 AM, Alexey Dokuchaev wrote: > hi there, > > i've made my -CURRENT world and installed, but "make delete-old" tells > me it cannot remove some directories: > >>>> Removing old directories > rmdir: /usr/share/dtrace: Directory not empty > rmdir: /usr/lib/dtrace: Directory not empty > rmdir: /usr/lib/debug/usr/tests/libexec/rtld-elf: Directory not empty > rmdir: /usr/lib/debug/usr/tests/libexec: Directory not empty > rmdir: /usr/lib/debug/usr/tests: Directory not empty > rmdir: /usr/lib/debug/usr/lib/i18n: Directory not empty > rmdir: /usr/lib/debug/usr/lib/engines: Directory not empty > rmdir: /usr/lib/debug/usr/lib: Directory not empty > rmdir: /usr/lib/debug/usr: Directory not empty > > taking /usr/lib/debug/usr/lib/engines as an example: > > # find /usr/lib/debug/usr/lib/engines > /usr/lib/debug/usr/lib/engines > /usr/lib/debug/usr/lib/engines/lib4758cca.so.debug > /usr/lib/debug/usr/lib/engines/libaep.so.debug > /usr/lib/debug/usr/lib/engines/libatalla.so.debug > /usr/lib/debug/usr/lib/engines/libcapi.so.debug > /usr/lib/debug/usr/lib/engines/libchil.so.debug > /usr/lib/debug/usr/lib/engines/libcswift.so.debug > /usr/lib/debug/usr/lib/engines/libgost.so.debug > /usr/lib/debug/usr/lib/engines/libnuron.so.debug > /usr/lib/debug/usr/lib/engines/libsureware.so.debug > /usr/lib/debug/usr/lib/engines/libubsec.so.debug > > am i missing something, or ObsoleteFiles.inc lacks a few entries? These are from the OpenSSL 1.1.1 commit. However, they are tagged as OLD_LIBS and check-old-libs and delete-old-libs should be automatically deleting these? Does 'make check-old' report these files as old libraries? -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ktrace/kdump give incorrect message on unlinkat() failure due to capabilities
On 9/25/19 10:33 AM, Sergey Kandaurov wrote: > On Sat, Sep 21, 2019 at 08:43:58PM -0400, Ryan Stone wrote: >> I have written a short test program that runs unlinkat(2) in >> capability mode and fails due to not having the write capabilities: >> >> https://people.freebsd.org/~rstone/src/unlink.c >> >> If I run the binary under ktrace and look at the kdump output, it >> gives the following incorrect output: >> >> 43775 unlink CALL unlinkat(0x3,0x7fffe995,0) >> 43775 unlink NAMI "from.QAUlAA0" >> 43775 unlink CAP operation requires CAP_LOOKUP, descriptor holds >> CAP_LOOKUP >> 43775 unlink RET unlinkat -1 errno 93 Capabilities insufficient >> >> The message should instead say that the operation requires >> CAP_UNLINKAT. Looking at sys/capsicum.h, I suspect that the problem >> is related to the strange definition of CAP_UNLINKAT: >> >> #define CAP_UNLINKAT (CAP_LOOKUP | 0x1000ULL) > > FYI, with this grep it was able to decode capabilities. > > Index: lib/libsysdecode/mktables > === > --- lib/libsysdecode/mktables (revision 352685) > +++ lib/libsysdecode/mktables (working copy) > @@ -157,7 +157,7 @@ > gen_table "sigcode" "SI_[A-Z]+[[:space:]]+0(x[0-9abcdef]+)?" > "sys/signal.h" > gen_table "umtxcvwaitflags" "CVWAIT_[A-Z_]+[[:space:]]+0x[0-9]+" > "sys/umtx.h" > gen_table "umtxrwlockflags" "URWLOCK_PREFER_READER[[:space:]]+0x[0-9]+" > "sys/umtx.h" > -gen_table "caprights" > "CAP_[A-Z_]+[[:space:]]+CAPRIGHT\([0-9],[[:space:]]+0x[0-9]{16}ULL\)" > "sys/capsicum.h" > +gen_table "caprights" > "CAP_[A-Z_]+[[:space:]]+(CAPRIGHT|[()A-Z_|[:space:]]+CAP_LOOKUP)" > "sys/capsicum.h" > gen_table "sctpprpolicy""SCTP_PR_SCTP_[A-Z_]+[[:space:]]+0x[0-9]+" > "netinet/sctp_uio.h" "SCTP_PR_SCTP_ALL" > gen_table "cmsgtypesocket" "SCM_[A-Z_]+[[:space:]]+0x[0-9]+" > "sys/socket.h" > if [ -e "${include_dir}/x86/sysarch.h" ]; then CAP_SEEK and CAP_MMAP_X might also be subject to this. However, I'm not quite understanding the regex, or at least why the modified portion of the regex isn't something like this: (CAPRIGHT\(|\(CAP_LOOKUP) That is, you currently have [()A-Z_|[:space:]]+ for an expression that I think will only ever match a single '(' character. A more general form that might work for CAP_SEEK and CAP_MMAP_X might be to match on 'CAP_ | 0xhttps://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: problem with LOCAL_MODULES
On 8/30/19 10:42 AM, Kyle Evans wrote: > On Fri, Aug 16, 2019 at 7:38 PM John Baldwin wrote: >> >> On 8/16/19 3:05 AM, Gary Jennejohn wrote: >>> I tried to build a kernel today and it failed in modules-all even >>> though I had LOCAL_MODULES="" in /etc/src.conf, as recommended by >>> jhb. >>> >>> That's wrong. It has to be LOCAL_MODULES=, otherwise >>> /sys/conf/kern.post.mk seems to conclude that there should be a >>> module under /usr/local/sys/modules with the name "". >> >> I think this will permit both versions to work: >> >> Index: sys/conf/kern.post.mk >> === >> --- kern.post.mk(revision 351151) >> +++ kern.post.mk(working copy) >> @@ -76,6 +76,7 @@ modules-${target}: >> cd $S/modules; ${MKMODULESENV} ${MAKE} \ >> ${target:S/^reinstall$/install/:S/^clobber$/cleandir/} >> .endif >> +.if !empty(LOCAL_MODULES) >> .for module in ${LOCAL_MODULES} >> @${ECHODIR} "===> ${module} >> (${target:S/^reinstall$/install/:S/^clobber$/cleandir/})" >> @cd ${LOCAL_MODULES_DIR}/${module}; ${MKMODULESENV} ${MAKE} \ >> @@ -83,6 +84,7 @@ modules-${target}: >> ${target:S/^reinstall$/install/:S/^clobber$/cleandir/} >> .endfor >> .endif >> +.endif >> .endfor >> >> # Handle ports (as defined by the user) that build kernel modules >> > > I think I'd like to see this with !empty(LOCAL_MODULES) && > EXISTS(${LOCAL_MODULES_DIR}) or maybe just the latter condition to > prevent accidental foot-shooting... I was testing a problem with doing > this stuff in a poudriere build for swills@ and set LOCAL_MODULES="" > only to get an error because LOCAL_MODULES_DIR doesn't yet exist on > the machine I was testing with -- which we can trivially avoid. Did this work for you? Gary said in a followup that it didn't work, so that's why I hadn't committed it. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: HEADSUP: drm-current-kmod now installs sources
On 8/16/19 5:33 PM, Rozhuk Ivan wrote: > On Fri, 16 Aug 2019 17:23:08 -0700 > John Baldwin wrote: > >>> I use better way: >>> /etc/make.conf: >>> # Modules to build with kernel. >>> PORTS_MODULES+= graphics/drm-fbsd12.0-kmod >>> graphics/gpu-firmware-kmod >> >> This doesn't work for folks who use pre-built packages. >> > > I update mine /usr/src via rsync from other mine server, so any > changes made by port or me or ... will lost. > > Probably there is must be some solution like special folder where > ports can store some file with port name, that automaticly go to > PORTS_MODULES on build kernel. > And probably pkg can do with this something for "folks who use pre-built > packages". That is what this framework does. Ports install a Makefile to /usr/local/sys/modules//Makefile. That makefile is invoked during the 'modules' stages of buildkernel and installkernel. In the case of drm-current-kmod, the port installs sources into subdirectories of /usr/local/sys/modules/drm-current-kmod and installs a top-level Makefile in /usr/local/sys/modules/drm-current-kmod/Makefile that uses bsd.subdir.mk to recurse into those subdirectories. Currently, LOCAL_MODULES is automatically populated to a list of the directories in /usr/local/sys/modules and the debate is about when LOCAL_MODULES should be auto-populated or not. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: problem with LOCAL_MODULES
On 8/16/19 3:05 AM, Gary Jennejohn wrote: > I tried to build a kernel today and it failed in modules-all even > though I had LOCAL_MODULES="" in /etc/src.conf, as recommended by > jhb. > > That's wrong. It has to be LOCAL_MODULES=, otherwise > /sys/conf/kern.post.mk seems to conclude that there should be a > module under /usr/local/sys/modules with the name "". I think this will permit both versions to work: Index: sys/conf/kern.post.mk === --- kern.post.mk(revision 351151) +++ kern.post.mk(working copy) @@ -76,6 +76,7 @@ modules-${target}: cd $S/modules; ${MKMODULESENV} ${MAKE} \ ${target:S/^reinstall$/install/:S/^clobber$/cleandir/} .endif +.if !empty(LOCAL_MODULES) .for module in ${LOCAL_MODULES} @${ECHODIR} "===> ${module} (${target:S/^reinstall$/install/:S/^clobber$/cleandir/})" @cd ${LOCAL_MODULES_DIR}/${module}; ${MKMODULESENV} ${MAKE} \ @@ -83,6 +84,7 @@ modules-${target}: ${target:S/^reinstall$/install/:S/^clobber$/cleandir/} .endfor .endif +.endif .endfor # Handle ports (as defined by the user) that build kernel modules -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"