Re: Files not deleted via update procedure: rescue/gbde usr/include/machine/fiq.h usr/lib/include/machine/fiq.h usr/share/man/man4/CAM.4.gz

2024-07-31 Thread John Baldwin

On 7/27/24 19:28, Mark Millard wrote:

On Jul 27, 2024, at 16:07, Mark Millard  wrote:


The following old files were in the historically incrementally
updated directory tree but not in the installation to an empty
directory tree (checked via diff -rq):

/usr/obj/DESTDIRs/main-CA7-poud/rescue/gbde
/usr/obj/DESTDIRs/main-CA7-poud/usr/include/machine/fiq.h
/usr/obj/DESTDIRs/main-CA7-poud/usr/lib/include/machine/fiq.h
/usr/obj/DESTDIRs/main-CA7-poud/usr/share/man/man4/CAM.4.gz


That was an armv7 context.

For comparison/contrast, aarch64 had:

/usr/obj/DESTDIRs/main-CA76-poud/rescue/gbde
/usr/obj/DESTDIRs/main-CA76-poud/usr/lib/debug/usr/tests/cddl/usr.sbin/dtrace/amd64/kinst/
/usr/obj/DESTDIRs/main-CA76-poud/usr/lib/debug/usr/tests/lib/libc/ssp/h_raw.debug
/usr/obj/DESTDIRs/main-CA76-poud/usr/share/examples/IPv6/USAGE
/usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man2/recvmmsg.2.gz
/usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man2/sendmmsg.2.gz
/usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man4/CAM.4.gz
/usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man4/geom_map.4.gz
/usr/obj/DESTDIRs/main-CA76-poud/usr/tests/cddl/usr.sbin/dtrace/amd64/kinst/
/usr/obj/DESTDIRs/main-CA76-poud/usr/tests/lib/libc/ssp/h_raw


Thanks, I've pushed fixes for most of these.  The *mmsg.2.gz links are actually
not supposed to be stale and D46200 should fix those.  h_raw is a bit more of an
odd duck that isn't easily solved.  I'm not sure why it was installed in the
past for you but isn't installed anymore.

--
John Baldwin




Re: aesni_load present in /boot/loader.conf on arm64

2024-07-31 Thread John Baldwin

On 7/31/24 08:15, void wrote:

Hi,

Looking at man 4 aesni it appears this pertains to intel and AMD only?
is its prescence on arm64 a bug?

It seems to be added to /boot/loader.conf by default.

The method I used to install is to boot to the latest snapshot at
the time, then plug in a usb3 disk, ran bsdinstall to that disk,
rebooted (this booted initially to the installer image), mounted the
msdos partition on /mnt. moved the /boot/efi/efi from the installed-to
disk out of the way, copied everything in /mnt to /boot/efi,
moved the /boot/efi/efi back to where it originally was, halted the machine and
removed the installer image. This was to achieve zfs-on-root.

Maybe something about the way I installed meant aesni was added?


Looks like bsdinstall hardcodes aesni without doing an architecture
check for both ZFS and geli.

Probably the bits of the zfsboot script referencing aesni need to
switch on the architecture.  The trick is that depending on the
architecture you may want to load more than one module.  For 14
I think you could get by with something like:

crypto_kld()
{
case `uname -m` in
amd64|i386)
echo "aesni"
;;
arm64)
echo "armv8crypto"
;;
*)
echo ""
}

Then in the other parts of zfsboot call this function and treat it as
a list of modules.  On main I think you would want 32-bit arm and
powerpc64 to list ossl, and you might want to include ossl for
x86 and arm64 as well (eventually ossl should replace aesni and
armv8crypto IMO).

Side topic: the ossl(4) manpage in main is stale and needs to be updated
to reflect armv7 and powerpc64 support.  I'm not sure yet if it supports
AES-GCM for armv8 as well.

--
John Baldwin




Re: 41dfea24eec panics during ata attach on ESXi VM

2024-06-05 Thread John Baldwin

On 6/5/24 4:35 AM, Yuri Pankov wrote:

After updating to 41dfea24eec (GENERIC-NODEBUG), ESXi VM started to
panic while attaching atapci children.  I was unable to grab original
boot panic data ("keyboard" dead), but was able to boot with
hint.ata.0.disabled=1, hint.ata.1.disabled=1, and `devctl enable ata0`
reproduced the issue:

ata0:  at channel 0 on atapci0


This should be fixed now by commit 56b822a17cde5940909633c50623d463191a7852.
Sorry for the breakage.

--
John Baldwin




Re: gcc behavior of init priority of .ctors and .dtors section

2024-06-04 Thread John Baldwin

On 5/16/24 4:05 PM, Lorenzo Salvadore wrote:

On Thursday, May 16th, 2024 at 20:26, Konstantin Belousov  
wrote:

gcc13 from ports
`# gcc ctors.c && ./a.out init 1 init 2 init 5 init 4 init 3 main fini 3 fini 4 
fini 5 fini 2 fini 1`

The above order is not expected. I think clang's one is correct.

Further hacking with readelf shows that clang produces the right order of
section .rela.ctors but gcc does not.

```
# clang -fno-use-init-array -c ctors.c && readelf -r ctors.o | grep 'Relocation 
section with addend (.rela.ctors)' -A5 > clang.txt
# gcc -c ctors.c && readelf -r ctors.o | grep 'Relocation section with addend 
(.rela.ctors)' -A5 > gcc.txt
# diff clang.txt gcc.txt
3,5c3,5
<  00080001 R_X86_64_64 0060 init_65535_2 + 0
< 0008 00070001 R_X86_64_64 0040 init + 0
< 0010 00060001 R_X86_64_64 0020 init_65535 + 0
---


 00060001 R_X86_64_64 0011 init_65535 + 0
0008 00070001 R_X86_64_64 0022 init + 0
0010 00080001 R_X86_64_64 0033 init_65535_2 + 0
```


The above show clearly gcc produces the wrong order of section `.rela.ctors`.

Is that expected behavior ?

I have not tried Linux version of gcc.


Note that init array vs. init function behavior is encoded by a note added
by crt1.o. I suspect that the problem is that gcc port is built without
--enable-initfini-array configure option.


Indeed, support for .init_array and .fini_array has been added to the GCC ports
but is present in the *-devel ports only for now. I will
soon proceed to enable it for the GCC standard ports too. lang/gcc14 is soon
to be added to the ports tree and will have it since the beginning.

If this is indeed the issue, switching to a -devel GCC port should fix it.


FWIW, the devel/freebsd-gcc* ports have passed this flag to GCC's configure
for a long time (since we made the switch in clang).

--
John Baldwin




Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic [now fixed]

2024-02-15 Thread John Baldwin

On 2/14/24 11:03 PM, Mark Millard wrote:

On Feb 14, 2024, at 18:19, Mark Millard  wrote:


Your changes have the RPi4B that previously got the
panic to boot all the way instead. Details:

I have updated my pkg base environment to have the
downloaded kernels (and kernel source) with your
changes and have booted with each of:

/boot/kernel/kernel
/boot/kernel.GENERIC-NODEBUG/kernel

For reference:

# uname -apKU
FreeBSD aarch64-main-pkgs 15.0-CURRENT FreeBSD 15.0-CURRENT 
main-n268300-d79b6b8ec267 GENERIC-NODEBUG arm64 aarch64 1500014 1500012

Thanks for the fix.

Now I'll update the rest of pkg base materials.


Question: Are any of the changes to be MFC'd at some point?


If I do I will merge a large batch at once, and probably adjust
the order.  For example, I'll merge the pci_host_generic changes
before pci_pci changes so that stable branches will be bisectable.

--
John Baldwin




Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-14 Thread John Baldwin

On 2/14/24 10:16 AM, Mark Millard wrote:

Top posting a related but separate item:

I looked up some old (2022-Dec-17) lspci -v output from
a Linux boot. Note the "Memory at" value 6 (in
the 35 bit BCM2711 address space) and the "(64-bit,
non-prefetchable)" (and "[size=4K]").

01:00.0 USB controller: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 
Controller (rev 01) (prog-if 30 [XHCI])
 Subsystem: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller
 Device tree node: 
/sys/firmware/devicetree/base/scb/pcie@7d50/pci@0,0/usb@0,0
 Flags: bus master, fast devsel, latency 0, IRQ 51
 Memory at 6 (64-bit, non-prefetchable) [size=4K]
 Capabilities: [80] Power Management version 3
 Capabilities: [90] MSI: Enable+ Count=1/4 Maskable- 64bit+
 Capabilities: [c4] Express Endpoint, MSI 00
 Capabilities: [100] Advanced Error Reporting
 Kernel driver in use: xhci_hcd


"Memory at 6 (64-bit, non-prefetchable)":
Violation of a PCIe standard?


No, this is a device BAR which can be 64-bit (memory BARs can either
be 32-bits or 64-bits).  However, the "window" in a PCI _bridge_ for
memory is only defined to be 32-bits.  Windows in PCI-PCI bridges
are a special type of BAR that defines the address ranges that the
bridge decodes on the parent side and passes down to child devices.
The prefetchable window in PCI-PCI bridges can optionally be 64-bit.

BAR == a range of memory or I/O port addresses decoded by a device,
usually mapped to a register bank, but sometimes mapped to internal
memory (e.g. a framebuffer)

Window == a range of memory or I/O port addresses decoded by a bridge
for which transactions are passed across the bridge to be handled by
a child device.

--
John Baldwin




Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-14 Thread John Baldwin

On 2/14/24 9:57 AM, Mark Millard wrote:

On Feb 14, 2024, at 08:08, John Baldwin  wrote:


On 2/12/24 5:57 PM, Mark Millard wrote:

On Feb 12, 2024, at 16:36, Mark Millard  wrote:

On Feb 12, 2024, at 16:10, Mark Millard  wrote:


On Feb 12, 2024, at 12:00, Mark Millard  wrote:


[Gack: I was looking at the wrong vintage of source code, predating
your changes: wrong system used.]


On Feb 12, 2024, at 10:41, Mark Millard  wrote:


On Feb 12, 2024, at 09:32, John Baldwin  wrote:


On 2/9/24 8:13 PM, Mark Millard wrote:

Summary:
pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000
. . .
rman_manage_region:  request: start 0x6, end 
0x6000f
panic: Failed to add resource to rman


Hmmm, I suspect this is due to the way that bus_translate_resource works which 
is
fundamentally broken.  It rewrites the start address of a resource in-situ 
instead
of keeping downstream resources separate from the upstream resources.   For 
example,
I don't see how you could ever release a resource in this design without 
completely
screwing up your rman.  That is, I expect trying to detach a PCI device behind a
translating bridge that uses the current approach should corrupt the allocated
resource ranges in an rman long before my changes.

That said, that doesn't really explain the panic.  Hmm, the panic might be 
because
for PCI bridge windows the driver now passes RF_ACTIVE and the 
bus_translate_resource
hack only kicks in the activate_resource method of pci_host_generic.c.


Detail:
. . .
pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000


This indicates this is a translating bus.


pcib1:  irq 91 at device 0.0 on pci0
rman_manage_region:  request: start 0x1, end 0x1
pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10
rman_reserve_resource_bound:  request: [0xc000, 0xc00f], 
length 0x10, flags 102, device pcib1
rman_reserve_resource_bound: trying 0x <0xc000,0xf>
considering [0xc000, 0x]
truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10)
candidate region: [0xc000, 0xc00f], size 0x10
allocating from the beginning
rman_manage_region:  request: start 0x6, end 
0x6000f


What you later typed does not match:

0x6
0x6000f

You later typed:

0x6000
0x600fff

This seems to have lead to some confusion from using the
wrong figure(s).


The fact that we are trying to reserve the CPU addresses in the rman is because
bus_translate_resource rewrote the start address in the resource after it was 
allocated.

That said, I can't see why rman_manage_region would actually fail.  At this 
point the
rman is empty (this is the first call to rman_manage_region for "pcib1 memory 
window"),
so only the check that should be failing are the checks against rm_start and
rm_end.  For the memory window, rm_start is always 0, and rm_end is always
0x, so both the old (0xc - 0xc00f) and new (0x6000 - 
0x600fff)
ranges are within those bounds.


No:

0x

.vs (actual):

0x6
0x6000f


Ok, then this explains the failure if the "raw" addresses are above 4G.  I have
access to an emag I'm currently using to test fixes to pci_host_generic.c to
avoid corrupting struct resource objects.  I'll post the diff once I've got
something verified to work.


It looks to me like in sys/dev/pci/pci_pci.c the:
static void
pcib_probe_windows(struct pcib_softc *sc)
{
. . .
 pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0x);
. . .
is just inappropriately restrictive about where in the system
address space a PCIe can validly be mapped to on the high end.
That, in turn, leads to the rejection on the RPi4B now that
the range use is checked.


No, the physical register in PCI-PCI bridges is only 32-bits.  Only the
prefetchable BAR supports 64-bit addresses.


Just for my edification . . .

As I understand, SYS_RES_MEMORY for the BCM2711
means the 35 bit addressing space in the BCM2711,
not a PCIe device internal address range that
corresponds. Am I wrong about that?

If I'm wrong, what does identify the 35 bit
addressing space in the BCM2711?

If I'm correct, then the 0..0x
seems to be from the wrong address space up
front. Or, may be, the SYS_RES_MEMORY and the
0x argments are not related as I
expected and the 0x is not a
SYS_RES_MEMORY value?


We use SYS_RES_MEMORY for both address spaces.  SYS_RES_MEMORY is more of
an address space "type" and doesn't necessarily name a single, unique
address space.  The way to think about these address spaces is instances
of 'struct rman'.  There's a global 'struct rman' in the arm64 

Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-14 Thread John Baldwin

On 2/14/24 8:42 AM, Warner Losh wrote:

On Wed, Feb 14, 2024 at 9:08 AM John Baldwin  wrote:


On 2/12/24 5:57 PM, Mark Millard wrote:

On Feb 12, 2024, at 16:36, Mark Millard  wrote:


On Feb 12, 2024, at 16:10, Mark Millard  wrote:


On Feb 12, 2024, at 12:00, Mark Millard  wrote:


[Gack: I was looking at the wrong vintage of source code, predating
your changes: wrong system used.]


On Feb 12, 2024, at 10:41, Mark Millard  wrote:


On Feb 12, 2024, at 09:32, John Baldwin  wrote:


On 2/9/24 8:13 PM, Mark Millard wrote:

Summary:
pcib0:  mem

0x7d50-0x7d50930f irq 80,81 on simplebus2

pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc000, CPU addr: 0x6, Size:

0x4000

. . .
rman_manage_region:  request: start

0x6, end 0x6000f

panic: Failed to add resource to rman


Hmmm, I suspect this is due to the way that bus_translate_resource

works which is

fundamentally broken.  It rewrites the start address of a resource

in-situ instead

of keeping downstream resources separate from the upstream

resources.   For example,

I don't see how you could ever release a resource in this design

without completely

screwing up your rman.  That is, I expect trying to detach a PCI

device behind a

translating bridge that uses the current approach should corrupt

the allocated

resource ranges in an rman long before my changes.

That said, that doesn't really explain the panic.  Hmm, the panic

might be because

for PCI bridge windows the driver now passes RF_ACTIVE and the

bus_translate_resource

hack only kicks in the activate_resource method of

pci_host_generic.c.



Detail:
. . .
pcib0:  mem

0x7d50-0x7d50930f irq 80,81 on simplebus2

pcib0: parsing FDT for ECAM0:
pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size:

0x4000


This indicates this is a translating bus.


pcib1:  irq 91 at device 0.0 on pci0
rman_manage_region:  request: start 0x1, end 0x1
pcib0: rman_reserve_resource: start=0xc000, end=0xc00f,

count=0x10

rman_reserve_resource_bound:  request: [0xc000,

0xc00f], length 0x10, flags 102, device pcib1

rman_reserve_resource_bound: trying 0x <0xc000,0xf>
considering [0xc000, 0x]
truncated region: [0xc000, 0xc00f]; size 0x10

(requested 0x10)

candidate region: [0xc000, 0xc00f], size 0x10
allocating from the beginning
rman_manage_region:  request: start

0x6, end 0x6000f


What you later typed does not match:

0x6
0x6000f

You later typed:

0x6000
0x600fff

This seems to have lead to some confusion from using the
wrong figure(s).


The fact that we are trying to reserve the CPU addresses in the

rman is because

bus_translate_resource rewrote the start address in the resource

after it was allocated.


That said, I can't see why rman_manage_region would actually fail.

At this point the

rman is empty (this is the first call to rman_manage_region for

"pcib1 memory window"),

so only the check that should be failing are the checks against

rm_start and

rm_end.  For the memory window, rm_start is always 0, and rm_end is

always

0x, so both the old (0xc - 0xc00f) and new

(0x6000 - 0x600fff)

ranges are within those bounds.


No:

0x

.vs (actual):

0x6
0x6000f


Ok, then this explains the failure if the "raw" addresses are above 4G.  I
have
access to an emag I'm currently using to test fixes to pci_host_generic.c
to
avoid corrupting struct resource objects.  I'll post the diff once I've got
something verified to work.


It looks to me like in sys/dev/pci/pci_pci.c the:

static void
pcib_probe_windows(struct pcib_softc *sc)
{
. . .
  pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0x);
. . .

is just inappropriately restrictive about where in the system
address space a PCIe can validly be mapped to on the high end.
That, in turn, leads to the rejection on the RPi4B now that
the range use is checked.


No, the physical register in PCI-PCI bridges is only 32-bits.  Only the
prefetchable BAR supports 64-bit addresses.  This is why the host bridge
is doing a translation from the CPU side (0x6) to the PCI BAR
addresses (0xc000) so that the BAR addresses are down in the 32-bit
address range.  It's also true that many PCI devices only support 32-bit
addresses in memory BARs.  64-bit BARs are an optional extension not
universally supported.

The translation here is somewhat akin to a type of MMU where the CPU
addresses are mapped to PCI addresses.  The problem here is that the
PCI BAR resources need to "stay" as PCI addresses since we depend on
being able to use rman_get_start/end to get the PCI addresses of
allocated resources, but pci_host_generic.c currently rewrites the
addresses.

Probably I should remove rman_set_start/end entirely (Warner added them
back in 2004) as the methods don't 

Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-14 Thread John Baldwin

On 2/12/24 5:57 PM, Mark Millard wrote:

On Feb 12, 2024, at 16:36, Mark Millard  wrote:


On Feb 12, 2024, at 16:10, Mark Millard  wrote:


On Feb 12, 2024, at 12:00, Mark Millard  wrote:


[Gack: I was looking at the wrong vintage of source code, predating
your changes: wrong system used.]


On Feb 12, 2024, at 10:41, Mark Millard  wrote:


On Feb 12, 2024, at 09:32, John Baldwin  wrote:


On 2/9/24 8:13 PM, Mark Millard wrote:

Summary:
pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000
. . .
rman_manage_region:  request: start 0x6, end 
0x6000f
panic: Failed to add resource to rman


Hmmm, I suspect this is due to the way that bus_translate_resource works which 
is
fundamentally broken.  It rewrites the start address of a resource in-situ 
instead
of keeping downstream resources separate from the upstream resources.   For 
example,
I don't see how you could ever release a resource in this design without 
completely
screwing up your rman.  That is, I expect trying to detach a PCI device behind a
translating bridge that uses the current approach should corrupt the allocated
resource ranges in an rman long before my changes.

That said, that doesn't really explain the panic.  Hmm, the panic might be 
because
for PCI bridge windows the driver now passes RF_ACTIVE and the 
bus_translate_resource
hack only kicks in the activate_resource method of pci_host_generic.c.


Detail:
. . .
pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000


This indicates this is a translating bus.


pcib1:  irq 91 at device 0.0 on pci0
rman_manage_region:  request: start 0x1, end 0x1
pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10
rman_reserve_resource_bound:  request: [0xc000, 0xc00f], 
length 0x10, flags 102, device pcib1
rman_reserve_resource_bound: trying 0x <0xc000,0xf>
considering [0xc000, 0x]
truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10)
candidate region: [0xc000, 0xc00f], size 0x10
allocating from the beginning
rman_manage_region:  request: start 0x6, end 
0x6000f


What you later typed does not match:

0x6
0x6000f

You later typed:

0x6000
0x600fff

This seems to have lead to some confusion from using the
wrong figure(s).


The fact that we are trying to reserve the CPU addresses in the rman is because
bus_translate_resource rewrote the start address in the resource after it was 
allocated.

That said, I can't see why rman_manage_region would actually fail.  At this 
point the
rman is empty (this is the first call to rman_manage_region for "pcib1 memory 
window"),
so only the check that should be failing are the checks against rm_start and
rm_end.  For the memory window, rm_start is always 0, and rm_end is always
0x, so both the old (0xc - 0xc00f) and new (0x6000 - 
0x600fff)
ranges are within those bounds.


No:

0x

.vs (actual):

0x6
0x6000f


Ok, then this explains the failure if the "raw" addresses are above 4G.  I have
access to an emag I'm currently using to test fixes to pci_host_generic.c to
avoid corrupting struct resource objects.  I'll post the diff once I've got
something verified to work.


It looks to me like in sys/dev/pci/pci_pci.c the:

static void
pcib_probe_windows(struct pcib_softc *sc)
{
. . .
 pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0x);
. . .

is just inappropriately restrictive about where in the system
address space a PCIe can validly be mapped to on the high end.
That, in turn, leads to the rejection on the RPi4B now that
the range use is checked.


No, the physical register in PCI-PCI bridges is only 32-bits.  Only the
prefetchable BAR supports 64-bit addresses.  This is why the host bridge
is doing a translation from the CPU side (0x6) to the PCI BAR
addresses (0xc000) so that the BAR addresses are down in the 32-bit
address range.  It's also true that many PCI devices only support 32-bit
addresses in memory BARs.  64-bit BARs are an optional extension not
universally supported.

The translation here is somewhat akin to a type of MMU where the CPU
addresses are mapped to PCI addresses.  The problem here is that the
PCI BAR resources need to "stay" as PCI addresses since we depend on
being able to use rman_get_start/end to get the PCI addresses of
allocated resources, but pci_host_generic.c currently rewrites the
addresses.

Probably I should remove rman_set_start/end entirely (Warner added them
back in 2004) as the methods don't do anything to deal with the fallout
that the rman.rm_list linked-list is no longer sorted by address once
some addresses get rewritten, etc.

--
John Baldwin




Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-12 Thread John Baldwin

On 2/10/24 2:09 PM, Michael Butler wrote:

I have stability problems with anything at or after this commit
(b377ff8) on an amd64 laptop. While I see the following panic logged, no
crash dump is preserved :-( It happens after ~5-6 minutes running in KDE
(X).

Reverting to 36efc64 seems to work reliably (after ACPI changes but
before the problematic PCI one)

kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid = 2; apic id = 02
kernel: fault virtual address = 0x48
kernel: fault code= supervisor read data, page not present
kernel: instruction pointer   = 0x20:0x80acb962
kernel: stack pointer = 0x28:0xfe00c4318d80
kernel: frame pointer = 0x28:0xfe00c4318d80
kernel: code segment  = base 0x0, limit 0xf, type 0x1b
kernel:   = DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags  = interrupt enabled, resume, IOPL = 0
kernel: current process   = 2 (clock (0))
kernel: rdi: f802e460c000 rsi:  rdx: 0002
kernel: rcx:   r8: 001e  r9: fe00c4319000
kernel: rax: 0002 rbx: f802e460c000 rbp: fe00c4318d80
kernel: r10: 1388 r11: 7ffc765d r12: 000f
kernel: r13: 0002 r14: f8000193e740 r15: 
kernel: trap number   = 12
kernel: panic: page fault
kernel: cpuid = 2
kernel: time = 1707573802
kernel: Uptime: 6m19s
kernel: Dumping 942 out of 16242
MB:..2%..11%..21%..31%..41%..51%..62%..72%..82%..92%
kernel: Dump complete
kernel: Automatic reboot in 15 seconds - press a key on the console to abort


Without a stack trace it is pretty much impossible to debug a panic like this.
Do you have KDB_TRACE enabled in your kernel config?  I'm also not sure how the
PCI changes can result in a panic post-boot.  If you were going to have problems
they would be during device attach, not after you are booted and running X.

Short of a stack trace, you can at least use lldb or gdb to lookup the source
line associated with the faulting instruction pointer (as long as it isn't in
a kernel module), e.g. for gdb you would use 'gdb /boot/kernel/kernel' and then
'l *', e.g. from above: 'l *0x80acb962'

--
John Baldwin




Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-12 Thread John Baldwin

On 2/9/24 8:13 PM, Mark Millard wrote:

Summary:

pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000
. . .
rman_manage_region:  request: start 0x6, end 
0x6000f
panic: Failed to add resource to rman


Hmmm, I suspect this is due to the way that bus_translate_resource works which 
is
fundamentally broken.  It rewrites the start address of a resource in-situ 
instead
of keeping downstream resources separate from the upstream resources.   For 
example,
I don't see how you could ever release a resource in this design without 
completely
screwing up your rman.  That is, I expect trying to detach a PCI device behind a
translating bridge that uses the current approach should corrupt the allocated
resource ranges in an rman long before my changes.

That said, that doesn't really explain the panic.  Hmm, the panic might be 
because
for PCI bridge windows the driver now passes RF_ACTIVE and the 
bus_translate_resource
hack only kicks in the activate_resource method of pci_host_generic.c.


Detail:


. . .
pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000


This indicates this is a translating bus.


pcib1:  irq 91 at device 0.0 on pci0
rman_manage_region:  request: start 0x1, end 0x1
pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10
rman_reserve_resource_bound:  request: [0xc000, 0xc00f], 
length 0x10, flags 102, device pcib1
rman_reserve_resource_bound: trying 0x <0xc000,0xf>
considering [0xc000, 0x]
truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10)
candidate region: [0xc000, 0xc00f], size 0x10
allocating from the beginning
rman_manage_region:  request: start 0x6, end 
0x6000f


The fact that we are trying to reserve the CPU addresses in the rman is because
bus_translate_resource rewrote the start address in the resource after it was 
allocated.

That said, I can't see why rman_manage_region would actually fail.  At this 
point the
rman is empty (this is the first call to rman_manage_region for "pcib1 memory 
window"),
so only the check that should be failing are the checks against rm_start and
rm_end.  For the memory window, rm_start is always 0, and rm_end is always
0x, so both the old (0xc - 0xc00f) and new (0x6000 - 
0x600fff)
ranges are within those bounds.  I would instead expect to see some other issue 
later
on where we fail to allocate a resource for a child BAR, but I wouldn't expect
rman_manage_region to fail.

Logging the return value from rman_manage_region would be the first step I think
to see which error value it is returning.

Probably I should fix pci_host_generic.c to handle translation properly however.
I can work on a patch for that.

--
John Baldwin




Re: make installworld fails because /usr/include/c++/v1/__tuple is a file

2023-12-13 Thread John Baldwin

On 12/10/23 8:43 AM, Dimitry Andric wrote:

On 10 Dec 2023, at 15:11, Herbert J. Skuhra  wrote:


On Sun, Dec 10, 2023 at 01:22:38PM +, John F Carr wrote:

On arm64 running CURRENT from two weeks ago I updated to

  c711af772782 Bump __FreeBSD_version for llvm 17.0.6 merge

and built and installed from source.  make installworld failed:

  install: target directory `/usr/include/c++/v1/__tuple/' does not exist

That pathname is a file:

  -r--r--r--  1 root wheel 20512 Feb 15  2023 /usr/include/c++/v1/__tuple

Early in make output is

  mtree -deU -i -f /usr/src/etc/mtree/BSD.include.dist -p /usr/include
  ./c++/v1/__algorithm/pstl_backends missing (created)
  [...]
  ./c++/v1/__tuple missing (not created: File exists)

Should I remove the file and try again, or is there a more elegant fix?

The word "tuple" does not appear in UPDATING.


'make delete-old' should have removed this file.

bdd1243df58e6 (Dimitry Andric  2023-04-14 23:41:27 +0200   965)
OLD_FILES+=usr/include/c++/v1/__tuple


Ah yes, that's it. The file was removed during the upgrade from libc++
15.0 to 16.0, while its contents was split into a subdirectory named
__tuple_dir. In libc++ 17.0.0 they renamed this subdirectory back to
just __tuple.

This means that apparently people are not running "make delete-old"
after installations. Please don't forget that. :)


Well, but if you have an old system with LLVM 15 that you upgrade directly
to LLVM 17 you will hit this even if you ran delete-old after your last
upgrading that used LLVM 15.  We might need something to cope with this
during the install target for libc++ in particular where this has occurred
multiple times historically.

--
John Baldwin




Re: bhyve -G

2023-11-15 Thread John Baldwin

On 11/15/23 3:06 PM, Bakul Shah wrote:




On Nov 15, 2023, at 7:57 AM, John Baldwin  wrote:

On 10/9/23 5:21 PM, Bakul Shah wrote:

Any hints on how to use bhyve's -G  option to debug a VM
kernel? I can connect to it from gdb with "target remote :"
& bhyve stops the VM initially but beyond that I am not sure.
Ideally this should work just like an in-circuit-emulator, not
requiring anything special in the VM or kernel itself.


step only works on Intel CPUs currently (and is a bit fragile
anyway due to interrupts firing while you try to step, but that
happens for me in QEMU as well).   Breakpoints should work fine.
I tend to use 'until' to do stepping (basically stepping via
temporary breakpoints) when debugging the kernel this way.


Thanks for your response!

I can ^C to stop the VM, examine the stack, set breakpoints,
continue etc. but when the breakpoint is hit, kgdb doesn't
regain control -- instead I get the usual

db> ...

prompt on the console. I guess I have to set some sysctl for
this?


Hmm, no, it shouldn't be breaking into DDB in the guest as the
breakpoint exception should be intercepted by the stub and
never made visible to the guest.

--
John Baldwin




Re: [HEADS-UP] Quick update to 14.0-RELEASE schedule

2023-11-15 Thread John Baldwin

On 11/14/23 8:52 PM, Glen Barber wrote:

On Tue, Nov 14, 2023 at 08:10:23PM -0700, The Doctor wrote:

On Wed, Nov 15, 2023 at 02:27:01AM +, Glen Barber wrote:

On Tue, Nov 14, 2023 at 05:15:48PM -0700, The Doctor wrote:

On Tue, Nov 14, 2023 at 08:36:54PM +, Glen Barber wrote:

We are still waiting for a few (non-critical) things to complete before
the announcement of 14.0-RELEASE will be ready.

It should only be another day or so before these things complete.

Thank you for your understanding.



I always just installed my copy.



Ok.  I do not know what exactly is your point, but releases are never
official until there is a PGP-signed email sent.  The email is intended
for the general public of consumers of official releases, not "yeah,
but"s.



Howver if you do a freebsd-update upgrade, you can upgrade.

Is that suppose to happen?



That does not say that the freebsd-update bits will not change *until*
the official release announcement has been sent.

In my past 15 years involved in the Project, I think we have been very
clear on that.

A RELEASE IS NOT FINAL UNTIL THE PGP-SIGNED ANNOUNCEMENT IS SENT.

I mean, c'mon, dude.

We really, seriously, for all intents and purposes, cannot be any more
clear than that.

So, yes, *IF* an update necessitates a new freebsd-update build, what
you are running is *NOT* official.

For at least 15 years, we have all said the same entire thing.


Yes, but, if at this point we had to rebuild, it would have to be 14.0.1
or something (which we have done a few times in the past).  It would be
too confusing otherwise once the bits are built and published (where
published means "uploaded to our CDN").  It is the 14.0 release bits,
the only question is if for some reason we had a dire emergency that
meant we had to pull it at the last minute and publish different bits
(under a different release name).

Realistically, once the bits are available, we can't prevent people from
using them, it's just at their own risk to do so until the project says
"yes, we believe these are good".  Granted, they are under the same risk
if they are still running the last RC.  The best way to minimize that
risk going forward is to add more automation of testing/CI to go along
with the process of building release bits so that the build artifacts
from the release build run through CI and are only published if the CI
is green as that would give us greater confidence of "we believe these
are good" before they are uploaded for publishing.

--
John Baldwin




Re: bsdinstall/scriptedpart could not run ;-(

2023-11-15 Thread John Baldwin

On 11/12/23 11:00 PM, KIRIYAMA Kazuhiko wrote:

Hi, all

I usually run bsdinstall by instllerconfig, but
bsdinstall/scriptedpart could not run ;-(

My installerconfig is:

PARTITIONS='nda0 gpt { 200M efi, 804G freebsd-ufs /, 128G freebsd-swap }'
DISTRIBUTIONS='base.txz kernel-dbg.txz kernel.txz lib32.txz tests.txz'
ZFSBOOT_DISKS=""

#!/bin/sh
/bin/mkdir -p /.dake
/bin/cp /usr/share/zoneinfo/Asia/Tokyo /etc/localtime
/bin/cp /root/.cshrc /root/.cshrc.org
/bin/cat <> /etc/fstab
192.168.1.17:/.dake /.dake  nfs rw  0   0
EOF
sed -i".bak" -Ee 
'/^#BDS_install.sh_added:start_line$/,/^#BDS_install.sh_added:end_line$/d' /root/.cshrc
/bin/cat <<'EOF' >> /root/.cshrc
#BDS_install.sh_added:start_line
setenv  PATH${PATH}:/.dake/bin
setenv  MGRHOME /usr/home/admin
setenv  OPENTOOLSDIR/.dake
setenv  DAKEDIR /.dake
#BDS_install.sh_added:end_line
EOF
   :
(snip)
   :

I investigated in bsdinstall script and found scriptedpart
which acutually run partedit with scriptedpart would not be
destroy existing partition. In fact scriptedpart -> partedit
changed in script as follows, then parttion editor run at
terminal.


My guess is something to do with commit 
23099099196548550461ba427dcf09dcfb01878d,
though I don't see how it could work any differently in this case as the only
change to part_config there was to return if if geom_gettree fails, and if it
fails, provider_for_name would presumably have failed anyway.

--
John Baldwin




Re: bhyve -G

2023-11-15 Thread John Baldwin

On 10/9/23 5:21 PM, Bakul Shah wrote:

Any hints on how to use bhyve's -G  option to debug a VM
kernel? I can connect to it from gdb with "target remote :"
& bhyve stops the VM initially but beyond that I am not sure.
Ideally this should work just like an in-circuit-emulator, not
requiring anything special in the VM or kernel itself.


step only works on Intel CPUs currently (and is a bit fragile
anyway due to interrupts firing while you try to step, but that
happens for me in QEMU as well).   Breakpoints should work fine.
I tend to use 'until' to do stepping (basically stepping via
temporary breakpoints) when debugging the kernel this way.

--
John Baldwin




Re: KTLS thread on 14.0-RC3

2023-10-31 Thread John Baldwin

On 10/30/23 3:41 AM, Zhenlei Huang wrote:




On Oct 30, 2023, at 12:09 PM, Zhenlei Huang  wrote:




On Oct 29, 2023, at 5:43 PM, Gordon Bergling  wrote:

Hi,

I am currently building a new system, which should be based on 14.0-RELEASE.
Therefor I am tracking releng/14.0 since its creation and updating it currently
via the usualy buildworld steps.

What I have noticed recently is, that the [KTLS] is missing. I have a stable/13
system which shows the [KTLS] thread and a very recent -CURRENT that also shows
the [KTLS] thread.

The stable/13 and releng/14.0 systems both use the GENERIC kernel, without any
custom modifications.

Loaded KLDs are also the same.

Did I miss something, or is there something in releng/14.0 missing, which
is currenlty enabled in stable/13?


KTLS shall still work as intended, the creation of it threads is deferred.

See a72ee355646c (ktls: Defer creation of threads and zones until first use)

Run ktls_init() when the first KTLS session is created rather than
unconditionally during boot.  This avoids creating unused threads and
allocating unused resources on systems which do not use KTLS.


```
-SYSINIT(ktls, SI_SUB_SMP + 1, SI_ORDER_ANY, ktls_init, NULL);
```


Seems 14.0 only create one KTLS thread.

IIRC 13.2 create one thread per core.


That part should not be different.  There should always be one thread per core.

--
John Baldwin




15/14 upgrades break old sudo, maybe bump PAM's shlib?

2023-09-09 Thread John Baldwin

I upgraded my laptop from a late June current to current from yesterday
today, and after installworld sudo stopped working (dies with a SIGBUS).
After some debugging, the issue ended up being OpenSSL library version
mismatches as sudo uses PAM and PAM is linked agianst OpenSSL 3, but
sudo is linked against OpenSSL 1.1.1.  Both shlibs get mapped into the
the process and at some point sudo crosses the streams and the crash
occurs inside OpenSSL 3's libcrypto.

I realize that we do have a generate note about needing to update third
party packages after an upgrade, but I tend to use sudo as part of my
workflow for doing that sort of thing.  I generally build all my own
packages via poudriere and use sudo at various points in that process,
but even if I were using FreeBSD.org packages I would be using sudo to
try to run 'pkg upgrade'.

su(8) in base works fine, so that's my workaround for now on my laptop,
but I wonder if we want to make this particular bump on the upgrade
path a little less bumpy?  Either by being clear in our release notes
that tools like sudo (and I suspect any other third-party su wrappers
that also use PAM, xscreensaver's screen lock doesn't seem to be affected
since it probably doesn't use OpenSSL directly thankfully) can break,
or another route we could take would be to bump the DSO versions of
things that depend on libcrypto/libssl in base.  We did not do this
latter approach for the OpenSSL 1.0.2 -> 1.1.1 upgrade FWIW.

If we wanted to do the shlib bump approach, Enji had a good list from a
while back (though Enji wanted to make them all private rather than
bumping):

- kerberos
- libarchive
- libbsnmp
- libfetch
- libgeli
- libldns
- libmp
- libradius
- libunbound

From my research it seems that PAM (library and modules), gssapi libraries,
and libzfs would also need to be on the list.

libldns is already private as is libunbound, though bumping them might
be safter anyway.

There is on libgeli, instead there is geli_eli.so which has no version,
but hopefully is not widely used in ports the same as PAM.  Note also
that if we did this, we would want to do it for 14.0 as 13.x -> 14 upgrades
are affected in the same way.

--
John Baldwin



Re: user problems when upgrading to v15

2023-09-09 Thread John Baldwin

On 9/2/23 7:11 AM, Dimitry Andric wrote:

On 1 Sep 2023, at 03:42, brian whalen  wrote:


Repeating the entire process:

I created a 13.2 vm with 6 cores and 8GB of ram.

Ran freebsd-update fetch and install.

Ran pkg install git bash ccache open-vm-tools-nox11

Used git clone to get current and ports source files.

Edited /etc/make.conf to use ccache

Ran make -j6 buildworld && make -j6 kernel

I then rebooted in single user mode and did the next steps saving output to a file 
with > filename.

etcupdate -p was pretty uneventful. It did  show the below and did not prompt 
to edit.

root@f15:~ # less etcupdatep
   C /etc/group
   C /etc/master.passwd


This is a problem: the "C" characters mean there were conflicts, and it's 
indeed very unfortunate that etcupdate does not immediately force you to resolve them. 
Because now you basically have mangled group and master.passwd files, with conflict 
markers in them!


No, the conflicted files are in /var/db/etcupdate/conflicts, the files in /etc 
are still
the old ones at this point and won't be updated until you run 'etcupdate 
resolve' to
fix them.

I suspect what happened here is that Brian chose the 'tf' (theirs-full) option 
for
'etcupdate resolve' when he really wanted to do 'e' to edit the conflicted 
version.


Immediately after this, you should run "etcupdate resolve", and fix any 
conflicts that it has found.

Note that recently there was a lot of churn due to the removal of $FreeBSD$ 
keywords, and this almost always creates conflicts in the group and passwd 
files. For lots of other files in /etc, the conflicts are resolved 
automatically, but unfortunately not for the files that are essential to log in!



make installworld seemed mostly error free though I did see a nonzero status 
for a man page failed inn the man4 directory.

etcupdate -B only showed the below. This was my first build after install.

root@f15:~ # less etcupdateB
Conflicts remain from previous update, aborting.


Yes, that is indeed the problem. You must first resolve conflicts from any 
previous etcupdate run, before doing anything else. As to why it does not 
immediately forces you to do so, and delegates this to a separate step, which 
can easily be forgotten, I have no idea.


So that if you are doing scripted upgrades, you don't hang forever in a script.
The intention is that after doing a bunch of scripted installworld + etcupdate's
on various hosts you can use 'etcupdate status' to see if there are any 
remaining
steps requiring manual intervention.

There could be an option to request batched vs interactivate updates perhaps.


If I type exit in single user mode to go multi user mode, the local user still 
works. After a reboot the local user still works. This local user can also sudo 
as expected. This wasn't the case for the previous build when I first reported 
this. However, if I run etcupdate resolve it is still presenting /etc/group and 
/etc/master/passwd as problems.

If this is is expected behavior for current then no big deal. I just wasn't 
sure.


The conflicts themselves are expected, alas. But you _must_ resolve them, 
otherwise you can end up with a mostly-bricked system.


No, the conflict markers are not placed in the versions in /etc.  However, 
etucpdate
does refuse to do a "new" upgrade until you resolve all the conflicts from your
previous upgrade to ensure that conflicted upgrades aren't missed.

--
John Baldwin




Re: Support for more than 256 CPU cores

2023-05-08 Thread John Baldwin

On 5/5/23 6:38 AM, Ed Maste wrote:

FreeBSD supports up to 256 CPU cores in the default kernel configuration
(on Tier-1 architectures).  Systems with more than 256 cores are
available now, and will become increasingly common over FreeBSD 14’s
lifetime.  The FreeBSD Foundation is supporting the effort to increase
MAXCPU, and PR269572[1] is open to track tasks and changes.

As a project we have scalability work ahead of us to make best use of
high core count machines, but at a minimum we should be able to boot a
GENERIC kernel on such systems, and have an ABI for the FreeBSD 14
release that supports such a configuration.

Some changes have already been committed in support of increased MAXCPU,
including increasing MAX_APIC_ID (commit c8113dad7ed4) and a number of
changes to reduce bloat (such as commits 42f722e721cd, e72f7ed43eef,
78cfa762ebf2 and 74ac712f72cf).

The next step is to increase the maximum cpuset size for userland.
I have this change open in review D39941[2] and an exp-run request in
PR271213[3].  Following that the kernel change for increasing MAXCPU is
in D36838[4].

Additional work on bloat reduction will continue after this change, and
looking forward FreeBSD is going to need ongoing effort from the
community and the FreeBSD Foundation to continue improving scalability.

[1] https://bugs.freebsd.org/269572
[2] https://reviews.freebsd.org/D39941
[3] https://bugs.freebsd.org/271213
[4] https://reviews.freebsd.org/D36838


FWIW, I think it will be useful for main to run with a larger userspace
MAXCPU than kernel for at least a while so that we have better testing
of that configuration and to give headroom for bumping MAXCPU in the
kernel during the 14.x branch.

The only other viable path I think which would be more work would be to
rework cpuset_t in userspace to always use a dynamically sized mask.
This could perhaps be done in an API-preserving manner by making cpuset_t
an opaque wrapper type in userland and requiring CPU_* to indirect to
functions in libc, etc.  That's a fair bit more work however.

--
John Baldwin




Re: How to Enable support for IPsec deprecated algorithms: 3DES, MD5-HMAC

2022-10-04 Thread John Baldwin

On 10/4/22 1:53 AM, alfadev wrote:

Hi, i am trying to move my gateway from FreeBSD 11.0 to FreeBSD 14.0 to use
newly added ipfw table lookup for mac addresses 
(https://reviews.freebsd.org/D35103)

Also I have too many IPSec connections between fortigate, cisco etc.
And their operators use only 3DES algorithms and they have no intention to 
change it for me.
So, now i have to enable 3DES support for FreeBSD 14.0 .

To add 3DES support again i changed some files shown below.
I am not sure what i did any help welcomes.


You do not want to just restore the files as-is.  You instead want to revert 
some of the
diffs from the first commit.  The second commit for /dev/crypto doesn't matter 
for IPsec
and you can ignore it.

However, you will need to also partially revert commit 
0e00c709d7f1cdaeb584d244df9534bcdd0ac527
which removes DES and 3DES from OCF itself.  This is what removed enc_xform_des 
for example.

--
John Baldwin



Re: pkg: Newer FreeBSD version for package... but why?

2022-07-13 Thread John Baldwin

On 7/13/22 3:17 AM, Andriy Gapon wrote:

On 2022-07-13 13:09, Michael Gmelin wrote:



On Wed, 13 Jul 2022 10:29:06 +0300
Andriy Gapon  wrote:


# uname -U
1400063

# uname -K
1400063

# pkg upgrade
Updating FreeBSD repository catalogue...
Fetching packagesite.pkg: 100%5 MiB   4.8MB/s00:01
Processing entries:   0%
Newer FreeBSD version for package zyre:
To ignore this error set IGNORE_OSVERSION=yes
- package: 1400063
- running kernel: 1400051
Ignore the mismatch and continue? [y/N]:

Does anyone know why this would happen?
Where does pkg get its notion of the running kernel version?



If I'm reading the sources correctly, it's determining the OS version
by looking at the elf headers of various files in this order:

  getenv("ABI_FILE")
  /usr/bin/uname
  /bin/sh

So I would assume that `file /usr/bin/uname` shows 1400051 on your
system.


Thank you very much!  That's it:
# file /usr/bin/uname
/usr/bin/uname: ELF 32-bit LSB executable, ARM, EABI5 version 1
(FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1,
FreeBSD-style, for FreeBSD 14.0 (1400051), stripped


You can point it to checking another file by setting ABI_FILE[0] in the
environment or ignore the check by setting IGNORE_OSVERSION (like
advised). The "running kernel:" label seems a bit misleading.


Indeed.

Now the next thing (for me) to research is why the binaries were built
"for FreeBSD 14.0 (1400051)" when the source tree has 1400063 and uname
-U also reports 1400063.
FWIW, this was a cross-build, maybe that played a role too.


If you do a NO_CLEAN=yes build, we don't relink binaries just because
crt*.o changed (where the note is stored).

--
John Baldwin



Re: BLAKE3 unstability?

2022-07-12 Thread John Baldwin

On 7/12/22 1:41 AM, Evgeniy Khramtsov wrote:

I can reproduce via:

$ truncate -s 10G /tmp/test
$ mdconfig -f /tmp/test -S 4096
$ zpool create test /dev/md1
$ zfs create -o checksum=blake3 test/b
$ dd if=/dev/random of=/test/b/noise bs=1M count=4096
$ sync
$ zpool scrub test
$ zpool status


I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was
most recently merged) built out of tree on either stable/13 70fd40edb86
or main 9aa02d5120a.

I'll update a system and see if I can reproduce it with the in-tree ZFS.

- Ryan


It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either.

Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using
different implementations?
I do see that blake3 went in with only a Linux module parameter for the
implementation selection, so I'll have to fix that. For now we can at least
see which was fastest, which should be the one selected. You just won't be
able to manually change it to see if that helps.

- Ryan


I found the culprit (kernel and base from download.FreeBSD.org
kernel.txz and base.txz respectively) (I forgot about local sysctl.conf...):

kern.sched.steal_thresh=1
kern.sched.preempt_thresh=121

Then

#!/bin/sh

truncate -s 10G /tmp/test
mdconfig -f /tmp/test -S 4096
zpool create test /dev/md0
zfs create -o checksum=blake3 test/b
dd if=/dev/random of=/test/b/noise bs=1M count=4096
sync
zpool scrub test
sleep 3
zpool status

zpool destroy test
mdconfig -d -u 0
rm /tmp/test

As for ULE "tuning", these values give me fine desktop interactivity
when building lang/rust when nice and idprio did not help, so I left
them in sysctl.conf. Not sure if scheduling parameters are worthy of
a ZFS PR, maybe something essential is preempted.


It could be missing fpu_kern_enter/leave that lack of preemption would
cover over.  I thought that missing that would give a panic in the
kernel though due to FPU instructions being disabled (including vector
instructions).  Maybe ZFS isn't using fpu_kern_enter(FPU_NOCTX) and is
instead trying to juggle contexts and it has a bug in how it manages
saved FPU contexts and reuses a context?  If so, I would just suggest
that ZFS switch to using FPU_KERN_NOCTX instead which runs all SSE
type code in a critical section to disable preemption but avoids
having to allocate and manage FPU contexts.

--
John Baldwin



Re: Profiled libraries on freebsd-current

2022-05-04 Thread John Baldwin

On 5/4/22 1:38 PM, Steve Kargl wrote:

On Wed, May 04, 2022 at 01:22:57PM -0700, John Baldwin wrote:

On 5/4/22 12:53 PM, Steve Kargl wrote:

On Wed, May 04, 2022 at 11:12:55AM -0700, John Baldwin wrote:

I don't know the entire FreeBSD ecosystem.  Do people
use FreeBSD on embedded systems (e.g., nanobsd) where
libthr may be stripped out?  Thus, --enable-threads=no
is needed.


If they do, they are also using a constrained userland and
probably are not shipping a GCC binary either.  However, it's
not clear to me what --enable-threads means.

Does this enable -pthread as an option?  If so, that should
definitely just always be on.  It's still an option users have
to opt into via a command line flag and doesn't prevent
building non-threaded programs.

If it's enabling use of threads at runtime within GCC itself,
I'd say that also should probably just be allowed to be on.

I can't really imagine what else it might mean (and I doubt
it means the latter).



AFAICT, it controls whether -lpthread is automatically added to
the command line.  In the case of -pg, it is -lpthread_p.
The relevant lines are

#ifdef FBSD_NO_THREADS
#define FBSD_LIB_SPEC "\
   %{pthread: %eThe -pthread option is only supported on FreeBSD when gcc \
is built with the --enable-threads configure-time option.}  \
   %{!shared:   \
 %{!pg: -lc}
\
 %{pg:  -lc_p}  \
   }"
#else
#define FBSD_LIB_SPEC "\
   %{!shared:   \
 %{!pg: %{pthread:-lpthread} -lc}   \
 %{pg:  %{pthread:-lpthread_p} -lc_p}   \
   }\
   %{shared:\
 %{pthread:-lpthread} -lc   \
   }"
#endif

Ed is wondering if one can get rid of FBSD_NO_THREADS. With the
pending removal of WITH_PROFILE, the above reduces to

#define FBSD_LIB_SPEC " \
   %{!shared:\
 %{pthread:-lpthread} -lc\
   } \
   %{shared: \
 %{pthread:-lpthread} -lc\
   }"

If one can do the above, then freebsd-nthr.h is no longer needed
and can be deleted and config.gcc's handling of --enable-threads
can be updated/removed.


Ok, so it's just if -pthread is supported (%{pthread:-lpthread} only
adds -lpthread if -pthread was given on the command line).  That can just
be on all the time and Ed is correct that it is safe to remove the
FBSD_NO_THREADS case and assume it is always present instead.

--
John Baldwin



Re: Profiled libraries on freebsd-current

2022-05-04 Thread John Baldwin

On 5/4/22 12:53 PM, Steve Kargl wrote:

On Wed, May 04, 2022 at 11:12:55AM -0700, John Baldwin wrote:

On 5/2/22 10:37 AM, Steve Kargl wrote:

On Mon, May 02, 2022 at 12:32:25PM -0400, Ed Maste wrote:

On Sun, 1 May 2022 at 11:54, Steve Kargl
 wrote:


diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h
index 594487829b5..1e8ab2e1827 100644
--- a/gcc/config/freebsd-spec.h
+++ b/gcc/config/freebsd-spec.h
@@ -93,14 +93,22 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  (similar to the default, except no -lg, and no -p).  */

   #ifdef FBSD_NO_THREADS


I wonder if we can simplify things now, and remove this
`FBSD_NO_THREADS` case. I didn't see anything similar in other GCC
targets I looked at.


That I don't know.  FBSD_NO_THREADS is defined in freebsd-nthr.h.
In fact, it's the only thing in that header (except copyright
broilerplate).  freebsd-nthr.h only appears in config.gcc and
seems to only get added to the build if someone runs configure
with --enable-threads=no.  Looking at my last config.log for
gcc trunk, I see "Thread model: posix", which appears to be
the default case or if someone does --enable-threads=yes or
--enable-threads=posix.  So, I suppose it comes down to
two questions: (1) is libpthread.* available on all supported
targets and versions? (2) does anyone build gcc without
threads support?


libpthread is available on all supported architectures on all
supported versions.  libthr has been the default threading library
since 7.0 and the only supported library since 8.0.  In GDB I just
assume libthr style threads, and I think GCC can safely do the
same.



I don't know the entire FreeBSD ecosystem.  Do people
use FreeBSD on embedded systems (e.g., nanobsd) where
libthr may be stripped out?  Thus, --enable-threads=no
is needed.


If they do, they are also using a constrained userland and
probably are not shipping a GCC binary either.  However, it's
not clear to me what --enable-threads means.

Does this enable -pthread as an option?  If so, that should
definitely just always be on.  It's still an option users have
to opt into via a command line flag and doesn't prevent
building non-threaded programs.

If it's enabling use of threads at runtime within GCC itself,
I'd say that also should probably just be allowed to be on.

I can't really imagine what else it might mean (and I doubt
it means the latter).

--
John Baldwin



Re: Profiled libraries on freebsd-current

2022-05-04 Thread John Baldwin

On 5/2/22 10:37 AM, Steve Kargl wrote:

On Mon, May 02, 2022 at 12:32:25PM -0400, Ed Maste wrote:

On Sun, 1 May 2022 at 11:54, Steve Kargl
 wrote:


diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h
index 594487829b5..1e8ab2e1827 100644
--- a/gcc/config/freebsd-spec.h
+++ b/gcc/config/freebsd-spec.h
@@ -93,14 +93,22 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 (similar to the default, except no -lg, and no -p).  */

  #ifdef FBSD_NO_THREADS


I wonder if we can simplify things now, and remove this
`FBSD_NO_THREADS` case. I didn't see anything similar in other GCC
targets I looked at.


That I don't know.  FBSD_NO_THREADS is defined in freebsd-nthr.h.
In fact, it's the only thing in that header (except copyright
broilerplate).  freebsd-nthr.h only appears in config.gcc and
seems to only get added to the build if someone runs configure
with --enable-threads=no.  Looking at my last config.log for
gcc trunk, I see "Thread model: posix", which appears to be
the default case or if someone does --enable-threads=yes or
--enable-threads=posix.  So, I suppose it comes down to
two questions: (1) is libpthread.* available on all supported
targets and versions? (2) does anyone build gcc without
threads support?


libpthread is available on all supported architectures on all
supported versions.  libthr has been the default threading library
since 7.0 and the only supported library since 8.0.  In GDB I just
assume libthr style threads, and I think GCC can safely do the
same.

--
John Baldwin



Re: 'set but unused' breaks drm-*-kmod

2022-04-27 Thread John Baldwin

On 4/21/22 6:45 AM, Emmanuel Vadot wrote:

On Thu, 21 Apr 2022 08:51:26 -0400
Michael Butler  wrote:


On 4/21/22 03:42, Emmanuel Vadot wrote:


   Hello Michael,

On Wed, 20 Apr 2022 23:39:12 -0400
Michael Butler  wrote:


Seems this new requirement breaks kmod builds too ..

The first of many errors was (I stopped chasing them all for lack of
time) ..

--- amdgpu_cs.o ---
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.7.19_3/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1210:26:
error: variable 'priority' set but not used
[-Werror,-Wunused-but-set-variable]
   enum drm_sched_priority priority;
   ^
1 error generated.
*** [amdgpu_cs.o] Error code 1



   How are you building the port, directly or with PORTS_MODULES ?
   I do make passes on the warning for drm and I did for set-but-not-used
case but unfortunately this option doesn't exists in 13.0 so I couldn't
apply those in every branch.


I build this directly on -current. I'm guessing that these are what
triggered this behaviour:

commit 8b83d7e0ee54416b0ee58bd85f9c0ae7fb3357a1
Author: John Baldwin 
Date:   Mon Apr 18 16:06:27 2022 -0700

  Make -Wunused-but-set-variable a fatal error for clang 13+ for
kernel builds.

  Reviewed by:imp, emaste
  Differential Revision:  https://reviews.freebsd.org/D34949

commit 615d289ffefe2b175f80caa9b1e113c975576472
Author: John Baldwin 
Date:   Mon Apr 18 16:06:14 2022 -0700

  Re-enable set but not used warnings for kernel builds.

  make tinderbox now passes with this warning enabled as a fatal error,
  so revert the change to hide it in preparation for making it fatal.

  This reverts commit e8e691983bb75e80153b802f47733f1531615fa2.

  Reviewed by:imp, emaste
  Differential Revision:  https://reviews.freebsd.org/D34948




  Ok I see,

  I won't have time until monday (maybe tuesday to fix this) but if
someone wants to beat me to it we should add some new CWARNFLAGS for
each problematic files in the 5.4-lts and 5.7-table branches of
drm-kmod (master which is following 5.10 is already good) only
if $ {COMPILER_VERSION} >= 13.


There is already a helper you can use that deals with compiler versions:

CWARNFLAGS+= ${NO_WUNUSED_BUT_SET_VARIABLE}

or some such.

--
John Baldwin



Re: ktrace on NFSroot failing?

2022-03-25 Thread John Baldwin

On 3/10/22 8:14 AM, Mateusz Guzik wrote:

On 3/10/22, Bjoern A. Zeeb  wrote:

Hi,

I am having a weird issue with ktrace on an nfsroot machine:

root:/tmp # ktrace sleep 1
root:/tmp # kdump
-559038242  Events dropped.
kdump: bogus length 0xdeadc0de

Anyone seen something like this before?



I just did a quick check and it definitely fails on nfs mounts:
# ktrace pwd
/root/mjg
# kdump
-559038242  Events dropped.
kdump: bogus length 0xdeadc0de

I don't have time to look into it this week though.


Possibly related: core dumps are no longer working for me on NFS
mounts.  I get a 0 byte foo.core instead of a valid core dump.

--
John Baldwin



Re: Buildworld fails with external GCC toolchain

2022-02-14 Thread John Baldwin

On 2/12/22 11:34 AM, Yasuhiro Kimura wrote:

From: Dimitry Andric 
Subject: Re: Buildworld fails with external GCC toolchain
Date: Fri, 11 Feb 2022 22:53:44 +0100


Not really, the gcc 9 build has been broken for months, as far as I know.

See also: https://ci.freebsd.org/job/FreeBSD-main-amd64-gcc9_build/

The last build(s) show a different error from yours, though:

/workspace/src/tests/sys/netinet/libalias/util.c: In function 'set_udp':
/workspace/src/tests/sys/netinet/libalias/util.c:112:2: error: converting a 
packed 'struct ip' pointer (alignment 2) to a 'uint32_t' {aka 'unsigned int'} 
pointer (alignment 4) may result in an unaligned pointer value 
[-Werror=address-of-packed-member]
   112 |  uint32_t *up = (void *)p;
   |  ^~~~
In file included from /workspace/src/tests/sys/netinet/libalias/util.h:37,
  from /workspace/src/tests/sys/netinet/libalias/util.c:39:
/workspace/src/sys/netinet/ip.h:51:8: note: defined here
51 | struct ip {
   |^~

-Dimitry



Thanks for information. I went back the commit history of main branch
about every month and check if buildworld succeeds with GCC. But it
didn't succeed even if I went back about a year. And devel/binutils
port was update to 2.37 on last August. So I suspect external GCC
toolchain doesn't work well after binutils is updated to current
version.


I have amd64 world + kernel building with GCC 9 and the only remaining
open review not merged yet is https://reviews.freebsd.org/D34147.

It is work to keep it working though and I hadn't worked on it again
until recently.

--
John Baldwin



Re: Dragonfly Mail Agent (dma) in the base system

2022-01-27 Thread John Baldwin

On 1/27/22 1:34 PM, Ed Maste wrote:

The Dragonfly Mail Agent (dma) is a small Mail Transport Agent (MTA)
which accepts mail from a local Mail User Agent (MUA) and delivers it
locally or to a smarthost for delivery. dma does not accept inbound
mail (i.e., it does not listen on port 25) and is not intended to
provide the same functionality as a full MTA like postfix or sendmail.
It is intended for use cases such as delivering cron(8) mail.

Since 2014 we have a copy of dma in the base system available as an
optional component, enabled via the WITH_DMAGENT src.conf knob.

I am interested in determining whether dma is a viable minimal base
system MTA, and if not what gaps remain. If you have enabled DMA on
your systems (or are willing to give it a try) and have any feedback
or are aware of issues please follow up or submit a PR as appropriate.


I've used DMA on systems without local mail accounts to forward cron
periodic e-mails just fine.  It even supports STARTTLS and SMTP AUTH.
I haven't tried using it for simple local delivery to /var/mail/root.

--
John Baldwin



Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]

2022-01-03 Thread John Baldwin

On 1/1/22 9:00 AM, Ed Maste wrote:

On Fri, 31 Dec 2021 at 18:04, John Baldwin  wrote:


However, your point about libcxxrt.so.1 is valid.  It needs to also be moved
to /lib if libc++.so.1 is moved to /lib.


libcxxrt.so.1 has always been in /lib.


Oh, I was thrown off by the .so indirection for libcxxrt in the linker script.

--
John Baldwin



Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]

2021-12-31 Thread John Baldwin

On 12/31/21 2:59 PM, Mark Millard wrote:

On 2021-Dec-31, at 14:28, Mark Millard  wrote:


On 2021-Dec-30, at 14:04, John Baldwin  wrote:


On 12/30/21 1:09 PM, Mark Millard wrote:

On 2021-Dec-30, at 13:05, Mark Millard  wrote:

This asks a question in a different direction that my prior
reports about my builds vs. Cy's reported build.

Background:

/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libc++.so:GROUP
 ( /lib/libc++.so.1 /usr/lib/libcxxrt.so
and:
lrwxr-xr-x  1 root  wheel23 Dec 29 13:17:01 2021 /usr/lib/libcxxrt.so 
-> ../../lib/libcxxrt.so.1

Why did libc++.so.1 not get a:

/usr/lib/libc++.so.1 -> ../../lib/libc++.so.1

I forgot to remove the .1 on the left hand side:
/usr/lib/libc++.so -> ../../lib/libc++.so.1


Because for libc++.so we don't just symlink to the current version of the 
library
(as we do for most other shared libraries) to tell the compiler what to link 
against
for -lc++, instead we use a linker script that tells the compiler to link 
against
both of those libraries when -lc++ is encountered.


A better identification of what looks odd to me is the
path variations in:

# more /usr/lib/libc++.so


Another not great day on my part: That path alone makes
the mix of /lib/ and /usr/lib/ use involved, given the
reference to /lib/libc++.so.1 . That would still be true
if the other path had been /lib/libcxxrt.so .


/usr/lib/libc++.so is only used by the compiler/linker when linking a binary.
The resulting binary has the associated paths (/lib/libc++.so.1 and
/usr/lib/libcxxrt.so.1) in its DT_NEEDED.  So it is fine for the .so to be
in /usr/lib.  This is the same with /usr/lib/libc.so vs /lib/libc.so.7.

However, your point about libcxxrt.so.1 is valid.  It needs to also be moved
to /lib if libc++.so.1 is moved to /lib.  Doing so will also require yet another
depend-clean.sh fixup (well, probably just adjusting the one I added to
check the libcxxrt path instead of libc++ path).

--
John Baldwin



Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]

2021-12-30 Thread John Baldwin

On 12/30/21 1:09 PM, Mark Millard wrote:



On 2021-Dec-30, at 13:05, Mark Millard  wrote:


This asks a question in a different direction that my prior
reports about my builds vs. Cy's reported build.

Background:

/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libc++.so:GROUP
 ( /lib/libc++.so.1 /usr/lib/libcxxrt.so
and:
lrwxr-xr-x  1 root  wheel23 Dec 29 13:17:01 2021 /usr/lib/libcxxrt.so 
-> ../../lib/libcxxrt.so.1

Why did libc++.so.1 not get a:

/usr/lib/libc++.so.1 -> ../../lib/libc++.so.1


I forgot to remove the .1 on the left hand side:

/usr/lib/libc++.so -> ../../lib/libc++.so.1


Because for libc++.so we don't just symlink to the current version of the 
library
(as we do for most other shared libraries) to tell the compiler what to link 
against
for -lc++, instead we use a linker script that tells the compiler to link 
against
both of those libraries when -lc++ is encountered.

I have finally reproduced Cy's build error locally and am testing my fix.  If it
works I'll commit it.

--
John Baldwin



Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

2021-12-14 Thread John Baldwin

On 12/14/21 9:40 AM, Gleb Smirnoff wrote:

On Tue, Dec 14, 2021 at 09:28:07AM -0800, John Baldwin wrote:
J> > AFAIK, today it will always panic only with WITNESS. Without WITNESS it 
would
J> > pass through mtx_lock as long as the mutex is not locked.
J>
J> Yes, but the default kernel on head is GENERIC which has witness enabled, 
hence
J> the out of the box kernel panics reliably. :)
J>
J> > So, do you suggest to push D33340 before finalizing D9?
J>
J> Yes, I think so.

Pushed. And I plan to post new version of D33339 today.


Thanks!

--
John Baldwin



Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

2021-12-14 Thread John Baldwin

On 12/13/21 12:25 PM, Gleb Smirnoff wrote:

On Mon, Dec 13, 2021 at 11:56:35AM -0800, John Baldwin wrote:
J> > J> So there are two things here.  The root issue is that the devel/apr1 
port
J> > J> runs a configure test for TCP_NDELAY being inherited by accepted 
sockets.
J> > J> This test panics because prison_check_ip4() tries to lock a prison mutex
J> > J> to walk the IPs assigned to a jail, but the caller 
(in_pcblookup_hash()) has
J> > J> done an smr_enter() which is a critical_enter():
J> >
J> > The first one is known, and I got a patch to fix it:
J> >
J> > https://reviews.freebsd.org/D33340
J> >
J> > However, a pre-requisite to this simple patch is more complex:
J> >
J> > https://reviews.freebsd.org/D9
J> >
J> > There is some discussion on how to improve that, and I decided to do that
J> > rather than stick to original version. So I takes a few extra days.
J> >
J> > We could push D33340 into main, if the negative effects (raciness of
J> > the prison check) is considered lesser evil then potentially contested
J> > mtx_lock in smr section.
J>
J> I think raciness is probably better than always panicking as it does today.

AFAIK, today it will always panic only with WITNESS. Without WITNESS it would
pass through mtx_lock as long as the mutex is not locked.


Yes, but the default kernel on head is GENERIC which has witness enabled, hence
the out of the box kernel panics reliably. :)


So, do you suggest to push D33340 before finalizing D9?


Yes, I think so.

--
John Baldwin



Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

2021-12-14 Thread John Baldwin

On 12/14/21 2:14 AM, Alexey Dokuchaev wrote:

How do you mean?  Most FreeBSD people, not some random Twitter crowd, want
the bell to be on by default, but it's still off.


I don't know that that's true, and I myself am not sure that I want it back
on by default.  Previously my laptop had a rather annoying beep whose volume
I couldn't control that I actually prefer to have off normally.  On further
reflection, the beep I was looking for for bad input may actually be an
xscreensaver thing for an invalid character to unlock the screen vs a sysbeep
anyway.

--
John Baldwin



Re: RFC: What to do about Allocate in the NFS server for FreeBSD13?

2021-12-14 Thread John Baldwin

On 12/13/21 8:30 AM, Konstantin Belousov wrote:

On Mon, Dec 13, 2021 at 04:26:42PM +, Rick Macklem wrote:

Hi,

There are two problems with Allocate in the NFSv4.2 server in FreeBSD13:
1 - It uses the thread credentials instead of the ones for the RPC.
2 - It does not ensure that file changes are committed to stable storage.
These problems are fixed by commit f0c9847a6c47 in main, which added
ioflag and cred arguments to VOP_ALLOCATE().

I can think of 3 ways to fix Allocate in FreeBSD13:
1 - Apply a *hackish* patch like this:
+   savcred = p->td_ucred;
+   td->td_ucred = cred;
do {
olen = len;
error = VOP_ALLOCATE(vp, &off, &len);
if (error == 0 && len > 0 && olen > len)
maybe_yield();
} while (error == 0 && len > 0 && olen > len);
+   p->td_ucred = savcred;
if (error == 0 && len > 0)
error = NFSERR_IO;
+   if (error == 0)
+   error = VOP_FSYNC(vp, MNT_WAIT, p);
The worst part of it is temporarily setting td_ucred to cred.

2 - MFC'ng commit f0c9847a6c47. Normally changes to the
  VOP/VFS are not MFC'd. However, in this case, it might be
  ok to do so, since it is unlikely there is an out of source tree
  file system with a custom VOP_ALLOCATE() method?

I do not see much wrong with #2, this is what I would do myself.


I also think this is fine.

--
John Baldwin



Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

2021-12-13 Thread John Baldwin

On 12/13/21 9:35 AM, Gleb Smirnoff wrote:

   Hi John,

On Mon, Dec 13, 2021 at 07:45:07AM -0800, John Baldwin wrote:
J> So there are two things here.  The root issue is that the devel/apr1 port
J> runs a configure test for TCP_NDELAY being inherited by accepted sockets.
J> This test panics because prison_check_ip4() tries to lock a prison mutex
J> to walk the IPs assigned to a jail, but the caller (in_pcblookup_hash()) has
J> done an smr_enter() which is a critical_enter():

The first one is known, and I got a patch to fix it:

https://reviews.freebsd.org/D33340

However, a pre-requisite to this simple patch is more complex:

https://reviews.freebsd.org/D9

There is some discussion on how to improve that, and I decided to do that
rather than stick to original version. So I takes a few extra days.

We could push D33340 into main, if the negative effects (raciness of
the prison check) is considered lesser evil then potentially contested
mtx_lock in smr section.


I think raciness is probably better than always panicking as it does today.
 

J> However, it was a bit harder to see this originally as the 915kms driver
J> tries to do a malloc(M_WAITOK) from cn_grab() when entering DDB which
J> recursively panics (even a malloc(M_NOWAIT) from cn_grab() is probably a
J> bad idea).  When it panicked in X the result was that the screen just froze
J> on whatever it had most recently drawn and the machine looked hung.  (The
J> fact that that sysbeep is off so I couldn't tell if typing in commands was
J> doing anything vs emitting errors probably didn't improve trying to diagnose
J> the hang as "sitting in ddb" initially, though I don't know if DDB itself
J> emits a beep for invalid commands, etc.)

Didn't know about this one. Is this isolated to actually entering DDB or
there is some path that in a normal inpcb lookup we would M_WAITOK?


This is in the drm(4) driver, nothing to do with in_pcb, just made it harder
to see the in_pcb issue.

--
John Baldwin



smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

2021-12-13 Thread John Baldwin
mprove trying to diagnose
the hang as "sitting in ddb" initially, though I don't know if DDB itself
emits a beep for invalid commands, etc.)

--
John Baldwin



Re: Make etcupdate bootstrap requirement due to previous mergemaster usage more clear in handbook

2021-12-08 Thread John Baldwin

On 12/3/21 6:09 PM, Tomoaki AOKI wrote:

On Fri, 03 Dec 2021 05:54:37 -0800 (PST)
"Jeffrey Bouquet"  wrote:




On Fri, 3 Dec 2021 13:58:39 +0100, Miroslav Lachman <000.f...@quip.cz> wrote:


On 03/12/2021 12:52, Yetoo Happy wrote:

[...]


  Quick Start* and follow the instructions and get to step
7 and may think that even though etcupdate is different from mergemaster
from the last time they used the handbook they have faith that following
the instructions won't brick their system. This user will instead find that
faith in general is just a very complex facade for the pain and suffering
of not following *24.5.6.1 Merging Configuration Files* because the user
doesn't know that step exists or relevant to the current step and ends up
unknowingly having etcupdate append "<<<< yours ... >>>>> new" to the top
of the user's very important configuration files that they didn't expect
the program to actually modify that way when they resolved differences nor
could they predict easily because the diff format is so unintuitive and
different from mergemaster. Now unable to login or boot into single user
mode because redirections instead of the actual configuration is parsed the
user goes to the handbook to find out what might have happened and scrolls
down to find *24.5.6.1 Merging Configuration Files* is under *24.5.6.


[...]

That's why I think etcupdate is not so intuitive as tool like this
should be and etcupdate is extremely dangerous because it intentionally
breaks syntax of files vital to have system up and running.
If anything goes wrong with mergemaster automatic process then your have
configuration not updated which is almost always fine to boot the system
and fix it. But after etcupdate? Much worse...

I maintain about 30 machines for 2 decades and had problems with
etcupdate many times. I had ti use mergemaster as fall back many times.
Mainly because of etcupdate said "Reference tree to diff against
unavailable" or "No previous tree to compare against, a sane comparison
is not possible.". And sometimes because etcupdate cannot automatically
update many files in /etc/rc.d and manual merging of a lot of files with
"<<<<  >>>>" is realy painful while with mergemaster only simple
keyboard shortcuts will solve it.
All of this must be very stressful for beginners.

So beside the update of documentation I really would like to see some
changes to etcupdate workflow where files are modified in temporary
location and moved to destination only if they do not contain any syntax
breaking changes like <<<<, , >>>>.

Kind regards
Miroslav Lachman



Agree. I fell back to mergemaster this Nov on 13-stable when the /var files
pertaining to etcupdate were all missing current /etc data, and no study of
man etcupdate was clear enough to rectify such a scenario, and suspect my
initial use of etcupdate will or may require a planned reinstall,  not having
had to do so since Jan 2004 iirc,  [ vs failed hard disk migrations ] and
I am just hoping mergemaster stays in /usr/src and updated
for system changes, even if moved to 'tools' or
something, since its use seems intuitive and much less of a black box.
Also, /usr/src/UPDATING still at the bottom emphasizes mergemaster still.



Not sure it's fixed or not (tooo dangerous to try...), -n (dry-run)
option of etcupdate is now quite harmful. Maybe by any commit done in
this april on main (MFC'ed to stable/13 in june).

  *I got busy manually checking and applying changes to /etc, and
   forgot to file PR.

Doing `etcupdate -n` itself runs OK, but following `etcupdate -B` does
NOT do anything, hence nothing is actually updated.
The only workaround I have is NOT to try dry-run.


Humm.


It would be because the same trees are used for dry-run and actual run.
(Not looked into the code. Just a thought.)


So the new changes always build a temporary tree (vs trying to build
/var/db/etupdate/current in place).  For -n it should be that it just
doesn't change /var/db/etcupdate/current at the end, but if it did the
move anyway that would explain the bug you are seeing.  That does indeed
look broken.  Please file a PR as a reminder for me to fix it.

--
John Baldwin



Re: Make etcupdate bootstrap requirement due to previous mergemaster usage more clear in handbook

2021-12-08 Thread John Baldwin

On 12/3/21 4:58 AM, Miroslav Lachman wrote:

So beside the update of documentation I really would like to see some
changes to etcupdate workflow where files are modified in temporary
location and moved to destination only if they do not contain any syntax
breaking changes like <<<<, , >>>>.


This is what etcupdate does, so I'm a bit confused why you are getting
merge markers in /etc.  When an automated 3-way merge doesn't work due
to conflicts, the file with the conflicts is saved in
/var/db/etcupdate/conflicts/.  It is only copied to /etc when you
mark it as fully resolved when running 'etcupdate resolve'.  Perhaps
you had multiple conflicts in a modified file and when editing the file
you only fixed the first one and then marked it as resolved at the
prompt?  Even in that case etcupdate explicitly prompts you a second time
after you say "r" with "File  still has conflicts, are you sure?",
so it will only install a file to /etc with those changes if you have
explicitly confirmed you want it.

--
John Baldwin



Re: amd64 (example) main [so: 14]: delete-old check-old delete-old-libs missing a bunch of files?

2021-12-01 Thread John Baldwin
d without updating 
ObsoleteFiles.inc.


Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/mixer: Kyuafile
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/mixer: mixer_test


Fallout from recent mixer changes?  Hans might know more.


Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v1-sparc64-sav.in
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v1-sparc64-sav.out
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v1-sparc64-u.out
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v1-sparc64-usr.in
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v1-sparc64-usr.out
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v2-sparc64-sav.in
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v2-sparc64-u.out
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v2-sparc64-usr.in


I'll commit fixes for some of these.

--
John Baldwin



Re: make cleandiry tries to access /lib/geom

2021-11-24 Thread John Baldwin

On 11/24/21 3:30 AM, Bjoern A. Zeeb wrote:

Hi,

  673 ===> usr.bin/diff/tests (cleandir)
  674 ===> lib/geom (cleandir)
  675 ===> sbin/mount_udf (cleandir)
  676 make[6] warning: /lib/geom: Permission denied.

 not sure what is going on here?

  677 ===> share/i18n/esdb/ISO-8859 (cleandir)
  678 ===> tests/sys/cddl/zfs/tests/cli_root/zfs_clone (cleandir)


I think Jess has a possible fix.  This is some regression added in the
build system several months ago.

--
John Baldwin



Re: "Khelp module "ertt" can't unload until its refcount drops from 1 to 0." after "All buffers synced."?

2021-11-22 Thread John Baldwin

On 11/19/21 4:29 AM, tue...@freebsd.org wrote:

On 19. Nov 2021, at 00:11, Mark Millard  wrote:


On 2021-Nov-18, at 12:31, tue...@freebsd.org  wrote:


On 17. Nov 2021, at 21:13, Mark Millard via freebsd-current 
 wrote:

I've not noticed the ertt message before in:

. . .
Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... done
All buffers synced.
Uptime: 1d9h57m18s
Khelp module "ertt" can't unload until its refcount drops from 1 to 0.

Hi Mark,

what kernel configuration are you using? What kernel modules are loaded?


The shutdown was of my ZFS boot media but the machine is
currently doing builds on the UFS media. (The ZFS media is
present but not mounted). For now I provide information
from the booted UFS system. The UFS context is intended
to be nearly a copy of the brctl selection for main [so: 14]
from the ZFS media. Both systems have been doing the same
poudriere builds for various comparison/contrast purposes.
The current build activity will likely take 16+ hrs.

Hi Mark,

thanks a lot for the information. I was contemplating whether this message
was related to a recent change in the TCP congestion module handling, but
since it was already there in February, this is not the case. Will try to
reproduce this, but wasn't able up to now.


The congestion control changes just probably exacerbated the bug by
adding a new reference on this module, just as they exposed the bug
with khelp using the wrong SYSINIT subsystem.

--
John Baldwin



Re: cross-compiling for i386 on amd64 fails

2021-11-16 Thread John Baldwin

On 11/15/21 8:34 PM, Michael Butler via freebsd-current wrote:

Haven't had time to identify which change caused this yet but I now get ..

===> lib/libsbuf (obj,all,install)
===> cddl/lib/libumem (obj,all,install)
===> cddl/lib/libnvpair (obj,all,install)
===> cddl/lib/libavl (obj,all,install)
ld: error: /usr/obj/usr/src/i386.i386/tmp/usr/lib/libspl.a(assert.o) is
incompatible with elf_i386_fbsd
===> cddl/lib/libspl (obj,all,install)
cc: error: linker command failed with exit code 1 (use -v to see invocation)
--- libavl.so.2 ---
*** [libavl.so.2] Error code 1

make[4]: stopped in /usr/src/cddl/lib/libavl


My guess is that this was fixed by
git: 9e9c651caceb - main - cddl: fix missing ZFS library dependencies

--
John Baldwin



Re: git: 2f7f8995367b - main - libdialog: Bump shared library version to 10. [ the .so.10 is listed in mk/OptionalObsoleteFiles.inc ?]

2021-10-27 Thread John Baldwin

On 10/27/21 3:23 PM, Mark Millard via freebsd-current wrote:



On 2021-Oct-27, at 15:21, Mark Millard  wrote:


Unfortunately(?) this update added the .so.10 to mk/OptionalObsoleteFiles.inc :

diff --git a/tools/build/mk/OptionalObsoleteFiles.inc 
b/tools/build/mk/OptionalObsoleteFiles.inc
index a8b0329104c4..91822aac492a 100644
--- a/tools/build/mk/OptionalObsoleteFiles.inc
+++ b/tools/build/mk/OptionalObsoleteFiles.inc
_at__at_ -1663,11 +1663,11 _at__at_ OLD_FILES+=usr/bin/dialog
. . .
OLD_FILES+=usr/lib/libdialog.so
-OLD_FILES+=usr/lib/libdialog.so.8
+OLD_FILES+=usr/lib/libdialog.so.10
. . .

Looks to my like that +line should have been:

+OLD_FILES+=usr/lib/libdialog.so.9

(presuming the original .so.8 was correct during
.so.9 's time frame).



Looks like:

+OLD_FILES+=usr/lib/libdpv.so.3

is the same sort of issue and possibly should have been:

+OLD_FILES+=usr/lib/libdpv.so.2


No, these lines are for removing the current versions of the
libraries if you do 'make delete-old WITHOUT_DIALOG=yes'.

They weren't bumped previously when I bumped them for ncurses
(probably my fault).

--
John Baldwin



Re: main changed DIALOG_STATE, DIALOG_VARS, and DIALOG_COLORS but /usr/lib/libdialog.so.? naming was not adjusted? (crashes in releng/13 programs on main [so: 14] can result)

2021-10-22 Thread John Baldwin

On 10/22/21 1:08 AM, Mark Millard via freebsd-current wrote:

main [soi: 14] commit a96ef450 (2021-02-26 09:16:49 +)
changed DIALOG_STATE, DIALOG_VARS, and DIALOG_COLORS .
These are publicly exposed in (ones that I noticed):

/usr/include/dialog.h:extern DIALOG_STATE dialog_state;
/usr/include/dialog.h:extern DIALOG_VARS dialog_vars;
/usr/include/dialog.h:extern DIALOG_COLORS dlg_color_table[];


Then we need to bump libdialog's so version to 10?  (I don't
think libdialog has symbol versioning)

--
John Baldwin



Re: ELF binary type "0" not known. (while compiling buildworld on risc-v/qemu)

2021-09-27 Thread John Baldwin

On 9/27/21 7:40 AM, Karel Gardas wrote:


Hello,

I'm playing with compiling freebsd 13 (releng/13.0 2 days ago) and
current (git HEAD as of today) on qemu-5.1.0/qemu-6.1.0 on risv64
platform. The emulator invocation is:

qemu-system-riscv64 -machine virt -smp 8 -m 16G -nographic -device
virtio-blk-device,drive=hd -drive
file=FreeBSD-14.0-CURRENT-riscv-riscv64.qcow2,if=none,id=hd -device
virtio-net-device,netdev=net -netdev user,id=net,hostfwd=tcp::2233-:22
-bios /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.elf -kernel
/usr/lib/u-boot/qemu-riscv64_smode/uboot.elf -object
rng-random,filename=/dev/urandom,id=rng -device
virtio-rng-device,rng=rng -nographic -append "root=LABEL=rootfs
console=ttyS0"

and the host is Ubuntu 20.04.x LTS. Both qemu 5.1.0 and qemu 6.1.0 are
compiled from, source, but both OpenSBI and u-boot for risc-v are Ubuntu
packages provided (to accompany ubuntu provided qemu 4.2.1)

My issue while compiling both 13 and current is that compilation after
some time fails with:

root@freebsd:/usr/src # time make -j8 buildworld > /tmp/build-j8-2.txt
ELF binary type "0" not known.
17784.134u 21388.907s 1:50:13.83 592.2% 30721+572k 10+2177io 0pf+0w

I'm curious if this is a know issue either in Qemu or in FreeBSD for
risc-v or if I'm doing anything wrong here?


It is a known issue with how we brand FreeBSD/riscv binaries.  Jess
(cc'd) has a WIP review with a possible fix IIRC.

--
John Baldwin



Re: [HEADSUP] making /bin/sh the default shell for root

2021-09-22 Thread John Baldwin

On 9/22/21 1:36 AM, Baptiste Daroussin wrote:

Hello,

TL;DR: this is not a proposal to deorbit csh from base!!!

For years now, csh is the default root shell for FreeBSD, csh can be confusing
as a default shell for many as all other unix like settled on a bourne shell
compatible interactive shell: zsh, bash, or variant of ksh.

Recently our sh(1) has receive update to make it more user friendly in
interactive mode:
* command completion (thanks pstef@)
* improvement in the emacs mode, to make it behave by default like other shells
* improvement in the vi mode (in particular the vi edit to respect $EDITOR)
* support for history as described by POSIX.

This makes it a usable shell by default, which is why I would like to propose to
make it the default shell for root starting FreeBSD 14.0-RELEASE (not MFCed)

If no strong arguments has been raised until October 15th, I will make this
proposal happen.

Again just in case: THIS IS NOT A PROPOSAL TO REMOVE CSH FROM BASE!


I think this is fine.  I would also be fine with either removing 'toor' from the
default password file or just leaving it as-is for POLA.  (I would probably
prefer removing it outright.)

--
John Baldwin



Re: rescue/sh check failed, installation aborted

2021-08-23 Thread John Baldwin

On 8/23/21 12:18 PM, Graham Perrin wrote:

Encountered whilst attempting to build and install 14.0-CURRENT over
13.0-RELEASE-p3 (experimental, helloSystem):

<https://i.imgur.com/euFBA8M.png>

Background, condensed, to the best of my recollection:

cd /usr/src
make buildworld    # succeeded
make kernel    # failed
make clean
LOCAL_MODULES=    # added to /etc/src.conf
make kernel-toolchain
make kernel
restarted in single user mode
mount -uw /
service zfs start
cd /usr/src
make installworld – failed as pictured.

I see <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231325>, fixed
in 2018.

Any suggestions?


I'm not sure what the 'make clean' would have done.  Did you mean 'make 
cleanworld'?
If so, you will need to do a 'make buildworld' again before trying to do
'make installworld'.  The error message implies that there is no 'make 
buildworld'
output in /usr/obj (as if you had run 'make cleanworld' up above where you list
'make clean')

--
John Baldwin



Re: etcupdate: Failed to build new tree

2021-07-14 Thread John Baldwin

On 7/2/21 2:30 AM, Nuno Teixeira wrote:

Hello,

Last update I have some issues with etcupdate:

etcupdate warning: "No previous tree to compare against, a sane comparison
is not possible."

That I corrected with:

etcupdate extract
etcupdate diff > /tmp/etc.diff
patch -R < /tmp/etc.diff
(etcupdate diff doesn't show any diffs.)

Today I've just updated current and etcupdate -p gives:

"Failed to build new tree"

What might be wrong?


You can look in /var/db/etcupdate/log to check for errors.

--
John Baldwin



Re: CURRENT: acpi_wakecode.S error: unknown -Werror warning specifier: '-Wno-error-tautological-compare'

2021-06-22 Thread John Baldwin

On 6/22/21 11:13 AM, O. Hartmann wrote:

Hello,

on a recent CURRENT (FreeBSD 14.0-CURRENT #6 main-n247512-e3be51b2bc7c: Tue Jun 
22 15:31:03
CEST 2021 amd64) we build a 13-STABLE based NanoBSD from a dedicated source 
tree for a small
routing appliance. It should be " ...we built ..." because sinde the 
introduction of
LLVM/CLANG 12 on FreeBSD, the build of the source tree fails with the error 
shown below.

Since these errors a re die to some compiler knobs, the question is how to 
avoid them and make
the tree of 13-STABLE build again?
We do not do explicetely cross compiling, so if there is in general an issue with 
this "brute
force method" I would appreciate any recommendation to avoid such malfunctions 
using other
techniques - as long as they are moderate to implement.

Thanks in advance and kind regards,


You can use 'make buildworld WITHOUT_SYSTEM_COMPILER=yes' to force your builds 
to use the
clang 11 included in stable/13 instead of the host clang 12.  You could also 
MFC the fixes
from head to use -Wno-error= instead of -Wno-error-.

--
John Baldwin



Re: etcupdate warning: "No previous tree to compare against, a sane comparison is not possible."

2021-06-22 Thread John Baldwin

On 6/22/21 12:34 AM, Nuno Teixeira wrote:

Hello,

Should I be worry about etcupdate warning "No previous tree to compare
against, a sane comparison is not possible." when I recompile and update
current?

I receive same warning when I do a 'etcupdate -p' after installworld too.


Yes, this means etcupdate is not merging any changes to /etc.  You should
run 'etcupdate extract' before your next upgrade cycle.  You should then
review the output of 'etcupdate diff' to see if there are files in /etc
that need updating.  If there are files that you want to update to stock
versions you can use 'etcupdate revert /path/to/file'.  Otherwise, you can
use the patch generated by 'etcupdate diff' either as a guide to manually
update files to remove unwanted differences, or as input to patch -R.

--
John Baldwin



Re: drm-kmod kernel crash fatal trap 12

2021-06-15 Thread John Baldwin

On 6/15/21 11:22 AM, Bakul Shah wrote:

On Jun 15, 2021, at 9:03 AM, John Baldwin  wrote:


On 6/10/21 8:13 AM, Bakul Shah wrote:

On Jun 10, 2021, at 7:13 AM, Thomas Laus  wrote:

The drm-kmod module is the latest from the pkg server.  It all
worked this past Monday after the recent drm-kmod update.

This is what I did:
git clone https://github.com/freebsd/drm-kmod
ln -s $PWD/drm-kmod /usr/local/sys/modules
Now it gets compiled every time you do make buildkernel.
If things break you can do a git pull in the drm-kmod dir
and rebuild.


This is what I do now as well.  I think this is probably the
sanest approach to use on HEAD at least.


IIRC I learned this from one of your posts.

The PORTS_MODULES approach results in installing kernel modules
/boot/modules, which doesn't track /boot/kernel*/.


Yes, PORTS_MODULES is not so great when you are building test kernels
from branches that are different points in time and then go back to
booting your "stock" kernel as the module is now built against the
wrong ABI and breaks your "stock" kernel.

This is why I added LOCAL_MODULES and the SRC knob to drm-kmod,
but the source knob is a bit bumpy in practice as you sometimes
need newer source than your current package.  (For example, if your
"stock" kernel only changes every few months, but you pull newer
work trees for test kernels.)  For that case, it has proven simpler
to just do the direct checkout that I can git pull when needed.

--
John Baldwin



Re: drm-kmod kernel crash fatal trap 12

2021-06-15 Thread John Baldwin

On 6/10/21 8:13 AM, Bakul Shah wrote:

On Jun 10, 2021, at 7:13 AM, Thomas Laus  wrote:

The drm-kmod module is the latest from the pkg server.  It all
worked this past Monday after the recent drm-kmod update.


This is what I did:

git clone https://github.com/freebsd/drm-kmod
ln -s $PWD/drm-kmod /usr/local/sys/modules

Now it gets compiled every time you do make buildkernel.
If things break you can do a git pull in the drm-kmod dir
and rebuild.


This is what I do now as well.  I think this is probably the
sanest approach to use on HEAD at least.

--
John Baldwin



Re: Files in /etc containing empty VCSId header

2021-06-08 Thread John Baldwin

On 6/7/21 12:58 PM, Ian Lepore wrote:

On Mon, 2021-06-07 at 13:53 -0600, Warner Losh wrote:

On Mon, Jun 7, 2021 at 12:26 PM John Baldwin  wrote:


On 5/20/21 9:37 AM, Michael Gmelin wrote:

Hi,

After a binary update using freebsd-update, all files in /etc
contain
"empty" VCS Id headers, e.g.,

  $ head  /etc/nsswitch.conf
  #
  # nsswitch.conf(5) - name service switch configuration file
  # $FreeBSD$
  #
  group: compat
  group_compat: nis
  hosts: files dns
  netgroup: compat
  networks: files
  passwd: compat

After migrating to git, I would've expected those to contain
something
else or disappear completely. Is this expected and are there any
plans
to remove them completely?


I believe we might eventually remove them in the future, but doing
so
right now would introduce a lot of churn and the conversion to git
had enough other churn going on.



We'd planned on not removing things that might be merged to stable/12
since
those releases (12.3 only I think) will be built out of svn. We'll
likely
start to
remove things more widely as the stable/12 branch reaches EOL and
after.

Warner


It would be really nice if, instead of just deleting the $FreeBSD$
markers, they could be replaced with the path/filename of the file in
the source tree.  Sometimes it's a real interesting exercise to figure
out where a file on your runtime system comes from in the source world.
All the source tree layout changes that happened for packaged-base
makes it even more interesting.


My hope is that we un-break src/etc. :(  A few folks have looked at doing
that (notably Kyle).

--
John Baldwin



Re: Files in /etc containing empty VCSId header

2021-06-07 Thread John Baldwin

On 5/20/21 9:37 AM, Michael Gmelin wrote:

Hi,

After a binary update using freebsd-update, all files in /etc contain
"empty" VCS Id headers, e.g.,

 $ head  /etc/nsswitch.conf
 #
 # nsswitch.conf(5) - name service switch configuration file
 # $FreeBSD$
 #
 group: compat
 group_compat: nis
 hosts: files dns
 netgroup: compat
 networks: files
 passwd: compat

After migrating to git, I would've expected those to contain something
else or disappear completely. Is this expected and are there any plans
to remove them completely?


I believe we might eventually remove them in the future, but doing so
right now would introduce a lot of churn and the conversion to git
had enough other churn going on.

--
John Baldwin



Re: etcupdate -p: No previous tree to compare against, a sane comparison is not possible. (was: Review D28062 …)

2021-04-26 Thread John Baldwin

On 4/24/21 4:42 AM, Graham Perrin wrote:

On 21/04/2021 18:19, John Baldwin wrote:

On 4/17/21 12:52 PM, Graham Perrin wrote:

2)

<https://reviews.freebsd.org/D28062#change-5KzY5tEtVUor> line 2274

etcupdate -p

I get:

   > No previous tree to compare against, a sane comparison is not
possible.


Hmm, how did you initially install this machine?  Release images should
generally include a pre-populated /var/db/etcupdate so that etcupdate
works.  If you don't have one of those, you will have to perform an
initial bootstrap of etcupdate (only once) by running 'etcupdate
extract'.
If you do this before you update /usr/src then 'etcupdate' will later
work fine.  If you are doing this after you have already updated
/usr/src, you will need to run 'etcupdate diff' after 'etcupdate extract'
and fix any unexpected local differences in the generated patch, e.g.
by copying files from /var/db/etcupdate/current/etc to /etc.  Once
you have done this, 'etcupdate' will work fine on the next upgrade.

However, I'm curious how you didn't get the etcupdate bootstrap when
you initially installed.



Sorry for not replying sooner.

It's not an answer to your question, but might the thread at
<https://lists.freebsd.org/pipermail/freebsd-current/2021-April/079538.html>
be relevant?


Yes, you might indeed have hit this bug (which has since been fixed).
You might have to 'etcupdate extract' and then manually review
'etcupdate diff' to see if you have any unexpected diffs to recover.
Sorry. :-/

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Despite the documentation, "etcupdate extract" handles -D destdir (and its contribution to the default workdir)

2021-04-26 Thread John Baldwin

On 4/24/21 12:22 PM, Mark Millard via freebsd-current wrote:

# etcupdate -?
Illegal option -?

usage: etcupdate [-npBF] [-d workdir] [-r | -s source | -t tarball]
  [-A patterns] [-D destdir] [-I patterns] [-L logfile]
  [-M options]
etcupdate build [-B] [-d workdir] [-s source] [-L logfile] [-M options]
  
etcupdate diff [-d workdir] [-D destdir] [-I patterns] [-L logfile]
etcupdate extract [-B] [-d workdir] [-s source | -t tarball] [-L 
logfile]
  [-M options]
etcupdate resolve [-p] [-d workdir] [-D destdir] [-L logfile]
etcupdate status [-d workdir] [-D destdir]

The "etcupdate extract" material does not show -D destdir as valid.


Thanks, it was a documentation oversight I've just fixed.  It is definitely
supposed to work and is quite useful for cross-builds (e.g. I use it frequently
to update rootfs images I use with qemu for RISC-V or MIPS that I run under
qemu, or when updating the SD-card for my RPI that I cross-build on an x86
host).

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: "etcupdate -p" vs. $OLDTREE?

2021-04-23 Thread John Baldwin

On 4/23/21 6:59 AM, Olivier Cochard-Labbé wrote:

On Fri, Apr 23, 2021 at 1:10 PM David Wolfskill 
wrote:


After the set of updates to etcupdate (main-n246232-0611aec3cf3a ..
main-n246235-ba30215ae0ef), I find that "etcupdate -B -p" is working as
expected, but after the following "make installworld", a subsequent
"etcupdate -B" chugs along for a bit, then stops, whining:

No previous tree to compare against, a sane comparison is not possible.




Same problem here while using /usr/src/tools/build/beinstall.sh:
(...)
Skipping blacklisted certificate
/usr/share/certs/trusted/Verisign_Class_1_Public_Primary_Certification_Authority_-_G3.pem
(/etc/ssl[0/1831]sted/ee1365c0.0)
Skipping blacklisted certificate
/usr/share/certs/trusted/Verisign_Class_2_Public_Primary_Certification_Authority_-_G3.pem
(/etc/ssl/blacklisted/dc45b0bd.0)
Scanning /usr/local/share/certs for certificates...
+ [ -n etcupdate ]
+ update_etcupdate
+ /usr/src/usr.sbin/etcupdate/etcupdate.sh -s /usr/src -D
/tmp/beinstall.MZ4oy8/mnt -F
No previous tree to compare against, a sane comparison is not possible.
+ return 1
+ [ 1 -ne 0 ]
+ errx 'etcupdate (post-world) failed!'
+ cleanup
(...)


Sorry, this should be fixed.  beinstall.sh still has a bug in that it needs 
this change:

diff --git a/tools/build/beinstall.sh b/tools/build/beinstall.sh
index 46c65d87e61a..fab21edc0fd5 100755
--- a/tools/build/beinstall.sh
+++ b/tools/build/beinstall.sh
@@ -133,7 +133,7 @@ update_mergemaster() {
 
 update_etcupdate_pre() {

${ETCUPDATE_CMD} -p -s ${srcdir} -D ${BE_MNTPT} ${ETCUPDATE_FLAGS} || 
return $?
-   ${ETCUPDATE_CMD} resolve -D ${BE_MNTPT} || return $?
+   ${ETCUPDATE_CMD} resolve -p -D ${BE_MNTPT} || return $?
 }
 
 update_etcupdate() {
        



--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Review D28062 of /usr/src/UPDATING with regard to upgrading FreeBSD and inconsistency with the FreeBSD Handbook

2021-04-21 Thread John Baldwin

On 4/17/21 12:52 PM, Graham Perrin wrote:

2)

<https://reviews.freebsd.org/D28062#change-5KzY5tEtVUor> line 2274

etcupdate -p

I get:

  > No previous tree to compare against, a sane comparison is not possible.


Hmm, how did you initially install this machine?  Release images should
generally include a pre-populated /var/db/etcupdate so that etcupdate
works.  If you don't have one of those, you will have to perform an
initial bootstrap of etcupdate (only once) by running 'etcupdate extract'.
If you do this before you update /usr/src then 'etcupdate' will later
work fine.  If you are doing this after you have already updated
/usr/src, you will need to run 'etcupdate diff' after 'etcupdate extract'
and fix any unexpected local differences in the generated patch, e.g.
by copying files from /var/db/etcupdate/current/etc to /etc.  Once
you have done this, 'etcupdate' will work fine on the next upgrade.

However, I'm curious how you didn't get the etcupdate bootstrap when
you initially installed.

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Review D28062 of /usr/src/UPDATING with regard to upgrading FreeBSD and inconsistency with the FreeBSD Handbook

2021-04-21 Thread John Baldwin

On 4/18/21 1:48 AM, driesm.michi...@gmail.com wrote:

If etcupdate -p fails before make installworld, then should the subsequent
etcupdate be with or without option -B ?


-p and -B are not related.

-p deals with changes needed for a correct run of installworld (see above).

-B uses a freshly built world to speed up the tree comparison; although no
flags work fine here as well, so -B is not necessarily required.


Technically -B speeds up generating the tree to compare against as it assumes
/usr/obj is up to date and uses that instead of rebuilding some files
generated by buildworld normally.  The actual tree comparison isn't affected.

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Getting started with ktls

2021-03-22 Thread John Baldwin

On 3/18/21 8:31 AM, tech-lists wrote:

On Wed, Mar 17, 2021 at 08:39:02PM +, Rick Macklem wrote:


Make sure you've done the following:
ktls_ocf - is loaded
these sysctls are set to 1
kern.ipc.tls.enable
kern.ipc.mb_use_ext_pgs


[on stable/13]

% sysctl kern.ipc.tls.enable kern.ipc.mb_use_ext_pgs
kern.ipc.tls.enable: 1
kern.ipc.mb_use_ext_pgs: 1

% kldstat | grep ktls
71 0x0135300025520 ktls_ocf.ko
%

% sysctl -a | fgrep kern.ipc.tls.stats
kern.ipc.tls.stats.ocf.retries: 0
kern.ipc.tls.stats.ocf.separate_output: 0
kern.ipc.tls.stats.ocf.inplace: 0
kern.ipc.tls.stats.ocf.tls13_gcm_crypts: 0
kern.ipc.tls.stats.ocf.tls12_gcm_crypts: 0
kern.ipc.tls.stats.ocf.tls11_cbc_crypts: 0
kern.ipc.tls.stats.ocf.tls10_cbc_crypts: 0
kern.ipc.tls.stats.switch_failed: 0
kern.ipc.tls.stats.switch_to_sw: 0
kern.ipc.tls.stats.switch_to_ifnet: 0
kern.ipc.tls.stats.failed_crypto: 0
kern.ipc.tls.stats.corrupted_records: 0
kern.ipc.tls.stats.active: 0
kern.ipc.tls.stats.enable_calls: 535
kern.ipc.tls.stats.offload_total: 0
kern.ipc.tls.stats.sw_rx_inqueue: 0
kern.ipc.tls.stats.sw_tx_inqueue: 0
kern.ipc.tls.stats.threads: 4
%


FYI, you can do this a bit more efficiently with just 'sysctl 
kern.ipc.tls.stats'

The 'enable_calls' means that OpenSSL is trying to offload connections,
but those attempts are all failing (offload_total is a count of how many
of those setsockopt() calls succeed).  If you are familiar with dtrace,
you can use some DTrace probes to see why 'ktls_enable_tx' and 'ktls_enable_rx'
are failing, or barring that printf.  For example, does ktls_create_session()
fail, or does ktls_try_sw() fail?  It is probably easiest to debug this using
a userland application using openssl than trying NFS over TLS.

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: On 14-CURRENT: no ports options anymore?

2021-03-22 Thread John Baldwin

On 3/13/21 12:58 PM, Guido Falsi via freebsd-current wrote:

On 13/03/21 20:17, Hartmann, O. wrote:

Since I moved on to 14-CURRENT, I face a very strange behaviour when trying to 
set
options via "make config" or via poudriere accordingly. I always get "===> 
Options
unchanged" (when options has been already set and I'd expect a dialog menu).
This misbehaviour is throughout ALL 14-CURRENT systems (the oldest is at FreeBSD
14.0-CURRENT #49 main-n245422-cecfaf9bede9: Fri Mar 12 16:08:09 CET 2021 amd64).

I do not see such a behaviour with 13-STABLE, 12-STABLE, 12.2-RELENG.

How to fix this? What happened?


I encountered something similar, some base shared library has changed,
guess this is related with the ncurses changes in base.

If I remember correctly force reinstalling dialog4ports package fixed
it. Make sure you reinstall a freshly rebuilt one.

Most probably anything using ncurses will require rebuild/reinstall.

The cause is dialog4ports failing to start and the system sees no option
changed.

If that's not enough try

# ldd -v /usr/local/bin/dialog4ports

And see if it reports some useful information.


There was an ABI breakage for ncurses that broke 12.x dialog4ports binaries.
The shared library versions for everything that depended on ncurses were bumped
for 13 and 14 after the branch of stable/13.0 (commit 6e1fe6d26ea2).  After
that commit, if you upgraded from 12 to 13 you should have been fine, but if
you had updated before that, the 12.x dialog4ports was still going to fail
as the 12.x version of those libraries were already broken.  I haven't checked
to see if the affected libraries have been added to misc/compat12x.

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Getting started with ktls

2021-03-11 Thread John Baldwin

On 3/10/21 4:18 PM, Alan Somers wrote:

I'm trying to make ktls work with "zfs send/recv" to substantially reduce
the CPU utilization of applications like zrepl.  But I have a few questions:

* ktls(4)'s "Transmit" section says "Once TLS transmit is enabled by a
successful set of the TCP_TXTLS_ENABLE socket option", but the "Supported
Libraries" section says "Applications using a supported library should
generally work with ktls without any changes".  These sentences seem to be
contradictory.  I think it means that the TCP_TXTLS_ENABLE option is
necessary, but OpenSSL sets it automatically?


Yes, you can do it by hand if you want but you'd have to do all the key
exchange by hand as well.


* When using OpenSSL, the library will automatically call setsockopt(_,
TCP_TXTLS_ENABLE).  But it swallows the error, if any.  How is an
application to tell if ktls is enabled on a particular socket or OpenSSL
session?


BIO_get_ktls_send() and BIO_get_ktls_recv() on the write and read BIO's of
the connection, respectively.


* From experiment, I can see that OpenSSL attempts to set
TCP_TXTLS_ENABLE.  But it doesn't try to set TCP_RXTLS_ENABLE.  Why not?
 From reading ktls_start and ossl_statem_server_post_work, it looks like
maybe a single socket cannot have ktls enabled for both sending and
receiving at the same time.  Is that true?


Neither FreeBSD nor OpenSSL yet support RX offload on TLS 1.3.  If you use
TLS 1.2 you will get KTLS in both directions (or if you use TLS 1.1 with
TOE offload on a Chelsio T6).

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: (n244517-f17fc5439f5) svn stuck forever in /usr/ports?

2021-02-25 Thread John Baldwin

On 2/11/21 9:59 AM, Hartmann, O. wrote:

On Wed, 10 Feb 2021 07:21:20 +0100
"Hartmann, O."  wrote:


On Tue, 9 Feb 2021 15:15:38 -0800
John Baldwin  wrote:


On 2/9/21 2:16 PM, Hartmann, O. wrote:

On Wed, 3 Feb 2021 17:34:24 +0100
Guido Falsi via freebsd-current  wrote:
 

On 03/02/21 17:02, John Baldwin wrote:

On 2/2/21 10:16 PM, Hartmann, O. wrote:

On Mon, 1 Feb 2021 03:24:45 +
Rick Macklem  wrote:
   

Rick Macklem wrote:

Guido Falsi wrote:
[good stuff snipped]

Performed a full bisect. Tracked it down to commit aa906e2a4957,
adding
KTLS support to embedded OpenSSL.

I filed a bug report about this:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253135


Apart from switching to svn:// scheme, another workaround is to build
base using WITHOUT_OPENSSL_KTLS.

Just fyi, when I tested the daemons I have for nfs-over-tls (which
use ktls),
they acted like things were ok (no handshake problems), but the data
ended up on the wire unencrypted (nfs-over-tls doesn't do a
SSL_write(),
so it depends on ktls to do the encryption).

Since these daemons work fine with openssl3 in
ports/security/openssl-devel,
I suspect the ktls backport is not quite right. I've sent jhb@ email.

I was wrong on the above. I did a full buildworld/installworld and
the daemons
now seem to work with the openssl in head/main.

Btw, did anyone try rebuilding svn from sources after doing
the system upgrade?
(The openssl library calls and .h files definitely changed.)


Yes, I did, on all boxes and its a pain in the a..., we had to rebuild
EVERY port (at
least, I did, to avoid further problem). Yesterday, on of our fastes
boxes got ready and
even with a full rebuild of the system AND a full rebuild of the ports
(no poudriere,
traditional way via make), the Apache 2.4 webservice doesn't work, and
so does subversion
not (Firefox reports problems with SSL handshake, subversion is
stuck/frozen forever).
I will run today another full world build today, hopefully finishing
on friday (portmaster
-dfR doesn't get everything in line on some ports, I assume).

oh


I tracked the subversion hang down to a bug in serf (an Apache library
used by
subversion).  It would also affect any other software using serf.  The
serf in
ports will also have to be patched.



I submitted your patch as a bug report to the serf port:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253214



What is the status of this bug?
As PR 253214 might suggest, the patch to www/serf has been commited. We still 
face a
problem with FreeBSD CURRENT-14 based systems running Apache24:

FreeBSD 14.0-CURRENT #4 main-n244672-866c8b8d5dd: Mon Feb  8 08:38:59 CET 2021 
amd64

/usr/ports is at Revision: 564736.

www/apache24, www/serf have been rebuilt using "portmaster -f www/apache24
www/serf".

Restarting Apache 2.4 still fails on any access with SSL enabled, firefox 
reports:

SSL_ERROR_HANDSHAKE_UNEXPECTED_ALERT


This is the first report I've had after the serf update.

Here's an untested patch that is similar to the serf bug.  You would
apply this in the www/apache24 port.

Index: files/patch-modules_ssl_ssl__engine__io.c
===
--- files/patch-modules_ssl_ssl__engine__io.c   (nonexistent)
+++ files/patch-modules_ssl_ssl__engine__io.c   (working copy)
@@ -0,0 +1,11 @@
+--- modules/ssl/ssl_engine_io.c.orig   2021-02-09 15:09:39.362123000 -0800
 modules/ssl/ssl_engine_io.c2021-02-09 15:12:13.59669 -0800
+@@ -542,7 +542,7 @@ static int bio_filter_in_gets(BIO *bio, char *buf, int
+
+ static long bio_filter_in_ctrl(BIO *bio, int cmd, long num, void *ptr)
+ {
+-return -1;
++return 0;
+ }
+
+ #if MODSSL_USE_OPENSSL_PRE_1_1_API
   


Thank you very much for investigating and the patch.

I haven't got the chance to apply the patch yet, I'll do within the next two 
hours. For
the record: I filed a PR on this specific problem in Apache 2.4, please see 
here:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253394

Kind regards,

O. Hartmann



I tried the patch, it doesn't work.
Assuming that it is sufficient to recompile from scratch/clean tree the whole 
OS and then
recompile every port required by www/apach24, applying then the patch, I tried 
to connect
to pages served by the 14-CURRENT server running the pacthed Apache 2.4 (ports 
tree at
the most recent state at that time), I still get the error described above.

Kind regards,

oh


I finally reproduced this today and was able to at least get a valid response 
back
from the server using openssl s_client as the client with a larger version of 
this
patch.

You can find the full patch at https://reviews.freebsd.org/D28932

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: (n244517-f17fc5439f5) svn stuck forever in /usr/ports?

2021-02-09 Thread John Baldwin

On 2/9/21 2:16 PM, Hartmann, O. wrote:

On Wed, 3 Feb 2021 17:34:24 +0100
Guido Falsi via freebsd-current  wrote:


On 03/02/21 17:02, John Baldwin wrote:

On 2/2/21 10:16 PM, Hartmann, O. wrote:

On Mon, 1 Feb 2021 03:24:45 +
Rick Macklem  wrote:
  

Rick Macklem wrote:

Guido Falsi wrote:
[good stuff snipped]

Performed a full bisect. Tracked it down to commit aa906e2a4957,
adding
KTLS support to embedded OpenSSL.

I filed a bug report about this:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253135


Apart from switching to svn:// scheme, another workaround is to build
base using WITHOUT_OPENSSL_KTLS.

Just fyi, when I tested the daemons I have for nfs-over-tls (which
use ktls),
they acted like things were ok (no handshake problems), but the data
ended up on the wire unencrypted (nfs-over-tls doesn't do a
SSL_write(),
so it depends on ktls to do the encryption).

Since these daemons work fine with openssl3 in
ports/security/openssl-devel,
I suspect the ktls backport is not quite right. I've sent jhb@ email.

I was wrong on the above. I did a full buildworld/installworld and
the daemons
now seem to work with the openssl in head/main.

Btw, did anyone try rebuilding svn from sources after doing
the system upgrade?
(The openssl library calls and .h files definitely changed.)


Yes, I did, on all boxes and its a pain in the a..., we had to rebuild
EVERY port (at
least, I did, to avoid further problem). Yesterday, on of our fastes
boxes got ready and
even with a full rebuild of the system AND a full rebuild of the ports
(no poudriere,
traditional way via make), the Apache 2.4 webservice doesn't work, and
so does subversion
not (Firefox reports problems with SSL handshake, subversion is
stuck/frozen forever).
I will run today another full world build today, hopefully finishing
on friday (portmaster
-dfR doesn't get everything in line on some ports, I assume).

oh


I tracked the subversion hang down to a bug in serf (an Apache library
used by
subversion).  It would also affect any other software using serf.  The
serf in
ports will also have to be patched.
   


I submitted your patch as a bug report to the serf port:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253214



What is the status of this bug?
As PR 253214 might suggest, the patch to www/serf has been commited. We still 
face a
problem with FreeBSD CURRENT-14 based systems running Apache24:

FreeBSD 14.0-CURRENT #4 main-n244672-866c8b8d5dd: Mon Feb  8 08:38:59 CET 2021 
amd64

/usr/ports is at Revision: 564736.

www/apache24, www/serf have been rebuilt using "portmaster -f www/apache24 
www/serf".

Restarting Apache 2.4 still fails on any access with SSL enabled, firefox 
reports:

SSL_ERROR_HANDSHAKE_UNEXPECTED_ALERT


This is the first report I've had after the serf update.

Here's an untested patch that is similar to the serf bug.  You would
apply this in the www/apache24 port.

Index: files/patch-modules_ssl_ssl__engine__io.c
===
--- files/patch-modules_ssl_ssl__engine__io.c   (nonexistent)
+++ files/patch-modules_ssl_ssl__engine__io.c   (working copy)
@@ -0,0 +1,11 @@
+--- modules/ssl/ssl_engine_io.c.orig   2021-02-09 15:09:39.362123000 -0800
 modules/ssl/ssl_engine_io.c2021-02-09 15:12:13.59669 -0800
+@@ -542,7 +542,7 @@ static int bio_filter_in_gets(BIO *bio, char *buf, int
+
+ static long bio_filter_in_ctrl(BIO *bio, int cmd, long num, void *ptr)
+ {
+-return -1;
++return 0;
+ }
+
+ #if MODSSL_USE_OPENSSL_PRE_1_1_API

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: (n244517-f17fc5439f5) svn stuck forever in /usr/ports?

2021-02-03 Thread John Baldwin

On 2/2/21 10:16 PM, Hartmann, O. wrote:

On Mon, 1 Feb 2021 03:24:45 +
Rick Macklem  wrote:


Rick Macklem wrote:

Guido Falsi wrote:
[good stuff snipped]

Performed a full bisect. Tracked it down to commit aa906e2a4957, adding
KTLS support to embedded OpenSSL.

I filed a bug report about this:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253135


Apart from switching to svn:// scheme, another workaround is to build
base using WITHOUT_OPENSSL_KTLS.

Just fyi, when I tested the daemons I have for nfs-over-tls (which use ktls),
they acted like things were ok (no handshake problems), but the data
ended up on the wire unencrypted (nfs-over-tls doesn't do a SSL_write(),
so it depends on ktls to do the encryption).

Since these daemons work fine with openssl3 in ports/security/openssl-devel,
I suspect the ktls backport is not quite right. I've sent jhb@ email.

I was wrong on the above. I did a full buildworld/installworld and the daemons
now seem to work with the openssl in head/main.

Btw, did anyone try rebuilding svn from sources after doing
the system upgrade?
(The openssl library calls and .h files definitely changed.)


Yes, I did, on all boxes and its a pain in the a..., we had to rebuild EVERY 
port (at
least, I did, to avoid further problem). Yesterday, on of our fastes boxes got 
ready and
even with a full rebuild of the system AND a full rebuild of the ports (no 
poudriere,
traditional way via make), the Apache 2.4 webservice doesn't work, and so does 
subversion
not (Firefox reports problems with SSL handshake, subversion is stuck/frozen 
forever).
I will run today another full world build today, hopefully finishing on friday 
(portmaster
-dfR doesn't get everything in line on some ports, I assume).

oh


I tracked the subversion hang down to a bug in serf (an Apache library used by
subversion).  It would also affect any other software using serf.  The serf in
ports will also have to be patched.

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Can In-Kernel TLS (kTLS) work with any OpenSSL Application?

2021-01-25 Thread John Baldwin

On 1/20/21 12:21 PM, Neel Chauhan wrote:

Hi freebsd-current@,

I know that In-Kernel TLS was merged into the FreeBSD HEAD tree a while
back.

With 13.0-RELEASE around the corner, I'm thinking about upgrading my
home server, well if I can accelerate any SSL application.

I'm asking because I have a home server on a symmetrical Gigabit
connection (Google Fiber/Webpass), and that server runs a Tor relay. If
you're interested in how Tor works, the EFF has a writeup:
https://www.eff.org/pages/what-tor-relay

But the main point for you all is: more-or-less Tor relays deal with
1000s TLS connections going into and out of the server.

Would In-Kernel TLS help with an application like Tor (or even load
balancers/TLS termination), or is it more for things like web servers
sending static files via sendfile() (e.g. CDN used by Netflix).


It depends.  Applications with allow OpenSSL to use a socket directly
(e.g. via SSL_set_fd() or via SSL_connect() or the like) will work with
kernel TLS transparently.  This includes things like apache, nginx,
fetch, wget, curl, etc.  However, some applications use OpenSSL purely
as a data transformation library and manage the socket I/O separately
(e.g. OpenVPN).  KTLS will not work with these applications since
OpenSSL doesn't "know" about the socket in question.


My server could also work with Intel's QuickAssist (since it has an
Intel Xeon "Scalable" CPU). Would QuickAssist SSL be more helpful here?


You can use this with ktls_ocf.ko and the qat(4) drivers.

I am working, btw, on merging KTLS into base OpenSSL and hope to have
it present in 13.0.

As you noted, applications would need to be changed to use SSL_sendfile()
to get the best performance on TX.  We don't really have an analog on
the receive side in our syscall API.  One might be able to do some
creative things with aio_read(4) perhaps, but I haven't implemented that.

Also, currently RX offload always returns individual records with the
full TLS header via recvmsg().  Linux's RX offload only includes the
message for non-application-data messages so that one could in theory
do bulk read(2) calls larger than a single TLS record.  OpenSSL itself
though always reads a single TLS record at a time, so if I were to change
this (e.g. with a new socket option to toggle headers for application
data), this would only be relevant to software that "knew" it was using
KTLS and would use direct read/write after letting OpenSSL (or a similar
library) handle the handshake.

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: service -e doesn't really sort does it? the cool tip is slightly off

2021-01-25 Thread John Baldwin

On 1/16/21 3:28 PM, Dennis Clarke wrote:

root@rhea:/usr/src/freebsd-src # diff -u
usr.bin/fortune/datfiles/freebsd-tips.orig
usr.bin/fortune/datfiles/freebsd-tips
--- usr.bin/fortune/datfiles/freebsd-tips.orig  2021-01-15
00:37:37.863506000 +
+++ usr.bin/fortune/datfiles/freebsd-tips   2021-01-16
07:46:57.335803000 +
@@ -517,7 +517,7 @@

 -- Lars Engels 
  %
-If you want to get a sorted list of all services that are started when
FreeBSD boots,
+If you want to get a list of all services that are started when FreeBSD
boots,
  enter "service -e".

 -- Lars Engels 
root@rhea:/usr/src/freebsd-src #

Sorry for being all OCD here. Perhaps it should say sorted in the order
in which they were started. Something like that.


I think the "sorted" detail is relevant, but describing the key (how it is
sorted) is certainly a missing detail, maybe:

If you want to get a list of all services started when FreeBSD boots
in the order they are started, enter "service-e".

?

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: git non-time-sequential logs

2021-01-05 Thread John Baldwin
On 1/4/21 8:52 AM, John Kennedy wrote:
> On Mon, Jan 04, 2021 at 08:22:56AM -0800, John Kennedy wrote:
>> The git logs in /usr/src aren't time-sequential, so maybe I shouldn't trust
>> those dates above (I pulled it ~Jan 3rd and let it compile overnight), but
>> I'm going to repull all the sources and recompile, just in case.  I might
>> have initiall pulled it during the git conversion and maybe it is confused.
> 
> This might be perfectly natural and just new to me, but when I look at the
> git logs this morning I see things like this (editing by me):
> 
>   commit e5df46055add3bcf074c9ba275ceb4481802ba04 (HEAD -> main, 
> freebsd/main, freebsd/HEAD)
>   Author: Emmanuel Vadot 
>   Date:   Mon Jan 4 17:30:00 2021 +0100
> 
>   commit f61a3898bb989edef7ca308043224e495ed78f64
>   Author: Emmanuel Vadot 
>   Date:   Mon Dec 14 18:56:56 2020 +0100
> 
>   commit b6cc69322a77fa778b00db873781be04f26bd2ee
>   Author: Emmanuel Vadot 
>   Date:   Tue Dec 15 13:50:00 2020 +0100
> 
>   commit 4401fa9bf1a3f2a7f2ca04fae9291218e1ca56bf
>   Author: Emmanuel Vadot 
>   Date:   Mon Jan 4 16:23:10 2021 +0100
> 
>   This is a fresh clone+pull off of anon...@git.freebsd.org:src.git.
> 
>   I've always assumed that the "Date:" there was when the commit happened,
> so they'd be increasing (most recent on top), but I suppose that you might
> have developers in branches that are committing to their branch at one
> point in time and it's getting merged into current (main) later, but the
> original date is preserved?
> 
>   I guess I only care because I was trying to use time to bisect the
> time I thought the problem might have been introduced.

For commits to gdb (which uses git), the project asks that all series be
rebased via 'git rebase --ignore-date' prior to pushing to master to give
monotonically increasing commit dates.  We could do something similar in
FreeBSD either by asking folks to do that explicitly (though I know I
sometimes forget when pushing to gdb myself), or we could avoid direct
pushes to main.  One option some folks mentioned on IRC was to have a
separate "staging" branch that developers push to and then have a bot that
does a rebase --ignore-date of that branch to main periodically, though
that opens the question of how to deal with cherry-picks to stable (for
which asking developers to do a rebase --ignore-date prior to pushing is
probably the simpler approach).

If we did want monotonically increasing dates without having a staging
branch, we could perhaps use a server-side push hook to reject them and
developers would just have to do a rebase --ignore-date before pushing
again.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling AESNI by default

2020-12-31 Thread John Baldwin
On 12/31/20 11:51 AM, Allan Jude wrote:
> We've had the AESNI module for quite a few years now, and it has not
> caused any problems.
> 
> I am wondering if there are any objections to including it in GENERIC,
> so that users get the benefit without having to have the "tribal
> knowledge" that 'to accelerate kernel crypto (GELI, ZFS, IPSEC, etc),
> you need to load aesni.ko'
> 
> Userspace crypto that uses openssl or similar libraries is already
> taking advantage of these CPU instructions if they are available, by
> excluding this feature from GENERIC we are just causing the "out of the
> box" experience to by very very slow for crypto.
> 
> For example, writing 1MB blocks to a GELI encrypted swap-backed md(4)
> device:
> 
> with 8 jobs on a 10 core Intel Xeon CPU E5-2630 v4 @ 2.20GHz
> 
> fio --filename=/dev/md0.eli --device=1 --name=geli --rw=write --bs=1m
> --numjobs=8 --iodepth=16 --end_fsync=1 --ioengine=pvsync
> --group_reporting --fallocate=none --runtime=60 --time_based
> 
> 
> stock:
> write: IOPS=530, BW=530MiB/s (556MB/s) (31.1GiB/60012msec)
> 
> with aesni.ko loaded:
> write: IOPS=2824, BW=2825MiB/s (2962MB/s) (166GiB/60002msec)
> 
> 
> Does anyone have a compelling reason to deny our users the 5x speedup?

I think this is fine.  I do hope to add AES support to ossl(4) at
some point, and it may be that ossl(4) will supplant aesni(4) (and
armv8crypto(4)) at that point, but aesni in GENERIC makes sense
right now (and I'd say to MFC it to 12).  armv8crypto in arm64
GENERIC will make sense once AES-XTS support (currently in a review)
lands.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Intel TigerLake NVMe vmd: Adding Support & Debugging a Patch

2020-12-31 Thread John Baldwin
On 12/31/20 2:40 PM, Chuck Tuffli wrote:
> On Wed, Dec 30, 2020 at 4:38 PM Neel Chauhan  wrote:
>>
>> Hi Chuck,
>>
>> On 2020-12-30 10:04, Chuck Tuffli wrote:
>>> What is the output from
>>> # pciconf -rb pci0:0:14:0 0x40:0x48
>>
>> The output is:
>>
>> 01 00 00 00 01 2e 68 02  00
> 
> Perfect. The Linux driver says the 8086:9a0b device you have "... may
> provide root port configuration information which limits bus
> numbering" which causes the code to read the VM Capability register
> (0x40) and the VM Configuration register (0x44). Here, VMCAP = 0x0001
> where bit 0 set appears to mean the config register has starting bus
> number information. VMCFG = 0x2e01 where bits 5:4 give the coded start
> number of bus 224 or 0xe0 which matches the PCI bridge shown in the
> lspci output (i.e. 1:e0:06.0).
> 
> I wonder if mirroring the logic in [1] and setting
> bus->rman.rm_start = 224;
> in vmd_attach() might help.
> 
>> I was also able to stop kernel panics by adding:
>>
>> rman_fini(&sc->vmd_bus.rman);
>>
>> In the fail: statement in vmd_attach().
>>
>> But I still cannot detect the SSD.
> 
> [1] 
> https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L507

You will also need to subtract that starting bus number from the bus number used
to compute the offset into the PCI-express region for config register read/write
as this code does:

https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L339

Also, that means the vm_bus.c can't hardcode reading from bus 0.  Instead,
vmd(4) might need to export an IVAR to vmd_bus(4) that is the starting bus
number and vm_bus needs to use that instead of hardcoding 0.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling AESNI by default

2020-12-31 Thread John Baldwin
On 12/31/20 12:15 PM, Franco Fichtner wrote:
> https://cgit.freebsd.org/src/commit/sys/crypto/aesni?h=stable/12&id=95b37a4ed741fd116809d0f2cb295c4e9977f5b6
> 
> may have subtly broken a number of IPsec installations by stalling active
> connections after certain amounts of traffic transferred.  We're still
> trying to confirm, but it looks like this had an overall impact on 12.0
> and 12.1 except that only one person in OPNsense traced it back to aesni.ko
> to our knowledge to effective work around an apparent issue there.
> 
> If that is not the actual fix, the problem still exists in 12.2 and onward ;)

We don't support AES-CCM for IPsec, so there is 0 chance that commit has any
effect on IPsec in 12.  There's not much detail in the forum posts though
(e.g. netstat -s output to get ipsec, esp, and ah stats).  Also, at least
one forum post mentioned it happened when doing an upgrade from 11.2 to 12.1
which is a larger set of changes.  I know the pfsense folks had a major
performance regression due to iflib with Intel e1000 devices that might
manifest as this perhaps?  Disabling aseni might just be throttling the
connection slow enough to avoid hitting a bug in a NIC driver for example.
I think netstat -s would be a better place to start to try to debug this.

> https://github.com/opnsense/core/issues/4415

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: PCIe Root Port/Bus Not Detected in VMD

2020-12-31 Thread John Baldwin
On 12/30/20 9:45 PM, Neel Chauhan wrote:
> For reference, I am attaching the `pciconf -lv` and `acpidump -dt` 
> dumps.

Hmm, the acpidump doesn't have the -d contents, only the -t, and
PCI bridges are generally enumerated in the the -d part.  These
PCI bridges aren't enumerated in ACPI though, so that probably
doesn't matter.

The dinfo getting 0x means that somehow the way the PCI config
access is being handled for the child devices in vmd.c is wrong for
this bridge.  You might have to spelunk in the Linux driver to see
if the logic in vmd_read_config() and vmd_write_config() is
correct.

-- 
-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: git and the loss of revision numbers

2020-12-29 Thread John Baldwin
On 12/29/20 7:11 AM, monochrome wrote:
> ok, this appears to be what I was looking for
> 
> example:
> git reset --hard f20c0e331
> then:
> git pull --ff-only
> is again able to update as normal
> 
> I should point out also that this is from the point of view of any 
> random person just building freebsd from source, not a developer, so 
> there are no local changes. Though it does blow away changes to the conf 
> file, that's a lesser issue to deal with.

One other thing to consider is that if you are trying to track down
a regression, you can use 'git bisect' to do this and it will do
the binary search for you.  In the case of searching for a regression,
you will be better served by that tool than trying to use
'git reset --hard' directly.

The other approach you can use to avoid having to look up hashes would
be to create your own local tags each time you update.  Then you can
easily go back to that tag by name instead of having to look up the
hash.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion pgrp->pg_jobc > 0 failed at kern_proc.c:816

2020-12-28 Thread John Baldwin
On 12/28/20 12:24 PM, John Baldwin wrote:
> I got this panic again today in a VM when quitting a gdb
> session after killing a child process via 'kill'.
> 
> panic: Assertion pgrp->pg_jobc > 0 failed at 
> /git/bhyve/sys/kern/kern_proc.c:816
> cpuid = 1
> time = 1609185862
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00946547f0
> vpanic() at vpanic+0x181/frame 0xfe0094654840
> panic() at panic+0x43/frame 0xfe00946548a0
> _refcount_update_saturated() at _refcount_update_saturated/frame 
> 0xfe00946548e0
> killjobc() at killjobc+0x6a6/frame 0xfe0094654940
> exit1() at exit1+0x6af/frame 0xfe00946549b0
> sys_sys_exit() at sys_sys_exit+0xd/frame 0xfe00946549c0
> amd64_syscall() at amd64_syscall+0x12e/frame 0xfe0094654af0
> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0094654af0
> --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8024c8f0a, rsp = 
> 0x7fffe358, rbp = 0x7fffe370 ---
> KDB: enter: panic
> [ thread pid 44034 tid 102484 ]
> Stopped at  kdb_enter+0x37: movq$0,0x10ab066(%rip)
> 
> From what I can tell, the child process that was killed via 'kill' has
> not yet exited and is stuck in ptracestop() from fork():
> 
> (kgdb) where
> #0  sched_switch (td=0xfe0094001a00, flags=) at 
> /git/bhyve/sys/kern/sched_ule.c:2147
> #1  0x80bf4015 in mi_switch (flags=266) at 
> /git/bhyve/sys/kern/kern_synch.c:542
> #2  0x80bfeba5 in thread_suspend_switch (td=, 
> p=)
> at /git/bhyve/sys/kern/kern_thread.c:1477
> #3  0x80bef04b in ptracestop (td=0xfe0094001a00, sig=17, si=0x0)
> at /git/bhyve/sys/kern/kern_sig.c:2642
> #4  0x80ba1a54 in fork_return (td=0xfe0094001a00, 
> frame=0xfe0094671b00)
> at /git/bhyve/sys/kern/kern_fork.c:1106
> #5  0x80ba18b0 in fork_exit (callout=0x80ba1950 
> , arg=0xfe0094001a00, 
> frame=0xfe0094671b00) at /git/bhyve/sys/kern/kern_fork.c:1069
> #6  
> #7  0x0008007b71aa in ?? ()
> 
> kgdb can't find the panicking process due to the zombproc removal, so I will
> have to go work on kgdb to recover from that change. :(

I've come up with a shorter reproducer (original was trying to debug a perl 
script
in OpenSSL's test suite).

Compile this program:

#include 
#include 
#include 
#include 
#include 
#include 

int
main(void)
{
pid_t pid, wpid;

pid = fork();
if (pid == -1)
err(1, "fork");
if (pid == 0) {
printf("I'm in the child\n");
exit(1);
}
printf("I'm in the parent\n");
wpid = waitpid(pid, NULL, 0);
if (wpid < 0)
err(1, "waitpid");

return (0);
}

Then in gdb do the following:

# gdb101 ./forktest
...
(gdb) catch fork
Catchpoint 1 (fork)
(gdb) r
Starting program: /mnt/forktest/forktest 

Catchpoint 1 (forked process 830), _fork () at _fork.S:4
4   _fork.S: No such file or directory.
(gdb) kill
Kill the program being debugged? (y or n) y
[Inferior 1 (process 828) killed]
(gdb) quit


-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic: Assertion pgrp->pg_jobc > 0 failed at kern_proc.c:816

2020-12-28 Thread John Baldwin
I got this panic again today in a VM when quitting a gdb
session after killing a child process via 'kill'.

panic: Assertion pgrp->pg_jobc > 0 failed at /git/bhyve/sys/kern/kern_proc.c:816
cpuid = 1
time = 1609185862
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00946547f0
vpanic() at vpanic+0x181/frame 0xfe0094654840
panic() at panic+0x43/frame 0xfe00946548a0
_refcount_update_saturated() at _refcount_update_saturated/frame 
0xfe00946548e0
killjobc() at killjobc+0x6a6/frame 0xfe0094654940
exit1() at exit1+0x6af/frame 0xfe00946549b0
sys_sys_exit() at sys_sys_exit+0xd/frame 0xfe00946549c0
amd64_syscall() at amd64_syscall+0x12e/frame 0xfe0094654af0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0094654af0
--- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8024c8f0a, rsp = 
0x7fffe358, rbp = 0x7fffe370 ---
KDB: enter: panic
[ thread pid 44034 tid 102484 ]
Stopped at  kdb_enter+0x37: movq$0,0x10ab066(%rip)

>From what I can tell, the child process that was killed via 'kill' has
not yet exited and is stuck in ptracestop() from fork():

(kgdb) where
#0  sched_switch (td=0xfe0094001a00, flags=) at 
/git/bhyve/sys/kern/sched_ule.c:2147
#1  0x80bf4015 in mi_switch (flags=266) at 
/git/bhyve/sys/kern/kern_synch.c:542
#2  0x80bfeba5 in thread_suspend_switch (td=, 
p=)
at /git/bhyve/sys/kern/kern_thread.c:1477
#3  0x80bef04b in ptracestop (td=0xfe0094001a00, sig=17, si=0x0)
at /git/bhyve/sys/kern/kern_sig.c:2642
#4  0x80ba1a54 in fork_return (td=0xfe0094001a00, 
frame=0xfe0094671b00)
at /git/bhyve/sys/kern/kern_fork.c:1106
#5  0x80ba18b0 in fork_exit (callout=0x80ba1950 , 
arg=0xfe0094001a00, 
frame=0xfe0094671b00) at /git/bhyve/sys/kern/kern_fork.c:1069
#6  
#7  0x0008007b71aa in ?? ()

kgdb can't find the panicking process due to the zombproc removal, so I will
have to go work on kgdb to recover from that change. :(

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: OpenZFS and L2ARC

2020-09-09 Thread John Baldwin
On 9/9/20 1:30 AM, Stefan Esser wrote:
> Am 09.09.20 um 08:46 schrieb Stefan Esser:
>> Am 09.09.20 um 00:45 schrieb Graham Perrin:
>>> Recalling 
>>> <https://lists.freebsd.org/pipermail/freebsd-current/2020-March/075661.html>,
>>>  
>>> on 28/03/2020 15:17,28/03/2020 15:17, Allan Jude wrote:
>>>
>>>  >> …
>>>  >>
>>>  >> Basically 'arc' was converted to a subtree.
>>>  >>
>>>  >> We should add some backwards compat sysctls to cover some of
>>>  >> these renames etc so configs and scripts don't break etc.
>>
>> This is not possible for quite a number of sysctls, since there is
>> no simple 1:1 mapping for many of them.
>>
>>
>> And there is an annoyance that I had noticed before but now have
>> tracked down:
>>
>> $ time sysctl kstat.zfs.misc.dbufs | wc
>>     55327 2047031 16333472
>>
>> real    0m16,446s
>> user    0m0,055s
>> sys    0m16,397s
>>
>> Somebody decided to put a complete list of dbufs under this sysctl
>> and thus querying "kstat.zfs.misc" takes that long (16 seconds to
>> generate 16 MB of output on my system), even if only a few other
>> values in "kstat.zfs.misc" are needed.
>>
>> I do not know whether there is any chance to get that debug output
>> moved out of the "misc", e.g. into a new "debug" sub-tree. I'm afraid,
>> that on Linux there are scripts that expect it under this name.
>>
>> If it is not acceptable to the upstream, we should locally modify the
>> sysctl tree to move that variable out of "misc", IMHO. (While not
>> taking much time, "kstat.zfs.misc.dbgmsg" should also be relocted to
>> a "debug" sub-tree, IMHO ...)
>>
>> zfs-stats needs tens of values from "misc", and if they are not all
>> added individually to the Kstat array, this will limit the response
>> time to any zfs-stats invocation.
>>
>> It is not too hard to add the new variables in zfs-stats and to
>> adapt the calculations to derive meaningful values to display.
>>
>> But if it always takes 16 seconds to generate any output, I'm not
>> likely to use it too often ...
> 
> Update: I have created a fork of zfs-stats to work on:
> 
> https://github.com/stesser/zfs-stats
> 
> Initial change is to work around the long delay mentioned above and to
> use the correct name for the vdev cache size variable and to display
> the size, data contents and the corresponding compression factor of the
> compressed L2ARC.
> 
> I'll create pull requests to inform the upstream of these changes.

A simple fix might be to use CTLFLAG_SKIP so that you only invoke the
expensive sysctls if you request them by name, but not if you request
the 'kstat.zfs' tree.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: weird Ctrl-T debug messages

2020-06-27 Thread John Baldwin
On 6/27/20 2:59 AM, Michael Gmelin wrote:
> 
> 
> On Sat, 27 Jun 2020 12:06:17 +0300
> Andriy Gapon  wrote:
> 
>> On 27/06/2020 10:44, Li-Wen Hsu wrote:
>>> On Sat, Jun 27, 2020 at 3:04 PM Hartmann, O.
>>>  wrote:  
>>>>
>>>> Running poudriere on recent CURRENT with (recent) 12-STABLE and
>>>> CURRENT jails reveals a weird behaviour recently when hitting
>>>> Ctrl-T:  
>>> ...  
>>>> Is this debug fallout from /bin/sh?  
>>>
>>> It's because kern.tty_info_kstacks is on by default now:
>>>
>>> https://svnweb.freebsd.org/changeset/base/362141  
>>
>> May I suggest that the stack trace is printed procstat -kk style
>> (single line) ? I think that the more compact output would be more
>> convenient.
> 
> It's a cool feature and having it on by default on CURRENT certainly
> helps to discover it, which is great. Thanks for implementing this!
> 
> I wouldn't enable it by default on RELEASE versions though, as CTRL-T
> is a user interface to get status information (at least this is how I
> use it personally, e.g., while running commands like dd[0], cp, mv,
> poudriere etc.), not for getting debug output.

I agree with this.

> Question: Speaking of discovering the feature, wouldn't it make sense
> to document this tunable on the stack(9) and/or tty(4) man page(s)?

This sounds like a great idea.  Would you able to come up with a patch?
I'd be happy to review it.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-27 Thread John Baldwin
On 5/27/20 2:05 PM, Hans Petter Selasky wrote:
> On 2020-05-27 15:41, Justin Hibbits wrote:
>> On Wed, 27 May 2020 06:27:16 -0700
>> John Baldwin  wrote:
>>
>>> On 5/27/20 2:39 AM, Andriy Gapon wrote:
>>>> On 27/05/2020 11:13, Andriy Gapon wrote:
>>>>> I added more diagnostics and it seems to support the idea that the
>>>>> problem is related to I/O cycles and bridges.
>>>>>
>>>>> ACPI timer suddenly starts returning 0x and that lasts for
>>>>> tens of microseconds before the timer goes back to returning
>>>>> normal values with an expected increase.
>>>>> AMD provides a proprietary way to access ACPI registers via MMIO
>>>>> (0xfed808xx). That mechanism is unaffected, ACPI timer register
>>>>> always returns good values.
>>>>>
>>>>> The problem seems to happen when restoring configuration of a
>>>>> particular PCI bridge.  What's interesting is that the bridge
>>>>> decodes one memory range and one I/O range.
>>>>>
>>>>> Looking at pci_cfg_restore() I wonder if it is wise to restore
>>>>> PCIR_COMMAND so early.  Could it be that after the resume the
>>>>> bridge is configured with a wrong I/O range (e.g., too wide) and
>>>>> by writing PCIR_COMMAND we enable that decoding. So, the bridge
>>>>> steals I/O cycles destined for ACPI support hardware.  If there is
>>>>> nothing behind the bridge to handle those ports, then we get those
>>>>> bad readings. Once the bridge configuration is fully restored, the
>>>>> I/O handling goes back to normal.
>>>>
>>>>  From what I see, this looks like a BIOS bug.
>>>> Upon resume, it swaps window configurations of pcib1 and pcib2
>>>> (until FreeBSD restores them).  pcib1 originally does not have an
>>>> I/O window.  So, BIOS programs both base and limit of pcib2 I/O
>>>> window to zero.   When FreeBSD writes its command register to
>>>> enable I/O decoding it starts claiming 0x0 - 0xFFF I/O port range.
>>>> That covers the ACPI ports at 0x8xx.
>>>>
>>>> Some printf-s.
>>>>  From (verbose) boot time:
>>>> pcib1:   domain0
>>>> pcib1:   secondary bus 1
>>>> pcib1:   subordinate bus   1
>>>> pcib1:   memory decode 0xfea0-0xfeaf
>>>> pcib2:   domain0
>>>> pcib2:   secondary bus 2
>>>> pcib2:   subordinate bus   2
>>>> pcib2:   I/O decode0xf000-0x
>>>> pcib2:   memory decode 0xfe90-0xfe9f
>>>>
>>>> My printf-s from resume time:
>>>> pcib1: old I/O base (low): 0xf1
>>>> pcib1: old I/O base (high): 0x0
>>>> pcib1: old I/O limit (low): 0x1
>>>> pcib1: old I/O limit (high): 0x0
>>>> pcib2: old I/O base (low): 0x1
>>>> pcib2: old I/O base (high): 0x0
>>>> pcib2: old I/O limit (low): 0x1
>>>> pcib2: old I/O limit (high): 0x0
>>>
>>> The "solution" I think is to have resume be multi-pass and to resume
>>> all the bridges first before trying to resume leaf devices (including
>>> timers), but that's a fair bit of work.  It might be that we just
>>> need to resume timer interrupts later after the new-bus resume (I
>>> think we currently do it before?), though the reason for that was to
>>> allow resume methods in devices to sleep (I'm not sure if any do).
>>>
>>
>> That sounds like a good fit for https://reviews.freebsd.org/D203 .
>> Someone (TM) just needs to take it over the finish line... 6 years
>> later.
> 
> Is this perhaps related to:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666

No.  I get that constantly on a desktop that never suspends/resumes.
It only started after upgrading to 12.0.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-27 Thread John Baldwin
On 5/27/20 2:39 AM, Andriy Gapon wrote:
> On 27/05/2020 11:13, Andriy Gapon wrote:
>> I added more diagnostics and it seems to support the idea that the problem is
>> related to I/O cycles and bridges.
>>
>> ACPI timer suddenly starts returning 0x and that lasts for tens of
>> microseconds before the timer goes back to returning normal values with an
>> expected increase.
>> AMD provides a proprietary way to access ACPI registers via MMIO 
>> (0xfed808xx).
>> That mechanism is unaffected, ACPI timer register always returns good values.
>>
>> The problem seems to happen when restoring configuration of a particular PCI
>> bridge.  What's interesting is that the bridge decodes one memory range and 
>> one
>> I/O range.
>>
>> Looking at pci_cfg_restore() I wonder if it is wise to restore PCIR_COMMAND 
>> so
>> early.  Could it be that after the resume the bridge is configured with a 
>> wrong
>> I/O range (e.g., too wide) and by writing PCIR_COMMAND we enable that 
>> decoding.
>>  So, the bridge steals I/O cycles destined for ACPI support hardware.  If 
>> there
>> is nothing behind the bridge to handle those ports, then we get those bad 
>> readings.
>> Once the bridge configuration is fully restored, the I/O handling goes back 
>> to
>> normal.
> 
> From what I see, this looks like a BIOS bug.
> Upon resume, it swaps window configurations of pcib1 and pcib2 (until FreeBSD
> restores them).  pcib1 originally does not have an I/O window.  So, BIOS
> programs both base and limit of pcib2 I/O window to zero.   When FreeBSD 
> writes
> its command register to enable I/O decoding it starts claiming 0x0 - 0xFFF I/O
> port range.  That covers the ACPI ports at 0x8xx.
> 
> Some printf-s.
> From (verbose) boot time:
> pcib1:   domain0
> pcib1:   secondary bus 1
> pcib1:   subordinate bus   1
> pcib1:   memory decode 0xfea0-0xfeaf
> pcib2:   domain0
> pcib2:   secondary bus 2
> pcib2:   subordinate bus   2
> pcib2:   I/O decode0xf000-0x
> pcib2:   memory decode 0xfe90-0xfe9f
> 
> My printf-s from resume time:
> pcib1: old I/O base (low): 0xf1
> pcib1: old I/O base (high): 0x0
> pcib1: old I/O limit (low): 0x1
> pcib1: old I/O limit (high): 0x0
> pcib2: old I/O base (low): 0x1
> pcib2: old I/O base (high): 0x0
> pcib2: old I/O limit (low): 0x1
> pcib2: old I/O limit (high): 0x0

The "solution" I think is to have resume be multi-pass and to resume all the 
bridges
first before trying to resume leaf devices (including timers), but that's a 
fair bit
of work.  It might be that we just need to resume timer interrupts later after 
the
new-bus resume (I think we currently do it before?), though the reason for that 
was
to allow resume methods in devices to sleep (I'm not sure if any do).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-26 Thread John Baldwin
On 5/26/20 11:55 AM, Konstantin Belousov wrote:
> On Tue, May 26, 2020 at 06:22:13PM +0300, Andriy Gapon wrote:
>> On 25/05/2020 11:37, Andriy Gapon wrote:
>>> Also, there is another issue related to atrtc.
>>> When I have both drivers attached, and also when I have only atrtc attached
>>> (efi.rt.disabled=1), system clock jumps 10 minutes forward after each 
>>> suspend /
>>> resume cycle (S0 -> S3 -> S0).  That does not happen for reboot and shutdown
>>> cycles.  I haven't investigated this deeper, but it is a curious problem.
>>
>> Actually, I was wrong.  The problem can also occur with efirtc alone.
>> Also, sometimes there is a different problem where there are no callouts for 
>> a
>> period of time on the order of minutes.  I tracked it to cc_lastscan being 
>> set
>> to a value greater than the current uptime.  So, any scheduled callout gets
>> scheduled at cc_lastscan and it is a while before the uptime catches up.
>>
>> It seemed that both issues were connected and were a result of the uptime
>> jumping forward by some minutes and then jumping back to a sane value.
>> If something important happened during the weird period, like getting time of
>> day from hardware or invoking a callout, it lead to the observed effects.
>>
>> So, that gave me some ideas where to add debugging checks.
>> What I determined is that ACPI timer (ACPI-fast) could produce a reading of 
>> all
>> 1-s like happens when there is no hardware response.
>>
>> I caught one such instance and got a stack trace for it (but no crash dump
>> because devices had not resumed yet):
>> tc_windup() at tc_windup+0x318/frame 0xfe00a7a19300
>> tc_ticktock() at tc_ticktock+0x4b/frame 0xfe00a7a19320
>> hardclock() at hardclock+0x107/frame 0xfe00a7a19360
>> handleevents() at handleevents+0xb3/frame 0xfe00a7a193a0
>> timercb() at timercb+0x196/frame 0xfe00a7a193f0
>> lapic_handle_timer() at lapic_handle_timer+0x98/frame 0xfe00a7a19420
>> Xtimerint() at Xtimerint+0xb1/frame 0xfe00a7a19420
>> --- interrupt, rip = 0x80b34500, rsp = 0xfe00a7a194f8, rbp =
>> 0xfe00a7a19540 ---
>> acpi_pcib_write_config() at acpi_pcib_write_config/frame 0xfe00a7a19540
>> pci_cfg_restore() at pci_cfg_restore+0x2cc/frame 0xfe00a7a195a0
>> pci_resume_child() at pci_resume_child+0xee/frame 0xfe00a7a195e0
>> pci_resume() at pci_resume+0x49/frame 0xfe00a7a19630
>> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
>> 0xfe00a7a19650
>> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a19680
>> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
>> 0xfe00a7a196a0
>> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a196d0
>> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
>> 0xfe00a7a196f0
>> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a19720
>> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
>> 0xfe00a7a19740
>> root_resume() at root_resume+0x29/frame 0xfe00a7a19770
>> acpi_EnterSleepState() at acpi_EnterSleepState+0x73b/frame 0xfe00a7a197f0
>> acpi_AckSleepState() at acpi_AckSleepState+0x144/frame 0xfe00a7a19820
>> devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfe00a7a19870
>> vn_ioctl() at vn_ioctl+0x132/frame 0xfe00a7a19980
>> devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfe00a7a199a0
>> kern_ioctl() at kern_ioctl+0x27b/frame 0xfe00a7a19a00
>> sys_ioctl() at sys_ioctl+0x123/frame 0xfe00a7a19ad0
>> amd64_syscall() at amd64_syscall+0x140/frame 0xfe00a7a19bf0
>> fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe00a7a19bf0
>>
>> I am not sure if this is just a coincidence but it appears as if a write to 
>> some
>> PCI configuration register could temporarily interfere with access to the PM
>> timer I/O port.
>> Is that plausible?
> If something disabled a BAR, then typical response of x86 chipset for timed
> out read from PCIe is 0xf... . 

And the ACPI timer might be "behind" the isab0 bridge device which would indeed
cause this.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC: merging nfs-over-tls changes into head/sys

2020-05-22 Thread John Baldwin
On 5/21/20 2:01 PM, Rick Macklem wrote:
> Hi,
> 
> I have now completed changes to the code in projects/nfs-over-tls, which
> implements TLS encryption of NFS RPC messages. (This roughly conforms
> to the internet draft "Towards Remote Procedure Call Encryption By Default",
> which should soon become an RFC. For now, TLS1.2 is used instead of TLS1.3,
> since FreeBSD's KERN_TLS does not yet implement TLS1.3.)
> 
> I'd like to start merging some of the kernel changes into head/sys.
> 
> The first of these would be creation of the syscall used by the daemons.
> (The code in projects/nfs-over-tls cheats and uses the syscall for the gssd,
>  but it needs to have its own syscall so that the gssd daemon can run 
> concurrently
>  with it. I didn't want testers to need to build userland just to get a 
> syscall stub
>  in libc.)
> 
> After this, there are a bunch of changes to the NFS code to add support for
> ext_pgs mbufs (these are significant patches, but should not affect the
> non-ext_pgs mbuf case, since they'll be conditional on ND_EXTPGS/M_EXTPGS).
> 
> Does this sound ok to do?
> 
> Please let me know if you see problems with me doing this?

I don't see any problems, per se, but I still need to do some changes on my
end for software KTLS RX before it's ready to merge (I'm hoping to kill
the iovecs in the kthreads entirely).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ${COMPILER_VERSION} < 40300

2020-05-08 Thread John Baldwin
On 5/7/20 10:17 AM, Warner Losh wrote:
> On Thu, May 7, 2020 at 10:38 AM Eric van Gyzen  wrote:
> 
>> If I were to clean up obsolete ${COMPILER_VERSION} tests in the tree,
>> which ones should I keep?  I would probably confine it to head, so I
>> could prune quite a few.
>>
> 
> Anything in the bootstrap path should remain, especially in the install
> portion of the bootstrap path since we don't require new compilers for
> that. I doubt there's more than one or two of these and there may be zero.
> The rest can go away.
> 
> We should also look at taking out the fmake workarounds in the tree too.
> Most of these are in src/Makefile and src/Makefile.inc.

I think Eric though was asking about  and the like.  Right now
we still have conditional support for some really old compilers that are
likely to never be used with FreeBSD 13 (ancient Intel icc, gcc 2.95, etc.).

Like, do we keep support for pre-ANSI C to delete 'const' etc. via
macros?   Admittedly there isn't a tremendous amount of cruft in cdefs.h.
What would seem more invasive would be to do things like require C99 and
use 'restrict' directly instead of __restrict, but the first step towards
any of that is probably to remove some of the cruft from cdefs.h and
possibly some other places.

(BTW, it would be good to know if it's at all useful to keep any of the
icc bits around.)

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: toolchain status

2020-04-20 Thread John Baldwin
On 4/18/20 5:46 PM, Eric van Gyzen wrote:
> Which architectures are still often built with an external toolchain? 
> I'd like to re-commit jemalloc 5.2.1.  It was reverted because 
> "compilation fails for non-llvm-based platforms."  I just built 
> tinderbox worlds with 5.2.1 with no problems, albeit with llvm.

All platforms now use LLVM.  You can still use GCC to build platforms,
but if amd64 builds fine with GCC 9 with the patch applied, I think
you should be fine to move forward (that is, if only GCC 6 fails to
build, we can just deprecate using GCC 6 as an external toolchain
for 13)

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: buildkernel failure because ctfconvert not installed

2020-04-08 Thread John Baldwin
On 4/7/20 11:32 PM, Gary Jennejohn wrote:
> Has anyone else seen this error?
> 
> I tried to build a kernel yesterday, but the build failed while compiling
> modules because ctfconvert was not found.
> 
> I've had WITH_CTF=no in my src.conf for years, so neither ctfconvert nor
> ctfmerge were installed.
> 
> OK, I'll just go to the source dirctories and build and install.
> 
> Nope.  I got this error:
>   make: exec(ctfconvert) failed (No such file or directory)
> and the build failed.
> 
> WTF?  ctfconvert requires ctfconvert to build?  That makes no sense and is
> a real chicken-and-egg problem if I've ever seen one.
> 
> I ended up creating /usr/bin/ctf{convert,merge} shell scripts which simply
> did exit 0.  That allowed me to finally compile and install the utilities.
> 
> Now I'm forced to have WITH_CTF=yes in my src.conf.  No big deal.
> 
> Still, it seems like the change to the make infrastructure which assumed
> that cft{convert,merge} are always installed was rather premature.

The change is that GENERIC has 'makeoptions WITH_CTF=yes'.  If you build a
kernel without that, you shouldn't need to have ctfconvert installed.  This
does mean you need to use a custom kernel instead of GENERIC.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Emacs tramp mode doesn't work with CURRENT

2020-03-10 Thread John Baldwin
On 1/28/20 8:57 AM, John F Carr wrote:
> I use emacs tramp mode, which opens an ssh connection to a remote machine for 
> file access.  It works to Linux and FreeBSD 12.1, but not to CURRENT.  There 
> has been a change in the way characters are echoed by the shell, with 12.1 
> treating a consecutive run of backspace as an atomic unit and CURRENT 
> processing them one at a time.  This is not necessarily a bug, but it is a 
> nuisance and independently it is suboptimal.

I have the same breakage with an amd64 laptop running HEAD (and using
tramp-mode from emacs on a 12.x host)

Have you been able to bisect it at all?  I think libedit is probably a
good candidate as well.

What I see is that tramp-mode just hangs until I kill the ssh session it
is using, and then I see the same output you had below in the debug
window showing the extraneous newlines.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to use the ktls

2020-01-27 Thread John Baldwin

On 1/26/20 8:08 PM, Rick Macklem wrote:

John Baldwin wrote:
[stuff snipped]

Hmmm, this might be a fair bit of work indeed.

Right now KTLS only works for transmit (though I have some WIP for receive).

KTLS does assumes that the initial handshake and key negotiation is handled by
OpenSSL.  OpenSSL uses custom setockopt() calls to tell the kernel which
session keys to use.

I think what you would want to do is use something like OpenSSL_connect() in
userspace, and then check to see if KTLS "worked".  If it did, you can tell
the kernel it can write to the socket directly, otherwise you will have to
bounce data back out to userspace to run it through SSL_write() and have
userspace do SSL_read() and then feed data into the kernel.

The pseudo-code might look something like:

SSL *s;

s = SSL_new(...);

/* fd is the existing TCP socket */
SSL_set_fd(s, fd);
OpenSSL_connect(s);
if (BIO_get_ktls_send(SSL_get_wbio(s)) {
   /* Can use KTLS for transmit. */
}
if (BIO_get_ktls_recv(SSL_get_rbio(s)) {
   /* Can use KTLS for receive. */
}


So, I've been making some progress. The first stab at the daemons that do the
handshake are now on svn in base/projects/nfs-over-tls/usr.sbin/rpctlscd and
rpctlssd.

A couple of questions...
1 - I haven't found BIO_get_ktls_send() or BIO_get_ktls_recv(). Are they in some
   different library?


They only existing currently in OpenSSL master (which will be OpenSSL 3.0.0 
when it
is released).  I have some not-yet-tested WIP changes to backport those changes 
into
the base OpenSSL, but it will also add overhead to future OpenSSL imports 
perhaps,
so it is something I need to work with secteam@ on to decide if it's viable 
once I
have a tested PoC.

I will try to at least provide a patch to the security/openssl port to add a 
KTLS
option "soon" that you could use for testing.


2 - After a successful SSL_connect(), the receive queue for the socket has 
478bytes
  of stuff in it. SSL_read() seems to know how to skip over it, but I 
haven't
  figured out a good way to do this. (I currently just do a recv(..478,0) 
on the
  socket.)
  Any idea what to do with this? (Or will the receive side of the ktls 
figure out
  how to skip over it?)


I don't know yet. :-/  With the TOE-based TLS I had been testing with, this 
doesn't
happen because the NIC blocks the data until it gets the key and then it's 
always
available via KTLS.  With software-based KTLS for RX (which I'm going to start
working on soon), this won't be the case and you will potentially have some data
already ready by OpenSSL that needs to be drained from OpenSSL before you can
depend on KTLS.  It's probably only the first few messsages, but I will need to 
figure
out a way that you can tell how much pending data in userland you need to read 
via
SSL_read() and then pass back into the kernel before relying on KTLS (it would 
just
be a single chunk of data after SSL_connect you would have to do this for).


I'm currently testing with a kernel that doesn't have options KERN_TLS and
(so long as I get rid of the 478 bytes), it then just does unencrypted RPCs.

So, I guess the big question is can I get access to your WIP code for KTLS
receive? (I have no idea if I can make progress on it, but I can't do a lot more
before I have that.)


The WIP only works right now if you have a Chelsio T6 NIC as it uses the T6's 
TCP
offload engine to do TLS.  If you don't have that gear, ping me off-list.  It
would also let you not worry about the SSL_read case for now for initial 
testing.

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to use the ktls

2020-01-13 Thread John Baldwin
On 1/12/20 8:23 PM, Benjamin Kaduk wrote:
> On Thu, Jan 09, 2020 at 10:53:38PM +, Rick Macklem wrote:
>> John Baldwin wrote:
>>> On 1/7/20 3:02 PM, Rick Macklem wrote:
>>>> Hi,
>>>>
>>>> Now that I've completed NFSv4.2 I'm on to the next project, which is 
>>>> making NFS
>>>> work over TLS.
>>>> Of course, I know absolutely nothing about TLS, which will make this an 
>>>> interesting
>>>> exercise for me.
>>>> I did find simple server code in the OpenSSL doc. which at least gives me 
>>>> a starting
>>>> point for the initialization stuff.
>>>> As I understand it, this initialization must be done in userspace?
>>>>
>>>> Then somehow, the ktls takes over and does the encryption of the
>>>> data being sent on the socket via sosend_generic(). Does that sound right?
>>>>
>>>> So, how does the kernel know the stuff that the initialization phase 
>>>> (handshake)
>>>> figures out, or is it magic I don't have to worry about?
>>>>
>>>> Don't waste much time replying to this. A few quick hints will keep me 
>>>> going for
>>>> now. (From what I've seen sofar, this TLS stuff isn't simple. And I 
>>>> thought Kerberos
>>>> was a pain.;-)
>>>>
>>>> Thanks in advance for any hints, rick
>>>
>>> Hmmm, this might be a fair bit of work indeed.
>> If it was easy,  it wouldn't be fun;-) FreeBSD13 is a ways off and if it 
>> doesn't make that, oh well..
>>
>>> Right now KTLS only works for transmit (though I have some WIP for receive).
>> Hopefully your WIP will make progress someday, or I might be able to work on 
>> it.
>>
>>> KTLS does assumes that the initial handshake and key negotiation is handled 
>>> by
>>> OpenSSL.  OpenSSL uses custom setockopt() calls to tell the kernel which
>>> session keys to use.
>> Yea, I figured I'd need a daemon like the gssd for this. The krpc makes it a 
>> little
>> more fun, since it handles TCP connections in the kernel.
>>
>>> I think what you would want to do is use something like OpenSSL_connect() in
>>> userspace, and then check to see if KTLS "worked".
>> Thanks (and for the code below). I found the simple server code in the 
>> OpenSSL doc,
>> but the client code gets a web page and is quite involved.
>>
>>> If it did, you can tell
>>> the kernel it can write to the socket directly, otherwise you will have to
>>> bounce data back out to userspace to run it through SSL_write() and have
>>> userspace do SSL_read() and then feed data into the kernel.
>> I don't think bouncing the data up/down to/from userland would work well.
>> I'd say "if it can't be done in the kernel, too bad". The above could be 
>> used for
>> a NULL RPC to see it is working, for the client.
> 
> So you're saying that we'd only support rpc-over-tls as an NFS client and
> not as a server, at least until the WIP for ktls read appears?

To be clear, I have KTLS RX working with TOE right now.  I have a design in my
head for KTLS RX that would use software and co-processor engines via OCF such
as aesni(4) and ccr(4) that I hope to implement in the next few months, so KTLS
RX isn't too far off.  OpenSSL already supports KTLS RX on Linux and the FreeBSD
patches I already have use the same API.  (Each received TLS frame is read via
recvmsg() with the TLS header fields in a cmsg.)

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to use the ktls

2020-01-08 Thread John Baldwin
On 1/7/20 3:02 PM, Rick Macklem wrote:
> Hi,
> 
> Now that I've completed NFSv4.2 I'm on to the next project, which is making 
> NFS
> work over TLS.
> Of course, I know absolutely nothing about TLS, which will make this an 
> interesting
> exercise for me.
> I did find simple server code in the OpenSSL doc. which at least gives me a 
> starting
> point for the initialization stuff.
> As I understand it, this initialization must be done in userspace?
> 
> Then somehow, the ktls takes over and does the encryption of the
> data being sent on the socket via sosend_generic(). Does that sound right?
> 
> So, how does the kernel know the stuff that the initialization phase 
> (handshake)
> figures out, or is it magic I don't have to worry about?
> 
> Don't waste much time replying to this. A few quick hints will keep me going 
> for
> now. (From what I've seen sofar, this TLS stuff isn't simple. And I thought 
> Kerberos
> was a pain.;-)
> 
> Thanks in advance for any hints, rick

Hmmm, this might be a fair bit of work indeed.

Right now KTLS only works for transmit (though I have some WIP for receive).

KTLS does assumes that the initial handshake and key negotiation is handled by
OpenSSL.  OpenSSL uses custom setockopt() calls to tell the kernel which
session keys to use.

I think what you would want to do is use something like OpenSSL_connect() in
userspace, and then check to see if KTLS "worked".  If it did, you can tell
the kernel it can write to the socket directly, otherwise you will have to
bounce data back out to userspace to run it through SSL_write() and have
userspace do SSL_read() and then feed data into the kernel.

The pseudo-code might look something like:

SSL *s;

s = SSL_new(...);

/* fd is the existing TCP socket */
SSL_set_fd(s, fd);
OpenSSL_connect(s);
if (BIO_get_ktls_send(SSL_get_wbio(s)) {
   /* Can use KTLS for transmit. */
}
if (BIO_get_ktls_recv(SSL_get_rbio(s)) {
   /* Can use KTLS for receive. */
}


-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: New external GCC toolchain ports/packages

2019-12-20 Thread John Baldwin
On 12/19/19 12:06 PM, Ryan Libby wrote:
> On Wed, Dec 18, 2019 at 1:49 PM John Baldwin  wrote:
>>
>> In the interest of supporting newer versions of GCC for a base system
>> toolchain, I've renamed the external GCC packages from -gcc
>> to -gcc6.  These are built as flavors of a new devel/freebsd-gcc6
>> port.  The xtoolchain package is not used for these new packages, instead
>> one does 'pkg install mips-gcc6' to get the GCC 6.x MIPS compiler and
>> uses 'CROSS_TOOLCHAIN=mips-gcc6'.  I've also gone ahead and updated this
>> compiler to 6.5.0.
>>
>> I will leave the old ports/packages around for now to permit an easy
>> transition, but going forward, the -gcc6 packages should be preferred
>> to -xtoolchain-gcc for all but riscv (riscv64-gcc and 
>> riscv64-xtoolchain-gcc
>> are separate from the powerpc64-gcc set of packages).
>>
>> In addition, I've also just added a devel/freebsd-gcc9 package which
>> builds -gcc9 packages.  It adds powerpc and riscv flavors relative
>> to freebsd-gcc6 and uses GCC 9.2.0.  To date in my testing I've yet to
>> be able to finish a buildworld on any of the platforms I've tried
>> (amd64, mips, sparc64), but the packages should permit other developers
>> to get the tree building with GCC 9.  To use these packages one would do
>> something like:
>>
>> # pkg install amd64-gcc9
>> # make buildworld CROSS_TOOLCHAIN=amd64-gcc9
>>
>> You can install both the gcc6 and gcc9 versions of a package at the same
>> time, e.g. amd64-gcc6 and amd64-gcc9.  Having different packages for major
>> versions is similar to llvm and will also let us keep a known-good
>> toolchain package for older releases while using newer major versions on
>> newer FreeBSD releases (e.g gcc9 for 13.0 and gcc6 for 12.x).
>>
>> I do plan to switch the default toolchains for make universe/tinderbox
>> for targets using -xtoolchain-gcc based on GCC 6 over to the
>> freebsd-gcc6 variants in the next week or so.
>>
>> --
>> John Baldwin
> 
> Awesome, thanks!  I was able to get amd64 buildworld and buildkernel to
> succeed with just a few changes, and none to the port.  I'll work on
> getting the changes in.

I have been able to get it building as well, mostly by muting a few
warnings, adding libcompiler_rt to rtld's link for i386, disabling
googletest (needs an upstream patch to stop using signed wchar_t),
and a hack to jemalloc.  I was able to build riscv as well with those
same changes and am working through builds of other platforms.

I'm happy to compare notes.  The jemalloc one is a bit weird.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: New external GCC toolchain ports/packages

2019-12-19 Thread John Baldwin
On 12/18/19 4:16 PM, Mark Millard wrote:
> 
> 
> On 2019-Dec-18, at 13:48, John Baldwin  wrote:
> 
>> In the interest of supporting newer versions of GCC for a base system
>> toolchain, I've renamed the external GCC packages from -gcc
>> to -gcc6.  These are built as flavors of a new devel/freebsd-gcc6
>> port.  The xtoolchain package is not used for these new packages, instead
>> one does 'pkg install mips-gcc6' to get the GCC 6.x MIPS compiler and
>> uses 'CROSS_TOOLCHAIN=mips-gcc6'.  I've also gone ahead and updated this
>> compiler to 6.5.0.
>>
>> I will leave the old ports/packages around for now to permit an easy
>> transition, but going forward, the -gcc6 packages should be preferred
>> to -xtoolchain-gcc for all but riscv (riscv64-gcc and 
>> riscv64-xtoolchain-gcc
>> are separate from the powerpc64-gcc set of packages).
>>
>> In addition, I've also just added a devel/freebsd-gcc9 package which
>> builds -gcc9 packages.  It adds powerpc and riscv flavors relative
>> to freebsd-gcc6 and uses GCC 9.2.0.  To date in my testing I've yet to
>> be able to finish a buildworld on any of the platforms I've tried
>> (amd64, mips, sparc64), but the packages should permit other developers
>> to get the tree building with GCC 9.  To use these packages one would do
>> something like:
>>
>> # pkg install amd64-gcc9
>> # make buildworld CROSS_TOOLCHAIN=amd64-gcc9
>>
>> You can install both the gcc6 and gcc9 versions of a package at the same
>> time, e.g. amd64-gcc6 and amd64-gcc9.  Having different packages for major
>> versions is similar to llvm and will also let us keep a known-good
>> toolchain package for older releases while using newer major versions on
>> newer FreeBSD releases (e.g gcc9 for 13.0 and gcc6 for 12.x).
>>
>> I do plan to switch the default toolchains for make universe/tinderbox
>> for targets using -xtoolchain-gcc based on GCC 6 over to the
>> freebsd-gcc6 variants in the next week or so.
>>
> 
> How about base/binutils and base/gcc ? Is their (future?) status
> changed by any of this activity?

I plan to rename base/gcc to base/gcc6 (and update it to 6.5) and then
add a base/gcc9 that would provide GCC 9 as /usr/bin/cc.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


New external GCC toolchain ports/packages

2019-12-18 Thread John Baldwin
In the interest of supporting newer versions of GCC for a base system
toolchain, I've renamed the external GCC packages from -gcc
to -gcc6.  These are built as flavors of a new devel/freebsd-gcc6
port.  The xtoolchain package is not used for these new packages, instead
one does 'pkg install mips-gcc6' to get the GCC 6.x MIPS compiler and
uses 'CROSS_TOOLCHAIN=mips-gcc6'.  I've also gone ahead and updated this
compiler to 6.5.0.

I will leave the old ports/packages around for now to permit an easy
transition, but going forward, the -gcc6 packages should be preferred
to -xtoolchain-gcc for all but riscv (riscv64-gcc and 
riscv64-xtoolchain-gcc
are separate from the powerpc64-gcc set of packages).

In addition, I've also just added a devel/freebsd-gcc9 package which
builds -gcc9 packages.  It adds powerpc and riscv flavors relative
to freebsd-gcc6 and uses GCC 9.2.0.  To date in my testing I've yet to
be able to finish a buildworld on any of the platforms I've tried
(amd64, mips, sparc64), but the packages should permit other developers
to get the tree building with GCC 9.  To use these packages one would do
something like:

# pkg install amd64-gcc9
# make buildworld CROSS_TOOLCHAIN=amd64-gcc9

You can install both the gcc6 and gcc9 versions of a package at the same
time, e.g. amd64-gcc6 and amd64-gcc9.  Having different packages for major
versions is similar to llvm and will also let us keep a known-good
toolchain package for older releases while using newer major versions on
newer FreeBSD releases (e.g gcc9 for 13.0 and gcc6 for 12.x).

I do plan to switch the default toolchains for make universe/tinderbox
for targets using -xtoolchain-gcc based on GCC 6 over to the
freebsd-gcc6 variants in the next week or so.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: make delete-old: missing some files?

2019-10-24 Thread John Baldwin
On 10/22/19 8:42 PM, Alexey Dokuchaev wrote:
> On Tue, Oct 22, 2019 at 04:34:53PM -0700, John Baldwin wrote:
>> On 10/18/19 10:05 AM, Alexey Dokuchaev wrote:
>>> hi there,
>>>
>>> i've made my -CURRENT world and installed, but "make delete-old" tells
>>> me it cannot remove some directories:
>>>
>>>>>> Removing old directories
>>> rmdir: /usr/share/dtrace: Directory not empty
>>> rmdir: /usr/lib/dtrace: Directory not empty
> 
> Apparently, these are because I've started to put WITHOUT_CDDL=yes in
> /etc/src.conf since recently:
> 
> $ find /usr/lib/dtrace -type f
> /usr/lib/dtrace/siftr.d
> /usr/lib/dtrace/mbuf.d
> /usr/lib/dtrace/socket.d
> 
> $ find /usr/share/dtrace -type f
> /usr/share/dtrace/nfsattrstats
> /usr/share/dtrace/siftr
> /usr/share/dtrace/blocking
> /usr/share/dtrace/tcpdebug> 
> I can see some dtrace/*.d files in OptionalObsoleteFiles.inc, perhaps
> these are missing?

Probably.

>>> # find /usr/lib/debug/usr/lib/engines
>>> /usr/lib/debug/usr/lib/engines
>>> /usr/lib/debug/usr/lib/engines/lib4758cca.so.debug
>>> ...
>>
>> These are from the OpenSSL 1.1.1 commit.  However, they are tagged as
>> OLD_LIBS and check-old-libs and delete-old-libs should be automatically
>> deleting these?  Does 'make check-old' report these files as
>> old libraries?
> 
> I've manually placed one of those back on the filesystem and `make
> check-old' reported it (twice!) under libraries.  But after r353907 it
> get cleaned up properly with `make delete-old'.

Hmm, then 'make delete-old-libs' should already delete them without needing
r353907.  The issue with r353907 is if someone doesn't delete the
actual libraries via 'make delete-old-libs' but then tries to debug an
application that was using the old openssl and crashed, we'd no longer
have debug symbols if the crash was in one of those libraries.  That
matters less for OpenSSL engines, but matters more for something like
libutil, etc. hence why we delete debug symbols as part of delete-old-libs
instead of delete-old.

If 'make delete-old-libs' deletes these files already, then we should
probably revert r353907.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: make delete-old: missing some files?

2019-10-22 Thread John Baldwin
On 10/18/19 10:05 AM, Alexey Dokuchaev wrote:
> hi there,
> 
> i've made my -CURRENT world and installed, but "make delete-old" tells
> me it cannot remove some directories:
> 
>>>> Removing old directories
> rmdir: /usr/share/dtrace: Directory not empty
> rmdir: /usr/lib/dtrace: Directory not empty
> rmdir: /usr/lib/debug/usr/tests/libexec/rtld-elf: Directory not empty
> rmdir: /usr/lib/debug/usr/tests/libexec: Directory not empty
> rmdir: /usr/lib/debug/usr/tests: Directory not empty
> rmdir: /usr/lib/debug/usr/lib/i18n: Directory not empty
> rmdir: /usr/lib/debug/usr/lib/engines: Directory not empty
> rmdir: /usr/lib/debug/usr/lib: Directory not empty
> rmdir: /usr/lib/debug/usr: Directory not empty
> 
> taking /usr/lib/debug/usr/lib/engines as an example:
> 
> # find /usr/lib/debug/usr/lib/engines
> /usr/lib/debug/usr/lib/engines
> /usr/lib/debug/usr/lib/engines/lib4758cca.so.debug
> /usr/lib/debug/usr/lib/engines/libaep.so.debug
> /usr/lib/debug/usr/lib/engines/libatalla.so.debug
> /usr/lib/debug/usr/lib/engines/libcapi.so.debug
> /usr/lib/debug/usr/lib/engines/libchil.so.debug
> /usr/lib/debug/usr/lib/engines/libcswift.so.debug
> /usr/lib/debug/usr/lib/engines/libgost.so.debug
> /usr/lib/debug/usr/lib/engines/libnuron.so.debug
> /usr/lib/debug/usr/lib/engines/libsureware.so.debug
> /usr/lib/debug/usr/lib/engines/libubsec.so.debug
> 
> am i missing something, or ObsoleteFiles.inc lacks a few entries?

These are from the OpenSSL 1.1.1 commit.  However, they are tagged as
OLD_LIBS and check-old-libs and delete-old-libs should be automatically
deleting these?  Does 'make check-old' report these files as
old libraries?

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ktrace/kdump give incorrect message on unlinkat() failure due to capabilities

2019-10-07 Thread John Baldwin
On 9/25/19 10:33 AM, Sergey Kandaurov wrote:
> On Sat, Sep 21, 2019 at 08:43:58PM -0400, Ryan Stone wrote:
>> I have written a short test program that runs unlinkat(2) in
>> capability mode and fails due to not having the write capabilities:
>>
>> https://people.freebsd.org/~rstone/src/unlink.c
>>
>> If I run the binary under ktrace and look at the kdump output, it
>> gives the following incorrect output:
>>
>> 43775 unlink   CALL  unlinkat(0x3,0x7fffe995,0)
>> 43775 unlink   NAMI  "from.QAUlAA0"
>> 43775 unlink   CAP   operation requires CAP_LOOKUP, descriptor holds 
>> CAP_LOOKUP
>> 43775 unlink   RET   unlinkat -1 errno 93 Capabilities insufficient
>>
>> The message should instead say that the operation requires
>> CAP_UNLINKAT.  Looking at sys/capsicum.h, I suspect that the problem
>> is related to the strange definition of CAP_UNLINKAT:
>>
>> #define CAP_UNLINKAT (CAP_LOOKUP | 0x1000ULL)
> 
> FYI, with this grep it was able to decode capabilities.
> 
> Index: lib/libsysdecode/mktables
> ===
> --- lib/libsysdecode/mktables (revision 352685)
> +++ lib/libsysdecode/mktables (working copy)
> @@ -157,7 +157,7 @@
>  gen_table "sigcode" "SI_[A-Z]+[[:space:]]+0(x[0-9abcdef]+)?"   
> "sys/signal.h"
>  gen_table "umtxcvwaitflags" "CVWAIT_[A-Z_]+[[:space:]]+0x[0-9]+"   
> "sys/umtx.h"
>  gen_table "umtxrwlockflags" "URWLOCK_PREFER_READER[[:space:]]+0x[0-9]+"
> "sys/umtx.h"
> -gen_table "caprights"   
> "CAP_[A-Z_]+[[:space:]]+CAPRIGHT\([0-9],[[:space:]]+0x[0-9]{16}ULL\)"   
> "sys/capsicum.h"
> +gen_table "caprights"   
> "CAP_[A-Z_]+[[:space:]]+(CAPRIGHT|[()A-Z_|[:space:]]+CAP_LOOKUP)"   
> "sys/capsicum.h"
>  gen_table "sctpprpolicy""SCTP_PR_SCTP_[A-Z_]+[[:space:]]+0x[0-9]+" 
> "netinet/sctp_uio.h" "SCTP_PR_SCTP_ALL"
>  gen_table "cmsgtypesocket"  "SCM_[A-Z_]+[[:space:]]+0x[0-9]+"  
> "sys/socket.h"
>  if [ -e "${include_dir}/x86/sysarch.h" ]; then

CAP_SEEK and CAP_MMAP_X might also be subject to this.  However, I'm not quite
understanding the regex, or at least why the modified portion of the regex isn't
something like this:

(CAPRIGHT\(|\(CAP_LOOKUP)

That is, you currently have [()A-Z_|[:space:]]+ for an expression that I think
will only ever match a single '(' character.

A more general form that might work for CAP_SEEK and CAP_MMAP_X might be
to match on 'CAP_ | 0xhttps://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: problem with LOCAL_MODULES

2019-08-30 Thread John Baldwin
On 8/30/19 10:42 AM, Kyle Evans wrote:
> On Fri, Aug 16, 2019 at 7:38 PM John Baldwin  wrote:
>>
>> On 8/16/19 3:05 AM, Gary Jennejohn wrote:
>>> I tried to build a kernel today and it failed in modules-all even
>>> though I had LOCAL_MODULES="" in /etc/src.conf, as recommended by
>>> jhb.
>>>
>>> That's wrong.  It has to be LOCAL_MODULES=, otherwise
>>> /sys/conf/kern.post.mk seems to conclude that there should be a
>>> module under /usr/local/sys/modules with the name "".
>>
>> I think this will permit both versions to work:
>>
>> Index: sys/conf/kern.post.mk
>> ===
>> --- kern.post.mk(revision 351151)
>> +++ kern.post.mk(working copy)
>> @@ -76,6 +76,7 @@ modules-${target}:
>> cd $S/modules; ${MKMODULESENV} ${MAKE} \
>> ${target:S/^reinstall$/install/:S/^clobber$/cleandir/}
>>  .endif
>> +.if !empty(LOCAL_MODULES)
>>  .for module in ${LOCAL_MODULES}
>> @${ECHODIR} "===> ${module} 
>> (${target:S/^reinstall$/install/:S/^clobber$/cleandir/})"
>> @cd ${LOCAL_MODULES_DIR}/${module}; ${MKMODULESENV} ${MAKE} \
>> @@ -83,6 +84,7 @@ modules-${target}:
>> ${target:S/^reinstall$/install/:S/^clobber$/cleandir/}
>>  .endfor
>>  .endif
>> +.endif
>>  .endfor
>>
>>  # Handle ports (as defined by the user) that build kernel modules
>>
> 
> I think I'd like to see this with !empty(LOCAL_MODULES) &&
> EXISTS(${LOCAL_MODULES_DIR}) or maybe just the latter condition to
> prevent accidental foot-shooting... I was testing a problem with doing
> this stuff in a poudriere build for swills@ and set LOCAL_MODULES=""
> only to get an error because LOCAL_MODULES_DIR doesn't yet exist on
> the machine I was testing with -- which we can trivially avoid.

Did this work for you?  Gary said in a followup that it didn't work,
so that's why I hadn't committed it.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: HEADSUP: drm-current-kmod now installs sources

2019-08-19 Thread John Baldwin
On 8/16/19 5:33 PM, Rozhuk Ivan wrote:
> On Fri, 16 Aug 2019 17:23:08 -0700
> John Baldwin  wrote:
> 
>>> I use better way:
>>> /etc/make.conf:
>>> # Modules to build with kernel.
>>> PORTS_MODULES+= graphics/drm-fbsd12.0-kmod
>>> graphics/gpu-firmware-kmod  
>>
>> This doesn't work for folks who use pre-built packages.
>>
> 
> I update mine /usr/src via rsync from other mine server, so any
> changes made by port or me or ... will lost.
> 
> Probably there is must be some solution like special folder where
> ports can store some file with port name, that automaticly go to
> PORTS_MODULES on build kernel.
> And probably pkg can do with this something for "folks who use pre-built 
> packages".

That is what this framework does.  Ports install a Makefile to
/usr/local/sys/modules//Makefile.  That makefile is invoked during
the 'modules' stages of buildkernel and installkernel.  In the case of
drm-current-kmod, the port installs sources into subdirectories of
/usr/local/sys/modules/drm-current-kmod and installs a top-level Makefile
in /usr/local/sys/modules/drm-current-kmod/Makefile that uses bsd.subdir.mk
to recurse into those subdirectories.

Currently, LOCAL_MODULES is automatically populated to a list of the
directories in /usr/local/sys/modules and the debate is about when
LOCAL_MODULES should be auto-populated or not.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: problem with LOCAL_MODULES

2019-08-16 Thread John Baldwin
On 8/16/19 3:05 AM, Gary Jennejohn wrote:
> I tried to build a kernel today and it failed in modules-all even
> though I had LOCAL_MODULES="" in /etc/src.conf, as recommended by
> jhb.
> 
> That's wrong.  It has to be LOCAL_MODULES=, otherwise
> /sys/conf/kern.post.mk seems to conclude that there should be a
> module under /usr/local/sys/modules with the name "".

I think this will permit both versions to work:

Index: sys/conf/kern.post.mk
===
--- kern.post.mk(revision 351151)
+++ kern.post.mk(working copy)
@@ -76,6 +76,7 @@ modules-${target}:
cd $S/modules; ${MKMODULESENV} ${MAKE} \
${target:S/^reinstall$/install/:S/^clobber$/cleandir/}
 .endif
+.if !empty(LOCAL_MODULES)
 .for module in ${LOCAL_MODULES}
@${ECHODIR} "===> ${module} 
(${target:S/^reinstall$/install/:S/^clobber$/cleandir/})"
@cd ${LOCAL_MODULES_DIR}/${module}; ${MKMODULESENV} ${MAKE} \
@@ -83,6 +84,7 @@ modules-${target}:
${target:S/^reinstall$/install/:S/^clobber$/cleandir/}
 .endfor
 .endif
+.endif
 .endfor
 
 # Handle ports (as defined by the user) that build kernel modules


-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


  1   2   3   4   5   6   7   8   9   10   >