Re: armv7-on-aarch64 stuck at urdlck

2024-07-24 Thread John F Carr



> On Jul 24, 2024, at 06:50, Konstantin Belousov  wrote:
> 
> On Wed, Jul 24, 2024 at 12:34:57PM +0200, m...@freebsd.org wrote:
>> 
>> 
>> On 24.07.2024 12:24, Konstantin Belousov wrote:
>>> On Tue, Jul 23, 2024 at 08:11:13PM +, John F Carr wrote:
>>>> On Jul 23, 2024, at 13:46, Michal Meloun  wrote:
>>>>> 
>>>>> On 23.07.2024 11:36, Konstantin Belousov wrote:
>>>>>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
>>>>>>> The good news is that I'm finally able to generate a working/locking
>>>>>>> test case.  The culprit (at least for me) is if "-mcpu" is used when
>>>>>>> compiling libthr (e.g. indirectly injected via CPUTYPE in 
>>>>>>> /etc/make.conf).
>>>>>>> If it is not used, libthr is broken (regardless of -O level or 
>>>>>>> debug/normal
>>>>>>> build), but -mcpu=cortex-a15 will always produce a working libthr.
>>>>>> I think this is very significant progress.
>>>>>> Do you plan to drill down more to see what is going on?
>>>>> 
>>>>> So the problem is now clear, and I fear it may apply to other 
>>>>> architectures as well.
>>>>> dlopen_object() (from rtld_elf),
>>>>> https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
>>>>> holds the rtld_bind_lock write lock for almost the entire time a new 
>>>>> library is loaded.
>>>>> If the code uses a yet unresolved symbol to load the library, the 
>>>>> rtl_bind() function attempts to get read lock of  rtld_bind_lock and a 
>>>>> deadlock occurs.
>>>>> 
>>>>> In this case, it round_up() in _thr_stack_fix_protection,
>>>>> https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
>>>>> Issued by __aeabi_uidiv (since not all armv7 processors support HW 
>>>>> divide).
>>>>> 
>>>>> Unfortunately, I'm not sure how to fix it.  The compiler can emit 
>>>>> __aeabi_<> in any place, and I'm not sure if it can resolve all the 
>>>>> symbols used by rtld_eld and libthr beforehand.
>>>>> 
>>>>> 
>>>>> Michal
>>>>> 
>>>> 
>>>> In this case (but not for all _aeabi_ functions) we can avoid division
>>>> as long as page size is a power of 2.
>>>> 
>>>> The function is
>>>> 
>>>>   static inline size_t
>>>>   round_up(size_t size)
>>>>   {
>>>>if (size % _thr_page_size != 0)
>>>>size = ((size / _thr_page_size) + 1) *
>>>>_thr_page_size;
>>>>return size;
>>>>   }
>>>> 
>>>> The body can be condensed to
>>>> 
>>>>   return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);
>>>> 
>>>> This is shorter in both lines of code and instruction bytes.
>>> 
>>> Lets not allow this to be lost.  Could anybody confirm that the patch
>>> below fixes the issue?
>>> 
>>> commit d560f4f6690a48476565278fd07ca131bf4eeb3c
>>> Author: Konstantin Belousov 
>>> Date:   Wed Jul 24 13:17:55 2024 +0300
>>> 
>>> rtld: avoid division in __thr_map_stacks_exec()
>>> The function is called by rtld with the rtld bind lock write-locked,
>>> when fixing the stack permission during dso load.  Not every ARMv7 CPU
>>> supports the div, which causes the recursive entry into rtld to resolve
>>> the  __aeabi_uidiv symbol, causing self-lock.
>>> Workaround the problem by using roundup2() instead of open-coding less
>>> efficient formula.
>>> Diagnosed by:   mmel
>>> Based on submission by: John F Carr 
>>> Sponsored by:   The FreeBSD Foundation
>>> MFC after:  1 week
>>> 
> Just realized that it is wrong.  Stack size is user-controlled and it does
> not need to be power of two.

Your change is correct.  _thr_page_size is set to getpagesize(),
which is a power of 2.   The call to roundup2 takes a user-provided
size and rounds it up to a multiple of the system page size.

I tested the change and it works.  My change also works and
should compile to identical code.  I forgot there was a standard
function to do the rounding.

> For final resolving of deadlocks, after a full day of digging, I'm very much
>> incline  of adding -znow to the linker flags for libthr.so (and maybe also
>> for ld-elf.so). The runtime cost of resolving all symbols at startup is very
>> low. Direct pre-solving in _thr_rtld_init() is problematic for the _aeabi_*
>> symbols, since they don't have an official C prototypes, and some are not
>> compatible with C calling conventions.
> I do not like it. `-z now' changes (breaks) the ABI and makes some symbols
> not preemtible.
> 
> In the worst case, we would need a call to the asm routine which causes the
> resolution of the _eabi_* symbols on arm.
> 

It would also be possible to link libthr with libgcc.a and use a linker map
to hide the _eabi_ symbols.





Re: armv7-on-aarch64 stuck at urdlck

2024-07-23 Thread John F Carr
On Jul 23, 2024, at 13:46, Michal Meloun  wrote:
> 
> On 23.07.2024 11:36, Konstantin Belousov wrote:
>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
>>> The good news is that I'm finally able to generate a working/locking
>>> test case.  The culprit (at least for me) is if "-mcpu" is used when
>>> compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf).
>>> If it is not used, libthr is broken (regardless of -O level or debug/normal
>>> build), but -mcpu=cortex-a15 will always produce a working libthr.
>> I think this is very significant progress.
>> Do you plan to drill down more to see what is going on?
> 
> So the problem is now clear, and I fear it may apply to other architectures 
> as well.
> dlopen_object() (from rtld_elf),
> https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
> holds the rtld_bind_lock write lock for almost the entire time a new library 
> is loaded.
> If the code uses a yet unresolved symbol to load the library, the rtl_bind() 
> function attempts to get read lock of  rtld_bind_lock and a deadlock occurs.
> 
> In this case, it round_up() in _thr_stack_fix_protection,
> https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
> Issued by __aeabi_uidiv (since not all armv7 processors support HW divide).
> 
> Unfortunately, I'm not sure how to fix it.  The compiler can emit __aeabi_<> 
> in any place, and I'm not sure if it can resolve all the symbols used by 
> rtld_eld and libthr beforehand.
> 
> 
> Michal
> 

In this case (but not for all _aeabi_ functions) we can avoid division
as long as page size is a power of 2.

The function is

  static inline size_t
  round_up(size_t size)
  {
if (size % _thr_page_size != 0)
size = ((size / _thr_page_size) + 1) *
_thr_page_size;
return size;
  }

The body can be condensed to

  return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);

This is shorter in both lines of code and instruction bytes.

John Carr




Re: armv7-on-aarch64 stuck at urdlck: I got a replication of the "ampere2" bulk build hangup problem on a Windows DevKit 2023

2024-07-22 Thread John F Carr



> On Jul 22, 2024, at 12:51, Mark Millard  wrote:
> 
> Another systematic difference in my personal builds vs.
> official pkgbase builds, snapshots, releases, etc. is
> that my armv7 builds are built on aarch64-as-armv7, not
> on amd64. Not that I have any specific evidence that
> such matters here.
> 
> But Michal Meloun's report indicated not using builds
> done on amd64 as well. ("Tegra" models and examples of
> ARMv7-A and of ARMv8-A.)
> 
> For John Carr, I do not know if amd64 based builds of
> the world were systematically in use, never in use,
> or some mix in his tests.
> 
> ===
> Mark Millard
> marklmi at yahoo.com
> 

I reproduced the hang with code built on aarch64.
I have not been cross-compiling from amd64.

For poudriere I use armv7 jails running on aarch64.
One of them just hit the hang with 14.1-STABLE
kernel and 15.0-CURRENT userspace.

# ps -d -J 1021
  PID TT  STATTIME COMMAND
77550  1  IJ   0:00.27 /usr/bin/make -C /usr/ports/graphics/librsvg2-rust stage
77574  1  IJ   0:00.00 - /bin/sh -e 
/wrkdirs/usr/ports/graphics/librsvg2-rust/work/makeiFVIOP
77575  1  IJ   0:00.06 `-- gmake -f Makefile 
DESTDIR=/wrkdirs/usr/ports/graphics/librsvg2-rust/wo
77576  1  IJ   0:00.06   `-- gmake INSTALL_PROGRAM=/bin/sh 
/wrkdirs/usr/ports/graphics/librsvg2-r
77577  1  IJ   0:00.06 `-- gmake install-recursive
77578  1  IJ   0:00.00   `-- /bin/sh -c fail=; \\\nif (target_option=k; 
case ${target_option-
77709  1  IJ   0:00.01 `-- gmake install
77710  1  IJ   0:00.00   `-- /bin/sh -c ( 
/usr/local/bin/gdk-pixbuf-query-loaders ./libpi
77711  1  IJ   0:00.01 `-- /usr/local/bin/gdk-pixbuf-query-loaders 
./libpixbufloader-
# ps -l -p 77711
  UID   PID  PPID  C PRI NI   VSZ   RSS MWCHAN STAT TT TIME COMMAND
65534 77711 77710 27  20  0 27520 16660 urdlck IJ1  0:00.01 
/usr/local/bin/gdk-pixbuf-query-l

Poudriere told me I shouldn't run a newer userspace than kernel.
It usually works despite the warning.





Re: 41dfea24eec panics during ata attach on ESXi VM

2024-06-05 Thread John Baldwin

On 6/5/24 4:35 AM, Yuri Pankov wrote:

After updating to 41dfea24eec (GENERIC-NODEBUG), ESXi VM started to
panic while attaching atapci children.  I was unable to grab original
boot panic data ("keyboard" dead), but was able to boot with
hint.ata.0.disabled=1, hint.ata.1.disabled=1, and `devctl enable ata0`
reproduced the issue:

ata0:  at channel 0 on atapci0


This should be fixed now by commit 56b822a17cde5940909633c50623d463191a7852.
Sorry for the breakage.

--
John Baldwin




Re: Deprecating smbfs(5) and removing it before FreeBSD 14

2024-06-04 Thread John Hixson
> 
> Thank you for the message. I'm glad someone has the courage to take the
> plunge. Smbfs is still very important to me. In a heterogeneous environment
> it is still the most common way to share data between systems.
> Are you planning the final version as a kernel module, or will the final
> version be via FUSE? I have had bad experiences with FUSE in the past with
> stability and performance.

The final version will be a kernel module. It  will also be BSD
licensed. I am not an expert at the VFS layer so I want to get the stack
ironed out in FUSE before moving it into kernel space.

- John


signature.asc
Description: PGP signature


Re: Deprecating smbfs(5) and removing it before FreeBSD 14

2024-06-04 Thread John Hixson
On Mon, Jun 27, 2022 at 03:27:54PM +0200, Miroslav Lachman wrote:
> On 16/06/2022 15:56, Rick Macklem wrote:
> > Miroslav Lachman <000.f...@quip.cz> wrote:
> > > On 24/01/2022 16:13, Rick Macklem wrote:
> > > 
> > [...]
> > > 
> > > > So, I think Mark and Yuri are correct and looking at up to date
> > > > Illumos sources is the next step.
> > > > (As I mentioned, porting the Apple sources is beyond what I am
> > > >willing to attempt.)
> > > > 
> > > > rick
> > > 
> > > Hello Rick,
> > > I would like to ask you I there is some progress with porting newer
> > > SMBFS / CIFS version to FreeBSD? Did you find Illumos sources as a
> > > possibility where to start porting?
> > Yes. I have the stuff off Illumos-gate, which I think is pretty up-to-date
> > and I agree that it should be easier than the Apple stuff to port into
> > FreeBSD.  I don't think it is "straightforward" as someone involved
> > with Illumos said, due to the big differences in VFS/locking, but...
> > 
> > Having said the above, I have not done much yet. I've been cleaning up
> > NFS stuff, although I am nearly done with that now.
> > I do plan on starting to work on it soon, but have no idea if/when I
> > will have something that might be useful for others.
> 
> I'm glad to hear that.
> 
> > > We have more and more problems with current state of mount_smbfs. I
> > > would be really glad if "somebody" can do the heroic work of
> > > implementing SMBv2 in FreeBSD.
> > > Maybe it's time to start some fundraising for sponsoring this work?
> > Well, funding isn't an issue for me (I'm just a retired guy who does this
> > stuff as a hobby). However, if there is someone else who is capable of
> > doing it if they are funded, I have no problem with that.
> > I could either help them, or simply stick with working on NFS and leave
> > SMBv23 to them.
> > 
> > Sorry, but I cannot report real progress on this as yet, rick
> 
> No need to sorry. I really appreciate your endless work on NFS and that you
> still have kind of interest to try porting SMBv2/3.
> Unfortunately I don't know anybody else trying to do this tremendous work.
> 

I am working on a from scratch implementation of smbfs. I do not have
any kind of time estimate since it is in my spare time. I chose this
route after spending considerable time looking at Apple and Solaris
implementations and wanting something without all of the legacy 1.0
crap. I do have a very minimal working FUSE version at this point, but
there is much to do, and even more to abide by the various
specifications.

I just thought I'd share in case anyone is interested.

- John


signature.asc
Description: PGP signature


Re: gcc behavior of init priority of .ctors and .dtors section

2024-06-04 Thread John Baldwin

On 5/16/24 4:05 PM, Lorenzo Salvadore wrote:

On Thursday, May 16th, 2024 at 20:26, Konstantin Belousov  
wrote:

gcc13 from ports
`# gcc ctors.c && ./a.out init 1 init 2 init 5 init 4 init 3 main fini 3 fini 4 
fini 5 fini 2 fini 1`

The above order is not expected. I think clang's one is correct.

Further hacking with readelf shows that clang produces the right order of
section .rela.ctors but gcc does not.

```
# clang -fno-use-init-array -c ctors.c && readelf -r ctors.o | grep 'Relocation 
section with addend (.rela.ctors)' -A5 > clang.txt
# gcc -c ctors.c && readelf -r ctors.o | grep 'Relocation section with addend 
(.rela.ctors)' -A5 > gcc.txt
# diff clang.txt gcc.txt
3,5c3,5
<  00080001 R_X86_64_64 0060 init_65535_2 + 0
< 0008 00070001 R_X86_64_64 0040 init + 0
< 0010 00060001 R_X86_64_64 0020 init_65535 + 0
---


 00060001 R_X86_64_64 0011 init_65535 + 0
0008 00070001 R_X86_64_64 0022 init + 0
0010 00080001 R_X86_64_64 0033 init_65535_2 + 0
```


The above show clearly gcc produces the wrong order of section `.rela.ctors`.

Is that expected behavior ?

I have not tried Linux version of gcc.


Note that init array vs. init function behavior is encoded by a note added
by crt1.o. I suspect that the problem is that gcc port is built without
--enable-initfini-array configure option.


Indeed, support for .init_array and .fini_array has been added to the GCC ports
but is present in the *-devel ports only for now. I will
soon proceed to enable it for the GCC standard ports too. lang/gcc14 is soon
to be added to the ports tree and will have it since the beginning.

If this is indeed the issue, switching to a -devel GCC port should fix it.


FWIW, the devel/freebsd-gcc* ports have passed this flag to GCC's configure
for a long time (since we made the switch in clang).

--
John Baldwin




Kernel build broken without "options KTRACE"

2024-03-06 Thread John Nielsen
Getting a set but not used warning for “td” in sys/kern/kern_condvar.c when 
doing a buildkernel for a config file without “options KTRACE”. I failed to 
copy the full error message/line numbers but I will reproduce this evening if 
needed.

JN




Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic [now fixed]

2024-02-15 Thread John Kennedy
On Wed, Feb 14, 2024 at 06:19:04PM -0800, Mark Millard wrote:
> Your changes have the RPi4B that previously got the
> panic to boot all the way instead. Details:
> 
> I have updated my pkg base environment to have the
> downloaded kernels (and kernel source) with your
> changes and have booted with each of:
> 
> /boot/kernel/kernel
> /boot/kernel.GENERIC-NODEBUG/kernel
> 
> For reference:
> 
> # uname -apKU
> FreeBSD aarch64-main-pkgs 15.0-CURRENT FreeBSD 15.0-CURRENT 
> main-n268300-d79b6b8ec267 GENERIC-NODEBUG arm64 aarch64 1500014 1500012
> 
> Thanks for the fix.
> 
> Now I'll update the rest of pkg base materials.

  The recent changes resolved my boot issues as well.

FreeBSD 15.0-CURRENT #245 main-n268300-d79b6b8ec26 (GENERIC-NODEBUG 
arm64 1500014)



Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic [now fixed]

2024-02-15 Thread John Baldwin

On 2/14/24 11:03 PM, Mark Millard wrote:

On Feb 14, 2024, at 18:19, Mark Millard  wrote:


Your changes have the RPi4B that previously got the
panic to boot all the way instead. Details:

I have updated my pkg base environment to have the
downloaded kernels (and kernel source) with your
changes and have booted with each of:

/boot/kernel/kernel
/boot/kernel.GENERIC-NODEBUG/kernel

For reference:

# uname -apKU
FreeBSD aarch64-main-pkgs 15.0-CURRENT FreeBSD 15.0-CURRENT 
main-n268300-d79b6b8ec267 GENERIC-NODEBUG arm64 aarch64 1500014 1500012

Thanks for the fix.

Now I'll update the rest of pkg base materials.


Question: Are any of the changes to be MFC'd at some point?


If I do I will merge a large batch at once, and probably adjust
the order.  For example, I'll merge the pci_host_generic changes
before pci_pci changes so that stable branches will be bisectable.

--
John Baldwin




Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-14 Thread John Baldwin

On 2/14/24 10:16 AM, Mark Millard wrote:

Top posting a related but separate item:

I looked up some old (2022-Dec-17) lspci -v output from
a Linux boot. Note the "Memory at" value 6 (in
the 35 bit BCM2711 address space) and the "(64-bit,
non-prefetchable)" (and "[size=4K]").

01:00.0 USB controller: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 
Controller (rev 01) (prog-if 30 [XHCI])
 Subsystem: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller
 Device tree node: 
/sys/firmware/devicetree/base/scb/pcie@7d50/pci@0,0/usb@0,0
 Flags: bus master, fast devsel, latency 0, IRQ 51
 Memory at 6 (64-bit, non-prefetchable) [size=4K]
 Capabilities: [80] Power Management version 3
 Capabilities: [90] MSI: Enable+ Count=1/4 Maskable- 64bit+
 Capabilities: [c4] Express Endpoint, MSI 00
 Capabilities: [100] Advanced Error Reporting
 Kernel driver in use: xhci_hcd


"Memory at 6 (64-bit, non-prefetchable)":
Violation of a PCIe standard?


No, this is a device BAR which can be 64-bit (memory BARs can either
be 32-bits or 64-bits).  However, the "window" in a PCI _bridge_ for
memory is only defined to be 32-bits.  Windows in PCI-PCI bridges
are a special type of BAR that defines the address ranges that the
bridge decodes on the parent side and passes down to child devices.
The prefetchable window in PCI-PCI bridges can optionally be 64-bit.

BAR == a range of memory or I/O port addresses decoded by a device,
usually mapped to a register bank, but sometimes mapped to internal
memory (e.g. a framebuffer)

Window == a range of memory or I/O port addresses decoded by a bridge
for which transactions are passed across the bridge to be handled by
a child device.

--
John Baldwin




Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-14 Thread John Baldwin

On 2/14/24 9:57 AM, Mark Millard wrote:

On Feb 14, 2024, at 08:08, John Baldwin  wrote:


On 2/12/24 5:57 PM, Mark Millard wrote:

On Feb 12, 2024, at 16:36, Mark Millard  wrote:

On Feb 12, 2024, at 16:10, Mark Millard  wrote:


On Feb 12, 2024, at 12:00, Mark Millard  wrote:


[Gack: I was looking at the wrong vintage of source code, predating
your changes: wrong system used.]


On Feb 12, 2024, at 10:41, Mark Millard  wrote:


On Feb 12, 2024, at 09:32, John Baldwin  wrote:


On 2/9/24 8:13 PM, Mark Millard wrote:

Summary:
pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000
. . .
rman_manage_region:  request: start 0x6, end 
0x6000f
panic: Failed to add resource to rman


Hmmm, I suspect this is due to the way that bus_translate_resource works which 
is
fundamentally broken.  It rewrites the start address of a resource in-situ 
instead
of keeping downstream resources separate from the upstream resources.   For 
example,
I don't see how you could ever release a resource in this design without 
completely
screwing up your rman.  That is, I expect trying to detach a PCI device behind a
translating bridge that uses the current approach should corrupt the allocated
resource ranges in an rman long before my changes.

That said, that doesn't really explain the panic.  Hmm, the panic might be 
because
for PCI bridge windows the driver now passes RF_ACTIVE and the 
bus_translate_resource
hack only kicks in the activate_resource method of pci_host_generic.c.


Detail:
. . .
pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000


This indicates this is a translating bus.


pcib1:  irq 91 at device 0.0 on pci0
rman_manage_region:  request: start 0x1, end 0x1
pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10
rman_reserve_resource_bound:  request: [0xc000, 0xc00f], 
length 0x10, flags 102, device pcib1
rman_reserve_resource_bound: trying 0x <0xc000,0xf>
considering [0xc000, 0x]
truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10)
candidate region: [0xc000, 0xc00f], size 0x10
allocating from the beginning
rman_manage_region:  request: start 0x6, end 
0x6000f


What you later typed does not match:

0x6
0x6000f

You later typed:

0x6000
0x600fff

This seems to have lead to some confusion from using the
wrong figure(s).


The fact that we are trying to reserve the CPU addresses in the rman is because
bus_translate_resource rewrote the start address in the resource after it was 
allocated.

That said, I can't see why rman_manage_region would actually fail.  At this 
point the
rman is empty (this is the first call to rman_manage_region for "pcib1 memory 
window"),
so only the check that should be failing are the checks against rm_start and
rm_end.  For the memory window, rm_start is always 0, and rm_end is always
0x, so both the old (0xc - 0xc00f) and new (0x6000 - 
0x600fff)
ranges are within those bounds.


No:

0x

.vs (actual):

0x6
0x6000f


Ok, then this explains the failure if the "raw" addresses are above 4G.  I have
access to an emag I'm currently using to test fixes to pci_host_generic.c to
avoid corrupting struct resource objects.  I'll post the diff once I've got
something verified to work.


It looks to me like in sys/dev/pci/pci_pci.c the:
static void
pcib_probe_windows(struct pcib_softc *sc)
{
. . .
 pcib_alloc_window(sc, >mem, SYS_RES_MEMORY, 0, 0x);
. . .
is just inappropriately restrictive about where in the system
address space a PCIe can validly be mapped to on the high end.
That, in turn, leads to the rejection on the RPi4B now that
the range use is checked.


No, the physical register in PCI-PCI bridges is only 32-bits.  Only the
prefetchable BAR supports 64-bit addresses.


Just for my edification . . .

As I understand, SYS_RES_MEMORY for the BCM2711
means the 35 bit addressing space in the BCM2711,
not a PCIe device internal address range that
corresponds. Am I wrong about that?

If I'm wrong, what does identify the 35 bit
addressing space in the BCM2711?

If I'm correct, then the 0..0x
seems to be from the wrong address space up
front. Or, may be, the SYS_RES_MEMORY and the
0x argments are not related as I
expected and the 0x is not a
SYS_RES_MEMORY value?


We use SYS_RES_MEMORY for both address spaces.  SYS_RES_MEMORY is more of
an address space "type" and doesn't necessarily name a single, unique
address space.  The way to think about these address spaces is instances
of 'struct rman'.  There's a global 'struct rman' in the arm64 nexus
driver that represents the CPU physical memory address space.  The
pci_host_

Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-14 Thread John Baldwin

On 2/14/24 8:42 AM, Warner Losh wrote:

On Wed, Feb 14, 2024 at 9:08 AM John Baldwin  wrote:


On 2/12/24 5:57 PM, Mark Millard wrote:

On Feb 12, 2024, at 16:36, Mark Millard  wrote:


On Feb 12, 2024, at 16:10, Mark Millard  wrote:


On Feb 12, 2024, at 12:00, Mark Millard  wrote:


[Gack: I was looking at the wrong vintage of source code, predating
your changes: wrong system used.]


On Feb 12, 2024, at 10:41, Mark Millard  wrote:


On Feb 12, 2024, at 09:32, John Baldwin  wrote:


On 2/9/24 8:13 PM, Mark Millard wrote:

Summary:
pcib0:  mem

0x7d50-0x7d50930f irq 80,81 on simplebus2

pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc000, CPU addr: 0x6, Size:

0x4000

. . .
rman_manage_region:  request: start

0x6, end 0x6000f

panic: Failed to add resource to rman


Hmmm, I suspect this is due to the way that bus_translate_resource

works which is

fundamentally broken.  It rewrites the start address of a resource

in-situ instead

of keeping downstream resources separate from the upstream

resources.   For example,

I don't see how you could ever release a resource in this design

without completely

screwing up your rman.  That is, I expect trying to detach a PCI

device behind a

translating bridge that uses the current approach should corrupt

the allocated

resource ranges in an rman long before my changes.

That said, that doesn't really explain the panic.  Hmm, the panic

might be because

for PCI bridge windows the driver now passes RF_ACTIVE and the

bus_translate_resource

hack only kicks in the activate_resource method of

pci_host_generic.c.



Detail:
. . .
pcib0:  mem

0x7d50-0x7d50930f irq 80,81 on simplebus2

pcib0: parsing FDT for ECAM0:
pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size:

0x4000


This indicates this is a translating bus.


pcib1:  irq 91 at device 0.0 on pci0
rman_manage_region:  request: start 0x1, end 0x1
pcib0: rman_reserve_resource: start=0xc000, end=0xc00f,

count=0x10

rman_reserve_resource_bound:  request: [0xc000,

0xc00f], length 0x10, flags 102, device pcib1

rman_reserve_resource_bound: trying 0x <0xc000,0xf>
considering [0xc000, 0x]
truncated region: [0xc000, 0xc00f]; size 0x10

(requested 0x10)

candidate region: [0xc000, 0xc00f], size 0x10
allocating from the beginning
rman_manage_region:  request: start

0x6, end 0x6000f


What you later typed does not match:

0x6
0x6000f

You later typed:

0x6000
0x600fff

This seems to have lead to some confusion from using the
wrong figure(s).


The fact that we are trying to reserve the CPU addresses in the

rman is because

bus_translate_resource rewrote the start address in the resource

after it was allocated.


That said, I can't see why rman_manage_region would actually fail.

At this point the

rman is empty (this is the first call to rman_manage_region for

"pcib1 memory window"),

so only the check that should be failing are the checks against

rm_start and

rm_end.  For the memory window, rm_start is always 0, and rm_end is

always

0x, so both the old (0xc - 0xc00f) and new

(0x6000 - 0x600fff)

ranges are within those bounds.


No:

0x

.vs (actual):

0x6
0x6000f


Ok, then this explains the failure if the "raw" addresses are above 4G.  I
have
access to an emag I'm currently using to test fixes to pci_host_generic.c
to
avoid corrupting struct resource objects.  I'll post the diff once I've got
something verified to work.


It looks to me like in sys/dev/pci/pci_pci.c the:

static void
pcib_probe_windows(struct pcib_softc *sc)
{
. . .
  pcib_alloc_window(sc, >mem, SYS_RES_MEMORY, 0, 0x);
. . .

is just inappropriately restrictive about where in the system
address space a PCIe can validly be mapped to on the high end.
That, in turn, leads to the rejection on the RPi4B now that
the range use is checked.


No, the physical register in PCI-PCI bridges is only 32-bits.  Only the
prefetchable BAR supports 64-bit addresses.  This is why the host bridge
is doing a translation from the CPU side (0x6) to the PCI BAR
addresses (0xc000) so that the BAR addresses are down in the 32-bit
address range.  It's also true that many PCI devices only support 32-bit
addresses in memory BARs.  64-bit BARs are an optional extension not
universally supported.

The translation here is somewhat akin to a type of MMU where the CPU
addresses are mapped to PCI addresses.  The problem here is that the
PCI BAR resources need to "stay" as PCI addresses since we depend on
being able to use rman_get_start/end to get the PCI addresses of
allocated resources, but pci_host_generic.c currently rewrites the
addresses.

Probably I should remove rman_set_start/end entirely (Warner added them
back in 2004) as the methods don't do anything to deal with the fallout
that th

Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-14 Thread John Baldwin

On 2/12/24 5:57 PM, Mark Millard wrote:

On Feb 12, 2024, at 16:36, Mark Millard  wrote:


On Feb 12, 2024, at 16:10, Mark Millard  wrote:


On Feb 12, 2024, at 12:00, Mark Millard  wrote:


[Gack: I was looking at the wrong vintage of source code, predating
your changes: wrong system used.]


On Feb 12, 2024, at 10:41, Mark Millard  wrote:


On Feb 12, 2024, at 09:32, John Baldwin  wrote:


On 2/9/24 8:13 PM, Mark Millard wrote:

Summary:
pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000
. . .
rman_manage_region:  request: start 0x6, end 
0x6000f
panic: Failed to add resource to rman


Hmmm, I suspect this is due to the way that bus_translate_resource works which 
is
fundamentally broken.  It rewrites the start address of a resource in-situ 
instead
of keeping downstream resources separate from the upstream resources.   For 
example,
I don't see how you could ever release a resource in this design without 
completely
screwing up your rman.  That is, I expect trying to detach a PCI device behind a
translating bridge that uses the current approach should corrupt the allocated
resource ranges in an rman long before my changes.

That said, that doesn't really explain the panic.  Hmm, the panic might be 
because
for PCI bridge windows the driver now passes RF_ACTIVE and the 
bus_translate_resource
hack only kicks in the activate_resource method of pci_host_generic.c.


Detail:
. . .
pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000


This indicates this is a translating bus.


pcib1:  irq 91 at device 0.0 on pci0
rman_manage_region:  request: start 0x1, end 0x1
pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10
rman_reserve_resource_bound:  request: [0xc000, 0xc00f], 
length 0x10, flags 102, device pcib1
rman_reserve_resource_bound: trying 0x <0xc000,0xf>
considering [0xc000, 0x]
truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10)
candidate region: [0xc000, 0xc00f], size 0x10
allocating from the beginning
rman_manage_region:  request: start 0x6, end 
0x6000f


What you later typed does not match:

0x6
0x6000f

You later typed:

0x6000
0x600fff

This seems to have lead to some confusion from using the
wrong figure(s).


The fact that we are trying to reserve the CPU addresses in the rman is because
bus_translate_resource rewrote the start address in the resource after it was 
allocated.

That said, I can't see why rman_manage_region would actually fail.  At this 
point the
rman is empty (this is the first call to rman_manage_region for "pcib1 memory 
window"),
so only the check that should be failing are the checks against rm_start and
rm_end.  For the memory window, rm_start is always 0, and rm_end is always
0x, so both the old (0xc - 0xc00f) and new (0x6000 - 
0x600fff)
ranges are within those bounds.


No:

0x

.vs (actual):

0x6
0x6000f


Ok, then this explains the failure if the "raw" addresses are above 4G.  I have
access to an emag I'm currently using to test fixes to pci_host_generic.c to
avoid corrupting struct resource objects.  I'll post the diff once I've got
something verified to work.


It looks to me like in sys/dev/pci/pci_pci.c the:

static void
pcib_probe_windows(struct pcib_softc *sc)
{
. . .
 pcib_alloc_window(sc, >mem, SYS_RES_MEMORY, 0, 0x);
. . .

is just inappropriately restrictive about where in the system
address space a PCIe can validly be mapped to on the high end.
That, in turn, leads to the rejection on the RPi4B now that
the range use is checked.


No, the physical register in PCI-PCI bridges is only 32-bits.  Only the
prefetchable BAR supports 64-bit addresses.  This is why the host bridge
is doing a translation from the CPU side (0x6) to the PCI BAR
addresses (0xc000) so that the BAR addresses are down in the 32-bit
address range.  It's also true that many PCI devices only support 32-bit
addresses in memory BARs.  64-bit BARs are an optional extension not
universally supported.

The translation here is somewhat akin to a type of MMU where the CPU
addresses are mapped to PCI addresses.  The problem here is that the
PCI BAR resources need to "stay" as PCI addresses since we depend on
being able to use rman_get_start/end to get the PCI addresses of
allocated resources, but pci_host_generic.c currently rewrites the
addresses.

Probably I should remove rman_set_start/end entirely (Warner added them
back in 2004) as the methods don't do anything to deal with the fallout
that the rman.rm_list linked-list is no longer sorted by address once
some addresses get rewritten, etc.

--
John Baldwin




Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-12 Thread John Baldwin

On 2/10/24 2:09 PM, Michael Butler wrote:

I have stability problems with anything at or after this commit
(b377ff8) on an amd64 laptop. While I see the following panic logged, no
crash dump is preserved :-( It happens after ~5-6 minutes running in KDE
(X).

Reverting to 36efc64 seems to work reliably (after ACPI changes but
before the problematic PCI one)

kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid = 2; apic id = 02
kernel: fault virtual address = 0x48
kernel: fault code= supervisor read data, page not present
kernel: instruction pointer   = 0x20:0x80acb962
kernel: stack pointer = 0x28:0xfe00c4318d80
kernel: frame pointer = 0x28:0xfe00c4318d80
kernel: code segment  = base 0x0, limit 0xf, type 0x1b
kernel:   = DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags  = interrupt enabled, resume, IOPL = 0
kernel: current process   = 2 (clock (0))
kernel: rdi: f802e460c000 rsi:  rdx: 0002
kernel: rcx:   r8: 001e  r9: fe00c4319000
kernel: rax: 0002 rbx: f802e460c000 rbp: fe00c4318d80
kernel: r10: 1388 r11: 7ffc765d r12: 000f
kernel: r13: 0002 r14: f8000193e740 r15: 
kernel: trap number   = 12
kernel: panic: page fault
kernel: cpuid = 2
kernel: time = 1707573802
kernel: Uptime: 6m19s
kernel: Dumping 942 out of 16242
MB:..2%..11%..21%..31%..41%..51%..62%..72%..82%..92%
kernel: Dump complete
kernel: Automatic reboot in 15 seconds - press a key on the console to abort


Without a stack trace it is pretty much impossible to debug a panic like this.
Do you have KDB_TRACE enabled in your kernel config?  I'm also not sure how the
PCI changes can result in a panic post-boot.  If you were going to have problems
they would be during device attach, not after you are booted and running X.

Short of a stack trace, you can at least use lldb or gdb to lookup the source
line associated with the faulting instruction pointer (as long as it isn't in
a kernel module), e.g. for gdb you would use 'gdb /boot/kernel/kernel' and then
'l *', e.g. from above: 'l *0x80acb962'

--
John Baldwin




Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

2024-02-12 Thread John Baldwin

On 2/9/24 8:13 PM, Mark Millard wrote:

Summary:

pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000
. . .
rman_manage_region:  request: start 0x6, end 
0x6000f
panic: Failed to add resource to rman


Hmmm, I suspect this is due to the way that bus_translate_resource works which 
is
fundamentally broken.  It rewrites the start address of a resource in-situ 
instead
of keeping downstream resources separate from the upstream resources.   For 
example,
I don't see how you could ever release a resource in this design without 
completely
screwing up your rman.  That is, I expect trying to detach a PCI device behind a
translating bridge that uses the current approach should corrupt the allocated
resource ranges in an rman long before my changes.

That said, that doesn't really explain the panic.  Hmm, the panic might be 
because
for PCI bridge windows the driver now passes RF_ACTIVE and the 
bus_translate_resource
hack only kicks in the activate_resource method of pci_host_generic.c.


Detail:


. . .
pcib0:  mem 0x7d50-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000


This indicates this is a translating bus.


pcib1:  irq 91 at device 0.0 on pci0
rman_manage_region:  request: start 0x1, end 0x1
pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10
rman_reserve_resource_bound:  request: [0xc000, 0xc00f], 
length 0x10, flags 102, device pcib1
rman_reserve_resource_bound: trying 0x <0xc000,0xf>
considering [0xc000, 0x]
truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10)
candidate region: [0xc000, 0xc00f], size 0x10
allocating from the beginning
rman_manage_region:  request: start 0x6, end 
0x6000f


The fact that we are trying to reserve the CPU addresses in the rman is because
bus_translate_resource rewrote the start address in the resource after it was 
allocated.

That said, I can't see why rman_manage_region would actually fail.  At this 
point the
rman is empty (this is the first call to rman_manage_region for "pcib1 memory 
window"),
so only the check that should be failing are the checks against rm_start and
rm_end.  For the memory window, rm_start is always 0, and rm_end is always
0x, so both the old (0xc - 0xc00f) and new (0x6000 - 
0x600fff)
ranges are within those bounds.  I would instead expect to see some other issue 
later
on where we fail to allocate a resource for a child BAR, but I wouldn't expect
rman_manage_region to fail.

Logging the return value from rman_manage_region would be the first step I think
to see which error value it is returning.

Probably I should fix pci_host_generic.c to handle translation properly however.
I can work on a patch for that.

--
John Baldwin




Re: Alder lake supported? (graphics)

2024-01-16 Thread John D Groenveld
In message , Chris writes:
>I upgraded to an alder lake based machine and installed 14.
>But I can't seem to get the intel graphics loaded (drm-515-kmod).
>It simply freezes at load.

Shot in the dark:
# pkg delete drm-515-kmod && pkg install drm-510-kmod && kldload i915kms

John
groenv...@acm.org



Re: How to upgrade an EOL FreeBSD release or how to make it working again

2024-01-15 Thread John F Carr
Judging by a commit message BSD on the ARM Chromebook didn't work
when support was removed in 2019.

>RK* Exynos* and Meson*/Odroid* don't even work with current
>source code, if someone wants to make them work again they
>better use the Linux DTS.
https://cgit.freebsd.org/src/commit?id=9dfa2a54684978d1d6cef67bbf6242e825801f18

I have one of the "snow" Chromebooks.  The warnings in the web page
https://wiki.freebsd.org/arm/Chromebook led me not to try FreeBSD.
None of the many bugs seemed likely to ever be fixed.  I'm not using it
so I could try an experiment, but fighting with u-boot is not how I want
to spend my days.  Even the popular Raspberry Pi takes skill or luck.

(So "build an arm6 world and copy X, Y, and Z to the DOS partition
on your USB drive" is the kind of advice I need to supplement the old
Chromebook wiki page.)

There is at least a little value in getting it to work because the armv6
code is bit rotting and will go away entirely unless people use it.

John Carr


> On Jan 15, 2024, at 10:59, Mario Marietto  wrote:
> 
> Hello to everyone.
> 
> I'm trying to install FreeBSD 14 natively on my ARM Chromebook model xe303c12 
> ; I've found only one tutorial that teaches how to do that,that's it :
> 
> https://wiki.freebsd.org/arm/Chromebook
> 
> The problem is that it ends with the installation of FreeBSD 11,that's very 
> EOL.
> I can't use it as is. I need to upgrade it to 14 (but I'm on arm 32 
> bit,that's TIER-2,so I can't upgrade it automatically using the 
> freebsd-update script. It is also true that I can't install 14 directly on 
> that machine,as you can read below :
> 
> 
> 
> 
> I've looked all around and I found the tool pkgbase,that I'm talking about on 
> the FreeBSD forum,to understand if it allows the 11 to be usable or 
> upgradable. It does not seem to be the proper tool to achieve my goal. Do you 
> have any suggestions that can help me ? Thanks.
> 
> -- 
> Mario.




Re: ZFS problems since recently ?

2024-01-10 Thread John Kennedy
On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote:
> Please see/test: https://github.com/openzfs/zfs/pull/15732 .

  Looks like that has landed in current:

commit f552d7adebb13e24f65276a6c4822bffeeac3993
Merge: 13720136fbf a382e21194c
Author: Martin Matuska 
Date:   Wed Jan 10 09:07:45 2024 +0100

zfs: merge openzfs/zfs@a382e2119

Notable upstream pull request merges:
 #15693 a382e2119 Add Gotify notification support to ZED
-->  #15732 e78aca3b3 Fix livelist assertions for dedup and cloning
 #15733 7ecaa0758 make zdb_decompress_block check decompression 
reliably
 #15735 255741fc9 Improve block sizes checks during cloning

Obtained from:  OpenZFS
OpenZFS commit: a382e21194c1690951d2eee8ebd98bc096f01c83



Re: ZFS problems since recently ?

2024-01-04 Thread John Kennedy
On Tue, Jan 02, 2024 at 08:02:04PM -0800, John Kennedy wrote:
> On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote:
> > On 01.01.2024 08:59, John Kennedy wrote:
> > >  ...
> > >My poudriere build did eventually fail as well:
> > >   ...
> > >   [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success
> > >   [05:40:24] Stopping 2 builders
> > >   panic: VERIFY(BP_GET_DEDUP(bp)) failed
> > 
> > Please see/test: https://github.com/openzfs/zfs/pull/15732 .
> 
>   It came back today at the end of my poudriere build.  Your patch has fixed
> it, so far at least.

  At the risk of conflating this with other ZFS issues, I beat on the VM a lot
more last night without triggering any panics.  My usual busy-workload is a
total kernel+world rebuild (with whatever pending patches might be out), then
a poudriere run (~230 or so packages).  It's weird that the first (much bigger)
run worked but later ones didn't (where maybe I had one port that failed to
build), triggering the panic.  Seemed repeatable, but don't have a feel for
the exact trigger like the sysctl issue.



Re: ZFS problems since recently ?

2024-01-02 Thread John Kennedy
On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote:
> On 01.01.2024 08:59, John Kennedy wrote:
> >  ...
> >My poudriere build did eventually fail as well:
> > ...
> > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success
> > [05:40:24] Stopping 2 builders
> > panic: VERIFY(BP_GET_DEDUP(bp)) failed
> 
> Please see/test: https://github.com/openzfs/zfs/pull/15732 .

  It came back today at the end of my poudriere build.  Your patch has fixed
it, so far at least.




Re: ZFS problems since recently ?

2024-01-01 Thread John Kennedy
On Mon, Jan 01, 2024 at 02:27:17PM +0100, Kurt Jaeger wrote:
> > On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote:
> > > markj@ pointed me in
> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039
> > > to
> > > https://github.com/openzfs/zfs/pull/15719 
> > > 
> > > So it will probably be fixed sooner or later.
> > > 
> > > The other ZFS crashes I've seen are still an issue.
> > 
> >   My poudriere build did eventually fail as well:
> > ...
> > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success
> > [05:40:24] Stopping 2 builders
> > panic: VERIFY(BP_GET_DEDUP(bp)) failed
> 
> That's one of the panic messages I had as well.
> 
> See
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276051
> 
> for additional crashes and dumps.
> 
> >   I didn't tweak this system off defaults for block-cloning.  I haven't 
> > been following
> > that issue 100%.
> 
> Do you have
>   vfs.zfs.dmu_offset_next_sync=0
> ?

  I reverted everything and reinstalled.  The VERIFY(BP_GET_DEDUP(bp)) panic
hasn't reoccurred (tended to happen on poudriere-build cleanup), which may
lean it more towards corruption, or maybe I just haven't been "lucky" with my
small random chance of corruption.

  I did set vfs.zfs.dmu_offset_next_sync=0 after the bsdinstall was complete
(maybe I could have loaded the zfs kernel module from the shell and set it
before things kicked off).




Re: ZFS problems since recently ?

2024-01-01 Thread John Kennedy
On Mon, Jan 01, 2024 at 08:42:26AM -0800, John Kennedy wrote:
>   Applying the two ZFS kernel patches fixes that issue:

commit 09af4bf2c987f6f57804162cef8aeee05575ad1d (zfs: Fix SPA sysctl handlers) 
landed too.

root@bsd15:~ # sysctl -a | grep vfs.zfs.zio
vfs.zfs.zio.deadman_log_all: 0
vfs.zfs.zio.dva_throttle_enabled: 1
vfs.zfs.zio.requeue_io_start_cut_in_line: 1
vfs.zfs.zio.slow_io_ms: 3
vfs.zfs.zio.taskq_wr_iss_ncpus: 0
vfs.zfs.zio.taskq_write: sync fixed,1,5 scale fixed,1,5
vfs.zfs.zio.taskq_read: fixed,1,8 null scale null
vfs.zfs.zio.taskq_batch_tpq: 0
vfs.zfs.zio.taskq_batch_pct: 80
vfs.zfs.zio.exclude_metadata: 0

root@bsd15:~ # uname -aUK
FreeBSD bsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #1 
main-n267336-09af4bf2c98: Mon Jan  1 12:04:15 PST 2024 
warlock@bsd15:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158




Re: ZFS problems since recently ?

2024-01-01 Thread John Kennedy
On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote:
> > >   I can crash mine with "sysctl -a" as well.
> 
> markj@ pointed me in
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039
> to
> https://github.com/openzfs/zfs/pull/15719 
> 
> So it will probably be fixed sooner or later.
> 
> The other ZFS crashes I've seen are still an issue.

  Applying the two ZFS kernel patches fixes that issue:

root@bsd15:~ # sysctl -a | grep vfs.zfs.zio
vfs.zfs.zio.deadman_log_all: 0
vfs.zfs.zio.dva_throttle_enabled: 1
vfs.zfs.zio.requeue_io_start_cut_in_line: 1
vfs.zfs.zio.slow_io_ms: 3
vfs.zfs.zio.taskq_wr_iss_ncpus: 0
vfs.zfs.zio.taskq_write: sync fixed,1,5 scale fixed,1,5
vfs.zfs.zio.taskq_read: fixed,1,8 null scale null
vfs.zfs.zio.taskq_batch_tpq: 0
vfs.zfs.zio.taskq_batch_pct: 80
vfs.zfs.zio.exclude_metadata: 0

root@bsd15:~ # uname -aUK
FreeBSD bsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #2 
main-n267335-499e84e16f5-dirty: Mon Jan  1 08:04:59 PST 2024 
warlock@bsd15:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158




Re: ZFS problems since recently ?

2024-01-01 Thread John Kennedy
On Mon, Jan 01, 2024 at 02:27:17PM +0100, Kurt Jaeger wrote:
> Do you have
>vfs.zfs.dmu_offset_next_sync=0

  I didn't initially, I do now.  Like I said, I haven't been following that one
100%.  I know it isn't block-clone per say, so much as some underlying problem
it pokes with a pointy stick.  Small chance multiplied by a bunch of ZFS IOPS.

  Seems like I'd have to revert it all the way back to fresh install if I want
to get rid of all potential corruption unrelated to sysctl panic.

  But I'll do myh busy-work cycle (*) with that one and maybe another with it
off and see what happens.


  * full kernel+world, plus my local poudriere package build, currenly wedged
a bit with the heimdall build issue.



Re: ZFS problems since recently ?

2024-01-01 Thread John Kennedy
On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote:
> markj@ pointed me in
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039
> to
> https://github.com/openzfs/zfs/pull/15719 
> 
> So it will probably be fixed sooner or later.
> 
> The other ZFS crashes I've seen are still an issue.

  My poudriere build did eventually fail as well:

...
[05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success
[05:40:24] Stopping 2 builders
panic: VERIFY(BP_GET_DEDUP(bp)) failed

cpuid = 2
time = 1704091946
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00f62898c0
vpanic() at vpanic+0x131/frame 0xfe00f62899f0
spl_panic() at spl_panic+0x3a/frame 0xfe00f6289a50
dsl_livelist_iterate() at dsl_livelist_iterate+0x2de/frame 
0xfe00f6289b30
bpobj_iterate_blkptrs() at bpobj_iterate_blkptrs+0x235/frame 
0xfe00f6289bf0
bpobj_iterate_impl() at bpobj_iterate_impl+0x16e/frame 
0xfe00f6289c80
dsl_process_sub_livelist() at dsl_process_sub_livelist+0x5c/frame 
0xfe00f6289d00
spa_livelist_delete_cb() at spa_livelist_delete_cb+0xf6/frame 
0xfe00f6289ea0
zthr_procedure() at zthr_procedure+0xa5/frame 0xfe00f6289ef0
fork_exit() at fork_exit+0x82/frame 0xfe00f6289f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00f6289f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 9 tid 100223 ]
Stopped at  kdb_enter+0x33: movq$0,0xe3a582(%rip)
db>

  Trying to do another poudriere build fails almost immediatly with that verify 
error.

  Your verify errors don't match up exactly.  I've got snapshots from before I 
started
freaking it out with the sysctl calls and possibly inducing corruption.

  I didn't tweak this system off defaults for block-cloning.  I haven't been 
following
that issue 100%.



Re: ZFS problems since recently ?

2023-12-31 Thread John Kennedy
>   I can crash mine with "sysctl -a" as well.

  Smaller test, this is sufficient to crash things:

root@bsd15:~ # sysctl vfs.zfs.zio
vfs.zfs.zio.deadman_log_all: 0
vfs.zfs.zio.dva_throttle_enabled: 1
vfs.zfs.ziopanic: sbuf_clear makes no sense on sbuf 0xf8002c8dc300 
with drain
cpuid = 3
time = 1704069514
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00fa502960
vpanic() at vpanic+0x131/frame 0xfe00fa502a90
panic() at panic+0x43/frame 0xfe00fa502af0
sbuf_clear() at sbuf_clear+0xa8/frame 0xfe00fa502b00
sbuf_cpy() at sbuf_cpy+0x56/frame 0xfe00fa502b20
spa_taskq_write_param() at spa_taskq_write_param+0x85/frame 
0xfe00fa502bd0
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame 
0xfe00fa502c20
sysctl_root() at sysctl_root+0x21e/frame 0xfe00fa502ca0
userland_sysctl() at userland_sysctl+0x184/frame 0xfe00fa502d50
sys___sysctl() at sys___sysctl+0x60/frame 0xfe00fa502e00
amd64_syscall() at amd64_syscall+0x153/frame 0xfe00fa502f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 
0xfe00fa502f30
--- syscall (202, FreeBSD ELF64, __sysctl), rip = 0x3733c1e5619a, rsp = 
0x3733bf494538, rbp = 0x3733bf494570 ---
KDB: enter: panic
[ thread pid 780 tid 100237 ]
Stopped at  kdb_enter+0x33: movq$0,0xe3a582(%rip)
db> 



Re: ZFS problems since recently ?

2023-12-31 Thread John Kennedy
On Sun, Dec 31, 2023 at 07:34:45PM +0100, Kurt Jaeger wrote:
> Hi!
> 
> Short overview:
> - Had CURRENT system from around September
> - Upgrade on the 23th of December
> - crashes in ZFS, see
>   https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261538
>   for details
> - Reinstalled from scratch with new SSDs drives from
> https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/15.0/
> freebsd-openzfs-amd64-2020081900-memstick.img.xz
> - Had one crash with
>   sysctl -a
>   https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039
> - Still see crashes with ZFS (and other) when using poudriere to
>   build ports.
> 
> Problem:
> 
> I happen to run in several cases of crashes in ZFS, some of
> them fatal (zpool non-recoverable).

  I can crash mine with "sysctl -a" as well.

  I seeded my bhyve with:
FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso

  Rebuilt the kernel (so now at main-n267320-4d08b569a01) and started
crunching through poudriere package builds.  Sorta stock install of encrypted
ZFS.  I didn't get it to crash with poudriere (yet).  Mine lives in bhyve,
so maybe less possible destruction via crashes.

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00fa5f3960
vpanic() at vpanic+0x131/frame 0xfe00fa5f3a90
panic() at panic+0x43/frame 0xfe00fa5f3af0
sbuf_clear() at sbuf_clear+0xa8/frame 0xfe00fa5f3b00
sbuf_cpy() at sbuf_cpy+0x56/frame 0xfe00fa5f3b20
spa_taskq_write_param() at spa_taskq_write_param+0x85/frame 
0xfe00fa5f3bd0
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame 
0xfe00fa5f3c20
sysctl_root() at sysctl_root+0x21e/frame 0xfe00fa5f3ca0
userland_sysctl() at userland_sysctl+0x184/frame 0xfe00fa5f3d50
sys___sysctl() at sys___sysctl+0x60/frame 0xfe00fa5f3e00
amd64_syscall() at amd64_syscall+0x153/frame 0xfe00fa5f3f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 
0xfe00fa5f3f30
--- syscall (202, FreeBSD ELF64, __sysctl), rip = 0x22e42167019a, rsp = 
0x22e41ee72518, rbp = 0x22e41ee72550 ---
KDB: enter: panic

  The sysctl died at this point, but who knows if it had pending buffered 
output or anything...

...
vfs.zfs.zio.deadman_log_all: 0
vfs.zfs.zio.dva_throttle_enabled: 1
vfs.zfs.zio.requeue_io_start_cut_in_line: 1
vfs.zfs.zio.slow_io_ms: 3
vfs.zfs.zio.taskq_wr_iss_ncpus: 0




Re: make installworld fails because /usr/include/c++/v1/__tuple is a file

2023-12-13 Thread John Baldwin

On 12/10/23 8:43 AM, Dimitry Andric wrote:

On 10 Dec 2023, at 15:11, Herbert J. Skuhra  wrote:


On Sun, Dec 10, 2023 at 01:22:38PM +, John F Carr wrote:

On arm64 running CURRENT from two weeks ago I updated to

  c711af772782 Bump __FreeBSD_version for llvm 17.0.6 merge

and built and installed from source.  make installworld failed:

  install: target directory `/usr/include/c++/v1/__tuple/' does not exist

That pathname is a file:

  -r--r--r--  1 root wheel 20512 Feb 15  2023 /usr/include/c++/v1/__tuple

Early in make output is

  mtree -deU -i -f /usr/src/etc/mtree/BSD.include.dist -p /usr/include
  ./c++/v1/__algorithm/pstl_backends missing (created)
  [...]
  ./c++/v1/__tuple missing (not created: File exists)

Should I remove the file and try again, or is there a more elegant fix?

The word "tuple" does not appear in UPDATING.


'make delete-old' should have removed this file.

bdd1243df58e6 (Dimitry Andric  2023-04-14 23:41:27 +0200   965)
OLD_FILES+=usr/include/c++/v1/__tuple


Ah yes, that's it. The file was removed during the upgrade from libc++
15.0 to 16.0, while its contents was split into a subdirectory named
__tuple_dir. In libc++ 17.0.0 they renamed this subdirectory back to
just __tuple.

This means that apparently people are not running "make delete-old"
after installations. Please don't forget that. :)


Well, but if you have an old system with LLVM 15 that you upgrade directly
to LLVM 17 you will hit this even if you ran delete-old after your last
upgrading that used LLVM 15.  We might need something to cope with this
during the install target for libc++ in particular where this has occurred
multiple times historically.

--
John Baldwin




make installworld fails because /usr/include/c++/v1/__tuple is a file

2023-12-10 Thread John F Carr
On arm64 running CURRENT from two weeks ago I updated to

  c711af772782 Bump __FreeBSD_version for llvm 17.0.6 merge

and built and installed from source.  make installworld failed:

  install: target directory `/usr/include/c++/v1/__tuple/' does not exist

That pathname is a file:

  -r--r--r--  1 root wheel 20512 Feb 15  2023 /usr/include/c++/v1/__tuple

Early in make output is

  mtree -deU -i -f /usr/src/etc/mtree/BSD.include.dist -p /usr/include
  ./c++/v1/__algorithm/pstl_backends missing (created)
  [...]
  ./c++/v1/__tuple missing (not created: File exists)

Should I remove the file and try again, or is there a more elegant fix?

The word "tuple" does not appear in UPDATING.




Re: How do I update the kernel of FreeBSD-CURRENT

2023-11-29 Thread John Nielsen
On Nov 29, 2023, at 12:21 PM, Manoel Games  wrote:
> 
> I am a new FreeBSD user, and I am using FreeBSD-CURRENT. How do I update the 
> FreeBSD-CURRENT kernel, and is it done through pkg? I installed 
> FreeBSD-CURRENT without src.

As a new user you should probably run a supported release version, such as 
14.0.  Releases have binary updates available via freebsd-update. (Upgrading 
the base OS via pkg is still experimental.) Current has no such feature, so you 
need to download/update the source and recompile.

See the Handbook chapter on upgrading FreeBSD:
https://docs.freebsd.org/en/books/handbook/cutting-edge/

JN



Re: bhyve -G

2023-11-15 Thread John Baldwin

On 11/15/23 3:06 PM, Bakul Shah wrote:




On Nov 15, 2023, at 7:57 AM, John Baldwin  wrote:

On 10/9/23 5:21 PM, Bakul Shah wrote:

Any hints on how to use bhyve's -G  option to debug a VM
kernel? I can connect to it from gdb with "target remote :"
& bhyve stops the VM initially but beyond that I am not sure.
Ideally this should work just like an in-circuit-emulator, not
requiring anything special in the VM or kernel itself.


step only works on Intel CPUs currently (and is a bit fragile
anyway due to interrupts firing while you try to step, but that
happens for me in QEMU as well).   Breakpoints should work fine.
I tend to use 'until' to do stepping (basically stepping via
temporary breakpoints) when debugging the kernel this way.


Thanks for your response!

I can ^C to stop the VM, examine the stack, set breakpoints,
continue etc. but when the breakpoint is hit, kgdb doesn't
regain control -- instead I get the usual

db> ...

prompt on the console. I guess I have to set some sysctl for
this?


Hmm, no, it shouldn't be breaking into DDB in the guest as the
breakpoint exception should be intercepted by the stub and
never made visible to the guest.

--
John Baldwin




Re: [HEADS-UP] Quick update to 14.0-RELEASE schedule

2023-11-15 Thread John Baldwin

On 11/14/23 8:52 PM, Glen Barber wrote:

On Tue, Nov 14, 2023 at 08:10:23PM -0700, The Doctor wrote:

On Wed, Nov 15, 2023 at 02:27:01AM +, Glen Barber wrote:

On Tue, Nov 14, 2023 at 05:15:48PM -0700, The Doctor wrote:

On Tue, Nov 14, 2023 at 08:36:54PM +, Glen Barber wrote:

We are still waiting for a few (non-critical) things to complete before
the announcement of 14.0-RELEASE will be ready.

It should only be another day or so before these things complete.

Thank you for your understanding.



I always just installed my copy.



Ok.  I do not know what exactly is your point, but releases are never
official until there is a PGP-signed email sent.  The email is intended
for the general public of consumers of official releases, not "yeah,
but"s.



Howver if you do a freebsd-update upgrade, you can upgrade.

Is that suppose to happen?



That does not say that the freebsd-update bits will not change *until*
the official release announcement has been sent.

In my past 15 years involved in the Project, I think we have been very
clear on that.

A RELEASE IS NOT FINAL UNTIL THE PGP-SIGNED ANNOUNCEMENT IS SENT.

I mean, c'mon, dude.

We really, seriously, for all intents and purposes, cannot be any more
clear than that.

So, yes, *IF* an update necessitates a new freebsd-update build, what
you are running is *NOT* official.

For at least 15 years, we have all said the same entire thing.


Yes, but, if at this point we had to rebuild, it would have to be 14.0.1
or something (which we have done a few times in the past).  It would be
too confusing otherwise once the bits are built and published (where
published means "uploaded to our CDN").  It is the 14.0 release bits,
the only question is if for some reason we had a dire emergency that
meant we had to pull it at the last minute and publish different bits
(under a different release name).

Realistically, once the bits are available, we can't prevent people from
using them, it's just at their own risk to do so until the project says
"yes, we believe these are good".  Granted, they are under the same risk
if they are still running the last RC.  The best way to minimize that
risk going forward is to add more automation of testing/CI to go along
with the process of building release bits so that the build artifacts
from the release build run through CI and are only published if the CI
is green as that would give us greater confidence of "we believe these
are good" before they are uploaded for publishing.

--
John Baldwin




Re: bsdinstall/scriptedpart could not run ;-(

2023-11-15 Thread John Baldwin

On 11/12/23 11:00 PM, KIRIYAMA Kazuhiko wrote:

Hi, all

I usually run bsdinstall by instllerconfig, but
bsdinstall/scriptedpart could not run ;-(

My installerconfig is:

PARTITIONS='nda0 gpt { 200M efi, 804G freebsd-ufs /, 128G freebsd-swap }'
DISTRIBUTIONS='base.txz kernel-dbg.txz kernel.txz lib32.txz tests.txz'
ZFSBOOT_DISKS=""

#!/bin/sh
/bin/mkdir -p /.dake
/bin/cp /usr/share/zoneinfo/Asia/Tokyo /etc/localtime
/bin/cp /root/.cshrc /root/.cshrc.org
/bin/cat <> /etc/fstab
192.168.1.17:/.dake /.dake  nfs rw  0   0
EOF
sed -i".bak" -Ee 
'/^#BDS_install.sh_added:start_line$/,/^#BDS_install.sh_added:end_line$/d' /root/.cshrc
/bin/cat <<'EOF' >> /root/.cshrc
#BDS_install.sh_added:start_line
setenv  PATH${PATH}:/.dake/bin
setenv  MGRHOME /usr/home/admin
setenv  OPENTOOLSDIR/.dake
setenv  DAKEDIR /.dake
#BDS_install.sh_added:end_line
EOF
   :
(snip)
   :

I investigated in bsdinstall script and found scriptedpart
which acutually run partedit with scriptedpart would not be
destroy existing partition. In fact scriptedpart -> partedit
changed in script as follows, then parttion editor run at
terminal.


My guess is something to do with commit 
23099099196548550461ba427dcf09dcfb01878d,
though I don't see how it could work any differently in this case as the only
change to part_config there was to return if if geom_gettree fails, and if it
fails, provider_for_name would presumably have failed anyway.

--
John Baldwin




Re: bhyve -G

2023-11-15 Thread John Baldwin

On 10/9/23 5:21 PM, Bakul Shah wrote:

Any hints on how to use bhyve's -G  option to debug a VM
kernel? I can connect to it from gdb with "target remote :"
& bhyve stops the VM initially but beyond that I am not sure.
Ideally this should work just like an in-circuit-emulator, not
requiring anything special in the VM or kernel itself.


step only works on Intel CPUs currently (and is a bit fragile
anyway due to interrupts firing while you try to step, but that
happens for me in QEMU as well).   Breakpoints should work fine.
I tend to use 'until' to do stepping (basically stepping via
temporary breakpoints) when debugging the kernel this way.

--
John Baldwin




Re: KTLS thread on 14.0-RC3

2023-10-31 Thread John Baldwin

On 10/30/23 3:41 AM, Zhenlei Huang wrote:




On Oct 30, 2023, at 12:09 PM, Zhenlei Huang  wrote:




On Oct 29, 2023, at 5:43 PM, Gordon Bergling  wrote:

Hi,

I am currently building a new system, which should be based on 14.0-RELEASE.
Therefor I am tracking releng/14.0 since its creation and updating it currently
via the usualy buildworld steps.

What I have noticed recently is, that the [KTLS] is missing. I have a stable/13
system which shows the [KTLS] thread and a very recent -CURRENT that also shows
the [KTLS] thread.

The stable/13 and releng/14.0 systems both use the GENERIC kernel, without any
custom modifications.

Loaded KLDs are also the same.

Did I miss something, or is there something in releng/14.0 missing, which
is currenlty enabled in stable/13?


KTLS shall still work as intended, the creation of it threads is deferred.

See a72ee355646c (ktls: Defer creation of threads and zones until first use)

Run ktls_init() when the first KTLS session is created rather than
unconditionally during boot.  This avoids creating unused threads and
allocating unused resources on systems which do not use KTLS.


```
-SYSINIT(ktls, SI_SUB_SMP + 1, SI_ORDER_ANY, ktls_init, NULL);
```


Seems 14.0 only create one KTLS thread.

IIRC 13.2 create one thread per core.


That part should not be different.  There should always be one thread per core.

--
John Baldwin




15/14 upgrades break old sudo, maybe bump PAM's shlib?

2023-09-09 Thread John Baldwin

I upgraded my laptop from a late June current to current from yesterday
today, and after installworld sudo stopped working (dies with a SIGBUS).
After some debugging, the issue ended up being OpenSSL library version
mismatches as sudo uses PAM and PAM is linked agianst OpenSSL 3, but
sudo is linked against OpenSSL 1.1.1.  Both shlibs get mapped into the
the process and at some point sudo crosses the streams and the crash
occurs inside OpenSSL 3's libcrypto.

I realize that we do have a generate note about needing to update third
party packages after an upgrade, but I tend to use sudo as part of my
workflow for doing that sort of thing.  I generally build all my own
packages via poudriere and use sudo at various points in that process,
but even if I were using FreeBSD.org packages I would be using sudo to
try to run 'pkg upgrade'.

su(8) in base works fine, so that's my workaround for now on my laptop,
but I wonder if we want to make this particular bump on the upgrade
path a little less bumpy?  Either by being clear in our release notes
that tools like sudo (and I suspect any other third-party su wrappers
that also use PAM, xscreensaver's screen lock doesn't seem to be affected
since it probably doesn't use OpenSSL directly thankfully) can break,
or another route we could take would be to bump the DSO versions of
things that depend on libcrypto/libssl in base.  We did not do this
latter approach for the OpenSSL 1.0.2 -> 1.1.1 upgrade FWIW.

If we wanted to do the shlib bump approach, Enji had a good list from a
while back (though Enji wanted to make them all private rather than
bumping):

- kerberos
- libarchive
- libbsnmp
- libfetch
- libgeli
- libldns
- libmp
- libradius
- libunbound

From my research it seems that PAM (library and modules), gssapi libraries,
and libzfs would also need to be on the list.

libldns is already private as is libunbound, though bumping them might
be safter anyway.

There is on libgeli, instead there is geli_eli.so which has no version,
but hopefully is not widely used in ports the same as PAM.  Note also
that if we did this, we would want to do it for 14.0 as 13.x -> 14 upgrades
are affected in the same way.

--
John Baldwin



Re: user problems when upgrading to v15

2023-09-09 Thread John Baldwin

On 9/2/23 7:11 AM, Dimitry Andric wrote:

On 1 Sep 2023, at 03:42, brian whalen  wrote:


Repeating the entire process:

I created a 13.2 vm with 6 cores and 8GB of ram.

Ran freebsd-update fetch and install.

Ran pkg install git bash ccache open-vm-tools-nox11

Used git clone to get current and ports source files.

Edited /etc/make.conf to use ccache

Ran make -j6 buildworld && make -j6 kernel

I then rebooted in single user mode and did the next steps saving output to a file 
with > filename.

etcupdate -p was pretty uneventful. It did  show the below and did not prompt 
to edit.

root@f15:~ # less etcupdatep
   C /etc/group
   C /etc/master.passwd


This is a problem: the "C" characters mean there were conflicts, and it's 
indeed very unfortunate that etcupdate does not immediately force you to resolve them. 
Because now you basically have mangled group and master.passwd files, with conflict 
markers in them!


No, the conflicted files are in /var/db/etcupdate/conflicts, the files in /etc 
are still
the old ones at this point and won't be updated until you run 'etcupdate 
resolve' to
fix them.

I suspect what happened here is that Brian chose the 'tf' (theirs-full) option 
for
'etcupdate resolve' when he really wanted to do 'e' to edit the conflicted 
version.


Immediately after this, you should run "etcupdate resolve", and fix any 
conflicts that it has found.

Note that recently there was a lot of churn due to the removal of $FreeBSD$ 
keywords, and this almost always creates conflicts in the group and passwd 
files. For lots of other files in /etc, the conflicts are resolved 
automatically, but unfortunately not for the files that are essential to log in!



make installworld seemed mostly error free though I did see a nonzero status 
for a man page failed inn the man4 directory.

etcupdate -B only showed the below. This was my first build after install.

root@f15:~ # less etcupdateB
Conflicts remain from previous update, aborting.


Yes, that is indeed the problem. You must first resolve conflicts from any 
previous etcupdate run, before doing anything else. As to why it does not 
immediately forces you to do so, and delegates this to a separate step, which 
can easily be forgotten, I have no idea.


So that if you are doing scripted upgrades, you don't hang forever in a script.
The intention is that after doing a bunch of scripted installworld + etcupdate's
on various hosts you can use 'etcupdate status' to see if there are any 
remaining
steps requiring manual intervention.

There could be an option to request batched vs interactivate updates perhaps.


If I type exit in single user mode to go multi user mode, the local user still 
works. After a reboot the local user still works. This local user can also sudo 
as expected. This wasn't the case for the previous build when I first reported 
this. However, if I run etcupdate resolve it is still presenting /etc/group and 
/etc/master/passwd as problems.

If this is is expected behavior for current then no big deal. I just wasn't 
sure.


The conflicts themselves are expected, alas. But you _must_ resolve them, 
otherwise you can end up with a mostly-bricked system.


No, the conflict markers are not placed in the versions in /etc.  However, 
etucpdate
does refuse to do a "new" upgrade until you resolve all the conflicts from your
previous upgrade to ensure that conflicted upgrades aren't missed.

--
John Baldwin




sscanf change prevents build of CURRENT

2023-08-30 Thread John F Carr
I had a problem yesterday and today rebuilding a -CURRENT system from source:

  --- magic.mgc ---
  ./mkmagic magic
  magic, 4979: Warning: Current entry does not yet have a description for 
adding a MIME type
  mkmagic: could not find any valid magic files!

The cause was an sscanf call unexpectedly failing to parse the input.  This 
caused
the mkmagic program (internal tool used to build magic number table for file) 
to fail.

If I link mkmagic against the static libc.a in /usr/obj then it works.  So my 
installed
libc.so is broken and the latest source works.  I think.  My installed kernel 
is at
76edfabbecde, the end of the binary integer parsing commit series, so my libc
should be the same.

The program below demonstrates the bug.  See src/contrib/file/src for context.

I am trying to manually compile a working mkmagic and restart the build to get 
unstuck.

#include 
#include 

struct guid {
uint32_t data1;
uint16_t data2;
uint16_t data3;
uint8_t data4[8];
};

int main(int argc, char *argv[])
{
  struct guid g = {0, 0, 0, {0}};
  char *text = "75B22630-668E-11CF-A6D9-00AA0062CE6C";

  if (argc > 1)
text = argv[1];
  int count =
sscanf(text,
   "%8x-%4hx-%4hx-%2hhx%2hhx-%2hhx%2hhx%2hhx%2hhx%2hhx%2hhx",
   , , , [0], [1],
   [2], [3], [4], [5],
   [6], [7]);

  fprintf(stdout,
  
"[%d]:\n%08x-%04hx-%04hx-%02hhx%02hhx-%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx\n",
  count,
  g.data1, g.data2, g.data3, g.data4[0], g.data4[1],
  g.data4[2], g.data4[3], g.data4[4], g.data4[5],
  g.data4[6], g.data4[7]);
  return count != 11;
}




Re: shell hung in fork system call

2023-07-10 Thread John F Carr



> On Jul 9, 2023, at 19:59, Konstantin Belousov  wrote:
> 
> On Sun, Jul 09, 2023 at 11:36:03PM +0000, John F Carr wrote:
>> 
>> 
>>> On Jul 9, 2023, at 19:25, Konstantin Belousov  wrote:
>>> 
>>> On Sun, Jul 09, 2023 at 10:41:27PM +, John F Carr wrote:
>>>> Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some 
>>>> irrelevant local changes, four 64 bit ARM processors, make.conf sets 
>>>> CPUTYPE?=cortex-a57.
>>>> 
>>>> I typed ^C while /bin/sh was starting a pipeline and my shell got hung in 
>>>> the middle of fork().
>>>> 
>>>>> From the terminal:
>>>> 
>>>> # git log --oneline --|more
>>>> ^C^C^C
>>>> load: 3.26  cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k
>>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
>>>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
>>>> load: 3.16  cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k
>>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
>>>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
>>>> 
>>>> According to ps -d on another terminal the shell has no children:
>>>> 
>>>> PID TT  STAT   TIME COMMAND
>>>> [...]
>>>> 873 u0  IWs 0:00.00 `-- login [pam] (login)
>>>> 874 u0  I   0:00.17   `-- -sh (sh)
>>>> 95504 u0  I   0:00.01 `-- su -
>>>> 95505 u0  D+  0:00.05   `-- -su (sh)
>>>> [...]
>>>> 
>>>> Nothing on the (115200 bps serial) console.  No change in system 
>>>> performance.
>>>> 
>>>> The system is busy copying a large amount of data from the network to a 
>>>> ZFS pool on spinning disks.  The git|more pipeline could have taken some 
>>>> time to get going while I/O requests worked their way through the queue.  
>>>> It would not have touched the busy pool, only the zroot pool on an SSD.
>>>> 
>>>> Has anything changed recently that might cause this?
>>> 
>>> There was some change around fork, but your sleep seems to be not from
>>> that change.  Can you show the wait channel for the process?  Do something
>>> like
>>> $ ps alxww
>>> 
>> 
>> UID   PID  PPID  C PRI NI   VSZ   RSS MWCHAN   STAT TTTIME COMMAND
>>   0 95505 95504  2  20  0 13508  2876 fork D+   u0 0:00.13 -su (sh)
>> 
>> This is probably the same information displayed as [fork] in the output from 
>> ^T.
>> 
>> Does it correspond to the source line
>> 
>> pause("fork", hz / 2);
>> 
>> ?
> 
> Yes, it is rate-limiting code.  Still it is interesting to see the whole
> ps output.
> 
> Do you have 7a70f17ac4bd64dc1a5020f in your source?

No, I do not have that commit.

The comment mentions livelock.  CPU use as reported by iostat did not change 
after the process hung.






Re: shell hung in fork system call

2023-07-09 Thread John F Carr



> On Jul 9, 2023, at 19:25, Konstantin Belousov  wrote:
> 
> On Sun, Jul 09, 2023 at 10:41:27PM +0000, John F Carr wrote:
>> Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some 
>> irrelevant local changes, four 64 bit ARM processors, make.conf sets 
>> CPUTYPE?=cortex-a57.
>> 
>> I typed ^C while /bin/sh was starting a pipeline and my shell got hung in 
>> the middle of fork().
>> 
>>> From the terminal:
>> 
>> # git log --oneline --|more
>> ^C^C^C
>> load: 3.26  cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k
>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
>> load: 3.16  cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k
>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
>> 
>> According to ps -d on another terminal the shell has no children:
>> 
>>  PID TT  STAT   TIME COMMAND
>> [...]
>>  873 u0  IWs 0:00.00 `-- login [pam] (login)
>>  874 u0  I   0:00.17   `-- -sh (sh)
>> 95504 u0  I   0:00.01 `-- su -
>> 95505 u0  D+  0:00.05   `-- -su (sh)
>> [...]
>> 
>> Nothing on the (115200 bps serial) console.  No change in system performance.
>> 
>> The system is busy copying a large amount of data from the network to a ZFS 
>> pool on spinning disks.  The git|more pipeline could have taken some time to 
>> get going while I/O requests worked their way through the queue.  It would 
>> not have touched the busy pool, only the zroot pool on an SSD.
>> 
>> Has anything changed recently that might cause this?
> 
> There was some change around fork, but your sleep seems to be not from
> that change.  Can you show the wait channel for the process?  Do something
> like
> $ ps alxww
> 

 UID   PID  PPID  C PRI NI   VSZ   RSS MWCHAN   STAT TTTIME COMMAND
   0 95505 95504  2  20  0 13508  2876 fork D+   u0 0:00.13 -su (sh)

This is probably the same information displayed as [fork] in the output from ^T.

Does it correspond to the source line

pause("fork", hz / 2);

?




shell hung in fork system call

2023-07-09 Thread John F Carr
Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some 
irrelevant local changes, four 64 bit ARM processors, make.conf sets 
CPUTYPE?=cortex-a57.

I typed ^C while /bin/sh was starting a pipeline and my shell got hung in the 
middle of fork().

>From the terminal:

# git log --oneline --|more
^C^C^C
load: 3.26  cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k
mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
load: 3.16  cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k
mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 

According to ps -d on another terminal the shell has no children:

  PID TT  STAT   TIME COMMAND
[...]
  873 u0  IWs 0:00.00 `-- login [pam] (login)
  874 u0  I   0:00.17   `-- -sh (sh)
95504 u0  I   0:00.01 `-- su -
95505 u0  D+  0:00.05   `-- -su (sh)
[...]

Nothing on the (115200 bps serial) console.  No change in system performance.

The system is busy copying a large amount of data from the network to a ZFS 
pool on spinning disks.  The git|more pipeline could have taken some time to 
get going while I/O requests worked their way through the queue.  It would not 
have touched the busy pool, only the zroot pool on an SSD.

Has anything changed recently that might cause this?





Re: For snapshot builds: armv7 chroot on aarch64 has kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin hung up [in getpid?], unkillable, prevents reboot

2023-07-07 Thread John F Carr
On Jul 6, 2023, at 20:42, Mike Karels  wrote:
> 
> 
> Thanks for isolating this.  Let me know when you have the bug number.
> I just tested a fix (the compat code drops the reference on the current
> address space an extra time, probably freeing it).
> 
> Mike

The bug was introduced in January, 2022.   It allows 32 bit binaries to crash a 
64 bit system when COMPAT_FREEBSD32 is on.  Test coverage of the buggy function 
(sysctl_kern_proc_vm_layout) was added at the same time.

There should be routine runs of 32 bit test suites on 64 bit systems.  Although 
i386 and armv7 are tier 2 systems, the tier 1 COMPAT_FREEBSD32 kernel code 
needs to be exercised.  This bug was only discovered by manually running tests 
in the right environment, 17 months after automated testing could have 
discovered it.





Re: For snapshot builds: armv7 chroot on aarch64 has kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin hung up [in getpid?], unkillable, prevents reboot

2023-07-06 Thread John F Carr



> On Jun 25, 2023, at 20:16, Mark Millard  wrote:
> 
> Using the likes of:
> 
> FreeBSD-14.0-CURRENT-arm64-aarch64-ROCK64-20230622-b95d2237af40-263748.img
> and:
> FreeBSD-14.0-CURRENT-arm-armv7-GENERICSD-20230622-b95d2237af40-263748.img
> 
> I have shown the following behavior after setting up storage
> media based on them. (This was a test that my builds were not
> odd for the issue.)
> 
> Boot the aarch64 media and log in. (Note: I logged in
> as root.)
> 
> mount the armv7 media (-noatime is just my habit)
> and then put it to use:
> 
> # mount -onoatime /dev/da1s2a /mnt
> 
> # chroot /mnt/
> 
> # kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin
> sys/kern/kern_copyin:kern_copyin  ->  
> 
> On the serial console:
> 
> # ps -xu
> USER  PID   %CPU %MEM   VSZ  RSS TT  STAT STARTED  TIME COMMAND
> root   11 1498.4  0.0 0  256  -  RNL  23:24   542:52.92 [idle]
> root 1174  100.0  0.0 0   16  -  Rs   23:37 0:00.00 
> /usr/tests/sys/kern/kern_copyin -vunprivileged-user=tests 
> -r/tmp/kyua.9YUttj/2/result.atf kern_copyin
> root00.0  0.0 0 1616  -  DLs  23:24 0:00.50 [kernel]
> root10.0  0.0 11704 1288  -  ILs  23:24 0:00.02 /sbin/init
> root20.0  0.0 0  256  -  WL   23:24 0:00.26 [clock]
> root30.0  0.0 0  272  -  DL   23:24 0:00.00 [crypto]
> root40.0  0.0 0   80  -  DL   23:24 0:00.95 [cam]
> root50.0  0.0 0   16  -  DL   23:24 0:00.00 [busdma]
> root60.0  0.0 0   16  -  DL   23:24 0:00.03 [rand_harvestq]
> root70.0  0.0 0   48  -  DL   23:24 0:00.06 [pagedaemon]
> root80.0  0.0 0   16  -  DL   23:24 0:00.00 [vmdaemon]
> root90.0  0.0 0  160  -  DL   23:24 0:00.38 [bufdaemon]
> root   100.0  0.0 0   16  -  DL   23:24 0:00.00 [audit]
> root   120.0  0.0 0  880  -  WL   23:24 0:11.81 [intr]
> root   130.0  0.0 0   48  -  DL   23:24 0:00.04 [geom]
> root   140.0  0.0 0   16  -  DL   23:24 0:00.00 [sequencer 00]
> root   150.0  0.0 0  160  -  DL   23:24 0:06.42 [usb]
> root   160.0  0.0 0   16  -  DL   23:24 0:00.10 [acpi_thermal]
> root   170.0  0.0 0   16  -  DL   23:24 0:00.00 [acpi_cooling0]
> root   180.0  0.0 0   16  -  DL   23:24 0:00.04 [syncer]
> root   190.0  0.0 0   16  -  DL   23:24 0:00.00 [vnlru]
> root  6710.0  0.0 13260 2600  -  Is   23:25 0:00.00 dhclient: 
> system.syslog (dhclient)
> root  6740.0  0.0 13260 2752  -  Is   23:25 0:00.00 dhclient: dpni0 
> [priv] (dhclient)
> root  7610.0  0.0 14572 3972  -  Ss   23:25 0:00.02 /sbin/devd
> root  9640.0  0.0 12832 2764  -  Is   23:25 0:00.02 /usr/sbin/syslogd 
> -s
> root 10330.0  0.0 13012 2604  -  Ss   23:25 0:00.01 /usr/sbin/cron -s
> root 10580.0  0.0 21052 8308  -  Is   23:25 0:00.01 sshd: 
> /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd)
> root 10780.0  0.0 21288 9304  -  Is   23:26 0:00.09 sshd: root@pts/0 
> (sshd)
> root 11750.0  0.0 21288 9496  -  Is   23:37 0:00.04 sshd: root@pts/1 
> (sshd)
> root 10740.0  0.0 13380 3008 u0  Is   23:25 0:00.01 login [pam] 
> (login)
> root 10750.0  0.0 13460 3292 u0  S23:25 0:00.02 -sh (sh)
> root 12330.0  0.0 13588 3016 u0  R+   00:00 0:00.00 ps -xu
> root 10810.0  0.0 13460 3328  0  Is   23:26 0:00.02 -sh (sh)
> root 11700.0  0.0  5788 2884  0  I23:36 0:00.02 /bin/sh -i
> root 11720.0  0.0 10408 7192  0  I+   23:37 0:00.30 kyua test -k 
> /usr/tests/Kyuafile sys/kern/kern_copyin
> root 11780.0  0.0 13460 3320  1  Is+  23:38 0:00.01 -sh (sh)
> 
> 1174 is stuck, even if one waits for 30min+.
> kill and kill -9 will not kill 1174.
> 
> "shutdown -r now" hangs before the reboot happens
> and reports: "some processes would not die".
> 
> An interesting property is that ps and top disagree
> about 1174 CPU usage: ps 100%, top 0%. But top also
> indicates 1174 always has CPU0 "STATE". (Across
> tests CPUn varies but within a test it has
> a fixed n.)
> 
> I have also seen ps "STAT" being RXs.
> 
> The following is from my earlier activity with my own
> builds involved, here 1119, not the 1174 from above.
> truss reports as the last thing for the stuck process
> as "getpid()".
> 
> . . .
> 1119: 0.588983953 fstatat(AT_FDCWD,"/usr/tests/sys/kern/kern_copyin",{ 
> mode=-r-xr-xr-x ,inode=111756,size=9776,blksize=10240 },AT_SYMLINK_NOFOLLOW) 
> = 0 (0x0)
> 1119: 0.589065030 
> mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12),-1,0x0)
>  = 1074188288 (0x4006d000)
> 1119: 0.589227544 
> openat(AT_FDCWD,"/tmp/kyua.aBQv6E/2/result.atf",O_WRONLY|O_CREAT|O_TRUNC,0644)
>  = 3 (0x3)
> 1119: 0.589276503 getpid()  = 1119 (0x45f)
> 
> 
> 
> For reference, from inside an armv7 chroot session
> before doing such a test:
> 
> # uname -apKU
> FreeBSD generic 

Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit

2023-06-26 Thread John F Carr


> On Jun 26, 2023, at 04:32, Mark Millard  wrote:
> 
> On Jun 24, 2023, at 17:25, Mark Millard  wrote:
> 
>> On Jun 24, 2023, at 14:26, John F Carr  wrote:
>> 
>>> 
>>>> On Jun 24, 2023, at 13:00, Mark Millard  wrote:
>>>> 
>>>> The running system build is a non-debug build (but
>>>> with symbols not stripped).
>>>> 
>>>> The HoneyComb's console log shows:
>>>> 
>>>> . . .
>>>> GEOM_STRIPE: Device stripe.IMfBZr destroyed.
>>>> GEOM_NOP: Device md0.nop created.
>>>> g_vfs_done():md0.nop[READ(offset=5885952, length=8192)]error = 5
>>>> GEOM_NOP: Device md0.nop removed.
>>>> GEOM_NOP: Device md0.nop created.
>>>> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5
>>>> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5
>>>> GEOM_NOP: Device md0.nop removed.
>>>> GEOM_NOP: Device md0.nop created.
>>>> GEOM_NOP: Device md0.nop removed.
>>>> Fatal data abort:
>>>> x0: a02506e64400
>>>> x1: 0001ea401880 (g_raid3_post_sync + 3a145f8)
>>>> x2:   4b
>>>> x3: a343932b0b22fb30
>>>> x4:0
>>>> x5:  3310b0d062d0e1d
>>>> x6: 1d0e2d060d0b3103
>>>> x7:0
>>>> x8: ea325df8
>>>> x9: 0001eec946d0 ($d.6 + 0)
>>>> x10: 0001ea401880 (g_raid3_post_sync + 3a145f8)
>>>> x11:0
>>>> x12:0
>>>> x13: 00cd8960 (lock_class_mtx_sleep + 0)
>>>> x14:0
>>>> x15: a02506e64405
>>>> x16: 0001eec94860 (_DYNAMIC + 160)
>>>> x17: 0063a450 (ifc_attach_cloner + 0)
>>>> x18: 0001eb290400 (g_raid3_post_sync + 48a3178)
>>>> x19: 0001eec94600 (vnet_epair_init_vnet_init + 0)
>>>> x20: 00fa5b68 (vnet_sysinit_sxlock + 18)
>>>> x21: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>>>> x22: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>>>> x23: a042e500
>>>> x24: a042e500
>>>> x25: 00ce0788 (linker_lookup_set_desc + 0)
>>>> x26: a0203cdef780
>>>> x27: 0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init + 0)
>>>> x28: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>>>> x29: 0001eb290430 (g_raid3_post_sync + 48a31a8)
>>>> sp: 0001eb290400
>>>> lr: 0001eec82a4c ($x.1 + 3c)
>>>> elr: 0001eec82a60 ($x.1 + 50)
>>>> spsr: 6045
>>>> far: 0002d8fba4c8
>>>> esr: 9646
>>>> panic: vm_fault failed: 0001eec82a60 error 1
>>>> cpuid = 14
>>>> time = 1687625470
>>>> KDB: stack backtrace:
>>>> db_trace_self() at db_trace_self
>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>>>> vpanic() at vpanic+0x13c
>>>> panic() at panic+0x44
>>>> data_abort() at data_abort+0x2fc
>>>> handle_el1h_sync() at handle_el1h_sync+0x14
>>>> --- exception, esr 0x9646
>>>> $x.1() at $x.1+0x50
>>>> vnet_register_sysinit() at vnet_register_sysinit+0x114
>>>> linker_load_module() at linker_load_module+0xae4
>>>> kern_kldload() at kern_kldload+0xfc
>>>> sys_kldload() at sys_kldload+0x60
>>>> do_el0_sync() at do_el0_sync+0x608
>>>> handle_el0_sync() at handle_el0_sync+0x44
>>>> --- exception, esr 0x5600
>>>> KDB: enter: panic
>>>> [ thread pid 70419 tid 101003 ]
>>>> Stopped at  kdb_enter+0x44: str xzr, [x19, #3200]
>>>> db> 
>>> 
>>> The failure appears to be initializing module if_epair.
>> 
>> Yep: trying:
>> 
>> # kldload if_epair.ko
>> 
>> was enough to cause the crash. (Just a HoneyComb context at
>> that point.)
>> 
>> I tried media dd'd from the recent main snapshot, booting the
>> same system. No crash. I moved my build boot media to some
>> other systems and tested them: crashes. I tried my boot media
>> built optimized for Cortex-A53 or Cortex-X1C/Cortex-A78C
>> instead of Cortex-A72: no crashes. (But only one system can
>> use the X1C/A78C code in that build.)
>> 
>> So variation testing only gets the crashes for my builds
>> that are code-optimized for Cortex-A72's. The same source
>> tree vinta

Re: twe(4) removed

2023-06-25 Thread John Nielsen
> On Jun 24, 2023, at 4:16 AM, Marcin Cieslak  wrote:
> 
> I just noticed that I had to remove "device twe"
> from my kernel configuration when rebuilding my -CURRENT today.
> 
> Is there any problem with this driver that makes it difficult
> to keep around?
> 
> Believe or not, I still rent a machine using it in JBOD mode
> (running 13 right now but I could switch it to -CURRENT for testing if 
> needed).

The deprecation notice and partial justification are here:
https://cgit.freebsd.org/src/commit/?id=4b22ce07306243d6641c93efcf315a787dd0876c

JN

Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit

2023-06-24 Thread John F Carr


> On Jun 24, 2023, at 13:00, Mark Millard  wrote:
> 
> The running system build is a non-debug build (but
> with symbols not stripped).
> 
> The HoneyComb's console log shows:
> 
> . . .
> GEOM_STRIPE: Device stripe.IMfBZr destroyed.
> GEOM_NOP: Device md0.nop created.
> g_vfs_done():md0.nop[READ(offset=5885952, length=8192)]error = 5
> GEOM_NOP: Device md0.nop removed.
> GEOM_NOP: Device md0.nop created.
> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5
> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5
> GEOM_NOP: Device md0.nop removed.
> GEOM_NOP: Device md0.nop created.
> GEOM_NOP: Device md0.nop removed.
> Fatal data abort:
>  x0: a02506e64400
>  x1: 0001ea401880 (g_raid3_post_sync + 3a145f8)
>  x2:   4b
>  x3: a343932b0b22fb30
>  x4:0
>  x5:  3310b0d062d0e1d
>  x6: 1d0e2d060d0b3103
>  x7:0
>  x8: ea325df8
>  x9: 0001eec946d0 ($d.6 + 0)
> x10: 0001ea401880 (g_raid3_post_sync + 3a145f8)
> x11:0
> x12:0
> x13: 00cd8960 (lock_class_mtx_sleep + 0)
> x14:0
> x15: a02506e64405
> x16: 0001eec94860 (_DYNAMIC + 160)
> x17: 0063a450 (ifc_attach_cloner + 0)
> x18: 0001eb290400 (g_raid3_post_sync + 48a3178)
> x19: 0001eec94600 (vnet_epair_init_vnet_init + 0)
> x20: 00fa5b68 (vnet_sysinit_sxlock + 18)
> x21: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
> x22: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
> x23: a042e500
> x24: a042e500
> x25: 00ce0788 (linker_lookup_set_desc + 0)
> x26: a0203cdef780
> x27: 0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init + 0)
> x28: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
> x29: 0001eb290430 (g_raid3_post_sync + 48a31a8)
>  sp: 0001eb290400
>  lr: 0001eec82a4c ($x.1 + 3c)
> elr: 0001eec82a60 ($x.1 + 50)
> spsr: 6045
> far: 0002d8fba4c8
> esr: 9646
> panic: vm_fault failed: 0001eec82a60 error 1
> cpuid = 14
> time = 1687625470
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x13c
> panic() at panic+0x44
> data_abort() at data_abort+0x2fc
> handle_el1h_sync() at handle_el1h_sync+0x14
> --- exception, esr 0x9646
> $x.1() at $x.1+0x50
> vnet_register_sysinit() at vnet_register_sysinit+0x114
> linker_load_module() at linker_load_module+0xae4
> kern_kldload() at kern_kldload+0xfc
> sys_kldload() at sys_kldload+0x60
> do_el0_sync() at do_el0_sync+0x608
> handle_el0_sync() at handle_el0_sync+0x44
> --- exception, esr 0x5600
> KDB: enter: panic
> [ thread pid 70419 tid 101003 ]
> Stopped at  kdb_enter+0x44: str xzr, [x19, #3200]
> db> 

The failure appears to be initializing module if_epair.  I see no recent 
changes in that module that would be likely to break initialization.

a9bfd080d09a if_epair: do not transmit packets that exceed the interface MTU
4d846d260e2b spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop 
-FreeBSD
a6b55ee6be15 net: replace IFF_KNOWSEPOCH with IFF_NEEDSEPOCH
c69ae8419734 if_epair: also remove vlan metadata from mbufs
29c9b1673305 epair: Remove unneeded includes and sort some of the rest








Re: Support for more than 256 CPU cores

2023-05-08 Thread John Baldwin

On 5/5/23 6:38 AM, Ed Maste wrote:

FreeBSD supports up to 256 CPU cores in the default kernel configuration
(on Tier-1 architectures).  Systems with more than 256 cores are
available now, and will become increasingly common over FreeBSD 14’s
lifetime.  The FreeBSD Foundation is supporting the effort to increase
MAXCPU, and PR269572[1] is open to track tasks and changes.

As a project we have scalability work ahead of us to make best use of
high core count machines, but at a minimum we should be able to boot a
GENERIC kernel on such systems, and have an ABI for the FreeBSD 14
release that supports such a configuration.

Some changes have already been committed in support of increased MAXCPU,
including increasing MAX_APIC_ID (commit c8113dad7ed4) and a number of
changes to reduce bloat (such as commits 42f722e721cd, e72f7ed43eef,
78cfa762ebf2 and 74ac712f72cf).

The next step is to increase the maximum cpuset size for userland.
I have this change open in review D39941[2] and an exp-run request in
PR271213[3].  Following that the kernel change for increasing MAXCPU is
in D36838[4].

Additional work on bloat reduction will continue after this change, and
looking forward FreeBSD is going to need ongoing effort from the
community and the FreeBSD Foundation to continue improving scalability.

[1] https://bugs.freebsd.org/269572
[2] https://reviews.freebsd.org/D39941
[3] https://bugs.freebsd.org/271213
[4] https://reviews.freebsd.org/D36838


FWIW, I think it will be useful for main to run with a larger userspace
MAXCPU than kernel for at least a while so that we have better testing
of that configuration and to give headroom for bumping MAXCPU in the
kernel during the 14.x branch.

The only other viable path I think which would be more work would be to
rework cpuset_t in userspace to always use a dynamically sized mask.
This could perhaps be done in an API-preserving manner by making cpuset_t
an opaque wrapper type in userland and requiring CPU_* to indirect to
functions in libc, etc.  That's a fair bit more work however.

--
John Baldwin




Re: morse(6) sound

2022-10-28 Thread John-Mark Gurney
Nuno Teixeira wrote this message on Fri, Oct 28, 2022 at 19:36 +0100:
> Is there any way to get sound from morse(6) without speaker(4) device?

I mean, I guess you could use sox (play command) and sed to make the audio..

morse -s converts it to . and -'s, so then you convert each one of those to
a frequency and necessary delay.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."



Re: How to Enable support for IPsec deprecated algorithms: 3DES, MD5-HMAC

2022-10-04 Thread John Baldwin

On 10/4/22 1:53 AM, alfadev wrote:

Hi, i am trying to move my gateway from FreeBSD 11.0 to FreeBSD 14.0 to use
newly added ipfw table lookup for mac addresses 
(https://reviews.freebsd.org/D35103)

Also I have too many IPSec connections between fortigate, cisco etc.
And their operators use only 3DES algorithms and they have no intention to 
change it for me.
So, now i have to enable 3DES support for FreeBSD 14.0 .

To add 3DES support again i changed some files shown below.
I am not sure what i did any help welcomes.


You do not want to just restore the files as-is.  You instead want to revert 
some of the
diffs from the first commit.  The second commit for /dev/crypto doesn't matter 
for IPsec
and you can ignore it.

However, you will need to also partially revert commit 
0e00c709d7f1cdaeb584d244df9534bcdd0ac527
which removes DES and 3DES from OCF itself.  This is what removed enc_xform_des 
for example.

--
John Baldwin



Re: pkg: Newer FreeBSD version for package... but why?

2022-07-13 Thread John Baldwin

On 7/13/22 3:17 AM, Andriy Gapon wrote:

On 2022-07-13 13:09, Michael Gmelin wrote:



On Wed, 13 Jul 2022 10:29:06 +0300
Andriy Gapon  wrote:


# uname -U
1400063

# uname -K
1400063

# pkg upgrade
Updating FreeBSD repository catalogue...
Fetching packagesite.pkg: 100%5 MiB   4.8MB/s00:01
Processing entries:   0%
Newer FreeBSD version for package zyre:
To ignore this error set IGNORE_OSVERSION=yes
- package: 1400063
- running kernel: 1400051
Ignore the mismatch and continue? [y/N]:

Does anyone know why this would happen?
Where does pkg get its notion of the running kernel version?



If I'm reading the sources correctly, it's determining the OS version
by looking at the elf headers of various files in this order:

  getenv("ABI_FILE")
  /usr/bin/uname
  /bin/sh

So I would assume that `file /usr/bin/uname` shows 1400051 on your
system.


Thank you very much!  That's it:
# file /usr/bin/uname
/usr/bin/uname: ELF 32-bit LSB executable, ARM, EABI5 version 1
(FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1,
FreeBSD-style, for FreeBSD 14.0 (1400051), stripped


You can point it to checking another file by setting ABI_FILE[0] in the
environment or ignore the check by setting IGNORE_OSVERSION (like
advised). The "running kernel:" label seems a bit misleading.


Indeed.

Now the next thing (for me) to research is why the binaries were built
"for FreeBSD 14.0 (1400051)" when the source tree has 1400063 and uname
-U also reports 1400063.
FWIW, this was a cross-build, maybe that played a role too.


If you do a NO_CLEAN=yes build, we don't relink binaries just because
crt*.o changed (where the note is stored).

--
John Baldwin



Re: BLAKE3 unstability?

2022-07-12 Thread John Baldwin

On 7/12/22 1:41 AM, Evgeniy Khramtsov wrote:

I can reproduce via:

$ truncate -s 10G /tmp/test
$ mdconfig -f /tmp/test -S 4096
$ zpool create test /dev/md1
$ zfs create -o checksum=blake3 test/b
$ dd if=/dev/random of=/test/b/noise bs=1M count=4096
$ sync
$ zpool scrub test
$ zpool status


I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was
most recently merged) built out of tree on either stable/13 70fd40edb86
or main 9aa02d5120a.

I'll update a system and see if I can reproduce it with the in-tree ZFS.

- Ryan


It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either.

Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using
different implementations?
I do see that blake3 went in with only a Linux module parameter for the
implementation selection, so I'll have to fix that. For now we can at least
see which was fastest, which should be the one selected. You just won't be
able to manually change it to see if that helps.

- Ryan


I found the culprit (kernel and base from download.FreeBSD.org
kernel.txz and base.txz respectively) (I forgot about local sysctl.conf...):

kern.sched.steal_thresh=1
kern.sched.preempt_thresh=121

Then

#!/bin/sh

truncate -s 10G /tmp/test
mdconfig -f /tmp/test -S 4096
zpool create test /dev/md0
zfs create -o checksum=blake3 test/b
dd if=/dev/random of=/test/b/noise bs=1M count=4096
sync
zpool scrub test
sleep 3
zpool status

zpool destroy test
mdconfig -d -u 0
rm /tmp/test

As for ULE "tuning", these values give me fine desktop interactivity
when building lang/rust when nice and idprio did not help, so I left
them in sysctl.conf. Not sure if scheduling parameters are worthy of
a ZFS PR, maybe something essential is preempted.


It could be missing fpu_kern_enter/leave that lack of preemption would
cover over.  I thought that missing that would give a panic in the
kernel though due to FPU instructions being disabled (including vector
instructions).  Maybe ZFS isn't using fpu_kern_enter(FPU_NOCTX) and is
instead trying to juggle contexts and it has a bug in how it manages
saved FPU contexts and reuses a context?  If so, I would just suggest
that ZFS switch to using FPU_KERN_NOCTX instead which runs all SSE
type code in a critical section to disable preemption but avoids
having to allocate and manage FPU contexts.

--
John Baldwin



Re: Accessibility in the FreeBSD installer and console

2022-07-07 Thread John Kennedy
On Thu, Jul 07, 2022 at 10:11:52PM +0200, Klaus Küchemann wrote:
> > Am 07.07.2022 um 19:32 schrieb Hans Petter Selasky :
> > The only argument I've heard from some non-sighted friends about not using 
> > FreeBSD natively is that ooh, MacOSX is so cool. It starts speaking from 
> > the start if I press this and this key. Is anyone here working on or 
> > wanting such a feature?
> 
> Possibly they didn’t want to  be rude and your friends didn't tell you the 
> other argument  :-)  : according to the corresponding wiki page FreeBSD 
> doesn't natively support any audio output at all on your friends current M1 
> Mac hardware.
>  since quite nothing is currently supported you probably will first take over 
> working on the Audio driver …..and of course USB  :-)

  I think a huge benefit that Apple would have is that they might be
able to guarantee some sort of audio speaker, period, since they
control the hardware that the software runs on.  That might be a big ask
on FreeBSD, but maybe if there was some relatively ubiquitous assistance
hardware, maybe doable.  But text-to-speech (and then WHAT language's
speech) is a big software chunk, audio layers seems large, and then
having to worry about the potential driver issues (while not being able
to see-to-hear any potential setup issues) seems huge.

  Everybody seems happy farming that out to the internet, except on
system setup you're not connected to the internet yet.

  Plus Apple has some deep hooks into the app-stack since you're
basically using their toolkit to make a graphical app, so they can
guarantee some potential for GUI-textbox-speech, where FreeBSD has a
hodgepodge of graphical toolkits (KDE, GTK, Gnome, etc).



Re: Accessibility in the FreeBSD installer and console

2022-07-07 Thread John Kennedy
On Thu, Jul 07, 2022 at 10:11:52PM +0200, Klaus Küchemann wrote:
> > Am 07.07.2022 um 19:32 schrieb Hans Petter Selasky :
> > The only argument I've heard from some non-sighted friends about not using 
> > FreeBSD natively is that ooh, MacOSX is so cool. It starts speaking from 
> > the start if I press this and this key. Is anyone here working on or 
> > wanting such a feature?
> 
> Possibly they didn’t want to  be rude and your friends didn't tell you the 
> other argument  :-)  : according to the corresponding wiki page FreeBSD 
> doesn't natively support any audio output at all on your friends current M1 
> Mac hardware.
>  since quite nothing is currently supported you probably will first take over 
> working on the Audio driver …..and of course USB  :-)

  I think a huge benefit that Apple would have is that they might be
able to guarantee some sort of audio speaker, period, since they
control the hardware that the software runs on.  That might be a big ask
on FreeBSD, but maybe if there was some relatively ubiqitous 



Re: Posting Netiquette [ref: Threads "look definitely like" unreadable mess. Handbook project.]

2022-07-01 Thread John-Mark Gurney
Greg 'groggy' Lehey wrote this message on Thu, Jun 23, 2022 at 16:33 +1000:
> Does anybody have an opinion on character set recommendations?  I
> think we should ask for UTF-8 if at all possible.

I don't think there's any need for a recommendation.  All [modern] MUA
should tag the post appropriately and each MUA be able to convert as
needed between them.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."


signature.asc
Description: PGP signature


Re: Profiled libraries on freebsd-current

2022-05-04 Thread John Baldwin

On 5/4/22 1:38 PM, Steve Kargl wrote:

On Wed, May 04, 2022 at 01:22:57PM -0700, John Baldwin wrote:

On 5/4/22 12:53 PM, Steve Kargl wrote:

On Wed, May 04, 2022 at 11:12:55AM -0700, John Baldwin wrote:

I don't know the entire FreeBSD ecosystem.  Do people
use FreeBSD on embedded systems (e.g., nanobsd) where
libthr may be stripped out?  Thus, --enable-threads=no
is needed.


If they do, they are also using a constrained userland and
probably are not shipping a GCC binary either.  However, it's
not clear to me what --enable-threads means.

Does this enable -pthread as an option?  If so, that should
definitely just always be on.  It's still an option users have
to opt into via a command line flag and doesn't prevent
building non-threaded programs.

If it's enabling use of threads at runtime within GCC itself,
I'd say that also should probably just be allowed to be on.

I can't really imagine what else it might mean (and I doubt
it means the latter).



AFAICT, it controls whether -lpthread is automatically added to
the command line.  In the case of -pg, it is -lpthread_p.
The relevant lines are

#ifdef FBSD_NO_THREADS
#define FBSD_LIB_SPEC "\
   %{pthread: %eThe -pthread option is only supported on FreeBSD when gcc \
is built with the --enable-threads configure-time option.}  \
   %{!shared:   \
 %{!pg: -lc}
\
 %{pg:  -lc_p}  \
   }"
#else
#define FBSD_LIB_SPEC "\
   %{!shared:   \
 %{!pg: %{pthread:-lpthread} -lc}   \
 %{pg:  %{pthread:-lpthread_p} -lc_p}   \
   }\
   %{shared:\
 %{pthread:-lpthread} -lc   \
   }"
#endif

Ed is wondering if one can get rid of FBSD_NO_THREADS. With the
pending removal of WITH_PROFILE, the above reduces to

#define FBSD_LIB_SPEC " \
   %{!shared:\
 %{pthread:-lpthread} -lc\
   } \
   %{shared: \
 %{pthread:-lpthread} -lc\
   }"

If one can do the above, then freebsd-nthr.h is no longer needed
and can be deleted and config.gcc's handling of --enable-threads
can be updated/removed.


Ok, so it's just if -pthread is supported (%{pthread:-lpthread} only
adds -lpthread if -pthread was given on the command line).  That can just
be on all the time and Ed is correct that it is safe to remove the
FBSD_NO_THREADS case and assume it is always present instead.

--
John Baldwin



Re: Profiled libraries on freebsd-current

2022-05-04 Thread John Baldwin

On 5/4/22 12:53 PM, Steve Kargl wrote:

On Wed, May 04, 2022 at 11:12:55AM -0700, John Baldwin wrote:

On 5/2/22 10:37 AM, Steve Kargl wrote:

On Mon, May 02, 2022 at 12:32:25PM -0400, Ed Maste wrote:

On Sun, 1 May 2022 at 11:54, Steve Kargl
 wrote:


diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h
index 594487829b5..1e8ab2e1827 100644
--- a/gcc/config/freebsd-spec.h
+++ b/gcc/config/freebsd-spec.h
@@ -93,14 +93,22 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  (similar to the default, except no -lg, and no -p).  */

   #ifdef FBSD_NO_THREADS


I wonder if we can simplify things now, and remove this
`FBSD_NO_THREADS` case. I didn't see anything similar in other GCC
targets I looked at.


That I don't know.  FBSD_NO_THREADS is defined in freebsd-nthr.h.
In fact, it's the only thing in that header (except copyright
broilerplate).  freebsd-nthr.h only appears in config.gcc and
seems to only get added to the build if someone runs configure
with --enable-threads=no.  Looking at my last config.log for
gcc trunk, I see "Thread model: posix", which appears to be
the default case or if someone does --enable-threads=yes or
--enable-threads=posix.  So, I suppose it comes down to
two questions: (1) is libpthread.* available on all supported
targets and versions? (2) does anyone build gcc without
threads support?


libpthread is available on all supported architectures on all
supported versions.  libthr has been the default threading library
since 7.0 and the only supported library since 8.0.  In GDB I just
assume libthr style threads, and I think GCC can safely do the
same.



I don't know the entire FreeBSD ecosystem.  Do people
use FreeBSD on embedded systems (e.g., nanobsd) where
libthr may be stripped out?  Thus, --enable-threads=no
is needed.


If they do, they are also using a constrained userland and
probably are not shipping a GCC binary either.  However, it's
not clear to me what --enable-threads means.

Does this enable -pthread as an option?  If so, that should
definitely just always be on.  It's still an option users have
to opt into via a command line flag and doesn't prevent
building non-threaded programs.

If it's enabling use of threads at runtime within GCC itself,
I'd say that also should probably just be allowed to be on.

I can't really imagine what else it might mean (and I doubt
it means the latter).

--
John Baldwin



Re: Profiled libraries on freebsd-current

2022-05-04 Thread John Baldwin

On 5/2/22 10:37 AM, Steve Kargl wrote:

On Mon, May 02, 2022 at 12:32:25PM -0400, Ed Maste wrote:

On Sun, 1 May 2022 at 11:54, Steve Kargl
 wrote:


diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h
index 594487829b5..1e8ab2e1827 100644
--- a/gcc/config/freebsd-spec.h
+++ b/gcc/config/freebsd-spec.h
@@ -93,14 +93,22 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 (similar to the default, except no -lg, and no -p).  */

  #ifdef FBSD_NO_THREADS


I wonder if we can simplify things now, and remove this
`FBSD_NO_THREADS` case. I didn't see anything similar in other GCC
targets I looked at.


That I don't know.  FBSD_NO_THREADS is defined in freebsd-nthr.h.
In fact, it's the only thing in that header (except copyright
broilerplate).  freebsd-nthr.h only appears in config.gcc and
seems to only get added to the build if someone runs configure
with --enable-threads=no.  Looking at my last config.log for
gcc trunk, I see "Thread model: posix", which appears to be
the default case or if someone does --enable-threads=yes or
--enable-threads=posix.  So, I suppose it comes down to
two questions: (1) is libpthread.* available on all supported
targets and versions? (2) does anyone build gcc without
threads support?


libpthread is available on all supported architectures on all
supported versions.  libthr has been the default threading library
since 7.0 and the only supported library since 8.0.  In GDB I just
assume libthr style threads, and I think GCC can safely do the
same.

--
John Baldwin



Re: 'set but unused' breaks drm-*-kmod

2022-04-27 Thread John Baldwin

On 4/21/22 6:45 AM, Emmanuel Vadot wrote:

On Thu, 21 Apr 2022 08:51:26 -0400
Michael Butler  wrote:


On 4/21/22 03:42, Emmanuel Vadot wrote:


   Hello Michael,

On Wed, 20 Apr 2022 23:39:12 -0400
Michael Butler  wrote:


Seems this new requirement breaks kmod builds too ..

The first of many errors was (I stopped chasing them all for lack of
time) ..

--- amdgpu_cs.o ---
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.7.19_3/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1210:26:
error: variable 'priority' set but not used
[-Werror,-Wunused-but-set-variable]
   enum drm_sched_priority priority;
   ^
1 error generated.
*** [amdgpu_cs.o] Error code 1



   How are you building the port, directly or with PORTS_MODULES ?
   I do make passes on the warning for drm and I did for set-but-not-used
case but unfortunately this option doesn't exists in 13.0 so I couldn't
apply those in every branch.


I build this directly on -current. I'm guessing that these are what
triggered this behaviour:

commit 8b83d7e0ee54416b0ee58bd85f9c0ae7fb3357a1
Author: John Baldwin 
Date:   Mon Apr 18 16:06:27 2022 -0700

  Make -Wunused-but-set-variable a fatal error for clang 13+ for
kernel builds.

  Reviewed by:imp, emaste
  Differential Revision:  https://reviews.freebsd.org/D34949

commit 615d289ffefe2b175f80caa9b1e113c975576472
Author: John Baldwin 
Date:   Mon Apr 18 16:06:14 2022 -0700

  Re-enable set but not used warnings for kernel builds.

  make tinderbox now passes with this warning enabled as a fatal error,
  so revert the change to hide it in preparation for making it fatal.

  This reverts commit e8e691983bb75e80153b802f47733f1531615fa2.

  Reviewed by:imp, emaste
  Differential Revision:  https://reviews.freebsd.org/D34948




  Ok I see,

  I won't have time until monday (maybe tuesday to fix this) but if
someone wants to beat me to it we should add some new CWARNFLAGS for
each problematic files in the 5.4-lts and 5.7-table branches of
drm-kmod (master which is following 5.10 is already good) only
if $ {COMPILER_VERSION} >= 13.


There is already a helper you can use that deals with compiler versions:

CWARNFLAGS+= ${NO_WUNUSED_BUT_SET_VARIABLE}

or some such.

--
John Baldwin



Can't build with INVARIANTS but not WITNESS

2022-04-27 Thread John F Carr
My -CURRENT kernel has INVARIANTS (inherited from GENERIC) but not WITNESS:

include GENERIC
ident   STRIATUS
nooptions   WITNESS
nooptions   WITNESS_SKIPSPIN

My kernel build fails:

/usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:102:13: error: variable 'line' 
set but not used [-Werror,-Wunused-but-set-variable]
int flags, line __diagused;
   ^
/usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:101:14: error: variable 'file' 
set but not used [-Werror,-Wunused-but-set-variable]
const char *file __diagused;

The problem is, __diagused expands to nothing if INVARIANTS _or_ WITNESS is 
defined, but the variable in vfs_lookup.c is only used if WITNESS is defined.

#if defined(INVARIANTS) || defined(WITNESS)
#define __diagused
#else
#define __diagused  __unused
#endif

I think this code is trying to be too clever and causing more trouble than it 
prevents.  Change the || to &&, or replace __diagused with __unused everywhere.





Re: ktrace on NFSroot failing?

2022-03-25 Thread John Baldwin

On 3/10/22 8:14 AM, Mateusz Guzik wrote:

On 3/10/22, Bjoern A. Zeeb  wrote:

Hi,

I am having a weird issue with ktrace on an nfsroot machine:

root:/tmp # ktrace sleep 1
root:/tmp # kdump
-559038242  Events dropped.
kdump: bogus length 0xdeadc0de

Anyone seen something like this before?



I just did a quick check and it definitely fails on nfs mounts:
# ktrace pwd
/root/mjg
# kdump
-559038242  Events dropped.
kdump: bogus length 0xdeadc0de

I don't have time to look into it this week though.


Possibly related: core dumps are no longer working for me on NFS
mounts.  I get a 0 byte foo.core instead of a valid core dump.

--
John Baldwin



Re: Buildworld fails with external GCC toolchain

2022-02-14 Thread John Baldwin

On 2/12/22 11:34 AM, Yasuhiro Kimura wrote:

From: Dimitry Andric 
Subject: Re: Buildworld fails with external GCC toolchain
Date: Fri, 11 Feb 2022 22:53:44 +0100


Not really, the gcc 9 build has been broken for months, as far as I know.

See also: https://ci.freebsd.org/job/FreeBSD-main-amd64-gcc9_build/

The last build(s) show a different error from yours, though:

/workspace/src/tests/sys/netinet/libalias/util.c: In function 'set_udp':
/workspace/src/tests/sys/netinet/libalias/util.c:112:2: error: converting a 
packed 'struct ip' pointer (alignment 2) to a 'uint32_t' {aka 'unsigned int'} 
pointer (alignment 4) may result in an unaligned pointer value 
[-Werror=address-of-packed-member]
   112 |  uint32_t *up = (void *)p;
   |  ^~~~
In file included from /workspace/src/tests/sys/netinet/libalias/util.h:37,
  from /workspace/src/tests/sys/netinet/libalias/util.c:39:
/workspace/src/sys/netinet/ip.h:51:8: note: defined here
51 | struct ip {
   |^~

-Dimitry



Thanks for information. I went back the commit history of main branch
about every month and check if buildworld succeeds with GCC. But it
didn't succeed even if I went back about a year. And devel/binutils
port was update to 2.37 on last August. So I suspect external GCC
toolchain doesn't work well after binutils is updated to current
version.


I have amd64 world + kernel building with GCC 9 and the only remaining
open review not merged yet is https://reviews.freebsd.org/D34147.

It is work to keep it working though and I hadn't worked on it again
until recently.

--
John Baldwin



Re: Dragonfly Mail Agent (dma) in the base system

2022-01-27 Thread John Baldwin

On 1/27/22 1:34 PM, Ed Maste wrote:

The Dragonfly Mail Agent (dma) is a small Mail Transport Agent (MTA)
which accepts mail from a local Mail User Agent (MUA) and delivers it
locally or to a smarthost for delivery. dma does not accept inbound
mail (i.e., it does not listen on port 25) and is not intended to
provide the same functionality as a full MTA like postfix or sendmail.
It is intended for use cases such as delivering cron(8) mail.

Since 2014 we have a copy of dma in the base system available as an
optional component, enabled via the WITH_DMAGENT src.conf knob.

I am interested in determining whether dma is a viable minimal base
system MTA, and if not what gaps remain. If you have enabled DMA on
your systems (or are willing to give it a try) and have any feedback
or are aware of issues please follow up or submit a PR as appropriate.


I've used DMA on systems without local mail accounts to forward cron
periodic e-mails just fine.  It even supports STARTTLS and SMTP AUTH.
I haven't tried using it for simple local delivery to /var/mail/root.

--
John Baldwin



Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]

2022-01-03 Thread John Baldwin

On 1/1/22 9:00 AM, Ed Maste wrote:

On Fri, 31 Dec 2021 at 18:04, John Baldwin  wrote:


However, your point about libcxxrt.so.1 is valid.  It needs to also be moved
to /lib if libc++.so.1 is moved to /lib.


libcxxrt.so.1 has always been in /lib.


Oh, I was thrown off by the .so indirection for libcxxrt in the linker script.

--
John Baldwin



Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]

2021-12-31 Thread John Baldwin

On 12/31/21 2:59 PM, Mark Millard wrote:

On 2021-Dec-31, at 14:28, Mark Millard  wrote:


On 2021-Dec-30, at 14:04, John Baldwin  wrote:


On 12/30/21 1:09 PM, Mark Millard wrote:

On 2021-Dec-30, at 13:05, Mark Millard  wrote:

This asks a question in a different direction that my prior
reports about my builds vs. Cy's reported build.

Background:

/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libc++.so:GROUP
 ( /lib/libc++.so.1 /usr/lib/libcxxrt.so
and:
lrwxr-xr-x  1 root  wheel23 Dec 29 13:17:01 2021 /usr/lib/libcxxrt.so 
-> ../../lib/libcxxrt.so.1

Why did libc++.so.1 not get a:

/usr/lib/libc++.so.1 -> ../../lib/libc++.so.1

I forgot to remove the .1 on the left hand side:
/usr/lib/libc++.so -> ../../lib/libc++.so.1


Because for libc++.so we don't just symlink to the current version of the 
library
(as we do for most other shared libraries) to tell the compiler what to link 
against
for -lc++, instead we use a linker script that tells the compiler to link 
against
both of those libraries when -lc++ is encountered.


A better identification of what looks odd to me is the
path variations in:

# more /usr/lib/libc++.so


Another not great day on my part: That path alone makes
the mix of /lib/ and /usr/lib/ use involved, given the
reference to /lib/libc++.so.1 . That would still be true
if the other path had been /lib/libcxxrt.so .


/usr/lib/libc++.so is only used by the compiler/linker when linking a binary.
The resulting binary has the associated paths (/lib/libc++.so.1 and
/usr/lib/libcxxrt.so.1) in its DT_NEEDED.  So it is fine for the .so to be
in /usr/lib.  This is the same with /usr/lib/libc.so vs /lib/libc.so.7.

However, your point about libcxxrt.so.1 is valid.  It needs to also be moved
to /lib if libc++.so.1 is moved to /lib.  Doing so will also require yet another
depend-clean.sh fixup (well, probably just adjusting the one I added to
check the libcxxrt path instead of libc++ path).

--
John Baldwin



Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]

2021-12-30 Thread John Baldwin

On 12/30/21 1:09 PM, Mark Millard wrote:



On 2021-Dec-30, at 13:05, Mark Millard  wrote:


This asks a question in a different direction that my prior
reports about my builds vs. Cy's reported build.

Background:

/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libc++.so:GROUP
 ( /lib/libc++.so.1 /usr/lib/libcxxrt.so
and:
lrwxr-xr-x  1 root  wheel23 Dec 29 13:17:01 2021 /usr/lib/libcxxrt.so 
-> ../../lib/libcxxrt.so.1

Why did libc++.so.1 not get a:

/usr/lib/libc++.so.1 -> ../../lib/libc++.so.1


I forgot to remove the .1 on the left hand side:

/usr/lib/libc++.so -> ../../lib/libc++.so.1


Because for libc++.so we don't just symlink to the current version of the 
library
(as we do for most other shared libraries) to tell the compiler what to link 
against
for -lc++, instead we use a linker script that tells the compiler to link 
against
both of those libraries when -lc++ is encountered.

I have finally reproduced Cy's build error locally and am testing my fix.  If it
works I'll commit it.

--
John Baldwin



Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

2021-12-14 Thread John Baldwin

On 12/14/21 9:40 AM, Gleb Smirnoff wrote:

On Tue, Dec 14, 2021 at 09:28:07AM -0800, John Baldwin wrote:
J> > AFAIK, today it will always panic only with WITNESS. Without WITNESS it 
would
J> > pass through mtx_lock as long as the mutex is not locked.
J>
J> Yes, but the default kernel on head is GENERIC which has witness enabled, 
hence
J> the out of the box kernel panics reliably. :)
J>
J> > So, do you suggest to push D33340 before finalizing D9?
J>
J> Yes, I think so.

Pushed. And I plan to post new version of D9 today.


Thanks!

--
John Baldwin



Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

2021-12-14 Thread John Baldwin

On 12/13/21 12:25 PM, Gleb Smirnoff wrote:

On Mon, Dec 13, 2021 at 11:56:35AM -0800, John Baldwin wrote:
J> > J> So there are two things here.  The root issue is that the devel/apr1 
port
J> > J> runs a configure test for TCP_NDELAY being inherited by accepted 
sockets.
J> > J> This test panics because prison_check_ip4() tries to lock a prison mutex
J> > J> to walk the IPs assigned to a jail, but the caller 
(in_pcblookup_hash()) has
J> > J> done an smr_enter() which is a critical_enter():
J> >
J> > The first one is known, and I got a patch to fix it:
J> >
J> > https://reviews.freebsd.org/D33340
J> >
J> > However, a pre-requisite to this simple patch is more complex:
J> >
J> > https://reviews.freebsd.org/D9
J> >
J> > There is some discussion on how to improve that, and I decided to do that
J> > rather than stick to original version. So I takes a few extra days.
J> >
J> > We could push D33340 into main, if the negative effects (raciness of
J> > the prison check) is considered lesser evil then potentially contested
J> > mtx_lock in smr section.
J>
J> I think raciness is probably better than always panicking as it does today.

AFAIK, today it will always panic only with WITNESS. Without WITNESS it would
pass through mtx_lock as long as the mutex is not locked.


Yes, but the default kernel on head is GENERIC which has witness enabled, hence
the out of the box kernel panics reliably. :)


So, do you suggest to push D33340 before finalizing D9?


Yes, I think so.

--
John Baldwin



Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

2021-12-14 Thread John Baldwin

On 12/14/21 2:14 AM, Alexey Dokuchaev wrote:

How do you mean?  Most FreeBSD people, not some random Twitter crowd, want
the bell to be on by default, but it's still off.


I don't know that that's true, and I myself am not sure that I want it back
on by default.  Previously my laptop had a rather annoying beep whose volume
I couldn't control that I actually prefer to have off normally.  On further
reflection, the beep I was looking for for bad input may actually be an
xscreensaver thing for an invalid character to unlock the screen vs a sysbeep
anyway.

--
John Baldwin



Re: RFC: What to do about Allocate in the NFS server for FreeBSD13?

2021-12-14 Thread John Baldwin

On 12/13/21 8:30 AM, Konstantin Belousov wrote:

On Mon, Dec 13, 2021 at 04:26:42PM +, Rick Macklem wrote:

Hi,

There are two problems with Allocate in the NFSv4.2 server in FreeBSD13:
1 - It uses the thread credentials instead of the ones for the RPC.
2 - It does not ensure that file changes are committed to stable storage.
These problems are fixed by commit f0c9847a6c47 in main, which added
ioflag and cred arguments to VOP_ALLOCATE().

I can think of 3 ways to fix Allocate in FreeBSD13:
1 - Apply a *hackish* patch like this:
+   savcred = p->td_ucred;
+   td->td_ucred = cred;
do {
olen = len;
error = VOP_ALLOCATE(vp, , );
if (error == 0 && len > 0 && olen > len)
maybe_yield();
} while (error == 0 && len > 0 && olen > len);
+   p->td_ucred = savcred;
if (error == 0 && len > 0)
error = NFSERR_IO;
+   if (error == 0)
+   error = VOP_FSYNC(vp, MNT_WAIT, p);
The worst part of it is temporarily setting td_ucred to cred.

2 - MFC'ng commit f0c9847a6c47. Normally changes to the
  VOP/VFS are not MFC'd. However, in this case, it might be
  ok to do so, since it is unlikely there is an out of source tree
  file system with a custom VOP_ALLOCATE() method?

I do not see much wrong with #2, this is what I would do myself.


I also think this is fine.

--
John Baldwin



Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

2021-12-13 Thread John Baldwin

On 12/13/21 9:35 AM, Gleb Smirnoff wrote:

   Hi John,

On Mon, Dec 13, 2021 at 07:45:07AM -0800, John Baldwin wrote:
J> So there are two things here.  The root issue is that the devel/apr1 port
J> runs a configure test for TCP_NDELAY being inherited by accepted sockets.
J> This test panics because prison_check_ip4() tries to lock a prison mutex
J> to walk the IPs assigned to a jail, but the caller (in_pcblookup_hash()) has
J> done an smr_enter() which is a critical_enter():

The first one is known, and I got a patch to fix it:

https://reviews.freebsd.org/D33340

However, a pre-requisite to this simple patch is more complex:

https://reviews.freebsd.org/D9

There is some discussion on how to improve that, and I decided to do that
rather than stick to original version. So I takes a few extra days.

We could push D33340 into main, if the negative effects (raciness of
the prison check) is considered lesser evil then potentially contested
mtx_lock in smr section.


I think raciness is probably better than always panicking as it does today.
 

J> However, it was a bit harder to see this originally as the 915kms driver
J> tries to do a malloc(M_WAITOK) from cn_grab() when entering DDB which
J> recursively panics (even a malloc(M_NOWAIT) from cn_grab() is probably a
J> bad idea).  When it panicked in X the result was that the screen just froze
J> on whatever it had most recently drawn and the machine looked hung.  (The
J> fact that that sysbeep is off so I couldn't tell if typing in commands was
J> doing anything vs emitting errors probably didn't improve trying to diagnose
J> the hang as "sitting in ddb" initially, though I don't know if DDB itself
J> emits a beep for invalid commands, etc.)

Didn't know about this one. Is this isolated to actually entering DDB or
there is some path that in a normal inpcb lookup we would M_WAITOK?


This is in the drm(4) driver, nothing to do with in_pcb, just made it harder
to see the in_pcb issue.

--
John Baldwin



smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore

2021-12-13 Thread John Baldwin
itting in ddb" initially, though I don't know if DDB itself
emits a beep for invalid commands, etc.)

--
John Baldwin



Re: Make etcupdate bootstrap requirement due to previous mergemaster usage more clear in handbook

2021-12-08 Thread John Baldwin

On 12/3/21 6:09 PM, Tomoaki AOKI wrote:

On Fri, 03 Dec 2021 05:54:37 -0800 (PST)
"Jeffrey Bouquet"  wrote:




On Fri, 3 Dec 2021 13:58:39 +0100, Miroslav Lachman <000.f...@quip.cz> wrote:


On 03/12/2021 12:52, Yetoo Happy wrote:

[...]


  Quick Start* and follow the instructions and get to step
7 and may think that even though etcupdate is different from mergemaster
from the last time they used the handbook they have faith that following
the instructions won't brick their system. This user will instead find that
faith in general is just a very complex facade for the pain and suffering
of not following *24.5.6.1 Merging Configuration Files* because the user
doesn't know that step exists or relevant to the current step and ends up
unknowingly having etcupdate append "<<<< yours ... >>>>> new" to the top
of the user's very important configuration files that they didn't expect
the program to actually modify that way when they resolved differences nor
could they predict easily because the diff format is so unintuitive and
different from mergemaster. Now unable to login or boot into single user
mode because redirections instead of the actual configuration is parsed the
user goes to the handbook to find out what might have happened and scrolls
down to find *24.5.6.1 Merging Configuration Files* is under *24.5.6.


[...]

That's why I think etcupdate is not so intuitive as tool like this
should be and etcupdate is extremely dangerous because it intentionally
breaks syntax of files vital to have system up and running.
If anything goes wrong with mergemaster automatic process then your have
configuration not updated which is almost always fine to boot the system
and fix it. But after etcupdate? Much worse...

I maintain about 30 machines for 2 decades and had problems with
etcupdate many times. I had ti use mergemaster as fall back many times.
Mainly because of etcupdate said "Reference tree to diff against
unavailable" or "No previous tree to compare against, a sane comparison
is not possible.". And sometimes because etcupdate cannot automatically
update many files in /etc/rc.d and manual merging of a lot of files with
"<<<<  >>>>" is realy painful while with mergemaster only simple
keyboard shortcuts will solve it.
All of this must be very stressful for beginners.

So beside the update of documentation I really would like to see some
changes to etcupdate workflow where files are modified in temporary
location and moved to destination only if they do not contain any syntax
breaking changes like <<<<, , >>>>.

Kind regards
Miroslav Lachman



Agree. I fell back to mergemaster this Nov on 13-stable when the /var files
pertaining to etcupdate were all missing current /etc data, and no study of
man etcupdate was clear enough to rectify such a scenario, and suspect my
initial use of etcupdate will or may require a planned reinstall,  not having
had to do so since Jan 2004 iirc,  [ vs failed hard disk migrations ] and
I am just hoping mergemaster stays in /usr/src and updated
for system changes, even if moved to 'tools' or
something, since its use seems intuitive and much less of a black box.
Also, /usr/src/UPDATING still at the bottom emphasizes mergemaster still.



Not sure it's fixed or not (tooo dangerous to try...), -n (dry-run)
option of etcupdate is now quite harmful. Maybe by any commit done in
this april on main (MFC'ed to stable/13 in june).

  *I got busy manually checking and applying changes to /etc, and
   forgot to file PR.

Doing `etcupdate -n` itself runs OK, but following `etcupdate -B` does
NOT do anything, hence nothing is actually updated.
The only workaround I have is NOT to try dry-run.


Humm.


It would be because the same trees are used for dry-run and actual run.
(Not looked into the code. Just a thought.)


So the new changes always build a temporary tree (vs trying to build
/var/db/etupdate/current in place).  For -n it should be that it just
doesn't change /var/db/etcupdate/current at the end, but if it did the
move anyway that would explain the bug you are seeing.  That does indeed
look broken.  Please file a PR as a reminder for me to fix it.

--
John Baldwin



Re: Make etcupdate bootstrap requirement due to previous mergemaster usage more clear in handbook

2021-12-08 Thread John Baldwin

On 12/3/21 4:58 AM, Miroslav Lachman wrote:

So beside the update of documentation I really would like to see some
changes to etcupdate workflow where files are modified in temporary
location and moved to destination only if they do not contain any syntax
breaking changes like <<<<, , >>>>.


This is what etcupdate does, so I'm a bit confused why you are getting
merge markers in /etc.  When an automated 3-way merge doesn't work due
to conflicts, the file with the conflicts is saved in
/var/db/etcupdate/conflicts/.  It is only copied to /etc when you
mark it as fully resolved when running 'etcupdate resolve'.  Perhaps
you had multiple conflicts in a modified file and when editing the file
you only fixed the first one and then marked it as resolved at the
prompt?  Even in that case etcupdate explicitly prompts you a second time
after you say "r" with "File  still has conflicts, are you sure?",
so it will only install a file to /etc with those changes if you have
explicitly confirmed you want it.

--
John Baldwin



Re: failure of pructl (atexit/_Block_copy/--no-allow-shlib-undefined)

2021-12-04 Thread John-Mark Gurney
John-Mark Gurney wrote this message on Thu, Dec 02, 2021 at 15:43 -0800:
> David Chisnall wrote this message on Thu, Dec 02, 2021 at 10:34 +:
> > On 02/12/2021 09:51, Dimitry Andric wrote:
> > > Apparently the "block runtime" is supposed to provide the actual object,
> > > so I guess you have to explicitly link to that runtime?
> > 
> > The block runtime provides this symbol.  You use this libc API, you must 
> > be compiling with a toolchain that supports blocks and must be providing 
> > the blocks symbols.  If you don't use `atexit_b` or any of the other 
> > `_b` APIs then you don't need to link the blocks runtime.
> > 
> > I am not sure why this is causing linker failures - if it's a weak 
> > symbol and it's not defined then that's entirely expected: the point of 
> > a weak symbol is that it might not be defined.  This avoids the need to 
> > link libc to the blocks runtime for code that doesn't use blocks (i.e. 
> > most code that doesn't come from macOS).
> > 
> > This code is not using `atexit_b`, but because it is using `atexit` the 
> > linker is complaining that the compilation unit containing `atexit` is 
> > referring to a symbol that isn't defined.
> 
> I assume that this failure was due to a recent llvm change, because I
> haven't received any failures about pructl until Nov 16th, 2021,
> despite the port and code being untouched since 2020-09-22.
> 
> Digging in a bit more, it looks like libpru is compiled w/ -fblocks,
> and so depending upon the _Block_copy symbol, the atexit is just the
> "closest" symbol that's defined".  pructl is not, but even compiling
> pructl w/ -fblocks, doesn't fix the link error, as it looks like the
> block runtime isn't linked.  If I manually include
> /usr/lib/libBlocksRuntime.so, then pructl is able to link.
> 
> I can't seem to find any docs on clang about how to properly compile
> code that uses blocks, so, unless someone points me to docs on how to
> compile blocks enable programs, I'll just patch libpru to not use
> blocks since it seems like blocks is well supported.  I don't want
> to fix this code every few years when things change.

Thanks to some off-list comms, it appears that this was a regression
in lld 13, and will be fixed by:
https://reviews.llvm.org/D115041

Thanks to jrtc27 for [helping] tracking this down!

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."



Re: failure of pructl (atexit/_Block_copy/--no-allow-shlib-undefined)

2021-12-02 Thread John-Mark Gurney
David Chisnall wrote this message on Thu, Dec 02, 2021 at 10:34 +:
> On 02/12/2021 09:51, Dimitry Andric wrote:
> > Apparently the "block runtime" is supposed to provide the actual object,
> > so I guess you have to explicitly link to that runtime?
> 
> The block runtime provides this symbol.  You use this libc API, you must 
> be compiling with a toolchain that supports blocks and must be providing 
> the blocks symbols.  If you don't use `atexit_b` or any of the other 
> `_b` APIs then you don't need to link the blocks runtime.
> 
> I am not sure why this is causing linker failures - if it's a weak 
> symbol and it's not defined then that's entirely expected: the point of 
> a weak symbol is that it might not be defined.  This avoids the need to 
> link libc to the blocks runtime for code that doesn't use blocks (i.e. 
> most code that doesn't come from macOS).
> 
> This code is not using `atexit_b`, but because it is using `atexit` the 
> linker is complaining that the compilation unit containing `atexit` is 
> referring to a symbol that isn't defined.

I assume that this failure was due to a recent llvm change, because I
haven't received any failures about pructl until Nov 16th, 2021,
despite the port and code being untouched since 2020-09-22.

Digging in a bit more, it looks like libpru is compiled w/ -fblocks,
and so depending upon the _Block_copy symbol, the atexit is just the
"closest" symbol that's defined".  pructl is not, but even compiling
pructl w/ -fblocks, doesn't fix the link error, as it looks like the
block runtime isn't linked.  If I manually include
/usr/lib/libBlocksRuntime.so, then pructl is able to link.

I can't seem to find any docs on clang about how to properly compile
code that uses blocks, so, unless someone points me to docs on how to
compile blocks enable programs, I'll just patch libpru to not use
blocks since it seems like blocks is well supported.  I don't want
to fix this code every few years when things change.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."



failure of pructl (atexit/_Block_copy/--no-allow-shlib-undefined)

2021-12-01 Thread John-Mark Gurney
Hello,

It seems like the recent changes to make --no-allow-shlib-undefined
broke pructl.

lib/libc/stdlib/atexit.c uses a weak _Block_copy symbol, but
pructl does not use atexit_b, and yet gets the following error:
: && /usr/bin/cc -Werror -O2 -pipe  -fstack-protector-strong -isystem 
/usr/local/include -fno-strict-aliasing -std=c99 -fstack-protector-strong 
CMakeFiles/pructl.dir/pructl.c.o -o pructl  -Wl,-rpath,/usr/local/lib:  
/usr/local/lib/libpru.so && :
ld: error: /lib/libc.so.7: undefined reference to _Block_copy 
[--no-allow-shlib-undefined]
cc: error: linker command failed with exit code 1 (use -v to see invocation)

What is the correct fix?  It seems like atexit.c or the linker should
be fixed, as pructl doesn't use atexit_b at all.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."



Re: amd64 (example) main [so: 14]: delete-old check-old delete-old-libs missing a bunch of files?

2021-12-01 Thread John Baldwin
 in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/mixer: Kyuafile
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/mixer: mixer_test


Fallout from recent mixer changes?  Hans might know more.


Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v1-sparc64-sav.in
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v1-sparc64-sav.out
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v1-sparc64-u.out
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v1-sparc64-usr.in
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v1-sparc64-usr.out
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v2-sparc64-sav.in
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v2-sparc64-u.out
Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: 
v2-sparc64-usr.in


I'll commit fixes for some of these.

--
John Baldwin



Re: make cleandiry tries to access /lib/geom

2021-11-24 Thread John Baldwin

On 11/24/21 3:30 AM, Bjoern A. Zeeb wrote:

Hi,

  673 ===> usr.bin/diff/tests (cleandir)
  674 ===> lib/geom (cleandir)
  675 ===> sbin/mount_udf (cleandir)
  676 make[6] warning: /lib/geom: Permission denied.

 not sure what is going on here?

  677 ===> share/i18n/esdb/ISO-8859 (cleandir)
  678 ===> tests/sys/cddl/zfs/tests/cli_root/zfs_clone (cleandir)


I think Jess has a possible fix.  This is some regression added in the
build system several months ago.

--
John Baldwin



Re: "Khelp module "ertt" can't unload until its refcount drops from 1 to 0." after "All buffers synced."?

2021-11-22 Thread John Baldwin

On 11/19/21 4:29 AM, tue...@freebsd.org wrote:

On 19. Nov 2021, at 00:11, Mark Millard  wrote:


On 2021-Nov-18, at 12:31, tue...@freebsd.org  wrote:


On 17. Nov 2021, at 21:13, Mark Millard via freebsd-current 
 wrote:

I've not noticed the ertt message before in:

. . .
Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... done
All buffers synced.
Uptime: 1d9h57m18s
Khelp module "ertt" can't unload until its refcount drops from 1 to 0.

Hi Mark,

what kernel configuration are you using? What kernel modules are loaded?


The shutdown was of my ZFS boot media but the machine is
currently doing builds on the UFS media. (The ZFS media is
present but not mounted). For now I provide information
from the booted UFS system. The UFS context is intended
to be nearly a copy of the brctl selection for main [so: 14]
from the ZFS media. Both systems have been doing the same
poudriere builds for various comparison/contrast purposes.
The current build activity will likely take 16+ hrs.

Hi Mark,

thanks a lot for the information. I was contemplating whether this message
was related to a recent change in the TCP congestion module handling, but
since it was already there in February, this is not the case. Will try to
reproduce this, but wasn't able up to now.


The congestion control changes just probably exacerbated the bug by
adding a new reference on this module, just as they exposed the bug
with khelp using the wrong SYSINIT subsystem.

--
John Baldwin



Re: cross-compiling for i386 on amd64 fails

2021-11-16 Thread John Baldwin

On 11/15/21 8:34 PM, Michael Butler via freebsd-current wrote:

Haven't had time to identify which change caused this yet but I now get ..

===> lib/libsbuf (obj,all,install)
===> cddl/lib/libumem (obj,all,install)
===> cddl/lib/libnvpair (obj,all,install)
===> cddl/lib/libavl (obj,all,install)
ld: error: /usr/obj/usr/src/i386.i386/tmp/usr/lib/libspl.a(assert.o) is
incompatible with elf_i386_fbsd
===> cddl/lib/libspl (obj,all,install)
cc: error: linker command failed with exit code 1 (use -v to see invocation)
--- libavl.so.2 ---
*** [libavl.so.2] Error code 1

make[4]: stopped in /usr/src/cddl/lib/libavl


My guess is that this was fixed by
git: 9e9c651caceb - main - cddl: fix missing ZFS library dependencies

--
John Baldwin



Re: Problems with getting a crash dump

2021-11-08 Thread John Kennedy
On Mon, Nov 08, 2021 at 07:08:31PM +, Alexander wrote:
> Hello, I am currently using FreeBSD 14.0-CURRENT and I found a bug that
> triggers a kernel panic. I wanted to make a kernel crash dump to further
> investigate the issue, but after a few tries I still did not manage to do it.
> I started by following the instructions in the FreeBSD Handbook. ...
> /dev/nvd0p2.eli is an active swap device and I configured it to be used as a
> dump device like this: ...

  Much like you, I found that my current (encryptd) swap files weren't going to
work and I used an external USB stick.

[/etc/rc.conf]
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
#dumpdev="AUTO"
dumpdev="/dev/da0p1"

[dumpon -vl]
kernel dumps on priority: device
0: da0p1

[gpart show da0]
=>   40  240353200  da0  GPT  (115G)
 40  2403532001  freebsd-swap  (115G)

[swapctl -lm]
Device:   1MB-blocks  Used:
/dev/nvd1p3.eli   8192   2932

  Apparently the last time I crashed was ~Mar 2021 so your version mileage may
vary (not 14), but make sure the OS didn't already do it for you (at least if
you're booting up fully into multi-user mode; you did say single).  The
/var/crash directory is the default location for where savecore stashes the
info for you.

  Note that I made da0p1 swap, but I didn't actually configure it that way
in /etc/fstab so I'm not using slow, unencrypted USB for swap, just dumps.

  The stick had a little write-LED on it, so it was obvious when it was
being hit and I think the kernel panic-dump had a status output of some sort
(it's been a while), although that might be obscured (under X11, etc).  I
sort of remember a prompt where I could have done something interactive
that I might have had to continue on from before it did the dump.  Again,
it's been a while since I had a dump that I was trying hard to report.

  115G is more than enough to hold 32G of RAM and 8G of swap.  Remember that
some of your RAM might *be* swapped out (so, worse cast, RAM+swap).  Seems
like you'd have good odds in a nice, controlled test of not needing all that
space but kernel crash dumps are often pretty brainless because they know
they've just lost at Russian roulette and don't know what they can trust
(don't know about FreeBSD specifically).  Lets just say that it has a very
different approach to swap than ancient SunOS.

  You've got some interesting physical quirks (ala, 14 + USB stick) that
I couldn't test with my setup, but I do have a bhyve running 14 that I
could probably try crashing in a similar way (no USB of course).

  It sounds like you're going down the right path, although I'd try to
borrow a bigger USB stick and see if that helps.




Re: LAN ure interface problem

2021-10-29 Thread John-Mark Gurney
Ludovit Koren wrote this message on Fri, Oct 22, 2021 at 16:00 +0200:
> I have installed FreeBSD 14.0-CURRENT #1 main-n250134-225639e7db6-dirty
> on my notebook HP EliteBook 830 G7 and I am using RealTek usb LAN
> interface:
> 
> ure0 on uhub0
> ure0:  on 
> usbus1
> miibus0:  on ure0
> rgephy0:  PHY 0 on miibus0
> rgephy0: OUI 0x00e04c, model 0x, rev. 0
> rgephy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 
> 1000baseT-FDX, 1000baseT-FDX-master, auto
> ue0:  on ure0
> ue0: bpf attached
> ue0: Ethernet address: 00:e0:4c:68:04:20
> 
> 
> When there is bigger load on the interface, for example rsync of the big
> directory, the carrier is lost. The only solution I found is to remove
> and insert the usb interface; ifconfig ue0 down, ifconfig ue0 up did not
> help. The output of the ifconfig:
> 
> ue0: flags=8843 metric 0 mtu 1500
> 
> options=68009b
> ether 00:e0:4c:68:04:20
> inet 192.168.1.18 netmask 0xff00 broadcast 192.168.1.255
> media: Ethernet autoselect (100baseTX )
> status: active
> nd6 options=29
> 
> I do not know and did not find anything relevant, if the driver is buggy
> or the hardware has some problems. Please, advice.

I have seen similar behavior, and unable to get an vendor support, so
have stopped working on the driver.  I have not found a reliable way to
reset the hardware to a working state, even via power_off/power_on
commands.

Sorry that I don't have a solution for you.  The closest that I could
suggest is to try to drop the USB id from the ure driver or switch it's
mode to try the ucdce driver instead.  I've seen that it's been more
reliable, but it could be because it also runs MUCH slower, and doesn't
hit the same bug.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."



Re: git: 2f7f8995367b - main - libdialog: Bump shared library version to 10. [ the .so.10 is listed in mk/OptionalObsoleteFiles.inc ?]

2021-10-27 Thread John Baldwin

On 10/27/21 3:23 PM, Mark Millard via freebsd-current wrote:



On 2021-Oct-27, at 15:21, Mark Millard  wrote:


Unfortunately(?) this update added the .so.10 to mk/OptionalObsoleteFiles.inc :

diff --git a/tools/build/mk/OptionalObsoleteFiles.inc 
b/tools/build/mk/OptionalObsoleteFiles.inc
index a8b0329104c4..91822aac492a 100644
--- a/tools/build/mk/OptionalObsoleteFiles.inc
+++ b/tools/build/mk/OptionalObsoleteFiles.inc
_at__at_ -1663,11 +1663,11 _at__at_ OLD_FILES+=usr/bin/dialog
. . .
OLD_FILES+=usr/lib/libdialog.so
-OLD_FILES+=usr/lib/libdialog.so.8
+OLD_FILES+=usr/lib/libdialog.so.10
. . .

Looks to my like that +line should have been:

+OLD_FILES+=usr/lib/libdialog.so.9

(presuming the original .so.8 was correct during
.so.9 's time frame).



Looks like:

+OLD_FILES+=usr/lib/libdpv.so.3

is the same sort of issue and possibly should have been:

+OLD_FILES+=usr/lib/libdpv.so.2


No, these lines are for removing the current versions of the
libraries if you do 'make delete-old WITHOUT_DIALOG=yes'.

They weren't bumped previously when I bumped them for ncurses
(probably my fault).

--
John Baldwin



Re: main changed DIALOG_STATE, DIALOG_VARS, and DIALOG_COLORS but /usr/lib/libdialog.so.? naming was not adjusted? (crashes in releng/13 programs on main [so: 14] can result)

2021-10-22 Thread John Baldwin

On 10/22/21 1:08 AM, Mark Millard via freebsd-current wrote:

main [soi: 14] commit a96ef450 (2021-02-26 09:16:49 +)
changed DIALOG_STATE, DIALOG_VARS, and DIALOG_COLORS .
These are publicly exposed in (ones that I noticed):

/usr/include/dialog.h:extern DIALOG_STATE dialog_state;
/usr/include/dialog.h:extern DIALOG_VARS dialog_vars;
/usr/include/dialog.h:extern DIALOG_COLORS dlg_color_table[];


Then we need to bump libdialog's so version to 10?  (I don't
think libdialog has symbol versioning)

--
John Baldwin



Re: ELF binary type "0" not known. (while compiling buildworld on risc-v/qemu)

2021-09-27 Thread John Baldwin

On 9/27/21 7:40 AM, Karel Gardas wrote:


Hello,

I'm playing with compiling freebsd 13 (releng/13.0 2 days ago) and
current (git HEAD as of today) on qemu-5.1.0/qemu-6.1.0 on risv64
platform. The emulator invocation is:

qemu-system-riscv64 -machine virt -smp 8 -m 16G -nographic -device
virtio-blk-device,drive=hd -drive
file=FreeBSD-14.0-CURRENT-riscv-riscv64.qcow2,if=none,id=hd -device
virtio-net-device,netdev=net -netdev user,id=net,hostfwd=tcp::2233-:22
-bios /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.elf -kernel
/usr/lib/u-boot/qemu-riscv64_smode/uboot.elf -object
rng-random,filename=/dev/urandom,id=rng -device
virtio-rng-device,rng=rng -nographic -append "root=LABEL=rootfs
console=ttyS0"

and the host is Ubuntu 20.04.x LTS. Both qemu 5.1.0 and qemu 6.1.0 are
compiled from, source, but both OpenSBI and u-boot for risc-v are Ubuntu
packages provided (to accompany ubuntu provided qemu 4.2.1)

My issue while compiling both 13 and current is that compilation after
some time fails with:

root@freebsd:/usr/src # time make -j8 buildworld > /tmp/build-j8-2.txt
ELF binary type "0" not known.
17784.134u 21388.907s 1:50:13.83 592.2% 30721+572k 10+2177io 0pf+0w

I'm curious if this is a know issue either in Qemu or in FreeBSD for
risc-v or if I'm doing anything wrong here?


It is a known issue with how we brand FreeBSD/riscv binaries.  Jess
(cc'd) has a WIP review with a possible fix IIRC.

--
John Baldwin



Re: [HEADSUP] making /bin/sh the default shell for root

2021-09-22 Thread John Baldwin

On 9/22/21 1:36 AM, Baptiste Daroussin wrote:

Hello,

TL;DR: this is not a proposal to deorbit csh from base!!!

For years now, csh is the default root shell for FreeBSD, csh can be confusing
as a default shell for many as all other unix like settled on a bourne shell
compatible interactive shell: zsh, bash, or variant of ksh.

Recently our sh(1) has receive update to make it more user friendly in
interactive mode:
* command completion (thanks pstef@)
* improvement in the emacs mode, to make it behave by default like other shells
* improvement in the vi mode (in particular the vi edit to respect $EDITOR)
* support for history as described by POSIX.

This makes it a usable shell by default, which is why I would like to propose to
make it the default shell for root starting FreeBSD 14.0-RELEASE (not MFCed)

If no strong arguments has been raised until October 15th, I will make this
proposal happen.

Again just in case: THIS IS NOT A PROPOSAL TO REMOVE CSH FROM BASE!


I think this is fine.  I would also be fine with either removing 'toor' from the
default password file or just leaving it as-is for POLA.  (I would probably
prefer removing it outright.)

--
John Baldwin



Re: rescue/sh check failed, installation aborted

2021-08-23 Thread John Baldwin

On 8/23/21 12:18 PM, Graham Perrin wrote:

Encountered whilst attempting to build and install 14.0-CURRENT over
13.0-RELEASE-p3 (experimental, helloSystem):

<https://i.imgur.com/euFBA8M.png>

Background, condensed, to the best of my recollection:

cd /usr/src
make buildworld    # succeeded
make kernel    # failed
make clean
LOCAL_MODULES=    # added to /etc/src.conf
make kernel-toolchain
make kernel
restarted in single user mode
mount -uw /
service zfs start
cd /usr/src
make installworld – failed as pictured.

I see <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231325>, fixed
in 2018.

Any suggestions?


I'm not sure what the 'make clean' would have done.  Did you mean 'make 
cleanworld'?
If so, you will need to do a 'make buildworld' again before trying to do
'make installworld'.  The error message implies that there is no 'make 
buildworld'
output in /usr/obj (as if you had run 'make cleanworld' up above where you list
'make clean')

--
John Baldwin



Re: etcupdate: Failed to build new tree

2021-07-14 Thread John Baldwin

On 7/2/21 2:30 AM, Nuno Teixeira wrote:

Hello,

Last update I have some issues with etcupdate:

etcupdate warning: "No previous tree to compare against, a sane comparison
is not possible."

That I corrected with:

etcupdate extract
etcupdate diff > /tmp/etc.diff
patch -R < /tmp/etc.diff
(etcupdate diff doesn't show any diffs.)

Today I've just updated current and etcupdate -p gives:

"Failed to build new tree"

What might be wrong?


You can look in /var/db/etcupdate/log to check for errors.

--
John Baldwin



Re: CURRENT: acpi_wakecode.S error: unknown -Werror warning specifier: '-Wno-error-tautological-compare'

2021-06-22 Thread John Baldwin

On 6/22/21 11:13 AM, O. Hartmann wrote:

Hello,

on a recent CURRENT (FreeBSD 14.0-CURRENT #6 main-n247512-e3be51b2bc7c: Tue Jun 
22 15:31:03
CEST 2021 amd64) we build a 13-STABLE based NanoBSD from a dedicated source 
tree for a small
routing appliance. It should be " ...we built ..." because sinde the 
introduction of
LLVM/CLANG 12 on FreeBSD, the build of the source tree fails with the error 
shown below.

Since these errors a re die to some compiler knobs, the question is how to 
avoid them and make
the tree of 13-STABLE build again?
We do not do explicetely cross compiling, so if there is in general an issue with 
this "brute
force method" I would appreciate any recommendation to avoid such malfunctions 
using other
techniques - as long as they are moderate to implement.

Thanks in advance and kind regards,


You can use 'make buildworld WITHOUT_SYSTEM_COMPILER=yes' to force your builds 
to use the
clang 11 included in stable/13 instead of the host clang 12.  You could also 
MFC the fixes
from head to use -Wno-error= instead of -Wno-error-.

--
John Baldwin



Re: etcupdate warning: "No previous tree to compare against, a sane comparison is not possible."

2021-06-22 Thread John Baldwin

On 6/22/21 12:34 AM, Nuno Teixeira wrote:

Hello,

Should I be worry about etcupdate warning "No previous tree to compare
against, a sane comparison is not possible." when I recompile and update
current?

I receive same warning when I do a 'etcupdate -p' after installworld too.


Yes, this means etcupdate is not merging any changes to /etc.  You should
run 'etcupdate extract' before your next upgrade cycle.  You should then
review the output of 'etcupdate diff' to see if there are files in /etc
that need updating.  If there are files that you want to update to stock
versions you can use 'etcupdate revert /path/to/file'.  Otherwise, you can
use the patch generated by 'etcupdate diff' either as a guide to manually
update files to remove unwanted differences, or as input to patch -R.

--
John Baldwin



Re: drm-kmod kernel crash fatal trap 12

2021-06-15 Thread John Baldwin

On 6/15/21 11:22 AM, Bakul Shah wrote:

On Jun 15, 2021, at 9:03 AM, John Baldwin  wrote:


On 6/10/21 8:13 AM, Bakul Shah wrote:

On Jun 10, 2021, at 7:13 AM, Thomas Laus  wrote:

The drm-kmod module is the latest from the pkg server.  It all
worked this past Monday after the recent drm-kmod update.

This is what I did:
git clone https://github.com/freebsd/drm-kmod
ln -s $PWD/drm-kmod /usr/local/sys/modules
Now it gets compiled every time you do make buildkernel.
If things break you can do a git pull in the drm-kmod dir
and rebuild.


This is what I do now as well.  I think this is probably the
sanest approach to use on HEAD at least.


IIRC I learned this from one of your posts.

The PORTS_MODULES approach results in installing kernel modules
/boot/modules, which doesn't track /boot/kernel*/.


Yes, PORTS_MODULES is not so great when you are building test kernels
from branches that are different points in time and then go back to
booting your "stock" kernel as the module is now built against the
wrong ABI and breaks your "stock" kernel.

This is why I added LOCAL_MODULES and the SRC knob to drm-kmod,
but the source knob is a bit bumpy in practice as you sometimes
need newer source than your current package.  (For example, if your
"stock" kernel only changes every few months, but you pull newer
work trees for test kernels.)  For that case, it has proven simpler
to just do the direct checkout that I can git pull when needed.

--
John Baldwin



Re: drm-kmod kernel crash fatal trap 12

2021-06-15 Thread John Baldwin

On 6/10/21 8:13 AM, Bakul Shah wrote:

On Jun 10, 2021, at 7:13 AM, Thomas Laus  wrote:

The drm-kmod module is the latest from the pkg server.  It all
worked this past Monday after the recent drm-kmod update.


This is what I did:

git clone https://github.com/freebsd/drm-kmod
ln -s $PWD/drm-kmod /usr/local/sys/modules

Now it gets compiled every time you do make buildkernel.
If things break you can do a git pull in the drm-kmod dir
and rebuild.


This is what I do now as well.  I think this is probably the
sanest approach to use on HEAD at least.

--
John Baldwin



Re: Files in /etc containing empty VCSId header

2021-06-08 Thread John Baldwin

On 6/7/21 12:58 PM, Ian Lepore wrote:

On Mon, 2021-06-07 at 13:53 -0600, Warner Losh wrote:

On Mon, Jun 7, 2021 at 12:26 PM John Baldwin  wrote:


On 5/20/21 9:37 AM, Michael Gmelin wrote:

Hi,

After a binary update using freebsd-update, all files in /etc
contain
"empty" VCS Id headers, e.g.,

  $ head  /etc/nsswitch.conf
  #
  # nsswitch.conf(5) - name service switch configuration file
  # $FreeBSD$
  #
  group: compat
  group_compat: nis
  hosts: files dns
  netgroup: compat
  networks: files
  passwd: compat

After migrating to git, I would've expected those to contain
something
else or disappear completely. Is this expected and are there any
plans
to remove them completely?


I believe we might eventually remove them in the future, but doing
so
right now would introduce a lot of churn and the conversion to git
had enough other churn going on.



We'd planned on not removing things that might be merged to stable/12
since
those releases (12.3 only I think) will be built out of svn. We'll
likely
start to
remove things more widely as the stable/12 branch reaches EOL and
after.

Warner


It would be really nice if, instead of just deleting the $FreeBSD$
markers, they could be replaced with the path/filename of the file in
the source tree.  Sometimes it's a real interesting exercise to figure
out where a file on your runtime system comes from in the source world.
All the source tree layout changes that happened for packaged-base
makes it even more interesting.


My hope is that we un-break src/etc. :(  A few folks have looked at doing
that (notably Kyle).

--
John Baldwin



Re: Files in /etc containing empty VCSId header

2021-06-07 Thread John Baldwin

On 5/20/21 9:37 AM, Michael Gmelin wrote:

Hi,

After a binary update using freebsd-update, all files in /etc contain
"empty" VCS Id headers, e.g.,

 $ head  /etc/nsswitch.conf
 #
 # nsswitch.conf(5) - name service switch configuration file
 # $FreeBSD$
 #
 group: compat
 group_compat: nis
 hosts: files dns
 netgroup: compat
 networks: files
 passwd: compat

After migrating to git, I would've expected those to contain something
else or disappear completely. Is this expected and are there any plans
to remove them completely?


I believe we might eventually remove them in the future, but doing so
right now would introduce a lot of churn and the conversion to git
had enough other churn going on.

--
John Baldwin



RFT: improvements to if_cdce driver

2021-06-03 Thread John-Mark Gurney
Hello,

I decided to make some improvements to the CDCE driver as at least
the RealTek devices (what I tested them with) when they aren't supported
by ure will present as cdce devices.

https://reviews.freebsd.org/D30625

This adds if_media support and link state support.

The most significant change is this means that if a ue device is
configured for DHCP, devd will now launch dhclient, where previously
it would not, as it would neither receive the link up status (for when
a cable is plugged in) nor would it be the requisit ethernet media
type.

The device I tested with was a RealTek 2.5G device.  So, other
non-RealTek devices would be great to test with.

Let me know if you have any issues with the change!

Thanks.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."



Re: tuning a zfs-mounted /var

2021-05-23 Thread John-Mark Gurney
Michael Gmelin wrote this message on Sat, May 22, 2021 at 21:13 +0200:
> > On 22. May 2021, at 20:32, tech-lists  wrote:
> > 
> > ???Hi,
> > 
> > What options could one pass to zfs to speed it up to characteristics
> > favourable to what's usually in /var ? Like lots of fast writes, lots of
> > files smaller than what's on /usr, lots of file creation and deletion
> > but also quite a few files that might become large, like what's in
> > /var/log, things like that.
> > 
> 
> Make sure your pool (or at least the /var file system) has compression=lz4 
> and that atime is off, beyond that I wouldn???t bother to try to optimize 
> manually there, unless you run a database like MySQL in /var/db/???, in which 
> case setting a fixed record size might make sense.

And if you're running a db in /var, you should just create a new dataset
for the database instead of reuse /var's dataset, that way the fixed
record size does not cause problems for the rest of /var...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."



Re: etcupdate -p: No previous tree to compare against, a sane comparison is not possible. (was: Review D28062 …)

2021-04-26 Thread John Baldwin

On 4/24/21 4:42 AM, Graham Perrin wrote:

On 21/04/2021 18:19, John Baldwin wrote:

On 4/17/21 12:52 PM, Graham Perrin wrote:

2)

<https://reviews.freebsd.org/D28062#change-5KzY5tEtVUor> line 2274

etcupdate -p

I get:

   > No previous tree to compare against, a sane comparison is not
possible.


Hmm, how did you initially install this machine?  Release images should
generally include a pre-populated /var/db/etcupdate so that etcupdate
works.  If you don't have one of those, you will have to perform an
initial bootstrap of etcupdate (only once) by running 'etcupdate
extract'.
If you do this before you update /usr/src then 'etcupdate' will later
work fine.  If you are doing this after you have already updated
/usr/src, you will need to run 'etcupdate diff' after 'etcupdate extract'
and fix any unexpected local differences in the generated patch, e.g.
by copying files from /var/db/etcupdate/current/etc to /etc.  Once
you have done this, 'etcupdate' will work fine on the next upgrade.

However, I'm curious how you didn't get the etcupdate bootstrap when
you initially installed.



Sorry for not replying sooner.

It's not an answer to your question, but might the thread at
<https://lists.freebsd.org/pipermail/freebsd-current/2021-April/079538.html>
be relevant?


Yes, you might indeed have hit this bug (which has since been fixed).
You might have to 'etcupdate extract' and then manually review
'etcupdate diff' to see if you have any unexpected diffs to recover.
Sorry. :-/

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Despite the documentation, "etcupdate extract" handles -D destdir (and its contribution to the default workdir)

2021-04-26 Thread John Baldwin

On 4/24/21 12:22 PM, Mark Millard via freebsd-current wrote:

# etcupdate -?
Illegal option -?

usage: etcupdate [-npBF] [-d workdir] [-r | -s source | -t tarball]
  [-A patterns] [-D destdir] [-I patterns] [-L logfile]
  [-M options]
etcupdate build [-B] [-d workdir] [-s source] [-L logfile] [-M options]
  
etcupdate diff [-d workdir] [-D destdir] [-I patterns] [-L logfile]
etcupdate extract [-B] [-d workdir] [-s source | -t tarball] [-L 
logfile]
  [-M options]
etcupdate resolve [-p] [-d workdir] [-D destdir] [-L logfile]
etcupdate status [-d workdir] [-D destdir]

The "etcupdate extract" material does not show -D destdir as valid.


Thanks, it was a documentation oversight I've just fixed.  It is definitely
supposed to work and is quite useful for cross-builds (e.g. I use it frequently
to update rootfs images I use with qemu for RISC-V or MIPS that I run under
qemu, or when updating the SD-card for my RPI that I cross-build on an x86
host).

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: "etcupdate -p" vs. $OLDTREE?

2021-04-23 Thread John Baldwin

On 4/23/21 6:59 AM, Olivier Cochard-Labbé wrote:

On Fri, Apr 23, 2021 at 1:10 PM David Wolfskill 
wrote:


After the set of updates to etcupdate (main-n246232-0611aec3cf3a ..
main-n246235-ba30215ae0ef), I find that "etcupdate -B -p" is working as
expected, but after the following "make installworld", a subsequent
"etcupdate -B" chugs along for a bit, then stops, whining:

No previous tree to compare against, a sane comparison is not possible.




Same problem here while using /usr/src/tools/build/beinstall.sh:
(...)
Skipping blacklisted certificate
/usr/share/certs/trusted/Verisign_Class_1_Public_Primary_Certification_Authority_-_G3.pem
(/etc/ssl[0/1831]sted/ee1365c0.0)
Skipping blacklisted certificate
/usr/share/certs/trusted/Verisign_Class_2_Public_Primary_Certification_Authority_-_G3.pem
(/etc/ssl/blacklisted/dc45b0bd.0)
Scanning /usr/local/share/certs for certificates...
+ [ -n etcupdate ]
+ update_etcupdate
+ /usr/src/usr.sbin/etcupdate/etcupdate.sh -s /usr/src -D
/tmp/beinstall.MZ4oy8/mnt -F
No previous tree to compare against, a sane comparison is not possible.
+ return 1
+ [ 1 -ne 0 ]
+ errx 'etcupdate (post-world) failed!'
+ cleanup
(...)


Sorry, this should be fixed.  beinstall.sh still has a bug in that it needs 
this change:

diff --git a/tools/build/beinstall.sh b/tools/build/beinstall.sh
index 46c65d87e61a..fab21edc0fd5 100755
--- a/tools/build/beinstall.sh
+++ b/tools/build/beinstall.sh
@@ -133,7 +133,7 @@ update_mergemaster() {
 
 update_etcupdate_pre() {

${ETCUPDATE_CMD} -p -s ${srcdir} -D ${BE_MNTPT} ${ETCUPDATE_FLAGS} || 
return $?
-   ${ETCUPDATE_CMD} resolve -D ${BE_MNTPT} || return $?
+   ${ETCUPDATE_CMD} resolve -p -D ${BE_MNTPT} || return $?
 }
 
 update_etcupdate() {
    



--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


  1   2   3   4   5   6   7   8   9   10   >