Update:
I encountered a post-syspatch relink failure on the box where I have a
modified relink Makefile testing this fix:
LINKFLAGS+= --no-mmap-output-file
Relinking to create unique kernel... failed!
!!! "/usr/libexec/reorder_kernel" must be run manually to install the new kernel
relink.log showed a segfault in ld during the link. could this be caused by an
out-of-memory situation? /usr was only 25% full with > 6G free.
(SHA256) /bsd: OK
LD="ld" sh makegap.sh 0xcccccccc gapdummy.o
ld -T ld.script -X --warn-common -nopie --no-mmap-output-file -o newbsd
${SYSTEM_HEAD} vers.o ${OBJS}
Segmentation fault (core dumped)
*** Error 139 in /usr/share/relink/kernel/GENERIC (Makefile:2432 'newbsd':
@echo ld -T ld.script -X --warn-common -nopie --no-mmap-output-fi...)
So this may be an unreliable solution. Interestingly, I re-ran reorder_kernel
manually (making no changes) and it completed without error.
Lloyd wrote:
> Did a fix ever get merged for this?
>
> I tested enough to validate passing --no-mmap-output-file does indeed
> fix the pesky broken-kernel-upon-ENOSPC problem, not sure of any
> ancillary impacts.
>
> On Wednesday, May 21st, 2025, Jonathan Matthew wrote:
>
> > On Thu, May 15, 2025 at 12:05:25PM +0200, Mark Kettenis wrote:
> >
> > > > Date: Thu, 15 May 2025 11:22:17 +0200
> > > > From: Claudio Jeker [email protected]
> > > >
> > > > On Thu, May 15, 2025 at 06:28:42PM +1000, Jonathan Matthew wrote:
> > > >
> > > > > On Tue, May 13, 2025 at 07:55:03AM +0200, Otto Moerbeek wrote:
> > > > >
> > > > > > On Mon, May 12, 2025 at 08:07:11PM +0200, Sebastien Marie wrote:
> > > > > >
> > > > > > > > I suppose the same could occurs with lld (untested for now).
> > > > > > >
> > > > > > > I confirm it is the same problem with lld.
> > > > > > >
> > > > > > > $ cd /usr/share/relink/kernel/GENERIC.MP
> > > > > > > $ ld -T ld.script -X --warn-common -nopie -o /tmp/newbsd *.o &&
> > > > > > > echo ok
> > > > > > >
> > > > > > > /tmp: write failed, file system is full
> > > > > > > ok
> > > > > > > $ ls -l /tmp/newbsd
> > > > > > > -rwxr-xr-x 1 semarie wheel 236434608 May 12 20:05 /tmp/newbsd*
> > > > > > >
> > > > > > > And my dmesg has the following:
> > > > > > > uvn_flush: obj=0xfffffd86e3311608, offset=0x16b0000. error during
> > > > > > > pageout.
> > > > > > > uvn_flush: WARNING: changes to page may be lost!
> > > > > >
> > > > > > So this code is using mmapped files for writing, which makes proper
> > > > > > error handling extremely difficult or even impossible. Best bet is
> > > > > > making sure enough space is available before starting.
> > > > >
> > > > > lld has a --no-mmap-output-file option that causes it to use plain
> > > > > write(2)
> > > > > calls to generate the output file. Perhaps it'd be worth using that
> > > > > for
> > > > > kernel linking and other stuff we relink at boot time?
> > > >
> > > > Maybe that should be the default. Having lld produce bad binaries but
> > > > exit
> > > > 0 is just very wrong. Not sure if this a problem that only manifests on
> > > > OpenBSD since there is no unified buffer cache or if other systems would
> > > > hit the same issue as well. As Otto mentioned detection IO errors when
> > > > using mmap to write files is not trivial.
> > >
> > > I think the same problem exists on other OSes, even those with a
> > > unified buffer cache. I suppose lld(1) could use msync(2) (with the
> > > MS_SYNC) flag to make sure everything lands on disk correctly. But
> > > that obviously would remove some of the benefits of using mmap(),
> > > namely the async completion of the writes.
> > >
> > > It would be intersting to see what the impact on build times of
> > > changing the defaults would be. But I'm somewhat hesitant to change
> > > the default since the --no-mmap-output-file code path isn't tested
> > > much on other OSes.
> >
> > The code path difference is pretty small, it's entirely within
> > src/gnu/llvm/llvm/lib/Support/FileOutputBuffer.cpp, which is under 200
> > lines.
> > Everything outside that just sees the address of a memory buffer to write
> > to.
> > There are some cases where the mmap code falls back to the non-mmap path,
> > but I agree it's probably not well tested.