On 2023-08-21 18:32, Dragan Simic wrote:
On 2023-08-21 18:19, Andrew Gregory wrote:
On 08/21/23 at 04:42am, Dragan Simic wrote:
Running "pacman -Syyu" to perform a full system update resulted
in a rather conspicuous error message in the output of pacman:

( 3/18) Creating temporary files...
Assertion 'fd' failed at src/tmpfiles/tmpfiles.c:843, function fd_set_perms(). Aborting. /usr/share/libalpm/scripts/systemd-hook: line 28: 1735 Aborted (core dumped) /usr/bin/systemd-tmpfiles --create
error: command failed to execute correctly

Expectedly, running "systemd-tmpfiles --create" manually afterwards
went just fine and resulted in no errors, leading to a conclusion
that executing /usr/share/libalpm/hooks/30-systemd-tmpfiles.hook
failed, but only when it was run from within pacman.

After a detailed and rather lengthy investigation, it turned out
the code in lib/libalpm/util.c that executes the hooks by forking
a child has some bugs that allow the error to occur under certain
circumstances.  In particular, function _alpm_run_chroot() that
executes hooks in a fork()ed child does not employ dup2() properly,
but instead executes close() followed by dup2().

The man page for dup2() clearly states in the quotation below that
attempts to re-implement the equivalent functionality, which is the
case in function _alpm_run_chroot(), must be avoided:

    The dup2() system call performs the same task as dup(), but
    instead of using the lowest-numbered unused file descriptor,
    it uses the file descriptor number specified in newfd.  In other
    words, the file descriptor newfd is adjusted so that it now
    refers to the same open file description as oldfd.

    If the file descriptor newfd was previously open, it is closed
    before being reused;  the close is performed silently (i.e., any
    errors during the close are not reported by dup2()).

    The steps of closing and reusing the file descriptor newfd are
    performed atomically.  This is important, because trying to
    implement equivalent functionality using close(2) and dup()
    would be subject to race conditions, whereby newfd might be
    reused between the two steps.  Such reuse could happen because
    the main program is interrupted by a signal handler that
    allocates a file descriptor, or because a parallel thread
    allocates a file descriptor.

As a result, a condition can occur in which the file descriptor 0 is
closed by calling close(0), and left closed after the while loop that
fails to execute dup2() because of receiving EBUSY, resulting in the
described issues.  Also, failed attempts to execute dup2() should be
treated as fatal errors instead of being silently ignored.

I am trying to reproduce this to wrap my head around what's actually happening and verify that this actually fixes it, but I am unable to get pacman to execute a hook with fd 0 closed with or without the ARM patches. Can you
provide a reproducible case?

Ah, I know very well how it feels, I spent at least a couple of days
hunting the issue down and fixing it.

One of the issues with reproducing the error is that you must have an
old systemd version installed, which still contains assert(fd) checks
in src/tmpfiles/tmpfiles.c that actually trigger the error.  IIRC,
systemd removed those checks about a couple of years ago, presumably
to fix similar complaints.

You may also want to see my original report of the issue, which also
contains the fix:
https://gitlab.manjaro.org/manjaro-arm/packages/core/pacman/-/issues/1

Just checking, have you managed to reproduce the issue? If not, please let me know, and I'll put together some kind of instructions for creating an appropriate environment and a test case that reproduces the error.

Reply via email to