I'm trying to build Postgres using the Nix language and the Nix package
manager on macOS (see [1]). After some work I was able to build, and even
run Postgres. But `make check` failed with the error

pg_regress: could not exec "sh": No such file or directory

The reason is that pg_regress uses execl() function to execute the shell
recorded in its shellprog variable, whose value is provided using the SHELL
variable via the makefile. Specifically, this error is because execl()
function expects the full path of the executable as the first parameter,
and it does _not_ perform a lookup in $PATH.

Using execl() in pg_regress has worked in the past, because the default
value of SHELL used by GNUmake (and I presume other make implementations)
is /bin/sh, and /bin/sh expected to be present on any Unix-like system.

But in Nixpkgs (the default and the central package repository of Nix/NixOS
distro), they have chosen to patch GNU make, and turn the default value of
SHELL from '/bin/sh' to just 'sh'; see their patch [2]. They did this
because Nixpkgs consider any files outside the Nix Store (the path
/nix/store/, by default) to be "impure". They want the packagers to use
$PATH (consisting solely of paths that begin with /nix/store/...), to
lookup their binaries and other files.

So when pg_regress tries to run a program (the postmaster, in this case),
the execl() function complains that it could not find 'sh',  since there's
no file ./sh in the directory where pg_regress is being run.

Please see attached the one-letter patch that fixes this problem. I have
chosen to replace the execl() call with execlp(), which performs a lookup
in $PATH, and finds the 'sh' to use for running the postmaster. This patch
does _not_ cause 'make check' or any other failures when Postgres is built
with non-Nix build tools available on macOS.

There is one other use of execl(), in pg_ctl.c, but that is safe from the
behaviour introduced by Nixpkgs, since that call site uses the absolute
path /bin/sh, and hence there's no ambiguity in where to look for the
executable.

There are no previous uses of execlp() in Postgres, which made me rethink
this patch. But I think it's safe to use execlp() since it's part of POSIX,
and it's also supported by Windows (see [3]; they say the function name is
"deprecated" but function is "supported" in the same paragraph!!).

There's one mention of execl in src/pl/plperl/ppport.h, and since it's a
generated file, I believe now execlp also needs to be added to that list.
But I'm not sure how to generate that file, or if it really needs to be
generated and included in this patch; maybe the file is re-generated during
a release process. Need advice on that.

GNU make's docs clearly explain (see [4]) the special handling of variable
SHELL, and it seems impossible to pass this variable from an env variable
down to the GNUmakefile of interest. The only alternative I see for us
being able to pass a custom value via SHELL, is to detect and declare the
SHELL variable in one of our higher-level files; and I don't think that'd
be a good idea.

We could propose to Nixpkgs community that they stop patching make, and
leave the default SHELL value alone. But I see very low likelihood of them
accepting our arguments, or changing their ways.

It took many days of debugging, troubleshooting etc, to get to this
root-cause. I first tried to coax autoconf, make, etc. to pass my custom
SHELL through to pg_regress' makefile. Changing CONFIG_SHELL, or SHELL does
not have any impact. Then I had to read the Nixpkgs code, and work out the
archaic ways the packages are defined, and after much code wrangling I was
able to find out that _they_ changed the default value of SHELL by patching
the make sources.

The Nix language is not so bad, but the way it's used to write code in the
Nix community leaves a lot to be desired; ad-hoc environment variable
naming, polluting the built environment with all kinds of variables, almost
non-existent comments, no standards on indentation, etc. These reasons made
me decide to use the plain Nix language as much as possible, and not rely
on Nixpkgs, whenever I can avoid it.

The Nixpkgs and NixOS distro includes all the supported versions of
Postgres, so one would assume they would've also encountered, and solved,
this problem. But they didn't. My best guess as to why, is, I believe they
never bothered to run `make check` on their built binaries.

[1]: https://github.com/DrPostgres/HaathiFarm/blob/master/default.nix
[2]:
https://github.com/NixOS/nixpkgs/blob/release-22.05/pkgs/development/tools/build-managers/gnumake/0001-No-impure-bin-sh.patch
[3]:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/execlp?view=msvc-170
[4]:
https://www.gnu.org/software/make/manual/html_node/Choosing-the-Shell.html

Best regards,
Gurjeet
http://Gurje.et

Attachment: pg_regress_use_execlp_to_lookup_shellprog.patch
Description: Binary data

Reply via email to