Hello!
I have devised a patch.
Notable findings/details/concerns regarding the implementation are
provided below.
* Shebangs and length limits
Substituting /two/ commands (i.e. "env" and the interpreter it calls) in
addition to appending other arguments has a high likelihood to exceed
127 characters. This was a previous default for Linux (BINPRM_BUF_SIZE
constant), but appears to have been updated to 255 characters in version
5.1, according to execve(2) [1]:
--8<---------------cut here---------------start------------->8---
The kernel imposes a maximum length on the text that follows the
"#!" characters at the start of a script; characters beyond the
limit are ignored. Before Linux 5.1, the limit is 127 characters.
Since Linux 5.1, the limit is 255 characters.
--8<---------------cut here---------------end--------------->8---
This new limit should permit patching "env -S CMD ..." shebangs,
although I've noted the old limit in case backwards compatibility is an
issue.
Tangentially related: Fuzzy-searching "shebang length" or "shebang
limit" in the repository indicates some workarounds/hacks/references to
the old limit. Perhaps this is worth opening a separate issue to
address?
* Handling binary-not-found cases with a catch-throw
Since these changes add new cases where binaries may not be found in
$PATH, I opted to use an exception handler to catch all the cases.
Testing seemed to indicate that the bootstrap process didn't support the
#:unwind? keyword argument for with-exception-handler (possibly due to
executing on an older Guile version?), so I used a catch-throw instead.
* Testing
I've included a patch to add tests for patch-shebang.
The last two test cases ("patch-shebang: fail with \"env -S CMD\" form
when {CMD,\"env\"} not found") have alternative expectations that I'd
like to hear thoughts about. The current expectation is that a shebang
of this form is only patched when /both/ "env" and CMD is found, but we
could also patch binaries as long as either is found. I decided not to
do this since I wasn't sure how to encode the partial success in the
return value (e.g. do we consider "partially successful" as #t or #f?
Or return a tertiary value?), but maybe there are edge cases that could
be fixed by doing this?
My local build of gash-and-dependencies (on glibc-headers-mesboot as of
writing) also appears to be chugging along okay so far, FWIW.
* Footnotes
[1] <https://www.man7.org/linux/man-pages/man2/execve.2.html#VERSIONS>
Cheers,
aurtzy
aurtzy (2):
utils: Handle "env -S CMD" forms in `patch-shebang'.
tests: Add tests for `patch-shebang'.
guix/build/utils.scm | 84 +++++++++++++++++++++++++++-------------
tests/build-utils.scm | 90 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 147 insertions(+), 27 deletions(-)
base-commit: c31662f7294b194663bc521358b01c3a7d7e4e27
--
2.49.0