Re: 1.16.90 regression: configure now takes 7 seconds to start
Bruno Haible wrote: Jacob Bachmeyer wrote: under what conditions can "checking that generated files are newer than configure" actually fail? I mentioned two such conditions in [1]: - Skewed clocks. (I see this regularly on VMs that have 1 or 2 hours of skew.) - If the configure file was created less than 1 second ago and the file system time resolution is 1 second. (This happens frequently in the Automake test suite.) In the first of those scenarios, AM_SANITY_CHECK should bail out. In the second case, AM_SANITY_CHECK should delay for 1 second, and then find the test file newer than configure. One (or both?) of us is misunderstanding something here. First, configure performs AM_SANITY_CHECK ("checking that build environment is sane") and bails out if that test fails. For that test to pass, a generated file (conftest.file in the old version) must test to be newer than configure. If that test fails, configure aborts and "checking that generated files are newer then configure" is never reached. Given that "checking that generated files are newer than configure" is reached, which implies that a file produced before any actual tests were run was found to be newer than configure, how can config.status, which is produced /after/ tests are run, now fail to be newer than configure? -- Jacob
Re: 1.16.90 regression: configure now takes 7 seconds to start
Zack Weinberg wrote: On Tue, Jun 18, 2024, at 12:02 AM, Jacob Bachmeyer wrote: [...] Wait... all of configure's non-system dependencies are in the release tarball and presumably (if "make dist" worked correctly) backdated older than configure when the tarball is unpacked. In my experience, tarballs cannot be trusted to get this right, *and* tar implementations cannot be trusted to unpack them accurately (e.g. despite POSIX I have run into implementations that defaulted to the equivalent of GNU tar's --touch mode). Subsequent bounces through downstream repackaging do not help. Literally as I type this I am watching gettext 0.22 run its ridiculous number of configure scripts a second time from inside `make`. First, "make dist" should get the tarball right. Second, absent some special flag (--enable-maintainer-mode?), a package using Automake should have no problem if all distributed files have the same timestamp. I see a possibility of a lazy tar(1) implementation not restoring timestamps at all, with the result that the unpacked files get mtimes in the order they were unpacked from the archive. Perhaps "make dist" should sort the files into the proper order while packing the tarball? Automake should have the dependency graph available while generating the "make dist" commands... Does "make dist" need to touch configure to ensure that it is newer than its dependencies before rolling the tarball? It ought to, but I don't think that will be more than a marginal improvement, and touching the top-level configure won't be enough, you'd need to do a full topological sort on the dependency graph leading into every configure + every Makefile.in + every other generated-but-shipped file and make sure that each tier of generated files is newer than its inputs. I wonder if a more effective approach would be to disable the rules to regenerate configure, Makefile.in, etc. unless either --enable-maintainer-mode or we detect that we are building out of a VCS checkout. I thought that that /was/ the effect of --enable-maintainer-mode? I would also suggest not handling VCS checkouts specially. If you want the Makefile rules for generating GNU build system scripts, you should have to say --enable-maintainer-mode. Otherwise, you can always use the tools directly or put an autogen.sh or bootstrap.sh or similar in the VCS. -- Jacob
Re: 1.16.90 regression: configure now takes 7 seconds to start
Zack Weinberg wrote: On Mon, Jun 17, 2024, at 10:30 PM, Jacob Bachmeyer wrote: ... Don't have enough brain right now to comment on any of the rest of your suggestions, but: once conftest.file is newer than configure, surely config.status, which is produced after all tests are run, /must/ also be newer than configure? How is this last check/delay actually necessary? Are there broken systems out there that play games with causality? I regret to say, yes, there are. For example, this can happen with NFS if there are multiple clients updating the same files and they don't all agree on the current time. Think build farm with several different configurations being built out of the same srcdir - separate build dirs, of course, but that doesn't actually help here since the issue is ensuring the Makefile doesn't think *configure* (not config.status) needs rebuilt. Wait... all of configure's non-system dependencies are in the release tarball and presumably (if "make dist" worked correctly) backdated older than configure when the tarball is unpacked. Does "make dist" need to touch configure to ensure that it is newer than its dependencies before rolling the tarball? How can configure [appear to] need to be rebuilt here? No build should touch it or its dependencies. Or, to put this another way, under what conditions can "checking that generated files are newer than configure" actually fail? If we do not know of any, then perhaps we should add a hidden "--enable--wait-for-newer-config.status" (double-hyphen intentional) option, and unless that option is given, bail out with a message asking (1) to report the system and environment configuration to the Automake list and (2) rerun configure with that option to sleep until config.status is newer instead of bailing out. -- Jacob
Re: 1.16.90 regression: configure now takes 7 seconds to start
Nick Bowler wrote: On 2024-06-16 21:35, Jacob Bachmeyer wrote: I think we might best be able to avoid this by using AC_CONFIG_COMMANDS_POST to touch config.status if neccessary, instead of trying to decide whether to sleep before writing config.status. If the problem is simply that we want to avoid the situation where "make" considers config.status to be out of date wrt. configure, or something similar with any other pair of files, then this should be solveable fairly easily with a pattern like this (but see below): AC_CONFIG_COMMANDS_POST([cat >conftest.mk <<'EOF' configure: config.status false EOF while ${MAKE-make} -f conftest.mk >/dev/null 2>&1 do touch config.status done]) In my own experience the above pattern is portable. It works with HP-UX make. It works with a "touch" that truncates timestamps. In the common case where configure is sufficiently old the loop condition will always be false and there is no delay. It won't guarantee that config.status has a strictly newer timestamp than configure (except on HP-UX), but it sounds like that's fine. We can guarantee that by reusing the pattern in AM_SANITY_CHECK, which uses `ls -t`, with the advantage that we have already used that pattern, so it cannot add "new" possible portability problems. I would also suggest a `sleep 1` in the loop instead of spinning on the test, since we expect the common case to not loop at all. Also, if we use `echo >> config.status` as Bruno Haible suggested in another reply, every cycle will add one newline to the end of config.status, so spinning at the test could make config.status very large. If we want to allow "checking that generated files are newer than configure" to fail, I would suggest bounding this at 5 seconds and bailing out after 5 `sleep 1` if config.status is not newer by then, but see below. One missing element is that there is no limit, which would be a bit of a problem if the clock skew is severe (e.g., if configure's mtime is years or even minutes in the future), so something extra is probably desirable to bound the amount of time this runs to something practical. This will not be a problem: AM_SANITY_CHECK bails out (or will bail out) if a recently-created file cannot be made newer than configure by sleeping briefly. If configure's mtime is in the future, config.status will never be written and this code will never be reached. The delay here is thus bounded by the filesystem timestamp resolution, since we may have to wait until config.status is newer than configure---but no longer---and that only if configure was regenerated just before being run. In the case of a tree of configure scripts that started this current mess, time will march on as the first run waits for config.status to be newer, and the later configure runs will each find that their config.status is newer when it is first written. In fact, now that I think about it, I am not sure how this could ever be a problem: time marches on as AM_SANITY_CHECK is doing its thing before any tests are run, so once conftest.file is newer than configure, surely config.status, which is produced after all tests are run, /must/ also be newer than configure? How is this last check/delay actually necessary? Are there broken systems out there that play games with causality? -- Jacob
Re: use of make in AM_SANITY_CHECK
Karl Berry wrote: make(1) in AM_SANITY_CHECK seems to be a logic error, since the user may want to build with a different $MAKE, You're right. Crap. It never ends. In practice it probably doesn't matter, though. Although in theory one can imagine that "make" succeeds while $MAKE fails, resulting in a false positive, in practice that seems next to zero probability to me. Much more likely is that "make" fails and $MAKE succeeds, and the only downside of that is an extra second of sleeping. The problem is that we still sleep unnecessarily in the sanity check. While there is no way to avoid sleeping if we need to /measure/ the filesystem timestamp resolution, few packages actually need that information (Automake itself is one of them, for its testsuite) and the sanity check can be (and previously was) done without actually measuring it. have a way to revise AM_SANITY_CHECK that can avoid any sleep in the most common cases. Bruno's last patch already does that, doesn't it? I'll apply it shortly. No, that patch does not: it promotes _AM_FILESYSTEM_TIMESTAMP_RESOLUTION to AM_FILESYSTEM_TIMESTAMP_RESOLUTION (removing the underscore), but still calls it as part of AM_SANITY_CHECK. I propose first mostly reverting to the code in commit f6b3f7fb620580356865ebedfbaf76af3e534369: revising AM_SANITY_CHECK to create a test file and immediately check if that file is newer than configure itself, then (if needed) sleep for one second, overwrite the test file and test again, then (if needed) sleep for one more second and repeat to allow FAT filesystems to be considered "sane". Then, replace the effect of commit 333c18a898e9042938be0e5709ec46ff0ead0797 and fix the problem with config.status not being newer than configure by adding an AC_CONFIG_COMMANDS_POST block that checks if config.status is newer than configure, and if not, sleeps one second and executes "touch config.status", then repeats that test once (again to accommodate FAT filesystem limitations) if needed. In Mike Frysinger's situation of a Gentoo build with many small configure scripts, this /should/ result in at most one configure sleeping once, after which all of the other freshly regenerated configure scripts will already be old enough to avoid delays. -- Jacob
Re: 1.16.90 regression: configure now takes 7 seconds to start
Karl Berry wrote: Find here attached a revised proposed patch. Ok on the reorg, but sorry, I remain confused. This whole thing started with Mike Vapier's change in Feb 2022 (commit 720a11531): https://lists.gnu.org/archive/html/automake-commit/2022-02/msg9.html As I read it now, his goal was to speed up other projects, not Automake, by reducing the "sleep 1" to "sleep " in AM_SANITY_CHECK, via AC_CONFIG_COMMANDS_PRE, i.e., before creating config.status. But that is only one instance of generating files. I must be missing something obvious. There are zillions of generated files in the world. For instance, why aren't there problems when a small C file is created and compiled? That could easily take less than 1 second, if that is the mtime resolution. I understand that equal timestamps are considered up to date, and presumably the .c and .o (say) would be equal in such a case. Ok, but then why is configure generating config.status/etc. such a special case that it requires the sleep, and nothing else? I mean, I know the sleep is needed; I've experienced the problems without that sleep myself. But I don't understand why it's the only place (in normal compilations; forget the Automake test suite specifically) that needs it. The sleep appears to have been introduced in commit 333c18a898e9042938be0e5709ec46ff0ead0797, which also added an item in NEWS: 8<-- * Miscellaneous changes: - Automake's early configure-time sanity check now tries to avoid sleeping for a second, which slowed down cached configure runs noticeably. In that case, it will check back at the end of the configure script to ensure that at least one second has passed, to avoid time stamp issues with makefile rules rerunning autotools programs. 8<-- Mike Frysinger () then complained that the above change, which enacted a policy of ensuring that any configure run requires at least one second, significantly delayed building packages that use many small configure scripts; his example in commit 720a1153134b833de9298927a432b4ea266216fb showed an elimination of nearly two minutes of useless delays. He appears to have also been trying to improve the performance of such a package in commit be55eaaa0bae0d6def92d5720b0e81f1d21a9db2, which may have actually made the problem worse by changing the test that determines whether to sleep at all. I think we might best be able to avoid this by using AC_CONFIG_COMMANDS_POST to touch config.status if neccessary, instead of trying to decide whether to sleep before writing config.status. Can someone please educate me as to what is really going on underneath all this endless agonizing tweaking of the mtime tests? I think that the main problem is that the test itself is difficult to do portably. -- Jacob
Re: use of make in AM_SANITY_CHECK (was: improved timestamp resolution test)
Karl Berry wrote: Jacob, [*sigh*] You said it. About this whole thing. I rather wish this bright idea had never come to pass. It has delayed the release by months. Oh well. Still, could we use make(1) for *all* of the testing and not use `ls -t` I guess it is technically possible, but I somehow feel doubtful about relying entirely on make. Using ls already has plenty of portability issues; I shudder to think how many strange problems we'll run into when we start exercising timing edge cases in make. Well, after having had some time to think about this, I have noticed a logic error in the current code. When _AM_FILESYSTEM_TIMESTAMP_RESOLUTION was introduced in commit 720a1153134b833de9298927a432b4ea266216fb, it did not use make. Commit 23e69f6e6d29b0f9aa5aa3aab2464b3cf38a59bf introduced the use of make in that test to work around a limitation on MacOS, but using make(1) in AM_SANITY_CHECK seems to be a logic error, since the user may want to build with a different $MAKE, which may have different characteristics from the system make. I think that we actually need a new AM_PROG_MAKE_FILESYSTEM_TICK_DELAY or similar that packages needing that information can use, and I think I have a way to revise AM_SANITY_CHECK that can avoid any sleep in the most common cases. There is no way to avoid sleeping when we need to measure the exact delay needed for files to be distinguishably newer, but most packages probably do not care about that, and (in the most common case) we can expect configure's mtime to be backdated according to the tarball it was unpacked from. If configure was recently regenerated, we need only sleep 1 (classic POSIX) or 2 (FAT) seconds before either passing the test or declaring the build environment insane. However, a package with a large number of configure scripts will only need for one of them to sleep; the rest will all then be old enough to take the zero-delay path. Are you willing to consider patches on this? -- Jacob
Re: End of life dates vs regression test matrix
Dan Kegel wrote: Does automake have a policy on when to stop supporting a CPU, operating system, or compiler? I am pondering the size of the matrix of supported operating systems, cpus, and compilers, and wonder where a policy like "Automake drops support 20 years after the release of a CPU, operating system, or compiler version" would fall on the heresy/utility plane. The way I understand that the GNU build system is supposed to work is that there are no "supported" CPUs, operating systems, etc. The GNU build system adapts packages to features found on the current machine by testing for those features just before building the package, using an often very lengthy shell script named "configure" that is itself generated by the relevant maintainer tools. This system has worked surprisingly well---releases made years ago can often be adapted to processor architectures that literally did not exist when the source tarball was built by simply replacing config.{guess,sub} with current versions that recognize the newer architecture. As far as those scripts embodying lists of known architectures go, entries appear to /never/ expire, and config.guess still today can identify (or so we think) systems that predate POSIX. -- Jacob
Re: improved timestamp resolution test (was: 1.16.90 regression: configure now takes 7 seconds to start)
Karl Berry wrote: Does BSD ls(1) support "--time=ctime --time-style=full-iso"? BSD ls does not support any --longopts. Looking at the man page, I don't see "millisecond" or "subsecond" etc. mentioned, though I could easily be missing it. E.g., https://man.freebsd.org/cgi/man.cgi?ls Even if there is such an option, I am skeptical of how portable it would be, or trying to discern whether it is really working or not. All the evidence so far is that it is very difficult to determine whether subsecond mtimes are sufficiently supported or not. Speaking in general, I don't think trying to get into system-specific behaviors, of whatever kind, is going to help. [*sigh*] It seems that there is no good way for configure to read timestamps, so we are limited to testing if file ages are distinguishable. Still, could we use make(1) for *all* of the testing and not use `ls -t` at all? A rough outline would be something like: (lightly tested; runs in about 2.2s here) 8<-- # The case below depends on the 1/10 + 9/10 = 10/10 pattern. am_try_resolutions="0.01 0.09 0.1 0.9 1" echo '#' > conftest.mk i=0 for am_try_res in $am_try_resolutions; do echo ts${i} > conftest.ts${i} sleep $am_try_res echo "conftest.ts${i}: conftest.ts"`expr 1 + $i` >> conftest.mk echo "echo $am_try_res" >> conftest.mk i=`expr 1 + $i` done echo end > conftest.ts${i} # This guess can be one step too fast, if the shorter delay just # happened to span a clock tick boundary. am_resolution_guess=`make -f conftest.mk conftest.ts0 | tail -1` case $am_resolution_guess in *9) i=no for am_try_res in $am_try_resolutions; do if test x$i = xyes; then am_resolution=$am_try_res break fi test x$am_try_res = x$am_resolution_guess && i=yes done ;; *) am_resolution=$am_resolution_guess ;; esac 8<-- The trick is that the various options form a dependency chain, but the command make will execute does /not/ actually touch the target, so it stops when the files are no longer distinguishable. This distinguishes between a tmpfs (which has nanosecond resolution here) and /home (which is an older filesystem with only 1-second resolution). I am not sure what it does with FAT yet. There should be some way to use 0.1+0.9+1 = 2 and 0.01+0.09+0.1+0.9+1 > 2 to check for that (accurately!) without further sleeps. -- Jacob
Re: 1.16.90 regression: configure now takes 7 seconds to start
dherr...@tentpost.com wrote: At some point, it becomes unreasonable to burden common platforms with delays that only support relatively obscure and obsolete platforms. Configure scripts already have a bad reputation for wasting time. Even if they are faster than editing a custom makefile, they are idle instead of active time for the user, so waiting is harder. I feel that 6-second test delays or 2-second incremental delays later qualify as clearly unreasonable. The 1-second timestamps are borderline unreasonable. Cross-compiling with a decent filesystem is more reasonable. One second timestamp granularity is classic POSIX, and apparently also modern NetBSD. We must support it. Why can't we resolve this by requiring systems with 2-second resolution to set a flag in config.site? That moves the burden closer to where it belongs. First, because configure scripts are supposed to Just Work without particular expertise on the part of the user. (Users with such deficient systems are least likely to have the expertise to handle that.) Second, because timestamp resolution is actually per-volume, which in the POSIX model, means it varies by directory. You can even have a modern filesystem (with nanosecond granularity) mounted on a directory in a FAT filesystem (with two second granularity) and ultimately a root filesystem with one second granularity. In fact, the machine on which I type this has all three: any tmpfs has nanosecond resolution, but /home has been carried for many years since mkfs and has one-second resolution, and I have removable media that is formatted FAT with its infamous two-second resolution. All of these, when in use, appear in the same hierarchical filesystem namespace. -- Jacob
Re: Bug Resilience Program of German Sovereign Tech Fund
Karl Berry wrote: [...] > and reduce technical debt. I don't know what that means. I instinctively shy away from such vague buzzwords. Essentially, "technical debt" means "stuff on the TODO list" and more specifically the accumulation of "good enough for now; fix it later" that tends to happen in software projects. As for "modernizing" autoconf/make, mentioned in other msgs, that's the last thing that should be done. We go to a lot of trouble to make the tools work on old systems that no one else supports. For example, I can just picture them saying "oh yes, you should use $(...) instead of `...`" and other such "modern" shell constructs. Or "use Perl module xyz to simplify", where xyz only became available a few years ago. Etc. If you make them run their patches past the mailing list, I will happily complain if they try to break backwards compatibility without a very good reason. Remember Time::HiRes and perl 5.6? :-) -- Jacob
Re: 1.16.90 regression: configure now takes 7 seconds to start
Karl Berry wrote: bh> Seen e.g. on NetBSD 10.0. Which doesn't support subsecond mtimes? jb> Maybe the best answer is to test for subsecond timestamp granularity first, and then only do the slow test to distinguish between 1-second and 2-second granularity if the subsecond granularity test gives a negative result? Unfortunately, that is already the case. The function (_AM_FILESYSTEM_TIMESTAMP_RESOLUTION in m4/sanity.m4) does the tests starting with .01, then .1, then 1. Searching for sleep [012] in Bruno's log confirms this. So we are hitting the one-second timestamp granularity path because there is a modern system that does not have sub-second timestamp granularity, and that path is annoyingly slow. If I understand correctly, Bruno's goal is to omit the "1" test if we can detect that we're not on a fat filesystem. But I admit I don't like trying to inspect filesystem types. That way lies madness, it seems to me, and this whole function is already maddening enough. E.g., mount and/or df could hang if NFS is involved. I agree, although I had not considered the possibilities of problems with NFS. It seems to me that using stat doesn't help because it's not available by default on the bsds etc., as Bruno pointed out. Does BSD ls(1) support "--time=ctime --time-style=full-iso"? That would give equivalent information as stat(1) and, if at least one file has an odd seconds field, would rule out FAT quickly. It could also indicate one-second granularity, if all subsecond parts are zero. The slow test would still be required to confirm the worst case: all timestamps are even because the filesystem has 2-second timestamp granularity. The simple change is to omit the make test if we are at resolution 1. That will save 4 seconds. Omitting it is justified because the make test is only there for the sake of makes that are broken wrt subsecond timestamps. I will do that. If the critical issue is whether or not make(1) correctly handles subsecond timestamp granularity, why not simply test if make(1) recognizes subsecond timestamp differences and remove the other tests? If we are on a filesystem that does not have subsecond timestamp granularity, make will not have it either. That will leave 2+ sec of sleeping, but if we are to reliably support fat, I don't see a good alternative. At least it's not as bad as 6+. Any other ideas? As I hinted at, could we move the entire test into make(1) somehow? Could we lay out a set of files with timestamps differing by .01, .1, 1 seconds and then see which pairs have ages distinguishable by make(1)? That should complete in less than 2 seconds: 1.11 seconds to make the files and less than half a second to run make(1). If we can find a portable way to read timestamps to 1-second resolution, we can confirm not being on FAT---there will be one "odd file out" and the others will all have either odd or even timestamps. If the timestamps all match, we can assume 2-second granularity without further testing, or do the slow test to confirm it. If the "odd file out" has a timestamp two seconds ahead of the others, we *know* the filesystem has 2-second granularity and we crossed a "tick" boundary while making the files. Alternately, could we improve the UI by emitting one additional dot per approximate second during the test? Reassure the user that, yes, configure is doing something, even if all we can actually do is wait for the clock to advance. -- Jacob
Re: 1.16.90 regression: configure now takes 7 seconds to start
Bruno Haible wrote: Hi Jacob, AFAIU, the 4x sleep 0.1 are to determine whether am_cv_filesystem_timestamp_resolution should be set to 0.1 or to 1. OK, so be it. But the 6x sleep 1 are to determine whether am_cv_filesystem_timestamp_resolution should be set to 1 or 2. 2 is known to be the case only for FAT/VFAT file systems. Therefore here is a proposed patch to speed this up. On NetBSD, it reduces the execution time of the test from ca. 7 seconds to ca. 0.5 seconds. The problem with the proposed patch is that it tries to read a filesystem name instead of testing for the feature. This would not be portable to new systems that use a different name for their FAT filesystem driver. I can amend the patch so that it uses `uname -s` first, and does the optimization only for the known systems (Linux, macOS, FreeBSD, NetBSD, OpenBSD, Solaris). This still has the same philosophical problem: testing for a known system rather than for the feature we actually care about. (We could also identify FAT with fair confidence by attempting to create a file with a name containing a character not allowed on the FAT filesystem, but I remember Linux having had at least one extended FAT driver ("umsdos" if I remember correctly) that lifted the name limits, but I do not remember if it also provided improved timestamps.) I think the test can be better optimized for the common case by first checking if stat(1) from GNU coreutils is available ([[case `stat --version` in *coreutils*) YES;; *) NO;; esac]]) Sure, if GNU coreutils 'stat -f' is available, things would be easy. But typically, from macOS to Solaris, it isn't. You can't achieve portability by using a highly unportable program like 'stat'. That's why my patch only uses 'df' and 'mount'. You can use anything in configure, *if* you first test for it and have a fallback if it is not available. In this case, I am proposing testing for 'stat -f', using it to examine conveniently-available timestamps to establish an upper bound on timestamp granularity if we can, and falling back to the current (slow) tests if not. Users of the GNU system will definitely get the fast path. and, if it is (common case and definitely so on the GNU system), checking [[case `stat --format=%y .` in *:??.0) SUBSEC_RESOLUTION=no;; *) SUBSEC_RESOLUTION=yes;; esac]] to determine if sub-second timestamps are likely to be available I don't care much about the 0.4 seconds spent on determining sub-second resolution. It's the 6 seconds that bug me. If 'stat -f' is available, we should be able to cut that to milliseconds. GNU systems will have 'stat -f', others might. The slow path would remain available if the fast path cannot be used. Using a direct feature test for 'stat -f' might motivate the *BSDs to also support it. To handle filesystems with 2-second timestamp resolution, check the timestamp on configure, and arrange for autoconf to ensure that the timestamp of a generated configure script is always odd Since a tarball can be created on ext4 and unpacked on vfat FS, That is exactly the situation I am anticipating here. this would mean that autoconf needs to introduce a sleep() of up to 1 second, _regardless_ on which FS it is running. No, thank you, that is not a good cure to the problem. One second, once, when building configure, to ensure that configure will have an odd timestamp... does autoconf normally complete in less than one second? Would this actually increase the running time significantly? Or, as Simon Richter mentioned, use the utime builtin (Autoconf is now written in Perl) to advance the mtime of the created file by one second before returning with no actual delay. The bigger problem would be that it would be impossible to properly package such a configure script if using a filesystem with 2-second granularity. Such a configure script would always be unpacked with an even timestamp (because it was packaged with an even timestamp) and the 2-second granularity test would give a false positive if the filesystem actually has 1-second granularity, but configure itself was generated on a 2-second granularity filesytem. The suggested tests for sub-second granularity would still work correctly on the unpacked files, however---if you can see non-zero fractional seconds in timestamps, you know that you are not on a 2-second granularity filesystem. Maybe the best answer is to test for subsecond timestamp granularity first, and then only do the slow test to distinguish between 1-second and 2-second granularity if the subsecond granularity test gives a negative result? Most modern systems will have the subsecond timestamp granularity, so would need only the 0.4 second test; older systems would need the full 6.4 second test, but would still work reliably. At worst, we might need to extend the 0.4 second test to 0.5 seconds, to confirm that we did not just happen to
Re: 1.16.90 regression: configure now takes 7 seconds to start
Bruno Haible wrote: [I'm writing to automake@gnu.org because bug-autom...@gnu.org appears to be equivalent to /dev/null: no echo in https://lists.gnu.org/archive/html/bug-automake/2024-06/threads.html nor in https://debbugs.gnu.org/cgi/pkgreport.cgi?package=automake, even after several hours.] In configure scripts generated by Autoconf 2.72 and Automake 1.16.90, one of the early tests checking filesystem timestamp resolution... takes 7 seconds! Seen e.g. on NetBSD 10.0. Logging the execution time, via sh -x ./configure 2>&1 | gawk '{ print strftime("%H:%M:%S"), $0; fflush(); }' > log1 I get the attached output. There are 6x sleep 1 4x sleep 0.1 That is, 6.4 seconds are wasted in sleeps. IBM software may do this; but GNU software shouldn't. AFAIU, the 4x sleep 0.1 are to determine whether am_cv_filesystem_timestamp_resolution should be set to 0.1 or to 1. OK, so be it. But the 6x sleep 1 are to determine whether am_cv_filesystem_timestamp_resolution should be set to 1 or 2. 2 is known to be the case only for FAT/VFAT file systems. Therefore here is a proposed patch to speed this up. On NetBSD, it reduces the execution time of the test from ca. 7 seconds to ca. 0.5 seconds. The problem with the proposed patch is that it tries to read a filesystem name instead of testing for the feature. This would not be portable to new systems that use a different name for their FAT filesystem driver. I think the test can be better optimized for the common case by first checking if stat(1) from GNU coreutils is available ([[case `stat --version` in *coreutils*) YES;; *) NO;; esac]]) and, if it is (common case and definitely so on the GNU system), checking [[case `stat --format=%y .` in *:??.0) SUBSEC_RESOLUTION=no;; *) SUBSEC_RESOLUTION=yes;; esac]] to determine if sub-second timestamps are likely to be available; this has a 1-in-actual-ticks-per-second of giving a false negative. These checks would be very fast, so could also be repeated with the access and inode change timestamps and/or extended to other files (`stat *`) for better certainty. The basic concept should be sound, although the pattern matching used in the examples is a first cut. The essential idea is that the fractional part beyond what the filesystem actually records will always read as zero, and unpacking an archive is not instant, so we should see every implemented fractional bit set at least once across files in the tree containing configure. To handle filesystems with 2-second timestamp resolution, check the timestamp on configure, and arrange for autoconf to ensure that the timestamp of a generated configure script is always odd---that least-significant bit will be dropped when the script is unpacked on a filesystem with 2-second timestamp resolution. If stat from GNU coreutils is not available, fall back to the current sleep(1)-based test and just eat the delay in the name of portability. The test checks only for "coreutils" because very old versions did not say GNU. A better, functional test for stat(1) is probably also possible. -- Jacob
Re: follow-up on backdoor CPU usage (was: libsystemd dependencies)
Jacob Bachmeyer wrote: [...] The preliminary reports that it was an RCE backdoor that would pass commands smuggled in public key material in SSH certificates to system(3) (as root of course, since that is sshd's context at that stage) are inconsistent with the slowdown that caused the backdoor to be discovered. I doubt that SSH logins were using that code path, and the SSH scanning botnets almost certainly are not presenting certificates, yet it apparently (reports have been unclear on this point) was the botnet scanning traffic that led to the discovery of sshd wasting considerable CPU time in liblzma... I am waiting for the proverbial other shoe to drop on that one. I have been given (https://www.openwall.com/lists/oss-security/2024/04/18/1>) a satisfactory explanation for the inconsistency: OpenSSH sshd uses exec(2) to reshuffle ASLR before accepting each connection, and the backdoor blob's tampering with the dynamic linking process greatly reduces the efficiency of ld.so on top of its own processing. The observable wasted CPU time was the backdoor's excessively-complex initialization, rather than any direct effect on sshd connection processing. -- Jacob
Re: GCC reporting piped input as a security feature
Zack Weinberg wrote: On Tue, Apr 9, 2024, at 11:35 PM, Jacob Bachmeyer wrote: Jan Engelhardt wrote: On Tuesday 2024-04-09 05:37, Jacob Bachmeyer wrote: In principle it could be posible to output something different to describe this stramge situation explicitly. For instance, output "via stdin" as a comment, or output `stdin/../filename' as the file name. (Programs that optimize the file name by deleting XXX/.../ are likely not to check whether XXX is a real directory.) ... How about `/dev/stdin/-` if no filename has been specified with #line or whatever, and `/dev/stdin/[filename]` if one has, where [filename] is the specified filename with all leading dots and slashes stripped, falling back to `-` if empty? /dev/stdin can be relied on to either not exist or not be a directory, so these shouldn't ever be openable. I like that idea, but would suggest expanding on it as "/dev/stdin/[working directory]//-" or "/dev/stdin/[working directory]//[full specified filename]". The double slash allows tools that care to parse out the specified filename, while the working directory preceding it provides a hint where to find that file if the specified filename is relative, but the kernel will collapse it to a single slash if a tool just passes the "[working directory]//[specified filename]" to open(2). Since the working directory should itself be an absolute name, there would typically be a double slash after the "/dev/stdin" prefix. Something like "/dev/stdin//var/cache/build/foopkg-1.0.0///usr/src/foopkg-1.0.0/special.c.m4" as an artificial example. -- Jacob
Re: GCC reporting piped input as a security feature
Jan Engelhardt wrote: On Tuesday 2024-04-09 05:37, Jacob Bachmeyer wrote: In principle it could be posible to output something different to describe this stramge situation explicitly. For instance, output "via stdin" as a comment, or output `stdin/../filename' as the file name. (Programs that optimize the file name by deleting XXX/.../ are likely not to check whether XXX is a real directory.) With the small difference that I believe the special marker should be '' (with the angle brackets, as it is now), this could be another good idea. Example output: "[working directory][specified filename]" or "[specified filename]///<>/[working directory]/". GDB could be modified [...] This will likely backfire. Assuming you have a userspace program which does not care about any particular substring being present, the fullpath is passed as-is to the OS kernel, which *will* resolve it component by component, and in doing so, stumble over the XXX/ part. And upon so stumbling, return ENOENT or ENOTDIR. Where is the harm there? Input read from a pipe does not exist in the filesystem. Better introduce a new DW_AT_ field for a stdin flag. That would mean that older tools could be confused. How about a new field for "source-specified filename" when that differs from the actual file being read? That way, existing tools would still see "[working directory]/" and avoid confusion, which could be a security risk here. -- Jacob
Re: GCC reporting piped input as a security feature
Alan D. Salewski wrote: On 2024-04-08 22:37:50, Jacob Bachmeyer spake thus: Richard Stallman wrote: [...] In principle it could be posible to output something different to describe this stramge situation explicitly. For instance, output "via stdin" as a comment, or output `stdin/../filename' as the file name. (Programs that optimize the file name by deleting XXX/.../ are likely not to check whether XXX is a real directory.) With the small difference that I believe the special marker should be '' (with the angle brackets, as it is now), this could be another good idea. Example output: "[working directory][specified filename]" or "[specified filename]///<>/[working directory]/". GDB could be modified to recognize either form and read the specified file (presumably some form of augmented C) but report that the sources were transformed prior to compilation. The use of triple-slash ensures that these combined strings cannot be confused with valid POSIX filenames, although I suspect that uses of these strings would have to be a GNU extension to the debugging info format. I do not think that the use of triple-slash (or any-N-slash) would entirely prevent potential confusion with valid POSIX filenames, as POSIX treats multiple slashes as equivalent to a single slash (except in at the beginning of a path, where two slash characters may have a different, implementation-defined meaning). Since a pathname component name can basically contains any bytes except and , any token value chosen will likely have some non-zero potential for confusion with a valid POSIX pathname. Yes, this is the downside of the extreme flexibility of POSIX filename semantics. Any C string is potentially a valid filename. From SUSv4 2018[0] (update from 2020-04-30, which is what I happen to have handy): 3.271 Pathname A string that is used to identify a file. In the context of POSIX.1-2017, a pathname may be limited to {PATH_MAX} bytes, including the terminating null byte. It has optional beginning characters, followed by zero or more filenames separated by characters. A pathname can optionally contain one or more trailing characters. Multiple successive characters are considered to be the same as one , except for the case of exactly two leading characters. Rats, I had forgotten that detail. Emacs treats typing a second slash as effectively invalidating everything to the left, I remembered that some systems (and many URL schemes) use double-slash to indicate a network host, and I expected that 3 slashes would mean starting over at the root if that were ever presented to the kernel's filename resolution service. On the other hand, we could use multiple slashes as a delimiter if GCC normalizes such sequences in input filename strings to single slash, which POSIX allows, according to the quote above. The simplest solution would be to simply document and preserve the current behavior, which appears to be ignoring directives and recording the working directory and "" in the case of reading from a pipe, and making sure that no normal build procedure for any GNU package pipes source into the compiler. -- Jacob
Re: GCC reporting piped input as a security feature
Richard Stallman wrote: [[[ To any NSA and FBI agents reading my email: please consider]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > While it does not /prevent/ cracks, there is something we can ensure > that we *keep* doing: GCC, when reading from a pipe, records the input > file as "" in debug info *even* if a "#" directive to set the > filename has been included. This was noticed by Adrien Nader (who > posted it to oss-security; > https://www.openwall.com/lists/oss-security/2024/04/03/2> and > https://marc.info/?l=oss-security&m=171214932201156&w=2>; those are > the same post at different public archives) and should provide a > "smoking gun" test to detect this type of backdoor dropping technique in > the future. This GCC behavior should be documented as a security > feature, because most program sources are not read from pipes. Are you suggesting fixing GCC to put the specified file into those linenumbers, or are you suggesting we keep this behavior to help with analysis? I am suggesting that we keep this behavior (and document it as an explicit security feature) to help with detection of any future similar cracks, and add provisions to the GNU Coding Standards to avoid false positives by requiring generated sources to appear in the filesystem instead of being piped to the compiler. In principle it could be posible to output something different to describe this stramge situation explicitly. For instance, output "via stdin" as a comment, or output `stdin/../filename' as the file name. (Programs that optimize the file name by deleting XXX/.../ are likely not to check whether XXX is a real directory.) With the small difference that I believe the special marker should be '' (with the angle brackets, as it is now), this could be another good idea. Example output: "[working directory][specified filename]" or "[specified filename]///<>/[working directory]/". GDB could be modified to recognize either form and read the specified file (presumably some form of augmented C) but report that the sources were transformed prior to compilation. The use of triple-slash ensures that these combined strings cannot be confused with valid POSIX filenames, although I suspect that uses of these strings would have to be a GNU extension to the debugging info format. (If GNU-extended debugging information is inhibited, I think it is more important to declare that the input came from a pipe than to carry the specified filename.) This might actually be a good idea in general if a directive specifies a filename with the same suffix but not the file being read. As another layer against similar attacks, distribution packaging tools could grep the debug symbols for '' and raise alarms if matches are found. Forbidding piping source to the compiler in the GNU Coding Standards would eliminate false positives. -- Jacob
Re: detecting modified m4 files (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)
Bruno Haible wrote: Richard Stallman commented on Jacob Bachmeyer's idea: > > Another related check that /would/ have caught this attempt would be > > comparing the aclocal m4 files in a release against their (meta)upstream > > sources before building a package. This is something distribution > > maintainers could do without cooperation from upstream. If > > m4/build-to-host.m4 had been recognized as coming from gnulib and > > compared to the copy in gnulib, the nonempty diff would have been > > suspicious. I have a hunch that some effort is needed to do that comparison, but that it is feasible to write a script to do it could make it easy. Is that so? Yes, the technical side of such a comparison is relatively easy to implement: - There are less than about 2000 or 5000 *.m4 files that are shared between projects. Downloading and storing all historical versions of these files will take ca. 0.1 to 1 GB. - They would be stored in a content-based index, i.e. indexed by sha256 hash code. - A distribution could then quickly test whether a *.m4 file found in a distrib tarball is "known". The recurrently time-consuming part is, whenever an "unknown" *.m4 file appears, to - manually review it, - update the list of upstream git repositories (e.g. when a project has been forked) or the list of releases to consider (e.g. snapshots of GNU Autoconf or GNU libtool, or distribution-specific modifications). I agree with Jacob that a distro can put this in place, without needing to bother upstream developers. I have since thought of a simple solution that /would/ have caught this backdoor campaign in its tracks: an "autopoint --check" command that simply compares the m4/ files (and possibly others?) that autopoint would copy in if m4/ were empty against the files that would be copied and reports any differences. A newer serial in the package tree than the system m4 library produces a minor complaint; a file with the same serial and different contents produces a major complaint. An older serial in the package tree should be reported, but is likely to be of no consequence if a distribution's packaging routine will copy in the known-good newer version before rebuilding configure. Any m4/ files local to the package are simply reported, but those are also in the package's Git repository. Distribution package maintainers would run "autopoint --check" and pass any suspicious files to upstream maintainers for evaluation. (The distribution's own packaging system can trace an m4 file in the system library came to its upstream package.) The modified build-to-host.m4 would have been very /unlikely/ to slip past the gnulib/gettext/Automake/Autoconf maintainers, although few distribution packagers would have had suspicions. The gnulib maintainers would know that gl_BUIILD_TO_HOST should not be checking /anything/ itself and the crackers would have been caught. This should be effective in closing off a large swath of possible attacks: a backdoor concealed in binary test data (or documentation) requires some visible means to unpack it, which means the unpacker must appear in source somewhere. While the average package maintainer might not be able to make sense of a novel m4 file, the maintainers of GNU's version of that file /will/ be able to recognize such chicanery, and the "red herrings" the cracker added for obfuscation would become a liability. Without them, the effect of the new code is more obvious, so the crackers lose either way. -- Jacob
Re: GCC reporting piped input as a security feature (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)
Richard Stallman wrote: [[[ To any NSA and FBI agents reading my email: please consider]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] [...] When considering any such change, we still should consider the question: will this actually prevent cracks, or will it rather give crackers an additional way to check that their activities can't be detected. While it does not /prevent/ cracks, there is something we can ensure that we *keep* doing: GCC, when reading from a pipe, records the input file as "" in debug info *even* if a "#" directive to set the filename has been included. This was noticed by Adrien Nader (who posted it to oss-security; https://www.openwall.com/lists/oss-security/2024/04/03/2> and https://marc.info/?l=oss-security&m=171214932201156&w=2>; those are the same post at different public archives) and should provide a "smoking gun" test to detect this type of backdoor dropping technique in the future. This GCC behavior should be documented as a security feature, because most program sources are not read from pipes. The xz backdoor dropper took great pains to minimize its use of the filesystem; only the binary blob ever touches the disk, and that presumably because there is no other way to feed it into the linker. If debug info is regularly checked for symbols obtained from "" and the presence of such symbols reliably indicates funny business, then we force crackers to risk leaving more direct traces in the filesystem, instead of being able to patch the code "in memory" and feed an ephemeral stream to the compiler. The "Jia Tan" crackers seem to have put a lot of work into minimizing the "footprint" of their dropper, so we can assume that this must have been important to them. To avoid false positives if this test is used, we might want to add a rule to the GNU Coding Standards (probably in the "Makefile Conventions" section) that code generated with other utilities MUST always be materialized in the filesystem and MUST NOT be piped into the compiler. -- Jacob
Re: compressed release distribution formats (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)
Eric Blake wrote: [adding in coreutils, for some history] [...] At any rate, it is now obvious (in hindsight) that zstd has a much larger development team than xz, which may alter the ability of zstd being backdoored in the same way that xz was, merely by social engineering of a lone maintainer. That just means that a cracker group needs to plant a mole in a larger team, which was effectively the goal of the sockpuppet campaign against the xz-utils maintainer, except that the cracker's sockpuppet was the second member of a two-member team. I see no real difference here. I would argue that GNU software should be consistently available in at least one format that can be unpacked using only tools that are also provided by the GNU project. I believe that currently means "gzip", unfortunately. We should probably look to adopt another one; perhaps the lzip maintainer might be interested? It is also obvious that having GNU distributions available through only a SINGLE compression format, when that format may be vulnerable, The xz format is not vulnerable, or at least has not been shown to be so in the sense of security risks, and only xz-utils was backdoored. Nor is there only one implementation: 7-zip can also handle xz files. is a dis-service to users when it is not much harder to provide tarballs in multiple formats. Having multiple tarballs as the recommendation can at least let us automate that each of the tarballs has the same contents, Agreed. In fact, if multiple formats can be produced concurrently, we could validate that the compressed tarballs are actually identical. (Generate using `tar -cf - [...] | tee >(compress1 >[...].tar.comp1) | tee >(compress2 >[...].tar.comp2) | gzip -9 >[...].tar.gz` if you do not want to actually write the uncompressed tarball to the disk.) But if tarlz is to be used to write the lzipped tarball, you probably want to settle for "same file contents", since tarlz only supports pax format and we may want to allow older tar programs to unpack GNU releases. although it won't make it any more obvious whether those contents match what was in git (which was how the xz backdoor got past so many people in the first place). This is another widespread misunderstanding---almost all of the xz backdoor was hidden in plain sight (admittedly, compressed and/or encrypted) *in* the Git repository. The only piece of the backdoor not found in Git was the modified build-to-host.m4. The xz-utils project's standard practice had been to /not/ commit imported m4 files, but to bring them in when preparing release tarballs. The cracker simply rolled the "key" to the dropper into the release tarball. I still have not seen whether the configure script in the release tarball was built with the modified build-to-host.m4 or if the crackers were depending on distribution packagers to regenerate configure. Again, everything present in both Git and the release tarball /was/ /identical/. There were no mismatches, only files added to the release that are not in the repository, and that are /expected/ to be added to a release. -- Jacob
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Richard Stallman wrote: [[[ To any NSA and FBI agents reading my email: please consider]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > My first thought was that Autoconf is a relatively trivial attack vector > since it is so complex and the syntax used for some parts (e.g. m4 and > shell scripts) is so arcane. In particular, it is common for Autotools > stuff to be installed on a computer (e.g. by installing a package from > an OS package manager) and then used while building. For example, there > are large collections of ".m4" files installed. If one of the m4 files > consumed has been modified, then the resulting configure script has been > modified. Can anyone think of a feasible way to prevent this sort of attack? There have been some possibilities suggested on other branches of the discussion. I have changed the subject of one of those to "checking aclocal m4 files" to highlight it. There is progress being made, but the solutions appear to be outside the direct scope of the GNU build system packages. Someone suggested that configure should not use m4 files that are lying around, but rather should fetch them from standard release points, WDYT of that idea? Autoconf configure scripts do not use nearby m4 files and do not require m4 at all; aclocal collects the files in question into aclocal.m4 (I think) and then autoconf uses that (and other inputs) to /produce/ configure. (This may seem like a trivial point, but exact derivations and their timing were critical to how the backdoor dropper worked.) Other tools (at least autopoint from GNU gettext, possibly others) are used to automatically scan a larger set of m4 files stored on the system and copy those needed into the m4/ directory of a package source tree, in a process conceptually similar to how the linker pulls only needed members from static libraries when building an executable. All of this is done on the maintainer's machine, so that the finished configure script is included in the release tarball. There have been past incidents where malicious code was directly added to autoconf-generated configure scripts, so (as I understand) distribution packagers often regenerate configure before building a package. In /this/ case, the crackers (likely) modified the version of build-to-host.m4 on /their/ computer, so the modified file would be copied into the xz-utils/m4 directory in the release tarball and used when distribution packagers regenerate configure before building the package. Fetching these files from standard release points would require an index of those standard release points, and packages are allowed to have their own package-specific macros as well. The entire system dates from well before ubiquitous network connectivity could be assumed (anywhere---and that is still a bad assumption in the less prosperous parts of the world), so release tarballs are meant to be self-contained, including copies of "standard" macros needed for configure but not supplied by autoconf/automake/etc. -- Jacob
Re: checking aclocal m4 files (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)
Bruno Haible wrote: Jacob Bachmeyer wrote: Another related check that /would/ have caught this attempt would be comparing the aclocal m4 files in a release against their (meta)upstream sources before building a package. This is something distribution maintainers could do without cooperation from upstream. If m4/build-to-host.m4 had been recognized as coming from gnulib and compared to the copy in gnulib, the nonempty diff would have been suspicious. True. Note, however, that there would be some false positives: True; all of these are Free Software, so a non-empty diff would still require manual review. libtool.m4 is often shipped modified, a) if the maintainer happens to use /usr/bin/libtoolize and is using a distro that has modified libtool.m4 (such as Gentoo), or Distribution libtool patches could be accumulated into the set of "known sources". b) if the maintainer intentionally improved the support of specific platforms, such as Solaris 11.3. In this case, the distribution maintainer should ideally take up pushing those improvements back to upstream libtool, if they are suitably general. Also, for pkg.m4 there is no single upstream source. They distribute a pkg.m4.in, from which pkg.m4 is generated on the developer's machine. This would be a special case, but could be treated as a package-specific m4 file anyway, since the developer must generate it. The developer could also write their own m4 macros to use with autoconf. But for macros from Gnulib or the Autoconf macros archive, this is a reasonable check to make. This type of check could also allow "sweeping" improvements upstream, in the case of a package maintainer that may be unsure of how to upstream their changes. (Of course, upstream needs to be careful about blindly collecting improvements, lest some of those "improvements" turn out to have come from cracker sockpuppets...) -- Jacob
Re: binary data in source trees (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)
Richard Stallman wrote: [[[ To any NSA and FBI agents reading my email: please consider]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > The issue seems to be releases containing binary data for unit tests, > instead of source or scripts to generate that data. In this case, that > binary data was used to smuggle in heavily obfuscated object code. If this is the crucial point, we could put in the coding standards (or the maintainers' guide) not to do this. On another branch of this discussion, Zack Weinberg noted that binary test data may be unavoidable in some cases. (A base64 blob or hex dump may as well be a binary blob.) Further, manuals often contain images, some of which may be in binary formats, such as PNG. To take this all the way, we would have to require that all documentation graphics be generated from readable sources. I know TikZ exists but am unsure how well it could be integrated into Texinfo, for example. -- Jacob
Re: reproducible dists and builds (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)
Richard Stallman wrote: [[[ To any NSA and FBI agents reading my email: please consider]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > What would be helpful is if `make dist' would guarantee to produce the same > tarball (bit-to-bit) each time it is run, assuming the tooling is the same > version. Currently I believe that is not the case (at least due to timestamps) Isn't this a description of "reproducible compilation"? No, but it is closely related. Compilation produces binary executables, while `make dist` produces a freestanding /source/ archive. We want to make that standard, but progress is inevitably slow because many packages need to be changed. I am not actually sure that that is actually a good idea. (Well, it is mostly a good idea except for one issue.) If compilation is strictly deterministic, then everyone ends up with identical binaries, which means an exploit that cracks one will crack all. Varied binaries make life harder for crackers developing exploits, and may even make "one exploit to crack them all" impossible. This is one of the reasons that exploits have long hit Windows (where all the systems are identical) so much harder than the various GNU/Linux distributions (where the binaries are likely different even before distribution-specific patches are considered). Ultimately, this probably means that we should have both an /ability/ for deterministic compilation and either a compiler mode or post-processing pass (a linker option?) to intentionally shuffle the final executable. -- Jacob
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Eric Gallager wrote: On Tue, Apr 2, 2024 at 12:04 AM Jacob Bachmeyer wrote: Russ Allbery wrote: [...] I think one useful principle that's emerged that doesn't disrupt the world *too* much is that the release tarball should differ from the Git tag only in the form of added files. From what I understand, the xz backdoor would have passed this check. [...] [...] In other words, even if a proposal wouldn't have stopped this particular attack, I don't think that's a reason not to try it. I agree that there may be dumber crackers who /would/ get caught by such a check, but I want to ensure that we do not end up thinking that we have a solution and the problem is solved and everyone is happy ... and then we get caught out when it happens again. I should clarify also that I think that this proposal *is* a good idea, but we should remain aware that it would not have prevented this incident. Maneuvering around back to topic, aclocal m4 files are fairly small, perhaps always carrying all of them that a package uses in the repository should be considered a good practice? (In other words, autogen.sh should *not* run autopoint---the files autopoint adds should be in the repository.) If such a practice were followed, that would have made checking for altered files between repository and release effective, or it would have forced the cracker to target the backdoor more widely and place the altered build-to-host.m4 in the repository, increasing the probability of discovery. Wording that as a policy: "All data inputs used to construct the build scripts for a package shall be stored in the package's repository." Another related check that /would/ have caught this attempt would be comparing the aclocal m4 files in a release against their (meta)upstream sources before building a package. This is something distribution maintainers could do without cooperation from upstream. If m4/build-to-host.m4 had been recognized as coming from gnulib and compared to the copy in gnulib, the nonempty diff would have been suspicious. -- Jacob
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Richard Stallman wrote: [[[ To any NSA and FBI agents reading my email: please consider]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > `distcheck` target's prominence to recommend it in the "Standard > Targets for All Users" section of the GCS? > Replying as an Automake developer, I have nothing against it in > principle, but it's clearly up to the GNU coding standards > maintainers. As far as I know, that's still rms (for anything > substantive) To make a change in the coding standards calls for a clear and specific proposal. If people think a change is desirable, I suggest making one or more such proposals. Now for a bit of speculation. I speculate that a cracker was careless and failed to adjust certain details of a bogus tar ball to be fully consistent, and that `make distcheck' enabled somene to notice those errors. I don't have any real info about whether that is so. If my speculation is mistaken, please say so. I believe it is completely mistaken. As I understand, the crocked tarballs would have passed `make distcheck` with flying colors. The rest of your questions about it therefore have no answer. On a side note, thanks for Emacs: when I finally extracted a copy of the second shell script in the backdoor dropper, Emacs made short work (M-< M-> C-M-\) of properly indenting it and making the control flow obvious. Misunderstandings of that control flow have been fairly common. (I too had it wrong before I finally had a nicely indented copy.) The backdoor was actually discovered in operation on machines running testing package versions. It caused sshd to consume an inordinate amount of CPU time, with profiling reporting that sshd was spending most of its time in liblzma, a library not even linked in sshd. (The "rogue" library had been loaded as a dependency of libsystemd, which the affected distributions had patched sshd to use for startup notification.) I will send a more detailed reply on the other thread, since its subject is more appropriate. -- Jacob
Re: role of GNU build system in recent xz-utils backdoor
Richard Stallman wrote: [[[ To any NSA and FBI agents reading my email: please consider]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I was recently reading about the backdoor announced in xz-utils the > other day, and one of the things that caught my attention was how > (ab)use of the GNU build system played a role in allowing the backdoor > to go unnoticed: https://openwall.com/lists/oss-security/2024/03/29/4 [...] I don't want to get involved in fixing the bug, but I want to make sure the GNU Project is working on it. I believe the GNU Project is blameless and uninvolved in this matter. I am aware of possible elements used in the attack from the GNU Project, however two of them are innocent improvements abused by the cracker and the third was modified by the cracker. The backdoor is in two major segments: a binary blob hidden in a testsuite data file and two shell scripts that drop it, also hidden in testsuite data files. A modified version of build-to-host.m4 from gnulib is used to insert code into configure to initially extract the dropper and run it (via pipeline---the dropper shell scripts never touch the filesystem). If several conditions are met (building a shared library on 'x86_64-*-linux-gnu', HAVE_FUNC_ATTRIBUTE_IFUNC, using the GNU toolchain (or at least a linker that claims to be "GNU ld" and a compiler invoked as "gcc"), and building under either dpkg or rpm), the dropper extracts a binary blob and links it with a legitimate object, which is patched and recompiled (using sed in a pipeline; the modified C source never touches the filesystem) to call a function exported from the blob. The aclocal m4 files involved in the attack were never committed to the xz Git repository, instead being added to each release tarball using autopoint. (This was the package's standard practice before the attack.) The offending build-to-host.m4 was modified by the cracker, either directly in the release tree or at the location where autopoint will find it. Some of the modifications "sound like" the cracker may have used a language model trained on other GNU sources---they are very plausible at first glance. The elements from the GNU Project potentially implicated are build-to-host.m4, gettext.m4, and the ifunc feature in glibc. All of these turn out to be innocent. The initial "key" that activated the backdoor dropper was a modified version of the gl_BUILD_TO_HOST macro from gnulib. The dropper also checks m4/gettext.m4 for the text "dnl Convert it to C string syntax." and fails to extract the blob if found. It turns out that gl_BUILD_TO_HOST is used only as a dependency of gettext.m4 and that that comment was removed in the same commit that factored out gl_BUILD_TO_HOST to gnulib. (commit 3adaddd73c8edcceaed059e859bd5262df65fc5a in GNU gettext repository is by Bruno Haible; his involvement in the backdoor campaign is *extremely* unlikely in my view) The "ifunc" feature merely allows the programmer to store function pointers in the PLT instead of the data segment when alternate implementations of a function are involved. Theoretically, it should actually be a security improvement, as the PLT can be made read-only after all links are resolved, while the data segment must remain generally writable. The backdoor will not be dropped if the use of ifunc is disabled or if the feature is unavailable. I currently believe that the cracker used ifunc support as a covert flag to disable the backdoor when the oss-fuzz project was scanning the package. I also suspect that the cracker's claim that ifuncs cause segfaults under -fsanitize=address (in the message for commit ee44863ae88e377a5df10db007ba9bfadde3d314 in the xz Git repository) may have been less than honest; that commit also gives credit to another of the cracker's sockpuppets for the original patch and was committed by the cracker's main "Jia Tan" sockpuppet, so the involvement of the primary maintainer (who is listed as the author of the commit in Git) is uncertain. (In other words, the xz Git repository likely contains blatant lies put there by the cracker.) Looking into this a little more, I now know what the dropper's C source patch does: the blob's initialization entrypoint is named _get_cpuid (note only one leading underscore) and is called from an inserted static inline function that crc{32,64}_resolve (the ifunc resolvers that choose CRC implementations) are patched to call. The dropper also ensures (by modifying liblzma_la_LDFLAGS in src/liblzma/Makefile) that liblzma.so will be linked with -Wl,-z,now so that ifuncs are resolved as the shared object is loaded. That is how the backdoor blob initially gains control at a time during early process initialization when the PLT is still writable despite other hardening. -- Jacob
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Zack Weinberg wrote: On Mon, Apr 1, 2024, at 2:04 PM, Russ Allbery wrote: "Zack Weinberg" writes: It might indeed be worth thinking about ways to minimize the difference between the tarball "make dist" produces and the tarball "git archive" produces, starting from the same clean git checkout, and also ways to identify and audit those differences. There is extensive ongoing discussion of this on debian-devel. There's no real consensus in that discussion, but I think one useful principle that's emerged that doesn't disrupt the world *too* much is that the release tarball should differ from the Git tag only in the form of added files. Any files that are present in both Git and in the release tarball should be byte-for-byte identical. That dovetails nicely with something I was thinking about myself. Obviously the result of "make dist" should be reproducible except for signatures; to the extent it isn't already, those are bugs in automake. But also, what if "make dist" produced *two* disjoint tarballs? One of which is guaranteed to be byte-for-byte identical to an archive of the VCS at the release tag (in some clearly documented fashion; AIUI, "git archive" does *not* do what we want). The other contains all the files that "autoreconf -i" or "./bootstrap.sh" or whatever would create, but nothing else. Diffs could be provided for both tarballs, or only for the VCS-archive tarball, whichever turns out to be more compact (I can imagine the diff for the generated-files tarball turning out to be comparable in size to the generated-files tarball itself). The way to do that is to detect that "make dist" is being run in a VCS checkout, ask the VCS which files are in version control, and assume the others were somehow "brought in" by autogen.sh or whatever. The problem is that now Automake needs to start growing support for varying version control systems, unless we /really/ want to say that this feature only works with Git. The problem is that now the disjoint tarballs both need to be unpacked in the same directory to build the package and once that is done, how does "make dist" rebuild the distribution it was run from? The file lists would need to be stored in the generated-files tarball. The other problem is that this really needs to be an option. DejaGnu, for example, stores the Autotools-generated files in Git and releases are just snapshots of the working tree. (DejaGnu can also now *run* from a Git checkout without actually installing it, but that is a convenience limited to interpreted languages.) Lastly, publishing a modified (third-party) distribution derived from a release instead of VCS *is* permitted. (I believe this is a case of freedom 3.) How would this feature interact with that? -- Jacob
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Russ Allbery wrote: [...] There is extensive ongoing discussion of this on debian-devel. There's no real consensus in that discussion, but I think one useful principle that's emerged that doesn't disrupt the world *too* much is that the release tarball should differ from the Git tag only in the form of added files. From what I understand, the xz backdoor would have passed this check. The backdoor dropper was hidden in test data files that /were/ in the repository, and required code in the modified build-to-host.m4 to activate it. The m4 files were not checked into the repository, instead being added (presumably by running autogen.sh with a rigged local m4 file collection) while preparing the release. Someone with a copy of a crocked release tarball should check if configure even had the backdoor "as released" or if the attacker was /depending/ on distributions to regenerate configure before packaging xz. -- Jacob
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Zack Weinberg wrote: [...] but I do think there's a valid point here: the malicious xz maintainer *might* have been caught earlier if they had committed the build-to-host.m4 modification to xz's VCS. That would require someone to notice that xz.git has a build-to-host.m4 that does not exist anywhere in the history of gnulib.git. That is a fairly complex scan, although it does look straightforward to implement. That said, the m4 files in Gnulib *are* Free Software, so having a modified version cannot itself raise too many concerns. (Or they might not have! Witness the three (and counting) malicious patches that they barefacedly submitted to *other* software and got accepted because the malice was subtle enough to pass through code review.) Exactly. :-/ That said, the whole thing looks to me like the attackers were trying to /not/ hit the more (what is the best word?) "advanced" users---the backdoor would only be inserted if building distribution packages, and then only under dpkg or rpm, not other systems like Gentoo's Portage or in an unpackaged "./configure && make && sudo make install" build. This would, of course, hit the most widely used systems, including (reports are that the sock farm tried very hard to get Ubuntu to ship the crocked version in their upcoming release, but the freeze was upheld) the systems most commonly used by less technically-skilled users, but pointedly exclude systems that require greater skill to use---and whose users would be more likely to notice anything amiss and start tearing the system apart with the debugger. Unfortunately for Mr. Sockmaster, it turns out that some highly-skilled users *do* use the widely-used systems and the backdoor caused sshd to misbehave enough to draw suspicion. (Profiling reports that sshd is spending most of its time in liblzma---a library it has no reason to use---will tend to raise a few eyebrows. :-) ) [...] Maybe the best revision to the GNU Coding Standards would be that releases should, if at all possible, contain only text? Any binary files needed for testing can be generated during "make check" if necessary I don't think this is a good idea. It's only a speed bump for someone trying to smuggle malicious data into a package (think "base64 -d") and it makes life substantially harder for honest authors of programs that work with binary data, and authors of material whose "source code" (as GPLv3 uses that term) *is* binary data. Consider pngsuite, for instance (http://www.schaik.com/pngsuite/) -- it would be a *ton* of work to convert each of these test PNG files into GNU Poke scripts, and probably the result would be *less* ergonomic for purposes of improving the test suite. That is a bad example because SNG (https://sng.sourceforge.net/>) exists precisely to provide a a text representation of PNG binary structures. (Admittedly, if I recall correctly, the contents of IDAT are simply a hexdump.) While we are on the topic, this leaves the other obvious place to hide binary data: images used as part of the manual. There is a reason that I added the "if at all possible" caveat, and I am not certain that it is always possible. I would like to suggest that a more useful policy would be "files written to $prefix by 'make install' should not have any data dependency on files labeled as part of the package's testsuite". That doesn't constrain honest authors and it seems within the scope of what the reproducible builds people could test for. (Build the package, install to nonce prefix 1, unpack the tarball again, delete the test suite, build again, install to prefix 2, compare.) Of course a sufficiently determined malicious coder could detect the reproducible-build test environment, but unlike "no binary data" this is a substantial difficulty increment. This could be a good idea. Another way to check this even without reproducible builds would be to ensure that the access timestamps on testsuite files do not change while "make" is processing the main sources. Checking this is slightly more invasive, since you would need to run a hook between processing top-level directories during the main build, but for packages using recursive Automake, you could simply run "make -C src" (or wherever the main sources are) and make sure that the testsuite files still have the same atime afterwards. I admit that this is harder to automate in general, but distribution packaging processes already have other metadata that is manually maintained, so identifying the source subtrees that yield the installable artifacts should not be difficult. Now that I think about it, I suggest tightening that policy a bit further: "files produced by make in the source subtree (typically src/) shall have no data dependency on files outside of that tree" I doubt anyone ever thought that recursive make could end up as security/verifiability feature. 8-| -- Jacob
Re: automated release building service
Bruno Haible wrote: Jacob Bachmeyer wrote: Essentially, this would be an automated release building service: upon request, make a Git checkout, run autogen.sh or equivalent, make dist, and publish or hash the result. The problem is that an attacker who manages to gain commit access to a repository may be able to launch attacks on the release building service, since "make dist" can run scripts. The service could probably mount the working filesystem noexec since preparing source releases should not require running (non-system) binaries and scripts can be run by directly feeding them into their interpreters even if the filesystem is mounted noexec, but this still leaves all available interpreters and system tools potentially available. Well, it'd at least make things more difficult for the attacker, even if it wouldn't stop them completely. Actually, no, it would open a *new* target for attackers---the release building service itself. Mounting the scratchpad noexec would help to complicate attacks on that service, but right now there is *no* central point for an attacker to hit to compromise releases. If a central release building service were set up, it would be a target, and an attacker able to arrange a persistent compromise of the service could then tamper with later releases as they are built. This should be fairly easy to catch, if an honest maintainer has a secure environment, ("Why the does the central release service tarball not match mine? And what the is the extra code in this diff between its tarball and mine!?") but there is a risk that, especially for large projects, maintainers start relying on the central release service instead of building their own tarballs. The problem here was not a maintainer with a compromised system---it seems that "Jia Tan" was a malefactor's sock puppet from the start. There are several problems that such an automated release building service would create. Here are a couple of them: * First of all, it's a statement of mistrust towards the developer/maintainer, if developers get pressured into using an automated release building service rather than producing the tarballs on their own. This demotivates and turns off developers, and it does not fix the original problem: If a developer is in fact a malefactor, they can also commit malicious code; they don't need to own the release process in order to do evil things. Limiting trust also limits the value of an attack as well, thus protecting the developers/maintainers from at least sane attackers in some ways. I also think that this point misunderstands the original proposal (or I have misunderstood it). To some extent, projects using Automake already have that automated release building service; we call it "make dist" and it is a distributed service running on each maintainer's machine, including distribution package maintainers who regenerate the Autotools files. A compromise of a developer's machine is thus valuable as it allows to tamper with releases, but the risk is managed somewhat by each developer building only their own releases. A central service as a "second opinion" would be a risk, but would also make those compromises even more difficult---now the attacker must hit both the central service *and* the dev box *and* coordinate to ensure that only packages prepared at the central service for which the maintainer's own machine is cracked are tampered, lest the whole thing be detected. This is even harder on the attacker, which is a good thing, of course. The more dangerous risk is that the central service becomes overly trusted and ceases to be merely the "second opinion" on a release. If that occurs, not only would we be right back to no real check on the process, but now *all* the releases come from one place. A compromise of the central release service would then allow *all* releases to be tampered, which is considerably more valuable to an attacker. * Such an automated release building service is a piece of SaaSS. I can hardly imagine how we at GNU tell people "SaaSS is as bad as, or worse than, proprietary software" and at the same time advocate the use of such a service. As long as it runs on published Free Software and anyone is free to set up their own instance, I would disagree here. I think we need to work out where the line between "hosting" and "SaaSS" actually is, and I am not sure that it has a clear technical description, since SaaSS is ultimately an ethical issue. * Like Jacob mentioned, such a service quickly becomes a target for attackers. So, instead of trusting a developer, you now need to trust the technical architecture and the maintainers of such a service. I think I may know an example of something similar:
Re: libsystemd dependencies
Bruno Haible wrote: Jacob Bachmeyer wrote: some of the blame for this needs to fall on the systemd maintainers and their "katamari" architecture. There is no good reason for notifications of daemon startup to pull in liblzma, but using libsystemd for that purpose does exactly that, and ended up getting xz-utils targeted as a means of getting to sshd without the OpenSSH maintainers noticing. The systemd people are working on reducing the libsystemd dependencies: https://github.com/systemd/systemd/issues/32028 However, the question remains unanswered why it needs 3 different compression libraries (liblzma, libzstd, and liblz4). Why would one not suffice? From reading other discussions, the only reason libsystemd pulls in compression libraries at all is its "katamari" architecture: the systemd journal can be optionally compressed with any of those algorithms, and the support for reading the journal (which libsystemd also provides) therefore requires support for all of them. No, sshd (even with the distribution patches at issue) does /not/ use that support whatsoever. Better design would split libsystemd into separate libraries: libsystemd-notify, libsystemd-journal, etc. I suspect that there are more logically distinct modules that have been "katamaried" into one excessively large library. The C runtime library has an excuse for being such an agglomeration, but also note that libc has *zero* hard external dependencies. You can ridicule NSS if you like, but NSS modules are only loaded if NSS is used. (To be fair, sshd almost certainly /does/ use functions provided by NSS.) The systemd developers do not have that excuse, and their library *has* external dependencies. I believe the systemd developers cite convenience as justification for the practice, because apparently figuring out which libraries (out of a set partitioned based on functionality) you need to link is "too hard" for developers these days. (Perhaps that is the real reason they want to replace X11?) That "convenience" nearly got all servers on the Internet running the major distributions backdoored with critical severity and we do not yet know exactly what the backdoor blob did. The preliminary reports that it was an RCE backdoor that would pass commands smuggled in public key material in SSH certificates to system(3) (as root of course, since that is sshd's context at that stage) are inconsistent with the slowdown that caused the backdoor to be discovered. I doubt that SSH logins were using that code path, and the SSH scanning botnets almost certainly are not presenting certificates, yet it apparently (reports have been unclear on this point) was the botnet scanning traffic that led to the discovery of sshd wasting considerable CPU time in liblzma... I am waiting for the proverbial other shoe to drop on that one. -- Jacob
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Jose E. Marchesi wrote: Jose E. Marchesi wrote: [...] I agree that distcheck is good but not a cure all. Any static system can be attacked when there is motive, and unit tests are easily gamed. The issue seems to be releases containing binary data for unit tests, instead of source or scripts to generate that data. In this case, that binary data was used to smuggle in heavily obfuscated object code. As a side note, GNU poke (https://jemarch.net/poke) is good for generating arbitrarily complex binary data from clear textual descriptions. While it is suitable for that use, at last check poke is itself very complex, complete with its own JIT-capable VM. This is good for interactive use, but I get nervous about complexity in testsuites, where simplicity can greatly aid debugging, and it /might/ be possible to hide a backdoor similarly in a poke pickle. (This seems to be a general problem with powerful interactive editors.) Yes, I agree simplicity it is very desirable, in testsuites and actually everywhere else. I also am not fond of dragging in dependencies. Exactly---I am sure that poke is great for interactive use, but a self-contained solution is probably better for a testsuite. But I suppose we also agree in that it is not possible to assembly non-trivial binary data structures in a simple way, without somehow moving the complexity of the encoding into some sort of generator, which will not be simple. The GDB testsuite, for example, ships with a DWARF assembler written in around 3000 lines of Tcl. Sure, it is simpler than poke and doesn't drag in additional dependencies. But it has to be carefully maintained and kept up to date, and the complexity is there. The problem for a compression tool testsuite is that compression formats are (I believe) defined as byte-streams or bit-streams. Further, the generator(s) must be able to produce /incorrect/ output as well, in order to test error handling. Further, GNU poke defines its own specialized programming language for manipulating binary data. Supplying generator programs in C (or C++) for binary test data in a package that itself uses C (or C++) ensures that every developer with the skills to improve or debug the package can also understand the testcase generators. Here we will have to disagree. IMO it is precisely the many and tricky details on properly marshaling binary data in general-purpose programming languages that would have greater odds to lead to difficult to understand, difficult to maintain and possibly buggy or malicious encoders. The domain specific language is here an advantage, not a liability. This you need to do in C to encode and generate test data for a single signed 32-bit NUMBER in an output file in a _more or less_ portable way: void generate_testdata (off_t offset, int endian, int number) { int bin_flag = 0, fd; #ifdef _WIN32 int bin_flag = O_BINARY; #endif fd = open ("testdata.bin", bin_flag, S_IWUSR); if (fd == -1) fatal ("error generating data."); if (endian == BIG) { b[0] = (number >> 24) & 0xff; b[1] = (number >> 16) & 0xff; b[2] = (number >> 8) & 0xff; b[3] = number & 0xff; } else { b[3] = (number >> 24) & 0xff; b[2] = (number >> 16) & 0xff; b[1] = (number >> 8) & 0xff; b[0] = number & 0xff; } lseek (fd, offset, SEEK_SET); for (i = 0; i < 4; ++i) write (fd, &b[i], 1); close (fd); } While that is a nice general solution, (aside from neglecting the declaration "uint8_t b[4];"; with "int b[4];", the code would only work on a little-endian processor; with no declaration, the compiler will reject it) a compression format would be expected to define the endianess of stored values, so the major branch in that function would collapse to just one of its alternatives. Compression formats are generally defined as streams, so a different decomposition of the problem would likely make more sense: (example untested) void emit_int32le (FILE * out, int value) { unsigned int R, i; for (R = (unsigned int)value, i = 0; i < 4; R = R >> 8, i++) if (fputc(R & 0xff, out) == EOF) fatal("error writing int32le"); } Other code handles opening OUT, or OUT is actually stdout and we are writing down a pipe or the shell handled opening the file. (The main function can easily check that stdout is not a terminal and bail out if it is.) Remember that I am suggesting test generator programs, which do not need to be as general as ordinary code, nor do they need the same level of user-friendliness, since they are expected to be run from scripts that encode the precise knowledge of how to call them. (That this version is also probably more efficient by avoiding a syscall for every byte written is irrelevant for its intended use.) This i
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Tomas Volf wrote: On 2024-03-31 14:50:47 -0400, Eric Gallager wrote: With a reproducible build system, multiple maintainers can "make dist" and compare the output to cross-check for erroneous / malicious dist environments. Multiple signatures should be harder to compromise, assuming each is independent and generally trustworthy. This can only work if a package /has/ multiple active maintainers. Well, other people besides the maintainers can also run `make dist` and `make distcheck`. My idea was to get end-users in the habit of running `make distcheck` themselves before installing stuff. And if that's too much to ask of end users, I'd also point out that there are multiple kinds of maintainer: besides the upstream maintainer, there are also usually separate distro maintainers. Even if there's only 1 upstream maintainer, as was the case here, I still think that it would be good to get distro maintainers in the habit of including `make distcheck` as part of their own release process, before they accept updates from upstream. What would be helpful is if `make dist' would guarantee to produce the same tarball (bit-to-bit) each time it is run, assuming the tooling is the same version. Currently I believe that is not the case (at least due to timestamps). A "tardiff" tool that ignores timestamps would be a solution to that problem, but not to this backdoor. Combined with GNU Guix that would allow simple way to verify that `make dist' was used, and the resulting artifact not tampered with, even without any central signing. The Guix "challenge" operation would not have detected this backdoor because *it* *was* *in* *the* *upstream* *release*. The build service works from that release tarball and you build from that same release tarball. GNU Guix ensures an equivalent build environment and your results *will* match---either the backdoor was not inserted or it was inserted in both builds. The flow of the attack as I understand it was: (0) (speculation on motivation) The attacker wanted a "Golden Key" to SSH and started looking for ways to backdoor sshd. (1) The attacker starts a sockpuppet campaign and manages to get one of his sockpuppets appointed co-maintainer of xz-utils. (2) [2023-06-27] The sockpuppet merges a pull request believed to be from another sockpuppet in commit ee44863ae88e377a5df10db007ba9bfadde3d314. (3) [2024-02-15] The sockpuppet "updates m4/.gitignore" to add build-to-host.m4 to the list in commit 4323bc3e0c1e1d2037d5e670a3bf6633e8a3031e. (4) [2024-02-23] The sockpuppet adds 5 files to the xz-utils testsuite in commit cf44e4b7f5dfdbf8c78aef377c10f71e274f63c0. (5) [2024-03-08] To cover tracks, the sockpuppet finally adds a test using bad-3-corrupt_lzma2.xz in commit a3a29bbd5d86183fc7eae8f0182dace374e778d8. (6) [2024-03-08] The sockpuppet revises two of those files with a lame excuse in commit a3a29bbd5d86183fc7eae8f0182dace374e778d8. The quick analysis of the Git history supporting steps 2 - 6 above has turned up another interesting detail: no version of configure.ac actually committed ever used the gl_BUILD_TO_HOST macro. An analysis found on pastebin noted that build-to-host.m4 is a dependency of gettext.m4. Following up finds commit 3adaddd73c8edcceaed059e859bd5262df65fc5a of 2023-02-18 in the GNU gettext repository introduced the use of gl_BUILD_TO_HOST, apparently as part of moving some existing path translation logic to gnulib and generalizing it for use elsewhere. This commit is innocent (it is *extremely* unlikely that Bruno Haible was involved in the backdoor campaign) and also explains why the backdoor was checking for "dnl Convert it to C string syntax." in m4/gettext.m4: that comment was removed in the same commit that switch to using gl_BUILD_TO_HOST. The change to gettext also occurred about a year before the sockpuppet began to take advantage of it. It almost "feels like" the attacker was waiting for an opportunity to make plausible changes to autoconf macros and finally got one when updating the m4/ files for the 5.6.0 release. Could someone with the release tarballs confirm that m4/gettext.m4 was updated between v5.5.2beta and v5.6.0? I doubt the entire backdoor was developed in the week between those two commits. In fact, the timing around introducing ifuncs suggests to me that the binary blob was at least well into development by mid-2023. The commit message at step 2 claims that using ifuncs with -fsanitize=address causes segfaults. If this is true generally, the glibc team should probably reconsider whether the abuse potential is worth the benefit of the feature and possibly investigate how the feature was introduced to glibc. If this was an excuse, it provided a clever way to prevent oss-fuzz from finding the backdoor, as disabling ifuncs provides a conveniently hidden flag to disable the backdoor. While double-checking the above, I stumb
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Eric Gallager wrote: On Sun, Mar 31, 2024 at 3:20 AM Jacob Bachmeyer wrote: dherr...@tentpost.com wrote: [...] The issue seems to be releases containing binary data for unit tests, instead of source or scripts to generate that data. In this case, that binary data was used to smuggle in heavily obfuscated object code. [...] Maybe this is something that the GNU project could start making stronger recommendations about. The key issue seems to be generating binary test data during `make` or `make check`, using GNU poke, GNU Awk, Perl, Tcl, small C programs, or something else, instead of packaging it in the release. The xz-utils backdoor was smuggled into the repository wrapped in compressed test data. With a reproducible build system, multiple maintainers can "make dist" and compare the output to cross-check for erroneous / malicious dist environments. Multiple signatures should be harder to compromise, assuming each is independent and generally trustworthy. This can only work if a package /has/ multiple active maintainers. Well, other people besides the maintainers can also run `make dist` and `make distcheck`. My idea was to get end-users in the habit of running `make distcheck` themselves before installing stuff. And if that's too much to ask of end users, I'd also point out that there are multiple kinds of maintainer: besides the upstream maintainer, there are also usually separate distro maintainers. Even if there's only 1 upstream maintainer, as was the case here, I still think that it would be good to get distro maintainers in the habit of including `make distcheck` as part of their own release process, before they accept updates from upstream. The problem with that is that `make distcheck` only verifies that the working tree can produce a reasonable release tarball. The backdoored xz-utils releases *would* *have* *passed* *this* *test* as far as I can determine. It catches errors like omitting files from the lists in Makefile.am. It will *not* catch a modified m4 file or questionable test data that has been properly listed as part of the release. Maybe GNU should establish a cross-verification signing standard and "dist verification service" that automates this process? Point it to a repo and tag, request a signed hash of the dist package... Then downstream projects could check package signatures from both the maintainer and such third-party verifiers to check that nothing was inserted outside of version control. Essentially, this would be an automated release building service: upon request, make a Git checkout, run autogen.sh or equivalent, make dist, and publish or hash the result. The problem is that an attacker who manages to gain commit access to a repository may be able to launch attacks on the release building service, since "make dist" can run scripts. The service could probably mount the working filesystem noexec since preparing source releases should not require running (non-system) binaries and scripts can be run by directly feeding them into their interpreters even if the filesystem is mounted noexec, but this still leaves all available interpreters and system tools potentially available. Well, it'd at least make things more difficult for the attacker, even if it wouldn't stop them completely. Actually, no, it would open a *new* target for attackers---the release building service itself. Mounting the scratchpad noexec would help to complicate attacks on that service, but right now there is *no* central point for an attacker to hit to compromise releases. If a central release building service were set up, it would be a target, and an attacker able to arrange a persistent compromise of the service could then tamper with later releases as they are built. This should be fairly easy to catch, if an honest maintainer has a secure environment, ("Why the does the central release service tarball not match mine? And what the is the extra code in this diff between its tarball and mine!?") but there is a risk that, especially for large projects, maintainers start relying on the central release service instead of building their own tarballs. The problem here was not a maintainer with a compromised system---it seems that "Jia Tan" was a malefactor's sock puppet from the start. -- Jacob
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Jose E. Marchesi wrote: [...] I agree that distcheck is good but not a cure all. Any static system can be attacked when there is motive, and unit tests are easily gamed. The issue seems to be releases containing binary data for unit tests, instead of source or scripts to generate that data. In this case, that binary data was used to smuggle in heavily obfuscated object code. As a side note, GNU poke (https://jemarch.net/poke) is good for generating arbitrarily complex binary data from clear textual descriptions. While it is suitable for that use, at last check poke is itself very complex, complete with its own JIT-capable VM. This is good for interactive use, but I get nervous about complexity in testsuites, where simplicity can greatly aid debugging, and it /might/ be possible to hide a backdoor similarly in a poke pickle. (This seems to be a general problem with powerful interactive editors.) Further, GNU poke defines its own specialized programming language for manipulating binary data. Supplying generator programs in C (or C++) for binary test data in a package that itself uses C (or C++) ensures that every developer with the skills to improve or debug the package can also understand the testcase generators. -- Jacob
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
dherr...@tentpost.com wrote: On 2024-03-30 18:25, Bruno Haible wrote: Eric Gallager wrote: Hm, so should automake's `distcheck` target be updated to perform these checks as well, then? The first mentioned check can not be automated. ... The second mentioned check could be done by the maintainer, ... I agree that distcheck is good but not a cure all. Any static system can be attacked when there is motive, and unit tests are easily gamed. The issue seems to be releases containing binary data for unit tests, instead of source or scripts to generate that data. In this case, that binary data was used to smuggle in heavily obfuscated object code. The best analysis in one place that I have found so far is https://gynvael.coldwind.pl/?lang=en&id=782>. In brief, grep is used to locate the main backdoor files by searching for marker strings. After running tests/files/bad-3-corrupt_lzma2.xz through tr(1), it becomes a /valid/ xz file that decompresses to a shell script that extracts a second shell script from part of the compressed data in tests/files/good-large_compressed.lzma and pipes it to a shell. That second script has two major functions: first, it searches the test files for four six-byte markers, and it then extracts and decrypts (using a simple RC4-alike implemented in Awk) the binary backdoor also found in tests/files/good-large_compressed.lzma. The six-byte markers mark beginning and end of raw LZMA2 streams obfuscated with a simple substitution cipher. Any such streams found would be decompressed and read by the shell, but neither of the known crocked releases had any files containing those markers. The binary backdoor is an x86-64 object that gets unpacked into liblzma_la-crc64-fast.o, unless m4/gettext.m4 contains "dnl Convert it to C string syntax." which is a clever flag because about no one actually checks that those m4 files in release tarballs actually match what the GNU project distributes. The object itself is just the backdoor and presumably provides the symbol _get_cpuid as its entrypoint, since the unpacker script patches the src/liblzma/check/crc{64,32}_fast.c files in a pipeline to add calls to that function and drops the compiled objects in .libs/. Running make will then skip building those objects, since they are already up-to-date, and the backdoored objects get linked into the final binary. Commit 6e636819e8f070330d835fce46289a3ff72a7b89 (https://git.tukaani.org/?p=xz.git;a=commitdiff;h=6e636819e8f070330d835fce46289a3ff72a7b89>) was an update to the backdoor. The commit message is suspicious, claiming the use of "a constant seed" to generate reproducible test files, but /not/ declaring how the files were produced, which of course prevents reproducibility. With a reproducible build system, multiple maintainers can "make dist" and compare the output to cross-check for erroneous / malicious dist environments. Multiple signatures should be harder to compromise, assuming each is independent and generally trustworthy. This can only work if a package /has/ multiple active maintainers. You also have a small misunderstanding here: "make dist" prepares a (source) release tarball, not a binary build, so this is a closely-related issue but actually distinct from reproducible builds. Also easier to solve, since we only have to make the source tarball reproducible. Maybe GNU should establish a cross-verification signing standard and "dist verification service" that automates this process? Point it to a repo and tag, request a signed hash of the dist package... Then downstream projects could check package signatures from both the maintainer and such third-party verifiers to check that nothing was inserted outside of version control. Essentially, this would be an automated release building service: upon request, make a Git checkout, run autogen.sh or equivalent, make dist, and publish or hash the result. The problem is that an attacker who manages to gain commit access to a repository may be able to launch attacks on the release building service, since "make dist" can run scripts. The service could probably mount the working filesystem noexec since preparing source releases should not require running (non-system) binaries and scripts can be run by directly feeding them into their interpreters even if the filesystem is mounted noexec, but this still leaves all available interpreters and system tools potentially available. -- Jacob
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Eric Gallager wrote: Specifically, what caught my attention was how the release tarball containing the backdoor didn't match the history of the project in its git repository. That made me think about automake's `distcheck` target, whose entire purpose is to make it easier to verify that a distribution tarball can be rebuilt from itself and contains all the things it ought to contain. The problem is that a release tarball is a freestanding object, with no dependency on the repository from which it was produced. In this case, the attacker added a bogus "update" of build-to-host.m4 from gnulib to the release tarball, but that file is not stored in the Git repository. This would not have tripped "make distcheck" because the crocked tarball can indeed be used to rebuild another crocked tarball. As Alexandre Oliva mentioned in his reply, there is not really any good way to prevent this, since the attacker could also patch the generated configure script more directly. (I seem to remember past incidents where tampered release tarballs had configure scripts that would download and run shell scripts. If you ran configure as root, well...) The *user* could catch issues like this backdoor, since the backdoor appears (based on what I have read so far) to materialize certain object files while configure is running, while `find . -iname '*.o'` /should/ return nothing before make is run. This also suggests that running "make clean" after configure would kill at least this backdoor. A *very* observant (unreasonably so) user might notice that "make" did not build the objects that the backdoor provided. Of course, an attacker could sneak around this as well by moving the process for unpacking the backdoor object to a Makefile rule, but that is more likely to "stick out" to an observant user, as well as being an easy target for automated analysis ("Which files have 'special' rules?") since you cannot obfuscate those from make(1) and expect them to still work. In this case, the backdoor was ultimately discovered when it caused performance problems in sshd, which should not be using liblzma at all, but gets linked with it courtesy of libsystemd on major GNU/Linux distributions. Yes, this means that systemd is a contributing factor to this incident, and that is aggravated by its unnecessary use of excessive dependencies. (Sending a notification that a daemon is ready should /not/ require compression support of any type. The "katamari" architecture model used in systemd had the effect here of broadening the supply-chain attack surface for OpenSSH sshd to include xz-utils, which is insane.) The bulk of the attack payload seems to have been stored in the Git repository, disguised as binary test data in files tests/files/{bad-3-corrupt_lzma2.xz,good-large_compressed.lzma}. The modified build-to-host.m4 merely added code to configure to start the process of unpacking the backdoor. In a build from Git, the legitimate build-to-host.m4 would get copied in from gnulib and the backdoor would remain hidden. Maybe the best revision to the GNU Coding Standards would be that releases should, if at all possible, contain only text? Any binary files needed for testing can be generated during "make check" if necessary, with generator programs packaged (as source or scripts) in the release. -- Jacob
Re: [RFC PATCH]: autom4te: report subsecond timestamp support in --version
Zack Weinberg wrote: On Mon, Dec 4, 2023, at 7:26 PM, Jacob Bachmeyer wrote: Now that I have seen the actual patch, yes, this test should be accurate. The test in the main autom4te script will also work, even if there is a mismatch between the script and its library Good. This appears to be misaligned with the GNU Coding Standards, which states: "The first line is meant to be easy for a program to parse; the version number proper starts after the last space." Perhaps the best option would be to conditionally add a line "This autom4te supports subsecond timestamps." after the license notice? I don't like putting anything after the license notice because it's convenient to be able to pipe --version output to sed '/Copyright/,$d' without losing anything relevant for troubleshooting. So how about $ autom4te --version autom4te (GNU Autoconf) 2.71 Features: subsecond-timestamps Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+/Autoconf: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>, <https://gnu.org/licenses/exceptions.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Akim Demaille. This preserves the effectiveness of sed '/Copyright/,$d' and also leaves room for future additions to the "Features:" line. That looks like a good idea to me, although the GNU Coding Standards do say (section 4.8.1, "--version") that the copyright and license notices "should" immediately follow the version numbers. The presence or absence of this feature is effectively determined by something similar to a library version (the availability of the Perl Time::HiRes module) and it is expected to be important for debugging, which is the criteria stated for listing library versions. Further, "should" does not express an absolute requirement and no rationale that would effectively make an absolute requirement (like a rule for automated parsing) is given here, unlike for the version in the first line. -- Jacob
Re: rhel8 test failure confirmation?
Zack Weinberg wrote: On Mon, Dec 4, 2023, at 7:14 PM, Jacob Bachmeyer wrote: Zack Weinberg wrote: [snip everything addressed in the other thread] Yes, there was a bit of confusion here; not only is the FileUtils module synchronized between autom4te and automake Thanks for reminding me that I need to make sure all those files are actually in sync before I cut the final 2.72 release. require Time::HiRes; import Time::HiRes qw(stat); I believe that the import is not actually necessary The previous line is a "require", not a "use", so I believe it _is_ necessary. Have I misunderstood? ... should do no harm as long as any use of stat in the code is prepared to handle floating-point timestamps. There's only one use, in 'sub mtime', and that's the place where we actively want the floating-point timestamps. Yes, before seeing your actual patch, I had the mistaken impression that this code was in autom4te itself, not the FileUtils module. The import is needed in the FileUtils module, so the patch is correct. -- Jacob
Re: [RFC PATCH]: autom4te: report subsecond timestamp support in --version
Zack Weinberg wrote: The Automake test suite wants this in order to know if it’s safe to reduce the length of various delays for the purpose of ensuring files in autom4te.cache are newer than the corresponding source files. * lib/Autom4te/FileUtils.pm: Provide (but do not export) a flag $subsecond_mtime, indicating whether the ‘mtime’ function reports modification time with precision greater than one second. Reorganize commentary and import logic for clarity. Add configuration for emacs’ perl-mode to the bottom of the file. Now that I have seen the actual patch, yes, this test should be accurate. The test in the main autom4te script will also work, even if there is a mismatch between the script and its library, since Perl accepts a fully-qualified variable name even if that variable has never been declared; its value is undef, which is falsish in Boolean context. * bin/autom4te.in ($version): If $Autom4te::FileUtils::subsecond_mtime is true, add the text “ (subsecond timestamps supported)” to the first line of --version output. This appears to be misaligned with the GNU Coding Standards, which states: "The first line is meant to be easy for a program to parse; the version number proper starts after the last space." Perhaps the best option would be to conditionally add a line "This autom4te supports subsecond timestamps." after the license notice? -- Jacob
Re: rhel8 test failure confirmation?
Zack Weinberg wrote: On Sun, Dec 3, 2023, at 4:49 PM, Karl Berry wrote: There would not need to be much parsing, just "automake --version | grep > HiRes" in that case, or "case `automake --version` in *HiRes*) ...;; > easc" to avoid running grep if you want. I specifically want to hear what Karl thinks. I lean towards Jacob's view that automake --version | grep HiRes will suffice. Not having a new option seems simpler/better in terms of later understanding, too. --thanks, karl. Did I misunderstand which program's --version output we are talking about? I thought we were talking about automake's testsuite probing the behavior of *autom4te*, but all the quoted text seems to be imagining a probe of *automake* instead. Yes, there was a bit of confusion here; not only is the FileUtils module synchronized between autom4te and automake, but those two are near "sound-alikes" as I read them. Oops. The issue here seems to be determining if a fix that (I think) originated in automake has been applied to the active autom4te. [...] I'm not using the identifier "HiRes" because the use of Time::HiRes is an implementation detail that could change. For instance, if there's a third party CPAN module that lets us get at nanosecond-resolution timestamps *without* loss of precision due to conversion to an NV (aka IEEE double) we could, in the future, look for that first. That is fine, but "[HiRes]" or "[HiResTime]" is much shorter and we could use it as the name of the feature regardless of the underlying implementation. Characters in the first line of `autom4te --version` are a finite resource if we want it to fit on a standard 80-column terminal without wrapping. If we need to distinguish, "[HiRes] [HiRes-ns]" could be used to indicate your hypothetical integer nanosecond-resolution timestamp support, which would indicate also having sub-second timestamp support. I also suggest changing the tag, since the GNU Coding Standards call for the version number to be indicated by the last space, but parenthesized text between the name and version is supposed to be the package, so this would lead to: $ ./tests/autom4te --version autom4te [HiResTime] (GNU Autoconf) 2.72d.6-49ab3-dirty Copyright (C) 2023 Free Software Foundation, Inc. [...] Is this workable all the way around, everyone? Or should the feature be indicated with another line after the license notice? ("This autom4te has subsecond timestamp resolution.") My apologies for neglecting to check this before suggesting a tag in the --version output. The implementation is just BEGIN { our $subsecond_timestamps = 0; eval { require Time::HiRes; import Time::HiRes qw(stat); $subsecond_timestamps = 1; } } Jacob, can you confirm that's an accurate test, given all the things you said earlier about ways that grepping the source code might get it wrong? That will determine if (a) Time::HiRes could be loaded and (b) Time::HiRes::stat could be imported. This is the same test that Autom{ak,4t}e::FileUtils effectively uses to use Time::HiRes::stat. I believe that the import is not actually necessary (i.e. Time::HiRes always exported Time::HiRes::stat) but it should do no harm as long as any use of stat in the code is prepared to handle floating-point timestamps. As long as the autom4te script and its library are consistent (which is the distribution's problem if they screw that up), this test should be accurate. -- Jacob
Re: rhel8 test failure confirmation?
Karl Berry wrote: > There would not need to be much parsing, just "automake --version | grep > HiRes" in that case, or "case `automake --version` in *HiRes*) ...;; > easc" to avoid running grep if you want. I specifically want to hear what Karl thinks. I lean towards Jacob's view that automake --version | grep HiRes will suffice. Not having a new option seems simpler/better in terms of later understanding, too. --thanks, karl. P.S. As for case vs. grep, personally I find a simple if...grep easier to comprehend/test/debug than a case statement. (Especially the macro-ized AS_CASE, which just makes me have to look up its syntax every time I see it.) Also fewer lines of source. Granted calling the external grep is less efficient, but that seems insignificant to me. I understand Paul and others may disagree ... I agree that if...grep is more direct. I suggested the case alternative because it stands out in my memory after I needed it once, but I do not recall exactly why that contortion was needed. In configure, the efficiency difference is trivial because configure already runs many, many, many subprocesses. One more grep will not make a difference on any reasonable platform. -- Jacob
Re: rhel8 test failure confirmation?
Mike Frysinger wrote: On 02 Dec 2023 17:07, Jacob Bachmeyer wrote: Mike Frysinger wrote: On 06 Apr 2023 21:29, Jacob Bachmeyer wrote: Karl Berry wrote: jb> a more thorough test would locate the autom4te script and grep it for the perllibdir that was substituted when autoconf was configured. I guess that would work. Challenge accepted. Here's a refined version: (lines \-folded for email) if $PERL -I${autom4te_perllibdir:-$(sed -n \ '/autom4te_perllibdir/{s/^.*|| //;s/;$//;s/^.//;s/.$//;p;q}' \ <$(command -v autom4te))} -MAutom4te::FileUtils \ -e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then # autom4te uses Time::HiRes else # autom4te does not use Time::HiRes fi this doesn't work on systems that wrap `autom4te`. [...] [...] so i don't know why we would need to set/export autom4te_perllibdir in our wrapper. we've been doing this for over 20 years without ever setting that var (or any other internal autoconf/automake var), so i'm pretty confident our approach is OK. No, not in the wrapper---in the automake ebuild script that runs configure to match the autom4te that the wrapper will run. That test I proposed checks for autom4te_perllibdir in the environment before extracting it from autom4te precisely so distributions like Gentoo would have a knob to turn if their packaging breaks that test. That said, it turns out that this whole line of discussion is now a red herring; see below. [...] i'm not aware of anything loading the Autom4te perl modules outside of the autoconf projects. does that actually happen ? i don't think we want to have automake start loading autoconf perl modules directly. going through the CLI interface seems better at this point. Autoconf and Automake are very closely associated; there is even some shared code that is synchronized between them. Autom4te::FileUtils is also Automake::FileUtils, for example. The test we are discussing here was intended for Automake's configure script to use to check if the installed Autoconf has high-resolution timestamp support. It turned out that the test I wrote can give a false positive, as some versions of other dependencies of Autom4te::FileUtils /also/ use Time::HiRes, causing the test to correctly report that Time::HiRes was loaded, but Autom4te::FileUtils nonetheless does not actually use it. The test could probably be improved to fix the false positives, but that would be getting into deep magic in Perl that might not be fully reliable across Perl versions. (It would be necessary to determine if (a) Time::HiRes::stat exists and (b) Autom4te::FileUtils::stat is an alias to it. Having configure build a special XSUB just to check this is well into "ridiculous" territory.) As such, the Automake maintainers replaced this particular test with a simpler test that just locates Autom4te/FileUtils.pm and greps it for "Time::HiRes", thus the error message you received, which initially had me confused because the test I proposed cannot produce that message as it does not use grep. An important bit of context to keep in mind is that I am not certain that timestamp resolution is still a problem outside of the Automake and Autoconf testsuites, since Autoconf and Automake now require cache files to actually be newer than their sources and consider the cache files stale if the timestamps are equal. This is a problem for the testsuite because the testsuite is supposed to actually exercise the caching mechanism, and higher-resolution timestamps can significantly reduce the amount of time required to run the tests by reducing the delays needed to ensure the caches will be valid. -- Jacob
Re: rhel8 test failure confirmation?
Zack Weinberg wrote: On Sat, Dec 2, 2023, at 7:33 PM, Jacob Bachmeyer wrote: Zack Weinberg wrote: Would it help if we added a command line option to autom4te that made it report whether it thought it could use high resolution timestamps? Versions of autom4te that didn't recognize this option should be conservatively assumed not to support them. Why not just add that information to the --version message? Add a "(HiRes)" tag somewhere if Time::HiRes is available? Either way is no problem from my end, but it would be more work for automake (parsing --version output, instead of just checking the exit status of autom4te --assert-high-resolution-timestamp-support) Karl, do you have a preference here? I can make whatever you decide on happen, in the next couple of days. There would not need to be much parsing, just "automake --version | grep HiRes" in that case, or "case `automake --version` in *HiRes*) ...;; easc" to avoid running grep if you want. -- Jacob
Re: rhel8 test failure confirmation?
Zack Weinberg wrote: On Sat, Dec 2, 2023, at 6:37 PM, Karl Berry wrote: The best way to check if high-resolution timestamps are available to autom4te is to have perl load Autom4te::FileUtils and check if that also loaded Time::HiRes. The problem with that turned out to be that Time::HiRes got loaded from other system modules, resulting in the test thinking that autom4te used it when that wasn't actually the case. That's what happened in practice with your patch. Would it help if we added a command line option to autom4te that made it report whether it thought it could use high resolution timestamps? Versions of autom4te that didn't recognize this option should be conservatively assumed not to support them. Why not just add that information to the --version message? Add a "(HiRes)" tag somewhere if Time::HiRes is available? All versions that know to check if Time::HiRes is loaded will also know how to use it, unlike the earlier test. (Of course there's the additional wrinkle that whether high resolution timestamps *work* depends on what filesystem autom4te.cache is stored in, but that's even harder to probe... one problem at a time?) Yes; even standard-resolution timestamps might not be "all there" with FAT and its infamous 2-second timestamp resolution. Is this actually still a problem (other than for ensuring the cache is used in the testsuite) after Bogdan's patches to require that cache files be strictly newer than their source files? -- Jacob
Re: rhel8 test failure confirmation?
Mike Frysinger wrote: On 06 Apr 2023 21:29, Jacob Bachmeyer wrote: Karl Berry wrote: jb> a more thorough test would locate the autom4te script and grep it for the perllibdir that was substituted when autoconf was configured. I guess that would work. Challenge accepted. Here's a refined version: (lines \-folded for email) if $PERL -I${autom4te_perllibdir:-$(sed -n \ '/autom4te_perllibdir/{s/^.*|| //;s/;$//;s/^.//;s/.$//;p;q}' \ <$(command -v autom4te))} -MAutom4te::FileUtils \ -e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then # autom4te uses Time::HiRes else # autom4te does not use Time::HiRes fi this doesn't work on systems that wrap `autom4te`. Gentoo for example wraps all autoconf & automake scripts to support parallel installs of different versions. this way we can easily have access to every autoconf version. we got this idea from Mandrake, so we aren't the only ones ;). If you install a wrapper script, (instead of, for example, making autom4te, etc. easily-repointable symlinks), then you must also set autom4te_perllibdir in the environment to the appropriate directory when building autoconf/automake. This (with the Gentoo-specific knowledge of where the active autom4te is actually located) should be easy to add to the ebuild. If autom4te_perllibdir is set in the environment, its value will be used instead of extracting that information from the autom4te script. [...] seems like the only reliable option is to invoke autom4te. am_autom4te_ver=`$AUTOM4TE --version | sed -n '1{s:.*) ::;p}' AS_CASE([$am_autom4te_ver], ... do the matching ... what is the first autoconf release that has the fix ? The problem with testing autoconf versions for this is that Time::HiRes is an *optional* module in Perl. It was available from CPAN before it was bundled with Perl, and distributions technically can *unbundle* it from later Perl releases if they want. The only reliable way to know if Time::HiRes is available (without effectively reimplementing Perl's module search) is to try to load it. Autom4te now (correctly) uses Time::HiRes if it is available and falls back to Perl builtins if not, for any version of Perl. The best way to check if high-resolution timestamps are available to autom4te is to have perl load Autom4te::FileUtils and check if that also loaded Time::HiRes. -- Jacob
Re: Getting long SOURCES lines with subdirs shorter
Jan Engelhardt wrote: Given a_SOURCES = aprog/main.c aprog/foo.c aprog/bar.c aprog/baz.c ... The more source files there are to be listed, the longer that line gets, the bigger the Makefile.am fragment becomes, etc. I am thinking about how to cut that repetition down. Current automake likely won't have anything in store already, so I'm thinking of editing automake and targeting a future automake release. While this does not reduce the repetition, Automake allows backslash-continuation on these lines. DejaGnu uses it to list files one per line in some places; see http://git.savannah.gnu.org/cgit/dejagnu.git/tree/Makefile.am>. -- Jacob
Re: rhel8 test failure confirmation?
Karl Berry wrote: Hi Jacob, The guess was the two most probable locations: /usr/share/autoconf and /usr/local/share/autoconf. Wouldn't have worked on my own system :). Challenge accepted. Thanks! if $PERL -I${autom4te_perllibdir:-$(sed -n \ '/autom4te_perllibdir/{s/^.*|| //;s/;$//;s/^.//;s/.$//;p;q}' \ <$(command -v autom4te))} -MAutom4te::FileUtils \ -e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then # autom4te uses Time::HiRes unfortunately we are highly restricted in what we can use in basic automake/conf shell code (as opposed to in the tests). Neither the "command" command nor $(...) syntax can be used. Are you sure about that? I got a fair bit of pushback on removing $(...) from config.guess (where it actually is a problem because config.guess is supposed to identify a variety of pre-POSIX systems and can be run independently of configure) on the grounds that Autoconf locates a POSIX shell and uses it for the bulk of configure (and the auxiliary scripts like config.guess). Of course, Autoconf's "find a POSIX shell" logic does not help DejaGnu, which installs a copy of config.guess and runs it with /bin/sh according to its #! line... For the former, I think there's an autoconf/make macro to look up a program name along PATH? From a quick glance at the manual, that would be AC_PATH_PROG([AUTOM4TE], [autom4te]). [...] Would you be up for tweaking the check to use such least-common-denominator shell stuff? Let's try: AC_PATH_PROG([AUTOM4TE], [autom4te]) if test x$autom4te_perllibdir = x; then autom4te_perllibdir=`sed -n \ '/autom4te_perllibdir/{s/^.*|| //;s/;$//;s/^.//;s/.$//;p;q}' <$AUTOM4TE` fi if $PERL -I$autom4te_perllibdir -MAutom4te::FileUtils \ -e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then ... The backslash-newline in the sed command was added as a precaution against line-wrap in email; the line could be combined. Ordinarily Perl could not be used either, but since Automake is written in Perl, I don't see a problem with doing so here. (If the system doesn't have Perl, Automake won't get far.) If the system lacks Perl, autom4te will not work either. The proposed test uses Perl to determine a characteristic of a program that is written in Perl. :-) Not sure if $PERL is already defined by the time at which this would be run, but it should be possible to arrange with an ac prerequisite if needed. That should be easy enough to rearrange, since this check must come /after/ the autoconf version check---the pattern is only valid since autoconf-2.52f, but Automake requires autoconf-2.65 or later. -- Jacob
Re: rhel8 test failure confirmation?
Karl Berry wrote: jb> The test also guesses the location of autoconf's Perl libraries; I'm skeptical that any "guessing" of library locations would be reliable enough. The guess was the two most probable locations: /usr/share/autoconf and /usr/local/share/autoconf. jb> a more thorough test would locate the autom4te script and grep it for the perllibdir that was substituted when autoconf was configured. I guess that would work. Challenge accepted. Here's a refined version: (lines \-folded for email) if $PERL -I${autom4te_perllibdir:-$(sed -n \ '/autom4te_perllibdir/{s/^.*|| //;s/;$//;s/^.//;s/.$//;p;q}' \ <$(command -v autom4te))} -MAutom4te::FileUtils \ -e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then # autom4te uses Time::HiRes else # autom4te does not use Time::HiRes fi This version matches a patten that was introduced in commit c737451f8c17afdb477ad0fe72f534ea837e001e on 2001-09-13 preceding autoconf-2.52f, and Automake currently requires autoconf-2.65 or later, so this should work. Getting the single quotes away from the value without directly mentioning them is the purpose of the "s/^.//;s/.$//;" part of the sed command. Wrapping it as "$(eval echo $(sed ...))" would have been another option to have the shell strip the single quotes. Automake and autoconf are not two independent tools. Automake completely relies on autoconf. It's not for me to hand down any final pronouncements, but personally I feel strongly that the tests should not paper over this problem by changing the way tests work in general. With rm -rf of the cache, or autoconf -f, etc. That is not what users do, so that's not what the tests should do, either. Such global changes could have all kinds of other unknown/undesirable effects on the tests. In contrast to setting the sleep value "as appropriate", which is what is/should be already done, so changing the conditions under which it is set is unlikely to cause any unforeseen additional problems. While potentially compromising the real-world validity of the testsuite is a legitimate concern, the fact that Automake depends on Autoconf does not preclude the Automake testsuite from working around Autoconf limitations in order to accurately test /Automake/. -- Jacob
Re: rhel8 test failure confirmation?
Bogdan wrote: Jacob Bachmeyer , Mon Apr 03 2023 06:16:53 GMT+0200 (Central European Summer Time) Karl Berry wrote: [...] What can we do about this? As for automake: can we (you :) somehow modify the computation of the sleep value to determine if autom4te can handle the HiRes testing or not (i.e., has the patch installed)? And then use the longer sleep in automake testing if needed. If you can locate Autom4te::FileUtils, grepping it for "Time::HiRes" will tell you if autom4te supports sub-second timestamps, but then you need more checks to validate that the filesystem actually has sub-second timestamps. A simple check: if $PERL -I${autom4te_perllibdir:-/usr/share/autoconf} -I/usr/local/share/autoconf \ -MAutom4te::FileUtils -e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then # autom4te uses Time::HiRes else # autom4te does not use Time::HiRes fi This method also has the advantage of implicitly also checking that $PERL has Time::HiRes installed by determining if loading Autom4te::FileUtils causes Time::HiRes to be loaded. (In other words, this will give the correct answer on Perl 5.6 if Time::HiRes was installed from CPAN or on later Perls if a distribution packager has unbundled Time::HiRes and the user has not installed its package.) Nice. The 0 and 1 may not be portable to each OS in the Universe (see EXIT_SUCCESS and EXIT_FAILURE in exit(3)), but should be good/portable enough for our goals. Or maybe some other simple solution. Generally, "exit 0" reports success to the shell and any other exit value is taken as false. I am unsure if POSIX actually requires that, however. As I understand, this could even be used to actually call the sub which checks the timestamps, so we'd have a read-to-use test. Only a matter of where to put it... Is there some code that runs *before* all tests that could set some environment variable passed to the tests, create a file, or whatever? The intended implication was that that test would go in configure. Verifying that the filesystem actually /has/ subsecond timestamps is a separate issue; that test only detects whether autom4te will use subsecond timestamps /if/ they are available. The test also guesses the location of autoconf's Perl libraries; a more thorough test would locate the autom4te script and grep it for the perllibdir that was substituted when autoconf was configured. -- Jacob
Re: rhel8 test failure confirmation?
Karl Berry wrote: [...] What can we do about this? As for automake: can we (you :) somehow modify the computation of the sleep value to determine if autom4te can handle the HiRes testing or not (i.e., has the patch installed)? And then use the longer sleep in automake testing if needed. If you can locate Autom4te::FileUtils, grepping it for "Time::HiRes" will tell you if autom4te supports sub-second timestamps, but then you need more checks to validate that the filesystem actually has sub-second timestamps. A simple check: if $PERL -I${autom4te_perllibdir:-/usr/share/autoconf} -I/usr/local/share/autoconf \ -MAutom4te::FileUtils -e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then # autom4te uses Time::HiRes else # autom4te does not use Time::HiRes fi This method also has the advantage of implicitly also checking that $PERL has Time::HiRes installed by determining if loading Autom4te::FileUtils causes Time::HiRes to be loaded. (In other words, this will give the correct answer on Perl 5.6 if Time::HiRes was installed from CPAN or on later Perls if a distribution packager has unbundled Time::HiRes and the user has not installed its package.) [...] It seems to me that using autoconf -f or similar is papering over the problem, so that the tests are no longer testing the normal behavior. Which does not sound desirable. The Automake testsuite is supposed to test Automake, not Autoconf, so working around Autoconf issues is appropriate. In this case, if always using "autoconf -f" allows us to eliminate the sleeps entirely (and does not expand the running time of Autoconf too much), we should do that, at least in my view. -- Jacob
Re: rhel8 test failure confirmation? [PATCH for problem affecting Automake testsuite]
A quick introduction to the situation for the Autoconf list: The Automake maintainers have encountered a bizarre issue with sporadic random test failures, seemingly due to "disk writes not taking effect" (as Karl Berry mentioned when starting the thread). Bogdan appears to have traced the issue to autom4te caching and offered a patch. I have attached a copy of Bogdan's patch. Bogdan's patch is a subtle change: the cache is now considered stale unless it is /newer/ than the source files, rather than being considered stale only if the source files are newer. In short, this patch causes the cache to be considered stale if its timestamp /matches/ the source file, while it is currently considered valid if the timestamps match. I am forwarding the patch to the Autoconf list now because I concur with the change, noting that Time:HiRes is also limited by the underlying filesystem and therefore is not a "magic bullet" solution. Assuming the cache files are stale unless proven otherwise is therefore correct. Note again that this is _Bogdan's_ patch I am forwarding unchanged. I did not write it (but I agree with it). [further comments inline below] Bogdan wrote: Bogdan , Sun Mar 05 2023 22:31:55 GMT+0100 (Central European Standard Time) Karl Berry , Sat Mar 04 2023 00:00:56 GMT+0100 (Central European Standard Time) Note that 'config.h' is older (4 seconds) than './configure', which shouldn't be the case as it should get updated with new values. Indeed. That is the same sort of thing as I was observing with nodef. But what (at any level) could be causing that to happen? Files just aren't getting updated as they should be. I haven't yet tried older releases of automake to see if their tests succeed on the systems that are failing now. That's next on my list. [...] Another tip, maybe: cache again. When I compare which files are newer than the only trace file I get in the failing 'backcompat2' test ('autom4te.cache/traces.0'), I see that 'configure.ac' is older than this file in the succeeding run, but it's newer in the failing run. This could explain why 'configure' doesn't get updated to put new values in config.h (in my case) - 'autom4te' thinks it's up-to-date. The root cause may be in 'autom4te', sub 'up_to_date': # The youngest of the cache files must be older than the oldest of # the dependencies. # FIXME: These timestamps have only 1-second resolution. # Time::HiRes fixes this, but assumes Perl 5.8 or later. (lines 913-916 in my version). This comment Bogdan cites is not correct: Time::HiRes could be installed from CPAN on Perls older than 5.8, and might be missing from a 5.8 or later installation if the distribution packager separated it into another package. Nor is Time::HiRes guaranteed to fix the issue; the infamous example is the FAT filesystem, where timestamps only have 2-second resolution. Either way, Time::HiRes is now used if available, so this "FIXME" is fixed now. :-) Perhaps 'configure.ac' in the case that fails is created "not late enough" (still within 1 second) when compared to the cache, and the cached values are taken, generating the old version of 'configure' which, in turn, generates old versions of the output files. Still a guess, but maybe a bit more probable now. Does it work when you add '-f' to '$AUTOCONF'? It does for me - again, about 20 sequential runs of the same set of tests and about 5 parallel with 4 threads. Zero failures. I'd probably get the same result if I did a 'rm -fr autom4te.cache' before each '$AUTOCONF' invocation. [...] More input (or noise): 1) The t/backcompat2.sh test (the only test which fails for me) is a test which modifies configure.ac and calls $AUTOCONF several times. 2) Autom4te (part of Autoconf) has a 1-second resolution in checking if the input files are newer than the cache. Maybe. That comment could be wrong; the actual "sub mtime" is in Autom4te::FileUtils. Does your version of that module use Time::HiRes? Git indicates that use of Time::HiRes was added to Autoconf at commit 3a9802d60156809c139e9b4620bf04917e143ee2 which is between the 2.72a and 2.72c snapshot tags. 3) Thus, a sequence: 'autoconf' + quickly modify configure.ac + quickly run 'autoconf' may cause autom4te to use the old values from the cache instead of processing the new configure.ac. "Quickly" means within the same second. It might be broader than that if your version is already using Time::HiRes. If so, what filesystems are involved? I could see a possible bug where multiple writes get the same mtime if they get flushed to disk together. Time::HiRes will not help if this happens; your patch will work around such a bug. 4) I ran the provided list of tests (t/backcompat2.sh, t/backcompat3.sh, t/get-sysconf.sh, t/lex-depend.sh, t/nodef.sh, t/remake-aclocal-version-mismatch.sh, t/subdir-add2-pr46.sh, t/testsuite-summary-reference-log.sh) in batches of 2
Re: if vs. ifdef in Makefile.am
Bogdan wrote: [...] Probably Nick's suggestion (a new option to ./configure or the AC_HEADER_ASSERT macro) would be the most future-proof, but it requires running ./configure each time you wish to change the build type (which maybe is not a bad idea, it depends). That would probably be a very good idea, to avoid mixing files built for one mode with files built for another. Even easier: use separate build directories for each type, from a common source directory, like so: $ : ... starting one directory above the source tree in ./src/ ... $ (mkdir test-build; cd ./test-build && ../src/configure --enable-assert ...) $ (mkdir release-build; cd ./release-build && ../src/configure --disable-assert ...) Now you avoid conflating modules for test and release builds and ending up with an executable that you cannot reliably replicate. A simple flag to make is unlikely to be properly recognized as a dependency for all objects built. -- Jacob
Re: Generating missing depfiles by an automake based makefile
Dmitry Goncharov wrote: On Thursday, February 9, 2023, Tom Tromey wrote: It's been a long time since I worked on automake, but the dependency tracking in automake is designed not to need to rebuild or pre-build dep files. Doing that means invoking the compiler twice, which is slow. Instead, automake computes dependencies as a side effect of compilation. The hello.Po example presented above computes depfiles as a side effect of compilation. Moreover, when hello.Po is absent that makefile compiles hello.o as a side effect of hello.Po computation. In total there is only one compilation. What is the scenario where you both end up with an empty depfile and a compilation that isn't out of date for some other reason? That seems like it shouldn't be possible. When a depfile is missing (for any reason) the current automake makefile creates a dummy depfile. From that point on the user has to notice that make is no longer tracking dependencies and their build is incorrect. I am asking if automake can be enhanced to do something similar to hello.Po example above, in those cases when make supports that. If I understand correctly, the problem here is that the depfile is both empty and current. If Automake could set the dummy depfile's mtime to some appropriate past timestamp (maybe the Makefile itself?), it would appear out-of-date immediately and therefore be remade, also rebuilding the corresponding object. A quick check of the POSIX manual finds that touch(1) accepts the '-r' option to name a reference file and can create a file. Could we simply use "touch -r Makefile $DEPFILE" to create depfiles when we need dummies? -- Jacob
Re: man_MANS install locations
Karl Berry wrote: Hi Jan, As for GNU/Linux, what was the rationale to only permit [0-9ln]? No idea. Maybe just didn't think about "m", or maybe it didn't exist at that time? Jim, Paul, anyone? Should automake be relaxed? I see no harm in allowing more (any) letters, if that's what you mean. When running automake on Solaris, placing svcadm.1m into man1 rather than man1m seems outright wrong. But is Automake's purpose to reproduce platform-specific behavior, or to have consistent behavior across platforms? I think the latter. This would be adapting to platform-specific requirements. I suspect that Solaris man(1) will not look for svcadm.1m in man1 at all but only in man1m. I guess a new option to install *.1m in man1m/, etc., would be ok, if you want it. If you or anyone can provide a patch, that would be great. Unfortunately I doubt it's anything I will ever implement myself. Maybe the best answer is to install into an existing directory if one is found and otherwise trim the suffix to the "standard" set? Should the rpmlint check be adjusted to cater to the GNU FHS? I guess that's a question for the rpmlint people, whoever they are. I don't see that Automake's default behavior is going to change. Also, GNU (as an organization) never had anything to do with the FHS, so far as I know. I don't think the GNU coding standards/maintainer information have anything to say about this topic ... I seem to remember reading somewhere that /usr is supposed to be a symlink to / on the GNU system, so no, GNU is not intended to follow FHS. -- Jacob
Re: Old .Po file references old directory, how to start fresh?
Travis Pressler via Discussion list for automake wrote: Hi, I'm learning how to make an autotools project and have created a test project to work with. I ran make with a directory `nested` and then deleted it and deleted the reference to it in my `Makefile.am`. Now I'm running ./configure && make and I get the following: *** No rule to make target 'nested/main.c', needed by 'main.o'. Stop. How can I run `make` so that it doesn't reference this old nested directory? I was curious if I could find where this reference is, so I did a grep -r nested . I think the only relevant hit is: ./src/.deps/main.Po:main.o nested/main.c /usr/include/stdc-predef.h /usr/include/stdio.h \ Have you rerun automake to regenerate Makefile.in since changing Makefile.am? -- Jacob
Re: type errors, command length limits, and Awk
Mike Frysinger wrote: On 15 Feb 2022 21:17, Jacob Bachmeyer wrote: Mike Frysinger wrote: context: https://bugs.gnu.org/53340 Looking at the highlighted line in the context: thanks for getting into the weeds with me You are welcome. echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \ It seems that the problem is that am__base_list expects ListOf/File (and produces ChunkedListOf/File) but am__pep3147_tweak emits ListOf/Glob. This works in the usual case because the shell implicitly converts Glob -> ListOf/File and implicitly flattens argument lists, but results in the overall command line being longer than expected if the globs expand to more filenames than expected, as described there. It seems that the proper solution to the problem at hand is to have am__pep3147_tweak expand globs itself somehow and thus provide ListOf/File as am__base_list expects. Do I misunderstand? Is there some other use for xargs? if i did not care about double expansion, this might work. the pipeline quoted here handles the arguments correctly (other than whitespace splitting on the initial input, but that's a much bigger task) before passing them to the rest of the pipeline. so the full context: echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \ while read files; do \ $(am__uninstall_files_from_dir) || st=$$?; \ done || exit $$?; \ ... am__uninstall_files_from_dir = { \ test -z "$$files" \ || { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \ || { echo " ( cd '$$dir' && rm -f" $$files ")"; \ $(am__cd) "$$dir" && rm -f $$files; }; \ } leveraging xargs would allow me to maintain a single shell expansion. the pathological situation being: bar.py __pycache__/ bar.pyc bar*.pyc bar**.pyc py_files="bar.py" which turns into "__pycache__/bar*.pyc" by the pipeline, and then am__uninstall_files_from_dir will expand it when calling `rm -f`. if the pipeline expanded the glob, it would be: __pycache__/bar.pyc __pycache__/bar*.pyc __pycache__/bar**.pyc and then when calling rm, those would expand a 2nd time. If we know that there will be _exactly_ one additional shell expansion, why not simply filter the glob results through `sed 's/[?*]/\\&/g'` to escape potential glob metacharacters before emitting them from am__pep3147_tweak? (Or is that not portable sed?) Back to the pseudo-type model I used earlier, the difference between File and Glob is that Glob contains unescaped glob metacharacters, so escaping them should solve the problem, no? (Or is there another thorn nearby?) [...] which at this point i've written `xargs -n40`, but not as fast :p. Not as fast, yes, but certainly portable! :p The real question would be if it is faster than simply running rm once per file. I would guess probably _so_ on MinGW (bash on Windows, where that logic would use shell builtins but running a new process is extremely slow) and probably _not_ on an archaic Unix system where "test" is not a shell builtin so saving the overhead and just running rm once per file would be faster. automake jumps through some hoops to try and limit the length of generated command lines, like deleting output objects in a non-recursive build. it's not perfect -- it breaks arguments up into 40 at a time (akin to xargs -n40) and assumes that it won't have 40 paths with long enough names to exceed the command line length. it also has some logic where it's deleting paths by globs, but the process to partition the file list into groups of 40 happens before the glob is expanded, so there are cases where it's 40 globs that can expand into many many more files and then exceed the command line length. First, I thought that GNU-ish systems were not supposed to have such arbitrary limits, one person's "arbitrary limits" is another person's "too small limit" :). i'm most familiar with Linux, so i'll focus on that. [...] plus, backing up, Automake can't assume Linux. so i think we have to proceed as if there is a command line limit we need to respect. So then the answer to my next question is that it is still an issue, even if the GNU system were to allow arguments up to available memory. and this issue (the context) originated from Gentoo GNU/Linux. Is this a more fundamental bug in Gentoo or still an issue because Automake build scripts are supposed to be portable to foreign system that do have those limits? to be clear, what's failing is an Automake test. it sets the `rm` limit to an articially low one. [...] Gentoo happened to find this error before Automake because Gentoo also found and fixe
type errors, command length limits, and Awk (was: portability of xargs)
Mike Frysinger wrote: context: https://bugs.gnu.org/53340 Looking at the highlighted line in the context: > echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \ It seems that the problem is that am__base_list expects ListOf/File (and produces ChunkedListOf/File) but am__pep3147_tweak emits ListOf/Glob. This works in the usual case because the shell implicitly converts Glob -> ListOf/File and implicitly flattens argument lists, but results in the overall command line being longer than expected if the globs expand to more filenames than expected, as described there. It seems that the proper solution to the problem at hand is to have am__pep3147_tweak expand globs itself somehow and thus provide ListOf/File as am__base_list expects. Do I misunderstand? Is there some other use for xargs? I note that the current version of standards.texi also allows configure and make rules to use awk(1); could that be useful here instead? (see below) [...] automake jumps through some hoops to try and limit the length of generated command lines, like deleting output objects in a non-recursive build. it's not perfect -- it breaks arguments up into 40 at a time (akin to xargs -n40) and assumes that it won't have 40 paths with long enough names to exceed the command line length. it also has some logic where it's deleting paths by globs, but the process to partition the file list into groups of 40 happens before the glob is expanded, so there are cases where it's 40 globs that can expand into many many more files and then exceed the command line length. First, I thought that GNU-ish systems were not supposed to have such arbitrary limits, and this issue (the context) originated from Gentoo GNU/Linux. Is this a more fundamental bug in Gentoo or still an issue because Automake build scripts are supposed to be portable to foreign system that do have those limits? Second, counting files in the list, as you note, does not necessarily actually conform to the system limits, while Awk can track both number of elements in the list and the length of the list as a string, allowing to break the list to meet both command tail length limits (on Windows or total size of block to transfer with execve on POSIX) and argument count limits (length of argv acceptable to execve on POSIX). POSIX Awk should be fairly widely available, although at least Solaris 10 has a non-POSIX awk in /usr/bin and a POSIX awk in /usr/xpg4/bin; I found this while working on DejaGnu. I ended up using this test to ensure that "awk" is suitable: 8<-- # The non-POSIX awk in /usr/bin on Solaris 10 fails this test if echo | "$awkbin" '1 && 1 {exit 0}' > /dev/null 2>&1 ; then have_awk=true else have_awk=false fi 8<-- Another "gotcha" with Solaris 10 /usr/bin/awk is that it will accept "--version" as a valid Awk program, so if you use that to test whether "awk" is GNU Awk, you must redirect input from /dev/null or it will hang. Automake may want to do more extensive testing to find a suitable Awk; the above went into a script that remains generic when installed and so must run its tests every time the user invokes it, so "quick" was a high priority. -- Jacob
Re: portability of xargs
Dan Kegel wrote: Meson is a candidate for such a next-gen config system. It is in python, which does not quite qualify as usable during early uplift/bootstrap, but there are C ports in progress, see e.g. https://sr.ht/~lattis/muon/ *Please* do not introduce a dependency on Python; they do not worry much about backwards compatibility. If there is ever a Python 4 with a 3->4 transition anything like the 2->3 transition, you could end up with every past release relying on current Python becoming unbuildable. Having complex dependencies for creating the build scripts is one thing, but needing major packages (like Python) to *use* the build scripts is a serious problem for anything below the "user application" tier, especially the "base system" tier. -- Jacob
Re: Automake for RISC-V
Billa Surendra wrote: On Sun, 21 Nov, 2021, 2:28 am Nick Bowler, wrote: On 20/11/2021, Billa Surendra wrote: I have RISC-V native compiler on target image, but when I am compiling automake on target image it needs automake on target. This is the main problem. Automake should not be required to install automake if you are using a released version and have not modified the build system Could you please explain more, What is the released version ? . Modified build system means ? Automake should only be needed if you have changed a "Makefile.am" file somewhere. Are you using some kind of packaging system that likes to regenerate build files as a matter of course? The normal "/path/to/src/configure && make && make install" procedure should not require Automake to be installed. -- Jacob
Re: Automake for RISC-V
Billa Surendra wrote: Thanks for your reply. I have installed perl on target system but target image and build system perl version were different. And second, thing I have noticed that in aclocal script very first line is #! /bin/perl A simple workaround is to find perl on the target system image (probably /usr/bin/perl, but it could have been installed somewhere else) and make a symlink at /bin/perl to the real interpreter. It is possible that your build system has /bin as a symlink to /usr/bin, as a certain widely-loathed developer has been rather forcefully advocating the past few years... -- Jacob
Re: Automake testsuite misuses DejaGnu [PATCH v0]
Jim Meyering wrote: [...] Even a sample fix for one of the currently-failing tests would be helpful. This is the first draft; this patch breaks 1.6.1 because versions of DejaGnu prior to 1.6.3 require srcdir to point exactly to the testsuite, while 1.6.3 allows the testsuite to be in ${srcdir}/testsuite. 8<-- diff -urN -x '*~' automake-1.16.3-original/t/check12.sh automake-1.16.3/t/check12.sh --- automake-1.16.3-original/t/check12.sh 2020-11-18 19:21:03.0 -0600 +++ automake-1.16.3/t/check12.sh2021-06-29 01:47:21.669276386 -0500 @@ -60,8 +60,8 @@ DEJATOOL = hammer spanner AM_RUNTESTFLAGS = HAMMER=$(srcdir)/hammer SPANNER=$(srcdir)/spanner EXTRA_DIST += $(DEJATOOL) -EXTRA_DIST += hammer.test/hammer.exp -EXTRA_DIST += spanner.test/spanner.exp +EXTRA_DIST += testsuite/hammer.test/hammer.exp +EXTRA_DIST += testsuite/spanner.test/spanner.exp END cat > hammer << 'END' @@ -77,9 +77,10 @@ END chmod +x hammer spanner -mkdir hammer.test spanner.test +mkdir testsuite +mkdir testsuite/hammer.test testsuite/spanner.test -cat > hammer.test/hammer.exp << 'END' +cat > testsuite/hammer.test/hammer.exp << 'END' set test test_hammer spawn $HAMMER expect { @@ -88,7 +89,7 @@ } END -cat > spanner.test/spanner.exp << 'END' +cat > testsuite/spanner.test/spanner.exp << 'END' set test test_spanner spawn $SPANNER expect { diff -urN -x '*~' automake-1.16.3-original/t/dejagnu3.sh automake-1.16.3/t/dejagnu3.sh --- automake-1.16.3-original/t/dejagnu3.sh 2020-11-18 19:21:03.0 -0600 +++ automake-1.16.3/t/dejagnu3.sh 2021-06-29 01:19:19.161147525 -0500 @@ -34,12 +34,13 @@ AUTOMAKE_OPTIONS = dejagnu DEJATOOL = hammer AM_RUNTESTFLAGS = HAMMER=$(srcdir)/hammer -EXTRA_DIST = hammer hammer.test/hammer.exp +EXTRA_DIST = hammer testsuite/hammer.test/hammer.exp END -mkdir hammer.test +mkdir testsuite +mkdir testsuite/hammer.test -cat > hammer.test/hammer.exp << 'END' +cat > testsuite/hammer.test/hammer.exp << 'END' set test test spawn $HAMMER expect { diff -urN -x '*~' automake-1.16.3-original/t/dejagnu4.sh automake-1.16.3/t/dejagnu4.sh --- automake-1.16.3-original/t/dejagnu4.sh 2020-11-18 19:21:03.0 -0600 +++ automake-1.16.3/t/dejagnu4.sh 2021-06-29 01:25:08.309780437 -0500 @@ -49,13 +49,14 @@ AM_RUNTESTFLAGS = HAMMER=$(srcdir)/hammer SPANNER=$(srcdir)/spanner -EXTRA_DIST = hammer hammer.test/hammer.exp -EXTRA_DIST += spanner spanner.test/spanner.exp +EXTRA_DIST = hammer testsuite/hammer.test/hammer.exp +EXTRA_DIST += spanner testsuite/spanner.test/spanner.exp END -mkdir hammer.test spanner.test +mkdir testsuite +mkdir testsuite/hammer.test testsuite/spanner.test -cat > hammer.test/hammer.exp << 'END' +cat > testsuite/hammer.test/hammer.exp << 'END' set test test spawn $HAMMER expect { @@ -64,7 +65,7 @@ } END -cat > spanner.test/spanner.exp << 'END' +cat > testsuite/spanner.test/spanner.exp << 'END' set test test spawn $SPANNER expect { diff -urN -x '*~' automake-1.16.3-original/t/dejagnu5.sh automake-1.16.3/t/dejagnu5.sh --- automake-1.16.3-original/t/dejagnu5.sh 2020-11-18 19:21:03.0 -0600 +++ automake-1.16.3/t/dejagnu5.sh 2021-06-29 01:26:36.511645792 -0500 @@ -34,12 +34,13 @@ cat > Makefile.am << END AUTOMAKE_OPTIONS = dejagnu -EXTRA_DIST = $package $package.test/$package.exp +EXTRA_DIST = $package testsuite/$package.test/$package.exp AM_RUNTESTFLAGS = PACKAGE=\$(srcdir)/$package END -mkdir $package.test -cat > $package.test/$package.exp << 'END' +mkdir testsuite +mkdir testsuite/$package.test +cat > testsuite/$package.test/$package.exp << 'END' set test "a_dejagnu_test" spawn $PACKAGE expect { diff -urN -x '*~' automake-1.16.3-original/t/dejagnu6.sh automake-1.16.3/t/dejagnu6.sh --- automake-1.16.3-original/t/dejagnu6.sh 2020-11-18 19:21:03.0 -0600 +++ automake-1.16.3/t/dejagnu6.sh 2021-06-29 01:28:07.151396859 -0500 @@ -35,8 +35,9 @@ AM_RUNTESTFLAGS = FAILDEJA=$(srcdir)/faildeja END -mkdir faildeja.test -cat > faildeja.test/faildeja.exp << 'END' +mkdir testsuite +mkdir testsuite/faildeja.test +cat > testsuite/faildeja.test/faildeja.exp << 'END' set test failing_deja_test spawn $FAILDEJA expect { diff -urN -x '*~' automake-1.16.3-original/t/dejagnu7.sh automake-1.16.3/t/dejagnu7.sh --- automake-1.16.3-original/t/dejagnu7.sh 2020-11-18 19:21:03.0 -0600 +++ automake-1.16.3/t/dejagnu7.sh 2021-06-29 01:29:38.877097021 -0500 @@ -39,8 +39,9 @@ AM_RUNTESTFLAGS = --status FAILTCL=$(srcdir)/failtcl END -mkdir failtcl.test -cat > failtcl.test/failtcl.exp << 'END' +mkdir testsuite +mkdir testsuite/failtcl.test +cat > testsuite/failtcl.test/failtcl.exp << 'END' set test test spawn $FAILTCL expect { diff -urN -x '*~' automake-1.16.3-original/t/dejagnu-absolute-builddir.sh automake-1.16.3/t/dejagnu-absolute-builddir.sh --- automake-1.16.3-original/t/dejagnu-absolute-builddir.sh 2020-11-18 19:21:03.0 -0600 +++ automake-1.16.3/t/dejagnu-absolute-builddir.sh 2021-06-29 01:36:15.6
Re: Automake testsuite misuses DejaGnu
Daniel Herring wrote: It seems fragile for DejaGnu to probe for a testsuite directory and change its behavior as you describe. For example, I could have a project without the testsuite dir, invoke the tester, and have it find and run some unrelated files in the parent directory. Unexpected behavior (chaos) may ensue. This already happens and this is the behavior that is deprecated and even more fragile. Without a testsuite/ directory, DejaGnu will end up searching the tree for *.exp files and running them all. Eventually, if $srcdir neither is nor contains "testsuite", DejaGnu will throw an error and abort. The testsuite/ directory is a long-documented requirement. Is there an explicit command-line argument that could be added to the Automake invocation? Not easily; the probing is done specifically to allow for two different ways of using DejaGnu: using recursive Makefiles that invoke DejaGnu with the testsuite/ directory current, and using non-recursive Makefiles, which with Automake will invoke DejaGnu with the top-level directory, presumably containing the "testsuite" directory. Both of these cases must be supported: the toolchain packages use the former and Automake's basic DejaGnu support will use the latter if a non-recursive layout is desired. Both of these use the same command line argument --srcdir and site.exp variable srcdir; the difference is that srcdir has acquired two different meanings. -- Jacob
Re: Automake testsuite misuses DejaGnu
Karl Berry wrote: DejaGnu has always required a DejaGnu testsuite to be rooted at a "testsuite" directory If something was stated in the documentation, but not enforced by the code, hardly surprising that "non-conformance" is widespread. It is not widespread -- all of the toolchain packages correctly place their testsuites in testsuite/ directories. As far as I know, the Automake tests are the only outlier. Anyway, it seems like an unfriendly requirement for users. And even more to incompatibly enforce something now that has not been enforced for previous decades. Why? (Just wondering.) -k Previous versions of DejaGnu did not properly handle non-recursive make with Automake-produced makefiles. Beginning with 1.6.3, the testsuite is allowed to be in ${srcdir}/testsuite instead of ${srcdir} exactly. Enforcing the long-documented (and mostly followed) requirement that there be a directory named "testsuite" containing the testsuite allows DejaGnu to resolve the ambiguity and determine if it has been invoked at package top-level or in the testsuite/ directory directly. Even in 1.6.3, there was intent to continue to allow the broken cases to work with a warning, but I made the conditional for that case too narrow (oops!) and some of the Automake test cases fail as a result. Fixing this now is appropriate because no one is going to see the future deprecation warnings due to the way Automake tests are run. -- Jacob
Re: Automake testsuite misuses DejaGnu
Jim Meyering wrote: On Sun, Jul 11, 2021 at 9:03 PM Jacob Bachmeyer wrote: [...] The affected tests are: check12, dejagnu3, dejagnu4, dejagnu5, dejagnu6, dejagnu7, dejagnu-absolute-builddir, dejagnu-relative-srcdir, dejgnu-siteexp-extend, dejagnu-siteexp-useredit. [...] Thank you for the analysis and heads-up. I see that Fedora 34 currently has only dejagnu-1.6.1. If this is something you can help with now, I can certainly wait a few days. Even a sample fix for one of the currently-failing tests would be helpful. That is part of the problem: I have a patch, but applying it will cause the tests to fail with DejaGnu 1.6.1. Older versions of DejaGnu require $srcdir to be exactly the root of the testsuite, while 1.6.3 accepts a testsuite in $srcdir or ${srcdir}/testsuite; the latter is needed to allow Automake to invoke DejaGnu from the top-level in the tree. I expect to have time to try a recursive make solution later tonight or tomorrow. Do I understand correctly that I will need to add "SUBDIRS = testsuite" to the top-level TEST_CASE/Makefile.am in the test case and move the "AUTOMAKE_OPTIONS = dejagnu" and "DEJATOOL" definitions to TEST_CASE/testsuite/Makefile.am to get Automake to invoke DejaGnu in the testsuite subdirectory instead of top-level? -- Jacob
Automake testsuite misuses DejaGnu
I was planning to find a solution with a complete patch before mentioning this, but since a release is imminent I will just state the problem: several tests in the Automake testsuite misuse DejaGnu and fail with the 1.6.3 DejaGnu release as a result. DejaGnu has always required a DejaGnu testsuite to be rooted at a "testsuite" directory and this has long been documented in the manual. However, prior to 1.6.3, DejaGnu did not actually depend on this requirement being met. Changes during the development process to properly support non-recursive Automake makefiles required relying on this requirement to resolve the ambiguity between recursive and non-recursive usage. Several tests in the Automake testsuite do not meet this requirement and fail if run with DejaGnu 1.6.3. The simple change of updating the tests to use a testsuite/ directory causes the tests to fail with older versions of DejaGnu, due to lack of support for non-recursive "make check" in those versions. I have not yet tried a patch that also switches the tests to use recursive make, but I believe that is probably the only way for the tests to pass with old and new DejaGnu. Note that, according to the original author, Rob Savoye, DejaGnu has always been intended to require that testsuites be rooted at a "testsuite" directory and the behavior that Automake's test cases rely on was never supported. The affected tests are: check12, dejagnu3, dejagnu4, dejagnu5, dejagnu6, dejagnu7, dejagnu-absolute-builddir, dejagnu-relative-srcdir, dejgnu-siteexp-extend, dejagnu-siteexp-useredit. Note that these tests do not all fail with the 1.6.3 release, but will all fail with some future release when the undocumented support for a testsuite not rooted at "testsuite" will eventually be removed. -- Jacob
Re: parallel build issues
Bob Friesenhahn wrote: It is possible to insert additional dependency lines in Makefile.am so software is always built in the desired order, but this approach might only work if you always build using the top level Makefile. This should actually work here: the problem is that a target in doc/ also depends on a target in frontend/ and uses recursive make to build that target. When the top-level Makefile is used in parallel mode, sub-makes are concurrently run in both doc/ and frontend/ but the doc/ sub-make invokes another make in frontend/ leading to a race and failure. If only doc/Makefile is used, it will spawn a sub-make in frontend/ that will be the only make running there and will succeed. If only frontend/Makefile is used, everything works similarly. Since the problem can only occur when building with the top-level Makefile, adding a dependency in the top-level Makefile should prevent it. -- Jacob