Re: 1.16.90 regression: configure now takes 7 seconds to start

2024-06-18 Thread Jacob Bachmeyer

Bruno Haible wrote:

Jacob Bachmeyer wrote:
  
under what conditions can "checking that 
generated files are newer than configure" actually fail?



I mentioned two such conditions in [1]:
  - Skewed clocks. (I see this regularly on VMs that have 1 or 2 hours
of skew.)
  - If the configure file was created less than 1 second ago and the
file system time resolution is 1 second. (This happens frequently
in the Automake test suite.)


In the first of those scenarios, AM_SANITY_CHECK should bail out.  In 
the second case, AM_SANITY_CHECK should delay for 1 second, and then 
find the test file newer than configure.


One (or both?) of us is misunderstanding something here.  First, 
configure performs AM_SANITY_CHECK ("checking that build environment is 
sane") and bails out if that test fails.  For that test to pass, a 
generated file (conftest.file in the old version) must test to be newer 
than configure.  If that test fails, configure aborts and "checking that 
generated files are newer then configure" is never reached.


Given that "checking that generated files are newer than configure" is 
reached, which implies that a file produced before any actual tests were 
run was found to be newer than configure, how can config.status, which 
is produced /after/ tests are run, now fail to be newer than configure?



-- Jacob



Re: 1.16.90 regression: configure now takes 7 seconds to start

2024-06-18 Thread Jacob Bachmeyer

Zack Weinberg wrote:

On Tue, Jun 18, 2024, at 12:02 AM, Jacob Bachmeyer wrote:
  

[...]
Wait... all of configure's non-system dependencies are in the release
tarball and presumably (if "make dist" worked correctly) backdated
older than configure when the tarball is unpacked.



In my experience, tarballs cannot be trusted to get this right, *and*
tar implementations cannot be trusted to unpack them accurately
(e.g. despite POSIX I have run into implementations that defaulted to
the equivalent of GNU tar's --touch mode).  Subsequent bounces through
downstream repackaging do not help.   Literally as I type this I am
watching gettext 0.22 run its ridiculous number of configure scripts a
second time from inside `make`.
  
First, "make dist" should get the tarball right.  Second, absent some 
special flag (--enable-maintainer-mode?), a package using Automake 
should have no problem if all distributed files have the same timestamp.


I see a possibility of a lazy tar(1) implementation not restoring 
timestamps at all, with the result that the unpacked files get mtimes in 
the order they were unpacked from the archive.  Perhaps "make dist" 
should sort the files into the proper order while packing the tarball?  
Automake should have the dependency graph available while generating the 
"make dist" commands...



Does "make dist" need to touch configure to ensure that it is newer
than its dependencies before rolling the tarball?



It ought to, but I don't think that will be more than a marginal
improvement, and touching the top-level configure won't be enough,
you'd need to do a full topological sort on the dependency graph
leading into every configure + every Makefile.in + every other
generated-but-shipped file and make sure that each tier of generated
files is newer than its inputs.

I wonder if a more effective approach would be to disable the rules to
regenerate configure, Makefile.in, etc. unless either --enable-maintainer-mode
or we detect that we are building out of a VCS checkout.


I thought that that /was/ the effect of --enable-maintainer-mode?  I 
would also suggest not handling VCS checkouts specially.  If you want 
the Makefile rules for generating GNU build system scripts, you should 
have to say --enable-maintainer-mode.  Otherwise, you can always use the 
tools directly or put an autogen.sh or bootstrap.sh or similar in the VCS.



-- Jacob




Re: 1.16.90 regression: configure now takes 7 seconds to start

2024-06-17 Thread Jacob Bachmeyer

Zack Weinberg wrote:

On Mon, Jun 17, 2024, at 10:30 PM, Jacob Bachmeyer wrote:
...

Don't have enough brain right now to comment on any of the rest of your 
suggestions, but:

  
once conftest.file is newer than configure, surely 
config.status, which is produced after all tests are run, /must/ also be 
newer than configure?


How is this last check/delay actually necessary?  Are there broken 
systems out there that play games with causality?



I regret to say, yes, there are. For example, this can happen with NFS if there 
are multiple clients updating the same files and they don't all agree on the 
current time. Think build farm with several different configurations being 
built out of the same srcdir - separate build dirs, of course, but that doesn't 
actually help here since the issue is ensuring the Makefile doesn't think 
*configure* (not config.status) needs rebuilt.


Wait... all of configure's non-system dependencies are in the release 
tarball and presumably (if "make dist" worked correctly) backdated older 
than configure when the tarball is unpacked.  Does "make dist" need to 
touch configure to ensure that it is newer than its dependencies before 
rolling the tarball?


How can configure [appear to] need to be rebuilt here?  No build should 
touch it or its dependencies.


Or, to put this another way, under what conditions can "checking that 
generated files are newer than configure" actually fail?  If we do not 
know of any, then perhaps we should add a hidden 
"--enable--wait-for-newer-config.status" (double-hyphen intentional) 
option, and unless that option is given, bail out with a message asking 
(1) to report the system and environment configuration to the Automake 
list and (2) rerun configure with that option to sleep until 
config.status is newer instead of bailing out.



-- Jacob




Re: 1.16.90 regression: configure now takes 7 seconds to start

2024-06-17 Thread Jacob Bachmeyer

Nick Bowler wrote:

On 2024-06-16 21:35, Jacob Bachmeyer wrote:
  

I think we might best be able to avoid this by using AC_CONFIG_COMMANDS_POST
to touch config.status if neccessary, instead of trying to decide
whether to sleep before writing config.status.



If the problem is simply that we want to avoid the situation where
"make" considers config.status to be out of date wrt. configure, or
something similar with any other pair of files, then this should be
solveable fairly easily with a pattern like this (but see below):

  AC_CONFIG_COMMANDS_POST([cat >conftest.mk <<'EOF'
  configure: config.status
false
  EOF
  while ${MAKE-make} -f conftest.mk >/dev/null 2>&1
  do
touch config.status
  done])

In my own experience the above pattern is portable.  It works with HP-UX
make.  It works with a "touch" that truncates timestamps.  In the common
case where configure is sufficiently old the loop condition will always
be false and there is no delay.

It won't guarantee that config.status has a strictly newer timestamp
than configure (except on HP-UX), but it sounds like that's fine.
  


We can guarantee that by reusing the pattern in AM_SANITY_CHECK, which 
uses `ls -t`, with the advantage that we have already used that pattern, 
so it cannot add "new" possible portability problems.  I would also 
suggest a `sleep 1` in the loop instead of spinning on the test, since 
we expect the common case to not loop at all.


Also, if we use `echo >> config.status` as Bruno Haible suggested in 
another reply, every cycle will add one newline to the end of 
config.status, so spinning at the test could make config.status very large.


If we want to allow "checking that generated files are newer than 
configure" to fail, I would suggest bounding this at 5 seconds and 
bailing out after 5 `sleep 1` if config.status is not newer by then, but 
see below.



One missing element is that there is no limit, which would be a bit of a
problem if the clock skew is severe (e.g., if configure's mtime is years
or even minutes in the future), so something extra is probably desirable
to bound the amount of time this runs to something practical.
  


This will not be a problem:  AM_SANITY_CHECK bails out (or will bail 
out) if a recently-created file cannot be made newer than configure by 
sleeping briefly.  If configure's mtime is in the future, config.status 
will never be written and this code will never be reached.  The delay 
here is thus bounded by the filesystem timestamp resolution, since we 
may have to wait until config.status is newer than configure---but no 
longer---and that only if configure was regenerated just before being 
run.  In the case of a tree of configure scripts that started this 
current mess, time will march on as the first run waits for 
config.status to be newer, and the later configure runs will each find 
that their config.status is newer when it is first written.


In fact, now that I think about it, I am not sure how this could ever be 
a problem:  time marches on as AM_SANITY_CHECK is doing its thing before 
any tests are run, so once conftest.file is newer than configure, surely 
config.status, which is produced after all tests are run, /must/ also be 
newer than configure?


How is this last check/delay actually necessary?  Are there broken 
systems out there that play games with causality?



-- Jacob



Re: use of make in AM_SANITY_CHECK

2024-06-16 Thread Jacob Bachmeyer

Karl Berry wrote:

make(1) in AM_SANITY_CHECK seems to be a logic error, since the user
may want to build with a different $MAKE,

You're right. Crap. It never ends.

In practice it probably doesn't matter, though.  Although in theory one
can imagine that "make" succeeds while $MAKE fails, resulting in a false
positive, in practice that seems next to zero probability to me. Much
more likely is that "make" fails and $MAKE succeeds, and the only
downside of that is an extra second of sleeping.
  


The problem is that we still sleep unnecessarily in the sanity check.  
While there is no way to avoid sleeping if we need to /measure/ the 
filesystem timestamp resolution, few packages actually need that 
information (Automake itself is one of them, for its testsuite) and the 
sanity check can be (and previously was) done without actually measuring it.


have a way to revise AM_SANITY_CHECK that can avoid any sleep in the 
most common cases.


Bruno's last patch already does that, doesn't it? I'll apply it shortly.
  


No, that patch does not:  it promotes 
_AM_FILESYSTEM_TIMESTAMP_RESOLUTION to 
AM_FILESYSTEM_TIMESTAMP_RESOLUTION (removing the underscore), but still 
calls it as part of AM_SANITY_CHECK.


I propose first mostly reverting to the code in commit 
f6b3f7fb620580356865ebedfbaf76af3e534369:  revising AM_SANITY_CHECK to 
create a test file and immediately check if that file is newer than 
configure itself, then (if needed) sleep for one second, overwrite the 
test file and test again, then (if needed) sleep for one more second and 
repeat to allow FAT filesystems to be considered "sane". Then, replace 
the effect of commit 333c18a898e9042938be0e5709ec46ff0ead0797 and fix 
the problem with config.status not being newer than configure by adding 
an AC_CONFIG_COMMANDS_POST block that checks if config.status is newer 
than configure, and if not, sleeps one second and executes "touch 
config.status", then repeats that test once (again to accommodate FAT 
filesystem limitations) if needed.


In Mike Frysinger's situation of a Gentoo build with many small 
configure scripts, this /should/ result in at most one configure 
sleeping once, after which all of the other freshly regenerated 
configure scripts will already be old enough to avoid delays.



-- Jacob




Re: 1.16.90 regression: configure now takes 7 seconds to start

2024-06-16 Thread Jacob Bachmeyer

Karl Berry wrote:

Find here attached a revised proposed patch.

Ok on the reorg, but sorry, I remain confused. This whole thing started
with Mike Vapier's change in Feb 2022 (commit 720a11531):
https://lists.gnu.org/archive/html/automake-commit/2022-02/msg9.html

As I read it now, his goal was to speed up other projects, not Automake,
by reducing the "sleep 1" to "sleep " in
AM_SANITY_CHECK, via AC_CONFIG_COMMANDS_PRE, i.e., before creating
config.status.

But that is only one instance of generating files. I must be missing
something obvious. There are zillions of generated files in the
world. For instance, why aren't there problems when a small C file is
created and compiled? That could easily take less than 1 second, if that
is the mtime resolution.

I understand that equal timestamps are considered up to date, and
presumably the .c and .o (say) would be equal in such a case. Ok, but
then why is configure generating config.status/etc. such a special case
that it requires the sleep, and nothing else? I mean, I know the sleep
is needed; I've experienced the problems without that sleep myself. But
I don't understand why it's the only place (in normal compilations;
forget the Automake test suite specifically) that needs it.
  


The sleep appears to have been introduced in commit 
333c18a898e9042938be0e5709ec46ff0ead0797, which also added an item in NEWS:


8<--
* Miscellaneous changes:

 - Automake's early configure-time sanity check now tries to avoid sleeping
   for a second, which slowed down cached configure runs noticeably.  In that
   case, it will check back at the end of the configure script to ensure that
   at least one second has passed, to avoid time stamp issues with makefile
   rules rerunning autotools programs.
8<--


Mike Frysinger () then complained that the above 
change, which enacted a policy of ensuring that any configure run 
requires at least one second, significantly delayed building packages 
that use many small configure scripts; his example in commit 
720a1153134b833de9298927a432b4ea266216fb showed an elimination of nearly 
two minutes of useless delays.  He appears to have also been trying to 
improve the performance of such a package in commit 
be55eaaa0bae0d6def92d5720b0e81f1d21a9db2, which may have actually made 
the problem worse by changing the test that determines whether to sleep 
at all.


I think we might best be able to avoid this by using 
AC_CONFIG_COMMANDS_POST to touch config.status if neccessary, instead of 
trying to decide whether to sleep before writing config.status.



Can someone please educate me as to what is really going on underneath
all this endless agonizing tweaking of the mtime tests?


I think that the main problem is that the test itself is difficult to do 
portably.



-- Jacob




Re: use of make in AM_SANITY_CHECK (was: improved timestamp resolution test)

2024-06-15 Thread Jacob Bachmeyer

Karl Berry wrote:

Jacob,

[*sigh*]

You said it. About this whole thing. I rather wish this bright idea had
never come to pass. It has delayed the release by months. Oh well.

Still, could we use make(1) for *all* of the testing and not use `ls -t` 


I guess it is technically possible, but I somehow feel doubtful about
relying entirely on make. Using ls already has plenty of portability
issues; I shudder to think how many strange problems we'll run into when
we start exercising timing edge cases in make.


Well, after having had some time to think about this, I have noticed a 
logic error in the current code.  When 
_AM_FILESYSTEM_TIMESTAMP_RESOLUTION was introduced in commit 
720a1153134b833de9298927a432b4ea266216fb, it did not use make.  Commit 
23e69f6e6d29b0f9aa5aa3aab2464b3cf38a59bf introduced the use of make in 
that test to work around a limitation on MacOS, but using make(1) in 
AM_SANITY_CHECK seems to be a logic error, since the user may want to 
build with a different $MAKE, which may have different characteristics 
from the system make.


I think that we actually need a new AM_PROG_MAKE_FILESYSTEM_TICK_DELAY 
or similar that packages needing that information can use, and I think I 
have a way to revise AM_SANITY_CHECK that can avoid any sleep in the 
most common cases.  There is no way to avoid sleeping when we need to 
measure the exact delay needed for files to be distinguishably newer, 
but most packages probably do not care about that, and (in the most 
common case) we can expect configure's mtime to be backdated according 
to the tarball it was unpacked from.  If configure was recently 
regenerated, we need only sleep 1 (classic POSIX) or 2 (FAT) seconds 
before either passing the test or declaring the build environment 
insane.  However, a package with a large number of configure scripts 
will only need for one of them to sleep; the rest will all then be old 
enough to take the zero-delay path.


Are you willing to consider patches on this?


-- Jacob



Re: End of life dates vs regression test matrix

2024-06-13 Thread Jacob Bachmeyer

Dan Kegel wrote:

Does automake have a policy on when to stop supporting a CPU, operating
system, or compiler?

I am pondering the size of the matrix of supported operating systems, cpus,
and compilers, and wonder where a policy like
"Automake drops support 20 years after the release of a CPU, operating
system, or compiler version" would fall on the heresy/utility plane.


The way I understand that the GNU build system is supposed to work is 
that there are no "supported" CPUs, operating systems, etc.  The GNU 
build system adapts packages to features found on the current machine by 
testing for those features just before building the package, using an 
often very lengthy shell script named "configure" that is itself 
generated by the relevant maintainer tools.


This system has worked surprisingly well---releases made years ago can 
often be adapted to processor architectures that literally did not exist 
when the source tarball was built by simply replacing config.{guess,sub} 
with current versions that recognize the newer architecture.  As far as 
those scripts embodying lists of known architectures go, entries appear 
to /never/ expire, and config.guess still today can identify (or so we 
think) systems that predate POSIX.



-- Jacob




Re: improved timestamp resolution test (was: 1.16.90 regression: configure now takes 7 seconds to start)

2024-06-12 Thread Jacob Bachmeyer

Karl Berry wrote:
Does BSD ls(1) support "--time=ctime --time-style=full-iso"?  


BSD ls does not support any --longopts. Looking at the man page,
I don't see "millisecond" or "subsecond" etc. mentioned, though I could
easily be missing it. E.g.,
  https://man.freebsd.org/cgi/man.cgi?ls

Even if there is such an option, I am skeptical of how portable it would
be, or trying to discern whether it is really working or not. All the
evidence so far is that it is very difficult to determine whether
subsecond mtimes are sufficiently supported or not. Speaking in general,
I don't think trying to get into system-specific behaviors, of whatever
kind, is going to help.


[*sigh*]

It seems that there is no good way for configure to read timestamps, so 
we are limited to testing if file ages are distinguishable.


Still, could we use make(1) for *all* of the testing and not use `ls -t` 
at all?  A rough outline would be something like:  (lightly tested; runs 
in about 2.2s here)


8<--
# The case below depends on the 1/10 + 9/10 = 10/10 pattern.
am_try_resolutions="0.01 0.09 0.1 0.9 1"
echo '#' > conftest.mk
i=0
for am_try_res in $am_try_resolutions; do
 echo ts${i} > conftest.ts${i}
 sleep $am_try_res
 echo "conftest.ts${i}: conftest.ts"`expr 1 + $i` >> conftest.mk
 echo "echo $am_try_res" >> conftest.mk
 i=`expr 1 + $i`
done
echo end > conftest.ts${i}
# This guess can be one step too fast, if the shorter delay just
#  happened to span a clock tick boundary.
am_resolution_guess=`make -f conftest.mk conftest.ts0 | tail -1`
case $am_resolution_guess in
 *9)
   i=no
   for am_try_res in $am_try_resolutions; do
 if test x$i = xyes; then
am_resolution=$am_try_res
break
 fi
 test x$am_try_res = x$am_resolution_guess && i=yes
   done
 ;;
 *)
   am_resolution=$am_resolution_guess
 ;;
esac
8<--


The trick is that the various options form a dependency chain, but the 
command make will execute does /not/ actually touch the target, so it 
stops when the files are no longer distinguishable.  This distinguishes 
between a tmpfs (which has nanosecond resolution here) and /home (which 
is an older filesystem with only 1-second resolution).  I am not sure 
what it does with FAT yet.  There should be some way to use 0.1+0.9+1 = 
2 and 0.01+0.09+0.1+0.9+1 > 2 to check for that (accurately!) without 
further sleeps.



-- Jacob



Re: 1.16.90 regression: configure now takes 7 seconds to start

2024-06-12 Thread Jacob Bachmeyer

dherr...@tentpost.com wrote:
At some point, it becomes unreasonable to burden common platforms with 
delays that only support relatively obscure and obsolete platforms.  
Configure scripts already have a bad reputation for wasting time.  
Even if they are faster than editing a custom makefile, they are idle 
instead of active time for the user, so waiting is harder.


I feel that 6-second test delays or 2-second incremental delays later 
qualify as clearly unreasonable.  The 1-second timestamps are 
borderline unreasonable.  Cross-compiling with a decent filesystem is 
more reasonable.


One second timestamp granularity is classic POSIX, and apparently also 
modern NetBSD.  We must support it.


Why can't we resolve this by requiring systems with 2-second 
resolution to set a flag in config.site?  That moves the burden closer 
to where it belongs.


First, because configure scripts are supposed to Just Work without 
particular expertise on the part of the user.  (Users with such 
deficient systems are least likely to have the expertise to handle 
that.)  Second, because timestamp resolution is actually per-volume, 
which in the POSIX model, means it varies by directory.  You can even 
have a modern filesystem (with nanosecond granularity) mounted on a 
directory in a FAT filesystem (with two second granularity) and 
ultimately a root filesystem with one second granularity.


In fact, the machine on which I type this has all three:  any tmpfs has 
nanosecond resolution, but /home has been carried for many years since 
mkfs and has one-second resolution, and I have removable media that is 
formatted FAT with its infamous two-second resolution.  All of these, 
when in use, appear in the same hierarchical filesystem namespace.



-- Jacob



Re: Bug Resilience Program of German Sovereign Tech Fund

2024-06-12 Thread Jacob Bachmeyer

Karl Berry wrote:

[...]

 > and reduce technical debt.

I don't know what that means. I instinctively shy away from such
vague buzzwords.
  


Essentially, "technical debt" means "stuff on the TODO list" and more 
specifically the accumulation of "good enough for now; fix it later" 
that tends to happen in software projects.



As for "modernizing" autoconf/make, mentioned in other msgs, that's the
last thing that should be done. We go to a lot of trouble to make the
tools work on old systems that no one else supports. For example, I can
just picture them saying "oh yes, you should use $(...) instead of
`...`" and other such "modern" shell constructs. Or "use Perl module
xyz to simplify", where xyz only became available a few years ago. Etc.
  


If you make them run their patches past the mailing list, I will happily 
complain if they try to break backwards compatibility without a very 
good reason.  Remember Time::HiRes and perl 5.6?  :-)



-- Jacob



Re: 1.16.90 regression: configure now takes 7 seconds to start

2024-06-11 Thread Jacob Bachmeyer

Karl Berry wrote:

bh> Seen e.g. on NetBSD 10.0.

Which doesn't support subsecond mtimes?

jb> Maybe the best answer is to test for subsecond timestamp
granularity first, and then only do the slow test to distinguish
between 1-second and 2-second granularity if the subsecond
granularity test gives a negative result?

Unfortunately, that is already the case. The function
(_AM_FILESYSTEM_TIMESTAMP_RESOLUTION in m4/sanity.m4) does the tests
starting with .01, then .1, then 1. Searching for sleep [012] in Bruno's
log confirms this.
  


So we are hitting the one-second timestamp granularity path because 
there is a modern system that does not have sub-second timestamp 
granularity, and that path is annoyingly slow.



If I understand correctly, Bruno's goal is to omit the "1" test if we
can detect that we're not on a fat filesystem. But I admit I don't like
trying to inspect filesystem types. That way lies madness, it seems to
me, and this whole function is already maddening enough. E.g., mount
and/or df could hang if NFS is involved.
  


I agree, although I had not considered the possibilities of problems 
with NFS.



It seems to me that using stat doesn't help because it's not available
by default on the bsds etc., as Bruno pointed out.
  


Does BSD ls(1) support "--time=ctime --time-style=full-iso"?  That would 
give equivalent information as stat(1) and, if at least one file has an 
odd seconds field, would rule out FAT quickly.  It could also indicate 
one-second granularity, if all subsecond parts are zero.  The slow test 
would still be required to confirm the worst case:  all timestamps are 
even because the filesystem has 2-second timestamp granularity.



The simple change is to omit the make test if we are at resolution 1.
That will save 4 seconds. Omitting it is justified because the make test
is only there for the sake of makes that are broken wrt subsecond
timestamps. I will do that.
  


If the critical issue is whether or not make(1) correctly handles 
subsecond timestamp granularity, why not simply test if make(1) 
recognizes subsecond timestamp differences and remove the other tests?  
If we are on a filesystem that does not have subsecond timestamp 
granularity, make will not have it either.



That will leave 2+ sec of sleeping, but if we are to reliably support
fat, I don't see a good alternative. At least it's not as bad as 6+.
Any other ideas?


As I hinted at, could we move the entire test into make(1) somehow?  
Could we lay out a set of files with timestamps differing by .01, .1, 1 
seconds and then see which pairs have ages distinguishable by make(1)?  
That should complete in less than 2 seconds:  1.11 seconds to make the 
files and less than half a second to run make(1).  If we can find a 
portable way to read timestamps to 1-second resolution, we can confirm 
not being on FAT---there will be one "odd file out" and the others will 
all have either odd or even timestamps.  If the timestamps all match, we 
can assume 2-second granularity without further testing, or do the slow 
test to confirm it.  If the "odd file out" has a timestamp two seconds 
ahead of the others, we *know* the filesystem has 2-second granularity 
and we crossed a "tick" boundary while making the files.


Alternately, could we improve the UI by emitting one additional dot per 
approximate second during the test?  Reassure the user that, yes, 
configure is doing something, even if all we can actually do is wait for 
the clock to advance.



-- Jacob



Re: 1.16.90 regression: configure now takes 7 seconds to start

2024-06-07 Thread Jacob Bachmeyer

Bruno Haible wrote:

Hi Jacob,

  

AFAIU, the 4x sleep 0.1 are to determine whether
am_cv_filesystem_timestamp_resolution should be set to 0.1 or to 1.
OK, so be it.

But the 6x sleep 1 are to determine whether
am_cv_filesystem_timestamp_resolution should be set to 1 or 2.
2 is known to be the case only for FAT/VFAT file systems. Therefore
here is a proposed patch to speed this up. On NetBSD, it reduces
the execution time of the test from ca. 7 seconds to ca. 0.5 seconds.
  
The problem with the proposed patch is that it tries to read a 
filesystem name instead of testing for the feature.  This would not be 
portable to new systems that use a different name for their FAT 
filesystem driver.



I can amend the patch so that it uses `uname -s` first, and does the
optimization only for the known systems (Linux, macOS, FreeBSD, NetBSD,
OpenBSD, Solaris).
  


This still has the same philosophical problem:  testing for a known 
system rather than for the feature we actually care about.  (We could 
also identify FAT with fair confidence by attempting to create a file 
with a name containing a character not allowed on the FAT filesystem, 
but I remember Linux having had at least one extended FAT driver 
("umsdos" if I remember correctly) that lifted the name limits, but I do 
not remember if it also provided improved timestamps.)


I think the test can be better optimized for the common case by first 
checking if stat(1) from GNU coreutils is available ([[case `stat 
--version` in *coreutils*) YES;; *) NO;; esac]])



Sure, if GNU coreutils 'stat -f' is available, things would be easy.
But typically, from macOS to Solaris, it isn't.

You can't achieve portability by using a highly unportable program
like 'stat'. That's why my patch only uses 'df' and 'mount'.
  


You can use anything in configure, *if* you first test for it and have a 
fallback if it is not available.  In this case, I am proposing testing 
for 'stat -f', using it to examine conveniently-available timestamps to 
establish an upper bound on timestamp granularity if we can, and falling 
back to the current (slow) tests if not.  Users of the GNU system will 
definitely get the fast path.


and, if it is (common 
case and definitely so on the GNU system), checking [[case `stat 
--format=%y .` in *:??.0) SUBSEC_RESOLUTION=no;; *) 
SUBSEC_RESOLUTION=yes;; esac]] to determine if sub-second timestamps are 
likely to be available



I don't care much about the 0.4 seconds spent on determining sub-second
resolution. It's the 6 seconds that bug me.
  


If 'stat -f' is available, we should be able to cut that to 
milliseconds.  GNU systems will have 'stat -f', others might.  The slow 
path would remain available if the fast path cannot be used.  Using a 
direct feature test for 'stat -f' might motivate the *BSDs to also 
support it.


To handle filesystems with 2-second timestamp resolution, check the 
timestamp on configure, and arrange for autoconf to ensure that the 
timestamp of a generated configure script is always odd



Since a tarball can be created on ext4 and unpacked on vfat FS,
  

That is exactly the situation I am anticipating here.

this would mean that autoconf needs to introduce a sleep() of up to
1 second, _regardless_ on which FS it is running. No, thank you,
that is not a good cure to the problem.


One second, once, when building configure, to ensure that configure will 
have an odd timestamp... does autoconf normally complete in less than 
one second?  Would this actually increase the running time 
significantly?  Or, as Simon Richter mentioned, use the utime builtin 
(Autoconf is now written in Perl) to advance the mtime of the created 
file by one second before returning with no actual delay.


The bigger problem would be that it would be impossible to properly 
package such a configure script if using a filesystem with 2-second 
granularity.  Such a configure script would always be unpacked with an 
even timestamp (because it was packaged with an even timestamp) and the 
2-second granularity test would give a false positive if the filesystem 
actually has 1-second granularity, but configure itself was generated on 
a 2-second granularity filesytem.  The suggested tests for sub-second 
granularity would still work correctly on the unpacked files, 
however---if you can see non-zero fractional seconds in timestamps, you 
know that you are not on a 2-second granularity filesystem.


Maybe the best answer is to test for subsecond timestamp granularity 
first, and then only do the slow test to distinguish between 1-second 
and 2-second granularity if the subsecond granularity test gives a 
negative result?  Most modern systems will have the subsecond timestamp 
granularity, so would need only the 0.4 second test; older systems would 
need the full 6.4 second test, but would still work reliably.  At worst, 
we might need to extend the 0.4 second test to 0.5 seconds, to confirm 
that we did not just happen to 

Re: 1.16.90 regression: configure now takes 7 seconds to start

2024-06-07 Thread Jacob Bachmeyer

Bruno Haible wrote:

[I'm writing to automake@gnu.org because bug-autom...@gnu.org
appears to be equivalent to /dev/null: no echo in
https://lists.gnu.org/archive/html/bug-automake/2024-06/threads.html
nor in https://debbugs.gnu.org/cgi/pkgreport.cgi?package=automake,
even after several hours.]

In configure scripts generated by Autoconf 2.72 and Automake 1.16.90,
one of the early tests
  checking filesystem timestamp resolution...
takes 7 seconds! Seen e.g. on NetBSD 10.0.

Logging the execution time, via
  sh -x ./configure 2>&1 | gawk '{ print strftime("%H:%M:%S"), $0; fflush(); }' 
> log1
I get the attached output. There are
  6x sleep 1
  4x sleep 0.1
That is, 6.4 seconds are wasted in sleeps. IBM software may do this;
but GNU software shouldn't.

AFAIU, the 4x sleep 0.1 are to determine whether
am_cv_filesystem_timestamp_resolution should be set to 0.1 or to 1.
OK, so be it.

But the 6x sleep 1 are to determine whether
am_cv_filesystem_timestamp_resolution should be set to 1 or 2.
2 is known to be the case only for FAT/VFAT file systems. Therefore
here is a proposed patch to speed this up. On NetBSD, it reduces
the execution time of the test from ca. 7 seconds to ca. 0.5 seconds.


The problem with the proposed patch is that it tries to read a 
filesystem name instead of testing for the feature.  This would not be 
portable to new systems that use a different name for their FAT 
filesystem driver.


I think the test can be better optimized for the common case by first 
checking if stat(1) from GNU coreutils is available ([[case `stat 
--version` in *coreutils*) YES;; *) NO;; esac]]) and, if it is (common 
case and definitely so on the GNU system), checking [[case `stat 
--format=%y .` in *:??.0) SUBSEC_RESOLUTION=no;; *) 
SUBSEC_RESOLUTION=yes;; esac]] to determine if sub-second timestamps are 
likely to be available; this has a 1-in-actual-ticks-per-second of 
giving a false negative.  These checks would be very fast, so could also 
be repeated with the access and inode change timestamps and/or extended 
to other files (`stat *`) for better certainty.  The basic concept 
should be sound, although the pattern matching used in the examples is a 
first cut.


The essential idea is that the fractional part beyond what the 
filesystem actually records will always read as zero, and unpacking an 
archive is not instant, so we should see every implemented fractional 
bit set at least once across files in the tree containing configure.


To handle filesystems with 2-second timestamp resolution, check the 
timestamp on configure, and arrange for autoconf to ensure that the 
timestamp of a generated configure script is always odd---that 
least-significant bit will be dropped when the script is unpacked on a 
filesystem with 2-second timestamp resolution.


If stat from GNU coreutils is not available, fall back to the current 
sleep(1)-based test and just eat the delay in the name of portability.  
The test checks only for "coreutils" because very old versions did not 
say GNU.  A better, functional test for stat(1) is probably also possible.



-- Jacob




Re: follow-up on backdoor CPU usage (was: libsystemd dependencies)

2024-04-26 Thread Jacob Bachmeyer

Jacob Bachmeyer wrote:
[...]  The preliminary reports that it was an RCE backdoor that would 
pass commands smuggled in public key material in SSH certificates to 
system(3) (as root of course, since that is sshd's context at that 
stage) are inconsistent with the slowdown that caused the backdoor to 
be discovered.  I doubt that SSH logins were using that code path, and 
the SSH scanning botnets almost certainly are not presenting 
certificates, yet it apparently (reports have been unclear on this 
point) was the botnet scanning traffic that led to the discovery of 
sshd wasting considerable CPU time in liblzma...


I am waiting for the proverbial other shoe to drop on that one.


I have been given 
(https://www.openwall.com/lists/oss-security/2024/04/18/1>) a 
satisfactory explanation for the inconsistency:  OpenSSH sshd uses 
exec(2) to reshuffle ASLR before accepting each connection, and the 
backdoor blob's tampering with the dynamic linking process greatly 
reduces the efficiency of ld.so on top of its own processing.  The 
observable wasted CPU time was the backdoor's excessively-complex 
initialization, rather than any direct effect on sshd connection processing.



-- Jacob




Re: GCC reporting piped input as a security feature

2024-04-12 Thread Jacob Bachmeyer

Zack Weinberg wrote:

On Tue, Apr 9, 2024, at 11:35 PM, Jacob Bachmeyer wrote:
  

Jan Engelhardt wrote:


On Tuesday 2024-04-09 05:37, Jacob Bachmeyer wrote:

  

In principle it could be posible to output something different to
describe this stramge situation explicitly.  For instance, output
"via stdin" as a comment, or output `stdin/../filename' as the file
name. (Programs that optimize the file name by deleting XXX/.../
are likely not to check whether XXX is a real directory.)
  

...

How about `/dev/stdin/-` if no filename has been specified with #line or
whatever, and `/dev/stdin/[filename]` if one has, where [filename] is
the specified filename with all leading dots and slashes stripped,
falling back to `-` if empty? /dev/stdin can be relied on to either not
exist or not be a directory, so these shouldn't ever be openable.


I like that idea, but would suggest expanding on it as 
"/dev/stdin/[working directory]//-" or "/dev/stdin/[working 
directory]//[full specified filename]".  The double slash allows tools 
that care to parse out the specified filename, while the working 
directory preceding it provides a hint where to find that file if the 
specified filename is relative, but the kernel will collapse it to a 
single slash if a tool just passes the "[working directory]//[specified 
filename]" to open(2).  Since the working directory should itself be an 
absolute name, there would typically be a double slash after the 
"/dev/stdin" prefix.  Something like 
"/dev/stdin//var/cache/build/foopkg-1.0.0///usr/src/foopkg-1.0.0/special.c.m4" 
as an artificial example.



-- Jacob




Re: GCC reporting piped input as a security feature

2024-04-09 Thread Jacob Bachmeyer

Jan Engelhardt wrote:

On Tuesday 2024-04-09 05:37, Jacob Bachmeyer wrote:
  

In principle it could be posible to output something different to
describe this stramge situation explicitly.  For instance, output "via
stdin" as a comment, or output `stdin/../filename' as the file name.
(Programs that optimize the file name by deleting XXX/.../ are likely
not to check whether XXX is a real directory.)
  

With the small difference that I believe the special marker should be ''
(with the angle brackets, as it is now), this could be another good idea.
Example output:  "[working directory][specified filename]" or
"[specified filename]///<>/[working directory]/".  GDB could be modified
[...]



This will likely backfire. Assuming you have a userspace program
which does not care about any particular substring being present, the
fullpath is passed as-is to the OS kernel, which *will* resolve it
component by component, and in doing so, stumble over the XXX/ part.
  


And upon so stumbling, return ENOENT or ENOTDIR.  Where is the harm 
there?  Input read from a pipe does not exist in the filesystem.



Better introduce a new DW_AT_ field for a stdin flag.
  


That would mean that older tools could be confused.  How about a new 
field for "source-specified filename" when that differs from the actual 
file being read?  That way, existing tools would still see "[working 
directory]/" and avoid confusion, which could be a security risk 
here.



-- Jacob




Re: GCC reporting piped input as a security feature

2024-04-09 Thread Jacob Bachmeyer

Alan D. Salewski wrote:

On 2024-04-08 22:37:50, Jacob Bachmeyer  spake thus:

Richard Stallman wrote:

[...]

In principle it could be posible to output something different to
describe this stramge situation explicitly.  For instance, output "via
stdin" as a comment, or output `stdin/../filename' as the file name.
(Programs that optimize the file name by deleting XXX/.../ are likely
not to check whether XXX is a real directory.)



With the small difference that I believe the special marker should be
'' (with the angle brackets, as it is now), this could be another
good idea.  Example output:  "[working directory][specified
filename]" or "[specified filename]///<>/[working directory]/".
GDB could be modified to recognize either form and read the specified
file (presumably some form of augmented C) but report that the sources
were transformed prior to compilation.  The use of triple-slash ensures
that these combined strings cannot be confused with valid POSIX
filenames, although I suspect that uses of these strings would have to
be a GNU extension to the debugging info format.


I do not think that the use of triple-slash (or any-N-slash) would
entirely prevent potential confusion with valid POSIX filenames, as
POSIX treats multiple slashes as equivalent to a single slash
(except in at the beginning of a path, where two slash characters
may have a different, implementation-defined meaning).

Since a pathname component name can basically contains any bytes
except  and , any token value chosen will likely have
some non-zero potential for confusion with a valid POSIX pathname.


Yes, this is the downside of the extreme flexibility of POSIX filename 
semantics.  Any C string is potentially a valid filename.



From SUSv4 2018[0] (update from 2020-04-30, which is what I happen
to have handy):


3.271 Pathname

A string that is used to identify a file. In the context of
POSIX.1-2017, a pathname may be limited to {PATH_MAX} bytes,
including the terminating null byte. It has optional beginning
 characters, followed by zero or more filenames separated
by  characters. A pathname can optionally contain one or
more trailing  characters. Multiple successive 
characters are considered to be the same as one , except
for the case of exactly two leading  characters.



Rats, I had forgotten that detail.  Emacs treats typing a second slash 
as effectively invalidating everything to the left, I remembered that 
some systems (and many URL schemes) use double-slash to indicate a 
network host, and I expected that 3 slashes would mean starting over at 
the root if that were ever presented to the kernel's filename resolution 
service.


On the other hand, we could use multiple slashes as a delimiter if GCC 
normalizes such sequences in input filename strings to single slash, 
which POSIX allows, according to the quote above.


The simplest solution would be to simply document and preserve the 
current behavior, which appears to be ignoring directives and recording 
the working directory and "" in the case of reading from a pipe, 
and making sure that no normal build procedure for any GNU package pipes 
source into the compiler.



-- Jacob




Re: GCC reporting piped input as a security feature

2024-04-08 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > While it does not /prevent/ cracks, there is something we can ensure 
  > that we *keep* doing:  GCC, when reading from a pipe, records the input 
  > file as "" in debug info *even* if a "#" directive to set the 
  > filename has been included.  This was noticed by Adrien Nader (who 
  > posted it to oss-security; 
  > https://www.openwall.com/lists/oss-security/2024/04/03/2> and 
  > https://marc.info/?l=oss-security&m=171214932201156&w=2>; those are 
  > the same post at different public archives) and should provide a 
  > "smoking gun" test to detect this type of backdoor dropping technique in 
  > the future.  This GCC behavior should be documented as a security 
  > feature, because most program sources are not read from pipes.


Are you suggesting fixing GCC to put the specified file into those
linenumbers, or are you suggesting we keep this behavior
to help with analysis?
  


I am suggesting that we keep this behavior (and document it as an 
explicit security feature) to help with detection of any future similar 
cracks, and add provisions to the GNU Coding Standards to avoid false 
positives by requiring generated sources to appear in the filesystem 
instead of being piped to the compiler.



In principle it could be posible to output something different to
describe this stramge situation explicitly.  For instance, output "via
stdin" as a comment, or output `stdin/../filename' as the file name.
(Programs that optimize the file name by deleting XXX/.../ are likely
not to check whether XXX is a real directory.)
  


With the small difference that I believe the special marker should be 
'' (with the angle brackets, as it is now), this could be another 
good idea.  Example output:  "[working directory][specified 
filename]" or "[specified filename]///<>/[working directory]/".  
GDB could be modified to recognize either form and read the specified 
file (presumably some form of augmented C) but report that the sources 
were transformed prior to compilation.  The use of triple-slash ensures 
that these combined strings cannot be confused with valid POSIX 
filenames, although I suspect that uses of these strings would have to 
be a GNU extension to the debugging info format.  (If GNU-extended 
debugging information is inhibited, I think it is more important to 
declare that the input came from a pipe than to carry the specified 
filename.)  This might actually be a good idea in general if a directive 
specifies a filename with the same suffix but not the file being read.


As another layer against similar attacks, distribution packaging tools 
could grep the debug symbols for '' and raise alarms if matches 
are found.  Forbidding piping source to the compiler in the GNU Coding 
Standards would eliminate false positives.



-- Jacob



Re: detecting modified m4 files (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-07 Thread Jacob Bachmeyer

Bruno Haible wrote:

Richard Stallman commented on Jacob Bachmeyer's idea:
  
  > > Another related check that /would/ have caught this attempt would be 
  > > comparing the aclocal m4 files in a release against their (meta)upstream 
  > > sources before building a package.  This is something distribution 
  > > maintainers could do without cooperation from upstream.  If 
  > > m4/build-to-host.m4 had been recognized as coming from gnulib and 
  > > compared to the copy in gnulib, the nonempty diff would have been 
  > > suspicious.


I have a hunch that some effort is needed to do that comparison, but
that it is feasible to write a script to do it could make it easy.
Is that so?



Yes, the technical side of such a comparison is relatively easy to
implement:
  - There are less than about 2000 or 5000 *.m4 files that are shared
between projects. Downloading and storing all historical versions
of these files will take ca. 0.1 to 1 GB.
  - They would be stored in a content-based index, i.e. indexed by
sha256 hash code.
  - A distribution could then quickly test whether a *.m4 file found
in a distrib tarball is "known".

The recurrently time-consuming part is, whenever an "unknown" *.m4 file
appears, to
  - manually review it,
  - update the list of upstream git repositories (e.g. when a project
has been forked) or the list of releases to consider (e.g. snapshots
of GNU Autoconf or GNU libtool, or distribution-specific modifications).

I agree with Jacob that a distro can put this in place, without needing
to bother upstream developers.


I have since thought of a simple solution that /would/ have caught this 
backdoor campaign in its tracks:  an "autopoint --check" command that 
simply compares the m4/ files (and possibly others?) that autopoint 
would copy in if m4/ were empty against the files that would be copied 
and reports any differences.  A newer serial in the package tree than 
the system m4 library produces a minor complaint; a file with the same 
serial and different contents produces a major complaint.  An older 
serial in the package tree should be reported, but is likely to be of no 
consequence if a distribution's packaging routine will copy in the 
known-good newer version before rebuilding configure.  Any m4/ files 
local to the package are simply reported, but those are also in the 
package's Git repository.


Distribution package maintainers would run "autopoint --check" and pass 
any suspicious files to upstream maintainers for evaluation.  (The 
distribution's own packaging system can trace an m4 file in the system 
library came to its upstream package.)  The modified build-to-host.m4 
would have been very /unlikely/ to slip past the 
gnulib/gettext/Automake/Autoconf maintainers, although few distribution 
packagers would have had suspicions.  The gnulib maintainers would know 
that gl_BUIILD_TO_HOST should not be checking /anything/ itself and the 
crackers would have been caught.


This should be effective in closing off a large swath of possible 
attacks:  a backdoor concealed in binary test data (or documentation) 
requires some visible means to unpack it, which means the unpacker must 
appear in source somewhere.  While the average package maintainer might 
not be able to make sense of a novel m4 file, the maintainers of GNU's 
version of that file /will/ be able to recognize such chicanery, and the 
"red herrings" the cracker added for obfuscation would become a 
liability.  Without them, the effect of the new code is more obvious, so 
the crackers lose either way.



-- Jacob




Re: GCC reporting piped input as a security feature (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-05 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

[...]

When considering any such change, we still should consider the question:
will this actually prevent cracks, or will it rather give crackers
an additional way to check that their activities can't be detected.
  


While it does not /prevent/ cracks, there is something we can ensure 
that we *keep* doing:  GCC, when reading from a pipe, records the input 
file as "" in debug info *even* if a "#" directive to set the 
filename has been included.  This was noticed by Adrien Nader (who 
posted it to oss-security; 
https://www.openwall.com/lists/oss-security/2024/04/03/2> and 
https://marc.info/?l=oss-security&m=171214932201156&w=2>; those are 
the same post at different public archives) and should provide a 
"smoking gun" test to detect this type of backdoor dropping technique in 
the future.  This GCC behavior should be documented as a security 
feature, because most program sources are not read from pipes.


The xz backdoor dropper took great pains to minimize its use of the 
filesystem; only the binary blob ever touches the disk, and that 
presumably because there is no other way to feed it into the linker.  If 
debug info is regularly checked for symbols obtained from "" and 
the presence of such symbols reliably indicates funny business, then we 
force crackers to risk leaving more direct traces in the filesystem, 
instead of being able to patch the code "in memory" and feed an 
ephemeral stream to the compiler.  The "Jia Tan" crackers seem to have 
put a lot of work into minimizing the "footprint" of their dropper, so 
we can assume that this must have been important to them.


To avoid false positives if this test is used, we might want to add a 
rule to the GNU Coding Standards (probably in the "Makefile Conventions" 
section) that code generated with other utilities MUST always be 
materialized in the filesystem and MUST NOT be piped into the compiler.



-- Jacob



Re: compressed release distribution formats (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-02 Thread Jacob Bachmeyer

Eric Blake wrote:

[adding in coreutils, for some history]

[...]

At any rate, it is now obvious (in hindsight) that zstd has a much
larger development team than xz, which may alter the ability of zstd
being backdoored in the same way that xz was, merely by social
engineering of a lone maintainer.
  


That just means that a cracker group needs to plant a mole in a larger 
team, which was effectively the goal of the sockpuppet campaign against 
the xz-utils maintainer, except that the cracker's sockpuppet was the 
second member of a two-member team.  I see no real difference here.


I would argue that GNU software should be consistently available in at 
least one format that can be unpacked using only tools that are also 
provided by the GNU project.  I believe that currently means "gzip", 
unfortunately.  We should probably look to adopt another one; perhaps 
the lzip maintainer might be interested?



It is also obvious that having GNU distributions available through
only a SINGLE compression format, when that format may be vulnerable,
  
The xz format is not vulnerable, or at least has not been shown to be so 
in the sense of security risks, and only xz-utils was backdoored.  Nor 
is there only one implementation:  7-zip can also handle xz files.

is a dis-service to users when it is not much harder to provide
tarballs in multiple formats.  Having multiple tarballs as the
recommendation can at least let us automate that each of the tarballs
has the same contents,
Agreed.  In fact, if multiple formats can be produced concurrently, we 
could validate that the compressed tarballs are actually identical.  
(Generate using `tar -cf - [...] | tee >(compress1 >[...].tar.comp1) | 
tee >(compress2 >[...].tar.comp2) | gzip -9 >[...].tar.gz` if you do not 
want to actually write the uncompressed tarball to the disk.)  But if 
tarlz is to be used to write the lzipped tarball, you probably want to 
settle for "same file contents", since tarlz only supports pax format 
and we may want to allow older tar programs to unpack GNU releases.

 although it won't make it any more obvious
whether those contents match what was in git (which was how the xz
backdoor got past so many people in the first place).
This is another widespread misunderstanding---almost all of the xz 
backdoor was hidden in plain sight (admittedly, compressed and/or 
encrypted) *in* the Git repository.  The only piece of the backdoor not 
found in Git was the modified build-to-host.m4.  The xz-utils project's 
standard practice had been to /not/ commit imported m4 files, but to 
bring them in when preparing release tarballs.  The cracker simply 
rolled the "key" to the dropper into the release tarball.  I still have 
not seen whether the configure script in the release tarball was built 
with the modified build-to-host.m4 or if the crackers were depending on 
distribution packagers to regenerate configure.


Again, everything present in both Git and the release tarball /was/ 
/identical/.  There were no mismatches, only files added to the release 
that are not in the repository, and that are /expected/ to be added to a 
release.



-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > My first thought was that Autoconf is a relatively trivial attack vector 
  > since it is so complex and the syntax used for some parts (e.g. m4 and 
  > shell scripts) is so arcane.  In particular, it is common for Autotools 
  > stuff to be installed on a computer (e.g. by installing a package from 
  > an OS package manager) and then used while building.  For example, there 
  > are large collections of ".m4" files installed.  If one of the m4 files 
  > consumed has been modified, then the resulting configure script has been 
  > modified.


Can anyone think of a feasible way to prevent this sort of attack?
  


There have been some possibilities suggested on other branches of the 
discussion.  I have changed the subject of one of those to "checking 
aclocal m4 files" to highlight it.  There is progress being made, but 
the solutions appear to be outside the direct scope of the GNU build 
system packages.



Someone suggested that configure should not use m4 files that are
lying around, but rather should fetch them from standard release points,
WDYT of that idea?
  


Autoconf configure scripts do not use nearby m4 files and do not require 
m4 at all; aclocal collects the files in question into aclocal.m4 (I 
think) and then autoconf uses that (and other inputs) to /produce/ 
configure.  (This may seem like a trivial point, but exact derivations 
and their timing were critical to how the backdoor dropper worked.)  
Other tools (at least autopoint from GNU gettext, possibly others) are 
used to automatically scan a larger set of m4 files stored on the system 
and copy those needed into the m4/ directory of a package source tree, 
in a process conceptually similar to how the linker pulls only needed 
members from static libraries when building an executable.  All of this 
is done on the maintainer's machine, so that the finished configure 
script is included in the release tarball.


There have been past incidents where malicious code was directly added 
to autoconf-generated configure scripts, so (as I understand) 
distribution packagers often regenerate configure before building a 
package.  In /this/ case, the crackers (likely) modified the version of 
build-to-host.m4 on /their/ computer, so the modified file would be 
copied into the xz-utils/m4 directory in the release tarball and used 
when distribution packagers regenerate configure before building the 
package.


Fetching these files from standard release points would require an index 
of those standard release points, and packages are allowed to have their 
own package-specific macros as well.  The entire system dates from well 
before ubiquitous network connectivity could be assumed (anywhere---and 
that is still a bad assumption in the less prosperous parts of the 
world), so release tarballs are meant to be self-contained, including 
copies of "standard" macros needed for configure but not supplied by 
autoconf/automake/etc.



-- Jacob



Re: checking aclocal m4 files (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-02 Thread Jacob Bachmeyer

Bruno Haible wrote:

Jacob Bachmeyer wrote:
  
Another related check that /would/ have caught this attempt would be 
comparing the aclocal m4 files in a release against their (meta)upstream 
sources before building a package.  This is something distribution 
maintainers could do without cooperation from upstream.  If 
m4/build-to-host.m4 had been recognized as coming from gnulib and 
compared to the copy in gnulib, the nonempty diff would have been 
suspicious.



True.

Note, however, that there would be some false positives:


True; all of these are Free Software, so a non-empty diff would still 
require manual review.



 libtool.m4 is often shipped modified,
  a) if the maintainer happens to use /usr/bin/libtoolize and
 is using a distro that has modified libtool.m4 (such as Gentoo), or
  


Distribution libtool patches could be accumulated into the set of "known 
sources".



  b) if the maintainer intentionally improved the support of specific
 platforms, such as Solaris 11.3.
  


In this case, the distribution maintainer should ideally take up pushing 
those improvements back to upstream libtool, if they are suitably general.



Also, for pkg.m4 there is no single upstream source. They distribute
a pkg.m4.in, from which pkg.m4 is generated on the developer's machine.
  


This would be a special case, but could be treated as a package-specific 
m4 file anyway, since the developer must generate it.  The developer 
could also write their own m4 macros to use with autoconf.



But for macros from Gnulib or the Autoconf macros archive, this is a
reasonable check to make.


This type of check could also allow "sweeping" improvements upstream, in 
the case of a package maintainer that may be unsure of how to upstream 
their changes.  (Of course, upstream needs to be careful about blindly 
collecting improvements, lest some of those "improvements" turn out to 
have come from cracker sockpuppets...)



-- Jacob




Re: binary data in source trees (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-02 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The issue seems to be releases containing binary data for unit tests, 
  > instead of source or scripts to generate that data.  In this case, that 
  > binary data was used to smuggle in heavily obfuscated object code.


If this is the crucial point, we could put in the coding standards
(or the maintainers' guide) not to do this.


On another branch of this discussion, Zack Weinberg noted that binary 
test data may be unavoidable in some cases.  (A base64 blob or hex dump 
may as well be a binary blob.)  Further, manuals often contain images, 
some of which may be in binary formats, such as PNG.  To take this all 
the way, we would have to require that all documentation graphics be 
generated from readable sources.  I know TikZ exists but am unsure how 
well it could be integrated into Texinfo, for example.



-- Jacob



Re: reproducible dists and builds (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-02 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > What would be helpful is if `make dist' would guarantee to produce the same
  > tarball (bit-to-bit) each time it is run, assuming the tooling is the same
  > version.  Currently I believe that is not the case (at least due to 
timestamps)

Isn't this a description of "reproducible compilation"?
  


No, but it is closely related.  Compilation produces binary executables, 
while `make dist` produces a freestanding /source/ archive.



We want to make that standard, but progress is inevitably slow
because many packages need to be changed.


I am not actually sure that that is actually a good idea.  (Well, it is 
mostly a good idea except for one issue.)  If compilation is strictly 
deterministic, then everyone ends up with identical binaries, which 
means an exploit that cracks one will crack all.  Varied binaries make 
life harder for crackers developing exploits, and may even make "one 
exploit to crack them all" impossible.  This is one of the reasons that 
exploits have long hit Windows (where all the systems are identical) so 
much harder than the various GNU/Linux distributions (where the binaries 
are likely different even before distribution-specific patches are 
considered).


Ultimately, this probably means that we should have both an /ability/ 
for deterministic compilation and either a compiler mode or 
post-processing pass (a linker option?) to intentionally shuffle the 
final executable.



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Eric Gallager wrote:

On Tue, Apr 2, 2024 at 12:04 AM Jacob Bachmeyer  wrote:
  

Russ Allbery wrote:


[...] I think one useful principle that's
emerged that doesn't disrupt the world *too* much is that the release
tarball should differ from the Git tag only in the form of added files.
  

 From what I understand, the xz backdoor would have passed this check.

[...]


[...] In other
words, even if a proposal wouldn't have stopped this particular
attack, I don't think that's a reason not to try it.


I agree that there may be dumber crackers who /would/ get caught by such 
a check, but I want to ensure that we do not end up thinking that we 
have a solution and the problem is solved and everyone is happy ... and 
then we get caught out when it happens again.


I should clarify also that I think that this proposal *is* a good idea, 
but we should remain aware that it would not have prevented this incident.


Maneuvering around back to topic, aclocal m4 files are fairly small, 
perhaps always carrying all of them that a package uses in the 
repository should be considered a good practice?  (In other words, 
autogen.sh should *not* run autopoint---the files autopoint adds should 
be in the repository.)  If such a practice were followed, that would 
have made checking for altered files between repository and release 
effective, or it would have forced the cracker to target the backdoor 
more widely and place the altered build-to-host.m4 in the repository, 
increasing the probability of discovery.


Wording that as a policy:  "All data inputs used to construct the build 
scripts for a package shall be stored in the package's repository."


Another related check that /would/ have caught this attempt would be 
comparing the aclocal m4 files in a release against their (meta)upstream 
sources before building a package.  This is something distribution 
maintainers could do without cooperation from upstream.  If 
m4/build-to-host.m4 had been recognized as coming from gnulib and 
compared to the copy in gnulib, the nonempty diff would have been 
suspicious.



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > `distcheck` target's prominence to recommend it in the "Standard
  > Targets for All Users" section of the GCS? 


  > Replying as an Automake developer, I have nothing against it in
  > principle, but it's clearly up to the GNU coding standards
  > maintainers. As far as I know, that's still rms (for anything
  > substantive)

To make a change in the coding standards calls for a clear and
specific proposal.  If people think a change is desirable, I suggest
making one or more such proposals.

Now for a bit of speculation.  I speculate that a cracker was careless
and failed to adjust certain details of a bogus tar ball to be fully
consistent, and that `make distcheck' enabled somene to notice those
errors.

I don't have any real info about whether that is so.  If my
speculation is mistaken, please say so.


I believe it is completely mistaken.  As I understand, the crocked 
tarballs would have passed `make distcheck` with flying colors.  The 
rest of your questions about it therefore have no answer.


On a side note, thanks for Emacs:  when I finally extracted a copy of 
the second shell script in the backdoor dropper, Emacs made short work 
(M-< M-> C-M-\) of properly indenting it and making the control flow 
obvious.  Misunderstandings of that control flow have been fairly 
common.  (I too had it wrong before I finally had a nicely indented copy.)


The backdoor was actually discovered in operation on machines running 
testing package versions.  It caused sshd to consume an inordinate 
amount of CPU time, with profiling reporting that sshd was spending most 
of its time in liblzma, a library not even linked in sshd.  (The "rogue" 
library had been loaded as a dependency of libsystemd, which the 
affected distributions had patched sshd to use for startup notification.)


I will send a more detailed reply on the other thread, since its subject 
is more appropriate.



-- Jacob




Re: role of GNU build system in recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I was recently reading about the backdoor announced in xz-utils the
  > other day, and one of the things that caught my attention was how
  > (ab)use of the GNU build system played a role in allowing the backdoor
  > to go unnoticed: https://openwall.com/lists/oss-security/2024/03/29/4

[...]

I don't want to get involved in fixing the bug, but I want to
make sure the GNU Project is working on it.
  


I believe the GNU Project is blameless and uninvolved in this matter.  I 
am aware of possible elements used in the attack from the GNU Project, 
however two of them are innocent improvements abused by the cracker and 
the third was modified by the cracker.


The backdoor is in two major segments:  a binary blob hidden in a 
testsuite data file and two shell scripts that drop it, also hidden in 
testsuite data files.  A modified version of build-to-host.m4 from 
gnulib is used to insert code into configure to initially extract the 
dropper and run it (via pipeline---the dropper shell scripts never touch 
the filesystem).


If several conditions are met (building a shared library on 
'x86_64-*-linux-gnu', HAVE_FUNC_ATTRIBUTE_IFUNC, using the GNU toolchain 
(or at least a linker that claims to be "GNU ld" and a compiler invoked 
as "gcc"), and building under either dpkg or rpm), the dropper extracts 
a binary blob and links it with a legitimate object, which is patched 
and recompiled (using sed in a pipeline; the modified C source never 
touches the filesystem) to call a function exported from the blob.


The aclocal m4 files involved in the attack were never committed to the 
xz Git repository, instead being added to each release tarball using 
autopoint.  (This was the package's standard practice before the 
attack.)  The offending build-to-host.m4 was modified by the cracker, 
either directly in the release tree or at the location where autopoint 
will find it.  Some of the modifications "sound like" the cracker may 
have used a language model trained on other GNU sources---they are very 
plausible at first glance.


The elements from the GNU Project potentially implicated are 
build-to-host.m4, gettext.m4, and the ifunc feature in glibc.  All of 
these turn out to be innocent.


The initial "key" that activated the backdoor dropper was a modified 
version of the gl_BUILD_TO_HOST macro from gnulib.  The dropper also 
checks m4/gettext.m4 for the text "dnl Convert it to C string syntax." 
and fails to extract the blob if found.  It turns out that 
gl_BUILD_TO_HOST is used only as a dependency of gettext.m4 and that 
that comment was removed in the same commit that factored out 
gl_BUILD_TO_HOST to gnulib.  (commit 
3adaddd73c8edcceaed059e859bd5262df65fc5a in GNU gettext repository is by 
Bruno Haible; his involvement in the backdoor campaign is *extremely* 
unlikely in my view)


The "ifunc" feature merely allows the programmer to store function 
pointers in the PLT instead of the data segment when alternate 
implementations of a function are involved.  Theoretically, it should 
actually be a security improvement, as the PLT can be made read-only 
after all links are resolved, while the data segment must remain 
generally writable.


The backdoor will not be dropped if the use of ifunc is disabled or if 
the feature is unavailable.  I currently believe that the cracker used 
ifunc support as a covert flag to disable the backdoor when the oss-fuzz 
project was scanning the package.  I also suspect that the cracker's 
claim that ifuncs cause segfaults under -fsanitize=address (in the 
message for commit ee44863ae88e377a5df10db007ba9bfadde3d314 in the xz 
Git repository) may have been less than honest; that commit also gives 
credit to another of the cracker's sockpuppets for the original patch 
and was committed by the cracker's main "Jia Tan" sockpuppet, so the 
involvement of the primary maintainer (who is listed as the author of 
the commit in Git) is uncertain.  (In other words, the xz Git repository 
likely contains blatant lies put there by the cracker.)


Looking into this a little more, I now know what the dropper's C source 
patch does:  the blob's initialization entrypoint is named _get_cpuid 
(note only one leading underscore) and is called from an inserted static 
inline function that crc{32,64}_resolve (the ifunc resolvers that choose 
CRC implementations) are patched to call.  The dropper also ensures (by 
modifying liblzma_la_LDFLAGS in src/liblzma/Makefile) that liblzma.so 
will be linked with -Wl,-z,now so that ifuncs are resolved as the shared 
object is loaded.  That is how the backdoor blob initially gains control 
at a time during early process initialization when the PLT is still 
writable despite other hardening.



-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Zack Weinberg wrote:

On Mon, Apr 1, 2024, at 2:04 PM, Russ Allbery wrote:
  

"Zack Weinberg"  writes:


It might indeed be worth thinking about ways to minimize the
difference between the tarball "make dist" produces and the tarball
"git archive" produces, starting from the same clean git checkout,
and also ways to identify and audit those differences.
  

There is extensive ongoing discussion of this on debian-devel. There's
no real consensus in that discussion, but I think one useful principle
that's emerged that doesn't disrupt the world *too* much is that the
release tarball should differ from the Git tag only in the form of
added files. Any files that are present in both Git and in the release
tarball should be byte-for-byte identical.



That dovetails nicely with something I was thinking about myself.
Obviously the result of "make dist" should be reproducible except for
signatures; to the extent it isn't already, those are bugs in automake.
But also, what if "make dist" produced *two* disjoint tarballs? One of
which is guaranteed to be byte-for-byte identical to an archive of the
VCS at the release tag (in some clearly documented fashion; AIUI, "git
archive" does *not* do what we want).  The other contains all the files
that "autoreconf -i" or "./bootstrap.sh" or whatever would create, but
nothing else.  Diffs could be provided for both tarballs, or only for
the VCS-archive tarball, whichever turns out to be more compact (I can
imagine the diff for the generated-files tarball turning out to be
comparable in size to the generated-files tarball itself).


The way to do that is to detect that "make dist" is being run in a VCS 
checkout, ask the VCS which files are in version control, and assume the 
others were somehow "brought in" by autogen.sh or whatever.  The problem 
is that now Automake needs to start growing support for varying version 
control systems, unless we /really/ want to say that this feature only 
works with Git.


The problem is that now the disjoint tarballs both need to be unpacked 
in the same directory to build the package and once that is done, how 
does "make dist" rebuild the distribution it was run from?  The file 
lists would need to be stored in the generated-files tarball.


The other problem is that this really needs to be an option.  DejaGnu, 
for example, stores the Autotools-generated files in Git and releases 
are just snapshots of the working tree.  (DejaGnu can also now *run* 
from a Git checkout without actually installing it, but that is a 
convenience limited to interpreted languages.)


Lastly, publishing a modified (third-party) distribution derived from a 
release instead of VCS *is* permitted.  (I believe this is a case of 
freedom 3.)  How would this feature interact with that?



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Russ Allbery wrote:

[...]

There is extensive ongoing discussion of this on debian-devel.  There's no
real consensus in that discussion, but I think one useful principle that's
emerged that doesn't disrupt the world *too* much is that the release
tarball should differ from the Git tag only in the form of added files.
  


From what I understand, the xz backdoor would have passed this check.  
The backdoor dropper was hidden in test data files that /were/ in the 
repository, and required code in the modified build-to-host.m4 to 
activate it.  The m4 files were not checked into the repository, instead 
being added (presumably by running autogen.sh with a rigged local m4 
file collection) while preparing the release.


Someone with a copy of a crocked release tarball should check if 
configure even had the backdoor "as released" or if the attacker was 
/depending/ on distributions to regenerate configure before packaging xz.



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Zack Weinberg wrote:

[...] but I do think there's a valid point here: the malicious xz
maintainer *might* have been caught earlier if they had committed the
build-to-host.m4 modification to xz's VCS.


That would require someone to notice that xz.git has a build-to-host.m4 
that does not exist anywhere in the history of gnulib.git.  That is a 
fairly complex scan, although it does look straightforward to 
implement.  That said, the m4 files in Gnulib *are* Free Software, so 
having a modified version cannot itself raise too many concerns.



  (Or they might not have!
Witness the three (and counting) malicious patches that they barefacedly
submitted to *other* software and got accepted because the malice was
subtle enough to pass through code review.)
  


Exactly.  :-/

That said, the whole thing looks to me like the attackers were trying to 
/not/ hit the more (what is the best word?) "advanced" users---the 
backdoor would only be inserted if building distribution packages, and 
then only under dpkg or rpm, not other systems like Gentoo's Portage or 
in an unpackaged "./configure && make && sudo make install" build.  This 
would, of course, hit the most widely used systems, including (reports 
are that the sock farm tried very hard to get Ubuntu to ship the crocked 
version in their upcoming release, but the freeze was upheld) the 
systems most commonly used by less technically-skilled users, but 
pointedly exclude systems that require greater skill to use---and whose 
users would be more likely to notice anything amiss and start tearing 
the system apart with the debugger.  Unfortunately for Mr. Sockmaster, 
it turns out that some highly-skilled users *do* use the widely-used 
systems and the backdoor caused sshd to misbehave enough to draw 
suspicion.  (Profiling reports that sshd is spending most of its time in 
liblzma---a library it has no reason to use---will tend to raise a few 
eyebrows.  :-)  )



[...]
  
Maybe the best revision to the GNU Coding Standards would be that 
releases should, if at all possible, contain only text?  Any binary 
files needed for testing can be generated during "make check" if 
necessary



I don't think this is a good idea.  It's only a speed bump for someone
trying to smuggle malicious data into a package (think "base64 -d") and
it makes life substantially harder for honest authors of programs that
work with binary data, and authors of material whose "source code"
(as GPLv3 uses that term) *is* binary data.  Consider pngsuite, for
instance (http://www.schaik.com/pngsuite/) -- it would be a *ton* of
work to convert each of these test PNG files into GNU Poke scripts,
and probably the result would be *less* ergonomic for purposes of
improving the test suite.
  


That is a bad example because SNG (https://sng.sourceforge.net/>) 
exists precisely to provide a a text representation of PNG binary 
structures.  (Admittedly, if I recall correctly, the contents of IDAT 
are simply a hexdump.)


While we are on the topic, this leaves the other obvious place to hide 
binary data:  images used as part of the manual.  There is a reason that 
I added the "if at all possible" caveat, and I am not certain that it is 
always possible.



I would like to suggest that a more useful policy would be "files
written to $prefix by 'make install' should not have any data
dependency on files labeled as part of the package's testsuite".
That doesn't constrain honest authors and it seems within the
scope of what the reproducible builds people could test for.
(Build the package, install to nonce prefix 1, unpack the tarball
again, delete the test suite, build again, install to prefix 2, compare.)
Of course a sufficiently determined malicious coder could detect
the reproducible-build test environment, but unlike "no binary data"
this is a substantial difficulty increment.


This could be a good idea.  Another way to check this even without 
reproducible builds would be to ensure that the access timestamps on 
testsuite files do not change while "make" is processing the main 
sources.  Checking this is slightly more invasive, since you would need 
to run a hook between processing top-level directories during the main 
build, but for packages using recursive Automake, you could simply run 
"make -C src" (or wherever the main sources are) and make sure that the 
testsuite files still have the same atime afterwards.  I admit that this 
is harder to automate in general, but distribution packaging processes 
already have other metadata that is manually maintained, so identifying 
the source subtrees that yield the installable artifacts should not be 
difficult.


Now that I think about it, I suggest tightening that policy a bit 
further:  "files produced by make in the source subtree (typically src/) 
shall have no data dependency on files outside of that tree"


I doubt anyone ever thought that recursive make could end up as 
security/verifiability feature.  8-|



-- Jacob



Re: automated release building service

2024-04-01 Thread Jacob Bachmeyer

Bruno Haible wrote:

Jacob Bachmeyer wrote:
  

Essentially, this would be an automated release building service:  upon
request, make a Git checkout, run autogen.sh or equivalent, make dist,
and publish or hash the result.  The problem is that an attacker who
manages to gain commit access to a repository may be able to launch
attacks on the release building service, since "make dist" can run
scripts.  The service could probably mount the working filesystem noexec
since preparing source releases should not require running (non-system)
binaries and scripts can be run by directly feeding them into their
interpreters even if the filesystem is mounted noexec, but this still
leaves all available interpreters and system tools potentially available.



Well, it'd at least make things more difficult for the attacker, even
if it wouldn't stop them completely.
  
  
Actually, no, it would open a *new* target for attackers---the release 
building service itself.  Mounting the scratchpad noexec would help to 
complicate attacks on that service, but right now there is *no* central 
point for an attacker to hit to compromise releases.  If a central 
release building service were set up, it would be a target, and an 
attacker able to arrange a persistent compromise of the service could 
then tamper with later releases as they are built.  This should be 
fairly easy to catch, if an honest maintainer has a secure environment, 
("Why the  does the central release service tarball not match mine?  
And what the  is the extra code in this diff between its tarball 
and mine!?") but there is a risk that, especially for large projects, 
maintainers start relying on the central release service instead of 
building their own tarballs.


The problem here was not a maintainer with a compromised system---it 
seems that "Jia Tan" was a malefactor's sock puppet from the start.



There are several problems that such an automated release building service
would create. Here are a couple of them:

* First of all, it's a statement of mistrust towards the developer/maintainer,
  if developers get pressured into using an automated release building
  service rather than producing the tarballs on their own.
  This demotivates and turns off developers, and it does not fix the
  original problem: If a developer is in fact a malefactor, they can
  also commit malicious code; they don't need to own the release process
  in order to do evil things.
  


Limiting trust also limits the value of an attack as well, thus 
protecting the developers/maintainers from at least sane attackers in 
some ways.  I also think that this point misunderstands the original 
proposal (or I have misunderstood it).  To some extent, projects using 
Automake already have that automated release building service; we call 
it "make dist" and it is a distributed service running on each 
maintainer's machine, including distribution package maintainers who 
regenerate the Autotools files.  A compromise of a developer's machine 
is thus valuable as it allows to tamper with releases, but the risk is 
managed somewhat by each developer building only their own releases.


A central service as a "second opinion" would be a risk, but would also 
make those compromises even more difficult---now the attacker must hit 
both the central service *and* the dev box *and* coordinate to ensure 
that only packages prepared at the central service for which the 
maintainer's own machine is cracked are tampered, lest the whole thing 
be detected.  This is even harder on the attacker, which is a good 
thing, of course.


The more dangerous risk is that the central service becomes overly 
trusted and ceases to be merely the "second opinion" on a release.  If 
that occurs, not only would we be right back to no real check on the 
process, but now *all* the releases come from one place.  A compromise 
of the central release service would then allow *all* releases to be 
tampered, which is considerably more valuable to an attacker.



* Such an automated release building service is a piece of SaaSS. I can
  hardly imagine how we at GNU tell people "SaaSS is as bad as, or worse
  than, proprietary software" and at the same time advocate the use of
  such a service.
  


As long as it runs on published Free Software and anyone is free to set 
up their own instance, I would disagree here.  I think we need to work 
out where the line between "hosting" and "SaaSS" actually is, and I am 
not sure that it has a clear technical description, since SaaSS is 
ultimately an ethical issue.



* Like Jacob mentioned, such a service quickly becomes a target for
  attackers. So, instead of trusting a developer, you now need to trust
  the technical architecture and the maintainers of such a service.
  


I think I may know an example of something similar:  

Re: libsystemd dependencies

2024-04-01 Thread Jacob Bachmeyer

Bruno Haible wrote:

Jacob Bachmeyer wrote:
  
some of the blame for this needs to fall on the 
systemd maintainers and their "katamari" architecture.  There is no good 
reason for notifications of daemon startup to pull in liblzma, but using 
libsystemd for that purpose does exactly that, and ended up getting 
xz-utils targeted as a means of getting to sshd without the OpenSSH 
maintainers noticing.



The systemd people are working on reducing the libsystemd dependencies:
https://github.com/systemd/systemd/issues/32028

However, the question remains unanswered why it needs 3 different
compression libraries (liblzma, libzstd, and liblz4). Why would one
not suffice?
  


From reading other discussions, the only reason libsystemd pulls in 
compression libraries at all is its "katamari" architecture:  the 
systemd journal can be optionally compressed with any of those 
algorithms, and the support for reading the journal (which libsystemd 
also provides) therefore requires support for all of them.  No, sshd 
(even with the distribution patches at issue) does /not/ use that 
support whatsoever.


Better design would split libsystemd into separate libraries:  
libsystemd-notify, libsystemd-journal, etc.  I suspect that there are 
more logically distinct modules that have been "katamaried" into one 
excessively large library.  The C runtime library has an excuse for 
being such an agglomeration, but also note that libc has *zero* hard 
external dependencies.  You can ridicule NSS if you like, but NSS 
modules are only loaded if NSS is used.  (To be fair, sshd almost 
certainly /does/ use functions provided by NSS.)  The systemd developers 
do not have that excuse, and their library *has* external dependencies.


I believe the systemd developers cite convenience as justification for 
the practice, because apparently figuring out which libraries (out of a 
set partitioned based on functionality) you need to link is "too hard" 
for developers these days.  (Perhaps that is the real reason they want 
to replace X11?)  That "convenience" nearly got all servers on the 
Internet running the major distributions backdoored with critical 
severity and we do not yet know exactly what the backdoor blob did.  The 
preliminary reports that it was an RCE backdoor that would pass commands 
smuggled in public key material in SSH certificates to system(3) (as 
root of course, since that is sshd's context at that stage) are 
inconsistent with the slowdown that caused the backdoor to be 
discovered.  I doubt that SSH logins were using that code path, and the 
SSH scanning botnets almost certainly are not presenting certificates, 
yet it apparently (reports have been unclear on this point) was the 
botnet scanning traffic that led to the discovery of sshd wasting 
considerable CPU time in liblzma...


I am waiting for the proverbial other shoe to drop on that one.


-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Jose E. Marchesi wrote:

Jose E. Marchesi wrote:


[...]



I agree that distcheck is good but not a cure all.  Any static
system can be attacked when there is motive, and unit tests are
easily gamed.
  
  

The issue seems to be releases containing binary data for unit tests,
instead of source or scripts to generate that data.  In this case,
that binary data was used to smuggle in heavily obfuscated object
code.



As a side note, GNU poke (https://jemarch.net/poke) is good for
generating arbitrarily complex binary data from clear textual
descriptions.
  

While it is suitable for that use, at last check poke is itself very
complex, complete with its own JIT-capable VM.  This is good for
interactive use, but I get nervous about complexity in testsuites,
where simplicity can greatly aid debugging, and it /might/ be possible
to hide a backdoor similarly in a poke pickle.  (This seems to be a
general problem with powerful interactive editors.)



Yes, I agree simplicity it is very desirable, in testsuites and actually
everywhere else.  I also am not fond of dragging in dependencies.
  


Exactly---I am sure that poke is great for interactive use, but a 
self-contained solution is probably better for a testsuite.



But I suppose we also agree in that it is not possible to assembly
non-trivial binary data structures in a simple way, without somehow
moving the complexity of the encoding into some sort of generator, which
will not be simple.  The GDB testsuite, for example, ships with a DWARF
assembler written in around 3000 lines of Tcl.  Sure, it is simpler than
poke and doesn't drag in additional dependencies.  But it has to be
carefully maintained and kept up to date, and the complexity is there.
  


The problem for a compression tool testsuite is that compression formats 
are (I believe) defined as byte-streams or bit-streams.  Further, the 
generator(s) must be able to produce /incorrect/ output as well, in 
order to test error handling.



Further, GNU poke defines its own specialized programming language for
manipulating binary data.  Supplying generator programs in C (or C++)
for binary test data in a package that itself uses C (or C++) ensures
that every developer with the skills to improve or debug the package
can also understand the testcase generators.



Here we will have to disagree.

IMO it is precisely the many and tricky details on properly marshaling
binary data in general-purpose programming languages that would have
greater odds to lead to difficult to understand, difficult to maintain
and possibly buggy or malicious encoders.  The domain specific language
is here an advantage, not a liability.

This you need to do in C to encode and generate test data for a single
signed 32-bit NUMBER in an output file in a _more or less_ portable way:

  void generate_testdata (off_t offset, int endian, int number)
  {
int bin_flag = 0, fd;

  #ifdef _WIN32
int bin_flag = O_BINARY;
  #endif
fd = open ("testdata.bin", bin_flag, S_IWUSR);
if (fd == -1)
  fatal ("error generating data.");

if (endian == BIG)

  {
b[0] = (number >> 24) & 0xff;
b[1] = (number >> 16) & 0xff;
b[2] = (number >> 8) & 0xff;
b[3] = number & 0xff;
  }
else
  {
b[3] = (number >> 24) & 0xff;
b[2] = (number >> 16) & 0xff;
b[1] = (number >> 8) & 0xff;
b[0] = number & 0xff;
  }

lseek (fd, offset, SEEK_SET);
for (i = 0; i < 4; ++i)
  write (fd, &b[i], 1);
close (fd);
  }
  


While that is a nice general solution, (aside from neglecting the 
declaration "uint8_t b[4];"; with "int b[4];", the code would only work 
on a little-endian processor; with no declaration, the compiler will 
reject it) a compression format would be expected to define the 
endianess of stored values, so the major branch in that function would 
collapse to just one of its alternatives.  Compression formats are 
generally defined as streams, so a different decomposition of the 
problem would likely make more sense:  (example untested)


   void emit_int32le (FILE * out, int value)
   {
 unsigned int R, i;

 for (R = (unsigned int)value, i = 0; i < 4; R = R >> 8, i++)
   if (fputc(R & 0xff, out) == EOF)
 fatal("error writing int32le");
   }
 

Other code handles opening OUT, or OUT is actually stdout and we are 
writing down a pipe or the shell handled opening the file.  (The main 
function can easily check that stdout is not a terminal and bail out if 
it is.)  Remember that I am suggesting test generator programs, which do 
not need to be as general as ordinary code, nor do they need the same 
level of user-friendliness, since they are expected to be run from 
scripts that encode the precise knowledge of how to call them.  (That 
this version is also probably more efficient by avoiding a syscall for 
every byte written is irrelevant for its intended use.)



This i

Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jacob Bachmeyer

Tomas Volf wrote:

On 2024-03-31 14:50:47 -0400, Eric Gallager wrote:
  

With a reproducible build system, multiple maintainers can "make dist"
and compare the output to cross-check for erroneous / malicious dist
environments.  Multiple signatures should be harder to compromise,
assuming each is independent and generally trustworthy.


This can only work if a package /has/ multiple active maintainers.
  

Well, other people besides the maintainers can also run `make dist`
and `make distcheck`. My idea was to get end-users in the habit of
running `make distcheck` themselves before installing stuff. And if
that's too much to ask of end users, I'd also point out that there are
multiple kinds of maintainer: besides the upstream maintainer, there
are also usually separate distro maintainers. Even if there's only 1
upstream maintainer, as was the case here, I still think that it would
be good to get distro maintainers in the habit of including `make
distcheck` as part of their own release process, before they accept
updates from upstream.



What would be helpful is if `make dist' would guarantee to produce the same
tarball (bit-to-bit) each time it is run, assuming the tooling is the same
version.  Currently I believe that is not the case (at least due to timestamps).
  


A "tardiff" tool that ignores timestamps would be a solution to that 
problem, but not to this backdoor.



Combined with GNU Guix that would allow simple way to verify that `make dist'
was used, and the resulting artifact not tampered with, even without any central
signing.


The Guix "challenge" operation would not have detected this backdoor 
because *it* *was* *in* *the* *upstream* *release*.  The build service 
works from that release tarball and you build from that same release 
tarball.  GNU Guix ensures an equivalent build environment and your 
results *will* match---either the backdoor was not inserted or it was 
inserted in both builds.



The flow of the attack as I understand it was:

   (0)  (speculation on motivation) The attacker wanted a "Golden Key" 
to SSH and started looking for ways to backdoor sshd.
   (1)  The attacker starts a sockpuppet campaign and manages to get 
one of his sockpuppets appointed co-maintainer of xz-utils.
   (2)  [2023-06-27] The sockpuppet merges a pull request believed to 
be from another sockpuppet in commit 
ee44863ae88e377a5df10db007ba9bfadde3d314.
   (3)  [2024-02-15] The sockpuppet "updates m4/.gitignore" to add 
build-to-host.m4 to the list in commit 
4323bc3e0c1e1d2037d5e670a3bf6633e8a3031e.
   (4)  [2024-02-23] The sockpuppet adds 5 files to the xz-utils 
testsuite in commit cf44e4b7f5dfdbf8c78aef377c10f71e274f63c0.
   (5)  [2024-03-08] To cover tracks, the sockpuppet finally adds a 
test using bad-3-corrupt_lzma2.xz in commit 
a3a29bbd5d86183fc7eae8f0182dace374e778d8.
   (6)  [2024-03-08] The sockpuppet revises two of those files with a 
lame excuse in commit a3a29bbd5d86183fc7eae8f0182dace374e778d8.


The quick analysis of the Git history supporting steps 2 - 6 above has 
turned up another interesting detail:  no version of configure.ac 
actually committed ever used the gl_BUILD_TO_HOST macro.  An analysis 
found on pastebin noted that build-to-host.m4 is a dependency of 
gettext.m4.  Following up finds commit 
3adaddd73c8edcceaed059e859bd5262df65fc5a of 2023-02-18 in the GNU 
gettext repository introduced the use of gl_BUILD_TO_HOST, apparently as 
part of moving some existing path translation logic to gnulib and 
generalizing it for use elsewhere.  This commit is innocent (it is 
*extremely* unlikely that Bruno Haible was involved in the backdoor 
campaign) and also explains why the backdoor was checking for "dnl 
Convert it to C string syntax." in m4/gettext.m4:  that comment was 
removed in the same commit that switch to using gl_BUILD_TO_HOST.  The 
change to gettext also occurred about a year before the sockpuppet began 
to take advantage of it.


It almost "feels like" the attacker was waiting for an opportunity to 
make plausible changes to autoconf macros and finally got one when 
updating the m4/ files for the 5.6.0 release.  Could someone with the 
release tarballs confirm that m4/gettext.m4 was updated between 
v5.5.2beta and v5.6.0?  I doubt the entire backdoor was developed in the 
week between those two commits.  In fact, the timing around introducing 
ifuncs suggests to me that the binary blob was at least well into 
development by mid-2023.


The commit message at step 2 claims that using ifuncs with 
-fsanitize=address causes segfaults.  If this is true generally, the 
glibc team should probably reconsider whether the abuse potential is 
worth the benefit of the feature and possibly investigate how the 
feature was introduced to glibc.  If this was an excuse, it provided a 
clever way to prevent oss-fuzz from finding the backdoor, as disabling 
ifuncs provides a conveniently hidden flag to disable the backdoor.


While double-checking the above, I stumb

Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jacob Bachmeyer

Eric Gallager wrote:

On Sun, Mar 31, 2024 at 3:20 AM Jacob Bachmeyer  wrote:
  

dherr...@tentpost.com wrote:


[...]

The issue seems to be releases containing binary data for unit tests,
instead of source or scripts to generate that data.  In this case, that
binary data was used to smuggle in heavily obfuscated object code.

[...]



Maybe this is something that the GNU project could start making
stronger recommendations about.
  


The key issue seems to be generating binary test data during `make` or 
`make check`, using GNU poke, GNU Awk, Perl, Tcl, small C programs, or 
something else, instead of packaging it in the release.  The xz-utils 
backdoor was smuggled into the repository wrapped in compressed test data.



With a reproducible build system, multiple maintainers can "make dist"
and compare the output to cross-check for erroneous / malicious dist
environments.  Multiple signatures should be harder to compromise,
assuming each is independent and generally trustworthy.
  

This can only work if a package /has/ multiple active maintainers.



Well, other people besides the maintainers can also run `make dist`
and `make distcheck`. My idea was to get end-users in the habit of
running `make distcheck` themselves before installing stuff. And if
that's too much to ask of end users, I'd also point out that there are
multiple kinds of maintainer: besides the upstream maintainer, there
are also usually separate distro maintainers. Even if there's only 1
upstream maintainer, as was the case here, I still think that it would
be good to get distro maintainers in the habit of including `make
distcheck` as part of their own release process, before they accept
updates from upstream.
  


The problem with that is that `make distcheck` only verifies that the 
working tree can produce a reasonable release tarball.  The backdoored 
xz-utils releases *would* *have* *passed* *this* *test* as far as I can 
determine.  It catches errors like omitting files from the lists in 
Makefile.am.  It will *not* catch a modified m4 file or questionable 
test data that has been properly listed as part of the release.



Maybe GNU should establish a cross-verification signing standard and
"dist verification service" that automates this process?  Point it to
a repo and tag, request a signed hash of the dist package...  Then
downstream projects could check package signatures from both the
maintainer and such third-party verifiers to check that nothing was
inserted outside of version control.
  

Essentially, this would be an automated release building service:  upon
request, make a Git checkout, run autogen.sh or equivalent, make dist,
and publish or hash the result.  The problem is that an attacker who
manages to gain commit access to a repository may be able to launch
attacks on the release building service, since "make dist" can run
scripts.  The service could probably mount the working filesystem noexec
since preparing source releases should not require running (non-system)
binaries and scripts can be run by directly feeding them into their
interpreters even if the filesystem is mounted noexec, but this still
leaves all available interpreters and system tools potentially available.



Well, it'd at least make things more difficult for the attacker, even
if it wouldn't stop them completely.
  


Actually, no, it would open a *new* target for attackers---the release 
building service itself.  Mounting the scratchpad noexec would help to 
complicate attacks on that service, but right now there is *no* central 
point for an attacker to hit to compromise releases.  If a central 
release building service were set up, it would be a target, and an 
attacker able to arrange a persistent compromise of the service could 
then tamper with later releases as they are built.  This should be 
fairly easy to catch, if an honest maintainer has a secure environment, 
("Why the  does the central release service tarball not match mine?  
And what the  is the extra code in this diff between its tarball 
and mine!?") but there is a risk that, especially for large projects, 
maintainers start relying on the central release service instead of 
building their own tarballs.


The problem here was not a maintainer with a compromised system---it 
seems that "Jia Tan" was a malefactor's sock puppet from the start.



-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jacob Bachmeyer

Jose E. Marchesi wrote:

[...]


I agree that distcheck is good but not a cure all.  Any static
system can be attacked when there is motive, and unit tests are
easily gamed.
  

The issue seems to be releases containing binary data for unit tests,
instead of source or scripts to generate that data.  In this case,
that binary data was used to smuggle in heavily obfuscated object
code.



As a side note, GNU poke (https://jemarch.net/poke) is good for
generating arbitrarily complex binary data from clear textual
descriptions.


While it is suitable for that use, at last check poke is itself very 
complex, complete with its own JIT-capable VM.  This is good for 
interactive use, but I get nervous about complexity in testsuites, where 
simplicity can greatly aid debugging, and it /might/ be possible to hide 
a backdoor similarly in a poke pickle.  (This seems to be a general 
problem with powerful interactive editors.)


Further, GNU poke defines its own specialized programming language for 
manipulating binary data.  Supplying generator programs in C (or C++) 
for binary test data in a package that itself uses C (or C++) ensures 
that every developer with the skills to improve or debug the package can 
also understand the testcase generators.



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jacob Bachmeyer

dherr...@tentpost.com wrote:

On 2024-03-30 18:25, Bruno Haible wrote:

Eric Gallager wrote:


Hm, so should automake's `distcheck` target be updated to perform
these checks as well, then?


The first mentioned check can not be automated. ...

The second mentioned check could be done by the maintainer, ...



I agree that distcheck is good but not a cure all.  Any static system 
can be attacked when there is motive, and unit tests are easily gamed.


The issue seems to be releases containing binary data for unit tests, 
instead of source or scripts to generate that data.  In this case, that 
binary data was used to smuggle in heavily obfuscated object code.


The best analysis in one place that I have found so far is 
https://gynvael.coldwind.pl/?lang=en&id=782>.  In brief, grep is 
used to locate the main backdoor files by searching for marker strings.  
After running tests/files/bad-3-corrupt_lzma2.xz through tr(1), it 
becomes a /valid/ xz file that decompresses to a shell script that 
extracts a second shell script from part of the compressed data in 
tests/files/good-large_compressed.lzma and pipes it to a shell.  That 
second script has two major functions:  first, it searches the test 
files for four six-byte markers, and it then extracts and decrypts 
(using a simple RC4-alike implemented in Awk) the binary backdoor also 
found in tests/files/good-large_compressed.lzma.  The six-byte markers 
mark beginning and end of raw LZMA2 streams obfuscated with a simple 
substitution cipher.  Any such streams found would be decompressed and 
read by the shell, but neither of the known crocked releases had any 
files containing those markers.  The binary backdoor is an x86-64 object 
that gets unpacked into liblzma_la-crc64-fast.o, unless m4/gettext.m4 
contains "dnl Convert it to C string syntax." which is a clever flag 
because about no one actually checks that those m4 files in release 
tarballs actually match what the GNU project distributes.  The object 
itself is just the backdoor and presumably provides the symbol 
_get_cpuid as its entrypoint, since the unpacker script patches the 
src/liblzma/check/crc{64,32}_fast.c files in a pipeline to add calls to 
that function and drops the compiled objects in .libs/.  Running make 
will then skip building those objects, since they are already 
up-to-date, and the backdoored objects get linked into the final binary.


Commit 6e636819e8f070330d835fce46289a3ff72a7b89 
(https://git.tukaani.org/?p=xz.git;a=commitdiff;h=6e636819e8f070330d835fce46289a3ff72a7b89>) 
was an update to the backdoor.  The commit message is suspicious, 
claiming the use of "a constant seed" to generate reproducible test 
files, but /not/ declaring how the files were produced, which of course 
prevents reproducibility.


With a reproducible build system, multiple maintainers can "make dist" 
and compare the output to cross-check for erroneous / malicious dist 
environments.  Multiple signatures should be harder to compromise, 
assuming each is independent and generally trustworthy.


This can only work if a package /has/ multiple active maintainers.

You also have a small misunderstanding here:  "make dist" prepares a 
(source) release tarball, not a binary build, so this is a 
closely-related issue but actually distinct from reproducible builds.  
Also easier to solve, since we only have to make the source tarball 
reproducible.


Maybe GNU should establish a cross-verification signing standard and 
"dist verification service" that automates this process?  Point it to 
a repo and tag, request a signed hash of the dist package...  Then 
downstream projects could check package signatures from both the 
maintainer and such third-party verifiers to check that nothing was 
inserted outside of version control.


Essentially, this would be an automated release building service:  upon 
request, make a Git checkout, run autogen.sh or equivalent, make dist, 
and publish or hash the result.  The problem is that an attacker who 
manages to gain commit access to a repository may be able to launch 
attacks on the release building service, since "make dist" can run 
scripts.  The service could probably mount the working filesystem noexec 
since preparing source releases should not require running (non-system) 
binaries and scripts can be run by directly feeding them into their 
interpreters even if the filesystem is mounted noexec, but this still 
leaves all available interpreters and system tools potentially available.



-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jacob Bachmeyer

Eric Gallager wrote:


Specifically, what caught my attention was how the release tarball
containing the backdoor didn't match the history of the project in its
git repository. That made me think about automake's `distcheck`
target, whose entire purpose is to make it easier to verify that a
distribution tarball can be rebuilt from itself and contains all the
things it ought to contain.


The problem is that a release tarball is a freestanding object, with no 
dependency on the repository from which it was produced.  In this case, 
the attacker added a bogus "update" of build-to-host.m4 from gnulib to 
the release tarball, but that file is not stored in the Git repository.  
This would not have tripped "make distcheck" because the crocked tarball 
can indeed be used to rebuild another crocked tarball.


As Alexandre Oliva mentioned in his reply, there is not really any good 
way to prevent this, since the attacker could also patch the generated 
configure script more directly.  (I seem to remember past incidents 
where tampered release tarballs had configure scripts that would 
download and run shell scripts.  If you ran configure as root, well...)  
The *user* could catch issues like this backdoor, since the backdoor 
appears (based on what I have read so far) to materialize certain object 
files while configure is running, while `find . -iname '*.o'` /should/ 
return nothing before make is run.  This also suggests that running 
"make clean" after configure would kill at least this backdoor.  A 
*very* observant (unreasonably so) user might notice that "make" did not 
build the objects that the backdoor provided.


Of course, an attacker could sneak around this as well by moving the 
process for unpacking the backdoor object to a Makefile rule, but that 
is more likely to "stick out" to an observant user, as well as being an 
easy target for automated analysis ("Which files have 'special' rules?") 
since you cannot obfuscate those from make(1) and expect them to still 
work.  In this case, the backdoor was ultimately discovered when it 
caused performance problems in sshd, which should not be using liblzma 
at all, but gets linked with it courtesy of libsystemd on major 
GNU/Linux distributions.  Yes, this means that systemd is a contributing 
factor to this incident, and that is aggravated by its unnecessary use 
of excessive dependencies.  (Sending a notification that a daemon is 
ready should /not/ require compression support of any type.  The 
"katamari" architecture model used in systemd had the effect here of 
broadening the supply-chain attack surface for OpenSSH sshd to include 
xz-utils, which is insane.)


The bulk of the attack payload seems to have been stored in the Git 
repository, disguised as binary test data in files 
tests/files/{bad-3-corrupt_lzma2.xz,good-large_compressed.lzma}.  The 
modified build-to-host.m4 merely added code to configure to start the 
process of unpacking the backdoor.  In a build from Git, the legitimate 
build-to-host.m4 would get copied in from gnulib and the backdoor would 
remain hidden.


Maybe the best revision to the GNU Coding Standards would be that 
releases should, if at all possible, contain only text?  Any binary 
files needed for testing can be generated during "make check" if 
necessary, with generator programs packaged (as source or scripts) in 
the release.



-- Jacob



Re: [RFC PATCH]: autom4te: report subsecond timestamp support in --version

2023-12-05 Thread Jacob Bachmeyer

Zack Weinberg wrote:

On Mon, Dec 4, 2023, at 7:26 PM, Jacob Bachmeyer wrote:
  

Now that I have seen the actual patch, yes, this test should be
accurate.  The test in the main autom4te script will also work, even
if there is a mismatch between the script and its library



Good.

  

This appears to be misaligned with the GNU Coding Standards, which
states:  "The first line is meant to be easy for a program to parse;
the version number proper starts after the last space."

Perhaps the best option would be to conditionally add a line "This
autom4te supports subsecond timestamps." after the license notice?



I don't like putting anything after the license notice because it's
convenient to be able to pipe --version output to sed '/Copyright/,$d'
without losing anything relevant for troubleshooting.  So how about

$ autom4te --version
autom4te (GNU Autoconf) 2.71
Features: subsecond-timestamps

Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+/Autoconf: GNU GPL version 3 or later
<https://gnu.org/licenses/gpl.html>, <https://gnu.org/licenses/exceptions.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Akim Demaille.

This preserves the effectiveness of sed '/Copyright/,$d' and also
leaves room for future additions to the "Features:" line.


That looks like a good idea to me, although the GNU Coding Standards do 
say (section 4.8.1, "--version") that the copyright and license notices 
"should" immediately follow the version numbers.  The presence or 
absence of this feature is effectively determined by something similar 
to a library version (the availability of the Perl Time::HiRes module) 
and it is expected to be important for debugging, which is the criteria 
stated for listing library versions.  Further, "should" does not express 
an absolute requirement and no rationale that would effectively make an 
absolute requirement (like a rule for automated parsing) is given here, 
unlike for the version in the first line.



-- Jacob



Re: rhel8 test failure confirmation?

2023-12-05 Thread Jacob Bachmeyer

Zack Weinberg wrote:

On Mon, Dec 4, 2023, at 7:14 PM, Jacob Bachmeyer wrote:
  

Zack Weinberg wrote:


[snip everything addressed in the other thread]
  

Yes, there was a bit of confusion here; not only is the FileUtils
module synchronized between autom4te and automake



Thanks for reminding me that I need to make sure all those files are
actually in sync before I cut the final 2.72 release.

  

  require Time::HiRes;
  import Time::HiRes qw(stat);
  

I believe that the import is not actually necessary



The previous line is a "require", not a "use", so I believe it _is_
necessary.  Have I misunderstood?

  

... should do no harm as long as any use of stat in the code
is prepared to handle floating-point timestamps.



There's only one use, in 'sub mtime', and that's the place
where we actively want the floating-point timestamps.


Yes, before seeing your actual patch, I had the mistaken impression that 
this code was in autom4te itself, not the FileUtils module.  The import 
is needed in the FileUtils module, so the patch is correct.



-- Jacob




Re: [RFC PATCH]: autom4te: report subsecond timestamp support in --version

2023-12-04 Thread Jacob Bachmeyer

Zack Weinberg wrote:

The Automake test suite wants this in order to know if it’s safe to
reduce the length of various delays for the purpose of ensuring files
in autom4te.cache are newer than the corresponding source files.

* lib/Autom4te/FileUtils.pm: Provide (but do not export) a flag
  $subsecond_mtime, indicating whether the ‘mtime’ function reports
  modification time with precision greater than one second.
  Reorganize commentary and import logic for clarity.  Add
  configuration for emacs’ perl-mode to the bottom of the file.
  


Now that I have seen the actual patch, yes, this test should be 
accurate.  The test in the main autom4te script will also work, even if 
there is a mismatch between the script and its library, since Perl 
accepts a fully-qualified variable name even if that variable has never 
been declared; its value is undef, which is falsish in Boolean context.



* bin/autom4te.in ($version): If $Autom4te::FileUtils::subsecond_mtime
  is true, add the text “ (subsecond timestamps supported)” to the
  first line of --version output.
  


This appears to be misaligned with the GNU Coding Standards, which 
states:  "The first line is meant to be easy for a program to parse; the 
version number proper starts after the last space."


Perhaps the best option would be to conditionally add a line "This 
autom4te supports subsecond timestamps." after the license notice?



-- Jacob



Re: rhel8 test failure confirmation?

2023-12-04 Thread Jacob Bachmeyer

Zack Weinberg wrote:

On Sun, Dec 3, 2023, at 4:49 PM, Karl Berry wrote:
  
There would not need to be much parsing, just "automake --version | grep 
  
> HiRes" in that case, or "case `automake --version` in *HiRes*) ...;; 
> easc" to avoid running grep if you want.


I specifically want to hear what Karl thinks.

I lean towards Jacob's view that automake --version | grep HiRes will
suffice. Not having a new option seems simpler/better in terms of later
understanding, too. --thanks, karl.



Did I misunderstand which program's --version output we are talking about?
I thought we were talking about automake's testsuite probing the behavior
of *autom4te*, but all the quoted text seems to be imagining a probe of
*automake* instead.


Yes, there was a bit of confusion here; not only is the FileUtils module 
synchronized between autom4te and automake, but those two are near 
"sound-alikes" as I read them.  Oops.


The issue here seems to be determining if a fix that (I think) 
originated in automake has been applied to the active autom4te.



[...]

I'm not using the identifier "HiRes" because the use of Time::HiRes is an
implementation detail that could change.  For instance, if there's a third
party CPAN module that lets us get at nanosecond-resolution timestamps
*without* loss of precision due to conversion to an NV (aka IEEE double)
we could, in the future, look for that first.
  


That is fine, but "[HiRes]" or "[HiResTime]" is much shorter and we 
could use it as the name of the feature regardless of the underlying 
implementation.  Characters in the first line of `autom4te --version` 
are a finite resource if we want it to fit on a standard 80-column 
terminal without wrapping.  If we need to distinguish, "[HiRes] 
[HiRes-ns]" could be used to indicate your hypothetical integer 
nanosecond-resolution timestamp support, which would indicate also 
having sub-second timestamp support.


I also suggest changing the tag, since the GNU Coding Standards call for 
the version number to be indicated by the last space, but parenthesized 
text between the name and version is supposed to be the package, so this 
would lead to:


$ ./tests/autom4te --version
autom4te [HiResTime] (GNU Autoconf) 2.72d.6-49ab3-dirty
Copyright (C) 2023 Free Software Foundation, Inc.
[...]


Is this workable all the way around, everyone?  Or should the feature be 
indicated with another line after the license notice?  ("This autom4te 
has subsecond timestamp resolution.")  My apologies for neglecting to 
check this before suggesting a tag in the --version output.



The implementation is just

BEGIN
{
  our $subsecond_timestamps = 0;
  eval
{
  require Time::HiRes;
  import Time::HiRes qw(stat);
  $subsecond_timestamps = 1;
}
}

Jacob, can you confirm that's an accurate test, given all the things you
said earlier about ways that grepping the source code might get it wrong?


That will determine if (a) Time::HiRes could be loaded and (b) 
Time::HiRes::stat could be imported.  This is the same test that 
Autom{ak,4t}e::FileUtils effectively uses to use Time::HiRes::stat.  I 
believe that the import is not actually necessary (i.e. Time::HiRes 
always exported Time::HiRes::stat) but it should do no harm as long as 
any use of stat in the code is prepared to handle floating-point 
timestamps.  As long as the autom4te script and its library are 
consistent (which is the distribution's problem if they screw that up), 
this test should be accurate.



-- Jacob



Re: rhel8 test failure confirmation?

2023-12-03 Thread Jacob Bachmeyer

Karl Berry wrote:
> There would not need to be much parsing, just "automake --version | grep 
> HiRes" in that case, or "case `automake --version` in *HiRes*) ...;; 
> easc" to avoid running grep if you want.


I specifically want to hear what Karl thinks.

I lean towards Jacob's view that automake --version | grep HiRes will
suffice. Not having a new option seems simpler/better in terms of later
understanding, too. --thanks, karl.

P.S. As for case vs. grep, personally I find a simple if...grep easier
to comprehend/test/debug than a case statement. (Especially the
macro-ized AS_CASE, which just makes me have to look up its syntax every
time I see it.) Also fewer lines of source. Granted calling the external
grep is less efficient, but that seems insignificant to me. I understand
Paul and others may disagree ...


I agree that if...grep is more direct.  I suggested the case alternative 
because it stands out in my memory after I needed it once, but I do not 
recall exactly why that contortion was needed.


In configure, the efficiency difference is trivial because configure 
already runs many, many, many subprocesses.  One more grep will not make 
a difference on any reasonable platform.



-- Jacob




Re: rhel8 test failure confirmation?

2023-12-02 Thread Jacob Bachmeyer

Mike Frysinger wrote:

On 02 Dec 2023 17:07, Jacob Bachmeyer wrote:
  

Mike Frysinger wrote:


On 06 Apr 2023 21:29, Jacob Bachmeyer wrote:
  

Karl Berry wrote:


jb> a more thorough test would locate the autom4te script and grep it
for the perllibdir that was substituted when autoconf was
configured.

I guess that would work.
  
  

Challenge accepted.  Here's a refined version:  (lines \-folded for email)

if $PERL -I${autom4te_perllibdir:-$(sed -n \
  '/autom4te_perllibdir/{s/^.*|| //;s/;$//;s/^.//;s/.$//;p;q}' \
<$(command -v autom4te))} -MAutom4te::FileUtils \
 -e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then
   # autom4te uses Time::HiRes
else
   # autom4te does not use Time::HiRes
fi



this doesn't work on systems that wrap `autom4te`.  [...]

[...]

so i don't know why we would need to set/export autom4te_perllibdir in our
wrapper.  we've been doing this for over 20 years without ever setting that
var (or any other internal autoconf/automake var), so i'm pretty confident
our approach is OK.
  


No, not in the wrapper---in the automake ebuild script that runs 
configure to match the autom4te that the wrapper will run.  That test I 
proposed checks for autom4te_perllibdir in the environment before 
extracting it from autom4te precisely so distributions like Gentoo would 
have a knob to turn if their packaging breaks that test.


That said, it turns out that this whole line of discussion is now a red 
herring; see below.



[...]
i'm not aware of anything loading the Autom4te perl modules outside of the
autoconf projects.  does that actually happen ?  i don't think we want to
have automake start loading autoconf perl modules directly.  going through
the CLI interface seems better at this point.


Autoconf and Automake are very closely associated; there is even some 
shared code that is synchronized between them.  Autom4te::FileUtils is 
also Automake::FileUtils, for example.


The test we are discussing here was intended for Automake's configure 
script to use to check if the installed Autoconf has high-resolution 
timestamp support.  It turned out that the test I wrote can give a false 
positive, as some versions of other dependencies of Autom4te::FileUtils 
/also/ use Time::HiRes, causing the test to correctly report that 
Time::HiRes was loaded, but Autom4te::FileUtils nonetheless does not 
actually use it.  The test could probably be improved to fix the false 
positives, but that would be getting into deep magic in Perl that might 
not be fully reliable across Perl versions.  (It would be necessary to 
determine if (a) Time::HiRes::stat exists and (b) 
Autom4te::FileUtils::stat is an alias to it.  Having configure build a 
special XSUB just to check this is well into "ridiculous" territory.)


As such, the Automake maintainers replaced this particular test with a 
simpler test that just locates Autom4te/FileUtils.pm and greps it for 
"Time::HiRes", thus the error message you received, which initially had 
me confused because the test I proposed cannot produce that message as 
it does not use grep.


An important bit of context to keep in mind is that I am not certain 
that timestamp resolution is still a problem outside of the Automake and 
Autoconf testsuites, since Autoconf and Automake now require cache files 
to actually be newer than their sources and consider the cache files 
stale if the timestamps are equal.  This is a problem for the testsuite 
because the testsuite is supposed to actually exercise the caching 
mechanism, and higher-resolution timestamps can significantly reduce the 
amount of time required to run the tests by reducing the delays needed 
to ensure the caches will be valid.



-- Jacob



Re: rhel8 test failure confirmation?

2023-12-02 Thread Jacob Bachmeyer

Zack Weinberg wrote:

On Sat, Dec 2, 2023, at 7:33 PM, Jacob Bachmeyer wrote:
  

Zack Weinberg wrote:


Would it help if we added a command line option to autom4te that made
it report whether it thought it could use high resolution timestamps?
Versions of autom4te that didn't recognize this option should be
conservatively assumed not to support them.
  

Why not just add that information to the --version message?  Add a
"(HiRes)" tag somewhere if Time::HiRes is available?



Either way is no problem from my end, but it would be more work
for automake (parsing --version output, instead of just checking the
exit status of autom4te --assert-high-resolution-timestamp-support)
Karl, do you have a preference here?  I can make whatever you decide
on happen, in the next couple of days.
  


There would not need to be much parsing, just "automake --version | grep 
HiRes" in that case, or "case `automake --version` in *HiRes*) ...;; 
easc" to avoid running grep if you want.


-- Jacob



Re: rhel8 test failure confirmation?

2023-12-02 Thread Jacob Bachmeyer

Zack Weinberg wrote:

On Sat, Dec 2, 2023, at 6:37 PM, Karl Berry wrote:
  
The best way to check if high-resolution 
timestamps are available to autom4te is to have perl load 
Autom4te::FileUtils and check if that also loaded Time::HiRes.


The problem with that turned out to be that Time::HiRes got loaded from
other system modules, resulting in the test thinking that autom4te used
it when that wasn't actually the case. That's what happened in practice
with your patch.



Would it help if we added a command line option to autom4te that made it report 
whether it thought it could use high resolution timestamps? Versions of 
autom4te that didn't recognize this option should be conservatively assumed not 
to support them.
  


Why not just add that information to the --version message?  Add a 
"(HiRes)" tag somewhere if Time::HiRes is available?  All versions that 
know to check if Time::HiRes is loaded will also know how to use it, 
unlike the earlier test.



(Of course there's the additional wrinkle that whether high resolution 
timestamps *work* depends on what filesystem autom4te.cache is stored in, but 
that's even harder to probe... one problem at a time?)


Yes; even standard-resolution timestamps might not be "all there" with 
FAT and its infamous 2-second timestamp resolution.


Is this actually still a problem (other than for ensuring the cache is 
used in the testsuite) after Bogdan's patches to require that cache 
files be strictly newer than their source files?



-- Jacob



Re: rhel8 test failure confirmation?

2023-12-02 Thread Jacob Bachmeyer

Mike Frysinger wrote:

On 06 Apr 2023 21:29, Jacob Bachmeyer wrote:
  

Karl Berry wrote:


jb> a more thorough test would locate the autom4te script and grep it
for the perllibdir that was substituted when autoconf was
configured.

I guess that would work.
  

Challenge accepted.  Here's a refined version:  (lines \-folded for email)

if $PERL -I${autom4te_perllibdir:-$(sed -n \
  '/autom4te_perllibdir/{s/^.*|| //;s/;$//;s/^.//;s/.$//;p;q}' \
<$(command -v autom4te))} -MAutom4te::FileUtils \
 -e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then
   # autom4te uses Time::HiRes
else
   # autom4te does not use Time::HiRes
fi



this doesn't work on systems that wrap `autom4te`.  Gentoo for example wraps
all autoconf & automake scripts to support parallel installs of different
versions.  this way we can easily have access to every autoconf version.  we
got this idea from Mandrake, so we aren't the only ones ;).
  


If you install a wrapper script, (instead of, for example, making 
autom4te, etc. easily-repointable symlinks), then you must also set 
autom4te_perllibdir in the environment to the appropriate directory when 
building autoconf/automake.  This (with the Gentoo-specific knowledge of 
where the active autom4te is actually located) should be easy to add to 
the ebuild.


If autom4te_perllibdir is set in the environment, its value will be used 
instead of extracting that information from the autom4te script.



[...]

seems like the only reliable option is to invoke autom4te.
am_autom4te_ver=`$AUTOM4TE --version | sed -n '1{s:.*) ::;p}'
AS_CASE([$am_autom4te_ver],
... do the matching ...

what is the first autoconf release that has the fix ?
  


The problem with testing autoconf versions for this is that Time::HiRes 
is an *optional* module in Perl.  It was available from CPAN before it 
was bundled with Perl, and distributions technically can *unbundle* it 
from later Perl releases if they want.  The only reliable way to know if 
Time::HiRes is available (without effectively reimplementing Perl's 
module search) is to try to load it.  Autom4te now (correctly) uses 
Time::HiRes if it is available and falls back to Perl builtins if not, 
for any version of Perl.  The best way to check if high-resolution 
timestamps are available to autom4te is to have perl load 
Autom4te::FileUtils and check if that also loaded Time::HiRes.



-- Jacob




Re: Getting long SOURCES lines with subdirs shorter

2023-07-16 Thread Jacob Bachmeyer

Jan Engelhardt wrote:

Given

a_SOURCES = aprog/main.c aprog/foo.c aprog/bar.c aprog/baz.c ...

The more source files there are to be listed, the longer that line gets, 
the bigger the Makefile.am fragment becomes, etc. I am thinking about 
how to cut that repetition down. Current automake likely won't have 
anything in store already, so I'm thinking of editing automake and 
targeting a future automake release.
  


While this does not reduce the repetition, Automake allows 
backslash-continuation on these lines.  DejaGnu uses it to list files 
one per line in some places; see 
http://git.savannah.gnu.org/cgit/dejagnu.git/tree/Makefile.am>.


-- Jacob



Re: rhel8 test failure confirmation?

2023-04-07 Thread Jacob Bachmeyer

Karl Berry wrote:

Hi Jacob,

The guess was the two most probable locations:  /usr/share/autoconf and 
/usr/local/share/autoconf.


Wouldn't have worked on my own system :).

Challenge accepted.  


Thanks!

if $PERL -I${autom4te_perllibdir:-$(sed -n \
  '/autom4te_perllibdir/{s/^.*|| //;s/;$//;s/^.//;s/.$//;p;q}' \
<$(command -v autom4te))} -MAutom4te::FileUtils \
 -e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then
   # autom4te uses Time::HiRes

unfortunately we are highly restricted in what we can use in basic
automake/conf shell code (as opposed to in the tests).  Neither the
"command" command nor $(...) syntax can be used.
  


Are you sure about that?  I got a fair bit of pushback on removing 
$(...) from config.guess (where it actually is a problem because 
config.guess is supposed to identify a variety of pre-POSIX systems and 
can be run independently of configure) on the grounds that Autoconf 
locates a POSIX shell and uses it for the bulk of configure (and the 
auxiliary scripts like config.guess).  Of course, Autoconf's "find a 
POSIX shell" logic does not help DejaGnu, which installs a copy of 
config.guess and runs it with /bin/sh according to its #! line...



For the former, I think there's an autoconf/make macro to look up a
program name along PATH?


From a quick glance at the manual, that would be 
AC_PATH_PROG([AUTOM4TE], [autom4te]).



[...]
  
Would you be up for tweaking the check to use such

least-common-denominator shell stuff?
  


Let's try:

AC_PATH_PROG([AUTOM4TE], [autom4te])
if test x$autom4te_perllibdir = x; then
 autom4te_perllibdir=`sed -n \
   '/autom4te_perllibdir/{s/^.*|| //;s/;$//;s/^.//;s/.$//;p;q}' <$AUTOM4TE`
fi
if $PERL -I$autom4te_perllibdir -MAutom4te::FileUtils \
-e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then
 ... 



The backslash-newline in the sed command was added as a precaution 
against line-wrap in email; the line could be combined.



Ordinarily Perl could not be used either, but since Automake is written
in Perl, I don't see a problem with doing so here. (If the system
doesn't have Perl, Automake won't get far.)


If the system lacks Perl, autom4te will not work either.  The proposed 
test uses Perl to determine a characteristic of a program that is 
written in Perl.  :-)



Not sure if $PERL is already
defined by the time at which this would be run, but it should be
possible to arrange with an ac prerequisite if needed.
  


That should be easy enough to rearrange, since this check must come 
/after/ the autoconf version check---the pattern is only valid since 
autoconf-2.52f, but Automake requires autoconf-2.65 or later.



-- Jacob



Re: rhel8 test failure confirmation?

2023-04-06 Thread Jacob Bachmeyer

Karl Berry wrote:
jb> The test also guesses the location of autoconf's Perl libraries; 


I'm skeptical that any "guessing" of library locations would be reliable
enough.
  


The guess was the two most probable locations:  /usr/share/autoconf and 
/usr/local/share/autoconf.



jb> a more thorough test would locate the autom4te script and grep it
for the perllibdir that was substituted when autoconf was
configured.

I guess that would work.


Challenge accepted.  Here's a refined version:  (lines \-folded for email)

if $PERL -I${autom4te_perllibdir:-$(sed -n \
 '/autom4te_perllibdir/{s/^.*|| //;s/;$//;s/^.//;s/.$//;p;q}' \
   <$(command -v autom4te))} -MAutom4te::FileUtils \
-e 'exit defined $INC{q[Time/HiRes.pm]} ? 0 : 1'; then
  # autom4te uses Time::HiRes
else
  # autom4te does not use Time::HiRes
fi


This version matches a patten that was introduced in commit 
c737451f8c17afdb477ad0fe72f534ea837e001e on 2001-09-13 preceding 
autoconf-2.52f, and Automake currently requires autoconf-2.65 or later, 
so this should work.


Getting the single quotes away from the value without directly 
mentioning them is the purpose of the "s/^.//;s/.$//;" part of the sed 
command.  Wrapping it as "$(eval echo $(sed ...))" would have been 
another option to have the shell strip the single quotes.



Automake and autoconf are not two independent tools. Automake completely
relies on autoconf.

It's not for me to hand down any final pronouncements, but personally I
feel strongly that the tests should not paper over this problem by
changing the way tests work in general. With rm -rf of the cache, or
autoconf -f, etc. That is not what users do, so that's not what the
tests should do, either. Such global changes could have all kinds of
other unknown/undesirable effects on the tests.

In contrast to setting the sleep value "as appropriate", which is what
is/should be already done, so changing the conditions under which it is
set is unlikely to cause any unforeseen additional problems.
  


While potentially compromising the real-world validity of the testsuite 
is a legitimate concern, the fact that Automake depends on Autoconf does 
not preclude the Automake testsuite from working around Autoconf 
limitations in order to accurately test /Automake/.



-- Jacob




Re: rhel8 test failure confirmation?

2023-04-04 Thread Jacob Bachmeyer

Bogdan wrote:
Jacob Bachmeyer , Mon Apr 03 2023 06:16:53 
GMT+0200 (Central European Summer Time)

Karl Berry wrote:

[...]
   What can we do about this?

As for automake: can we (you :) somehow modify the computation of the
sleep value to determine if autom4te can handle the HiRes testing or 
not

(i.e., has the patch installed)? And then use the longer sleep in
automake testing if needed.


If you can locate Autom4te::FileUtils, grepping it for "Time::HiRes" 
will tell you if autom4te supports sub-second timestamps, but then 
you need more checks to validate that the filesystem actually has 
sub-second timestamps.


A simple check:

if $PERL -I${autom4te_perllibdir:-/usr/share/autoconf} 
-I/usr/local/share/autoconf \
-MAutom4te::FileUtils -e 'exit defined $INC{q[Time/HiRes.pm]} 
? 0 : 1'; then

# autom4te uses Time::HiRes
else
# autom4te does not use Time::HiRes
fi

This method also has the advantage of implicitly also checking that 
$PERL has Time::HiRes installed by determining if loading 
Autom4te::FileUtils causes Time::HiRes to be loaded.  (In other 
words, this will give the correct answer on Perl 5.6 if Time::HiRes 
was installed from CPAN or on later Perls if a distribution packager 
has unbundled Time::HiRes and the user has not installed its package.)



 Nice. The 0 and 1 may not be portable to each OS in the Universe (see 
EXIT_SUCCESS and EXIT_FAILURE in exit(3)), but should be good/portable 
enough for our goals. Or maybe some other simple solution.


Generally, "exit 0" reports success to the shell and any other exit 
value is taken as false.  I am unsure if POSIX actually requires that, 
however.


 As I understand, this could even be used to actually call the sub 
which checks the timestamps, so we'd have a read-to-use test. Only a 
matter of where to put it... Is there some code that runs *before* all 
tests that could set some environment variable passed to the tests, 
create a file, or whatever?


The intended implication was that that test would go in configure.

Verifying that the filesystem actually /has/ subsecond timestamps is a 
separate issue; that test only detects whether autom4te will use 
subsecond timestamps /if/ they are available.


The test also guesses the location of autoconf's Perl libraries; a more 
thorough test would locate the autom4te script and grep it for the 
perllibdir that was substituted when autoconf was configured.



-- Jacob




Re: rhel8 test failure confirmation?

2023-04-02 Thread Jacob Bachmeyer

Karl Berry wrote:

[...]
   What can we do about this?

As for automake: can we (you :) somehow modify the computation of the
sleep value to determine if autom4te can handle the HiRes testing or not
(i.e., has the patch installed)? And then use the longer sleep in
automake testing if needed.
  


If you can locate Autom4te::FileUtils, grepping it for "Time::HiRes" 
will tell you if autom4te supports sub-second timestamps, but then you 
need more checks to validate that the filesystem actually has sub-second 
timestamps.


A simple check:

if $PERL -I${autom4te_perllibdir:-/usr/share/autoconf} 
-I/usr/local/share/autoconf \
   -MAutom4te::FileUtils -e 'exit defined $INC{q[Time/HiRes.pm]} ? 
0 : 1'; then

   # autom4te uses Time::HiRes
else
   # autom4te does not use Time::HiRes
fi

This method also has the advantage of implicitly also checking that 
$PERL has Time::HiRes installed by determining if loading 
Autom4te::FileUtils causes Time::HiRes to be loaded.  (In other words, 
this will give the correct answer on Perl 5.6 if Time::HiRes was 
installed from CPAN or on later Perls if a distribution packager has 
unbundled Time::HiRes and the user has not installed its package.)



[...]
It seems to me that using autoconf -f or similar is papering over the
problem, so that the tests are no longer testing the normal behavior.
Which does not sound desirable.


The Automake testsuite is supposed to test Automake, not Autoconf, so 
working around Autoconf issues is appropriate.  In this case, if always 
using "autoconf -f" allows us to eliminate the sleeps entirely (and does 
not expand the running time of Autoconf too much), we should do that, at 
least in my view.



-- Jacob




Re: rhel8 test failure confirmation? [PATCH for problem affecting Automake testsuite]

2023-03-31 Thread Jacob Bachmeyer

A quick introduction to the situation for the Autoconf list:

The Automake maintainers have encountered a bizarre issue with sporadic 
random test failures, seemingly due to "disk writes not taking effect" 
(as Karl Berry mentioned when starting the thread).  Bogdan appears to 
have traced the issue to autom4te caching and offered a patch.  I have 
attached a copy of Bogdan's patch.


Bogdan's patch is a subtle change:  the cache is now considered stale 
unless it is /newer/ than the source files, rather than being considered 
stale only if the source files are newer.  In short, this patch causes 
the cache to be considered stale if its timestamp /matches/ the source 
file, while it is currently considered valid if the timestamps match.  I 
am forwarding the patch to the Autoconf list now because I concur with 
the change, noting that Time:HiRes is also limited by the underlying 
filesystem and therefore is not a "magic bullet" solution.  Assuming the 
cache files are stale unless proven otherwise is therefore correct.


Note again that this is _Bogdan's_ patch I am forwarding unchanged.  I 
did not write it (but I agree with it).


[further comments inline below]

Bogdan wrote:
Bogdan , Sun Mar 05 2023 22:31:55 GMT+0100 (Central 
European Standard Time)
Karl Berry , Sat Mar 04 2023 00:00:56 GMT+0100 
(Central European Standard Time)
 Note that 'config.h' is older (4 seconds) than './configure', 
which

 shouldn't be the case as it should get updated with new values.

Indeed. That is the same sort of thing as I was observing with nodef.
But what (at any level) could be causing that to happen?
Files just aren't getting updated as they should be.

I haven't yet tried older releases of automake to see if their tests
succeed on the systems that are failing now. That's next on my list.


[...]


  Another tip, maybe: cache again. When I compare which files are 
newer than the only trace file I get in the failing 'backcompat2' 
test ('autom4te.cache/traces.0'), I see that 'configure.ac' is older 
than this file in the succeeding run, but it's newer in the failing 
run. This could explain why 'configure' doesn't get updated to put 
new values in config.h (in my case) - 'autom4te' thinks it's up-to-date.

  The root cause may be in 'autom4te', sub 'up_to_date':

   # The youngest of the cache files must be older than the oldest of
   # the dependencies.
   # FIXME: These timestamps have only 1-second resolution.
   # Time::HiRes fixes this, but assumes Perl 5.8 or later.

(lines 913-916 in my version).


This comment Bogdan cites is not correct:  Time::HiRes could be 
installed from CPAN on Perls older than 5.8, and might be missing from a 
5.8 or later installation if the distribution packager separated it into 
another package.  Nor is Time::HiRes guaranteed to fix the issue; the 
infamous example is the FAT filesystem, where timestamps only have 
2-second resolution.  Either way, Time::HiRes is now used if available, 
so this "FIXME" is fixed now.  :-)


  Perhaps 'configure.ac' in the case that fails is created "not late 
enough" (still within 1 second) when compared to the cache, and the 
cached values are taken, generating the old version of 'configure' 
which, in turn, generates old versions of the output files.


  Still a guess, but maybe a bit more probable now.

  Does it work when you add '-f' to '$AUTOCONF'? It does for me - 
again, about 20 sequential runs of the same set of tests and about 5 
parallel with 4 threads. Zero failures.
  I'd probably get the same result if I did a 'rm -fr autom4te.cache' 
before each '$AUTOCONF' invocation.

[...]

  More input (or noise):

1) The t/backcompat2.sh test (the only test which fails for me) is a 
test which modifies configure.ac and calls $AUTOCONF several times.


2) Autom4te (part of Autoconf) has a 1-second resolution in checking 
if the input files are newer than the cache.


Maybe.  That comment could be wrong; the actual "sub mtime" is in 
Autom4te::FileUtils.  Does your version of that module use Time::HiRes?  
Git indicates that use of Time::HiRes was added to Autoconf at commit 
3a9802d60156809c139e9b4620bf04917e143ee2 which is between the 2.72a and 
2.72c snapshot tags.


3) Thus, a sequence: 'autoconf' + quickly modify configure.ac + 
quickly run 'autoconf' may cause autom4te to use the old values from 
the cache instead of processing the new configure.ac. "Quickly" means 
within the same second.


It might be broader than that if your version is already using 
Time::HiRes.  If so, what filesystems are involved?  I could see a 
possible bug where multiple writes get the same mtime if they get 
flushed to disk together.  Time::HiRes will not help if this happens; 
your patch will work around such a bug.


4) I ran the provided list of tests (t/backcompat2.sh, 
t/backcompat3.sh, t/get-sysconf.sh, t/lex-depend.sh, t/nodef.sh, 
t/remake-aclocal-version-mismatch.sh, t/subdir-add2-pr46.sh, 
t/testsuite-summary-reference-log.sh) in batches of 2

Re: if vs. ifdef in Makefile.am

2023-03-02 Thread Jacob Bachmeyer

Bogdan wrote:

[...]
 Probably Nick's suggestion (a new option to ./configure or the 
AC_HEADER_ASSERT macro) would be the most future-proof, but it 
requires running ./configure each time you wish to change the build 
type (which maybe is not a bad idea, it depends).


That would probably be a very good idea, to avoid mixing files built for 
one mode with files built for another.  Even easier:  use separate build 
directories for each type, from a common source directory, like so:


$ : ... starting one directory above the source tree in ./src/ ...
$ (mkdir test-build; cd ./test-build && ../src/configure --enable-assert 
...)
$ (mkdir release-build; cd ./release-build && ../src/configure 
--disable-assert ...)


Now you avoid conflating modules for test and release builds and ending 
up with an executable that you cannot reliably replicate.  A simple flag 
to make is unlikely to be properly recognized as a dependency for all 
objects built.



-- Jacob



Re: Generating missing depfiles by an automake based makefile

2023-02-09 Thread Jacob Bachmeyer

Dmitry Goncharov wrote:

On Thursday, February 9, 2023, Tom Tromey  wrote:
  

It's been a long time since I worked on automake, but the dependency
tracking in automake is designed not to need to rebuild or pre-build dep
files.  Doing that means invoking the compiler twice, which is slow.
Instead, automake computes dependencies as a side effect of compilation.


The hello.Po example presented above computes depfiles as a side effect of
compilation. Moreover, when hello.Po is absent that makefile compiles
hello.o as a side effect of hello.Po computation. In total there is only
one compilation.
  

What is the scenario where you both end up with an empty depfile and a
compilation that isn't out of date for some other reason?  That seems
like it shouldn't be possible.


When a depfile is missing (for any reason) the current automake makefile
creates a dummy depfile. From that point on the user has to notice that
make is no longer tracking dependencies and their build is incorrect.

I am asking if automake can be enhanced to do something similar to hello.Po
example above, in those cases when make supports that.


If I understand correctly, the problem here is that the depfile is both 
empty and current.  If Automake could set the dummy depfile's mtime to 
some appropriate past timestamp (maybe the Makefile itself?), it would 
appear out-of-date immediately and therefore be remade, also rebuilding 
the corresponding object.


A quick check of the POSIX manual finds that touch(1) accepts the '-r' 
option to name a reference file and can create a file.  Could we simply 
use "touch -r Makefile $DEPFILE" to create depfiles when we need dummies?



-- Jacob




Re: man_MANS install locations

2022-08-31 Thread Jacob Bachmeyer

Karl Berry wrote:

Hi Jan,

As for GNU/Linux, what was the rationale to only permit [0-9ln]?

No idea. Maybe just didn't think about "m", or maybe it didn't exist at
that time? Jim, Paul, anyone?

Should automake be relaxed? 


I see no harm in allowing more (any) letters, if that's what you mean.

When running automake on Solaris, placing svcadm.1m into man1 rather
than man1m seems outright wrong.

But is Automake's purpose to reproduce platform-specific behavior, or to
have consistent behavior across platforms?  I think the latter.
  


This would be adapting to platform-specific requirements.  I suspect 
that Solaris man(1) will not look for svcadm.1m in man1 at all but only 
in man1m.



I guess a new option to install *.1m in man1m/, etc., would be ok, if
you want it. If you or anyone can provide a patch, that would be
great. Unfortunately I doubt it's anything I will ever implement myself.
  


Maybe the best answer is to install into an existing directory if one is 
found and otherwise trim the suffix to the "standard" set?



Should the rpmlint check be adjusted to cater to the GNU FHS?

I guess that's a question for the rpmlint people, whoever they are.
I don't see that Automake's default behavior is going to change.

Also, GNU (as an organization) never had anything to do with the FHS,
so far as I know. I don't think the GNU coding standards/maintainer
information have anything to say about this topic ...
  


I seem to remember reading somewhere that /usr is supposed to be a 
symlink to / on the GNU system, so no, GNU is not intended to follow FHS.



-- Jacob



Re: Old .Po file references old directory, how to start fresh?

2022-08-04 Thread Jacob Bachmeyer

Travis Pressler via Discussion list for automake wrote:

Hi,

I'm learning how to make an autotools project and have created a test project 
to work with. I ran make with a directory `nested` and then deleted it and 
deleted the reference to it in my `Makefile.am`.

Now I'm running ./configure && make​ and I get the following:

*** No rule to make target 'nested/main.c', needed by 'main.o'. Stop.​

How can I run `make` so that it doesn't reference this old nested​ directory?

I was curious if I could find where this reference is, so I did a grep -r 
nested .​ I think the only relevant hit is:

./src/.deps/main.Po:main.o nested/main.c /usr/include/stdc-predef.h 
/usr/include/stdio.h \​


Have you rerun automake to regenerate Makefile.in since changing 
Makefile.am?



-- Jacob




Re: type errors, command length limits, and Awk

2022-02-15 Thread Jacob Bachmeyer

Mike Frysinger wrote:

On 15 Feb 2022 21:17, Jacob Bachmeyer wrote:
  

Mike Frysinger wrote:


context: https://bugs.gnu.org/53340
  
  

Looking at the highlighted line in the context:



thanks for getting into the weeds with me
  


You are welcome.


  echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \

It seems that the problem is that am__base_list expects ListOf/File (and 
produces ChunkedListOf/File) but am__pep3147_tweak emits ListOf/Glob.  
This works in the usual case because the shell implicitly converts Glob 
-> ListOf/File and implicitly flattens argument lists, but results in 
the overall command line being longer than expected if the globs expand 
to more filenames than expected, as described there.


It seems that the proper solution to the problem at hand is to have 
am__pep3147_tweak expand globs itself somehow and thus provide 
ListOf/File as am__base_list expects.


Do I misunderstand?  Is there some other use for xargs?



if i did not care about double expansion, this might work.  the pipeline
quoted here handles the arguments correctly (other than whitespace splitting
on the initial input, but that's a much bigger task) before passing them to
the rest of the pipeline.  so the full context:

  echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \
  while read files; do \
$(am__uninstall_files_from_dir) || st=$$?; \
  done || exit $$?; \
...
am__uninstall_files_from_dir = { \
  test -z "$$files" \
|| { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
|| { echo " ( cd '$$dir' && rm -f" $$files ")"; \
 $(am__cd) "$$dir" && rm -f $$files; }; \
  }

leveraging xargs would allow me to maintain a single shell expansion.
the pathological situation being:
  bar.py
  __pycache__/
bar.pyc
bar*.pyc
bar**.pyc

py_files="bar.py" which turns into "__pycache__/bar*.pyc" by the pipeline,
and then am__uninstall_files_from_dir will expand it when calling `rm -f`.

if the pipeline expanded the glob, it would be:
  __pycache__/bar.pyc __pycache__/bar*.pyc __pycache__/bar**.pyc
and then when calling rm, those would expand a 2nd time.
  


If we know that there will be _exactly_ one additional shell expansion, 
why not simply filter the glob results through `sed 's/[?*]/\\&/g'` to 
escape potential glob metacharacters before emitting them from 
am__pep3147_tweak?  (Or is that not portable sed?)


Back to the pseudo-type model I used earlier, the difference between 
File and Glob is that Glob contains unescaped glob metacharacters, so 
escaping them should solve the problem, no?  (Or is there another thorn 
nearby?)



[...]

which at this point i've written `xargs -n40`, but not as fast :p.
  


Not as fast, yes, but certainly portable!  :p

The real question would be if it is faster than simply running rm once 
per file.  I would guess probably _so_ on MinGW (bash on Windows, where 
that logic would use shell builtins but running a new process is 
extremely slow) and probably _not_ on an archaic Unix system where 
"test" is not a shell builtin so saving the overhead and just running rm 
once per file would be faster.



automake jumps through some hoops to try and limit the length of generated
command lines, like deleting output objects in a non-recursive build.  it's
not perfect -- it breaks arguments up into 40 at a time (akin to xargs -n40)
and assumes that it won't have 40 paths with long enough names to exceed the
command line length.  it also has some logic where it's deleting paths by
globs, but the process to partition the file list into groups of 40 happens
before the glob is expanded, so there are cases where it's 40 globs that can
expand into many many more files and then exceed the command line length.
  
First, I thought that GNU-ish systems were not supposed to have such 
arbitrary limits,



one person's "arbitrary limits" is another person's "too small limit" :).
i'm most familiar with Linux, so i'll focus on that.

[...]

plus, backing up, Automake can't assume Linux.  so i think we have to
proceed as if there is a command line limit we need to respect.
  


So then the answer to my next question is that it is still an issue, 
even if the GNU system were to allow arguments up to available memory.


and this issue (the context) originated from Gentoo 
GNU/Linux.  Is this a more fundamental bug in Gentoo or still an issue 
because Automake build scripts are supposed to be portable to foreign 
system that do have those limits?



to be clear, what's failing is an Automake test.  it sets the `rm` limit to
an articially low one.  [...]

Gentoo happened to find this error before Automake because Gentoo also found
and fixe

type errors, command length limits, and Awk (was: portability of xargs)

2022-02-15 Thread Jacob Bachmeyer

Mike Frysinger wrote:

context: https://bugs.gnu.org/53340
  

Looking at the highlighted line in the context:

>   echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \
It seems that the problem is that am__base_list expects ListOf/File (and 
produces ChunkedListOf/File) but am__pep3147_tweak emits ListOf/Glob.  
This works in the usual case because the shell implicitly converts Glob 
-> ListOf/File and implicitly flattens argument lists, but results in 
the overall command line being longer than expected if the globs expand 
to more filenames than expected, as described there.


It seems that the proper solution to the problem at hand is to have 
am__pep3147_tweak expand globs itself somehow and thus provide 
ListOf/File as am__base_list expects.


Do I misunderstand?  Is there some other use for xargs?

I note that the current version of standards.texi also allows configure 
and make rules to use awk(1); could that be useful here instead? (see below)



[...]

automake jumps through some hoops to try and limit the length of generated
command lines, like deleting output objects in a non-recursive build.  it's
not perfect -- it breaks arguments up into 40 at a time (akin to xargs -n40)
and assumes that it won't have 40 paths with long enough names to exceed the
command line length.  it also has some logic where it's deleting paths by
globs, but the process to partition the file list into groups of 40 happens
before the glob is expanded, so there are cases where it's 40 globs that can
expand into many many more files and then exceed the command line length.
  


First, I thought that GNU-ish systems were not supposed to have such 
arbitrary limits, and this issue (the context) originated from Gentoo 
GNU/Linux.  Is this a more fundamental bug in Gentoo or still an issue 
because Automake build scripts are supposed to be portable to foreign 
system that do have those limits?


Second, counting files in the list, as you note, does not necessarily 
actually conform to the system limits, while Awk can track both number 
of elements in the list and the length of the list as a string, allowing 
to break the list to meet both command tail length limits (on Windows or 
total size of block to transfer with execve on POSIX) and argument count 
limits (length of argv acceptable to execve on POSIX).


POSIX Awk should be fairly widely available, although at least Solaris 
10 has a non-POSIX awk in /usr/bin and a POSIX awk in /usr/xpg4/bin; I 
found this while working on DejaGnu.  I ended up using this test to 
ensure that "awk" is suitable:


8<--
# The non-POSIX awk in /usr/bin on Solaris 10 fails this test
if echo | "$awkbin" '1 && 1 {exit 0}' > /dev/null 2>&1 ; then
   have_awk=true
else
   have_awk=false
fi
8<--


Another "gotcha" with Solaris 10 /usr/bin/awk is that it will accept 
"--version" as a valid Awk program, so if you use that to test whether 
"awk" is GNU Awk, you must redirect input from /dev/null or it will hang.


Automake may want to do more extensive testing to find a suitable Awk; 
the above went into a script that remains generic when installed and so 
must run its tests every time the user invokes it, so "quick" was a high 
priority.



-- Jacob



Re: portability of xargs

2022-02-15 Thread Jacob Bachmeyer

Dan Kegel wrote:

Meson is a candidate for such a next-gen config system.  It is in python,
which does not quite qualify as usable during early uplift/bootstrap, but
there are C ports in progress, see e.g. https://sr.ht/~lattis/muon/
  


*Please* do not introduce a dependency on Python; they do not worry much 
about backwards compatibility.  If there is ever a Python 4 with a 3->4 
transition anything like the 2->3 transition, you could end up with 
every past release relying on current Python becoming unbuildable.


Having complex dependencies for creating the build scripts is one thing, 
but needing major packages (like Python) to *use* the build scripts is a 
serious problem for anything below the "user application" tier, 
especially the "base system" tier.



-- Jacob




Re: Automake for RISC-V

2021-11-20 Thread Jacob Bachmeyer

Billa Surendra wrote:

On Sun, 21 Nov, 2021, 2:28 am Nick Bowler,  wrote:
  

On 20/11/2021, Billa Surendra  wrote:


I have RISC-V native compiler on target image, but when I am compiling
automake on target image it needs automake on target. This is the main
problem.
  

Automake should not be required to install automake if you are using
a released version and have not modified the build system

Could you please explain more, What is the released version ? . Modified
build system means ?
  


Automake should only be needed if you have changed a "Makefile.am" file 
somewhere.


Are you using some kind of packaging system that likes to regenerate 
build files as a matter of course?  The normal "/path/to/src/configure 
&& make && make install" procedure should not require Automake to be 
installed.



-- Jacob



Re: Automake for RISC-V

2021-11-18 Thread Jacob Bachmeyer

Billa Surendra wrote:

Thanks for your reply. I have installed perl on target system but target
image and build system perl version were different. And second, thing I
have noticed  that in aclocal script very first line is #! /bin/perl
  


A simple workaround is to find perl on the target system image (probably 
/usr/bin/perl, but it could have been installed somewhere else) and make 
a symlink at /bin/perl to the real interpreter.  It is possible that 
your build system has /bin as a symlink to /usr/bin, as a certain 
widely-loathed developer has been rather forcefully advocating the past 
few years...




-- Jacob



Re: Automake testsuite misuses DejaGnu [PATCH v0]

2021-07-12 Thread Jacob Bachmeyer

Jim Meyering wrote:

[...]
Even a sample fix for one of the currently-failing tests would be helpful.
  


This is the first draft; this patch breaks 1.6.1 because versions of 
DejaGnu prior to 1.6.3 require srcdir to point exactly to the testsuite, 
while 1.6.3 allows the testsuite to be in ${srcdir}/testsuite.


8<--
diff -urN -x '*~' automake-1.16.3-original/t/check12.sh 
automake-1.16.3/t/check12.sh
--- automake-1.16.3-original/t/check12.sh   2020-11-18 19:21:03.0 
-0600
+++ automake-1.16.3/t/check12.sh2021-06-29 01:47:21.669276386 -0500
@@ -60,8 +60,8 @@
DEJATOOL = hammer spanner
AM_RUNTESTFLAGS = HAMMER=$(srcdir)/hammer SPANNER=$(srcdir)/spanner
EXTRA_DIST += $(DEJATOOL)
-EXTRA_DIST += hammer.test/hammer.exp
-EXTRA_DIST += spanner.test/spanner.exp
+EXTRA_DIST += testsuite/hammer.test/hammer.exp
+EXTRA_DIST += testsuite/spanner.test/spanner.exp
END

cat > hammer << 'END'
@@ -77,9 +77,10 @@
END
chmod +x hammer spanner

-mkdir hammer.test spanner.test
+mkdir testsuite
+mkdir testsuite/hammer.test testsuite/spanner.test

-cat > hammer.test/hammer.exp << 'END'
+cat > testsuite/hammer.test/hammer.exp << 'END'
set test test_hammer
spawn $HAMMER
expect {
@@ -88,7 +89,7 @@
}
END

-cat > spanner.test/spanner.exp << 'END'
+cat > testsuite/spanner.test/spanner.exp << 'END'
set test test_spanner
spawn $SPANNER
expect {
diff -urN -x '*~' automake-1.16.3-original/t/dejagnu3.sh 
automake-1.16.3/t/dejagnu3.sh
--- automake-1.16.3-original/t/dejagnu3.sh  2020-11-18 19:21:03.0 
-0600
+++ automake-1.16.3/t/dejagnu3.sh   2021-06-29 01:19:19.161147525 -0500
@@ -34,12 +34,13 @@
AUTOMAKE_OPTIONS = dejagnu
DEJATOOL = hammer
AM_RUNTESTFLAGS = HAMMER=$(srcdir)/hammer
-EXTRA_DIST = hammer hammer.test/hammer.exp
+EXTRA_DIST = hammer testsuite/hammer.test/hammer.exp
END

-mkdir hammer.test
+mkdir testsuite
+mkdir testsuite/hammer.test

-cat > hammer.test/hammer.exp << 'END'
+cat > testsuite/hammer.test/hammer.exp << 'END'
set test test
spawn $HAMMER
expect {
diff -urN -x '*~' automake-1.16.3-original/t/dejagnu4.sh 
automake-1.16.3/t/dejagnu4.sh
--- automake-1.16.3-original/t/dejagnu4.sh  2020-11-18 19:21:03.0 
-0600
+++ automake-1.16.3/t/dejagnu4.sh   2021-06-29 01:25:08.309780437 -0500
@@ -49,13 +49,14 @@

AM_RUNTESTFLAGS = HAMMER=$(srcdir)/hammer SPANNER=$(srcdir)/spanner

-EXTRA_DIST  = hammer  hammer.test/hammer.exp
-EXTRA_DIST += spanner spanner.test/spanner.exp
+EXTRA_DIST  = hammer  testsuite/hammer.test/hammer.exp
+EXTRA_DIST += spanner testsuite/spanner.test/spanner.exp
END

-mkdir hammer.test spanner.test
+mkdir testsuite
+mkdir testsuite/hammer.test testsuite/spanner.test

-cat > hammer.test/hammer.exp << 'END'
+cat > testsuite/hammer.test/hammer.exp << 'END'
set test test
spawn $HAMMER
expect {
@@ -64,7 +65,7 @@
}
END

-cat > spanner.test/spanner.exp << 'END'
+cat > testsuite/spanner.test/spanner.exp << 'END'
set test test
spawn $SPANNER
expect {
diff -urN -x '*~' automake-1.16.3-original/t/dejagnu5.sh 
automake-1.16.3/t/dejagnu5.sh
--- automake-1.16.3-original/t/dejagnu5.sh  2020-11-18 19:21:03.0 
-0600
+++ automake-1.16.3/t/dejagnu5.sh   2021-06-29 01:26:36.511645792 -0500
@@ -34,12 +34,13 @@

cat > Makefile.am << END
AUTOMAKE_OPTIONS = dejagnu
-EXTRA_DIST = $package $package.test/$package.exp
+EXTRA_DIST = $package testsuite/$package.test/$package.exp
AM_RUNTESTFLAGS = PACKAGE=\$(srcdir)/$package
END

-mkdir $package.test
-cat > $package.test/$package.exp << 'END'
+mkdir testsuite
+mkdir testsuite/$package.test
+cat > testsuite/$package.test/$package.exp << 'END'
set test "a_dejagnu_test"
spawn $PACKAGE
expect {
diff -urN -x '*~' automake-1.16.3-original/t/dejagnu6.sh 
automake-1.16.3/t/dejagnu6.sh
--- automake-1.16.3-original/t/dejagnu6.sh  2020-11-18 19:21:03.0 
-0600
+++ automake-1.16.3/t/dejagnu6.sh   2021-06-29 01:28:07.151396859 -0500
@@ -35,8 +35,9 @@
AM_RUNTESTFLAGS = FAILDEJA=$(srcdir)/faildeja
END

-mkdir faildeja.test
-cat > faildeja.test/faildeja.exp << 'END'
+mkdir testsuite
+mkdir testsuite/faildeja.test
+cat > testsuite/faildeja.test/faildeja.exp << 'END'
set test failing_deja_test
spawn $FAILDEJA
expect {
diff -urN -x '*~' automake-1.16.3-original/t/dejagnu7.sh 
automake-1.16.3/t/dejagnu7.sh
--- automake-1.16.3-original/t/dejagnu7.sh  2020-11-18 19:21:03.0 
-0600
+++ automake-1.16.3/t/dejagnu7.sh   2021-06-29 01:29:38.877097021 -0500
@@ -39,8 +39,9 @@
AM_RUNTESTFLAGS = --status FAILTCL=$(srcdir)/failtcl
END

-mkdir failtcl.test
-cat > failtcl.test/failtcl.exp << 'END'
+mkdir testsuite
+mkdir testsuite/failtcl.test
+cat > testsuite/failtcl.test/failtcl.exp << 'END'
set test test
spawn $FAILTCL
expect {
diff -urN -x '*~' automake-1.16.3-original/t/dejagnu-absolute-builddir.sh 
automake-1.16.3/t/dejagnu-absolute-builddir.sh
--- automake-1.16.3-original/t/dejagnu-absolute-builddir.sh 2020-11-18 
19:21:03.0 -0600
+++ automake-1.16.3/t/dejagnu-absolute-builddir.sh  2021-06-29 
01:36:15.6

Re: Automake testsuite misuses DejaGnu

2021-07-12 Thread Jacob Bachmeyer

Daniel Herring wrote:
It seems fragile for DejaGnu to probe for a testsuite directory and 
change its behavior as you describe.  For example, I could have a 
project without the testsuite dir, invoke the tester, and have it find 
and run some unrelated files in the parent directory.  Unexpected 
behavior (chaos) may ensue.


This already happens and this is the behavior that is deprecated and 
even more fragile.  Without a testsuite/ directory, DejaGnu will end up 
searching the tree for *.exp files and running them all.  Eventually, if 
$srcdir neither is nor contains "testsuite", DejaGnu will throw an error 
and abort.  The testsuite/ directory is a long-documented requirement.


Is there an explicit command-line argument that could be added to the 
Automake invocation?


Not easily; the probing is done specifically to allow for two different 
ways of using DejaGnu:  using recursive Makefiles that invoke DejaGnu 
with the testsuite/ directory current, and using non-recursive 
Makefiles, which with Automake will invoke DejaGnu with the top-level 
directory, presumably containing the "testsuite" directory.  Both of 
these cases must be supported:  the toolchain packages use the former 
and Automake's basic DejaGnu support will use the latter if a 
non-recursive layout is desired.


Both of these use the same command line argument --srcdir and site.exp 
variable srcdir; the difference is that srcdir has acquired two 
different meanings.



-- Jacob



Re: Automake testsuite misuses DejaGnu

2021-07-12 Thread Jacob Bachmeyer

Karl Berry wrote:
DejaGnu has always required a DejaGnu testsuite to be rooted at a 
"testsuite" directory


If something was stated in the documentation, but not enforced by the
code, hardly surprising that "non-conformance" is widespread.
  


It is not widespread -- all of the toolchain packages correctly place 
their testsuites in testsuite/ directories.  As far as I know, the 
Automake tests are the only outlier.



Anyway, it seems like an unfriendly requirement for users. And even more
to incompatibly enforce something now that has not been enforced for
previous decades. Why? (Just wondering.) -k


Previous versions of DejaGnu did not properly handle non-recursive make 
with Automake-produced makefiles.  Beginning with 1.6.3, the testsuite 
is allowed to be in ${srcdir}/testsuite instead of ${srcdir} exactly.  
Enforcing the long-documented (and mostly followed) requirement that 
there be a directory named "testsuite" containing the testsuite allows 
DejaGnu to resolve the ambiguity and determine if it has been invoked at 
package top-level or in the testsuite/ directory directly.


Even in 1.6.3, there was intent to continue to allow the broken cases to 
work with a warning, but I made the conditional for that case too narrow 
(oops!) and some of the Automake test cases fail as a result.  Fixing 
this now is appropriate because no one is going to see the future 
deprecation warnings due to the way Automake tests are run.



-- Jacob



Re: Automake testsuite misuses DejaGnu

2021-07-12 Thread Jacob Bachmeyer

Jim Meyering wrote:

On Sun, Jul 11, 2021 at 9:03 PM Jacob Bachmeyer  wrote:
  

[...]

The affected tests are:  check12, dejagnu3, dejagnu4, dejagnu5,
dejagnu6, dejagnu7, dejagnu-absolute-builddir, dejagnu-relative-srcdir,
dejgnu-siteexp-extend, dejagnu-siteexp-useredit.

[...]



Thank you for the analysis and heads-up.
I see that Fedora 34 currently has only dejagnu-1.6.1.
If this is something you can help with now, I can certainly wait a few days.

Even a sample fix for one of the currently-failing tests would be helpful.
  


That is part of the problem:  I have a patch, but applying it will cause 
the tests to fail with DejaGnu 1.6.1.  Older versions of DejaGnu require 
$srcdir to be exactly the root of the testsuite, while 1.6.3 accepts a 
testsuite in $srcdir or ${srcdir}/testsuite; the latter is needed to 
allow Automake to invoke DejaGnu from the top-level in the tree.


I expect to have time to try a recursive make solution later tonight or 
tomorrow.  Do I understand correctly that I will need to add "SUBDIRS = 
testsuite" to the top-level TEST_CASE/Makefile.am in the test case and 
move the "AUTOMAKE_OPTIONS = dejagnu" and "DEJATOOL" definitions to 
TEST_CASE/testsuite/Makefile.am to get Automake to invoke DejaGnu in the 
testsuite subdirectory instead of top-level?



-- Jacob



Automake testsuite misuses DejaGnu

2021-07-11 Thread Jacob Bachmeyer
I was planning to find a solution with a complete patch before 
mentioning this, but since a release is imminent I will just state the 
problem:  several tests in the Automake testsuite misuse DejaGnu and 
fail with the 1.6.3 DejaGnu release as a result.


DejaGnu has always required a DejaGnu testsuite to be rooted at a 
"testsuite" directory and this has long been documented in the manual.  
However, prior to 1.6.3, DejaGnu did not actually depend on this 
requirement being met.  Changes during the development process to 
properly support non-recursive Automake makefiles required relying on 
this requirement to resolve the ambiguity between recursive and 
non-recursive usage.  Several tests in the Automake testsuite do not 
meet this requirement and fail if run with DejaGnu 1.6.3.


The simple change of updating the tests to use a testsuite/ directory 
causes the tests to fail with older versions of DejaGnu, due to lack of 
support for non-recursive "make check" in those versions.  I have not 
yet tried a patch that also switches the tests to use recursive make, 
but I believe that is probably the only way for the tests to pass with 
old and new DejaGnu.


Note that, according to the original author, Rob Savoye, DejaGnu has 
always been intended to require that testsuites be rooted at a 
"testsuite" directory and the behavior that Automake's test cases rely 
on was never supported.


The affected tests are:  check12, dejagnu3, dejagnu4, dejagnu5, 
dejagnu6, dejagnu7, dejagnu-absolute-builddir, dejagnu-relative-srcdir, 
dejgnu-siteexp-extend, dejagnu-siteexp-useredit.


Note that these tests do not all fail with the 1.6.3 release, but will 
all fail with some future release when the undocumented support for a 
testsuite not rooted at "testsuite" will eventually be removed.



-- Jacob



Re: parallel build issues

2021-06-23 Thread Jacob Bachmeyer

Bob Friesenhahn wrote:
It is possible to insert additional dependency lines in Makefile.am so 
software is always built in the desired order, but this approach might 
only work if you always build using the top level Makefile.


This should actually work here:  the problem is that a target in doc/ 
also depends on a target in frontend/ and uses recursive make to build 
that target.  When the top-level Makefile is used in parallel mode, 
sub-makes are concurrently run in both doc/ and frontend/ but the doc/ 
sub-make invokes another make in frontend/ leading to a race and failure.


If only doc/Makefile is used, it will spawn a sub-make in frontend/ that 
will be the only make running there and will succeed.  If only 
frontend/Makefile is used, everything works similarly.  Since the 
problem can only occur when building with the top-level Makefile, adding 
a dependency in the top-level Makefile should prevent it.



-- Jacob