Re: RFC: Add public macros AC_LOG_CMD and AC_LOG_FILE.

2024-06-26 Thread Zack Weinberg
On Sun, Jun 23, 2024, at 10:23 PM, Zack Weinberg wrote:
> I'm thinking of making AC_RUN_LOG, which has existed forever but is
> undocumented, an official documented macro under the new name
> AC_LOG_CMD.  I'm renaming it because I also want to add AC_LOG_FILE, a
> generalization of _AC_MSG_LOG_CONFTEST.

FYI, this seemed straightforward to implement but has exposed a whole
bunch of latent problems, so it'll be a week or two more before I have
patches ready to go, but there's no insurmountable obstacles.  You can
all follow along at
.
(This branch will definitely be rebased at least once before things are
ready to merge.)

zw



Re: RFC: Add public macros AC_LOG_CMD and AC_LOG_FILE.

2024-06-26 Thread Karl Berry
Subject: RFC: Add public macros AC_LOG_CMD and AC_LOG_FILE.

FWIW, it sounds good to me. To my mind, logging is one of the most
important features of autoconf, so I'm all for macros to support it
further. --thanks, karl.



Re: RFC: Add public macros AC_LOG_CMD and AC_LOG_FILE.

2024-06-24 Thread Nick Bowler
On 2024-06-24 10:04, Zack Weinberg wrote:
> On Mon, Jun 24, 2024, at 2:56 AM, Nick Bowler wrote:
>> I think at the same time it would be worth documenting the AS_LINENO
>> functionality, which is the main internal functionality of these
>> macros that (unless you just goes ahead and use it) Autoconf users
>> can't really replicate in their own logging.
> 
> I believe what you mean is you want _AS_ECHO_LOG to be promoted to a
> documented and supported macro, and for AS_LINENO_PUSH and AS_LINENO_POP
> also to be documented for external use.  Is this correct?  Did I miss
> any other internal macros that ought to be available for external
> use?

> I don't think we should tell people to be using $as_lineno directly,
> is there some use case for it that isn't covered by existing macros?

On reflection, I think I may have had a mistaken understanding of the
purpose of the as_lineno when I last looked at these macros.  I assumed
it was related to supporting shells without LINENO support but it seems
it is not the case, so maybe nothing is actually needed.

Perhaps a link from the (very short) description of AS_LINENO_PREPARE[1]
to the description of LINENO[2] might have helped.

That being said, something like AS_ECHO_LOG (with or without
AS_LINENO_PUSH/POP) looks generally useful, although I don't have an
immediate use case offhand besides implementing an AC_RUN_LOG workalike.

[1] 
https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.72/autoconf.html#index-AS_005fLINENO_005fPREPARE
[2] 
https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.72/autoconf.html#index-LINENO-1

[...]
> Will do.  The main point of the macro is that it does something a little
> fancier than "cat file", so it's unambiguous where normal log output
> resumes. Like the existing _AC_MSG_LOG_CONFTEST does:
> 
> configure: failed program was:
> | /* confdefs.h */
> | #define PACKAGE_NAME "lexlib-probe"
> | #define PACKAGE_TARNAME "lexlib-probe"
> | #define PACKAGE_VERSION "1"
> | ... etc ...
> configure: result: no
> 
> The "label" will go where _AC_MSG_LOG_CONFTEST prints "failed program was".

Looks great, this example output definitely helps to understand when one
might want to use this macro.

Cheers,
  Nick



Re: RFC: Add public macros AC_LOG_CMD and AC_LOG_FILE.

2024-06-24 Thread Zack Weinberg
On Mon, Jun 24, 2024, at 2:56 AM, Nick Bowler wrote:
> On 2024-06-23 22:23, Zack Weinberg wrote:
>> I'm thinking of making AC_RUN_LOG, which has existed forever but is
>> undocumented, an official documented macro ...
>
> Yes, please!
>
> I will note that Autoconf has a lot of "run and log a command" internal
> macros with various comments of the form "doesn't work well" suggesting
> that this is a hard feature to get right.

... Wow, this is a bigger mess than I thought last night.  Up to bad
quotation in third party macros, however, I *think* almost all of it
is obsolete and can be scrapped.  Stay tuned.

> I think at the same time it would be worth documenting the AS_LINENO
> functionality, which is the main internal functionality of these
> macros that (unless you just goes ahead and use it) Autoconf users
> can't really replicate in their own logging.

I believe what you mean is you want _AS_ECHO_LOG to be promoted to a
documented and supported macro, and for AS_LINENO_PUSH and AS_LINENO_POP
also to be documented for external use.  Is this correct?  Did I miss
any other internal macros that ought to be available for external
use?  I don't think we should tell people to be using $as_lineno directly,
is there some use case for it that isn't covered by existing macros?

> If you implement this, please explain in the manual what "labeled with
> /label/" really means, otherwise I'm left wondering why this macro
> exists when we can almost as easily write something like:
>
>   { echo label; cat file; } >&AS_MESSAGE_LOG_FD
>
> Including example logfile output together with the example program
> might be sufficient.

Will do.  The main point of the macro is that it does something a little
fancier than "cat file", so it's unambiguous where normal log output
resumes. Like the existing _AC_MSG_LOG_CONFTEST does:

configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "lexlib-probe"
| #define PACKAGE_TARNAME "lexlib-probe"
| #define PACKAGE_VERSION "1"
| ... etc ...
configure: result: no

The "label" will go where _AC_MSG_LOG_CONFTEST prints "failed program was".

zw



Re: RFC: Add public macros AC_LOG_CMD and AC_LOG_FILE.

2024-06-23 Thread Nick Bowler
On 2024-06-23 22:23, Zack Weinberg wrote:
> I'm thinking of making AC_RUN_LOG, which has existed forever but is
> undocumented, an official documented macro ...

Yes, please!

I will note that Autoconf has a lot of "run and log a command" internal
macros with various comments of the form "doesn't work well" suggesting
that this is a hard feature to get right.

I think at the same time it would be worth documenting the AS_LINENO
functionality, which is the main internal functionality of these
macros that (unless you just goes ahead and use it) Autoconf users
can't really replicate in their own logging.

> +@anchor{AC_LOG_FILE}
> +@defmac AC_LOG_FILE (@var{file}, @var{label})
> +Record the contents of @var{file} in @file{config.log}, labeled with
> +@var{label}.
> +@end defmac

If you implement this, please explain in the manual what "labeled with
/label/" really means, otherwise I'm left wondering why this macro
exists when we can almost as easily write something like:

  { echo label; cat file; } >&AS_MESSAGE_LOG_FD

Including example logfile output together with the example program
might be sufficient.

Cheers,
  Nick



RFC: Add public macros AC_LOG_CMD and AC_LOG_FILE.

2024-06-23 Thread Zack Weinberg
I'm thinking of making AC_RUN_LOG, which has existed forever but is
undocumented, an official documented macro under the new name
AC_LOG_CMD.  I'm renaming it because I also want to add AC_LOG_FILE, a
generalization of _AC_MSG_LOG_CONFTEST.

These are handy any time you want to record details of why some test got
the result it did in config.log; AC_COMPILE_IFELSE and friends have been
able to do this forever, but hand-written tests of the shell environment
or of interpreters can't without reaching into autoconf's guts.
Automake has been carrying around its own copy of AC_LOG_CMD (under the
name AM_LOG_CMD) forever as well...

I haven't *implemented* this yet, but here's the proposed documentation.
What do you think?

zw

+@node Logging
+@section Logging Details of Tests
+
+It is helpful to record details of @emph{why} each test got the result
+that it did, in @file{config.log}.  The macros that compile test
+programs (@code{AC_COMPILE_IFELSE} etc.; @pxref{Writing Tests}) do this
+automatically, but if you write a test that only involves M4sh and basic
+shell commands, you will need to do it yourself, using the following macros.
+
+@anchor{AC_LOG_CMD}
+@defmac AC_LOG_CMD (@var{shell-command})
+Execute @var{shell-command}.
+Record @var{shell-command} in @file{config.log}, along with any error
+messages it printed (specifically, everything it wrote to its standard
+error) and its exit code.
+
+This macro may be used within a command substitution, or as the test
+argument of @code{AS_IF} or a regular shell @code{if} statement.
+@end defmac
+
+@anchor{AC_LOG_FILE}
+@defmac AC_LOG_FILE (@var{file}, @var{label})
+Record the contents of @var{file} in @file{config.log}, labeled with
+@var{label}.
+@end defmac
+
+Here is an example of how to use these macros to test for a feature of
+the system `awk'.
+
+@smallexample
+AC_PROG_AWK
+AC_MSG_CHECKING([whether $AWK supports 'asort'])
+cat > conftest.awk <<\EOF
+[@{ lines[NR] = $0 @}
+END @{
+  ORS=" "
+  asort(lines)
+  for (i in lines) @{
+print lines[i]
+  @}
+@}]
+EOF
+
+AS_IF([result=`AC_LOG_CMD([printf 'z\ny\n' | $AWK -f conftest.awk])` &&
+   test x"$result" = x"y z"],
+  [AWK_HAS_ASORT=yes],
+  [AWK_HAS_ASORT=no
+   AC_LOG_FILE([conftest.awk], [test awk script])])
+AC_MSG_RESULT([$AWK_HAS_ASORT])
+AC_SUBST([AWK_HAS_ASORT])
+rm -f conftest.awk
+@end smallexample
 
 @c == Programming in M4.



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-08 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I'm currently looking at adding support for this to
  > https://github.com/hlein/distro-backdoor-scanner.

Are you one of the important developers of that?  I hope so,

There is a grave problem with github.com -- almost all access (even
read-only) requires running nonfree JS code.  We shouldn't link to it
from GNU packages or gnu.org.  (It is possible to do git clone,
if the user knows how to figure out the URL to use, but a user
can't get that URL from the github.com home page.)

So we are starting to removng links to github.com.

If you have influence in that project, could you ask them to create
a mirror on a repo that doesn't have that problem?  One we could
post links to?

Even better, to move it to a more fredeom-respecting repository.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GCC reporting piped input as a security feature (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-08 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > While it does not /prevent/ cracks, there is something we can ensure 
  > that we *keep* doing:  GCC, when reading from a pipe, records the input 
  > file as "" in debug info *even* if a "#" directive to set the 
  > filename has been included.  This was noticed by Adrien Nader (who 
  > posted it to oss-security; 
  > https://www.openwall.com/lists/oss-security/2024/04/03/2> and 
  > https://marc.info/?l=oss-security&m=171214932201156&w=2>; those are 
  > the same post at different public archives) and should provide a 
  > "smoking gun" test to detect this type of backdoor dropping technique in 
  > the future.  This GCC behavior should be documented as a security 
  > feature, because most program sources are not read from pipes.

Are you suggesting fixing GCC to put the specified file into those
linenumbers, or are you suggesting we keep this behavior
to help with analysis?

In principle it could be posible to output something different to
describe this stramge situation explicitly.  For instance, output "via
stdin" as a comment, or output `stdin/../filename' as the file name.
(Programs that optimize the file name by deleting XXX/.../ are likely
not to check whether XXX is a real directory.)

Are the GCC developers discussing these questions?  If not, please
send them a bug report about this so they start doing so.


-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GCC reporting piped input as a security feature (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-08 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > To avoid false positives if this test is used, we might want to add a 
  > rule to the GNU Coding Standards (probably in the "Makefile Conventions" 
  > section) that code generated with other utilities MUST always be 
  > materialized in the filesystem and MUST NOT be piped into the compiler.

That sounds like a good idea.  I expect this will not interfere with
anything useful.  What do others thing of this question?

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: detecting modified m4 files (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-07 Thread Jacob Bachmeyer

Bruno Haible wrote:

Richard Stallman commented on Jacob Bachmeyer's idea:
  
  > > Another related check that /would/ have caught this attempt would be 
  > > comparing the aclocal m4 files in a release against their (meta)upstream 
  > > sources before building a package.  This is something distribution 
  > > maintainers could do without cooperation from upstream.  If 
  > > m4/build-to-host.m4 had been recognized as coming from gnulib and 
  > > compared to the copy in gnulib, the nonempty diff would have been 
  > > suspicious.


I have a hunch that some effort is needed to do that comparison, but
that it is feasible to write a script to do it could make it easy.
Is that so?



Yes, the technical side of such a comparison is relatively easy to
implement:
  - There are less than about 2000 or 5000 *.m4 files that are shared
between projects. Downloading and storing all historical versions
of these files will take ca. 0.1 to 1 GB.
  - They would be stored in a content-based index, i.e. indexed by
sha256 hash code.
  - A distribution could then quickly test whether a *.m4 file found
in a distrib tarball is "known".

The recurrently time-consuming part is, whenever an "unknown" *.m4 file
appears, to
  - manually review it,
  - update the list of upstream git repositories (e.g. when a project
has been forked) or the list of releases to consider (e.g. snapshots
of GNU Autoconf or GNU libtool, or distribution-specific modifications).

I agree with Jacob that a distro can put this in place, without needing
to bother upstream developers.


I have since thought of a simple solution that /would/ have caught this 
backdoor campaign in its tracks:  an "autopoint --check" command that 
simply compares the m4/ files (and possibly others?) that autopoint 
would copy in if m4/ were empty against the files that would be copied 
and reports any differences.  A newer serial in the package tree than 
the system m4 library produces a minor complaint; a file with the same 
serial and different contents produces a major complaint.  An older 
serial in the package tree should be reported, but is likely to be of no 
consequence if a distribution's packaging routine will copy in the 
known-good newer version before rebuilding configure.  Any m4/ files 
local to the package are simply reported, but those are also in the 
package's Git repository.


Distribution package maintainers would run "autopoint --check" and pass 
any suspicious files to upstream maintainers for evaluation.  (The 
distribution's own packaging system can trace an m4 file in the system 
library came to its upstream package.)  The modified build-to-host.m4 
would have been very /unlikely/ to slip past the 
gnulib/gettext/Automake/Autoconf maintainers, although few distribution 
packagers would have had suspicions.  The gnulib maintainers would know 
that gl_BUIILD_TO_HOST should not be checking /anything/ itself and the 
crackers would have been caught.


This should be effective in closing off a large swath of possible 
attacks:  a backdoor concealed in binary test data (or documentation) 
requires some visible means to unpack it, which means the unpacker must 
appear in source somewhere.  While the average package maintainer might 
not be able to make sense of a novel m4 file, the maintainers of GNU's 
version of that file /will/ be able to recognize such chicanery, and the 
"red herrings" the cracker added for obfuscation would become a 
liability.  Without them, the effect of the new code is more obvious, so 
the crackers lose either way.



-- Jacob




Re: GCC reporting piped input as a security feature (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-05 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

[...]

When considering any such change, we still should consider the question:
will this actually prevent cracks, or will it rather give crackers
an additional way to check that their activities can't be detected.
  


While it does not /prevent/ cracks, there is something we can ensure 
that we *keep* doing:  GCC, when reading from a pipe, records the input 
file as "" in debug info *even* if a "#" directive to set the 
filename has been included.  This was noticed by Adrien Nader (who 
posted it to oss-security; 
https://www.openwall.com/lists/oss-security/2024/04/03/2> and 
https://marc.info/?l=oss-security&m=171214932201156&w=2>; those are 
the same post at different public archives) and should provide a 
"smoking gun" test to detect this type of backdoor dropping technique in 
the future.  This GCC behavior should be documented as a security 
feature, because most program sources are not read from pipes.


The xz backdoor dropper took great pains to minimize its use of the 
filesystem; only the binary blob ever touches the disk, and that 
presumably because there is no other way to feed it into the linker.  If 
debug info is regularly checked for symbols obtained from "" and 
the presence of such symbols reliably indicates funny business, then we 
force crackers to risk leaving more direct traces in the filesystem, 
instead of being able to patch the code "in memory" and feed an 
ephemeral stream to the compiler.  The "Jia Tan" crackers seem to have 
put a lot of work into minimizing the "footprint" of their dropper, so 
we can assume that this must have been important to them.


To avoid false positives if this test is used, we might want to add a 
rule to the GNU Coding Standards (probably in the "Makefile Conventions" 
section) that code generated with other utilities MUST always be 
materialized in the filesystem and MUST NOT be piped into the compiler.



-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-05 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Can anyone think of a feasible way to prevent this sort of attack?
  > A common way would be to use PGP signing to bless a set of files. 
  > Perhaps a manifest which specifies the file names/paths and their sha256 
  > would be sufficient.  But there needs to be a way to augment this in 
  > case there are multiple collections of blessed files, including those 
  > blessed by the user.

Could you make that last part more precise and clear>

  > > What is an "OS package manager"?

  > A popular OS package manager is Debian 'apt'

Thanks, now I know what you meant.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-05 Thread Sam James
Bruno Haible  writes:

> Richard Stallman commented on Jacob Bachmeyer's idea:
>>   > > Another related check that /would/ have caught this attempt would be 
>>   > > comparing the aclocal m4 files in a release against their 
>> (meta)upstream 
>>   > > sources before building a package.  This is something distribution 
>>   > > maintainers could do without cooperation from upstream.  If 
>>   > > m4/build-to-host.m4 had been recognized as coming from gnulib and 
>>   > > compared to the copy in gnulib, the nonempty diff would have been 
>>   > > suspicious.
>> 
>> I have a hunch that some effort is needed to do that comparison, but
>> that it is feasible to write a script to do it could make it easy.
>> Is that so?
>
> Yes, the technical side of such a comparison is relatively easy to
> implement:
>   - There are less than about 2000 or 5000 *.m4 files that are shared
> between projects. Downloading and storing all historical versions
> of these files will take ca. 0.1 to 1 GB.
>   - They would be stored in a content-based index, i.e. indexed by
> sha256 hash code.
>   - A distribution could then quickly test whether a *.m4 file found
> in a distrib tarball is "known".
>
> The recurrently time-consuming part is, whenever an "unknown" *.m4 file
> appears, to
>   - manually review it,
>   - update the list of upstream git repositories (e.g. when a project
> has been forked) or the list of releases to consider (e.g. snapshots
> of GNU Autoconf or GNU libtool, or distribution-specific modifications).
>
> I agree with Jacob that a distro can put this in place, without needing
> to bother upstream developers.

I'm currently looking at adding support for this to
https://github.com/hlein/distro-backdoor-scanner. It was brought up at
https://openwall.com/lists/oss-security/2024/04/02/5.

>
> Bruno

best,
sam



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-04 Thread Bruno Haible
Richard Stallman commented on Jacob Bachmeyer's idea:
>   > > Another related check that /would/ have caught this attempt would be 
>   > > comparing the aclocal m4 files in a release against their 
> (meta)upstream 
>   > > sources before building a package.  This is something distribution 
>   > > maintainers could do without cooperation from upstream.  If 
>   > > m4/build-to-host.m4 had been recognized as coming from gnulib and 
>   > > compared to the copy in gnulib, the nonempty diff would have been 
>   > > suspicious.
> 
> I have a hunch that some effort is needed to do that comparison, but
> that it is feasible to write a script to do it could make it easy.
> Is that so?

Yes, the technical side of such a comparison is relatively easy to
implement:
  - There are less than about 2000 or 5000 *.m4 files that are shared
between projects. Downloading and storing all historical versions
of these files will take ca. 0.1 to 1 GB.
  - They would be stored in a content-based index, i.e. indexed by
sha256 hash code.
  - A distribution could then quickly test whether a *.m4 file found
in a distrib tarball is "known".

The recurrently time-consuming part is, whenever an "unknown" *.m4 file
appears, to
  - manually review it,
  - update the list of upstream git repositories (e.g. when a project
has been forked) or the list of releases to consider (e.g. snapshots
of GNU Autoconf or GNU libtool, or distribution-specific modifications).

I agree with Jacob that a distro can put this in place, without needing
to bother upstream developers.

Bruno






Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-04 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Now for a bit of speculation.  I speculate that a cracker was careless
  > > and failed to adjust certain details of a bogus tar ball to be fully
  > > consistent, and that `make distcheck' enabled somene to notice those
  > > errors.
  > >
  > > I don't have any real info about whether that is so.  If my
  > > speculation is mistaken, please say so.

  > I believe it is completely mistaken.  As I understand, the crocked 
  > tarballs would have passed `make distcheck` with flying colors.  The 
  > rest of your questions about it therefore have no answer.

Thanks for correcting me on that point.  However, people have proposed
changes in make disclean and may propose changes in our coding
standards.

When considering any such change, we still should consider the question:
will this actually prevent cracks, or will it rather give crackers
an additional way to check that their activities can't be detected.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-04 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I would like to clarify that my purpose in starting this thread wasn't
  > so much to ask, "How could the xz backdoor specifically have been
  > prevented?" (which seems pretty clearly impossible) but rather, "How
  > can we use this incident as inspiration for general-purpose
  > improvements to the GNU Coding Standards and related tools?" In other
  > words, even if a proposal wouldn't have stopped this particular
  > attack, I don't think that's a reason not to try it.

I agree -- you are posing the important question.

However, people have proposed ideas here that (it seems)
could have made the XZ crack harder to do, or increased
the likelihood of spotting it.  For instance, checking m4
files against standard sources. and maybe some others.

So let's not discard completely the idea of preventing
the XZ crack.


-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-04 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Another related check that /would/ have caught this attempt would be 
  > > comparing the aclocal m4 files in a release against their (meta)upstream 
  > > sources before building a package.  This is something distribution 
  > > maintainers could do without cooperation from upstream.  If 
  > > m4/build-to-host.m4 had been recognized as coming from gnulib and 
  > > compared to the copy in gnulib, the nonempty diff would have been 
  > > suspicious.

I have a hunch that some effort is needed to do that comparison, but
that it is feasible to write a script to do it could make it easy.
Is that so?

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Alfred M. Szmidt


   [[[ To any NSA and FBI agents reading my email: please consider]]]
   [[[ whether defending the US Constitution against all enemies, ]]]
   [[[ foreign or domestic, requires you to follow Snowden's example. ]]]

 > My first thought was that Autoconf is a relatively trivial attack vector 
 > since it is so complex and the syntax used for some parts (e.g. m4 and 
 > shell scripts) is so arcane.  In particular, it is common for Autotools 
 > stuff to be installed on a computer (e.g. by installing a package from 
 > an OS package manager) and then used while building.  For example, 
there 
 > are large collections of ".m4" files installed.  If one of the m4 files 
 > consumed has been modified, then the resulting configure script has been 
 > modified.

   Can anyone think of a feasible way to prevent this sort of attack?

One cannot prvent it, only make it a bit harder -- possibly with the
draw back of making it more harder to find such attacks in the future
but that is hypothetical.

   Someone suggested that configure should not use m4 files that are
   lying around, but rather should fetch them from standard release points,
   WDYT of that idea?

It would be trivial to modify things after it has been fetched, make
the release, and you're back at square one.  One would also need to
keep a list of which m4 files to fetch, people write them for their
packages as well.

Requiring network access to build, or develop a GNU package is also
just a non-starter ... and if it is not requried, well you can just
not use it and again back at square one.


The idea of signing Autoconf offical M4 files is problematic, why
would you trust such a signature?  The attack on xz was performed by
the maintainer, given any rouge maintainer you'r shit ouf of luck.




Re: compressed release distribution formats (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-02 Thread Jacob Bachmeyer

Eric Blake wrote:

[adding in coreutils, for some history]

[...]

At any rate, it is now obvious (in hindsight) that zstd has a much
larger development team than xz, which may alter the ability of zstd
being backdoored in the same way that xz was, merely by social
engineering of a lone maintainer.
  


That just means that a cracker group needs to plant a mole in a larger 
team, which was effectively the goal of the sockpuppet campaign against 
the xz-utils maintainer, except that the cracker's sockpuppet was the 
second member of a two-member team.  I see no real difference here.


I would argue that GNU software should be consistently available in at 
least one format that can be unpacked using only tools that are also 
provided by the GNU project.  I believe that currently means "gzip", 
unfortunately.  We should probably look to adopt another one; perhaps 
the lzip maintainer might be interested?



It is also obvious that having GNU distributions available through
only a SINGLE compression format, when that format may be vulnerable,
  
The xz format is not vulnerable, or at least has not been shown to be so 
in the sense of security risks, and only xz-utils was backdoored.  Nor 
is there only one implementation:  7-zip can also handle xz files.

is a dis-service to users when it is not much harder to provide
tarballs in multiple formats.  Having multiple tarballs as the
recommendation can at least let us automate that each of the tarballs
has the same contents,
Agreed.  In fact, if multiple formats can be produced concurrently, we 
could validate that the compressed tarballs are actually identical.  
(Generate using `tar -cf - [...] | tee >(compress1 >[...].tar.comp1) | 
tee >(compress2 >[...].tar.comp2) | gzip -9 >[...].tar.gz` if you do not 
want to actually write the uncompressed tarball to the disk.)  But if 
tarlz is to be used to write the lzipped tarball, you probably want to 
settle for "same file contents", since tarlz only supports pax format 
and we may want to allow older tar programs to unpack GNU releases.

 although it won't make it any more obvious
whether those contents match what was in git (which was how the xz
backdoor got past so many people in the first place).
This is another widespread misunderstanding---almost all of the xz 
backdoor was hidden in plain sight (admittedly, compressed and/or 
encrypted) *in* the Git repository.  The only piece of the backdoor not 
found in Git was the modified build-to-host.m4.  The xz-utils project's 
standard practice had been to /not/ commit imported m4 files, but to 
bring them in when preparing release tarballs.  The cracker simply 
rolled the "key" to the dropper into the release tarball.  I still have 
not seen whether the configure script in the release tarball was built 
with the modified build-to-host.m4 or if the crackers were depending on 
distribution packagers to regenerate configure.


Again, everything present in both Git and the release tarball /was/ 
/identical/.  There were no mismatches, only files added to the release 
that are not in the repository, and that are /expected/ to be added to a 
release.



-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > My first thought was that Autoconf is a relatively trivial attack vector 
  > since it is so complex and the syntax used for some parts (e.g. m4 and 
  > shell scripts) is so arcane.  In particular, it is common for Autotools 
  > stuff to be installed on a computer (e.g. by installing a package from 
  > an OS package manager) and then used while building.  For example, there 
  > are large collections of ".m4" files installed.  If one of the m4 files 
  > consumed has been modified, then the resulting configure script has been 
  > modified.


Can anyone think of a feasible way to prevent this sort of attack?
  


There have been some possibilities suggested on other branches of the 
discussion.  I have changed the subject of one of those to "checking 
aclocal m4 files" to highlight it.  There is progress being made, but 
the solutions appear to be outside the direct scope of the GNU build 
system packages.



Someone suggested that configure should not use m4 files that are
lying around, but rather should fetch them from standard release points,
WDYT of that idea?
  


Autoconf configure scripts do not use nearby m4 files and do not require 
m4 at all; aclocal collects the files in question into aclocal.m4 (I 
think) and then autoconf uses that (and other inputs) to /produce/ 
configure.  (This may seem like a trivial point, but exact derivations 
and their timing were critical to how the backdoor dropper worked.)  
Other tools (at least autopoint from GNU gettext, possibly others) are 
used to automatically scan a larger set of m4 files stored on the system 
and copy those needed into the m4/ directory of a package source tree, 
in a process conceptually similar to how the linker pulls only needed 
members from static libraries when building an executable.  All of this 
is done on the maintainer's machine, so that the finished configure 
script is included in the release tarball.


There have been past incidents where malicious code was directly added 
to autoconf-generated configure scripts, so (as I understand) 
distribution packagers often regenerate configure before building a 
package.  In /this/ case, the crackers (likely) modified the version of 
build-to-host.m4 on /their/ computer, so the modified file would be 
copied into the xz-utils/m4 directory in the release tarball and used 
when distribution packagers regenerate configure before building the 
package.


Fetching these files from standard release points would require an index 
of those standard release points, and packages are allowed to have their 
own package-specific macros as well.  The entire system dates from well 
before ubiquitous network connectivity could be assumed (anywhere---and 
that is still a bad assumption in the less prosperous parts of the 
world), so release tarballs are meant to be self-contained, including 
copies of "standard" macros needed for configure but not supplied by 
autoconf/automake/etc.



-- Jacob



Re: checking aclocal m4 files (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-02 Thread Jacob Bachmeyer

Bruno Haible wrote:

Jacob Bachmeyer wrote:
  
Another related check that /would/ have caught this attempt would be 
comparing the aclocal m4 files in a release against their (meta)upstream 
sources before building a package.  This is something distribution 
maintainers could do without cooperation from upstream.  If 
m4/build-to-host.m4 had been recognized as coming from gnulib and 
compared to the copy in gnulib, the nonempty diff would have been 
suspicious.



True.

Note, however, that there would be some false positives:


True; all of these are Free Software, so a non-empty diff would still 
require manual review.



 libtool.m4 is often shipped modified,
  a) if the maintainer happens to use /usr/bin/libtoolize and
 is using a distro that has modified libtool.m4 (such as Gentoo), or
  


Distribution libtool patches could be accumulated into the set of "known 
sources".



  b) if the maintainer intentionally improved the support of specific
 platforms, such as Solaris 11.3.
  


In this case, the distribution maintainer should ideally take up pushing 
those improvements back to upstream libtool, if they are suitably general.



Also, for pkg.m4 there is no single upstream source. They distribute
a pkg.m4.in, from which pkg.m4 is generated on the developer's machine.
  


This would be a special case, but could be treated as a package-specific 
m4 file anyway, since the developer must generate it.  The developer 
could also write their own m4 macros to use with autoconf.



But for macros from Gnulib or the Autoconf macros archive, this is a
reasonable check to make.


This type of check could also allow "sweeping" improvements upstream, in 
the case of a package maintainer that may be unsure of how to upstream 
their changes.  (Of course, upstream needs to be careful about blindly 
collecting improvements, lest some of those "improvements" turn out to 
have come from cracker sockpuppets...)



-- Jacob




Re: binary data in source trees (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-02 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The issue seems to be releases containing binary data for unit tests, 
  > instead of source or scripts to generate that data.  In this case, that 
  > binary data was used to smuggle in heavily obfuscated object code.


If this is the crucial point, we could put in the coding standards
(or the maintainers' guide) not to do this.


On another branch of this discussion, Zack Weinberg noted that binary 
test data may be unavoidable in some cases.  (A base64 blob or hex dump 
may as well be a binary blob.)  Further, manuals often contain images, 
some of which may be in binary formats, such as PNG.  To take this all 
the way, we would have to require that all documentation graphics be 
generated from readable sources.  I know TikZ exists but am unsure how 
well it could be integrated into Texinfo, for example.



-- Jacob



Re: reproducible dists and builds (was: GNU Coding Standards, automake, and the recent xz-utils backdoor)

2024-04-02 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > What would be helpful is if `make dist' would guarantee to produce the same
  > tarball (bit-to-bit) each time it is run, assuming the tooling is the same
  > version.  Currently I believe that is not the case (at least due to 
timestamps)

Isn't this a description of "reproducible compilation"?
  


No, but it is closely related.  Compilation produces binary executables, 
while `make dist` produces a freestanding /source/ archive.



We want to make that standard, but progress is inevitably slow
because many packages need to be changed.


I am not actually sure that that is actually a good idea.  (Well, it is 
mostly a good idea except for one issue.)  If compilation is strictly 
deterministic, then everyone ends up with identical binaries, which 
means an exploit that cracks one will crack all.  Varied binaries make 
life harder for crackers developing exploits, and may even make "one 
exploit to crack them all" impossible.  This is one of the reasons that 
exploits have long hit Windows (where all the systems are identical) so 
much harder than the various GNU/Linux distributions (where the binaries 
are likely different even before distribution-specific patches are 
considered).


Ultimately, this probably means that we should have both an /ability/ 
for deterministic compilation and either a compiler mode or 
post-processing pass (a linker option?) to intentionally shuffle the 
final executable.



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Jeffrey Walton
On Tue, Apr 2, 2024 at 6:05 PM Karl Berry  wrote:
>
> I'm also wondering whether the GNU system should recommend using zstd
> instead of or in addition to xz for compression purposes.
>
> I'm not sure GNU explicitly recommends anything. Although the tarball
> examples in standards.texi and maintain.texi all use gz, I don't think
> even gz is explicitly recommended. (Which seems ok to me.)

As an extra datapoint, Debian does xz in some areas. From
<https://wiki.debian.org/DebianRepository/Format#Compression_of_indices>:

    Clients must support xz compression, and
must support gzip and bzip2 ...

Servers should offer only xz compressed files,
except for the special cases listed above.

> Personally, I would support lz4 over zstd simply because more GNU
> packages already use lz4.(*) Both lz4 and zstd are quite a bit less
> resource-hungry than xz, especially for compression. I don't know if
> there is a technical reason to prefer zstd.
>
> In general, I think it can continue to be left up to individual
> maintainers, vs. making any decrees. Automake supports them all
> (among others). --best, karl.
>
> (*) Looking at a listing of ftp.gnu.org, I see only gmp using zst, and
> perhaps a dozen or so packages using lz. Basically always in addition to
> another format.

Jeff



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Bob Friesenhahn

On 4/2/24 16:42, Richard Stallman wrote:


[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

   > My first thought was that Autoconf is a relatively trivial attack vector
   > since it is so complex and the syntax used for some parts (e.g. m4 and
   > shell scripts) is so arcane.  In particular, it is common for Autotools
   > stuff to be installed on a computer (e.g. by installing a package from
   > an OS package manager) and then used while building.  For example, there
   > are large collections of ".m4" files installed.  If one of the m4 files
   > consumed has been modified, then the resulting configure script has been
   > modified.

Can anyone think of a feasible way to prevent this sort of attack?
A common way would be to use PGP signing to bless a set of files. 
Perhaps a manifest which specifies the file names/paths and their sha256 
would be sufficient.  But there needs to be a way to augment this in 
case there are multiple collections of blessed files, including those 
blessed by the user.

   > It may be that an OS package manager

What is an "OS package manager"?


A popular OS package manager is Debian 'apt'. Well designed ones provide 
a way to test if installed files on the system have been modified.


But I only use this as an example since I don't think that any GNU build 
system should depend on something specific to an operating system.



Could you say concretely what this would do?  Which files do you have
in mind?  The m4 files discussed above?


M4 files, scripts, templates, and any other standard files which may be 
assimilated as part of the build process.



   > If installed files were themselves independently signed (or sha256s of
   > the files are contained in a signed manifest), and Autotools was able to
   > validate them while copying into a project ("bootstrapping"), then at
   > least there is some assurance that the many files which were consumed
   > have not been subverted.

Is this a proposal to deal with the problem described above?  I think
maybe it is, but things are not concrete enough for me to tell for
certain.


I do not think that it would solve the specific issues which lead to the 
xz-utils backdoor, but it may solve a large class of issues which have 
been ignored up until now.  People preparing operating system 
distributions solve such issues via the extensive (and repeatable) 
processes that they use.


GNU software developers are less likely (or able) to solve issues via 
extensive processes.  They expect that 'make distcheck' will prepare a 
clean distribution tarball.


Bob

--

Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Karl Berry
I'm also wondering whether the GNU system should recommend using zstd
instead of or in addition to xz for compression purposes.

I'm not sure GNU explicitly recommends anything. Although the tarball
examples in standards.texi and maintain.texi all use gz, I don't think
even gz is explicitly recommended. (Which seems ok to me.)

Personally, I would support lz4 over zstd simply because more GNU
packages already use lz4.(*) Both lz4 and zstd are quite a bit less
resource-hungry than xz, especially for compression. I don't know if
there is a technical reason to prefer zstd.

In general, I think it can continue to be left up to individual
maintainers, vs. making any decrees. Automake supports them all
(among others). --best, karl.

(*) Looking at a listing of ftp.gnu.org, I see only gmp using zst, and
perhaps a dozen or so packages using lz. Basically always in addition to
another format.



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Bob Friesenhahn

I'm also wondering whether the GNU system should recommend using zstd
instead of or in addition to xz for compression purposes.  Automake
gained support for dist-zstd back in 2019 [1], but I'm not sure how
many projects are using it yet.

[1] https://git.savannah.gnu.org/cgit/automake.git/commit/?id=5c466eaf


For several years, GraphicsMagick distributed a tarball compressed to 
zstd format.  This started before Automake offered support for it.


I used these rules:

# Rules to build a .tar.zst tarball (zstd compression)
dist-zstd: distdir
    tardir=$(distdir) && $(am__tar) | ZSTD_CLEVEL=$${ZSTD_CLEVEL-22} 
zstd --ultra -c >$(distdir).tar.zst

    $(am__post_remove_distdir)

With these options, the zst tarball came withing a hare's breath of the 
xz compressed file size.  I did not find any drawbacks.


I also had good experience with 'lzip', which has the benefit of a very 
small implementation and more compact coding than xz uses.


I stopped distributing anything but xz format since that is what almost 
everyone was choosing to download.


Bob

--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The issue seems to be releases containing binary data for unit tests, 
  > instead of source or scripts to generate that data.  In this case, that 
  > binary data was used to smuggle in heavily obfuscated object code.

If this is the crucial point, we could put in the coding standards
(or the maintainers' guide) not to do this.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > My first thought was that Autoconf is a relatively trivial attack vector 
  > since it is so complex and the syntax used for some parts (e.g. m4 and 
  > shell scripts) is so arcane.  In particular, it is common for Autotools 
  > stuff to be installed on a computer (e.g. by installing a package from 
  > an OS package manager) and then used while building.  For example, there 
  > are large collections of ".m4" files installed.  If one of the m4 files 
  > consumed has been modified, then the resulting configure script has been 
  > modified.

Can anyone think of a feasible way to prevent this sort of attack?

Someone suggested that configure should not use m4 files that are
lying around, but rather should fetch them from standard release points,
WDYT of that idea?

  > It may be that an OS package manager

What is an "OS package manager"?

 has the ability to validate already 
  > installed files,

Could you say concretely what this would do?  Which files do you have
in mind?  The m4 files discussed above?

  > If installed files were themselves independently signed (or sha256s of 
  > the files are contained in a signed manifest), and Autotools was able to 
  > validate them while copying into a project ("bootstrapping"), then at 
  > least there is some assurance that the many files which were consumed 
  > have not been subverted. 

Is this a proposal to deal with the problem described above?  I think
maybe it is, but things are not concrete enough for me to tell for
certain.

Let's please not talk about files "consumed" unless they are used up
in the process.

  > It seems common for OS distributions to modify some of the files 
  > (especially libtool related) so they differ from the original GNU versions.

The packager would need to specify another key and use that to sign
the files perse modifies.  Or maybe, to sign all the files in the
distribution.




-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > There is not much one can do when a maintainer with signing/release 
  > power does something intentionally wrong.

That is clearly true.  I don't think we should propose changes in tools
with the idea of preventing such sabotage outright.

We should also point out that free software is still far safer
than nonfree software.  With nonfree software, intentional sabotage
is normal practice (see https://gnu.org/malware/) and unintentional
gross security failures are not unusual.

However, this case could suggest improvements in practices or tools
that would catch more mistakes, and some instances of sabotage too.
It can't hurt to think about possibilities for that,

Just as long as we don't insist on perfect or nothing.
Because, as you said, no change in tools could protect perfectly
against this soft of devious sabotage.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > There is not much one can do when a maintainer with signing/release 
  > power does something intentionally wrong.

That is clearly true.  I don't think we should propose changes in
tools with the idea of outright preventing insider sabotage.

We should also point out that free software is still far safer than
nonfree software.  With nonfree software, intentional sabotage and
back doors are normal practice (see https://gnu.org/malware/), and
unintentional gross security failures are not unusual.

However, this case could suggest improvements in practices or tools
that would catch more mistakes, and some instances of sabotage too.
It can't hurt to think about possibilities for that,

That's useful to think about, as long as we don't insist that the
target is perfection.  Because, as you said, no change in tools could
protect _perfectly_ against devious sabotage by maintainers.

We may need more maintainers on some of the tools in question.
Aside from autoconf and automake, what tools are involved here?


-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > What would be helpful is if `make dist' would guarantee to produce the same
  > tarball (bit-to-bit) each time it is run, assuming the tooling is the same
  > version.  Currently I believe that is not the case (at least due to 
timestamps)

Isn't this a description of "reproducible compilation"?
We want to make that standard, but progress is inevitably slow
because many packages need to be changed.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Eric Blake
[adding in coreutils, for some history]

On Sat, Mar 30, 2024 at 12:55:35PM -0400, Eric Gallager wrote:
> I was recently reading about the backdoor announced in xz-utils the
> other day, and one of the things that caught my attention was how
> (ab)use of the GNU build system played a role in allowing the backdoor
> to go unnoticed: https://openwall.com/lists/oss-security/2024/03/29/4

I'm also wondering whether the GNU system should recommend using zstd
instead of or in addition to xz for compression purposes.  Automake
gained support for dist-zstd back in 2019 [1], but I'm not sure how
many projects are using it yet.

[1] https://git.savannah.gnu.org/cgit/automake.git/commit/?id=5c466eaf

Furthermore, I was around when GNU Coreutils kicked off the initial
push to support dist-xz (initially named dist-lzma, before a change in
upstream [2]) because of its benefits over over dist-bz2 (compresses
smaller, decompresses faster) [3][4], and even when it ditched
dist-gzip leaving dist-xz as its ONLY release option [5][6], before
needing to be reinstated for bootstrapping Guix [7][8].

[2] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=b52a8860
[3] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=b75e3b85
[4] https://lists.gnu.org/r/bug-coreutils/2007-10/msg00165.html
[5] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=e1c589ec
[6] https://lists.gnu.org/r/coreutils/2011-10/msg0.html
[7] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=516cdf38
[8] https://lists.gnu.org/r/coreutils/2020-02/msg00042.html

At any rate, it is now obvious (in hindsight) that zstd has a much
larger development team than xz, which may alter the ability of zstd
being backdoored in the same way that xz was, merely by social
engineering of a lone maintainer.

It is also obvious that having GNU distributions available through
only a SINGLE compression format, when that format may be vulnerable,
is a dis-service to users when it is not much harder to provide
tarballs in multiple formats.  Having multiple tarballs as the
recommendation can at least let us automate that each of the tarballs
has the same contents, although it won't make it any more obvious
whether those contents match what was in git (which was how the xz
backdoor got past so many people in the first place).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Bruno Haible
Jacob Bachmeyer wrote:
> Another related check that /would/ have caught this attempt would be 
> comparing the aclocal m4 files in a release against their (meta)upstream 
> sources before building a package.  This is something distribution 
> maintainers could do without cooperation from upstream.  If 
> m4/build-to-host.m4 had been recognized as coming from gnulib and 
> compared to the copy in gnulib, the nonempty diff would have been 
> suspicious.

True.

Note, however, that there would be some false positives: libtool.m4
is often shipped modified,
  a) if the maintainer happens to use /usr/bin/libtoolize and
 is using a distro that has modified libtool.m4 (such as Gentoo), or
  b) if the maintainer intentionally improved the support of specific
 platforms, such as Solaris 11.3.

Also, for pkg.m4 there is no single upstream source. They distribute
a pkg.m4.in, from which pkg.m4 is generated on the developer's machine.

But for macros from Gnulib or the Autoconf macros archive, this is a
reasonable check to make.

Bruno






Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-02 Thread Jose E. Marchesi


> Jose E. Marchesi wrote:
>>> Jose E. Marchesi wrote:
>>> 
>>>>> [...]
>>>>> 
>>>>>> I agree that distcheck is good but not a cure all.  Any static
>>>>>> system can be attacked when there is motive, and unit tests are
>>>>>> easily gamed.
>>>>>> 
>>>>> The issue seems to be releases containing binary data for unit tests,
>>>>> instead of source or scripts to generate that data.  In this case,
>>>>> that binary data was used to smuggle in heavily obfuscated object
>>>>> code.
>>>>> 
>>>> As a side note, GNU poke (https://jemarch.net/poke) is good for
>>>> generating arbitrarily complex binary data from clear textual
>>>> descriptions.
>>>>   
>>> While it is suitable for that use, at last check poke is itself very
>>> complex, complete with its own JIT-capable VM.  This is good for
>>> interactive use, but I get nervous about complexity in testsuites,
>>> where simplicity can greatly aid debugging, and it /might/ be possible
>>> to hide a backdoor similarly in a poke pickle.  (This seems to be a
>>> general problem with powerful interactive editors.)
>>> 
>>
>> Yes, I agree simplicity it is very desirable, in testsuites and actually
>> everywhere else.  I also am not fond of dragging in dependencies.
>>   
>
> Exactly---I am sure that poke is great for interactive use, but a
> self-contained solution is probably better for a testsuite.
>
>> But I suppose we also agree in that it is not possible to assembly
>> non-trivial binary data structures in a simple way, without somehow
>> moving the complexity of the encoding into some sort of generator, which
>> will not be simple.  The GDB testsuite, for example, ships with a DWARF
>> assembler written in around 3000 lines of Tcl.  Sure, it is simpler than
>> poke and doesn't drag in additional dependencies.  But it has to be
>> carefully maintained and kept up to date, and the complexity is there.
>>   
>
> The problem for a compression tool testsuite is that compression
> formats are (I believe) defined as byte-streams or bit-streams.
> Further, the generator(s) must be able to produce /incorrect/ output
> as well, in order to test error handling.
>>> Further, GNU poke defines its own specialized programming language for
>>> manipulating binary data.  Supplying generator programs in C (or C++)
>>> for binary test data in a package that itself uses C (or C++) ensures
>>> that every developer with the skills to improve or debug the package
>>> can also understand the testcase generators.
>>> 
>>
>> Here we will have to disagree.
>>
>> IMO it is precisely the many and tricky details on properly marshaling
>> binary data in general-purpose programming languages that would have
>> greater odds to lead to difficult to understand, difficult to maintain
>> and possibly buggy or malicious encoders.  The domain specific language
>> is here an advantage, not a liability.
>>
>> This you need to do in C to encode and generate test data for a single
>> signed 32-bit NUMBER in an output file in a _more or less_ portable way:
>>
>>   void generate_testdata (off_t offset, int endian, int number)
>>   {
>> int bin_flag = 0, fd;
>>
>>   #ifdef _WIN32
>> int bin_flag = O_BINARY;
>>   #endif
>> fd = open ("testdata.bin", bin_flag, S_IWUSR);
>> if (fd == -1)
>>   fatal ("error generating data.");
>> if (endian == BIG)
>>   {
>> b[0] = (number >> 24) & 0xff;
>> b[1] = (number >> 16) & 0xff;
>> b[2] = (number >> 8) & 0xff;
>> b[3] = number & 0xff;
>>   }
>> else
>>   {
>> b[3] = (number >> 24) & 0xff;
>> b[2] = (number >> 16) & 0xff;
>> b[1] = (number >> 8) & 0xff;
>> b[0] = number & 0xff;
>>   }
>>
>> lseek (fd, offset, SEEK_SET);
>> for (i = 0; i < 4; ++i)
>>   write (fd, &b[i], 1);
>> close (fd);
>>   }
>>   
>
> While that is a nice general solution, (aside from neglecting the
> declaration "uint8_t b[4];"; with "int b[4];", the code would only
> work on a little-endian processor; with no declaration, the compiler
> will reject it) a compression format would

Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Eric Gallager wrote:

On Tue, Apr 2, 2024 at 12:04 AM Jacob Bachmeyer  wrote:
  

Russ Allbery wrote:


[...] I think one useful principle that's
emerged that doesn't disrupt the world *too* much is that the release
tarball should differ from the Git tag only in the form of added files.
  

 From what I understand, the xz backdoor would have passed this check.

[...]


[...] In other
words, even if a proposal wouldn't have stopped this particular
attack, I don't think that's a reason not to try it.


I agree that there may be dumber crackers who /would/ get caught by such 
a check, but I want to ensure that we do not end up thinking that we 
have a solution and the problem is solved and everyone is happy ... and 
then we get caught out when it happens again.


I should clarify also that I think that this proposal *is* a good idea, 
but we should remain aware that it would not have prevented this incident.


Maneuvering around back to topic, aclocal m4 files are fairly small, 
perhaps always carrying all of them that a package uses in the 
repository should be considered a good practice?  (In other words, 
autogen.sh should *not* run autopoint---the files autopoint adds should 
be in the repository.)  If such a practice were followed, that would 
have made checking for altered files between repository and release 
effective, or it would have forced the cracker to target the backdoor 
more widely and place the altered build-to-host.m4 in the repository, 
increasing the probability of discovery.


Wording that as a policy:  "All data inputs used to construct the build 
scripts for a package shall be stored in the package's repository."


Another related check that /would/ have caught this attempt would be 
comparing the aclocal m4 files in a release against their (meta)upstream 
sources before building a package.  This is something distribution 
maintainers could do without cooperation from upstream.  If 
m4/build-to-host.m4 had been recognized as coming from gnulib and 
compared to the copy in gnulib, the nonempty diff would have been 
suspicious.



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Richard Stallman wrote:

[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > `distcheck` target's prominence to recommend it in the "Standard
  > Targets for All Users" section of the GCS? 


  > Replying as an Automake developer, I have nothing against it in
  > principle, but it's clearly up to the GNU coding standards
  > maintainers. As far as I know, that's still rms (for anything
  > substantive)

To make a change in the coding standards calls for a clear and
specific proposal.  If people think a change is desirable, I suggest
making one or more such proposals.

Now for a bit of speculation.  I speculate that a cracker was careless
and failed to adjust certain details of a bogus tar ball to be fully
consistent, and that `make distcheck' enabled somene to notice those
errors.

I don't have any real info about whether that is so.  If my
speculation is mistaken, please say so.


I believe it is completely mistaken.  As I understand, the crocked 
tarballs would have passed `make distcheck` with flying colors.  The 
rest of your questions about it therefore have no answer.


On a side note, thanks for Emacs:  when I finally extracted a copy of 
the second shell script in the backdoor dropper, Emacs made short work 
(M-< M-> C-M-\) of properly indenting it and making the control flow 
obvious.  Misunderstandings of that control flow have been fairly 
common.  (I too had it wrong before I finally had a nicely indented copy.)


The backdoor was actually discovered in operation on machines running 
testing package versions.  It caused sshd to consume an inordinate 
amount of CPU time, with profiling reporting that sshd was spending most 
of its time in liblzma, a library not even linked in sshd.  (The "rogue" 
library had been loaded as a dependency of libsystemd, which the 
affected distributions had patched sshd to use for startup notification.)


I will send a more detailed reply on the other thread, since its subject 
is more appropriate.



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Eric Gallager
On Tue, Apr 2, 2024 at 12:04 AM Jacob Bachmeyer  wrote:
>
> Russ Allbery wrote:
> > [...]
> >
> > There is extensive ongoing discussion of this on debian-devel.  There's no
> > real consensus in that discussion, but I think one useful principle that's
> > emerged that doesn't disrupt the world *too* much is that the release
> > tarball should differ from the Git tag only in the form of added files.
> >
>
>  From what I understand, the xz backdoor would have passed this check.
> The backdoor dropper was hidden in test data files that /were/ in the
> repository, and required code in the modified build-to-host.m4 to
> activate it.  The m4 files were not checked into the repository, instead
> being added (presumably by running autogen.sh with a rigged local m4
> file collection) while preparing the release.
>
> Someone with a copy of a crocked release tarball should check if
> configure even had the backdoor "as released" or if the attacker was
> /depending/ on distributions to regenerate configure before packaging xz.
>
>
> -- Jacob
>

I would like to clarify that my purpose in starting this thread wasn't
so much to ask, "How could the xz backdoor specifically have been
prevented?" (which seems pretty clearly impossible) but rather, "How
can we use this incident as inspiration for general-purpose
improvements to the GNU Coding Standards and related tools?" In other
words, even if a proposal wouldn't have stopped this particular
attack, I don't think that's a reason not to try it.



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Russ Allbery
Jacob Bachmeyer  writes:

> The m4 files were not checked into the repository, instead being added
> (presumably by running autogen.sh with a rigged local m4 file
> collection) while preparing the release.

Ah, yes, I think you are correct.  For some reason I thought the
legitimate build-to-host.m4 had been checked into the repository, but this
is indeed not the case.

-- 
Russ Allbery (ea...@eyrie.org) 



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Zack Weinberg wrote:

On Mon, Apr 1, 2024, at 2:04 PM, Russ Allbery wrote:
  

"Zack Weinberg"  writes:


It might indeed be worth thinking about ways to minimize the
difference between the tarball "make dist" produces and the tarball
"git archive" produces, starting from the same clean git checkout,
and also ways to identify and audit those differences.
  

There is extensive ongoing discussion of this on debian-devel. There's
no real consensus in that discussion, but I think one useful principle
that's emerged that doesn't disrupt the world *too* much is that the
release tarball should differ from the Git tag only in the form of
added files. Any files that are present in both Git and in the release
tarball should be byte-for-byte identical.



That dovetails nicely with something I was thinking about myself.
Obviously the result of "make dist" should be reproducible except for
signatures; to the extent it isn't already, those are bugs in automake.
But also, what if "make dist" produced *two* disjoint tarballs? One of
which is guaranteed to be byte-for-byte identical to an archive of the
VCS at the release tag (in some clearly documented fashion; AIUI, "git
archive" does *not* do what we want).  The other contains all the files
that "autoreconf -i" or "./bootstrap.sh" or whatever would create, but
nothing else.  Diffs could be provided for both tarballs, or only for
the VCS-archive tarball, whichever turns out to be more compact (I can
imagine the diff for the generated-files tarball turning out to be
comparable in size to the generated-files tarball itself).


The way to do that is to detect that "make dist" is being run in a VCS 
checkout, ask the VCS which files are in version control, and assume the 
others were somehow "brought in" by autogen.sh or whatever.  The problem 
is that now Automake needs to start growing support for varying version 
control systems, unless we /really/ want to say that this feature only 
works with Git.


The problem is that now the disjoint tarballs both need to be unpacked 
in the same directory to build the package and once that is done, how 
does "make dist" rebuild the distribution it was run from?  The file 
lists would need to be stored in the generated-files tarball.


The other problem is that this really needs to be an option.  DejaGnu, 
for example, stores the Autotools-generated files in Git and releases 
are just snapshots of the working tree.  (DejaGnu can also now *run* 
from a Git checkout without actually installing it, but that is a 
convenience limited to interpreted languages.)


Lastly, publishing a modified (third-party) distribution derived from a 
release instead of VCS *is* permitted.  (I believe this is a case of 
freedom 3.)  How would this feature interact with that?



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Russ Allbery wrote:

[...]

There is extensive ongoing discussion of this on debian-devel.  There's no
real consensus in that discussion, but I think one useful principle that's
emerged that doesn't disrupt the world *too* much is that the release
tarball should differ from the Git tag only in the form of added files.
  


From what I understand, the xz backdoor would have passed this check.  
The backdoor dropper was hidden in test data files that /were/ in the 
repository, and required code in the modified build-to-host.m4 to 
activate it.  The m4 files were not checked into the repository, instead 
being added (presumably by running autogen.sh with a rigged local m4 
file collection) while preparing the release.


Someone with a copy of a crocked release tarball should check if 
configure even had the backdoor "as released" or if the attacker was 
/depending/ on distributions to regenerate configure before packaging xz.



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Zack Weinberg wrote:

[...] but I do think there's a valid point here: the malicious xz
maintainer *might* have been caught earlier if they had committed the
build-to-host.m4 modification to xz's VCS.


That would require someone to notice that xz.git has a build-to-host.m4 
that does not exist anywhere in the history of gnulib.git.  That is a 
fairly complex scan, although it does look straightforward to 
implement.  That said, the m4 files in Gnulib *are* Free Software, so 
having a modified version cannot itself raise too many concerns.



  (Or they might not have!
Witness the three (and counting) malicious patches that they barefacedly
submitted to *other* software and got accepted because the malice was
subtle enough to pass through code review.)
  


Exactly.  :-/

That said, the whole thing looks to me like the attackers were trying to 
/not/ hit the more (what is the best word?) "advanced" users---the 
backdoor would only be inserted if building distribution packages, and 
then only under dpkg or rpm, not other systems like Gentoo's Portage or 
in an unpackaged "./configure && make && sudo make install" build.  This 
would, of course, hit the most widely used systems, including (reports 
are that the sock farm tried very hard to get Ubuntu to ship the crocked 
version in their upcoming release, but the freeze was upheld) the 
systems most commonly used by less technically-skilled users, but 
pointedly exclude systems that require greater skill to use---and whose 
users would be more likely to notice anything amiss and start tearing 
the system apart with the debugger.  Unfortunately for Mr. Sockmaster, 
it turns out that some highly-skilled users *do* use the widely-used 
systems and the backdoor caused sshd to misbehave enough to draw 
suspicion.  (Profiling reports that sshd is spending most of its time in 
liblzma---a library it has no reason to use---will tend to raise a few 
eyebrows.  :-)  )



[...]
  
Maybe the best revision to the GNU Coding Standards would be that 
releases should, if at all possible, contain only text?  Any binary 
files needed for testing can be generated during "make check" if 
necessary



I don't think this is a good idea.  It's only a speed bump for someone
trying to smuggle malicious data into a package (think "base64 -d") and
it makes life substantially harder for honest authors of programs that
work with binary data, and authors of material whose "source code"
(as GPLv3 uses that term) *is* binary data.  Consider pngsuite, for
instance (http://www.schaik.com/pngsuite/) -- it would be a *ton* of
work to convert each of these test PNG files into GNU Poke scripts,
and probably the result would be *less* ergonomic for purposes of
improving the test suite.
  


That is a bad example because SNG (https://sng.sourceforge.net/>) 
exists precisely to provide a a text representation of PNG binary 
structures.  (Admittedly, if I recall correctly, the contents of IDAT 
are simply a hexdump.)


While we are on the topic, this leaves the other obvious place to hide 
binary data:  images used as part of the manual.  There is a reason that 
I added the "if at all possible" caveat, and I am not certain that it is 
always possible.



I would like to suggest that a more useful policy would be "files
written to $prefix by 'make install' should not have any data
dependency on files labeled as part of the package's testsuite".
That doesn't constrain honest authors and it seems within the
scope of what the reproducible builds people could test for.
(Build the package, install to nonce prefix 1, unpack the tarball
again, delete the test suite, build again, install to prefix 2, compare.)
Of course a sufficiently determined malicious coder could detect
the reproducible-build test environment, but unlike "no binary data"
this is a substantial difficulty increment.


This could be a good idea.  Another way to check this even without 
reproducible builds would be to ensure that the access timestamps on 
testsuite files do not change while "make" is processing the main 
sources.  Checking this is slightly more invasive, since you would need 
to run a hook between processing top-level directories during the main 
build, but for packages using recursive Automake, you could simply run 
"make -C src" (or wherever the main sources are) and make sure that the 
testsuite files still have the same atime afterwards.  I admit that this 
is harder to automate in general, but distribution packaging processes 
already have other metadata that is manually maintained, so identifying 
the source subtrees that yield the installable artifacts should not be 
difficult.


Now that I think about it, I suggest tightening that policy a bit 
further:  "files produced by make in the source subtree (typically src/) 
shall have no data dependency on files outside of that tree"


I doubt anyone ever thought that recursive make could end up as 
security/verifiability feature.  8-|



-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jacob Bachmeyer

Jose E. Marchesi wrote:

Jose E. Marchesi wrote:


[...]



I agree that distcheck is good but not a cure all.  Any static
system can be attacked when there is motive, and unit tests are
easily gamed.
  
  

The issue seems to be releases containing binary data for unit tests,
instead of source or scripts to generate that data.  In this case,
that binary data was used to smuggle in heavily obfuscated object
code.



As a side note, GNU poke (https://jemarch.net/poke) is good for
generating arbitrarily complex binary data from clear textual
descriptions.
  

While it is suitable for that use, at last check poke is itself very
complex, complete with its own JIT-capable VM.  This is good for
interactive use, but I get nervous about complexity in testsuites,
where simplicity can greatly aid debugging, and it /might/ be possible
to hide a backdoor similarly in a poke pickle.  (This seems to be a
general problem with powerful interactive editors.)



Yes, I agree simplicity it is very desirable, in testsuites and actually
everywhere else.  I also am not fond of dragging in dependencies.
  


Exactly---I am sure that poke is great for interactive use, but a 
self-contained solution is probably better for a testsuite.



But I suppose we also agree in that it is not possible to assembly
non-trivial binary data structures in a simple way, without somehow
moving the complexity of the encoding into some sort of generator, which
will not be simple.  The GDB testsuite, for example, ships with a DWARF
assembler written in around 3000 lines of Tcl.  Sure, it is simpler than
poke and doesn't drag in additional dependencies.  But it has to be
carefully maintained and kept up to date, and the complexity is there.
  


The problem for a compression tool testsuite is that compression formats 
are (I believe) defined as byte-streams or bit-streams.  Further, the 
generator(s) must be able to produce /incorrect/ output as well, in 
order to test error handling.



Further, GNU poke defines its own specialized programming language for
manipulating binary data.  Supplying generator programs in C (or C++)
for binary test data in a package that itself uses C (or C++) ensures
that every developer with the skills to improve or debug the package
can also understand the testcase generators.



Here we will have to disagree.

IMO it is precisely the many and tricky details on properly marshaling
binary data in general-purpose programming languages that would have
greater odds to lead to difficult to understand, difficult to maintain
and possibly buggy or malicious encoders.  The domain specific language
is here an advantage, not a liability.

This you need to do in C to encode and generate test data for a single
signed 32-bit NUMBER in an output file in a _more or less_ portable way:

  void generate_testdata (off_t offset, int endian, int number)
  {
int bin_flag = 0, fd;

  #ifdef _WIN32
int bin_flag = O_BINARY;
  #endif
fd = open ("testdata.bin", bin_flag, S_IWUSR);
if (fd == -1)
  fatal ("error generating data.");

if (endian == BIG)

  {
b[0] = (number >> 24) & 0xff;
b[1] = (number >> 16) & 0xff;
b[2] = (number >> 8) & 0xff;
b[3] = number & 0xff;
  }
else
  {
b[3] = (number >> 24) & 0xff;
b[2] = (number >> 16) & 0xff;
b[1] = (number >> 8) & 0xff;
b[0] = number & 0xff;
  }

lseek (fd, offset, SEEK_SET);
for (i = 0; i < 4; ++i)
  write (fd, &b[i], 1);
close (fd);
  }
  


While that is a nice general solution, (aside from neglecting the 
declaration "uint8_t b[4];"; with "int b[4];", the code would only work 
on a little-endian processor; with no declaration, the compiler will 
reject it) a compression format would be expected to define the 
endianess of stored values, so the major branch in that function would 
collapse to just one of its alternatives.  Compression formats are 
generally defined as streams, so a different decomposition of the 
problem would likely make more sense:  (example untested)


   void emit_int32le (FILE * out, int value)
   {
 unsigned int R, i;

 for (R = (unsigned int)value, i = 0; i < 4; R = R >> 8, i++)
   if (fputc(R & 0xff, out) == EOF)
 fatal("error writing int32le");
   }
 

Other code handles opening OUT, or OUT is actually stdout and we are 
writing down a pipe or the shell handled opening the file.  (The main 
function can easily check that stdout is not a terminal and bail out if 
it is.)  Remember that I am suggesting test generator programs, which do 
not need to be as general as ordinary code, nor do they need the same 
level of user-friendliness, since they are expected to be run from 
scripts that encode the precise knowledge of how to call them. 

Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > `distcheck` target's prominence to recommend it in the "Standard
  > Targets for All Users" section of the GCS? 

  > Replying as an Automake developer, I have nothing against it in
  > principle, but it's clearly up to the GNU coding standards
  > maintainers. As far as I know, that's still rms (for anything
  > substantive)

To make a change in the coding standards calls for a clear and
specific proposal.  If people think a change is desirable, I suggest
making one or more such proposals.

Now for a bit of speculation.  I speculate that a cracker was careless
and failed to adjust certain details of a bogus tar ball to be fully
consistent, and that `make distcheck' enabled somene to notice those
errors.

I don't have any real info about whether that is so.  If my
speculation is mistaken, please say so.  But supposing it is correct:

If we had publicized `make distcheck' more, would that have been
likely to help people detect the bogus tar ball sooner?  Or would it
have been likely to help the cracker be more careful about avoiding
such signs?  Would they balance out?


-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Bruno Haible
Eric Gallager wrote:
> What about a 3rd one of these prefixes: "novcs", to teach automake
> about which files belong in VCS or not? i.e. then you might have a
> variable name like:
> dist_novcs_DATA = foo bar baz
> ...which would indicate that foo, bar, and baz are data files that
> ought to be distributed in the release tarball, but not in the
> VCS-based one?

The maintainer already decides which files to put under version control,
on a per-file basis ('git add' vs. 'git rm'). Why should a maintainer
duplicate this information in a Makefile.am? The lists can then diverge,
leading to hassle.

> Or would it be easier to just teach automake to read
> .gitignore files and the like so that it can get that information from
> there?

Of course, if you want to have a Makefile target that needs the
information whether some file is in VCS, it should use 'git' commands
(such as 'git status') to determine this information. Whether it
additionally should read .gitignore files, can be debated on a case-by-case
basis.

Bruno






Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Eric Gallager
On Mon, Apr 1, 2024 at 2:26 PM Zack Weinberg  wrote:
>
> On Mon, Apr 1, 2024, at 2:04 PM, Russ Allbery wrote:
> > "Zack Weinberg"  writes:
> >> It might indeed be worth thinking about ways to minimize the
> >> difference between the tarball "make dist" produces and the tarball
> >> "git archive" produces, starting from the same clean git checkout,
> >> and also ways to identify and audit those differences.
> >
> > There is extensive ongoing discussion of this on debian-devel. There's
> > no real consensus in that discussion, but I think one useful principle
> > that's emerged that doesn't disrupt the world *too* much is that the
> > release tarball should differ from the Git tag only in the form of
> > added files. Any files that are present in both Git and in the release
> > tarball should be byte-for-byte identical.
>
> That dovetails nicely with something I was thinking about myself.
> Obviously the result of "make dist" should be reproducible except for
> signatures; to the extent it isn't already, those are bugs in automake.
> But also, what if "make dist" produced *two* disjoint tarballs? One of
> which is guaranteed to be byte-for-byte identical to an archive of the
> VCS at the release tag (in some clearly documented fashion; AIUI, "git
> archive" does *not* do what we want).

Thinking about how to implement this: so, currently automake variables
have (at least) 2 special prefixes (that I can think of at the moment)
that control various automake behaviors: "dist" or "nodist" to control
inclusion in the distribution, and "noinst" to prevent installation.
What about a 3rd one of these prefixes: "novcs", to teach automake
about which files belong in VCS or not? i.e. then you might have a
variable name like:
dist_novcs_DATA = foo bar baz
...which would indicate that foo, bar, and baz are data files that
ought to be distributed in the release tarball, but not in the
VCS-based one? Or would it be easier to just teach automake to read
.gitignore files and the like so that it can get that information from
there?

> The other contains all the files that "autoreconf -i" or "./bootstrap.sh"
> or whatever would create, but nothing else.  Diffs could be provided
> for both tarballs, or only for the VCS-archive tarball, whichever turns
> out to be more compact (I can imagine the diff for the generated-files
> tarball turning out to be comparable in size to the generated-files
> tarball itself).
>
> This should make it much easier to find, and therefore audit, the pre-
> generated files, and to validate that there's no overlap. It would add
> an extra step for people who want to build from tarball, without having
> to install autoconf (or whatever) first -- but an easier extra step
> than, y'know, installing autoconf. :)  Conversely, people who want to
> build from tarballs but *not* use the pre-generated configure, etc,
> could now download the 'bare' tarball only.
>
> ("Couldn't those people just build from a git checkout?"  Not if they
> don't have the tooling for it, not during early stages of a distribution
> bootstrap, etc.  Also, the act of publishing a tarball that's a golden
> copy of the VCS at the release tag is valuable for archival purposes.)
>

Agreed on these points.

> zw



Should the GNU Coding Standards make a recommendation about aclocal's `--install` flag? (was: "Re: GNU Coding Standards, automake, and the recent xz-utils backdoor")

2024-04-01 Thread Eric Gallager
On Sun, Mar 31, 2024 at 6:19 PM Peter Johansson  wrote:
>
>
> On 1/4/24 06:00, Eric Gallager wrote:
>
> So, `aclocal` has a flag to control this behavior: specifically, its
> `--install` flag. Right now I don't see `aclocal` mentioned in the GNU
> Coding Standards at all. Should they be updated to include a
> recommendation as to whether it's better to put `--install` in
> `ACLOCAL_AMFLAGS` or not? Or would such a recommendation be a better
> fit for the `automake` manual (since that's where `aclocal` comes
> from)?
>
> A common scenario is that the embedded M4 files are not the latest version 
> and that the code in configure.ac is not compatible with newer versions that 
> might be installed. Setting the --install flag and make every developer 
> bootstrap with 'aclocal --install' or anyone trying to bootstrap an old 
> version of the project would be very fragile. Also 'aclocal --install' only 
> overwrite the embedded copy if the serial numbers in the files suggest the 
> installed file is a newer version than the embedded M4 file.

Note that there's some discussion ongoing on the bug-autoconf and
bug-gnulib mailing lists (which I'm not subscribed to, but will read
via the archives occasionally) regarding whether aclocal's current
handling of serial numbers is the correct way to behave or not, see
for example starting here:
https://lists.gnu.org/archive/html/bug-autoconf/2024-04/msg3.html

>
> Peter



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Zack Weinberg
On Mon, Apr 1, 2024, at 2:04 PM, Russ Allbery wrote:
> "Zack Weinberg"  writes:
>> It might indeed be worth thinking about ways to minimize the
>> difference between the tarball "make dist" produces and the tarball
>> "git archive" produces, starting from the same clean git checkout,
>> and also ways to identify and audit those differences.
>
> There is extensive ongoing discussion of this on debian-devel. There's
> no real consensus in that discussion, but I think one useful principle
> that's emerged that doesn't disrupt the world *too* much is that the
> release tarball should differ from the Git tag only in the form of
> added files. Any files that are present in both Git and in the release
> tarball should be byte-for-byte identical.

That dovetails nicely with something I was thinking about myself.
Obviously the result of "make dist" should be reproducible except for
signatures; to the extent it isn't already, those are bugs in automake.
But also, what if "make dist" produced *two* disjoint tarballs? One of
which is guaranteed to be byte-for-byte identical to an archive of the
VCS at the release tag (in some clearly documented fashion; AIUI, "git
archive" does *not* do what we want).  The other contains all the files
that "autoreconf -i" or "./bootstrap.sh" or whatever would create, but
nothing else.  Diffs could be provided for both tarballs, or only for
the VCS-archive tarball, whichever turns out to be more compact (I can
imagine the diff for the generated-files tarball turning out to be
comparable in size to the generated-files tarball itself).

This should make it much easier to find, and therefore audit, the pre-
generated files, and to validate that there's no overlap. It would add
an extra step for people who want to build from tarball, without having
to install autoconf (or whatever) first -- but an easier extra step
than, y'know, installing autoconf. :)  Conversely, people who want to
build from tarballs but *not* use the pre-generated configure, etc,
could now download the 'bare' tarball only.

("Couldn't those people just build from a git checkout?"  Not if they
don't have the tooling for it, not during early stages of a distribution
bootstrap, etc.  Also, the act of publishing a tarball that's a golden
copy of the VCS at the release tag is valuable for archival purposes.)

zw



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Russ Allbery
"Zack Weinberg"  writes:

> I have been thinking about this incident and this thread all weekend and
> have seen a lot of people saying things like "this is more proof that
> tarballs are a thing of the past and everyone should just build straight
> from git".  There are a bunch of reasons why one might disagree with
> this as a blanket statement, but I do think there's a valid point here:
> the malicious xz maintainer *might* have been caught earlier if they had
> committed the build-to-host.m4 modification to xz's VCS.  (Or they might
> not have!  Witness the three (and counting) malicious patches that they
> barefacedly submitted to *other* software and got accepted because the
> malice was subtle enough to pass through code review.)

> It might indeed be worth thinking about ways to minimize the difference
> between the tarball "make dist" produces and the tarball "git archive"
> produces, starting from the same clean git checkout, and also ways to
> identify and audit those differences.

There is extensive ongoing discussion of this on debian-devel.  There's no
real consensus in that discussion, but I think one useful principle that's
emerged that doesn't disrupt the world *too* much is that the release
tarball should differ from the Git tag only in the form of added files.
Any files that are present in both Git and in the release tarball should
be byte-for-byte identical.  That, in turn, allows distro tooling to
either use the Git tag and regenerate all the generated files, or start
from the release tarball, remove all the added files, and do the same.
But it still preserves an augmented release tarball for people building
from scratch who may not have all of the necessary tools available.

It's not a panacea (there are no panaceas), but it's less aggressive and
disruptive than some other ideas that have been proposed, and I think it's
mostly best practice already.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Zack Weinberg
On Sun, Mar 31, 2024, at 3:17 AM, Jacob Bachmeyer wrote:
> Eric Gallager wrote:
>> Specifically, what caught my attention was how the release tarball
>> containing the backdoor didn't match the history of the project in its
>> git repository. That made me think about automake's `distcheck`
>> target, whose entire purpose is to make it easier to verify that a
>> distribution tarball can be rebuilt from itself and contains all the
>> things it ought to contain.
>
> The problem is that a release tarball is a freestanding object, with no 
> dependency on the repository from which it was produced.  In this case, 
> the attacker added a bogus "update" of build-to-host.m4 from gnulib to 
> the release tarball, but that file is not stored in the Git repository.  
> This would not have tripped "make distcheck" because the crocked tarball 
> can indeed be used to rebuild another crocked tarball.
>
> As Alexandre Oliva mentioned in his reply, there is not really any good 
> way to prevent this, since the attacker could also patch the generated 
> configure script more directly.

I have been thinking about this incident and this thread all weekend and
have seen a lot of people saying things like "this is more proof that tarballs
are a thing of the past and everyone should just build straight from git".
There are a bunch of reasons why one might disagree with this as a blanket
statement, but I do think there's a valid point here: the malicious xz
maintainer *might* have been caught earlier if they had committed the
build-to-host.m4 modification to xz's VCS.  (Or they might not have!
Witness the three (and counting) malicious patches that they barefacedly
submitted to *other* software and got accepted because the malice was
subtle enough to pass through code review.)

It might indeed be worth thinking about ways to minimize the difference
between the tarball "make dist" produces and the tarball "git archive"
produces, starting from the same clean git checkout, and also ways to
identify and audit those differences.

...
> Maybe the best revision to the GNU Coding Standards would be that 
> releases should, if at all possible, contain only text?  Any binary 
> files needed for testing can be generated during "make check" if 
> necessary

I don't think this is a good idea.  It's only a speed bump for someone
trying to smuggle malicious data into a package (think "base64 -d") and
it makes life substantially harder for honest authors of programs that
work with binary data, and authors of material whose "source code"
(as GPLv3 uses that term) *is* binary data.  Consider pngsuite, for
instance (http://www.schaik.com/pngsuite/) -- it would be a *ton* of
work to convert each of these test PNG files into GNU Poke scripts,
and probably the result would be *less* ergonomic for purposes of
improving the test suite.

I would like to suggest that a more useful policy would be "files
written to $prefix by 'make install' should not have any data
dependency on files labeled as part of the package's testsuite".
That doesn't constrain honest authors and it seems within the
scope of what the reproducible builds people could test for.
(Build the package, install to nonce prefix 1, unpack the tarball
again, delete the test suite, build again, install to prefix 2, compare.)
Of course a sufficiently determined malicious coder could detect
the reproducible-build test environment, but unlike "no binary data"
this is a substantial difficulty increment.

zw



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-04-01 Thread Jose E. Marchesi


> Jose E. Marchesi wrote:
>>> [...]
>>> 
>>>> I agree that distcheck is good but not a cure all.  Any static
>>>> system can be attacked when there is motive, and unit tests are
>>>> easily gamed.
>>>>   
>>> The issue seems to be releases containing binary data for unit tests,
>>> instead of source or scripts to generate that data.  In this case,
>>> that binary data was used to smuggle in heavily obfuscated object
>>> code.
>>> 
>>
>> As a side note, GNU poke (https://jemarch.net/poke) is good for
>> generating arbitrarily complex binary data from clear textual
>> descriptions.
>
> While it is suitable for that use, at last check poke is itself very
> complex, complete with its own JIT-capable VM.  This is good for
> interactive use, but I get nervous about complexity in testsuites,
> where simplicity can greatly aid debugging, and it /might/ be possible
> to hide a backdoor similarly in a poke pickle.  (This seems to be a
> general problem with powerful interactive editors.)

Yes, I agree simplicity it is very desirable, in testsuites and actually
everywhere else.  I also am not fond of dragging in dependencies.

But I suppose we also agree in that it is not possible to assembly
non-trivial binary data structures in a simple way, without somehow
moving the complexity of the encoding into some sort of generator, which
will not be simple.  The GDB testsuite, for example, ships with a DWARF
assembler written in around 3000 lines of Tcl.  Sure, it is simpler than
poke and doesn't drag in additional dependencies.  But it has to be
carefully maintained and kept up to date, and the complexity is there.

> Further, GNU poke defines its own specialized programming language for
> manipulating binary data.  Supplying generator programs in C (or C++)
> for binary test data in a package that itself uses C (or C++) ensures
> that every developer with the skills to improve or debug the package
> can also understand the testcase generators.

Here we will have to disagree.

IMO it is precisely the many and tricky details on properly marshaling
binary data in general-purpose programming languages that would have
greater odds to lead to difficult to understand, difficult to maintain
and possibly buggy or malicious encoders.  The domain specific language
is here an advantage, not a liability.

This you need to do in C to encode and generate test data for a single
signed 32-bit NUMBER in an output file in a _more or less_ portable way:

  void generate_testdata (off_t offset, int endian, int number)
  {
int bin_flag = 0, fd;

  #ifdef _WIN32
int bin_flag = O_BINARY;
  #endif
fd = open ("testdata.bin", bin_flag, S_IWUSR);
if (fd == -1)
  fatal ("error generating data.");

if (endian == BIG)
  {
b[0] = (number >> 24) & 0xff;
b[1] = (number >> 16) & 0xff;
b[2] = (number >> 8) & 0xff;
b[3] = number & 0xff;
  }
else
  {
b[3] = (number >> 24) & 0xff;
b[2] = (number >> 16) & 0xff;
b[1] = (number >> 8) & 0xff;
b[0] = number & 0xff;
  }

lseek (fd, offset, SEEK_SET);
for (i = 0; i < 4; ++i)
  write (fd, &b[i], 1);
close (fd);
  }

This is the Poke equivalent:

  fun generate_testdata = (offset,B> off, int<32> endian, int<32> 
number) void:
  {
var fd = open ("testdata.bin");
set_endian (endian);
int<32> @ fd : off = number;
close (fd);
  }

And thanks to the DSL, this scales nicely to more complex structures,
such as an ELF64 relocation instead of a signed 32-bit integer:

  fun generate_testdata = (offset,B> off, int<32> endian, int<32> 
number) void:
  {
type Elf64_RelInfo =
  struct Elf64_Xword
  {
uint<32> r_sym;
uint<32> r_type;
  };

type Elf64_Rela =
  struct
  {
offset,B> r_offset;
Elf64_RelInfo r_info;
offset,B> r_addend;
  };

var fd = open ("got32reloc.bin");
set_endian (endian);
Elf64_Rela @ 0#B
  = Elf64_Rela { r_info = Elf64_RelInfo { r_sym = 0xff00, r_type = 3 } }
close (fd);
  }



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jacob Bachmeyer

Tomas Volf wrote:

On 2024-03-31 14:50:47 -0400, Eric Gallager wrote:
  

With a reproducible build system, multiple maintainers can "make dist"
and compare the output to cross-check for erroneous / malicious dist
environments.  Multiple signatures should be harder to compromise,
assuming each is independent and generally trustworthy.


This can only work if a package /has/ multiple active maintainers.
  

Well, other people besides the maintainers can also run `make dist`
and `make distcheck`. My idea was to get end-users in the habit of
running `make distcheck` themselves before installing stuff. And if
that's too much to ask of end users, I'd also point out that there are
multiple kinds of maintainer: besides the upstream maintainer, there
are also usually separate distro maintainers. Even if there's only 1
upstream maintainer, as was the case here, I still think that it would
be good to get distro maintainers in the habit of including `make
distcheck` as part of their own release process, before they accept
updates from upstream.



What would be helpful is if `make dist' would guarantee to produce the same
tarball (bit-to-bit) each time it is run, assuming the tooling is the same
version.  Currently I believe that is not the case (at least due to timestamps).
  


A "tardiff" tool that ignores timestamps would be a solution to that 
problem, but not to this backdoor.



Combined with GNU Guix that would allow simple way to verify that `make dist'
was used, and the resulting artifact not tampered with, even without any central
signing.


The Guix "challenge" operation would not have detected this backdoor 
because *it* *was* *in* *the* *upstream* *release*.  The build service 
works from that release tarball and you build from that same release 
tarball.  GNU Guix ensures an equivalent build environment and your 
results *will* match---either the backdoor was not inserted or it was 
inserted in both builds.



The flow of the attack as I understand it was:

   (0)  (speculation on motivation) The attacker wanted a "Golden Key" 
to SSH and started looking for ways to backdoor sshd.
   (1)  The attacker starts a sockpuppet campaign and manages to get 
one of his sockpuppets appointed co-maintainer of xz-utils.
   (2)  [2023-06-27] The sockpuppet merges a pull request believed to 
be from another sockpuppet in commit 
ee44863ae88e377a5df10db007ba9bfadde3d314.
   (3)  [2024-02-15] The sockpuppet "updates m4/.gitignore" to add 
build-to-host.m4 to the list in commit 
4323bc3e0c1e1d2037d5e670a3bf6633e8a3031e.
   (4)  [2024-02-23] The sockpuppet adds 5 files to the xz-utils 
testsuite in commit cf44e4b7f5dfdbf8c78aef377c10f71e274f63c0.
   (5)  [2024-03-08] To cover tracks, the sockpuppet finally adds a 
test using bad-3-corrupt_lzma2.xz in commit 
a3a29bbd5d86183fc7eae8f0182dace374e778d8.
   (6)  [2024-03-08] The sockpuppet revises two of those files with a 
lame excuse in commit a3a29bbd5d86183fc7eae8f0182dace374e778d8.


The quick analysis of the Git history supporting steps 2 - 6 above has 
turned up another interesting detail:  no version of configure.ac 
actually committed ever used the gl_BUILD_TO_HOST macro.  An analysis 
found on pastebin noted that build-to-host.m4 is a dependency of 
gettext.m4.  Following up finds commit 
3adaddd73c8edcceaed059e859bd5262df65fc5a of 2023-02-18 in the GNU 
gettext repository introduced the use of gl_BUILD_TO_HOST, apparently as 
part of moving some existing path translation logic to gnulib and 
generalizing it for use elsewhere.  This commit is innocent (it is 
*extremely* unlikely that Bruno Haible was involved in the backdoor 
campaign) and also explains why the backdoor was checking for "dnl 
Convert it to C string syntax." in m4/gettext.m4:  that comment was 
removed in the same commit that switch to using gl_BUILD_TO_HOST.  The 
change to gettext also occurred about a year before the sockpuppet began 
to take advantage of it.


It almost "feels like" the attacker was waiting for an opportunity to 
make plausible changes to autoconf macros and finally got one when 
updating the m4/ files for the 5.6.0 release.  Could someone with the 
release tarballs confirm that m4/gettext.m4 was updated between 
v5.5.2beta and v5.6.0?  I doubt the entire backdoor was developed in the 
week between those two commits.  In fact, the timing around introducing 
ifuncs suggests to me that the binary blob was at least well into 
development by mid-2023.


The commit message at step 2 claims that using ifuncs with 
-fsanitize=address causes segfaults.  If this is true generally, the 
glibc team should probably reconsider whether the abuse potential is 
worth the benefit of the feature and possibly investigate how the 
feature was introduced to glibc.  If this was an excuse, it provided a 
clever way to prevent oss-fuzz from finding the backdoor, as disabling 
ifuncs pr

Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jacob Bachmeyer

Eric Gallager wrote:

On Sun, Mar 31, 2024 at 3:20 AM Jacob Bachmeyer  wrote:
  

dherr...@tentpost.com wrote:


[...]

The issue seems to be releases containing binary data for unit tests,
instead of source or scripts to generate that data.  In this case, that
binary data was used to smuggle in heavily obfuscated object code.

[...]



Maybe this is something that the GNU project could start making
stronger recommendations about.
  


The key issue seems to be generating binary test data during `make` or 
`make check`, using GNU poke, GNU Awk, Perl, Tcl, small C programs, or 
something else, instead of packaging it in the release.  The xz-utils 
backdoor was smuggled into the repository wrapped in compressed test data.



With a reproducible build system, multiple maintainers can "make dist"
and compare the output to cross-check for erroneous / malicious dist
environments.  Multiple signatures should be harder to compromise,
assuming each is independent and generally trustworthy.
  

This can only work if a package /has/ multiple active maintainers.



Well, other people besides the maintainers can also run `make dist`
and `make distcheck`. My idea was to get end-users in the habit of
running `make distcheck` themselves before installing stuff. And if
that's too much to ask of end users, I'd also point out that there are
multiple kinds of maintainer: besides the upstream maintainer, there
are also usually separate distro maintainers. Even if there's only 1
upstream maintainer, as was the case here, I still think that it would
be good to get distro maintainers in the habit of including `make
distcheck` as part of their own release process, before they accept
updates from upstream.
  


The problem with that is that `make distcheck` only verifies that the 
working tree can produce a reasonable release tarball.  The backdoored 
xz-utils releases *would* *have* *passed* *this* *test* as far as I can 
determine.  It catches errors like omitting files from the lists in 
Makefile.am.  It will *not* catch a modified m4 file or questionable 
test data that has been properly listed as part of the release.



Maybe GNU should establish a cross-verification signing standard and
"dist verification service" that automates this process?  Point it to
a repo and tag, request a signed hash of the dist package...  Then
downstream projects could check package signatures from both the
maintainer and such third-party verifiers to check that nothing was
inserted outside of version control.
  

Essentially, this would be an automated release building service:  upon
request, make a Git checkout, run autogen.sh or equivalent, make dist,
and publish or hash the result.  The problem is that an attacker who
manages to gain commit access to a repository may be able to launch
attacks on the release building service, since "make dist" can run
scripts.  The service could probably mount the working filesystem noexec
since preparing source releases should not require running (non-system)
binaries and scripts can be run by directly feeding them into their
interpreters even if the filesystem is mounted noexec, but this still
leaves all available interpreters and system tools potentially available.



Well, it'd at least make things more difficult for the attacker, even
if it wouldn't stop them completely.
  


Actually, no, it would open a *new* target for attackers---the release 
building service itself.  Mounting the scratchpad noexec would help to 
complicate attacks on that service, but right now there is *no* central 
point for an attacker to hit to compromise releases.  If a central 
release building service were set up, it would be a target, and an 
attacker able to arrange a persistent compromise of the service could 
then tamper with later releases as they are built.  This should be 
fairly easy to catch, if an honest maintainer has a secure environment, 
("Why the  does the central release service tarball not match mine?  
And what the **** is the extra code in this diff between its tarball 
and mine!?") but there is a risk that, especially for large projects, 
maintainers start relying on the central release service instead of 
building their own tarballs.


The problem here was not a maintainer with a compromised system---it 
seems that "Jia Tan" was a malefactor's sock puppet from the start.



-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jacob Bachmeyer

Jose E. Marchesi wrote:

[...]


I agree that distcheck is good but not a cure all.  Any static
system can be attacked when there is motive, and unit tests are
easily gamed.
  

The issue seems to be releases containing binary data for unit tests,
instead of source or scripts to generate that data.  In this case,
that binary data was used to smuggle in heavily obfuscated object
code.



As a side note, GNU poke (https://jemarch.net/poke) is good for
generating arbitrarily complex binary data from clear textual
descriptions.


While it is suitable for that use, at last check poke is itself very 
complex, complete with its own JIT-capable VM.  This is good for 
interactive use, but I get nervous about complexity in testsuites, where 
simplicity can greatly aid debugging, and it /might/ be possible to hide 
a backdoor similarly in a poke pickle.  (This seems to be a general 
problem with powerful interactive editors.)


Further, GNU poke defines its own specialized programming language for 
manipulating binary data.  Supplying generator programs in C (or C++) 
for binary test data in a package that itself uses C (or C++) ensures 
that every developer with the skills to improve or debug the package can 
also understand the testcase generators.



-- Jacob




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Peter Johansson



On 1/4/24 06:00, Eric Gallager wrote:

So, `aclocal` has a flag to control this behavior: specifically, its
`--install` flag. Right now I don't see `aclocal` mentioned in the GNU
Coding Standards at all. Should they be updated to include a
recommendation as to whether it's better to put `--install` in
`ACLOCAL_AMFLAGS` or not? Or would such a recommendation be a better
fit for the `automake` manual (since that's where `aclocal` comes
from)?

A common scenario is that the embedded M4 files are not the latest 
version and that the code in configure.ac is not compatible with newer 
versions that might be installed. Setting the --install flag and make 
every developer bootstrap with 'aclocal --install' or anyone trying to 
bootstrap an old version of the project would be very fragile. Also 
'aclocal --install' only overwrite the embedded copy if the serial 
numbers in the files suggest the installed file is a newer version than 
the embedded M4 file.


Peter


Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Tomas Volf
On 2024-03-31 14:50:47 -0400, Eric Gallager wrote:

> > > With a reproducible build system, multiple maintainers can "make dist"
> > > and compare the output to cross-check for erroneous / malicious dist
> > > environments.  Multiple signatures should be harder to compromise,
> > > assuming each is independent and generally trustworthy.
> >
> > This can only work if a package /has/ multiple active maintainers.
>
> Well, other people besides the maintainers can also run `make dist`
> and `make distcheck`. My idea was to get end-users in the habit of
> running `make distcheck` themselves before installing stuff. And if
> that's too much to ask of end users, I'd also point out that there are
> multiple kinds of maintainer: besides the upstream maintainer, there
> are also usually separate distro maintainers. Even if there's only 1
> upstream maintainer, as was the case here, I still think that it would
> be good to get distro maintainers in the habit of including `make
> distcheck` as part of their own release process, before they accept
> updates from upstream.

What would be helpful is if `make dist' would guarantee to produce the same
tarball (bit-to-bit) each time it is run, assuming the tooling is the same
version.  Currently I believe that is not the case (at least due to timestamps).
Combined with GNU Guix that would allow simple way to verify that `make dist'
was used, and the resulting artifact not tampered with, even without any central
signing.

Maybe new `dist-reproducible' automake option which would do two things:

1. Try to make things under its control reproducible (e.g.: set timestamps to 0)
2. `make distcheck' would build the archive twice (sequentially), checking that
   the hash matches.

Have a nice day,
Tomas Volf

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature


Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Eric Gallager
On Sun, Mar 31, 2024 at 3:54 PM Russ Allbery  wrote:
>
> Eric Gallager  writes:
>
> > Well, other people besides the maintainers can also run `make dist` and
> > `make distcheck`. My idea was to get end-users in the habit of running
> > `make distcheck` themselves before installing stuff. And if that's too
> > much to ask of end users, I'd also point out that there are multiple
> > kinds of maintainer: besides the upstream maintainer, there are also
> > usually separate distro maintainers. Even if there's only 1 upstream
> > maintainer, as was the case here, I still think that it would be good to
> > get distro maintainers in the habit of including `make distcheck` as
> > part of their own release process, before they accept updates from
> > upstream.
>
> Surely the distro maintainer should just delete all of those files and
> regenerate them from already-packaged tools, though?  This is already
> partially done by, e.g., Debian and addresses the case of malicious code
> embedded in the configure script.
>
> Here it wouldn't have helped because, knowing that the configure script
> would be regenerated, the malicious code was embedded in M4 files, but M4
> files that come from known external sources could be retrieved from those
> sources rather than using the copies inside the package (this is a whole
> can of worms, I realize).

So, `aclocal` has a flag to control this behavior: specifically, its
`--install` flag. Right now I don't see `aclocal` mentioned in the GNU
Coding Standards at all. Should they be updated to include a
recommendation as to whether it's better to put `--install` in
`ACLOCAL_AMFLAGS` or not? Or would such a recommendation be a better
fit for the `automake` manual (since that's where `aclocal` comes
from)?

> And, more relevantly to this specific attack, distro maintainers can verify
> that all files in the release tarball are either missing from Git or exactly
> match the file in Git with the appropriate tag.
>
> If there is an upstream Git repository, distro maintainers should probably
> just package the signed Git tag, not the release tarball, because it
> avoids a whole class of problems like this and ensures that the artifact
> that's packaged at least has a Git history and doesn't have changes
> injected without version control into the release artifact.
>
> I think the distro problem is in some sense easier.  The problem of the
> individual downloader who may not have the tools required to bootstrap
> from Git available is much harder.  (But also there aren't the same
> advantages to the attacker in compromising those folks, since there isn't
> the same mangification of scale as compromising the distro packages.)
>
> > Well, it'd at least make things more difficult for the attacker, even if
> > it wouldn't stop them completely.
>
> This is the whole field of security.  Nothing stops attackers completely;
> more difficult is the best that one can do.
>
> --
> Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Russ Allbery
Eric Gallager  writes:

> Well, other people besides the maintainers can also run `make dist` and
> `make distcheck`. My idea was to get end-users in the habit of running
> `make distcheck` themselves before installing stuff. And if that's too
> much to ask of end users, I'd also point out that there are multiple
> kinds of maintainer: besides the upstream maintainer, there are also
> usually separate distro maintainers. Even if there's only 1 upstream
> maintainer, as was the case here, I still think that it would be good to
> get distro maintainers in the habit of including `make distcheck` as
> part of their own release process, before they accept updates from
> upstream.

Surely the distro maintainer should just delete all of those files and
regenerate them from already-packaged tools, though?  This is already
partially done by, e.g., Debian and addresses the case of malicious code
embedded in the configure script.

Here it wouldn't have helped because, knowing that the configure script
would be regenerated, the malicious code was embedded in M4 files, but M4
files that come from known external sources could be retrieved from those
sources rather than using the copies inside the package (this is a whole
can of worms, I realize).  And, more relevantly to this specific attack,
distro maintainers can verify that all files in the release tarball are
either missing from Git or exactly match the file in Git with the
appropriate tag.

If there is an upstream Git repository, distro maintainers should probably
just package the signed Git tag, not the release tarball, because it
avoids a whole class of problems like this and ensures that the artifact
that's packaged at least has a Git history and doesn't have changes
injected without version control into the release artifact.

I think the distro problem is in some sense easier.  The problem of the
individual downloader who may not have the tools required to bootstrap
from Git available is much harder.  (But also there aren't the same
advantages to the attacker in compromising those folks, since there isn't
the same mangification of scale as compromising the distro packages.)

> Well, it'd at least make things more difficult for the attacker, even if
> it wouldn't stop them completely.

This is the whole field of security.  Nothing stops attackers completely;
more difficult is the best that one can do.

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Eric Gallager
On Sun, Mar 31, 2024 at 3:20 AM Jacob Bachmeyer  wrote:
>
> dherr...@tentpost.com wrote:
> > On 2024-03-30 18:25, Bruno Haible wrote:
> >> Eric Gallager wrote:
> >>>
> >>> Hm, so should automake's `distcheck` target be updated to perform
> >>> these checks as well, then?
> >>
> >> The first mentioned check can not be automated. ...
> >>
> >> The second mentioned check could be done by the maintainer, ...
> >
> >
> > I agree that distcheck is good but not a cure all.  Any static system
> > can be attacked when there is motive, and unit tests are easily gamed.
>
> The issue seems to be releases containing binary data for unit tests,
> instead of source or scripts to generate that data.  In this case, that
> binary data was used to smuggle in heavily obfuscated object code.
>
> The best analysis in one place that I have found so far is
> https://gynvael.coldwind.pl/?lang=en&id=782>.  In brief, grep is
> used to locate the main backdoor files by searching for marker strings.
> After running tests/files/bad-3-corrupt_lzma2.xz through tr(1), it
> becomes a /valid/ xz file that decompresses to a shell script that
> extracts a second shell script from part of the compressed data in
> tests/files/good-large_compressed.lzma and pipes it to a shell.  That
> second script has two major functions:  first, it searches the test
> files for four six-byte markers, and it then extracts and decrypts
> (using a simple RC4-alike implemented in Awk) the binary backdoor also
> found in tests/files/good-large_compressed.lzma.  The six-byte markers
> mark beginning and end of raw LZMA2 streams obfuscated with a simple
> substitution cipher.  Any such streams found would be decompressed and
> read by the shell, but neither of the known crocked releases had any
> files containing those markers.  The binary backdoor is an x86-64 object
> that gets unpacked into liblzma_la-crc64-fast.o, unless m4/gettext.m4
> contains "dnl Convert it to C string syntax." which is a clever flag
> because about no one actually checks that those m4 files in release
> tarballs actually match what the GNU project distributes.

Maybe this is something that the GNU project could start making
stronger recommendations about.

> The object itself is just the backdoor and presumably provides the
> symbol _get_cpuid as its entrypoint, since the unpacker script patches
> the src/liblzma/check/crc{64,32}_fast.c files in a pipeline to add calls to
> that function and drops the compiled objects in .libs/.  Running make
> will then skip building those objects, since they are already
> up-to-date, and the backdoored objects get linked into the final binary.
>
> Commit 6e636819e8f070330d835fce46289a3ff72a7b89
> (https://git.tukaani.org/?p=xz.git;a=commitdiff;h=6e636819e8f070330d835fce46289a3ff72a7b89>)
> was an update to the backdoor.  The commit message is suspicious,
> claiming the use of "a constant seed" to generate reproducible test
> files, but /not/ declaring how the files were produced, which of course
> prevents reproducibility.
>
> > With a reproducible build system, multiple maintainers can "make dist"
> > and compare the output to cross-check for erroneous / malicious dist
> > environments.  Multiple signatures should be harder to compromise,
> > assuming each is independent and generally trustworthy.
>
> This can only work if a package /has/ multiple active maintainers.

Well, other people besides the maintainers can also run `make dist`
and `make distcheck`. My idea was to get end-users in the habit of
running `make distcheck` themselves before installing stuff. And if
that's too much to ask of end users, I'd also point out that there are
multiple kinds of maintainer: besides the upstream maintainer, there
are also usually separate distro maintainers. Even if there's only 1
upstream maintainer, as was the case here, I still think that it would
be good to get distro maintainers in the habit of including `make
distcheck` as part of their own release process, before they accept
updates from upstream.

>
> You also have a small misunderstanding here:  "make dist" prepares a
> (source) release tarball, not a binary build, so this is a
> closely-related issue but actually distinct from reproducible builds.
> Also easier to solve, since we only have to make the source tarball
> reproducible.
>
> > Maybe GNU should establish a cross-verification signing standard and
> > "dist verification service" that automates this process?  Point it to
> > a repo and tag, request a signed hash of the dist package...  Then
> > downstream projects could check package signatures from both the
> &

Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jose E. Marchesi


> [...]
>> I agree that distcheck is good but not a cure all.  Any static
>> system can be attacked when there is motive, and unit tests are
>> easily gamed.
>
> The issue seems to be releases containing binary data for unit tests,
> instead of source or scripts to generate that data.  In this case,
> that binary data was used to smuggle in heavily obfuscated object
> code.

As a side note, GNU poke (https://jemarch.net/poke) is good for
generating arbitrarily complex binary data from clear textual
descriptions.



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Alfred M. Szmidt
   Bluntly, I don't think it would help with security.  The attacker would
   just have to disable or adjust the distcheck target to seemingly pass.

Yeah, it should be noted that the way the backdoor got into the code
was by the _co-maintainer_ -- distcheck or not, would not have
mattered, automake or not, would not have mattered.  The individual
could have sneaked in code changes into the release tar-ball just as
well -- Github presented two sets of files one could download (direct
from git, and "release").

The deviousness of this backdoor should not be understated, it was a
long game of over two years in work and technological improvments will
simply not mitigate it.

   Relying on something in a code repository to tell whether the repository
   is secure is akin to tying a dog with sausage.

   For security proper, the verification code needs to be held elsewhere,
   not compromisable along with the thing it's supposed to verify.

   Analogously, you don't run a rootkit checker on the system that's
   potentially compromised, because the rootkit may hide itself; you boot
   off secure media and then use the tools in it to look for the rootkit in
   the potentially-compromised system, *without* handing control over to
   it.



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Alfred M. Szmidt
   > It is not yet clear if the 
   > maintainer intentionally did this, or if the changes were introduced via 
   > a compromise of his computer.

   I think it is pretty clear by now. [1][2][3]

There is a bit more to it all than just this -- the maintainer wasn't
responsible (Lasse Collin), the co-maintainer -- JiaT75 (or what you
might call the person) was from the looks.

   [1] https://boehs.org/node/everything-i-know-about-the-xz-backdoor
   [2] https://news.ycombinator.com/item?id=39865810
   [3] https://www.youtube.com/watch?v=Kw8MCN5uJPg









Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Bob Friesenhahn

I think it is pretty clear by now. [1][2][3]

[1] https://boehs.org/node/everything-i-know-about-the-xz-backdoor
[2] https://news.ycombinator.com/item?id=39865810
[3] https://www.youtube.com/watch?v=Kw8MCN5uJPg


There is not much one can do when a maintainer with signing/release 
power does something intentionally wrong.


My GraphicsMagick oss-fuzz builds include xz and are still working (but 
with a few security issues open due to problems in xz). The URL used is 
https://github.com/xz-mirror/xz. When I visit that URL, I see this 
message "This repository has been archived by the owner on Aug 28, 2023. 
It is now read-only.", so it seems that this is a stale repository.  The 
upstream repository to it has been disabled.


Regardless, how can Autotool's based projects be more assured of 
security given how they are selectively assembled from "parts"? I have 
already been concerned about using any Autotools packages provided by 
the operating system, since they are likely dated, but may also have 
been modified by the distribution package maintainers.


Besides GNU Autoconf, Automake, and libtool, there are also several 
popular Autoconf macro archives. Sometimes components are automatically 
downloaded via build scripts. This is not at all a "safe" situation. 
There is quite a lot of trust, which may be unwarranted.


Should the GNU project itself perform an independent file verification 
of included Autotools files (Autoconf .m4 files, scripts, libtool, etc.) 
for all of the packages it distributes? Besides verifying the original 
files which are re-distributed, it might be necessary to verify that 
generated files are correct, and are in fact based on the files which 
are re-distributed.


Bob

--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Bruno Haible
Bob Friesenhahn wrote:
> It is not yet clear if the 
> maintainer intentionally did this, or if the changes were introduced via 
> a compromise of his computer.

I think it is pretty clear by now. [1][2][3]

[1] https://boehs.org/node/everything-i-know-about-the-xz-backdoor
[2] https://news.ycombinator.com/item?id=39865810
[3] https://www.youtube.com/watch?v=Kw8MCN5uJPg






Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Bob Friesenhahn

On 3/30/24 19:00, Alexandre Oliva wrote:


Bluntly, I don't think it would help with security.  The attacker would
just have to disable or adjust the distcheck target to seemingly pass.

Relying on something in a code repository to tell whether the repository
is secure is akin to tying a dog with sausage.

For security proper, the verification code needs to be held elsewhere,
not compromisable along with the thing it's supposed to verify.

Analogously, you don't run a rootkit checker on the system that's
potentially compromised, because the rootkit may hide itself; you boot
off secure media and then use the tools in it to look for the rootkit in
the potentially-compromised system, *without* handing control over to
it.


I am on the oss-security mailing list where this issue was perhaps first 
publicly reported, and has been discussed/analyzed furiously.


My first thought was that Autoconf is a relatively trivial attack vector 
since it is so complex and the syntax used for some parts (e.g. m4 and 
shell scripts) is so arcane.  In particular, it is common for Autotools 
stuff to be installed on a computer (e.g. by installing a package from 
an OS package manager) and then used while building.  For example, there 
are large collections of ".m4" files installed.  If one of the m4 files 
consumed has been modified, then the resulting configure script has been 
modified.


It may be that an OS package manager has the ability to validate already 
installed files, but this is not likely to be used.


If installed files were themselves independently signed (or sha256s of 
the files are contained in a signed manifest), and Autotools was able to 
validate them while copying into a project ("bootstrapping"), then at 
least there is some assurance that the many files which were consumed 
have not been subverted.  The same signed data could be used to detect 
if the files are modified after the initial bootstrap.


It seems common for OS distributions to modify some of the files 
(especially libtool related) so they differ from the original GNU versions.


The problem which happened with the xz utils software is that the 
maintainer signed a release package with his PGP key, but there were 
subtle changes in the released product.  It is not yet clear if the 
maintainer intentionally did this, or if the changes were introduced via 
a compromise of his computer.


Bob

--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt




Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jacob Bachmeyer

dherr...@tentpost.com wrote:

On 2024-03-30 18:25, Bruno Haible wrote:

Eric Gallager wrote:


Hm, so should automake's `distcheck` target be updated to perform
these checks as well, then?


The first mentioned check can not be automated. ...

The second mentioned check could be done by the maintainer, ...



I agree that distcheck is good but not a cure all.  Any static system 
can be attacked when there is motive, and unit tests are easily gamed.


The issue seems to be releases containing binary data for unit tests, 
instead of source or scripts to generate that data.  In this case, that 
binary data was used to smuggle in heavily obfuscated object code.


The best analysis in one place that I have found so far is 
https://gynvael.coldwind.pl/?lang=en&id=782>.  In brief, grep is 
used to locate the main backdoor files by searching for marker strings.  
After running tests/files/bad-3-corrupt_lzma2.xz through tr(1), it 
becomes a /valid/ xz file that decompresses to a shell script that 
extracts a second shell script from part of the compressed data in 
tests/files/good-large_compressed.lzma and pipes it to a shell.  That 
second script has two major functions:  first, it searches the test 
files for four six-byte markers, and it then extracts and decrypts 
(using a simple RC4-alike implemented in Awk) the binary backdoor also 
found in tests/files/good-large_compressed.lzma.  The six-byte markers 
mark beginning and end of raw LZMA2 streams obfuscated with a simple 
substitution cipher.  Any such streams found would be decompressed and 
read by the shell, but neither of the known crocked releases had any 
files containing those markers.  The binary backdoor is an x86-64 object 
that gets unpacked into liblzma_la-crc64-fast.o, unless m4/gettext.m4 
contains "dnl Convert it to C string syntax." which is a clever flag 
because about no one actually checks that those m4 files in release 
tarballs actually match what the GNU project distributes.  The object 
itself is just the backdoor and presumably provides the symbol 
_get_cpuid as its entrypoint, since the unpacker script patches the 
src/liblzma/check/crc{64,32}_fast.c files in a pipeline to add calls to 
that function and drops the compiled objects in .libs/.  Running make 
will then skip building those objects, since they are already 
up-to-date, and the backdoored objects get linked into the final binary.


Commit 6e636819e8f070330d835fce46289a3ff72a7b89 
(https://git.tukaani.org/?p=xz.git;a=commitdiff;h=6e636819e8f070330d835fce46289a3ff72a7b89>) 
was an update to the backdoor.  The commit message is suspicious, 
claiming the use of "a constant seed" to generate reproducible test 
files, but /not/ declaring how the files were produced, which of course 
prevents reproducibility.


With a reproducible build system, multiple maintainers can "make dist" 
and compare the output to cross-check for erroneous / malicious dist 
environments.  Multiple signatures should be harder to compromise, 
assuming each is independent and generally trustworthy.


This can only work if a package /has/ multiple active maintainers.

You also have a small misunderstanding here:  "make dist" prepares a 
(source) release tarball, not a binary build, so this is a 
closely-related issue but actually distinct from reproducible builds.  
Also easier to solve, since we only have to make the source tarball 
reproducible.


Maybe GNU should establish a cross-verification signing standard and 
"dist verification service" that automates this process?  Point it to 
a repo and tag, request a signed hash of the dist package...  Then 
downstream projects could check package signatures from both the 
maintainer and such third-party verifiers to check that nothing was 
inserted outside of version control.


Essentially, this would be an automated release building service:  upon 
request, make a Git checkout, run autogen.sh or equivalent, make dist, 
and publish or hash the result.  The problem is that an attacker who 
manages to gain commit access to a repository may be able to launch 
attacks on the release building service, since "make dist" can run 
scripts.  The service could probably mount the working filesystem noexec 
since preparing source releases should not require running (non-system) 
binaries and scripts can be run by directly feeding them into their 
interpreters even if the filesystem is mounted noexec, but this still 
leaves all available interpreters and system tools potentially available.



-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-31 Thread Jacob Bachmeyer

Eric Gallager wrote:


Specifically, what caught my attention was how the release tarball
containing the backdoor didn't match the history of the project in its
git repository. That made me think about automake's `distcheck`
target, whose entire purpose is to make it easier to verify that a
distribution tarball can be rebuilt from itself and contains all the
things it ought to contain.


The problem is that a release tarball is a freestanding object, with no 
dependency on the repository from which it was produced.  In this case, 
the attacker added a bogus "update" of build-to-host.m4 from gnulib to 
the release tarball, but that file is not stored in the Git repository.  
This would not have tripped "make distcheck" because the crocked tarball 
can indeed be used to rebuild another crocked tarball.


As Alexandre Oliva mentioned in his reply, there is not really any good 
way to prevent this, since the attacker could also patch the generated 
configure script more directly.  (I seem to remember past incidents 
where tampered release tarballs had configure scripts that would 
download and run shell scripts.  If you ran configure as root, well...)  
The *user* could catch issues like this backdoor, since the backdoor 
appears (based on what I have read so far) to materialize certain object 
files while configure is running, while `find . -iname '*.o'` /should/ 
return nothing before make is run.  This also suggests that running 
"make clean" after configure would kill at least this backdoor.  A 
*very* observant (unreasonably so) user might notice that "make" did not 
build the objects that the backdoor provided.


Of course, an attacker could sneak around this as well by moving the 
process for unpacking the backdoor object to a Makefile rule, but that 
is more likely to "stick out" to an observant user, as well as being an 
easy target for automated analysis ("Which files have 'special' rules?") 
since you cannot obfuscate those from make(1) and expect them to still 
work.  In this case, the backdoor was ultimately discovered when it 
caused performance problems in sshd, which should not be using liblzma 
at all, but gets linked with it courtesy of libsystemd on major 
GNU/Linux distributions.  Yes, this means that systemd is a contributing 
factor to this incident, and that is aggravated by its unnecessary use 
of excessive dependencies.  (Sending a notification that a daemon is 
ready should /not/ require compression support of any type.  The 
"katamari" architecture model used in systemd had the effect here of 
broadening the supply-chain attack surface for OpenSSH sshd to include 
xz-utils, which is insane.)


The bulk of the attack payload seems to have been stored in the Git 
repository, disguised as binary test data in files 
tests/files/{bad-3-corrupt_lzma2.xz,good-large_compressed.lzma}.  The 
modified build-to-host.m4 merely added code to configure to start the 
process of unpacking the backdoor.  In a build from Git, the legitimate 
build-to-host.m4 would get copied in from gnulib and the backdoor would 
remain hidden.


Maybe the best revision to the GNU Coding Standards would be that 
releases should, if at all possible, contain only text?  Any binary 
files needed for testing can be generated during "make check" if 
necessary, with generator programs packaged (as source or scripts) in 
the release.



-- Jacob



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-30 Thread Alexandre Oliva
On Mar 30, 2024, Eric Gallager  wrote:

> automake's `distcheck` target, whose entire purpose is to make it
> easier to verify that a distribution tarball can be rebuilt from
> itself and contains all the things it ought to contain.

> Recommending the `distcheck` target to a wider variety of users would
> help more projects catch mismatches between things a distribution
> tarball is supposed to contain, and things that it isn't. This would
> be a win for security and could help make it easier to catch future
> possible bad actors trying to pull a similar trick. What do people
> think?

Bluntly, I don't think it would help with security.  The attacker would
just have to disable or adjust the distcheck target to seemingly pass.

Relying on something in a code repository to tell whether the repository
is secure is akin to tying a dog with sausage.

For security proper, the verification code needs to be held elsewhere,
not compromisable along with the thing it's supposed to verify.

Analogously, you don't run a rootkit checker on the system that's
potentially compromised, because the rootkit may hide itself; you boot
off secure media and then use the tools in it to look for the rootkit in
the potentially-compromised system, *without* handing control over to
it.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice but
very few check the facts.  Think Assange & Stallman.  The empires strike back



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-30 Thread dherring

On 2024-03-30 18:25, Bruno Haible wrote:

Eric Gallager wrote:


Hm, so should automake's `distcheck` target be updated to perform
these checks as well, then?


The first mentioned check can not be automated. ...

The second mentioned check could be done by the maintainer, ...



I agree that distcheck is good but not a cure all.  Any static system 
can be attacked when there is motive, and unit tests are easily gamed.


With a reproducible build system, multiple maintainers can "make dist" 
and compare the output to cross-check for erroneous / malicious dist 
environments.  Multiple signatures should be harder to compromise, 
assuming each is independent and generally trustworthy.


Maybe GNU should establish a cross-verification signing standard and 
"dist verification service" that automates this process?  Point it to a 
repo and tag, request a signed hash of the dist package...  Then 
downstream projects could check package signatures from both the 
maintainer and such third-party verifiers to check that nothing was 
inserted outside of version control.


-- Daniel



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-30 Thread Bruno Haible
Eric Gallager wrote:
> >   * In order to detect that a tarball contains too many files, that is,
> > some files that the release manager did not intend to include,
> > the best way is to compare the file list of the current tarball
> > with the previous version:
> >   $ diff -r -q package-prev_version/ package-curr_version/
> >
> >   * In order to detect whether the packaged file list is consistent
> > with the .gitignore file, one can use
> >   $ git status -u
> 
> Hm, so should automake's `distcheck` target be updated to perform
> these checks as well, then?

The first mentioned check can not be automated. It can only be done by the
maintainer / release manager, reviewing the list of added files and matching
them against the list of added features or tests since the last release.

The second mentioned check could be done by the maintainer, if they add
a 'distcheck-hook' [1] for this purpose. I personally find this quite
hairy, because mixing the GNU build system (which is about *generating files*)
with *version control* topics has been a recipe for trouble along the years.

Bruno

[1] 
https://www.gnu.org/software/automake/manual/html_node/Checking-the-Distribution.html






Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-30 Thread Eric Gallager
On Sat, Mar 30, 2024 at 5:41 PM Bruno Haible  wrote:
>
> Eric Gallager wrote:
> > Recommending the `distcheck` target to a wider variety of users would
> > help more projects catch mismatches between things a distribution
> > tarball is supposed to contain, and things that it isn't.
>
> While 'make distcheck' detects some of these mismatches, it does not
> detect them all. In particular:
>
>   * In order to detect that a tarball contains too many files, that is,
> some files that the release manager did not intend to include,
> the best way is to compare the file list of the current tarball
> with the previous version:
>   $ diff -r -q package-prev_version/ package-curr_version/
>
>   * In order to detect whether the packaged file list is consistent
> with the .gitignore file, one can use
>   $ git status -u
>
> Bruno
>

Hm, so should automake's `distcheck` target be updated to perform
these checks as well, then?



Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-30 Thread Bruno Haible
Eric Gallager wrote:
> Recommending the `distcheck` target to a wider variety of users would
> help more projects catch mismatches between things a distribution
> tarball is supposed to contain, and things that it isn't.

While 'make distcheck' detects some of these mismatches, it does not
detect them all. In particular:

  * In order to detect that a tarball contains too many files, that is,
some files that the release manager did not intend to include,
the best way is to compare the file list of the current tarball
with the previous version:
  $ diff -r -q package-prev_version/ package-curr_version/

  * In order to detect whether the packaged file list is consistent
with the .gitignore file, one can use
  $ git status -u

Bruno







Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-30 Thread Karl Berry
`distcheck` target's prominence to recommend it in the "Standard
Targets for All Users" section of the GCS? 

Replying as an Automake developer, I have nothing against it in
principle, but it's clearly up to the GNU coding standards
maintainers. As far as I know, that's still rms (for anything
substantive)

FWIW, I expect that few users would actually run make distcheck,
regardless of anything in the GCS.  And of those that do, I suspect
there would be many failures because make distcheck is a complex target
that is not, so far as I understand it, intended to be as perfectly
portable or prerequisite-free as other targets. No doubt it could be
improved in that regard. Not a stopper, just my thoughts.

Thanks for the suggestion,
Karl



GNU Coding Standards, automake, and the recent xz-utils backdoor

2024-03-30 Thread Eric Gallager
I was recently reading about the backdoor announced in xz-utils the
other day, and one of the things that caught my attention was how
(ab)use of the GNU build system played a role in allowing the backdoor
to go unnoticed: https://openwall.com/lists/oss-security/2024/03/29/4
Specifically, what caught my attention was how the release tarball
containing the backdoor didn't match the history of the project in its
git repository. That made me think about automake's `distcheck`
target, whose entire purpose is to make it easier to verify that a
distribution tarball can be rebuilt from itself and contains all the
things it ought to contain. However, as I check the GNU Coding
Standards now, I notice that it doesn't say anything about this
target. I'm wondering if it might be worthwhile to upgrade the
`distcheck` target's prominence to recommend it in the "Standard
Targets for All Users" section of the GCS? Specifically here:
https://www.gnu.org/prep/standards/html_node/Standard-Targets.html#Standard-Targets
Recommending the `distcheck` target to a wider variety of users would
help more projects catch mismatches between things a distribution
tarball is supposed to contain, and things that it isn't. This would
be a win for security and could help make it easier to catch future
possible bad actors trying to pull a similar trick. What do people
think?
Eric Gallager



Re: Typo in NEWS section "New in 1.17" and "New in 1.15"

2023-12-26 Thread Karl Berry
While reading the announcement for 1.16i, I found a typo in the
NEWS file "New in 1.17" section.

I have also accidentally found a typo in the "New in 1.15"
section, 

Thanks x 2, Hans. Applied.

which might need a line rewrap after the fix.

Nah, it's ok.

I have not systematically looked for typos in any of the
"New in ..." sections older than 1.17.

I ran a spell checker and nothing obvious showed up. Not that that's
conclusive, but it will have to do :). --thanks, karl.



Typo in NEWS section "New in 1.17" and "New in 1.15"

2023-12-25 Thread Hans Ulrich Niedermann
While reading the announcement for 1.16i, I found a typo in the
NEWS file "New in 1.17" section.

I have also accidentally found a typo in the "New in 1.15"
section, which might need a line rewrap after the fix.

I have not systematically looked for typos in any of the
"New in ..." sections older than 1.17.

diff --git a/NEWS b/NEWS
index 5b56a7283..384be5e94 100644
--- a/NEWS
+++ b/NEWS
@@ -51,7 +51,7 @@ New in 1.17:
 filesystem that supports sub-second resolution; otherwise, we fall
 back to one-second granularity as before. When everything is
 supported, a new line `Features: subsecond-mtime' is printed by
-automake --version (and autom4mte --version). (bug#64756, bug#67670)
+automake --version (and autom4te --version). (bug#64756, bug#67670)
 
   - The default value of $ARFLAGS is now "cr" instead of "cru", to better
 support deterministic builds. (bug#20082)
@@ -386,7 +386,7 @@ New in 1.15:
 include $(srcdir)/fragment.am
     ...
 
-If the use forgot to add data.txt and/or preproc.sh in the distribution
+If the user forgot to add data.txt and/or preproc.sh in the distribution
 tarball, "make distcheck" would have erroneously succeeded!  This issue
 is now fixed.
 



Re: Setting libXXX_la_CPPFLAGS and libXXX_la_CFLAGS erases AM_CPPFLAGS and AM_CFLAGS

2022-11-19 Thread Jan Engelhardt
On Saturday 2022-11-19 09:11, madmurphy wrote:

>I guess it does make sense. But then what might be missing to Automake are
>libXXX_la_AM_CFLAGS, libXXX_la_AM_CPPFLAGS and libXXX_la_AM_LDFLAGS
>variables, in which the global AM_CFLAGS, AM_CPPFLAGS and AM_LDFLAGS are
>automatically pasted (whereas the corresponding versions without the AM_
>prefix erase everything)…

I guess it's doable. But probably not worth the development effort
or the codebase impact.



Re: Setting libXXX_la_CPPFLAGS and libXXX_la_CFLAGS erases AM_CPPFLAGS and AM_CFLAGS

2022-11-19 Thread madmurphy
I guess it does make sense. But then what might be missing to Automake are
libXXX_la_AM_CFLAGS, libXXX_la_AM_CPPFLAGS and libXXX_la_AM_LDFLAGS
variables, in which the global AM_CFLAGS, AM_CPPFLAGS and AM_LDFLAGS are
automatically pasted (whereas the corresponding versions without the AM_
prefix erase everything)…

--madmurphy

On Fri, Nov 18, 2022 at 11:42 PM Jan Engelhardt  wrote:

>
> On Friday 2022-11-18 22:57, Russ Allbery wrote:
> >madmurphy  writes:
> >
> >> However, if at the same time I set also the libfoo_la_CPPFLAGS variable
> (no
> >> matter the content), as in the following example,
> >
> >> AM_CPPFLAGS = \
> >>   "-DLIBFOO_BUILD_MESSAGE=\"correctly defined via AM_CPPFLAGS\""
> >> libfoo_la_CPPFLAGS = \
> >>   "-DLIBFOO_DUMMY=\"This is just a dummy text\""
> >
> >> the AM_CPPFLAGS variable will be completely overwritten
>
> It makes sense though.
>
> It's better that pertarget_CPPFLAGS overwrites, because otherwise... there
> would be no chance to dump (get rid) the AM_CPPFLAGS portion in the
> command-line - short of never setting AM_CPPFLAGS at all, which is not very
> economical if all you want to change is one target...
>


Re: Setting libXXX_la_CPPFLAGS and libXXX_la_CFLAGS erases AM_CPPFLAGS and AM_CFLAGS

2022-11-18 Thread Jan Engelhardt


On Friday 2022-11-18 22:57, Russ Allbery wrote:
>madmurphy  writes:
>
>> However, if at the same time I set also the libfoo_la_CPPFLAGS variable (no
>> matter the content), as in the following example,
>
>> AM_CPPFLAGS = \
>>   "-DLIBFOO_BUILD_MESSAGE=\"correctly defined via AM_CPPFLAGS\""
>> libfoo_la_CPPFLAGS = \
>>   "-DLIBFOO_DUMMY=\"This is just a dummy text\""
>
>> the AM_CPPFLAGS variable will be completely overwritten

It makes sense though.

It's better that pertarget_CPPFLAGS overwrites, because otherwise... there
would be no chance to dump (get rid) the AM_CPPFLAGS portion in the
command-line - short of never setting AM_CPPFLAGS at all, which is not very
economical if all you want to change is one target...



Re: Setting libXXX_la_CPPFLAGS and libXXX_la_CFLAGS erases AM_CPPFLAGS and AM_CFLAGS

2022-11-18 Thread Russ Allbery
madmurphy  writes:

> However, if at the same time I set also the libfoo_la_CPPFLAGS variable (no
> matter the content), as in the following example,

> AM_CPPFLAGS = \
>"-DLIBFOO_BUILD_MESSAGE=\"correctly defined via AM_CPPFLAGS\""

> ...

> libfoo_la_CPPFLAGS = \
>"-DLIBFOO_DUMMY=\"This is just a dummy text\""

> the AM_CPPFLAGS variable will be completely overwritten by the
> libfoo_la_CPPFLAGS variable, and invoking libfoo_func() will print

> Message from the build system: undefined

While this is often confusing, this is the documented behavior of
Automake.  See:

https://www.gnu.org/software/automake/manual/automake.html#Program-and-Library-Variables

In compilations with per-target flags, the ordinary ‘AM_’ form of the
flags variable is not automatically included in the compilation
(however, the user form of the variable is included). So for instance,
if you want the hypothetical maude compilations to also use the value
of AM_CFLAGS, you would need to write:

maude_CFLAGS = … your flags … $(AM_CFLAGS)

See Flag Variables Ordering, for more discussion about the interaction
between user variables, ‘AM_’ shadow variables, and per-target
variables.

and

https://www.gnu.org/software/automake/manual/automake.html#Flag-Variables-Ordering

-- 
Russ Allbery (ea...@eyrie.org) <https://www.eyrie.org/~eagle/>



Setting libXXX_la_CPPFLAGS and libXXX_la_CFLAGS erases AM_CPPFLAGS and AM_CFLAGS

2022-11-18 Thread madmurphy
Hi,

If I create a library named libfoo containing the following code (example
attached),

#include "libfoo.h"

#ifndef LIBFOO_BUILD_MESSAGE
#define LIBFOO_BUILD_MESSAGE "undefined"
#endif


int libfoo_func() {
printf("Message from the build system: " LIBFOO_BUILD_MESSAGE "\n");
return 0;
}

and then I set the LIBFOO_BUILD_MESSAGE preprocessor macro via Makefile.am,

AM_CPPFLAGS = \
 "-DLIBFOO_BUILD_MESSAGE=\"correctly defined via AM_CPPFLAGS\""

invoking libfoo_func() from a linked program will correctly print the
following text.

Message from the build system: correctly defined via AM_CPPFLAGS

However, if at the same time I set also the libfoo_la_CPPFLAGS variable (no
matter the content), as in the following example,

AM_CPPFLAGS = \
 "-DLIBFOO_BUILD_MESSAGE=\"correctly defined via AM_CPPFLAGS\""

...

libfoo_la_CPPFLAGS = \
 "-DLIBFOO_DUMMY=\"This is just a dummy text\""

the AM_CPPFLAGS variable will be completely overwritten by the
libfoo_la_CPPFLAGS variable, and invoking libfoo_func() will print

Message from the build system: undefined

If I decide to use the *_CFLAGS class of variables instead of *_CPPFLAGS,

AM_CFLAGS = \
 -Wall \
 -Wextra \
 -g \
 "-DLIBFOO_BUILD_MESSAGE=\"correctly defined via AM_CPPFLAGS\""

...

libfoo_la_CFLAGS = \
 "-DLIBFOO_DUMMY=\"This is just a dummy text\""

the result will be the same.

Message from the build system: undefined

To restore AM_CPPFLAGS (or AM_CFLAGS) I need to mention it explicitly in
libfoo_la_CPPFLAGS.

AM_CPPFLAGS = \
 "-DLIBFOO_BUILD_MESSAGE=\"correctly defined via AM_CPPFLAGS\""

...

libfoo_la_CPPFLAGS = \
$(AM_CPPFLAGS) \
 "-DLIBFOO_DUMMY=\"This is just a dummy text\""

In this case libfoo_func() will correctly print

Message from the build system: correctly defined via AM_CPPFLAGS

Is this a wanted behavior? Isn't the sense of AM_* variables that of being
applied to every single library in a project?

--madmurphy


libfoo-1.0.0.tar.xz
Description: application/xz


Re: Wrong order of preprocessor and compiler flags

2022-03-28 Thread Evgeny Grin

Hello Alex,

On 28.03.2022 4:55, Alex Ameen wrote:
This is a message I meant send to "all", I'm sending again for the wider 
discussion.


Please let me know if my understanding of include order is incorrect. 
Essentially I'm more concerned about relative placement of `AM_CPPFLAGS' 
and `CPPFLAGS' in any future changes.


Moving CPPFLAGS to the end of the line prevents users from overriding 
include paths.


Currently flags are used in automake as
AM_CPPFLAGS CPPFLAGS AM_CFLAGS CFLAGS

Actually, there is *no way* to override include path defined in 
AM_CPPFLAGS (unless AM_CPPFLAGS is redefined in command line, which is 
not right way). Any include search paths defined in CPPFLAGS and in 
CFLAGS will be added to the end of the search list.


On the other hand, this is a right thing. As defined in the same section 
of the GNU Coding Standards [1], there are two types of flags. The first 
type is flags required for proper compilation.
For example: when building library, I need root of build path for the 
lib's "configure.h" and the lib's includes directory. They must be the 
first items in the include search path, like

AM_CPPFLAGS="-I../../builddir -I../lib/include"
If user will override (prepend) with "-I/usr/include 
-I/usr/include/polly/Config" it would break the compilation, because 
lib's installed header will be used instead of header in sources dir and 
polly's config.h will be found first instead of lib's config.h.


I believe it's current placement is intended to provide an avenue for 
overrides in the same way that CFLAGS being at the end allows users to 
override the C standards and spec flags.


Really what I care about is the relative order of `CPPFLAGS 
AM_CPPFLAGS', and `AM_CFLAGS CFLAGS' - whether these groups are ordered 
before or after the other group is less important though. For example 
I'm content with either `CPPFLAGS AM_CPPFLAGS AM_CFLAGS CFLAGS' or 
`AM_CFLAGS CFLAGS CPPFLAGS AM_CPPFLAGS'.


My suggestion with "AM_CFLAGS AM_CPPFLAGS CFLAGS CPPFLAGS" gives user 
the same level of freedom on flags overriding as current "AM_CPPFLAGS 
CPPFLAGS AM_CFLAGS CFLAGS", CFLAGS still override any AM_* flags, and 
order of used flags is the same as tested by "configure".



[1] https://www.gnu.org/prep/standards/standards.html#Command-Variables

--
Evgeny

On Sun, Mar 27, 2022, 5:00 PM Jan Engelhardt <mailto:jeng...@inai.de>> wrote:



On Sunday 2022-03-27 23:22, Karl Berry wrote:

 >It seems the basic inconsistency is whether CPPFLAGS is considered a
 >"user variable" or not. In earlier eras, it wasn't [...]

In earlier eras of what exactly?

As for make, it never made a distinction between user variables or
otherwise,
at least that's the way make comes across. Some software will just
break on `make CFLAGS=-O3` and others will work to compile.

As for automake, AM_CPPFLAGS was introduced at the same time as
AM_CFLAGS as
per the git log. So CPPFLAGS always was a user variable.

 >[more on CFLAGS<->CPPFLAGS order]

I went to the GNU make git repo to check on CPPFLAGS; it appeared
first in
documentation rather than source (which seems like a history import
mishap),
but even back then in '94, the documentation was inconsistent, sometimes
providing example descriptions where CPPFLAGS comes after
CFLAGS/FFLAGS/etc.,
and sometimes reversed.



OpenPGP_0x460A317C3326D2AE.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: Wrong order of preprocessor and compiler flags

2022-03-28 Thread Evgeny Grin

Hello Karl,


On 28.03.2022 0:22, Karl Berry wrote:

It seems the basic inconsistency is whether CPPFLAGS is considered a
"user variable" or not. In earlier eras, it wasn't, but from your msg,
I gather it is now.

The GNU standards node about it, that you mentioned,
   https://www.gnu.org/prep/standards/standards.html#Command-Variables
does not clearly state it one way or another. But its example shows
CFLAGS after CPPFLAGS.


The same example puts "-I." into ALL_CFLAGS, which makes even harder to 
guess the right idea how to use CPPFLAGS.



Thus I think a prior step is to write bug-standa...@gnu.org and suggest
clarifying the status of CPPFLAGS, its relationship to CFLAGS, etc.


Definitely makes sense.
I'll write to the standards list.


We could consider changing automake to follow autoconf even if the GCS
is not changed, but it would be better if the GCS were clear.

 2. Use AM_CFLAGS CFLAGS AM_CPPFLAGS CPPFLAGS. This is more aligned with
 current flags grouping, but CFLAGS will not override definitions in
 AM_CPPFLAGS (less aligned with GNU Standards).

It seems wrong (and disastrously backwards-incompatible) to me for
CFLAGS not to override AM_everything. Your option 1:

 1. Use AM_CFLAGS AM_CPPFLAGS CFLAGS CPPFLAGS. I think this is the best
 option. As required by GNU Standards, CFLAGS still override all
 upstream-defined flags.

seems like the best option to me too. --thanks, karl.


--
Evgeny



Re: Wrong order of preprocessor and compiler flags

2022-03-27 Thread Alex Ameen
This is a message I meant send to "all", I'm sending again for the wider
discussion.

Please let me know if my understanding of include order is incorrect.
Essentially I'm more concerned about relative placement of `AM_CPPFLAGS'
and `CPPFLAGS' in any future changes.

Moving CPPFLAGS to the end of the line prevents users from overriding
include paths.

I believe it's current placement is intended to provide an avenue for
overrides in the same way that CFLAGS being at the end allows users to
override the C standards and spec flags.

Really what I care about is the relative order of `CPPFLAGS AM_CPPFLAGS',
and `AM_CFLAGS CFLAGS' - whether these groups are ordered before or after
the other group is less important though. For example I'm content with
either `CPPFLAGS AM_CPPFLAGS AM_CFLAGS CFLAGS' or `AM_CFLAGS CFLAGS
CPPFLAGS AM_CPPFLAGS'.


On Sun, Mar 27, 2022, 5:00 PM Jan Engelhardt  wrote:

>
> On Sunday 2022-03-27 23:22, Karl Berry wrote:
>
> >It seems the basic inconsistency is whether CPPFLAGS is considered a
> >"user variable" or not. In earlier eras, it wasn't [...]
>
> In earlier eras of what exactly?
>
> As for make, it never made a distinction between user variables or
> otherwise,
> at least that's the way make comes across. Some software will just
> break on `make CFLAGS=-O3` and others will work to compile.
>
> As for automake, AM_CPPFLAGS was introduced at the same time as AM_CFLAGS
> as
> per the git log. So CPPFLAGS always was a user variable.
>
> >[more on CFLAGS<->CPPFLAGS order]
>
> I went to the GNU make git repo to check on CPPFLAGS; it appeared first in
> documentation rather than source (which seems like a history import
> mishap),
> but even back then in '94, the documentation was inconsistent, sometimes
> providing example descriptions where CPPFLAGS comes after
> CFLAGS/FFLAGS/etc.,
> and sometimes reversed.
>


Re: Wrong order of preprocessor and compiler flags

2022-03-27 Thread Bob Friesenhahn

On Mon, 28 Mar 2022, Jan Engelhardt wrote:


I went to the GNU make git repo to check on CPPFLAGS; it appeared first in
documentation rather than source (which seems like a history import mishap),
but even back then in '94, the documentation was inconsistent, sometimes
providing example descriptions where CPPFLAGS comes after CFLAGS/FFLAGS/etc.,
and sometimes reversed.


I think that this is because it was always assumed that the order does 
not matter.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt



Re: Wrong order of preprocessor and compiler flags

2022-03-27 Thread Jan Engelhardt


On Sunday 2022-03-27 23:22, Karl Berry wrote:

>It seems the basic inconsistency is whether CPPFLAGS is considered a
>"user variable" or not. In earlier eras, it wasn't [...]

In earlier eras of what exactly?

As for make, it never made a distinction between user variables or otherwise,
at least that's the way make comes across. Some software will just
break on `make CFLAGS=-O3` and others will work to compile.

As for automake, AM_CPPFLAGS was introduced at the same time as AM_CFLAGS as
per the git log. So CPPFLAGS always was a user variable.

>[more on CFLAGS<->CPPFLAGS order]

I went to the GNU make git repo to check on CPPFLAGS; it appeared first in
documentation rather than source (which seems like a history import mishap),
but even back then in '94, the documentation was inconsistent, sometimes
providing example descriptions where CPPFLAGS comes after CFLAGS/FFLAGS/etc.,
and sometimes reversed.



Re: Wrong order of preprocessor and compiler flags

2022-03-27 Thread Karl Berry
It seems the basic inconsistency is whether CPPFLAGS is considered a
"user variable" or not. In earlier eras, it wasn't, but from your msg,
I gather it is now.

The GNU standards node about it, that you mentioned,
  https://www.gnu.org/prep/standards/standards.html#Command-Variables
does not clearly state it one way or another. But its example shows
CFLAGS after CPPFLAGS.

Thus I think a prior step is to write bug-standa...@gnu.org and suggest
clarifying the status of CPPFLAGS, its relationship to CFLAGS, etc.

We could consider changing automake to follow autoconf even if the GCS
is not changed, but it would be better if the GCS were clear.

2. Use AM_CFLAGS CFLAGS AM_CPPFLAGS CPPFLAGS. This is more aligned with 
current flags grouping, but CFLAGS will not override definitions in 
AM_CPPFLAGS (less aligned with GNU Standards).

It seems wrong (and disastrously backwards-incompatible) to me for
CFLAGS not to override AM_everything. Your option 1:

1. Use AM_CFLAGS AM_CPPFLAGS CFLAGS CPPFLAGS. I think this is the best 
option. As required by GNU Standards, CFLAGS still override all 
upstream-defined flags.

seems like the best option to me too. --thanks, karl.



Wrong order of preprocessor and compiler flags

2022-03-27 Thread Evgeny Grin

Hello,

This discussion was started initially in the autoconf list. [1]
Automake and autoconf use compiler and preprocessor flags in different 
order.

Within 'configure' scripts, compile checks/tests are performed as [2]:
$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&AS_MESSAGE_LOG_FD
but resulting flags are used in another order in automake makefiles:
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) 
$(AM_CFLAGS) $(CFLAGS)


Automake uses CPPFLAGS before CFLAGS [3].

In the following research I found than no 'make' implementation uses 
CPPFLAGS before CFLAGS [4]. Almost all 'make' implementations (GNU, 
OpenBSD, NetBSD, Solaris) put CPPFLAGS after CFLAGS, the only exception 
is FreeBSD version, which doesn't use CPPFLAGS at all.


glibc uses CFLAGS before CPPFLAGS [5][6].

While usage of CPPFLAGS before CFLAGS looks more logical, seems that 
majority of the software tools use flags in other order.


GNU Coding Standards recommends to put CFLAGS at the end of command line 
parameters [7] to give user ability to override upstream-supplied flags.


I think that automake should be aligned with other tools for several 
reasons:
* automated build systems are using the same CFLAGS, CPPFLAGS for 
packages built with autotools and pure Makefiles. Currently if may give 
different results.
* 'configure' results are based on CFLAGS CPPFLAGS, but automake's 
makefiles use CPPFLAGS CFLAGS which may produce result different from 
expected.


I see several ways to implement it in automake:
1. Use AM_CFLAGS AM_CPPFLAGS CFLAGS CPPFLAGS. I think this is the best 
option. As required by GNU Standards, CFLAGS still override all 
upstream-defined flags.
2. Use AM_CFLAGS CFLAGS AM_CPPFLAGS CPPFLAGS. This is more aligned with 
current flags grouping, but CFLAGS will not override definitions in 
AM_CPPFLAGS (less aligned with GNU Standards).

3. Use AM_CFLAGS AM_CPPFLAGS CPPFLAGS CFLAGS.
4. Use AM_CPPFLAGS AM_CFLAGS CFLAGS CPPFLAGS.
Although I can find arguments for the last two options, I don't think 
they make any real sense.


To avoid any possible breakage of existing packages, some new automake 
option could be introduced, with name like "right-flags-order" with 
warning that flag will be default soon.


I may work on patches, if my proposal will be accepted.


[1] https://lists.gnu.org/archive/html/autoconf/2022-03/msg4.html
[2] 
https://git.savannah.gnu.org/gitweb/?p=autoconf.git;a=blob;f=lib/autoconf/c.m4;hb=00358457d09c19ff6b5ec7ed98708540d1994a5f#l64
[3] 
https://git.savannah.gnu.org/cgit/automake.git/tree/bin/automake.in?id=fee9a828bcc968656edfc89e38b157c28d6335f0#n700

[4] https://lists.gnu.org/archive/html/autoconf/2022-03/msg00010.html
[5] 
https://sourceware.org/git/?p=glibc.git;a=blob;f=Makefile;hb=305769b2a15c2e96f9e1b5195d3c4e0d6f0f4b68#l528

[6] https://lists.gnu.org/archive/html/autoconf/2022-03/msg8.html
[7] https://www.gnu.org/prep/standards/standards.html#Command-Variables

--
Evgeny


OpenPGP_0x460A317C3326D2AE.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: type errors, command length limits, and Awk

2022-02-15 Thread Jacob Bachmeyer

Mike Frysinger wrote:

On 15 Feb 2022 21:17, Jacob Bachmeyer wrote:
  

Mike Frysinger wrote:


context: https://bugs.gnu.org/53340
  
  

Looking at the highlighted line in the context:



thanks for getting into the weeds with me
  


You are welcome.


  echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \

It seems that the problem is that am__base_list expects ListOf/File (and 
produces ChunkedListOf/File) but am__pep3147_tweak emits ListOf/Glob.  
This works in the usual case because the shell implicitly converts Glob 
-> ListOf/File and implicitly flattens argument lists, but results in 
the overall command line being longer than expected if the globs expand 
to more filenames than expected, as described there.


It seems that the proper solution to the problem at hand is to have 
am__pep3147_tweak expand globs itself somehow and thus provide 
ListOf/File as am__base_list expects.


Do I misunderstand?  Is there some other use for xargs?



if i did not care about double expansion, this might work.  the pipeline
quoted here handles the arguments correctly (other than whitespace splitting
on the initial input, but that's a much bigger task) before passing them to
the rest of the pipeline.  so the full context:

  echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \
  while read files; do \
$(am__uninstall_files_from_dir) || st=$$?; \
  done || exit $$?; \
...
am__uninstall_files_from_dir = { \
  test -z "$$files" \
|| { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
|| { echo " ( cd '$$dir' && rm -f" $$files ")"; \
 $(am__cd) "$$dir" && rm -f $$files; }; \
  }

leveraging xargs would allow me to maintain a single shell expansion.
the pathological situation being:
  bar.py
  __pycache__/
bar.pyc
bar*.pyc
bar**.pyc

py_files="bar.py" which turns into "__pycache__/bar*.pyc" by the pipeline,
and then am__uninstall_files_from_dir will expand it when calling `rm -f`.

if the pipeline expanded the glob, it would be:
  __pycache__/bar.pyc __pycache__/bar*.pyc __pycache__/bar**.pyc
and then when calling rm, those would expand a 2nd time.
  


If we know that there will be _exactly_ one additional shell expansion, 
why not simply filter the glob results through `sed 's/[?*]/\\&/g'` to 
escape potential glob metacharacters before emitting them from 
am__pep3147_tweak?  (Or is that not portable sed?)


Back to the pseudo-type model I used earlier, the difference between 
File and Glob is that Glob contains unescaped glob metacharacters, so 
escaping them should solve the problem, no?  (Or is there another thorn 
nearby?)



[...]

which at this point i've written `xargs -n40`, but not as fast :p.
  


Not as fast, yes, but certainly portable!  :p

The real question would be if it is faster than simply running rm once 
per file.  I would guess probably _so_ on MinGW (bash on Windows, where 
that logic would use shell builtins but running a new process is 
extremely slow) and probably _not_ on an archaic Unix system where 
"test" is not a shell builtin so saving the overhead and just running rm 
once per file would be faster.



automake jumps through some hoops to try and limit the length of generated
command lines, like deleting output objects in a non-recursive build.  it's
not perfect -- it breaks arguments up into 40 at a time (akin to xargs -n40)
and assumes that it won't have 40 paths with long enough names to exceed the
command line length.  it also has some logic where it's deleting paths by
globs, but the process to partition the file list into groups of 40 happens
before the glob is expanded, so there are cases where it's 40 globs that can
expand into many many more files and then exceed the command line length.
  
First, I thought that GNU-ish systems were not supposed to have such 
arbitrary limits,



one person's "arbitrary limits" is another person's "too small limit" :).
i'm most familiar with Linux, so i'll focus on that.

[...]

plus, backing up, Automake can't assume Linux.  so i think we have to
proceed as if there is a command line limit we need to respect.
  


So then the answer to my next question is that it is still an issue, 
even if the GNU system were to allow arguments up to available memory.


and this issue (the context) originated from Gentoo 
GNU/Linux.  Is this a more fundamental bug in Gentoo or still an issue 
because Automake build scripts are supposed to be portable to foreign 
system that do have those limits?



to be clear, what's failing is an Automake test.  it sets the `rm` limit to
an articially low one.  [...]

Gentoo happened to find this error before Automake because Gentoo also found
and fixe

Re: type errors, command length limits, and Awk (was: portability of xargs)

2022-02-15 Thread Dan Kegel
FWIW, commandline length limits are a real thing, I've run into them
with Make, CMake, and Meson.
I did some work to help address them in Meson, see e.g.
https://github.com/mesonbuild/meson/issues/7212

And just for fun, here's a vaguely related changelog entry from long
ago, back when things were much worse:

Tue Jun  8 15:24:14 1993  Paul Eggert  (egg...@twinsun.com)
* inp.c (plan_a): Check that RCS and working files are not the
same.  This check is needed on hosts that do not report file
name length limits and have short limits.

:-)



Re: type errors, command length limits, and Awk (was: portability of xargs)

2022-02-15 Thread Mike Frysinger
On 15 Feb 2022 21:17, Jacob Bachmeyer wrote:
> Mike Frysinger wrote:
> > context: https://bugs.gnu.org/53340
> >   
> Looking at the highlighted line in the context:

thanks for getting into the weeds with me

> > >   echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \
> It seems that the problem is that am__base_list expects ListOf/File (and 
> produces ChunkedListOf/File) but am__pep3147_tweak emits ListOf/Glob.  
> This works in the usual case because the shell implicitly converts Glob 
> -> ListOf/File and implicitly flattens argument lists, but results in 
> the overall command line being longer than expected if the globs expand 
> to more filenames than expected, as described there.
> 
> It seems that the proper solution to the problem at hand is to have 
> am__pep3147_tweak expand globs itself somehow and thus provide 
> ListOf/File as am__base_list expects.
> 
> Do I misunderstand?  Is there some other use for xargs?

if i did not care about double expansion, this might work.  the pipeline
quoted here handles the arguments correctly (other than whitespace splitting
on the initial input, but that's a much bigger task) before passing them to
the rest of the pipeline.  so the full context:

  echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \
  while read files; do \
$(am__uninstall_files_from_dir) || st=$$?; \
  done || exit $$?; \
...
am__uninstall_files_from_dir = { \
  test -z "$$files" \
|| { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
|| { echo " ( cd '$$dir' && rm -f" $$files ")"; \
 $(am__cd) "$$dir" && rm -f $$files; }; \
  }

leveraging xargs would allow me to maintain a single shell expansion.
the pathological situation being:
  bar.py
  __pycache__/
bar.pyc
bar*.pyc
bar**.pyc

py_files="bar.py" which turns into "__pycache__/bar*.pyc" by the pipeline,
and then am__uninstall_files_from_dir will expand it when calling `rm -f`.

if the pipeline expanded the glob, it would be:
  __pycache__/bar.pyc __pycache__/bar*.pyc __pycache__/bar**.pyc
and then when calling rm, those would expand a 2nd time.

i would have to change how the pipeline outputs the list of files such that
the final subshell could safely consume & expand.  since this is portable
shell, i don't have access to arrays & fancy things like readarray.  if the
pipeline switched to newline delimiting, and i dropped $(am__base_list), i
could use positionals to construct an array and safely expand that.  but i
strongly suspect that it's not going to be as performant, and i might as
well just run `rm` once per file :x.

  echo "$$py_files" | $(am__pep3147_tweak) | \
  ( set --
while read file; do
  set -- "$@" "$file"
  if test $# -ge 40; then
rm -f "$@"
set --
  fi
done
if test $# -gt 0; then
  rm -f "$@"
fi
  )

which at this point i've written `xargs -n40`, but not as fast :p.

> > automake jumps through some hoops to try and limit the length of generated
> > command lines, like deleting output objects in a non-recursive build.  it's
> > not perfect -- it breaks arguments up into 40 at a time (akin to xargs -n40)
> > and assumes that it won't have 40 paths with long enough names to exceed the
> > command line length.  it also has some logic where it's deleting paths by
> > globs, but the process to partition the file list into groups of 40 happens
> > before the glob is expanded, so there are cases where it's 40 globs that can
> > expand into many many more files and then exceed the command line length.
> 
> First, I thought that GNU-ish systems were not supposed to have such 
> arbitrary limits,

one person's "arbitrary limits" is another person's "too small limit" :).
i'm most familiar with Linux, so i'll focus on that.

xargs --show-limits on my Linux-5.15 system says:
Your environment variables take up 5934 bytes
POSIX upper limit on argument length (this system): 2089170
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2083236

2MB ain't too shabby.  but if we consult execve(2), it has more details:
https://man7.org/linux/man-pages/man2/execve.2.html
   On Linux prior to kernel 2.6.23, the memory used to store the
   environment and argument strings was limited to 32 pages (defined
   by the kernel constant MAX_ARG_PAGES).  On architectures with a
   4-kB page size, this yields a maximum size of 128 kB.

i've def seen "Argument list too long" errors in Gentoo from a variety of
packages du

type errors, command length limits, and Awk (was: portability of xargs)

2022-02-15 Thread Jacob Bachmeyer

Mike Frysinger wrote:

context: https://bugs.gnu.org/53340
  

Looking at the highlighted line in the context:

>   echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \
It seems that the problem is that am__base_list expects ListOf/File (and 
produces ChunkedListOf/File) but am__pep3147_tweak emits ListOf/Glob.  
This works in the usual case because the shell implicitly converts Glob 
-> ListOf/File and implicitly flattens argument lists, but results in 
the overall command line being longer than expected if the globs expand 
to more filenames than expected, as described there.


It seems that the proper solution to the problem at hand is to have 
am__pep3147_tweak expand globs itself somehow and thus provide 
ListOf/File as am__base_list expects.


Do I misunderstand?  Is there some other use for xargs?

I note that the current version of standards.texi also allows configure 
and make rules to use awk(1); could that be useful here instead? (see below)



[...]

automake jumps through some hoops to try and limit the length of generated
command lines, like deleting output objects in a non-recursive build.  it's
not perfect -- it breaks arguments up into 40 at a time (akin to xargs -n40)
and assumes that it won't have 40 paths with long enough names to exceed the
command line length.  it also has some logic where it's deleting paths by
globs, but the process to partition the file list into groups of 40 happens
before the glob is expanded, so there are cases where it's 40 globs that can
expand into many many more files and then exceed the command line length.
  


First, I thought that GNU-ish systems were not supposed to have such 
arbitrary limits, and this issue (the context) originated from Gentoo 
GNU/Linux.  Is this a more fundamental bug in Gentoo or still an issue 
because Automake build scripts are supposed to be portable to foreign 
system that do have those limits?


Second, counting files in the list, as you note, does not necessarily 
actually conform to the system limits, while Awk can track both number 
of elements in the list and the length of the list as a string, allowing 
to break the list to meet both command tail length limits (on Windows or 
total size of block to transfer with execve on POSIX) and argument count 
limits (length of argv acceptable to execve on POSIX).


POSIX Awk should be fairly widely available, although at least Solaris 
10 has a non-POSIX awk in /usr/bin and a POSIX awk in /usr/xpg4/bin; I 
found this while working on DejaGnu.  I ended up using this test to 
ensure that "awk" is suitable:


8<--
# The non-POSIX awk in /usr/bin on Solaris 10 fails this test
if echo | "$awkbin" '1 && 1 {exit 0}' > /dev/null 2>&1 ; then
   have_awk=true
else
   have_awk=false
fi
8<--


Another "gotcha" with Solaris 10 /usr/bin/awk is that it will accept 
"--version" as a valid Awk program, so if you use that to test whether 
"awk" is GNU Awk, you must redirect input from /dev/null or it will hang.


Automake may want to do more extensive testing to find a suitable Awk; 
the above went into a script that remains generic when installed and so 
must run its tests every time the user invokes it, so "quick" was a high 
priority.



-- Jacob



Re: automake 1.16.4 and new PYTHON_PREFIX

2021-09-23 Thread Karl Berry
If anyone who weighed in on the Python prefix stuff (or didn't, for that
matter) has a chance to try the new pretest at

  https://meyering.net/automake/automake-1.16g.tar.xz

that'd be great. It'd be nice not to have to do another patch-up release.

Thanks,
Karl



Re: automake 1.16.4 and new PYTHON_PREFIX

2021-09-19 Thread Karl Berry
Regarding PYTHON_PREFIX setting in automake, I've pushed a change that
(I hope) reverts to the previous behavior of using the usual GNU prefix
variables by default. It's attached.

The new configure option --with-python-sys-prefix yields the
the 1.16.4 behavior of using the sys.* Python values.

The --with-python_[exec_]prefix options are still present and
unchanged, setting the prefixes explicitly.

It would be really fantastic if there could be some testing of this by
other people before we push out another problematic release.

Jim, could you roll a test release please? --thanks, karl.

P.S. Oops, I see the brief description in the change is only supposed to
be one line. Well, too bad, not going to adjust now.

P.P.S. Although the diff shows nearly every line being changed, in fact
most of the changes are about indentation. Unfortunately. But separating
the formatting changes from the real changes proved too problematic and
time-consuming, and I wanted to end up with a correctly-formatted source
(as best I could manage). Sorry.



ampy.diff
Description: Binary data


Re: automake 1.16.4 and new PYTHON_PREFIX

2021-08-26 Thread Karl Berry
I think we need an easy way to set a default for this behaviour
from within configure.ac, similar to AC_PREFIX_DEFAULT(), so that
the end-user doesn't have to pass a bunch of options to configure
just to get the build to work sensibly.

I have nothing against the idea, but my immediate goal is more basic:
restore the previous behavior without losing the new configure benefits
entirely.

The current workaround I use is described below.

Thanks for sending that code. --best, karl.



Re: automake 1.16.4 and new PYTHON_PREFIX

2021-08-26 Thread Luke Mewburn
On 21-08-25 10:00, Karl Berry wrote:
  | yf> Would keeping PYTHON_PREFIX but changing its default to the
  | "classical" value be a working solution for this ?
  | 
  | Yes, I think we should. And I think I should have been smart enough to
  | realize that changing the default behavior was too risky in the first
  | place. Apologies for that.
  | 
  | My thought now is to add yet one more option, like
  | --python-prefix-from-python, to get the new 1.16.4 behavior of using the
  | "computed" sys.* values. Else go back to the previous $prefix-based 
behavior.
  | 
  | Does that sound sensible? A better name for the option?
  | 
  | Joshua (or anyone), would you be willing to work on something like by
  | any chance? Would be greatly, greatly, appreciated. I am way
  | overcommitted right now (like all of us, I know ...).
  | 
  | please keep the --with-python_prefix
  | 
  | Definitely. --thanks, karl.


Currently overriding the python module path to install in the default
python module path in a manner that works for DESTDIR and distcheck
is a bit tricky, especially when using a different prefix.

I think we need an easy way to set a default for this behaviour
from within configure.ac, similar to AC_PREFIX_DEFAULT(), so that
the end-user doesn't have to pass a bunch of options to configure
just to get the build to work sensibly.

The current workaround I use is described below. If there was a
cleaner/more standardized mechanism built into automake / autoconf,
I'd use that.


Change the default prefix in configure.ac:

AC_PREFIX_DEFAULT([/opt/something])


Set defaults for make variables in configure.ac:

AC_SUBST([PY_OVERRIDE_BASE],
  [$($PYTHON -c 'from distutils import sysconfig; 
print(sysconfig.PREFIX)')])
dnl Note: Makefile sets PY_OVERRIDE_PREFIX from $(prefix) or 
$(PY_OVERRIDE_BASE)
AC_SUBST([PY_OVERRIDE_EXTDIR],
  [$($PYTHON -c 'from distutils import sysconfig; 
print(sysconfig.get_python_lib(plat_specific=1,prefix="${PY_OVERRIDE_PREFIX}"))')])


Use a (non-portable) GNU make snippet in Makefile.am:

# Install python module to $(prefix) if it's not /opt/something,
# otherwise the default python prefix in $(PY_OVERRIDE_BASE).
#
PY_OVERRIDE_PREFIX := $(patsubst 
/opt/something,$(PY_OVERRIDE_BASE),$(prefix))
pkgpyexecdir = $(PY_OVERRIDE_EXTDIR)/mymodule


This latter requires in configure.ac:

AM_INIT_AUTOMAKE([-Wno-portability])



Luke.


signature.asc
Description: PGP signature


Re: automake 1.16.4 and new PYTHON_PREFIX

2021-08-25 Thread Karl Berry
yf> Would keeping PYTHON_PREFIX but changing its default to the
"classical" value be a working solution for this ?

Yes, I think we should. And I think I should have been smart enough to
realize that changing the default behavior was too risky in the first
place. Apologies for that.

My thought now is to add yet one more option, like
--python-prefix-from-python, to get the new 1.16.4 behavior of using the
"computed" sys.* values. Else go back to the previous $prefix-based behavior.

Does that sound sensible? A better name for the option?

Joshua (or anyone), would you be willing to work on something like by
any chance? Would be greatly, greatly, appreciated. I am way
overcommitted right now (like all of us, I know ...).

please keep the --with-python_prefix

Definitely. --thanks, karl.



Re: automake 1.16.4 and new PYTHON_PREFIX

2021-08-25 Thread Joshua Root

On 2021-8-25 10:14 , Karl Berry wrote:

Ok, I guess we'll have to revert the Python change and make another
release. I was worried about the change. But I am not sure of how best
to deal with the intended benefits.

Joshua, can you please take a look at these reports and advise?
https://lists.gnu.org/archive/html/automake/2021-08/msg7.html
https://lists.gnu.org/archive/html/automake/2021-08/msg6.html


I guess the fundamental question is: Why are we asking python where it 
wants modules installed? If it's so we can install modules in a place 
where python will find them, it makes sense that $PYTHON_PREFIX may be 
distinct from $prefix. If it's so that we can install modules in a 
consistent subdir relative to the prefix, then we run into the problem 
that what python gives you is not always relative to the prefix (and 
isn't necessarily consistent). It seems like using a fixed subdir would 
solve that problem better than asking python.


If reliance on the previous behaviour is widespread enough that the 
change is unacceptable, some alternatives might be to have different 
behaviour depending on whether the discovered python is in the 
configured prefix or not, or depending on whether or not python is a 
framework build. The downside is more special cases and more potential 
for confusion.


Whatever else you decide to do, please keep the --with-python_prefix 
option. Having that makes it at least possible to install in the right 
place for framework builds, even if the defaults go back to being 
incorrect for that case.


- Josh



Re: automake 1.16.4 and new PYTHON_PREFIX

2021-08-24 Thread FOURNIER Yvan via Discussion list for automake
Hello,

Would keeping PYTHON_PREFIX but changing its default to the "classical" value 
be a working solution for this ?

Best regards,

Yvan

Le 25 août 2021 07:08, j...@macports.org a écrit :
On 2021-8-25 10:14 , Karl Berry wrote:
> Ok, I guess we'll have to revert the Python change and make another
> release. I was worried about the change. But I am not sure of how best
> to deal with the intended benefits.
>
> Joshua, can you please take a look at these reports and advise?
> https://lists.gnu.org/archive/html/automake/2021-08/msg7.html
> https://lists.gnu.org/archive/html/automake/2021-08/msg6.html

I guess the fundamental question is: Why are we asking python where it
wants modules installed? If it's so we can install modules in a place
where python will find them, it makes sense that $PYTHON_PREFIX may be
distinct from $prefix. If it's so that we can install modules in a
consistent subdir relative to the prefix, then we run into the problem
that what python gives you is not always relative to the prefix (and
isn't necessarily consistent). It seems like using a fixed subdir would
solve that problem better than asking python.

If reliance on the previous behaviour is widespread enough that the
change is unacceptable, some alternatives might be to have different
behaviour depending on whether the discovered python is in the
configured prefix or not, or depending on whether or not python is a
framework build. The downside is more special cases and more potential
for confusion.

Whatever else you decide to do, please keep the --with-python_prefix
option. Having that makes it at least possible to install in the right
place for framework builds, even if the defaults go back to being
incorrect for that case.

- Josh



Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support 
que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.


This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure, either whole or partial, is prohibited except 
formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.


RE: automake 1.16.4 and new PYTHON_PREFIX

2021-08-24 Thread Karl Berry
Ok, I guess we'll have to revert the Python change and make another
release. I was worried about the change. But I am not sure of how best
to deal with the intended benefits.

Joshua, can you please take a look at these reports and advise?
https://lists.gnu.org/archive/html/automake/2021-08/msg7.html
https://lists.gnu.org/archive/html/automake/2021-08/msg6.html

Thanks,
Karl



  1   2   3   4   5   6   7   8   9   10   >