Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Florian Weimer via Gcc
* Guinevere Larsen via Overseers:

> Beyond that, we (GDB) are already experimenting with approved-by, and
> I think glibc was doing the same.

The glibc project uses Reviewed-by:, but it's completely unrelated to
this.  Everyone still pushes their own patches, and there are no
technical countermeasures in place to ensure that the pushed version is
the reviewed version.

Thanks,
Florian



Patches submission policy change

2024-04-03 Thread Christophe Lyon via Gcc
Dear release managers and developers,

TL;DR: For the sake of improving precommit CI coverage and simplifying
workflows, I’d like to request a patch submission policy change, so
that we now include regenerated files. This was discussed during the
last GNU toolchain office hours meeting [1] (2024-03-28).

Benefits or this change include:
- Increased compatibility with precommit CI
- No need to manually edit patches before submitting, thus the “git
send-email” workflow is simplified
- Patch reviewers can be confident that the committed patch will be
exactly what they approved
- Precommit CI can test exactly what has been submitted

Any concerns/objections?

As discussed on the lists and during the meeting, we have been facing
issues with testing a class of patches: those which imply regenerating
some files. Indeed, for binutils and gcc, the current patch submission
policy is to *not* include the regenerated files (IIUC the policy is
different for gdb [2]).

This means that precommit CI receives an incomplete patch, leading to
wrong and misleading regression reports, and complaints/frustration.
(our notifications now include a warning, making it clear that we do
not regenerate files ;-) )

I thought the solution was as easy as adding –enable-maintainer-mode
to the configure arguments but this has proven to be broken (random
failures with highly parallel builds).  I tried to workaround that by
adding new “regenerate” rules in the makefiles, that we could build at
-j1 before running the real build with a higher parallelism level, but
this is not ideal, not to mention that in the case of gcc, configuring
target libraries requires having built all-gcc before, which is quite
slow at -j1.

Another approach used in binutils and gdb builtbots is a dedicated
script (aka autoregen.py) which calls autotools as appropriate. It
could be extended to update other types of files, but this can be a
bit tricky (eg. some opcodes files require to build a generator first,
some doc fragments also use non-trivial build sequences), and it
creates a maintenance issue: the build recipe is duplicated in the
script and in the makefiles.  Such a script has proven to be useful
though in postcommit CI, to catch regeneration errors.

Having said that, it seems that for the sake of improving usefulness
of precommit CI, the simplest way forward is to change the policy to
include regenerated files.  This does not seem to be a burden for
developers, since they have to regenerate the files before pushing
their patches anyway, and it also enables reviewers to make sure the
generated files are correct.

Said differently, if you want to increase the chances of having your
patches tested by precommit CI, make sure to include all the
regenerated files, otherwise you might receive failure notifications.

Any concerns/objections?

Thanks,

Christophe

[1] https://gcc.gnu.org/wiki/OfficeHours#Meeting:_2024-03-28_.40_1100h_EST5EDT
[2] 
https://inbox.sourceware.org/gdb/cc0a5c86-a041-429a-9890-efd393760...@simark.ca/


Re: Patches submission policy change

2024-04-03 Thread Jan Beulich via Gcc
On 03.04.2024 10:22, Christophe Lyon wrote:
> Dear release managers and developers,
> 
> TL;DR: For the sake of improving precommit CI coverage and simplifying
> workflows, I’d like to request a patch submission policy change, so
> that we now include regenerated files. This was discussed during the
> last GNU toolchain office hours meeting [1] (2024-03-28).
> 
> Benefits or this change include:
> - Increased compatibility with precommit CI
> - No need to manually edit patches before submitting, thus the “git
> send-email” workflow is simplified
> - Patch reviewers can be confident that the committed patch will be
> exactly what they approved
> - Precommit CI can test exactly what has been submitted
> 
> Any concerns/objections?

Yes: Patch size. And no, not sending patches inline is bad practice.
Even assuming sending patches bi-modal (inline and as attachment) works
(please indicate whether that's the case), it would mean extra work on
the sending side.

Jan


Re: Patches submission policy change

2024-04-03 Thread Jakub Jelinek via Gcc
On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
> Any concerns/objections?

I'm all for it, in fact I've been sending it like that myself for years
even when the policy said not to.  In most cases, the diff for the
regenerated files is very small and it helps even in patch review to
actually check if the configure.ac/m4 etc. changes result just in the
expected changes and not some unrelated ones (e.g. caused by user using
wrong version of autoconf/automake etc.).
There can be exceptions, e.g. when in GCC we update from a new version
of Unicode, the regenerated ucnid.h diff can be large and
uname2c.h can be huge, such that it can trigger the mailing list size
limits even when the patch is compressed, see e.g.
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
But I think most configure or Makefile changes should be pretty small,
usual changes shouldn't rewrite everything in those files.

Jakub



Re: Patches submission policy change

2024-04-03 Thread Jan Beulich via Gcc
On 03.04.2024 10:45, Jakub Jelinek wrote:
> On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
>> Any concerns/objections?
> 
> I'm all for it, in fact I've been sending it like that myself for years
> even when the policy said not to.  In most cases, the diff for the
> regenerated files is very small and it helps even in patch review to
> actually check if the configure.ac/m4 etc. changes result just in the
> expected changes and not some unrelated ones (e.g. caused by user using
> wrong version of autoconf/automake etc.).
> There can be exceptions, e.g. when in GCC we update from a new version
> of Unicode, the regenerated ucnid.h diff can be large and
> uname2c.h can be huge, such that it can trigger the mailing list size
> limits even when the patch is compressed, see e.g.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
> But I think most configure or Makefile changes should be pretty small,
> usual changes shouldn't rewrite everything in those files.

Which may then call for a policy saying "include generate script diff-s,
but don't include generated data file ones"? At least on the binutils
side, dealing (for CI) with what e.g. opcodes/*-gen produce ought to be
possible by having something along the lines of "maintainer mode light".

Jan


Re: Patches submission policy change

2024-04-03 Thread Richard Biener via Gcc
On Wed, 3 Apr 2024, Jan Beulich wrote:

> On 03.04.2024 10:45, Jakub Jelinek wrote:
> > On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
> >> Any concerns/objections?
> > 
> > I'm all for it, in fact I've been sending it like that myself for years
> > even when the policy said not to.  In most cases, the diff for the
> > regenerated files is very small and it helps even in patch review to
> > actually check if the configure.ac/m4 etc. changes result just in the
> > expected changes and not some unrelated ones (e.g. caused by user using
> > wrong version of autoconf/automake etc.).
> > There can be exceptions, e.g. when in GCC we update from a new version
> > of Unicode, the regenerated ucnid.h diff can be large and
> > uname2c.h can be huge, such that it can trigger the mailing list size
> > limits even when the patch is compressed, see e.g.
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
> > But I think most configure or Makefile changes should be pretty small,
> > usual changes shouldn't rewrite everything in those files.
> 
> Which may then call for a policy saying "include generate script diff-s,
> but don't include generated data file ones"? At least on the binutils
> side, dealing (for CI) with what e.g. opcodes/*-gen produce ought to be
> possible by having something along the lines of "maintainer mode light".

I'd say we should send generated files when it fits the mailing list
limits (and possibly simply lift those limits?).  As a last resort
do a series splitting the re-generation out (but I guess this would
confuse the CI as well and of course for the push you want to squash
again).

Richard.

> Jan
> 


Re: Patches submission policy change

2024-04-03 Thread Jonathan Wakely via Gcc
On Wed, 3 Apr 2024 at 09:46, Jakub Jelinek via Gcc  wrote:
>
> On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
> > Any concerns/objections?
>
> I'm all for it, in fact I've been sending it like that myself for years
> even when the policy said not to.  In most cases, the diff for the
> regenerated files is very small and it helps even in patch review to
> actually check if the configure.ac/m4 etc. changes result just in the
> expected changes and not some unrelated ones (e.g. caused by user using
> wrong version of autoconf/automake etc.).
> There can be exceptions, e.g. when in GCC we update from a new version
> of Unicode, the regenerated ucnid.h diff can be large and
> uname2c.h can be huge, such that it can trigger the mailing list size
> limits even when the patch is compressed, see e.g.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
> But I think most configure or Makefile changes should be pretty small,
> usual changes shouldn't rewrite everything in those files.

For libstdc++ we've had two "large" changes to regenerated files in
the past year, but they're not common:
https://gcc.gnu.org/r14-5424-gdb50aea6259545
https://gcc.gnu.org/r14-5342-g898fd81b831c10

We were getting large, useless diffs for
libstdc++-v3/include/bits/version.h too (e.g.
r14-7220-gac1a399bf61b04) but I've fixed that now.

In ye olde days I used filterdiff to strip the generated files from
patch submissions, but with git send-email I no longer use filterdiff,
so as Christophe said, the suggested policy would avoid manually
editing emails before sending.

I don't feel strongly either way, but I have no objection to the change.


Re: Patches submission policy change

2024-04-03 Thread Jan Beulich via Gcc
On 03.04.2024 10:57, Richard Biener wrote:
> On Wed, 3 Apr 2024, Jan Beulich wrote:
>> On 03.04.2024 10:45, Jakub Jelinek wrote:
>>> On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
 Any concerns/objections?
>>>
>>> I'm all for it, in fact I've been sending it like that myself for years
>>> even when the policy said not to.  In most cases, the diff for the
>>> regenerated files is very small and it helps even in patch review to
>>> actually check if the configure.ac/m4 etc. changes result just in the
>>> expected changes and not some unrelated ones (e.g. caused by user using
>>> wrong version of autoconf/automake etc.).
>>> There can be exceptions, e.g. when in GCC we update from a new version
>>> of Unicode, the regenerated ucnid.h diff can be large and
>>> uname2c.h can be huge, such that it can trigger the mailing list size
>>> limits even when the patch is compressed, see e.g.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
>>> But I think most configure or Makefile changes should be pretty small,
>>> usual changes shouldn't rewrite everything in those files.
>>
>> Which may then call for a policy saying "include generate script diff-s,
>> but don't include generated data file ones"? At least on the binutils
>> side, dealing (for CI) with what e.g. opcodes/*-gen produce ought to be
>> possible by having something along the lines of "maintainer mode light".
> 
> I'd say we should send generated files when it fits the mailing list
> limits (and possibly simply lift those limits?).

Well, that would allow patches making it through, but it would still
severely increase overall size. I'm afraid more people than not also
fail to cut down reply context, so we'd further see (needlessly) huge
replies to patches as well.

Additionally - how does one up front determine "fits the mailing list
limits"? My mail UI (Thunderbird) doesn't show me the size of a message
until I've actually sent it.

>  As a last resort
> do a series splitting the re-generation out (but I guess this would
> confuse the CI as well and of course for the push you want to squash
> again).

Yeah, unless the CI would only ever test full series, this wouldn't help.
It's also imo even more cumbersome than simply stripping the generated
file parts from emails.

Jan


Re: Patches submission policy change

2024-04-03 Thread Joel Sherrill
Another possible issue which may be better now than in years past
is that the versions of autoconf/automake required often had to be
installed by hand. I think newlib has gotten better but before the
rework on its Makefile/configure, I had a special install of autotools
which precisely matched what it required.

And that led to very few people being able to successfully regenerate.

Is that avoidable?

OTOH the set of people touching these files may be small enough that
pain isn't an issue.

--joel

On Wed, Apr 3, 2024 at 5:22 AM Jan Beulich via Gcc  wrote:

> On 03.04.2024 10:57, Richard Biener wrote:
> > On Wed, 3 Apr 2024, Jan Beulich wrote:
> >> On 03.04.2024 10:45, Jakub Jelinek wrote:
> >>> On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
>  Any concerns/objections?
> >>>
> >>> I'm all for it, in fact I've been sending it like that myself for years
> >>> even when the policy said not to.  In most cases, the diff for the
> >>> regenerated files is very small and it helps even in patch review to
> >>> actually check if the configure.ac/m4 etc. changes result just in the
> >>> expected changes and not some unrelated ones (e.g. caused by user using
> >>> wrong version of autoconf/automake etc.).
> >>> There can be exceptions, e.g. when in GCC we update from a new version
> >>> of Unicode, the regenerated ucnid.h diff can be large and
> >>> uname2c.h can be huge, such that it can trigger the mailing list size
> >>> limits even when the patch is compressed, see e.g.
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
> >>> But I think most configure or Makefile changes should be pretty small,
> >>> usual changes shouldn't rewrite everything in those files.
> >>
> >> Which may then call for a policy saying "include generate script diff-s,
> >> but don't include generated data file ones"? At least on the binutils
> >> side, dealing (for CI) with what e.g. opcodes/*-gen produce ought to be
> >> possible by having something along the lines of "maintainer mode light".
> >
> > I'd say we should send generated files when it fits the mailing list
> > limits (and possibly simply lift those limits?).
>
> Well, that would allow patches making it through, but it would still
> severely increase overall size. I'm afraid more people than not also
> fail to cut down reply context, so we'd further see (needlessly) huge
> replies to patches as well.
>
> Additionally - how does one up front determine "fits the mailing list
> limits"? My mail UI (Thunderbird) doesn't show me the size of a message
> until I've actually sent it.
>
> >  As a last resort
> > do a series splitting the re-generation out (but I guess this would
> > confuse the CI as well and of course for the push you want to squash
> > again).
>
> Yeah, unless the CI would only ever test full series, this wouldn't help.
> It's also imo even more cumbersome than simply stripping the generated
> file parts from emails.
>
> Jan
>


Re: Patches submission policy change

2024-04-03 Thread Jason Merrill via Gcc
On Wed, Apr 3, 2024 at 6:23 AM Jan Beulich via Gcc  wrote:

> On 03.04.2024 10:57, Richard Biener wrote:
> > On Wed, 3 Apr 2024, Jan Beulich wrote:
> >> On 03.04.2024 10:45, Jakub Jelinek wrote:
> >>> On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
>  Any concerns/objections?
> >>>
> >>> I'm all for it, in fact I've been sending it like that myself for years
> >>> even when the policy said not to.  In most cases, the diff for the
> >>> regenerated files is very small and it helps even in patch review to
> >>> actually check if the configure.ac/m4 etc. changes result just in the
> >>> expected changes and not some unrelated ones (e.g. caused by user using
> >>> wrong version of autoconf/automake etc.).
> >>> There can be exceptions, e.g. when in GCC we update from a new version
> >>> of Unicode, the regenerated ucnid.h diff can be large and
> >>> uname2c.h can be huge, such that it can trigger the mailing list size
> >>> limits even when the patch is compressed, see e.g.
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
> >>> But I think most configure or Makefile changes should be pretty small,
> >>> usual changes shouldn't rewrite everything in those files.
> >>
> >> Which may then call for a policy saying "include generate script diff-s,
> >> but don't include generated data file ones"? At least on the binutils
> >> side, dealing (for CI) with what e.g. opcodes/*-gen produce ought to be
> >> possible by having something along the lines of "maintainer mode light".
> >
> > I'd say we should send generated files when it fits the mailing list
> > limits (and possibly simply lift those limits?).
>
> Well, that would allow patches making it through, but it would still
> severely increase overall size. I'm afraid more people than not also
> fail to cut down reply context, so we'd further see (needlessly) huge
> replies to patches as well.
>
> Additionally - how does one up front determine "fits the mailing list
> limits"? My mail UI (Thunderbird) doesn't show me the size of a message
> until I've actually sent it.
>
> >  As a last resort
> > do a series splitting the re-generation out (but I guess this would
> > confuse the CI as well and of course for the push you want to squash
> > again).
>
> Yeah, unless the CI would only ever test full series, this wouldn't help.
> It's also imo even more cumbersome than simply stripping the generated
> file parts from emails.
>

Massive patches need special handling either way; I wouldn't lift the
mailing list limits because of this change.

I'm in favor of this change for typical patches.

Jason


Re: Patches submission policy change

2024-04-03 Thread Christophe Lyon via Gcc
On Wed, 3 Apr 2024 at 10:30, Jan Beulich  wrote:
>
> On 03.04.2024 10:22, Christophe Lyon wrote:
> > Dear release managers and developers,
> >
> > TL;DR: For the sake of improving precommit CI coverage and simplifying
> > workflows, I’d like to request a patch submission policy change, so
> > that we now include regenerated files. This was discussed during the
> > last GNU toolchain office hours meeting [1] (2024-03-28).
> >
> > Benefits or this change include:
> > - Increased compatibility with precommit CI
> > - No need to manually edit patches before submitting, thus the “git
> > send-email” workflow is simplified
> > - Patch reviewers can be confident that the committed patch will be
> > exactly what they approved
> > - Precommit CI can test exactly what has been submitted
> >
> > Any concerns/objections?
>
> Yes: Patch size. And no, not sending patches inline is bad practice.
Not sure what you mean? Do you mean sending patches as attachments is
bad practice?

> Even assuming sending patches bi-modal (inline and as attachment) works
> (please indicate whether that's the case), it would mean extra work on
> the sending side.
>
For the CI perspective, we use what patchwork is able to detect as patches.
Looking at recent patches submissions, it seems patchwork is able to
cope with the output of git format-patch/git send-email, as well as
attachments.
There are cases where patchwork is not able to detect the patch, but I
don't know patchwork's exact specifications.

Thanks,

Christophe

> Jan


Re: Patches submission policy change

2024-04-03 Thread Christophe Lyon via Gcc
On Wed, 3 Apr 2024 at 12:21, Jan Beulich  wrote:
>
> On 03.04.2024 10:57, Richard Biener wrote:
> > On Wed, 3 Apr 2024, Jan Beulich wrote:
> >> On 03.04.2024 10:45, Jakub Jelinek wrote:
> >>> On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
>  Any concerns/objections?
> >>>
> >>> I'm all for it, in fact I've been sending it like that myself for years
> >>> even when the policy said not to.  In most cases, the diff for the
> >>> regenerated files is very small and it helps even in patch review to
> >>> actually check if the configure.ac/m4 etc. changes result just in the
> >>> expected changes and not some unrelated ones (e.g. caused by user using
> >>> wrong version of autoconf/automake etc.).
> >>> There can be exceptions, e.g. when in GCC we update from a new version
> >>> of Unicode, the regenerated ucnid.h diff can be large and
> >>> uname2c.h can be huge, such that it can trigger the mailing list size
> >>> limits even when the patch is compressed, see e.g.
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
> >>> But I think most configure or Makefile changes should be pretty small,
> >>> usual changes shouldn't rewrite everything in those files.
> >>
> >> Which may then call for a policy saying "include generate script diff-s,
> >> but don't include generated data file ones"? At least on the binutils
> >> side, dealing (for CI) with what e.g. opcodes/*-gen produce ought to be
> >> possible by having something along the lines of "maintainer mode light".
> >
> > I'd say we should send generated files when it fits the mailing list
> > limits (and possibly simply lift those limits?).
>
> Well, that would allow patches making it through, but it would still
> severely increase overall size. I'm afraid more people than not also
> fail to cut down reply context, so we'd further see (needlessly) huge
> replies to patches as well.
>
> Additionally - how does one up front determine "fits the mailing list
> limits"? My mail UI (Thunderbird) doesn't show me the size of a message
> until I've actually sent it.
>
> >  As a last resort
> > do a series splitting the re-generation out (but I guess this would
> > confuse the CI as well and of course for the push you want to squash
> > again).
>
> Yeah, unless the CI would only ever test full series, this wouldn't help.
> It's also imo even more cumbersome than simply stripping the generated
> file parts from emails.
>

Our CI starts by testing the full series, then iterates by dropping
the top-most patches one by one, to make sure no patch breaks
something that is fixed in a later patch.
This is meant to be additional information for patch reviewers, if
they use patchwork to assist them.

Other CIs may behave differently though.

Thanks,

Christophe

> Jan


Re: Patches submission policy change

2024-04-03 Thread Christophe Lyon via Gcc
On Wed, 3 Apr 2024 at 14:59, Joel Sherrill  wrote:
>
> Another possible issue which may be better now than in years past
> is that the versions of autoconf/automake required often had to be
> installed by hand. I think newlib has gotten better but before the
> rework on its Makefile/configure, I had a special install of autotools
> which precisely matched what it required.
>
> And that led to very few people being able to successfully regenerate.
>
> Is that avoidable?
>
> OTOH the set of people touching these files may be small enough that
> pain isn't an issue.
>

For binutils/gcc/gdb we still have to use specific versions which are
generally not the distro's ones.
So indeed, the number of people having to update autotools-related
files is relatively small, but there are other files which are
regenerated during the build process when maintainer-mode is enabled
(for instance parts of the gcc documentation, or opcodes tables in
binutils, and these are modified by more people.

Thanks,

Christophe

> --joel
>
> On Wed, Apr 3, 2024 at 5:22 AM Jan Beulich via Gcc  wrote:
>>
>> On 03.04.2024 10:57, Richard Biener wrote:
>> > On Wed, 3 Apr 2024, Jan Beulich wrote:
>> >> On 03.04.2024 10:45, Jakub Jelinek wrote:
>> >>> On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
>>  Any concerns/objections?
>> >>>
>> >>> I'm all for it, in fact I've been sending it like that myself for years
>> >>> even when the policy said not to.  In most cases, the diff for the
>> >>> regenerated files is very small and it helps even in patch review to
>> >>> actually check if the configure.ac/m4 etc. changes result just in the
>> >>> expected changes and not some unrelated ones (e.g. caused by user using
>> >>> wrong version of autoconf/automake etc.).
>> >>> There can be exceptions, e.g. when in GCC we update from a new version
>> >>> of Unicode, the regenerated ucnid.h diff can be large and
>> >>> uname2c.h can be huge, such that it can trigger the mailing list size
>> >>> limits even when the patch is compressed, see e.g.
>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
>> >>> But I think most configure or Makefile changes should be pretty small,
>> >>> usual changes shouldn't rewrite everything in those files.
>> >>
>> >> Which may then call for a policy saying "include generate script diff-s,
>> >> but don't include generated data file ones"? At least on the binutils
>> >> side, dealing (for CI) with what e.g. opcodes/*-gen produce ought to be
>> >> possible by having something along the lines of "maintainer mode light".
>> >
>> > I'd say we should send generated files when it fits the mailing list
>> > limits (and possibly simply lift those limits?).
>>
>> Well, that would allow patches making it through, but it would still
>> severely increase overall size. I'm afraid more people than not also
>> fail to cut down reply context, so we'd further see (needlessly) huge
>> replies to patches as well.
>>
>> Additionally - how does one up front determine "fits the mailing list
>> limits"? My mail UI (Thunderbird) doesn't show me the size of a message
>> until I've actually sent it.
>>
>> >  As a last resort
>> > do a series splitting the re-generation out (but I guess this would
>> > confuse the CI as well and of course for the push you want to squash
>> > again).
>>
>> Yeah, unless the CI would only ever test full series, this wouldn't help.
>> It's also imo even more cumbersome than simply stripping the generated
>> file parts from emails.
>>
>> Jan


How to get the default values for parameters?

2024-04-03 Thread Hanke Zhang via Gcc
Hi,
I'm trying to get the default values for parameters of some functions
in my GIMPLE-PASS. The example code is here:

enum {
  edefault1 = 1,
  edefault2 = 2,
  edefault3 = 3
}

void func(int p0, int p1 = edefault1, int p2 = edefault2, int p3 = edefault3) {
  // do other things
}

I want to get the value of `p1` here. But I didn't find a way to get it.

And I noticed that the marco `DECL_INITIAL` cannot be applied on
PARM_DECL. And the comments say that the default values for parameters
are encoded in the type of the function. But I can't find it.

If you know how to get the value, I'll be so grateful! :)

Thanks
Hanke Zhang


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Joel Sherrill
On Wed, Apr 3, 2024, 3:09 AM Florian Weimer via Gdb 
wrote:

> * Guinevere Larsen via Overseers:
>
> > Beyond that, we (GDB) are already experimenting with approved-by, and
> > I think glibc was doing the same.
>
> The glibc project uses Reviewed-by:, but it's completely unrelated to
> this.  Everyone still pushes their own patches, and there are no
> technical countermeasures in place to ensure that the pushed version is
> the reviewed version.
>

Or that there isn't "collusion" between a malicious author and reviewer.
Just tagging it approved or reviewed by just gives you two people to blame.
It is not a perfect solution either.

But double checking and checklists are good practices. They are not
foolproof if some bad actor is determined enough.

--joel



> Thanks,
> Florian
>
>


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Michael Matz via Gcc
Hello,

On Wed, 3 Apr 2024, Martin Uecker via Gcc wrote:

> > > Seems reasonable, but note that it wouldn't make any difference to
> > > this attack.  The liblzma library was modified to corrupt the sshd
> > > binary, when sshd was linked against liblzma.  The actual attack
> > > occurred via a connection to a corrupt sshd.  If sshd was running as
> > > root, as is normal, the attacker had root access to the machine.  None
> > > of the attacking steps had anything to do with having root access
> > > while building or installing the program.
> 
> There does not seem a single good solution against something like this.
> 
> My take a way is that software needs to become less complex. Do 
> we really still need complex build systems such as autoconf?

Do we really need complex languages like C++ to write our software in?  
SCNR :)  Complexity lies in the eye of the beholder, but to be honest in 
the software that we're dealing with here, the build system or autoconf 
does _not_ come to mind first when thinking about complexity.

(And, FWIW, testing for features isn't "complex".  And have you looked at 
other build systems?  I have, and none of them are less complex, just 
opaque in different ways from make+autotools).


Ciao,
Michael.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Tom Tromey
> "Florian" == Florian Weimer  writes:

Florian> Everyone still pushes their own patches, and there are no
Florian> technical countermeasures in place to ensure that the pushed version is
Florian> the reviewed version.

This is a problem for gdb as well.

Probably we should switch to some kind of pull-request model, where
patches can only be landed via the UI, after sufficient review; and
where all generated files are regenerated by the robot before checkin.
(Or alternatively some CI runs and rejects patches where they don't
match.)

Tom


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Paul Koning via Gcc



> On Apr 3, 2024, at 10:00 AM, Michael Matz  wrote:
> 
> Hello,
> 
> On Wed, 3 Apr 2024, Martin Uecker via Gcc wrote:
> 
 Seems reasonable, but note that it wouldn't make any difference to
 this attack.  The liblzma library was modified to corrupt the sshd
 binary, when sshd was linked against liblzma.  The actual attack
 occurred via a connection to a corrupt sshd.  If sshd was running as
 root, as is normal, the attacker had root access to the machine.  None
 of the attacking steps had anything to do with having root access
 while building or installing the program.
>> 
>> There does not seem a single good solution against something like this.
>> 
>> My take a way is that software needs to become less complex. Do 
>> we really still need complex build systems such as autoconf?
> 
> Do we really need complex languages like C++ to write our software in?  
> SCNR :)  Complexity lies in the eye of the beholder, but to be honest in 
> the software that we're dealing with here, the build system or autoconf 
> does _not_ come to mind first when thinking about complexity.
> 
> (And, FWIW, testing for features isn't "complex".  And have you looked at 
> other build systems?  I have, and none of them are less complex, just 
> opaque in different ways from make+autotools).
> 
> Ciao,
> Michael.

I would tend to agree with that even given my limited exposure to alternatives.

One aspect of the present attack that needs to be cured is that -- as I 
understand it -- the open source repository was fine but the kit as distributed 
had been subverted.  In other words, the standard assumption that the 
repository actually corresponds to the released code was not valid.  And 
furthermore, that it wasn't unusual for the kit to contain different or 
additional elements, just that it wasn't supposed to differ in malicious ways.

One possible answer is for all elements of kits to be made explicitly visible, 
though generated files probably don't want to be held in a normal source 
control system.  Another possible answer is for consumers of kits to treat kits 
as suspect, and have them unpacked and examined -- including any elements not 
source controlled -- before acceptance.  I think the first option is better 
because it exposes these additional elements to ongoing scrutiny from the 
entire community, rather than only one-time inspection by release managers who 
are probably quite pressed for time.

Either way, the reasons for these extra files to exist and the manner in which 
they are supposed to be generated would need to be both well documented and 
readily reproducible by outside parties.

paul



Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Martin Uecker via Gcc
Am Mittwoch, dem 03.04.2024 um 16:00 +0200 schrieb Michael Matz:
> Hello,
> 
> On Wed, 3 Apr 2024, Martin Uecker via Gcc wrote:
> 
> > > > Seems reasonable, but note that it wouldn't make any difference to
> > > > this attack.  The liblzma library was modified to corrupt the sshd
> > > > binary, when sshd was linked against liblzma.  The actual attack
> > > > occurred via a connection to a corrupt sshd.  If sshd was running as
> > > > root, as is normal, the attacker had root access to the machine.  None
> > > > of the attacking steps had anything to do with having root access
> > > > while building or installing the program.
> > 
> > There does not seem a single good solution against something like this.
> > 
> > My take a way is that software needs to become less complex. Do 
> > we really still need complex build systems such as autoconf?
> 
> Do we really need complex languages like C++ to write our software in?  
> SCNR :)  

Probably not.

> Complexity lies in the eye of the beholder, but to be honest in 
> the software that we're dealing with here, the build system or autoconf 
> does _not_ come to mind first when thinking about complexity.

The backdoor was hidden in a complicated autoconf script...

> 
> (And, FWIW, testing for features isn't "complex".  And have you looked at 
> other build systems?  I have, and none of them are less complex, just 
> opaque in different ways from make+autotools).

I ask a very specific question: To what extend is testing 
for features instead of semantic versions and/or supported
standards still necessary?  This seems like a problematic approach
that  may have been necessary decades ago, but it seems it may be
time to move on.


Martin




Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Jeff Law via Gcc




On 4/3/24 8:04 AM, Tom Tromey wrote:

"Florian" == Florian Weimer  writes:


Florian> Everyone still pushes their own patches, and there are no
Florian> technical countermeasures in place to ensure that the pushed version is
Florian> the reviewed version.

This is a problem for gdb as well.

Probably we should switch to some kind of pull-request model, where
patches can only be landed via the UI, after sufficient review; and
where all generated files are regenerated by the robot before checkin.
(Or alternatively some CI runs and rejects patches where they don't
match.)

I've very much prefer to move to a pull-request model.

jeff


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Jeffrey Walton via Gcc
On Wed, Apr 3, 2024 at 10:33 AM Martin Uecker via Gdb
 wrote:
>
> Am Mittwoch, dem 03.04.2024 um 16:00 +0200 schrieb Michael Matz:
> > [...]
> > (And, FWIW, testing for features isn't "complex".  And have you looked at
> > other build systems?  I have, and none of them are less complex, just
> > opaque in different ways from make+autotools).
>
> I ask a very specific question: To what extend is testing
> for features instead of semantic versions and/or supported
> standards still necessary?  This seems like a problematic approach
> that  may have been necessary decades ago, but it seems it may be
> time to move on.

I think it is still needed. As a first example, Musl does not define
preprocessor macros to identify itself. The project feels features
should be probed at build time with a tool like Autoconf. As a second
example, activating code paths, like AVX2 and AVX512, requires
ensuring the compiler actually supports the ISA. Building a program on
an older distribution with an older compiler could run afoul.

Jeff


Re: Patches submission policy change

2024-04-03 Thread Simon Marchi via Gcc
On 4/3/24 4:22 AM, Christophe Lyon via Gdb wrote:
> Dear release managers and developers,
> 
> TL;DR: For the sake of improving precommit CI coverage and simplifying
> workflows, I’d like to request a patch submission policy change, so
> that we now include regenerated files. This was discussed during the
> last GNU toolchain office hours meeting [1] (2024-03-28).
> 
> Benefits or this change include:
> - Increased compatibility with precommit CI
> - No need to manually edit patches before submitting, thus the “git
> send-email” workflow is simplified
> - Patch reviewers can be confident that the committed patch will be
> exactly what they approved
> - Precommit CI can test exactly what has been submitted
> 
> Any concerns/objections?
> 
> As discussed on the lists and during the meeting, we have been facing
> issues with testing a class of patches: those which imply regenerating
> some files. Indeed, for binutils and gcc, the current patch submission
> policy is to *not* include the regenerated files (IIUC the policy is
> different for gdb [2]).
> 
> This means that precommit CI receives an incomplete patch, leading to
> wrong and misleading regression reports, and complaints/frustration.
> (our notifications now include a warning, making it clear that we do
> not regenerate files ;-) )
> 
> I thought the solution was as easy as adding –enable-maintainer-mode
> to the configure arguments but this has proven to be broken (random
> failures with highly parallel builds).  I tried to workaround that by
> adding new “regenerate” rules in the makefiles, that we could build at
> -j1 before running the real build with a higher parallelism level, but
> this is not ideal, not to mention that in the case of gcc, configuring
> target libraries requires having built all-gcc before, which is quite
> slow at -j1.
> 
> Another approach used in binutils and gdb builtbots is a dedicated
> script (aka autoregen.py) which calls autotools as appropriate. It
> could be extended to update other types of files, but this can be a
> bit tricky (eg. some opcodes files require to build a generator first,
> some doc fragments also use non-trivial build sequences), and it
> creates a maintenance issue: the build recipe is duplicated in the
> script and in the makefiles.  Such a script has proven to be useful
> though in postcommit CI, to catch regeneration errors.
> 
> Having said that, it seems that for the sake of improving usefulness
> of precommit CI, the simplest way forward is to change the policy to
> include regenerated files.  This does not seem to be a burden for
> developers, since they have to regenerate the files before pushing
> their patches anyway, and it also enables reviewers to make sure the
> generated files are correct.
> 
> Said differently, if you want to increase the chances of having your
> patches tested by precommit CI, make sure to include all the
> regenerated files, otherwise you might receive failure notifications.
> 
> Any concerns/objections?

We already do that for GDB (include generated files), and it works fine.
I sometimes have a patch that exceeds the mailing list limit (400k?),
but it seems like the only consequence is that someone needs to approve
it (thanks to whoever does that).

Simon


Re: How to get the default values for parameters?

2024-04-03 Thread Jason Merrill via Gcc
On Wed, Apr 3, 2024 at 9:35 AM Hanke Zhang via Gcc  wrote:
>
> Hi,
> I'm trying to get the default values for parameters of some functions
> in my GIMPLE-PASS. The example code is here:
>
> enum {
>   edefault1 = 1,
>   edefault2 = 2,
>   edefault3 = 3
> }
>
> void func(int p0, int p1 = edefault1, int p2 = edefault2, int p3 = edefault3) 
> {
>   // do other things
> }
>
> I want to get the value of `p1` here. But I didn't find a way to get it.
>
> And I noticed that the marco `DECL_INITIAL` cannot be applied on
> PARM_DECL. And the comments say that the default values for parameters
> are encoded in the type of the function. But I can't find it.

The TYPE_ARG_TYPES of a FUNCTION_TYPE is a TREE_LIST where the
TREE_PURPOSE is the default argument.

Jason



Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Michael Matz via Gcc
Hello,

On Wed, 3 Apr 2024, Martin Uecker wrote:

> The backdoor was hidden in a complicated autoconf script...

Which itself had multiple layers and could just as well have been a 
complicated cmake function.

> > (And, FWIW, testing for features isn't "complex".  And have you looked at 
> > other build systems?  I have, and none of them are less complex, just 
> > opaque in different ways from make+autotools).
> 
> I ask a very specific question: To what extend is testing 
> for features instead of semantic versions and/or supported
> standards still necessary?

I can't answer this with absolute certainty, but points to consider: the 
semantic versions need to be maintained just as well, in some magic way.  
Because ultimately software depend on features of dependencies not on 
arbitrary numbers given to them.  The numbers encode these features, in 
the best case, when there are no errors.  So, no, version numbers are not 
a replacement for feature tests, they are a proxy.  One that is manually 
maintained, and hence prone to errors.

Now, supported standards: which one? ;-)  Or more in earnest: while on 
this mailing list here we could chose a certain set, POSIX, some 
languages, Windows, MacOS (versions so-and-so).  What about other 
software relying on other 3rdparty feature providers (libraries or system 
services)?  Without standards?

So, without absolute certainty, but with a little bit of it: yes, feature 
tests are required in general.  That doesn't mean that we could not 
do away with quite some of them for (e.g.) GCC, those that hold true on 
any platform we support.  But we can't get rid of the infrastructure for 
that, and can't get rid of certain classes of tests.

> This seems like a problematic approach that may have been necessary 
> decades ago, but it seems it may be time to move on.

I don't see that.  Many aspects of systems remain non-standardized.


Ciao,
Michael.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Joel Sherrill
On Wed, Apr 3, 2024 at 11:03 AM Michael Matz via Gdb 
wrote:

> Hello,
>
> On Wed, 3 Apr 2024, Martin Uecker wrote:
>
> > The backdoor was hidden in a complicated autoconf script...
>
> Which itself had multiple layers and could just as well have been a
> complicated cmake function.
>
> > > (And, FWIW, testing for features isn't "complex".  And have you looked
> at
> > > other build systems?  I have, and none of them are less complex, just
> > > opaque in different ways from make+autotools).
> >
> > I ask a very specific question: To what extend is testing
> > for features instead of semantic versions and/or supported
> > standards still necessary?
>
> I can't answer this with absolute certainty, but points to consider: the
> semantic versions need to be maintained just as well, in some magic way.
> Because ultimately software depend on features of dependencies not on
> arbitrary numbers given to them.  The numbers encode these features, in
> the best case, when there are no errors.  So, no, version numbers are not
> a replacement for feature tests, they are a proxy.  One that is manually
> maintained, and hence prone to errors.
>
> Now, supported standards: which one? ;-)  Or more in earnest: while on
> this mailing list here we could chose a certain set, POSIX, some
> languages, Windows, MacOS (versions so-and-so).  What about other
> software relying on other 3rdparty feature providers (libraries or system
> services)?  Without standards?
>
> So, without absolute certainty, but with a little bit of it: yes, feature
> tests are required in general.  That doesn't mean that we could not
> do away with quite some of them for (e.g.) GCC, those that hold true on
> any platform we support.  But we can't get rid of the infrastructure for
> that, and can't get rid of certain classes of tests.
>

Even assuming a set of standards still leaves issues like supporting a
wide range of OS versions which may evolve to support different versions
of the same standard. And then there is the case of standards having
deliberately implementation defined behavior. Last week, I had to double
check what malloc(0) and free(NULL) are supposed to do per POSIX.
Guess what -- implementation defined behavior options.

And worse, POSIX doesn't define the default attributes for pthreads and
the associated synchronization objects. These vary widely across OSes.

Then there are the broken things autoconf checks for. Somehow the
"broken sed from Solaris" check sticks in my head. Or needing to use
GNU sed instead of FreeBSD sed to build a mips embedded target.

And this ignores issues like word size issues. 32 vs 64 bits for one,
Or weird CPUs where nothing is byte addressable (TI C3x/4x is one
example).


> > This seems like a problematic approach that may have been necessary
> > decades ago, but it seems it may be time to move on.
>
> I don't see that.  Many aspects of systems remain non-standardized.
>

Again from pthreads, manipulating affinity on multi-core systems and
having names for pthreads are non-standard but commonly available
with varying APIs.

And the standards have plenty of wiggle room in them. Undefined areas
or deliberately implementation defined.

--joel

>
>
> Ciao,
> Michael.
>


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Martin Uecker via Gcc
Am Mittwoch, dem 03.04.2024 um 18:02 +0200 schrieb Michael Matz:
> Hello,
> 
> On Wed, 3 Apr 2024, Martin Uecker wrote:
> 
> > The backdoor was hidden in a complicated autoconf script...
> 
> Which itself had multiple layers and could just as well have been a 
> complicated cmake function.

Don't me wrong, cmake is no way better. Another problem was 
actually hidden in a cmake test in upstream git in plain
sight:

https://git.tukaani.org/?p=xz.git;a=commitdiff;h=f9cf4c05edd14dedfe63833f8ccbe41b55823b00;hp=af071ef7702debef4f1d324616a0137a5001c14c

> 
> > > (And, FWIW, testing for features isn't "complex".  And have you looked at 
> > > other build systems?  I have, and none of them are less complex, just 
> > > opaque in different ways from make+autotools).
> > 
> > I ask a very specific question: To what extend is testing 
> > for features instead of semantic versions and/or supported
> > standards still necessary?
> 
> I can't answer this with absolute certainty, but points to consider: the 
> semantic versions need to be maintained just as well, in some magic way.  

It would certainly need to be maintained just like now autoconf
configuration needs to be maintained.

> Because ultimately software depend on features of dependencies not on 
> arbitrary numbers given to them.  The numbers encode these features, in 
> the best case, when there are no errors.  So, no, version numbers are not 
> a replacement for feature tests, they are a proxy.  One that is manually 
> maintained, and hence prone to errors.

Tests are also prone to errors and - as the example above shows -
also very fragile and susceptible to manipulation.

> 
> Now, supported standards: which one? ;-)  Or more in earnest: while on 
> this mailing list here we could chose a certain set, POSIX, some 
> languages, Windows, MacOS (versions so-and-so).  What about other 
> software relying on other 3rdparty feature providers (libraries or system 
> services)?  Without standards?
> 
> So, without absolute certainty, but with a little bit of it: yes, feature 
> tests are required in general.  That doesn't mean that we could not 
> do away with quite some of them for (e.g.) GCC, those that hold true on 
> any platform we support.  But we can't get rid of the infrastructure for 
> that, and can't get rid of certain classes of tests.
> 
> > This seems like a problematic approach that may have been necessary 
> > decades ago, but it seems it may be time to move on.
> 
> I don't see that.  Many aspects of systems remain non-standardized.

This is just part of the problem.

Martin

> 
> 
> Ciao,
> Michael.



Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Andreas Schwab
On Apr 03 2024, Martin Uecker wrote:

> I ask a very specific question: To what extend is testing 
> for features instead of semantic versions and/or supported
> standards still necessary?  This seems like a problematic approach
> that  may have been necessary decades ago, but it seems it may be
> time to move on.

CMake is doing mostly the same feature tests under the hood.  Because
they are proven convenient and effective.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Jonathan Wakely via Gcc
On Wed, 3 Apr 2024 at 15:34, Martin Uecker via Gcc  wrote:
> I ask a very specific question: To what extend is testing
> for features instead of semantic versions and/or supported
> standards still necessary?  This seems like a problematic approach
> that  may have been necessary decades ago, but it seems it may be
> time to move on.


What standard or version should I check for to detect nl_langinfo_l
support? It's in POSIX 2017 but macOS supports it in 
despite not being POSIX 2017 compliant.

What about _get_osfhandle? Or fwrite_unlocked?

What about whether the platform ABI allows aligning global objects to
the cacheline size?

Those are just a few of the most recent autoconf checks I've written
for libstdc++ in the past few months, not decades ago.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Paul Floyd via Gcc




On 03-04-24 14:32, Martin Uecker via Gcc wrote:



The backdoor was hidden in a complicated autoconf script...


How many uncomplicated autoconf scripts exist in the real world?

A+
Paul



Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Toon Moene

On 4/1/24 17:06, Mark Wielaard wrote:


A big thanks to everybody working this long Easter weekend who helped
analyze the xz-backdoor and making sure the impact on Sourceware and
the hosted projects was minimal.


Thanks for those efforts !

Now, I have seen two more days of thinking about this vulnerability ... 
but no one seem to address the following issues:


A hack was made in liblzma, which, when the code was executed by a 
daemon that by virtue of its function, *has* to be run as root, was 
effective.


Two questions arise (as far as I am concerned):

1. Do daemons like sshd *have* to be linked with shared libraries ?
   Or could it be left to the security minded of the downstream
   (binary) distributions to link it statically with known & proven
   correct libraries ?

2. Is it a limitation of the Unix / Linux daemon concept that, once
   such a process needs root access, it has to have root access
   *always* - even when performing trivial tasks like compressing
   data ?

I recall quite well (vis-a-vis question 2) that the VMS equivalent would 
drop all privileges at the start of the code, and request only those 
relevant when actually needed (e.g., to open a file for reading that was 
owned by [the equivalent on VMS] of root - or perform other functions 
that only root could do), and then drop them immediately afterwards again.


Kind regards,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands



Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Paul Koning via Gcc



> On Apr 3, 2024, at 2:04 PM, Toon Moene  wrote:
> 
> On 4/1/24 17:06, Mark Wielaard wrote:
> 
>> A big thanks to everybody working this long Easter weekend who helped
>> analyze the xz-backdoor and making sure the impact on Sourceware and
>> the hosted projects was minimal.
> 
> Thanks for those efforts !
> 
> Now, I have seen two more days of thinking about this vulnerability ... but 
> no one seem to address the following issues:
> 
> A hack was made in liblzma, which, when the code was executed by a daemon 
> that by virtue of its function, *has* to be run as root, was effective.
> 
> Two questions arise (as far as I am concerned):
> 
> 1. Do daemons like sshd *have* to be linked with shared libraries ?
>   Or could it be left to the security minded of the downstream
>   (binary) distributions to link it statically with known & proven
>   correct libraries ?

I would add: should IFUNC be deleted?  Or alternatively, should it be strictly 
limited only to non-security-sensitive applications when not running as root?

> 2. Is it a limitation of the Unix / Linux daemon concept that, once
>   such a process needs root access, it has to have root access
>   *always* - even when performing trivial tasks like compressing
>   data ?

Clearly not, given the existence of the "seteuid" syscall.

> I recall quite well (vis-a-vis question 2) that the VMS equivalent would drop 
> all privileges at the start of the code, and request only those relevant when 
> actually needed (e.g., to open a file for reading that was owned by [the 
> equivalent on VMS] of root - or perform other functions that only root could 
> do), and then drop them immediately afterwards again.

Yes, and with additional effort all "root" type applications could be written 
that way.

paul



Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Ian Lance Taylor via Gcc
On Wed, Apr 3, 2024 at 11:05 AM Toon Moene  wrote:
>
> Two questions arise (as far as I am concerned):
>
> 1. Do daemons like sshd *have* to be linked with shared libraries ?
> Or could it be left to the security minded of the downstream
> (binary) distributions to link it statically with known & proven
> correct libraries ?

I like static linking personally, but it seems like glibc has made a
decision that shared linking is much preferred over static.  That said
my guess is that this kind of attack would have been effective on any
executable built as PIE.  It relied on using an IFUNC hook to adjust
the PLT entry for a different function.  And, of course, most
executables are built as PIE these days, because that is more secure
against different kinds of attacks.

> 2. Is it a limitation of the Unix / Linux daemon concept that, once
> such a process needs root access, it has to have root access
> *always* - even when performing trivial tasks like compressing
> data ?
>
> I recall quite well (vis-a-vis question 2) that the VMS equivalent would
> drop all privileges at the start of the code, and request only those
> relevant when actually needed (e.g., to open a file for reading that was
> owned by [the equivalent on VMS] of root - or perform other functions
> that only root could do), and then drop them immediately afterwards again.

Note that the attack really didn't have anything to do with
compressing data.  The library used an IFUNC to change the PLT of a
different function, so it effectively took control of the code that
verified the cryptographic key.  The only part of the attack that
involved compression was the fact that it happened to live in a
compression library.  And it wouldn't matter whether the code that
verified the cryptographic key was run as root either; the effect of
the attack was to say that the key was OK, and that sshd should
execute the command, and of course that execution must be done on
behalf of the requesting user, which (as I understand it) could be
root.

Ian


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Toon Moene

On 4/3/24 20:25, Ian Lance Taylor wrote:


Note that the attack really didn't have anything to do with
compressing data.  The library used an IFUNC to change the PLT of a
different function, so it effectively took control of the code that
verified the cryptographic key.  The only part of the attack that
involved compression was the fact that it happened to live in a
compression library.  And it wouldn't matter whether the code that
verified the cryptographic key was run as root either; the effect of
the attack was to say that the key was OK, and that sshd should
execute the command, and of course that execution must be done on
behalf of the requesting user, which (as I understand it) could be
root.


Ah, OK - that's what I missed.

Does your explanation mean that - if, as I do in my sshd config file - 
you *forbid* root access via sshd in *any* way, you wouldn't be vulnerable ?


Thanks,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands



Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Jonathon Anderson via Gcc
Hello all,

On Wed, 2024-04-03 at 16:00 +0200, Michael Matz wrote:
> > My take a way is that software needs to become less complex. Do 
> > we really still need complex build systems such as autoconf?
>
> (And, FWIW, testing for features isn't "complex".  And have you looked at 
> other build systems?  I have, and none of them are less complex, just 
> opaque in different ways from make+autotools).

Some brief opinions from a humble end-user:

I think the key difference here is that Autotools allows arbitrarily generated 
code to be executed at any time. More modern build systems require the use of 
specific commands/files to run arbitrary code, e.g. CMake (IIRC 
[`execute_process()`][2] and [`ExternalProject`][3]), Meson 
([`run_command()`][1]), Cargo ([`build.rs`][4]).\
IMHO there are similarities here to the memory "safety" of Rust: Rust code can 
have memory errors, but it can only come from Rust code declared as `unsafe` 
(or bugs in the compiler itself). The scope is limited and those scopes can be 
audited with more powerful microscopes... and removed if/when the build system 
gains first-class support upstream.

There are other features in the newest build systems listed here (Meson and 
Cargo) that make this particular attack vector harder. These build systems 
don't have release tarballs with auxiliary files (e.g. [Meson's is very close 
to `git archive`][5]), nor do their DSLs allow writing files to the source 
tree. One could imagine a build/CI worker where the source tree is a read-only 
bind-mount of a `git archive` extract, that might help defend against attacks 
of this specific design.

It's also worth noting that Meson and Cargo use non-Turing-complete declarative 
DSLs for their build configuration. This implies there is an upper bound on the 
[cyclomatic complexity][6]-per-line of the build script DSL itself. That 
doesn't mean you can't write complex Meson code (I have), but it ends up being 
suspiciously long and thus clear something complex and out of the ordinary is 
being done.

Of course, this doesn't make the build system any less complex, but projects 
using newer build systems seem easier to secure and audit than those using 
overly flexible build systems like Autotools and maybe even CMake. IMHO using a 
late-model build system is a relatively low technical hurdle to overcome for 
the benefits noted above, switching should be considered and in a positive 
light.

(For context: my team recently switched our main C/C++ project from Autotools 
to Meson. The one-time refactor itself was an effort, but after that we got 
back up to speed quickly and we haven't looked back. Other projects may have an 
easier time using an unofficial port in the [Meson WrapDB][7] as a starting 
point.)

-Jonathon

[1]: https://mesonbuild.com/External-commands.html
[2]: 
https://cmake.org/cmake/help/latest/command/execute_process.html#execute-process
[3]: https://cmake.org/cmake/help/latest/module/ExternalProject.html
[4]: https://doc.rust-lang.org/cargo/reference/build-scripts.html
[5]: https://mesonbuild.com/Creating-releases.html
[6]: https://en.wikipedia.org/wiki/Cyclomatic_complexity
[7]: https://mesonbuild.com/Wrapdb-projects.html


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Andreas Schwab
On Apr 03 2024, Paul Floyd via Gcc wrote:

> On 03-04-24 14:32, Martin Uecker via Gcc wrote:
>
>> The backdoor was hidden in a complicated autoconf script...
>
> How many uncomplicated autoconf scripts exist in the real world?

Probably the same amount as in any other build system.  Building
(portable) software requires a certain level of complexity.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Jonathan Wakely via Gcc
On Wed, 3 Apr 2024 at 19:36, Toon Moene  wrote:
>
> On 4/3/24 20:25, Ian Lance Taylor wrote:
>
> > Note that the attack really didn't have anything to do with
> > compressing data.  The library used an IFUNC to change the PLT of a
> > different function, so it effectively took control of the code that
> > verified the cryptographic key.  The only part of the attack that
> > involved compression was the fact that it happened to live in a
> > compression library.  And it wouldn't matter whether the code that
> > verified the cryptographic key was run as root either; the effect of
> > the attack was to say that the key was OK, and that sshd should
> > execute the command, and of course that execution must be done on
> > behalf of the requesting user, which (as I understand it) could be
> > root.
>
> Ah, OK - that's what I missed.
>
> Does your explanation mean that - if, as I do in my sshd config file -
> you *forbid* root access via sshd in *any* way, you wouldn't be vulnerable ?


No, sshd is still running as root.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Martin Uecker via Gcc
Am Mittwoch, dem 03.04.2024 um 13:46 -0500 schrieb Jonathon Anderson via Gcc:
> Hello all,
> 
> On Wed, 2024-04-03 at 16:00 +0200, Michael Matz wrote:
> > > My take a way is that software needs to become less complex. Do 
> > > we really still need complex build systems such as autoconf?
> > 
> > (And, FWIW, testing for features isn't "complex".  And have you looked at 
> > other build systems?  I have, and none of them are less complex, just 
> > opaque in different ways from make+autotools).
> 
> Some brief opinions from a humble end-user:
> 
> I think the key difference here is that Autotools allows arbitrarily 
> generated code to be executed at any time. More modern build systems require 
> the use of specific commands/files to run arbitrary code, e.g. CMake (IIRC 
> [`execute_process()`][2] and [`ExternalProject`][3]), Meson 
> ([`run_command()`][1]), Cargo ([`build.rs`][4]).\

To me it seems that Cargo is the absolute worst case with respect to
supply chain attacks.

It pulls in dependencies recursively from a relatively uncurated
list of projects, puts the source of all those dependencies into a 
hidden directory in home, and runs Build.rs automatically with
user permissions.

Martin





> IMHO there are similarities here to the memory "safety" of Rust: Rust code 
> can have memory errors, but it can only come from Rust code declared as 
> `unsafe` (or bugs in the compiler itself). The scope is limited and those 
> scopes can be audited with more powerful microscopes... and removed if/when 
> the build system gains first-class support upstream.
> 
> There are other features in the newest build systems listed here (Meson and 
> Cargo) that make this particular attack vector harder. These build systems 
> don't have release tarballs with auxiliary files (e.g. [Meson's is very close 
> to `git archive`][5]), nor do their DSLs allow writing files to the source 
> tree. One could imagine a build/CI worker where the source tree is a 
> read-only bind-mount of a `git archive` extract, that might help defend 
> against attacks of this specific design.
> 
> It's also worth noting that Meson and Cargo use non-Turing-complete 
> declarative DSLs for their build configuration. This implies there is an 
> upper bound on the [cyclomatic complexity][6]-per-line of the build script 
> DSL itself. That doesn't mean you can't write complex Meson code (I have), 
> but it ends up being suspiciously long and thus clear something complex and 
> out of the ordinary is being done.
> 
> Of course, this doesn't make the build system any less complex, but projects 
> using newer build systems seem easier to secure and audit than those using 
> overly flexible build systems like Autotools and maybe even CMake. IMHO using 
> a late-model build system is a relatively low technical hurdle to overcome 
> for the benefits noted above, switching should be considered and in a 
> positive light.
> 
> (For context: my team recently switched our main C/C++ project from Autotools 
> to Meson. The one-time refactor itself was an effort, but after that we got 
> back up to speed quickly and we haven't looked back. Other projects may have 
> an easier time using an unofficial port in the [Meson WrapDB][7] as a 
> starting point.)
> 
> -Jonathon
> 
> [1]: https://mesonbuild.com/External-commands.html
> [2]: 
> https://cmake.org/cmake/help/latest/command/execute_process.html#execute-process
> [3]: https://cmake.org/cmake/help/latest/module/ExternalProject.html
> [4]: https://doc.rust-lang.org/cargo/reference/build-scripts.html
> [5]: https://mesonbuild.com/Creating-releases.html
> [6]: https://en.wikipedia.org/wiki/Cyclomatic_complexity
> [7]: https://mesonbuild.com/Wrapdb-projects.html



Re: GSoC Timeline Review

2024-04-03 Thread Eric Feng via Gcc
Hi Nada,

Apologies for not being able to reply earlier as well. I’m glad to hear
you’re interested in continuing this project! There is still a lot of work
to be done — my work from last summer is in a very prototype stage. As
David mentioned, familiarizing myself with the analyzer took some time, but
I'm confident you’ll be able to make quicker progress if you're more
experienced. If you’re interested in continuing to work on the reference
count checking aspect of the project in a similar direction to what I was
doing, this bugzilla post might be helpful:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646. Specifically, David
and I discussed the challenge of scaling the implementation effectively,
particularly when tracking the behavior of every API entrypoint through
subclassing of known functions, and how we might reduce some of this effort
with function attributes. Moreover, you might also be interested in
exploring the work presented by Xutong Ma et al. at ASE 2023 late last
summer, titled “Detecting Memory Errors in Python Native Code by Tracking
Object Lifecycle with Reference Count,” as well as the works it was
compared against, if you’re interested in exploring alternative approaches.
Overall, I hope you find the experience as enriching as I did should you
work on this project! Please feel free to reach out with any questions and
I’ll be more than happy to help where I can.

Best,
Eric


On Tue, Apr 2, 2024 at 10:35 AM Martin Jambor  wrote:

> Hello,
>
> On Sat, Mar 30 2024, Nada Elsayed via Gcc wrote:
> > I think that I didn't fully understand the project, so I read more and
> > updated the Timeline Suggestion.
>
> Sorry that we were for not being able to respond sooner, Easter got into
> way in an unfortunate way.  I do not know much about Cython or static
> analysis (so I would not be able to give much advice about improvements
> to the time-line), but can confirm we have received your application.
>
> Thanks a lot.
>
> Martin
>
> >
> > Suggested Timeline:
> >
> >-
> >
> >May 1-26:
> >-
> >
> >   Explore Cython modules and try more realistic codes to see how it
> >   translates Python to c/c++.
> >   -
> >
> >   Know more about entry-points that Cython uses.
> >   -
> >
> >   Understand common bugs that happen when converting Python to c/c++.
> >
> >
> >
> >-
> >
> >Explore static analysis tool for CPython Extension code -which is
> >written in Python- and try this analyzer to understand the bugs in
> >translated Python code fully.
> >-
> >
> >Know how we can emit warnings and errors.
> >
> >
> >
> >-
> >
> >Debug the codebase to grasp its structure and potential areas for
> >improvement.
> >
> >
> >-
> >
> >Weeks 1-2:
> >-
> >
> >   Understand more about reference counting verification.
> >   -
> >
> >   Develop verifying reference counts for PyObjects passed as
> parameters.
> >   -
> >
> >Weeks 3-4:
> >-
> >
> >   Begin to investigate Error Handling Checking.
> >   -
> >
> >   Understand how the Static Analysis tool does Error Handling
> checking.
> >   -
> >
> >   Implement these checks in the plugin.
> >   -
> >
> >Weeks 5-7:
> >-
> >
> >   Begin to investigate Exception Handling Checking.
> >   -
> >
> >   Understand how the Static Analysis tool does Exception Handling
> >   checking.
> >   -
> >
> >   Implement these checks in the plugin.
> >   -
> >
> >Weeks 8-11
> >-
> >
> >   Begin to investigate Format String Checking.
> >   -
> >
> >   Understand how the Static Analysis tool does Format String
> Checking.
> >   -
> >
> >   Implement these checks in the plugin.
> >   -
> >
> >Week 12
> >-
> >
> >   Writing the GSoC wrapping-up document.
> >
> >
> > ‫في الأربعاء، 27 مارس 2024 في 2:31 ص تمت كتابة ما يلي بواسطة ‪Nada
> > Elsayed‬‏ <‪nadaelsayed...@gmail.com‬‏>:‬
> >
> >> Greetings All,
> >> Hope this email finds you well.
> >> I am interested in "Extend the plugin to add checking for usage of the
> >> CPython API" project. First of all, I built the library, and now I am
> >> trying to debug it. Then, I also used Cpython in 3 demos to understand
> how
> >> it works. Finally, I read the uploaded patch comments to understand the
> >> codebase and file structure.
> >>
> >> I was wondering if you could review my suggested timeline?
> >> suggested Timeline:
> >>
> >>-
> >>
> >>May 1-26:
> >>-
> >>
> >>   Explore Cython modules, emphasizing entry-points and bug
> >>   identification.
> >>   -
> >>
> >>   Study analyzers, particularly cpy-analyzer, to enhance
> >>   understanding.
> >>   -
> >>
> >>   Debug the codebase to grasp its structure and potential areas for
> >>   improvement.
> >>   -
> >>
> >>   Focus investigation on "errors in GIL handling" and "tp_traverse
> >>   errors".
> >>   -
> >>
> >>Weeks 1-6:
> >>