Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-26 Thread Denis Excoffier



> On 2022-09-25 16:25, Dmitry Goncharov wrote:
> 
> On Sun, Sep 25, 2022 at 2:09 AM Martin Dorey
>  wrote:
>> And vfork is where that happens.  If I’ve followed the thicket of #ifdef 
>> correctly and understood the vfork man page, then this is illegal when using 
>> vfork:
>> 
>> https://github.com/mirror/make/blob/master/src/job.c#L2556
> 
> 
> Thanks, Martin.
> This is indeed the culprit.
> I was able to find a linux where this reproduces. However, the this
> does this reproduce on my sun. Denis, can you please try this patch on
> your sun?
> 
> regards, Dmitry
> 
> Here is a patch
> 
> index d12a9138..1ed71f0a 100644
> --- a/src/job.c
> +++ b/src/job.c
> @@ -2286,6 +2286,8 @@ child_execute_job (struct childbase *child, int
> good_stdin, char **argv)
>   posix_spawnattr_t attr;
>   posix_spawn_file_actions_t fa;
>   short flags = 0;
> +#else
> +  char **parent_environ = environ;
> #endif
> 
>   /* Divert child output if we want to capture it.  */
> @@ -2301,7 +2303,12 @@ child_execute_job (struct childbase *child, int
> good_stdin, char **argv)
> 
>   pid = vfork();
>   if (pid != 0)
> +{
> +  /* The child sets environ to child->environment before calling execvp.
> +   * Restore it here.  */
> +  environ = parent_environ;
>   return pid;
> +}
> 
>   /* We are the child.  */
>   unblock_all_sigs ();
> @@ -2552,7 +2559,8 @@ exec_command (char **argv, char **envp)
> errno = ENOEXEC;
> 
> # else
> -  /* Run the program.  */
> +  /* Run the program.
> +   * The parent has to restore environ.  */
>   environ = envp;
>   execvp (argv[0], argv);

Hello,
This patch works perfectly, both on my 32bit linux and my 32bit solaris 10.
Thank you.
Regards,
Denis Excoffier.




Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-25 Thread Paul Smith
On Sun, 2022-09-25 at 13:03 -0400, Paul Smith wrote:
> The odd thing is that make ALREADY preserves the environment pointer
> and restores it, but it does so after child_execute_job() is called
> (after removing some ifdefs):

Oh, well, clearly child_execute_job() is called from other places, that
don't do this.  Doh!

OK will fix, thanks for the assistance all.



Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-25 Thread Paul Smith
On Sun, 2022-09-25 at 10:25 -0400, Dmitry Goncharov wrote:
> However, the this does this reproduce on my sun.

Sorry Dmitry but I didn't understand that sentence...?

The odd thing is that make ALREADY preserves the environment pointer
and restores it, but it does so after child_execute_job() is called
(after removing some ifdefs):

{
  /* Fork the child process.  */

  char **parent_environ;

run_local:
  block_sigs ();

  child->remote = 0;

  parent_environ = environ;

  jobserver_pre_child (flags & COMMANDS_RECURSE);

  child->pid = child_execute_job ((struct childbase *)child,
  child->good_stdin, argv);

  environ = parent_environ; /* Restore value child may have clobbered.  */
  jobserver_post_child (flags & COMMANDS_RECURSE);
}

I suppose this may not be sufficient to fix the problem?



Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-25 Thread Martin Dorey
https://docs.oracle.com/cd/E88353_01/html/E37841/execvpe-2.html alleges to 
document Oracle Solaris 11.4 and appears to support execvpe.  Yes, it's listed 
in Linux pages as a GNU extension and it's not on the Open Group page for the 
exec family, but symmetry makes it quite an obvious extension.  Perhaps that 
could be detected by a configure test.

The Linux page for vfork, but not the Open Group one, says that vfork suspend 
the caller until the exec is done, so the caller could restore environ after.  
I don't immediately see that it could hurt and should fix it for Linux.

Or there's copying the environment somewhere else.  Huh, even just the environ 
pointer.


From: Bug-make  on behalf of 
Martin Dorey 
Sent: Sunday, September 25, 2022 07:09
To: psm...@gnu.org ; Denis Excoffier 

Cc: bug-make@gnu.org 
Subject: Re: 4.3.90 release candidate segfaults on linux and solaris

execvpe does indeed fix it for me.  Undoing the fix again, doing make clean -k 
and coming out of the chroot to configure --disable-posix-spawn for my 64 bit 
root environment (with libc 2.24 from Debian Stretch on Linux 4.9.0-19-amd64), 
I can reproduce the problem again with this makefile.  I'd be surprised if that 
doesn't crash for Paul too because I suggest that the kernel version isn't 
actually important either.


From: Paul Smith 
Sent: Sunday, September 25, 2022 06:23
To: Martin Dorey ; Denis Excoffier 

Cc: bug-make@gnu.org 
Subject: Re: 4.3.90 release candidate segfaults on linux and solaris

* EXTERNAL EMAIL *

On Sun, 2022-09-25 at 06:02 +, Martin Dorey wrote:
> And vfork is where that happens.  If I’ve followed the thicket of
> #ifdef correctly and understood the vfork man page, then this is
> illegal when using vfork:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmirror%2Fmake%2Fblob%2Fmaster%2Fsrc%2Fjob.c%23L2556data=05%7C01%7CMartin.Dorey%40hitachivantara.com%7C82c25851389d4e1c12cb08da9ef91def%7C18791e1761594f52a8d4de814ca8284a%7C0%7C0%7C637997090043171790%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=v9hASQYmLkyN49Vpvk4DGrT0roD482%2FC5uIyc3V2DLQ%3Dreserved=0

It's long past time to rewrite this entire part of job.c but I don't
want to do that before the release.

I did try this with a 32bit Ubuntu 14.04 docker container image which
uses a similar libc version, but contrary to my previous assertion
(sorry Denis!) if it's an issue with vfork then it IS related to the
kernel version and so my docker containers won't be of any use.

This is not an actual fix (won't work on Solaris obviously since
execvpe() is a GNU extension) but can you try this change to see if it
works on your system Martin?  At least we can verify that this is the
issue:

diff --git a/src/job.c b/src/job.c
index d12a9138..0a4ddae8 100644
--- a/src/job.c
+++ b/src/job.c
@@ -2553,8 +2553,7 @@ exec_command (char **argv, char **envp)

 # else
   /* Run the program.  */
-  environ = envp;
-  execvp (argv[0], argv);
+  execvpe (argv[0], argv, envp);

 # endif /* !__EMX__ */




Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-25 Thread Dmitry Goncharov
On Sun, Sep 25, 2022 at 2:09 AM Martin Dorey
 wrote:
> And vfork is where that happens.  If I’ve followed the thicket of #ifdef 
> correctly and understood the vfork man page, then this is illegal when using 
> vfork:
>
> https://github.com/mirror/make/blob/master/src/job.c#L2556


Thanks, Martin.
This is indeed the culprit.
I was able to find a linux where this reproduces. However, the this
does this reproduce on my sun. Denis, can you please try this patch on
your sun?

regards, Dmitry

Here is a patch

index d12a9138..1ed71f0a 100644
--- a/src/job.c
+++ b/src/job.c
@@ -2286,6 +2286,8 @@ child_execute_job (struct childbase *child, int
good_stdin, char **argv)
   posix_spawnattr_t attr;
   posix_spawn_file_actions_t fa;
   short flags = 0;
+#else
+  char **parent_environ = environ;
 #endif

   /* Divert child output if we want to capture it.  */
@@ -2301,7 +2303,12 @@ child_execute_job (struct childbase *child, int
good_stdin, char **argv)

   pid = vfork();
   if (pid != 0)
+{
+  /* The child sets environ to child->environment before calling execvp.
+   * Restore it here.  */
+  environ = parent_environ;
   return pid;
+}

   /* We are the child.  */
   unblock_all_sigs ();
@@ -2552,7 +2559,8 @@ exec_command (char **argv, char **envp)
 errno = ENOEXEC;

 # else
-  /* Run the program.  */
+  /* Run the program.
+   * The parent has to restore environ.  */
   environ = envp;
   execvp (argv[0], argv);



Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-25 Thread Martin Dorey
execvpe does indeed fix it for me.  Undoing the fix again, doing make clean -k 
and coming out of the chroot to configure --disable-posix-spawn for my 64 bit 
root environment (with libc 2.24 from Debian Stretch on Linux 4.9.0-19-amd64), 
I can reproduce the problem again with this makefile.  I'd be surprised if that 
doesn't crash for Paul too because I suggest that the kernel version isn't 
actually important either.


From: Paul Smith 
Sent: Sunday, September 25, 2022 06:23
To: Martin Dorey ; Denis Excoffier 

Cc: bug-make@gnu.org 
Subject: Re: 4.3.90 release candidate segfaults on linux and solaris

* EXTERNAL EMAIL *

On Sun, 2022-09-25 at 06:02 +, Martin Dorey wrote:
> And vfork is where that happens.  If I’ve followed the thicket of
> #ifdef correctly and understood the vfork man page, then this is
> illegal when using vfork:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmirror%2Fmake%2Fblob%2Fmaster%2Fsrc%2Fjob.c%23L2556data=05%7C01%7CMartin.Dorey%40hitachivantara.com%7C82c25851389d4e1c12cb08da9ef91def%7C18791e1761594f52a8d4de814ca8284a%7C0%7C0%7C637997090043171790%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=v9hASQYmLkyN49Vpvk4DGrT0roD482%2FC5uIyc3V2DLQ%3Dreserved=0

It's long past time to rewrite this entire part of job.c but I don't
want to do that before the release.

I did try this with a 32bit Ubuntu 14.04 docker container image which
uses a similar libc version, but contrary to my previous assertion
(sorry Denis!) if it's an issue with vfork then it IS related to the
kernel version and so my docker containers won't be of any use.

This is not an actual fix (won't work on Solaris obviously since
execvpe() is a GNU extension) but can you try this change to see if it
works on your system Martin?  At least we can verify that this is the
issue:

diff --git a/src/job.c b/src/job.c
index d12a9138..0a4ddae8 100644
--- a/src/job.c
+++ b/src/job.c
@@ -2553,8 +2553,7 @@ exec_command (char **argv, char **envp)

 # else
   /* Run the program.  */
-  environ = envp;
-  execvp (argv[0], argv);
+  execvpe (argv[0], argv, envp);

 # endif /* !__EMX__ */




Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-25 Thread Paul Smith
On Sun, 2022-09-25 at 06:02 +, Martin Dorey wrote:
> And vfork is where that happens.  If I’ve followed the thicket of
> #ifdef correctly and understood the vfork man page, then this is
> illegal when using vfork:
> 
> https://github.com/mirror/make/blob/master/src/job.c#L2556

It's long past time to rewrite this entire part of job.c but I don't
want to do that before the release.

I did try this with a 32bit Ubuntu 14.04 docker container image which
uses a similar libc version, but contrary to my previous assertion
(sorry Denis!) if it's an issue with vfork then it IS related to the
kernel version and so my docker containers won't be of any use.

This is not an actual fix (won't work on Solaris obviously since
execvpe() is a GNU extension) but can you try this change to see if it
works on your system Martin?  At least we can verify that this is the
issue:

diff --git a/src/job.c b/src/job.c
index d12a9138..0a4ddae8 100644
--- a/src/job.c
+++ b/src/job.c
@@ -2553,8 +2553,7 @@ exec_command (char **argv, char **envp)

 # else
   /* Run the program.  */
-  environ = envp;
-  execvp (argv[0], argv);
+  execvpe (argv[0], argv, envp);

 # endif /* !__EMX__ */





Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-25 Thread Martin Dorey
I have a 32 bit Debian Squeeze (so 2011 era, with gcc-4.4 - too early for asan 
- and libc 2.11, so two Debian releases earlier than Denis’s) chroot lying 
around.  Unsurprisingly, his reproducer crashes for me too, configured with no 
options and make run with no arguments.  gdb’s reporting line 119 rather than 
118 in expand.c.  valgrind annoyingly stops it crashing.

I had trouble compiling rule.c because johnny-come-lately mempcpy on this 
vintage setup seems to be provided as a macro and that doesn't play well with 
the STRING_SIZE_TUPLE macro, because, until the latter is expanded, mempcpy 
appears to only have two arguments, when it requires three.  #undef mempcpy in 
rule.c got me over that.  I know we depend on C99 now, seemingly without a 
storm of protest, but this is failing to compile on a setup a decade newer than 
that.

For me, it’s the variable known in the make source as “environ”, which isn't 
what my gdb calls environ but what it calls ‘environ@@GLIBC_2.0' which has been 
corrupted, by the second call to recursively_expand_for_file, not, far as I 
sampled, the environment variables it points to.  gdb alleged to let me set a 
hardware watch point on it but it didn't fire.  It seems to have been corrupted 
to point into the middle of an environment variable.  So what should be an 
array of character pointers is an array of characters.  Interpreting those as 
character pointers promptly doesn't go so well.  It’s already changed, for me, 
by the call to free_childbase.

…

And vfork is where that happens.  If I’ve followed the thicket of #ifdef 
correctly and understood the vfork man page, then this is illegal when using 
vfork:

https://github.com/mirror/make/blob/master/src/job.c#L2556

(I’d use the savannah git web thing but I don’t see line number links.)

The corruption happening in another process would explain the hardware watch 
point not catching it.  The “blame“ says that dodgy assignment is 28 years old 
but, as was recently topical again, we have stopped using posix_spawn instead 
of vfork with old libc versions and this code for copying the environment 
definitions of recursively expanded variables for $(shell) invocations is even 
newer.  Perhaps Make didn’t otherwise use the environ variable that it’s 
illegally corrupted for the last three decades.


From: Bug-make  on behalf of 
Paul Smith 
Sent: Saturday, September 24, 2022 14:17
To: Denis Excoffier 
Cc: bug-make@gnu.org 
Subject: Re: 4.3.90 release candidate segfaults on linux and solaris

* EXTERNAL EMAIL *

On Sat, 2022-09-24 at 13:06 -0400, Paul Smith wrote:
> > The crashes in solaris and linux are so similar, and go away also
> > so similarly, that i would primarily think about size of types.
> > Under cygwin and MacOS all sizes (pointers, long int, size_t,
> > SIZE_MAX, time_t) are 8, while under (this) solaris and (this)
> > linux all sizes are 4. Only size of int is 4 on all.
>
> Aha!  You're using a 32bit Linux.  That is indeed helpful
> information, thanks.  I will check it.

I installed a 32bit docker container and tried with that but still no
failure.  I will keep poking at it; maybe I can find a much older
container.



Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-24 Thread Paul Smith
On Sat, 2022-09-24 at 13:06 -0400, Paul Smith wrote:
> > The crashes in solaris and linux are so similar, and go away also
> > so similarly, that i would primarily think about size of types.
> > Under cygwin and MacOS all sizes (pointers, long int, size_t,
> > SIZE_MAX, time_t) are 8, while under (this) solaris and (this)
> > linux all sizes are 4. Only size of int is 4 on all.
> 
> Aha!  You're using a 32bit Linux.  That is indeed helpful
> information, thanks.  I will check it.

I installed a 32bit docker container and tried with that but still no
failure.  I will keep poking at it; maybe I can find a much older
container.



Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-24 Thread Paul Smith
On Sat, 2022-09-24 at 18:23 +0200, Denis Excoffier wrote:
> The segfault also occurs without ‘env -i’.

Thanks for helping me avoid that red herring.

> The crashes in solaris and linux are so similar, and go away also so
> similarly, that i would primarily think about size of types. Under
> cygwin and MacOS all sizes (pointers, long int, size_t, SIZE_MAX,
> time_t) are 8, while under (this) solaris and (this) linux all sizes
> are 4. Only size of int is 4 on all.

Aha!  You're using a 32bit Linux.  That is indeed helpful information,
thanks.  I will check it.



Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-24 Thread Denis Excoffier


> On 2022-09-24 17:19, Paul Smith wrote:
> 
> On Sat, 2022-09-24 at 09:36 +0200, Denis Excoffier wrote:
>> In my specific configuration (under linux, with --disable-nls,
>> --disable-load, without using -j, using 'env -i make -d -n'), a
>> segfault always occurs around line 118 of src/expand.c:
>> 
>> My linux is old (2.6.32),
> 
> The kernel version is not very interesting for a userspace program like
> GNU make.  But, it would be interesting if you could provide the
> version of libc you are using; on my system I can use:
> 
>  ~$ /lib/x86_64-linux-gnu/libc.so.6 --version | head -n1
> 
> You can find the library by running "ldd make | grep libc'
> 
> It's also interesting you're running with "env -i" that seems like it
> might be related.  Do you get the crash if you run without that?
> 
> A quick check on my system was not able to show any issues using the
> same setup as you, even building make with ASAN.  I will review the
> code especially around memory handling in the child structures to see
> if I can find an error by inspection.

% ldd make
linux-gate.so.1 (0xb76e4000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb7565000)
/lib/ld-linux.so.2 (0xb76e5000)
% /lib/i386-linux-gnu/libc.so.6 --version | head -2
GNU C Library (Debian GLIBC 2.19-18+deb8u10) stable release version 2.19, by 
Roland McGrath et al.
Copyright (C) 2014 Free Software Foundation, Inc.
%

The segfault also occurs without ‘env -i’. I was just trying to minimize the 
number of
entries in the environment. In any case make seems to always add a few ones
by default (like MAKEFLAGS etc.).

The crashes in solaris and linux are so similar, and go away also so similarly,
that i would primarily think about size of types. Under cygwin and
MacOS all sizes (pointers, long int, size_t, SIZE_MAX, time_t) are 8, while
under (this) solaris and (this) linux all sizes are 4. Only size of int is 4 on 
all.

Regards,

Denis Excoffier.




Re: 4.3.90 release candidate segfaults on linux and solaris

2022-09-24 Thread Paul Smith
On Sat, 2022-09-24 at 09:36 +0200, Denis Excoffier wrote:
> In my specific configuration (under linux, with --disable-nls,
> --disable-load, without using -j, using 'env -i make -d -n'), a
> segfault always occurs around line 118 of src/expand.c:
> 
> My linux is old (2.6.32),

The kernel version is not very interesting for a userspace program like
GNU make.  But, it would be interesting if you could provide the
version of libc you are using; on my system I can use:

  ~$ /lib/x86_64-linux-gnu/libc.so.6 --version | head -n1

You can find the library by running "ldd make | grep libc'

It's also interesting you're running with "env -i" that seems like it
might be related.  Do you get the crash if you run without that?

A quick check on my system was not able to show any issues using the
same setup as you, even building make with ASAN.  I will review the
code especially around memory handling in the child structures to see
if I can find an error by inspection.