I've applied your patch, but gcc 15.1.0 has decided that a 'struct
winsize *' not being a 'struct winsize *' is an error now.

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: cygwin
Compiler: gcc
Compilation CFLAGS: -Wno-error=incompatible-pointer-types
uname output: MSYS_NT-10.0-26100 Zack2021HPPavilion
3.6.3-ab81aae6.x86_64 2025-07-01 18:20 UTC x86_64 Msys
Machine Type: x86_64-pc-cygwin

Bash Version: 5.3
Patch Level: 0
Release Status: maint

Would be nice if bashbug were generated even if bash itself failed to build.

>From before I changed CFLAGS, obviously:

winsize.c: In function 'get_new_window_size':
winsize.c:98:39: error: passing argument 2 of 'tcgetwinsize' from
incompatible pointer type [-Wincompatible-pointer-types]
   98 |   if (tty >= 0 && (tcgetwinsize (tty, &win) == 0) &&
win.ws_row > 0 && win.ws_col > 0)
      |                                       ^~~~
      |                                       |
      |                                       struct winsize *
In file included from /usr/include/sys/ioctl.h:15,
                 from winsize.c:31:
/usr/include/sys/termios.h:304:42: note: expected 'struct winsize *'
but argument is of type 'struct winsize *'
  304 | int tcgetwinsize(int fd, struct winsize *winsz);
      |                          ~~~~~~~~~~~~~~~~^~~~~
make[1]: *** [Makefile:84: winsize.o] Error 1
make[1]: Leaving directory '/home/zsant/repos/bash/lib/sh'
make: *** [Makefile:763: lib/sh/libsh.a] Error 1

I've got it built now.

On Tue, Jul 15, 2025 at 4:33 PM Chet Ramey <chet.ra...@case.edu> wrote:
>
> On 7/15/25 1:26 PM, Zachary Santer wrote:
> > On Tue, Jul 15, 2025 at 11:39 AM Chet Ramey <chet.ra...@case.edu> wrote:
> >>
> >> On 7/15/25 11:35 AM, Zachary Santer wrote:
> >>> On Tue, Jul 15, 2025 at 10:29 AM Chet Ramey <chet.ra...@case.edu> wrote:
> >>>>
> >>>> With respect to not consulting the list of saved pids if given a list
> >>>> of pid arguments, solving this problem.
> >>>
> >>> I'd like to think that's just a stopgap, but yeah, I can play around with 
> >>> that.
> >>
> >> Why do you think it would be a stopgap?
> >
> > We had that super-long discussion, and then you just throw away a
> > pretty big part of those changes immediately after there's any kind of
> > issue?
>
> No.
>
> > Maybe I was being optimistic.
>
> OK, if you haven't tested or read the patch, let's see if I can
> describe the issue precisely enough and describe the solution well
> enough so you won't have to.

I can't build bash at work, but yes, I should've read the patch
instead of jumping to conclusions.

> I defined the problem like this:
>
> "The short story is that `wait -n' now returns the status of any process
> that's completed and hasn't been waited for yet, just like `wait'. It's
> not restricted to processes that terminates after it's invoked. It does
> this even when it's given a list of pid or job arguments, which is
> where the problem here arises."
>
> This means that in posix mode, even when it's given a list of pids to
> operate on, bash-5.3 can return the pid of a process from the terminated
> background pids list (the `bgpids list') that is not one of the pids
> supplied as an argument.
>
> It will do this if it doesn't find one of the pids it's looking for in the
> bgpids list, the list of terminated procsubs (before they get moved to the
> bgpids list), or a terminated job. This is what it would do if it were not
> supplied any pid arguments, and, as several people noted, is a real
> problem.

Other people who also misunderstood you as meaning that had been intended.

> In this case, `wait' is returning the status of the process substitution,
> which is saved on the bgpids list, even though its pid is not one of the
> ones it should look for.
>
> The fix is to pick and return a terminated pid from the bgpids list only if
> we are *not* restricting action to a list of pids.
>
> If we have such a list, we check as above, then wait for a process to
> terminate if all the processes of interest are still running, then do the
> checks again, repeating until we find a terminated process we're interested 
> in.
>
> That's why the patch is so short -- it adds a missing test.
>
> The workaround I suggested works for the example script because the extra
> `wait' discards the contents of the bgpids list, leaving nothing to return
> even if the code checks for it. It's something you can put in a script
> immediately, if it's appropriate, while you wait for me to release a patch
> the distros will accept.
>
> > It doesn't seem to me like consulting the list of saved pids or not
> > would even be particularly relevant to the issue here.
>
> This is a bad assumption. `wait' consults the bgpids list to see if it
> has one of the pids its interested in. The issue here is that it's
> returning one that's not one of those pids.
>
> > 'wait -n' is
> > mistakenly returning the termination status of a procsub, which
> > wouldn't show up in that list to begin with, if I remember correctly.
>
> You don't.
>
> There is a list of active process substitutions, and some time after a
> procsub terminates, its pid gets saved on the bgpids list. Process
> substitutions set $!, which means you can potentially save that pid and
> wait for it later, and saving its pid to the bgpids list is how you make
> that happen. In this case, the procsub terminated when it finished
> writing the jobs output and was reaped when mapfile finished and the shell
> went back to read the next command. At that point, it got moved to the
> bgpids list.
>
> > This instead of waiting for and returning the termination status of
> > the single regular background process whose pid was given to it
> > explicitly, even though 'wait -n' was called before that background
> > process had terminated, and thus it also wouldn't show up in that
> > list.
>
> Pretty much, as detailed above.
>
> > If we *are* looking at a race condition thing, your change could have
> > affected timing.
>
> We're not. The patch was already there to verify that when you replied.
>
> > Not sure if I can realistically dive into and understand the source
> > code myself right away.
>
> OK. If it's going to be a black box -- not everyone wants to get into the
> code -- you can apply the patch, rebuild, and observe the changed behavior.
> It's easier if you make all of the asynchronous processes have distinct
> exit statuses and inspect $? after `wait' returns so you can tell which
> one it grabbed.

"JWAIT_WAITING"?

I failed to turn on posix mode in much of my earlier testing, so good job me.

Thanks for the detailed explanation. Sorry for wasting your time.

Reply via email to