I've applied your patch, but gcc 15.1.0 has decided that a 'struct winsize *' not being a 'struct winsize *' is an error now.
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: cygwin Compiler: gcc Compilation CFLAGS: -Wno-error=incompatible-pointer-types uname output: MSYS_NT-10.0-26100 Zack2021HPPavilion 3.6.3-ab81aae6.x86_64 2025-07-01 18:20 UTC x86_64 Msys Machine Type: x86_64-pc-cygwin Bash Version: 5.3 Patch Level: 0 Release Status: maint Would be nice if bashbug were generated even if bash itself failed to build. >From before I changed CFLAGS, obviously: winsize.c: In function 'get_new_window_size': winsize.c:98:39: error: passing argument 2 of 'tcgetwinsize' from incompatible pointer type [-Wincompatible-pointer-types] 98 | if (tty >= 0 && (tcgetwinsize (tty, &win) == 0) && win.ws_row > 0 && win.ws_col > 0) | ^~~~ | | | struct winsize * In file included from /usr/include/sys/ioctl.h:15, from winsize.c:31: /usr/include/sys/termios.h:304:42: note: expected 'struct winsize *' but argument is of type 'struct winsize *' 304 | int tcgetwinsize(int fd, struct winsize *winsz); | ~~~~~~~~~~~~~~~~^~~~~ make[1]: *** [Makefile:84: winsize.o] Error 1 make[1]: Leaving directory '/home/zsant/repos/bash/lib/sh' make: *** [Makefile:763: lib/sh/libsh.a] Error 1 I've got it built now. On Tue, Jul 15, 2025 at 4:33 PM Chet Ramey <chet.ra...@case.edu> wrote: > > On 7/15/25 1:26 PM, Zachary Santer wrote: > > On Tue, Jul 15, 2025 at 11:39 AM Chet Ramey <chet.ra...@case.edu> wrote: > >> > >> On 7/15/25 11:35 AM, Zachary Santer wrote: > >>> On Tue, Jul 15, 2025 at 10:29 AM Chet Ramey <chet.ra...@case.edu> wrote: > >>>> > >>>> With respect to not consulting the list of saved pids if given a list > >>>> of pid arguments, solving this problem. > >>> > >>> I'd like to think that's just a stopgap, but yeah, I can play around with > >>> that. > >> > >> Why do you think it would be a stopgap? > > > > We had that super-long discussion, and then you just throw away a > > pretty big part of those changes immediately after there's any kind of > > issue? > > No. > > > Maybe I was being optimistic. > > OK, if you haven't tested or read the patch, let's see if I can > describe the issue precisely enough and describe the solution well > enough so you won't have to. I can't build bash at work, but yes, I should've read the patch instead of jumping to conclusions. > I defined the problem like this: > > "The short story is that `wait -n' now returns the status of any process > that's completed and hasn't been waited for yet, just like `wait'. It's > not restricted to processes that terminates after it's invoked. It does > this even when it's given a list of pid or job arguments, which is > where the problem here arises." > > This means that in posix mode, even when it's given a list of pids to > operate on, bash-5.3 can return the pid of a process from the terminated > background pids list (the `bgpids list') that is not one of the pids > supplied as an argument. > > It will do this if it doesn't find one of the pids it's looking for in the > bgpids list, the list of terminated procsubs (before they get moved to the > bgpids list), or a terminated job. This is what it would do if it were not > supplied any pid arguments, and, as several people noted, is a real > problem. Other people who also misunderstood you as meaning that had been intended. > In this case, `wait' is returning the status of the process substitution, > which is saved on the bgpids list, even though its pid is not one of the > ones it should look for. > > The fix is to pick and return a terminated pid from the bgpids list only if > we are *not* restricting action to a list of pids. > > If we have such a list, we check as above, then wait for a process to > terminate if all the processes of interest are still running, then do the > checks again, repeating until we find a terminated process we're interested > in. > > That's why the patch is so short -- it adds a missing test. > > The workaround I suggested works for the example script because the extra > `wait' discards the contents of the bgpids list, leaving nothing to return > even if the code checks for it. It's something you can put in a script > immediately, if it's appropriate, while you wait for me to release a patch > the distros will accept. > > > It doesn't seem to me like consulting the list of saved pids or not > > would even be particularly relevant to the issue here. > > This is a bad assumption. `wait' consults the bgpids list to see if it > has one of the pids its interested in. The issue here is that it's > returning one that's not one of those pids. > > > 'wait -n' is > > mistakenly returning the termination status of a procsub, which > > wouldn't show up in that list to begin with, if I remember correctly. > > You don't. > > There is a list of active process substitutions, and some time after a > procsub terminates, its pid gets saved on the bgpids list. Process > substitutions set $!, which means you can potentially save that pid and > wait for it later, and saving its pid to the bgpids list is how you make > that happen. In this case, the procsub terminated when it finished > writing the jobs output and was reaped when mapfile finished and the shell > went back to read the next command. At that point, it got moved to the > bgpids list. > > > This instead of waiting for and returning the termination status of > > the single regular background process whose pid was given to it > > explicitly, even though 'wait -n' was called before that background > > process had terminated, and thus it also wouldn't show up in that > > list. > > Pretty much, as detailed above. > > > If we *are* looking at a race condition thing, your change could have > > affected timing. > > We're not. The patch was already there to verify that when you replied. > > > Not sure if I can realistically dive into and understand the source > > code myself right away. > > OK. If it's going to be a black box -- not everyone wants to get into the > code -- you can apply the patch, rebuild, and observe the changed behavior. > It's easier if you make all of the asynchronous processes have distinct > exit statuses and inspect $? after `wait' returns so you can tell which > one it grabbed. "JWAIT_WAITING"? I failed to turn on posix mode in much of my earlier testing, so good job me. Thanks for the detailed explanation. Sorry for wasting your time.