On Mon, Feb 8, 2021 at 2:38 PM Paul Smith <psm...@gnu.org> wrote: > > On Mon, 2021-02-08 at 10:43 +0000, Edward Welbourne wrote: > > Sounds to me like that's a bug: when the descriptors are closed, the > > part of MAKEFLAGS that claims they're make's jobserver file > > descriptors should be removed, since that's when the claim stops > > being true. > > I believe there have been other similar issues reported recently. > > Certainly fixing MAKEFLAGS when we run without jobserver available is > something that could be done. > > There is a loss of debugging information if we make this change: today > make can detect if it was invoked in a way that _should_ expect to > receive a jobserver context, but _didn't_ receive that context. That > is, if make sees that jobserver-auth is set but it can't open the > jobserver pipes it can warn the user that most likely there's a problem > in their environment or with the setup of their makefiles. > > Without this warning there's no way to know when this situation occurs. > It's easy to create a situation where every sub-make will create its > own completely unique jobserver domain. So you start the top make with > -j4 and run 4 sub-makes; if you do it wrong then each of 4 sub-makes > could create a new jobserver domain, and now you're running 16 jobs in > parallel instead of 4... there's no way for make to warn you about this > situation.
One thought occurred to me. Specifically: when make executes what it believes to be something other than a recursive invocation of $(MAKE), and it closes the job server pipe file descriptors for that, it can also: 1) Add an additional parameter to MAKEFLAGS, let's call it "--no-jobserver", and perhaps remove the --jobserver-auth parameter completely. It might be easier just to append something there, instead of surgically removing this. 2) Make checks for a --no-jobserver in MAKEFLAGS when it starts. If it's there it does NOT attempt to validate the file descriptors that are given in --jobserver-auth (if this parameter is preserved). It's a given that they're not there: if (!FD_OK (job_fds[0]) || !FD_OK (job_fds[1]) || make_job_rfd () < 0) Don't even do that. What happens right now a warning message gets printed and make runs without a job server. This change should have the same result, print the warning but skip the FD_OK tests. This will result in the same warning, but it should avoid triggering the bug that I found. However that might cause a minor regression in LTO linking. I think that this prevents the LTO linker's internal invocation of make from finding that it can attach to the original make process's job server. >From sifting through strace dumps, I see that a linker-invoked make gets its own -j flag. It appears that the linker is courteous enough to count how many CPUs it has and use it to construct its own -j flag. How about this, safe approach: once --no-jobserver is there it stays there, and gets propagated to all recursively invoked makes. If an invoke make finds that it has both a --no-jobserver and a -j flag, it'll warn and refuse to create its own job server, and then proceed executing one command at a time. This prevents an arithmetic proliferation of job worker processes if the original job server's file descriptors get lost. Currently recursively-invoked makes will find, and attach themselves to, an existing job server. This is nice; but this is vulnerable to an edge case that I think I'm hitting: a false positive involving a leaked file descriptor. This change encourages fixing whatever's causing make to fail to detect a recursive invocation. _______________________________________________ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache