Hi, On 2022-07-02 09:52:33 +1200, Thomas Munro wrote: > On Sat, Jul 2, 2022 at 9:06 AM Andres Freund <and...@anarazel.de> wrote: > > On 2022-07-01 13:29:44 -0700, Andres Freund wrote: > > Chris, do you have any additional details about the machine that lead to > > this > > change? OS version, whether it might have been swapping, etc? > > > > I wonder if what happened is that posix_fallocate() used glibc's fallback > > implementation because the kernel was old enough to not support fallocate() > > for tmpfs. Looks like support for fallocate() for tmpfs was added in 3.5 > > ([1]). So e.g. a rhel 6 wouldn't have had that. > > With a quick test program on my Linux 5.10 kernel I see that an > SA_RESTART signal handler definitely causes posix_fallocate() to > return EINTR (can post trivial program). > > A drive-by look at the current/modern kernel source supports this: > shmem_fallocate returns -EINTR directly (not -ERESTARTSYS, which seems > to be the Linux-y way to say you want EINTR or restart as > appropriate?), and it also undoes all partial progress too (not too > surprising), which would explain why a perfectly timed machine gun > stream of signals from our recovery conflict system can make an > fallocate retry loop never terminate, for large enough sizes.
Yea :( And even if we fix recovery to not do douse other processes in signals quite that badly, there are plenty other sources of signals that can arrive at a steady clip. So I think we need to do something to defuse this another way. Ideas: 1) do the fallocate in smaller chunks, thereby making it much more likely to complete between two signal deliveries 2) block signals while calling posix_fallocate(). That won't work for everything (e.g. rapid SIGSTOP/SIGCONT), but that's not something we'd send ourselves, so whatever. 3) 1+2 4) ? Greetings, Andres Freund