On Sat, Oct 12, 2019 at 08:41:12AM +1300, Thomas Munro wrote: > On Sat, Oct 12, 2019 at 7:56 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > This matches up with the intermittent infinite_recurse failures > > we've been seeing in the buildfarm. Those are happening across > > a range of systems, but they're (almost) all Linux-based ppc64, > > suggesting that there's a longstanding arch-specific kernel bug > > involved. For reference, I scraped the attached list of such > > failures in the last three months. I wonder whether we can get > > the attention of any kernel hackers about that. > > Yeah, I don't know anything about this stuff, but I was also beginning > to wonder if something is busted in the arch-specific fault.c code > that checks if stack expansion is valid[1], in a way that fails with a > rapidly growing stack, well timed incoming signals, and perhaps > Docker/LXC (that's on Mark's systems IIUC, not sure about the ARM > boxes that failed or if it could be relevant here). Perhaps the > arbitrary tolerances mentioned in that comment are relevant.
This specific one (wobbegon) is OpenStack/KVM[2], for what it's worth... "... cluster is an OpenStack based cluster offering POWER8 & POWER9 LE instances running on KVM ..." But to keep you on your toes, some of my ppc animals are Docker within other OpenStack/KVM instance... Regards, Mark [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/mm/fault.c#L244 [2] https://osuosl.org/services/powerdev/ -- Mark Wong 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/