On Fri, May 10, 2019 at 05:26:43PM -0400, Andrew Dunstan wrote: > > On 5/10/19 3:35 PM, Tom Lane wrote: > > Andres Freund <and...@anarazel.de> writes: > >> On 2019-05-10 11:38:57 -0400, Tom Lane wrote: > >>> I am wondering if, somehow, the stack depth limit seen by the postmaster > >>> sometimes doesn't apply to its children. That would be pretty wacko > >>> kernel behavior, especially if it's only intermittently true. > >>> But we're running out of other explanations. > >> I wonder if this is a SIGSEGV that actually signals an OOM > >> situation. Linux, if it can't actually extend the stack on-demand due to > >> OOM, sends a SIGSEGV. The signal has that information, but > >> unfortunately the buildfarm code doesn't print it. p $_siginfo would > >> show us some of that... > >> Mark, how tight is the memory on that machine? Does dmesg have any other > >> information (often segfaults are logged by the kernel with the code > >> IIRC). > > It does sort of smell like a resource exhaustion problem, especially > > if all these buildfarm animals are VMs running on the same underlying > > platform. But why would that manifest as "you can't have a measly two > > megabytes of stack" and not as any other sort of OOM symptom? > > > > Mark, if you don't mind modding your local copies of the buildfarm > > script, I think what Andres is asking for is a pretty trivial addition > > in PGBuild/Utils.pm's sub get_stack_trace: > > > > my $cmdfile = "./gdbcmd"; > > my $handle; > > open($handle, '>', $cmdfile) || die "opening $cmdfile: $!"; > > print $handle "bt\n"; > > + print $handle "p $_siginfo\n"; > > close($handle); > > > > > > > I think we'll need to write that as: > > > print $handle 'p $_siginfo',"\n";
Ok, I have this added to everyone now. I think I also have caught up on this thread, but let me know if I missed anything. Regards, Mark -- Mark Wong 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/