Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

Tom Lane Sat, 17 May 2014 21:01:28 -0700

Christoph Berg <[email protected]> writes:
> Re: Tom Lane 2014-05-14 <[email protected]>
>> It would appear that something is wrong with check_stack_depth(),
>> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.


> ulimit -s is 8192 (kB); max_stack_depth is 2MB.

> check_stack_depth looks right, max_stack_depth_bytes there is 2097152
> and I can see stack_base_ptr - &stack_top_loc grow over repeated
> invocations of the function (stack_depth itself is optimized out).
> Still, it never enters "if (stack_depth > max_stack_depth_bytes...)".

Hm.  Did you check that stack_base_ptr is non-NULL?  If it were somehow
not getting set, that would disable the error report.  But on most
architectures that would also result in silly values for the pointer
difference, so I doubt this is the issue.

> Interestingly, the Debian buildd managed to run the testsuite for
> i386, while I could reproduce the problem on the pgapt build machine
> and on my notebook, so there must be some system difference. Possibly
> the reason is these two machines are running a 64bit kernel and I'm
> building in a 32bit chroot, though that hasn't been a problem before.

I'm suspicious that something has changed in your build environment,
because that stack-checking logic hasn't changed since these commits:

Author: Heikki Linnakangas <[email protected]>
Branch: master Release: REL9_2_BR [ef3883d13] 2012-04-08 19:07:55 +0300
Branch: REL9_1_STABLE Release: REL9_1_4 [ef29bb1f7] 2012-04-08 19:08:13 +0300
Branch: REL9_0_STABLE Release: REL9_0_8 [77dc2b0a4] 2012-04-08 19:09:12 +0300
Branch: REL8_4_STABLE Release: REL8_4_12 [89da5dc6d] 2012-04-08 19:09:26 +0300
Branch: REL8_3_STABLE Release: REL8_3_19 [ddeac5dec] 2012-04-08 19:09:37 +0300

    Do stack-depth checking in all postmaster children.
    
    We used to only initialize the stack base pointer when starting up a regular
    backend, not in other processes. In particular, autovacuum workers can run
    arbitrary user code, and without stack-depth checking, infinite recursion
    in e.g an index expression will bring down the whole cluster.

The lack of reports from the buildfarm or other users is also evidence
against there being a widespread issue here.

A different thought: I have heard of environments in which the available
stack depth is much less than what ulimit would suggest because the ulimit
space gets split up for multiple per-thread stacks.  That should not be
happening in a Postgres backend, since we don't do threading, but I'm
running out of ideas to investigate ...

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

Reply via email to