Hi, just a small update. I've configured the OS for taking crash dumps on Ubuntu 16.04 with the following (maybe somebody will find it helpful): I've added LimitCORE=infinity to /lib/systemd/system/postgresql@.service under [Service] section I've reloaded the service config with sudo systemctl daemon-reload Changed the core pattern to: sudo echo /var/lib/postgresql/core.%p.sig%s.%ts | tee -a /proc/sys/kernel/core_pattern I had tested it with kill -ABRT pidofbackend and it behaved correctly. A crash dump was written.
In the last days I've been monitoring no segfault occurred but the das_allocation did. I'm starting to doubt if the segfault I've found in dmesg was actually related. I've grepped the postgres log for dsa_allocated: Why do the messages occur sometimes as FATAL and sometimes as ERROR? 2018-11-29 07:59:06 CET::@:[20584]: FATAL: dsa_allocate could not find 7 free pages 2018-11-29 07:59:06 CET:127.0.0.1(40846):user@db:[19507]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 09:04:13 CET::@:[27341]: FATAL: dsa_allocate could not find 13 free pages 2018-11-30 09:04:13 CET:127.0.0.1(41782):user@db:[25417]: ERROR: dsa_allocate could not find 13 free pages 2018-11-30 09:28:38 CET::@:[30215]: FATAL: dsa_allocate could not find 4 free pages 2018-11-30 09:28:38 CET:127.0.0.1(45980):user@db:[29924]: ERROR: dsa_allocate could not find 4 free pages 2018-11-30 16:37:16 CET::@:[14385]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET::@:[14375]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET:212.186.105.45(55004):user@db:[14386]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET:212.186.105.45(54964):user@db:[14379]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET:212.186.105.45(54916):user@db:[14370]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:45:11 CET:212.186.105.45(55356):user@db:[14555]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET::@:[15359]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET::@:[15363]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET:212.186.105.45(54964):user@db:[14379]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET:212.186.105.45(54916):user@db:[14370]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET:212.186.105.45(55842):user@db:[14815]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:56:11 CET:212.186.105.45(57076):user@db:[15638]: FATAL: dsa_allocate could not find 7 free pages There's quite a bit errors from today but I was launching the problematic query in parallel from 2-3 sessions. Sometimes it was breaking sometimes not. Couldn't find any pattern. The workload on this db is not really constant, rather bursting. -- regards, Jakub Glapa On Tue, Nov 27, 2018 at 9:03 AM Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: > > Hmm. I will see if I can come up with a many-partition torture test > > reproducer for this. > > No luck. I suppose one theory that could link both failure modes > would a buffer overrun, where in the non-shared case it trashes a > pointer that is later dereferenced, and in the shared case it writes > past the end of allocated 4KB pages and corrupts the intrusive btree > that lives in spare pages to track available space. > > -- > Thomas Munro > http://www.enterprisedb.com >