Hi, just a small update.
I've configured the OS for taking crash dumps on Ubuntu 16.04 with the
following (maybe somebody will find it helpful):
I've added LimitCORE=infinity to /lib/systemd/system/postgresql@.service
under [Service] section
I've reloaded the service config with sudo systemctl daemon-reload
Changed the core pattern to: sudo echo
/var/lib/postgresql/core.%p.sig%s.%ts | tee -a /proc/sys/kernel/core_pattern
I had tested it with kill -ABRT pidofbackend and it behaved correctly. A
crash dump was written.

In the last days I've been monitoring no segfault occurred but the
das_allocation did.
I'm starting to doubt if the segfault I've found in dmesg was actually
related.

I've grepped the postgres log for dsa_allocated:
Why do the messages occur sometimes as FATAL and sometimes as ERROR?

2018-11-29 07:59:06 CET::@:[20584]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-29 07:59:06 CET:127.0.0.1(40846):user@db:[19507]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 09:04:13 CET::@:[27341]: FATAL:  dsa_allocate could not find 13
free pages
2018-11-30 09:04:13 CET:127.0.0.1(41782):user@db:[25417]: ERROR:
dsa_allocate could not find 13 free pages
2018-11-30 09:28:38 CET::@:[30215]: FATAL:  dsa_allocate could not find 4
free pages
2018-11-30 09:28:38 CET:127.0.0.1(45980):user@db:[29924]: ERROR:
dsa_allocate could not find 4 free pages
2018-11-30 16:37:16 CET::@:[14385]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:37:16 CET::@:[14375]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:37:16 CET:212.186.105.45(55004):user@db:[14386]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54964):user@db:[14379]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54916):user@db:[14370]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:45:11 CET:212.186.105.45(55356):user@db:[14555]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET::@:[15359]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:49:13 CET::@:[15363]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:49:13 CET:212.186.105.45(54964):user@db:[14379]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(54916):user@db:[14370]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(55842):user@db:[14815]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:56:11 CET:212.186.105.45(57076):user@db:[15638]: FATAL:
dsa_allocate could not find 7 free pages


There's quite a bit errors from today but I was launching the problematic
query in parallel from 2-3 sessions.
Sometimes it was breaking sometimes not.
Couldn't find any pattern.
The workload on this db is not really constant, rather bursting.

--
regards,
Jakub Glapa


On Tue, Nov 27, 2018 at 9:03 AM Thomas Munro <thomas.mu...@enterprisedb.com>
wrote:

> On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
> > Hmm.  I will see if I can come up with a many-partition torture test
> > reproducer for this.
>
> No luck.  I suppose one theory that could link both failure modes
> would a buffer overrun, where in the non-shared case it trashes a
> pointer that is later dereferenced, and in the shared case it writes
> past the end of allocated 4KB pages and corrupts the intrusive btree
> that lives in spare pages to track available space.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

Reply via email to