Re: dsa_allocate() faliure
> Do you have query logging enabled ? If not, could you consider it on at least one of those servers ? I'm interested to know what ELSE is running at the time that query failed. Ok, I have configured that and will enable in the time window when the errors usually occur. I'll report as soon as I have something. -- regards, pozdrawiam, Jakub Glapa On Thu, Feb 7, 2019 at 12:21 AM Justin Pryzby wrote: > Moving to -hackers, hopefully it doesn't confuse the list scripts too much. > > On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote: > > I see the error showing up every night on 2 different servers. But it's a > > bit of a heisenbug because If I go there now it won't be reproducible. > > Do you have query logging enabled ? If not, could you consider it on at > least > one of those servers ? I'm interested to know what ELSE is running at the > time > that query failed. > > Perhaps you could enable query logging JUST for the interval of time that > the > server usually errors ? The CSV logs can be imported to postgres for > analysis. > You might do something like SELECT > left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time > BETWEEN .. AND .. GROUP BY 1 ORDER BY 2; > And just maybe there'd be a query there that only runs once per day which > would > allow reproducing the error at will. Or utility command like vacuum.. > > I think ideally you'd set: > > log_statement= all > log_min_messages = info > log_destination = 'stderr,csvlog' > # stderr isn't important for this purpose, but I keep it set to capture > crash messages, too > > You should set these to something that works well at your site: > > log_rotation_age= '2min' > log_rotation_size = '32MB' > > I would normally set these, and I don't see any reason why you wouldn't set > them too: > > log_checkpoints = on > log_lock_waits = on > log_temp_files = on > log_min_error_statement = notice > log_temp_files = 0 > log_min_duration_statement = '9sec' > log_autovacuum_min_duration = '999sec' > > And I would set these too but maybe you'd prefer to do something else: > > log_directory = /var/log/postgresql > log_file_mode = 0640 > log_filename= postgresql-%Y-%m-%d_%H%M%S.log > > Justin >
Re: pg11.1: dsa_area could not attach to segment
> > It might be interesting to have CPU info, too. model name: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (virtualized vmware) and model name: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (bare metal) -- regards, pozdrawiam, Jakub Glapa On Wed, Feb 6, 2019 at 7:52 PM Justin Pryzby wrote: > On Wed, Feb 06, 2019 at 06:37:16PM +0100, Jakub Glapa wrote: > > I'm seeing dsa_allocate on two different servers. > > One is virtualized with VMWare the other is bare metal. > > Thanks. So it's not limited to vmware or VM at all. > > FYI here we've seen DSA errors on (and only on) two vmware VMs. > > It might be interesting to have CPU info, too. > > For us the affected servers are: > > Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz stepping 02 > microcode: CPU0 sig=0x206d2, pf=0x1, revision=0xb2e > > Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz stepping 02 > microcode: CPU0 sig=0x206d2, pf=0x1, revision=0x710 >
Re: pg11.1: dsa_area could not attach to segment
Hi Justin I'm seeing dsa_allocate on two different servers. One is virtualized with VMWare the other is bare metal. ubuntu@db1:~$ grep dsa_allocate /var/log/postgresql/postgresql-11-main.log 2019-02-03 17:03:03 CET:192.168.10.83(48336):foo@bar:[27979]: FATAL: dsa_allocate could not find 7 free pages 2019-02-05 17:05:12 CET:192.168.10.83(38138):foo@bar:[2725]: FATAL: dsa_allocate could not find 49 free pages 2019-02-06 09:04:18 CET::@:[22120]: FATAL: dsa_allocate could not find 13 free pages 2019-02-06 09:04:18 CET:192.168.10.83(55740):foo@bar:[21725]: ERROR: dsa_allocate could not find 13 free pages ubuntu@db1:~$ sudo dmidecode -s system-product-name VMware Virtual Platform -- ubuntu@db2:~$ grep dsa_allocate /var/log/postgresql/postgresql-11-main2.log 2019-02-03 07:45:45 CET::@:[28592]: FATAL: dsa_allocate could not find 25 free pages 2019-02-03 07:45:45 CET:127.0.0.1(41920):foo1@bar:[27320]: ERROR: dsa_allocate could not find 25 free pages 2019-02-03 07:46:03 CET:127.0.0.1(41920):foo1@bar:[27320]: FATAL: dsa_allocate could not find 25 free pages 2019-02-04 11:56:28 CET::@:[31713]: FATAL: dsa_allocate could not find 7 free pages 2019-02-04 11:56:28 CET:127.0.0.1(41950):foo1@bar:[30465]: ERROR: dsa_allocate could not find 7 free pages 2019-02-04 11:57:59 CET::@:[31899]: FATAL: dsa_allocate could not find 7 free pages 2019-02-04 11:57:59 CET:127.0.0.1(44096):foo1@bar:[31839]: ERROR: dsa_allocate could not find 7 free pages ubuntu@db2:~$ sudo dmidecode -s system-product-name ProLiant DL380 Gen9 -- regards, pozdrawiam, Jakub Glapa On Wed, Feb 6, 2019 at 6:19 PM Justin Pryzby wrote: > On Wed, Feb 06, 2019 at 04:22:12PM +1100, Thomas Munro wrote: > > Can anyone else who has hit this comment on any virtualisation they > > might be using? > > I don't think most of these people are on -hackers (one of the original > reports > was on -performance) so I'm copying them now. > > Could you let us know which dsa_* error you were seeing, whether or not you > were running postgres under virtual environment, and (if so) which VM > hypervisor? > > Thanks, > Justin > > > https://www.postgresql.org/message-id/flat/CAMAYy4%2Bw3NTBM5JLWFi8twhWK4%3Dk_5L4nV5%2BbYDSPu8r4b97Zg%40mail.gmail.com > > https://www.postgresql.org/message-id/flat/CAEepm%3D0aPq2yEy39gEqVK2m_Qi6jJdy96ysHGJ6VSHOZFz%2Bxbg%40mail.gmail.com#e02bee0220b422fe91a3383916107504 > > https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com >
Re: dsa_allocate() faliure
Hi, just a small update. I've configured the OS for taking crash dumps on Ubuntu 16.04 with the following (maybe somebody will find it helpful): I've added LimitCORE=infinity to /lib/systemd/system/postgresql@.service under [Service] section I've reloaded the service config with sudo systemctl daemon-reload Changed the core pattern to: sudo echo /var/lib/postgresql/core.%p.sig%s.%ts | tee -a /proc/sys/kernel/core_pattern I had tested it with kill -ABRT pidofbackend and it behaved correctly. A crash dump was written. In the last days I've been monitoring no segfault occurred but the das_allocation did. I'm starting to doubt if the segfault I've found in dmesg was actually related. I've grepped the postgres log for dsa_allocated: Why do the messages occur sometimes as FATAL and sometimes as ERROR? 2018-11-29 07:59:06 CET::@:[20584]: FATAL: dsa_allocate could not find 7 free pages 2018-11-29 07:59:06 CET:127.0.0.1(40846):user@db:[19507]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 09:04:13 CET::@:[27341]: FATAL: dsa_allocate could not find 13 free pages 2018-11-30 09:04:13 CET:127.0.0.1(41782):user@db:[25417]: ERROR: dsa_allocate could not find 13 free pages 2018-11-30 09:28:38 CET::@:[30215]: FATAL: dsa_allocate could not find 4 free pages 2018-11-30 09:28:38 CET:127.0.0.1(45980):user@db:[29924]: ERROR: dsa_allocate could not find 4 free pages 2018-11-30 16:37:16 CET::@:[14385]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET::@:[14375]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET:212.186.105.45(55004):user@db:[14386]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET:212.186.105.45(54964):user@db:[14379]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET:212.186.105.45(54916):user@db:[14370]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:45:11 CET:212.186.105.45(55356):user@db:[14555]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET::@:[15359]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET::@:[15363]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET:212.186.105.45(54964):user@db:[14379]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET:212.186.105.45(54916):user@db:[14370]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET:212.186.105.45(55842):user@db:[14815]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:56:11 CET:212.186.105.45(57076):user@db:[15638]: FATAL: dsa_allocate could not find 7 free pages There's quite a bit errors from today but I was launching the problematic query in parallel from 2-3 sessions. Sometimes it was breaking sometimes not. Couldn't find any pattern. The workload on this db is not really constant, rather bursting. -- regards, Jakub Glapa On Tue, Nov 27, 2018 at 9:03 AM Thomas Munro wrote: > On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro > wrote: > > Hmm. I will see if I can come up with a many-partition torture test > > reproducer for this. > > No luck. I suppose one theory that could link both failure modes > would a buffer overrun, where in the non-shared case it trashes a > pointer that is later dereferenced, and in the shared case it writes > past the end of allocated 4KB pages and corrupts the intrusive btree > that lives in spare pages to track available space. > > -- > Thomas Munro > http://www.enterprisedb.com >