Re: dsa_allocate() faliure

2019-02-07 Thread Jakub Glapa
> Do you have query logging enabled ?  If not, could you consider it on at
least
one of those servers ?  I'm interested to know what ELSE is running at the
time
that query failed.

Ok, I have configured that and will enable in the time window when the
errors usually occur. I'll report as soon as I have something.


--
regards,
pozdrawiam,
Jakub Glapa


On Thu, Feb 7, 2019 at 12:21 AM Justin Pryzby  wrote:

> Moving to -hackers, hopefully it doesn't confuse the list scripts too much.
>
> On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote:
> > I see the error showing up every night on 2 different servers. But it's a
> > bit of a heisenbug because If I go there now it won't be reproducible.
>
> Do you have query logging enabled ?  If not, could you consider it on at
> least
> one of those servers ?  I'm interested to know what ELSE is running at the
> time
> that query failed.
>
> Perhaps you could enable query logging JUST for the interval of time that
> the
> server usually errors ?  The CSV logs can be imported to postgres for
> analysis.
> You might do something like SELECT
> left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time
> BETWEEN .. AND .. GROUP BY 1 ORDER BY 2;
> And just maybe there'd be a query there that only runs once per day which
> would
> allow reproducing the error at will.  Or utility command like vacuum..
>
> I think ideally you'd set:
>
> log_statement= all
> log_min_messages = info
> log_destination  = 'stderr,csvlog'
> # stderr isn't important for this purpose, but I keep it set to capture
> crash messages, too
>
> You should set these to something that works well at your site:
>
> log_rotation_age= '2min'
> log_rotation_size   = '32MB'
>
> I would normally set these, and I don't see any reason why you wouldn't set
> them too:
>
> log_checkpoints = on
> log_lock_waits  = on
> log_temp_files  = on
> log_min_error_statement = notice
> log_temp_files  = 0
> log_min_duration_statement  = '9sec'
> log_autovacuum_min_duration = '999sec'
>
> And I would set these too but maybe you'd prefer to do something else:
>
> log_directory   = /var/log/postgresql
> log_file_mode   = 0640
> log_filename= postgresql-%Y-%m-%d_%H%M%S.log
>
> Justin
>


Re: pg11.1: dsa_area could not attach to segment

2019-02-06 Thread Jakub Glapa
>
> It might be interesting to have CPU info, too.


model name: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (virtualized
vmware)
and
model name: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (bare metal)


--
regards,
pozdrawiam,
Jakub Glapa


On Wed, Feb 6, 2019 at 7:52 PM Justin Pryzby  wrote:

> On Wed, Feb 06, 2019 at 06:37:16PM +0100, Jakub Glapa wrote:
> > I'm seeing dsa_allocate on two different servers.
> > One is virtualized with VMWare the other is bare metal.
>
> Thanks.  So it's not limited to vmware or VM at all.
>
> FYI here we've seen DSA errors on (and only on) two vmware VMs.
>
> It might be interesting to have CPU info, too.
>
> For us the affected servers are:
>
> Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz stepping 02
> microcode: CPU0 sig=0x206d2, pf=0x1, revision=0xb2e
>
> Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz stepping 02
> microcode: CPU0 sig=0x206d2, pf=0x1, revision=0x710
>


Re: pg11.1: dsa_area could not attach to segment

2019-02-06 Thread Jakub Glapa
Hi Justin
I'm seeing dsa_allocate on two different servers.
One is virtualized with VMWare the other is bare metal.

ubuntu@db1:~$ grep dsa_allocate /var/log/postgresql/postgresql-11-main.log
2019-02-03 17:03:03 CET:192.168.10.83(48336):foo@bar:[27979]: FATAL:
dsa_allocate could not find 7 free pages
2019-02-05 17:05:12 CET:192.168.10.83(38138):foo@bar:[2725]: FATAL:
dsa_allocate could not find 49 free pages
2019-02-06 09:04:18 CET::@:[22120]: FATAL:  dsa_allocate could not find 13
free pages
2019-02-06 09:04:18 CET:192.168.10.83(55740):foo@bar:[21725]: ERROR:
dsa_allocate could not find 13 free pages
ubuntu@db1:~$ sudo dmidecode -s system-product-name
VMware Virtual Platform

--
ubuntu@db2:~$ grep dsa_allocate /var/log/postgresql/postgresql-11-main2.log
2019-02-03 07:45:45 CET::@:[28592]: FATAL:  dsa_allocate could not find 25
free pages
2019-02-03 07:45:45 CET:127.0.0.1(41920):foo1@bar:[27320]: ERROR:
dsa_allocate could not find 25 free pages
2019-02-03 07:46:03 CET:127.0.0.1(41920):foo1@bar:[27320]: FATAL:
dsa_allocate could not find 25 free pages
2019-02-04 11:56:28 CET::@:[31713]: FATAL:  dsa_allocate could not find 7
free pages
2019-02-04 11:56:28 CET:127.0.0.1(41950):foo1@bar:[30465]: ERROR:
dsa_allocate could not find 7 free pages
2019-02-04 11:57:59 CET::@:[31899]: FATAL:  dsa_allocate could not find 7
free pages
2019-02-04 11:57:59 CET:127.0.0.1(44096):foo1@bar:[31839]: ERROR:
dsa_allocate could not find 7 free pages
ubuntu@db2:~$ sudo dmidecode -s system-product-name
ProLiant DL380 Gen9






--
regards,
pozdrawiam,
Jakub Glapa


On Wed, Feb 6, 2019 at 6:19 PM Justin Pryzby  wrote:

> On Wed, Feb 06, 2019 at 04:22:12PM +1100, Thomas Munro wrote:
> > Can anyone else who has hit this comment on any virtualisation they
> > might be using?
>
> I don't think most of these people are on -hackers (one of the original
> reports
> was on -performance) so I'm copying them now.
>
> Could you let us know which dsa_* error you were seeing, whether or not you
> were running postgres under virtual environment, and (if so) which VM
> hypervisor?
>
> Thanks,
> Justin
>
>
> https://www.postgresql.org/message-id/flat/CAMAYy4%2Bw3NTBM5JLWFi8twhWK4%3Dk_5L4nV5%2BbYDSPu8r4b97Zg%40mail.gmail.com
>
> https://www.postgresql.org/message-id/flat/CAEepm%3D0aPq2yEy39gEqVK2m_Qi6jJdy96ysHGJ6VSHOZFz%2Bxbg%40mail.gmail.com#e02bee0220b422fe91a3383916107504
>
> https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com
>


Re: dsa_allocate() faliure

2018-11-30 Thread Jakub Glapa
Hi, just a small update.
I've configured the OS for taking crash dumps on Ubuntu 16.04 with the
following (maybe somebody will find it helpful):
I've added LimitCORE=infinity to /lib/systemd/system/postgresql@.service
under [Service] section
I've reloaded the service config with sudo systemctl daemon-reload
Changed the core pattern to: sudo echo
/var/lib/postgresql/core.%p.sig%s.%ts | tee -a /proc/sys/kernel/core_pattern
I had tested it with kill -ABRT pidofbackend and it behaved correctly. A
crash dump was written.

In the last days I've been monitoring no segfault occurred but the
das_allocation did.
I'm starting to doubt if the segfault I've found in dmesg was actually
related.

I've grepped the postgres log for dsa_allocated:
Why do the messages occur sometimes as FATAL and sometimes as ERROR?

2018-11-29 07:59:06 CET::@:[20584]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-29 07:59:06 CET:127.0.0.1(40846):user@db:[19507]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 09:04:13 CET::@:[27341]: FATAL:  dsa_allocate could not find 13
free pages
2018-11-30 09:04:13 CET:127.0.0.1(41782):user@db:[25417]: ERROR:
dsa_allocate could not find 13 free pages
2018-11-30 09:28:38 CET::@:[30215]: FATAL:  dsa_allocate could not find 4
free pages
2018-11-30 09:28:38 CET:127.0.0.1(45980):user@db:[29924]: ERROR:
dsa_allocate could not find 4 free pages
2018-11-30 16:37:16 CET::@:[14385]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:37:16 CET::@:[14375]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:37:16 CET:212.186.105.45(55004):user@db:[14386]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54964):user@db:[14379]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54916):user@db:[14370]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:45:11 CET:212.186.105.45(55356):user@db:[14555]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET::@:[15359]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:49:13 CET::@:[15363]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:49:13 CET:212.186.105.45(54964):user@db:[14379]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(54916):user@db:[14370]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(55842):user@db:[14815]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:56:11 CET:212.186.105.45(57076):user@db:[15638]: FATAL:
dsa_allocate could not find 7 free pages


There's quite a bit errors from today but I was launching the problematic
query in parallel from 2-3 sessions.
Sometimes it was breaking sometimes not.
Couldn't find any pattern.
The workload on this db is not really constant, rather bursting.

--
regards,
Jakub Glapa


On Tue, Nov 27, 2018 at 9:03 AM Thomas Munro 
wrote:

> On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro
>  wrote:
> > Hmm.  I will see if I can come up with a many-partition torture test
> > reproducer for this.
>
> No luck.  I suppose one theory that could link both failure modes
> would a buffer overrun, where in the non-shared case it trashes a
> pointer that is later dereferenced, and in the shared case it writes
> past the end of allocated 4KB pages and corrupts the intrusive btree
> that lives in spare pages to track available space.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>