pg_dump from v13 is slow

2020-08-13 Thread Justin Pryzby
I ran into this while running pg_upgrade from beta2 to beta3:
$ time sudo -u postgres sh -c 'cd /var/lib/pgsql; /usr/pgsql-13/bin/pg_upgrade 
-b /usr/pgsql-13b2/bin/ -d ./13b2/data -D ./13/data --link'
real94m18.335s

This instances has many table partitions, and the production instance uses
tablespaces.  Some of our tables are wide.  This VM is not idle, but does not
account for being 20x slower.

pg_dump -v --section=pre-data ts |wc
1846659 4697507 59575253
real39m8.524s

Compare v12 and v13:

|$ /usr/pgsql-12/bin/initdb -D 12
|$ /usr/pgsql-12/bin/postgres -D 12 -c shared_buffers=256MB -c 
max_locks_per_transaction=128 -c port=5678 -c unix_socket_directories=/tmp&
|$ psql -h /tmp -p 5678 postgres 

Re: pg_dump from v13 is slow

2020-08-13 Thread Alvaro Herrera
On 2020-Aug-13, Justin Pryzby wrote:

> I'm trying to narrow this down, but I'd be very happy for suggestions.

Maybe you can time "pg_dump --binary-upgrade" to see if the slowness
comes from there.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_dump from v13 is slow

2020-08-13 Thread Andres Freund
Hi,

On 2020-08-13 17:48:23 -0500, Justin Pryzby wrote:
> I'm trying to narrow this down, but I'd be very happy for suggestions.

Would be worth knowing how much of the time pgbench is 100% CPU
utilized, and how much of the time it is basically waiting for server
side queries and largely idle.

If it's close to 100% busy for a significant part of that time, it'd be
useful to get a perf profile. If it's largely queries to the server that
are the issue, logging those would be relevant.

Greetings,

Andres Freund




Re: pg_dump from v13 is slow

2020-08-13 Thread Justin Pryzby
I can reproduce the issue with generated data:

pryzbyj=# SELECT format('create table t%s(i int)', i) FROM 
generate_series(1,)i;
\set ECHO errors
\set QUIET
\gexec

$ time pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5613 |wc   
 110015  240049 1577087
real0m50.445s

$ time pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5612 |wc
 110015  240049 1577084
real0m11.203s

On Thu, Aug 13, 2020 at 04:30:14PM -0700, Andres Freund wrote:
> Would be worth knowing how much of the time pgbench is 100% CPU
> utilized, and how much of the time it is basically waiting for server
> side queries and largely idle.

Good question - I guess you mean pg_dump.

$ command time -v pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5612 |wc   
Command being timed: "pg_dump --section=pre-data -d pryzbyj -h /tmp -p 
5612"
User time (seconds): 0.65
System time (seconds): 0.52
Percent of CPU this job got: 9%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.85

$ command time -v pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5613 |wc
Command being timed: "pg_dump --section=pre-data -d pryzbyj -h /tmp -p 
5613"
User time (seconds): 0.79
System time (seconds): 0.49
Percent of CPU this job got: 2%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:48.51

So v13 was 4.5x slower and it seems to be all on the server side.

I looked queries like this:
time strace -ts999 -e sendto pg_dump  --section=pre-data -d pryzbyj -h /tmp -p 
5613 2>strace-13-3 |wc
cut -c1-66 strace-13-3 |sort |uniq |less

Most of the time is spent on these three queries:

|12:58:11 sendto(3, 
"Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn
|...
|12:58:30 sendto(3, 
"Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn

|12:58:32 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, 
pol.polpermissive, CASE WHEN pol.polroles = '{0}' TH
|...
|12:58:47 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, 
pol.polpermissive, CASE WHEN pol.polroles = '{0}' TH

|12:58:49 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM 
pg_publication_rel pr, pg_publication p WHERE pr.prrelid
|...
|12:59:01 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM 
pg_publication_rel pr, pg_publication p WHERE pr.prrelid

Compare with v12:

|12:57:58 sendto(3, 
"Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn
|...
|12:58:03 sendto(3, 
"Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn

|12:58:05 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, 
pol.polpermissive, CASE WHEN pol.polroles = '{0}' TH
|...
|12:58:07 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, 
pol.polpermissive, CASE WHEN pol.polroles = '{0}' TH

|12:58:09 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM 
pg_publication_rel pr, pg_publication p WHERE pr.prrelid
|...
|12:58:11 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM 
pg_publication_rel pr, pg_publication p WHERE pr.prrelid

The first query regressed the worst.

$ psql -h /tmp -Ap 5612 pryzbyj
psql (13beta3, server 12.4)
pryzbyj=# explain analyze SELECT 
a.attnum,a.attname,a.atttypmod,a.attstattarget,a.attstorage,t.typstorage,a.attnotnull,a.atthasdef,a.attisdropped,a.attlen,a.attalign,a.attislocal,pg_catalog.format_type(t.oid,
 a.atttypmod) AS atttypname,a.attgenerated,CASE WHEN a.atthasmissing AND NOT 
a.attisdropped THEN a.attmissingval ELSE null END AS 
attmissingval,a.attidentity,pg_catalog.array_to_string(ARRAY(SELECT 
pg_catalog.quote_ident(option_name) || ' ' || 
pg_catalog.quote_literal(option_value) FROM 
pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E',') 
AS attfdwoptions,CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation 
ELSE 0 END AS attcollation,array_to_string(a.attoptions, ', ') AS attoptions 
FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t ON a.atttypid = 
t.oid WHERE a.attrelid = '191444'::pg_catalog.oid AND a.attnum > 
0::pg_catalog.int2 ORDER BY a.attnum;
QUERY PLAN
Nested Loop Left Join  (cost=0.58..16.72 rows=1 width=217) (actual 
time=0.205..0.209 rows=1 loops=1)
  ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  
(cost=0.29..8.31 rows=1 width=189) (actual time=0.030..0.032 rows=1 loops=1)
Index Cond: ((attrelid = '191444'::oid) AND (attnum > '0'::smallint))
  ->  Index Scan using pg_type_oid_index on pg_type t  (cost=0.29..8.30 rows=1 
width=9) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: (oid = a.atttypid)
  SubPlan 1
->  Sort  (cost=0.09..0.09 rows=3 width=64) (actual time=0.119..0.119 
rows=0 loops=1)
  Sort Key: pg_options_to_table.option_name
  Sort Method: quick

Re: pg_dump from v13 is slow

2020-08-13 Thread Alvaro Herrera
Hmm, I wonder if you're comparing an assert-enabled pg13 build to a
non-assert-enabled pg12 build, or something like that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_dump from v13 is slow

2020-08-13 Thread Justin Pryzby
On Thu, Aug 13, 2020 at 08:53:46PM -0400, Alvaro Herrera wrote:
> Hmm, I wonder if you're comparing an assert-enabled pg13 build to a
> non-assert-enabled pg12 build, or something like that.

Great question - I thought of it myself but then forgot to look..

$ rpm -q postgresql1{2,3}-server
postgresql12-server-12.4-1PGDG.rhel7.x86_64
postgresql13-server-13-beta3_1PGDG.rhel7.x86_64

$ /usr/pgsql-12/bin/pg_config |grep -o cassert || echo not found
not found
$ /usr/pgsql-13/bin/pg_config |grep -o cassert || echo not found
cassert

It looks like the beta packages are compiled with cassert, which makes sense.

Thanks and sorry for noise.

-- 
Justin




Re: pg_dump from v13 is slow

2020-08-13 Thread Tom Lane
Justin Pryzby  writes:
> I can reproduce the issue with generated data:
> pryzbyj=# SELECT format('create table t%s(i int)', i) FROM 
> generate_series(1,)i;

Hm, I tried this case and didn't really detect much runtime difference
between v12 and HEAD.

> I don't know if it's any issue, but I found that pg12 can process "null
> statements" almost 2x as fast:

Now I'm suspicious that you're comparing an assert-enabled v13 build
to a non-assert-enabled v12 build.  Check the output of
"pg_config --configure" from each installation to see if they're
configured alike.

regards, tom lane