Re: [PERFORM] Query Plan Performance on Partitioned Table
I tried to add index on partition key and it didn't help. we have autovacuum running. The updates and inserts are very frequent on these tables. The server kernel version is 3.5.0-22-generic. It has 376G memory. max_connections = 2500# (change requires restart) shared_buffers = 32GB# min 128kB work_mem = 8MB# min 64kB maintenance_work_mem = 20GB# min 1MB We usually have around 400 active connections on the db. Most of them are idle. There are about 100 connections are in active status and I can see most of the time they are in 'BIND' status in ps command. We have heavy IO load on the disk of the default tablespace where I believe table statistics tables are in. Will that impact the query planing greatly? 2015-08-12 15:00 GMT+08:00 Pietro Pugni pietro.pu...@gmail.com: You can give it a try only on that partition just to see if your query plan gets better. I prefer defining partitioning over ranging attributes like, for example: cid between 123 and 456. It makes more sense, especially when there are attributes which value strictly depends on the check attribute. Btw, dozens of millions is not a problem on modern systems. I remember of reading about a recommended 20 millions per partition but I usually work with 60 millions per partition without any problem. Do you autovacuum? How frequently do the updates and insert operations occur? Give us your configuration about work_mem, shared_buffers, max_connections etc. Kernel version? If possible avoid 3.2 and 3.8-3.13. Also think to upgrade your OS version. From today I'm on vacancy, so others could help :) Pietro Pugni Il 12/ago/2015 03:49, Rural Hunter ruralhun...@gmail.com ha scritto: article_729 has about 0.8 million rows. The rows of the children tables are variance from several thousands to dozens of millions. How can it help to create index on the partition key? 2015-08-12 1:03 GMT+08:00 Pietro Pugni pietro.pu...@gmail.com: Hi Rural Hunter, Try to create an index on cid attribute. How many rows has article_729? Pietro Pugni Il 11/ago/2015 16:51, Rural Hunter ruralhun...@gmail.com ha scritto: yes i'm very sure. from what i observed, it has something to do with the concurrent query planing. if i disconnect other connections, the plan is very quick. 2015-08-11 22:42 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com: Check constraints: article_729_cid_check CHECK (cid = 729) Used partition schema looks very simple and straightforward, and should have no issues with 80 partitions. Are you sure that you have only 80 partitions but not (lets say) 800? Are every other partition of the article table use the same general idea of partition check (cid=something)? Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
[PERFORM] Query Plan Performance on Partitioned Table
Platform: pg 9.2.9 on Ubuntu 12.04.4 LTS. I have a table which is partitioned to about 80 children. There are usualy several dozens of connections accessing these tables concurrently. I found sometimes the query planing time is very long if I query against the parent table with partition key. The connections are shown with status 'BIND' by ps command. In normal condition, the plan time of the query is about several hundred of million seconds while the same query accessing child table directly is less than 1 million seconds: # explain select 1 from article where cid=729 and url_hash='6851f596f55a994b2df417b53523fe45'; QUERY PLAN Result (cost=0.00..8.68 rows=2 width=0) - Append (cost=0.00..8.68 rows=2 width=0) - Seq Scan on article (cost=0.00..0.00 rows=1 width=0) Filter: ((cid = 729) AND (url_hash = '6851f596f55a994b2df417b53523fe45'::bpchar)) - Index Scan using article_729_url_hash on article_729 article (cost=0.00..8.68 rows=1 width=0) Index Cond: (url_hash = '6851f596f55a994b2df417b53523fe45'::bpchar) Filter: (cid = 729) (7 rows) Time: 361.401 ms # explain select 1 from article_729 where url_hash='6851f596f55a994b2df417b53523fe45'; QUERY PLAN - Index Only Scan using article_729_url_hash on article_729 (cost=0.00..8.67 rows=1 width=0) Index Cond: (url_hash = '6851f596f55a994b2df417b53523fe45'::bpchar) (2 rows) Time: 0.898 ms This is only in normal condition. In extreme condition, the planing time could take several minutes. There seems some locking issue in query planing. How can I increase the plan performance? Or is it bad to partition table to 80 children in PostgreSQL?
Re: [PERFORM] Query Plan Performance on Partitioned Table
# \d article_729 Table public.article_729 Column|Type | Modifiers --+-+--- aid | bigint | not null default nextval('article_aid_seq'::regclass) style| smallint| not null default 0 oaid | bigint | default 0 fid | integer | bid | integer | default 0 cid | integer | tid | integer | url | text| default NULL::bpchar tm_post | timestamp without time zone | tm_last_rply | timestamp without time zone | author | character varying(100) | default NULL::bpchar title| character varying(255) | default NULL::bpchar content | text| ab_content | text| rply_cnt | integer | read_cnt | integer | url_hash | character(32) | not null hash_plain | text| default NULL::bpchar title_hash | character(32) | default NULL::bpchar guid | character(32) | default NULL::bpchar neg_pos | smallint| not null default 0 match_code | character(32) | default NULL::bpchar tm_spider| timestamp without time zone | tm_update| timestamp without time zone | stage| smallint| not null default 0 rply_cut | integer | not null default 0 read_cut | integer | not null default 0 src | integer | default 0 rfid | integer | labels | integer[] | kwds | integer[] | like_cnt | integer | Indexes: article_729_pkey PRIMARY KEY, btree (aid), tablespace indextbs article_729_url_hash UNIQUE CONSTRAINT, btree (url_hash), tablespace indextbs article_729_bid_titlehash_idx btree (bid, title_hash), tablespace indextbs article_729_fid_idx btree (fid), tablespace indextbs article_729_guid_idx btree (guid), tablespace indextbs article_729_labels_idx gin (labels), tablespace data1tbs article_729_mtcode_idx btree (match_code), tablespace indextbs article_729_rfid_author_idx btree (rfid, author), tablespace indextbs article_729_stage_idx btree (stage), tablespace data1tbs article_729_time_style_idx btree (tm_post DESC, style), tablespace data1tbs article_729_tm_spider_idx btree (tm_spider), tablespace indextbs article_729_tm_update_idx btree (tm_update), tablespace data1tbs Check constraints: article_729_cid_check CHECK (cid = 729) Foreign-key constraints: article_729_cid_fk FOREIGN KEY (cid) REFERENCES company(cid) ON DELETE CASCADE Triggers: trg_article_729_delete AFTER DELETE ON article_729 FOR EACH ROW EXECUTE PROCEDURE fn_article_delete() trg_article_729_insert AFTER INSERT ON article_729 FOR EACH ROW EXECUTE PROCEDURE fn_article_insert() trg_article_729_update AFTER UPDATE ON article_729 FOR EACH ROW EXECUTE PROCEDURE fn_article_update() Inherits: article 2015-08-11 21:53 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com: On Tue, Aug 11, 2015 at 11:44 PM, Rural Hunter ruralhun...@gmail.com wrote: # \dt+ article_729 List of relations Schema |Name | Type | Owner | Size | Description +-+---+++- public | article_729 | table | omuser1 | 655 MB | (1 row) The problem exists on not only this specific child table, but with all of them. Oops sorry, оf course I mean \d+ article_729 (to see criteria used for partitioning). -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/
Re: [PERFORM] Query Plan Performance on Partitioned Table
), tablespace data1tbs Check constraints: article_729_cid_check CHECK (cid = 729) Foreign-key constraints: article_729_cid_fk FOREIGN KEY (cid) REFERENCES company(cid) ON DELETE CASCADE Triggers: trg_article_729_delete AFTER DELETE ON article_729 FOR EACH ROW EXECUTE PROCEDURE fn_article_delete() trg_article_729_insert AFTER INSERT ON article_729 FOR EACH ROW EXECUTE PROCEDURE fn_article_insert() trg_article_729_update AFTER UPDATE ON article_729 FOR EACH ROW EXECUTE PROCEDURE fn_article_update() Inherits: article Has OIDs: no 2015-08-11 22:00 GMT+08:00 Rural Hunter ruralhun...@gmail.com: # \d article_729 Table public.article_729 Column|Type | Modifiers --+-+--- aid | bigint | not null default nextval('article_aid_seq'::regclass) style| smallint| not null default 0 oaid | bigint | default 0 fid | integer | bid | integer | default 0 cid | integer | tid | integer | url | text| default NULL::bpchar tm_post | timestamp without time zone | tm_last_rply | timestamp without time zone | author | character varying(100) | default NULL::bpchar title| character varying(255) | default NULL::bpchar content | text| ab_content | text| rply_cnt | integer | read_cnt | integer | url_hash | character(32) | not null hash_plain | text| default NULL::bpchar title_hash | character(32) | default NULL::bpchar guid | character(32) | default NULL::bpchar neg_pos | smallint| not null default 0 match_code | character(32) | default NULL::bpchar tm_spider| timestamp without time zone | tm_update| timestamp without time zone | stage| smallint| not null default 0 rply_cut | integer | not null default 0 read_cut | integer | not null default 0 src | integer | default 0 rfid | integer | labels | integer[] | kwds | integer[] | like_cnt | integer | Indexes: article_729_pkey PRIMARY KEY, btree (aid), tablespace indextbs article_729_url_hash UNIQUE CONSTRAINT, btree (url_hash), tablespace indextbs article_729_bid_titlehash_idx btree (bid, title_hash), tablespace indextbs article_729_fid_idx btree (fid), tablespace indextbs article_729_guid_idx btree (guid), tablespace indextbs article_729_labels_idx gin (labels), tablespace data1tbs article_729_mtcode_idx btree (match_code), tablespace indextbs article_729_rfid_author_idx btree (rfid, author), tablespace indextbs article_729_stage_idx btree (stage), tablespace data1tbs article_729_time_style_idx btree (tm_post DESC, style), tablespace data1tbs article_729_tm_spider_idx btree (tm_spider), tablespace indextbs article_729_tm_update_idx btree (tm_update), tablespace data1tbs Check constraints: article_729_cid_check CHECK (cid = 729) Foreign-key constraints: article_729_cid_fk FOREIGN KEY (cid) REFERENCES company(cid) ON DELETE CASCADE Triggers: trg_article_729_delete AFTER DELETE ON article_729 FOR EACH ROW EXECUTE PROCEDURE fn_article_delete() trg_article_729_insert AFTER INSERT ON article_729 FOR EACH ROW EXECUTE PROCEDURE fn_article_insert() trg_article_729_update AFTER UPDATE ON article_729 FOR EACH ROW EXECUTE PROCEDURE fn_article_update() Inherits: article 2015-08-11 21:53 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com: On Tue, Aug 11, 2015 at 11:44 PM, Rural Hunter ruralhun...@gmail.com wrote: # \dt+ article_729 List of relations Schema |Name | Type | Owner | Size | Description +-+---+++- public | article_729 | table | omuser1 | 655 MB | (1 row) The problem exists on not only this specific child table, but with all of them. Oops sorry, оf course I mean \d+ article_729 (to see criteria used for partitioning). -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/
Re: [PERFORM] Query Plan Performance on Partitioned Table
yes i'm very sure. from what i observed, it has something to do with the concurrent query planing. if i disconnect other connections, the plan is very quick. 2015-08-11 22:42 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com: Check constraints: article_729_cid_check CHECK (cid = 729) Used partition schema looks very simple and straightforward, and should have no issues with 80 partitions. Are you sure that you have only 80 partitions but not (lets say) 800? Are every other partition of the article table use the same general idea of partition check (cid=something)? Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [PERFORM] Query Plan Performance on Partitioned Table
# \dt+ article_729 List of relations Schema |Name | Type | Owner | Size | Description +-+---+++- public | article_729 | table | omuser1 | 655 MB | (1 row) The problem exists on not only this specific child table, but with all of them. 2015-08-11 19:43 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com: On Tue, Aug 11, 2015 at 6:46 PM, Rural Hunter ruralhun...@gmail.com wrote: Platform: pg 9.2.9 on Ubuntu 12.04.4 LTS. I have a table which is partitioned to about 80 children. There are usualy several dozens of connections accessing these tables concurrently. I found sometimes the query planing time is very long if I query against the parent table with partition key. The connections are shown with status 'BIND' by ps command. In normal condition, the plan time of the query is about several hundred of million seconds while the same query accessing child table directly is less than 1 million seconds: # explain select 1 from article where cid=729 and url_hash='6851f596f55a994b2df417b53523fe45'; QUERY PLAN Result (cost=0.00..8.68 rows=2 width=0) - Append (cost=0.00..8.68 rows=2 width=0) - Seq Scan on article (cost=0.00..0.00 rows=1 width=0) Filter: ((cid = 729) AND (url_hash = '6851f596f55a994b2df417b53523fe45'::bpchar)) - Index Scan using article_729_url_hash on article_729 article (cost=0.00..8.68 rows=1 width=0) Index Cond: (url_hash = '6851f596f55a994b2df417b53523fe45'::bpchar) Filter: (cid = 729) (7 rows) Time: 361.401 ms # explain select 1 from article_729 where url_hash='6851f596f55a994b2df417b53523fe45'; QUERY PLAN - Index Only Scan using article_729_url_hash on article_729 (cost=0.00..8.67 rows=1 width=0) Index Cond: (url_hash = '6851f596f55a994b2df417b53523fe45'::bpchar) (2 rows) Time: 0.898 ms This is only in normal condition. In extreme condition, the planing time could take several minutes. There seems some locking issue in query planing. How can I increase the plan performance? Or is it bad to partition table to 80 children in PostgreSQL? Hi, Could you provide full definition of article_729 table (\dt+ article_729)? 80 partitions is adequate amount of partitions for the PostgreSQL, so there are going something unusual (I suspect it may be related to used partitioning schema). -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/
Re: [PERFORM] Query Plan Performance on Partitioned Table
article_729 has about 0.8 million rows. The rows of the children tables are variance from several thousands to dozens of millions. How can it help to create index on the partition key? 2015-08-12 1:03 GMT+08:00 Pietro Pugni pietro.pu...@gmail.com: Hi Rural Hunter, Try to create an index on cid attribute. How many rows has article_729? Pietro Pugni Il 11/ago/2015 16:51, Rural Hunter ruralhun...@gmail.com ha scritto: yes i'm very sure. from what i observed, it has something to do with the concurrent query planing. if i disconnect other connections, the plan is very quick. 2015-08-11 22:42 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com: Check constraints: article_729_cid_check CHECK (cid = 729) Used partition schema looks very simple and straightforward, and should have no issues with 80 partitions. Are you sure that you have only 80 partitions but not (lets say) 800? Are every other partition of the article table use the same general idea of partition check (cid=something)? Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [PERFORM] MusicBrainz postgres performance issues
pls check this if it helps: http://ubuntuforums.org/showthread.php?t=2258734 在 2015/3/15 18:54, Robert Kaye 写道: Hi! We at MusicBrainz have been having trouble with our Postgres install for the past few days. I’ve collected all the relevant information here: http://blog.musicbrainz.org/2015/03/15/postgres-troubles/ If anyone could provide tips, suggestions or other relevant advice for what to poke at next, we would love it. Thanks! -- --ruaok Robert Kaye -- r...@musicbrainz.org -- http://musicbrainz.org
Re: [PERFORM] Very slow planning performance on partition table
Hi Tom, Could my problem be a victim of this issue? http://postgresql.1045698.n5.nabble.com/Planner-performance-extremely-affected-by-an-hanging-transaction-20-30-times-td5771686.html is the patch mentioned in that thread applied in 9.2.9? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
在 2014/7/30 18:03, Rural Hunter 写道: I think I understand what happened now. I have another monitor script runs periodically and calls pg_cancel_backend and pg_terminate_backend for those hanging update sqls. However for some unkown reason the cancle and termiante command doesn't work at pgsql side for those update sqls. With the log of the monitorkill scipt, I can confirm that the CLOSE_WAIT is not caused by it. I logged the netstat before actually doing the kill and found the CLOSE_WAIT connections were already there. So it must be something else caused the CLOSE_WAIT connections. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
I think I understand what happened now. I have another monitor script runs periodically and calls pg_cancel_backend and pg_terminate_backend for those hanging update sqls. However for some unkown reason the cancle and termiante command doesn't work at pgsql side for those update sqls. But I think pgbouncer side was notified by cancel or terminate command. It then drops old connections and creates new ones while those old connections still hang at pgsql side. That's why the connection status shows CLOST_WAIT and there are more processes at pgsql side than pgbouncer defined . So the root cause is still at pgsql side. It shouldn't hang there. What the hanging process was doing is in my previous posts. There many same concurrent sql which updates a partitioned table witouth partition key specified in conditions. The gdb trace shows this: (gdb) bt #0 0x7f8cea310db7 in semop () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x005f97d3 in PGSemaphoreLock () #2 0x00638153 in LWLockAcquire () #3 0x004a90d0 in ginTraverseLock () #4 0x004a9d0b in ginFindLeafPage () #5 0x004a8377 in ginInsertItemPointers () #6 0x004a4548 in ginEntryInsert () #7 0x004ae687 in ginInsertCleanup () #8 0x004af3d6 in ginHeapTupleFastInsert () #9 0x004a4ab1 in gininsert () #10 0x00709b15 in FunctionCall6Coll () #11 0x0047b6b7 in index_insert () #12 0x0057f475 in ExecInsertIndexTuples () #13 0x0058bf07 in ExecModifyTable () #14 0x005766e3 in ExecProcNode () #15 0x00575ad4 in standard_ExecutorRun () #16 0x0064718f in ProcessQuery () #17 0x006473b7 in PortalRunMulti () #18 0x00647e8a in PortalRun () #19 0x00645160 in PostgresMain () #20 0x0060459e in ServerLoop () #21 0x006053bc in PostmasterMain () #22 0x005a686b in main () (gdb) q It will just hangs there forever and finally blocks all other update sqls if I don't stop pgbouncer. When this happens, all the cpus will be utilized by those hanging processes and the server load is very very high. It keeps at serveral hundreds comparing with about 20 normally which causes the performance problem for all tasks on the server. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
More information found. After the hang connection appears, I noticed there were several hundreds of connections of the same user. Since I use pgbouncer and I only set the pool size to 50 for each user, this is very strange. I checked the pgbouncer side, 'show pools' showed the active server connection count is less than 50(only 35 actually). I also checked the client port which is shown in pg process list. It is not used at pgbouncer side when I did the check. So I stopped pgbouncer then the connection count from the user drops slowly. Finally all those connections disappeared. After that I restarted pgbouncer and it looks good again. With this solution, I at least don't have to kill pg when the problem happens. But anyone has a clue why this happens? What I need to check for the root cause? One thing I forgot to check is the network status of those orphan connections at pg side. I will check it next time and see if they are in abnormal status. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
在 2014/7/30 1:27, Jeff Janes 写道: It sounds like someone is bypassing your pgbouncer and connecting directly to your database. Maybe they tried to create their own parallelization and have a master connection going through pgbouncer and create many auxiliary connections that go directly to the database (probably because pgbouncer wouldn't let them create as many connections as they wanted through it). That would explain why the connections slowly drain away once pgbouncer is shut down. Can you change your pg_hba.conf file so that it only allows connections from pgbouncer's IP address? This should flush out the culprit pretty quickly. Cheers, Jeff I suspected that first. But after I checked a few things, I am quite sure this is not someone bypassing the pgbouncer. 1. The connections were all from the host of pgbouncer. 2. The id is an application id and no human has access to it. There was no other suspect applications running on the host of pgbouncer when the problem happened. 3. When I found the problem and checked the connections on the host of pgbouncer, those network connection actually didn't exist on the client side while they were still hanging at pg server side. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
This happened again. This time I got the connection status(between pgbouncer host to pgsql host) at postgresql side. When the problem happens, the connection status is this: ESTABLISHED: 188 CLOSE_WAIT: 116 The count of connections in CLOSE_WAIT is abnormal. Comparing with normal situation, there is usually no close_wait connection. The connection status sample is like this: ESTABLISHED: 117 CLOSE_WAIT: 0 I have 4 users configured in pgbouncer and the pool_size is 50. So the max number of connections from pgbouncer should be less than 200. The connection spike happens very quickly. I created a script to check the connections from pgbouncer. The script checks the connections from pgbouncer every 5 mins. This is the log: 10:55:01 CST pgbouncer is healthy. connection count: 73 11:00:02 CST pgbouncer is healthy. connection count: 77 11:05:01 CST pgbouncer is healthy. connection count: 118 11:10:01 CST pgbouncer is healthy. connection count: 115 11:15:01 CST pgbouncer is healthy. connection count: 75 11:20:01 CST pgbouncer is healthy. connection count: 73 11:25:02 CST pgbouncer is healthy. connection count: 75 11:30:01 CST pgbouncer is healthy. connection count: 77 11:35:01 CST pgbouncer is healthy. connection count: 84 11:40:10 CST Problematic connection count: 292, will restart pgbouncer... Now I suspect there is some network problem between the hosts of pgbouncer and pgsql. Will check more. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
This was no error in the log of pgbouncer, but there is a sudden drop of request count when the problem happened: 2014-07-30 11:36:51.919 25369 LOG Stats: 2394 req/s, in 339478 b/s, out 1422425 b/s,query 3792 us 2014-07-30 11:37:51.919 25369 LOG Stats: 2207 req/s, in 314570 b/s, out 2291440 b/s,query 5344 us 2014-07-30 11:38:51.919 25369 LOG Stats: 2151 req/s, in 288565 b/s, out 1945795 b/s,query 10016 us [=problem happens=] 2014-07-30 11:39:51.919 25369 LOG Stats: 1061 req/s, in 140077 b/s, out 2652730 b/s,query 515753 us [=pgbouncer restart=] 2014-07-30 11:40:52.780 10640 LOG File descriptor limit: 65535 (H:65535), max_client_conn: 5500, max fds possible: 6560 2014-07-30 11:40:52.781 10640 LOG Stale pidfile, removing 2014-07-30 11:40:52.782 10642 LOG listening on 0.0.0.0: 2014-07-30 11:40:52.782 10642 WARNING Cannot listen on ::/: bind(): Address already in use 2014-07-30 11:40:52.782 10642 LOG listening on unix:/tmp/.s.PGSQL. 2014-07-30 11:40:52.782 10642 LOG process up: pgbouncer 1.5.4, libevent 1.4.13-stable (epoll), adns: libc-2.11 2014-07-30 11:41:52.781 10642 LOG Stats: 2309 req/s, in 331097 b/s, out 3806033 b/s,query 4671 us 2014-07-30 11:42:52.782 10642 LOG Stats: 2044 req/s, in 285153 b/s, out 2932543 b/s,query 4789 us 2014-07-30 11:43:52.782 10642 LOG Stats: 1969 req/s, in 282697 b/s, out 560439 b/s,query 4607 us 2014-07-30 11:44:52.782 10642 LOG Stats: 2551 req/s, in 351589 b/s, out 3223438 b/s,query 4364 us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
This is the vmstat output when the high load peak happens: # vmstat 3 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 54 1 756868 1047128 264572 34157347200 243 2721 2 51 7 39 2 53 1 756888 924452 264508 34156608007 31379 3623 53110 184808 29 55 15 1 70 1 756892 992416 264408 34153088003 14483 9455 53010 183758 23 61 15 1 93 1 756900 954704 264160 34151420803 20280 3391 66607 304526 23 59 17 1 65 2 756916 998524 263696 34142752005 23295 2084 53748 213259 26 60 12 1 46 0 756924 969036 263636 34142108803 23508 1447 51134 200739 22 59 19 1 123 1 756932 977336 263568 34142601603 21444 2747 48044 174390 27 59 13 1 71 2 756932 975932 263580 34148352000 19328 89629 54321 234718 25 59 14 2 47 5 756932 967004 263676 34150224000 19509 52652 56792 236648 21 60 15 4 70 0 756944 1038464 263660 34146880004 21349 3584 51937 179806 25 59 15 1 70 0 756940 923800 263532 34147571200 15135 1524 58201 236794 21 59 19 1 40 1 756940 1022420 263560 34150656000 9163 4889 34702 130106 19 61 19 1 59 0 756944 939380 263500 34151814401 22809 4024 46398 224644 21 60 19 1 56 1 756956 954656 263464 34146944004 22927 4477 53705 175386 28 57 14 1 39 0 756976 968204 263372 34137657607 24612 2556 61900 262784 30 51 18 1 109 1 756984 1015260 263332 34132377603 16636 4039 29271 83699 7 85 7 1 76 6 756992 980044 263312 34130812803 6949 1848 27496 130478 6 90 2 2 103 0 756992 963540 263308 34135206400 22125 2493 20526 61133 4 88 6 2 Seems most of the cpu is used by sys part. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
I am now seeing another phenominom of hanging connections. They are showing 'UPDATE' status in process list. (gdb) bt #0 0x7f783f79d4f7 in semop () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x005f97d3 in PGSemaphoreLock () #2 0x00638153 in LWLockAcquire () #3 0x004a9239 in ginStepRight () #4 0x004a9c61 in ginFindLeafPage () #5 0x004a8377 in ginInsertItemPointers () #6 0x004a4548 in ginEntryInsert () #7 0x004ae687 in ginInsertCleanup () #8 0x004af3d6 in ginHeapTupleFastInsert () #9 0x004a4ab1 in gininsert () #10 0x00709b15 in FunctionCall6Coll () #11 0x0047b6b7 in index_insert () #12 0x0057f475 in ExecInsertIndexTuples () #13 0x0058bf07 in ExecModifyTable () #14 0x005766e3 in ExecProcNode () #15 0x00575ad4 in standard_ExecutorRun () #16 0x0064718f in ProcessQuery () #17 0x006473b7 in PortalRunMulti () #18 0x00647e8a in PortalRun () #19 0x00645160 in PostgresMain () #20 0x0060459e in ServerLoop () #21 0x006053bc in PostmasterMain () #22 0x005a686b in main () (gdb) q This connection can not be killed by pg_cancel_backend nor pg_terminate_backend. It just hangs there and does not respond to normal kill command. I had to kill -9 the process to terminate whole postgresql instance. What happened there and how can I kill these connections safely? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
在 2014/7/29 1:29, Jeff Janes 写道: If it were waiting on a pg_locks lock, the semop should be coming from ProcSleep, not from LWLockAcquire, shouldn't it? I'm guessing he has a lot of connections, and each connection is locking each partition in shared mode in rapid fire, generating spin-lock or cache-line contention. Cheers, Jeff Yes. I have a lot of connections and they maybe coming together and doing the same update statement without partition key on the partition table.
Re: [PERFORM] Very slow planning performance on partition table
Anyone? I can see many pg processes are in BIND status with htop. Some of them could be hanging like 30 mins. I tried gdb on the same process many times and the trace shows same as my previous post. This happened after I partitioned my main tables to 60 children tables. And also, I'm experiecing a cpu peak around 30-60 mins every 1-2 days. During the peak, all my cpus(32 cores) are full utilized while there is no special load and the memory and io are fine. Sometimes I had to kill the db process and restart the db to escape the situation. I tried to upgrade to the latest 9.2.9 but it didn't help. ?? 2014/7/25 22:23, Rural Hunter : I run dbg on the backend process and got this: (gdb) bt #0 0x7fc4a1b6cdb7 in semop () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x005f8703 in PGSemaphoreLock () #2 0x00636703 in LWLockAcquire () #3 0x00632eb3 in LockAcquireExtended () #4 0x0062fdfb in LockRelationOid () #5 0x00474e55 in relation_open () #6 0x0047b39b in index_open () #7 0x005f3c22 in get_relation_info () #8 0x005f6590 in build_simple_rel () #9 0x005f65db in build_simple_rel () #10 0x005de8c0 in add_base_rels_to_query () #11 0x005df352 in query_planner () #12 0x005e0d51 in grouping_planner () #13 0x005e2bbe in subquery_planner () #14 0x005e2ef9 in standard_planner () #15 0x006426e1 in pg_plan_query () #16 0x0064279e in pg_plan_queries () #17 0x006f4b7a in BuildCachedPlan () #18 0x006f4e1e in GetCachedPlan () #19 0x00642259 in exec_bind_message () #20 0x00643561 in PostgresMain () #21 0x0060347f in ServerLoop () #22 0x00604121 in PostmasterMain () #23 0x005a5ade in main () Does that indicate something? seems it's waiting for some lock. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
Yes I checked. The connection I inspected is the longest running one. There was no other connections blocking it. And I also see all locks are granted for it. Does the planning phase require some internal locks? ?? 2014/7/28 0:28, Tom Lane : Yeah, that's what the stack trace suggests. Have you looked into pg_locks and pg_stat_activity to see which lock it wants and what's holding said lock? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
?? 2014/7/25 9:53, Tom Lane : [ shrug... ] Insufficient data. When I try a simple test case based on what you've told us, I get planning times of a couple of milliseconds. I can think of contributing factors that would increase that, but not by four orders of magnitude. So there's something very significant that you've left out. Can you construct a self-contained test case that's this slow? regards, tom lane I run dbg on the backend process and got this: (gdb) bt #0 0x7fc4a1b6cdb7 in semop () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x005f8703 in PGSemaphoreLock () #2 0x00636703 in LWLockAcquire () #3 0x00632eb3 in LockAcquireExtended () #4 0x0062fdfb in LockRelationOid () #5 0x00474e55 in relation_open () #6 0x0047b39b in index_open () #7 0x005f3c22 in get_relation_info () #8 0x005f6590 in build_simple_rel () #9 0x005f65db in build_simple_rel () #10 0x005de8c0 in add_base_rels_to_query () #11 0x005df352 in query_planner () #12 0x005e0d51 in grouping_planner () #13 0x005e2bbe in subquery_planner () #14 0x005e2ef9 in standard_planner () #15 0x006426e1 in pg_plan_query () #16 0x0064279e in pg_plan_queries () #17 0x006f4b7a in BuildCachedPlan () #18 0x006f4e1e in GetCachedPlan () #19 0x00642259 in exec_bind_message () #20 0x00643561 in PostgresMain () #21 0x0060347f in ServerLoop () #22 0x00604121 in PostmasterMain () #23 0x005a5ade in main () Does that indicate something? seems it's waiting for some lock. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
2014/7/25 9:53, Tom Lane wrote: Rural [ shrug... ] Insufficient data. When I try a simple test case based on what you've told us, I get planning times of a couple of milliseconds. I can think of contributing factors that would increase that, but not by four orders of magnitude. So there's something very significant that you've left out. Can you construct a self-contained test case that's this slow? regards, tom lane No I cann't. I exported the db schema(without data) to another server and there is no problem. Is the planning time related to data volume? Anything else can I check? I already checked the default statistics target and it's the default value. I did change some statistics target on one column of the table, but the column is not involved in the slow planning query. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Very slow planning performance on partition table
Hi, I have a table partitioned with about 60 children tables. Now I found the planning time of simple query with partition key are very slow. # explain analyze select count(*) as cnt from article where pid=88 and hash_code='2ca3ff8b17b163f0212c2ba01b80a064'; QUERY PLAN - Aggregate (cost=16.55..16.56 rows=1 width=0) (actual time=0.259..0.259 rows=1 loops=1) - Append (cost=0.00..16.55 rows=2 width=0) (actual time=0.248..0.250 rows=1 loops=1) - Seq Scan on article (cost=0.00..0.00 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Filter: ((pid = 88) AND (hash_code = '2ca3ff8b17b163f0212c2ba01b80a064'::bpchar)) - Index Scan using article_88_hash_idx on article_88 article (cost=0.00..16.55 rows=1 width=0) (actual time=0.246..0.248 rows=1 loops=1) Index Cond: (hash_code = '2ca3ff8b17b163f0212c2ba01b80a064'::bpchar) Filter: (pid = 88) Total runtime: 3.816 ms (8 rows) Time: 30999.986 ms You can see the timing output that the actual run time of the 'explain analyze' is 30 seconds while the select sql itself takes only 3 ms. My partition key is on article.pid and the constraint is simple like this: CONSTRAINT article_88_pid_check CHECK (pid = 88). What's wrong and how can I improve the planning performance? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow planning performance on partition table
It's the default value(partition): # grep exclusion postgresql.conf #constraint_exclusion = partition?0?2?0?2?0?2 # on, off, or partition btw, I'm on postgresql 9.2.4 ?? 2014/7/24 1:35, Douglas J Hunley : On Wed, Jul 23, 2014 at 6:21 AM, Rural Hunter ruralhun...@gmail.com wrote: What's wrong and how can I improve the planning performance? What is constraint exclusion set to? -- Douglas J Hunley (doug.hun...@gmail.com)
[PERFORM] How to investiage slow insert problem
Hi, I'm on 9.2.4 with Ubuntu server. There are usually hundereds of connections doing the same insert with different data from different networks every minute, through pgbouncer in the same network of the database server. The database has been running for about one year without problem. Yesterday I got a problem that the connection count limit of the database server is reached. I checked the connections and found that there are many inserts hanging there. I checked the load(cpu,memory,io) of the db server but seems everything is fine. I also checked pg log and I only found there are one incomplete message from client error message every several minute. The I recycled pgbouncer and kept monitoring the connections. I found the majority of the inserts finish quickly but every minute there are several inserts left and seems hanging there . So after a while, the connection limit is reached again. Besides those inserts, there are no other long run queries and auto vacuums. I also checked the locks of the inserts and found they were all granted. The insert statement itself is very simple and it only inserts one row but there are some triggers involved. They might impact the performance but I have never experience any since the majority of the inserts are fine. The problem persisted about 1-2 hours. I didn't do anything except recycling pgbouncer a few times. After that period, everything goes back to normal. It's has been 24 hours and it didn't happen again. From the error message in pg log, I supect it might be the network problem from some clients. Could anyone point out if there are other possible causes? I'm also wondering what those inserts are doing actually when they are hanging there, such as if they are in the trigger or not. Anything I can get similar with the connection snapshots in db2? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to investiage slow insert problem
于 2013/8/20 10:38, Sergey Konoplev 写道: On Mon, Aug 19, 2013 at 6:44 PM, Rural Hunter ruralhun...@gmail.com wrote: What do you mean by recycling pgbouncer? I mean restarting pgbouncer. Haven't you noticed what was in the state column of the pg_state_activity view? In 9.2 the query column in this view shows the last statement that was executed in this connection, and it does not mean that this statement is working at the moment of monitoring. If the state is active, than it was working, however, my assumption is that it was IDLE in transaction. No, they are alll with 'active' state. You mentioned the incomplete message from client error, so it might somehow be a network problem that led to a hunging connection to pgbouncer, that made pgbouncer kept a connection to postgres after transaction was started. pgbouncer and the db server are in the same local network and there shouldn't be any network problem between them. I also ran ping from pgbouncer server to the db server and there was no problem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to investiage slow insert problem
2013/8/20 12:34, Jeff Janes : On Monday, August 19, 2013, Rural Hunter wrote: I think that this should generally not happen at the server if you are using pgbouncer, as you should configure it so that pgbouncer has a lower limit than postgresql itself does. What pooling method (session, transaction, statement) are you using? statement. Currently, I set the limit of pgbouncer connection to same as db connection. But I also have a few connections connecting to db server directly. Can you provide some example numbers for the io load? I get some when the connection limit is reached(The database related storage is on sdb/sdd/sde/sdf): root@ubtserver:~# iostat -xm 3 Linux 3.5.0-22-generic (ubuntu) 20130819 _x86_64_ (32 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 14.71 0.00 2.86 0.48 0.00 81.96 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.26 0.04 0.36 0.00 0.00 24.71 0.00 0.55 3.01 0.30 0.29 0.01 sdb 0.00 0.26 0.18 2.32 0.02 0.38 329.50 0.01 5.36 1.26 5.69 0.21 0.05 sdc 0.01 4.59 10.13 45.75 0.30 0.92 44.65 0.05 5.14 7.49 4.62 0.63 3.50 dm-0 0.00 0.00 0.00 0.01 0.00 0.00 8.00 0.00 6.37 6.38 6.36 3.62 0.00 sdd 0.00 0.42 0.02 42.87 0.00 0.46 22.12 0.03 0.78 14.09 0.77 0.49 2.10 sde 0.00 3.68 10.23 156.41 0.19 1.45 20.06 0.03 1.59 21.34 0.29 0.51 8.55 sdf 0.00 2.56 6.29 66.00 0.29 0.71 28.42 0.04 0.56 4.52 0.19 0.37 2.71 avg-cpu: %user %nice %system %iowait %steal %idle 13.99 0.00 1.91 1.04 0.00 83.06 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.33 0.00 0.00 0.00 16.00 0.00 4.00 4.00 0.00 4.00 0.13 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 15.33 5.33 14.33 0.13 0.21 34.98 0.03 1.63 6.00 0.00 1.02 2.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.00 31.33 0.00 0.26 17.19 0.01 0.34 0.00 0.34 0.34 1.07 sde 0.00 0.00 43.00 163.67 0.59 1.29 18.55 2.56 21.34 72.06 8.01 1.69 34.93 sdf 0.00 0.00 6.00 62.00 0.17 0.55 21.88 0.49 7.16 5.56 7.31 0.27 1.87 avg-cpu: %user %nice %system %iowait %steal %idle 15.84 0.00 2.63 1.70 0.00 79.83 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 1.67 0.00 2.00 0.00 0.01 14.67 0.07 33.33 0.00 33.33 25.33 5.07 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 4.67 0.00 0.06 0.00 26.29 0.13 6.29 6.29 0.00 25.14 11.73 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.33 0.00 49.00 0.00 0.39 16.49 0.02 0.35 0.00 0.35 0.35 1.73 sde 0.00 11.00 30.67 81.33 0.38 0.71 19.98 36.46 143.19 43.91 180.62 2.69 30.13 sdf 0.00 9.33 3.00 326.00 0.09 2.75 17.69 3.51 10.66 5.33 10.71 0.11 3.60 avg-cpu: %user %nice %system %iowait %steal %idle 14.99 0.00 2.39 4.89 0.00 77.74 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 19.67 7.33 29.00 0.09 0.60 38.61 1.18 35.41 175.45 0.00 15.93 57.87 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.33 0.00 39.33 0.00 0.31 15.93 0.01 0.37 0.00 0.37 0.37 1.47 sde 0.00 11.33 29.67 312.67 0.39 2.51 17.34 87.15 314.23 108.13 333.78 2.84 97.20 sdf 0.00 0.00 8.33 0.00 0.17 0.00 42.24 0.05 6.56 6.56 0.00 2.40 2.00 avg-cpu: %user %nice %system %iowait %steal %idle 14.98 0.00 2.23 5.45 0.00 77.34 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.67 0.00 0.01 20.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 9.67 10.00 6.00 0.12 0.10 27.83 0.08 5.08 8.13 0.00 1.42 2.27 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.00 44.33 0.00 0.35 16.00 0.03 0.72 0.00 0.72 0.72 3.20 sde 0.00 0.00 47.33 0.00 0.58 0.00 25.18 5.26 111.04 111.04 0.00 19.10 90.40 sdf 0.00 11.00 3.33 683.33 0.12 7.38 22.37 12.05 17.54 244.00 16.44 0.49 33.33 avg-cpu: %user %nice %system %iowait %steal %idle 15.21 0.00 2.54 0.56 0.00 81.69 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 2.00 0.00 1.00 0.00 0.01 24.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00
Re: [PERFORM] Geoserver-PostGIS performance problems
Why not just use simple Statement instead of PreparedStatement and construct the SQL with concated string or StringBuilder? like this: int col1=xxx; String col2=; String sql=select * from table where col1=+col+ and col2='+col2+'; 于 2012/7/26 3:59, Vinicius Abrahao 写道: On Wed, Jul 25, 2012 at 4:26 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Jul 25, 2012 at 2:17 PM, Vinicius Abrahao vinnix@gmail.com wrote: On Wed, Jul 25, 2012 at 3:45 PM, Merlin Moncure mmonc...@gmail.com wrote: Note that it seems the preparing/planning interaction was not the poster's actual problem, but it may have been yours. As Tom Lane notes in that thread, this should get better in 9.2. jdbc should get some blame too -- it's really aggressive about preparing queries. indeed! Is there any reason for that? IMNSHO it's an oversight in the core JDBC design dating back to the beginning: you have two basic choices for executing SQL. The unparameterized Statement or the parameterized PreparedStatement. There should have been a 'ParamaterizedStatement' that gave the expectation of paramaterization without setting up and permanent server side structures to handle the query; libpq makes this distinction and it works very well. Of course, there are various ways to work around this but the point stands. That is true, I was observing the same, days ago: Running queries and statments in jdbc: https://github.com/vinnix/JavaLab/blob/master/Scrollable.java And running queries with libpq: https://github.com/vinnix/testLibPQ/blob/master/testlibpq.c Is this possible to change something (I really don't know what or where) in the jdbc driver to get more direct aproach? (if that's make any sense to you guys...) Best regards, vinnix -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to change the index chosen in plan?
于 2012/6/11 20:07, Kevin Grittner 写道: Rural Hunter wrote: 于 2012/6/9 22:39, Kevin Grittner 写道: You neglected to mention the LIMIT clause in your earlier presentation of the problem. A LIMIT can have a big impact on plan choice. Is the LIMIT 10 part of the actual query you want to optimize? Either way it would be helpful to see the EXPLAIN ANALYZE output for the the query without the LIMIT clause. Yes, sorry for that. I do need the limit clause in the query to show only part of the results to the user(common multi-pages view). Without the limit clause, I got the plan as I wanted: http://explain.depesz.com/s/Qdu So looks either I remove the order-by or limit clause, I can get what I wanted. But I do need the both in the query... Well, we're still doing diagnostic steps. What this one shows is that your statistics are leading the planner to believe that there will be 20846 rows with lid = 3072, while there are really only 62. If it knew the actual number I doubt it would choose the slower plan. The next thing I would try is: ALTER TABLE article_label ALTER COLUMN lid SET STATISTICS = 5000; ANALYZE article_label; Then try the query without LIMIT and see if you get something on the right order of magnitude comparing the estimated rows to actual on that index scan. You can try different STATISTICS values until you get the lowest value that puts the estimate in the right neighborhood. Higher settings will increase plan time; lower settings may lead to bad plans. Once you've got a decent estimate, try with the ORDER BY and LIMIT again. I set statistics to 5000 and got estimated row count 559. Set statistics to 8000 and got estimated row count 393. At this step, I run the query with both order-by and limit clause and got the expected result. Kevin, Thank you very much for your patience and step-by-step guidance! I learnt a lot from this case! If you have a hard time getting a good estimate even with a high statistics target, you should investigate whether you have extreme table bloat. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to change the index chosen in plan?
于 2012/6/9 22:39, Kevin Grittner 写道: Rural Hunter wrote: 于 2012/6/9 0:39, Kevin Grittner 写道: name | current_setting full_page_writes | off There may be exceptions on some file systems, but generally turning this off leaves you vulnerable to possible database corruption if you OS or hardware crashes. Yes, I understand. My situation is, the io utiliztion of my system is quite high so I turned this off to reduce the io utilization. We have a replication server to serve as the hot standby if there is any issue on the primary. So currently I think it's acceptable option to me. max_connections | 2500 Yikes! You may want to look in to a connection pooler which can take 2500 client connections and funnel them into a much smaller number of database connections. https://wiki.postgresql.org/wiki/Number_Of_Database_Connections shared_buffers | 60GB You might want to compare your performance with this setting against a smaller setting. Many benchmarks have shown settings about a certain point (like 8MB to 12 MB) to be counter-productive, although a few have shown increased performance going past that. It really seems to depend on your hardware and workload, so you have to test to find the sweet spot for your environment. work_mem | 8MB With so many connections, I can understand being this low. One of the advantages of using connection pooling to funnel your user connections into fewer database conncections is that you can boost this, which might help considerably with some types of queries. None of the above, however, really gets to your immediate problem. What is most significant about your settings with regard to the problem query is what's *not* in that list. You appear to have a heavily cached active data set, based on the row counts and timings in EXPLAIN ANALYZE output, and you have not adjusted your cost factors, which assume less caching. Thanks for the advices. As of now we don't see overall performance issue on the db. I will adjust these settings based on your advices if we begin to see overall performance degrade. Try setting these on a connection and then running your queries on that connection. set seq_page_cost = 0.1; set random_page_cost = 0.1; set cpu_tuple_cost = 0.03; I tried these settings but don't see noticeable improvement. The plan is not changed. Ok, I get out a simple version of the actualy query. Here is the explain anaylze without order-by, which is I wanted: http://explain.depesz.com/s/p1p Another with the order-by which I want to avoid: http://explain.depesz.com/s/ujU You neglected to mention the LIMIT clause in your earlier presentation of the problem. A LIMIT can have a big impact on plan choice. Is the LIMIT 10 part of the actual query you want to optimize? Either way it would be helpful to see the EXPLAIN ANALYZE output for the the query without the LIMIT clause. Yes, sorry for that. I do need the limit clause in the query to show only part of the results to the user(common multi-pages view). Without the limit clause, I got the plan as I wanted: http://explain.depesz.com/s/Qdu So looks either I remove the order-by or limit clause, I can get what I wanted. But I do need the both in the query... -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] how to change the index chosen in plan?
I have a query like this: select a.* from a inner join b on a.aid=b.aid where a.col1=33 a.col2=44 and b.bid=8 postgresql selected the index on a.col1 then selected the index on b.bid. But in my situation, I know that the query will be faster if it chose the index on b.bid first since there are only a few rows with value 8. So I re-wrote the query as below: select a.* from a where a.aid in (select aid from b where bid=8) and a.col1=33 a.col2=44 But surprisingly, postgresql didn't change the plan. it still chose to index scan on a.col1. How can I re-wirte the query so postgresql will scan on b.bid first? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to change the index chosen in plan?
No, it's not the analyze problem. For some other values on b.bid such as 9, 10, the plan is fine since there a a lot of rows in table b for them. But for some specific values such as 8 I want the plan changed. 于2012年6月8日 22:10:58,Tom Lane写到: Rural Hunter ruralhun...@gmail.com writes: I have a query like this: select a.* from a inner join b on a.aid=b.aid where a.col1=33 a.col2=44 and b.bid=8 postgresql selected the index on a.col1 then selected the index on b.bid. But in my situation, I know that the query will be faster if it chose the index on b.bid first since there are only a few rows with value 8. If you know that and the planner doesn't, maybe ANALYZE is called for. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to change the index chosen in plan?
Hi Kevin, Thanks for your detailed explanation. 于 2012/6/8 22:37, Kevin Grittner 写道: Rural Hunter ruralhun...@gmail.com wrote: 于2012年6月8日 22:10:58,Tom Lane写到: Rural Hunter ruralhun...@gmail.com writes: I have a query like this: select a.* from a inner join b on a.aid=b.aid where a.col1=33 a.col2=44 and b.bid=8 postgresql selected the index on a.col1 then selected the index on b.bid. But in my situation, I know that the query will be faster if it chose the index on b.bid first since there are only a few rows with value 8. If you know that and the planner doesn't, maybe ANALYZE is called for. No, it's not the analyze problem. So you ran ANALYZE and retried? If not, please do. Yes, I did. For some other values on b.bid such as 9, 10, the plan is fine since there a a lot of rows in table b for them. So it uses the same plan regardless of the number of rows in table b for the value? yes. That sure *sounds* like you need to run ANALYZE, possibly after adjusting the statistics target for a column or two. How can adjust the statistics target? But for some specific values such as 8 I want the plan changed. If you approach it from that line of thought, you will be unlikely to reach a good long-term solution. PostgreSQL has a costing model to determine which plan is expected to be cheapest (fastest). This is based on statistics gathered during ANALYZE and on costing factors. Generally, if it's not choosing the fastest plan, you aren't running ANALYZE frequently enough or with a fine-grained enough statistics target _or_ you need to adjust your costing factors to better model your actual costs. You haven't given us a lot of clues about which it is that you need to do, but there is *some* suggestion that you need to ANALYZE. If you *try* that and it doesn't solve your problem, please read this page and provide more information: http://wiki.postgresql.org/wiki/SlowQueryQuestions Sorry the actual tables and query are very complicated so I just simplified the problem with my understanding. I rechecked the query and found it should be simplified like this: select a.* from a inner join b on a.aid=b.aid where a.col1=33 and a.col2=44 and a.timenow() and b.bid=8 order by a.time limit 10 There is an index on (a.col1,a.col2,a.time). If I remove the order-by clause, I can get the plan as I expected. I think that's why postgresql selected that index. But still I want the index on b.bid selected first for value 8 since there are only several rows with bid 8. though for other normal values there might be several kilo to million rows. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to change the index chosen in plan?
于 2012/6/9 0:39, Kevin Grittner 写道: Rural Hunter ruralhun...@gmail.com wrote: How can adjust the statistics target? default_statistics_target http://www.postgresql.org/docs/current/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER or ALTER TABLE x ALTER COLUMN y SET STATISTICS n http://www.postgresql.org/docs/current/interactive/sql-altertable.html Thanks, I will check detail. Sorry the actual tables and query are very complicated so I just simplified the problem with my understanding. I rechecked the query and found it should be simplified like this: select a.* from a inner join b on a.aid=b.aid where a.col1=33 and a.col2=44 and a.timenow() and b.bid=8 order by a.time limit 10 There is an index on (a.col1,a.col2,a.time). If I remove the order-by clause, I can get the plan as I expected. I think that's why postgresql selected that index. Sounds like it expects the sort to be expensive, which means it probably expects a large number of rows. An EXPLAIN ANALYZE of the query with and without the ORDER BY might be instructive. It would also help to know what version of PostgreSQL you have and how it is configured, all of which shows up in the results of the query on this page: http://wiki.postgresql.org/wiki/Server_Configuration Here is the output: name | current_setting -+--- version | PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit archive_command | test ! -f /dbbk/postgres/logarch/%f.gz gzip -c %p /dbbk/postgres/logarch/%f.gz archive_mode | on autovacuum | on autovacuum_freeze_max_age | 20 checkpoint_segments | 20 client_encoding | UTF8 effective_cache_size | 150GB full_page_writes | off lc_collate | zh_CN.utf8 lc_ctype | zh_CN.utf8 listen_addresses | * log_autovacuum_min_duration | 30min log_destination | stderr log_line_prefix | %t [%u@%h] log_min_duration_statement | 10s log_statement | ddl logging_collector | on maintenance_work_mem | 10GB max_connections | 2500 max_stack_depth | 2MB max_wal_senders | 1 port | 3500 server_encoding | UTF8 shared_buffers | 60GB synchronous_commit | off TimeZone | PRC track_activities | on track_counts | on vacuum_freeze_table_age | 10 wal_buffers | 16MB wal_level | hot_standby work_mem | 8MB (33 rows) But still I want the index on b.bid selected first for value 8 since there are only several rows with bid 8. though for other normal values there might be several kilo to million rows. An EXPLAIN ANALYZE of one where you think the plan is a good choice might also help. Ok, I get out a simple version of the actualy query. Here is the explain anaylze without order-by, which is I wanted: http://explain.depesz.com/s/p1p Another with the order-by which I want to avoid: http://explain.depesz.com/s/ujU This is the count of rows in article_label with value 3072(which I referred as table b in previous mail): # select count(*) from article_label where lid=3072; count --- 56 (1 row) Oh, and just to be sure -- are you actually running queries with the literals like you show, or are you using prepared statements with placeholders and plugging the values in after the statement is prepared? Sample code, if possible, might help point to or eliminate issues with a cached plan. If you're running through a cached plan, there is no way for it to behave differently based on the value plugged into the query -- the plan has already been set before you get to that point. Yes, I ran the query directly wih psql. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query got slow from 9.0 to 9.1 upgrade
This is very similar with my problem: http://postgresql.1045698.n5.nabble.com/index-choosing-problem-td5567320.html -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] index choosing problem
I have a table with serveral million records. they are divided into about one hundred catagory(column cid). I created index includes the cid as the first column. I had a problem with some cids they only have few records comparing with other cids. Some of them only have serveral thousand rows. Some queries are not using index on the cids. I got the explain for the queries. Note: article_others_cid_time_style_idx is the index contains cid as the first column article_others_pkey is the primary key on an auto incremented column aid. # select count(*) from article_others; count - 6888459 (1 row) # select count(*) from article_others where cid=74; count --- 4199 (1 row) 1. # explain select count(*) from article_others where cid=74; QUERY PLAN -- Aggregate (cost=32941.95..32941.96 rows=1 width=0) - Index Scan using article_others_cid_time_style_idx on article_others (cost=0.00..32909.34 rows=13047 width=0) Index Cond: (cid = 74) (3 rows) 2. # explain select aid from article_others where cid=74 limit 10; QUERY PLAN -- Limit (cost=0.00..25.22 rows=10 width=8) - Index Scan using article_others_cid_time_style_idx on article_others (cost=0.00..32909.34 rows=13047 width=8) Index Cond: (cid = 74) (3 rows) 3. # explain select aid from article_others where cid=74 order by aid desc limit 10; QUERY PLAN --- Limit (cost=0.00..1034.00 rows=10 width=8) - Index Scan Backward using article_others_pkey on article_others (cost=0.00..1349056.65 rows=13047 width=8) Filter: (cid = 74) (3 rows) 4. # explain select aid from article_others where cid=74 order by aid desc limit 1; QUERY PLAN --- Limit (cost=0.00..103.40 rows=1 width=8) - Index Scan Backward using article_others_pkey on article_others (cost=0.00..1349060.65 rows=13047 width=8) Filter: (cid = 74) (3 rows) 5. # explain select max(aid) from article_others where cid=74; QUERY PLAN --- Result (cost=104.70..104.71 rows=1 width=0) InitPlan 1 (returns $0) - Limit (cost=0.00..104.70 rows=1 width=8) - Index Scan Backward using article_others_pkey on article_others (cost=0.00..1365988.55 rows=13047 width=8) Index Cond: (aid IS NOT NULL) Filter: (cid = 74) (6 rows) Now the query 3-5 using article_others_pkey are quite slow. The rows for cid 74 are very old and seldom get updated. I think pg needs to scan quite a lot on article_others_pkey before it gets the rows for cid 74. The same query for other cids with new and majority of rows runs very fast. for example: # explain select max(aid) from article_others where cid=258; QUERY PLAN Result (cost=1.54..1.55 rows=1 width=0) InitPlan 1 (returns $0) - Limit (cost=0.00..1.54 rows=1 width=8) - Index Scan Backward using article_others_pkey on article_others (cost=0.00..1366260.55 rows=889520 width=8) Index Cond: (aid IS NOT NULL) Filter: (cid = 258) So I think if pg chooses to use index article_others_cid_time_style_idx the performance would be much better. or any other solution I can take to improve the query performance for those cids like 74? Another question, why the plan shows rows=13047 for cid=74 while actually it only has 4199 rows? There is almost no data changes for cid 74 and I just vacuum/analyzed the table this morning. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Gin index insert performance issue
I'm using gin index for my full text search engine in production. These days the gin index size grows to 20-30G and the system started to suffer with periodical insert hanging. This is same as described in the 2 posts: http://postgresql.1045698.n5.nabble.com/Random-penalties-on-GIN-index-updates-td2073848.html http://postgresql.1045698.n5.nabble.com/Periodically-slow-inserts-td3230434.html The gin index is on a dedicated raid 10 SAS disk and the performance should be enough for normal db operation. But I always see almost 100% disk utiliztion on the disk when the inserts hang. The utiliztion for other data(such as the full text table data) on another disk(same setup as the gin index disk: SAS raid 10) is quite low comparing with the gin index disk. From my observation, looks too much data is written to the disk when the pending list of gin index is flushed to the disk. Below is the outupt of 'iostat -xm 3' on the disk when inserts hang: Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sde 0.00 0.000.67 2614.00 0.0822.94 18.0332.94 12.61 0.38 100.00 sde 0.00 0.001.67 2377.33 0.1720.43 17.7332.00 13.44 0.42 100.00 sde 0.00 0.00 15.67 2320.33 0.2320.13 17.8531.99 13.73 0.43 100.00 sde 0.00 0.007.33 1525.00 0.1214.02 18.9032.00 20.83 0.65 100.00 sde 0.00 0.00 14.33 1664.67 0.1215.54 19.1032.00 19.06 0.60 100.00 sde 0.00 0.005.33 1654.33 0.0412.07 14.9432.00 19.22 0.60 100.00 I tried to increase work_mem but the inserts hang more time each time with less frequency. So it makes almost no difference for the total hanging time. Frequent vacuum is not a choice since the hang happens very 3-5 mins. is there any improvement I can make with pg for such data volumn(still increasing) or it's time to turn to other full text search solution such as lucene etc? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Is it possible to use index on column for regexp match operator '~'?
for example, the where condition is: where '' ~ col1. I created a normal index on col1 but seems it is not used. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is it possible to use index on column for regexp match operator '~'?
actually I stored the pattern in col1. I want to get the row whose col1 pattern matches one string 'aaa'. 于2011年12月15日 4:43:37,Marti Raudsepp写到: 2011/12/14 Rural Hunterruralhun...@gmail.com: for example, the where condition is: where '' ~ col1. I created a normal index on col1 but seems it is not used. I assume you want to search values that match one particular pattern, that would be col1 ~ '' The answer is, only very simple patterns that start with '^'. Note that you MUST use the text_pattern_ops index opclass: # create table words (word text); # copy words from '/usr/share/dict/words'; # create index on words (word text_pattern_ops); # explain select * from words where word ~ '^post'; Index Scan using words_word_idx on words (cost=0.00..8.28 rows=10 width=9) Index Cond: ((word ~=~ 'post'::text) AND (word ~~ 'posu'::text)) Filter: (word ~ '^post'::text) If you just want to search for arbitrary strings, in PostgreSQL 9.1+ you can use pg_trgm extension with a LIKE expression: # create extension pg_trgm; # create index on words using gist (word gist_trgm_ops); # explain select * from words where word like '%post%'; Bitmap Heap Scan on words (cost=4.36..40.23 rows=10 width=9) Recheck Cond: (word ~~ '%post%'::text) - Bitmap Index Scan on words_word_idx1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (word ~~ '%post%'::text) There's also the wildspeed external module which is somewhat faster at this: http://www.sai.msu.su/~megera/wiki/wildspeed And someone is working to get pg_trgm support for arbitrary regular expression searches. This *may* become part of the next major PostgreSQL release (9.2) http://archives.postgresql.org/message-id/CAPpHfduD6EGNise5codBz0KcdDahp7--MhFz_JDD_FRPC7-i=a...@mail.gmail.com Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance